AMD AMD-K6-2/400 User Guide - Page 237

Write Allocate, not exist in the L2 cache, the processor performs a 32-byte burst

Page 237 highlights

23542A/0-September 2000 Preliminary Information AMD-K6™-2E+ Embedded Processor Data Sheet 9.8 Chapter 9 L1 instruction-cache lines and L2 cache lines are replaced using a Least Recently Used (LRU) algorithm. If a line replacement is required, lines are replaced when read cache misses occur. The L1 data cache uses a slightly different approach to line replacement. If a miss occurs, and a replacement is required, lines are replaced by using a Least Recently Allocated (LRA) algorithm. Write Allocate Write allocate, if enabled, occurs when the processor has a pending memory write cycle to a cacheable line and the line does not currently reside in the L1 data cache. If the line does not exist in the L2 cache, the processor performs a 32-byte burst read cycle on the system bus to fetch the data-cache line addressed by the pending write cycle. If the line does exist in the L2 cache, the data is supplied directly from the L2 cache, in which case a system bus cycle is not executed. The data associated with the pending write cycle is merged with the recently-allocated data-cache line and stored in the processor's L1 data cache. If the data-cache line was fetched from memory (because of a L2 cache miss), the data is stored, without modification, in the L2 cache. The final MESI state of the cache lines depends on the state of the WB/WT# and PWT signals during the burst read cycle and the subsequent L1 data cache write hit (See Table 39 on page 221 to determine the cache-line states and the access types following a cache write miss). If the L1 data cache line is stored in the modified state, then the same cache line is stored in the L2 cache in the exclusive state. If the L1 data cache line is stored in the shared state, then the same cache line is stored in the L2 cache in the shared state. If a data-cache line fetch from memory is attempted because the write allocate misses the L2 cache, and KEN# is sampled negated, the processor does not perform an allocation. In this case, the pending write cycle is executed as a single write cycle on the system bus. During write allocates that miss the L2 cache, a 32-byte burst read cycle is executed in place of a non-burst write cycle. While the burst read cycle generally takes longer to execute than the non-burst write cycle, performance gains are realized on subsequent write cycle hits to the write-allocated cache line. Cache Organization 215

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258
  • 259
  • 260
  • 261
  • 262
  • 263
  • 264
  • 265
  • 266
  • 267
  • 268
  • 269
  • 270
  • 271
  • 272
  • 273
  • 274
  • 275
  • 276
  • 277
  • 278
  • 279
  • 280
  • 281
  • 282
  • 283
  • 284
  • 285
  • 286
  • 287
  • 288
  • 289
  • 290
  • 291
  • 292
  • 293
  • 294
  • 295
  • 296
  • 297
  • 298
  • 299
  • 300
  • 301
  • 302
  • 303
  • 304
  • 305
  • 306
  • 307
  • 308
  • 309
  • 310
  • 311
  • 312
  • 313
  • 314
  • 315
  • 316
  • 317
  • 318
  • 319
  • 320
  • 321
  • 322
  • 323
  • 324
  • 325
  • 326
  • 327
  • 328
  • 329
  • 330
  • 331
  • 332
  • 333
  • 334
  • 335
  • 336
  • 337
  • 338
  • 339
  • 340
  • 341
  • 342
  • 343
  • 344
  • 345
  • 346
  • 347
  • 348
  • 349
  • 350
  • 351
  • 352
  • 353
  • 354
  • 355
  • 356
  • 357
  • 358
  • 359
  • 360
  • 361
  • 362
  • 363
  • 364
  • 365
  • 366
  • 367
  • 368

Chapter 9
Cache Organization
215
23542A/0—September 2000
AMD-K6™-2E+ Embedded Processor Data Sheet
Preliminary Information
L1 instruction-cache lines and L2 cache lines are replaced using
a Least Recently Used (LRU) algorithm. If a line replacement is
required, lines are replaced when read cache misses occur.
The L1 data cache uses a slightly different approach to line
replacement. If a miss occurs, and a replacement is required,
lines are replaced by using a Least Recently Allocated (LRA)
algorithm.
9.8
Write Allocate
Write allocate, if enabled, occurs when the processor has a
pending memory write cycle to a cacheable line and the line
does not currently reside in the L1 data cache. If the line does
not exist in the L2 cache, the processor performs a 32-byte burst
read cycle on the system bus to fetch the data-cache line
addressed by the pending write cycle. If the line does exist in
the L2 cache, the data is supplied directly from the L2 cache, in
which case a system bus cycle is not executed. The data
associated with the pending write cycle is merged with the
recently-allocated data-cache line and stored in the processor’s
L1 data cache. If the data-cache line was fetched from memory
(because of a L2 cache miss), the data is stored, without
modification, in the L2 cache. The final MESI state of the cache
lines depends on the state of the WB/WT# and PWT signals
during the burst read cycle and the subsequent L1 data cache
write hit (See Table 39 on page 221 to determine the cache-line
states and the access types following a cache write miss). If the
L1 data cache line is stored in the modified state, then the same
cache line is stored in the L2 cache in the exclusive state. If the
L1 data cache line is stored in the shared state, then the same
cache line is stored in the L2 cache in the shared state.
If a data-cache line fetch from memory is attempted because
the write allocate misses the L2 cache, and KEN# is sampled
negated, the processor does not perform an allocation. In this
case, the pending write cycle is executed as a single write cycle
on the system bus.
During write allocates that miss the L2 cache, a 32-byte burst
read cycle is executed in place of a non-burst write cycle. While
the burst read cycle generally takes longer to execute than the
non-burst write cycle, performance gains are realized on
subsequent write cycle hits to the write-allocated cache line.