AMD AMD-K6-2/400 User Guide - Page 47

BranchPrediction Logic, Branch History Table

Page 47 highlights

23542A/0-September 2000 Preliminary Information AMD-K6™-2E+ Embedded Processor Data Sheet 2.6 Branch-Prediction Logic Sophisticated branch logic that can minimize or hide the impact of changes in program flow is designed into the AMD-K6-2E+ processor. Branches in x86 code fit into two categories: s Unconditional branches always change program flow (that is, the branches are always taken) s Conditional branches may or may not divert program flow (that is, the branches are taken or not-taken). When a conditional branch is not taken, the processor simply continues decoding and executing the next instructions in memory. Branch History Table Typical applications have up to 10% of unconditional branches and another 10% to 20% conditional branches. The AMD-K6-2E+ processor branch logic has been designed to handle this type of program behavior and to minimize its negative effects on instruction execution, such as stalls due to delayed instruction fetching and the draining of the processor pipeline. The branch logic contains an 8192-entry branch history table, a 16-entry by 16-byte branch target cache, a 16-entry return address stack, and a branch execution unit. The AMD-K6-2E+ processor handles unconditional branches without any penalty by redirecting instruction fetching to the target address of the unconditional branch. However, conditional branches require the use of the dynamic branch-prediction mechanism built into the AMD-K6-2E+ processor. A two-level adaptive history algorithm is implemented in an 8192-entry branch history table. This table stores executed branch information, predicts individual branches, and predicts the behavior of groups of branches. To accommodate the large branch history table, the AMD-K6-2E+ processor does not store predicted target addresses. Instead, the branch target addresses are calculated on-the-fly using ALUs during the decode stage. The adders calculate all possible target addresses before the instructions are fully decoded and the processor chooses which addresses are valid. Chapter 2 Internal Architecture 25

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258
  • 259
  • 260
  • 261
  • 262
  • 263
  • 264
  • 265
  • 266
  • 267
  • 268
  • 269
  • 270
  • 271
  • 272
  • 273
  • 274
  • 275
  • 276
  • 277
  • 278
  • 279
  • 280
  • 281
  • 282
  • 283
  • 284
  • 285
  • 286
  • 287
  • 288
  • 289
  • 290
  • 291
  • 292
  • 293
  • 294
  • 295
  • 296
  • 297
  • 298
  • 299
  • 300
  • 301
  • 302
  • 303
  • 304
  • 305
  • 306
  • 307
  • 308
  • 309
  • 310
  • 311
  • 312
  • 313
  • 314
  • 315
  • 316
  • 317
  • 318
  • 319
  • 320
  • 321
  • 322
  • 323
  • 324
  • 325
  • 326
  • 327
  • 328
  • 329
  • 330
  • 331
  • 332
  • 333
  • 334
  • 335
  • 336
  • 337
  • 338
  • 339
  • 340
  • 341
  • 342
  • 343
  • 344
  • 345
  • 346
  • 347
  • 348
  • 349
  • 350
  • 351
  • 352
  • 353
  • 354
  • 355
  • 356
  • 357
  • 358
  • 359
  • 360
  • 361
  • 362
  • 363
  • 364
  • 365
  • 366
  • 367
  • 368

Chapter 2
Internal Architecture
25
23542A/0—September 2000
AMD-K6™-2E+ Embedded Processor Data Sheet
Preliminary Information
2.6
Branch-Prediction Logic
Sophisticated branch logic that can minimize or hide the impact
of changes in program flow is designed into the AMD-K6-2E+
processor. Branches in x86 code fit into two categories:
Unconditional branches
always change program flow (that is,
the branches are always taken)
C
onditional branches
may or may not divert program flow
(that is, the branches are taken or not-taken). When a
conditional branch is not taken, the processor simply
continues decoding and executing the next instructions in
memory.
Typical applications have up to 10% of unconditional branches
and another 10% to 20% conditional branches. The
AMD-K6-2E+ processor branch logic has been designed to
handle this type of program behavior and to minimize its
negative effects on instruction execution, such as stalls due to
delayed instruction fetching and the draining of the processor
pipeline. The branch logic contains an 8192-entry branch
history table, a 16-entry by 16-byte branch target cache, a
16-entry return address stack, and a branch execution unit.
Branch History Table
The AMD-K6-2E+ processor handles unconditional branches
without any penalty by redirecting instruction fetching to the
target address of the unconditional branch. However,
conditional branches require the use of the dynamic
branch-prediction mechanism built into the AMD-K6-2E+
processor.
A two-level adaptive history algorithm is implemented in an
8192-entry branch history table. This table stores executed
branch information, predicts individual branches, and predicts
the behavior of groups of branches.
To accommodate the large branch history table, the
AMD-K6-2E+ processor does not store predicted target
addresses. Instead, the branch target addresses are calculated
on-the-fly using ALUs during the decode stage. The adders
calculate all possible target addresses before the instructions
are fully decoded and the processor chooses which addresses
are valid.