AMD AMD-K6-2/400 User Guide - Page 47
BranchPrediction Logic, Branch History Table
View all AMD AMD-K6-2/400 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 47 highlights
23542A/0-September 2000 Preliminary Information AMD-K6™-2E+ Embedded Processor Data Sheet 2.6 Branch-Prediction Logic Sophisticated branch logic that can minimize or hide the impact of changes in program flow is designed into the AMD-K6-2E+ processor. Branches in x86 code fit into two categories: s Unconditional branches always change program flow (that is, the branches are always taken) s Conditional branches may or may not divert program flow (that is, the branches are taken or not-taken). When a conditional branch is not taken, the processor simply continues decoding and executing the next instructions in memory. Branch History Table Typical applications have up to 10% of unconditional branches and another 10% to 20% conditional branches. The AMD-K6-2E+ processor branch logic has been designed to handle this type of program behavior and to minimize its negative effects on instruction execution, such as stalls due to delayed instruction fetching and the draining of the processor pipeline. The branch logic contains an 8192-entry branch history table, a 16-entry by 16-byte branch target cache, a 16-entry return address stack, and a branch execution unit. The AMD-K6-2E+ processor handles unconditional branches without any penalty by redirecting instruction fetching to the target address of the unconditional branch. However, conditional branches require the use of the dynamic branch-prediction mechanism built into the AMD-K6-2E+ processor. A two-level adaptive history algorithm is implemented in an 8192-entry branch history table. This table stores executed branch information, predicts individual branches, and predicts the behavior of groups of branches. To accommodate the large branch history table, the AMD-K6-2E+ processor does not store predicted target addresses. Instead, the branch target addresses are calculated on-the-fly using ALUs during the decode stage. The adders calculate all possible target addresses before the instructions are fully decoded and the processor chooses which addresses are valid. Chapter 2 Internal Architecture 25