AMD AMD-K6-2/500AFX Data Sheet - Page 29
Cache, Instruction Prefetch, and Predecode Bits, Cache, MESI Modified, Exclusive, Shared - 3d processor
View all AMD AMD-K6-2/500AFX manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 29 highlights
21850J/0-February 2000 Preliminary Information AMD-K6®-2 Processor Data Sheet 2.3 Cache The AMD-K6-2 processor implements a two-level branch prediction scheme based on an 8192-entry branch history table. The branch history table stores prediction information that is used for predicting conditional branches. Because the branch history table does not store predicted target addresses, special address ALUs calculate target addresses on-the-fly during instruction decode. The branch target cache augments predicted branch performance by avoiding a one clock cache-fetch penalty. This specialized target cache does this by supplying the first 16 bytes of target instructions to the decoders when branches are predicted. The return address stack is a unique device specifically designed for optimizing CALL and RETURN pairs. In summary, the AMD-K6-2 processor uses dynamic branch logic to minimize delays due to the branch instructions that are common in x86 software. 3DNow!™ Technology. AMD has taken a lead role in improving the multimedia and 3D capabilities of the x86 processor family with the introduction of 3DNow! technology, which uses a packed, single-precision, floating-point data format and Single Instruction Multiple Data (SIMD) operations based on the MMX technology model. Cache, Instruction Prefetch, and Predecode Bits The writeback level-one cache on the AMD-K6-2 processor is organized as a separate 32-Kbyte instruction cache and a 32-Kbyte data cache with two-way set associativity. The cache line size is 32 bytes and lines are prefetched from main memory using an efficient pipelined burst transaction. As the instruction cache is filled, each instruction byte is analyzed for instruction boundaries using predecoding logic. Predecoding annotates information (5 bits per byte) to each instruction byte that later enables the decoders to efficiently decode multiple instructions simultaneously. The processor cache design takes advantage of a sectored organization (see Figure 2 on page 10). Each sector consists of 64 bytes configured as two 32-byte cache lines. The two cache lines of a sector share a common tag but have separate pairs of MESI (Modified, Exclusive, Shared, Invalid) bits that track the state of each cache line. Chapter 2 Internal Architecture 9