AMD AMD-K6-2/400 User Guide - Page 39
Instruction Fetch and Decode, Prefetching, Predecode Bits, Instruction Fetch
View all AMD AMD-K6-2/400 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 39 highlights
23542A/0-September 2000 Preliminary Information AMD-K6™-2E+ Embedded Processor Data Sheet Prefetching Predecode Bits The AMD-K6-2E+ processor conditionally performs cache prefetching, which results in the filling of the required cache line first, and a prefetch of the second cache line making up the other half of the sector. From the perspective of the external bus, the two cache-line fills typically appear as two 32-byte burst read cycles occurring back-to-back or, if allowed, as pipelined cycles. The 3DNow! technology includes an instruction called PREFETCH that allows a cache line to be prefetched into the L1 data cache and the L2 cache. The PREFETCH instruction format is defined in Table 15, "3DNow!™ Instructions," on page 89. For more detailed information, see the 3DNow!™ Technology Manual, order# 21928. Decoding x86 instructions is particularly difficult because the instructions are variable-length and can be from 1 to 15 bytes long. Predecode logic supplies the five predecode bits that are associated with each instruction byte. The predecode bits indicate the number of bytes to the start of the next x86 instruction. The predecode bits are stored in an extended instruction cache alongside each x86 instruction byte as shown in Figure 2 on page 16. The predecode bits are passed with the instruction bytes to the decoders where they assist with parallel x86 instruction decoding. 2.3 Instruction Fetch and Decode Instruction Fetch The processor can fetch up to 16 bytes per clock out of the L1 instruction cache or branch target cache. The fetched information is placed into a 16-byte instruction buffer that feeds directly into the decoders (see Figure 3 on page 18). Fetching can occur along a single execution stream with up to seven outstanding branches taken. The instruction fetch logic is capable of retrieving any 16 contiguous bytes of information within a 32-byte boundary. There is no additional penalty when the 16 bytes of instructions lie across a cache line boundary. The instruction bytes are loaded into the instruction buffer as they are consumed by the decoders. Although instructions can be consumed with byte granularity, the instruction buffer is managed on a memory-aligned word Chapter 2 Internal Architecture 17