Home » AMD Manuals » Processors » AMD AMD-K6-2/400 » Manual Viewer

AMD AMD-K6-2/400 User Guide - Page 39

Instruction Fetch and Decode, Prefetching, Predecode Bits, Instruction Fetch

Add to My Manuals
Save this manual to your list of manuals

Page 39 highlights

23542A/0-September 2000 Preliminary Information AMD-K6™-2E+ Embedded Processor Data Sheet Prefetching Predecode Bits The AMD-K6-2E+ processor conditionally performs cache prefetching, which results in the filling of the required cache line first, and a prefetch of the second cache line making up the other half of the sector. From the perspective of the external bus, the two cache-line fills typically appear as two 32-byte burst read cycles occurring back-to-back or, if allowed, as pipelined cycles. The 3DNow! technology includes an instruction called PREFETCH that allows a cache line to be prefetched into the L1 data cache and the L2 cache. The PREFETCH instruction format is defined in Table 15, "3DNow!™ Instructions," on page 89. For more detailed information, see the 3DNow!™ Technology Manual, order# 21928. Decoding x86 instructions is particularly difficult because the instructions are variable-length and can be from 1 to 15 bytes long. Predecode logic supplies the five predecode bits that are associated with each instruction byte. The predecode bits indicate the number of bytes to the start of the next x86 instruction. The predecode bits are stored in an extended instruction cache alongside each x86 instruction byte as shown in Figure 2 on page 16. The predecode bits are passed with the instruction bytes to the decoders where they assist with parallel x86 instruction decoding. 2.3 Instruction Fetch and Decode Instruction Fetch The processor can fetch up to 16 bytes per clock out of the L1 instruction cache or branch target cache. The fetched information is placed into a 16-byte instruction buffer that feeds directly into the decoders (see Figure 3 on page 18). Fetching can occur along a single execution stream with up to seven outstanding branches taken. The instruction fetch logic is capable of retrieving any 16 contiguous bytes of information within a 32-byte boundary. There is no additional penalty when the 16 bytes of instructions lie across a cache line boundary. The instruction bytes are loaded into the instruction buffer as they are consumed by the decoders. Although instructions can be consumed with byte granularity, the instruction buffer is managed on a memory-aligned word Chapter 2 Internal Architecture 17

Section	Page
IF YOU HAVE QUESTIONS, WE\	3
Contents	5
List of Figures	9
List of Tables	13
Revision History	17
About this Data Sheet	19
1 AMDK6™2E+ Embedded Processor	23
1.1 AMDK6™2E+ Embedded Processor Features	25
1.2 Process Technology	29
1.3 Super7™ Platform	30
2 Internal Architecture	33
2.1 Microarchitecture Overview	33
2.2 Cache, Instruction Prefetch, and Predecode Bits	38
2.3 Instruction Fetch and Decode	39
2.4 Centralized Scheduler	43
2.5 Execution Units	44
2.6 BranchPrediction Logic	47
3 Software Environment	49
3.1 Registers	49
3.2 ModelSpecific Registers (MSR)	66
3.3 Memory Management Registers	76
3.4 Paging	78
3.5 Descriptors and Gates	81
3.6 Exceptions and Interrupts	84
3.7 Instructions Supported by the AMDK6™2E+ Processor	85
4 Logic Symbol Diagram	113
5 Signal Descriptions	115
5.1 Signal Terminology	115
5.2 A20M# (Address Bit 20 Mask)	116
5.3 A[31:3] (Address Bus)	117
5.4 ADS# (Address Strobe)	118
5.5 ADSC# (Address Strobe Copy)	118
5.6 AHOLD (Address Hold)	119
5.7 AP (Address Parity)	120
5.8 APCHK# (Address Parity Check)	121
5.9 BE[7:0]# (Byte Enables)	122
5.10 BF[2:0] (Bus Frequency)	123
5.11 BOFF# (Backoff)	124
5.12 BRDY# (Burst Ready)	125
5.13 BRDYC# (Burst Ready Copy)	126
5.14 BREQ (Bus Request)	126
5.15 CACHE# (Cacheable Access)	127
5.16 CLK (Clock)	127
5.17 D/C# (Data/Code)	128
5.18 D[63:0] (Data Bus)	129
5.19 DP[7:0] (Data Parity)	130
5.20 EADS# (External Address Strobe)	131
5.21 EWBE# (External Write Buffer Empty)	132
5.22 FERR# (FloatingPoint Error)	133
5.23 FLUSH# (Cache Flush)	134
5.24 HIT# (Inquire Cycle Hit)	135
5.25 HITM# (Inquire Cycle Hit To Modified Line)	135
5.26 HLDA (Hold Acknowledge)	136
5.27 HOLD (Bus Hold Request)	137
5.28 IGNNE# (Ignore Numeric Exception)	138
5.29 INIT (Initialization)	139
5.30 INTR (Maskable Interrupt)	140
5.31 INV (Invalidation Request)	140
5.32 KEN# (Cache Enable)	141
5.33 LOCK# (Bus Lock)	142
5.34 M/IO# (Memory or I/O)	143
5.35 NA# (Next Address)	144
5.36 NMI (NonMaskable Interrupt)	145
5.37 PCD (Page Cache Disable)	146
5.38 PCHK# (Parity Check)	147
5.39 PWT (Page Writethrough)	148
5.40 RESET (Reset)	149
5.41 RSVD (Reserved)	150
5.42 SCYC (Split Cycle)	151
5.43 SMI# (System Management Interrupt)	152
5.44 SMIACT# (System Management Interrupt Active)	153
5.45 STPCLK# (Stop Clock)	154
5.46 TCK (Test Clock)	155
5.47 TDI (Test Data Input)	155
5.48 TDO (Test Data Output)	155
5.49 TMS (Test Mode Select)	156
5.50 TRST# (Test Reset)	156
5.51 VCC2DET (VCC2 Detect)	157
5.52 VCC2H/L# (VCC2 High/Low)	158
5.53 VID[4:0] (Voltage Identification)	159
5.54 W/R# (Write/Read)	160
5.55 WB/WT# (Writeback or Writethrough)	161
5.56 Pin Tables by Type	162
5.57 Bus Cycle Definitions	164
6 AMD PowerNow!™ Technology	165
6.1 Enhanced Power Management Features	165
6.2 Dynamic Core Frequency and Core Voltage Control	172
7 Bus Cycles	175
7.1 Timing Diagrams	175
7.2 Bus States	177
7.3 Memory Reads and Writes	180
7.4 I/O Read and Write	188
7.5 Inquire and Bus Arbitration Cycles	190
7.6 Special Bus Cycles	212
8 Poweron Configuration and Initialization	221
8.1 Signals Sampled During the Falling Transition of RESET	221
8.2 RESET Requirements	222
8.3 State of Processor After RESET	222
8.4 State of Processor After INIT	225
9 Cache Organization	227
9.1 MESI States in the L1 Data Cache and L2 Cache	229
9.2 Predecode Bits	230
9.3 Cache Operation	230
9.4 Cache Disabling and Flushing	233
9.5 L2 Cache Testing	235
9.6 CacheLine Fills	235
9.7 CacheLine Replacements	236
9.8 Write Allocate	237
9.9 Prefetching	242
9.10 Cache States	243
9.11 Cache Coherency	244
9.12 Writethrough and Writeback Coherency States	249
9.13 A20M# Masking of Cache Accesses	249
10 Write Merge Buffer	251
10.1 EWBE# Control	251
10.2 Memory Type Range Registers	253
10.3 Memory-Range Restrictions	255
10.4 Examples	257
11 FloatingPoint and Multimedia Execution Units	259
11.1 FloatingPoint Execution Unit	259
11.2 Multimedia and 3DNow!™ Execution Units	261
11.3 FloatingPoint and MMX™/3DNow!™ Instruction Compatibility	262
12 System Management Mode (SMM)	263
12.1 SMM Operating Mode and Default Register Values	263
12.2 SMM StateSave Area	265
12.3 SMM Revision Identifier	267
12.4 SMM Base Address	268
12.5 Halt Restart Slot	268
12.6 I/O Trap Doubleword	269
12.7 I/O Trap Restart Slot	270
12.8 Exceptions, Interrupts, and Debug in SMM	272
13 Test and Debug	273
13.1 BuiltIn SelfTest (BIST)	273
13.2 ThreeState Test Mode	274
13.3 BoundaryScan Test Access Port (TAP)	275
13.4 Cache Inhibit	285
13.5 L2 Cache and Tag Array Testing	286
13.6 Debug	290
14 Clock Control	297
14.1 Clock Control States	297
14.2 Halt State	300
14.3 Stop Grant State	300
14.4 Stop Grant Inquire State	302
14.5 EPM Stop Grant State	303
14.6 Stop Clock State	305
15 Electrical Data	307
15.1 Operating Ranges	308
15.2 Absolute Ratings	309
15.3 DC Characteristics	309
15.4 Power Dissipation	311
15.5 Power and Grounding	313
16 Signal Switching Characteristics	317
16.1 CLK Switching Characteristics	318
16.2 Clock Switching Characteristics for 100MHz Bus Operation	318
16.3 Clock Switching Characteristics for 66MHz Bus Operation	319
16.4 Valid Delay, Float, Setup, and Hold Timings	320
16.5 Output Delay Timings for 100MHz Bus Operation	320
16.6 Input Setup and Hold Timings for 100MHz Bus Operation	322
16.7 Output Delay Timings for 66MHz Bus Operation	324
16.8 Input Setup and Hold Timings for 66MHz Bus Operation	326
16.9 RESET and Test Signal Timing	328
16.10 Timing Diagrams	331
17 Thermal Design	335
17.1 Package Thermal Specifications	335
17.2 Measuring Case Temperature	339
17.3 Layout and Airflow Considerations	339
18 Pin Designations	343
18.1 Pins Designations for CPGA Package	344
18.2 Pins Designations for OBGA Package	348
19 Package Specifications	353
19.1 321Pin Staggered CPGA Package Specification	353
19.2 349Ball OBGA Package Specification	354
20 Ordering Information	355

Match case Limit results 1 per page

Chapter 2

Internal Architecture

23542A/0—September 2000

AMD-K6™-2E+ Embedded Processor Data Sheet

Preliminary Information

Prefetching

The AMD-K6-2E+ processor conditionally performs cache

prefetching, which results in the filling of the required cache

line first, and a prefetch of the second cache line making up the

other half of the sector. From the perspective of the external

bus, the two cache-line fills typically appear as two 32-byte

burst read cycles occurring back-to-back or, if allowed, as

pipelined cycles.

The 3DNow! technology includes an instruction called

PREFETCH that allows a cache line to be prefetched into the

L1 data cache and the L2 cache. The PREFETCH instruction

format is defined in Table 15, “3DNow!™ Instructions,” on

page 89. For more detailed information, see the

3DNow!™

Technology Manual

, order# 21928.

Predecode Bits

Decoding x86 instructions is particularly difficult because the

instructions are variable-length and can be from 1 to 15 bytes

long. Predecode logic supplies the five predecode bits that are

associated with each instruction byte. The predecode bits

indicate the number of bytes to the start of the next x86

instruction. The predecode bits are stored in an extended

instruction cache alongside each x86 instruction byte as shown

in Figure 2 on page 16. The predecode bits are passed with the

instruction bytes to the decoders where they assist with parallel

x86 instruction decoding.

2.3

Instruction Fetch and Decode

Instruction Fetch

The processor can fetch up to 16 bytes per clock out of the L1

instruction cache or branch target cache. The fetched

information is placed into a 16-byte instruction buffer that

feeds directly into the decoders (see Figure 3 on page 18).

Fetching can occur along a single execution stream with up to

seven outstanding branches taken.

The instruction fetch logic is capable of retrieving any 16

contiguous bytes of information within a 32-byte boundary.

There is no additional penalty when the 16 bytes of instructions

lie across a cache line boundary. The instruction bytes are

loaded into the instruction buffer as they are consumed by the

decoders.

Although instructions can be consumed with byte granularity,

the instruction buffer is managed on a memory-aligned word