Home » AMD Manuals » Processors » AMD AMD-K6-2/500AFX » Manual Viewer

AMD AMD-K6-2/500AFX Data Sheet - Page 29

Cache, Instruction Prefetch, and Predecode Bits, Cache, MESI Modified, Exclusive, Shared - 3d processor

View all AMD AMD-K6-2/500AFX manuals

Add to My Manuals
Save this manual to your list of manuals

Page 29 highlights

21850J/0-February 2000 Preliminary Information AMD-K6®-2 Processor Data Sheet 2.3 Cache The AMD-K6-2 processor implements a two-level branch prediction scheme based on an 8192-entry branch history table. The branch history table stores prediction information that is used for predicting conditional branches. Because the branch history table does not store predicted target addresses, special address ALUs calculate target addresses on-the-fly during instruction decode. The branch target cache augments predicted branch performance by avoiding a one clock cache-fetch penalty. This specialized target cache does this by supplying the first 16 bytes of target instructions to the decoders when branches are predicted. The return address stack is a unique device specifically designed for optimizing CALL and RETURN pairs. In summary, the AMD-K6-2 processor uses dynamic branch logic to minimize delays due to the branch instructions that are common in x86 software. 3DNow!™ Technology. AMD has taken a lead role in improving the multimedia and 3D capabilities of the x86 processor family with the introduction of 3DNow! technology, which uses a packed, single-precision, floating-point data format and Single Instruction Multiple Data (SIMD) operations based on the MMX technology model. Cache, Instruction Prefetch, and Predecode Bits The writeback level-one cache on the AMD-K6-2 processor is organized as a separate 32-Kbyte instruction cache and a 32-Kbyte data cache with two-way set associativity. The cache line size is 32 bytes and lines are prefetched from main memory using an efficient pipelined burst transaction. As the instruction cache is filled, each instruction byte is analyzed for instruction boundaries using predecoding logic. Predecoding annotates information (5 bits per byte) to each instruction byte that later enables the decoders to efficiently decode multiple instructions simultaneously. The processor cache design takes advantage of a sectored organization (see Figure 2 on page 10). Each sector consists of 64 bytes configured as two 32-byte cache lines. The two cache lines of a sector share a common tag but have separate pairs of MESI (Modified, Exclusive, Shared, Invalid) bits that track the state of each cache line. Chapter 2 Internal Architecture 9

Section	Page
Contents	3
List of Figures	11
List of Tables	15
Revision History	19
1 AMDK6®2 Processor	21
1.1 Super7™ Platform Initiative	23
Super7™ Platform Enhancements	23
Super7™ Platform Advantages	24
2 Internal Architecture	25
2.1 Introduction	25
2.2 AMDK6®2 Processor Microarchitecture Overview	25
Enhanced RISC86® Microarchitecture	26
2.3 Cache, Instruction Prefetch, and Predecode Bits	29
Cache	29
Prefetching	30
Predecode Bits	30
2.4 Instruction Fetch and Decode	31
Instruction Fetch	31
Instruction Decode	32
2.5 Centralized Scheduler	34
2.6 Execution Units	35
Register X and Y Pipelines	36
2.7 BranchPrediction Logic	37
Branch History Table	38
Branch Target Cache	38
Return Address Stack	38
Branch Execution Unit	39
3 Software Environment	41
3.1 Registers	41
GeneralPurpose Registers	42
Integer Data Types	43
Segment Registers	44
Segment Usage	44
Instruction Pointer	45
FloatingPoint Registers	45
FloatingPoint Register Data Types	48
MMX™/3DNow!™ Registers	49
MMX™ Data Types	49
3DNow!™ Data Types	50
EFLAGS Register	51
Control Registers	52
Debug Registers	54
ModelSpecific Registers (MSR)	57
Memory Management Registers	60
Task State Segment	62
Paging	63
Descriptors and Gates	66
Exceptions and Interrupts	69
3.2 AMDK6®2 Processor Model 8/[F:8] Registers	70
Extended Feature Enable Register (EFER)–Model 8/[F:8]	70
Write Handling Control Register (WHCR)–Model 8/[F:8]	71
UC/WC Cacheability Control Register (UWCCR)	72
Processor State Observability Register (PSOR)	73
Page Flush/Invalidate Register (PFIR)	73
3.3 Instructions Supported by the AMDK6®2 Processor	74
4 Signal Descriptions	103
4.1 Signal Terminology	103
4.2 A20M# (Address Bit 20 Mask)	105
4.3 A[31:3] (Address Bus)	106
4.4 ADS# (Address Strobe)	107
4.5 ADSC# (Address Strobe Copy)	107
4.6 AHOLD (Address Hold)	108
4.7 AP (Address Parity)	109
4.8 APCHK# (Address Parity Check)	110
4.9 BE[7:0]# (Byte Enables)	111
4.10 BF[2:0] (Bus Frequency)	112
4.11 BOFF# (Backoff)	113
4.12 BRDY# (Burst Ready)	114
4.13 BRDYC# (Burst Ready Copy)	115
4.14 BREQ (Bus Request)	116
4.15 CACHE# (Cacheable Access)	116
4.16 CLK (Clock)	117
4.17 D/C# (Data/Code)	117
4.18 D[63:0] (Data Bus)	118
4.19 DP[7:0] (Data Parity)	119
4.20 EADS# (External Address Strobe)	120
4.21 EWBE# (External Write Buffer Empty)	121
4.22 FERR# (FloatingPoint Error)	122
4.23 FLUSH# (Cache Flush)	123
4.24 HIT# (Inquire Cycle Hit)	124
4.25 HITM# (Inquire Cycle Hit To Modified Line)	124
4.26 HLDA (Hold Acknowledge)	125
4.27 HOLD (Bus Hold Request)	125
4.28 IGNNE# (Ignore Numeric Exception)	126
4.29 INIT (Initialization)	127
4.30 INTR (Maskable Interrupt)	128
4.31 INV (Invalidation Request)	128
4.32 KEN# (Cache Enable)	129
4.33 LOCK# (Bus Lock)	130
4.34 M/IO# (Memory or I/O)	131
4.35 NA# (Next Address)	132
4.36 NMI (NonMaskable Interrupt)	132
4.37 PCD (Page Cache Disable)	133
4.38 PCHK# (Parity Check)	134
4.39 PWT (Page Writethrough)	135
4.40 RESET (Reset)	136
4.41 RSVD (Reserved)	136
4.42 SCYC (Split Cycle)	137
4.43 SMI# (System Management Interrupt)	137
4.44 SMIACT# (System Management Interrupt Active)	138
4.45 STPCLK# (Stop Clock)	139
4.46 TCK (Test Clock)	139
4.47 TDI (Test Data Input)	140
4.48 TDO (Test Data Output)	140
4.49 TMS (Test Mode Select)	140
4.50 TRST# (Test Reset)	141
4.51 VCC2DET (VCC2 Detect)	141
4.52 VCC2H/L# (VCC2 High/Low)	141
4.53 W/R# (Write/Read)	142
4.54 WB/WT# (Writeback or Writethrough)	143
5 Bus Cycles	147
5.1 Timing Diagrams	147
5.2 Bus State Machine Diagram	149
Idle	150
Address	150
Data	150
DataNA# Requested	150
Pipeline Address	150
Pipeline Data	151
Transition	151
5.3 Memory Reads and Writes	152
SingleTransfer Memory Read and Write	152
Misaligned SingleTransfer Memory Read and Write	154
Burst Reads and Pipelined Burst Reads	156
Burst Writeback	158
5.4 I/O Read and Write	160
Basic I/O Read and Write	160
Misaligned I/O Read and Write	161
5.5 Inquire and Bus Arbitration Cycles	162
Hold and Hold Acknowledge Cycle	162
HOLDInitiated Inquire Hit to Shared or Exclusive Line	164
HOLDInitiated Inquire Hit to Modified Line	166
AHOLDInitiated Inquire Miss	168
AHOLDInitiated Inquire Hit to Shared or Exclusive Line	170
AHOLDInitiated Inquire Hit to Modified Line	172
AHOLD Restriction	174
Bus Backoff (BOFF#)	176
Locked Cycles	178
Basic Locked Operation	178
Locked Operation with BOFF# Intervention	180
Interrupt Acknowledge	182
5.6 Special Bus Cycles	184
Basic Special Bus Cycle	184
Shutdown Cycle	186
Stop Grant and Stop Clock States	187
INITInitiated Transition from Protected Mode to Real Mode	190
6 Poweron Configuration and Initialization	193
6.1 Signals Sampled During the Falling Transition of RESET	193
FLUSH#	193
BF[2:0]	193
BRDYC#	193
6.2 RESET Requirements	194
6.3 State of Processor After RESET	194
Output Signals	194
Registers	194
6.4 State of Processor After INIT	197
7 Cache Organization	199
7.1 MESI States in the Data Cache	200
7.2 Predecode Bits	200
7.3 Cache Operation	201
CacheRelated Signals	203
7.4 Cache Disabling and Flushing	203
7.5 CacheLine Fills	204
7.6 CacheLine Replacements	205
7.7 Write Allocate	206
Write to a Cacheable Page	206
Write to a Sector	207
Write Allocate Limit	207
Write Allocate Logic Mechanisms and Conditions	209
7.8 Prefetching	212
Hardware Prefetching	212
Software Prefetching	212
7.9 Cache States	212
7.10 Cache Coherency	214
Inquire Cycles	214
Internal Snooping	214
FLUSH#	215
PFIR	215
WBINVD and INVD	216
CacheLine Replacement	216
Cache Snooping	218
7.11 Writethrough versus Writeback Coherency States	219
7.12 A20M# Masking of Cache Accesses	219
8 Write Merge Buffer	221
8.1 EWBE Control	221
8.2 Memory Type Range Registers	223
UC/WC Cacheability Control Register (UWCCR)	223
9 FloatingPoint and Multimedia Execution Units	227
9.1 FloatingPoint Execution Unit	227
Handling FloatingPoint Exceptions	227
External Logic Support of FloatingPoint Exceptions	227
9.2 Multimedia and 3DNow!™ Execution Units	229
9.3 FloatingPoint and MMX™/3DNow!™ Instruction Compatibility	229
Registers	229
Exceptions	229
FERR# and IGNNE#	229
10 System Management Mode (SMM)	231
10.1 Overview	231
10.2 SMM Operating Mode and Default Register Values	231
10.3 SMM StateSave Area	234
10.4 SMM Revision Identifier	236
10.5 SMM Base Address	237
10.6 Halt Restart Slot	237
10.7 I/O Trap Dword	238
10.8 I/O Trap Restart Slot	239
10.9 Exceptions, Interrupts, and Debug in SMM	240
11 Test and Debug	241
11.1 BuiltIn SelfTest (BIST)	241
11.2 TriState Test Mode	242
11.3 BoundaryScan Test Access Port (TAP)	243
Test Access Port	243
TAP Signals	243
TAP Registers	244
TAP Instructions	251
TAP Controller State Machine	252
11.4 L1 Cache Inhibit	255
Purpose	255
11.5 Debug	256
Debug Registers	256
Debug Exceptions	261
12 Clock Control	263
12.1 Halt State	264
Enter Halt State	264
Exit Halt State	264
12.2 Stop Grant State	265
Enter Stop Grant State	265
Exit Stop Grant State	265
12.3 Stop Grant Inquire State	266
Enter Stop Grant Inquire State	266
Exit Stop Grant Inquire State	266
12.4 Stop Clock State	266
Enter Stop Clock State	266
Exit Stop Clock State	267
13 Power and Grounding	269
13.1 Power Connections	269
13.2 Decoupling Recommendations	270
13.3 Pin Connection Requirements	271
14 Electrical Data	273
14.1 Electrical Data for OPN Suffixes AHX, 400AFQ, and AFR	273
Operating Ranges	273
Absolute Ratings	274
DC Characteristics	274
Power Dissipation	277
14.2 Electrical Data for OPN Suffixes AGR, AFX, and 400AFR	278
Operating Ranges	278
Absolute Ratings	279
DC Characteristics	279
Power Dissipation	282
15 I/O Buffer Characteristics	283
15.1 Selectable Drive Strength	283
15.2 I/O Buffer Model	284
15.3 I/O Model Application Note	285
15.4 I/O Buffer AC and DC Characteristics	285
16 Signal Switching Characteristics	287
16.1 CLK Switching Characteristics	287
16.2 Clock Switching Characteristics for 100MHz Bus Operation	288
16.3 Clock Switching Characteristics for 66MHz Bus Operation	288
16.4 Valid Delay, Float, Setup, and Hold Timings	289
16.5 Output Delay Timings for 100MHz Bus Operation	290
16.6 Input Setup and Hold Timings for 100MHz Bus Operation	292
16.7 Output Delay Timings for 66MHz Bus Operation	294
16.8 Input Setup and Hold Timings for 66MHz Bus Operation	296
16.9 RESET and Test Signal Timing	298
17 Thermal Design	305
17.1 Package Thermal Specifications	305
Heat Dissipation Path	310
Measuring Case Temperature	310
17.2 Layout and Airflow Considerations	311
Voltage Regulator	311
Airflow Management in a System Design	312
18 Pin Description Diagram	315
19 Pin Designations	317
20 Package Specifications	319
20.1 321Pin Staggered CPGA Package Specification	319
21 Ordering Information	321

Match case Limit results 1 per page

Chapter 2

Internal Architecture

21850J/0—February 2000

AMD-K6

-2 Processor Data Sheet

Preliminary Information

The AMD-K6-2 processor implements a two-level branch

prediction scheme based on an 8192-entry branch history table.

The branch history table stores prediction information that is

used for predicting conditional branches. Because the branch

history table does not store predicted target addresses, special

address ALUs calculate target addresses on-the-fly during

instruction decode. The branch target cache augments

predicted branch performance by avoiding a one clock

cache-fetch penalty. This specialized target cache does this by

supplying the first 16 bytes of target instructions to the

decoders when branches are predicted. The return address

stack is a unique device specifically designed for optimizing

CALL and RETURN pairs. In summary, the AMD-K6-2

processor uses dynamic branch logic to minimize delays due to

the branch instructions that are common in x86 software.

3DNow!™ Technology.

AMD has taken a lead role in improving the

multimedia and 3D capabilities of the x86 processor family with

the introduction of 3DNow! technology, which uses a packed,

single-precision, floating-point data format and Single

Instruction Multiple Data (SIMD) operations based on the

MMX technology model.

2.3

Cache, Instruction Prefetch, and Predecode Bits

The writeback level-one cache on the AMD-K6-2 processor is

organized as a separate 32-Kbyte instruction cache and a

32-Kbyte data cache with two-way set associativity. The cache

line size is 32 bytes and lines are prefetched from main memory

using an efficient pipelined burst transaction. As the

instruction cache is filled, each instruction byte is analyzed for

instruction boundaries using predecoding logic. Predecoding

annotates information (5 bits per byte) to each instruction byte

that later enables the decoders to efficiently decode multiple

instructions simultaneously.

Cache

The processor cache design takes advantage of a sectored

organization (see Figure 2 on page 10). Each sector consists of

64 bytes configured as two 32-byte cache lines. The two cache

lines of a sector share a common tag but have separate pairs of

MESI (Modified, Exclusive, Shared, Invalid) bits that track the

state of each cache line.