Home » AMD Manuals » Processors » AMD AMD-K6-2/500AFX » Manual Viewer

AMD AMD-K6-2/500AFX Data Sheet - Page 38

Branch History Table, Branch Target Cache, Return Address Stack, bytes. In total

View all AMD AMD-K6-2/500AFX manuals

Add to My Manuals
Save this manual to your list of manuals

Page 38 highlights

AMD-K6®-2 Processor Data Sheet Preliminary Information 21850J/0-February 2000 Branch History Table Branch Target Cache Return Address Stack program behavior and its negative effects on instruction execution, such as stalls due to delayed instruction fetching and the draining of the processor pipeline. The branch logic contains an 8192-entry branch history table, a 16-entry by 16-byte branch target cache, a 16-entry return address stack, and a branch execution unit. The AMD-K6-2 processor handles unconditional branches without any penalty by redirecting instruction fetching to the target address of the unconditional branch. However, conditional branches require the use of the dynamic branch-prediction mechanism built into the AMD-K6-2 p ro c e s s o r. A t wo -l eve l a d a p t ive h i s t o ry a l g o r i t h m i s implemented in an 8192-entry branch history table. This table stores executed branch information, predicts individual branches, and predicts the behavior of groups of branches. To accommodate the large branch history table, the AMD-K6-2 processor does not store predicted target addresses. Instead, the branch target addresses are calculated on-the-fly using ALUs during the decode stage. The adders calculate all possible target addresses before the instructions are fully decoded and the processor chooses which addresses are valid. To avoid a one clock cache-fetch penalty when a branch is predicted taken, a built-in branch target cache supplies the first 16 bytes of instructions directly to the instruction buffer (assuming the target address hits this cache). (See Figure 3 on page 11.) The branch target cache is organized as 16 entries of 16 bytes. In total, the branch prediction logic achieves branch prediction rates greater than 95%. The return address stack is a special device designed to optimize CALL and RET pairs. Software is typically compiled with subroutines that are frequently called from various places in a program. This is usually done to save space. Entry into the subroutine occurs with the execution of a CALL instruction. At that time, the processor pushes the address of the next instruction in memory following the CALL instruction onto the stack (allocated space in memory). When the processor encounters a RET instruction (within or at the end of the subroutine), the branch logic pops the address from the stack and begins fetching from that location. To avoid the latency of main memory accesses during CALL and RET operations, the return address stack caches the pushed addresses. 18 Internal Architecture Chapter 2

Section	Page
Contents	3
List of Figures	11
List of Tables	15
Revision History	19
1 AMDK6®2 Processor	21
1.1 Super7™ Platform Initiative	23
Super7™ Platform Enhancements	23
Super7™ Platform Advantages	24
2 Internal Architecture	25
2.1 Introduction	25
2.2 AMDK6®2 Processor Microarchitecture Overview	25
Enhanced RISC86® Microarchitecture	26
2.3 Cache, Instruction Prefetch, and Predecode Bits	29
Cache	29
Prefetching	30
Predecode Bits	30
2.4 Instruction Fetch and Decode	31
Instruction Fetch	31
Instruction Decode	32
2.5 Centralized Scheduler	34
2.6 Execution Units	35
Register X and Y Pipelines	36
2.7 BranchPrediction Logic	37
Branch History Table	38
Branch Target Cache	38
Return Address Stack	38
Branch Execution Unit	39
3 Software Environment	41
3.1 Registers	41
GeneralPurpose Registers	42
Integer Data Types	43
Segment Registers	44
Segment Usage	44
Instruction Pointer	45
FloatingPoint Registers	45
FloatingPoint Register Data Types	48
MMX™/3DNow!™ Registers	49
MMX™ Data Types	49
3DNow!™ Data Types	50
EFLAGS Register	51
Control Registers	52
Debug Registers	54
ModelSpecific Registers (MSR)	57
Memory Management Registers	60
Task State Segment	62
Paging	63
Descriptors and Gates	66
Exceptions and Interrupts	69
3.2 AMDK6®2 Processor Model 8/[F:8] Registers	70
Extended Feature Enable Register (EFER)–Model 8/[F:8]	70
Write Handling Control Register (WHCR)–Model 8/[F:8]	71
UC/WC Cacheability Control Register (UWCCR)	72
Processor State Observability Register (PSOR)	73
Page Flush/Invalidate Register (PFIR)	73
3.3 Instructions Supported by the AMDK6®2 Processor	74
4 Signal Descriptions	103
4.1 Signal Terminology	103
4.2 A20M# (Address Bit 20 Mask)	105
4.3 A[31:3] (Address Bus)	106
4.4 ADS# (Address Strobe)	107
4.5 ADSC# (Address Strobe Copy)	107
4.6 AHOLD (Address Hold)	108
4.7 AP (Address Parity)	109
4.8 APCHK# (Address Parity Check)	110
4.9 BE[7:0]# (Byte Enables)	111
4.10 BF[2:0] (Bus Frequency)	112
4.11 BOFF# (Backoff)	113
4.12 BRDY# (Burst Ready)	114
4.13 BRDYC# (Burst Ready Copy)	115
4.14 BREQ (Bus Request)	116
4.15 CACHE# (Cacheable Access)	116
4.16 CLK (Clock)	117
4.17 D/C# (Data/Code)	117
4.18 D[63:0] (Data Bus)	118
4.19 DP[7:0] (Data Parity)	119
4.20 EADS# (External Address Strobe)	120
4.21 EWBE# (External Write Buffer Empty)	121
4.22 FERR# (FloatingPoint Error)	122
4.23 FLUSH# (Cache Flush)	123
4.24 HIT# (Inquire Cycle Hit)	124
4.25 HITM# (Inquire Cycle Hit To Modified Line)	124
4.26 HLDA (Hold Acknowledge)	125
4.27 HOLD (Bus Hold Request)	125
4.28 IGNNE# (Ignore Numeric Exception)	126
4.29 INIT (Initialization)	127
4.30 INTR (Maskable Interrupt)	128
4.31 INV (Invalidation Request)	128
4.32 KEN# (Cache Enable)	129
4.33 LOCK# (Bus Lock)	130
4.34 M/IO# (Memory or I/O)	131
4.35 NA# (Next Address)	132
4.36 NMI (NonMaskable Interrupt)	132
4.37 PCD (Page Cache Disable)	133
4.38 PCHK# (Parity Check)	134
4.39 PWT (Page Writethrough)	135
4.40 RESET (Reset)	136
4.41 RSVD (Reserved)	136
4.42 SCYC (Split Cycle)	137
4.43 SMI# (System Management Interrupt)	137
4.44 SMIACT# (System Management Interrupt Active)	138
4.45 STPCLK# (Stop Clock)	139
4.46 TCK (Test Clock)	139
4.47 TDI (Test Data Input)	140
4.48 TDO (Test Data Output)	140
4.49 TMS (Test Mode Select)	140
4.50 TRST# (Test Reset)	141
4.51 VCC2DET (VCC2 Detect)	141
4.52 VCC2H/L# (VCC2 High/Low)	141
4.53 W/R# (Write/Read)	142
4.54 WB/WT# (Writeback or Writethrough)	143
5 Bus Cycles	147
5.1 Timing Diagrams	147
5.2 Bus State Machine Diagram	149
Idle	150
Address	150
Data	150
DataNA# Requested	150
Pipeline Address	150
Pipeline Data	151
Transition	151
5.3 Memory Reads and Writes	152
SingleTransfer Memory Read and Write	152
Misaligned SingleTransfer Memory Read and Write	154
Burst Reads and Pipelined Burst Reads	156
Burst Writeback	158
5.4 I/O Read and Write	160
Basic I/O Read and Write	160
Misaligned I/O Read and Write	161
5.5 Inquire and Bus Arbitration Cycles	162
Hold and Hold Acknowledge Cycle	162
HOLDInitiated Inquire Hit to Shared or Exclusive Line	164
HOLDInitiated Inquire Hit to Modified Line	166
AHOLDInitiated Inquire Miss	168
AHOLDInitiated Inquire Hit to Shared or Exclusive Line	170
AHOLDInitiated Inquire Hit to Modified Line	172
AHOLD Restriction	174
Bus Backoff (BOFF#)	176
Locked Cycles	178
Basic Locked Operation	178
Locked Operation with BOFF# Intervention	180
Interrupt Acknowledge	182
5.6 Special Bus Cycles	184
Basic Special Bus Cycle	184
Shutdown Cycle	186
Stop Grant and Stop Clock States	187
INITInitiated Transition from Protected Mode to Real Mode	190
6 Poweron Configuration and Initialization	193
6.1 Signals Sampled During the Falling Transition of RESET	193
FLUSH#	193
BF[2:0]	193
BRDYC#	193
6.2 RESET Requirements	194
6.3 State of Processor After RESET	194
Output Signals	194
Registers	194
6.4 State of Processor After INIT	197
7 Cache Organization	199
7.1 MESI States in the Data Cache	200
7.2 Predecode Bits	200
7.3 Cache Operation	201
CacheRelated Signals	203
7.4 Cache Disabling and Flushing	203
7.5 CacheLine Fills	204
7.6 CacheLine Replacements	205
7.7 Write Allocate	206
Write to a Cacheable Page	206
Write to a Sector	207
Write Allocate Limit	207
Write Allocate Logic Mechanisms and Conditions	209
7.8 Prefetching	212
Hardware Prefetching	212
Software Prefetching	212
7.9 Cache States	212
7.10 Cache Coherency	214
Inquire Cycles	214
Internal Snooping	214
FLUSH#	215
PFIR	215
WBINVD and INVD	216
CacheLine Replacement	216
Cache Snooping	218
7.11 Writethrough versus Writeback Coherency States	219
7.12 A20M# Masking of Cache Accesses	219
8 Write Merge Buffer	221
8.1 EWBE Control	221
8.2 Memory Type Range Registers	223
UC/WC Cacheability Control Register (UWCCR)	223
9 FloatingPoint and Multimedia Execution Units	227
9.1 FloatingPoint Execution Unit	227
Handling FloatingPoint Exceptions	227
External Logic Support of FloatingPoint Exceptions	227
9.2 Multimedia and 3DNow!™ Execution Units	229
9.3 FloatingPoint and MMX™/3DNow!™ Instruction Compatibility	229
Registers	229
Exceptions	229
FERR# and IGNNE#	229
10 System Management Mode (SMM)	231
10.1 Overview	231
10.2 SMM Operating Mode and Default Register Values	231
10.3 SMM StateSave Area	234
10.4 SMM Revision Identifier	236
10.5 SMM Base Address	237
10.6 Halt Restart Slot	237
10.7 I/O Trap Dword	238
10.8 I/O Trap Restart Slot	239
10.9 Exceptions, Interrupts, and Debug in SMM	240
11 Test and Debug	241
11.1 BuiltIn SelfTest (BIST)	241
11.2 TriState Test Mode	242
11.3 BoundaryScan Test Access Port (TAP)	243
Test Access Port	243
TAP Signals	243
TAP Registers	244
TAP Instructions	251
TAP Controller State Machine	252
11.4 L1 Cache Inhibit	255
Purpose	255
11.5 Debug	256
Debug Registers	256
Debug Exceptions	261
12 Clock Control	263
12.1 Halt State	264
Enter Halt State	264
Exit Halt State	264
12.2 Stop Grant State	265
Enter Stop Grant State	265
Exit Stop Grant State	265
12.3 Stop Grant Inquire State	266
Enter Stop Grant Inquire State	266
Exit Stop Grant Inquire State	266
12.4 Stop Clock State	266
Enter Stop Clock State	266
Exit Stop Clock State	267
13 Power and Grounding	269
13.1 Power Connections	269
13.2 Decoupling Recommendations	270
13.3 Pin Connection Requirements	271
14 Electrical Data	273
14.1 Electrical Data for OPN Suffixes AHX, 400AFQ, and AFR	273
Operating Ranges	273
Absolute Ratings	274
DC Characteristics	274
Power Dissipation	277
14.2 Electrical Data for OPN Suffixes AGR, AFX, and 400AFR	278
Operating Ranges	278
Absolute Ratings	279
DC Characteristics	279
Power Dissipation	282
15 I/O Buffer Characteristics	283
15.1 Selectable Drive Strength	283
15.2 I/O Buffer Model	284
15.3 I/O Model Application Note	285
15.4 I/O Buffer AC and DC Characteristics	285
16 Signal Switching Characteristics	287
16.1 CLK Switching Characteristics	287
16.2 Clock Switching Characteristics for 100MHz Bus Operation	288
16.3 Clock Switching Characteristics for 66MHz Bus Operation	288
16.4 Valid Delay, Float, Setup, and Hold Timings	289
16.5 Output Delay Timings for 100MHz Bus Operation	290
16.6 Input Setup and Hold Timings for 100MHz Bus Operation	292
16.7 Output Delay Timings for 66MHz Bus Operation	294
16.8 Input Setup and Hold Timings for 66MHz Bus Operation	296
16.9 RESET and Test Signal Timing	298
17 Thermal Design	305
17.1 Package Thermal Specifications	305
Heat Dissipation Path	310
Measuring Case Temperature	310
17.2 Layout and Airflow Considerations	311
Voltage Regulator	311
Airflow Management in a System Design	312
18 Pin Description Diagram	315
19 Pin Designations	317
20 Package Specifications	319
20.1 321Pin Staggered CPGA Package Specification	319
21 Ordering Information	321

Match case Limit results 1 per page

Internal Architecture

Chapter 2

AMD-K6

-2 Processor Data Sheet

21850J/0—February 2000

Preliminary Information

program behavior and its negative effects on instruction

execution, such as stalls due to delayed instruction fetching and

the draining of the processor pipeline. The branch logic

contains an 8192-entry branch history table, a 16-entry by

16-byte branch target cache, a 16-entry return address stack,

and a branch execution unit.

Branch History Table

The AMD-K6-2 processor handles unconditional branches

without any penalty by redirecting instruction fetching to the

target address of the unconditional branch. However,

conditional branches require the use of the dynamic

branch-prediction mechanism built into the AMD-K6-2

processor. A two-level adaptive history algorithm is

implemented in an 8192-entry branch history table. This table

stores executed branch information, predicts individual

branches, and predicts the behavior of groups of branches. To

accommodate the large branch history table, the AMD-K6-2

processor does not store predicted target addresses. Instead,

the branch target addresses are calculated on-the-fly using

ALUs during the decode stage. The adders calculate all

possible target addresses before the instructions are fully

decoded and the processor chooses which addresses are valid.

Branch Target Cache

To avoid a one clock cache-fetch penalty when a branch is

predicted taken, a built-in branch target cache supplies the first

16 bytes of instructions directly to the instruction buffer

(assuming the target address hits this cache). (See Figure 3 on

page 11.) The branch target cache is organized as 16 entries of

16 bytes. In total, the branch prediction logic achieves branch

prediction rates greater than 95%.

Return Address Stack

The return address stack is a special device designed to

optimize CALL and RET pairs. Software is typically compiled

with subroutines that are frequently called from various places

in a program. This is usually done to save space. Entry into the

subroutine occurs with the execution of a CALL instruction. At

that time, the processor pushes the address of the next

instruction in memory following the CALL instruction onto the

stack (allocated space in memory). When the processor

encounters a RET instruction (within or at the end of the

subroutine), the branch logic pops the address from the stack

and begins fetching from that location. To avoid the latency of

main memory accesses during CALL and RET operations, the

return address stack caches the pushed addresses.