AMD OS1354WBJ4BGHBOX Optimization Guide - Page 18
Instruction, Fetch, Decode, Integer
UPC - 730143266024
View all AMD OS1354WBJ4BGHBOX manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 18 highlights
Software Optimization Guide for AMD Family 16h Processors 52128 Rev. 1.1 March 2013 2.8 Instruction Fetch and Decode The AMD Family 16h processor fetches instructions in 32-byte naturally aligned blocks. The processor can perform an instruction block fetch every cycle. The first two branches in a 64-byte cache line are typically allocated into the same fetch window tracking structure entry. Each additional branch will be allocated into a separate fetch window tracking structure entry. The fetch unit sends these bytes to the decode unit through a 16-entry Instruction Byte Buffer (IBB) in two 16byte windows. The IBB acts as a decoupling queue between the fetch/branch-predict unit and the decode unit. The decode unit scans two of these windows in a given cycle, decoding a maximum of two instructions. The decode unit also contains a sideband stack optimizer, which tracks the stack-pointer value. This optimization removes the dependencies that arise during chains of PUSH and POP operations on the rSP register, and thereby improves the efficiency of the PUSH and POP instructions. 2.9 Integer Unit The integer unit consists of the following components: • schedulers • execution units • retire control unit The schedulers feed integer micro-ops to the execution units. The execution units carry out various types of operations further described below. The retire control unit serves as the final arbiter for exception processing versus instruction retirement. 2.9.1 Integer Schedulers The schedulers can receive up to two macro-ops per cycle, where they are broken down into micro-ops. ALU micro-ops are sent to the 20-entry ALU scheduler. Load and Store micro-ops are sent to the 12-entry address generation unit (AGU) scheduler. Each scheduler can issue up to two micro-ops per cycle. The scheduler tracks operand availability and dependency information as part of its task of issuing micro-ops to be executed. It also assures that older micro-ops which have been waiting for operands are executed in a timely manner. Micro-ops can be issued and executed out-of-order. 2.9.2 Integer Execution Units The AMD Family 16h processor contains 4 integer execution pipes. There are 2 ALUs connected to the ALU scheduler, one of which can also handle integer multiplies and divides. There are 2 AGUs connected to the AGU scheduler, one AGU dedicated for load address generation handling (LAGU), and the other AGU dedicated for store address generation handling (SAGU). Figure 2 below provides a block diagram of the integer schedulers and execution units for the AMD Family 16h processor core. 18 Microarchitecture of the Family 16h Processor Chapter 2