AMD OS1354WBJ4BGHBOX Optimization Guide - Page 10
Instruction, Decomposition, Superscalar, Organization
UPC - 730143266024
View all AMD OS1354WBJ4BGHBOX manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 10 highlights
Software Optimization Guide for AMD Family 16h Processors 52128 Rev. 1.1 March 2013 2.2 Instruction Decomposition The AMD Family 16h processor implements the AMD64 instruction set by means of macro-ops (the primary units of work managed by the processor) and micro-ops (the primitive operations executed in the processor's execution units). These operations are designed to include direct support for AMD64 instructions and adhere to the high-performance principles of fixed-length encoding, regularized instruction fields, and a large register set. This enhanced microarchitecture enables higher processor core performance and promotes straightforward extensibility for future designs. Instructions are marked as fastpath single (one macro-op), fastpath double (two macro-ops), or microcode (greater than 2 macro-ops). Macro-ops can normally contain up to 2 micro-ops. The table below lists some examples showing how instructions are mapped to macro-ops and how these macro-ops are mapped into one or more micro-ops. Table 1. Typical Instruction Mappings Instruction Macro-ops MOV reg,[mem] 1 MOV [mem],reg 1 MOV [mem],imm 1 REP MOVS [mem],[mem] Many ADD reg,reg 1 ADD reg,[mem] 1 ADD [mem],reg 1 MOVAPD [mem],xmm 1 VMOVAPD [mem],ymm 2 ADDPD xmm,xmm 1 ADDPD xmm,[mem] 1 VADDPD ymm,ymm 2 VADDPD ymm,[mem] 2 Micro-ops 1: load 1: store 2: move-imm, store Many 1: add 2: load, add 2: load/store, add 2: store, FP-store-data 4: 2 × {store, FP-store-data} 1: addpd 2: load, addpd 2: 2 × {addpd} 4: 2 × {load, addpd} Comments Fastpath single Fastpath single Fastpath single Microcode Fastpath single Fastpath single Fastpath single Fastpath single 256b AVX Fastpath double Fastpath single Fastpath single 256b AVX Fastpath double 256b AVX Fastpath double 2.3 Superscalar Organization The AMD Family 16h processor is an out-of-order, two-way superscalar AMD64 processor. It can fetch, decode, and retire up to two AMD64 instructions per cycle. The processor uses decoupled execution units to process instructions through fetch/branch-predict, decode, schedule/execute, and retirement pipelines. The processor can fetch 32 bytes per cycle and can scan two 16-byte instruction windows for up to two instruction decodes per cycle. The decoder marks each instruction as fastpath single, fastpath double, or microcode. The dispatcher can send up to two macro-ops to the retire unit for tracking, as well as sending the corresponding micro-ops to the schedulers. These are upper limits, however. The actual number of bytes fetched or scanned, instructions decoded, or macro-ops dispatched may be lower, depending on a number of factors such as whether instructions can be broken up into 16-byte windows. The processor uses decoupled independent schedulers, consisting of an integer ALU scheduler, an AGU scheduler, and a floating-point scheduler. These three schedulers can simultaneously issue up to six micro-ops to 10 Microarchitecture of the Family 16h Processor Chapter 2