AMD OS1354WBJ4BGHBOX Optimization Guide - Page 20
Floating-point, Block, Diagram
UPC - 730143266024
View all AMD OS1354WBJ4BGHBOX manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 20 highlights
Software Optimization Guide for AMD Family 16h Processors 52128 Rev. 1.1 March 2013 paths are 128 bits wide. As a result, the maximum throughput of both single-precision and double-precision floating-point SSE vector operations has improved by a factor of two over the AMD Family 14h processor. The floating-point unit (FPU) utilizes a coprocessor model. As such it contains its own scheduler, register files, and renamers and does not share them with the integer units. It can handle dispatch and renaming of 2 floatingpoint macro-ops per cycle, and the scheduler can issue 1 micro-op per cycle for each pipe. The floating-point scheduler has an 18-entry micro-op capacity. The floating-point retire queue holds up to 44 floating-point micro-ops between dispatch and retire. Any macroop that has a floating-point micro-op component, and that is dispatched into the integer retire control unit, will be held in the floating-point retire queue until the macro-op retires from the integer retire control unit. Thus a maximum of 44 macro-ops which have floating-point micro-op components can be in-flight in the 64-macro-op in-flight window that the integer retire control unit provides. Figure 3. Floating-point Unit Block Diagram The FPU contains a 128-bit floating-point multiply unit (FPM) and a 128-bit floating-point adder unit (FPA). The FPM contains two 76-bit × 27-bit multipliers, which means that double precision (64-bit) and extended precision (80-bit) floating-point multiplication computations require iteration. A few selected floating-point micro-ops, primarily logical/move/shuffle micro-ops, can execute in either the FPM or the FPA. The FPU also contains two 128-bit vector arithmetic / logical units (VALUs) which perform arithmetic and logical operations on AVX, SSE, and legacy MMX packed integer data, and a 128-bit integer multiply unit (VIMUL). The store/ convert unit (STC) primarily handles stores (up to 128-bit operand size), floating-point / integer conversions, and integer / floating-point conversions. The register file and bypass network can also accept one 128-bit load per cycle from the load-store unit. There are two important organizational dimensions to understand with respect to the execution units. The first is the pipeline binding. Pipe 0 contains vector integer ALU 0 (VALU0), the vector integer multiplier (VIMUL), and the floating-point adder (FPA). Pipe 1 contains vector integer ALU 1 (VALU1), the store/convert unit, and 20 Microarchitecture of the Family 16h Processor Chapter 2