AMD OS1354WBJ4BGHBOX Optimization Guide - Page 9
Movbe, Xsave, Xsaveopt, Lzcnt, Popcnt, Rdrand, Invpcid
UPC - 730143266024
View all AMD OS1354WBJ4BGHBOX manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 9 highlights
52128 Rev. 1.1 March 2013 Software Optimization Guide for AMD Family 16h Processors • 128-bit and 256-bit single-instruction / multiple-data (SIMD) instructions. The following instruction subsets are supported: • Streaming SIMD Extensions 1 (SSE1) • Streaming SIMD Extensions 2 (SSE2) • Streaming SIMD Extensions 3 (SSE3) • Supplemental Streaming SIMD Extensions 3 (SSSE3) • Streaming SIMD Extensions 4a (SSE4a) • Streaming SIMD Extensions 4.1 (SSE4.1) • Streaming SIMD Extensions 4.2 (SSE4.2) • Advanced Vector Extensions (AVX) • Half-precision floating-point conversion (F16C) • Carry-less Multiply (CLMUL) instructions • Advanced Encryption Standard (AES) acceleration instructions • Bit Manipulation Instructions (BMI) • Move Big-Endian instruction (MOVBE) • XSAVE / XSAVEOPT • LZCNT / POPCNT • AMD Virtualization™ technology (AMD-V™) The AMD Family 16h processor does not support the following instruction subsets: • Fused Multiply/Add instructions (FMA3 / FMA4) • XOP instructions • Trailing bit manipulation (TBM) instructions • Light-weight profiling (LWP) instructions • Read and write fsbase and gsbase instructions • RDRAND, and INVPCID instructions The AMD Family 16h processor includes many features designed to improve software performance. The microarchitecture provides the following key features: • Unified 1-2-Mbyte L2 cache shared by up to 4 cores • Integrated memory controller with memory prefetcher • 32-Kbyte L1 instruction cache per core • 32-Kbyte L1 data cache per core • Prefetchers for L2 cache, L1 data cache, and L1 instruction cache • Advanced dynamic branch prediction • 32-byte instruction fetch • 2-way x86 instruction decoding with sideband stack optimizer • Dynamic out-of-order scheduling and speculative execution • Two-way integer execution • Two-way address generation (1 load and 1 store) • Two-way 128-bit wide floating-point and packed integer execution • Integer hardware divider • Superforwarding • L1 Instruction TLB of 32 4-Kbyte entries and L1 Data TLB of 40 4-Kbyte entries • Four fully-symmetric core performance counters Chapter 2 Microarchitecture of the Family 16h Processor 9