AMD OS1354WBJ4BGHBOX Optimization Guide - Page 3

Contents - processor

Page 3 highlights

52128 Rev. 1.1 March 2013 Contents Software Optimization Guide for AMD Family 16h Processors Revision History...6 1 Preface...7 2 Microarchitecture of the Family 16h Processor 8 2.1 Features...8 2.2 Instruction Decomposition...10 2.3 Superscalar Organization...10 2.4 Processor Block Diagram...11 2.5 Processor Cache Operations...11 2.5.1 L1 Instruction Cache...12 2.5.2 L1 Data Cache...12 2.5.3 L2 Cache...12 2.6 Memory Address Translation...13 2.6.1 L1 Translation Lookaside Buffers...13 2.6.2 L2 Translation Lookaside Buffers...13 2.6.3 Hardware Page Table Walker...13 2.7 Optimizing Branching...13 2.7.1 Branch Prediction...13 2.7.2 Loop Alignment...16 2.8 Instruction Fetch and Decode...18 2.9 Integer Unit...18 2.9.1 Integer Schedulers...18 2.9.2 Integer Execution Units...18 2.9.3 Retire Control Unit...19 2.10 Floating-Point Unit...19 2.10.1 Denormals...21 2.11 XMM Register Merge Optimization...22 2.12 Load Store Unit...23 Appendix A Instruction Latencies...24 A.1 Instruction Latency Assumptions...24 A.2 Spreadsheet Column Descriptions...24 Contents 3

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

Contents
Revision
History
......................................................................................................................................
6
1
Preface
...................................................................................................................................................
7
2
Microarchitecture
of
the
Family
16
h
Processor
................................................................................
8
2.1
Features
....................................................................................................................................................................
8
2.2
Instruction
Decomposition
.....................................................................................................................................
10
2.3
Superscalar
Organization
.......................................................................................................................................
10
2.4
Processor
Block
Diagram
......................................................................................................................................
11
2.5
Processor
Cache
Operations
..................................................................................................................................
11
2.5.1
L
1
Instruction
Cache
...............................................................................................................................
12
2.5.2
L
1
Data
Cache
.........................................................................................................................................
12
2.5.3
L
2
Cache
.................................................................................................................................................
12
2.6
Memory
Address
Translation
................................................................................................................................
13
2.6.1
L
1
Translation
Lookaside
Buffers
..........................................................................................................
13
2.6.2
L
2
Translation
Lookaside
Buffers
..........................................................................................................
13
2.6.3
Hardware
Page
Table
Walker
.................................................................................................................
13
2.7
Optimizing
Branching
............................................................................................................................................
13
2.7.1
Branch
Prediction
....................................................................................................................................
13
2.7.2
Loop
Alignment
......................................................................................................................................
16
2.8
Instruction
Fetch
and
Decode
................................................................................................................................
18
2.9
Integer
Unit
............................................................................................................................................................
18
2.9.1
Integer
Schedulers
...................................................................................................................................
18
2.9.2
Integer
Execution
Units
..........................................................................................................................
18
2.9.3
Retire
Control
Unit
.................................................................................................................................
19
2.10
Floating-Point
Unit
..............................................................................................................................................
19
2.10.1
Denormals
.............................................................................................................................................
21
2.11
XMM
Register
Merge
Optimization
....................................................................................................................
22
2.12
Load
Store
Unit
....................................................................................................................................................
23
Appendix
A
Instruction
Latencies
.......................................................................................................
24
A
.1
Instruction
Latency
Assumptions
..........................................................................................................................
24
A
.2
Spreadsheet
Column
Descriptions
........................................................................................................................
24
52128
Rev
. 1.1
March
2013
Software
Optimization
Guide
for
AMD
Family
16
h
Processors
Contents
3