AMD OS1354WBJ4BGHBOX Optimization Guide - Page 18

Instruction, Fetch, Decode, Integer

Page 18 highlights

Software Optimization Guide for AMD Family 16h Processors 52128 Rev. 1.1 March 2013 2.8 Instruction Fetch and Decode The AMD Family 16h processor fetches instructions in 32-byte naturally aligned blocks. The processor can perform an instruction block fetch every cycle. The first two branches in a 64-byte cache line are typically allocated into the same fetch window tracking structure entry. Each additional branch will be allocated into a separate fetch window tracking structure entry. The fetch unit sends these bytes to the decode unit through a 16-entry Instruction Byte Buffer (IBB) in two 16byte windows. The IBB acts as a decoupling queue between the fetch/branch-predict unit and the decode unit. The decode unit scans two of these windows in a given cycle, decoding a maximum of two instructions. The decode unit also contains a sideband stack optimizer, which tracks the stack-pointer value. This optimization removes the dependencies that arise during chains of PUSH and POP operations on the rSP register, and thereby improves the efficiency of the PUSH and POP instructions. 2.9 Integer Unit The integer unit consists of the following components: • schedulers • execution units • retire control unit The schedulers feed integer micro-ops to the execution units. The execution units carry out various types of operations further described below. The retire control unit serves as the final arbiter for exception processing versus instruction retirement. 2.9.1 Integer Schedulers The schedulers can receive up to two macro-ops per cycle, where they are broken down into micro-ops. ALU micro-ops are sent to the 20-entry ALU scheduler. Load and Store micro-ops are sent to the 12-entry address generation unit (AGU) scheduler. Each scheduler can issue up to two micro-ops per cycle. The scheduler tracks operand availability and dependency information as part of its task of issuing micro-ops to be executed. It also assures that older micro-ops which have been waiting for operands are executed in a timely manner. Micro-ops can be issued and executed out-of-order. 2.9.2 Integer Execution Units The AMD Family 16h processor contains 4 integer execution pipes. There are 2 ALUs connected to the ALU scheduler, one of which can also handle integer multiplies and divides. There are 2 AGUs connected to the AGU scheduler, one AGU dedicated for load address generation handling (LAGU), and the other AGU dedicated for store address generation handling (SAGU). Figure 2 below provides a block diagram of the integer schedulers and execution units for the AMD Family 16h processor core. 18 Microarchitecture of the Family 16h Processor Chapter 2

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26

2.8
Instruction
Fetch
and
Decode
The
AMD
Family
16
h
processor
fetches
instructions
in
32
-byte
naturally
aligned
blocks
.
The
processor
can
perform
an
instruction
block
fetch
every
cycle
.
The
first
two
branches
in
a
64
-byte
cache
line
are
typically
allocated
into
the
same
fetch
window
tracking
structure
entry
.
Each
additional
branch
will
be
allocated
into
a
separate
fetch
window
tracking
structure
entry
.
The
fetch
unit
sends
these
bytes
to
the
decode
unit
through
a
16
-entry
Instruction
Byte
Buffer
(
IBB
in
two
16
-
byte
windows
.
The
IBB
acts
as
a
decoupling
queue
between
the
fetch
/
branch-predict
unit
and
the
decode
unit
.
The
decode
unit
scans
two
of
these
windows
in
a
given
cycle
,
decoding
a
maximum
of
two
instructions
.
The
decode
unit
also
contains
a
sideband
stack
optimizer
,
which
tracks
the
stack-pointer
value
.
This
optimization
removes
the
dependencies
that
arise
during
chains
of
PUSH
and
POP
operations
on
the
rSP
register
,
and
thereby
improves
the
efficiency
of
the
PUSH
and
POP
instructions
.
2.9
Integer
Unit
The
integer
unit
consists
of
the
following
components
:
schedulers
execution
units
retire
control
unit
The
schedulers
feed
integer
micro-ops
to
the
execution
units
.
The
execution
units
carry
out
various
types
of
operations
further
described
below
.
The
retire
control
unit
serves
as
the
final
arbiter
for
exception
processing
versus
instruction
retirement
.
2.9.1
Integer
Schedulers
The
schedulers
can
receive
up
to
two
macro-ops
per
cycle
,
where
they
are
broken
down
into
micro-ops
.
ALU
micro-ops
are
sent
to
the
20
-entry
ALU
scheduler
.
Load
and
Store
micro-ops
are
sent
to
the
12
-entry
address
generation
unit
(
AGU
scheduler
.
Each
scheduler
can
issue
up
to
two
micro-ops
per
cycle
.
The
scheduler
tracks
operand
availability
and
dependency
information
as
part
of
its
task
of
issuing
micro-ops
to
be
executed
.
It
also
assures
that
older
micro-ops
which
have
been
waiting
for
operands
are
executed
in
a
timely
manner
.
Micro-ops
can
be
issued
and
executed
out-of-order
.
2.9.2
Integer
Execution
Units
The
AMD
Family
16
h
processor
contains
4
integer
execution
pipes
.
There
are
2
ALUs
connected
to
the
ALU
scheduler
,
one
of
which
can
also
handle
integer
multiplies
and
divides
.
There
are
2
AGUs
connected
to
the
AGU
scheduler
,
one
AGU
dedicated
for
load
address
generation
handling
(
LAGU
,
and
the
other
AGU
dedicated
for
store
address
generation
handling
(
SAGU
.
Figure
2
below
provides
a
block
diagram
of
the
integer
schedulers
and
execution
units
for
the
AMD
Family
16
h
processor
core
.
Software
Optimization
Guide
for
AMD
Family
16
h
Processors
52128
Rev
. 1.1
March
2013
18
Microarchitecture
of
the
Family
16
h
Processor
Chapter
2