Apple M9592LL Technology Overview - Page 8

Eight Double-Precision Floating-Point Units, Four Velocity Engine Units - a parts

Page 8 highlights

Technology Overview 8 Power Mac G5 Linpack A measure of a computer's floating-point execution performance, the Linpack benchmark solves a dense system of linear equations. The Power Mac G5 Quad executed the double-precision equations 88 percent faster than the dual 2.7GHz Power Mac G5 and an amazing 626 percent faster than the dual 1.42GHz Power Mac G4. Power Mac G5 Quad 2.5GHz Dual 2.7GHz Power Mac G5 Dual 1.42GHz Power Mac G4 2.9 gigaflops 21 gigaflops 11.1 gigaflops Eight Double-Precision Floating-Point Units The PowerPC G5 core contains two double-precision floating-point units, each capable of performing a multiply and an add at the same time. This means a Power Mac G5 Quad, with four processor cores and a total of eight floating-point units, can complete up to sixteen 64-bit floating-point operations in a single cycle. Such immense 64-bit computational power accelerates applications in many fields, including audio creation, 3D content creation, and scientific visualization and analysis- resulting in performance levels far beyond those of previous Power Mac generations. Fused multiply-add example The floating-point units in the PowerPC G5 can complete both a multiply and an add operation as part of the same machine instruction-accelerating matrix multiplication, vector dot products, and other scientific computations. Referred to as fused multiplyadd, or "fmadd," this instruction is considered a building block for data-intensive floating-point computation. The following computation can be completed by a fused multiply-add instruction in one pass through either of the two floating-point units in a PowerPC G5 core: T = (a * b) + c On other processors, two instructions are required. The first is a multiply instruction: U = (a * b) The product "U" is used by a second instruction, an addition, to complete the computation: V = U + c In processors with comparable clock speeds, the computation of "(a * b) + c" is completed twice as fast using fused multiply-add. It also delivers a more accurate result, because round-o∂ occurs just once in the computation of "T"-while on other processors, round-o∂ occurs twice: in the computation of "U" and in the computation of "V." Gigaflops The gigaflops test indicates a system's vector processing capability by measuring the maximum number of floating-point operations it can perform. With four Velocity Engine units, the Power Mac G5 Quad completed the test 85 percent faster than the dual 2.7GHz Power Mac G5 and 260 percent faster than the dual 1.42GHz Power Mac G4. Power Mac G5 Quad 2.5GHz Dual 2.7GHz Power Mac G5 Dual 1.42GHz Power Mac G4 76.6 gigaflops 41.1 gigaflops 21.3 gigaflops Four Velocity Engine Units A dual-pipelined Velocity Engine in each processor core is optimized with two independent queues and dedicated 128-bit registers and data paths for e∑cient instruction and data flow. This 128-bit vector processing unit accelerates data manipulation by applying a single instruction to multiple data at the same time, known as SIMD processing. Originally implemented in the PowerPC G4, the Velocity Engine in the PowerPC G5 uses the same set of 162 instructions, enabling it to accelerate existing Mac OS X applications that have been optimized for the Velocity Engine. Vector processing is useful for transforming large sets of data, such as manipulating an image or rendering a video e∂ect. For example, when a designer uses a filter to apply a motion blur to an image, each pixel of the image must be changed according to the same set of instructions-a highly repetitive processing task. Each Velocity Engine pipeline speeds up this task by processing up to 128 bits of data, in four 32-bit integers, eight 16-bit integers, sixteen 8-bit integers, or four 32-bit single-precision floating-point values, in a single clock cycle. That works out to 16 simultaneous 32-bit floating-point operations on a Power Mac G5 Quad.

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33

8
Technology Overview
Power Mac G5
Eight Double-Precision Floating-Point Units
The PowerPC G5 core contains two double-precision floating-point units, each capable
of performing a multiply and an add at the same time. This means a Power Mac G5
Quad, with four processor cores and a total of eight floating-point units, can complete
up to sixteen 64-bit floating-point operations in a single cycle.
Such immense 64-bit computational power accelerates applications in many fields,
including audio creation, 3D content creation, and scientific visualization and analysis—
resulting in performance levels far beyond those of previous Power Mac generations.
Fused multiply-add example
The floating-point units in the PowerPC G5 can complete both a multiply and an add
operation as part of the same machine instruction—accelerating matrix multiplication,
vector dot products, and other scientific computations. Referred to as fused multiply-
add, or “fmadd,” this instruction is considered a building block for data-intensive
floating-point computation.
The following computation can be completed by a fused multiply-add instruction in
one pass through either of the two floating-point units in a PowerPC G5 core:
T = (a * b) + c
On other processors, two instructions are required. The first is a multiply instruction:
U = (a * b)
The product “U” is used by a second instruction, an addition, to complete the
computation:
V = U + c
In processors with comparable clock speeds, the computation of “(a * b) + c” is com-
pleted twice as fast using fused multiply-add. It also delivers a more accurate result,
because round-o∂
occurs just once in the computation of “T”—while on other proces-
sors, round-o∂
occurs twice: in the computation of “U” and in the computation of “V.”
Four Velocity Engine Units
A dual-pipelined Velocity Engine in each processor core is optimized with two inde-
pendent queues and dedicated 128-bit registers and data paths for e∑
cient instruction
and data flow. This 128-bit vector processing unit accelerates data manipulation by
applying a single instruction to multiple data at the same time, known as SIMD pro-
cessing. Originally implemented in the PowerPC G4, the Velocity Engine in the PowerPC
G5 uses the same set of 162 instructions, enabling it to accelerate existing Mac OS X
applications that have been optimized for the Velocity Engine.
Vector processing is useful for transforming large sets of data, such as manipulating an
image or rendering a video e∂
ect. For example, when a designer uses a filter to apply
a motion blur to an image, each pixel of the image must be changed according to
the same set of instructions—a highly repetitive processing task. Each Velocity Engine
pipeline speeds up this task by processing up to 128 bits of data, in four 32-bit integers,
eight 16-bit integers, sixteen 8-bit integers, or four 32-bit single-precision floating-point
values, in a single clock cycle. That works out to 16 simultaneous 32-bit floating-point
operations on a Power Mac G5 Quad.
Linpack
A measure of a computer’s floating-point
execution performance, the Linpack
benchmark solves a dense system of linear
equations. The Power Mac G5 Quad executed
the double-precision equations 88 percent
faster than the dual 2.7GHz Power Mac G5
and an amazing 626 percent faster than the
dual 1.42GHz Power Mac G4.
Gigaflops
The gigaflops test indicates a system’s
vector processing capability by measuring
the maximum number of floating-point
operations it can perform. With four Velocity
Engine units, the Power Mac G5 Quad
completed the test 85 percent faster than the
dual 2.7GHz Power Mac G5 and 260 percent
faster than the dual 1.42GHz Power Mac G4.
11.1 gigaflops
Power Mac G5
Quad 2.5GHz
21 gigaflops
Dual 2.7GHz
Power Mac G5
2.9 gigaflops
Dual 1.42GHz
Power Mac G4
41.1 gigaflops
Power Mac G5
Quad 2.5GHz
76.6 gigaflops
Dual 2.7GHz
Power Mac G5
21.3 gigaflops
Dual 1.42GHz
Power Mac G4