HP DL360 The Intel processor roadmap for industry-standard servers technology - Page 10

Extended hyper-pipeline, SSE3 instructions, 64-bit extensions —Intel 64

Page 10 highlights

Extended hyper-pipeline In keeping with its history of regularly increasing processor frequencies, Intel extended the hyperpipeline queue from 20 (in the earlier Pentium 4 design) to 31 stages. The biggest drawback to this approach is that, as the pipe gets longer, interruptions (stalls) to the regular flow of instructions in the pipe become progressively more costly in terms of performance. To mitigate such stalls, Intel improved the branch-prediction algorithm sufficiently to prevent this deeper pipeline from causing performance degradation. SSE3 instructions The Prescott design added Streaming Single-Instruction-Multiple-Data (SIMD) Extensions 3, or Prescott New Instructions. As they did in earlier processors, SIMD instructions provide the potential for improved performance because each instruction permits operation on multiple data items at the same time. Prescott processors had newer versions of arithmetic, graphics, and HT synchronization instructions. The arithmetic group consists of one new instruction for converting x87 data into integer format, and five instructions that simplify the process of performing complex arithmetic. Complex numbers actually consist of two numbers: a real and an imaginary component. The additional instructions facilitate complex operations because they are designed to operate on both parts of these complex pairs of numbers at the same time. Using these instructions also simplifies coding complex arithmetic operations because fewer instructions are needed to accomplish the goal. The graphics group contains one instruction for video encoding and four that are specific to graphics operations. Finally, two instructions facilitate HT operation, for example, by allowing one operational thread to be moved to a higher priority than another. 64-bit extensions -Intel 64 In response to market demands, Intel added 64-bit extensions to the x86 architecture of the Xeon, Xeon MP, and Pentium 4 processors. The key advantage of 64-bit processing is that the system can address a much larger flat memory space (up to 16 exabytes). Even though the 32-bit architecture can actually access up to 64 GB of memory, access above the standard 4 GB limit must go through a slow and cumbersome windowing facility such as Physical Address Extension (PAE). Due to the complexities of this process, most 32-bit applications have not made use of the higher address space. Today, few applications require more than 1 or 2 GB of memory; however, this will eventually change. By adding 64-bit extensions to its x86 processors, Intel has provided users with the same 64bit addressing benefit at a much lower cost than if users were forced to replace both the hardware and software. Even though the larger memory addressing capability is the primary advantage of 64-bit extensions, it is not the only one. The 64-bit extensions also provide a larger register set with eight additional general purpose registers (GPR) and 64-bit versions of the existing registers. With a total of 16 GPRs, 64-bit extensions provide additional resources that compilers can use to increase performance. 64-bit Extensions AMD was first to release 64-bit extensions―called AMD64―with its Opteron processor in early 2003. Within a year, Intel responded with its own plans to deliver a similar solution called Extended Memory 64 Technology, or EM64T, which is broadly compatible with AMD64. In late 2006, Intel began using the name Intel 64 for its implementation. Intel 64 and AMD64 use the same register sets and definitions, and the 64-bit instructions are nearly identical. HP expects that any minor differences will be handled by the OS and compiler, so that the average application writer or customer should see no differences. New operating systems are required to make use of 64-bit extensions. Red Hat, SUSE, and Microsoft provide AMD64 support and Intel 64 support. 10

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

Extended hyper-pipeline
In keeping with its history of regularly increasing processor frequencies, Intel extended the hyper-
pipeline queue from 20 (in the earlier Pentium 4 design) to 31 stages. The biggest drawback to this
approach is that, as the pipe gets longer, interruptions (stalls) to the regular flow of instructions in the
pipe become progressively more costly in terms of performance. To mitigate such stalls, Intel improved
the branch-prediction algorithm sufficiently to prevent this deeper pipeline from causing performance
degradation.
SSE3 instructions
The Prescott design added Streaming Single-Instruction-Multiple-Data (SIMD) Extensions 3, or Prescott
New Instructions. As they did in earlier processors, SIMD instructions provide the potential for
improved performance because each instruction permits operation on multiple data items at the same
time. Prescott processors had newer versions of arithmetic, graphics, and HT synchronization
instructions.
The arithmetic group consists of one new instruction for converting x87 data into integer format, and
five instructions that simplify the process of performing complex arithmetic. Complex numbers actually
consist of two numbers: a real and an imaginary component. The additional instructions facilitate
complex operations because they are designed to operate on both parts of these complex pairs of
numbers at the same time. Using these instructions also simplifies coding complex arithmetic
operations because fewer instructions are needed to accomplish the goal.
The graphics group contains one instruction for video encoding and four that are specific to graphics
operations. Finally, two instructions facilitate HT operation, for example, by allowing one operational
thread to be moved to a higher priority than another.
64-bit extensions —Intel 64
In response to market demands, Intel added 64-bit extensions to the x86 architecture of the Xeon,
Xeon MP, and Pentium 4 processors. The key advantage of 64-bit processing is that the system can
address a much larger flat memory space (up to 16 exabytes). Even though the 32-bit architecture
can actually access up to 64 GB of memory, access above the standard 4 GB limit must go through a
slow and cumbersome windowing facility such as Physical Address Extension (PAE). Due to the
complexities of this process, most 32-bit applications have not made use of the higher address space.
Today, few applications require more than 1 or 2 GB of memory; however, this will eventually
change. By adding 64-bit extensions to its x86 processors, Intel has provided users with the same 64-
bit addressing benefit at a much lower cost than if users were forced to replace both the hardware
and software.
Even though the larger memory addressing capability is the primary advantage of 64-bit extensions, it
is not the only one. The 64-bit extensions also provide a larger register set with eight additional
general purpose registers (GPR) and 64-bit versions of the existing registers. With a total of 16 GPRs,
64-bit extensions provide additional resources that compilers can use to increase performance.
64-bit Extensions
AMD was first to release 64-bit extensions
called AMD64
with its Opteron
processor in early 2003. Within a year, Intel responded with its own plans to
deliver a similar solution called Extended Memory 64 Technology, or EM64T,
which is broadly compatible with AMD64. In late 2006, Intel began using the
name Intel 64 for its implementation. Intel 64 and AMD64 use the same register
sets and definitions, and the 64-bit instructions are nearly identical. HP expects
that any minor differences will be handled by the OS and compiler, so that the
average application writer or customer should see no differences. New operating
systems are required to make use of 64-bit extensions. Red Hat, SUSE, and
Microsoft provide AMD64 support and Intel 64 support.
10