HP DL740 HP F8 Architecture Technology Brief - Page 13

Frequency and Full-Speed Cache, Processor and I/O Bus Design, SIMD Instructions

Page 13 highlights

HP F8 Architecture Frequency and Full-Speed Cache Processor and I/O Bus Design SIMD Instructions Out-of-order Execution Branch Prediction logical processors just as it would in a traditional multiprocessor system. The execution core processes instructions in an order determined by dependencies in the data and availability. Therefore, the processor is allowed to execute instructions in the order that will yield the best overall performance. For more information, read the HP technology brief entitled "Intel® Hyper-Threading Technology," Document Number TC0300306, available on the HP website: www.hp.com. The Xeon MP processor is now available with an operating frequency of 1.5, 1.90, and 2.00 GHz. The Xeon MP includes an L2 cache located on the same die as the processor logic, giving high bandwidth and low latency on a full-speed backside bus. The full-speed backside bus will enable efficient access to the most frequently used data. The Xeon MP also includes an integrated level three (L3) cache on the die with size options of 1 or 2 MB. The 64-bit Xeon MP bus uses a similar protocol and cache coherency design as the P6 bus. The Xeon MP bus operates at 100 MHz using a quad-pumped data rate. The quad-data-rate bus uses four separate clocks, or strobes, to allow data transfer four times within a single clock cycle; therefore it provides an effective data transfer frequency of 400 MT/s and a maximum theoretical bandwidth of 3.2 GB/s. The Pentium 4 instruction set includes 76 new instructions, known as Streaming Single Instruction Multiple Data (SIMD) Extensions 2. These instructions are improvements to the Streaming SIMD Extensions used in Pentium III processors. New SIMD instructions include floating-point and integer SIMD instructions. These improved instructions will enable software developers to deliver higher levels of performance in multimedia applications ranging from 3-D engineering applications to speech recognition. The Xeon MP processor extends the out-of-order execution model of P6 processors by providing: • an extended pipeline with twice the number of stages as the current P6 family of processors (20 versus 10 stages). • increased depth of speculation by using a much deeper buffer. This allows more code loops to be active in the processor. • an arithmetic logic unit that uses a double-data rate, clocked at twice the speed of the processor. In a pipelined architecture such as that used in X86 processors, processor performance is directly related to how well the processor can predict branching. The Xeon MP processor employs a new type of cache called a trace cache to improve branch prediction. This cache stores micro-operations that are already decoded to help speed up execution. The Xeon MP processor will also enhance the branch prediction algorithms originally implemented in the P6 family by effectively combining all currently available prediction schemes. 13

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

HP F8 Architecture
13
logical processors just as it would in a traditional multiprocessor system. The execution core
processes instructions in an order determined by dependencies in the data and availability.
Therefore, the processor is allowed to execute instructions in the order that will yield the best
overall performance.
For more information, read the HP technology brief entitled “Intel® Hyper-Threading
Technology,” Document Number TC0300306, available on the HP website:
www.hp.com
.
Frequency and
Full-Speed Cache
The Xeon MP processor is now available with an operating frequency of 1.5, 1.90, and
2.00 GHz.
The Xeon MP includes an L2 cache located on the same die as the processor
logic, giving high bandwidth and low latency on a full-speed backside bus. The full-speed
backside bus will enable efficient access to the most frequently used data. The Xeon MP also
includes an integrated level three (L3) cache on the die with size options of 1 or 2 MB.
Processor and I/O
Bus Design
The 64-bit Xeon MP bus uses a similar protocol and cache coherency design as the P6 bus.
The Xeon MP bus operates at 100 MHz using a quad-pumped data rate. The quad-data-rate
bus uses four separate clocks, or strobes, to allow data transfer four times within a single
clock cycle; therefore it provides an effective data transfer frequency of 400 MT/s and a
maximum theoretical bandwidth of 3.2 GB/s.
SIMD Instructions
The Pentium 4 instruction set includes 76 new instructions, known as Streaming Single
Instruction Multiple Data (SIMD) Extensions 2. These instructions are improvements to the
Streaming SIMD Extensions used in Pentium III processors. New SIMD instructions include
floating-point and integer SIMD instructions. These improved instructions will enable software
developers to deliver higher levels of performance in multimedia applications ranging from
3-D engineering applications to speech recognition.
Out-of-order
Execution
The Xeon MP processor extends the out-of-order execution model of P6 processors by
providing:
an extended pipeline with twice the number of stages as the current P6 family of
processors (20 versus 10 stages).
increased depth of speculation by using a much deeper buffer. This allows more code
loops to be active in the processor.
an arithmetic logic unit that uses a double-data rate, clocked at twice the speed of the
processor.
Branch Prediction
In a pipelined architecture such as that used in X86 processors, processor performance is
directly related to how well the processor can predict branching. The Xeon MP processor
employs a new type of cache called a trace cache to improve branch prediction. This cache
stores micro-operations that are already decoded to help speed up execution. The Xeon MP
processor will also enhance the branch prediction algorithms originally implemented in the
P6 family by effectively combining all currently available prediction schemes.