HP DL360 The Intel processor roadmap for industry-standard servers technology - Page 5

NetBurst

®

microarchitecture

The NetBurst-based processor for low-cost, single-processor servers is the Pentium® 4 processor. The

original 180nm version of the Pentium 4 was known as Willamette, and the subsequent 130nm

version was known as Northwood. NetBurst-based processors intended for multi-processor

environments are referred to as Intel® Xeon™ (for two-processor systems) and Xeon MP (for systems

using more than two processors).

The NetBurst microarchitecture included the following enhancements:

•

Higher bandwidth for instruction fetches

•

256-KB Level 2 (L2) cache with 64-byte cache lines

•

NetBurst system bus: a 64-bit, 100-MHz bus capable of providing 3.2 GB/s of bandwidth by

double pumping the address and quad pumping the data. The 100-MHz quad pumped data bus is

also referred to as a 400-MHz data bus. To provide higher levels of performance, Intel added

support for a 533-MHz front side bus to the Pentium 4 and Xeon processors and later added

support for 800 MHz to the Pentium 4.

•

Integer arithmetic logic unit (ALU) running at twice the clock speed (double data rate)

•

Modified floating point unit (FPU)

•

Streaming SIMD extension 2 (SSE2): New instructions bring the total to 144 SIMD instructions to

manage floating point, application, and multimedia performance.

•

Advanced dynamic execution

•

Deeper instruction window for out-of-order, speculative execution and improved branch prediction

over the P6 dynamic execution core

•

Execution trace cache (stores pre-decoded micro-operations)

•

Enhanced floating point/multimedia engine

•

Hyper-threading (HT) in Xeon processors and Pentium 4 processors (described below)

Hyper-pipeline and clock frequency

One performance-enhancing feature of the NetBurst microarchitecture was its hyper-pipeline, a 20-

stage branch-prediction pipeline. Previous 32-bit processors had a 10-stage pipeline. The hyper-

pipeline can contain more than 100 instructions at once and can handle up to 48 loads and stores

concurrently. The pipeline in a processor is analogous to a factory assembly line where production is

split into multiple stages to keep all factory workers busy and to complete multiple stages in parallel.

Likewise, the work to execute program code is split into stages to keep the processor busy and allow

it to execute more code during each clock cycle. In this case, the processor must complete the

operation for each stage within a single clock cycle. The processor can achieve this by splitting the

task into smaller tasks and using more (shorter) stages to execute the instructions (Figure 3). Thus,

each stage can be completed faster, allowing the processor to have a higher clock frequency.

However, it is important to understand that splitting each stage into smaller stages to achieve a higher

clock frequency does not mean that more work is being done in the pipeline per clock cycle.

5

Section	Page
Abstract	2
Introduction	2
Intel processor architecture and microarchitectures	2
NetBurst® microarchitecture	5
Hyper-pipeline and clock frequency	5
Hyper-Threading Technology	7
NetBurst microarchitecture on 90nm silicon process technology	9
Extended hyper-pipeline	10
SSE3 instructions	10
64-bit extensions —Intel 64	10
Two-core technology	11
Intel Core™ microarchitecture	12
Processors	12
Xeon two-core processors	12
Xeon four-core processors	13
Enhanced SpeedStep® Technology	14
Intel Virtualization® Technology	15
Intel® Microarchitecture Nehalem	15
Integrated memory controller	15
Intel® QuickPath Technology	16
Three-level cache hierarchy	17
Intel® Hyper-Threading Technology	18
Intel® Turbo Boost Technology	18
Dynamic Power Management	19
Performance comparisons	20
TPC-C performance	20
SPEC performance	20
Conclusion	21
For more information	22

HP DL360 The Intel processor roadmap for industry-standard servers technology - Page 5

NetBurst® microarchitecture, Hyper-pipeline and clock frequency

Page 5 highlights