Compaq W8000 Hyper-Threading Technology, New Feature of Intel Xeon Processor - Page 6

the key to the performance gain. Higher utilization of these resources improves the IPC. - processor

Page 6 highlights

Hyper-Threading Technology, New Feature of Intel Xeon Processor White Paper 6 Figure 3 Figure 4 shows the micro-architecture block diagram of the Intel Pentium 4 processor. A thread is initialized by the OS in the same way as in a MP system. There is no distinction in the OS between a logical processor and a true physical processor. The two active threads are interleaved at the instruction fetch and issue stages of the pipeline. (These are the first six blocks on the left in Figure 4.) The trace cache (decoded instruction cache) has an additional tag bit for each instruction, to represent the thread number (0 or 1). If one thread takes an exception, the instructions from only that thread are flushed. The rapid execution engine (the Scheduler, Floating-Point (FP-RF), Integer-RF, and arithmetic logic unit [ALU] stages) is competitively shared between two threads with equal priority, which can be an issue when a low-priority task is scheduled simultaneously with a high-priority, time-critical task. The rapid execution engine is the key to the performance gain. Higher utilization of these resources improves the IPC. Instructions from both threads are simultaneously dispatched for execution by the processor core. The processor core executes these two threads concurrently, using out-of-order instruction scheduling to keep as many of its execution units as busy as possible during each clock cycle. Reorder/retire stage again alternates between two logical processors to commit state in program order. 167T-0202A-WWEN

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35

Hyper-Threading Technology, New Feature of Intel Xeon Processor White Paper
6
167T-0202A-WWEN
Figure 3
Figure 4 shows the micro-architecture block diagram of the Intel Pentium 4 processor. A thread is
initialized by the OS in the same way as in a MP system. There is no distinction in the OS
between a logical processor and a true physical processor. The two active threads are interleaved
at the instruction fetch and issue stages of the pipeline. (These are the first six blocks on the left in
Figure 4.) The trace cache (decoded instruction cache) has an additional tag bit for each
instruction, to represent the thread number (0 or 1). If one thread takes an exception, the
instructions from only that thread are flushed. The rapid execution engine (the Scheduler,
Floating–Point (FP-RF), Integer–RF, and arithmetic logic unit [ALU] stages) is competitively
shared between two threads with equal priority, which can be an issue when a low-priority task is
scheduled simultaneously with a high-priority, time-critical task. The rapid execution engine is
the key to the performance gain. Higher utilization of these resources improves the IPC.
Instructions from both threads are simultaneously dispatched for execution by the processor core.
The processor core executes these two threads concurrently, using out-of-order instruction
scheduling to keep as many of its execution units as busy as possible during each clock cycle.
Reorder/retire stage again alternates between two logical processors to commit state in program
order.