HP ProLiant DL380p DDR3 memory technology - Page 12

DDR3 latency

Page 12 highlights

Table 3. Theoretical maximum versus measured memory throughput for 2P ProLiant servers Intel-based 2P ProLiant G5 Intel-based 2P ProLiant G6 Theoretical maximum memory bandwidth 25.6 GB/s (RDIMMs) 38.4 GB/s (FBDIMMs) 64 GB/s Measured maximum memory throughput 12 GB/s 40 GB/s NUMA architecture also allows the 4P ProLiant G7 servers to have significantly increased memory bandwidth (Table 4). Measured maximum memory throughput for these systems has not yet been fully characterized. Table 4. Theoretical maximum memory throughput for 4P ProLiant servers Intel-based 4P ProLiant G5 Intel-based 4P ProLiant G7 AMD-based 4P ProLiant G6 AMD-based 4P ProLiant G7 Theoretical maximum memory bandwidth 38.4 GB/s (FBDIMMs) 136.4 GB/s 51.2 GB/s 169.6 GB/s DDR3 latency Memory latency is a measure of the time it takes for the CPU to receive data from the memory controller once it has been requested by the processor. It is an important measurement of memory subsystem responsiveness. Retrieving data from the memory subsystem consists of several steps, each of which consume time and together comprise overall latency:  Time memory request spends in the processor I/O queue and being sent to the memory controller  Time in the memory controller queue  Issuing of the Row Address Select (RAS) and Column Address Select (CAS) commands on the memory address bus  Retrieving data from the memory data bus  Time through the memory controller and I/O bus back to the requesting processor Arithmetic Logic Unit (ALU). The setting of RAS and CAS lines determine which memory address will be accessed. The electrical properties of DRAM are such that setting them requires about 13.5 nanoseconds each, and is roughly the same for both DDR2 and DDR3 memory. This means that there are 27 to 28 nanoseconds of memory latency that is relatively fixed and cannot be improved. Any improvements in memory latency must come elsewhere. DDR3 achieves improvements in latency through its faster data rate and by using only UDIMMs and RDIMMs. There are two different measurements of memory latency for a system - unloaded latency and loaded latency. Measured when the system is idle, unloaded latency is the fastest possible time in which data can be retrieved from the memory subsystem. Unloaded latency is determined by the timing and electrical properties of the memory subsystem. Loaded latency is measured when the memory subsystem is saturated with memory requests. With loaded latency, many additional factors come into play, including the number of memory controllers in the memory subsystem, controller efficiency in 12

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14

12
Table 3.
Theoretical maximum versus measured memory throughput for 2P ProLiant servers
Theoretical maximum
memory bandwidth
Measured maximum memory
throughput
Intel-based 2P ProLiant G5
25.6 GB/s (RDIMMs)
38.4 GB/s (FBDIMMs)
12 GB/s
Intel-based 2P ProLiant G6
64 GB/s
40 GB/s
NUMA architecture also allows the 4P ProLiant G7 servers to have significantly increased memory
bandwidth (Table 4). Measured maximum memory throughput for these systems has not yet been fully
characterized.
Table 4.
Theoretical maximum memory throughput for 4P ProLiant servers
Theoretical maximum memory
bandwidth
Intel-based 4P ProLiant G5
38.4 GB/s (FBDIMMs)
Intel-based 4P ProLiant G7
136.4 GB/s
AMD-based 4P ProLiant G6
51.2 GB/s
AMD-based 4P ProLiant G7
169.6 GB/s
DDR3 latency
Memory latency is a measure of the time it takes for the CPU to receive data from the memory
controller once it has been requested by the processor. It is an important measurement of memory
subsystem responsiveness. Retrieving data from the memory subsystem consists of several steps, each
of which consume time and together comprise overall latency:
Time memory request spends in the processor I/O queue and being sent to the memory controller
Time in the memory controller queue
Issuing of the Row Address Select (RAS) and Column Address Select (CAS) commands on the
memory address bus
Retrieving data from the memory data bus
Time through the memory controller and I/O bus back to the requesting processor Arithmetic Logic
Unit (ALU).
The setting of RAS and CAS lines determine which memory address will be accessed. The electrical
properties of DRAM are such that setting them requires about 13.5 nanoseconds each, and is roughly
the same for both DDR2 and DDR3 memory. This means that there are 27 to 28 nanoseconds of
memory latency that is relatively fixed and cannot be improved. Any improvements in memory latency
must come elsewhere. DDR3 achieves improvements in latency through its faster data rate and by
using only UDIMMs and RDIMMs.
There are two different measurements of memory latency for a system – unloaded latency and loaded
latency. Measured when the system is idle, unloaded latency is the fastest possible time in which
data can be retrieved from the memory subsystem. Unloaded latency is determined by the timing and
electrical properties of the memory subsystem. Loaded latency is measured when the memory
subsystem is saturated with memory requests. With loaded latency, many additional factors come into
play, including the number of memory controllers in the memory subsystem, controller efficiency in