HP ProLiant DL360e Configuring and using DDR3 memory with HP ProLiant Gen8 Ser - Page 14

Optimizing for performance, Factors influencing latency

Page 14 highlights

Table 7: Maximum memory capacities for HP 2P ProLiant Gen8 servers using different DIMM types Number of DIMM Slots DIMM Type Maximum Capacity Configuration 24 UDIMM 128 GB 16 x 8GB 2R RDIMM 384 GB 24 x 16GB 2R LRDIMM 768 GB 24 x 32GB 4R 16 UDIMM 128 GB 16 x 8GB 2R RDIMM 256 GB 16 x 16GB 2R LRDIMM 512 GB 16 x 32GB 4R Optimizing for performance The two primary measurements of memory subsystem performance are throughput and latency. Latency is a measure of the time it takes for the memory subsystem to begin to deliver data to the processor core once the processor has requested. Throughput measures the total amount of data that the memory subsystem can transfer to the system processor(s) during a given period. Factors influencing latency Unloaded and loaded latencies are a measure of the efficiency of the memory sub-section in a server. Memory latency in servers is usually measured from the time of a read request in the core of a processor until the data is supplied to that core. This is also called load-to-use. Unloaded latency measures the latency when the system is idle and represents the lowest latency that you can achieve for memory requests for a given processor/memory combination. Loaded latency is the latency when the memory subsystem is saturated with memory requests. Loaded latency will always be greater than unloaded latency. There are a number of factors that influence memory latency in a system. • DIMM Speed. Faster DIMM speeds deliver lower latency, particularly loaded latency. Under loaded conditions, the primary contributor to latency is the time memory requests spend in a queue waiting to be executed. The faster the DIMM speed, the more quickly the memory controller can process the queued commands. For example, Memory running at 1600 MT/s has about 20% lower loaded latency than memory running at 1333 MT/s. • Ranks. For the same memory speed and DIMM type, more ranks will result in lower loaded latency. More ranks give the memory controller a greater capability to parallelize the processing of memory requests. This results in shorter request queues and therefore lower latency. • CAS latency. CAS (Column Address Strobe) latency represents the basic DRAM response time. It is specified as the number of clock cycles (e.g. 6, 7, 11) that the controller must wait after asserting the Column Address signal before data is available on the bus. CAS latency plays a larger role in determining the unloaded latency than loaded latency. Figure 5 shows both unloaded and loaded latency numbers for various DDR3 DIMMs when used in a one DIMM per channel configuration. As this chart illustrates, the idle latency is almost the same for every DIMM type and capacity. This is because the primary component of idle latency is the memory system overhead of performing a basic memory read or write operation, which is the same for all DIMM types. 14

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27

14
Table 7:
Maximum memory capacities for HP 2P ProLiant Gen8 servers using different DIMM types
Number of DIMM Slots
DIMM Type
Maximum Capacity
Configuration
24
UDIMM
128 GB
16 x 8GB 2R
RDIMM
384 GB
24 x 16GB 2R
LRDIMM
768 GB
24 x 32GB 4R
16
UDIMM
128 GB
16 x 8GB 2R
RDIMM
256 GB
16 x 16GB 2R
LRDIMM
512 GB
16 x 32GB 4R
Optimizing for performance
The two primary measurements of memory subsystem performance are throughput and latency. Latency is
a measure of the time it takes for the memory subsystem to begin to deliver data to the processor core once
the processor has requested. Throughput measures the total amount of data that the memory subsystem can
transfer to the system processor(s) during a given period.
Factors influencing latency
Unloaded and loaded latencies are a measure of the efficiency of the memory sub-section in a server.
Memory latency in servers is usually measured from the time of a read request in the core of a processor
until the data is supplied to that core. This is also called load-to-use. Unloaded latency measures the
latency when the system is idle and represents the lowest latency that you can achieve for memory requests
for a given processor/memory combination. Loaded latency is the latency when the memory subsystem is
saturated with memory requests. Loaded latency will always be greater than unloaded latency.
There are a number of factors that influence memory latency in a system.
DIMM Speed
. Faster DIMM speeds deliver lower latency, particularly loaded latency. Under loaded
conditions, the primary contributor to latency is the time memory requests spend in a queue waiting to be
executed. The faster the DIMM speed, the more quickly the memory controller can process the queued
commands. For example, Memory running at 1600 MT/s has about 20% lower loaded latency than
memory running at 1333 MT/s.
Ranks
. For the same memory speed and DIMM type, more ranks will result in lower loaded latency.
More ranks give the memory controller a greater capability to parallelize the processing of memory
requests. This results in shorter request queues and therefore lower latency.
CAS latency
. CAS (Column Address Strobe) latency represents the basic DRAM response time. It is
specified as the number of clock cycles (e.g. 6, 7, 11) that the controller must wait after asserting the
Column Address signal before data is available on the bus.
CAS latency plays a larger role in
determining the unloaded latency than loaded latency.
Figure 5 shows both unloaded and loaded latency numbers for various DDR3 DIMMs when used in a one
DIMM per channel configuration. As this chart illustrates, the idle latency is almost the same for every
DIMM type and capacity. This is because the primary component of idle latency is the memory system
overhead of performing a basic memory read or write operation, which is the same for all DIMM types.