HP DL360 Memory technology evolution: an overview of system memory technologie - Page 13

Memory protection technologies

Page 13 highlights

Memory protection technologies While advanced ECC provides memory correction, it does not provide failover capability. Replacing a failed DIMM requires powering down the system. Taking a server off line for unscheduled maintenance almost always raises operating costs-both in terms of replacement parts and in lost revenue from a server's lack of availability. Three available memory protection technologies offer failover/backup capability (also known as Memory Failure Recovery) to maintain server availability goals: Online spare memory mode Mirrored memory mode Lockstep memory mode Online spare memory mode In Online Spare mode, a populated channel (or branch) is designated as the spare, which makes it unavailable for normal use as system memory. If a DIMM in the system channel exceeds a threshold rate of correctable memory errors, the affected channel is taken offline and the data is copied to the spare channel. This capability maintains server availability and memory reliability without service intervention or server interruption. The DIMM that exceeded the error threshold can be replaced at the administrator's convenience during a scheduled shutdown. Online Spare memory reduces the chance of an uncorrectable error bringing down the system; however, it does not fully protect the system against uncorrectable memory errors. NOTE Online Spare memory mode can run on some systems with only one memory channel populated. However, dual-rank DIMMs (discussed later in this document) are required for a single-channel memory configuration. For more information, refer to the associated server user guide. In a system with three channels per memory controller, two channels operate normally and the third channel is the spare. Online Spare mode does not require operating system support or special software beyond the System BIOS. However, to support messaging and logging at the console along with messages in HP Systems Insight Manager, the operating system must have system management and agent support for Advanced Memory Protection. Implementing Online Spare mode over Advanced ECC requires extra DIMMs for the spare memory channel and reduces the memory capacity of the system. Mirrored memory mode Mirrored memory mode is a fault-tolerant memory option that provides a higher level of availability than Online Spare mode. Mirrored Memory mode provides full protection against single-bit and multibit errors. With Mirrored Memory mode enabled, identical data is written to two channels simultaneously. If a memory read from one channel returns incorrect data due to an uncorrectable memory error, the system automatically retrieves the data from the other channel. Mirroring is not lost due to a transient or soft error in one channel, and operation continues until the highly unlikely case of a simultaneous error in exactly the same location on a DIMM and its mirrored DIMM. Mirrored Memory mode reduces the amount of memory available to the operating system by 50 percent since only one of the two populated channels provides data. 13

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

13
Memory protection technologies
While advanced ECC provides memory correction, it does not provide failover capability. Replacing
a failed DIMM requires powering down the system. Taking a server off line for unscheduled
maintenance almost always raises operating costs
both in terms of replacement parts and in lost
revenue from a server’s lack of availability. Three available memory protection technologies offer
failover/backup capability (also known as Memory Failure Recovery) to maintain server availability
goals:
Online spare memory mode
Mirrored memory mode
Lockstep memory mode
Online spare memory mode
In Online Spare mode, a populated channel (or branch) is designated as the spare, which makes it
unavailable for normal use as system memory. If a DIMM in the system channel exceeds a threshold
rate of correctable memory errors, the affected channel is taken offline and the data is copied to the
spare channel. This capability maintains server availability and memory reliability without service
intervention or server interruption. The DIMM that exceeded the error threshold can be replaced at the
administrator’s convenience during a scheduled shutdown. Online Spare memory reduces the chance
of an uncorrectable error bringing down the system; however, it does not fully protect the system
against uncorrectable memory errors.
NOTE
Online Spare memory mode can run on some systems with only
one memory channel populated. However, dual-rank DIMMs
(discussed later in this document) are required for a single-channel
memory configuration. For more information, refer to the
associated server user guide.
In a system with three channels per memory controller, two channels operate normally and the third
channel is the spare. Online Spare mode does not require operating system support or special
software beyond the System BIOS. However, to support messaging and logging at the console along
with messages in HP Systems Insight Manager, the operating system must have system management
and agent support for Advanced Memory Protection. Implementing Online Spare mode over
Advanced ECC requires extra DIMMs for the spare memory channel and reduces the memory
capacity of the system.
Mirrored memory mode
Mirrored memory mode is a fault-tolerant memory option that provides a higher level of availability
than Online Spare mode. Mirrored Memory mode provides full protection against single-bit and multi-
bit errors.
With Mirrored Memory mode enabled, identical data is written to two channels simultaneously. If a
memory read from one channel returns incorrect data due to an uncorrectable memory error, the
system automatically retrieves the data from the other channel. Mirroring is not lost due to a transient
or soft error in one channel, and operation continues until the highly unlikely case of a simultaneous
error in exactly the same location on a DIMM and its mirrored DIMM. Mirrored Memory mode
reduces the amount of memory available to the operating system by 50 percent since only one of the
two populated channels provides data.