IBM 71413SU Technical Reference - Page 25

Memory mirroring, Each memory port could then sustain a second chip failure without shutting down. - manual

Page 25 highlights

Chipkill™ memory Chipkill is integrated into the XA-64e chipset, so it does not require special Chipkill DIMMs and is transparent to the operating system. When combining Chipkill with Memory ProteXion and Active Memory, the x3850 M2 and x3950 M2 provide very high reliability in the memory subsystem. When a memory chip failure occurs, Memory ProteXion transparently handles the rerouting of data around the failed component as previously described. However, if a further failure occurs, the Chipkill component in the memory controller reroutes data. The memory controller provides memory protection similar in concept to disk array striping with parity, writing the memory bits across multiple memory chips on the DIMM. The controller is able to reconstruct the missing bit from the failed chip and continue working as usual. One of these additional failures can be handled for each memory port, for a total of eight Chipkill recoveries. Hot-add and hot-swap memory The x3850 M2 and x3950 M2 support the replacing of failed DIMMs while the server is still running. This hot-swap support works in conjunction with memory mirroring. The server also supports adding additional memory while the server is running. Adding memory requires operating system support. Note: These two features are mutually exclusive, as explained here: Hot-add requires that memory mirroring be disabled. Hot-swap requires that memory mirroring be enabled. In addition, to maintain the highest levels of system availability, if a memory error is detected during POST or memory configuration, the server can automatically disable the failing memory bank and continue operating with reduced memory capacity. You can manually re-enable the memory bank after the problem is corrected by using the Setup menu in the BIOS. Memory mirroring, Chipkill, and Memory ProteXion provide multiple levels of redundancy to the memory subsystem. Combining Chipkill with Memory ProteXion allows up to two memory chip failures for each memory port on the x3850 M2 and x3950 M2, for a total of eight failures sustained. 1. The first failure detected by the Chipkill algorithm on each port does not generate a light path diagnostics error because Memory ProteXion recovers from the problem automatically. 2. Each memory port could then sustain a second chip failure without shutting down. 3. Provided that memory mirroring is enabled, the third chip failure on that port would send the alert and take the DIMM offline, but keep the system running out of the redundant memory bank. Memory mirroring Memory mirroring is available on the x3850 M2 and x3950 M2 for increased fault tolerance. Memory mirroring is operating system-independent, because all mirroring activities are handled by the hardware. The x3850 M2 and x3950 M2 have four separate memory power buses that each power one of the four memory cards. Figure 13 on page 26 shows the location of the memory cards (which are numbered 1 to 4, from left to right) and the DIMM sockets and LEDs on the memory cards. IBM System x3950 M2 and x3850 M2 Technical Introduction 25

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42

IBM System x3950 M2 and x3850 M2 Technical Introduction
25
±
Chipkill™ memory
Chipkill is integrated into the XA-64e chipset, so it does not require special Chipkill DIMMs
and is transparent to the operating system. When combining Chipkill with Memory
ProteXion and Active Memory, the x3850 M2 and x3950 M2 provide very high reliability in
the memory subsystem.
When a memory chip failure occurs, Memory ProteXion transparently handles the
rerouting of data around the failed component as previously described. However, if a
further failure occurs, the Chipkill component in the memory controller reroutes data. The
memory controller provides memory protection similar in concept to disk array striping with
parity, writing the memory bits across multiple memory chips on the DIMM. The controller
is able to reconstruct the missing bit from the failed chip and continue working as usual.
One of these additional failures can be handled for each memory port, for a total of eight
Chipkill recoveries.
±
Hot-add and hot-swap memory
The x3850 M2 and x3950 M2 support the replacing of failed DIMMs while the server is still
running. This hot-swap support works in conjunction with memory mirroring. The server
also supports adding additional memory while the server is running. Adding memory
requires operating system support.
In addition, to maintain the highest levels of system availability, if a memory error is detected
during POST or memory configuration, the server can automatically disable the failing
memory bank and continue operating with reduced memory capacity. You can manually
re-enable the memory bank after the problem is corrected by using the Setup menu in the
BIOS.
Memory mirroring, Chipkill, and Memory ProteXion provide multiple levels of redundancy to
the memory subsystem. Combining Chipkill with Memory ProteXion allows up to two memory
chip failures for each memory port on the x3850 M2 and x3950 M2, for a total of eight failures
sustained.
1.
The first failure detected by the Chipkill algorithm on each port does not generate a light
path diagnostics error because Memory ProteXion recovers from the problem
automatically.
2.
Each memory port could then sustain a second chip failure without shutting down.
3.
Provided that memory mirroring is enabled, the third chip failure on that port would send
the alert and take the DIMM offline, but keep the system running out of the redundant
memory bank.
Memory mirroring
Memory mirroring is available on the x3850 M2 and x3950 M2 for increased fault tolerance.
Memory mirroring is operating system-independent, because all mirroring activities are
handled by the hardware.
The x3850 M2 and x3950 M2 have four separate memory power buses that each power one
of the four memory cards. Figure 13 on page 26 shows the location of the memory cards
(which are numbered 1 to 4, from left to right) and the DIMM sockets and LEDs on the
memory cards.
Note:
These two features are mutually exclusive, as explained here:
±
Hot-add requires that memory mirroring be disabled.
±
Hot-swap requires that memory mirroring be enabled.