IBM 86884RX Installation Guide - Page 32

Memory ProteXion, detects and reports memory errors that might be developing before

Page 32 highlights

Memory ProteXion Memory ProteXion, also known as "redundant bit steering", is the technology behind using redundant bits in a data packet to provide backup in the event of a DIMM failure. Currently, other industry-standard servers use 8 bits of the 72-bit data packets for ECC functions and the remaining 64 bits for data. However, the x450 uses an advanced ECC algorithm that is based not on bits but on memory symbols. Symbols are groups of multiple bits, and in the case of the x450, each symbol is 4 bits wide. With two-way interleaved memory, the algorithm needs only three symbols to perform the same ECC functions, thus leaving one symbol free (2 bits on each DIMM). See Figure 1-10. S0 S1 S2 S3 S16 S17 S18 S19 S4 S5 S20 S21 S6 S7 S22 S23 S8 S9 S24 S25 S10 S11 S12 S13 S26 S27 S28 S29 S14 S15 S30 S31 C0 C1 C2 K1 S32 S33 S34 S35 S36 S37 S38 S39 S40 S41 S48 S49 S50 S51 S52 S53 S54 S55 S56 S57 Figure 1-10 Memory ProteXion S42 S43 S44 S45 S58 S59 S60 S61 S46 S47 S62 S63 C3 C4 C5 K2 In the event that a chip failure on the DIMM is detected by memory scrubbing, the memory controller can re-route data around that failed chip through the spare symbol (similar to the hot-spare drive of RAID array). It can do this automatically without issuing a Predictive Failure Analysis® (PFA) or light path diagnostics alert to the administrator. After the second DIMM failure, PFA and light path diagnostics alerts would occur on that DIMM as normal. Memory scrubbing Memory scrubbing is an automatic daily test of all the system memory that detects and reports memory errors that might be developing before they cause a server outage. Memory scrubbing and Memory ProteXion work in conjunction with each other, but they do not require memory mirroring (as described below) to be enabled to work properly. When a bit error is detected, memory scrubbing determines if the error is recoverable or not. If it is recoverable, Memory ProteXion is enabled and the data that was stored in the damaged locations is rewritten to a new location. The error is then reported so that preventative maintenance can be performed. As long as there are enough good locations to allow the proper operation of the server, no further action is taken other than recording the error in the error 18 IBM ^ xSeries 450 Planning and Installation Guide

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160

18
IBM
^
xSeries 450 Planning and Installation Guide
±
Memory ProteXion
Memory ProteXion, also known as “redundant bit steering”, is the technology
behind using redundant bits in a data packet to provide backup in the event of
a DIMM failure.
Currently, other industry-standard servers use 8 bits of the 72-bit data packets
for ECC functions and the remaining 64 bits for data. However, the x450 uses
an advanced ECC algorithm that is based not on bits but on memory symbols.
Symbols are groups of multiple bits, and in the case of the x450, each symbol
is 4 bits wide. With two-way interleaved memory, the algorithm needs only
three symbols to perform the same ECC functions, thus leaving one symbol
free (2 bits on each DIMM). See Figure 1-10.
Figure 1-10
Memory ProteXion
In the event that a chip failure on the DIMM is detected by memory scrubbing,
the memory controller can re-route data around that failed chip through the
spare symbol (similar to the hot-spare drive of RAID array). It can do this
automatically without issuing a Predictive Failure Analysis® (PFA) or light
path diagnostics alert to the administrator. After the second DIMM failure, PFA
and light path diagnostics alerts would occur on that DIMM as normal.
±
Memory scrubbing
Memory scrubbing is an automatic daily test of all the system memory that
detects and reports memory errors that might be developing before they
cause a server outage.
Memory scrubbing and Memory ProteXion work in conjunction with each
other, but they do not require memory mirroring (as described below) to be
enabled to work properly.
When a bit error is detected, memory scrubbing determines if the error is
recoverable or not. If it is recoverable, Memory ProteXion is enabled and the
data that was stored in the damaged locations is rewritten to a new location.
The error is then reported so that preventative maintenance can be
performed.
As long as there are enough good locations to allow the proper operation of
the server, no further action is taken other than recording the error in the error
S0
C0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10 S11
S12 S13
S14 S15
C1
C2
S16 S17
S18 S19
S20 S21
S22 S23
S24 S25
S26 S27
S28 S29
S30 S31
S32
C3
S33
S34 S35
S36 S37
S38 S39
S40 S41
S42 S43
S44 S45
S46 S47
C4
C5
S48 S49
S50 S51
S52 S53
S54 S55
S56 S57
S58 S59
S60 S61
S62 S63
K1
K2