Intel S2600GZ S2600GZ/GL - Page 39

Demand Scrubbing for ECC Memory, 2.4.5.4, Patrol Scrubbing for ECC Memory, 2.4.5.5, Rank

Page 39 highlights

Product Architecture Overview Intel® Server Board S2600GZ/GL TPS 3.2.4.5.3 Demand Scrubbing for ECC Memory Demand scrubbing is the ability to write corrected data back to the memory once a correctable error is detected on a read transaction. This allows for correction of data in memory at detect, and decrease the chances of a second error on the same address accumulating to cause a multi-bit error (MBE) condition. Demand Scrubbing is enabled/disabled (default is enabled) in the Memory Configuration screen in Setup. 3.2.4.5.4 Patrol Scrubbing for ECC Memory Patrol scrubs are intended to ensure that data with a correctable error does not remain in DRAM long enough to stand a significant chance of further corruption to an uncorrectable stage. 3.2.4.5.5 Rank Sparing Mode Rank Sparing Mode enhances the system's RAS capability by "swapping out" failing ranks of DIMMs. Rank Sparing is strictly channel and rank oriented. Each memory channel is a Sparing Domain. For Rank Sparing to be available as a RAS option, there must be 2 or more single rank or dual rank DIMMs, or at least one quad rank DIMM installed on each memory channel. Rank Sparing Mode is enabled/disabled in the Memory RAS and Performance Configuration screen in the Bios Setup Utility When Sparing Mode is operational, for each channel, the largest size memory rank is reserved as a "spare" and is not used during normal operations. The impact on Effective Memory Size is to subtract the sum of the reserved ranks from the total amount of installed memory. Hardware registers count the number of Correctable ECC Errors for each rank of memory on each channel during operations and compare the count against a Correctable Error Threshold. When the correctable error count for a given rank hits the threshold value, that rank is deemed to be "failing", and it triggers a Sparing Fail Over (SFO) event for the channel in which that rank resides. The data in the failing rank is copied to the Spare Rank for that channel, and the Spare Rank replaces the failing rank in the IMC's address translation registers. An SFO Event is logged to the BMC SEL. The failing rank is then disabled, and any further Correctable Errors on that now non-redundant channel will be disregarded. The correctable error that triggered the SFO may be logged to the BMC SEL, if it was the first one to occur in the system. That first correctable error event will be the only one logged for the system. However, since each channel is a Sparing Domain, the correctable error counting continues for other channels which are still in a redundant state. There can be as many SFO Events as there are memory channels with DIMMs installed. 3.2.4.5.6 Mirrored Channel Mode Channel Mirroring Mode gives the best memory RAS capability by maintaining two copies of the data in main memory. If there is an Uncorrectable ECC Error, the channel with the error is disabled and the system continues with the "good" channel, but in a non-redundant configuration. 26 Revision 1.1 Intel order number G24881-004

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258
  • 259
  • 260
  • 261
  • 262
  • 263
  • 264

Product Architecture Overview
Intel® Server Board S2600GZ/GL TPS
Revision 1.1
Intel order number G24881-004
26
3.2.4.5.3
Demand Scrubbing for ECC Memory
Demand scrubbing is the ability to write corrected data back to the memory once a correctable
error is detected on a read transaction. This allows for correction of data in memory at detect,
and decrease the chances of a second error on the same address accumulating to cause a
multi-bit error (MBE) condition.
Demand Scrubbing is enabled/disabled (default is enabled) in the Memory Configuration screen
in Setup.
3.2.4.5.4
Patrol Scrubbing for ECC Memory
Patrol scrubs are intended to ensure that data with a correctable error does not remain in DRAM
long enough to stand a significant chance of further corruption to an uncorrectable stage.
3.2.4.5.5
Rank Sparing Mode
Rank Sparing Mode enhances the system’s RAS capability by “swapping out” failing ranks of
DIMMs. Rank Sparing is strictly channel and rank oriented. Each memory channel is a Sparing
Domain.
For Rank Sparing to be available as a RAS option, there must be 2 or more single rank or dual
rank DIMMs, or at least one quad rank DIMM installed on each memory channel.
Rank Sparing Mode is enabled/disabled in the Memory RAS and Performance Configuration
screen in the <F2> Bios Setup Utility
When Sparing Mode is operational, for each channel, the largest size memory rank is reserved
as a “spare” and is not used during normal operations. The impact on Effective Memory S
ize is
to subtract the sum of the reserved ranks from the total amount of installed memory.
Hardware registers count the number of Correctable ECC Errors for each rank of memory on
each channel during operations and compare the count against a Correctable Error Threshold.
When the correctable error count for a given rank hits the threshold value, that rank is deemed
to be “failing”, and it triggers a Sparing Fail Over (SFO) event for the channel in which that rank
resides. The data in the failing rank is copied to the Spare Rank for that channel, and the Spare
Rank replaces the failing rank in the IMC’s address translation registers.
An SFO Event is logged to the BMC SEL. The failing rank is then disabled, and any further
Correctable Errors on that now non-redundant channel will be disregarded.
The correctable error that triggered the SFO may be logged to the BMC SEL, if it was the first
one to occur in the system. That first correctable error event will be the only one logged for the
system. However, since each channel is a Sparing Domain, the correctable error counting
continues for other channels which are still in a redundant state. There can be as many SFO
Events as there are memory channels with DIMMs installed.
3.2.4.5.6
Mirrored Channel Mode
Channel Mirroring Mode gives the best memory RAS capability by maintaining two copies of the
data in main memory. If there is an Uncorrectable ECC Error, the channel with the error is
disabled and the system continues with the “good” channel, but in a non
-redundant
configuration.