Intel SE7525GP2 Product Specification - Page 38

Disabling DIMMs

Page 38 highlights

Functional Architecture Intel® Server Boards SE7320SP2 and SE7525GP2 3.5.4 Disabling DIMMs The BIOS provides a mechanism to disable a DIMM if it is detected to be faulty. A faulty DIMM is defined to have either multiple correctable errors or a single uncorrectable error on a single DIMM. Memory errors are logged during runtime and CMEs (Correctable Memory Error) are counted, the CMEs include both single bit correctable and other correctable memory errors. Though DIMMs are marked as disabled, they are actually disabled only during the next reboot. At the next system boot, memory-sizing code reads the recorded state of the DIMMs and skips sizing DIMMs marked as disabled. Because DIMMs are always used in 2-way interleaving, the DIMM pair is disabled. The disabled DIMMs are indicated by an LED next to the DIMM socket. If all DIMMs in a system have been disabled, the BIOS generates beep codes to indicate that the system has no usable memory. Disabled DIMMs/rows may be re-enabled through a BIOS Setup option (Advanced Menu | Memory Configuration Sub-menu | Memory Retest | change setting to "enabled" | Exit Menu | Save changes and Exit). The DIMM slot will no longer be disabled if the system boots without memory in the DIMM slot. 3.5.4.1 Mechanism for CME/SEC Counter The expected error rates for DIMMs are stated per gigabyte of memory. This information comes from three sources: ƒ Intel experimental measurements (one and one-half errors per year) ƒ Data from a memory component vendor (one error per month) ƒ The results from a 10-year study by a major computer manufacturer (four errors per month) Since the lowest error rate was gathered over a short time, and the highest error rate was gathered over a long time, these two numbers are not considered valid and are discarded. The middle error number is perceived as being a more accurate conservative estimate and is used to program the threshold registers for single-bit correctable memory errors or SECs. The threshold number must be adjusted for geographical areas of increased occurrence of alpha particles, which will increase error rates. Geographical effects include high altitudes and radioactive mineral deposits. Studies have shown that single-bit error rates at altitudes over 10,000 feet are 14 times higher than error rates at sea level. The highest of the three quoted error rates included various geographical locations. Table 8 shows the suggested SEC register threshold for various DIMM sizes. The values in the table include a minimal error residue at one times the expected average error rate. Halving the time or threshold would result in loss of error count resolution. One register is programmed for each DIMM slot. 26 Revision 4.0

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184

Functional Architecture
Intel® Server Boards SE7320SP2 and SE7525GP2
Revision 4.0
26
3.5.4
Disabling DIMMs
The BIOS provides a mechanism to disable a DIMM if it is detected to be faulty. A faulty DIMM
is defined to have either multiple correctable errors or a single uncorrectable error on a single
DIMM. Memory errors are logged during runtime and CMEs (Correctable Memory Error) are
counted, the CMEs include both single bit correctable and other correctable memory errors.
Though DIMMs are marked as disabled, they are actually disabled only during the next reboot.
At the next system boot, memory-sizing code reads the recorded state of the DIMMs and skips
sizing DIMMs marked as disabled. Because DIMMs are always used in 2-way interleaving, the
DIMM pair is disabled. The disabled DIMMs are indicated by an LED next to the DIMM socket. If
all DIMMs in a system have been disabled, the BIOS generates beep codes to indicate that the
system has no usable memory.
Disabled DIMMs/rows may be re-enabled through a BIOS Setup option (Advanced Menu |
Memory Configuration Sub-menu | Memory Retest | change setting to “enabled” | Exit Menu |
Save changes and Exit). The DIMM slot will no longer be disabled if the system boots without
memory in the DIMM slot.
3.5.4.1
Mechanism for CME/SEC Counter
The expected error rates for DIMMs are stated per gigabyte of memory. This information comes
from three sources:
Intel experimental measurements (one and one-half errors per year)
Data from a memory component vendor (one error per month)
The results from a 10-year study by a major computer manufacturer (four errors per
month)
Since the lowest error rate was gathered over a short time, and the highest error rate was
gathered over a long time, these two numbers are not considered valid and are discarded. The
middle error number is perceived as being a more accurate conservative estimate and is used
to program the threshold registers for single-bit correctable memory errors or SECs.
The threshold number must be adjusted for geographical areas of increased occurrence of
alpha particles, which will increase error rates. Geographical effects include high altitudes and
radioactive mineral deposits. Studies have shown that single-bit error rates at altitudes over
10,000 feet are 14 times higher than error rates at sea level. The highest of the three quoted
error rates included various geographical locations.
Table 8 shows the suggested SEC register threshold for various DIMM sizes. The values in the
table include a minimal error residue at one times the expected average error rate. Halving the
time or threshold would result in loss of error count resolution. One register is programmed for
each DIMM slot.