Intel SE7525GP2 Product Specification - Page 40

DRAM ECC - Intel, x4 Single Device Data Correction x4 SDDC, 5.5.2, Integrated Memory Scrub - case

Page 40 highlights

Functional Architecture Intel® Server Boards SE7320SP2 and SE7525GP2 3.5.5.1 DRAM ECC - Intel® x4 Single Device Data Correction (x4 SDDC) The DRAM interface uses two different ECC algorithms. The first is a standard SEC/DED ECC across a 64-bit data quantity. The second ECC method is a distributed, 144-bit S4EC-D4ED mechanism, which provides x4 SDDC protection for DIMMS that utilize x4 devices. Bits from x4 parts are presented in an interleaved fashion such that each bit from a particular part is represented in a different ECC word. DIMMs that use x8 devices, can use the same algorithm but will not have x4 SDDC protection, since at most only four bits can be corrected with this method. The algorithm does provide enhanced protection for the x8 parts over a standard SECDED implementation. With two memory channels, either ECC method can be utilized with equal performance, although single-channel mode only supports standard SEC/DED. When memory mirroring is enabled, x4 SDDC ECC is supported in single-channel mode when the second channel has been disabled during a fail-down phase. x4 SDDC ECC is not supported during single-channel operation outside of DIMM mirroring fail-down because it does have significant performance impacts in that environment. 3.5.5.2 Integrated Memory Scrub Engine The Intel® E7320 and Intel E7525 MCHs include an integrated engine to walk the populated memory space proactively seeking out soft errors in the memory subsystem. In the case of a single bit correctable error, this hardware detects, logs, and corrects the data except when an incoming write to the same memory address is detected. For any uncorrectable errors detected, the scrub engine logs the failure. Both types of errors may be reported via multiple alternate mechanisms under configuration control. The scrub hardware will also execute "demand scrub" writes when correctable errors are encountered during normal operation (on demand reads, rather than scrub-initiated reads). This functionality provides incremental protection against time-based deterioration of soft memory errors from correctable to uncorrectable. Using this method, an 8 GB system can be completely scrubbed in less than one day. (The effect of these scrub writes do not cause any noticeable degradation to memory bandwidth, although they will cause a greater latency for that one very infrequent read that is delayed due to the scrub write cycle.) Note that an uncorrectable error encountered by the memory scrub engine is a "speculative error." This designation is applied because no system agent has specifically requested use of the corrupt data, and no real error condition exists in the system until that occurs. It is possible that the error resides in an unmodified page of memory that will be simply dropped on a swap back to disk. Were that to occur, the speculative error would simply "vanish" from the system undetected without adverse consequences. 3.5.5.3 Retry on Uncorrectable Error The Intel® E7320 and Intel E7525 MCHs include specialized hardware to resubmit a memory read request upon detection of an uncorrectable error. When a demand fetch (as opposed to a scrub) of memory encounters an uncorrectable error as determined by the enabled ECC algorithm, the memory control hardware will cause a (single) full resubmission of the entire cache line request from memory to verify the existence of corrupt data. This feature is expected to greatly reduce or eliminate the reporting of false or transient uncorrectable errors in the DRAM array. 28 Revision 4.0

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184

Functional Architecture
Intel® Server Boards SE7320SP2 and SE7525GP2
Revision 4.0
28
3.5.5.1
DRAM ECC – Intel
®
x4 Single Device Data Correction (x4 SDDC)
The DRAM interface uses two different ECC algorithms. The first is a standard SEC/DED ECC
across a 64-bit data quantity. The second ECC method is a distributed, 144-bit S4EC-D4ED
mechanism, which provides x4 SDDC protection for DIMMS that utilize x4 devices. Bits from x4
parts are presented in an interleaved fashion such that each bit from a particular part is
represented in a different ECC word. DIMMs that use x8 devices, can use the same algorithm
but will not have x4 SDDC protection, since at most only four bits can be corrected with this
method. The algorithm does provide enhanced protection for the x8 parts over a standard SEC-
DED implementation. With two memory channels, either ECC method can be utilized with equal
performance, although single-channel mode only supports standard SEC/DED.
When memory mirroring is enabled, x4 SDDC ECC is supported in single-channel mode when
the second channel has been disabled during a fail-down phase. x4 SDDC ECC is not
supported during single-channel operation outside of DIMM mirroring fail-down because it does
have significant performance impacts in that environment.
3.5.5.2
Integrated Memory Scrub Engine
The Intel
®
E7320 and Intel E7525 MCHs include an integrated engine to walk the populated
memory space proactively seeking out soft errors in the memory subsystem. In the case of a
single bit correctable error, this hardware detects, logs, and corrects the data except when an
incoming write to the same memory address is detected. For any uncorrectable errors detected,
the scrub engine logs the failure. Both types of errors may be reported via multiple alternate
mechanisms under configuration control. The scrub hardware will also execute “demand scrub”
writes when correctable errors are encountered during normal operation (on demand reads,
rather than scrub-initiated reads). This functionality provides incremental protection against
time-based deterioration of soft memory errors from correctable to uncorrectable.
Using this method, an 8 GB system can be completely scrubbed in less than one day. (The
effect of these scrub writes do not cause any noticeable degradation to memory bandwidth,
although they will cause a greater latency for that one very infrequent read that is delayed due
to the scrub write cycle.)
Note that an uncorrectable error encountered by the memory scrub engine is a “speculative
error.” This designation is applied because no system agent has specifically requested use of
the corrupt data, and no real error condition exists in the system until that occurs. It is possible
that the error resides in an unmodified page of memory that will be simply dropped on a swap
back to disk. Were that to occur, the speculative error would simply “vanish” from the system
undetected without adverse consequences.
3.5.5.3
Retry on Uncorrectable Error
The Intel
®
E7320 and Intel E7525 MCHs include specialized hardware to resubmit a memory
read request upon detection of an uncorrectable error. When a demand fetch (as opposed to a
scrub) of memory encounters an uncorrectable error as determined by the enabled ECC
algorithm, the memory control hardware will cause a (single) full resubmission of the entire
cache line request from memory to verify the existence of corrupt data. This feature is expected
to greatly reduce or eliminate the reporting of false or transient uncorrectable errors in the
DRAM array.