HP DL360 Memory technology evolution: an overview of system memory technologie - Page 11

Basic ECC memory,

Page 11 highlights

Basic ECC memory Parity checking detects only single-bit errors. It does not correct memory errors or detect multi-bit errors. HP introduced error correction code (ECC) memory in 1993 and continues to implement advanced ECC in all HP ProLiant servers. ECC detects both single-bit and multi-bit errors in a 64-bit data word, and it corrects single-bit errors. ECC encodes information in a block of 8 bits to permit the recovery of a single-bit error. Every time data is written to memory, ECC uses a special algorithm to generate values called check bits. The algorithm adds the check bits together to calculate a checksum, which it stores with the data. When data is read from memory, the algorithm recalculates the checksum and compares it with the checksum of the written data. If the checksums are equal, then the data is valid and operation continues. If they are different, the data has an error and the ECC memory logic isolates the error and reports it to the system. In the case of a single-bit error, the ECC memory logic can correct the error and output the corrected data so that the system continues to operate (Figure 8). Figure 8. ECC logic locating and correcting a single-bit error In addition to detecting and correcting single-bit errors, ECC detects (but does not correct) errors of two random bits and up to four bits within a single DRAM chip. ECC memory responds to these multibit errors by generating a non-maskable interrupt (NMI) that instructs the system to halt to avoid data corruption. ECC technology has provided adequate protection for many applications. However, the effectiveness of ECC protection decreases as memory capacity rises. This fact is significant because of the following factors driving industry-standard servers to support more memory capacity: Operating system support for increasing amounts of memory Availability of low-cost, high-capacity memory modules Server virtualization 11

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24

11
Basic ECC memory
Parity checking detects only single-bit errors. It does not correct memory errors or detect multi-bit
errors. HP introduced error correction code (ECC) memory in 1993 and continues to implement
advanced ECC in all HP ProLiant servers. ECC detects both single-bit and multi-bit errors in a 64-bit
data word, and it corrects single-bit errors.
ECC encodes information in a block of 8 bits to permit the recovery of a single-bit error. Every time
data is written to memory, ECC uses a special algorithm to generate values called check bits. The
algorithm adds the check bits together to calculate a checksum, which it stores with the data. When
data is read from memory, the algorithm recalculates the checksum and compares it with the
checksum of the written data. If the checksums are equal, then the data is valid and operation
continues. If they are different, the data has an error and the ECC memory logic isolates the error and
reports it to the system. In the case of a single-bit error, the ECC memory logic can correct the error
and output the corrected data so that the system continues to operate (Figure 8).
Figure 8.
ECC logic locating and correcting a single-bit error
In addition to detecting and correcting single-bit errors, ECC detects (but does not correct) errors of
two random bits and up to four bits within a single DRAM chip. ECC memory responds to these multi-
bit errors by generating a non-maskable interrupt (NMI) that instructs the system to halt to avoid data
corruption. ECC technology has provided adequate protection for many applications. However, the
effectiveness of ECC protection decreases as memory capacity rises. This fact is significant because of
the following factors driving industry-standard servers to support more memory capacity:
Operating system support for increasing amounts of memory
Availability of low-cost, high-capacity memory modules
Server virtualization