HP ProLiant SE2170s Memory technology evolution: an overview of system memory - Page 9

Advanced ECC memory, Server virtualization

Page 9 highlights

all HP ProLiant servers. ECC detects both single-bit and multi-bit errors in a 64-bit data word; it corrects single-bit errors. ECC encodes information in a block of 8 bits to recover a single-bit error. When the DIMM writes data to memory, ECC uses a special algorithm to generate values called check bits. The algorithm adds the check bits together to calculate a checksum, which it stores with the data. When reading data from memory, the algorithm recalculates the checksum and compares it with the checksum of the written data. If the checksums are equal, then the data is valid and operation continues. If they differ, the data has an error, and the ECC memory logic isolates the error and reports it to the system. In the case of a single-bit error, the ECC memory logic can correct the error and output the corrected data so that the system continues to operate (Figure 6). Figure 6. ECC logic locating and correcting a single-bit error In addition to correcting single-bit errors, ECC detects, but does not correct, errors of two random bits and up to four bits within a single DRAM chip. ECC memory responds to these multi-bit errors by generating a Non-Maskable Interrupt (NMI) message that instructs the system to halt. ECC technology provided adequate protection for many applications. But the effectiveness of ECC protection decreases as memory capacity rises. These trends help to drive manufacturers to build more memory capacity in industry-standard servers: • Operating system support for increasing amounts of memory • Availability of low-cost, high-capacity memory modules • Server virtualization Advanced ECC memory To improve memory protection beyond standard ECC, HP introduced Advanced ECC technology in 1996. HP and most other server manufacturers use this solution in industry-standard products. Advanced ECC can correct a multi-bit error that occurs within a DRAM chip and avoid a complete DRAM chip failure. In Advanced ECC with 4-bit memory devices, each chip contributes four bits of data to the data word. The four bits from each chip are distributed across four ECC devices (one bit per ECC device), so that an error in one chip could produce up to four separate single-bit errors. Figure 7 shows how one ECC device receives four data bits from four DRAM chips. 9

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

all HP ProLiant servers. ECC detects both single-bit and multi-bit errors in a 64-bit data word; it corrects
single-bit errors.
ECC encodes information in a block of 8 bits to recover a single-bit error. When the DIMM writes data to
memory, ECC uses a special algorithm to generate values called check bits. The algorithm adds the check
bits together to calculate a checksum, which it stores with the data. When reading data from memory, the
algorithm recalculates the checksum and compares it with the checksum of the written data. If the checksums
are equal, then the data is valid and operation continues. If they differ, the data has an error, and the ECC
memory logic isolates the error and reports it to the system. In the case of a single-bit error, the ECC
memory logic can correct the error and output the corrected data so that the system continues to operate
(Figure 6).
Figure 6.
ECC logic locating and correcting a single-bit error
In addition to correcting single-bit errors, ECC detects, but does not correct, errors of two random bits and
up to four bits within a single DRAM chip. ECC memory responds to these multi-bit errors by generating a
Non-Maskable Interrupt (NMI) message that instructs the system to halt. ECC technology provided adequate
protection for many applications. But the effectiveness of ECC protection decreases as memory capacity
rises. These trends help to drive manufacturers to build more memory capacity in industry-standard servers:
Operating system support for increasing amounts of memory
Availability of low-cost, high-capacity memory modules
Server virtualization
Advanced ECC memory
To improve memory protection beyond standard ECC, HP introduced Advanced ECC technology in 1996.
HP and most other server manufacturers use this solution in industry-standard products. Advanced ECC can
correct a multi-bit error that occurs within a DRAM chip and avoid a complete DRAM chip failure. In
Advanced ECC with 4-bit memory devices, each chip contributes four bits of data to the data word. The
four bits from each chip are distributed across four ECC devices (one bit per ECC device), so that an error
in one chip could produce up to four separate single-bit errors. Figure 7 shows how one ECC device
receives four data bits from four DRAM chips.
9