HP ProLiant SL165s Memory technology evolution: an overview of system memory t - Page 8

DIMM error detection/correction technologies, The increasing possibility of memory errors

Page 8 highlights

To prevent such memory-related problems, we advise our customers to use only HP-certified DIMMs, which are available in the memory option kits for each ProLiant server (see the "Importance of using HP-certified memory modules in ProLiant servers" section). Another important difference between single-rank and dual-rank DIMMs is cost. Memory costs generally increase with DRAM density. For example, the cost of an advanced, high-density DRAM chip usually runs more than twice that of a conventional DRAM chip. Because large capacity, single-rank DIMMs are manufactured with higher-density DRAM chips, they typically cost more than dual-rank DIMMs of comparable capacity. DIMM error detection/correction technologies Memory modules are inherently susceptible to memory errors. Each DRAM chip stores data in columns and rows of capacitors, or memory cells. The DIMM continuously refreshes the cells to preserve the data. The operating voltage of the memory device determines the level of the electrical charge. If an external event affects a capacitor's charge, the data may become incorrect. Such memory errors can cause applications and operating systems to crash, sometimes resulting in permanent data loss. Memory errors are classified by the number of bits that are affected-single-bit or multi-bit-and the cause of error. A 64-bit wide data bus transports 64 bits at a time. These 64 bits constitute an ECC data word. An error in one bit of a data word is a single-bit error. An error in more than one bit of a data word is a multi-bit error. Depending on the cause, engineers refer to memory errors as either hard or soft. Broken or defective pieces of hardware, such as DRAM defects, bad solder joints, and connector issues, cause hard errors so the device consistently returns incorrect results. For example, a memory cell may be stuck so that it always returns "0" bit, even when a "1" bit is written to it. Soft errors are more prevalent. They occur randomly when an electrical disturbance near a memory cell alters the charge on the capacitor. A soft error does not indicate a problem with a memory device because once the stored data is corrected the error does not recur. The increasing possibility of memory errors Two trends increase the likelihood of memory errors in servers: • Expanding memory capacity • Increasing storage density Software vendors are developing increasingly complex, memory-intensive applications. This drives operating systems to address more memory, which causes manufacturers to expand memory capacity. Increased memory use increases the possibility of memory errors. The storage density of the DRAM chips depends on the operating voltage of the memory system. As the size of memory cells decreases, both DRAM storage density and the memory-cell voltage sensitivity increase. Initially, industry-standard DIMMs operated at 5 volts. Because of improvements in DRAM storage density, operating voltage decreased first to 3.3 V, then 2.5 V, and then 1.8 V, allowing memory to run faster and consume less power. However, with increased memory storage and decreased operating voltage, a higher probability exists that an error may occur. Whenever a data bit is misinterpreted and goes uncorrected, the error can cause an application to crash. The only true protection from memory errors is to use some sort of memory detection or correction protocol. Some protocols only detect errors, while others can both detect and correct memory problems seamlessly. Basic ECC memory Parity checking detects only single-bit errors. It does not correct memory errors or detect multi-bit errors. HP introduced Error Correction Code (ECC) memory in 1993 and we continue to implement advanced ECC in 8

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19

8
To prevent such memory-related problems, we advise our customers to use only HP-certified DIMMs, which
are available in the memory option kits for each ProLiant server (see the “Importance of using HP-certified
memory modules in ProLiant servers” section).
Another important difference between single-rank and dual-rank DIMMs is cost. Memory costs generally
increase with DRAM density. For example, the cost of an advanced, high-density DRAM chip usually runs
more than twice that of a conventional DRAM chip. Because large capacity, single-rank DIMMs are
manufactured with higher-density DRAM chips, they typically cost more than dual-rank DIMMs of
comparable capacity.
DIMM error detection/correction technologies
Memory modules are inherently susceptible to memory errors. Each DRAM chip stores data in columns and
rows of capacitors, or memory cells. The DIMM continuously refreshes the cells to preserve the data. The
operating voltage of the memory device determines the level of the electrical charge. If an external event
affects a capacitor’s charge, the data may become incorrect. Such memory errors can cause applications
and operating systems to crash, sometimes resulting in permanent data loss.
Memory errors are classified by the number of bits that are affected—single-bit or multi-bit—and the cause
of error. A 64-bit wide data bus transports 64 bits at a time. These 64 bits constitute an ECC data word.
An error in one bit of a data word is a single-bit error. An error in more than one bit of a data word is a
multi-bit error.
Depending on the cause, engineers refer to memory errors as either hard or soft. Broken or defective pieces
of hardware, such as DRAM defects, bad solder joints, and connector issues, cause hard errors so the
device consistently returns incorrect results. For example, a memory cell may be stuck so that it always
returns “0” bit, even when a “1” bit is written to it. Soft errors are more prevalent. They occur randomly
when an electrical disturbance near a memory cell alters the charge on the capacitor. A soft error does not
indicate a problem with a memory device because once the stored data is corrected the error does not
recur.
The increasing possibility of memory errors
Two trends increase the likelihood of memory errors in servers:
Expanding memory capacity
Increasing storage density
Software vendors are developing increasingly complex, memory-intensive applications. This drives
operating systems to address more memory, which causes manufacturers to expand memory capacity.
Increased memory use increases the possibility of memory errors.
The storage density of the DRAM chips depends on the operating voltage of the memory system. As the size
of memory cells decreases, both DRAM storage density and the memory-cell voltage sensitivity increase.
Initially, industry-standard DIMMs operated at 5 volts. Because of improvements in DRAM storage density,
operating voltage decreased first to 3.3 V, then 2.5 V, and then 1.8 V, allowing memory to run faster and
consume less power. However, with increased memory storage and decreased operating voltage, a higher
probability exists that an error may occur. Whenever a data bit is misinterpreted and goes uncorrected, the
error can cause an application to crash. The only true protection from memory errors is to use some sort of
memory detection or correction protocol. Some protocols only detect errors, while others can both detect
and correct memory problems seamlessly.
Basic ECC memory
Parity checking detects only single-bit errors. It does not correct memory errors or detect multi-bit errors. HP
introduced Error Correction Code (ECC) memory in 1993 and we continue to implement advanced ECC in