HP C3939A Memory technology evolution: an overview of system memory technologi - Page 10
DIMM error detection/correction technologies
UPC - 088698056431
View all HP C3939A manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 10 highlights
To prevent this and other memory-related problems, HP urges customers to use only HP-certified DIMMs, which are available in the memory option kits for each ProLiant server (see the "Importance of using HP-certified memory modules in ProLiant servers" section). Another important difference between single-rank and dual-rank DIMMs is cost. Typically, memory costs increase with DRAM density. For example, the cost of an advanced, high-density DRAM chip is typically more than twice that of a conventional DRAM chip. Because large capacity single-rank DIMMs are manufactured with higher-density DRAM chips, they typically cost more than dual-rank DIMMs of comparable capacity. DIMM error detection/correction technologies Memory modules used in servers are inherently susceptible to memory errors. As described earlier, each DRAM chip stores data in columns and rows of capacitors (memory cells) that must be continuously recharged (refreshed) to preserve the data. The operating voltage of the memory device determines the level of the electrical charge. However, if a capacitor's charge is affected by some external event, the data may become incorrect. Such memory errors can cause applications and operating systems to crash and can result in the permanent loss of business data. Memory errors are classified by the number of bits that are affected-single-bit or multi-bit-and the cause of error. A 64-bit wide data bus transports 64 bits at a time. These 64 bits constitute an ECC data word. An error in one bit of a data word is a single-bit error. An error in more than one bit of a data word is a multi-bit error. Depending on the cause, a memory error is referred to as either a hard or soft error. A hard error is caused by a broken or defective piece of hardware, so the device consistently returns incorrect results. For example, a memory cell may be stuck so that it always returns "0" bit, even when a "1" bit is written to it. Hard errors can be caused by DRAM defects, bad solder joints, connector issues, and other physical issues. Soft errors are more prevalent. They occur randomly when an electrical disturbance near a memory cell alters the charge on the capacitor. A soft error does not indicate a problem with a memory device because once the stored data is corrected (for example, by a write to a memory cell), the same error does not recur. The increasing possibility of memory errors Two trends increase the likelihood of memory errors in servers: expanding memory capacity and increasing storage density. Software vendors are developing increasingly complex and memory-intensive applications. This drives operating systems to address more memory, which causes manufacturers to expand the memory capacity of the servers. For example, while the HP ProLiant DL585 G2 of 2007 could support a maximum of 128 GB, some of the latest servers now support up to 256 GB of memory. As manufacturers continue to expand the memory capacity of servers, the possibility of memory errors likewise increases. Two parameters of DRAM are inextricably tied together-the storage density of the DRAM chips and the operating voltage of the memory system. As the size of memory cells decreases, both DRAM storage density and the memory-cell voltage sensitivity increase. Initially, industry-standard DIMMs operated at 5 volts. However, due to improvements in DRAM storage density, the operating voltage decreased first to 3.3 V, then 2.5 V, and then 1.8 V to allow memory to run faster and consume less power. Because memory storage density is increasing and operating voltage is shrinking, there is a higher probability that an error may occur. Whenever a data bit is misinterpreted and not corrected, the error can cause an application to crash. The only true protection from memory errors is to use some sort of memory detection or correction protocol. Some protocols can only detect errors, while others can both detect and correct memory problems, seamlessly. 10