HP ProLiant DL280 System Memory Troubleshooting Best Practices for HP ProLiant - Page 2

Why Should I Troubleshoot Every Memory Problem?, How Can I Tell if a Memory Problem has Occurred?

Page 2 highlights

Why Should I Troubleshoot Every Memory Problem? Accurate diagnosis of system memory problems in ProLiant servers has many advantages, including: • Prevents unnecessary hardware replacement. • Prevents the return of parts that test NFF (No Fault Found). • Prevents server downtime. B es t P r a ct ice: Many product issues that result in hardware replacement are preventable or correctable with a firmware update. HP recommends checking for a firmware update before sending a part back to HP for replacement. Based on the HP ProLiant product return rates, a significant percentage of all returned hardware products were functioning properly and only needed a firmware update. Although not all products fall into this category, server downtime and time spent removing, returning, and ultimately replacing hardware may have been avoided if an attempt had been made to flash the firmware during the troubleshooting process. How Can I Tell if a Memory Problem has Occurred? There are many indicators that a problem has occurred within the memory subsystem. HP has several tools used to identify the status of hardware and software within a system. Using these tools is a good first step in the process. When a memory problem is suspected, check one or all of these common places to find information: • The HP System Management Homepage • HP Systems Insight Manager (HP SIM) • Server Logs • DIMM Slot LEDs IMP ORTANT: When a memory error is detected, the firmware illuminates the fault LEDs located near each DIMM slot on the system board. If the system identifies an error to a specific slot, that LED illuminates. However, if the system can only identify an error within a bank, but cannot isolate a specific DIMM, all the LEDs in the bank will illuminate. In addition, if the system cannot identify the bank in which the error has occurred, all the LEDs in all banks illuminate, making the task of isolating the failing DIMM more difficult, and the chance of replacing functioning banks of memory more likely. 2

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

2
Why Should
I Troubleshoot Every Memory Problem?
Accurate diagnosis of system memory problems in ProLiant servers has many advantages, including:
Prevents unnecessary hardware replacement.
Prevents the return of parts that test NFF (No Fault Found).
Prevents server downtime.
Best Practice
: Many product issues that result in hardware replacement are preventable or
correctable with a firmware update. HP recommends checking for a firmware update before sending
a part back to HP for replacement. Based on the HP ProLiant product return rates, a significant
percentage of all returned hardware products were functioning properly and only needed a firmware
update. Although not all products fall into this category, server downtime and time spent removing,
returning, and ultimately replacing hardware may have been avoided if an attempt had been made
to flash the firmware during the troubleshooting process.
How Can I Tell if a Memory Problem has Occurred?
There are many indicators that a problem has occurred within the memory subsystem. HP has several tools
used to identify the status of hardware and software within a system. Using these tools is a good first step in
the process.
When a memory problem is suspected, check one or all of these common places to find
information:
The HP System Management Homepage
HP Systems Insight Manager (HP SIM)
Server Logs
DIMM Slot LEDs
IMPORTANT:
When a memory error is detected, the firmware illuminates the fault LEDs located near each DIMM
slot on the system board.
If the system identifies an error to a specific slot, that LED illuminates. However, if the system can only
identify an error within a bank, but cannot isolate a specific DIMM, all the LEDs in the bank will
illuminate.
In addition, if the system cannot identify the bank in which the error has occurred, all the LEDs in all
banks illuminate, making the task of isolating the failing DIMM more difficult, and the chance of
replacing functioning banks of memory more likely.