HP Carrier-grade cc3300 Intel Server Management (ISM) Installation and User&am - Page 38

PCI Hot-Plug Device, Power Supply and Power Unit

Page 38 highlights

Intel Server Management (ISM) Installation and User's Guide Client SSU (CSSU) Details • Sets the Last Error Update value to During PIC Runtime, indicating the update occurred while the system was operational The BIOS stops logging noncritical single-bit errors when the SBE error count reaches nine. This prevents the errors from filling the SEL. Upon system reboot, the OS uses the SEL records, along with the results from its own memory test, to map out bad memory by reducing the usable size of a memory bank to avoid using the bad memory element(s). This elimination of hard errors is a precaution that prevents single-bit errors from becoming multiple-bit errors after the system has booted, and also to prevent single-bit errors from being detected and logged each time the failed locations are accessed. Upon reboot, the single-bit error count is set to zero in the SEL. Multiple-Bit Error (MBE) Handling If a multiple-bit error occurs, the system generates a System Management Interrupt (SMI) that allows the BIOS to log information about the error in the SEL, identifying the memory bank in which the error occurred. However, on some systems, it is not possible to determine the exact memory device that caused a multiple-bit error. Because a multiple-bit error is a critical condition, upon logging the error the BIOS generates an NMI that halts the system. Upon rebooting the server, this error is indicated as a critical condition on the Memory Array and Memory Device in the health branch of PIC. The requested event actions are carried out, and PIC: • Increments the critical error count on the Sensor Settings tab • Sets the Memory Device Error Type to MBE on the Sensor Information tab for the Memory Device • Sets the Last Error Update value to Previous Boot, indicating the last update occurred during the last system boot Comparison of Single-bit Errors to Multiple-bit Errors Table 4-3 compares the steps taken with single-bit and multiple-bit errors. Table 4-3 SBE and MBE Comparison Memory Error Handling SBE Generate SMI Yes Log information includes Exact SIMM or DIMM Action after SEL logging Continue operation Indicated by PIC screen changes Immediately Bad memory is mapped out at next reboot Yes MBE Yes Memory bank only Stop the system After the system reboots Yes (immediately after the failure) PCI Hot-Plug Device This sensor screen displays information about each PCI hot-plug device installed in a PHP slot. Power Supply and Power Unit The Power Supply sensor screen shows information about each power supply. The Power Unit represents power-supply redundancy. For systems that support it, PIC monitors the status of the power supplies in the managed server. The power unit sensor screen displays information and status about each power unit. 38

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102

Intel Server Management (ISM) Installation and User's Guide
Client SSU (CSSU) Details
38
Sets the Last Error Update value to During PIC Runtime, indicating the update occurred while the
system was operational
The BIOS stops logging noncritical single-bit errors when the SBE error count reaches nine. This prevents
the errors from filling the SEL. Upon system reboot, the OS uses the SEL records, along with the results
from its own memory test, to map out bad memory by reducing the usable size of a memory bank to avoid
using the bad memory element(s). This elimination of hard errors is a precaution that prevents single-bit
errors from becoming multiple-bit errors after the system has booted, and also to prevent single-bit errors
from being detected and logged each time the failed locations are accessed. Upon reboot, the single-bit error
count is set to zero in the SEL.
Multiple-Bit Error (MBE) Handling
If a multiple-bit error occurs, the system generates a System Management Interrupt (SMI) that allows the
BIOS to log information about the error in the SEL, identifying the memory bank in which the error
occurred. However, on some systems, it is not possible to determine the exact memory device that caused a
multiple-bit error.
Because a multiple-bit error is a critical condition, upon logging the error the BIOS generates an NMI that
halts the system. Upon rebooting the server, this error is indicated as a critical condition on the Memory
Array and Memory Device in the health branch of PIC. The requested event actions are carried out, and PIC:
Increments the critical error count on the Sensor Settings tab
Sets the Memory Device Error Type to MBE on the Sensor Information tab for the Memory Device
Sets the Last Error Update value to Previous Boot, indicating the last update occurred during the last
system boot
Comparison of Single-bit Errors to Multiple-bit Errors
Table 4-3 compares the steps taken with single-bit and multiple-bit errors.
Table 4-3
SBE and MBE Comparison
Memory Error Handling
SBE
MBE
Generate SMI
Yes
Yes
Log information includes
Exact SIMM or DIMM
Memory bank only
Action after SEL logging
Continue operation
Stop the system
Indicated by PIC screen changes
Immediately
After the system reboots
Bad memory is mapped out at next reboot
Yes
Yes (immediately after the failure)
PCI Hot-Plug Device
This sensor screen displays information about each PCI hot-plug device installed in a PHP slot.
Power Supply and Power Unit
The Power Supply sensor screen shows information about each power supply.
The Power Unit represents power-supply redundancy. For systems that support it, PIC monitors the status
of the power supplies in the managed server. The power unit sensor screen displays information and status
about each power unit.