Intel X38ML Product Specification - Page 104

Logging Format Conventions - driver download

Page 104 highlights

Error Reporting and Handling Intel® Server Board X38ML the BMC as PCI Express* Bus Correctable errors. PCI Express* non-fatal and fatal errors are reported to the BMC as PCI Express* Bus Uncorrectable errors. The system event log for these errors includes the location of the device reporting an error, the PCI Express* link number, PCI bus number, PCI device number, and the PCI function number. An NMI is generated for PCI Express* Uncorrectable errors after they are logged. 6.2.2.3 Processor Bus Error The BIOS enables the error correction and detection capabilities of the processors by setting appropriate bits in the processor model specific register (MSR) and the chipset. When unrecoverable errors occur on the host processor bus, the asynchronous error handler (usually SMI) may not execute properly or log the event. The handler records the error in the system event log only if the system has not experienced a catastrophic failure that compromises the integrity of the handler. 6.2.2.4 Memory Bus Error The hardware is programmed to generate an SMI on correctable data errors in the memory array. The SMI handler records the error and the DIMM location to the system event log. Uncorrectable errors in the memory array are mapped to the SMI because the BMC cannot determine the location of the bad DIMM. The uncorrectable errors may have corrupted the contents of SMRAM. The SMI handler will log the failing DIMM number to the BMC if the SMRAM contents are still valid. The ability to isolate the failure down to a single DIMM may not be available with certain errors, and/or during early POST. The format of the data bytes is described in Section 6.2.3.1. 6.2.2.5 Operating System Watchdog Failure If an operating system device driver is using the watchdog timer to detect software or hardware failures and that timer expires, an Asynchronous Reset (ASR) is generated. The ASR is equivalent to a hard reset. The POST portion of the BIOS can query the BMC for a watchdog reset event as the system reboots and log it in the SEL. 6.2.2.6 Boot Event The BIOS downloads the system date and time to the BMC during POST and logs a boot event. Software that parses the event log should not treat the boot event as an error. 6.2.3 Logging Format Conventions The BIOS complies with the logging format defined in the IPMI specification. IPMI requires the use of all but two bytes in each event log entry, called Event Data 2 and Event Data 3. An event generator can specify that these bytes contain OEM-specified values. The system BIOS uses these two bytes to record additional information about the error. This specification describes the format of the OEM data bytes (Event Data 2 and 3) for the following errors: ƒ Memory errors (see Section 6.2.3.1) ƒ PCI bus errors (see Section 6.2.3.3) 92 Revision 1.3 Intel order number E15331-006

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132

Error Reporting and Handling
Intel® Server Board X38ML
Revision 1.3
Intel order number E15331-006
92
the BMC as PCI Express* Bus Correctable errors. PCI Express* non-fatal and fatal errors are
reported to the BMC as PCI Express* Bus Uncorrectable errors. The system event log for these
errors includes the location of the device reporting an error, the PCI Express* link number, PCI
bus number, PCI device number, and the PCI function number. An NMI is generated for PCI
Express* Uncorrectable errors after they are logged.
6.2.2.3
Processor Bus Error
The BIOS enables the error correction and detection capabilities of the processors by setting
appropriate bits in the processor model specific register (MSR) and the chipset.
When unrecoverable errors occur on the host processor bus, the asynchronous error handler
(usually SMI) may not execute properly or log the event. The handler records the error in the
system event log only if the system has not experienced a catastrophic failure that compromises
the integrity of the handler.
6.2.2.4
Memory Bus Error
The hardware is programmed to generate an SMI on correctable data errors in the memory
array. The SMI handler records the error and the DIMM location to the system event log.
Uncorrectable errors in the memory array are mapped to the SMI because the BMC cannot
determine the location of the bad DIMM. The uncorrectable errors may have corrupted the
contents of SMRAM. The SMI handler will log the failing DIMM number to the BMC if the
SMRAM contents are still valid. The ability to isolate the failure down to a single DIMM may not
be available with certain errors, and/or during early POST. The format of the data bytes is
described in Section 6.2.3.1.
6.2.2.5
Operating System Watchdog Failure
If an operating system device driver is using the watchdog timer to detect software or hardware
failures and that timer expires, an Asynchronous Reset (ASR) is generated. The ASR is
equivalent to a hard reset. The POST portion of the BIOS can query the BMC for a watchdog
reset event as the system reboots and log it in the SEL.
6.2.2.6
Boot Event
The BIOS downloads the system date and time to the BMC during POST and logs a boot event.
Software that parses the event log should not treat the boot event as an error.
6.2.3
Logging Format Conventions
The BIOS complies with the logging format defined in the IPMI specification. IPMI requires the
use of all but two bytes in each event log entry, called Event Data 2 and Event Data 3. An event
generator can specify that these bytes contain OEM-specified values. The system BIOS uses
these two bytes to record additional information about the error.
This specification describes the format of the OEM data bytes (Event Data 2 and 3) for the
following errors:
±
Memory errors (see Section 6.2.3.1)
±
PCI bus errors (see Section 6.2.3.3)