Home » Intel Manuals » Motherboards » Intel X38ML » Manual Viewer

Intel X38ML Product Specification - Page 103

Error Handling and Logging

UPC - 735858197397

Add to My Manuals
Save this manual to your list of manuals

Page 103 highlights

Intel® Server Board X38ML Error Reporting and Handling 6.2 Error Handling and Logging This section defines how errors are handled by the system BIOS, including a discussion of the role of the BIOS in error handling and the interaction between the BIOS, platform hardware, and server management firmware with regard to error handling. In addition, error-logging techniques are described and error codes for errors are defined. 6.2.1 Error Sources and Types One of the major requirements of server management is to correctly and consistently handle system errors. System errors that can be enabled and disabled individually or as a group can be categorized as follows: PCI bus Memory single- and multi-bit errors Sensors Errors detected during POST, logged as POST errors Sensors are managed by the BMC. The BMC is capable of receiving event messages from individual sensors and logging system events. For more information on BMC logged errors, see the BMC EPS. 6.2.2 Error Logging via SMI Handler The SMI handler is used to handle and log system level events not visible to the server management firmware. The SMI handler pre-processes all system errors, including errors that can generate an NMI. The SMI handler sends a command to the BMC to log the event and provides the data to be logged. For example, the BIOS programs the hardware to generate an SMI on a single-bit memory error and logs the location of the failed DIMM in the system event log. System events handled by the BIOS generate an SMI. After the BIOS finishes logging the error, it asserts the NMI if needed. 6.2.2.1 PCI Bus Error The PCI bus defines two error pins, PERR# and SERR#. These are used for reporting PCI parity errors and system errors, respectively. The BIOS can be instructed to enable or disable reporting PERR# and SERR# through the NMI. Disabling NMI for PERR# and/or SERR# also disables logging of the corresponding event. In the case of PERR#, the PCI bus master has the option to retry the offending transaction, or to report it using SERR#. All other PCI-related errors are reported by SERR#. All PCI-to-PCI bridges are configured so that they generate an SERR# on the primary interface whenever there is an SERR# on the secondary side, as long as SERR# is enabled in BIOS Setup. The same is true for PERR#. The format of the data bytes is described in Section 6.2.3.3. 6.2.2.2 PCI Express* Errors The hardware is programmed to generate an SMI on PCI Express* correctable, uncorrectable non-fatal, and uncorrectable fatal errors. The correctable PCI Express* errors are reported to Revision 1.3 91 Intel order number E15331-006

Section	Page
1. Introduction	13
1.1 Server Board Use Disclaimer	13
2. Server Board Overview	14
2.1 Server Board Feature Set	14
2.2 Server Board Layout	16
3. Functional Architecture	19
3.1 Processor Subsystem	20
3.2 Intel® X38 Chipset	20
3.2.2.1 Direct Media Interface (DMI)	21
3.2.2.2 PCI Express* Interfaces	21
3.2.2.3 Serial ATA II Interface	22
3.2.2.4 Low Pin Count Interface (LPC)	22
3.2.2.5 Compatibility Modules	22
3.2.2.6 Universal Serial Bus (USB) Controller	22
3.2.2.7 Real Time Clock (RTC)	23
3.2.2.8 GPIO	23
3.2.2.9 Enhanced Power Management	23
3.2.2.10 System Management Interface	23
3.2.2.11 Serial Peripheral Interface (SPI)	23
3.2.2.12 Manageability	24
3.3 Integrated Baseboard Management Controller	24
3.4 Memory Subsystem	27
3.5 I/O Subsystem	28
3.5.2.1 SATA RAID	28
3.5.2.2 Intel® RAID Technology Option ROM	29
3.5.6.1 Serial Ports	30
3.5.6.2 Keyboard and Mouse Support	31
3.5.6.3 Wake-up Control	31
3.6 Replacing the Back-Up Battery	31
4. System BIOS	33
4.1 BIOS Identification String	33
4.2 Logo/Diagnostic Screen	33
4.3 BIOS Setup Utility	34
4.3.1.1 Setup Page Layout	34
4.3.1.2 Entering BIOS Setup	35
4.3.1.3 Keyboard Commands	35
4.3.1.4 Menu Selection Bar	36
4.3.2.1 Main Screen	37
4.3.2.2 Advanced Screen	39
4.3.2.2.1 Processor Configuration Screen	40
4.3.2.2.2 Memory Configuration Screen	42
4.3.2.2.3 SATA Controller Configuration Screen	44
4.3.2.2.4 Serial Ports Screen	45
4.3.2.2.5 USB Configuration Screen	46
4.3.2.2.6 PCI Screen	48
4.3.2.3 Security Screen	49
4.3.2.4 Server Management Screen	51
4.3.2.4.1 Console Redirection Screen	52
4.3.2.4.2 System Information Screen	54
4.3.2.5 Boot Options Screen	55
4.3.2.5.1 Hard Disk Order Screen	56
4.3.2.5.2 CDROM Order Screen	56
4.3.2.5.3 Floppy Order Screen	57
4.3.2.5.4 Network Device Order Screen	58
4.3.2.5.5 BEV Device Order Screen	58
4.3.2.6 Boot Manager	59
4.3.2.7 Error Manager Screen	60
4.3.2.8 Exit Screen	60
4.4 Loading BIOS Defaults	62
4.5 Multiple Boot Blocks	62
4.6 Recovery Mode	63
4.7 OEM Logo	63
5. Platform Management	65
5.1 Platform Management Features	65
5.2 Power System	66
5.2.5.1 Power Button Signal	68
5.2.5.2 Chipset Sleep S4/S5	69
5.2.5.3 Power-On Enable	69
5.2.5.4 Power-down Disable	69
5.3 Advanced Configuration and Power Interface (ACPI)	70
5.4 System Reset Control	71
5.5 BMC Reset Control	72
5.6 System Initialization	72
5.6.2.1 Watchdog Timer Timeout Reason Bits	73
5.7 Integrated Front Panel User Interface	74
5.7.3.1 Chassis Intrusion	75
5.7.3.2 Power Button	75
5.7.3.3 Reset Button	75
5.8 Private Management I2C Buses	76
5.9 Watchdog Timer	76
5.10 BMC Internal Timestamp Clock	76
5.11 System Event Log (SEL)	77
5.12 Sensor Data Record (SDR) Repository	77
5.13 Field Replaceable Unit (FRU) Inventory Device	78
5.14 Sensor Rearm Behavior	79
5.15 Processor Sensors	80
5.15.1.1 ThermTrip Monitoring	80
5.15.1.2 IERR Monitoring	81
5.15.2.1 PECI Interface	81
5.16 Standard Fan Management	81
5.17 Power Unit Management	83
5.18 BMC Self Test	84
5.19 Messaging Interfaces	84
5.19.6.1 LPC/KCS Interface	85
5.19.6.2 Receive Message Queue	86
5.19.6.3 SMS/SMM Status Register	86
5.19.6.4 Server Management Software (SMS) Interface	86
5.19.6.5 SMM Interface	87
5.19.8.1 Serial-over-LAN (SOL 2.0)	88
5.20 Event Filtering and Alerting	88
5.21 Sensor Support	89
5.22 BIOS-BMC interactions	95
5.23 Platform Management Features Implemented by BIOS	95
5.23.2.1 Serial Configuration Settings	96
5.23.2.2 Keystroke Mappings	96
5.23.2.2.1 Standalone <Esc> Key for Headless Operation	96
5.23.2.3 Limitations	97
5.23.2.4 Interface to Server Management	97
5.23.3.1 Channel Access Modes	97
5.23.3.2 Interaction with BIOS Console Redirection	97
5.23.3.3 SOL, EMP and Console Redirection Use Case Model	98
5.23.4.1 PXE BIOS Support	100
5.23.6.1 Password Clear Jumper	101
6. Error Reporting and Handling	102
6.1 Fault Resilient Booting	102
6.2 Error Handling and Logging	103
6.2.2.1 PCI Bus Error	103
6.2.2.2 PCI Express* Errors	103
6.2.2.3 Processor Bus Error	104
6.2.2.4 Memory Bus Error	104
6.2.2.5 Operating System Watchdog Failure	104
6.2.2.6 Boot Event	104
6.2.3.1 Memory Error Events	105
6.2.3.2 Examples of Event Data Field Contents for Memory Errors	106
6.2.3.3 PCI Error Events	106
6.2.3.4 Examples of Event Data Field Contents for PCI Errors	107
6.2.3.5 FRB-2 Error Events	107
6.2.4.1 No Real-Time Clock (RTC) Access	108
6.3 Error Messages and Error Codes	109
7. Connectors and Jumper Blocks	116
7.1 Power Connectors	116
7.2 PCI Express* x16 Connector	116
7.3 SMBus Connector	118
7.4 Front Panel Connector	118
7.5 I/O Connectors	119
7.6 Fan Headers	122
7.7 Chassis Intrusion Header	122
7.8 Jumper Blocks	123
8. Design and Environmental Specifications	124
8.1 Server Board Design Specification	124
8.2 Product Regulatory Compliance	124
8.3 Electromagnetic Compatibility Notices	127
Glossary	128

Match case Limit results 1 per page

Intel® Server Board X38ML

Error Reporting and Handling

Revision 1.3

Intel order number E15331-006

6.2

Error Handling and Logging

This section defines how errors are handled by the system BIOS, including a discussion of the

role of the BIOS in error handling and the interaction between the BIOS, platform hardware, and

server management firmware with regard to error handling. In addition, error-logging techniques

are described and error codes for errors are defined.

6.2.1

Error Sources and Types

One of the major requirements of server management is to correctly and consistently handle

system errors. System errors that can be enabled and disabled individually or as a group can be

categorized as follows:

PCI bus

Memory single- and multi-bit errors

Sensors

Errors detected during POST, logged as POST errors

Sensors are managed by the BMC. The BMC is capable of receiving event messages from

individual sensors and logging system events. For more information on BMC logged errors, see

the BMC EPS.

6.2.2

Error Logging via SMI Handler

The SMI handler is used to handle and log system level events not visible to the server

management firmware. The SMI handler pre-processes all system errors, including errors that

can generate an NMI.

The SMI handler sends a command to the BMC to log the event and provides the data to be

logged. For example, the BIOS programs the hardware to generate an SMI on a single-bit

memory error and logs the location of the failed DIMM in the system event log. System events

handled by the BIOS generate an SMI. After the BIOS finishes logging the error, it asserts the

NMI if needed.

6.2.2.1

PCI Bus Error

The PCI bus defines two error pins, PERR# and SERR#. These are used for reporting PCI

parity errors and system errors, respectively. The BIOS can be instructed to enable or disable

reporting PERR# and SERR# through the NMI. Disabling NMI for PERR# and/or SERR# also

disables logging of the corresponding event.

In the case of PERR#, the PCI bus master has the option to retry the offending transaction, or to

report it using SERR#. All other PCI-related errors are reported by SERR#. All PCI-to-PCI

bridges are configured so that they generate an SERR# on the primary interface whenever

there is an SERR# on the secondary side, as long as SERR# is enabled in BIOS Setup. The

same is true for PERR#. The format of the data bytes is described in Section 6.2.3.3.

6.2.2.2

PCI Express* Errors

The hardware is programmed to generate an SMI on PCI Express* correctable, uncorrectable

non-fatal, and uncorrectable fatal errors. The correctable PCI Express* errors are reported to