Dell PowerEdge 3250 Product Guide (.pdf) - Page 13

SR870BH2 Machine Check Error Handling, Classification of Errors, Error Types - specifications

Page 13 highlights

Intel® Server Platform SR870BH2 SR870BH2 Machine Check Error Handling 5. SR870BH2 Machine Check Error Handling This section gives an overview of the implementation of machine check error handling on the Server Platform SR870BH2. For additional details about Itanium-based system error generation and error handling, refer to the Itanium® Processor Family Error Handling Guide (document number: 249278-002) and the Itanium® System Abstraction Layer Specification (document number: 245359-005). Both documents can be downloaded from the web at www.developer.intel.com. The goal of MCA is to contain errors and correct as many as possible before they propagate to network or permanent storage. If an error cannot be fixed by the hardware or firmware, and the OS cannot handle it, the machine shall be reset. MCA errors include ECC, BINIT, BERR, SERR, and PERR. These conditions are handled by the BIOS through SAL 3.0-compatible services. 5.1 Classification of Errors Error events are classified by the processor and platform into three basic groups. This section provides a summary of the different error types and signaling methods defined by the Itanium Machine Check Architecture (MCA) and implemented in the Server Platform SR870BH2. 5.2 Error Types Fatal: A fatal error is an error where the state has been corrupted and the error may, or may not, be contained. The platform will signal a fatal error when the integrity of the platform or subsystem cannot be determined. These errors cannot be corrected by hardware, firmware, or system software. A reset of the system or subsystem is required. Recoverable/Uncorrectable: An error has been detected that cannot be corrected by hardware or firmware. However, the operating integrity of platform hardware and system state has been maintained. These errors may or may not be recoverable (determined by system software capabilities). Correctable: An error has been detected and corrected by hardware, or by processor/platform firmware. Revision 1.1 7

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37

Intel® Server Platform SR870BH2
SR870BH2 Machine Check Error Handling
Revision 1.1
7
5.
SR870BH2 Machine Check Error Handling
This section gives an overview of the implementation of machine check error handling on the
Server Platform SR870BH2. For additional details about Itanium-based system error generation
and error handling, refer to the
Itanium® Processor Family Error Handling Guide
(document
number: 249278-002) and the
Itanium® System Abstraction Layer Specification
(document
number: 245359-005). Both documents can be downloaded from the web at
www.developer.intel.com.
The goal of MCA is to contain errors and correct as many as possible before they propagate to
network or permanent storage. If an error cannot be fixed by the hardware or firmware, and the
OS cannot handle it, the machine shall be reset. MCA errors include ECC, BINIT, BERR,
SERR, and PERR. These conditions are handled by the BIOS through SAL 3.0-compatible
services.
5.1
Classification of Errors
Error events are classified by the processor and platform into three basic groups. This section
provides a summary of the different error types and signaling methods defined by the Itanium
Machine Check Architecture (MCA) and implemented in the Server Platform SR870BH2.
5.2
Error Types
±
Fatal:
A fatal error is an error where the state has been corrupted and the error may, or
may not, be contained. The platform will signal a fatal error when the integrity of the
platform or subsystem cannot be determined. These errors cannot be corrected by
hardware, firmware, or system software. A reset of the system or subsystem is required.
±
Recoverable/Uncorrectable:
An error has been detected that cannot be corrected by
hardware or firmware. However, the operating integrity of platform hardware and system
state has been maintained. These errors may or may not be recoverable (determined by
system software capabilities).
±
Correctable:
An error has been detected and corrected by hardware, or by
processor/platform firmware.