Intel SE7525GP2 Product Specification - Page 140

Single-bit ECC Error Throttling Prevention

Page 140 highlights

Error Reporting and Handling Intel® Server Boards SE7320SP2 and SE7525GP2 6.1.2.6 Boot Event The BIOS downloads the system date and time to the mBMC during POST and logs a boot event. This record does not indicate an error, and software that parses the event log should treat it as such. 6.1.2.7 Logging Format Conventions The BIOS event log data in the SEL complies with the IPMI specification. IPMI requires use of all but two bytes in each event log entry, called Event Data 2 and Event Data 3. An event generator can specify that these bytes contain OEM-specified values. The system BIOS uses these two bytes to record additional information about the error. The format of the OEM data bytes (Event Data 2 and Event Data 3) for memory errors, PCI bus errors and FRB2 errors is described here. This format is supported by all platforms that are IPMI version 1.0 (or later) compliant. Bits 3:1 of the generator ID field define the format revision. The system software ID is a 7-bit quantity. For events covered in this document, the system software IDs will be within the range 0x18-0x1F. System software ID of 0x18 indicates that OEM data byte 2 and 3 are encoded using data format scheme revision 0. Note that the system software IDs in the range 0x10-0x1f are reserved for the SMI handler. The IPMI specification reserves two distinct ranges for the BIOS and the SMI handler. Since the distinction between the two is not very important, we use the same values of generator ID's for the BIOS as well as the SMI handler. Technically, the FRB-2 event is not logged by the SMI handler, but it will use the same generator ID range as memory errors. 6.1.3 Single-bit ECC Error Throttling Prevention The system detects, corrects, and logs correctable errors. As long as these errors occur infrequently, the system should continue to operate without a problem. Occasionally, correctable errors are caused by a persistent failure of a single component. For example, a broken data line on a DIMM would exhibit repeated errors until replaced. Although these errors are correctable, continual calls to the error logger can throttle the system, preventing any further useful work. For this reason, the system counts certain types of correctable errors and disables reporting if they occur too frequently. Correction remains enabled but calls to the error handler are disabled. This allows the system to continue running, despite a persistent correctable failure. The BIOS adds an entry to the event log to indicate that logging for that type of error has been disabled. Such an entry indicates a serious hardware problem that must be repaired at the earliest possible time. 128 Revision 4.0

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184

Error Reporting and Handling
Intel® Server Boards SE7320SP2 and SE7525GP2
Revision 4.0
128
6.1.2.6
Boot Event
The BIOS downloads the system date and time to the mBMC during POST and logs a boot
event. This record does not indicate an error, and software that parses the event log should
treat it as such.
6.1.2.7
Logging Format Conventions
The BIOS event log data in the SEL complies with the IPMI specification. IPMI requires use of
all but two bytes in each event log entry, called Event Data 2 and Event Data 3. An event
generator can specify that these bytes contain OEM-specified values. The system BIOS uses
these two bytes to record additional information about the error.
The format of the OEM data bytes (Event Data 2 and Event Data 3) for memory errors, PCI bus
errors and FRB2 errors is described here. This format is supported by all platforms that are IPMI
version 1.0 (or later) compliant.
Bits 3:1 of the generator ID field define the format revision. The system software ID is a 7-bit
quantity. For events covered in this document, the system software IDs will be within the range
0x18-0x1F. System software ID of 0x18 indicates that OEM data byte 2 and 3 are encoded
using data format scheme revision 0. Note that the system software IDs in the range 0x10-0x1f
are reserved for the SMI handler. The IPMI specification reserves two distinct ranges for the
BIOS and the SMI handler. Since the distinction between the two is not very important, we use
the same values of generator ID’s for the BIOS as well as the SMI handler. Technically, the
FRB-2 event is not logged by the SMI handler, but it will use the same generator ID range as
memory errors.
6.1.3
Single-bit ECC Error Throttling Prevention
The system detects, corrects, and logs correctable errors. As long as these errors occur
infrequently, the system should continue to operate without a problem.
Occasionally, correctable errors are caused by a persistent failure of a single component. For
example, a broken data line on a DIMM would exhibit repeated errors until replaced. Although
these errors are correctable, continual calls to the error logger can throttle the system,
preventing any further useful work.
For this reason, the system counts certain types of correctable errors and disables reporting if
they occur too frequently. Correction remains enabled but calls to the error handler are
disabled. This allows the system to continue running, despite a persistent correctable failure.
The BIOS adds an entry to the event log to indicate that logging for that type of error has been
disabled. Such an entry indicates a serious hardware problem that must be repaired at the
earliest possible time.