Intel S1200RP Technical Product Specification - Page 58

Fault Resilient Booting FRB, Sensor Monitoring

Page 58 highlights

Platform Management Functional Overview Intel® Server Board S1200V3RP 2. Reversion of temporary test modes for the BMC back to normal operational modes. 3. FP status LED and DIMM fault LEDs may not reflect BIOS detected errors. 6.6 Fault Resilient Booting (FRB) Fault resilient booting (FRB) is a set of BIOS and BMC algorithms and hardware support that allow a multiprocessor system to boot even if the bootstrap processor (BSP) fails. Only FRB2 is supported using watchdog timer commands. FRB2 refers to the FRB algorithm that detects system failures during POST. The BIOS uses the BMC watchdog timer to back up its operation during POST. The BIOS configures the watchdog timer to indicate that the BIOS is using the timer for the FRB2 phase of the boot operation. After the BIOS has identified and saved the BSP information, it sets the FRB2 timer use bit and loads the watchdog timer with the new timeout interval. If the watchdog timer expires while the watchdog use bit is set to FRB2, the BMC (if so configured) logs a watchdog expiration event showing the FRB2 timeout in the event data bytes. The BMC then hard resets the system, assuming the BIOS-selected reset as the watchdog timeout action. The BIOS is responsible for disabling the FRB2 timeout before initiating the option ROM scan and before displaying a request for a boot password. If the processor fails and causes an FRB2 timeout, the BMC resets the system. The BIOS gets the watchdog expiration status from the BMC. If the status shows an expired FRB2 timer, the BIOS enters the failure in the system event log (SEL). In the OEM bytes entry in the SEL, the last POST code generated during the previous boot attempt is written. FRB2 failure is not reflected in the processor status sensor value. The FRB2 failure does not affect the front panel LEDs. 6.7 Sensor Monitoring The BMC monitors system hardware and reports system health. Some of the sensors include those for monitoring:  Component, board, and platform temperatures  Board and platform voltages  System fan presence and tach  Chassis intrusion  Front Panel NMI  Front Panel Power and System Reset Buttons  SMI timeout  Processor errors The information gathered from physical sensors is translated into IPMI sensors as part of the IPMI Sensor Model. The BMC also reports various system state changes by maintaining virtual sensors that are not specifically tied to physical hardware. 46 Revision 1.0

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166
  • 167
  • 168
  • 169
  • 170
  • 171
  • 172
  • 173
  • 174
  • 175
  • 176
  • 177
  • 178
  • 179
  • 180
  • 181
  • 182
  • 183
  • 184
  • 185
  • 186
  • 187
  • 188
  • 189
  • 190
  • 191
  • 192
  • 193
  • 194
  • 195
  • 196
  • 197
  • 198
  • 199
  • 200
  • 201
  • 202
  • 203
  • 204
  • 205
  • 206
  • 207
  • 208
  • 209
  • 210
  • 211
  • 212
  • 213
  • 214
  • 215
  • 216
  • 217
  • 218
  • 219
  • 220
  • 221
  • 222
  • 223
  • 224
  • 225
  • 226
  • 227
  • 228
  • 229
  • 230
  • 231
  • 232
  • 233
  • 234
  • 235
  • 236
  • 237
  • 238
  • 239
  • 240
  • 241
  • 242
  • 243
  • 244
  • 245
  • 246
  • 247
  • 248
  • 249
  • 250
  • 251
  • 252
  • 253
  • 254
  • 255
  • 256
  • 257
  • 258
  • 259
  • 260
  • 261
  • 262

Platform Management Functional Overview
Intel® Server Board S1200V3RP
2.
Reversion of temporary test modes for the BMC back to normal operational modes.
3.
FP status LED and DIMM fault LEDs may not reflect BIOS detected errors.
6.6
Fault Resilient Booting (FRB)
Fault resilient booting (FRB) is a set of BIOS and BMC algorithms and hardware support that
allow a multiprocessor system to boot even if the bootstrap processor (BSP) fails. Only FRB2 is
supported using watchdog timer commands.
FRB2 refers to the FRB algorithm that detects system failures during POST. The BIOS uses the
BMC watchdog timer to back up its operation during POST. The BIOS configures the watchdog
timer to indicate that the BIOS is using the timer for the FRB2 phase of the boot operation.
After the BIOS has identified and saved the BSP information, it sets the FRB2 timer use bit and
loads the watchdog timer with the new timeout interval.
If the watchdog timer expires while the watchdog use bit is set to FRB2, the BMC (if so
configured) logs a watchdog expiration event showing the FRB2 timeout in the event data bytes.
The BMC then hard resets the system, assuming the BIOS-selected reset as the watchdog
timeout action.
The BIOS is responsible for disabling the FRB2 timeout before initiating the option ROM scan
and before displaying a request for a boot password. If the processor fails and causes an FRB2
timeout, the BMC resets the system.
The BIOS gets the watchdog expiration status from the BMC. If the status shows an expired
FRB2 timer, the BIOS enters the failure in the system event log (SEL). In the OEM bytes entry
in the SEL, the last POST code generated during the previous boot attempt is written. FRB2
failure is not reflected in the processor status sensor value.
The FRB2 failure does not affect the front panel LEDs.
6.7
Sensor Monitoring
The BMC monitors system hardware and reports system health. Some of the sensors include
those for monitoring:
Component, board, and platform temperatures
Board and platform voltages
System fan presence and tach
Chassis intrusion
Front Panel NMI
Front Panel Power and System Reset Buttons
SMI timeout
Processor errors
The information gathered from physical sensors is translated into IPMI sensors as part of the
IPMI Sensor Model. The BMC also reports various system state changes by maintaining virtual
sensors that are not specifically tied to physical hardware.
Revision 1.0
46