HP Integrity rx2800 rx2800 i2 User Service Guide - Page 85

Troubleshooting the server CPU, CPU load order, CPU module behaviors, Customer messaging policy

Page 85 highlights

Troubleshooting the server CPU The server supports both single- and dual-core CPUs. Each server supports one or two CPU modules. The dual-core CPU modules contain two individual CPU cores. This results in four physical CPUs when two CPU modules are installed in the server. Furthermore, each physical CPU core contains logic to support two physical threads. This results in up to eight threads, or the equivalent of eight logical CPUs, when two dual-core CPU modules are installed and enabled in the server. CPU load order For a minimally loaded server, one CPU module must be installed in CPU slot 0 on the system board, and its threads must be enabled by user actions. Additional CPU modules of the same revision are installed in CPU slot 1 for the server. CPU module behaviors Local MCA events can cause the physical CPU core and one or both of its logical CPUs within that CPU module to fail while all other physical and their logical CPUs continue operating. Double-bit data cache errors in any physical CPU core causes a Global MCA event that causes all logical and physical CPUs in the server to fail and reboot the operating system. Customer messaging policy • A diagnostic LED only lights for physical CPU core errors, when isolation is to a specific IPF CPU module. If there is any uncertainty about a specific CPU, then the customer is pointed to the SEL for any action, and the suspect IPF CPU module CRU LED on the System Insight Display does not light up. • For configuration-type errors (for example, when there is no IPF CPU module installed in CPU slot 0) all of the CRU LEDs on the diagnostic LED panel light up for all of the IPF CPUs that are missing. • No diagnostic messages are reported for single-bit errors that are corrected in both instruction and data caches, during corrected machine check (CMC) events to any physical CPU core. Diagnostic messages are reported for CMC events when thresholds are exceeded for single-bit errors; fatal CPU errors cause global / local MCA events. Table 35 CPU events that light SID LEDs Diagnostic LEDs CPUs Sample IPMI Events Type E0h, 39d:04d BOOT_DECONFIG_CPU CPUs Type E0h, 5823d:26d PFM_CACHE_ERR_PROC CPUs Type E0h, 5824d:26d PFM_CORR_ERROR_MEM CPUs Type 02h, 02h:07h:03h VOLTAGE_DEGRADES_TO_NON_RECOVERABLE Cause Source Notes CPU failed and SFW deconfigured Too many cache errors detected by processor WIN Agent Too many WIN Agent corrected errors detected by platform Voltage on CRU is inadequate BMC This event follows other failed CPUs Threshold exceeded for cache parity errors on CPU Threshold exceeded for cache errors from CPU corrected by ICH10 Power Pod voltage is out of range (likely too low) Troubleshooting the CPU and Memory 85

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151

Troubleshooting the server CPU
The server supports both single- and dual-core CPUs. Each server supports one or two CPU modules.
The dual-core CPU modules contain two individual CPU cores. This results in four physical CPUs
when two CPU modules are installed in the server.
Furthermore, each physical CPU core contains logic to support two physical threads. This results
in up to eight threads, or the equivalent of eight logical CPUs, when two dual-core CPU modules
are installed and enabled in the server.
CPU load order
For a minimally loaded server, one CPU module must be installed in CPU slot 0 on the system
board, and its threads must be enabled by user actions. Additional CPU modules of the same
revision are installed in CPU slot 1 for the server.
CPU module behaviors
Local MCA events can cause the physical CPU core and one or both of its logical CPUs within that
CPU module to fail while all other physical and their logical CPUs continue operating. Double-bit
data cache errors in any physical CPU core causes a Global MCA event that causes all logical
and physical CPUs in the server to fail and reboot the operating system.
Customer messaging policy
A diagnostic LED only lights for physical CPU core errors, when isolation is to a specific IPF
CPU module. If there is any uncertainty about a specific CPU, then the customer is pointed to
the SEL for any action, and the suspect IPF CPU module CRU LED on the System Insight Display
does not light up.
For configuration-type errors (for example, when there is no IPF CPU module installed in CPU
slot 0) all of the CRU LEDs on the diagnostic LED panel light up for all of the IPF CPUs that are
missing.
No diagnostic messages are reported for single-bit errors that are corrected in both instruction
and data caches, during corrected machine check (CMC) events to any physical CPU core.
Diagnostic messages are reported for CMC events when thresholds are exceeded for single-bit
errors; fatal CPU errors cause global / local MCA events.
Table 35 CPU events that light SID LEDs
Notes
Source
Cause
Sample IPMI Events
Diagnostic
LEDs
This event
follows other
failed CPUs
SFW
CPU failed and
deconfigured
Type E0h, 39d:04d
BOOT_DECONFIG_CPU
CPUs
Threshold
exceeded for
cache parity
errors on CPU
WIN Agent
Too many
cache errors
detected by
processor
Type E0h, 5823d:26d
PFM_CACHE_ERR_PROC
CPUs
Threshold
exceeded for
cache errors
from CPU
corrected by
ICH10
WIN Agent
Too many
corrected
errors detected
by platform
Type E0h, 5824d:26d
PFM_CORR_ERROR_MEM
CPUs
Power Pod
voltage is out
of range
(likely too
low)
BMC
Voltage on
CRU is
inadequate
Type 02h, 02h:07h:03h
VOLTAGE_DEGRADES_TO_NON_RECOVERABLE
CPUs
Troubleshooting the CPU and Memory
85