Home » HP Manuals » Servers » HP Integrity rx2800 » Manual Viewer

HP Integrity rx2800 rx2800 i2 User Service Guide - Page 80

Fault management overview, HP-UX fault management, WBEM indication providers

View all HP Integrity rx2800 manuals

Add to My Manuals
Save this manual to your list of manuals

Page 80 highlights

Fault management overview The goal of fault management and monitoring is to increase system availability, by moving from a reactive fault detection, diagnosis, and repair strategy to a proactive fault detection, diagnosis, and repair strategy. The objectives are as follows: • To detect problems automatically, as nearly as possible to when they actually occur. • To diagnose problems automatically, at the time of detection. • To automatically report in understandable text a description of the problem, the likely causes of the problem, the recommended actions to resolve the problem, and detailed information about the problem. • To ensure that tools are available to repair or recover from the fault. HP-UX fault management Proactive fault prediction and notification is provided on HP-UX by SysFaultMgmt WBEM indication providers. WBEM provideS frameworks for monitoring and reporting events. SysFaultMgmt WBEM indication providers allow users to monitor the operation of a wide variety of hardware products, and alert them immediately if any failure or other unusual event occurs. By using hardware event monitoring, users can virtually eliminate undetected hardware failures that could interrupt system operation or cause data loss. WBEM indication providers Hardware monitors are available to monitor the following components (These monitors are distributed free on the OE media): • Server/fans/environment • CPU monitor • UPS monitor* • FC hub monitor* • FC switch monitor* • Memory monitor • Core electronics components • Disk drives • Ha_disk_array NOTE: No SysFaultMgmt WBEM indication provider is currently available for components followed by an asterisk. Errors and reading error logs Event log definitions Often the underlying root cause of an MCA event is captured by system or BMC firmware in both the System Event and Forward Progress Event Logs (SEL and FP, respectively). These errors are easily matched with MCA events by their timestamps. For example, the loss of a CPU VRM might cause a CPU fault. Decoding the MCA error logs would only identify the failed CPU as the most likely faulty CRU. Following are some important points to remember about events and event logs: • Event logs are the equivalent of the old server logs for status or error information output. • Symbolic names are used in the source code; for example, MC_CACHE_CHECK. 80 Troubleshooting

Section	Page
HP Integrity rx2800 i2 Server User Service Guide	1
Contents	3
Abstract	9
1 Overview	10
Server subsystems	10
Internal components	10
I/O subsystem	12
RAID support	13
CPU subsystem	13
Memory subsystem	13
Cooling subsystem	14
Power subsystem	14
Hard drive subsystem	15
Firmware	15
Event IDs for errors and events	15
Controls, ports, and LEDs	16
Front panel controls, ports, and LEDs	16
SID	17
Storage and media devices	18
Hard drive LEDs	18
Optical drive	19
Rear panel controls, ports, and LEDs	19
Power supply	21
PCIe card slots	21
2 Server Specifications	22
System configuration	22
Dimensions and weight	22
Grounding	23
Electrical specifications	23
System power specifications	23
Power consumption and cooling	24
Physical and environmental specifications	24
3 Installing the server	26
Safety information	26
Installation sequence and checklist	26
Unpacking and inspecting the server	27
Verifying site preparation	27
Inspecting the shipping containers for damage	27
Unpacking the server	27
Checking the inventory	27
Returning damaged equipment	27
Unloading the server with a lifter	28
Installing additional components	28
Installing a hot-pluggable SAS hard drive	28
Installing a hot-swappable power supply	29
Removing the access panel	30
Removing the PCI riser cage	31
Removing expansion slot covers	32
Installing expansion boards	33
Installing a half-length expansion board	33
Installing a full-length expansion board	33
DIMMs	34
Memory configurations	34
Memory riser board locations and slot IDs	34
Supported DIMM sizes	35
Memory loading rules and guidelines	35
Installing DIMMs	36
Installing a CPU	37
CPU load order	38
Installing a CPU and heat sink module	38
Completing installation	44
Installing the server into a rack or pedestal	44
Rack installation	44
HP rack	44
Non-HP rack	44
Pedestal kit installation	45
Connecting server cables	45
AC input power	45
Power states	45
Applying standby power to the server	46
Connecting to the LAN	46
Connecting and setting up the console	46
Setting up the console	46
Connecting to a host console	46
Physical access	46
iLO 3 MP LAN	47
HP-UX	47
Setup checklist	47
Preparation	48
Determining the physical iLO 3 MP access method	48
Determining the iLO 3 MP LAN configuration method	48
Configuring the iLO 3 MP LAN using DHCP and DNS	49
Configuring the iLO 3 MP LAN using the RS-232 serial port	49
Logging in to the iLO 3 MP	50
Additional setup	51
Modifying user accounts and default password	51
Setting up security	51
Security access settings	52
Accessing the host console	52
Accessing the host console with the TUI - CO command	52
Interacting with the iLO 3 MP using the web GUI	52
Accessing the graphic console using VGA	53
Powering on and powering off the server	53
Power states	53
Powering on the server	53
Powering on the server using the iLO 3 MP	54
Powering on the server manually	54
Powering off the server	54
Powering off the server using the iLO 3 MP	54
Powering off the server manually	54
Verifying installed components in the server	55
Installation troubleshooting	57
Troubleshooting methodology	57
Troubleshooting using the server power button	57
Server does not power on	58
UEFI menu is not available	58
Operating system does not boot	59
Operating system boots with problems	59
Intermittent server problems	59
SATA DVD+RW drive problems	59
SAS disk drive problems	59
Console problems	59
Downloading and installing the latest version of the firmware	60
Downloading the latest version of the firmware	60
Installing the latest version of the firmware on the server	60
4 Installing, booting and shutting down the operating system	61
Operating systems supported on the server	61
Installing the operating system onto the server	61
Installing the OS from the DVD drive	61
Installing the OS using HP Ignite–UX	61
Installing the OS using vMedia	62
Configuring system boot options	62
Booting and shutting down HP-UX	63
Adding HP-UX to the boot options list	63
HP-UX standard boot	64
Booting HP-UX from the UEFI Boot Manager	64
Booting HP-UX from the UEFI Shell	64
Booting HP-UX in single-user mode	65
Booting HP-UX in LVM-maintenance mode	65
Shutting down HP-UX	65
Booting and shutting down Microsoft Windows	65
Adding Microsoft Windows to the boot options list	65
Booting the Microsoft Windows operating system	66
Shutting down Microsoft Windows	67
Shutting down Windows from the command line	68
5 Troubleshooting	69
Methodology	69
General troubleshooting methodology	69
Recommended troubleshooting methodology	70
Basic and advanced troubleshooting tables	71
Troubleshooting tools	75
LEDs	75
Front panel	75
Health LED	75
System Event Log LED	76
Locator Switch/LED (UID)	77
SID LEDs	77
FRU and CRU health LEDs	77
Diagnostics	77
Online diagnostics and exercisers	77
Online support tool availability	78
Online support tools list	78
Offline support tools list	79
General diagnostic tools	79
Fault management overview	80
HP-UX fault management	80
WBEM indication providers	80
Errors and reading error logs	80
Event log definitions	80
Using event logs	81
iLO 3 MP event logs	81
System event log review	82
Supported configurations	82
Server block diagram	82
System build-Up troubleshooting procedure	83
Troubleshooting the CPU and Memory	84
Troubleshooting the server CPU	85
CPU load order	85
CPU module behaviors	85
Customer messaging policy	85
Troubleshooting the server memory	87
Memory DIMM load order	87
Memory subsystem behaviors	87
Customer messaging policy	87
Troubleshooting the power subsystem	88
Power subsystem behavior	88
Power LED button	89
Troubleshooting the cooling subsystem	89
Cooling subsystem behavior	89
Troubleshooting the I/O	90
I/O subsystem behaviors	90
Customer messaging policy	90
Troubleshooting the iLO 3 MP subsystem	92
iLO 3 MP LAN LED on the rear panel	92
Troubleshooting the I/O subsystem	92
Verifying SAS hard drive operation	92
System LAN LEDs	93
Troubleshooting the boot process	93
Troubleshooting the firmware	94
Identifying and troubleshooting firmware problems	94
Updates	94
Troubleshooting the system console	95
Troubleshooting tips	95
Troubleshooting the server environment	95
Reporting your problems to HP	95
Online support	96
Phone support	96
Information to collect before you contact support	96
6 Removal and replacement procedures	97
Required tools	97
Safety considerations	97
Preventing electrostatic discharge	97
Server warnings and cautions	98
Preparation procedures	98
Extend the server from the rack	99
Accessing internal components for a pedestal–mounted server	99
power off the server	102
Remove the server from the rack	102
Access the product rear panel	103
Cable management arm with left-hand swing	103
Cable management arm with right-hand swing	103
Server component classification	104
Hot-swappable components	104
Hot-pluggable components	104
Cold-swappable components	104
SAS hard drive blank	105
Hot-plug SAS hard drive	105
Power supply blank	106
Hot-swap power supply	106
Access panel	107
Optical drive filler	107
Optical drive	108
Hot-swap fan	109
Power supply backplane	110
Hard drive backplane	111
PCI riser cage	112
Expansion slot covers	112
Expansion boards	112
Half-length expansion board	112
Full-length expansion board	113
Battery-backed write cache procedures	114
Removing the cache module	114
Removing the super capacitor pack	114
Recovering data from the battery-backed write cache	116
Removing and replacing the CPU baffle	117
Removing the CPU baffle	117
Replacing the CPU baffle	117
Removing and replacing a CPU and heat sink module	118
Removing a CPU and heat sink module	118
Replacing a CPU	119
DIMMs	120
PDH battery (system battery)	121
SID	121
Intrusion switch cable	122
System board	122
HP Trusted Platform Module (TPM)	125
7 Support and other resources	126
Contacting HP	126
Before you contact HP	126
HP contact information	126
Subscription service	126
HP Insight Remote Support Software	126
Related information	127
About this document	127
Typographic Conventions	127
HP-UX release name and release identifier	128
Related documents	128
A Customer replaceable units information	129
Parts only warranty service	129
Customer self repair	129
Customer replaceable units list	130
B Utilities	132
SAS disk setup	132
Using the saupdate command	132
Get mode	132
Set mode	133
Updating the firmware using saupdate	133
Determining the Driver ID and CTRL ID	134
Using the ORCA menu-driven interface	134
Creating a logical drive	134
Deleting a logical drive	134
UEFI	135
UEFI shell and HP POSSE commands	135
Drive paths in UEFI	138
Using the boot maintenance manager	138
Boot options	139
Add boot option	139
Delete boot option	140
Change boot order	141
Driver options	141
Add driver option	142
Delete driver option	143
Change driver order	143
Console options	143
Boot from file	143
Set boot next value	144
Set time out value	144
Reset system	145
iLO MP	145
Glossary	146

Match case Limit results 1 per page

Fault management overview

The goal of fault management and monitoring is to increase system availability, by moving from

a reactive fault detection, diagnosis, and repair strategy to a proactive fault detection, diagnosis,

and repair strategy. The objectives are as follows:

•

To detect problems automatically, as nearly as possible to when they actually occur.

•

To diagnose problems automatically, at the time of detection.

•

To automatically report in understandable text a description of the problem, the likely causes

of the problem, the recommended actions to resolve the problem, and detailed information

about the problem.

•

To ensure that tools are available to repair or recover from the fault.

HP-UX fault management

Proactive fault prediction and notification is provided on HP-UX by SysFaultMgmt WBEM indication

providers. WBEM provideS frameworks for monitoring and reporting events.

SysFaultMgmt WBEM indication providers allow users to monitor the operation of a wide variety

of hardware products, and alert them immediately if any failure or other unusual event occurs. By

using hardware event monitoring, users can virtually eliminate undetected hardware failures that

could interrupt system operation or cause data loss.

WBEM indication providers

Hardware monitors are available to monitor the following components (These monitors are distributed

free on the OE media):

•

Server/fans/environment

•

CPU monitor

•

UPS monitor*

•

FC hub monitor*

•

FC switch monitor*

•

Memory monitor

•

Core electronics components

•

Disk drives

•

Ha_disk_array

NOTE:

No SysFaultMgmt WBEM indication provider is currently available for components

followed by an asterisk.

Errors and reading error logs

Event log definitions

Often the underlying root cause of an MCA event is captured by system or BMC firmware in both

the System Event and Forward Progress Event Logs (SEL and FP, respectively). These errors are

easily matched with MCA events by their timestamps. For example, the loss of a CPU VRM might

cause a CPU fault. Decoding the MCA error logs would only identify the failed CPU as the most

likely faulty CRU. Following are some important points to remember about events and event logs:

•

Event logs are the equivalent of the old server logs for status or error information output.

•

Symbolic names are used in the source code; for example,

MC_CACHE_CHECK

Troubleshooting