HP Cluster Platform Interconnects v2010 HP Cluster Platform InfiniBand Interco - Page 107

Postinstallation Troubleshooting and Diagnostics, 9.1 Postinstallation Troubleshooting

Page 107 highlights

9 Postinstallation Troubleshooting and Diagnostics This chapter provides information on the interconnect firmware's logging and monitoring functions. Use these functions to perform initial fabric debugging and confirm device failures if you encounter a problem that is indicated by the interconnect or HCA LED status arrays. This is not a definitive list of cluster diagnostics, which are described in the Voltaire InfiniBand Fabric Management and Diagnostic Guide and the HP Cluster Platform InfiniBand Fabric Management and Diagnostic Guide. Note on Terminology: An HP Cluster Platform contains both Ethernet network switches (ProCurve switches), and a system interconnect. In this case, the InfiniBand is the system interconnect. However, it is common usage throughout the industry to also refer to interconnects as switches. To avoid confusion, the term interconnect is used consistently in the context of an HP cluster platform. The software that resides on an interconnect often refers to a switch. You will see this term used in command interfaces and in output from commands. In this context, the term switch is equivalent to interconnect. The following quick diagnostics are described: • Postinstallation troubleshooting (Section 9.1). • Startup checks (Section 9.2). • Debugging fabric failure using PM (Section 9.3). • How to identify a bad leaf or spine port on an ISR 9XXX (Section 9.4). • Detecting a failed port ( Section 9.5). 9.1 Postinstallation Troubleshooting Startup problems are usually isolated to a single component and are more difficult to isolate than a problem with a subsystem. When troubleshooting, first test each separate subsystem in the ISR 9288, since there are fewer subsystems than components. The ISR 9XXX chassis consists of the following subsystems: • The power supplies operate whenever rack power is connected. • The chassis fan modules operate when the system power is connected. The fan modules will not continue to operate when power is disconnected. The following are simple checks you can make to determine if there is a fan problem: • Listen to the fan modules to determine they are operating. • Check for any obstructions restricting airflow through the ISR 9XXX. If you determine that the fan is not operating, contact your HP customer service representative. 9.2 Startup Checks for the ISR 9XXX After making a configuration change such as replacing a failed module, use the following procedure as a start up check: 1. Listen for the chassis fans operation. If they do not operate, the fans may need to be replaced. Continue to Step 2 to determine if the power Supplies are operational. If you determine that the power supplies are functioning normally and that the fans are faulty, contact a customer service representative. If the ISR 9288 fan does not function properly at initial startup (there are no installation adjustments that you can make), contact a customer service representative. 2. Check the power supply LEDs on the rear panel. The power LEDs illuminate immediately upon the connection of power to the ISR 9XXX. If the LEDs are not on, the power supplies may need to be replaced. 9.1 Postinstallation Troubleshooting 107

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140

9 Postinstallation Troubleshooting and Diagnostics
This chapter provides information on the interconnect firmware's logging and monitoring
functions. Use these functions to perform initial fabric debugging and confirm device failures if
you encounter a problem that is indicated by the interconnect or HCA LED status arrays.
This is not a definitive list of cluster diagnostics, which are described in the
Voltaire InfiniBand
Fabric Management and Diagnostic Guide
and the
HP Cluster Platform InfiniBand Fabric Management
and Diagnostic Guide
.
Note on Terminology:
An HP Cluster Platform contains both Ethernet network switches (ProCurve switches), and a
system interconnect
. In this case, the InfiniBand is the system interconnect. However, it is common
usage throughout the industry to also refer to interconnects as
switches
.
To avoid confusion, the term
interconnect
is used consistently in the context of an HP cluster
platform. The software that resides on an interconnect often refers to a
switch
. You will see this
term used in command interfaces and in output from commands. In this context, the term
switch
is equivalent to
interconnect
.
The following quick diagnostics are described:
Postinstallation troubleshooting (
Section 9.1
).
Startup checks (
Section 9.2
).
Debugging fabric failure using PM (
Section 9.3
).
How to identify a bad leaf or spine port on an ISR 9XXX (
Section 9.4
).
Detecting a failed port (
Section 9.5
).
9.1 Postinstallation Troubleshooting
Startup problems are usually isolated to a single component and are more difficult to isolate
than a problem with a subsystem. When troubleshooting, first test each separate subsystem in
the ISR 9288, since there are fewer subsystems than components. The ISR 9XXX chassis consists
of the following subsystems:
The power supplies operate whenever rack power is connected.
The chassis fan modules operate when the system power is connected. The fan modules will
not continue to operate when power is disconnected.
The following are simple checks you can make to determine if there is a fan problem:
Listen to the fan modules to determine they are operating.
Check for any obstructions restricting airflow through the ISR 9XXX.
If you determine that the fan is not operating, contact your HP customer service representative.
9.2 Startup Checks for the ISR 9XXX
After making a configuration change such as replacing a failed module, use the following
procedure as a start up check:
1.
Listen for the chassis fans operation. If they do not operate, the fans may need to be replaced.
Continue to Step 2 to determine if the power Supplies are operational. If you determine that
the power supplies are functioning normally and that the fans are faulty, contact a customer
service representative. If the ISR 9288 fan does not function properly at initial startup (there
are no installation adjustments that you can make), contact a customer service representative.
2.
Check the power supply LEDs on the rear panel. The power LEDs illuminate immediately
upon the connection of power to the ISR 9XXX. If the LEDs are not on, the power supplies
may need to be replaced.
9.1 Postinstallation Troubleshooting
107