HP Cluster Platform Interconnects v2010 Quadrics QsNetII Interconnect - Page 149

Mbytes of local SDRAM. Its memory system can detect and correct a variety

Page 149 highlights

D Output From the qselantest Utility Using the qselantest utility is described in Section 12.3. The output from the test is extensive and is broken down for description as follows: • Device information (dev_info) is described in Section D.1. • PCI Bus information is described in Section D.2. • Thread processor information is described in Section D.3. • SDRAM memory information is described in Section D.4 The qselantest can fail if a faulty component is detected or if the link cable to the interconnect network is badly seated or missing. If the failing test is one of the qsnet2_dmatest tests, the most likely cause is the link cable connection to the network. Remove, inspect, and reseat the cable at both ends. Make sure that the locking catch is fully seated. If a qsnet2_dmatest run continues to fail, replace the link cable. If a qsnet2_regtest test fails, it is most likely due to the QM500 (Elan4) network device driver not being loaded, or a badly seated or broken QM500 PCI adaptor. If the qsnet2_tmemtest fails, there is a problem in the QM500 PCI cards SDRAM memory and you must replace the card. The QM500 network adapter has 64 Mbytes of local SDRAM. Its memory system can detect and correct a variety of single and multiple bit errors. These errors are logged by the device driver which also outputs errors on the console in the following format: elan0: ECC memory error Address=001e71d0 Syndrome=13 Correctable elan0: ECC memory error Address=001e71c0 Syndrome=13 Correctable Multiple Errors An uncorrectable error causes the node to panic. The multiple errors message is displayed if another correctable error occurs before the device driver handles a pending error. A low rate of correctable errors is to be expected; approximately 8 per day on a system with 1000 QM500 network adapters is usual. Memory errors will be logged in the console files for each node. They can also be seen in the output from the dmesg command. Use the elandebug command to display a summary of device error statistics as follows: # /usr/opt/rms/diag/bin/elandebug device_error_stats ... CorrectableErrors 0(0) UncorrectableErrors 0(0) MultipleErrors 0(0) If you are escalating a problem report to a support organization, use the following command to obtain information about the card: # cat /proc/qsnet/elan4/device0/vpd PN: QM500b EC: XX SN: K76A7A1BFPN471 FN: A1 MN: 0000 Z0: 686a232b MT: 0104-023-112007 Output From the qselantest Utility D-1

  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56
  • 57
  • 58
  • 59
  • 60
  • 61
  • 62
  • 63
  • 64
  • 65
  • 66
  • 67
  • 68
  • 69
  • 70
  • 71
  • 72
  • 73
  • 74
  • 75
  • 76
  • 77
  • 78
  • 79
  • 80
  • 81
  • 82
  • 83
  • 84
  • 85
  • 86
  • 87
  • 88
  • 89
  • 90
  • 91
  • 92
  • 93
  • 94
  • 95
  • 96
  • 97
  • 98
  • 99
  • 100
  • 101
  • 102
  • 103
  • 104
  • 105
  • 106
  • 107
  • 108
  • 109
  • 110
  • 111
  • 112
  • 113
  • 114
  • 115
  • 116
  • 117
  • 118
  • 119
  • 120
  • 121
  • 122
  • 123
  • 124
  • 125
  • 126
  • 127
  • 128
  • 129
  • 130
  • 131
  • 132
  • 133
  • 134
  • 135
  • 136
  • 137
  • 138
  • 139
  • 140
  • 141
  • 142
  • 143
  • 144
  • 145
  • 146
  • 147
  • 148
  • 149
  • 150
  • 151
  • 152
  • 153
  • 154
  • 155
  • 156
  • 157
  • 158
  • 159
  • 160
  • 161
  • 162
  • 163
  • 164
  • 165
  • 166

D
Output From the qselantest Utility
Using the
qselantest
utility is described in Section 12.3. The output from the
test is extensive and is broken down for description as follows:
Device information (
dev_info
) is described in Section D.1.
PCI Bus information is described in Section D.2.
Thread processor information is described in Section D.3.
SDRAM memory information is described in Section D.4
The
qselantest
can fail if a faulty component is detected or if the link cable
to the interconnect network is badly seated or missing. If the failing test is one
of the
qsnet2_dmatest
tests, the most likely cause is the link cable connection
to the network. Remove, inspect, and reseat the cable at both ends. Make sure
that the locking catch is fully seated. If a
qsnet2_dmatest
run continues to fail,
replace the link cable.
If a
qsnet2_regtest
test fails, it is most likely due to the QM500 (Elan4) network
device driver not being loaded, or a badly seated or broken QM500 PCI adaptor.
If the
qsnet2_tmemtest
fails, there is a problem in the QM500 PCI cards
SDRAM memory and you must replace the card. The QM500 network adapter has
64 Mbytes of local SDRAM. Its memory system can detect and correct a variety of
single and multiple bit errors. These errors are logged by the device driver which
also outputs errors on the console in the following format:
elan0: ECC memory error
Address=001e71d0 Syndrome=13 Correctable
elan0: ECC memory error
Address=001e71c0 Syndrome=13 Correctable Multiple Errors
An uncorrectable error causes the node to panic. The multiple errors message is
displayed if another correctable error occurs before the device driver handles a
pending error. A low rate of correctable errors is to be expected; approximately 8
per day on a system with 1000 QM500 network adapters is usual. Memory errors
will be logged in the console files for each node. They can also be seen in the output
from the
dmesg
command. Use the
elandebug
command to display a summary of
device error statistics as follows:
#
/usr/opt/rms/diag/bin/elandebug device_error_stats
...
CorrectableErrors 0(0)
UncorrectableErrors 0(0)
MultipleErrors 0(0)
If you are escalating a problem report to a support organization, use the following
command to obtain information about the card:
#
cat /proc/qsnet/elan4/device0/vpd
PN: QM500b
EC: XX
SN: K76A7A1BFPN471
FN: A1
MN: 0000
Z0: 686a232b
MT: 0104-023-112007
Output From the qselantest Utility
D-1