HP Cluster Platform Interconnects v2010 Quadrics QsNetII Interconnect - Page 133
Troubleshooting Link Problems
View all HP Cluster Platform Interconnects v2010 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 133 highlights
2. Test the node using qselantest and qsnetcabletest to confirm whether the node is functional or not. 3. Test every other node in the segment by using qselantest and qsnetcabletest to verify their integrity. Using a tool such as pdsh with dshbak is helpful for sorting the diagnostic output. 13.2 Troubleshooting Link Problems A high occurrence of network errors seen using the qsnetstat or qsneterr diagnostics Network errors are often due to a badly seated or failing link component. You can determine the location of the error by using the qsnetstat or qsneterr diagnostic tools. The cause of the errors might be present in the link cable, the cable connectors, or on the switch cards. If errors are reported on the receiver during data transfer, it means that the cause might also be at the other end of an interconnecting link. Proceed as follows: 1. Identify the end points of the link reporting errors. Error locations are identified by the switch module name, a switch card id, and a port id or chip and link. Typical output from the qsnetstat command is as follows: Name B C:L/Port State CRC Clock Data Protocol QR1N07 6 07 ULink R 0 (0/0) 342 (8/2) 0 (0/0) 0 (0/0) QR0N11 2 0:6 Intnl N 4 (2/0) 0 (0/0) 0 (0/0) 0 (0/0 The first line indicates an port, the second line has the colon delimiter indicating a chip:link location. Port ids are used for locations where a link cable connects to a switch card. Chip and link ids are used for internal links, either between chips on the same switch card or between chips on different switch cards, connecting through the midplane of the interconnect. 2. If the error is contained within a single switch card, replace the card and retest the network with the replacement. 3. Errors in links between switch cards which pass through the midplane of a federated node-level interconnect can be in one or other of the cards, or the midplane itself. Reseating both cards connected by the problem link should be the first step. If this fails to clear the error then replace the QM502 which carries the up-links to the top-level interconnects. If the problem persists then swap the QM501 which carries the down-links to the cluster nodes. If swapping both cards doesn't fix the problem then the midplane may be at fault and a replacement is required. 4. Errors reported at the ports of the switch cards may be due to the card, port or the interconnecting cable. Diagnosing the cause of the problem may cause some disruption in availability of the nodes local to the fault. To minimize downtime, configure out the link by using the qsctrl -o command. You can then rectify the problem during a scheduled maintenance period. The link should be configured back in using qsctrl -i before diagnosis proceeds. Diagnosis starts with the cables and moves onto the switch cards in the following steps: a. Visually inspect and reseat both ends of the link cable. On reseat a solid green LED should be lit next to the port. The red LED should not be lit. Use qsnetstat to monitor the link. If the link comes out of reset and the error counts stop incrementing, note its location and make it clean. If it fails to come out of reset after a number of reseat attempts the cable should be replaced. Troubleshooting Nodes and Links 13-3