HP Cluster Platform Interconnects v2010 Quadrics QsNetII Interconnect - Page 132
QM500 driver unable to determine network position
View all HP Cluster Platform Interconnects v2010 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 132 highlights
QM500 driver unable to determine network position The QM500 (Elan) driver reports that it is unable to determine network position The QM500 PCI adapter is found but the driver is unable to communicate with the network through the card. Proceed as follows: 1. Verify that the card is actually functioning using qsnelantest. It is possible that the driver is only able to communicate partially with the card. If qselantest fails, it is likely that the card is poorly seated in its PCI connector. Reseat the card. 2. Check the green LED at both ends of the link cable. If the green LEDs are not lit (or are only lit at one end), it is likely that the cable is faulty. Try reseating the cable connections. 3. If reseating the cable connections does not help, try swapping the cable for a replacement that you know to be good. Node has an incorrect nodeset You can determine the nodeset by examining the /proc/elan/device0/nodeset file. An anomalous nodeset can mean either that the QM500 network adapter is malfunctioning intermittently, or that there is a fault in the interconnect network above the problem node. Proceed as follows: 1. Using a tool such as pdsh with dshbak is useful for viewing the nodeset on every node and collating the returned data. The nodeset information is contained in the procfs, in the text file /proc/qsnet/ep/rail0/nodeset. 2. A contiguous group of nodes with a broken nodeset suggests that the error is in the interconnect network. Run network diagnostics. 3. Isolated nodes with broken nodesets are more likely to be a broken or poorly seated QM500 card. Reseat the card. QM500 (Elan) driver displays unusual messages Unexpected driver messages might be displayed, such as the following: Rev A switch detected... ...change in network level..... You might see these messages in conjunction with a nodeset problem, as described in the preceding troubleshooting symptoms. Proceed as follows: 1. The QM500 network adapter is either faulty or needs to be reseated in its PCI connection. Test the card with a diagnostic and reseat the card. 2. A useful way of detecting nodes with Elan driver problems is to route all syslog kernel messages from the nodes to a log host. Configure this routing syslog.conf in the node system images. You can then examine the output of the syslog log file by using the tail command. Applications receive signal 6 (I/O trap) on the node Signal 6 indicates a QM500 hardware exception. Further information can be found by using edb on the core produced (This is done by default when the exception occurs). Exceptions usually mean that a node is generating hardware errors. Proceed as follows: 1. It is possible that this node is on the receiving end of a hardware error generated elsewhere in the network. Configure the node out of the network by using qsctrl -o. If the exception moves to another node, it is a sign that the node itself is not the cause of the problem. 13-2 Troubleshooting Nodes and Links