HP Cluster Platform Interconnects v2010 Quadrics QsNetII Interconnect - Page 134
Poor application performance indicated by nodes hanging or low bandwidth or
View all HP Cluster Platform Interconnects v2010 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 134 highlights
b. On the second occurrence of an error at the same link location, repeat the procedure described in Step a. c. On the third occurrence of an error at the same location, replace the cable. d. On the first occurrence of an error in the same location with the new cable, replace one of the switch cards to which the cable is connected. If the errors are cleared then the replaced switch card is at fault. If the error persists then replace the other switch card. A link appears to be disconnected A disconnected link is indicated in the Links in Reset panel in the output from the qsnetstat command. Proceed as follows: 1. Links will go through reset if they are the down links to nodes that have been rebooted or powered down. Any uplinks from federated node-level interconnects to top-level interconnects which have been powered off will also appear in reset. In these cases no further action is required. 2. Links with high error counts may go into reset. Physical disconnection of the link cable from the switch card port will also result in the link going into reset. Follow the procedures described in the preceding diagnostic (A high occurrence of network errors seen using the qsnetstat or qsneterr diagnostics). Poor application performance indicated by nodes hanging or low bandwidth or high latency Poor performance might be caused by unexpected processes running on the compute nodes of a cluster. If these can be ruled out then a possible cause is the high occurrence of network errors. Proceed as follows: 1. Run qsnetstat to identify the link(s) with high error rates. 2. Examine monitoring log histories to look for a specific event that might cause the problem or enable you to identify when the problem started. 3. Use the link location as an argument to the qsctrl command and configure the link out of the network. 4. If application characteristics are restored then diagnose the fault using the process described in the diagnostic (A high occurrence of network errors seen using the qsnetstat or qsneterr diagnostics. If the application is still not behaving as expected then the network is probably not the cause. Escalate the problem to the next level of support. You suspect that a link cable replacement is required You suspect that a link cable needs replacement due to high error counts or possible physical damage to the cable (such as exceeding its bend radius, which might cause invisible damage. Proceed as follows: 1. Route a temporary link cable to replace the suspected bad cable and run diagnostics with the new cable in place. 2. Do not replace the original cable until the test is complete. It is possible that the fault is in the port connector or the switch card itself. If the fault persists with the replacement cable then further diagnosis is required. 3. If the fault is not found in the replaced cable then reconnect the original cable and remove the temporary link. 13-4 Troubleshooting Nodes and Links