HP Integrity Superdome SX2000 Cluster Installation and Configuration Guide - W - Page 29
Troubleshooting the Cluster, What to Do if Validation Tests Fail
View all HP Integrity Superdome SX2000 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 29 highlights
• The NIC and switch redundancy layer is transparent to the IP layer. • It may use standby, redundant team members to load balance your network traffic and improve performance for transmitted and received packets on the individual cluster node. • It may use advanced redundancy mechanisms to improve the detection of failures in your network infrastructure, and to provide a proactive response to them. For example, cluster nodes continuously test their connectivity with each other but they cannot detect path failures when there is an external switch upstream. Active Path Failover is an advanced teaming feature that detects such failures, and fails over to a NIC that has a path to an Echo Node device (an external switch upstream). If you are going to implement NIC teaming in your cluster networks, you should complete the following steps: 1. Plan your network infrastructure according to the cluster demands, taking into account NIC teaming configuration, redundant switches, routers, and so on. 2. Create the teams planned in the previous step for every cluster node. 3. Validate your cluster configuration. 4. Create your cluster. For more information about NIC teaming issues in clustered environments, see the following document: http://support.microsoft.com/kb/254101 Troubleshooting the Cluster What to Do if Validation Tests Fail In most cases, if any tests in the cluster validation wizard fail, then Microsoft does not consider the solution to be supported. There are exceptions to this rule, such as the case with multi-site (geographically dispersed) clusters where there is no shared storage. In this scenario the expected result of the validation wizard is that the storage tests will fail. This is still a supported solution if the remainder of the tests complete successfully. The type of test that fails is a guideline to the corrective action to take. For example, if the storage test "List all disks" fails, and subsequent storage tests do not run (because these would also fail), contact the storage vendor to troubleshoot. Similarly, if a network test related to IP addresses fails, consult with your network infrastructure team. Most of the warnings or errors should result in working with internal teams or with a specific hardware vendor. After the issues have been addressed and resolved, it is necessary to rerun the cluster validation wizard. It is required (in order to be considered a supported configuration) that all tests are run and completed successfully without failures. Validation Issues for Multi-site or Geographically Dispersed Failover Clusters Failover cluster solutions that do not have a common shared disk and instead leverage data replication between nodes might not pass the cluster validation "storage" tests. This is a common configuration in cluster solutions where nodes are stretched across geographic regions. If a cluster solution does not require external storage to fail over from one node to another, it does not need to pass the "storage" tests to be a fully supported solution. For more information on multi-site or geographically dispersed clusters, see the following white paper: http://go.microsoft.com/fwlink/?LinkId=112125 Troubleshooting See the following documents for more information about troubleshooting errors and interpreting system event descriptions in clusters: Troubleshooting the Cluster 29