HP 2128-M ClusterPack V2.4 Tutorial - Page 126
Restart application from a checkpoint if a Compute Node crashes
View all HP 2128-M manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 126 highlights
2.3.6 Restart application from a checkpoint if a Compute Node crashes If a Compute Node crashes, jobs submitted to an AppRS queue will automatically be restarted on a new node or set of nodes as those resources become available. No user intervention is necessary. Back to Top 2.3.7 Determine if the application fails to complete The job state of EXIT is assigned to jobs that end abnormally. Using the Clusterware Pro V5.1 Web Interface: From the Jobs tab: z Review the job states in the Jobs table. z Use the Previous and Next buttons to view more Jobs. Using the Clusterware Pro V5.1 CLI: % bjobs References: z 3.7.8 How do I access the Clusterware Pro V5.1 Web Interface? z 3.7.9 How do I access the Clusterware Pro V5.1 Command Line Interface? Back to Top 2.3.8 Check impact on the job if a Compute Node crashes In the event that a Compute Node crashes or becomes unavailable, it may be desirable to check on jobs that may be affected by the situation. Using the Clusterware Pro V5.1 CLI: z List your current and recently finished jobs: % bjobs -a z Request information on a particular job: