HP 1032 ClusterPack V2.4 Tutorial - Page 143
ClusterPack, Application ReStart AppRS Overview
View all HP 1032 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 143 highlights
Application ReStart (AppRS) Overview ClusterPack Application ReStart (AppRS) Overview Index | Administrators Guide | Users Guide | Tool Overview | Related Documents | Dictionary 3.4.1 What is AppRS? 3.4.1 What is AppRS? AppRS is a collection of software that works in conjunction with Platform Computing's Clusterware™ to provide a fail-over system that preserves the current working directory (CWD) contents of applications in the event of a fail-over. Many technical applications provide application-level checkpoint/restart facilities in which the application can save and restore its state from a file set. Checkpoint/restart is particularly helpful for long running applications because it can minimize lost computing time due to computer failure. The usefulness of this capability is diminished however by two factors. First, computer failure frequently leaves the restart files inaccessible. Using a shared file system does not preclude data loss and can introduce performance degradation. Redundant hardware solutions are often financially impractical for large clusters used in technical computing. Secondly, applications affected by computer failure generally require human detection and intervention in order to be restarted from restart files. Valuable compute time is often lost between the time that the job fails and a user is made aware of the failure. Clusterware™ + AppRS provides functionality to migrate and restart applications affected by an unreachable host and ensure that the content of the CWD of such applications is preserved across a migration. AppRS is accessed by submitting jobs to AppRS-enabled queues. Such queues generally end in "_apprs". A number of utilities are also available for monitoring a job and its files: z apprs_hist z apprs_ls z apprs_clean z apprs_mpijob More information is available in the man page or HP Application ReStart User's Guide. % man apprs