Compaq ProLiant 1200 Architecting and Deploying High-Availability Solutions - Page 6
What Causes Downtime?
View all Compaq ProLiant 1200 manuals
Add to My Manuals
Save this manual to your list of manuals |
Page 6 highlights
Architecting and Deploying High-Availability Solutions 6 Remote Hot Sites (functional locations geographically distant from the primary operations center) are an option if the Recovery Point and Recovery Time for an application are not very critical. An example might be a billing application where the monthly statements could be delayed in mailing with minimum impact on a business Electronic Vaulting (method of electronically storing, managing, and protecting data in a computer "vault" which is located off-site in a physically secure location) is an option if Recovery Point is more important than Recovery Time; if, for instance, an indeterminate amount of data cannot be lost or historical data needs to be available online for reference. An example might be an inventory application where the most current transactions are recoverable by other means and the application can be restarted where it left off using the historical data as a basis for inventory status. This is a good example of a data-centric operation. On-line Hot Backup (data backup that is conducted while the system is in full operation) is necessary if the Recovery Time is more critical than the Recovery Point A good example is an on-line traffic or production control system where history is not as important as the current state of the situation. In air-traffic control, where the planes were five minutes ago is not as critical as where they are now, because in five minutes they may have moved 50 miles each, but in what directions? This is a classic example of a transactioncentric operation. 24 x 365 (continuous availability) is the only viable option where both the Recovery Point and Recovery Time are critical for an application. Using the criteria of Recovery Point and Recovery Time, which state of availability is right for your organization? 3. What Causes Downtime? After looking at your information systems, the user community, and the cost of downtime, you can determine the level of availability you need. Now it is time to focus on the events that can have a negative impact on your ability to keep an application - and an organization - up and running. Component faults due to hardware, software, or interoperability issues. While the industry has come a long way in reducing Mean Time Between Failure (MTBF) rates for individual hardware, packaging , and mechanical components, the interdependent nature of today's multivendor and networked solutions makes them vulnerable to hardware, software, and network interoperability problems. Administrative intervention. Just because it's planned downtime doesn't mean it's not downtime. Management tasks like system maintenance, database backups, index builds, table reorganizations, cache changes, application/operating system updates, system re-configuration, and a physical move may require that a system be brought down. Or the intervention itself may cause a failure. Building-level incidents. In addition to system problems, disasters affecting a site or building, such as fire, power loss, or flooding, can interrupt service by damaging systems, robbing them of power, or preventing access to them. Metropolitan area disaster. Disasters, such as floods, fire, and blackouts, can also affect whole cities, impacting systems located throughout the metropolitan area. Regional events. Computing can also be interrupted by disasters that affect systems across an even a larger region. Hurricanes, earthquakes, or geopolitical disruptions can cause outages over hundreds of square miles. Do you know the probability of each of these events affecting your operation? Do you know what will happen to your applications, particularly those in the "24 x 365" zone in each of these cases? Do you know it can cost less than the alternative to minimize the negative impact that could occur? Understanding these factors is crucial to determining the level of availability required by your organization. ECG064/1198