Compaq ProLiant 1000 Architecting and Deploying High-Availability Solutions - Page 6

Architecting and Deploying High-Availability Solutions

6

ECG064/1198

Remote Hot Sites

(functional locations geographically distant from the primary operations center) are an

option if the Recovery Point and Recovery Time for an application are not very critical. An example might

be a billing application where the monthly statements could be delayed in mailing with minimum impact on

a business

Electronic Vaulting

(method of electronically storing, managing, and protecting data in a computer "vault"

which is located off-site in a physically secure location) is an option if Recovery Point is more important

than Recovery Time; if, for instance, an indeterminate amount of data cannot be lost or historical data

needs to be available online for reference. An example might be an inventory application where the most

current transactions are recoverable by other means and the application can be restarted where it left off

using the historical data as a basis for inventory status. This is a good example of a

data-centric

operation.

On-line Hot Backup

(data backup that is conducted while the system is in full operation) is necessary if the

Recovery Time is more critical than the Recovery Point A good example is an on-line traffic or production

control system where history is not as important as the current state of the situation. In air-traffic control,

where the planes were five minutes ago is not as critical as where they are now, because in five minutes

they may have moved 50 miles each, but in what directions? This is a classic example of a

transaction-

centric

operation.

24 x 365

(continuous availability) is the only viable option where both the Recovery Point and Recovery

Time are critical for an application.

Using the criteria of Recovery Point and Recovery Time, which state of availability is right for your

organization?

3. What Causes Downtime?

After looking at your information systems, the user community, and the cost of downtime, you can

determine the level of availability you need. Now it is time to focus on the events that can have a negative

impact on your ability to keep an application – and an organization – up and running.

Component faults due to hardware, software, or interoperability issues.

While the industry has come a long

way in reducing Mean Time Between Failure (MTBF) rates for individual hardware, packaging , and

mechanical components, the interdependent nature of today's multivendor and networked solutions makes

them vulnerable to hardware, software, and network interoperability problems.

Administrative intervention.

Just because it's planned downtime doesn't mean it's not downtime.

Management tasks like system maintenance, database backups, index builds, table reorganizations, cache

changes, application/operating system updates, system re-configuration, and a physical move may require

that a system be brought down. Or the intervention itself may cause a failure.

Building-level incidents.

In addition to system problems, disasters affecting a site or building, such as fire,

power loss, or flooding, can interrupt service by damaging systems, robbing them of power, or preventing

access to them.

Metropolitan area disaster.

Disasters, such as floods, fire, and blackouts, can also affect whole cities,

impacting systems located throughout the metropolitan area.

Regional events.

Computing can also be interrupted by disasters that affect systems across an even a larger

region. Hurricanes, earthquakes, or geopolitical disruptions can cause outages over hundreds of square

miles.

Do you know the probability of each of these events affecting your operation? Do you know what will

happen to your applications, particularly those in the “24 x 365” zone in each of these cases? Do you know

it can cost less than the alternative to minimize the negative impact that could occur? Understanding these

factors is crucial to determining the level of availability required by your organization.

Compaq ProLiant 1000 Architecting and Deploying High-Availability Solutions - Page 6

What Causes Downtime?

Page 6 highlights