In depth: Planning for disaster recovery

Feb 20, 2001
Auerbach Analysis

Every business faces risk factors that, unmanaged, carry the potential to disrupt operations or, in extreme cases, bring down the organization. By definition, risk contains elements that cannot be controlled. However, the risks faced by your organization can be assessed and managed; and the consequences of an unforeseen event can be controlled through the creation and implementation of disaster recovery plans.

In "Risk and the need for disaster planning," an article from Auerbach Publications, authors Denise Johnson McManus and Houston H. Carr cite historical examples to demonstrate how recent disasters have led companies to recognize the importance of disaster assessment, management, and recovery planning.

The article also outlines the major components of a disaster recovery plan:

To read the complete article, continue to page two.

Auerbach Publications on TechRepublic
For 40 years, Auerbach Publications has been publishing premier content for IT professionals. You can find many of its enterprise computing articles at TechRepublic. You can read more Auerbach Publications articles by clicking here.

By Denise Johnson McManus and Houston H. Carr

Introduction
Articles abound about the risk to business continuity that is always present due to the erratic actions of Mother Nature. Risk is often equated with external forces (e.g., natural disasters, such as floods, hurricanes, or earthquakes) that present the risk of power disruption, building destruction, or worse. Less obvious is the risk inherent in the adoption of a new computer-based system or the distribution of data processing and data storage across a country or the world via telecommunications networks. One would more readily view the risk in the telecommunications system than the risk associated with the new capability. Furthermore, equal risk may be present in disruption due to labor disputes, labor shortages, a poorly run or missing training program, or a flu epidemic that takes out one-half of the personnel for a week.

This article discusses risk in its more generic or basic form—not its outcome as the result of a fire, flood, or earthquake. Risk is inherent in any organization, in any operation, in any situation where the goal is continuance. There are ways to assess and manage this risk; however, first an examination of the nature of risk is necessary. Then, the reaction to risk will be addressed.

The nature of risk
According to Webster's Dictionary, "risk" is "the possibility of loss or injury; also, the degree of the probability of such loss." The four components of risk are threats, resources, modifying factors, and consequences. Threats are the broad range of forces capable of producing adverse consequences. Resources consist of the assets, people, or earnings potentially affected by threats. Modifying factors are the internal and external factors that influence the probability of a threat becoming a reality or the severity of consequences when the threat materializes. Consequences have to do with the way the threat manifests its effects on the resources and the extent of those effects.

Risk becomes loss when there is some adverse change in existing or expected circumstances. Change produces the uncertainty inherent in risk. No one can be sure if and when change will take place, nor can one be certain about the consequences of change. From an organizational standpoint, change may be internal or external. Because internal change is by definition controllable, an organization can respond to the risk associated with internal change in a proactive fashion. For example, the installation of a new management-ordered procedure invokes change. Part of the procedure-creating process should be a contingency plan in case some of the people or resources are temporarily not available.

External change, on the other hand, is uncontrollable by the organization, requiring responses that can be reactive. Such a situation would be a new tax law and the resultant financial consequences. To the degree that change can be anticipated, a proactive response is preferred. In any case, and for any risk environment, organizations should prepare for unforeseen incidents through risk assessment and management.

TechRepublic and Auerbach Publications
This article first appeared in the December-January 2001 issue of the Auerbach Information Management Service journal Information Management: Strategy, Systems and Technologies. It appears here under agreement with Auerbach Publications. For information on subscribing to this journal or to see a list of previously published topics, click here. To find out about other Auerbach publications, click here.
Risk assessment and management
In the use of any technology, process, or procedure, someone should determine where unexpected or undesired consequences are likely to occur. Managers must think about objectives, the system, and procedures they have installed to achieve these objectives, and the weak points in the equipment, staffing, and procedures. By detecting and recognizing risks, the result of adverse consequences will be less catastrophic than ignoring them.

Risk assessment and analysis involves a methodological investigation of the organization, its resources, personnel, procedures, and objectives to determine points of weakness. Finding such points, managers overtly control the risk by passing it to someone else (insurance or outsourcing the task) or strengthening the weak points by changes or building redundancies.

Risk management is the science and art of recognizing the existence of threats, determining their consequences to resources, and applying modifying factors in a cost-effective manner to keep adverse consequences within bounds.

Hurricanes Hugo and Andrew on the East Coast of the United States, the San Francisco earthquake on the West Coast, and the Chicago/Hinsdale central office fire are well-publicized, significant acts of nature or accidents. Just as significant but somewhat less expected are more common acts of nature and accidents. A severe storm in Florida left 500,000 people (homes and offices) without power. (If the organization or home used ISDN telephone service or cordless phones, it also will be without voice service because, unlike its analog counterpart, ISDN and cordless telephones in the home or office are not powered from the central office.) This storm was followed by a tornado with less widespread but more severe consequences.

A major snowstorm in the city of Birmingham, Alabama, in early March of 1993, brought more than 13 inches of snow to that southern city, and business halted. The city planners had not prepared for the possibility of a blizzard of this magnitude. Risk assessment would have considered, for example, whether their telecommunications systems needed to function in spite of the snow. Equally important for review is the vulnerability of equipment to water damage from the runoff as the snow melted. Meanwhile, 12,000 miles and five years away, New Zealand suffered a countrywide power outage. Although the country "closes down for each weekend," lack of a recovery plan could have disastrous consequences.

A credit card processing company in Georgia was prepared for Hurricane Opal in 1995, except it failed to account for the lack of telephone service and thus could not call its employees back to work. In a different city and time, a college in Texas placed its academic mainframe computer in the basement of a low-lying building, just above the sanitary sewer level, and the rains came. A commercial timeshare firm knew the risk of low-lying areas for its mainframe in Chicago and placed it on the fifth floor of a 10-story building. Snow came as expected, crushed the roof, and flooded the computer despite its lofty positioning.

TechRepublic and Auerbach Publications
This article first appeared in the December-January 2001 issue of the Auerbach Information Management Service journal Information Management: Strategy, Systems and Technologies. It appears here under agreement with Auerbach Publications. For information on subscribing to this journal or to see a list of previously published topics, click here. To find out about other Auerbach publications, click here.

Several more examples to support the vital nature of disaster recovery planning are in order. In a major defense contractor's facility in Texas, the entire second shift operation was halted due to a (drunk) truck driver running into a utility pole that carried the primary power to the facility. The only light in the office complex was provided by the buttons on the telephone. In the college of business at a major southeastern university, an electrical storm—not a hurricane, just a storm—took out all power to the building and campus. Although this eventuality had been foreseen, the emergency generator did not come online because the battery that powered the starter motor was dead.

A less obvious problem to assess and manage is what to do when someone in an office goes on vacation, is sick, or goes on medical leave. Hopefully, provisions have been made for another person with like skills to take that person's place; that person has been properly trained; and adequate documentation is in place to do the job. What about a labor strike, the flu season, or a computer virus? Snowstorms, hurricanes, flu epidemics, and floods are acts of nature, but labor strikes, computer viruses, and ill-prepared training programs are not. These latter events are seemingly less consequential but more likely to happen.

What about the everyday operations of a network and computers? Mainframe and desktop computers can be halted by a 100-millisecond flicker of the power when there is no uninterruptible power supply (UPS). Does the LAN file server have redundant components to avoid a single point of failure? Does it have a backup server for critical functions? Are there alternate lines from the PBX to the Telco's central office in case of an inadvertent line cut by a backhoe? One telecommunications-dependent firm has buried the telecommunications trunks on their premises in deep trenches and then poured concrete to protect against such digging. The authors have personal UPS devices on their desktop computers and surge protectors on the telephone lines.

Risk management is the analysis and subsequent actions taken to ensure that the organization can continue to operate under foreseeable adverse conditions, such as illness, labor strikes, hurricanes, earthquakes, fire, power outages, heavy rains, oppressive heat, or flu epidemics. The beginning of risk management is assessment, which leads to management on a continuous basis. A specific point is the creation of a disaster recovery plan in case of a catastrophic occurrence. The plan is based on procedures that occur every day that allow an organization to recover after a disaster and continue operations. It describes the place, procedures, and resources to provide for continued operations. Disaster plans are often referred to as business continuity plans for good reason.

TechRepublic and Auerbach Publications
This article first appeared in the December-January 2001 issue of the Auerbach Information Management Service journal Information Management: Strategy, Systems and Technologies. It appears here under agreement with Auerbach Publications. For information on subscribing to this journal or to see a list of previously published topics, click here. To find out about other Auerbach publications, click here.
Disaster recovery
A disaster recovery plan is a series of procedures to restore normal operations following a disaster—with maximum speed and minimal impact on operations. A comprehensive plan will include essential information and materials for necessary emergency action.

Planned procedures
Planned procedures are designed to eliminate unnecessary decision-making immediately following the disaster. Disaster recovery planning begins with preventive measures and tests to detect situations that might lead to significant problems. If this planning process is completed, the chance of experiencing a total disaster is lessened. The severity of a disaster determines the level of recovery measures. Disaster classifications are helpful in organizing procedures for a disaster plan. Figure 1 shows some classifications.

Figure 1


Regardless of the importance of the activity, there are nine essential steps for a successful implementation of disaster recovery planning, which are displayed in Figure 2. The first is commitment.

Figure 2
The essential steps of disaster recovery planning


Commitment
The key to beginning a successful disaster recovery plan is to gain commitment from top-level management and the organization. To obtain the required support, the CEO and top managers need to understand the business risk and personal liability if a disaster recovery plan is not developed and a disaster occurs. Although many companies have excuses for not developing a plan, a corporate policy should be mandated requiring disaster recovery planning. The corporate policy would assist in defining the charter for contingency planning, while encouraging cooperation with internal and external staff.

Furthermore, if the financial impact to the business does not warrant the financial support of the corporate executives, an analysis of The Foreign Corrupt Practices Act of 1977 should get the required attention and support of the officers. The act deals with the fiduciary responsibilities, or "standard of care," of the officers, which may be judged legally. In the legal publication Corpus Juris Secundum, the "standard of care" is defined as follows: "A director or officer is liable for the loss of corporate assets through his negligence, fraud, or abuse of trust."

TechRepublic and Auerbach Publications
This article first appeared in the December-January 2001 issue of the Auerbach Information Management Service journal Information Management: Strategy, Systems and Technologies. It appears here under agreement with Auerbach Publications. For information on subscribing to this journal or to see a list of previously published topics, click here. To find out about other Auerbach publications, click here.

However, the most convincing reason for having a business disaster recovery plan is that it simply makes good business sense to have a company protected from a major disaster. Additional reasons to have a recovery plan include a potential for greater profits and reduced liabilities to the company and the employees. Thus, a risk assessment provides a powerful argument for recovery planning. The assessment of current operations tells where the organization is at risk and helps determine the critical areas that require change to protect from the threats. Recovery from a major disaster will be expensive. However, the inability to recover quickly and support primary business functions would be significantly more costly and destructive to the company.

Computer resources are a specific area of concern. Rare would be the organization not utilizing a computer for daily operations. Many firms today rely fully on real-time processing, if only for credit checks. Statistics indicate that if a company's computers are down for more than five working days, 90 percent of those companies will be out of business in a year. Hubert Huschke, Executive Vice President of Union Bank of Switzerland, estimates that a complete breakdown of the company's network for two days could cause the failure of the bank. In this computer-intensive environment, several instances in financial services have been reported where collapses of services for only a few minutes have resulted in losses that could have financed the entire network several times over.

However, these can be avoided or greatly lessened if a coherent disaster recovery plan is developed and implemented. "The disaster recovery process generally is much longer than the duration of the disaster itself." The company experiences immediate problems from the disaster and continues to experience difficulties for several months. Financial and functional losses increase rapidly after the onset of an outage. Corrective action must be initiated quickly, and disaster recovery methods should be functioning by the end of the first week, if not the first day, of an outage. Loss of revenues and additional costs rise rapidly and become substantial as the outage continues. The inability to communicate with customers and suppliers is devastating and can prevent the company from staying in business. Therefore, an effective disaster recovery plan directly affects the bottom line—staying in business.

TechRepublic and Auerbach Publications
This article first appeared in the December-January 2001 issue of the Auerbach Information Management Service journal Information Management: Strategy, Systems and Technologies. It appears here under agreement with Auerbach Publications. For information on subscribing to this journal or to see a list of previously published topics, click here. To find out about other Auerbach publications, click here.

Costs
Costs are a major concern for disaster recovery plans. Some of the costs incurred for disaster recovery are costs of insurance, fees for hot-site backup, stockpiled equipment, supplies, forms, redundant facilities, cold sites, communications networks for recovery purposes, testing, and training and education. Disaster recovery planning costs are calculable and can be budgeted. Not only can they be allocated across many business units, but also can be amortized over many years. Many costs must be considered when developing the plan—not only the time invested by the team members but also implementation costs must be considered when developing the budget.

Planning
The process of developing a recovery plan involves management and staff members. Each member of the disaster recovery team has a specific role that is defined in the plan. Disaster recovery planning is a complex process; organizations must utilize a structured approach in determining the scope, collecting the data, performing analysis, developing assumptions, determining recovery tasks, and calculating milestones. The issues displayed in Figure 3 must be considered during the planning process. This highly interactive process requires information from throughout the organization. The plan requires continuous revision. It is out of date whenever a major change occurs in the organization.

Figure 3
Recovery planning issues


The process of building a plan is extremely valuable to the company. The purpose of identifying problems and developing a recovery process not only forces the organization to examine the impact of a disaster on the company and the business, but questions the very mode of operation. Thus, the end result should be a plan that can be utilized for all levels of disasters and potentially a change in the way business is conducted. Recovery from a major disaster requires the efficient execution of numerous small plans that comprise the master plan. Recovery managers select the plan, assign responsibility, and coordinate resources to execute the plan.

Conclusion
Many disasters that have occurred in the United States in recent years have driven companies to recognize the importance of disaster assessment, management, and recovery planning. Disaster recovery plans appear to be a cost-effective but underutilized tool. Organizations that have prepared for an extended outage through insurance and a contingency plan reported significantly lower expected loss of revenues, additional costs, and loss of capabilities. In the last 10 years, a major disaster has been reported somewhere in the United States, on the average, every year. Meanwhile, standard problems occur each month somewhere in the United States, for example, tornadoes in Oklahoma or Texas, severe storms in Florida, heavy rains in California, or a flu epidemic across the eastern seaboard. The size of the disaster is not the determining factor of staying in business; it is the disaster recovery plan that will determine if the doors will stay open or be closed. "Smart companies make it their business to have a disaster recovery plan in place. If a disaster does strike, being prepared can make the difference between a smooth recovery and a slow terrifying struggle to survive."

Therefore, it will be the organization that analyzes its operations and determines the threats to resources, the modifying factors in place, and the consequences of adding additional resources, procedures, or modifying factors on the ability of the organization to continue business in case of a disaster. The success of the assessment and disaster plan will be determined by the extent to which planned procedures are in place to eliminate unnecessary decision-making immediately following the disaster.

Notes
  1. Borsi, Robert S. Union Bank of Switzerland: Strategic Options When Outsourcing ATM Services, Harvard Business School Case 9-397-013, Oct 21, 1996, 1–16.
  2. Carr, Houston and Charles A. Snyder. The Management of Telecommunications, Irwin, 1997, Boston, MA.
  3. Howley, Peter A. Disaster Preparedness is Key to Any Telecommunications Plan. Disaster Recovery Journal, 7, April/May/June 1994, 26–32.
  4. Lewis, Steven. Disaster Recovery Planning: Suggestions to Top Management and Information Systems Managers. Journal of Systems Management, 45, May 1994, 28–33.
  5. McGaughey, Ronald E. Jr.; Charles A. Snyder; and Houston H. Carr. Implementing Information Technology for Competitive Advantage: Risk Management Issues, Information & Management, 26, 1994, 273–80.
  6. Powell, Jeanne D. Justifying Contingency Plans, Disaster Recovery Journal, 8, October/November/ December 1995, 41–44.
  7. Preston, Kathryn. Disaster Recovery Planning, Industrial Distribution, 83, December 1994, 65.
  8. Seymour, Jim, Y2K v.2: Time for Triage, PC Magazine, June 30, 1998, 93–94.

Denise Johnson McManus and Houston H. Carr are both on the faculty of the department of management, Auburn University.


TechRepublic and Auerbach Publications
This article first appeared in the December-January 2001 issue of the Auerbach Information Management Service journal Information Management: Strategy, Systems and Technologies. It appears here under agreement with Auerbach Publications. For information on subscribing to this journal or to see a list of previously published topics, click here. To find out about other Auerbach publications, click here.

Copyright © 1999-2001 TechRepublic, Inc.
Visit us at http://www.techrepublic.com/