What is an it disaster recovery and contingency plan
1. 1 By Compiled By : Abenet Asmellash Ethiopia
What is an IT disaster recovery plan?
IT disaster recovery plan is a roadmap teams can use to keep things secure and running in the
event of an emergency. It primarily consists of action steps, resources, and official policies that
provide guidance. While we can’t prevent events such as hacking or fires from happening, we
can create the IT equivalent of “stop, drop, and roll” if they do.
Having an IT disaster recovery plan makes it easier to keep business going as usual during and
after a crisis. Not only does this mitigate revenue loss, but it also saves money on more costly
solutions. Instead of waiting until something happens to allocate resources, you can outline both
procedure and budget considerations ahead of time.
IT capacity planning also make it easier for employees across every department to be productive.
The IT team can get right to work when something happens. Meanwhile, all other departments
affected by the disaster can either operate as usual or receive detailed information on what will
likely happen next and when they can get back on track.
Customers benefit too. When a hiccup does happen, clients are less likely to notice it since the
business will be running as usual. If the issue is known and undeniably disruptive, companies
can communicate more effectively with their customers and give them concrete information on
what to expect.
Why do you need an IT disaster recovery plan?
An IT disaster recovery plan is a safeguard against unpredictable emergencies. Disasters such as
floods, earthquakes, fires, hurricanes, hardware failure, outdated software, viruses, hackers,
ransom ware, and even human error can create huge problems for your IT. Even though we hope
these events never happen, we also recognize that they are a possibility. With an IT disaster
recovery plan, you can get back up and running sooner rather than later.
Besides keeping your entire system from being destroyed, IT disaster recovery plans also
improve customer relationships. When disaster strikes, not every customer who experiences an
interruption in their services will understand. Many might not like the idea of paying full price
for a subpar experience, regardless of what caused it. Keeping your systems up and running is
the smart way to ensure a positive cash flow and a happy audience even in the face of an
unpredictable event.
And it’s not just your customers you’ll be helping — your team will thank you too. Working
under high pressure to recover from an IT disaster without a plan in place can lead to more
stress, mistakes, and disorganization. But if you’ve already thought out a detailed procedure,
your team can confidently move forward, think on their feet, and keep up with their day-to-day
while recovering from a major disruptive event.
2. 2 By Compiled By : Abenet Asmellash Ethiopia
There’s one other huge advantage to making an IT disaster recovery plan that’s rarely discussed
— prevention. Like investments, operations delayed from an IT disaster compound over time.
The longer it takes for your team to create and implement a solution, the longer it takes to get
back up and running at full capacity. That means more project delays, customer inquiries, and
unexpected recovery issues to address. All of this is solved by having a well-thought-out IT
disaster recovery plan in place ahead of time.
What do you need to put in an IT contingency plan?
The best and most proactive move managers can make for their IT department when disaster
strikes is to maintain business continuity. Templates prepared ahead of time that outline steps for
business continuity make it easy to pinpoint vulnerabilities, view resource distribution, and
provide a common touchpoint for all communications.
Some common pitfalls with this include not knowing who is available, who has the right skill set
for the task, and what their bandwidth looks like at any given time. Make sure you have a
method for tracking employee workloads to make your recovery plan template actionable.
Your IT disaster recovery plan template should also offer communication tools. Time is of the
essence when issues come up, plus there is little margin for error for sharing updates, resources,
and approvals during a crisis. These two issues combined can seriously derail even the best-laid
plans. So having a solid template and communication plan working together at the same time is
essential.
In order for your team to spring into action, they’ll need a well thought-out plan. IT contingency
plans should be thorough, detailed, and organized. While your specific action steps may vary
depending on your team and infrastructure, every IT contingency plan needs the following:
An emergency contact list that includes full names, titles, companies, phone numbers,
after-hours contact information, and email address.
A notification plan for who will be contacted when specific situations arise, who is
responsible for reaching out and what happens if that person can't be reached.
A communication system for notifying other team members, partners, and customers that
includes sample messaging approved by legal and compliance.
A set of directions for conducting a diagnostic and determining the scope of the issue.
A defined list of disaster recovery IT project management roles and responsibilities.
Step-by-step instructions for how to begin recovery while maintaining operations
alongside key timelines. You must also have approval from project team leads and
stakeholders ahead of time.
An inventory of all relevant IT project management trends to account for, infrastructure
you have and want, and IT tools your team can pull from.
Physical and digital data storage backups.
Data access and permission settings squared away so that progress isn’t inhibited and
data remains secure.
A workflow for how you’ll regularly test and update your IT contingency plan.
3. 3 By Compiled By : Abenet Asmellash Ethiopia
Do cloud servers have contingency plans?
Cloud servers do experience emergencies of their own from time to time. Even if they don’t, they
might stop offering their services one day. While some servers offer solutions for hybrid cloud
security matters, your company is still responsible for finding data storage solutions if something
changes.
According to Network World, “This problem is not as severe in a conventional IT model. With a
traditional data center, you own whatever hardware you purchase, so even if the manufacturer
goes out of business, you still have the equipment and can keep using it but may have issues with
support.”
Or, if you are partnered with a newer cloud service provider, they might not offer hardware at all.
Either way, it’s good to be prepared with our own contingency plan.
Who should get involved in organizing an IT recovery plan?
IT recovery plans require a team effort. Most companies choose to form a committee of
representatives from every department who can speak to what they’ll need in the event of an
emergency. Key operations personnel such as finance team members and the PMO in project
management as well as other divisions should be included. Like with any business continuity
plan, managers should organize it while stakeholders and sponsors should approve it.
What is the worst-case scenario in IT recovery?
The absolute worst-case scenario in IT recovery is experiencing a disruption so large that it costs
you business. That can mean anything from losing intellectual property from hacker leaks to
losing servers that your day-to-day operations rely entirely on because of a powerful
thunderstorm that leaks rain through a warehouse roof. As the experts at Worksighted point out,
even ransomware can bankrupt an organization overnight.
Having an IT disaster recovery plan can mean the difference between winning a standoff with
hackers or shuttering a decades-old business.
Common IT disasters that ruin businesses
Natural disasters. Damage from hurricanes, floods, fires, and earthquakes to physical server
facilities and offices can disrupt IT.
Power outages. Not having access to data storage can often be extremely costly if the
information isn’t backed up elsewhere.
Hardware failure. Lost data can also be caused by faulty, dirty, or broken hardware.
4. 4 By Compiled By : Abenet Asmellash Ethiopia
Outdated software. Missing critical updates can leave servers open to new
vulnerabilities and create gaps between old data and new.
Cyber attacks and data leaks. Whether targeted or not, cyber attacks pose a threat as
both legal issues and in terms of intellectual property theft.
Lack of testing. Not testing your backup equipment or IT disaster recovery plan can lead
to missteps in an already high-stakes process.
Not having backup. Regardless of who you partner with for storage, your company is
responsible for having a backup if they suddenly shut down operations.
Example: Disaster recovery plan
The objective of a disaster recovery plan is to ensure that you can respond to a disaster or other
emergency that affects information systems and minimize the effect on the operation of the
business. When you have prepared the information described in this topic collection, store your
document in a safe, accessible location off site.
Section 1. Example: Major goals of a disaster recovery plan
Here are the major goals of a disaster recovery plan.
Section 2. Example: Personnel
You can use the tables in this topic to record your data processing personnel. You can include a
copy of the organization chart with your plan.
Section 3. Example: Application profile
You can use the Display Software Resources (DSPSFWRSC) command to complete the table in
this topic.
Section 4. Example: Inventory profile
You can use the Work with Hardware Products (WRKHDWPRD) command to complete the table
in this topic.
Section 5. Information services backup procedures
Use these procedures for information services backup.
Section 6. Disaster recovery procedures
For any disaster recovery plan, these three elements should be addressed.
Section 7. Recovery plan for mobile site
This topic provides information about how to plan your recovery task at a mobile site.
Section 8. Recovery plan for hot site
An alternate hot site plan should provide for an alternative (backup) site. The alternate site has
a backup system for temporary use while the home site is being reestablished.
Section 9. Restoring the entire system
You can learn how to restore the entire system.
5. 5 By Compiled By : Abenet Asmellash Ethiopia
Section 10. Rebuilding process
The management team must assess the damage and begin the reconstruction of a new data
center.
Section 11. Testing the disaster recovery plan
In successful contingency planning, it is important to test and evaluate the plan regularly.
Section 12. Disaster site rebuilding
Use this information to do disaster site rebuilding.
Section 13. Record of plan changes
Keep your plan current, and keep records of changes to your configuration, your applications,
and your backup schedules and procedures.
Here are the major goals of a disaster recovery plan.
To minimize interruptions to the normal operations.
To limit the extent of disruption and damage.
To minimize the economic impact of the interruption.
To establish alternative means of operation in advance.
To train personnel with emergency procedures.
To provide for smooth and rapid restoration of service.
You can use the tables in this topic to record your data processing personnel. You can
include a copy of the organization chart with your plan.
Data processing personnel
Name Position Address Telephone
Section 3. Example: Application profile
You can use the Display Software Resources (DSPSFWRSC) command to complete the table in
this topic.
Application profile
Application name
Critical
Yes /
No
Fixed asset
Yes / No
Manufacturer Comments
6. 6 By Compiled By : Abenet Asmellash Ethiopia
Application profile
Application name
Critical
Yes /
No
Fixed asset
Yes / No
Manufacturer Comments
Comment legend:
1.
Runs daily.
2.
Runs weekly on ____________.
3.
Runs monthly on ____________.
Section 4. Example: Inventory profile
You can use the Work with Hardware Products (WRKHDWPRD) command to complete the
table in this topic.
Application profile
Manufacturer Description Model
Serial
number
Own or
leased Cost
7. 7 By Compiled By : Abenet Asmellash Ethiopia
Application profile
Manufacturer Description Model
Serial
number
Own or
leased Cost
Notes:
1.
This list should be audited every ____________ months.
2.
This list should include the following items:
Processing units System printer
Disk units Tape and optical devices
Models Controllers
Workstation controllers I/O Processors
Personal computers General data communication
Spare workstations Spare displays
Telephones Racks
Air conditioner or heater Humidifier or dehumidifier
Miscellaneous inventory
Description Quantity Comments
8. 8 By Compiled By : Abenet Asmellash Ethiopia
Miscellaneous inventory
Description Quantity Comments
Note: This list should include the following items:
Tapes CDs and DVDs
PC software Emulation packages
File cabinet contents or documentation Language software (such as COBOL and RPG)
Tape vault contents Printer supplies (such as paper and forms)
Optical media
Section 5. Information services backup procedures
Use these procedures for information services backup.
System i® environment
o Daily, journals receivers are changed at ____________ and at ____________.
o Daily, a saving of changed objects in the following libraries and directories is done at
____________:
____________
____________
____________
____________
____________
____________
____________
____________
The preceding procedure also saves the journals and journal receivers.
o On ____________ at ____________ a complete save of the system is done.
o All save media is stored off-site in a vault at ____________ location.
Personal Computer
o It is suggested that all personal computers be backed up. Copies of the personal
computer files should be uploaded to the System i environment on ____________
(date) at ____________ (time), just before a complete save of the system is done. It is
then saved with the normal system save procedure. This provides for a more secure
9. 9 By Compiled By : Abenet Asmellash Ethiopia
backup of personal computer-related systems where a local area disaster can wipe out
important personal computer systems.
Section 6. Disaster recovery procedures
For any disaster recovery plan, these three elements should be addressed.
Emergency response procedures
To document the appropriate emergency response to a fire, natural disaster, or any other
activity in order to protect lives and limit damage.
Backup operations procedures
To ensure that essential data processing operational tasks can be conducted after the
disruption.
Recovery actions procedures
To facilitate the rapid restoration of a data processing system following a disaster.
Disaster action checklist
This checklist provides possible initial actions that you might take following a disaster.
Recovery startup procedures for use after actual disaster
Consider these recovery startup procedures for use after actual disaster.
Section 8. Recovery plan for hot site
An alternate hot site plan should provide for an alternative (backup) site. The alternate site has a
backup system for temporary use while the home site is being reestablished.
1. Notify ____________ of the nature of the disaster and of its desire for a hot site.
2. Request air shipment of modems to ____________ for communications. (See ____________ for
communications for the hot site.)
3. Confirm in writing the telephone notification to ____________ within 48 hours of the telephone
notification.
4. Begin making necessary travel arrangements to the site for the operations team.
5. Confirm that you have enough save media and that it is packed for shipment to restore on the
backup system.
6. Prepare a purchase order to cover the use of the backup system.
7. Review the checklist for all necessary materials before departing to the hot site.
8. Make sure that the disaster recovery team at the disaster site has the necessary information to
begin restoring the site.
9. Provide for travel expenses (cash advance).
10. After arriving at the hot site, contact home base to establish communications procedures.
10. 10 By Compiled By : Abenet Asmellash Ethiopia
11. Review materials brought to the hot site for completeness.
12. Start to load the system from the save media.
13. Begin normal operations as soon as possible:
a. Daily jobs
b. Daily saves
c. Weekly saves
14. Plan the schedule to back up the hot-site system in order to restore on the home-base
computer.
Alternate-site system configuration
You can attach the alternate-site system configuration here.
Section 9. Restoring the entire system
You can learn how to restore the entire system.
To get your system back to the way it was before the disaster, use the procedures in Checklist 20:
Recovering your entire system after a complete system loss.
Before you begin: Find the following save media, equipment, and information from the on-site
tape vault or the offsite storage location:
If you install from the alternate installation device, you need both your save media and the CD-
ROM media containing the Licensed Internal Code.
All save media from the most recent complete save operation
The most recent save media from saving security data (SAVSECDTA or SAVSYS)
The most recent save media from saving your configuration, if necessary
All save media that contains journals and journal receivers that you saved since the most recent
daily save operation
All save media from the most recent daily save operation
PTF list (stored with the most recent complete save media, weekly save media, or both)
Save media list from most recent complete save operation
Save media list from most recent weekly save operation
Save media list from daily saves
History log from the most recent complete save operation
History log from the most recent weekly save operation
History log from the daily save operations
The Installing, upgrading, or deleting i5/OS and related software PDF. You can order a printed
version of this PDF (SC41-5120; feature code 8006) with i5/OS software upgrade orders or new
hardware orders.
The Recovering your system PDF. You can order a printed version of this PDF (SC41-5304;
feature code 8007) with i5/OS software upgrade orders or new hardware orders.
Telephone directory
Modem manual
Tool kit
11. 11 By Compiled By : Abenet Asmellash Ethiopia
Section 10. Rebuilding process
The management team must assess the damage and begin the reconstruction of a new data center.
If the original site must be restored or replaced, the following questions are some of the factors to
consider:
What is the projected availability of all needed computer equipment?
Will it be more effective and efficient to upgrade the computer systems with newer equipment?
What is the estimated time needed for repairs or construction of the data site?
Is there an alternative site that more readily can be upgraded for computer purposes?
After the decision to rebuild the data center has been made, go to Section 12. Disaster site
rebuilding.
Section 11. Testing the disaster recovery plan
In successful contingency planning, it is important to test and evaluate the plan regularly.
Data processing operations are volatile in nature, resulting in frequent changes to equipment,
programs, and documentation. These actions make it critical to consider the plan as a changing
document.
Table 1 should be helpful for conducting a recovery test.
Table 1. Checklist for testing the disaster recovery plan
Item Yes No Applicable
Not
applicable Comments
Conducting a Recovery Test
1. Select the purpose of the test. What aspects
of the plan are being evaluated?
2. Describe the objectives of the test. How will
you measure successful achievement of the
objectives?
3. Meet with management and explain the test
and objectives. Gain their agreement and
support.
4. Have management announce the test and the
expected completion time.
5. Collect test results at the end of the test
12. 12 By Compiled By : Abenet Asmellash Ethiopia
Table 1. Checklist for testing the disaster recovery plan
Item Yes No Applicable
Not
applicable Comments
period.
6. Evaluate results. Was recovery successful?
Why or why not?
7. Determine the implications of the test results.
Does successful recovery in a simple case
imply successful recovery for all critical jobs in
the tolerable outage period?
8. Make suggestions for changes. Call for
responses by a given date.
9. Notify other areas of results. Include users and
auditors.
10. Change the disaster recovery plan manual as
necessary.
Areas to be tested
1. Recovery of individual application systems by
using files and documentation stored off-site.
2. Reloading of system save media and
performing an initial program load (IPL) by
using files and documentation stored off-site.
3. Ability to process on a different computer.
4. Ability of management to determine priority
of systems with limited processing.
5. Ability to recover and process successfully
without key people.
6. Ability of the plan to clarify areas of
responsibility and the chain of command.
7. Effectiveness of security measures and
security bypass procedures during the
recovery period.
8. Ability to accomplish emergency evacuation
and basic first-aid responses.
9. Ability of users of real time systems to cope
with a temporary loss of online information.
10. Ability of users to continue day-to-day
operations without applications or jobs that
are considered noncritical.
13. 13 By Compiled By : Abenet Asmellash Ethiopia
Table 1. Checklist for testing the disaster recovery plan
Item Yes No Applicable
Not
applicable Comments
11. Ability to contact the key people or their
designated alternates quickly.
12. Ability of data entry personnel to provide the
input to critical systems by using alternate
sites and different input media.
13. Availability of peripheral equipment and
processing, such as printers and scanners.
14. Availability of support equipment, such as air
conditioners and dehumidifiers.
15. Availability of support: supplies,
transportation, communication.
16. Distribution of output produced at the
recovery site.
17. Availability of important forms and paper
stock.
18. Ability to adapt plan to lesser disasters.
Section 12. Disaster site rebuilding
Use this information to do disaster site rebuilding.
Floor plan of data center.
Determine current hardware needs and possible alternatives.
Data center square footage, power requirements and security requirements.
o Square footage ____________
o Power requirements ____________
o Security requirements: locked area, preferably with combination lock on one door.
o Floor-to-ceiling studding
o Detectors for high temperature, water, smoke, fire and motion
o Raised floor
Vendors
You can attach the vendors information here.
Floor plan
You can include a copy of the proposed floor plan here.
14. 14 By Compiled By : Abenet Asmellash Ethiopia
Section 13. Record of plan changes
Keep your plan current, and keep records of changes to your configuration, your applications,
and your backup schedules and procedures.
You can get print a list of your current local hardware by typing the following command:
DSPLCLHDW OUTPUT(*PRINT)