12. Linda Robinson, Raleigh
Thanks also to the following IBMers for their invaluable contributions to this
project:
John Gates, Tape Product Manager, Raleigh
Lee Pisarek, Netfinity Technology Lab, Raleigh
Dan Watanabe, Tape and Optics Business Development, Tuscon
Comments welcome
Your comments are important to us!
We want our Redbooks to be as helpful as possible. Please send us your comments
about this or other Redbooks in one of the following ways:
• Fax the evaluation form found in “IBM Redbooks review” on page 305 to the fax
number shown on the form.
• Use the online evaluation form found at http://www.redbooks.ibm.com/
• Send your comments in an Internet note to redbook@us.ibm.com
x Netfinity Tape Solutions
16. several years, depending on legal standards). As a result, backup products
should be able to differentiate between these two types of data, since storage
policies will differ.
Besides a difference in handling this data, the storage device and media will have
specific needs. Since data will be kept for a long time, media lifetime must be very
high, which means you might need tape devices that are backward compatible.
Physical storage is as important. It should be an environmentally controlled,
secured area.
Finally, availability of this data should be very high. That is why some sources
suggest keeping a second backup server, entirely identical to the production system,
on standby in a remote location, together with an extra copy of the media.
2.2 Backup methodologies
This section explains the different ways our data will be backed up, what will be
backed up, and where will it go. Different methods exist, each having its
advantages and disadvantages. We will discuss three common ways in which
data is approached by backup programs. When an approach is decided upon, the
next step is to set the backup pattern that will be used. The backup pattern can be
seen as the way the backup program determines how data will be handled over a
certain time period. This leads us to another important factor in backup
operations: continuity. There is a start point, and from then on, reliable backups
must be maintained. This is why backup implementation should be very well
planned before starting.
2.2.1 When will a file be backed up?
2.2.1.1 Full backup
A full backup is simply that: a complete backup of every single file.
It is the start point for every backup implementation. Every file that needs to be
backed up, will have to be backed up at least once.
The advantage of such a backup is that files are easily found when needed. Since
full backups include all data on your hard drive, you do not have to search through
several tapes to find the files you need to restore. If you should need to restore
the entire system, all of the most current information can be found on the last
backup tape (or set of tapes).
The disadvantage is that doing nothing but full backups leads to redundancy
which wastes both media and time. A backup strategy would normally include a
combination of full, incremental and/or differential backups.
2.2.1.2 Incremental backup
Incremental backups include files that were created or changed since the last
backup (that is, the last full or incremental backup). To achieve this, the status of
each file must be recorded either within the backup software or through the use of
the archive attribute of the files. If no previous backup was made, an incremental
backup is equivalent to a full backup.
4 Netfinity Tape Solutions
17. Incremental backups make better use of media compared to full backups. Only
files that were created or changed since the last backup are included, so less
backup space is used and less time is required.
Note
The definition of a file change can differ between backup applications. Some
criteria used for marking a file as changed include:
• Data changes
• Location changes
• Attribute changes (last modification or access date, archive bit)
• Security changes
The disadvantage is that multiple tapes are needed to restore a set of files. The
files can be spread over all the tapes in use since the last full backup. You may
have to search several tapes to find the file you wish to restore. The backup
software can minimize this by remembering where files are located; however a
restoration may still require access to all incremental backups.
2.2.1.3 Differential backup
A differential backup includes all files that were created or modified since the last
full backup. Note the difference between incremental and differential: incremental
backups save files changed since the last (incremental or full) backup, whereas
differential backups save files changed since the last full backup. In some
publications, a differential backup is also called a cumulative incremental backup.
The advantages over full backups are that they are quicker and use less media.
The advantage over an incremental backup is that the restore process is more
efficient — at worst, the restore will require only the latest differential backup set
and the latest full backup set, whereas an incremental backup could require all
incremental backup sets and the full backup set.
The disadvantage of differential backups is that longer and longer time is needed
to perform them as the amount of changed data grows. Compared to incremental
backups, differential backups use more time and media — each backup would
store much of the same information plus the latest information added or created
since the last full backup.
2.2.2 Backup patterns
A backup pattern is the way we will back up our data. Now that we have defined
the different types of backups, the question is how should we combine them?
Tape usage and reusage are important factors, because tape management will
get complicated when dealing with a large numbers of tapes, and media costs will
rise if we do not reuse tapes.
2.2.2.1 Full/Incremental pattern
The most common way of performing backups is to take full backups on a regular
basis, with incremental backups in between.
To avoid the management of too many tapes, the number of incremental backups
should be as few as possible. The average frequency is one full backup every
week, plus five or six incremental backups (one per day) in between. This is
shown graphically in Figure 1 on page 6.
Chapter 2. Strategy 5
18. This way of performing backups implies:
• One tape (or set of tapes) per day
• Very little data on each tape (except the full backup tapes)
• When performing the second full backup, you ignore all of the previous full
backups, erase the tapes, and send them back to the scratch pool.
The administration of the tapes, inventory and tracking, tape labeling, and
archiving must be done manually in most cases. In addition, each time you do a
full backup, you send all of the data again.
When doing a full restore, you will need to start by restoring the full backup, then
restore the changes using every incremental backup.
Sun Mon Tue Wed Thu Fri Sat
Week 1 F I I I I I I
Week 2 F I I I I I I
F Full backup
I Incremental backup
Figure 1. Tape usage in full/incremental backup pattern
An important factor within each backup pattern is tape usage and reutilization. In
the example above (Figure 1), if in week 2, you need to restore a file that was
backed up in week 1, you will need to have these tapes still available. This means
that the number of tapes needed increases significantly. That is why rotation
schedules are a very important part of tape management. Tape rotation
schedules will provide you with different versions of files, without having a large
number of tapes.
A commonly used tape rotation strategy is the “grandfather-father-son” schedule.
This name reflects the use of three generations of backup tapes: grandfather
tapes, father tapes and son tapes. To explain, let us start our backups.
On Sunday, a full backup is taken to a tape labeled “Week_1”. From Monday to
Saturday, backups are taken to tapes labeled “Monday”, “Tuesday”, etc. The next
Sunday, a full backup is taken to a tape “Week_2”. On Monday, we reuse the
tapes labeled with the names of the week (the same tapes as used in week 1).
These tapes are called the son tapes. For the next two weeks, we take weekly full
backups to separate tapes, and store daily backups on the son tapes. At the end
of the month, this leaves us with four father tapes, labeled “Week_1”, “Week_2”,
“Week_3”, “Week_4”. This gives us the possibility to restore a version of a file that
is one month in age. The last day of the month, a backup is taken to a grandfather
tape, labeled “Month_1”. After this, the “Week_1” through “Week_4” tapes can be
reused to do the weekly full backup.
6 Netfinity Tape Solutions
19. So, you will have a set of 5 son tapes reused weekly, a set of 4 father tapes
reused monthly, and a set of 4 or 12 grandfather tapes (depending on the amount
of time you want to cover).
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
Week_1
Week_2
Week_3
Wednesday Thursday Friday Saturday Week_4
Monday Tuesday Month_1
Daily Backup (Son backup set)
Weekly Full Backup (Father backup set)
Monthly Full Backup (Grandfather backup set)
Figure 2. Grandfather-Father-Son media rotation schedule
2.2.2.2 Full/differential pattern
Another way of performing backups is to take full backups and differential
backups, with incremental backups in between.
In this pattern:
• A full backup saves every file.
• A differential backup saves the files that have changed since the previous full
backup.
• An incremental backup saves the files that have changed since the previous
incremental backup (or the previous differential backup if no previous
incremental backups exist, or the previous full if no previous differentials
exist).
This process reduces the number of tapes to manage because you can discard
your incremental tapes once you have done a differential. You still have to
manage the incremental tapes prior to the differential backup, however.
This way of performing backups implies:
• One tape (or set of tapes) per day
• Very little data on each tape (except the full backup tape)
• More tapes to manage, because you have to keep the full backup tapes, the
differential tapes, and the incremental tapes
Chapter 2. Strategy 7
20. Sun Mon Tue Wed Thu Fri Sat
F I I D I I D
F Full backup New/changed data since last incremental backup
I Incremental backup Data from previous days' incremental backups
D Differential backup
Figure 3. Tape usage in full/differential backup patterns
The advantage of the full/differential pattern over a full/incremental pattern is that a
restore will use full, differential and incremental backups which require fewer tapes.
(See 2.2.2.4, “Example” on page 9.)
As in the full/incremental pattern, tape rotation can be implemented to limit the
number of tapes used, while keeping a certain number of versions of each file
over a certain time period
2.2.2.3 Incremental forever pattern
Since one of the critical factors in any backup is the amount of data that has to be
moved, a way of limiting this amount should be pursued. The best way to do this
is to back up changes only. Using the incremental forever pattern, only
incremental backups are performed. This means that there is no need for regular
full or differential backups. Though the first backup will be an incremental that will
back up everything (so, essentially the same as a full backup), only incremental
backups need to be taken afterwards.
It is clear that this pattern will limit the amount of backed up data, but turns tape
management and usage into a very complex process. That is why you will need a
backup application that is capable of managing these tapes.
A good example of this is tape reusage. Since there is no determined point in
time when tapes can be reused (as we had in the previous two patterns), the
number of tapes can increase dramatically. Therefore, the application should be
able to check tapes and clean them if necessary. This cleanup (or tape
reclamation) should occur when a tape holds backup data that will no longer be
used, since newer versions have been backed up.
Another point is that, when backing up data from different machines, their data
can be dispersed over a multitude of different tapes. Since mounting a tape is a
slow process, this should be avoided. That is why some applications have a
mechanism that is called collocation. Collocation will try to maintain the data of
one machine on the fewest number of tapes possible. This should mean a
performance gain when restoring, but will slow down the backup in cases where
multiple machines need to back up their data to a single tape drive. Instead of
moving the backup data of both clients to the same tape, the backup program will
try to put the data of both clients on separate tapes. Therefore, the second client
will have to wait until the backup of the first one completes before it can start its
8 Netfinity Tape Solutions
21. backup. Again, applications have been provided to limit the impact of this (see
2.5.4, “Hierarchical storage” on page 22).
2.2.2.4 Example
To make things a bit clearer, let’s look at an example. We have a machine with
20 GB of data, and each day about 5% of this data changes. This means we will
have to back up about 1 GB of data for each incremental backup. The network will
be the determining factor for the data transfer rate (we will assume a 16 Mbps
token-ring network), and backup and restore throughput is equal.
Table 1 shows times needed for backup operations and the type of backups:
Table 1. Backup operation: time required using specific backup patterns
Pattern Sun Mon Tue Wed Thu Fri Sat
Full/increment Type Full Incr Incr Incr Incr Incr Incr
al (Figure 1 on
page 6) Time (sec) 10240 512 512 512 512 512 512
Full/differential Type Full Incr Incr Diff Incr Incr Diff
(Figure 3 on
page 8) Time (sec) 10240 512 512 1536 512 512 1536
Incremental Type Incr Incr Incr Incr Incr Incr Incr
Time (sec) 5121 512 512 512 512 512 512
1.The first incremental backup will take 10240 seconds but here we assume
that Sunday’s backup is not the first backup.
If we look at the restore operation, we will need to determine the number of tapes
that are required and the time needed to restore the data. Let’s assume that we
have to do a full restore (that is, 20 MB) on Friday (restoring from Thursday’s
backups).
Table 2. Restore operation: total number of tapes and total amount of time required
Type Number of tapes Time (seconds)
Full/incremental 5 6000
Sun, Mon, Tue, Wed, Thu (10240 + 4 x 512= 12288)
Full/differential 3 6000
Sun, Wed, Thu (10240 +1536 + 512 =
12288)
Incremental Unknown 10240
From this we conclude:
• A full restore is faster when using incremental strategies, but the number of
tapes needed is hard to predict.
• The number of tapes is the least when using the differential pattern.
Chapter 2. Strategy 9
22. 2.3 System and storage topologies
When implementing a backup solution, the first thing to look at is how you are
going to set up your site. Different possibilities exist, each giving some
advantages and disadvantages. For SAN implementations, refer to 2.4, “Storage
area network implementations” on page 14.
The following topology models will be discussed:
• Direct connection
• Single server site
• Two-tier site
• Multi-tier site or branch office model
There is no one “best” solution applicable to every situation. Factors to be
considered when deciding on a backup solution include:
• The network bandwidth available
• The period available for backup activity
• The capabilities of the backup software
• The size and number of machines to be backed up
2.3.1 Direct tape connection
The most easy topology to understand, is the one where we connect our tape
device directly to the machine we are going to back up (see Figure 4). One
advantage of this setup is the speed of the link between data and backup device
(typically SCSI) versus the network connection used in other models.
The disadvantages of this model are limited scalability, manageability and
hardware cost (one tape device needed for every machine that requires backup).
This setup can be suited for sites with a limited number of machines that need to
be backed up, or for emergency restores.
Storage
Device
Figure 4. Direct tape connection
2.3.2 Single server model
As opposed to the direct connection model, this type of setup is based on a
backup server, connected through a network to the machines that will need to
take a backup. These machines are often referred to as clients, nodes or agents.
The tape device (or other storage media) will be connected to this backup server
(see Figure 5). The advantages of this design are that centralized storage
administration is possible and the number of storage devices is reduced (and
probably the cost).
10 Netfinity Tape Solutions
23. However, one of the problems here could be the network bandwidth. Since all
data that is backed up needs to go over the network, the throughput is smaller
than what we have using a direct tape connection. Every client that is added will
need some of this bandwidth (see 2.5.2, “Network bandwidth considerations” on
page 20). This bandwidth issue becomes even more important when dealing with
a distributed site. Let’s imagine that one of the machines that needs to be backed
up is located in a different location than the backup server, with only a very slow
link between these two sites. Throughput could diminish in such a way that it
would take longer than 24 hours to back up the remote system. In this case, a
two-tier solution would be better (as discussed in 2.3.3, “Two-tier model” on page
11).
Although not required, it is advised that the machine used as backup server
should be a dedicated machine. The reason for this is that backup and restore
operations would have an impact on this server’s performance. If you included it
in your regular server pool, acting as a file or application server, it could slow
down all operations on the network servers.
Machines that
need to be
backed up.
Backup server
Figure 5. Single server model
This design is well suited for sites with a limited number of machines. There are
multiple reasons for this. For example, network bandwidth is not unlimited.
Another reason for the limit on clients that a single server will support is that each
session will use resources (processor, memory) on the backup server.
2.3.3 Two-tier model
As discussed in 2.3.2, “Single server model” on page 10, scalability and network
bandwidth are limited when working with a single server site. In a two-tier model,
an intermediate backup server is used as a staging platform (see Figure 6 on
page 12). The advantages are twofold:
1. The backup is done to the first backup server (or source server), which resides
locally (and has a LAN connection), and only then forwarded to the central
Chapter 2. Strategy 11
24. backup server (or target server). This can be done asynchronously, so that
communication performance between the first and second-level backup
servers is not critical.
2. The backup completes in a much shorter time as data transmission is not
slowed down by tape drive write speeds. This leads to much shorter backup
windows.
You could also load balance large sites by adding additional source servers.
Source Servers with local backups
forwarded asynchronously to...
Target or Central
Server
Figure 6. Two-tier model
Figure 7 shows what happens with the data that needs to be backed up. In the
first stage, data is moved to the source server. This happens during the period of
time that we have to take our backups (referred to as backup window; see 2.5.1,
“Scheduling backups” on page 19).
12 Netfinity Tape Solutions
25. Stage 1: Data is backed up to the Stage 2: Data is moved from
first backup server. This operation Storage 1 to Storage 2. This
should complete during the normal can happen outside of the
backup window. backup window.
Storage 2
Storage 1
DATA
Figure 7. Data movement in a two-tier model
The specifications of the storage device connected to this source server should
be sufficient to store all the data that is backed up. Typically, it will also be a fast
device (probably a disk drive). In the second stage, data on this storage device is
moved across the network to a second backup server. This normally happens
after stage 1 completes (but not necessarily, however), and can be done outside
of the normal backup window. The only rule here is that all data from the source
servers must be moved to the target server before the backup window restarts.
This setup gives advantages with regard to scalability, since you can add as many
source servers as you want. However, more intelligent software is required to
manage the transfer of backed up data both in backup mode and in restore mode.
In the case of a restore operation, the user should not need to know on which
backup server the data is.
Another advantage of this server storage hierarchy is that in case of a site
disaster at the source server location, the backups still reside on the target
server. Of course, this advantage will only be true if the target and source servers
are geographically separated, and all backup data has been moved to the central
server.
2.3.4 Multi-tier model
The multi-tier or branch office model is an extension of the two-tier model, but
with another stage added (and you can add even more stages if you wish). The
same advantages and disadvantages can be observed. Scalability goes up, but
so does complexity.
Chapter 2. Strategy 13
26. Branches
Regional Offices
Central Server
Figure 8. Multi-tier model
2.4 Storage area network implementations
In this section, we will introduce some tape storage implementations using a
storage area network architecture. This is not intended to be an introduction to
SAN itself. It is limited to currently supported and tested configurations. For more
information please refer to the following redbooks: Introduction to Storage Area
Networks, SG24-5470 and Storage Area Networks: Tape Future in Fabrics,
SG24-5474.
The IBM definition of a storage area network, or SAN, is a dedicated, centrally
managed, secure information infrastructure, which enables any-to-any
interconnection of servers and storage systems.
A SAN is made up of the following components:
• A Fibre Channel topology
• One or more host nodes
• One or more storage nodes
• Management software
14 Netfinity Tape Solutions
27. The SAN topology is a combination of components, which can be compared to
those used in local area networks. Examples of such components are hubs,
gateways, switches and routers. The transport media used is Fibre Channel,
which is defined in several ANSI standards. Although the name, Fibre Channel,
assumes the usage of fiber connections, copper wiring is also supported. This
topology is used to interconnect nodes. The two types of nodes are host nodes,
such as the FC adapter of a server, and storage nodes. Storage nodes can be
any devices that connect to a SAN.
When looking at the above description, it is clear that many configurations can be
created. However, only a limited number of implementations of the SAN
architecture are currently supported for Netfinity backup solutions. This number
will certainly rise in the future.
2.4.1 Why use SAN for tape storage?
There can be several reasons to use SAN, and the importance of these reasons
will depend on your requirements, such as availability, cost and performance. One
thing that should be noted is that current Netfinity SAN implementations are
limited to tape libraries, and not single tape drives. Besides the lack of tested
implementations using single drives, another important fact is responsible for this
lack of support: the main reason for implementing SAN solutions is tape drive and
media sharing. Both concepts are possible when a media pool is available, and
the tape devices have enough intelligence to share this media between them.
Neither of these concepts is applicable to single tape drives.
When talking about the availability of tape storage, two separate points can be
discussed. The first one is availability of the hardware, meaning the tape library
itself. The second one is the availability of the data backed up to tape. In current
high availability implementations, this data is backed up and stored off-site.
Although this way of working is generally accepted, it also might be a good thing
to automate this. By doing so, a copy of local tapes would be sent to a remote site
without human intervention. Retrieving these copies would also be transparent.
This technique, which is sometimes referred to as automatic vaulting, can be
achieved by using the SAN architecture.
Performance issues can also be addressed using SAN architectures. When using
a client/server backup model (the client backs up the data to the backup server),
all the backup data must pass through the network (LAN or WAN). In some cases,
for example, if the backup client is a big database server, the network can no
longer deliver the throughput that is needed to complete the backup in a certain
time frame. Current solutions would consist of putting a local tape device on the
backup client. Besides the extra cost of additional tape devices, decentralizing
backup management can be difficult to maintain. SAN provides a solution by a
technique called “LAN-free backup”. Here, only the meta data (labeled control
data in Figure 9 on page 16) flows over the LAN, while the actual backup data
moves directly from the client to the storage device connected to the SAN.
Chapter 2. Strategy 15
28. Control Data
Backup Backup
Client Server
Data
SAN
Backup Data Control Data
Tape Storage
Node
Figure 9. SAN-based LAN-free backup
Even though this solution is still in an early phase, the next step towards
performance improvement has already been architected. This will be called
“server-free backup”. Here, client-attached SAN storage moves data immediately
to the tape storage. Besides having the advantage that most of the data no longer
needs to be backed up through the network, you get an additional performance
gain by bypassing the SCSI interface. Both connections (SCSI and network) have
a lower throughput than the SAN interface. Its nominal throughput is rated at
1 Gbps. Future implementations will allow this figure to extend to 4 Gbps. See
2.5.2, “Network bandwidth considerations” on page 20 for network throughput
figures. Compared to SCSI, operating at 40 MBps, FC-AL operates at 100 MBps.
Since FC-AL supports full-duplex communications, the total throughput can go up
to 200 MBps.
Control Data
Backup Backup
Client Server
SAN
Control Data
Backup Data
Client
Data Tape Storage
Node
Figure 10. SAN Server-free backup
16 Netfinity Tape Solutions
29. Finally, cost reduction can be an important factor in deciding to move tape
storage from traditional SCSI attachments to SAN attachments. Here, using the
sharing capability of a SAN-connected tape library, two or more systems can use
one library. This limits the investment in expensive hardware, and enables the use
of cheaper storage media (as compared to disk). So, where a traditional
implementation of a tape library would probably cost more than the equivalent in
disk storage, sharing of the library increases the utilization factor and decreases
the cost per amount of storage.
Cost
Disk
Tape
Tape Library Cost
x Amount of Data
Figure 11. Storage cost
Figure 11 is a graph of Cost versus Amount of Data for both tape and disk
storage. As you can see, if the amount of data that needs to be stored is lower
than x, disk storage is cheaper than tape. However, by increasing the amount of
data stored, the total cost goes below that of disk storage. In order to get past this
point, you should increase the volume of data that is stored on tape. One way of
doing this is by sharing the library between different systems.
2.4.2 Fibre Channel attached tape storage
Probably the most straightforward type of SAN implementation of a tape device is
where a tape library is connected to one backup server using fiber. The reason
why this can be done is the fact that fiber connections, using long-wave
technology, can have a length up to 10 kilometers. This means that you can
physically separate your tape library from your backup server, which might prove
efficient for disaster recovery or automatic vaulting.
Figure 12 shows the logical setup of such a configuration:
Chapter 2. Strategy 17
30. SAN Tape Storage
Host Node
Node
Fibre SCSI
Channel
Figure 12. Fibre Channel attached tape storage
The above diagram is only a representation of a logical configuration. For
information on the actual hardware and software that can be used to implement
this, see Chapter 4, “SAN equipment” on page 105.
This still leaves the question of how to implement the remote vaulting. Since this
is typically done by using tape copies, a second library should be added. Here, for
example, we could use a local SCSI-connected library.
2.4.3 Tape pooling
A configuration that comes closer to the general idea of storage area networks,
sharing storage across multiple machines, is the tape pooling configuration. Here,
one (or more) tape libraries are connected to several backup servers. This is
done by using a Fibre Channel SAN switch. The main advantage of this type of
installation is the ability to share a (costly) tape library between two or more
backup servers. Although this might look like something that could already be
accomplished in the past, using a library setup in split configuration (the library is
logically split in two, each part using one tape device connected to one backup
system), there are some differences.
The split configuration was a static setup. This means that you connected one
tape drive to one system, the other tape drive to another. If you had a library with
two tape devices, the split setup meant that you created two smaller, independent
libraries. Also the cartridges were assigned to one part of the split library.
In a tape pooling configuration, there is no physical or logical split of the tape
hardware. The entire library is available to both systems. This means that when
one server needs two tape drives for a certain configuration, it will be able to get
them (if they are not being used by another system). Also, the free tapes, or
scratch pool, can be accessed by both systems.
However, the physical media that are in use (meaning that they do not belong to
the scratch pool) cannot be shared. A tape used by one system cannot be read by
another.
Figure 13 shows a tape pooling configuration:
18 Netfinity Tape Solutions
31. Fibre
Host Node Channel
SAN Tape Storage
Node
SCSI
Host Node
Fibre
Channel
Figure 13. Tape pooling
Again, this configuration is just a logical layout. The exact physical layout, and the
necessary hardware and software will be discussed later.
2.5 Performance considerations
When talking to people who are using backup software intensively, one of their
major problems is performance. The reason for that is as follows: while the
amount of data increases steadily, the time that a machine is available for backup
(which has a performance impact on the machine, and sometimes requires
applications to be quiesced) often gets shorter. That is, more data has to be
moved in a shorter time period. Although hardware and software manufacturers
are continually improving their products to cope with this trend, some parameters
affecting performance are related to the way the backup solution is implemented.
The following topics discuss some of the techniques you can use, as well as
some considerations that might help you determine what performance issues
should be addressed.
2.5.1 Scheduling backups
When thinking about which machines you are going to back up, you will probably
think about file or application servers. Unfortunately, these machines get updates
during the day, and the only time it makes sense to back up these systems is after
hours. The reason for this is that backup products need to access files and back
up valid copies of them. If these files are in use and modified during a backup, the
backup version you have would not be very helpful when restored.
That is why you should determine a period of time in which operations on the
machine that you will back up are minimal, and use this period of time to run your
backup. This period is often referred to as the backup window. You will soon see
that this backup window usually starts sometime late at night, and ends early in
the morning, not exactly the time you or someone else wants to sit beside the
machine starting or stopping backup operations. Luckily, backup programs make
good use of scheduling mechanisms. These schedulers allow you to start a
backup at a certain point in time.
The following points are important when automating your backup processes using
schedulers:
• What will my backup application do in case of errors? Will it continue or stop?
The worst case would be if the application stops and asks for user
intervention. It would be better for the backup application to make every effort
to work around problems, backing up as much of your data as possible.
Chapter 2. Strategy 19
32. • Will operations and errors be logged somewhere, so I can check if the
backups were successful?
• If the backup operation takes longer than the defined backup window, will it
continue or stop?
There are different scheduling mechanisms, each with its own advantages and
disadvantages. For more details, please refer to Chapter 5, “Software” on page
123.
2.5.2 Network bandwidth considerations
When implementing a backup solution that backs up data to a backup server over
the network, an important factor is network bandwidth. The reason for this is that
all the data must go over the network. This becomes even more important when
different machines are trying to back up to one server at the same time, since the
amount of data increases. That is why network bandwidth will be one of the
factors when deciding how many machines will be backed up to one backup
server, and which backup window will be needed.
To calculate the time needed for a backup, the following points must be
considered:
• The amount of data that will be backed up
Unfortunately, this number can differ from backup system to backup system.
Let’s say you have a file server with 20 GB of data. When you do a full backup
of this system, it will indeed send 20 GB. But most backup programs also work
with incremental or differential backup algorithms, which only back up
changed data. So, to figure out the amount of data that is backed up in such
an operation, we will have to consider the following points:
• How much data changes between two backups?
• What does “changed” mean to my backup program?
Backup programs will normally also compress data. Unfortunately, the
compression rate is strongly dependent on the type of file you are backing up,
and therefore hard to define. For initial calculations, you could take the worst
case scenario, where no compression would take place.
• Nominal network speed (commonly expressed in Mbps)
This is the published speed of your network. Token-ring for example will have a
nominal network speed of 4 or 16 Mbps.
• The practical network speed
Since a communication protocol typically adds headers, control data and
acknowledgments to network frames, not all of it will be available for our
backup data. As a rule of thumb, the practical capacity is 50-60% for
token-ring, FDDI or ATM networks, and 35% for Ethernet networks.
20 Netfinity Tape Solutions