Mais conteúdo relacionado
Semelhante a Dmg emc-avamar-optimized-backup-recovery-dedupe[1] (20)
Dmg emc-avamar-optimized-backup-recovery-dedupe[1]
- 1. Data
Joseph Martins and Walter Purvis
Mobility
Group January 9, 2009
Research EMC Avamar: Optimized Backup and Recovery
Perspective with Source/Global Data De-duplication
Abstract:
Data growth will continue to outpace the growth of IT budgets for the
foreseeable future and IT departments will be expected to manage ever
more data with proportionally fewer resources and staff. Now more
than ever organizations need cost-effective data protection solutions.
Sustainable, reliable access to digital information—office documents,
email, instant messages, online transactions, images, video, and
more—is imperative and essential. Disk-based backup solutions that use
data de-duplication technology provide an affordably sustainable, more
manageable alternative to traditional tape-based backups.
This DMG Research Perspective examines the data protection
challenges found in remote office and VMware environments, and
presents the advantages of deploying an EMC Avamar solution to meet
those challenges.
Data Protection Challenges
Traditional Tape and Disk Backups
Moving away from tape-based backup and recovery infrastructure is a strategic
imperative, especially for companies with resource-strapped remote offices.
There are several well-known problems with tape-based backup and recovery:
Copyright © 2002-2009 Data Mobility Group, LLC. All Rights Reserved. Reproduction of this publication
without prior written permission is forbidden. Data Mobility Group believes the statements contained herein
are based on accurate and reliable information. However, because information is provided to Data Mobility
Group from various sources, we cannot warrant that this publication is complete and error-free. Data Mobility
Group disclaims all implied warranties, including warranties of merchantability or fitness for a particular pur-
pose. Data Mobility Group shall have no liability for any direct, incidental, special, or consequential damages
or lost profits. The opinions expressed herein are subject to change without notice.
datamobilitygroup.com 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 2. Data
Mobility
Group
www.datamobilitygroup.com
• A lack of experienced on-site staff (especially in branch offices)
• Unacceptably slow recovery times from off-site
• Inefficient recovery of small numbers of lost files
• Unreliable processes especially prone to human error
• Security concerns due to lost or stolen tapes
• Difficulty and cost to respond to e-discovery requests and compliance-related inquiries.
Yet many organizations continue to tolerate the expense and headaches of tape, either because they
are unaware of better disk-based alternatives or they believe that disk-based alternatives are still
too expensive. Some organizations have purchased disk-based virtual tape library (VTL) systems.
VTLs do reduce or eliminate some of the headaches of tape backup management, and they allow
organizations to leverage their investments in fibre-channel infrastructure and continue using their
existing backup processes. However, in a simple head-to-head total cost of ownership (TCO)
comparison, backing up to disk (without de-duplication) is still considerably more expensive than
backing up to tape.
Storage Hungry Virtualization
Rapid data growth, backup multiplicity, and highly redundant virtual computing environments
conspire to make a bad situation worse. Driven largely by power, floor space, and manageability
constraints, organizations have embraced server virtualization as a way to consolidate many
physical servers into fewer physical servers running large numbers of virtual machines.
Unfortunately, server virtualization massively increases the resource consumption of traditional
tape and disk backup processes. Running traditional backup software on individual virtual
machines results in resource contention for the underlying physical server’s network bandwidth,
CPU, memory, and disk—making it very difficult to meet shrinking backup windows. And,
running traditional backup software at the host server level consumes quite a large amount of
disk space (for example, copies of VMware’s virtual machine disk (VMDK) files might be 10,
50, 100, or more gigabytes each). Because so few of the files within cloned VMDKs change
on a day-to-day basis it makes no sense to regularly back up dozens or hundreds of duplicate
copies across identical VMDKs. The costs of network bandwidth and disk storage alone make
this an untenable approach.
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 3. Data
Mobility
Group
www.datamobilitygroup.com
The Economics of Data De-duplication
In its broadest sense, de-duplication is the removal of redundant data from a defined set of data.
Unlike traditional data compression methods that are applied to individual files, de-duplication can
be applied across the files in a dataset and across storage devices at both the file and sub-file level,
depending upon the vendor’s offering.
The cost benefit of data de-duplication is undeniable even at ratios of just 10 or 20-to-1. DMG’s
own year-long research project using EMC Avamar revealed a massive 76-to-1 disk storage
savings over traditional daily full backups and greater than 18-to-1 space savings over traditional
weekly-full and daily incremental backups. On virtual servers the space savings across VMDKs
can easily exceed 40-to-1. And, while disk drives are relatively inexpensive, the fully-loaded cost
(i.e. energy, floor space, labor, maintenance costs, etc.) of operating and managing unnecessarily
large disk systems is not. De-duplication squeezes as much capacity as possible out of the fewest
number of storage devices to minimize costs across the board. In effect, data de-duplication makes
disk-based backup solutions more affordable than tape.
Backup Data De-duplication Approaches
Target vs. Source De-duplication
Backup data de-duplication solutions differ significantly in terms of where they perform the
process of finding redundant data. Generally speaking, data de-duplication can occur at the target
or at the source, depending on the selected vendor solution. Where it occurs determines the
impact on an organization’s ability to meet shrinking backup windows, while leveraging existing
infrastructure and resources.
Target de-duplication products, as their name implies, are typically backup targets for traditional
backup software. Backup data is de-duplicated only when it reaches the target backup hardware
device. This means that all of the data from the backup source, including lots of redundant data, is
sent across the network or virtual infrastructure during daily backup operations. In many situations,
the vast majority of data is redundant and the result is wasted network and disk resources and
unnecessarily lengthy backup processes. In bandwidth-constrained environments—for example,
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 4. Data
Mobility
Group
www.datamobilitygroup.com
remote offices attempting to back up to a corporate data center over a WAN, or multiple virtual
servers contending for the same network interface on one physical server—finding enough
bandwidth to complete backups within available backup windows is often unaffordable or
impossible. In addition, long-running backup jobs can deprive applications needed server
resources and have a negative impact on end-user productivity.
In contrast, EMC Avamar de-duplicates backup data at the source. As a result, redundant
data is eliminated up-front at the start of the backup process (at the client) before any data is
moved across the network. The primary benefits of examining and de-duplicating backup data
at the source are:
• Fast, efficient daily full backups since only unique sub-file data is moved over the network
• Significantly lower resource contention across congested networks and virtual
infrastructure
• Shorter required backup windows due to less data in flight
• Lower operating expenses and the ability to leverage existing network infrastructure
EMC Avamar also de-duplicates backup data globally, across sites and servers. Only a single
copy of each sub-file variable length data segment is stored to disk during backup operations. As
a result, Avamar can significantly reduce the required total backend disk storage, in addition to
providing the benefits of de-duplication at the source.
Does Data Segment Length Matter?
The short answer is yes. Leading data de-duplication solutions on the market reduce data at
the sub-file level, but some use fixed-length data segments while others use variable length
segments. Data segment length, or more accurately, the ability to vary data segment length
based on commonality within the data set, ensures maximum data reduction. After all, that is
the purpose of de-duplication.
As users edit their files or save new files, the de-duplication engine that utilizes variable length
data segment technology is better equipped to detect the changes and store only the new, unique
segments during backup operations. For example, a fixed-length segment solution can be fooled
by the insertion of a single new character into an existing file since it erroneously views the logical
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 5. Data
Mobility
Group
www.datamobilitygroup.com
shift in data as entirely new segments—when, in fact, the original data is mostly unchanged. As
a result, fixed-length data segment solutions are significantly less efficient and require additional
network bandwidth and storage capacity over time.
EMC Avamar sports one of the most efficient backup data de-duplication engines available today
and consistently outperforms the competition in de-duplication bake-offs. Its sub-file, variable-
length de-duplication technology is not fooled by data insertions or deletions, so only the new,
truly unique data segments are backed up. Avamar’s variable length data segments are just 12KB
on average—significantly more efficient than fixed length segment solutions that may have a
minimum default fixed segment size of 128KB, 256KB or more. It is easy to understand how
Avamar efficiently de-duplicates data at the source (and globally across multiple sites) to minimize
the amount of data moved across the network and ultimately stored to disk.
Scalability
Leading backup solutions make it easy to increase performance and capacity when needed, but
not all are as simple as advertised. One vendor’s approach separates the backup data from its
associated metadata. While this approach seems conceptually elegant, users quickly realize that
separately managing and scaling the metadata and backup data can be an unsustainable nightmare.
More boxes, more space, more power and more system management overhead is exactly what
most companies do not want.
The ability to simply drop in an additional self-contained box with incrementally more backup
compute power and capacity provides the sort of organic scalability companies desire, without the
unnecessary cost or complexity of scaling and managing the metadata and backup data separately.
EMC Avamar’s scalable grid architecture enables additional compute power and disk capacity by
simply adding another Avamar server (node) to the grid, whether IT managers opt for its out-of-
the-box Data Store, or install the software on their own commodity servers. Existing backup data
is automatically load-balanced across the newly added server for maximum performance, without
any downtime. No need to separately manage and plan for the growth of metadata and backup
data. Just drop in a new box and go.
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 6. Data
Mobility
Group
www.datamobilitygroup.com
High Availability and Reliability
The idea of having to add (and pay for) additional hardware, software, licenses, and training to
achieve high availability is unappealing. When high availability is native to an appliance, there
is no need for the added complexity and the corresponding costs of additional external hardware
and/or software.
Unlike Avamar’s closest competitor, which provides no high availability without the addition
of external disk, specialized (and expensive) clustering software, and training, EMC Avamar’s
Redundant Array of Independent Nodes (RAIN) architecture provides built-in high availability
and fault tolerance across nodes. EMC Avamar nodes continuously communicate and
cooperate without administrative intervention, automatically detect and configure new nodes,
automatically check the Avamar server’s integrity twice daily and verify data recoverability
daily with no down time.
Backup and Recovery Performance
Backup and recovery performance can be influenced by many factors, including the type of servers,
network links, and other infrastructure considerations. However, the right data de-duplication
technology can significantly increase performance, even across slow or congested environments.
As discussed earlier, data segment size makes a big difference in de-duplication efficiency,
with the clear advantage going to solutions that de-duplicate data using variable length data
segments. Where the de-duplication occurs also plays an important part, since de-duplicating
data at the source always results in less data to move across slow, congested physical or virtual
environments.
Only EMC Avamar de-duplicates backup data at the source (and globally) using variable length
data segments. Not surprisingly, Avamar is ideally suited for challenging backup environments
such as remote office / branch office (ROBO) and virtual environments (e.g. VMware).
Moving only the new, unique data segments during daily full backup operations means ROBO
environments can leverage existing wide area network (WAN) links and centralize backup
management. And many virtual environments can actually increase server consolidation levels,
since Avamar removes the backup bottlenecks.
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 7. Data
Mobility
Group
www.datamobilitygroup.com
In all cases, Avamar delivers fast, daily full backups. And, Avamar’s single-step recovery
eliminates the tedious process of recovering from the last good full and subsequent incremental
backups to reach the desired recovery point. As a result, users can eliminate or greatly reduce
their reliance on tape since Avamar provides efficient, affordable backups that enable data to be
retained locally on disk for extended periods of time.
Deployment Options and Application Support
Given the incredibly diverse server, storage and application infrastructures found in modern
business, deployment flexibility and application support are essential elements of any backup
technology offering.
EMC Avamar offers the broadest range of deployment options of any source/global data de-
duplication backup solution. Deployment options include:
• Avamar agents installed directly on the systems to be protected (great for smaller remote
offices because it eliminates the need for extra local hardware).
• Avamar software installed on industry standard certified servers (perfect for organizations
that wish to choose or reuse their own hardware).
• EMC Avamar Data Store—a pre-packaged, pre-configured solution consisting of Avamar
software bundled with EMC hardware. Scalable from single to multiple nodes to provide
the equivalent of up to several petabytes of cumulative traditional backup storage (a
turnkey solution from EMC that simplifies ordering, deployment, and service).
• Avamar Virtual Edition—an industry first that enables an Avamar server to be deployed
as a virtual appliance on an existing ESX server (to leverage existing compute power and
disk storage).
And when it comes to protecting VMware environments, the Avamar software agent can be
installed within the VM Guest, at the Service Console, or at the VMware Consolidated Backup
(VCB) proxy server. In all cases, Avamar efficiently de-duplicates backup data at the source, and
globally across the entire environment.
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 8. Data
Mobility
Group
www.datamobilitygroup.com
EMC Avamar also supports the broadest array of enterprise applications of any source de-
duplication technology. From Avamar’s proven Virtual Edition (a certified VMware virtual
appliance) to its unrivaled support for NetWare, Oracle, DB2, VMware ESX 3.5, Windows Vista,
Windows NT, Unix, Linux, MAC OS, NetApp and EMC Celera filers, MS Exchange, MS SQL,
and other key applications and environments.
EMC Avamar also integrates with EMC NetWorker. As a result, NetWorker users can deploy a
single agent and decide which servers to de-duplicate via the NetWorker Management Console to
leverage their existing interface and schedules.
Reporting
One of the key components of any backup solution, essential to maximize productivity and
minimize downtime, is a detailed management user interface and reporting tool.
EMC Avamar’s Enterprise Manager dashboard provides an intuitive, at-a-glance view of the entire
Avamar environment. Combined with the Avamar Administrator, Avamar delivers user friendly,
powerful native backup management and reporting, integration with EMC Backup Advisor, and a
point-and-click interface that minimizes the number of clicks necessary to complete most tasks.
Real World Avamar
The results of Avamar’s performance during our in-house 2007 road test were crystal clear. Still,
we wanted to find out if other companies experienced similar benefits in much larger, more
distributed environments. Fortunately, we had the opportunity to speak to the director of one such
operation at a leading Fortune 10 multinational corporation.
His organization provided great service to its data centers, but he wanted to cost-effectively
improve the level of service and support to some 300 remote offices each with 1-2 terabytes of data
onsite. His challenges were many:
• A large amount of data distributed across an equally large number of remote offices
• Massive data growth (double and even triple digit)
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 9. Data
Mobility
Group
www.datamobilitygroup.com
• Shrinking backup windows
• A need to cost-effectively store more data for longer periods of time
• The risks associated with tape jockeying and having media in 3rd party hands offsite
• A need to support a broad range of operating systems and applications
Avamar was chosen for its broad client support, manageability, superior de-duplication, and speedy
full backups among other reasons. The company expects its data volume to double in 2009. With
35-day retention policies for daily workloads, in addition to single monthly and 1-7 annuals,
Avamar’s de-duplication will continue to keep a leash on data growth. Backup windows have
improved 33-45% and clients are now backed up in 6 hours or less. And the company is able to
use a single solution across a variety of platforms, from HP-UX, Linux, and Solaris to SQL Server,
Oracle, and NAS.
The company’s next big push is for the use of Avamar in the data center and greater replication
between facilities. The plan is to retain short term data onsite, and retain long-term data in an
offsite facility using Avamar.
Summing up
EMC Avamar is just one part of a broad portfolio of backup and recovery solutions that EMC
has assembled to satisfy the data protection needs of nearly any organization. In addition to
Avamar’s source-based de-duplication, EMC also integrates Avamar into EMC NetWorker, and
offers target-based de-duplication solutions with its Disk Library DL1500, DL3000 and DL4000
Series products.
Data Mobility Group has found EMC Avamar to be one of the best de-duplicating backup and
recovery solutions available today. In February 2008, we published the results of a 13-month,
in-house EMC Avamar road test.1 EMC’s Avamar technology made it possible for one person
to set up, schedule, monitor and manage more than 365 full daily backups of nearly 200 GBs of
data distributed across several servers. By the end of the road test the system had consumed less
than 1/76th the capacity required by traditional daily full backups, 1/18th the capacity required by
traditional weekly full and daily incremental backups, and the backups occurred very quickly with
minimal network impact.
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
- 10. Data
Mobility
Group
www.datamobilitygroup.com
Total Backup Storage Consumed Over 13 Months
~18 TB ~76 TB
< 1 TB
EMC Avamar Traditional backup methods Traditional backup methods
(daily fulls) (weekly fulls and daily (daily fulls)
incrementals)
EMC Avamar has one of the most efficient data de-duplication engines available today, consistently
outperforming the competition in de-duplication bake-offs. The graphic above illustrates just
how effective Avamar can be after one year in an ordinary office environment such as DMG’s. Its
variable-length, sub-file, de-duplication at the source (and globally across multiple sites) minimizes
the amount of data stored on backups and moved over the network.
The Avamar product lineup offers outstanding flexibility, manageability, reliability,
infrastructure reusability, and proven cost-savings. There are many environments in which
EMC Avamar could be usefully deployed, but it is particularly advantageous for organizations
that have remote office environments, extensive VMware deployments, or a need for LAN-
based backup within their data centers.
Organizations in search of an affordably sustainable, reliable, more manageable alternative to tape-
based backup cannot afford to overlook EMC Avamar.
Footnotes
1
High Value Remote Office Data Protection With EMC Avamar, published February 6, 2008.
© 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886