SlideShare uma empresa Scribd logo
1 de 6
Baixar para ler offline
White Paper




                        Dedupe-Centric Storage
                        Hugo Patterson, Chief Architect, Data Domain




                        Abstract
                        Deduplication is enabling a new generation of data storage systems. While traditional
                        storage vendors have tried to dismiss deduplication as a feature, it is proving to be a true
                        storage fundamental. Faced with the choice of completely rebuilding their storage system,
                        or simply bolting on a copy-on-write snapshot, most vendors choose the easy path that
                        will get a check-box snapshot feature to market soonest.

                        There are a vast number of requirements that make it very challenging to build a storage
                        system that actually delivers the full potential of deduplication. Deduplication is hard to
                        implement in a way that runs fast with low overhead across a full range of protocols and
                        application environments. When deduplication is done well, people can create, share,
                        access, protect, and manage their data in new and easier ways, and IT administrators
                        aren’t struggling to keep up.

                        Dedupe-centric storage is engineered from the start to address these challenges.
                        This paper gives historical perspective and a technological context to this revolution
                        in storage management.




DEDUPLICATION STORAGE
Dedupe-Centric Storage

Table of Contents

Deduplication: The Post-Snapshot Revolution
in Storage . .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 3


       Building a Better Snapshot . .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 3


       Deduplication as a Point Solution . .  .  .  .  .  .  .  .  .  .  .  .  .  . 3


Dedupe as a Storage Fundamental. .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 4


       Variable Length Duplicates . .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 4


       Local Compression . .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 4


       Format Agnostic .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 4
                        .


       Multi-Protocol . .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 4


       CPU-Centric vs. Disk-Intensive Algorithm . .  .  .  .  .  .  .  . 5


       Deduplicated Replication . .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 5


       Deduplicated Snapshots . .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 5


Conclusion . .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  . 5




2     DEDUPE-CENTRIC STORAGE
Deduplication: The Post-Snapshot Revolution in Storage
The revolution to come with deduplication will eclipse the snapshot revolution of the early 1990s. Since at least the 1980s, systems had been
saving versions of individual files or snapshots of whole disk volumes or file systems. Technologies existed for creating snapshots reasonably
efficiently. But, in most production systems, snapshots were expensive, slow, or limited in number and so had limited use and deployment.

About 15 years ago, Network Appliance (NetApp) leveraged the existing technologies and added some of their own innovations to build
a file system that could create and present multiple snapshots of the entire system with little performance impact and without making a
complete new copy of all the data for every snapshot. It did not use deduplication to find and eliminate redundant data, but at least it
didn’t duplicate unchanged data just to create a new logical view of the same data in a new snapshot. Even this was a significant advance.
By not requiring a complete copy for each snapshot, it became much cheaper to store snapshots. And, by not moving the old version of
data preserved in a snapshot just to store a new version of data, there was no performance impact to creating snapshots. The efficiency
of their approach meant there was no reason not to create snapshots and keep a bunch around. Users responded by routinely creating
several snapshots a day and often preserving some snapshots for several days. The relative abundance of such snapshots revolutionized
data protection. Users could browse snapshots and restore earlier versions of files to recover from mistakes immediately all by themselves.
Instead of being shut down to run backups, databases could be quiesced briefly to create a consistent snapshot and then freed to run
again while backups proceeded in the background. Such consistent snapshots could also serve as starting points for database recovery in
the event of a database corruption, thereby avoiding the need to go to tape for recovery.


Building a Better Snapshot                                               data sets are vast, putting a premium on scalability. Ideally, a
Despite the many benefits and commercial success of space and            deduplication technology would be able to find and eliminate
performance efficient snapshots, it was over a decade before the         redundant data no matter which application writes it, even if
first competitors built comparable technology. Most competitors’         different applications have written the same data. Deduplication
snapshots still do not compare. Many added a snapshot feature            should be effective and scalable so that it can find a small segment
that delivers space efficiency but few matched the simplicity,           of matching data no matter where it is stored in a large system.
performance, scalability and ultimately the utility of NetApp’s          Deduplication also needs to be efficient or the memory and
design. Why?                                                             computing overhead of finding duplicates in a large system could
                                                                         negate any benefit. Finally, deduplication needs to be simple and
The answer is that creating and managing the multiple virtual
                                                                         automatic or the management burden of scheduling, tuning, or
views of a file system captured in snapshots is challenging. By far
                                                                         generally managing the deduplication process could again negate
the easiest and most widely adopted approach is to read the old
                                                                         any benefits of the data reduction. All of these requirements make
version of data and copy it to a safe place before writing new
                                                                         it very challenging to build a storage system that actually delivers
data. This is known as copy-on-write and works especially well for
                                                                         the full potential of deduplication.
block storage systems. Unfortunately, the copy operation imposes
a severe performance penalty because of the extra read and write
operations it requires. But, doing something more sophisticated                   When deduplication is done well, people
and writing new data to a new location without the extra read                      can create, share, access, protect, and
and write requires all new kinds of data structures and completely                 manage their data in new and easier
changes the model for how a storage system works. Further,                           ways and IT administrators aren’t
block-level snapshots by themselves do not provide access to the                           struggling to keep up.
file version captured in the snapshots; you need a file system that
integrates those versions into the name space.
                                                                         Early efforts at deduplication did not attempt a comprehensive
Faced with the choice of completely rebuilding their storage
                                                                         solution. One example is email systems that store only one copy
system, or simply bolting on a copy-on-write snapshot, most
                                                                         of an attachment or a message even when it is sent to many
vendors choose the easy path that will get a check-box snapshot
                                                                         recipients. Such email blasts are a common way for people to share
feature to market soonest. Very few are willing to invest the years
                                                                         their work and early email implementations created a separate copy
and dollars to start over for what they view as just a single feature.
                                                                         of the email and attachment for every recipient. The first innovation
By not investing, their snapshots were doomed to be second rate.
                                                                         is to store just a single copy on the server and maintain multiple
They don’t cross the hurdle to that makes snapshots easy, cheap
                                                                         references to that. More sophisticated systems might detect when
and most useful. They don’t deliver on the snapshot revolution.
                                                                         the same attachment is forwarded to further recipients and create
                                                                         additional references to the same data instead of storing a new
Deduplication as a Point Solution                                        copy. Nevertheless, when users download attachments to their
On its surface, deduplication is a simple concept: find and              desktops, the corporate environment still ends up with many copies
eliminate redundant copies of data. But computing systems are            to store, manage and protect.
complex, supporting many different kinds of applications. And

                                                                                                                   DEDUPE-CENTRIC STORAGE        3
Local Compression
           Creating and managing the multiple                             Deduplication across all the data stored in a system is necessary, but
         virtual views of a file system captured in                       it should be complemented with local compression which typi-
                  snapshots is challenging.                               cally reduces data size by another factor of 2. This may not be as
                                                                          significant a factor as deduplication itself, but no system that takes
                                                                          data reduction seriously can afford to ignore it.

At the end of the day, an email system that eliminates duplicate
email attachments is a point solution. It helps that one application,     Format Agnostic
but it doesn’t keep many additional duplicates, even of the very          Data comes in many formats generated by many different applica-
same attachments from proliferating through the environment.              tions. But, embedded in those different formats is often the same
                                                                          duplicate data. A document may appear as an individual file gener-
The power of an efficient snapshot mechanism in a storage system          ated by a word processor, or in an email saved in a folder by an
is its generality. Databases could all build their own snapshot           email reader, or in the database of the email server, or embedded
mechanism, but when the storage provides them they don’t have             in a backup image, or squirreled away by an email archive applica-
to. They and any other application can benefit from snapshots for         tion. A deduplication system that relies on parsing the formats
efficient, and independent, data protection. NetApp was dedicated         generated by all these different applications can never be a general
to building efficient snapshots so all the applications didn’t have to.   purpose storage platform. There are too many such formats and
                                                                          they change too quickly for a storage vendor to support them all.
Dedupe as a Storage Fundamental                                           Even if they could, such an approach would end up handcuffing
                                                                          application writers trying to innovate on top of the platform. Until
Data Domain is dedicated to building deduplication storage
                                                                          their new format is supported, they’d gain no benefit from the
systems with a deduplication engine powerful enough to become
                                                                          platform. They would be better off sticking with the same old
a platform that general applications can leverage. That engine
                                                                          formats and not creating anything new. Storage platforms should
can find duplicates in data written by many different applications
                                                                          unleash creativity, not squelch it. Thus, the deduplication engine
at every stage of the lifecycle of the file, can scale to store many
                                                                          must be data agnostic and find and eliminate duplicates in data no
hundreds of terabytes of data and find duplicates wherever they
                                                                          matter how it is packaged and stored to the system.
exist, can reduce data by a factor of 20 or more, can do so at
high speed without huge memory or disk resources, and does so
automatically while taking snapshots and so requires a minimum of
administrator attention.                                                                Vast requirements make it very
Such an engine is only possible with an unconventional system                        challenging to build a storage system
architecture that breaks with traditional thinking. Here are some                  that actually delivers the full potential of
important features needed to build such an engine.                                               deduplication.

Variable Length Duplicates
Not all data changes are exactly 4KB in size and 4KB aligned.
Sometimes people just replace one word with a longer word. Such           Multi-Protocol
a simple small replacement shifts all the rest of the data by some        There are many standard protocols in use in storage systems today
small amount. To a deduplication engine built on fixed size blocks,       from NFS and CIFS to blocks and VTL. For maximum flexibility,
the entire rest of the file would seem to be new unique data even         storage systems should support all these protocols since different
though it really contains lots of duplicates. Further, that file may      protocols are needed for different applications at the same time.
exist elsewhere in the system, say in an email folder, at a random        User home directories may be in NAS. The exchange server may
offset in a larger file. Again, to an engine organized around fixed       need to run on blocks. And backups may prefer VTL. Over the
blocks, the data would all seem to be new. Conventional storage           course of its lifecycle, the same data may be stored with all of these
systems, whether NAS or SAN, store fixed sized blocks. Such a             protocols. A presentation that starts in a home directory may be
system which attempts to bolt on deduplication as an afterthought         emailed to a colleague and stored in blocks by the email server and
will only be able to look for identical fixed size blocks. Clearly,       then archived in NAS by an email archive application and backed
such an approach will never be as effective, comprehensive, and           up from the home directory, the email server, and the archive ap-
general purpose as a system that can handle small replacements            plication to VTL. Deduplication should be able to find and eliminate
and recognize and eliminate duplicate data no matter where in a           redundant data no matter how it is stored.
file it appears.




4   DEDUPE-CENTRIC STORAGE
CPU-Centric vs. Disk-Intensive Algorithm                                 Deduplicated Snapshots
Over the last two decades, CPU performance has increased                 Snapshots are very helpful for capturing different versions of
2,000,000 times1. In that time, disk performance has only increased      data and deduplication can store all those versions much more
11 times1. Today, CPU performance is taking another leap with            compactly. Both are fundamental to simplifying data management.
every doubling of the number of cores in a chip. Clearly, algorithms     Yet many systems that are cobbled together as a set of features
developed today for deduplication should leverage the growth in          either can’t create snapshots efficiently, or they can’t deduplicate
CPU performance instead of being tied to disk performance. Some          snapshots, so users need to be careful when they create snapshots
                                                                         so as not to lock duplicates in. Such restrictions and limitations
                                                                         complicate data management not simplify it.
               Deduplication has leverage across the
                storage infrastructure for reducing                      Conclusion
              data, improving data protection and, in                    Deduplication has leverage across the storage infrastructure for
              general, simplifying data management.                      reducing data, improving data protection and, in general, simplify-
                                                                         ing data management. But, deduplication is hard to implement
                                                                         in a way that runs fast with low overhead across a full range of
                                                                         protocols and application environments. Storage system vendors
systems rely on a disk access to find every piece of duplicate data.
                                                                         who treat deduplication merely as a feature will check off a box on
In some systems, the disk access is to lookup a segment fingerprint.
                                                                         a feature list, but are likely to fall short of delivering the benefits
Other systems can’t find duplicate data except by reading the old
                                                                         deduplication promises.
data to compare it to the new. In either case, the rate at which
duplicate data can be found and eliminated is bounded by the
speed of these disk accesses. To go faster, such systems need to
add more disks. But, the whole idea of deduplication is to reduce
storage, not grow it. The Data Domain SISL™ (Stream-Informed
Segment Layout) scaling architecture does not have to rely on
reading lots of data from disk to find duplicates and organizes
the segment fingerprints on disk in such a way that only a small
number of accesses are needed to find thousands of duplicates.
With SISL, back end disks only need to deliver a few megabytes of
data to deduplicate hundreds of megabytes of incoming data.

Deduplicated Replication
Data is protected from disaster only when a copy of it is safely at
a remote location. Replication has long been used for high-value
data, but without deduplication, replication is too expensive for the
other 90% of the data. Deduplication should happen immediately
inline and its benefits applied to replication in real time, so the
lag till data is safely off site is as small as possible. Only systems
designed for deduplication can run fast enough to deduplicate and
replicate data right away. Systems which have bolted on deduplica-
tion as merely a feature at best impose unnecessary delays in
replication and at worst don’t deliver the benefits of deduplicated
replication at all.




                                                                                                                  www.datadomain.com
1	 Seagate Technology Paper, Economies of Capacity and Speed, May 2004
Data Domain
2421 Mission College Blvd.
Santa Clara, CA 95054
866-WE-DDUPE; 408-980-4800
sales@datadomain.com
22 international offices: datadomain.com/company/contacts.html

Copyright © 2008 Data Domain, Inc. All rights reserved.

Data Domain, Inc. believes information in this publication is accurate as of its publication date. This publication could include technical inaccuracies or typographical errors. The information is subject to change without notice. Changes are periodically
added to the information herein; these changes will be incorporated in new additions of the publication. Data Domain, Inc. may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time.
Reproduction of this publication without prior written permission is forbidden.

The information in this publication is provided “as is.” Data Domain, Inc. makes no representations or warranties of any kind, with respect to the information in this publication, and specifically disclaims implied warranties of
merchantability or fitness for a particular purpose.

Data Domain and Global Compression are trademarks or registered trademarks of Data Domain, Inc. All other brands, products, service names, trademarks, or registered service marks are used to identify the products or
services of their respective owners. WP-DCS-0508


      DEDUPLICATION STORAGE                                                                                                                                                                                    www.datadomain.com

Mais conteúdo relacionado

Mais procurados

Hosted Virtualization
Hosted VirtualizationHosted Virtualization
Hosted Virtualizationjayallen77
 
New Features For Your Software Defined Storage
New Features For Your Software Defined StorageNew Features For Your Software Defined Storage
New Features For Your Software Defined StorageDataCore Software
 
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...Principled Technologies
 
A Study Of Disaggregated Memory Management Techniques With Hypervisor Based T...
A Study Of Disaggregated Memory Management Techniques With Hypervisor Based T...A Study Of Disaggregated Memory Management Techniques With Hypervisor Based T...
A Study Of Disaggregated Memory Management Techniques With Hypervisor Based T...IJSRED
 
Make room for more virtual desktops with fast storage
Make room for more virtual desktops with fast storageMake room for more virtual desktops with fast storage
Make room for more virtual desktops with fast storagePrincipled Technologies
 

Mais procurados (8)

Hosted Virtualization
Hosted VirtualizationHosted Virtualization
Hosted Virtualization
 
Implementing dr w. hyper v clustering
Implementing dr w. hyper v clusteringImplementing dr w. hyper v clustering
Implementing dr w. hyper v clustering
 
University of Siegen
University of Siegen University of Siegen
University of Siegen
 
New Features For Your Software Defined Storage
New Features For Your Software Defined StorageNew Features For Your Software Defined Storage
New Features For Your Software Defined Storage
 
Using VMTurbo to boost performance
Using VMTurbo to boost performanceUsing VMTurbo to boost performance
Using VMTurbo to boost performance
 
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
Improve deep learning inference  performance with Microsoft Azure Esv4 VMs wi...
 
A Study Of Disaggregated Memory Management Techniques With Hypervisor Based T...
A Study Of Disaggregated Memory Management Techniques With Hypervisor Based T...A Study Of Disaggregated Memory Management Techniques With Hypervisor Based T...
A Study Of Disaggregated Memory Management Techniques With Hypervisor Based T...
 
Make room for more virtual desktops with fast storage
Make room for more virtual desktops with fast storageMake room for more virtual desktops with fast storage
Make room for more virtual desktops with fast storage
 

Destaque

DIGITALLY MODIFIED CHILDREN - THE NEW CHALLENGE OF THE HUMANITY
DIGITALLY MODIFIED CHILDREN   - THE NEW CHALLENGE OF THE HUMANITY  DIGITALLY MODIFIED CHILDREN   - THE NEW CHALLENGE OF THE HUMANITY
DIGITALLY MODIFIED CHILDREN - THE NEW CHALLENGE OF THE HUMANITY Dr. Raju M. Mathew
 
Media homework andrew goodwin
Media homework   andrew goodwinMedia homework   andrew goodwin
Media homework andrew goodwinloousmith
 
Ict policy for networked society
Ict policy for networked societyIct policy for networked society
Ict policy for networked societyRene Summer
 
Tues factors of production
Tues factors of productionTues factors of production
Tues factors of productionTravis Klein
 
Texas s ta r chart
Texas s ta r chartTexas s ta r chart
Texas s ta r chartbumbada
 
профорієнтація і профільне навч.
профорієнтація і профільне навч.профорієнтація і профільне навч.
профорієнтація і профільне навч.Татьяна Глинская
 
Reinforcedslab 100917010457-phpapp02
Reinforcedslab 100917010457-phpapp02Reinforcedslab 100917010457-phpapp02
Reinforcedslab 100917010457-phpapp02Laquiao Ganagan
 
Ελληνικές Επιχειρήσεις και Οικονομική Κρίση
Ελληνικές Επιχειρήσεις και Οικονομική ΚρίσηΕλληνικές Επιχειρήσεις και Οικονομική Κρίση
Ελληνικές Επιχειρήσεις και Οικονομική Κρίσηchaniadevs
 
RSA Incident Response Threat Emerging Threat Profile: Shell_Crew
 RSA Incident Response Threat Emerging Threat Profile: Shell_Crew RSA Incident Response Threat Emerging Threat Profile: Shell_Crew
RSA Incident Response Threat Emerging Threat Profile: Shell_CrewEMC
 
Spending multipliers
Spending multipliersSpending multipliers
Spending multipliersTravis Klein
 
Africa after imperialism
Africa after imperialismAfrica after imperialism
Africa after imperialismTravis Klein
 
Brown 2011 leadership at the edge_leading change with postconventional consci...
Brown 2011 leadership at the edge_leading change with postconventional consci...Brown 2011 leadership at the edge_leading change with postconventional consci...
Brown 2011 leadership at the edge_leading change with postconventional consci...Instituto Integral Brasil
 
Global Spec Overview
Global Spec OverviewGlobal Spec Overview
Global Spec Overviewkbnueml2
 
Atomic theory
Atomic theoryAtomic theory
Atomic theoryFLI
 
RSA Monthly Online Fraud Report - June 2013
RSA Monthly Online Fraud Report - June 2013RSA Monthly Online Fraud Report - June 2013
RSA Monthly Online Fraud Report - June 2013EMC
 

Destaque (20)

Social Media
Social MediaSocial Media
Social Media
 
DIGITALLY MODIFIED CHILDREN - THE NEW CHALLENGE OF THE HUMANITY
DIGITALLY MODIFIED CHILDREN   - THE NEW CHALLENGE OF THE HUMANITY  DIGITALLY MODIFIED CHILDREN   - THE NEW CHALLENGE OF THE HUMANITY
DIGITALLY MODIFIED CHILDREN - THE NEW CHALLENGE OF THE HUMANITY
 
Media homework andrew goodwin
Media homework   andrew goodwinMedia homework   andrew goodwin
Media homework andrew goodwin
 
Ict policy for networked society
Ict policy for networked societyIct policy for networked society
Ict policy for networked society
 
Presentation1linx
Presentation1linxPresentation1linx
Presentation1linx
 
1 3words
1 3words1 3words
1 3words
 
Tues factors of production
Tues factors of productionTues factors of production
Tues factors of production
 
Pat5
Pat5Pat5
Pat5
 
Texas s ta r chart
Texas s ta r chartTexas s ta r chart
Texas s ta r chart
 
профорієнтація і профільне навч.
профорієнтація і профільне навч.профорієнтація і профільне навч.
профорієнтація і профільне навч.
 
Reinforcedslab 100917010457-phpapp02
Reinforcedslab 100917010457-phpapp02Reinforcedslab 100917010457-phpapp02
Reinforcedslab 100917010457-phpapp02
 
Ελληνικές Επιχειρήσεις και Οικονομική Κρίση
Ελληνικές Επιχειρήσεις και Οικονομική ΚρίσηΕλληνικές Επιχειρήσεις και Οικονομική Κρίση
Ελληνικές Επιχειρήσεις και Οικονομική Κρίση
 
Windows Server 2012 Disk Dedupe
Windows Server 2012 Disk DedupeWindows Server 2012 Disk Dedupe
Windows Server 2012 Disk Dedupe
 
RSA Incident Response Threat Emerging Threat Profile: Shell_Crew
 RSA Incident Response Threat Emerging Threat Profile: Shell_Crew RSA Incident Response Threat Emerging Threat Profile: Shell_Crew
RSA Incident Response Threat Emerging Threat Profile: Shell_Crew
 
Spending multipliers
Spending multipliersSpending multipliers
Spending multipliers
 
Africa after imperialism
Africa after imperialismAfrica after imperialism
Africa after imperialism
 
Brown 2011 leadership at the edge_leading change with postconventional consci...
Brown 2011 leadership at the edge_leading change with postconventional consci...Brown 2011 leadership at the edge_leading change with postconventional consci...
Brown 2011 leadership at the edge_leading change with postconventional consci...
 
Global Spec Overview
Global Spec OverviewGlobal Spec Overview
Global Spec Overview
 
Atomic theory
Atomic theoryAtomic theory
Atomic theory
 
RSA Monthly Online Fraud Report - June 2013
RSA Monthly Online Fraud Report - June 2013RSA Monthly Online Fraud Report - June 2013
RSA Monthly Online Fraud Report - June 2013
 

Semelhante a Dedupe-Centric Storage for General Applications

Data Domain Architecture
Data Domain ArchitectureData Domain Architecture
Data Domain Architecturekoesteruk22
 
What every-programmer-should-know-about-memory
What every-programmer-should-know-about-memoryWhat every-programmer-should-know-about-memory
What every-programmer-should-know-about-memoryxan peng
 
Streamlining Backup: Enhancing Data Protection with Backup Appliances
Streamlining Backup: Enhancing Data Protection with Backup AppliancesStreamlining Backup: Enhancing Data Protection with Backup Appliances
Streamlining Backup: Enhancing Data Protection with Backup AppliancesMaryJWilliams2
 
8 considerations for evaluating disk based backup solutions
8 considerations for evaluating disk based backup solutions8 considerations for evaluating disk based backup solutions
8 considerations for evaluating disk based backup solutionsServium
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperDavid Walker
 
Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013EMC
 
What Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryWhat Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryYing wei (Joe) Chou
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage ReductionPerforce
 
Successfully Implementing a private storage cloud to help reduce total cost o...
Successfully Implementing a private storage cloud to help reduce total cost o...Successfully Implementing a private storage cloud to help reduce total cost o...
Successfully Implementing a private storage cloud to help reduce total cost o...IBM India Smarter Computing
 
Genetic Engineering - Teamworking Infrastructure For Post And DI
Genetic Engineering - Teamworking Infrastructure For Post And DIGenetic Engineering - Teamworking Infrastructure For Post And DI
Genetic Engineering - Teamworking Infrastructure For Post And DIQuantel
 
Pivotal gem fire_twp_distributed-main-memory-platform_042313
Pivotal gem fire_twp_distributed-main-memory-platform_042313Pivotal gem fire_twp_distributed-main-memory-platform_042313
Pivotal gem fire_twp_distributed-main-memory-platform_042313EMC
 

Semelhante a Dedupe-Centric Storage for General Applications (20)

Data Domain Architecture
Data Domain ArchitectureData Domain Architecture
Data Domain Architecture
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
Gfs论文
Gfs论文Gfs论文
Gfs论文
 
The google file system
The google file systemThe google file system
The google file system
 
Generic RLM White Paper
Generic RLM White PaperGeneric RLM White Paper
Generic RLM White Paper
 
What every-programmer-should-know-about-memory
What every-programmer-should-know-about-memoryWhat every-programmer-should-know-about-memory
What every-programmer-should-know-about-memory
 
Streamlining Backup: Enhancing Data Protection with Backup Appliances
Streamlining Backup: Enhancing Data Protection with Backup AppliancesStreamlining Backup: Enhancing Data Protection with Backup Appliances
Streamlining Backup: Enhancing Data Protection with Backup Appliances
 
My sql
My sqlMy sql
My sql
 
8 considerations for evaluating disk based backup solutions
8 considerations for evaluating disk based backup solutions8 considerations for evaluating disk based backup solutions
8 considerations for evaluating disk based backup solutions
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013Pivotal gem fire_wp_hardest-problems-data-management_053013
Pivotal gem fire_wp_hardest-problems-data-management_053013
 
NetApp against ransomware
NetApp against ransomwareNetApp against ransomware
NetApp against ransomware
 
What Every Programmer Should Know About Memory
What Every Programmer Should Know About MemoryWhat Every Programmer Should Know About Memory
What Every Programmer Should Know About Memory
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
[IJET-V1I6P11] Authors: A.Stenila, M. Kavitha, S.Alonshia
 
[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction[IC Manage] Workspace Acceleration & Network Storage Reduction
[IC Manage] Workspace Acceleration & Network Storage Reduction
 
Successfully Implementing a private storage cloud to help reduce total cost o...
Successfully Implementing a private storage cloud to help reduce total cost o...Successfully Implementing a private storage cloud to help reduce total cost o...
Successfully Implementing a private storage cloud to help reduce total cost o...
 
Genetic Engineering - Teamworking Infrastructure For Post And DI
Genetic Engineering - Teamworking Infrastructure For Post And DIGenetic Engineering - Teamworking Infrastructure For Post And DI
Genetic Engineering - Teamworking Infrastructure For Post And DI
 
Pivotal gem fire_twp_distributed-main-memory-platform_042313
Pivotal gem fire_twp_distributed-main-memory-platform_042313Pivotal gem fire_twp_distributed-main-memory-platform_042313
Pivotal gem fire_twp_distributed-main-memory-platform_042313
 

Mais de EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

Mais de EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Dedupe-Centric Storage for General Applications

  • 1. White Paper Dedupe-Centric Storage Hugo Patterson, Chief Architect, Data Domain Abstract Deduplication is enabling a new generation of data storage systems. While traditional storage vendors have tried to dismiss deduplication as a feature, it is proving to be a true storage fundamental. Faced with the choice of completely rebuilding their storage system, or simply bolting on a copy-on-write snapshot, most vendors choose the easy path that will get a check-box snapshot feature to market soonest. There are a vast number of requirements that make it very challenging to build a storage system that actually delivers the full potential of deduplication. Deduplication is hard to implement in a way that runs fast with low overhead across a full range of protocols and application environments. When deduplication is done well, people can create, share, access, protect, and manage their data in new and easier ways, and IT administrators aren’t struggling to keep up. Dedupe-centric storage is engineered from the start to address these challenges. This paper gives historical perspective and a technological context to this revolution in storage management. DEDUPLICATION STORAGE
  • 2. Dedupe-Centric Storage Table of Contents Deduplication: The Post-Snapshot Revolution in Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Building a Better Snapshot . . . . . . . . . . . . . . . . . . . . . . 3 Deduplication as a Point Solution . . . . . . . . . . . . . . . 3 Dedupe as a Storage Fundamental. . . . . . . . . . . . . . . . . . 4 Variable Length Duplicates . . . . . . . . . . . . . . . . . . . . . . 4 Local Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Format Agnostic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 . Multi-Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 CPU-Centric vs. Disk-Intensive Algorithm . . . . . . . . . 5 Deduplicated Replication . . . . . . . . . . . . . . . . . . . . . . . . 5 Deduplicated Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 DEDUPE-CENTRIC STORAGE
  • 3. Deduplication: The Post-Snapshot Revolution in Storage The revolution to come with deduplication will eclipse the snapshot revolution of the early 1990s. Since at least the 1980s, systems had been saving versions of individual files or snapshots of whole disk volumes or file systems. Technologies existed for creating snapshots reasonably efficiently. But, in most production systems, snapshots were expensive, slow, or limited in number and so had limited use and deployment. About 15 years ago, Network Appliance (NetApp) leveraged the existing technologies and added some of their own innovations to build a file system that could create and present multiple snapshots of the entire system with little performance impact and without making a complete new copy of all the data for every snapshot. It did not use deduplication to find and eliminate redundant data, but at least it didn’t duplicate unchanged data just to create a new logical view of the same data in a new snapshot. Even this was a significant advance. By not requiring a complete copy for each snapshot, it became much cheaper to store snapshots. And, by not moving the old version of data preserved in a snapshot just to store a new version of data, there was no performance impact to creating snapshots. The efficiency of their approach meant there was no reason not to create snapshots and keep a bunch around. Users responded by routinely creating several snapshots a day and often preserving some snapshots for several days. The relative abundance of such snapshots revolutionized data protection. Users could browse snapshots and restore earlier versions of files to recover from mistakes immediately all by themselves. Instead of being shut down to run backups, databases could be quiesced briefly to create a consistent snapshot and then freed to run again while backups proceeded in the background. Such consistent snapshots could also serve as starting points for database recovery in the event of a database corruption, thereby avoiding the need to go to tape for recovery. Building a Better Snapshot data sets are vast, putting a premium on scalability. Ideally, a Despite the many benefits and commercial success of space and deduplication technology would be able to find and eliminate performance efficient snapshots, it was over a decade before the redundant data no matter which application writes it, even if first competitors built comparable technology. Most competitors’ different applications have written the same data. Deduplication snapshots still do not compare. Many added a snapshot feature should be effective and scalable so that it can find a small segment that delivers space efficiency but few matched the simplicity, of matching data no matter where it is stored in a large system. performance, scalability and ultimately the utility of NetApp’s Deduplication also needs to be efficient or the memory and design. Why? computing overhead of finding duplicates in a large system could negate any benefit. Finally, deduplication needs to be simple and The answer is that creating and managing the multiple virtual automatic or the management burden of scheduling, tuning, or views of a file system captured in snapshots is challenging. By far generally managing the deduplication process could again negate the easiest and most widely adopted approach is to read the old any benefits of the data reduction. All of these requirements make version of data and copy it to a safe place before writing new it very challenging to build a storage system that actually delivers data. This is known as copy-on-write and works especially well for the full potential of deduplication. block storage systems. Unfortunately, the copy operation imposes a severe performance penalty because of the extra read and write operations it requires. But, doing something more sophisticated When deduplication is done well, people and writing new data to a new location without the extra read can create, share, access, protect, and and write requires all new kinds of data structures and completely manage their data in new and easier changes the model for how a storage system works. Further, ways and IT administrators aren’t block-level snapshots by themselves do not provide access to the struggling to keep up. file version captured in the snapshots; you need a file system that integrates those versions into the name space. Early efforts at deduplication did not attempt a comprehensive Faced with the choice of completely rebuilding their storage solution. One example is email systems that store only one copy system, or simply bolting on a copy-on-write snapshot, most of an attachment or a message even when it is sent to many vendors choose the easy path that will get a check-box snapshot recipients. Such email blasts are a common way for people to share feature to market soonest. Very few are willing to invest the years their work and early email implementations created a separate copy and dollars to start over for what they view as just a single feature. of the email and attachment for every recipient. The first innovation By not investing, their snapshots were doomed to be second rate. is to store just a single copy on the server and maintain multiple They don’t cross the hurdle to that makes snapshots easy, cheap references to that. More sophisticated systems might detect when and most useful. They don’t deliver on the snapshot revolution. the same attachment is forwarded to further recipients and create additional references to the same data instead of storing a new Deduplication as a Point Solution copy. Nevertheless, when users download attachments to their On its surface, deduplication is a simple concept: find and desktops, the corporate environment still ends up with many copies eliminate redundant copies of data. But computing systems are to store, manage and protect. complex, supporting many different kinds of applications. And DEDUPE-CENTRIC STORAGE 3
  • 4. Local Compression Creating and managing the multiple Deduplication across all the data stored in a system is necessary, but virtual views of a file system captured in it should be complemented with local compression which typi- snapshots is challenging. cally reduces data size by another factor of 2. This may not be as significant a factor as deduplication itself, but no system that takes data reduction seriously can afford to ignore it. At the end of the day, an email system that eliminates duplicate email attachments is a point solution. It helps that one application, Format Agnostic but it doesn’t keep many additional duplicates, even of the very Data comes in many formats generated by many different applica- same attachments from proliferating through the environment. tions. But, embedded in those different formats is often the same duplicate data. A document may appear as an individual file gener- The power of an efficient snapshot mechanism in a storage system ated by a word processor, or in an email saved in a folder by an is its generality. Databases could all build their own snapshot email reader, or in the database of the email server, or embedded mechanism, but when the storage provides them they don’t have in a backup image, or squirreled away by an email archive applica- to. They and any other application can benefit from snapshots for tion. A deduplication system that relies on parsing the formats efficient, and independent, data protection. NetApp was dedicated generated by all these different applications can never be a general to building efficient snapshots so all the applications didn’t have to. purpose storage platform. There are too many such formats and they change too quickly for a storage vendor to support them all. Dedupe as a Storage Fundamental Even if they could, such an approach would end up handcuffing application writers trying to innovate on top of the platform. Until Data Domain is dedicated to building deduplication storage their new format is supported, they’d gain no benefit from the systems with a deduplication engine powerful enough to become platform. They would be better off sticking with the same old a platform that general applications can leverage. That engine formats and not creating anything new. Storage platforms should can find duplicates in data written by many different applications unleash creativity, not squelch it. Thus, the deduplication engine at every stage of the lifecycle of the file, can scale to store many must be data agnostic and find and eliminate duplicates in data no hundreds of terabytes of data and find duplicates wherever they matter how it is packaged and stored to the system. exist, can reduce data by a factor of 20 or more, can do so at high speed without huge memory or disk resources, and does so automatically while taking snapshots and so requires a minimum of administrator attention. Vast requirements make it very Such an engine is only possible with an unconventional system challenging to build a storage system architecture that breaks with traditional thinking. Here are some that actually delivers the full potential of important features needed to build such an engine. deduplication. Variable Length Duplicates Not all data changes are exactly 4KB in size and 4KB aligned. Sometimes people just replace one word with a longer word. Such Multi-Protocol a simple small replacement shifts all the rest of the data by some There are many standard protocols in use in storage systems today small amount. To a deduplication engine built on fixed size blocks, from NFS and CIFS to blocks and VTL. For maximum flexibility, the entire rest of the file would seem to be new unique data even storage systems should support all these protocols since different though it really contains lots of duplicates. Further, that file may protocols are needed for different applications at the same time. exist elsewhere in the system, say in an email folder, at a random User home directories may be in NAS. The exchange server may offset in a larger file. Again, to an engine organized around fixed need to run on blocks. And backups may prefer VTL. Over the blocks, the data would all seem to be new. Conventional storage course of its lifecycle, the same data may be stored with all of these systems, whether NAS or SAN, store fixed sized blocks. Such a protocols. A presentation that starts in a home directory may be system which attempts to bolt on deduplication as an afterthought emailed to a colleague and stored in blocks by the email server and will only be able to look for identical fixed size blocks. Clearly, then archived in NAS by an email archive application and backed such an approach will never be as effective, comprehensive, and up from the home directory, the email server, and the archive ap- general purpose as a system that can handle small replacements plication to VTL. Deduplication should be able to find and eliminate and recognize and eliminate duplicate data no matter where in a redundant data no matter how it is stored. file it appears. 4 DEDUPE-CENTRIC STORAGE
  • 5. CPU-Centric vs. Disk-Intensive Algorithm Deduplicated Snapshots Over the last two decades, CPU performance has increased Snapshots are very helpful for capturing different versions of 2,000,000 times1. In that time, disk performance has only increased data and deduplication can store all those versions much more 11 times1. Today, CPU performance is taking another leap with compactly. Both are fundamental to simplifying data management. every doubling of the number of cores in a chip. Clearly, algorithms Yet many systems that are cobbled together as a set of features developed today for deduplication should leverage the growth in either can’t create snapshots efficiently, or they can’t deduplicate CPU performance instead of being tied to disk performance. Some snapshots, so users need to be careful when they create snapshots so as not to lock duplicates in. Such restrictions and limitations complicate data management not simplify it. Deduplication has leverage across the storage infrastructure for reducing Conclusion data, improving data protection and, in Deduplication has leverage across the storage infrastructure for general, simplifying data management. reducing data, improving data protection and, in general, simplify- ing data management. But, deduplication is hard to implement in a way that runs fast with low overhead across a full range of protocols and application environments. Storage system vendors systems rely on a disk access to find every piece of duplicate data. who treat deduplication merely as a feature will check off a box on In some systems, the disk access is to lookup a segment fingerprint. a feature list, but are likely to fall short of delivering the benefits Other systems can’t find duplicate data except by reading the old deduplication promises. data to compare it to the new. In either case, the rate at which duplicate data can be found and eliminated is bounded by the speed of these disk accesses. To go faster, such systems need to add more disks. But, the whole idea of deduplication is to reduce storage, not grow it. The Data Domain SISL™ (Stream-Informed Segment Layout) scaling architecture does not have to rely on reading lots of data from disk to find duplicates and organizes the segment fingerprints on disk in such a way that only a small number of accesses are needed to find thousands of duplicates. With SISL, back end disks only need to deliver a few megabytes of data to deduplicate hundreds of megabytes of incoming data. Deduplicated Replication Data is protected from disaster only when a copy of it is safely at a remote location. Replication has long been used for high-value data, but without deduplication, replication is too expensive for the other 90% of the data. Deduplication should happen immediately inline and its benefits applied to replication in real time, so the lag till data is safely off site is as small as possible. Only systems designed for deduplication can run fast enough to deduplicate and replicate data right away. Systems which have bolted on deduplica- tion as merely a feature at best impose unnecessary delays in replication and at worst don’t deliver the benefits of deduplicated replication at all. www.datadomain.com 1 Seagate Technology Paper, Economies of Capacity and Speed, May 2004
  • 6. Data Domain 2421 Mission College Blvd. Santa Clara, CA 95054 866-WE-DDUPE; 408-980-4800 sales@datadomain.com 22 international offices: datadomain.com/company/contacts.html Copyright © 2008 Data Domain, Inc. All rights reserved. Data Domain, Inc. believes information in this publication is accurate as of its publication date. This publication could include technical inaccuracies or typographical errors. The information is subject to change without notice. Changes are periodically added to the information herein; these changes will be incorporated in new additions of the publication. Data Domain, Inc. may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time. Reproduction of this publication without prior written permission is forbidden. The information in this publication is provided “as is.” Data Domain, Inc. makes no representations or warranties of any kind, with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Data Domain and Global Compression are trademarks or registered trademarks of Data Domain, Inc. All other brands, products, service names, trademarks, or registered service marks are used to identify the products or services of their respective owners. WP-DCS-0508 DEDUPLICATION STORAGE www.datadomain.com