1. Deduplication Best Practices
With Microsoft Windows Server 2012
and Veeam Backup & Replication 6.5
Joep Piscaer, VMware vExpert, VCDX #101
j.piscaer@virtuallifestyle.nl
@jpiscaer
2. Agenda
Introduction
Use case for deduplication
Configure Data Deduplication in Windows Server 2012
How to optimize backup repository for deduplication
How to optimize backup jobs for deduplication
How to leverage per-pool deduplication
Demo
3. Agenda
Introduction
Use case for deduplication
Configure Data Deduplication in Windows Server 2012
How to optimize backup repository for deduplication
How to optimize backup jobs for deduplication
How to leverage per-pool deduplication
Demo
4. Introduction
Joep Piscaer
● Consulting Architect at OGD ict-diensten
● VMware VCDX5 #101, vExpert 2009, 2011, 2012
● Know Veeam since 2007 and in love with them ever since
(best. VMworld. parties. ever.)
5. Past Projects
Past implementations of Veeam B&R
● Commonly see a VMware virtualization layer with Windows VMs on top
● Windows Server 2008 R2 as the base for Veeam implementation
● Starting to work with Windows Server 2012
Notable projects include
● Bi-directional DR for 200-250 VMs with 2 infrastructures
● 150+ VM backup and replication within a single large datacenter
● Application consistent backups of Zarafa Collaboration Platform without
bringing database down (or any other downtime)
● Numerous smaller projects for DR or backup at customer sites
6. Agenda
Introduction
Use case for deduplication
Configure Data Deduplication in Windows Server 2012
How to optimize backup repository for deduplication
How to optimize backup jobs for deduplication
How to leverage per-pool deduplication
Demo
10. Use case – why use deduplication?
41,9% of CIOs
affected by
cost of storage
required for
storing backups
11. Use case – why use deduplication?
Lack of disk space is a growing
issue preventing adoption
of replication.
22,6% of CIOs reported lack of
disk space as an issue preventing
adoption of replication in 2011;
35,5% report the same in 2013.
12. Use case – why use deduplication?
Conclusions of Virtualization Data Protection Report 2013
● Decrease cost of storage
● Increase usage of existing storage
Make backups as storage-friendly as possible
To keep management cost down, don‟t introduce new or
complex solutions
● Use „what we already know‟ and „what we already have‟ to solve these
challenges
● Prevent re-training of administrators
● Increase ease of use of solution
13. Use case – why use deduplication?
Why deduplication instead of feature x or technology y?
● Veeam captures entire VM including Guest OS, applications and data
● VMs usually share Guest OS type, middleware, etc.
● Hence: there tends to be a lot of identical data across VMs
● Deduplication slashes identical data
But mix and match solutions if available
● Don‟t forget compression
● Excluded unneeded data (VM and Guest OS swap files, etc)
14. Use case – why use deduplication?
Backup files can be separated into two categories:
● Recent backups for fast restore and recoverability
● Older backups for archival purposes
With Veeam, these two types are „connected‟
● The backup file chain contains „recent‟ and „old‟ backups
● Nearly impossible to separate on the storage layer
15. Why use Windows Server 2012 deduplication?
The killer use case for using Data Deduplication in
Windows Server 2012 is longer-term (>60 day) retention
or archival of VMs on the same on-site storage platform.
Other use cases include off-site replication
and increased performance of forward incremental jobs
For 30-60 day retention, consider
reverse incremental backup mode
which offers similar storage efficiency.
16. Agenda
Introduction
Use case for deduplication
Configure Data Deduplication in Windows Server 2012
How to optimize backup repository for deduplication
How to optimize backup jobs for deduplication
How to leverage per-pool deduplication
Demo
17. Planning for disk-based deduplication
Determine the amount of data being backed up
Determine the data retention policy for each dataset
● Determine if and when to use an archival system
Determine the biggest full uncompressed backup file size
● Windows Server 2012 Data Deduplication is post-process (and not in-line)
● So you need at least this amount of free storage space to complete the job
Determine the daily change rate
● Determine if Data Deduplication can handle that amount of data
● MSFT recommends to design for ~100GB/hr deduplication data processing
18. Planning for disk-based deduplication
Estimate dedupe space savings on existing dataset
● Use „Data Deduplication Savings Evaluation Tool‟ (DDPEval.exe)
● Installed in C:WindowsSystem32 on 2012 host with Deduplication role
But can be copied to any system running 2012, 2008 R2 and 7
19. Planning for disk-based deduplication
Estimated sizing:
● Applicable to environments with less
than 1.5T of daily data ingestion daily (10.5T weekly)
Limits
● Large full backups ( > 1.5T) require lengthy dedupe process
● Consider running active fulls monthly to split ingestion of large amounts of
data
21. Agenda
Introduction
Use case for deduplication
Configure Data Deduplication in Windows Server 2012
How to optimize backup repository for deduplication
How to optimize backup jobs for deduplication
How to leverage per-pool deduplication
Demo
22. Tying it together with Veeam
Veeam Backup & Replication 6.5 include full support for
Microsoft Windows Server 2012 including ReFS volumes
and global data deduplication
But support for Storage Spaces is experimental.
To restore files from deduplicated volumes, backup server
must be installed on Windows Server 2012 with Data
Deduplication feature enabled in the OS settings.
23. Backup Repository settings
Deduplicating storage compatibility
● Align backup file data blocks
● Decompress backup data before storing
24. Agenda
Introduction
Use case for deduplication
Configure Data Deduplication in Windows Server 2012
How to optimize backup repository for deduplication
How to optimize backup jobs for deduplication
How to leverage per-pool deduplication
Demo
25. Backup Job settings
Backup Mode
● Incremental
● Use „dedupe-friendly‟ compression
● Set storage optimization to „local target‟
In my experience, enabling
Veeam dedupe and compression
significantly speeds up job
processing and do not interfere
with Windows deduplication
26. Backup Job settings
Why forward incremental?
● Less ingestion of new data on deduplicating repository
● Reversed incremental requires 3x IOps
Deduplicating repository is usually I/O bound
27. Backup Job settings
Synthetic or periodic active full?
No difference after dedupe but synthetic prevents „full‟ job to hit
source. Synthetic fulls requires data re-hydration while last full is
read from disk and is building the new synthetic full.
Try synthetic fulls first and monitor performance (time needed to
build new synthetic full).
Synthetic full only takes up space before deduplication process
kicks in; after dedupe, synthetic full is deduped againts previous
synthetic full
You want the previous synthetic full saved in deduped state the
moment you create a new synthetic full, so make sure to keep
enough restore points on disk.
28. Backup Job settings
Why enable inline data deduplication?
● Source-based deduplication
− Takes place at the backup source instead of target
● Decreases the amount of data sent over the network
− Amount of network traffic is a consideration, too
● Speeds up job processing significantly
● Uses a large block size so doesn‟t interfere with Data Deduplication
● If we disable inline data deduplication:
“The amount of data on disk remained the same, but the amount
transferred across the wire is 40x more.”
29. Backup Job settings
Why enable compression?
● Source-based compression
● Decreases the amount of data sent over the network
● Speeds up job processing significantly
● Dedupe-friendly compression uses a fixed dictionary and doesn‟t interfere
with Data Deduplication
● Dedupe-friendly compression saves about 10-20% on the initial VBK/VIB
file size. About the same (20-30%) is lost at Data Deduplication stage.
Significantly faster restores and slightly faster instant recovery.
● “Effectively you are trading some hard disk space overall (because of less
dedupe) for some up front network and disk bandwidth savings.”
30. Agenda
Introduction
Use case for deduplication
Configure Data Deduplication in Windows Server 2012
How to optimize backup repository for deduplication
How to optimize backup jobs for deduplication
How to leverage per-pool deduplication
Demo
31. Global Data Deduplication
Deduplicate data across backup files produced by different
backup jobs
● Veeam uses per-job deduplication, but space savings isn‟t the only
consideration when choosing which VMs go into a job.
● Microsoft deduplicates on a per-volume basis
Multiple backup files / jobs stored on a single volume
One less variable to consider when planning jobs
● Makes „more small jobs‟ a viable solution
● Focus more on more functional separation of VMs in jobs
− Separate per (multi-tier) application (vCenter Intenvory VM Folders)
− Replicate off-site using Cloud Edition
32. Agenda
Introduction
Use case for deduplication
Configure Data Deduplication in Windows Server 2012
How to optimize backup repository for deduplication
How to optimize backup jobs for deduplication
How to leverage per-pool deduplication
Demo
34. Demo – Dedupe: Jobs vs. Volumes
3 VM‟s in 3 separate backup jobs
● 24,2 GB before dedupe, 13,3 GB after dedupe
Copy a large file (~8GB) to all three VMs
Run backup jobs (regular incremental run)
● 45,9 GB before dedupe, 19,5GB after dedupe
Run deduplication process on repository
● Run dedup with „Start-DedupJob -type Optimization -volume E:‟
● Check status running dedupe job with „Get-DedupJob‟
● Check space savings after dedupe job with „Get-Dedupstatus‟
Source VM’s occupy 65 GB on source datastore
● That’s a 3,33:1 deduplication rate!
35. Demo – Instant VM Restore
Re-hydration of deduplicated backup files takes time and
performance will suffer.
Most recent backups typically not deduplicated yet
because Data Deduplication is post-process, so instant
recovery using vPower NFS won‟t suffer in performance
Don‟t set Data Deduplication schedule to dedupe (most)
recent backup files as these are used for restores in 99%
of the cases. Set data aging to > 1-2 days.
Weigh the benefits of disk savings against increased RTO.
http://forums.veeam.com/viewtopic.php?f=2&t=14910Note that I'm not saying this isn't cool, improving the storage efficiency for forward incremental using Windows 2012 dedupe is awesome, especially for the use cases where it makes sense (retention longer than 60 days, staging to tape or offsite, faster incremental performance, etc.), but just pointing out the fact that reverse incremental (a feature of Veeam since V1) offers similar storage efficiency for retention periods of 30-60 days and environments of this size. I'd love if you would continue to share your results as your retention builds as I'm continuing to collect both lab and real world results.
http://forums.veeam.com/viewtopic.php?f=2&t=14002Determine if and when to use an archival system since online disk-based systems are great for shorter-term retention but might not be for archival and compliancy purposes
01 AddRole02 Create NTFS volumewithdeduplicationenabled.Let op: werkt alleen op NTFS en niet op ReFSLet uit: waarom files olderthan ‘x’: kans op restoren dergelijke oude files niet aannemelijk.Leg uit: default dedupschedule en backupwindows bijten elkaar normaal gesproken; pas schedule aan. Backupserver doet overdag niet veel, dus juist dan dedup laten lopen03 & 04 simplededupexample without Veeam
http://forums.veeam.com/viewtopic.php?f=2&t=15766About the repository configurations, is it the same machine as also proxy role? In that case you do not need to enable those configurations, they are used to reduce at the minimum data travelling via network from proxy to repository, and let nonetheless reporitory expand again data before writing them to the storage, this is pretty useful to save bandwidth while shipping data to a dedupe appliance with a linux/windows "head" in front of it. If it's the same server, there is really no need to compress and decompress data inside the same VM.http://www.veeam.com/blog/how-to-get-unbelievable-deduplication-results-with-windows-server-2012-and-veeam-backup-replication.htmlThe Align Backup file data blocks setting is recommnded for Dedupe appliances however Windows Server 2012 dedupe is not an appliance. It is volume based and uses software to break the data into chunks and store them in a chunk store. In my experience I have not seen a benefit from using this setting.
Previous chain will only be removed by retention when the latest incremental backup in that chain is no longer needed for restorehttp://jpaul.me/?p=1729
http://forums.veeam.com/viewtopic.php?f=2&t=14002&start=15#p73149“I would expect it to be very similar to other dedupe appliances. Typically "dedupe friendly" compression provides only a 10-20% reduction in the initial size of the VBK and VIB files, while costing roughly that same amount dedupe savings, perhaps slightly more. Saving 10-20% may not sound like much, however, for customers backing up 10's or 100's of TB, this can be a significant savings in network bandwidth and it also generally makes for faster restores, and sometimes even slightly faster instant recovery since 10-20% less data must be read from the backup repository, so it can be a reasonable compromise. Effectively you are trading some hard disk space overall (because of less dedupe) for some up front network and disk bandwidth savings. If you're happy with your current performance and want to maximize dedupe, I would leave compression disabled.”http://forums.veeam.com/viewtopic.php?f=2&t=8916&start=30#p61601To be completely fair, I took significant artistic license with this example, it's not 100% technically accurate, but the goal was to outline the concept, and show why Veeam dedupe does not interfere with the dedupe on the appliance, although it will slightly decrease the reported dedupe ratio from the appliance perspective since we obviously reduce the amount of data sent in the first place. In the example above, the dedupe appliance would likely report a 6:1 dedupe ratio of Veeam dedupe was disabled, but a 4:1 if Veeam dedupe was enabled, because we eliminated to data before it got to the appliance. The final amount of data on storage would be exactly the same.If you really want to use compression when writing to a dedupe appliance, using "Low" compression is probably the best bet. This compressing uses a fixed dictionary and is somewhat predictable. It will still lower dedupe, in testing by about 20-30% or so, but it will provide some reduced data going to the dedupe appliance which can make backups faster.http://forums.veeam.com/viewtopic.php?f=2&t=15166Typical compression using the "dedupe-friendly" method can be 10-20%, significantly less than the 50-75% (or more) compression available from the other algorithms, however, this will have some negative impact on dedupe, normally reducing it's effectiveness by a similar ratio.
Benadrukdatdedupe op storagelaageengrote impact heeft op hoe je de jobs ‘organiseert’, waardoor de jobs veelflexibelerworden en gemakkelijker in tezetten en tewijzigen.Stel je voordat je een VM in eenandere job plaatst; zonderdedupkost je dat storage (vanwegeretentie in oude job); met dedupkost je datgeen storage.