1. Vmbkp: An Online Backup Tool
for VMware vSphere
Oct 15, 2010
HOSHINO Takashi
Cybozu Labs, Inc.
1
2. What is Vmbkp?
• Backup software for Virtual Machines
in VMware vSphere environment
– Online full/differential/incremental backup
– Multi-generation backup management
– Efficient archive access with sequential IO and reverse diff.
– Command-line I/F for scheduling by Cron
2
3. Supported platform
• VMware vSphere 4
– vCenter server managing several ESX(i)s
– Single ESX(i) (not tested)
– Free ESXi is not supported (snapshot fails)
• Backup server
– Linux on x86_64 host.
– CentOS 5.5 64bit is confirmed
3
4. Hardware Architecture
Control/GetInfo with
vSphere Soap Protocol VMware vSphere
vCenter Server
Vmbkp LAN
Server
VMware VMware
ESX(i) Host ESX(i) Host
SAN
Data Transfer via SAN
with VDDK Protocol
Backup VM VM VM
Storage Storage Storage Storage
You can use NBD transfer without SAN. 4
5. Commands
• Update:
– Get and save information of all available VMs
• Backup:
– Execute backup of the specified vm/group or all
• Restore:
– Execute restore of the specified archived generation as a new VM
• Check:
– Check backup archives are valid
• Status:
– Show status of backup archives
5
6. Commands –cont.
• Destroy:
– Remove a virtual machine from vSphere environment
• Clean:
– Delete archives of virtual machines
• List:
– Get a list of virtual machines satisfying specified conditions
• Help:
– Show usage
6
7. Workflow
Backup Restore
Prepare config Prepare config
(Register to cron)
Read config/profiles Read config/profiles
Get vSphere information Restore target VMs
Backup target VMs Import ovf
Export ovf (without disks) Add disks to new VM
Create snapshot Restore vmdk files
(Get changed block info)
Backup vmdk files
Delete snapshot
(Delete previous dump)
User task
Update profiles
Vmbkp task
7
8. Configuration files
• Global (required)
– Global configuration
• Backup directory
• Number of generations to keep
• Vmdkbkp path to backup/restore vmdk files
• vSphere authentication information
• Group (optional)
– Group configuration for convenient use
8
9. Layout of Archive Files
• <backup dir>
– AllVM profile
• <backup dir>/<vm>/
– VM profile
• <backup dir>/<vm>/<generation>/
– Generation profile
– Ovf file for VM configuration
– Dump/digest/rdiff/bmp files for each vmdk
9
10. Profiles
• Allvm
– Information/status of all VMs in the target vSphere environment
– Updated by update command
• Vm
– Information/status of archives of a VM
– Created/updated by backup command and referred by restore
command
• Generation
– Information/status of each generation of backup of a VM
– Created by backup command and referred by restore command
10
11. Software Architecture
Cron User
Command-line Interface
Backup/Restore Controller
Utility Soap Wrapper Vmdkbkp Wrapper
Library Snapshot
Vmdkbkp: Vmdk
Ovf Backup/Restore
Bitmap Changed blocks
XML (Ovf) Tool/Library (C++)
Config/Profile VI Java Library VDDK C Library
VMware vSphere VMware ESX(i) SAN
vCenter Server Host Storage
11
12. Required Tools and Libraries
• Java SE 1.6
– Java, Javac, Jar comands
• VI-Java 2.1GA
– soap wrapper
• G++ 4.4
• Boost 1.43
– shared_ptr, scoped_array, thread, and iostreams
• VDDK 1.2.0
– Virtual disk development kit by Vmware
12
15. What is VmdkBkp?
• Online backup software
for remote/local vmdk files
in VMware vSphere environments.
– Currently support vSphere version 4.
• Written in C++
• Uses VDDK Library by Vmware
• Used by Vmbkp (java) tool
16. Archive Files
• Dump/Rdiff
– VMDK metadata and blocks archive
without zero-blocks
– Dump is full archive,
Rdiff is reverse differential one
– Dump + Rdiff = Previous dump
• Digest
– MD5 digest data for all blocks of VMDK
– Used to check equality of blocks,
and validate corresponding dump/rdiff files
17. Supported Commands
• Dump
– Execute full/differential/incremental dump
• Restore
– Execute restore with dump/rdiff
• Check
– Validate dump/rdiff with digest data
• Print
– Print dump/rdiff/digest for human read
• Digest
– Make digest from dump
• Merge
– Make past dump from current dump and past rdiff(s)
18. How to Backup Remote Vmdk
• Command line:
– vmdkbkp dump [connect options] --mode [full/diff/incr]
--vm [vm moref] --snapshot [snapshot moref]
--remote [disk path]
--dumpin [previous dump] --dumpout [current dump]
--digestin [previous digest] --digestout [current digest]
--bmpin [changed block bitmap]
--rdiffout [current-previous rdiff]
• Inputs/Outputs:
– Full: Just --dumpout and --digestout are required
– Diff: All options except --bmpin are required
– Incr: All options are required
19. Full Backup
VM Virtual Disk
Configuration (vmdk) • Ovf
All blocks – VM configuration data
(without disk information)
Vmbkp Tool
• Dump
Non-zero blocks – Full data of vmdk
(without zero-blocks)
Backup files
• Digest
Dump – Digest data of all blocks
Ovf Digest
19
20. Differential Backup
VM Virtual Disk • Rdiff
Configuration (vmdk)
– Reverse difference
All blocks data of vmdk
– Dump’ + Rdiff’ = Dump
Vmbkp Tool
• You can delete dump of previous
generation after current backup
Non-zero blocks
Backup files of Backup files of
previous generation current generation
Dump Dump’ Rdiff’
Ovf Digest Ovf’ Digest’
20
21. Incremental Backup
VM Virtual Disk Changed Block
Configuration (vmdk) Information
Changed blocks
• Changed Block Information
Vmbkp Tool – The set of address of changed
blocks after previous backup
Non-zero blocks
Backup files of Backup files of
previous generation current generation
Dump Dump’ Rdiff’
Ovf Digest Ovf’ Digest’
21
22. Vmdk Archives Relationships
Write some data on the 1st vm.
0.vmdk 1.vmdk
Full Full
dump dump
Diff
dump
0.dump 1.dump
0.digest 1.digest
Incr
dump 1-0.rdiff
Check the all dump/digest files rdiff2bmp
from all possible paths are the same
using check_dump_and_dump and 1.bitmap
check_digest_and_digest.
23. Vmdk Archives Relationships –cont.
Write some data on the 1st vm.
0.vmdk 1.vmdk
Restore Restore
Merge
0.dump 1.dump
0.digest 1.digest
Restore to 0.dump
1-0.rdiff
Digest Full dump 0.vmdk to 0r.dump
Check 0.dump and 0r.dump are the same.
Merge 1.dump and 1-0.rdiff to 0m.dump
Digest 0m.dump to 0m.digest
Check 0.{dump,digest} and 0m.{dump.digest} are the same.
24. Software Architecture of vmdkbkp
Command Command executor
Util Header Manager Specific components
Exception Serialize Bitmap General components
• Command • Manager
– Parse command-line and execute it – Manage (1) VDDK connection,
• Util (2) vmdk file access, and (3)
dump/rdiff/digest file access
– Configuration, Time, etc.
• Serialize
• Header – StringMap/Integers data serializer
– Manage header/blocks of
dump/rdiff/digest files • Bitmap
• Exception – Bitmap data serializer
– Exceptions and related macros.
25. VDDK Control with Fork
• Solves the problem that VDDK re-initialization
for SAN transfer due to SCSI reservation
conflict error inevitably fails and falls back to
NBD transfer.
25
26. VDDK Control with Fork –cont.
Main process
Provide the same interface
VddkController with Vddk/Vmdk Manager
VddkWorker(parent) Manage processes and
communicate with child
Forked process
Wrapper of Vddk/Vmdk
VddkWorker(child) manager and communicate
with parent
VddkManager VmdkManager
27. Multi-threaded Archive Manager
• Improves performance with gziped multi-
stream dump/restore/check/merge
operations
Archive Managers Interface of archive accesses
specialized for each command
Archive IO Managers Multi-threaded/Single-threaded
stream access for each archive file
DataReader, DataWriter Worker thread and its controller for
Gzip compresson/decompression
Queue Thread-safe FIFO
27
29. Restore with SAN
• Problem in restore with SAN
– Failed auto-allocation for thin vmdk
– Auto-allocation is too slow for thick vmdk
– There is no efficient allocation API.
• If zero-block restore with NBD is faster, use it
as allocation method
– not fast…
30. Future Work
• Improve parallelism
– Solving SCSI reservation conflict problem
– Multi-threaded compression
• Restore with SAN
– Depends on VDDK’s efficient block allocation API
30