1. High Availability and Disaster Recovery Considerations for Microsoft Hyper-V Ing. Eduardo Castro, PhD Grupo Asesor en Informática ecastro@grupoasesor.net
2. Agenda Hyper-V Virtualization Scenarios How VM Availability, Disaster Recovery and Backup/Recovery Relate to Business Continuity Anatomy of a Hyper-V Virtual Machine Backup/HA/DR for Hyper-V Backup/Recovery Implications for Hyper-V VMs High Availability Implications for Hyper-V VMs Disaster Recovery Implications for Hyper-V VMs Geo-Clustered Hyper-V VM Demonstration Summary / Q&A
4. Business Continuity Resumption of full operations combining People, Processes and Platforms Disaster Recovery Site-level crisis , data and IT operations resumption Backup and Restore Presumes infrastructure is whole 97% is file/small unit related High Availability Presumes that the rest of the environment is active Keeping the Business Running
7. simplifying backups, recovery and DR testingPrimary Site Secondary Site High Availability Disaster Recovery Backup and Recovery Disaster Recovery Storage Array Storage Array VHD Clustering Shared Storage Quick/Live Migration Backup/Recovery Backup/Recovery Backup/Recovery
8. The Architecture of Hyper-V VMWorkerProcesses Parent Partition Child Partitions Applications Applications Applications Applications User Mode WMIProvider VM Service Windows Server 2008 Non-Hypervisor Aware OS WindowsKernel Windows Kernel VSP Xen-Enabled Linux Kernel Windows Server 2003, 2008 Kernel Mode IHV Drivers VMBus Emulation Linux VSC VMBus VMBus VSC Windows Hypervisor Ring -1 HypercallAdapter “Designed for Windows” Server Hardware
9. The Anatomy of a Hyper-V VM .VHD – VM data .AVHD – VM snapshots *.BIN – Contents of VM RAM for a saved state *.VSV – Saved state information (i.e., processor register data) *.XML – VM configuration information in an industry-standard XML file
11. The Anatomy of a Hyper-V VM All VMs are assigned a unique GUID: <logical_id type="string">056B19F3…FAD06C76416D</logical_id> All snapshots are assigned a GUID – used to identify the snapshot and construct relative paths to .AVHDs: <guid type="string">53E0AC2C…EE46C4F495D4</guid> Both the virtualized NIC(s) in the VM as well as the virtual switch(es) on the host are assigned a GUID: <ChannelInstanceGuidtype="string">{bc66…}</ChannelInstanceGuid> <SwitchName type="string">Switch-SM-847f89…</SwitchName> Permissions related to Hyper-VM are important to consider: <sid type="string">S-1-5-2…</sid>
12. VM Backup/Recovery Challenges Expense – Loading Agents in Each Guest OS Protecting Virtualized Applications (Exchange, SQL, etc.) VMs may Increase Backup/Restore Complexity Backing up “in the guest” Versus “outside the guest” – Image or file –level recovery Restoring to different hardware if necessary
13. Some VM Backup Terminology File-Level Backup – “In the Guest” Image-Level Backup – “On the Host” Application Quiescing O/S Crash Consistency Application Crash Consistency
14. Types of VM Backups Three types of Backups Backing up the host system May be necessary to maintain host configuration But often, not completely necessary The fastest fix for a broken host is often a complete rebuild Backing up Virtual Disk Files Fast and can be done from a single host-based backup client Challenging to do file-level restore Backing up VM’s from inside the VM Slower and requires backup clients in every VM. Resource intensive on host Capable of doing file-level restores
15. Challenges of Transactional DBs O/S Crash Consistency is fairly easy Quiesce the NTFS file system before beginning the backup Application Crash Consistency is much harder Tx databases like AD, Exchange and SQL don’t quiesce just because NTFS does Restoration without crash consistency will lose data - DB restores into “inconsistent” state and must perform a soft recovery
16. Dealing with Consistency When backing up VMs, may need to consider dual approaches: file level backups and image-level backups File-level = Restore Individual Files w/Tx Integrity Image-level = Whole-Server Recoverability Image-level backups may not provide application crash consistency! MSFT and 3rd Party Solutions may integrate with VSS-aware guest OS and applications Microsoft System Center Data Protection Manager 3rd Party Backup Solutions
17. Integrating Backup w/VSS VSS = Volume Shadow Copy No need to power down virtual machines to do backups VSS ensures a consistent state in the virtual machine Must have backup integration component enabled
18. Data Protection Manager 2007 Data Protection Manager 2007 Recovery Point Objective 15min versus RT for VSs-aware VMs ~1 day versus RT for non VSS-aware VMs Recovery Time Objective Automated Monitoring and Failover versus on-demand recovery Type of Recovery Needed Disaster Recovery – focus on getting back up and running with the latest copy ASAP Operational Recovery & Disaster Recovery – focus on being able to recover multiple points in time
24. Protects VMs without hibernation (if OS is VSS enabled)Secondary Site Primary Site Recovery Up to every 15 minutes WAN Connectivity
25. VSS/Backup Recommendations VSS in Hyper-V does not support: Host-level backups of pass-through VHDs. Host-level backups of iSCSI volumes in guest VMs Instead, use guest-based Exchange-aware streaming backup or VSS backup Data Protection Manager 2007 VSS in Hyper-V does support host-level backups of VHDs Hardware-based VSS backups of Exchange Storage Supported by the vendor, not Microsoft
26. Hyper-V Backup Best Practices Ensure your backup solution supports VSS Support for the VSS writer in Hyper-V specifically Virtual Machine Backup Best practices Leverage the Hyper-V VSS writer to take online snapshots of virtual machines System Center Data Protection Manager will provide Hyper-V VSS snapshots Ability to quickly recover virtual machines Replicate snapshots to backup location for DR
27.
28. Downtime is far worse because multiple workloads are affectedVirtualization and High-Availability Go Hand in Hand
29. Microsoft Hyper-V Quick Migration Provides solutions for both planned and unplanned downtime Planned downtime Quickly move virtualized workloads to service underlying hardware More common than unplanned Unplanned downtime Automatic failover to other nodes (hardware or power failure) Not as common and more difficult Windows Server 2008 R2 introduces Live-migration supporting movement of virtual machines between servers with no loss of service
30. Quick Migration Fundamentals Save state Save entire virtual machine state Move virtual machine Move storage connectivity from origin to destination host Restore state and run Restore virtual machine and run VHDs Shared Storage Network Connectivity
31. Other VM Availability Scenarios Guest-based VM clustering (using WSFC) Cost prohibitive – requires Enterprise edition of Windows Server and shared storage More complex to install/configure/manage An option for cluster-aware applications 3rd party replication/failover solutions Use software-based replication/failover to replicate VMs between Hyper-V hosts (or within VMs) Double-Take for Hyper-V CA XOsoft High Availability SteelEyeLifeKeeper for Windows
32. Virtualization Benefits Downtime is Expensive More Rapid Backup and Recovery Quick/Live Migration/Clustering Things are Complicated Eliminate maintaining duplicate physical systems Automate Backup, Recovery and DR processes Infrastructure/People are Expensive Reduce expenditure on facility and infrastructure Diminish need for specialized hardware/personnel
33. Some DR Terminology RTO – Recovery Time Objective How much data you can afford to lose… RPO – Recovery Point Objective How long you can afford to be down… Hot site Servers up and operational at remote site at all times. Warm site Servers pre-provisioned at remote site. Tasks to complete for failover to occur. Cold site Empty site and servers on retainer awaiting DR event.
35. Days to Weeks Recovery Use free or low-cost solutions to backup VMs at the host level (image-level backups) DR site is a “cold site” with equipment available on-demand from a vendor/co-lo company Store images to tape/disk and rotate off-site Will need to manually restore images and fix problems …. …and there will be problems!
36. Hours to Days Recovery Use free or low-cost solutions to backup VMs at the host level (image-level backups) DR site is a “warm site” with storage available for replicated/copies VM images Transfer images to off-site data storage location Some tools provide off-site capabilities Will need to manually restore images and fix problems …. …and there will be problems!
37. Minutes to Hours Recovery Use replication to provide site-to site replication of VM data These host-level replicated VM copies are potentially inconsistent Can use SAN-based or host-based replication Cost / Bandwidth trade-off Less impact to WAN – changes being sent in real-time (compression/throttling) Will need to attach replicated VMs to replacement equipment and fix problems
38. Immediate Recovery Warm or hot site is used for DR Storage to storage replication installed between sites 3rd party replication technologies used for VM replication “in the guest” for transactional integrity “on the host” for all other workloads Restoration is usually automated using 3rd party tools or interoperability with Windows Server Failover Clustering
39. Windows Server 2008 - WSFC No More Single-Subnet Limitation Allows cluster nodes to communicate across network routers No more having to connect nodes with VLANs! Configurable Heartbeat Timeouts Increase to extend geographically dispersed clusters over greater distances Storage Vendor Based Solution Mirrored storage between stretched locations Hardware or Software based replication
40. GeoCluster Integrates with Microsoft Failover Clustering Uses Double-Take Patented Replication Extends Clusters Across Geographical Distances Eliminates Single Point of Disk Failure GeoCluster for Hyper-V Workloads Utilizes GeoCluster technology to extend Hyper-V clustering across virtual hosts without the use of shared disk Allows manual and automatic moves of cluster resources between virtual hosts
41. At failover, the new active node resumes with current, replicated data Only the active node accesses its disks Data is replicated to all passive nodes Replication GeoCluster nodes use separate disks, kept synchronized by real-time replication How GC Integrates w/WSFC