Maximize Availability and Uptime by Clustering Your Physical Data Centers Within Metro Distances

1
MAXIMIZE AVAILABILITY AND UPTIME
BY CLUSTERING PHYSICAL DATA
CENTERS WITHIN METRO DISTANCES
MICHAEL NAKAMURA, SENIOR SOLUTIONS ARCHITECT
HENRY CHU, SENIOR SOLUTIONS ARCHITECT
OCTOBER 2012

WEBTECH EDUCATIONAL SERIES
Maximize Availability and Uptime by Clustering Your Physical Data Centers
within Metro Distances
As IT infrastructures continue to be virtualized, data center architects are looking
for ways to increase the mobility and high availability of virtual machines beyond a
single data center.
Expanding data centers across multiple locations has become an increasingly
common strategy to address high-availability and disaster recovery needs for
businesses with high uptime requirements.
Join Hitachi Data Systems for this Webinar and learn how you can:
• Accelerate tier-1 virtualization adoption by providing best-in-class SLAs
• Dynamically move workloads within and across data centers to avoid
contention, and support utility-on-demand models
• Provide automated recovery of applications with high return on investment

UPCOMING WEBTECHS
November
 Comprehensive and Simplified Management for VMware vSphere
environments, November 14, 11 a.m. PT, 2 p.m. ET
 Microsoft SQL Server 2012 Data Warehouse solutions on Hitachi
converged platform, November 27, 9 a.m. PT, 12 p.m. ET
Check www.hds.com/webtech for
 Links to the recording, the presentation and Q&A (available next week)
 Schedule and registration for upcoming WebTech sessions

AGENDA
 Customer challenges
 VMware Metro Storage Cluster overview
 Hitachi Storage Cluster for VMware vSphere
technical review
 Best practices
© Hitachi Data Systems Corporation and Brocade Communications Systems, Inc. 2012. All Rights Reserved.

CUSTOMER CHALLENGES
 Downtime
‒ Key component(s) failure in single data center
‒ Planned maintenance
‒ No disaster recovery without downtime
 Reluctance to migrate mission-critical apps
‒ Fear of performance degradation
‒ Data recovery is an issue; inability to meet recovery time
objectives (RTO) and recovery point objectives (RPO)
 Lack of a single point of management
across data centers
 No ability to pool resources across data
centers limits application deployment flexibility

VMWARE METRO STORAGE CLUSTER
OVERVIEW
 VMware vSphere Metro Storage Cluster (vMSC) is a new
certified configuration in which a storage device spans
multiple geographical storage systems
 Hitachi Storage Cluster certification is complete – on
VMware Hardware Compatibility List
 Implemented for disaster and downtime avoidance
WHAT IS A METRO STORAGE CLUSTER?

WHY USE A METRO STORAGE CLUSTER?
 Maximize availability and uptime by clustering physical
data centers within metro distances
 Leverage VMware infrastructure high-availability benefits
with storage-based synchronous replication awareness
 Stretched storage clusters provide new architectures that
enable
 Nondisruptive workload mobility
 Cross-site load balancing of resources
 Avoidance of disaster and downtime
 Uniform host access model – provides a single view of a datastore
across sites
 Data consistency across 2 sites in the case of failure

HITACHI STORAGE CLUSTER FOR VMWARE
VSPHERE: INFRASTRUCTURE OVERVIEW

VSPHERE: MANAGEMENT OVERVIEW
 vCenter Server contains
these management
components:
‒ vCenter
‒ Hitachi Dynamic Link
Manager (HDLM)
command
‒ vSphere CLI
‒ CCI Raid Manager
 Cmd Dev presented
from both Hitachi
Virtual Storage
Platform (VSP)
systems
 Best practice: Place vCenter at
a 3rd site to ensure virtual
infrastructure management is
not affected from any 1 site
during a sitewide failure

VSPHERE: ARCHITECTURE OVERVIEW
 Hitachi High Availability Manager (HAM)
installed on each VSP
 P-VOL and S-VOL seen as a single
volume
‒ RCU takes MCU serial number upon
failover
 Write data transferred from MCU to RCU
cache via synchronous Hitachi TrueCopy®
‒ Supports external storage and Hitachi
Dynamic Provisioning volumes
 Quorum disk on external storage
‒ Used by both MCU and RCU
‒ Unique quorum disk for each MCU-RCU
relationship
‒ Allows verification of data integrity before
failover
‒ Denotes location of most recent host data

HITACHI DYNAMIC LINK MANAGER (HDLM) WITH HIGH
AVAILABILITY MANAGER (HAM): INTRODUCTION
 Virtual storage represents P-VOL and
S-VOL as a single volume
‒ P-VOL and S-VOL have same VOL ID
in SCSI inquiry
 HDLM in ESX manages path
selection
‒ Active I/O sent to P-VOL
‒ S-VOL in standby state in normal
operation
‒ Load balancing algorithm
 Extended round robin
 Extended least I/O
 Extended least blocks
 HAM uses synchronous TrueCopy to
replicate from P-VOL to S-VOL

HDLM WITH HAM: VMOTION AND DYNAMIC
RESOURCE SCHEDULER
 vMotioned VMs
‒ Hosts within the cluster will
use active paths to P-VOLs

HDLM WITH HAM:
VMWARE HIGH AVAILABILITY (HA)
 VMware HA failover
‒ VMs failover to existing ESX
nodes in HA cluster
‒ I/O continues to active P-
VOL paths

HDLM WITH HAM: PATH FAILOVER
 When paths to P-VOL fail,
HDLM PSP handles the
path failover

HDLM WITH HAM: STORAGE FAILOVER
 When all paths to P-VOL or
MCU fail
‒ Paths to S-VOL become
active
‒ Verify data integrity with
quorum disk before failover
‒ RCU splits S-VOL with
write- enabled status

HDLM WITH HAM: PATH RECOVERY
 Storage recovery will
require reverse sync
‒ pairresync –swaps/swapp
 When storage recovers
and paths to P-VOL
recover
‒ Paths to S-VOL become
standby
‒ P-VOL paths become
active

QUORUM FAILURE
 Remote mirroring between
P-VOL and S-VOL stops
 P-VOL continues to process
host I/O

REPLICATION LINK FAILURE
 P-VOL continues to
process host I/O

HDLM WITH HAM: SITE FAILURE
 VM failover handled by
VMware HA
 Storage failover handled by
HAM
 Path failover to replicated
storage handled by HDLM

WAN LINK FAILURE (UNDER REVIEW)
 Link for replication and
remote site has failed but
links to local site are
active
‒ P-VOL cannot process host I/O
‒ HDLM switches the I/O path to
S-VOL
‒ Site 1: I/O paths to S-VOL also
cannot be used, so Site 1
cannot continue to access both
P-VOL and S-VOL
‒ Site 2: S-VOL continues to
process host I/O

BEST PRACTICE DESIGN
RECOMMENDATIONS
 Performance bottleneck dependent on WAN latency and
bandwidth
‒ Optionally use VMware HA with N+1 settings with
combination of DRS affinity rules to keep VMs on same site
where the active volume resides
 Quorum disk should be located at 3rd site to ensure
quorum access is not affected from any 1 site during
sitewide failure.
 vCenter should be located at 3rd site to ensure virtual
infrastructure management is not affected from any 1 site
during sitewide failure.

BEST PRACTICE DESIGN
RECOMMENDATIONS
 Perform storage failback during scheduled downtime
‒ Perform a clean and controlled storage failback by migrating
high-uptime virtual machines to a single host via VMware
vMotion and then performing storage failback
 Avoid single points of failure by architecting with
redundancy in mind

THANK YOU
MICHAEL NAKAMURA
HENRY CHU
michael.nakamura@hds.com, henry.chu@hds.com

Maximize Availability and Uptime by Clustering Your Physical Data Centers Within Metro Distances

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (6)

Mais de Hitachi Vantara

Mais de Hitachi Vantara (20)

Último

Último (20)

Maximize Availability and Uptime by Clustering Your Physical Data Centers Within Metro Distances