4. Benefits of a Multi-Site Cluster
Protects Against Loss of an Entire Datacenter
Power outage, fires, hurricanes, floods, earthquakes, terrorism
Automates Failover
Reduced downtime
Lower complexity of disaster recovery plan
Reduces Administrative Overhead
Automatically synchronize application and cluster changes
Easier to keep consistent than unclustered servers
What is the primary reason why disaster recovery solutions fail?
Dependence on People
7. Site B
Multi-Site Clustering Basics
2+ physically separate sites
1+ node at each site
Storage at each site with data replication
Application moves during
a failover
Site A
SANSAN
8. Redundancy Everywhere
2 or more computers (nodes)
2 NICs
3rd NIC for iSCSI
HBA
Fibre Channel (FC)
Serial Attached-SCSI (SAS)
Multipath IO (MPIO)
Redundant Storage Interconnects
Replicated Storage
OS, Service or Application HA Roles
9. Mix and Match Hardware
You Can Use Any Hardware Configuration if
Each component has a Windows Server 2008 / R2 logo
Servers, Storage, HBAs, MPIO, etc…
It passes Validate
It’s That Simple!
Connect your Windows Server 2008 / R2 logo’d hardware
Pass every test in Validate
It is now supported!
If you make a change, just run Validate again
Details: http://go.microsoft.com/fwlink/?LinkID=119949
10. FCCP
Failover Cluster Configuration Program
Windows Server 2008 / R2
Buy validated solutions
“Validated by Microsoft Failover Cluster Configuration Program”
Not required for Microsoft support, must be logo’d
More information:
http://www.microsoft.com/windowsserver2008/en/us/clusterin
g-program.aspx
12. Cluster Validation and Replication
Multi-Site clusters are not
required to pass the Storage
tests to be supported
Validation guide and policy:
http://go.microsoft.com/
fwlink/?LinkID=119949
14. Why is Replication Needed?
Loss of a site won’t cause complete data loss
Data must exist on other site after a failover
Different storage needs than local clusters
Multiple storage arrays, independent on each site
Nodes usually access local site’s storage first
Site A
Changes are made on Site A
and replicated to Site B
Site B
Replica
15. Replication Solutions
Replication Levels
Hardware (block level) storage-based replication
Software (file system level) host-based replication
Application-based replication
Exchange Server 2007 CCR
Replication Types
Synchronous
Asynchronous
A data replication mechanism between sites is needed
16. Synchronous Replication
Host receives “write complete” response from the storage after
the data is successfully written on both storage devices
Primary
Storage
Secondary
Storage
Write
Complete
Replication
Acknowledgement
Write
Request
17. Asynchronous Replication
Host receives “write complete” response from the storage after
the data is successfully written to the primary storage device
Primary
Storage
Secondary
Storage
Write
Complete
Replication
Write
Request
18. Synchronous vs. Asynchronous
Synchronous Asynchronous
No data loss Potential data loss on
hard failures
Requires high
bandwidth/low
latency connection
Enough bandwidth to keep
up with data replication
Stretches over
shorter distances
Stretches over
longer distances
Write latencies impact
application performance
No significant impact on
application performance
19. What About DFS-Replication?
DFS-R performs replication on file close
Some file types stay open for a very long time
VHDs for Virtual Machines
Databases for SQL Server
Data could be lost during a failover if it had not yet replicated
Using DFS-R to replicate the cluster disk’s data
in a multi-site Failover Cluster is not supported
20. Disk Resource
Resource Group
Custom Resource
(manages replication)
IP Address
Resources*
Network Name
Resource
Establishes
start order
timing
Group
determines
smallest unit of
failover
Resource Dependencies
Workload Resource (example File Server)
“ depends on ”
22. Site B
Network Considerations
Cluster nodes can reside in different subnets (2008/R2)
No need to connect nodes with VLANs
Site A
10.10.10.1 20.20.20.1
30.30.30.1
40.40.40.1
Public
Network
Separate
Network
23. Stretching the Network
Longer distance means greater network latency
Too many missed health checks can cause false failover
Fully configurable in 2008/R2
Failover Clustering has NO DISTANCE & NO SUBNET LIMITATIONS
Check if your vendor’s hardware / replication has limitations
SameSubnetDelay (default = 1 second)
Frequency heartbeats are sent
SameSubnetThreshold (default = 5 heartbeats)
Missed heartbeats before an interface is considered down
CrossSubnetDelay (default = 1 second)
Frequency heartbeats are sent to nodes on dissimilar subnets
CrossSubnetThreshold (default = 5 heartbeats)
Missed heartbeats before an interface is considered down to nodes on dissimilar subnets
Command Line: Cluster.exe /prop
PowerShell (R2): Get-Cluster | fl *
24. Security Over the WAN
Improved Security
Prevent Clients from Connecting to Networks
Encrypt Intra-cluster Traffic
0 = clear text
1 = signed (default)
2 = encrypted
25. IP Address
Resource B
IP Address
Resource A
Enhanced Dependencies – OR
Network Name resource stays up if either
IP Address Resource A OR IP Address Resource B is up
Network Name Resource
OR
26. Disk Resource
Workload Resource (example File Server)
IP Address
Resources A
Network Name
Resource
Resource Dependencies
IP Address
Resources B
Comes online
on site A
Comes online
on site B
OR
Custom App
(replication)
28. DNS Updates
Nodes in dissimilar subnets
Failover changes resource’s IP Address
Clients need that new IP Address from DNS to reconnect
10.10.10.111 20.20.20.222
DNS Server 1 DNS Server 2
DNS Replication
Record Updated
Record Created
Record Obtained
FS = 10.10.10.111
Record Updated
FS = 20.20.20.222
Site A Site B
29. Network Name Properties
RegisterAllProvidersIP (default = 0 for FALSE)
Determines if all IP Addresses for a Network Name will be registered by DNS
TRUE (1): IP Addresses can be online or offline and will still be registered
Ensure application is set to try all IP Addresses, so clients can come online quicker
HostRecordTTL (default = 1200 seconds)
Controls time the DNS record lives on client for a cluster network name
Shorter TTL: DNS records for clients updated sooner
Exchange Server 2007 recommends a value of five minutes (300 seconds)
30. Local Failover First
Local failover first
No change in IP Address
Cross-site failover for disaster recovery
10.10.10.111
DNS Server 1 DNS Server 2
FS = 10.10.10.111
Site A Site B
20.20.20.222
FS = 20.20.20.222
31. Failover Order
Preferred Owners
Local failover first
Possible Owners Always Enforced
Resource will not start on
non-possible owner
AntiAffinityClassNames
Groups with same AACN try to
avoid moving to same node
http://msdn.microsoft.com/en-us/
library/aa369651(VS.85).aspx
32. Virtual LAN (VLAN)
Deploying a VLAN minimizes client reconnection times
Can be harder to configure
Required for SQL & live migration
10.10.10.111
DNS Server 1 DNS Server 2
FS = 10.10.10.111
Site A Site B
VLAN
10.10.10.111
35. Quorum Overview
Node majority
Node and File Share majority
Disk only (not recommended)
Node and Disk majority
Vote Vote Vote VoteVote
Majority is greater than 50%
Possible Voters:
Nodes (1 each), Disk Witness (1 max), File Share Witness (1 max)
4 Quorum Types
36. Node and Disk Majority
Nodes get 1 vote each and Disk gets vote
Loss of disk or node OK if majority is maintained
Do not use in multi-site clusters unless directed by vendor
Vote VoteVote
Replicated Storage
from vendor
?
37. Node Majority
Site BSite A
Cross site network
connectivity broken!
Can I communicate
with majority of the
nodes in the cluster?
Yes, then Stay Up
Can I communicate
with majority of the
nodes in the cluster?
No, drop out of
Cluster Membership
5 Node Cluster:
Majority = 3
SAN SAN
Majority in
Primary Site
38. Node Majority
Site BSite A
Disaster at Site 1
We are down!
Can I communicate
with majority of the
nodes in the cluster?
No, drop out of
Cluster Membership
SAN SAN
Majority in
Primary Site
5 Node Cluster:
Majority = 3
39. Forcing Quorum
Always understand why quorum was lost
Used to bring cluster online without quorum
Cluster starts in a special “forced” state
Once majority achieved, no more “forced” state
Command line:
net start clussvc /forcequorum (or /fq)
PowerShell (R2):
Start-ClusterNode –FixQuorum (or –fq)
40. Site A
Multi-Site With File Share Witness
Site B
WAN
Site C
SAN SAN
FooCluster1
Complete resiliency and
automatic recovery from
the loss of any 1 site
File Share
Witness
Replicated Storage
from vendor
41. WAN
Site A
Multi-Site With File Share Witness
Site B
Site C
SAN SAN
Complete resiliency and
automatic recovery from
the loss of any 1 site
File Share
Witness
Replicated Storage
from vendor
FooCluster1
42. WAN
Site A
Multi-Site With File Share Witness
Site B
Site C
SAN SAN
Complete resiliency and
automatic recovery from
the loss of the File
Share Witness
File Share
Witness
Replicated Storage
from vendor
FooCluster1
43. FSW Considerations
Simple Windows File Server
Needs to be in the same forest
Running Windows Server® 2003,
2008 or 2008 R2
Recommended to be at 3rd separate site
Single file server can serve
as a witness for multiple clusters
Each cluster requires its own share
Can be clustered in a second cluster
FSW cannot be on a node in the same cluster
It is an additional voter for free (almost)
45. Quorum Model Summary
No Majority: Disk Only
Note Recommended
Only use as directed by vendor
Node and Disk Majority
Only use as directed by vendor
Node Majority
Odd number of nodes
Node and File Share Majority
Best availability solution
Recommended for
Exchange Server 2007 CCR
47. Cluster your Branch Offices
Cluster several standalone File Servers from branch offices
Keep network traffic low
High-Availability for the files
Redundancy for the data
Site BSite A
Clients primarily
accessing
applications in
Site A
Clients primarily
accessing
applications in
Site B
48. Multi-Site Across the Enterprise
More distributed cluster nodes & clusters gives higher availability
Complete resiliency and automatic failover
Remember your quorum model
Loss of any single site should not bring down the cluster
File Share Witness
1 File Server hosts all File Share Witnesses for multiple clusters
Make it highly-available
Separate site
Not a node in that same cluster
Cluster 1, Site 1
Cluster 2, Branch 1
Cluster 2, Main Office
Cluster 2, Branch 2
Cluster 1, Site 2 Cluster 3, Many FSWs
49. WANSite A
Multi-Site Clustering Review
Site B
Site C
SAN SAN
4, 6, 8… nodes + FSW = odd # votes
Local failover first (preferred owner)
Site failover second (possible owner)
AntiAffinityClassNames
File Share
Witness
Replicated Storage from vendor
Faster DNS Updates
Register all IPs for a Network Name
Shorten client’s DNS record TTL
Ensure application tries all IPs
Encrypt WAN traffic for security
Adjust health checks for latency
Configure ‘OR’ dependencies
50. Session Summary
Multi-Site Failover Clustering has many benefits
Variety of hardware options & configurations
Redundancy is needed everywhere
Understand your replication needs
Compare VLANs with multiple subnets
Plan your quorum model & nodes before deployment
Follow the checklist and best practices
http://technet.microsoft.com/en-us/library/dd197546.aspx
51. Are You Up For
a Challenge?
Become a Cluster MVP!
Contact: ClusMVP@microsoft.com
Passion for High Availability?
52.
53. www.microsoft.com/teched
Sessions On-Demand & Community
http://microsoft.com/technet
Resources for IT Professionals
http://microsoft.com/msdn
Resources for Developers
www.microsoft.com/learning
Microsoft Certification and Training Resources
www.microsoft.com/learning
Microsoft Certification & Training Resources
Resources
54. Related Content
Breakout Sessions
WSV310 Failover Clustering Feature Roadmap for Windows Server 2008 R2
WSV313 Innovating High Availability with Cluster Shared Volumes (CSV)
WSV316 Multi-Site Clustering with Windows Server 2008 Enterprise
VIR311 From Zero to Live Migration. How to Set Up a Live Migration
DAT302 All You Need to Know about Microsoft SQL Server 2008 Failover Clusters
DAT306 Building a HA Strategy for Your Enterprise Using Microsoft SQL Server 2008
DAT322 Tips and Tricks for Successful Database Mirroring Deployments with Microsoft SQL Server
WSV311 High Availability and Disaster Recovery Considerations for Hyper-V
WSV315 Implementing Hyper-V on Clusters (High Availability)
UNC313 High Availability in Microsoft Exchange Server "14"
UNC402 Microsoft Exchange Server 2007 HA and Disaster Recovery Deep Dive
BOF52 Microsoft Exchange Server 2007 HA and Disaster Recovery: Are You Prepared?
Interactive Sessions
WSV01-INT Failover Clustering Unleashed with Windows Server 2008 R2
UNC02-INT Designing Microsoft Exchange Server "14" High Availability Solutions
Hands on Labs
WSV16-HOL Windows Server 2008 R2: Failover Clustering
VIR03-HOL Implementing Windows Server 2008 Hyper-V HA and Quick Migration
DAT12-HOL Microsoft SQL Server 2008 Database Mirroring, Part 1
DAT13-HOL Microsoft SQL Server 2008 Database Mirroring, Part 2
UNC12-HOL Microsoft Exchange Server "14" High Availability and Storage Scenarios
55. Track Resources
Cluster Team Blog: http://blogs.msdn.com/clustering/
Cluster Information Portal:
http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx
Clustering Technical Resources:
http://www.microsoft.com/windowsserver2008/en/us/clustering-resources.aspx
Clustering Forum (2008):
http://forums.technet.microsoft.com/en-US/winserverClustering/threads/
Clustering Forum (2008 R2): http://social.technet.microsoft.com/Forums/en-
US/windowsserver2008r2highavailability/threads/
Clustering Newsgroup: http://www.microsoft.com/communities/newsgroups/list/en-
us/default.aspx?dg=microsoft.public.windows.server.clustering
Failover Clustering Deployment Guide: http://technet.microsoft.com/en-us/library/dd197477.aspx
TechNet: Configure a Service or Application for High Availability:
http://technet.microsoft.com/en-us/library/cc732478.aspx
TechNet: Installing a Failover Cluster: http://technet.microsoft.com/en-us/library/cc772178.aspx
TechNet: Creating a Failover Cluster: http://technet.microsoft.com/en-us/library/cc755009.aspx
Webcast (2008 R2): Introduction to Failover Clustering:
http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407190&Culture=en-US
Webcast (2008 R2): HA Basics with Hyper-V:
http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407222&Culture=en-US
Webcast (2008 R2): Cluster Shared Volumes (CSV):
http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032407238&Culture=en-US
56. Windows Server Resources
Make sure you pick up your
copy of Windows Server 2008
R2 RC from the Materials
Distribution Counter
Learn More about Windows Server 2008 R2:
www.microsoft.com/WindowsServer2008R2
Technical Learning Center (Orange Section):
Highlighting Windows Server 2008 and R2 technologies
• Over 15 booths and experts from Microsoft and our partners