Session from SQLBits 2008, covering:
Scopes of Protection in SQL Server 2005
SQL Server Backup Features and Technologies
SQL Server Disaster Recovery Features
SQL Server High Availability Features
SQL Server Data Distribution Features
Recoverability Scenarios Review
2. 01 March 2008 | Birmingham, United Kingdom
HIGH AVAILABILITY AND DISASTER RECOVERY OVERVIEW
Charley Hanania
B.Sc (Computing Science), MCP, MCDBA, MCITP, MCTS, MCT
Senior Database Specialist
Production Product Owner – MS SQL Server
UBS Investment Bank
3. 01 March 2008 | Birmingham, United Kingdom
General Overview
Scopes of Protection in SQL Server 2005
SQL Server Backup Features and
Technologies
SQL Server Disaster Recovery Features
SQL Server High Availability Features
SQL Server Data Distribution Features
Recoverability Scenarios Review
4. 01 March 2008 | Birmingham, United Kingdom
Definitions
Scope of Protection
Object scope (or boundary) that can be protected from a
recoverability or availability standpoint.
Disaster Recovery
Processes, policies and procedures of restoring operations critical to
the resumption of business.
High Availability
The end users' ability to access the system. If a user cannot access
the system, it is said to be unavailable.
Data Distribution
Creating subsets or copies of application data in different locations
for system scalability, operational efficiency or manageability.
Outage / Downtime
Generally, the terms outage or downtime are used to refer to
periods when a system is unavailable.
5. 01 March 2008 | Birmingham, United Kingdom
Agenda
Scopes of Protection in SQL Server 2005
SQL Server Backup Features and
Technologies
SQL Server Disaster Recovery Features
SQL Server High Availability Features
SQL Server Data Distribution Features
Recoverability Scenarios Review
6. 01 March 2008 | Birmingham, United Kingdom
Scopes of Protection
Examples of scopes or conceptual objects
Database pages
Database objects (table, stored proc, function etc)
Files / Filegroups
Databases
Instances
Servers
Sites...
Others:
○ Transactions
○ Information consistency
data that is only meaningful when presented in a synchronised
fashion with other data
○ User experience
7. 01 March 2008 | Birmingham, United Kingdom
Protection: A Balancing
Act... Balancing protection against cost and function is something
you'll need to think about right from the start...
Key word: "Prioritise"
Decide what is important and what's nice to have
Look at costs
○ Time vs. component
○ Brand vs function
○ Consistency vs versatility
○ People vs automation
○ Technology vs manual processes
○ Etc
Look at effort
○ Homegrown vs prepackaged
○ Time to market/implementation timelines
Look at the return on the overall investment
○ Supportability
○ Fit For Use / Fit For Purpose
8. 01 March 2008 | Birmingham, United Kingdom
Agenda
Scopes of Protection in SQL Server 2005
SQL Server Backup Features and
Technologies
SQL Server Disaster Recovery Features
SQL Server High Availability Features
SQL Server Data Distribution Features
Recoverability Scenarios Review
9. 01 March 2008 | Birmingham, United Kingdom
Backup
The underrated overachiever...
Recovery Models
Set the way that Transaction Logs are
maintained.
Affect the restoration possibilities.
Not to be confused with Backup types!
The basis of any database recovery strategy.
3rd Party Vendors have tools which can
extend backup and recovery possibilities.
10. 01 March 2008 | Birmingham, United Kingdom
Recovery Models - Full
Enables recovery to the most atomic level
Point in time
Marked Transaction
Log Sequence Number (LSN)
Recovery of a corrupt page possible as well.
Transaction Logs must be backed up
manually.
The backup will purge the inactive portion of the
log after writing to disk
11. 01 March 2008 | Birmingham, United Kingdom
Recovery Models – Bulk
Logged
Designed for use with Full Recovery Model.
Allows bulk operations such as BCP and BULK
INSERT to log less information to the Log.
Allows point in time recovery to the time that a
Transaction Log backup occurred.
Should be switched on and off when needed,
not used as a long term model.
Transaction Logs must be backed up manually.
12. 01 March 2008 | Birmingham, United Kingdom
Recovery Models - Simple
No point in time (etc) recovery.
Automatically purges log periodically.
If the benefits of using log backups do not
justify the cost of managing the backups, MS
recommend that you use the simple recovery
mode (Books Online)
Remember: entries are still written to the Log
during normal operations!
13. 01 March 2008 | Birmingham, United Kingdom
Backup Types - Full
Backs up complete database.
Objects, files and data.
Portion of Transaction Log as well.
Around the same size as the used space
allocation.
Will allow recovery to the time that the
backup finished.
Is the basis of all other backup types for a
restore sequence.
14. 01 March 2008 | Birmingham, United Kingdom
Backup Types – Differential
Uses a bitmap internally to work out what
pages have changed in the database since
the last full backup.
Generally much smaller that a full backup
Size depends on the number of unique pages
changed.
When used with full and bulk-logged
recovery model, can speed up point in time
recovery.
15. 01 March 2008 | Birmingham, United Kingdom
Backup Types – Transaction
Log
Back up the sequence of changes that have
been committed to the database since the
last transaction log backup.
Allows for point in time (etc) recovery by
reapplying the transactions in order to the
point you want.
Needed for some of the DR solutions we’ll
discuss.
16. 01 March 2008 | Birmingham, United Kingdom
Backup Types –
File/Filegroup
Needs a full backup to have been taken first.
Useful for databases that have multiple
filegroups due to size of complexity of the
system.
Is similar in concept to the differential
backup, focussed on the files or filegroups
within it.
Great way to save backup space and time
during recovery.
17. 01 March 2008 | Birmingham, United Kingdom
Fixing damaged pages using page restore
18. 01 March 2008 | Birmingham, United Kingdom
Agenda
Scopes of Protection in SQL Server 2005
SQL Server Backup Features and
Technologies
SQL Server Disaster Recovery Features
SQL Server High Availability Features
SQL Server Data Distribution Features
Recoverability Scenarios Review
19. 01 March 2008 | Birmingham, United Kingdom
Disaster Recovery
All about the time and procedures needed to
restore normal operations.
No “one” solution covers all bases
Combining solutions is possible
Think of overall system, not just the
database.
20. 01 March 2008 | Birmingham, United Kingdom
Log Shipping Concepts
Scope: Database
Primary
The copy of the database that the applications connect to.
Secondary
The copy of the primary database on another instance.
Can be use to offload reporting needs as a read only copy.
Database is inaccessible when being brought up to the Primary’s level.
Monitor
Optional SQL instance.
Backup failure alert information.
When the transaction log was last backed up.
When the transaction log was last copied and restored.
21. 01 March 2008 | Birmingham, United Kingdom
Log Shipping - Method
Use a full backup to bring the secondary in
synch with primary initially.
Do not recovery the db!
Three Steps:
Back up the Primary’s transaction log.
Copy the transaction log file to the secondary
server instance.
Restore the log backup to the secondary.
Uses SQL Server Agent to do this work.
22. 01 March 2008 | Birmingham, United Kingdom
Log Shipping Multiple
Copies
Trans scope
maintained
Manual Failover
& reconfig
No distance
limits
23. 01 March 2008 | Birmingham, United Kingdom
Database Mirroring
Concepts
Principal
The copy of the database that the applications connect to.
The server that hosts it is known as the principal server.
Mirror
The copy of the principal database.
always in a restoring state;
not accessible to the applications.
The server that hosts the mirror database is known as the mirror
server.
Witness
Optional SQL instance.
Separate from the principal and mirror instances.
When used in synchronous mode, provides automatic failover.
24. 01 March 2008 | Birmingham, United Kingdom
Database Mirroring
Concepts
Send Queue
Located at the Principle
Used if the log records can’t be sent at the rate at which they are
generated.
It exists entirely in the transaction log.
Redo Queue
Located at the mirror
Used if the log records can’t be applied at the rate at which they
are received.
Does not use extra storage or memory.
Exists entirely in the transaction log of the mirror.
It is the part of the hardened log that remains to be applied to the
mirror database to roll it forward.
25. 01 March 2008 | Birmingham, United Kingdom
Database Mirroring
Concepts
Endpoint
SQL Server object that enables SQL Server to communicate
over the network.
It encapsulates a transport protocol and a port number.
Failover
When the principal database (or the server hosting it) fails
26. 01 March 2008 | Birmingham, United Kingdom
Database Mirroring
Less than
three seconds
Zero committed
work lost
Maximum one
mirror per DB
Transparent
client redirect
27. 01 March 2008 | Birmingham, United Kingdom
Mirroring with Failover
(Process)
Clients
Principal Server
Mirror Server
Witness
Server
28. 01 March 2008 | Birmingham, United Kingdom
Mirroring with Failover
(Process)
Clients
Witness
Server
Mirror Server
Principal Server
29. 01 March 2008 | Birmingham, United Kingdom
Mirroring with Failover
(Process)
Clients
Witness
Server
Mirror Server
Principal Server
30. 01 March 2008 | Birmingham, United Kingdom
Database Snapshots
Historical data
snapshots
Safeguards
against user /
admin error
Doesn’t protect
against disk errors
/ other corruptions
31. 01 March 2008 | Birmingham, United Kingdom
Database Snapshot Scenarios
Mirroring for reporting
Point-in-time reporting
Recover from administrative error Protection from application or user error
32. 01 March 2008 | Birmingham, United Kingdom
Agenda
Scopes of Protection in SQL Server 2005
SQL Server Backup Features and
Technologies
SQL Server Disaster Recovery Features
SQL Server High Availability Features
SQL Server Data Distribution Features
Recoverability Scenarios Review
33. 01 March 2008 | Birmingham, United Kingdom
High Availability
Minimise the risk to the organisation of H/W and S/W
failures
Try to minimise the downtime (perceived downtime)
Each H/A solution caters for different risks
No “one” solution covers all bases
Combining solutions may be best approach
Careful planning/costing is advisable
Changes may be needed in your application
True H/A incorporates availability and performance
models on all levels, business process, H/W,
Application, N/W and service (employees)
34. 01 March 2008 | Birmingham, United Kingdom
Clustering Concepts
Main Instance
Services the application.
Hot Standby Instance
Takes over database operations when the main
server Instance fails.
Quorum
Tells the cluster which node should be active.
Intervenes when communications fail between nodes.
Failover
When the standby instance takes over from the main
instance.
35. 01 March 2008 | Birmingham, United Kingdom
~20-second
failover
Eight nodes
Zero committed
work lost
More SQL
services
supported
Failover Clustering
36. 01 March 2008 | Birmingham, United Kingdom
Implementing Failover
Clustering
Clustered
Servers
Virtual
Instance
Clients
Heartbeat
Network
Shared Disk
Array
Private
Network
Effective but expensive
37. 01 March 2008 | Birmingham, United Kingdom
Agenda
Scopes of Protection in SQL Server 2005
SQL Server Backup Features and
Technologies
SQL Server Disaster Recovery Features
SQL Server High Availability Features
SQL Server Data Distribution Features
Recoverability Scenarios Review
38. 01 March 2008 | Birmingham, United Kingdom
Replication Concepts
Publisher
“Publishes” to other locations through replication.
Many publications possible of logically related sets of objects and data.
Subscriber
Receives replicated data from one or more publishers and publications.
Depending on the type of replication, can also pass changes back or republish to
other Subscribers.
Distributor
Stores replication specific data associated with one or more Publishers.
○ Replication status.
○ Publication Metadata.
○ Acts as a queue.
Publications and Articles
Comprised of many logically related articles (objects being replicated)
39. 01 March 2008 | Birmingham, United Kingdom
Replication
Provides hot
standby
No distance limits
No conflict detection
Single table to
entire database
Some committed
data loss
40. 01 March 2008 | Birmingham, United Kingdom
Fault Tolerance with Peer-to-
Peer
Los Angeles
Zurich
Sydney
41. 01 March 2008 | Birmingham, United Kingdom
Load Balancing with Peer-to-
Peer
Read / Write
Load Balancing
Application
Server
Application
Server
Read-Only
Load Balancing
Replication
Read
Write
Key
42. 01 March 2008 | Birmingham, United Kingdom
Comparison of High Availability
Options
Feature
Hot Standby Warm Standby
Database
Mirroring
Failover
Clustering
Peer-to-Peer
Transactional
Replication
Log
Shipping
Data Loss
No data loss
option
No data loss
Some Data Loss
possible
Some data loss
possible
Some data loss
possible
Automatic Failover Yes Yes Optional No No
Transparent to Client
Yes, Auto-
Redirect
Yes, Reconnect to
same IP
Optional No, NLB helps No, NLB helps
Downtime < 3 Seconds
20 Sec + DB
Recovery
None Seconds
Seconds + DB
Recovery
Standby Read Access
Continuously
accessible
Snapshot
No
Continuously
accessible
Continuously
accessible
Intermittently
accessible
Data Granularity Database Only
All System and
User Databases
Table or View Table or View Database Only
Masks Disk Failure Yes No, Shared Disk Yes Yes Yes
Special Hardware Needed
No, Dup. system
needed
Specialized
Hardware from
Cluster HCL
No, Dup. system
needed
No, Dup. system
needed
No, Dup. system
needed
Complexity Medium Medium/High High High Medium
Microsoft TechNet 2005
44. 01 March 2008 | Birmingham, United Kingdom
Recoverability Scenario
Review
24x7 Customer facing web portal
Customers connecting globally
Order entry
Order tracking
1000’s transactions per hour
System must be available
Outages for ordering not acceptable
Tracking can endure 2-3 hrs outage
Discuss...
45. 01 March 2008 | Birmingham, United Kingdom
Recoverability Scenario
Review
Internal line of business app
100’s of users
Management Reporting Critical
Employees find that screens freeze at times
○ Root found to be when mgt run reports
Small outages ok
Loss of data unacceptable.
Some paper trails exist but not always reliable
Discuss...
46. 01 March 2008 | Birmingham, United Kingdom
Recoverability Scenario
Review
Credit Card Risk System
1,000,000’s of transactions per minute.
Analyses out of ordinary spend & approves or
rejects.
Data comes from feeder systems updated in
batches.
System unavailability poses risk as shops
may decide to lodge manually.
Discuss...
47. 01 March 2008 | Birmingham, United Kingdom
Recoverability Scenario
Review
Point of Sale system
Interfaces with Risk system.
Must guarantee that transactions are not lost if
completed.
1,000,000’s of transactions per minute.
Outages OK, as manual methods possible.
Unavailability preferable to missing
transactions.
Discuss...
48. 01 March 2008 | Birmingham, United Kingdom
Using piecemeal restore
49. 01 March 2008 | Birmingham, United Kingdom
SQL DR Resources
Description of disaster recovery options for
Microsoft SQL Server
http://support.microsoft.com/kb/822400
SQL Server Disaster Recovery and
Availability (MSDN forums)
https://forums.microsoft.com/MSDN/ShowForum.
aspx?ForumID=744&SiteID=1
50. 01 March 2008 | Birmingham, United Kingdom
SQL HA Resources
SQL Server High Availability Site
http://www.microsoft.com/sql/technologies/highavailab
ility/default.mspx
SQL Server Mission Critical High Availability
http://www.microsoft.com/technet/prodtechnol/sql/the
mes/high-availability.mspx
Why Consider a Service-Oriented Database
Architecture for Scalability and Availability
http://www.microsoft.com/sql/techinfo/whitepapers/wh
y-soda.mspx
51. 01 March 2008 | Birmingham, United Kingdom
Additional Resources
Books Online has a wealth of information! (Sept 07)
http://msdn2.microsoft.com/en-us/library/ms130214.aspx
Database Mirroring in SQL Server 2005
http://www.microsoft.com/technet/prodtechnol/sql/2005/db
mirror.mspx
Always on Technologies
http://www.microsoft.com/sql/alwayson/default.mspx
Performing Piecemeal Restores
http://msdn2.microsoft.com/en-us/library/ms177425.aspx
52. 01 March 2008 | Birmingham, United Kingdom
Resources - Webcasts
53. 01 March 2008 | Birmingham, United Kingdom
Additional Resources - Virtual Labs
54. 01 March 2008 | Birmingham, United Kingdom
Additional Resources - Virtual
Machines
55. 01 March 2008 | Birmingham, United Kingdom
Resources - Community
Swiss PASS Chapter
www.sqlpass-swiss.org
Swiss IT Pro User Group
www.swissitpro.ch
Monthly sessions in Zurich and Geneva
56. 01 March 2008 | Birmingham, United Kingdom
European PASS Conference 2008
http://www.european-pass-conference.com/default.aspx
57. 01 March 2008 | Birmingham, United Kingdom
Thank You