2AM. We sleeping well. And our mobile ringing and ringing. Message: DISASTER! In this session (on slides) we are NOT talk about potential disaster (such BCM); we talk about: What happened NOW? Which tasks should have been finished BEFORE. Is virtual or physical SQL matter? We talk about systems, databases, peoples, encryption, passwords, certificates and users. In this session (on few demos) I'll show which part of our SQL Server Environment are critical and how to be prepared to disaster. In some documents I'll show You how to be BEST prepared.
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
KoprowskiT_SQLSat219_Kiev_2AM-aDisasterJustbegan
1. 2 AM
A DISASTER JUST BEGAN…
Tobiasz Janusz Koprowski
Communnity Leader , SQL Server MVP
@KoprowskiT
2. ABOUT ME
Polish SQL Server User Group Leader
Microsoft Certified Trainer
MCP, MCSA, MLSS, MLSBS, MCTS, MCITP, MCT
SQL Server MVP three years in a row)
Blogger, Influencer, Technical Writer
Last 8 years living in Data Center in Wrocław
Generally about 14 years in IT/banking area
Speaker at SQL Server Community Launch, Time for
SharePoint, CodeCamps, SharePoint Community Launch,
CISSP Day, SQL in the City, InfoTRAMS, SQL Bits, SQL
Saturday, CareerCon, Sharepoint & SQL Connection, IT Camp,
Deep Dives Co-Author:
High availability of SQL Server in the context
of Service Level Agreements (Chapter 18th)
3. 2:00 AM … In a dreams…
Your best time for dreaming … is the best
time for Disaster
Your mobile phone ringing and ringing…
And Your husband / wife say…….
4. 2:15 AM … in a car
What’s happen with my server?
When I made last backup?
Where is my backup?
Have I ever tried to restore?
If yes – I hope that all in a team (about team
soon) remember about (me?)
If not – who can help me NOW?
5. 2:40 AM … in a SERVER ROOM
$#$$@$^^#^&^@!#
Is Windows Server alive?
YES (thanks all saints)
or NOT (damn)
who is responsible for it?
Is my SQL Server alive?
YES (why phones ringing)
or NOT (…)
Why I’m responsible for it?
9. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
Backups
about type of backup ( simple rm / full rm)
about place for stored backup data
about backup window
about procedure used for backup
about backup tools
about backup of „backup logs”
about estimated time for executing backup
about REAL TIME of executing backup
10. BACKUP > extract from SOP*
In the request, backup, should include the following information:
• Information about the operating system and application version
for online backup and installed updates for these components
• a file backup policy, in particular:
a number of versions of a file stored
the storage time of the next version of the file
the frequency of execution of such incremental backups with the
proposal of their implementation
• Online Backup Policy
the storage time of a full backup with storage time such an
incremental backup
the time of transaction log files
the frequency of execution of a full backup with the proposal deadline
for its implementation
the frequency of execution of the transaction log backup
• Information about trees directories / files that should be omitted
or included during backup tasks (include / exclude list)
• Number and type / model of physical processors,
• Does the node will use the connection to the SAN to implement backup
11. BACKUP (registry) > extract from SOP*
This register contains information about a backup plan implemented.
Backup file space:
number of versions of a file stored in a backup
number of days that are kept more versions of a file
number of versions of a file stored in the backup system after its removal from
client device
number of days that will store the latest version of the deleted file from the client
device
number of days that will be stored in the archive
Online Backup:
number of backups stored in full backup
number of incremental backups / diff / full stored in the backup
frequency of transaction log backups stored in the backup (for databases)
number of days the backups to be stored on-line backup system
The list of nodes defined in the system backup:
Domain | Node name | IP address of the node
The list of defined backup tasks (called schedule)
name of the task (schedule) | execution time
a period of at which the task is repeated
12. BEST PRACTICES BY BRENT OZAR
SQL Server Backup Best Practices | Written on October 17, 2007 by Brent Ozar in SQL
Server >> http://www.brentozar.com/archive/2007/10/backing-up-sql-server-my-own-
mediocre-practices/
I’ve been backing up SQL Servers for almost a decade now, and it’s time to share the
lessons I’ve learned. All of this is my own opinion – your mileage may vary – but I’ll try to
explain the reasoning behind the choices I make. I won’t address log shipping or
snapshots this time around.
• Never back up databases to local disk.
• Back up databases to a fileshare, then back the share up to tape.
• Cost justify the network share with lower licensing costs & simpler backups.
• Back up to a different SAN if possible.
• My sweet spot for the backup array is raid 10 SATA.
• Backup agents like NetBackup and Backup Exec mean giving up scheduling control.
• Do regular fire drill rebuilds and restores.
• Build a standalone restore testbed.
• Keep management informed on restore time estimates.
• Trust no one.
13. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
Restore
about type of backup ( simple rm / full rm)
about place for stored backup data
about the procedures of recovery
about estimated time for recovery
about REAL TIME for recovery
about tools for recovery
about Corporate Backup Manager
about password for access to library
14. RESTORE > extract from SOP*
Register for Recovery/Restore/Replacement Tests
This register contains information about the tests and replacement of
part or all of the environment. It consists of the following fields:
the date of commencement and completion of the recovery test
client for which the test was performed recovery test
servers involved in testing and replacement
extent of testing and replacement
person / persons performing the recovery test
person on the client side accepts the correctness of the recovery
test
subsequent to the recovery test
15. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
Procedures
It is not about stored procedures!!!
It’s about storing procedures with answers for the following:
One piece of paper
How to start restore
Who can help
How to processing a restore
When we can finish
It MUST be simple
16. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
Roles
Database Administrator
Windows Administrator
Backup Administrator
Network Administrator
Customer Key Account
Manager of division
Data Center Manager
Nightshift Operator - BOFH
Customer Administrator!!
17. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
PSO > USO > SLA
PSO Planned System Outages – Planned System
Unavailability
Minimum planned unavailability, due to the need to carry out
modernization work, installing patches, replacement / extension
of hardware,
Agreed/accepted by/with the client and not affecting the
provisions of the HA, and SLA, until
...USO Unplaned System Outages – Unplanned System
Unavailability
an error that prevents a partial or total work environment in a
tangible, measurable customer
resulting in high costs if you need repairs, as well as penalty
payments for non-SLA
18. The Magic nines…
Availability % Downtime per year
Downtime per
month*
Downtime per
week
90% 36.5 days 72 hours 16.8 hours
95% 18.25 days 36 hours 8.4 hours
98% 7.30 days 14.4 hours 3.36 hours
99% 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 min
99.8% 17.52 hours 86.23 min 20.16 min
99.9% ("three nines") 8.76 hours 43.2 min 10.1 min
99.95% 4.38 hours 21.56 min 5.04 min
99.99% ("four nines") 52.6 min 4.32 min 1.01 min
99.999% ("five nines") 5.26 min 25.9 s 6.05 s
99.9999% ("six nines") 31.5 s 2.59 s 0.605 s
22. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
Envelope
With ACTUAL!!! User names and passwords for:
Windows Server Administrator
SQL Server Administrator
SQL Server Agent
SQL Server Services (if You didn’t use default)
SQL Server Applications Services
Backup accounts
23. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
Hardware
Some of the hard stuff for replacement:
Server
Motherboard
Memory (RAM)
Processor (CPU)
Network Adapter (LAN/NIC)
Fibre Channel Adapter
Hard Disk (IDE/SATA/SAS/SSD…)
RAID Controller
24. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
Software
Windows
2000/2003/2003R2/2008/2008R2/2012
SP 1,2,3,4 +CU 1, 2,3, …
Standard, Enterprise, Data Center
x32, x64, ia64
SQL Server
6.5, 7.0, 2000, 2005, 2008, 2008R2, 2012
SP 1,2,3,4 +CU, 1,2,3,4,5,6,7,8,9,10,11,12,13….
Drivers (servers, lan card, video card)
AGENT ORANGE
25. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
Keys
Some keys which You need…
Serial keys
Rack keys
Server keys
Storage keys
Knife
Lighter
Phone
26. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
ENCRYPTION
If You use encryption (such a TDE)
TDE
Create encryption key
Export encryption key
Backup encryption key
CA
Remember about expiration date
BitLocker
27. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
TEAM
You can work with disaster as:
Team Member
Team Leader
Last Samurai
28. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
MANAGERS
hmm
46. BEST PRACTICE FOR SURVIVING DISASTER
ITS ONLY ONE: BE PREPARED
• Backups (and know-how about stored place, and restoring way)
• Procedures (the short is better | one page is the best)
• Roles (who can help, who is necessary for access)
• SLA (90? 95? 99,99? in minutes, hours or days you have to recover)
• Envelope (with user names and passwords for all important
accounts)
• Hardware (server, motherboard, CPU, RAM, LAN, HDD, SDD, USB)
• Software (Windows+SP+CU, SQL+SP+CU, drivers, AGENT ORANGE)
• Keys (serial numbers, physical keys, knife)
• ENCRYPTION (arrghhhhh!!! Certificates, keys, internal/external)
• TEAM (Team, leader, separate…)
• MANAGERS (hmmm)