SQL Azure Database Overview

SQL Azure Database

SQL Azure. Une ou plusieurs bases.
Database

Application

Database

Application

Database
Database

Les applications utilisent les librairies
standards d’accès SQL : ODBC,
ADO.Net, PHP, …

Application

Implémentation
Internet

TDS (tcp)

Les load balancer répartissent la charge
sur les passerelles TDS en tenant compte
des affinités de session

LB

TDS (tcp)

Gateway

Gateway

Gateway

Gateway

Gateway

Gateway

Gateway: TDS protocol gateway, enforces AUTHN/AUTHZ policy; proxy to backend SQL
TDS (tcp)

SQL

SQL

SQL

SQL

SQL

Scalability and Availability: Fabric, Failover, Replication, and Load balancing

SQL

Sql Azure
Sql Server dans les nuages avec ses avantages :
Provisioning simple


Via le portail



Via l’API REST

Haute disponibilité
Load Balancing

Protocole TDS (le même que SQL Server) pour tout le reste
sur SSL (crypté)

Les différences avec Sql Server
Vous n’avez pas accès à tout ce qui est physique
(filegroup …)
Pas de CLR
Pas de transactions distribuées

Pas de service Broker

Développer avec Sql Azure
Implémenter une politique de Retry
Facturation de la bande passante donc utiliser dés que
possible :


Lazy loading



Cache

Windows Azure Storage
• Cloud Storage - Anywhere and anytime access
•

Blobs, Disks, Tables and Queues

• Highly Durable, Available and Massively Scalable
•
•
•

Easily build “internet scale” applications
8.5 trillion stored objects
900K request/sec on average (2.3+ trillion per month)

• Pay for what you use
• Exposed via easy and open REST APIs
• Client libraries in .NET, Java, Node.js, Python, PHP, Ruby

Abstractions – Blobs and Disks

•
•
•
•

•
•
•

Abstractions – Tables and Queues
•
•
•

•

•

•
•

•
•

http://<account>.blob.core.windows.net/<container>/<blobname>

Blobs

Account

Container

Blob
PIC01.JPG

images

PIC02.JPG

cohowinery

videos

VID1.AVI

Blob Storage
Pour stocker vos fichiers petits ou très grands

Les blocks blobs pour les fichiers image, vidéo etc.. 200 GB
max
Les page blobs optimisé pour la lecture écriture rapide 1Tb
Max
Les Azure Drives : un disque NTFS que vous pouvez
« monter » dans votre rôle et qui est sauvegardé
automatiquement dans un page blob

CDN avec smooth streaming pour les vidéos
Les blobs sont dans des containers

Accès public, ou privé
Snapshot
Shared access signature
Lease

Table Storage
1 seul index le couple PartitionKey/RowKey
Transactions possibles au sein d’une même partition
ODATA + authentification
Sdk .net opensource
https://github.com/WindowsAzure/azure-sdk-for-net
API REST
Table non relationnelle
Schéma flexible ( plusieurs versions de schéma peuvent cohabiter dans la même table)

1) Receive work

Web Role

Worker Role

Queue typical usage

ASP.NET,
WCF, etc.

2) Put message in
queue

main()
{ … }

4) Do
work

3) Get message
from queue
5) Delete
message from
queue

Queue

Windows Azure Data Storage Concepts
Container

Blobs

https://<account>.blob.core.windows.net/<container>

Account

Table

Entities

https://<account>.table.core.windows.net/<table>

Queue

Messages

https://<account>.queue.core.windows.net/<queue>

How is Azure Storage used by Microsoft?

Design Goals
Highly Available with Strong Consistency
• Provide access to data in face of failures/partitioning

Durability
• Replicate data several times within and across regions
Scalability
• Need to scale to zettabytes
• Provide a global namespace to access data around the world
• Automatically scale out and load balance data to meet peak traffic demands

• Additional details can be found in the SOSP paper:
•

“Windows Azure Storage: A Highly Available Cloud Storage Service with Strong
Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011

Windows Azure Storage Stamps
Access blob storage via the URL: http://<account>.blob.core.windows.net/

Storage
Location
Service

Data access

LB

LB

Front-Ends

Front-Ends

Partition Layer

Partition Layer

Inter-stamp (Geo) replication

DFS Layer

DFS Layer

Intra-stamp replication

Intra-stamp replication

Storage Stamp

Storage Stamp

Architecture Layers inside Stamps

•
•
•
•

•
•

Partition Layer

•
•

Index

•

•
•
•

•

•
•

Availability with Consistency for Writing

All writes are appends to the end of a log, which is an append to
the last extent in the log
Write Consistency across all replicas for an extent:

•

•
•

•

Appends are ordered the same across all 3 replicas for
an extent (file)

Only return success if all 3 replica appends are
committed to storage
When extent gets to a certain size or on write
failure/LB, seal the extent’s replica set and never
append anymore data to it

Write Availability: To handle failures during write

•
•
•

Seal extent’s replica set
Append immediately to a new extent (replica set) on 3
other available nodes
Add this new extent to the end of the partition’s log
(stream)

Partition Layer

• Read Consistency: Can read

from any Availability with Consistency for Reading
replica, since data in
each replica for an extent is bitwise identical
Partition Layer

• Read Availability: Send out
parallel read requests if first
read is taking higher than 95%
latency

• Spreads index/transaction Balancing – Partition Layer
Dynamic Load
processing across partition
servers
•
•

Master monitors traffic load/resource
utilization on partition servers

Partition Layer

Dynamically load balance partitions across
servers to achieve better
performance/availability
Index

• Does not move data around,
only reassigns what part of
the index a partition server
is responsible for

Dynamic Load Balancing – DFS Layer
• DFS Read load balancing across
replicas
•

•

•

•

•
•

Monitor latency/load on each node/replica;
dynamically select what replica to read from and start
additional reads in parallel based on 95% latency

Partition Layer

Architecture Summary

• Durability: All data stored with at least 3 replicas
• Consistency: All committed data across all 3 replicas are identical
• Availability: Can read from any 3 replicas; If any issues writing seal extent and continue
appending to new extent

• Performance/Scale: Retry based on 95% latencies; Auto scale out and load balance
based on load/capacity

• Additional details can be found in the SOSP paper:

• “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong
Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011

What’s Coming by end of 2013
•

•
•
•
•
•

Geo-Replication

•
•

Queue Geo-Replication
Secondary Read-Only Access

Windows Azure Import/Export
Real-Time Metrics for Blobs, Tables and Queues
CORS for Azure Blobs, Tables and Queues
JSON for Azure Tables
New .NET 2.1 Library

Two Types of Durability Offered
Local Redundant Storage Accounts

•
•

Maintain 3 copies of data within a given region
~ 2/3 price of Geo Redundant Storage

Geo Redundant Storage Accounts

•

Maintain 6 copies of data spread over 2 regions at least 400 miles apart from each other (3 copies are kept
at each region)

Geo Redundant Storage
Data geo-replicated across regions 400+ miles apart

•
•

Provide data durability in face of potential major regional disasters

Provided for Blob, Tables and Queues (NEW)

North
Central
US

User chooses primary region during account creation

•

Each primary region has a predefined secondary region

Geo-replication

Asynchronous geo-replication

•

Off critical path of live requests

North
Europe

South
East Asia

East Asia
Geo-replication

South
Central
US

Geo-replication

Europe
West

Geo-replication

West US

East US

Geo-Rep & Geo-Failover
Hostname
http://account.blob.core.windows.net/

Azure
DNS

IP Address

account.blob.core.windows.net East US
West US

Update DNS
DNS lookup
Data access

West US

Failover

East US

Geo-replication

•
•
•
•

Existing URL works after failover
Failover Trigger – failover would only be used by MS if primary could not be recovered
Asynchronous Geo-replication – may lose recent updates during failover
Typically geo-replicate data within minutes, though no SLA for how long it will take

Geo Redundant Storage Roadmap
• Customer Controlled Failover (Future)
•

Provide APIs to allow clients to switch the primary and secondary regions for a storage account

• Queue Geo-Replication (Done)
• Secondary Read Only Access (by end of CY13)

Secondary Read-Only Access – Scenarios
Read-only access to data even if primary is unavailable

•

Access to an eventually consistent copy of the data in the other region

Provides another read source for geographically distributed applications/customers

•
•

Allows lower latency access to data in secondary region
Have compute at both primary and secondary region and use the storage stored in that region

• For these, the application semantics need to allow for eventually consistent reads

Secondary RO Access – How it Works

Customers using Geo Redundant Storage can opt to have read-only access to the eventually consistent
copy of the data on Secondary tenant
•

Customer choose primary region, and the secondary region is fixed

Get two endpoints for accessing your storage account
• Primary endpoint
•

•

accountname.<service>.core.windows.net

Secondary endpoint

•

accountname-secondary.<service>.core.windows.net

Same storage keys work for both endpoints
Consistency
• All Writes go to the Primary
• Reads to Primary are Strongly Consistent
• Reads to Secondary are Eventually Consistent

Secondary RO Access – Capabilities
Application will be able to control which location they read data from

•

Use one of the two endpoints

• Primary:
accountname.<service>.core.windows.net
• Secondary: accountname-secondary.<service>.core.windows.net

•

Our client library SDK will provide features for reading from the secondary

• PrimaryOnly, SecondaryOnly, PrimaryThenSecondary, etc

Application will be able to query the current max geo-replication delay for each service
(blob, table, queue) for a storage account
There will be separate storage account metrics for primary and secondary locations

Windows Azure Import/Export
• Move TBs of data into and out of Windows Azure Blobs by shipping disks
Windows
Azure
Storage

Import/Export Features
• Accessible via REST with Portal integration
• Each Job imports/exports data for a single storage account
•

Each Job can be up to 10 disks

• Support 3.5” SATA HDDs
• All Disks must be encrypted with BitLocker

•
•
•
•
•
•

$MetricsRealtimeTransactionsBlob, $MetricsRealtimeTransactionsTable and $MetricsRealtimeTransactionsQueue

•

•

•
•
•

6/24/2013
6/24/2013 1:00
6/24/2013 2:00
6/24/2013 3:00
6/24/2013 4:00
6/24/2013 5:00
6/24/2013 6:00
6/24/2013 7:00
6/24/2013 8:00
6/24/2013 9:00
6/24/2013 10:00
6/24/2013 11:00
6/24/2013 12:00
6/24/2013 13:00
6/24/2013 14:00
6/24/2013 15:00
6/24/2013 16:00
6/24/2013 17:00
6/24/2013 18:00
6/24/2013 19:00
6/24/2013 20:00
6/24/2013 21:00
6/24/2013 22:00
6/24/2013 23:00
6/25/2013

700000
200

695000

198

690000
196

194

685000
192

680000
190

675000
188

670000
186

184

665000
182

660000
180

Average of TransactionCount

Average of TPS

6/24/2013 1:00

6/24/2013 0:57

6/24/2013 0:54

6/24/2013 0:51

6/24/2013 0:48

6/24/2013 0:45

6/24/2013 0:42

6/24/2013 0:39

6/24/2013 0:36

6/24/2013 0:33

6/24/2013 0:30

6/24/2013 0:27

6/24/2013 0:24

6/24/2013 0:21

6/24/2013 0:18

6/24/2013 0:15

6/24/2013 0:12

6/24/2013 0:09

6/24/2013 0:06

6/24/2013 0:03

6/24/2013

20000
350

18000

16000
300

14000
250

12000
200

10000

8000

150

6000
100
Average of TransactionCount

4000

2000
50

0
0

Average of TPS

•
•
•
•
•
•
•
•
•

<RealtimeMetrics>
<Version>1.0</Version>
<Enabled>true</Enabled>
<IncludeAPIs>true</IncludeAPIs>
<RetentionPolicy>
<Enabled>true</Enabled>
<Days>7</Days>
</RetentionPolicy>
</RealtimeMetrics>

CORS (Cross Origin Resource Sharing)
• What?
•
•
•

Browser by default prevents scripts from accessing resources from different origin
CORS is a mechanism that enables cross origin access for scripts
Set CORS rules via SetServiceProperties for Blobs, Tables and Queues

• Can control the origins that can access resources
• Can control the headers that can be accessed

• Why?
•

Do not require running a proxy service for web apps to access storage service

•
•

CORS Settings

<Cors>
<CorsRule>
<AllowedMethods>GET,PUT</AllowedMethods>
•
<AllowedOrigins>*</AllowedOrigins>
•
<AllowedHeaders>*</AllowedHeaders>
•
<ExposedHeaders>*</ExposedHeaders>
•
<MaxAgeInSeconds>180</MaxAgeInSeconds>
•
</CorsRule>
• </Cors>

• What?
•
•

JSON (JavaScript Object Notation)

A popular concise format for REST protocols
OData supports two formats

• ATOMPub: We currently support this but is too verbose
• JSON: OData has released multiple flavors of JSON

• Why?
•

Improves COGS for applications

• Lower bandwidth consumption (approx. 70% savings), lower cpu utilization and hence
better responsiveness

•

Many applications use JSON to represent object model

• Efficient object data model to wire protocol

• New Features
•
•
•
•

2.1 .NET Library

Async Task methods with support for cancellation
Byte Array, Text, File upload / download APIs for blobs
IQueryable provider for Tables
Compiled Expressions for Table Entities

• Performance Improvements
•
•
•

Buffer Pooling
Multi-Buffer Memory Stream for consistent performance when buffering unknown length data
.NET MD5 now default (~20% faster than invoking native one)

• Available Soon @ http://www.nuget.org/packages/WindowsAzure.Storage

Demo –
CORS, JSON and Realtime Metrics

Best Practices –
Account, Blobs, Tables and Queues

• Disable Nagle General .NET1400 b) Practices For Azure
for small messages (< Best
•

ServicePointManager.UseNagleAlgorithm = false;

• Disable Expect 100-Continue*
•

ServicePointManager.Expect100Continue = false;

• Increase default connection limit
•

ServicePointManager.DefaultConnectionLimit = 100; (Or More)

• Take advantage of .Net 4.5 GC
•
•

GC performance is greatly improved
Background GC: http://msdn.microsoft.com/en-us/magazine/hh882452.aspx

General Best Practices

• Locate Storage accounts close to compute/users
• Understand Account Scalability targets
•
•

Use multiple storage accounts to get more
Distribute your storage accounts across regions

• Cache critical data sets
•
•

As a Backup data set to fall back on
To get more request/sec than the account/partition targets

• Distribute load over many partitions and avoid spikes

General Best Practices (cont.)
• Use HTTPS
•

Optimize what you send & receive

•
•
•

Blobs: Range reads, Metadata, Head Requests
Tables: Upsert, Merge, Projection, Point Queries
Queues: Update Message, Batch size

• Control Parallelism at the application layer
•

Unbounded Parallelism can lead to slow latencies and throttling

General Best Practices (cont.)
• Enable Logging & Metrics on each storage service
•
•
•

Can be done via REST, Client API, or Portal
Enables clients to self diagnose issues, including performance related ones
Data can be automatically GC’d according to a user specified retention interval

• For example, have longer retention for hourly metrics and shorter retention for realtime metrics

Blob Best Practice

• Try to match your read size with your write size
•
•

Avoid reading small ranges on blobs with large blocks
CloudBlockBlob.StreamMinimumReadSizeInBytes/ StreamWriteSizeInBytes

• How do I upload a folder the fastest?
•

Upload multiple blobs simultaneously

• How do I upload a blob the fastest?
•

Use parallel block upload

• Concurrency (C)- Multiple workers upload different blobs
• Parallelism (P) – Multiple workers upload different blocks for same blob

Concurrency Vs. Blob Parallelism

XL VM Uploading 512, 256MB Blobs (Total upload size = 128GB)

•

•
•
•

•
•
•
•

C=1, P=1 => Averaged ~ 13. 2 MB/s

10000

C=1, P=30 => Averaged ~ 50.72 MB/s
C=30, P=1 => Averaged ~ 96.64 MB/s

8000

Single TCP connection is bound by TCP
rate control & RTT
P=30 vs. C=30: Test completed almost
twice as fast!
Single Blob is bound by the limits of a
single partition

Time (s)

•
•
•

6000

4000

[NOM DE
SÉRIE]

Accessing multiple blobs concurrently
scales

2000

0

[NOM DE
SÉRIE] [NOM DE
SÉRIE]

Blob Download
140
120

12.5GB)
•
•

C=1, P=1 => Averaged ~ 96 MB/s
C=30, P=1 => Averaged ~ 130 MB/s

Time (s)

• XL VM Downloading 50, 256MB Blobs (Total download size =
100
80

60

40
20
0
C=1, P=1

C=30, P=1

•

Table Best Practice
Critical Queries: Select PartitionKey, RowKey to avoid hotspots

•

Table Scans are expensive – avoid them at all costs for latency sensitive scenarios

•

Batch: Same PartitionKey for entities that need to be updated together

•

Schema-less: Store multiple types in same table

•

Single Index – {PartitionKey, RowKey}: If needed, concatenate columns to form composite keys

•

Entity Locality: {PartitionKey, RowKey} determines sort order

•

•

Store related entites together to reduce IO and improve performance

Table Service Client Layer in 2.1: Dramatic performance improvements and better NoSQL interface

Queue Messages become visible
• Make message processing idempotent: Best Practice if client worker
fails to delete message

• Benefit from Update Message: Extend visibility time based on message or save
intermittent state

• Message Count: Use this to scale workers
• Dequeue Count: Use it to identify poison messages or validity of invisibility time
used

• Blobs to store large messages: Increase throughput by having larger batches
• Multiple Queues: To get more than a single queue (partition) target

Resources
• Windows Azure Developer Website
•

http://www.windowsazure.com/en-us/develop/net/

• Windows Azure Storage Blog
•

http://blogs.msdn.com/b/windowsazurestorage/

• SOSP Paper/Talk
•

http://blogs.msdn.com/b/windowsazurestorage/archive/2011/11/20/windows-azure-storage-a-highlyavailable-cloud-storage-service-with-strong-consistency.aspx

SQL Azure Database Overview

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (18)

Semelhante a SQL Azure Database Overview

Semelhante a SQL Azure Database Overview (20)

Mais de Aymeric Weinbach

Mais de Aymeric Weinbach (20)

Último

Último (20)

SQL Azure Database Overview

Notas do Editor