Azure Cosmos DB - The Swiss Army NoSQL Cloud Database

Azure Cosmos DB – The Swiss Army
NoSQL Cloud Database
Steef-Jan Wiggers
https://nl.linkedin.com/in/steefjan

What can you expect?
• Introduction of Cosmos DB
• Scenarios
• Some In-depth discussion of capabilities (Scale, Consistency,
Indexing, Models, API)
• Reference case
• RU/s, Latency, SLA
• Demos

Introduction
Evolution
Service
Customers
Scenarios

Azure Cosmos DB Evolution
Originally started to address the
problems faced by large scale apps
inside Microsoft
• Built from the ground up for the
cloud
• Used extensively inside Microsoft
• One of the fastest growing
services on Azure
2010 2014 2015 2017
DocumentDB Cosmos DBProject Florence

Cosmos DB Service
5
Column-family
Document
Graph
Turnkey global distribution
Elastic scale out
of storage & throughput
Guaranteed low latency at the 99th percentile
Comprehensive SLAs
Five well-defined consistency models
Table API
Key-value
MongoDB API
A globally distributed, massively scalable, multi-model database service
Cassandra API

Scenario - Telemetry&SensorData
Azure IoT Hub
Apache Storm on
Azure HDInsight
Azure Cosmos DB
(telemetry and
device state)
events
Azure Web Jobs
(Change feed
processor)
Azure Function
latest state
Azure Data Lake
(archival)

Scenario – Order processing
Azure Functions
(E-Commerce Checkout API)
Azure Cosmos DB
(Order Event Store)
Azure Functions
(Microservice 1: Tax)
Azure Functions
(Microservice 2: Payment)
Azure Functions
(Microservice N: Fufillment)
. . .

Scenario – Multiplayer gaming
Azure Cosmos DB
(game database)
Azure HDInsight
(game analytics)
Azure CDN
Azure API Apps
(game backend)
Azure Storage
(game files)
Azure Notification Hubs
(push notifications)
Azure Functions
Azure Traffic
Manager

Capabilities
Global distribution
Resource model
Scale
Consistency
Indexing

Turnkey Global Distribution
High Availability
• Automatic and Manual Failover
• Multi-homing API removes need for app redeployment
Low Latency (anywhere in the world)
• Packets cannot move fast than the speed of light
• Sending a packet across the world under ideal network
conditions takes 100’s of milliseconds
• You can cheat the speed of light – using data locality
• CDN’s solved this for static content
• Azure Cosmos DB solves this for dynamic content

Global Distribution
• Automatic and transparent replication
worldwide
• Each partition hosts a replica set per
region
• Customers can test end to end
application availability by
programmatically simulating failovers
• All regions are hidden behind a single
global URI with multi-homing
capabilities
• Customers can dynamically add /
remove additional regions at any time
Writes/
Reads
Reads
"airport" : “AMS" "airport" : “MEL"
West US
Container
"airport" : "LAX"
Local Distribution (via horizontal partitioning)
GlobalDistribution(ofresourcepartitions)
Reads
30K transactions/sec
Writes/
Reads
Reads
Reads
West Europ
30K transactions/sec
Partition-key = "airport"

Resource model
• Resources identified by their logical
and stable URI
• Hierarchical overlay over horizontally
partitioned entities; spanning
machines, clusters and regions
• Extensible custom projections based on
specific type of API interface
• Stateless interaction (HTTP and TCP)
Account
DatabaseDatabaseDatabase
DatabaseDatabaseContainer
DatabaseDatabaseItem

Account URI and Credentials
Account
********.azure.com
IGeAvVUp …

Creating Account
Account

Database representation
Account

Container representation
Account
= Collection Graph Table

Creating collection – SQL API
Account

Container level resources
Account
DatabaseDatabaseItem ConflictSproc Trigger UDF

System topology (behind the scenes)

Resource Hierarchy
Containers
Resource Partitions
CollectionsTables Graphs
Tenants
CONTAINERS
Logical resources “surfaced” to APIs as tables,
collections or graphs, which are made up of one
or more physical partitions or servers.
RESOURCE PARTITIONS
Consistent, highly available, and resource-
governed coordination primitives
Consist of replica sets, with each replica hosting
an instance of the database engine
Leader
Follower
Follower
Forwarder
Replica Set
To remote resource partition(s)

Scale
• Pay as go for storage and throughput
• Elastic scale across regions
• Partitions

Horizontal Scaling
• Containers are horizontally
partitioned
• Each partition made highly available
via a replica set
• Partition management is
transparent and highly responsive
• Partitioning scheme is dictated by a
“partition-key”
(partition-key = “airport”)
Replica set
Resource Partitions
{"airport":“DUB"}
Container = Collection Graph Table

Elastic Scale Out of Storage and
Throughput
SCALES AS YOUR APPS’ NEEDS CHANGE
• Database elastically scales storage and
throughput
• How? Scale-out!
• Collections can span across large clusters
of machines
• Can start small and seamlessly grow as
your app grows

Partitions
Cosmos DB Container
(e.g. Collection)
Partition Key: User ID
Logical Partitioning Abstraction
Behind the Scenes:
Physical Partition Sets
hash(User ID)
Psuedo-random distribution of data over range of possible hashed values

Partitions
…
Partition 1 Partition 2 Partition n
Frugal # of Partitions based on actual storage and throughput needs
(yielding scalability with low total cost of ownership)
hash(User ID)
Pseudo-random distribution of data over range of possible hashed values
Andrew
Mike
…
Bob
Dharma
Shireesh
Karthik
Rimma
Alice
Carol
…

Partitions
Best Practices: Design Goals for Choosing a Good Partition Key
Distribute the overall request + storage volume
Avoid “hot” partition keys
Steps for Success
• Ballpark scale needs (size/throughput)
• Understand the workload
• # of reads/sec vs writes per sec
• Use pareto principal (80/20 rule) to help optimize bulk of workload
• For reads – understand top 3-5 queries (look for common filters)
• For writes – understand transactional needs
General Tips
• Build a POC to strengthen your understanding of the workload and
iterate (avoid analyses paralysis)
• Don’t be afraid of having too many partition keys
• Partitions keys are logical
• More partition keys more scalability
• Partition Key is scope for multi-record transactions and routing queries
• Queries can be intelligently routed via partition key
• Omitting partition key on query requires fan-out

Consistency
• Global distribution forces us to navigate the CAP
theorem
• Writing correct distributed applications is hard
• Five well-defined consistency levels
• Intuitive and practical with clear PACELC tradeoffs
• Programmatically change at anytime
• Can be overridden on a per-request basis

Programmable data consistency
Strong consistency
High latency
Eventual consistency,
Low latency

Well-defined consistency models
• Intuitive programming model
• 5 Well-defined, consistency models
• Overridable on a per-request basis
Clear tradeoffs
• Latency
• Availability
• Throughput

5 Well-defined, consistency models
PACELC Theorem: In the case of network partitioning (P) in a distributed computer
system, one has to choose between availability (A) and consistency (C) (as per the CAP
theorem), but else (E), even when the system is running normally in the absence of
partitions, one has to choose between latency (L) and consistency (C).

Choose the right consistency
• Five well-defined, consistency models
• Overridable on a per-request basis
• Provides control over performance-consistency
tradeoffs, backed by comprehensive SLAs.
• An intuitive programming model offering low
latency and high availability for your planet-scale
app.
CLEAR TRADEOFFS
• Latency
• Availability
• Throughput
Strong Bounded-staleness Session Consistent prefix Eventual

Handle any data with no schema or
indexing required
Azure Cosmos DB’s schema-less service automatically indexes all your data,
regardless of the data model, to delivery blazing fast queries.
• Automatic index management
• Synchronous auto-indexing
• No schemas or secondary indices needed
• Works across every data model

Indexing Json Documents
{
"locations": [
{
"country": "Germany",
"city": "Berlin"
},
{
"country": "France",
"city": "Paris"
}
],
"headquarter": "Belgium",
"exports": [
{ "city": "Moscow" },
{ "city": "Athens" }
]
}
locations headquarter exports
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium

{
"locations": [
{
"country": "Germany",
"city": "Bonn",
"revenue": 200
}
],
"headquarter": "Italy",
"exports": [
{
"city": "Berlin",
"dealers": [
{ "name": "Hans" }
]
},
{ "city": "Athens" }
]
}
0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans

0
country city
Germany Bonn
revenue
200
0 1
citycity
Berlin
Italy
dealers
0
name
Hans
0
country city
Germany Berlin
1
country city
France Paris
0 1
city
Athens
city
Moscow
Belgium

Inverted Index
0
country city
Germany
Berlin
revenue
200
0 1
city
Athens
city
Berlin
Italy
dealers
0
name
Hans
Bonn
1
country city
France Paris
Belgium
Moscow

Index Policies
CUSTOM INDEXING POLICIES
Though all Azure Cosmos DB data is indexed by default, you
can specify a custom indexing policy for your collections.
Custom indexing policies allow you to design and customize
the shape of your index while maintaining schema flexibility.
• Define trade-offs between storage, write and query
performance, and query consistency
• Include or exclude documents and paths to and from the
index
• Configure various index types
{
"automatic": true,
"indexingMode": "Consistent",
"includedPaths": [{
"path": "/*",
"indexes": [{
"kind": "Hash",
"dataType": "String",
"precision": -1
}, {
"kind": "Range",
"dataType": "Number",
"precision": -1
}, {
"kind": "Spatial",
"dataType": "Point"
}]
}],
"excludedPaths": [{
"path": "/nonIndexedContent/*"
}]
}

Swiss Army
Knife
Multi-model + Multi API
Reference Case
Demos

Multi-model + multi-API
• Different models:
• Graph
• Key-Value
• Document DB
• No schema or index management
• Automatic indexing
• API support:
• SQL
• JavaScript
• Gremlin
• MongoDB
• Azure Table Storage
• Cassandra

Mimic strategy
• MongoDB
• Cassandra

Graph model
A graph is a structure that's composed of vertices and
edges. Both vertices and edges can have an arbitrary
number of properties.
• Vertices - Vertices denote discrete objects, such as a
person, a place, or an event.
• Edges - Edges denote relationships between vertices.
For example, a person might know another person,
be involved in an event, and recently been at a
location.
• Properties - Properties express information about
the vertices and edges.

Use case – Cosmos DB Graph
https://customers.microsoft.com/en-in/story/reed-business-information-professional-services-azure

Document (JSON)
• A schema-less JSON database engine with rich SQL querying capabilities.
• Store documents
• Searchable by integrating with Azure Search
• Easy integration with Azure Functions
• Change Feed

Demo - Home Automation
data
data
data
data
start/stop
data
start/stop
start/stop
Cosmos DB
Event Hubs
IoT Hub
Functions
data
data
notify

Logic App Cosmos DB Connector
• Cosmos DB connector is a standard connector.
• Several operation including create and update document.
• Note: The maximum size of a document that is supported by the
Document DB (Azure Cosmos DB) connector is 2 MB.

Demo – Data Ingestion
Convert Epoch to
DateTime
Ingest
Pull Push Pull Push
Function A
Store in Collection

More…
Request Units
Latency
SLA

Request Units
• Request Units (RU) is a rate-based
currency
• Abstracts physical resources for
performing requests
• Key to multi-tenancy, SLAs, and COGS
efficiency
• Foreground and background activities
% IOPS
% CPU
% Memory

Request Units
• Normalized across various access
methods
• 1 RU = 1 read of 1 KB document
• Each request consumes fixed RUs
• Applies to reads, writes, queries, and
stored procedure execution
GET
POST
PUT
Query

Request Units
• Provisioned in terms of RU/sec and RU/min
granularities
• Rate limiting based on amount of throughput
provisioned
• Can be increased or decreased instantaneously
• Metered Hourly
• Background processes like TTL expiration,
index transformations scheduled when
quiescent
Min RU/sec
Max RU/sec
IncomingRequests
Replica
Quiescent
Rate
limit
No
throttling

Latency
• Simultaneously read/write
• Latch free and write-optimized
• 10-ms latency on read
• 15-ms latency on writes
~ 100 reads per second
~ 70 writes per second

Performance Demo
Rules
Subscription0
Subscription1
Subscription2
Subscription3
Subscription4
Cosmos DB
Database
Document Collection

SLA
• Fully managed service
• 99,99% SLA for latency
• Guaranteed throughput,
consistency and high
availability

Key Takeaways
• Multiple options with models and API’s
• Various consistency models
• Global scale
• Flexible through put
• Support for diverse scenario’s

Call to action
• Documentation: https://docs.microsoft.com/en-us/azure/
• Middleware Friday: https://www.youtube.com/watch?v=ZplZgOoGUzY
• Graph demo: https://github.com/anthonychu/cosmosdb-gremlin-flights
• Pluralsight: https://www.pluralsight.com/courses/azure-cosmos-db
• Channel 9: https://channel9.msdn.com/Events/Build/2018/BRK3319

Thanks for watching!
facebook.com/BizTalk360
twitter.com/BizTalk360
http://www.biztalk360.com

Azure Cosmos DB - The Swiss Army NoSQL Cloud Database

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Azure Cosmos DB - The Swiss Army NoSQL Cloud Database

Semelhante a Azure Cosmos DB - The Swiss Army NoSQL Cloud Database (20)

Mais de BizTalk360

Mais de BizTalk360 (20)

Último

Último (20)

Azure Cosmos DB - The Swiss Army NoSQL Cloud Database