(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re:Invent 2014

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in partwithout the express consent of Amazon.com, Inc.
November 12, 2014 | Las Vegas
BDT312Using the Cloud to Scale from a Database to a Data Platform
Ryan Horn, Lead Software Engineer at Twilio

Hi, I’m Ryan
Tech Lead of the User Data team at Twilio

What is Twilio?
We provide a communications API that enables phones, VoIP, and messaging to be embedded into web, desktop and mobile software.

How Does it Work?
A user calls your
number
Twilio receives the call
Your app responds

What is the User Data Team?
•We scale Twilio'sbackend database infrastructure
•We build customer facing data APIs
•We manage data policies and data security at rest

Calls and Messages are Stateful
Queued
Ringing
In Progress
Completed
Queued
Sending
Sent
Delivered

In the Beginning…
All data was placed in the same physical database regardless of where the call or message was in its lifecycle.

The Monolithic Database Model
API
Web
Billing
MySQL
Call/Message Service
Carriers

Problems at Scale
•Many consumers of data
•Data with different performance characteristics
•Failure in the database degrades many services
•Horizontal scaling and orchestration is complicated

Moving to a Service-Oriented Architecture

What is a Service-Oriented Architecture?
An architecture in which required system behavior is decomposed into discrete units of functionality, implemented as individual services for applications to compose and consume.

Communicate Through Interfaces, Not Databases
API
Web
Billing
In Flight MySQL
In Flight Service
Post Flight Service
Post Flight MySQL
Carriers

Database Can Change Without Changing Every Service
API
Web
Billing
In Flight MySQL
In Flight Service
Post Flight Service
Post Flight Amazon DynamoDB
Carriers

SOA Doesn’t Solve Everything
No matter how many services you put in front of MySQL, it’s still a single point of failure.

Implementing Sharding (the easy part)
1.Choose partitioning scheme
2.Implement routing logic
3.Send application queries through router
4.Go!

Sharding at Twilio
Application
Router
Shard1
Shard2
Shard0
0-3
3-6
6-9

Rolling it Out With Zero Downtime (the hard part)
•We provide a 24/7, always on service
•Communications is intolerantof inconsistency and latency
•There is no maintenance window

Bringing Up a New Shard
Master1
Slave1
Master2
Slave2
Application
0-9

Split Odds and Evens for Writes
Master1
Slave1
Master2
Slave2
Application
Odds
Evens
0-9

Update Routing
Master1
Slave1
Master2
Slave2
Application
Odds
Evens
0-4
5-9

Cut Slave Link
Master1
Slave1
Master2
Slave2
Application
0-4
5-9

A Necessary Burden
In the beginning, the burden of managing our own databases was non-negotiable.

The Landscape has Changed
We now have a variety of managed database services which solve these problems for us, such as Amazon RDS, Amazon DynamoDB, Amazon SimpleDB, Amazon Redshift, etc.

Cost Is Never Optimized
Application developers do not (and should not) optimize for database cost.

Self Managed Databases are Costly
Everything
Else 22%
Databases
78%
Source: TwilioData Usage

Keeping up With Growth
As growth continues to accelerate, we need to somehow keep up.

A Change in Approach
•Change our hiring practices and bring in specialists
•Remove the context switching

Thinking in Terms of Throughput
Amazon DynamoDBallows us to scale in terms of throughput, not machines. This is the future of resource provisioning.

Operations
Management and scaling of our cluster is fully abstracted away from us.

Cost Compared to MySQL
MySQL 82%
Amazon DynamoDB 18%

Cost with MySQL Fully Replaced
Everything Else 61%
Databases 39%

A Relational Model with Amazon DynamoDB
Many of our services allow for querying data in a way that maps naturally to a relational database.

GET/Accounts/2/Events
SELECT * FROM events ORDER BY date DESC;

SELECT * FROM events WHERE IpAddress=“5.6.7.8” ORDER BY date DESC;

SELECT * FROM events WHERE IpAddress=“5.6.7.8” AND Date<=“2014-10-03” ORDER BY date DESC;
GET /Accounts/2/Events?IpAddress=5.6.7.8&Date<=2014-10-03

AccountId (Hash)
Date (Range)
IpAddress_Date
Type
2
2014-10-03
5.6.7.8|2014-10-03
call
2
2014-10-01
5.6.7.8|2014-10-01
message
GET /Accounts/2/Events
AccountId=2, ScanIndexForward=false

AccountId (Hash)
IpAddress_Date (Range)
Date
Type
2
5.6.7.8|2014-10-03
2014-10-03
call
2
5.6.7.8|2014-10-01
2014-10-01
message
GET /Accounts/2/Events?IpAddress=5.6.7.8
AccountId=2, IpAddress_Date begins with “5.6.7.8|”, ScanIndexForward=false

AccountId (Hash)
IpAddress_Date (Range)
Date
Type
2
5.6.7.8|2014-10-03
2014-10-03
call
2
5.6.7.8|2014-10-01
2014-10-01
message
GET /Accounts/2/Events?IpAddress=5.6.7.8&Date<=2014-10-03
AccountId=2, IpAddress_DateLT “5.6.7.8|2014-10-03”, ScanIndexForward=false

Need to Handle Exceeded Throughput Failures
Exceeding provisioned throughput is a runtime error.

Handling Exceeded Write Throughput with Amazon SQS
Queuing events to Amazon SQS processing asynchronously allows us to gracefully deal with write throughput errors.

API
Web
Billing
Amazon SQS
Events Processor
Amazon DynamoDB

Maximum of 5 Global and 5 Local Indexes
You can manage your own indexes, but your application must then handle partial mutation failures.

Local Index Size Limits
Local secondary indexes provide immediate consistency…and limit the data set for a given hash key to 10GB.

Brief History
2008 -2011
All business intelligence queries run on replicas of MySQL clusters servingproduction traffic.

Brief History
2011 -2013
Data pushed to Amazon S3 and queried with Pig, Amazon EMR, improving ability to aggregate, but with high latency.

Brief History
2013 -Present
Move to Amazon Redshift cut the time these reports took from hours to seconds allowing us to answer critical BI and financial questions in near real time.

Pushing Data Into Amazon Redshift
Post Flight Service
Kafka
SQS (DLQ)
Amazon S3 Loader
S3
Warehouse Loader
Amazon Redshift

Managed Services as a Culture
Our focus is on creating an experience that unifies and simplifies communications is a reflection on our adoption of managed services.

Managed Services as a Culture
Understanding and focusing on our areas of expertise and leveraging managed services for the rest accelerates the delivery of value and innovation to our customers.

(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re:Invent 2014

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a (BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re:Invent 2014

Semelhante a (BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re:Invent 2014 (20)

Mais de Amazon Web Services

Mais de Amazon Web Services (20)

Último

Último (20)

(BDT312) Using the Cloud to Scale from a Database to a Data Platform | AWS re:Invent 2014