Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Founder CTO, ParElastic

Scalability and
database virtualization
How virtualizing your databases improves
performance, and lowers costs
New York City MySQL Meetup, October 3, 2013

What’s this presentation about?
• Scalability and the database tier
•
•
•
•
•
•

What’s the problem?
How did we get here?
Some proposed solutions
What are parallel databases?
What’s ParElastic?
How do I get ParElastic?

• Q&A

October 3, 2013

Tweet this presentation
#parelastic

Scalability and the database tier | NYC MySQL Meetup

2

What is the scalability
problem?

October 3, 2013


3

What is the scalability problem?
• Has many faces
•
•
•
•

Connections and Concurrency
Data Volume and Retention Period
Databases and Tenants
Read vs. Write

• Your problem(s)
• May be more than one
• May change over time

October 3, 2013


4

Connections and Concurrency
• More [Active] Connections
• Worse Performance

• Sizing your database

October 3, 2013


5

Data Volume and Retention Period
• Longer Retention Period
• More Data

• More Data

• Progressive deterioration
• All data in memory 
• All indexes in memory
• Not enough memory 

October 3, 2013


6

Databases and “Tenants”
• Common paradigm in SaaS applications
• Each tenant’s application instance has a database
• Several databases on each database instance

• More databases per instance
In one customer engagement we were informed that no more than 1000
tenants could be located on one database instance before performance
became unacceptable
October 3, 2013


7

Read vs. Write
• Simple read (SELECT) queries could scale well
• Key based lookups
• With favorable indexes

• Things that cause heartburn
• Complex joins (with large data sets)
• Sorts
• Aggregation

• Reads are easier to scale than writes

October 3, 2013


8

How did we get here?
A brief history lesson 

October 3, 2013


9

How did we get here? [1]
• A combination of factors
• Changes in the application user/usage
• Driven by the Internet and mobile computing
• “News Cycles” are getting shorter

• Economics
• Commodity computing is cheap and getting cheaper
• Solutions that can “scale-out” win, others lose

• Ability to leverage higher core-densities
• Other databases does a better job at this than MySQL
• MySQL would do great if you had a 20GHz processor ;)

October 3, 2013


10

• The Evolution of the Database Management System
• A battle between “generalized” and “specialized”

• The Relational Database Management System (RDBMS)
• Designed for monolithic systems
• SMP
• Scale-Up

• Applications evolve quickly!
• Databases respond slowly

October 3, 2013


11

• Moore’s Law
• Scale-Up seemed like a fine answer

• But there are limits …

October 3, 2013


12

• Database architectures traditionally were
• Shared CPU/Memory/Disk
• Also known as “Shared-Everything”

• But “Shared-Everything” doesn’t scale 
• At least not for databases
A server costing twice as much doesn’t always give you twice as much
database “power”. You reach a point of diminishing returns.

October 3, 2013


13

• You can pay more but you may not get more 

Source: Amazon RDS TPC-C Benchmark. Md. Borhan Uddin, Bo He,
Radu Sion, Cloud Computing Center, SUNY Stony Brook.
Viewed online http://digitalpiglet.org/research/sion2010cloud-rds.pdf

October 3, 2013


14

Some proposed
solutions

October 3, 2013


15

Some proposed solutions
• Several strategies have been advocated
•
•
•
•
•

Cache, Cache, Cache,…
Get a bigger server [a.k.a. Scale-Up]
Sharding [a form of Scale-Out]
NoSQL or NewSQL [typically Scale-Out]
Replication and variants

• We look at each one in more detail

October 3, 2013


16

Cache, Cache, Cache!
That’s easy! Do
some caching!

caching transitive verb
to cache
cache
noun
Temporary computer storage used for quick retrieval
of data in order to increase processing speed.

• Caching only addresses
‘read’; not ‘write’
• Social Media workloads
are 'write heavy‘,
'interactive‘ and ‘highly
personalized’
October 3, 2013


17

Get a bigger server [Scale-Up]
I will use a
bigger database
server
Can I even get a
bigger server?
What if
m2.4xlarge isn’t
enough?
Maybe I just
have too much
data?
Maybe I have
too many users?

October 3, 2013


18

Sharding [a form of Scale-Out]
Sharding will solve
my problem!

shard
noun ˈshärd
a piece or fragment of a brittle
substance <shards of glass>; broadly :
a small piece or part
sharding
noun ˈshär-diŋ
(a) to make ones application brittle or
fragmented;
(b) to take one big problem and make
many small problems;
(c) to complicate an application while
claiming to solve a scalability
problem;
(d) to decrease developer
productivity;
(e) a bad idea;
(f) sharding library: a mechanism
that attempts (unsuccessfully) to
hide the bad taste of sharding

October 3, 2013


19

NoSQL or NewSQL?
You need NoSQL
or NewSQL!

• Yes, I have to rewrite my
application

• Yes, not all queries will work
• No, there’s no standard query
language
• No, most do not have ACID
guarantees; hell some don’t even
guarantee Durability
• Yes, most are somewhat untried
science-experiments
• More flavors than Ben & Jerry’s
Ice Cream [yes, really]
• But, all the cool kids are doing it!

October 3, 2013


20

Replication and variants
• Replication based solutions (typically called clustering)
•
•
•
•

Many copies of the data
Distribute queries across the copies
Keep the copies synchronized: like herding cats
Write bottleneck

• Read/Write splitting
•
•
•
•

Single Master (gets all the writes)
Many Slaves (share the reads)
Unpredictable latency
Write bottleneck

October 3, 2013


21

What about MySQL Cluster?
• MySQL Cluster is a strange beast
• For best results, you must use the NDB interface
• Only supports the NDB storage engine
• Primarily a distributed in-memory Key-Value Store
• That is ACID compliant and supports joins and things if you
use the SQL interface
• But no one tells you about the performance of this path!

• Published benchmarks are all “FlexAsync” which talk
directly to the NDB interface
• And READ-ONLY
For more details visit http://www.parelastic.com/blog/mysql-cluster-and-benchmarks
Or stick around after the presentation and we can chat!
October 3, 2013


22

What are parallel
databases?

October 3, 2013


23

What are parallel databases?
1

• A database architecture proposed in 1992
• Very successfully applied to many database problems
• Oracle Exadata, Netezza, Teradata, Greenplum, …

• An example of the “Shared Nothing” database
2
paradigm

1

Parallel Database Systems: The future of high performance database processing [1992, Dewitt, Gray,
ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf]
2
The Case for Shared Nothing [1986, Stonebraker, http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf]

October 3, 2013


24

How parallel databases execute queries

Image from “Parallel Database Systems: The future of high performance database processing” [1992, Dewitt,
Gray, ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf]

October 3, 2013


25

Benefits of parallel databases
• Linear improvement in “reads”
• Linear improvements in “writes”
• Better than linear improvement in “joins”
• Better than linear improvement in “aggregation”
• Better than linear improvement in “sorts”

For more details, refer “Parallel Database Systems: The future of high performance database processing”
[1992, Dewitt, Gray, ftp://ftp.cs.wisc.edu/pub/techreports/1992/TR1079.pdf]

October 3, 2013


26

Parallel Databases vs. Sharding
• Parallel Database
• Database architecture
• Application is data
location agnostic
• Application perceives a
single database
• Requires no application
rewrites

• Application is not
constrained by parallel
database architecture
• A parallel database
handles any schema
October 3, 2013

• Sharding
• Application architecture
• Application is data location
aware
• Application perceives a
collection of databases
• Requires application
rewrites

• Application is constrained
to the limitations of the
sharding architecture
• Not all schemas are
shard’able


27

What is ParElastic?
Hypervisor for databases

October 3, 2013


28

What is ParElastic?
• An approach to relational database virtualization
• Addresses issues of scalability in relational databases
• A parallel database architecture
• Built on standard MySQL or MySQL variant databases
• Horizontal Scalability
• Elastic

October 3, 2013


29

ParElastic: System Architecture

ParElastic Architecture protected by US8214356, “Apparatus for elastic database processing with heterogeneous data”

10/7/2013

Flex Your Database | ParElastic ® Database Virtualization
Engine

30

Data Distribution: How it works
• User data is “distributed” across multiple storage nodes
• Queries are executed in parallel by some [or all] nodes
• Multiple distribution models supported
•
•
•
•

Range
Hash
Broadcast
Random

• ParElastic guarantees co-location and query execution

October 3, 2013


31

Storage Elasticity: How it works
• A “generational scheme”
• Storage Nodes added over time
• Each creates a new “generation”

• Unnecessary to migrate large amounts of data
• A key drawback with “sharding” that requires “resharding”

Storage Elasticity protected by US8478790, US8386532 and other patents.

October 3, 2013


32

ParElastic: How It Works

10/7/2013

Engine

33

ParElastic: Simple query processing example

SELECT COUNT(*)
FROM CUSTOMER;
count(*)
-------2771
(1 row affected)

PROVISION 1 DYNAMIC NODE
ON DYNAMIC NODE
CREATE TEMP TABLE
T1
( C INT );
ON ALL STORAGE NODES
SELECT COUNT(*)
FROM CUSTOMER
AND REDISTRIBUTE
TO T1
ON DYNAMIC NODE
SELECT SUM(C)
FROM T1;

10/7/2013

Engine

34

ParElastic Performance Benefits
• Connection Scalability
• ParElastic Tier Elasticity; have more or less ParElastic servers

• Storage / Data Volume Scalability
• Add ParElastic Persistent Nodes as data volumes increase
• Multiple machines working together

• Workloads are variable
• Compute Node Elasticity; have more or less as required

• Databases and Tenants [SaaS applications]
• ParElastic Adaptive Multi-tenancy ™

• No application change
• Queries processed by, data stored on standard MySQL!
10/7/2013

Engine

35

ParElastic Multi-Tenancy

October 3, 2013


36

ParElastic Concurrency [1]

October 3, 2013


37

ParElastic Concurrency [2]

October 3, 2013


38

ParElastic data “ingest”
One Million rows/s!
15 Storage Nodes, 2 ParElastic Servers

Tests conducted in Amazon Cloud. Native MySQL testing on m1.xlarge server, standard MySQL, standard EBS volumes. Test driver was a c1.xlarge server to provide
sufficient CPU head-room to generate load. ParElastic run with 5 and 15 persistent storage nodes identically configured, m1.xlarge, standard MySQL, standard EBS
Volumes. 15 node test employed two c1.xlarge test drivers. Best ParElastic performance was with 10 threads, 10 persistent storage nodes and an insert batch size of
5,000 tuples per insert batch. Best native MySQL performance was with 2 threads and a batch size of 10,000 tuples per insert batch.

October 3, 2013


39

What’s the ParElastic Overhead?
Query Time
15.72ms
Test Client
Machine 1

Query Time
17.03ms

ParElastic overhead ~ 1.31ms
Network RTT
0.35ms

Machine 1

ParElastic
Machine 2

mysqld

mysqld

Machine 2

Machine 3

October 3, 2013

Test Client

mysqld
…


Machine 4

40

Characterizing ParElastic Performance
• A “fixed cost”, the overhead per query
• A “variable cost” for query processing
• Consider this example, a simple “COUNT” query.

October 3, 2013


41

Some things to keep in mind
• Horizontal Scale-Out benefits from
• Being “stateless”, or at least having less state
• Adhering to a truly “shared nothing” approach

• Horizontal Scale-Out is impeded by
• Complex or Shared “State”
• Things that violate the “shared nothing” paradigm

October 3, 2013


42

What is ParElastic?
• An approach to relational database virtualization
• "A Hypervisor for the Database Tier"

• Scale out database capacity across many servers
• Effectively handle workloads too big for one server

• Share this pool of database among many applications
• Efficiently allocate database capacity to workload

• An elastic, multi-tenant, parallel database architecture
• Built on standard MySQL or MySQL variant databases
• Horizontal Scalability
• Elastic

October 3, 2013


43

Some target markets
• Database Virtualization – “Hypervisor for the Database”
• Reduce capex and simplify administration for development
and test

• SaaS Enablement
• Simplified deployment of SaaS applications using multitenancy

• High Volume Database Applications
• High traffic websites, (e.g. social, ecommerce, on-line games)
• High speed data ingest (e.g. click tracking, sensor arrays,
mobile)

October 3, 2013


45

Where do I get
ParElastic?

October 3, 2013


46

Getting ParElastic
• For Evaluations
• Available at no charge on Amazon Marketplace
• Preconfigured for evaluation purposes; not performance
testing
• Runs completely on a single EC2 instance

• For Larger Configurations
•
•
•
•

Contact ParElastic
Email: info@parelastic.com
Twitter: @parelastic
Web: http://www.parelastic.com

October 3, 2013


47

Getting ParElastic
• On the Amazon AWS Marketplace
(aws.amazon.com/marketplace)

• Quick start guide and simple (two-step) setup wizard
provided.

October 3, 2013


48

Conclusion

October 3, 2013


49

Conclusion
• Database Scalability is a very real problem
• The Cloud has put a very complicated wrinkle in it

• The problem was seen before with commodity servers
• Virtualization was able to address this problem

• Several “hacks” have been proposed
• Not really solutions, just hacks

• ParElastic is a database virtualization solution
• Based on standard relational databases
• Provides benefits of horizontal scalability and multi-tenancy

• ParElastic is available for evaluation on many platforms
• Free evaluation also available on Amazon Marketplace
October 3, 2013


50

Contacting ParElastic
• Look us up online
– http://www.parelastic.com

• Watch an explainer video
– http://www.parelastic.com/video

• Contact us
– Email: info@parelastic.com

October 3, 2013


51

Q&A

October 3, 2013


52

Image Credits
•

Moore’s Law
•

•

Hercules slays the Hydra
•

•

Wikipedia [http://commons.wikimedia.org/wiki/File%3AHercules_slaying_the_Hydra.jpg]

CPU History
•

•

Wikipedia [http://commons.wikimedia.org/wiki/File%3ATransistor_Count_and_Moore's_Law_-_2011.svg]

Phillip E. Ross, “Why CPU Frequency Stalled” [http://spectrum.ieee.org/computing/hardware/why-cpu-frequency-stalled]

Herding Cats
•

Image from [http://wodongatafe.wordpress.com/2011/05/27/herding-cats-or-facilitating-a-webinar-whats-the-difference/]

October 3, 2013


53

Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Founder CTO, ParElastic

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Founder CTO, ParElastic

Semelhante a Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Founder CTO, ParElastic (20)

Mais de ✔ Eric David Benari, PMP

Mais de ✔ Eric David Benari, PMP (10)

Último

Último (20)

Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Founder CTO, ParElastic