Minnebar 2013 - Scaling with Cassandra

Scaling With Cassandra
Jeff Bollinger – CTO - @jbollinger
Jeff Smoley – Infrastructure Architect

Agenda

About NativeX
The Backstory
Why Cassandra
Cassandra Overview
NativeX Cassandra Implementation / Metrics
What we Learned

NativeX
Formerly W3i
Marketing technology platform
that enables developers to build
successful businesses around
their apps.

Vanity Metrics

Over 620M unique devices on our network
Over 500 apps in network
> 100M Monthly Active Users
100 GB of data ingest per week

Backstory

A growing mobile advertising network
API Requests
6
Billions

5

4

3

2

1

0
2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1

Infrastructure Intensive Model

Session Calls by Week After User Acquired
12
Millions

Lifetime of user
10

8

6

4

2

0
0 1 2 3 4 5 6 7 8 9 10 11 12

Scale Up Architecture

Microsoft SQL Server
2 Node Cluster (failover)
12 cores / node
192 GB of / node
Compellent SAN
172 Disk (SSD,FC,SATA)

CAP Theorem

Consistency
SQL Server, MySQL MongoDB

Partition
Availability
Tolerance

Cassandra

Objectives

Scale Resiliency

•Horizontal •No single point
•Incremental of failure
cost structure •Geographically
distributed

What Needed to Scale

Web Application Tier
Database Tier

Web Application Tier is already a server farm that can scale
horizontally through our VMWare environment.
Database Tier was one giant monolithic Microsoft SQL
Server machine.

What is NoSQL?

Stands for Not Only SQL
The NoSQL movement is not about silver bullets and
black boxes.
It’s about understanding problems and focusing on
solutions.
It’s about using the right tool for the right problem.

Selecting Cassandra

DB Distributed Maturity High Availability Style Documentation Native Language Drivers Popularity
MongoDB Yes Medium Yes Document - NoSQL Excellent Major Languages High
VoltDB Yes Low Yes RDBMS - SQL Good Major Languages Low
MySQL Cluster Yes High Yes RDBMS - SQL & Key/Value Excellent Major Languages Medium
MySQL ScaleDB Yes Low Yes RDBMS - SQL Good Major Languages Low
Cassandra Yes Medium Yes Key/Value - Column Family Excellent Major; Poor .Net High
CouchDB No Medium Yes Document - NoSQL ? No - REST only Medium
RavenDB Yes? Low No Document - NoSQL Poor C#, JS, REST Medium
Couchbase Yes Medium Yes Key/Value - Document Good Major Languages Medium

http://nosql.mypopescu.com/ is a helpful site for discovering and learning about
different DB Systems.

*Disclaimer, this data was complied in spring of 2012 and my not reflect the
current state of each database system shown here.

Top Choices

Considered Multiple DB Providers
MySQL Cluster
Relational and very familiar.
Has physical row limitations.
MongoDB
Data modeling was simpler than C*.
Not very clear if it had multi-cluster support.
Cassandra
At the very core it’s all about scalability and resiliency.
Data modeling a little scary, limited .Net support.

Cassandra

Multi-node
Multi-cluster
Tunable Consistency
Highly Available
Durable Shared Nothing

C* at NativeX

C* was not a replacement DB system, but an addition.
C* solves a very specific problem (for us).
Writing large volumes of data quickly.
Reading very specific data out of a large record set.
NoSQL solutions, like C*, are not meant to be a
replacement for everything.
You will make your lifer harder if you try!
The same should be said about Relational Databases.
They don’t solve every problem!

Data Classification

We have three major classifications of data.
Configuration
Activity Tracking
Device History

Configuration Data

This data is relatively small in total size and is used
to operationally run our products. Examples
include:
Mobile Apps
Offers
Campaigns
Restrictions
Queue Settings
This data is typically relational and therefore
continues to be stored in MS SQL Server.

The Very Basics of C* Data Modeling

Data is stored inside of Column Families using nested Key/Value pairs.
A Row Key maps to a collection of Columns.
A Column Name (AKA Column Key) maps to a Column Value.
The Column Name is stored along side the Value.
A common strategy is to store JSON/XML in the Column Value.
(Side note, if you’ve heard of Super Columns, forget about them, they
hurt more than they help)

Activity Tracking Data

Raw tracking data for all activities used by the ETL process to
produce OLAP data on an hourly basis.
Synonymous with Time Series, Event Series, or Logging data.
Examples include:
Running of Mobile Apps
Viewing Offers
Clicking on Offers
Receiving Rewards

Device History Data

Historical activities that each device has performed while
being part of NativeX’s network.
Used for offer classification for a given device.
Examples include:
Clicking on Offers
Running Mobile Apps
Redeeming Rewards

Hardware

12 Nodes
Cisco UCS Blades
12 Cores @ 2.0GHz with Hyper-threading
64GB of Ram
2 x 480GB Intel commodity SSDs in RAID 0
10.5 TB total, ~7 TB usable
Red Hat Linux

Commodity Vs. Enterprise

We chose to use Enterprise hardware for the servers
so that we would have support for them.
However, our work load is very read heavy and 15K
rpm rotational disks were a bottle neck.
We chose to swap out the rotational for commodity
SSDs. (Enterprise SSDs were 10x as expensive)
We have limited support on the hardware because of
this.

Internal C* Cluster Stats

240 peak Writes per second per node
2,880/sec cluster wide
888 peak Reads per second per node
10,656/sec cluster wide
0.53 ms average Write Latency per request
1.7 ms average Read Latency per request
Almost 3 TB of data adding 1 TB a month

Application Side Latencies

MS SQL
Writes 12 ms
Reads 1.5 ms
C*
Writes 3 ms
Reads 4 ms

Can We Make Reads Faster?

We think that in SQL Server, reads were faster
because most of the data sat in memory.
We might be able to achieve lower latencies in C* if
we gave each node just as much memory as our SQL
Server.
To counter act the increased latencies we used
certain techniques like parallel reads using multi-
threading in our web application.

Not all Roses

There are still challenges with C*, like any complex
system.
More moving parts and things that need to stay in
sync.
Misconfigurations can literally destroy your data.
Certain config settings cannot be changed after you
are live, such as the number of virtual Racks.

Lessons Learned

Get into production early
Data Import = Reality
Break down communication barriers
Understanding your IO profile is really important
Cassandra changes quickly, you need to keep up
Scalable systems like C* have a massive amount of
knobs, you need to know them
Leverage cloud resources in working toward right
sizing your cluster

Thanks

We’re hiring
http://nativex.com/careers/
Join the MSP C* Meetup
http://www.meetup.com/Minneapolis-St-Paul-Cassandra-
Meetup/
Email us
Jeff.Smoley@nativex.com
Jeff.Bollinger@nativex.com or @jbollinger
Slide Deck
http://www.slideshare.net/JBollinger/minnebar-2013-scaling-
with-cassandra

Minnebar 2013 - Scaling with Cassandra

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (6)

Semelhante a Minnebar 2013 - Scaling with Cassandra

Semelhante a Minnebar 2013 - Scaling with Cassandra (20)

Mais de Jeff Bollinger

Mais de Jeff Bollinger (7)

Último

Último (20)

Minnebar 2013 - Scaling with Cassandra