3. DataStax: CRN’s “10 Coolest Big Data Startups”
Cassandra: InfoWorld’s Technology of the Year
1,000+ production deployments and 300 customers
$84M in funding from industry-leading investors
4. We are the first viable alternative to
Oracle for modern online
applications.
We seek to be the first and best
choice in databases.
7. Internet of Things Database Requirements
• “UTC subject predicate”: Time series data and metadata are the lingua franca of
sensors/device data communications
• FAST AND ALWAYS ON: High-velocity ingest rates from geographically dispersed inputs
with variable schemas/data models is the norm—and unless you tell them to do so, sensors
never, ever sleep…
• HOT AND COLD: Real-time data and analytics vs. data reservoir/data factory needs vary.
• DHTs: Wide-row column-oriented distributed hash tables are the optimal home for IoT
operational datastores
• AND: Other key functionality needed includes indexed search, along with both batch and realtime analytics—with data-in-flight and data-at-rest security an emerging need
• SPOILER ALERT: DataStax Enterprise supports all of the above
7
8. Time Series Analytics: 70B readings
Smart Grid Proof of Concept: Analyze 2 years of Smart Meter data for 1M households
Improvements in demand forecasting could yield EBITDA > $100M per GW saved
•
•
•
$5M CAPEX
10 man/months delivery
(Deploy, DevOps, Tuning)
Ongoing OPEX of > $1M
•
•
•
•
$450K OPEX
2 DevOps running 15 AWS nodes
Faster performance in 2 weeks
…All in the cloud
9. Major Changes: The Evolving Data Center
LOB
App
LOB
App
LOB
App
Data Warehouse
Oracle
MySQL
SQL
Server
Teradata/
Exadata
“What’s Happening?”
Hyper Velocity
Transactional
“What Happened?”
Massive Volume
Bit Bucket
NoSQL
Hadoop
11. Common Use Cases
•
Web product searches
•
Internal document search (law firms, etc.)
•
Real estate/property searches
•
Social media match ups
•
Web & application log management / analysis
•
Big data OLTP and write intensive systems
•
Time series data management
•
High velocity device data consumption and analysis
•
Healthcare systems input and analysis
•
Media streaming (music, movies, etc.)
•
Online Web retail (shopping carts, user transactions, etc.)
•
Online gaming (real-time messaging, etc.)
•
Real time data analytics
•
•
Web click-stream analysis
•
Buyer event and behavior analytics
•
Fraud detection and analysis
•
Risk analysis and management
•
11
Social media input and analysis
Supply chain analytics
14. The New DR: Simian Army “Dystopia as a Service”
Virginia
London
Santa Clara
Sydney
14
15. Heterogeneous Workloads: Active Everywhere
Read
Analyze
Write
Virginia
London
Search
Write
Santa Clara
Sydney
Search
Write
15
Read
16. Our Product Solution
• DataStax Enterprise
powers the big data apps
that transform business.
• Extreme Data Velocity
• Continuous Availability
• Operational Simplicity
18. Performance: NoSQL Leadership
Cassandra vs. HBase:
•10x more read throughput
•100x faster read latency
•8x more write throughput
•8x faster scan latency
•4x more scan throughput
Source: Solving Big Data Challenges for Enterprise Application Performance Management
Tillman Rabl, University of Toronto et al VLDB 2012 (August 2012, Istanbul)
20. From STB to the Scalable Cloud Message Bus
Use Case: X1 Sports App
18000)
16000)
API/sec)
14000)
Even in preproduction
environment prior
to tuning, achieved
near-linear
scalability
12000)
10000)
8000)
6000)
4000)
2000)
0)
4)
8)
12) 16) 20) 24)
Ring)Size)
20
Enabling a richer
active consumer
experience across
multiple devices,
multiple platforms
21. Instagram Scales Engaged Networks
• Transitioned from Redis (in-memory cache) to
Cassandra in Amazon Web Services EC2
• Doubled cluster—and then doubled again—to support
150MM users on new infrastructure
• Continue to scale in spite of Justin Bieber storms, video
formats, new features, new markets
CASSAN DRA
AT IN STAGRAM
Rick Branson, Infrastructure Engineer
@
rbranson
c om i t ac b02daea57dc a889c 2aa45963754a271f a51566
m
Aut hor : Ri c k Br ans on
Dat e:
Sun Feb 10 20: 36: 34 2013 - 0800
Doubl ed C* c l us t er
2013 Cassandra Summit
#cassandra13
June 12, 2013
San Francisco, CA
21
22. Our Vision
DataStax is driving
Cassandra to be the first
viable alternative to the
Oracle database for
companies who are
transforming the way they
interact with customers.
Getting ahead of exploding growth
Sign big, new contracts all the time (ESPN)
• 200M unique users per month
• 40TB of data
•
Flexible architecture
•
“Couldn’t shoehorn RDBMS technology”
Very small operations team
3 people
• 20 clusters
• 100’s of nodes
•
23. Why We Exist
Today’s applications must be
always available and lightning
fast as they scale to previously
unimaginable levels.
Cassandra delivers both with a
beautifully simple and elegant
architecture.
“We need a real-time, massively
scalable architecture, where no
one node is a single point of
failure, that can easily span
multiple data centers and cloud
availability zones, and that’s
Cassandra.”
24. What We Do Best
Cassandra was designed to do
things that are impossible in
other databases when it comes
to availability and
performance. Forget about
losing a machine here or there -Cassandra delivers a world
where you can lose an entire
datacenter and still perform as
your customers expect.
“We have to be ready for disaster
recovery all the time. It’s really
great that Cassandra allows for
active-active multiple data centers
where we can read and write
anywhere”
Jay Patel
Technical Architect at eBay
(Describing why they switched from legacy
relational architecture)
38. BENEFITS
FEATURES
Security in Cassandra
Internal Authentication
Manages login IDs and
passwords inside the
database
+ Ensures only
authorized users can
access a database
system using internal
validation
+ Simple to implement
and easy to
understand
+ No learning curve from
the relational world
Object Permission
Management
controls who has access
to what and who can do
what in the database
Client to Node
Encryption
protects data in flight to
and from a database
cluster
+ Provides granular based
control over who can
add/change/delete/read
data
+ Ensures data cannot be
captured/stolen in route
to a server
+ Uses familiar
GRANT/REVOKE from
relational systems
+ No learning curve
+ Data is safe both in
flight from/to a
database and on the
database; complete
coverage is ensured
39. BENEFITS
FEATURES
Advanced Security in DataStax Enterprise
External Authentication
uses external security
software packages to
control security
Transparent Data
Encryption
encrypts data at rest
Data Auditing
provides trail of who did
and looked at what/when
+ Only authorized users
have access to a
database system using
external validation
+ Protects sensitive data
at rest from theft and
from being read at the
file system level
+ Supplies admins with
an audit trail of all
accesses and changes
+ Uses most trusted
external security
packages (Kerberos,
LDAP), mainstays in
government and
finance
+ No changes needed at
application level
+ Single sign on to all
data domains
+ Can encrypt both
Cassandra and
Hadoop data
+ Granular control to
audit only what’s
needed
+ Uses log4j interface to
ensure performance
and efficient audit
operations