Mais conteúdo relacionado
Cassandra at NoSql Matters 2012
- 4. Why Big Data Matters
Research done by McKinsey & Company shows the eye-opening, 10-
year category growth rate differences between businesses that smartly
use their big data and those that do not.
©2012 DataStax
- 5. Big data
Analytics Realtime
?
(Hadoop) (“NoSQL”)
©2012 DataStax
- 7. Industries & use cases
• Financial • Time series data
• Social Media • Messaging
• Advertising • Ad tracking
• Entertainment • Data mining
• Energy • User activity streams
• E-tail • User sessions
• Health care • Anything requiring:
Scalable performant
• Government + highly available
©2012 DataStax
- 8. Why Cassandra?
• Fully distributed, no SPOF
• Multi-master, multi-DC
• Linearly scalable
• Larger-than-memory datasets
• Best-in-class performance (not just writes!)
• Fully durable
• Integrated caching
• Tuneable consistency
©2012 DataStax
- 9. Availability
• “There is no such thing as standby
infrastructure: there is stuff you always use and
stuff that won’t work when you need it.” -- Ben
Black: founder, Boundary; ex-AWS
• “The biggest problem with failover is that you're
almost never using it until it really hurts. It's like
backups that you never test.” -- Rick Branson: instagram;
ex-DataStax
©2012 DataStax
- 13. Partitioning
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru gender: F
johnny age:12 gender: M
suzy age:10 gender: F
©2012 DataStax
- 14. Partitioning
Primary key determines placement*
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru gender: F
johnny age:12 gender: M
suzy age:10 gender: F
©2012 DataStax
- 15. PK MD5 Hash
jim 5e02739678... MD5 hash
operation yields
carol a9a0198010...
a 128-bit
johnny f4eb27cea7... number for
keys
suzy 78b421309e... of any size.
©2012 DataStax
- 17. Start End
0xc000000000.. 0x0000000000..
A 1 0
0x0000000000.. 0x4000000000..
B 1 0
0x4000000000.. 0x8000000000..
C 1 0
0x8000000000.. 0xc000000000..
D 1 0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 18. Start End
0xc000000000.. 0x0000000000..
A 1 0
0x0000000000.. 0x4000000000..
B 1 0
0x4000000000.. 0x8000000000..
C 1 0
0x8000000000.. 0xc000000000..
D 1 0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 19. Start End
0xc000000000.. 0x0000000000..
A 1 0
0x0000000000.. 0x4000000000..
B 1 0
0x4000000000.. 0x8000000000..
C 1 0
0x8000000000.. 0xc000000000..
D 1 0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 20. Start End
0xc000000000.. 0x0000000000..
A 1 0
0x0000000000.. 0x4000000000..
B 1 0
0x4000000000.. 0x8000000000..
C 1 0
0x8000000000.. 0xc000000000..
D 1 0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 21. Start End
0xc000000000.. 0x0000000000..
A 1 0
0x0000000000.. 0x4000000000..
B 1 0
0x4000000000.. 0x8000000000..
C 1 0
0x8000000000.. 0xc000000000..
D 1 0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
©2012 DataStax
- 22. Replication
Node A Node B
Node D Node C
carol a9a0198010...
©2012 DataStax
- 23. Node A Node B
Node D Node C
carol a9a0198010...
©2012 DataStax
- 24. Node A Node B
Node D Node C
carol a9a0198010...
©2012 DataStax
- 25. Highlights
• Adding capacity is application-transparent and
requires no downtime
• No SPOF, not even temporarily
• No “primary” replica
• Configurable synchronous/asynchronous
• Tolerates node failure; never have to restart
replication “from scratch”
• “Smart” replication avoids correlated failures
©2012 DataStax
- 26. What about performance?
• Log-structured storage engine avoids random i/
o
• Excellent performance on both reads and writes
• Row-level isolation via concurrent algorithms
• no locking
• Built in compression improves cache hotness
• “Row cache” can replace memcached
©2012 DataStax
- 27. reads/s writes/s
35000
30000
25000
20000
15000
10000
5000
Cassandra 0.6
0
©2012 DataStax
Cassandra 1.0
- 29. Netflix
Application/Use Case
• Manage subscriber interactions with
downloaded movies
• Need to handle distributed databases all over
the world (40 countries)
• Need better TCO than Oracle
simple text Why Cassandra?
• Easy scale and multi-data center support
for geographical data distribution
• Data model perfect fit for customer
interaction data
• Much better TCO than Oracle or SimpleDB
“I can create a Cassandra cluster in any region of the world in 10
minutes. When marketing guys decide we want to move into a
certain part of the world, we’re ready.”
©2012 DataStax
- 30. Constant Contact
Application/Use Case
• Manage marketing/email campaigns for
small businesses
• Needed database to handle social media
data that is very large in volume and must be
maintained for long time
• Data is unstructured in nature
simple text
Why Cassandra?
• Cassandra built for big data scale and able
to persist, manage, and quickly query big
data
• Deployed application on Cassandra in
1/3rd the time and 1/10th the cost of
Oracle
“Whenever we need new capacity, we just add new nodes online
and we’re able to meet whatever demand we have. Cassandra is
great for that.”
©2012 DataStax
- 31. ReachLocal
Application/Use Case
• ReachLocal provides end-to-end Internet
advertising services to small and medium-
sized businesses in eight countries
• Must track most or all user interaction with
marketing campaigns on web sites
simple text Why Cassandra?
• The amount of information was beyond
the scalability limits of traditional
RDBMS’s
• Has to replicate data to six data centers
around the world
• Needed integration with real-time data and
analytics/search
©2012 DataStax
- 32. Backupify
Application/Use Case
• Cloud-based utility that enables backups and
searches of Google Apps, Gmail, Facebook,
Twitter, Blogger and other content.
• Must write lots of data very quickly
simple text Why Cassandra?
• Big data requirements necessitated easy
scale out and continuously available
database architecture
• Strong Community support of Cassandra
• TCO was much better than others
“Cassandra was just a better design all around – more truly
horizontally scalable and with less management overhead – and
there’s no single point of failure. I looked at Cassandra’s
architecture and thought, ‘Yeah, that’s how you do it.’”
©2012 DataStax
- 33. OpenWave
Application/Use Case
• Openwave Messaging delivers next
generation converged messaging platform
with cloud and social integration capabilities.
simple text
Why Cassandra?
• Needed new database that would support
geographic redundancy, continuous
availability, and big data scale
• Required high IOPS database speed
• Better TCO than prior Oracle database
“Here are the big ‘checkbox’ items for us with Apache
Cassandra: There is no single point of failure, it offers high read-
and-write performance, and it has the ability to work on
commodity hardware”.
©2012 DataStax
- 34. Healthx
Application/Use Case
• Develops and manages online portals for
healthcare market
• Delivered via cloud platform
• Manages provider, patient, and other related
data
simple text Why DataStax Enterprise?
• Needed to scale, perform, and search data
faster than previous Microsoft SQL Server
database farm
• Integrated big data platform that provides
one database cluster for all real-time and
search data
“We really like the integration with Solr. We get the full
redundancy that you’d expect out of Cassandra as well as the full
text indexing of Solr. The two things together make a win.”
©2012 DataStax
- 35. Big data
Analytics Realtime
?
(Hadoop) (“NoSQL”)
©2012 DataStax
- 39. Big data
Analytics Datastax Realtime
(Hadoop) Enterprise (Cassandra)
©2012 DataStax
- 43. Better Hadoop than Hadoop
• “Vanilla” Hadoop
• 8+ services to setup, monitor, backup, and recover
(NameNode, SecondaryNameNode, DataNode, JobTracker,
TaskTracker, Zookeeper, Region Server,...)
• Single points of failure
• Can't separate online and offline processing
• DataStax Enterprise
• Single, simplified component
• Self-organizes based on workload
• Peer to peer
• JobTracker failover
©2012 DataStax
- 44. Enterprise search with Solr
SELECT title FROM solr WHERE solr_query='title:natio*';
title
--------------------------------------------------------------------------
Bolivia national football team 2002
List of French born footballers who have played for other national teams
Lithuania national basketball team at Eurobasket 2009
Bolivia national football team 2000
Kenya national under-20 football team
Bolivia national football team 1999
Israel men's national inline hockey team
Bolivia national football team 2001
©2012 DataStax
- 45. Managing & Monitoring Big Data
DataStax
OpsCenter
manages and
monitors all
Cassandra and
Hadoop
operations
©2012 DataStax
- 46. Questions?
• http://www.datastax.com/docs
• http://www.datastax.com/dev/blog/whats-
new-in-cassandra-1-1
• http://www.datastax.com/products/enterprise
©2012 DataStax