SlideShare a Scribd company logo
1 of 29
Confidential © 2011 eBay Inc. All rights reserved. eBay and the eBay logo are registered trademarks of eBay Inc.
Cassandra at
Anurag Jambhekar, Sr Manager Database Engineering
Feng Qu, Principal Database Engineer
Scale
#cassandra13
confidential 2
Databases in eBay marketplace’s transactional platform
Xmp is a new emerging database system which
offer best of relational and NoSQL model. It provide high
transaction throughput with ACID compliance and auto
sharding.)
#cassandra13
confidential 3
Database Scalability Patterns
• Logical database host and transparent mapping to physical
database
• Multiple read copies
– One write database and multiple read databases
• Horizontal Scaling
– Shard data across multiple database hosts using Mod/Range/Lookup
based Sharding
• Vertical scaling of database hardware
• Archiving of data
• Auto sharding in NoSQL makes it even easier to add capacity
#cassandra13
confidential 4
Database Logical Host
• eBay applications interact with the databases through a level of
indirection that is referred to as Logical Databases.
• Logical databases to physical database relation is many-to-one
– Smaller logicals from different DB families are mapped to same physical
– DAL( Data Access Layer) does the translation from logical host to
physical host
• Why Logical Databases?
– Gives DBAs the independence to move objects based on load and traffic
– Code modification is not required when the physical topology changes
– Helps failover scenarios
– Allows to quickly turn off less critical feature to minimize site impact
#cassandra13
confidential 5
Always Available Read Databases
App pools App pools
DAL Load
balancer
DAL Load
Balancer
#cassandra13
confidential 6
MOD Based Model
• Implicit allocation/routing based on a mathematical formula
• A modulus function of primary key provides host ID to allocate/use
• Applies mostly in case of user generated data
• Allows for simple segmentation with almost zero overhead
• Limits to scaling only to “N” based on initial configuration#cassandra13
confidential 7
Range Based Model – Item Split
UserLookupHost
Item_host_lookup
UserId=10056, UserItemhost=2
LookupHost
iItem_host_item_id_range
UserItem1host : 1 -100
UsrerItem2host : 101-200
Useritem3host : 201 -300
….UserItem1host UserItem2host UserItem27host
Selling pools All pools
Item Range : 101 -200
Allocation
Location
• Allocation based on seller segment, and seller segment based on user host, seller level
- All of a sellers items are listed in certain set of Item hosts
• Each User Item host has a range of IDs associated with it
- All items listed on a item host get an ID within a specified Range
• ID being in a fixed range allows for location of items
• Indefinitely scalable
Item Id = 150
#cassandra13
confidential 8
Lookup Based Model : User Split
User_id_lookup
User_name_lookup
UserLookupHost
UserReadWrite10host
UserReadFailover10host
Replica
UserReadWrite11host
UserReadFailover11host
Replica
UserReadWrite22host
UserReadFailover22host
Replica
……….
Multiple USER Read/Write Databases
App Pools
User lookup based on user id or name
User read/Write
• Initial allocation of user based on MOD/ round robin
• Location based on persistent mapping stored in UserLookup database family
- Lookup based on user numeric id or name id.
• Each host has a subset of users and support both read and write
• Infinitely scalable#cassandra13
confidential 9
Challenges of Traditional RDBMS
• Performance penalty to maintain ACID features
• Lack of native sharding and replication features
• Cost of software/hardware
• Higher cost of commit
#cassandra13
confidential 10
Few words about myself
• Am a RDBMS veteran.
• Started with Oracle 5 in early 1990s.
• For more than a decade, I worked on Oracle, MS SQL
Server, MySQL in DoubleClick, Yahoo and Intuit
• In recent years, I am working at eBay marketplace focusing
on NoSQL technology, like Cassandra, MongoDB.
#cassandra13
confidential 11
Why Cassandra?
• Flexible schema
• Good performance for both reads/writes
• Horizontal linear scalability
• Built-in high availability
• No SPOF
• Automatic Geo load balancing and DR with multiple data centers deployment
• Automatic sharding for load balancing across multiple nodes
• Real-time data analytics, search with DSE, no more ETL
• No manual data purging required
#cassandra13
confidential 12
Cassandra @ eBay
• Started in 2011
• Uses Apache Cassandra and DataStax Enterprise
• Java Hector client used most, and evaluating other options
• 10+ production clusters on 100+ bare metal nodes with 250TB
storage provisioned across multiple data centers
• 100TB+ user data on local HDD and shared SSD array
• Cluster size varies from 6-node cluster to 32-node cluster
#cassandra13
confidential 13
Cassandra Environment
• Software
• Cassandra 1.0/1.1/1.2, DSE 3.0, RHEL 6.3, Hector 1.1
• Standard hardware:
• HP DL380, 12 cores, 96GB RAM, 5.5TB raid10 HDD
• Dell R620, 12 cores, 128GB RAM, 1TB raid10 HDD , 10 Gbps NIC
• Violin 6200/6600 flash memory array
• VM – coming soon
• Two types of cluster:
• Dedicated cluster for large use case
• Multi-tenant cluster for small use cases
#cassandra13
confidential 14
eBay Scale
#cassandra13
• Data Centers: 3
• Production Clusters: 10+
• Physical nodes: 100+
• Total data size: 100+ TB
• Largest cluster: 32 nodes
• Largest CF by size: 40 TB
• Largest CF by rows: 32 billion
• Busiest CF by reads: 1.6 billion/day
• Busiest CF by writes: 6 billion/day
• Total reads: 5 billion/day
• Total writes: 9 billion/day
confidential 15
Packaging
• Build own RPMs on top of binary distribution
• Different RPM for Apache Cassandra and DSE
• Easy to install/upgrade, easy to maintain
• Ensure deployment consistency across board and easy to
identify deployment difference
• Built in pre-defined tuning parameters
• Using virtual nodes as default for new 1.2 clusters
#cassandra13
confidential 16
Tuning Examples
• Kernel
• Increase vm.max_map_count to fix “attempt to allocate stack guard
pages failed” and “java.lang.OutOfMemoryError: Map failed”
• JVM heap
• Keep heap size under control, and disable row cache(almost always)
• Compaction
• LCS vs. STCS. Used LCS in 1.0, but had severe performance issue
when # of SST grows to ~200K
• Adjust compaction throughput as needed
• Hector client
• Enable discovery mode and adjust timeout for different use cases
#cassandra13
confidential 17
Scalability
• Benchmarking
• Get performance baseline for new type of hardware
• Enforce full scale testing in dedicated LnP env before going to production
• In general, scale out by adding more nodes to increase throughput
• Sometimes, it’s cost-efficient to scale up at component level by
Identifying scaling bottleneck, then resolve it accordingly
• Network bandwidth: upgrade to 10 Gbps network
• I/O latency: upgrade to SSD
• Storage: add/expand data volume
• CPU/Memory: haven’t seen it yet
#cassandra13
confidential 18
Operations Notes
• Routine repair is not really needed if there is no deletes. You still need
run repair after bringing up a down node if it is dead for a while
• Use CNAME in client configuration to avoid client conf change in case
of hardware replacement with new IP/name
• Reduce gc_grace to reduce overall data size
• Disable swap to avoid a slow node
• Bootstrapping didn’t work for 1st few times , so you have keep trying
• Disable row cache, unless you have <100K rows
• Collect statistics, real-time or historical, to monitor system
performance and provide dashboard report for management
#cassandra13
confidential 19
Monitoring
• Integrate with NOC monitoring tool for 24*7 support
• Customized email alerts for non-critical events
• Pending compactions
• Garbage collections
• Storage usage
• DataStax OpsCenter Enterprise
• Monitor multiple clusters on one place
• Rolling restart
• Dashboard
#cassandra13
confidential 20
OpsCenter
#cassandra13
confidential 21
NOC Real-time Monitoring
#cassandra13
confidential 22
Typical Cassandra Use Cases at eBay
• Write Intensive: Metrics collection
• Collecting metrics from tens of thousands devices periodically
• 6+ billion writes per day
• Read Intensive: Recommendation backend
• 4+ billion reads per day
• Mixed workload: Personalization
• Data is bulked loaded from data warehouse periodically
• Data is retrieved in real time when user visits ebay site
#cassandra13
confidential 23
Metrics Collection
#cassandra13
confidential 24
Metrics Collection
• Storing, serving real-time metrics data for tens of thousands
devices for dashboards, triage Automation etc
• Statistics
• 16-node 1.2 cluster
• 2 copies of data in 2 data centers
• 600M keys
• 6B writes including rollups per day
#cassandra13
confidential 25
Recommendation Backend
#cassandra13
confidential 26
Recommendation Backend
• Recommendation based on user’s list/watch/sell activities
• Two 1.0 clusters, 40 nodes combined
• Statistics
• 25B keys
• 32TB data on SSD
• 600M writes per day
• 3B reads per day
#cassandra13
confidential 27
Vendor Support
• Support from DataStax.
• CASSANDRA-4142: OOM during repair with LCS
• CASSANDRA-4287: Slow compaction
• CASSANDRA-4427: Restarting a failed bootstrap node instajoins the
ring
• CASSANDRA-5107: Node fails to start because host id is missing
• Prompt responses from highly skilled support persons
• Support $$$ is not wasted, trust me.
#cassandra13
confidential 28
Looking Forward
• Support more mission critical use cases, like personalization,
anti-fraud etc
• Hardware virtualization
• OpenStratus which is built on top of OpenStack
• Separate storage from compute node
• To scale different components as needed
• More automations, such as self-serve deployment
• More best practice process
• Troubleshooting/Tuning runbook
#cassandra13
confidential 29
Before Q&A…
Thanks for your time.
We are looking for talented NoSQL experts to join
database engineering team
email resume to
ajambhekar at ebay dot com
#cassandra13

More Related Content

What's hot

Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writesInstaclustr
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackDataStax Academy
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Dave Gardner
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraDataStax Academy
 
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarWebinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarDataStax
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScyllaDB
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...DataStax Academy
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7DataStax
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupVictor Coustenoble
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsScyllaDB
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra ClustersInstaclustr
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%ScyllaDB
 

What's hot (20)

Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
 
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra RockstarWebinar: DataStax Training - Everything you need to become a Cassandra Rockstar
Webinar: DataStax Training - Everything you need to become a Cassandra Rockstar
 
Scylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDSScylla Summit 2016: Scylla at Samsung SDS
Scylla Summit 2016: Scylla at Samsung SDS
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
 
Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7Introducing DataStax Enterprise 4.7
Introducing DataStax Enterprise 4.7
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
The True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS OptionsThe True Cost of NoSQL DBaaS Options
The True Cost of NoSQL DBaaS Options
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra Clusters
 
Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%Webinar: How to Shrink Your Datacenter Footprint by 50%
Webinar: How to Shrink Your Datacenter Footprint by 50%
 

Similar to Database Scalability Patterns at eBay

Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB OverviewAndrew Liu
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Web Services
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingAmazon Web Services
 
AWS Database Services-Philadelphia AWS User Group-4-17-2018
AWS Database Services-Philadelphia AWS User Group-4-17-2018AWS Database Services-Philadelphia AWS User Group-4-17-2018
AWS Database Services-Philadelphia AWS User Group-4-17-2018Bert Zahniser
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.
 

Similar to Database Scalability Patterns at eBay (20)

Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Clustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmarkClustrix Database Percona Ruby on Rails benchmark
Clustrix Database Percona Ruby on Rails benchmark
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Scylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the DatabaseScylla Summit 2016: Compose on Containing the Database
Scylla Summit 2016: Compose on Containing the Database
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
Cassandra
CassandraCassandra
Cassandra
 
Azure DocumentDB Overview
Azure DocumentDB OverviewAzure DocumentDB Overview
Azure DocumentDB Overview
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)Amazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS)
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
 
AWS Database Services-Philadelphia AWS User Group-4-17-2018
AWS Database Services-Philadelphia AWS User Group-4-17-2018AWS Database Services-Philadelphia AWS User Group-4-17-2018
AWS Database Services-Philadelphia AWS User Group-4-17-2018
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph DatabasesDataStax Academy
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 
Getting Started with Graph Databases
Getting Started with Graph DatabasesGetting Started with Graph Databases
Getting Started with Graph Databases
 
Cassandra Data Maintenance with Spark
Cassandra Data Maintenance with SparkCassandra Data Maintenance with Spark
Cassandra Data Maintenance with Spark
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Database Scalability Patterns at eBay

  • 1. Confidential © 2011 eBay Inc. All rights reserved. eBay and the eBay logo are registered trademarks of eBay Inc. Cassandra at Anurag Jambhekar, Sr Manager Database Engineering Feng Qu, Principal Database Engineer Scale #cassandra13
  • 2. confidential 2 Databases in eBay marketplace’s transactional platform Xmp is a new emerging database system which offer best of relational and NoSQL model. It provide high transaction throughput with ACID compliance and auto sharding.) #cassandra13
  • 3. confidential 3 Database Scalability Patterns • Logical database host and transparent mapping to physical database • Multiple read copies – One write database and multiple read databases • Horizontal Scaling – Shard data across multiple database hosts using Mod/Range/Lookup based Sharding • Vertical scaling of database hardware • Archiving of data • Auto sharding in NoSQL makes it even easier to add capacity #cassandra13
  • 4. confidential 4 Database Logical Host • eBay applications interact with the databases through a level of indirection that is referred to as Logical Databases. • Logical databases to physical database relation is many-to-one – Smaller logicals from different DB families are mapped to same physical – DAL( Data Access Layer) does the translation from logical host to physical host • Why Logical Databases? – Gives DBAs the independence to move objects based on load and traffic – Code modification is not required when the physical topology changes – Helps failover scenarios – Allows to quickly turn off less critical feature to minimize site impact #cassandra13
  • 5. confidential 5 Always Available Read Databases App pools App pools DAL Load balancer DAL Load Balancer #cassandra13
  • 6. confidential 6 MOD Based Model • Implicit allocation/routing based on a mathematical formula • A modulus function of primary key provides host ID to allocate/use • Applies mostly in case of user generated data • Allows for simple segmentation with almost zero overhead • Limits to scaling only to “N” based on initial configuration#cassandra13
  • 7. confidential 7 Range Based Model – Item Split UserLookupHost Item_host_lookup UserId=10056, UserItemhost=2 LookupHost iItem_host_item_id_range UserItem1host : 1 -100 UsrerItem2host : 101-200 Useritem3host : 201 -300 ….UserItem1host UserItem2host UserItem27host Selling pools All pools Item Range : 101 -200 Allocation Location • Allocation based on seller segment, and seller segment based on user host, seller level - All of a sellers items are listed in certain set of Item hosts • Each User Item host has a range of IDs associated with it - All items listed on a item host get an ID within a specified Range • ID being in a fixed range allows for location of items • Indefinitely scalable Item Id = 150 #cassandra13
  • 8. confidential 8 Lookup Based Model : User Split User_id_lookup User_name_lookup UserLookupHost UserReadWrite10host UserReadFailover10host Replica UserReadWrite11host UserReadFailover11host Replica UserReadWrite22host UserReadFailover22host Replica ………. Multiple USER Read/Write Databases App Pools User lookup based on user id or name User read/Write • Initial allocation of user based on MOD/ round robin • Location based on persistent mapping stored in UserLookup database family - Lookup based on user numeric id or name id. • Each host has a subset of users and support both read and write • Infinitely scalable#cassandra13
  • 9. confidential 9 Challenges of Traditional RDBMS • Performance penalty to maintain ACID features • Lack of native sharding and replication features • Cost of software/hardware • Higher cost of commit #cassandra13
  • 10. confidential 10 Few words about myself • Am a RDBMS veteran. • Started with Oracle 5 in early 1990s. • For more than a decade, I worked on Oracle, MS SQL Server, MySQL in DoubleClick, Yahoo and Intuit • In recent years, I am working at eBay marketplace focusing on NoSQL technology, like Cassandra, MongoDB. #cassandra13
  • 11. confidential 11 Why Cassandra? • Flexible schema • Good performance for both reads/writes • Horizontal linear scalability • Built-in high availability • No SPOF • Automatic Geo load balancing and DR with multiple data centers deployment • Automatic sharding for load balancing across multiple nodes • Real-time data analytics, search with DSE, no more ETL • No manual data purging required #cassandra13
  • 12. confidential 12 Cassandra @ eBay • Started in 2011 • Uses Apache Cassandra and DataStax Enterprise • Java Hector client used most, and evaluating other options • 10+ production clusters on 100+ bare metal nodes with 250TB storage provisioned across multiple data centers • 100TB+ user data on local HDD and shared SSD array • Cluster size varies from 6-node cluster to 32-node cluster #cassandra13
  • 13. confidential 13 Cassandra Environment • Software • Cassandra 1.0/1.1/1.2, DSE 3.0, RHEL 6.3, Hector 1.1 • Standard hardware: • HP DL380, 12 cores, 96GB RAM, 5.5TB raid10 HDD • Dell R620, 12 cores, 128GB RAM, 1TB raid10 HDD , 10 Gbps NIC • Violin 6200/6600 flash memory array • VM – coming soon • Two types of cluster: • Dedicated cluster for large use case • Multi-tenant cluster for small use cases #cassandra13
  • 14. confidential 14 eBay Scale #cassandra13 • Data Centers: 3 • Production Clusters: 10+ • Physical nodes: 100+ • Total data size: 100+ TB • Largest cluster: 32 nodes • Largest CF by size: 40 TB • Largest CF by rows: 32 billion • Busiest CF by reads: 1.6 billion/day • Busiest CF by writes: 6 billion/day • Total reads: 5 billion/day • Total writes: 9 billion/day
  • 15. confidential 15 Packaging • Build own RPMs on top of binary distribution • Different RPM for Apache Cassandra and DSE • Easy to install/upgrade, easy to maintain • Ensure deployment consistency across board and easy to identify deployment difference • Built in pre-defined tuning parameters • Using virtual nodes as default for new 1.2 clusters #cassandra13
  • 16. confidential 16 Tuning Examples • Kernel • Increase vm.max_map_count to fix “attempt to allocate stack guard pages failed” and “java.lang.OutOfMemoryError: Map failed” • JVM heap • Keep heap size under control, and disable row cache(almost always) • Compaction • LCS vs. STCS. Used LCS in 1.0, but had severe performance issue when # of SST grows to ~200K • Adjust compaction throughput as needed • Hector client • Enable discovery mode and adjust timeout for different use cases #cassandra13
  • 17. confidential 17 Scalability • Benchmarking • Get performance baseline for new type of hardware • Enforce full scale testing in dedicated LnP env before going to production • In general, scale out by adding more nodes to increase throughput • Sometimes, it’s cost-efficient to scale up at component level by Identifying scaling bottleneck, then resolve it accordingly • Network bandwidth: upgrade to 10 Gbps network • I/O latency: upgrade to SSD • Storage: add/expand data volume • CPU/Memory: haven’t seen it yet #cassandra13
  • 18. confidential 18 Operations Notes • Routine repair is not really needed if there is no deletes. You still need run repair after bringing up a down node if it is dead for a while • Use CNAME in client configuration to avoid client conf change in case of hardware replacement with new IP/name • Reduce gc_grace to reduce overall data size • Disable swap to avoid a slow node • Bootstrapping didn’t work for 1st few times , so you have keep trying • Disable row cache, unless you have <100K rows • Collect statistics, real-time or historical, to monitor system performance and provide dashboard report for management #cassandra13
  • 19. confidential 19 Monitoring • Integrate with NOC monitoring tool for 24*7 support • Customized email alerts for non-critical events • Pending compactions • Garbage collections • Storage usage • DataStax OpsCenter Enterprise • Monitor multiple clusters on one place • Rolling restart • Dashboard #cassandra13
  • 21. confidential 21 NOC Real-time Monitoring #cassandra13
  • 22. confidential 22 Typical Cassandra Use Cases at eBay • Write Intensive: Metrics collection • Collecting metrics from tens of thousands devices periodically • 6+ billion writes per day • Read Intensive: Recommendation backend • 4+ billion reads per day • Mixed workload: Personalization • Data is bulked loaded from data warehouse periodically • Data is retrieved in real time when user visits ebay site #cassandra13
  • 24. confidential 24 Metrics Collection • Storing, serving real-time metrics data for tens of thousands devices for dashboards, triage Automation etc • Statistics • 16-node 1.2 cluster • 2 copies of data in 2 data centers • 600M keys • 6B writes including rollups per day #cassandra13
  • 26. confidential 26 Recommendation Backend • Recommendation based on user’s list/watch/sell activities • Two 1.0 clusters, 40 nodes combined • Statistics • 25B keys • 32TB data on SSD • 600M writes per day • 3B reads per day #cassandra13
  • 27. confidential 27 Vendor Support • Support from DataStax. • CASSANDRA-4142: OOM during repair with LCS • CASSANDRA-4287: Slow compaction • CASSANDRA-4427: Restarting a failed bootstrap node instajoins the ring • CASSANDRA-5107: Node fails to start because host id is missing • Prompt responses from highly skilled support persons • Support $$$ is not wasted, trust me. #cassandra13
  • 28. confidential 28 Looking Forward • Support more mission critical use cases, like personalization, anti-fraud etc • Hardware virtualization • OpenStratus which is built on top of OpenStack • Separate storage from compute node • To scale different components as needed • More automations, such as self-serve deployment • More best practice process • Troubleshooting/Tuning runbook #cassandra13
  • 29. confidential 29 Before Q&A… Thanks for your time. We are looking for talented NoSQL experts to join database engineering team email resume to ajambhekar at ebay dot com #cassandra13

Editor's Notes

  1. The reason of calling myself a veteran is that I have used Oracle for more than 20 years. When I installed Oracle for the first time, I got more than 20 floppy disks. One of the disk was bad in the middle of the installation, then I had to get it replaced and started over. I was kind of getting bored with Oracle, so in recent years…
  2. Given limitations of RDBMS as Anurag mentioned, we have evaluated multiple NoSQL solutions which are good alternatives to transactional RDBMS for some use cases. At eBay we have started to adopt NoSQL technology company wide in recent years and we choose different NoSQL platforms case by case. Here are some reasons we choose Cassandra for certain projects.Flexible schema. A major complain we heard from developers who are developing on top of RDBMS is they have to rely on DBA to make schema change before they release code. As a Oracle DBA myself, I also feel the pain to make schema change in production environment. Sometimes it requires application downtime, sometimes it changes execution plan which severely impacts system performance. Flexible schema in Cassandra gives developer flexibility to implement features faster and independently and it solves a big headache for DBA as well.Performance wise, while Cassandra is already well known for high performance writes, in recent releases it also improves reads performance dramatically by compression, leveled compaction and other changes.Let’s move on to scalability. Think about Oracle RAC, MySQL cluster and MongoDB shard, none of them can achieve linear scalability like Cassandra does. Such feature is critical to support big environment like eBay.Another out of box benefit is built-in HA. --Because of its peer to peer architecture, we no longer need to worry about single point failure, DBA can sleep well during night when a node fails.--Multiple data center deployment provides load balancing and disaster recovery. We can direct application traffic to local Cassandra hosts only so there is no need to worry about across data center latency.Automatic sharding. Using random partitioner, you don’t need to design and implement complicated sharding strategy to scale out. Cassandra does it for you out of box.On the analytic side, big data analysis is made easy with Cassandra. There is no need to build separated Hadoop cluster to run M/R job, thus eliminate ETL as well.Last, another problem Cassandra solves is data purging which DBA use to write backend scripts to delete expired data to save storage and improve performance.
  3. eBay started to use Cassandra 2 years ago in 2011. First we started with simple write-intensive logging use cases and gradually expand to other categories. We supports both Apache Cassandra and DSE. DSE is mostly used in use case when data analytics and solr search is required. Java is standard programming language at eBay. Java based applications access Cassandra using Hector client. We are also evaluating other client options to meet requirement for different use cases.As of now, we have over 10 production clusters across multiple data centers. 250TB storage are provisioned on 100 plus physical nodes.In terms of user data, there are more than 100TB data stored in HDD or SSD.Most of use cases are big enough to have own dedicated cluster. The smallest cluster is 6 node and biggest one is 32 nodes.
  4. Here is information about our production environment.
  5. Let’s see some numbers of our Cassandra deployment.
  6. In order to simplify and automate deployment and administration, we build our own RPM packages from binary distribution. It allows us to install or upgrade software easily in our production environment which is behind firewall where we can not access external repository. When we build RPM, we customized cassandra.yaml and cassandra-env.sh and some kernel parameters to get better performance out of box. Then we can make further based on actual usage.
  7. Here are few parameters we tweaked based on our learning.
  8. One import thing to avoid seeing surprise in production is benchmarking near-real traffic pre-production. We run benchmark test in our LnP(Load and Performance) env before onboarding major application. We also run similar benchmark test against different type of hardware, so we have more accurate data for capacity planning.Just like other NoSQL solution, Cassandra can scale out by adding more nodes to increase throughput linearly. When we run Cassandra at eBay, sometimes it’s more cost-efficient to scale up before scaling out. When a system reaches capacity limit, we would try to identify scaling bottleneck to see whether we can solve the issue at component level. Like….
  9. I want to share few operation notes we learned from production system in last few years.1)2)3) Gc_grace is pretty important for data with short TTL. Its 10 day default is too big for data with few days TTL. If you forget to consider gc_grace when you estimate total storage required in production, you will be surprised to see your storage usage keeps increase beyond expectation.4)When a node is performing poorly, it’s better to stop it instead of letting it running in unstable state. Thus we disable swap so Cassandra can go down when it runs out of Java heap. 5)6) Row cache is not efficient for large data set, so we always disable it for all eBay deployments.7) We collect system/cassandra statistics through JMX interface periodically so we have a better idea how system is performing at any given time and it helps a lot when we troubleshoot a particular problem.
  10. Monitoring is very important to manage production system. We have 3 ways to monitoring Cassandra.-- Critical components like database, storage, network, firewall are closely monitored by NOC people 24*7 in US and China. As of Cassandra, they are being monitored real time as well in NOC. -- In addition, we configured email alerts for certain non-critical events, such as pending compaction, storage usage, garbage collection frequency etc. This can be reviewed and handled by oncall DBA as well.-- We also utilize OpsCenter to view/monitor multiple clusters at one place and we also use it for certain admin tasks like rolling restart.
  11. Here is OpsCenteroverview for our production system. It’s easy to see storage usage and access pattern here.
  12. Here is screenshot of our NOC monitoring screen which refreshed every few seconds. If a system is slow, it will show up on the top as yellow color, if it’s down, it will show on the top as red color.
  13. Now I want to review few use cases at eBay, they covers different kinds of work load.
  14. Metrics collection is used to collect tens of thousands of eBay production devices, like server, switch, load balancer erc periodically and then saved into Cassandra for statistics rollup, dashboard reporting and troubleshooting.
  15. eBay has hundred of millions of users and every day there are also hundred millions of items listed on eBay site. There are billions user activities happening daily. It’s a not a big data set, but a huge data set which is a gold mine if we can abstract useful information out of it. One of the usage of big data is predict user interest based on past activities. User activity data are stored/processed in Cassandra and then being passed to recommendation engine.
  16. It’s important to have a partner when adopting a new technology. We want to thank DataStax customer service team for their support during past 2 years. They have helped solved many critical issues and we can always count on them.