SlideShare uma empresa Scribd logo
1 de 38
YCSB
Yahoo! Cloud Serving Benchmark
Scalable Distributed Systems
Antonio L. Severien
antonio.severien@gmail.com
João Rosa
Joao.rui.rosa@gmail.com
Overview
• Distributed Databases
• Cassandra
• HBase
• YCSB General View
• YCSB Details
• Amazon EC2
• YCSB Results
• YCSB Future
• Conclusions
• References
Distributed Databases
Traditional RDBMS
• ACID transactions
• Query language (SQL)
• Data tied to the modeling (hard to analyze)
• Scalable to a limit
Distributed Databases
• Not ACID
• Not Relational
• Column oriented (key-value)
• CAP (Consistency, Availability, Partitioning)
• Big Data (Massively scalable)
Distributed Databases
• Sherpa/PNUTS
• BigTable
• HBase, Hypertable, HTable
• Megastore
• Azure
• Cassandra
• Amazon Web Services
• S3, SimpleDB, EBS • CouchDB
• Voldemort
• Dynomite
• Tokyo
• Redis
• MongoDB
Distributed Databases
• NoSQL Databases have different designs and architecture
Cassandra
Thrift
Gossip
Token ring
…
Hbase
HDFS
Zookeeper
Hadoop (MapReduce)
BigTable
GFS
Chubby (Lock Service)
MapReduce
Cassandra
• Highlights
• High availability
• Incremental scalability
• Eventually consistent
• Tradeoffs between consistency and latency
• Minimal administration
• No SPF (Single Point of Failure)
Cassandra
• CAP-aware
• Cassandra values Availability and Partitioning tolerance (AP)
 eventually consistent
• Providing strong Consistency in Cassandra increases latency
• Partitioning
• Token oriented
• Explicit Replication
• Replication factor ≤ Total nodes
• High level clients
• Python, Java, C#, .NET, Scala, Ruby, PHP, Erlang, Haskell…etc
• Thrift  driver-level interface
Cassandra
• Data Model
• Cluster:
• Machines (nodes) in a logical
Cassandra instance
• can contain multiple keyspaces
• Keyspace:
• name for ColumnFamilies
• ColumnFamilies:
• contain multiple columns each with name, value and timestamp
referenced by row keys.
• Analogous to table on RDBMS
• SuperColumns:
• columns with subcolumns
• Rows
• Columns
keyA Column1 Column2 Column3
keyB Column5 Column6 column10
Column
Byte[] Name
Byte[] Value
I64 Timestamp
Cassandra
Partitioning Replication
HBase
“HBase is more a datastore than a database”
• It lacks many of the features of RDBMS
• Distributed and scalable big data store.
• Regions model
• Strong consistency
HBase
Built on top of Hadoop Distributed Filesystem (HDFS)
HBase
• The NameNode is
responsible for maintaining
the filesystem metadata.
• The DataNodes are
responsible for storing HDFS
blocks.
HBase
• The NameNode is
responsible for maintaining
the filesystem metadata.
• The DataNodes are
responsible for storing HDFS
blocks.
Note: In our study case, we only
had interest on HDFS layer.
HBase
HBase
DatanodesNamenode
HBase
• Data is stored into HBase tables.
• Tables are made of rows and columns.
• All columns belong to a particular column family.
Important note: All column family members are stored together.
• A query on a
column family
model has a better
performance
YCSB General View
• Which is the best NoSQL DB?
• How to compare?
• Yahoo! Cloud Serving Benchmark (YCSB)
• Benchmarking tool
• Evaluate key-value and cloud DBs performance on a common set
of workloads
• Client – an extensible workload generator
• Yahoo! Research
• Brian F. Cooper - cooperb@yahoo-inc.com
• Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan
and Russell Sear
YCSB Details
• How it works?
YCSB Client
DBInterface
Layer
Client
Threads
Statistics
Workload
Executor
Cloud
Serving
Store
Workload file
• Read/write mix
• Record size
• Popularity distribution
• …
Command line
• DB to use
• Workload to use
• Target throughput
• Number of threads
• …
YCSB Details
Benchmark Tiers
• Performance
• Measure latency/throughput curve
• Increase throughput until saturation
• Scalability
• Scale up: increase hardware, data size and throughput
proportionally
• Elastic speedup: add servers while running a workload
YCSB Details
Load phase
- Load the database
$ ycsb load cassandra-10
–p hosts=127.0.0.1 –P workloadX
Transactions phase
- Executes the workload
$ ycsb run cassandra-10
–p hosts=127.0.0.1 –P workloadX
Random Load Distribution
YCSB Details
• # Yahoo! Cloud System Benchmark
• # Workload A: Update heavy workload
• # Application example: Session store recording recent actions
• #
• # Read/update ratio: 50/50
• # Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
• # Request distribution: zipfian
• recordcount=1000
• operationcount=1000
• workload=com.yahoo.ycsb.workloads.CoreWorkload
• readallfields=true
• readproportion=0.5
• updateproportion=0.5
• scanproportion=0
• insertproportion=0
• requestdistribution=zipfian
YCSB Details
• Execution parameters
• $ ./bin/ycsb run cassandra-10 –P workloads/workloada –s –threads 10 –target 100
> transactions.dat
[OVERALL],RunTime(ms), 10110
[OVERALL],Throughput(ops/sec), 98.91196834817013
[UPDATE], Operations, 491
[UPDATE], AverageLatency(ms), 0.054989816700611
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 1
[UPDATE], 95thPercentileLatency(ms), 1
[UPDATE], 99thPercentileLatency(ms), 1
[UPDATE], Return=0, 491
[UPDATE], 0, 464
[UPDATE], 1, 27
[UPDATE], 2, 0
[UPDATE], 3, 0
[UPDATE], 4, 0
...
YCSB Details
• $ ./bin/ycsb run basic -P workloads/workloada -P large.dat -s -threads 10 -
target 100 –p measurementtype=timeseries -p timeseries.granularity=2000 >
transactions.dat
[OVERALL],RunTime(ms), 10077
[OVERALL],Throughput(ops/sec), 9923.58836955443
[UPDATE], Operations, 50396
[UPDATE], AverageLatency(ms), 0.04339630129375347
[UPDATE], MinLatency(ms), 0
[UPDATE], MaxLatency(ms), 338
[UPDATE], Return=0, 50396
[UPDATE], 0, 0.10264765784114054
[UPDATE], 2000, 0.026989343690867442
[UPDATE], 4000, 0.0352882703777336
[UPDATE], 6000, 0.004238958990536277
[UPDATE], 8000, 0.052813085033008175
[UPDATE], 10000, 0.0
[READ], Operations, 49604
[READ], AverageLatency(ms), 0.038242883638416256
[READ], MinLatency(ms), 0
[READ], MaxLatency(ms), 230
[READ], Return=0, 49604
[READ], 0, 0.08997245741099663
[READ], 2000, 0.02207505518763797
[READ], 4000, 0.03188493260913297
[READ], 6000, 0.004869141813755326
[READ], 8000, 0.04355329949238579
[READ], 10000, 0.005405405405405406
YCSB Details
Status Output
Amazon EC2 Configuration
Large Instance
7.5 GB memory
4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
850 GB instance storage
64-bit platform
I/O Performance: High
API name: m1.large
Experiment Set-up
Cassandra Cluster
3 nodes + 1 node (Elasticity)
Hbase Cluster
3 nodes
Amazon EC2 Usage
Cassandra
Load phase: 60,000,000 records of 1Kb
Amazon EC2 Usage
HBase
Load phase: 60,000,000 records of 1Kb
Amazon EC2 Usage
Load phase: 60,000,000 records of 1Kb
Cassandra
HBase
Amazon EC2 Usage
Load phase: 60,000,000 records of 1KbCassandra HBase
Amazon EC2 Usage
Transaction phase:
- 10,000 records
- 1,000,000 operations
- 250 threads
Cassandra
YCSB Cassandra Results
Update Heavy Workload
(50/50)
0
10
20
30
40
50
60
0 1,000 2,000 3,000 4,000 5,000 6,000
AverageLatency(ms)
Throughput (ops/sec)
Update
0
10
20
30
40
50
60
0 1,000 2,000 3,000 4,000 5,000 6,000
AverageLatency(ms)
Throughput (ops/sec)
Read
YCSB HBase Results
0.00
0.05
0.10
0.15
0.20
0.25
0.30
471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15
AverageLatency(ms)
Throughput (ops/sec)
Update Hbase 0.90.5
0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15
AverageLatency(ms)
Throughput (ops/sec)
Read HBase 0.90.5
YCSB Cassandra Results
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
0 50000 100000 150000 200000 250000 300000 350000 400000
Latency(ms)
Time miliseconds
Elasticity Cassandra 1.0
YCSB Cassandra Results
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
0 50000 100000 150000 200000 250000 300000 350000 400000
Latency(ms)
Time miliseconds
Elasticity Cassandra 1.0
YCSB Future
Provide statistics for:
- Availability
- Replication
Additional Distributed Databases
Currently supported:
Cassandra Mapkeeper
MongoDB Redis
Voldemort Vmware vFabric Gemfire
Hbase
Conclusions
• YCSB provides a common ground for benchmarking cloud DB
services
• Good for leaning and experimenting with different distributed
databases
• Open source, extensible for new databases
• Laboratory with Amazon EC2 provided good insight into setting
up cloud services
• Challenges
• Installation problems
• Hard to follow documentation
• Working on distributed environment require lots of configuration
References
• YCSB (Yahoo! Cloud Serving Benchmark)
• https://github.com/brianfrankcooper/YCSB/wiki
• Yahoo! Research
• http://research.yahoo.com/Web_Information_Management/YCSB
• BigTable
• http://en.wikipedia.org/wiki/BigTable
• Cassandra
• http://wiki.apache.org/cassandra/
• HBase
• http://hbase.apache.org/
Questions

Mais conteúdo relacionado

Mais procurados

HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBaseHBaseCon
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseHBaseCon
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path HBaseCon
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBAthiq Ahamed
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBaseCon
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 

Mais procurados (20)

HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
Digital Library Collection Management using HBase
Digital Library Collection Management using HBaseDigital Library Collection Management using HBase
Digital Library Collection Management using HBase
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDBBenchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
Benchmarking Top NoSQL Databases: Apache Cassandra, Apache HBase and MongoDB
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 

Destaque

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBaseAnil Gupta
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBlehresman
 
Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarkingSqrrl
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDBTim Callaghan
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionNGDATA
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Sematext Group, Inc.
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...MongoDB
 
STAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureSTAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureGord Sissons
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyDataStax Academy
 
Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Alexey Rusnak
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Consjohnrjenson
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Yahoo Cloud Serving Benchmark
Yahoo Cloud Serving BenchmarkYahoo Cloud Serving Benchmark
Yahoo Cloud Serving Benchmarkkevin han
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupCarlos Juzarte Rolo
 
Преимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBПреимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBUNETA
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
VENU_Hadoop_Resume
VENU_Hadoop_ResumeVENU_Hadoop_Resume
VENU_Hadoop_ResumeVenu Gopal
 

Destaque (20)

HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseHBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL database
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDB
 
Ycsb benchmarking
Ycsb benchmarkingYcsb benchmarking
Ycsb benchmarking
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Lily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC editionLily for the Bay Area HBase UG - NYC edition
Lily for the Bay Area HBase UG - NYC edition
 
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
 
Rigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase PerformanceRigorous and Multi-tenant HBase Performance
Rigorous and Multi-tenant HBase Performance
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
STAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructureSTAC Summit 2014 - Building a multitenant Big Data infrastructure
STAC Summit 2014 - Building a multitenant Big Data infrastructure
 
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al TobeyTokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
Tokyo Cassandra Summit 2014: Tunable Consistency by Al Tobey
 
Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.Couchbase, что за зверь и на что способен.
Couchbase, что за зверь и на что способен.
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Yahoo Cloud Serving Benchmark
Yahoo Cloud Serving BenchmarkYahoo Cloud Serving Benchmark
Yahoo Cloud Serving Benchmark
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User Group
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Преимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBПреимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDB
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
VENU_Hadoop_Resume
VENU_Hadoop_ResumeVENU_Hadoop_Resume
VENU_Hadoop_Resume
 

Semelhante a YCSB: Benchmarking Distributed Databases on Amazon EC2

Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924Amazon Web Services
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPTony Rogerson
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Databases in the hosted cloud
Databases in the hosted cloud Databases in the hosted cloud
Databases in the hosted cloud Colin Charles
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL ServicesAmazon Web Services
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesAmazon Web Services
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWSAmazon Web Services
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)Amazon Web Services Korea
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinAmazon Web Services
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinIan Massingham
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 

Semelhante a YCSB: Benchmarking Distributed Databases on Amazon EC2 (20)

Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
SQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTPSQL Server 2014 In-Memory OLTP
SQL Server 2014 In-Memory OLTP
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Databases in the hosted cloud
Databases in the hosted cloud Databases in the hosted cloud
Databases in the hosted cloud
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
(DAT204) NoSQL? No Worries: Build Scalable Apps on AWS NoSQL Services
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
Drop acid
Drop acidDrop acid
Drop acid
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best Practices
 
(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS(DAT202) Managed Database Options on AWS
(DAT202) Managed Database Options on AWS
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
AWS CLOUD 2018- Amazon DynamoDB기반 글로벌 서비스 개발 방법 (김준형 솔루션즈 아키텍트)
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 

Mais de Antonio Severien

Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsAntonio Severien
 
On Pragmatism and Scientific Freedom
On Pragmatism and Scientific FreedomOn Pragmatism and Scientific Freedom
On Pragmatism and Scientific FreedomAntonio Severien
 
Community cloud antonioseverien
Community cloud antonioseverienCommunity cloud antonioseverien
Community cloud antonioseverienAntonio Severien
 

Mais de Antonio Severien (6)

Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
Scalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data StreamsScalable Distributed Real-Time Clustering for Big Data Streams
Scalable Distributed Real-Time Clustering for Big Data Streams
 
On Pragmatism and Scientific Freedom
On Pragmatism and Scientific FreedomOn Pragmatism and Scientific Freedom
On Pragmatism and Scientific Freedom
 
Community cloud antonioseverien
Community cloud antonioseverienCommunity cloud antonioseverien
Community cloud antonioseverien
 
Relational Cloud
Relational CloudRelational Cloud
Relational Cloud
 
Soap vs rest
Soap vs restSoap vs rest
Soap vs rest
 

Último

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Último (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

YCSB: Benchmarking Distributed Databases on Amazon EC2

  • 1. YCSB Yahoo! Cloud Serving Benchmark Scalable Distributed Systems Antonio L. Severien antonio.severien@gmail.com João Rosa Joao.rui.rosa@gmail.com
  • 2. Overview • Distributed Databases • Cassandra • HBase • YCSB General View • YCSB Details • Amazon EC2 • YCSB Results • YCSB Future • Conclusions • References
  • 3. Distributed Databases Traditional RDBMS • ACID transactions • Query language (SQL) • Data tied to the modeling (hard to analyze) • Scalable to a limit Distributed Databases • Not ACID • Not Relational • Column oriented (key-value) • CAP (Consistency, Availability, Partitioning) • Big Data (Massively scalable)
  • 4. Distributed Databases • Sherpa/PNUTS • BigTable • HBase, Hypertable, HTable • Megastore • Azure • Cassandra • Amazon Web Services • S3, SimpleDB, EBS • CouchDB • Voldemort • Dynomite • Tokyo • Redis • MongoDB
  • 5. Distributed Databases • NoSQL Databases have different designs and architecture Cassandra Thrift Gossip Token ring … Hbase HDFS Zookeeper Hadoop (MapReduce) BigTable GFS Chubby (Lock Service) MapReduce
  • 6. Cassandra • Highlights • High availability • Incremental scalability • Eventually consistent • Tradeoffs between consistency and latency • Minimal administration • No SPF (Single Point of Failure)
  • 7. Cassandra • CAP-aware • Cassandra values Availability and Partitioning tolerance (AP)  eventually consistent • Providing strong Consistency in Cassandra increases latency • Partitioning • Token oriented • Explicit Replication • Replication factor ≤ Total nodes • High level clients • Python, Java, C#, .NET, Scala, Ruby, PHP, Erlang, Haskell…etc • Thrift  driver-level interface
  • 8. Cassandra • Data Model • Cluster: • Machines (nodes) in a logical Cassandra instance • can contain multiple keyspaces • Keyspace: • name for ColumnFamilies • ColumnFamilies: • contain multiple columns each with name, value and timestamp referenced by row keys. • Analogous to table on RDBMS • SuperColumns: • columns with subcolumns • Rows • Columns keyA Column1 Column2 Column3 keyB Column5 Column6 column10 Column Byte[] Name Byte[] Value I64 Timestamp
  • 10. HBase “HBase is more a datastore than a database” • It lacks many of the features of RDBMS • Distributed and scalable big data store. • Regions model • Strong consistency
  • 11. HBase Built on top of Hadoop Distributed Filesystem (HDFS)
  • 12. HBase • The NameNode is responsible for maintaining the filesystem metadata. • The DataNodes are responsible for storing HDFS blocks.
  • 13. HBase • The NameNode is responsible for maintaining the filesystem metadata. • The DataNodes are responsible for storing HDFS blocks. Note: In our study case, we only had interest on HDFS layer.
  • 14. HBase
  • 16. HBase • Data is stored into HBase tables. • Tables are made of rows and columns. • All columns belong to a particular column family. Important note: All column family members are stored together. • A query on a column family model has a better performance
  • 17. YCSB General View • Which is the best NoSQL DB? • How to compare? • Yahoo! Cloud Serving Benchmark (YCSB) • Benchmarking tool • Evaluate key-value and cloud DBs performance on a common set of workloads • Client – an extensible workload generator • Yahoo! Research • Brian F. Cooper - cooperb@yahoo-inc.com • Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sear
  • 18. YCSB Details • How it works? YCSB Client DBInterface Layer Client Threads Statistics Workload Executor Cloud Serving Store Workload file • Read/write mix • Record size • Popularity distribution • … Command line • DB to use • Workload to use • Target throughput • Number of threads • …
  • 19. YCSB Details Benchmark Tiers • Performance • Measure latency/throughput curve • Increase throughput until saturation • Scalability • Scale up: increase hardware, data size and throughput proportionally • Elastic speedup: add servers while running a workload
  • 20. YCSB Details Load phase - Load the database $ ycsb load cassandra-10 –p hosts=127.0.0.1 –P workloadX Transactions phase - Executes the workload $ ycsb run cassandra-10 –p hosts=127.0.0.1 –P workloadX Random Load Distribution
  • 21. YCSB Details • # Yahoo! Cloud System Benchmark • # Workload A: Update heavy workload • # Application example: Session store recording recent actions • # • # Read/update ratio: 50/50 • # Default data size: 1 KB records (10 fields, 100 bytes each, plus key) • # Request distribution: zipfian • recordcount=1000 • operationcount=1000 • workload=com.yahoo.ycsb.workloads.CoreWorkload • readallfields=true • readproportion=0.5 • updateproportion=0.5 • scanproportion=0 • insertproportion=0 • requestdistribution=zipfian
  • 22. YCSB Details • Execution parameters • $ ./bin/ycsb run cassandra-10 –P workloads/workloada –s –threads 10 –target 100 > transactions.dat [OVERALL],RunTime(ms), 10110 [OVERALL],Throughput(ops/sec), 98.91196834817013 [UPDATE], Operations, 491 [UPDATE], AverageLatency(ms), 0.054989816700611 [UPDATE], MinLatency(ms), 0 [UPDATE], MaxLatency(ms), 1 [UPDATE], 95thPercentileLatency(ms), 1 [UPDATE], 99thPercentileLatency(ms), 1 [UPDATE], Return=0, 491 [UPDATE], 0, 464 [UPDATE], 1, 27 [UPDATE], 2, 0 [UPDATE], 3, 0 [UPDATE], 4, 0 ...
  • 23. YCSB Details • $ ./bin/ycsb run basic -P workloads/workloada -P large.dat -s -threads 10 - target 100 –p measurementtype=timeseries -p timeseries.granularity=2000 > transactions.dat [OVERALL],RunTime(ms), 10077 [OVERALL],Throughput(ops/sec), 9923.58836955443 [UPDATE], Operations, 50396 [UPDATE], AverageLatency(ms), 0.04339630129375347 [UPDATE], MinLatency(ms), 0 [UPDATE], MaxLatency(ms), 338 [UPDATE], Return=0, 50396 [UPDATE], 0, 0.10264765784114054 [UPDATE], 2000, 0.026989343690867442 [UPDATE], 4000, 0.0352882703777336 [UPDATE], 6000, 0.004238958990536277 [UPDATE], 8000, 0.052813085033008175 [UPDATE], 10000, 0.0 [READ], Operations, 49604 [READ], AverageLatency(ms), 0.038242883638416256 [READ], MinLatency(ms), 0 [READ], MaxLatency(ms), 230 [READ], Return=0, 49604 [READ], 0, 0.08997245741099663 [READ], 2000, 0.02207505518763797 [READ], 4000, 0.03188493260913297 [READ], 6000, 0.004869141813755326 [READ], 8000, 0.04355329949238579 [READ], 10000, 0.005405405405405406
  • 25. Amazon EC2 Configuration Large Instance 7.5 GB memory 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) 850 GB instance storage 64-bit platform I/O Performance: High API name: m1.large Experiment Set-up Cassandra Cluster 3 nodes + 1 node (Elasticity) Hbase Cluster 3 nodes
  • 26. Amazon EC2 Usage Cassandra Load phase: 60,000,000 records of 1Kb
  • 27. Amazon EC2 Usage HBase Load phase: 60,000,000 records of 1Kb
  • 28. Amazon EC2 Usage Load phase: 60,000,000 records of 1Kb Cassandra HBase
  • 29. Amazon EC2 Usage Load phase: 60,000,000 records of 1KbCassandra HBase
  • 30. Amazon EC2 Usage Transaction phase: - 10,000 records - 1,000,000 operations - 250 threads Cassandra
  • 31. YCSB Cassandra Results Update Heavy Workload (50/50) 0 10 20 30 40 50 60 0 1,000 2,000 3,000 4,000 5,000 6,000 AverageLatency(ms) Throughput (ops/sec) Update 0 10 20 30 40 50 60 0 1,000 2,000 3,000 4,000 5,000 6,000 AverageLatency(ms) Throughput (ops/sec) Read
  • 32. YCSB HBase Results 0.00 0.05 0.10 0.15 0.20 0.25 0.30 471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15 AverageLatency(ms) Throughput (ops/sec) Update Hbase 0.90.5 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 471.15 485 492.38 507.17 562.33 620.04 634.82 734.32 845.15 AverageLatency(ms) Throughput (ops/sec) Read HBase 0.90.5
  • 33. YCSB Cassandra Results 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 0 50000 100000 150000 200000 250000 300000 350000 400000 Latency(ms) Time miliseconds Elasticity Cassandra 1.0
  • 34. YCSB Cassandra Results 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 0 50000 100000 150000 200000 250000 300000 350000 400000 Latency(ms) Time miliseconds Elasticity Cassandra 1.0
  • 35. YCSB Future Provide statistics for: - Availability - Replication Additional Distributed Databases Currently supported: Cassandra Mapkeeper MongoDB Redis Voldemort Vmware vFabric Gemfire Hbase
  • 36. Conclusions • YCSB provides a common ground for benchmarking cloud DB services • Good for leaning and experimenting with different distributed databases • Open source, extensible for new databases • Laboratory with Amazon EC2 provided good insight into setting up cloud services • Challenges • Installation problems • Hard to follow documentation • Working on distributed environment require lots of configuration
  • 37. References • YCSB (Yahoo! Cloud Serving Benchmark) • https://github.com/brianfrankcooper/YCSB/wiki • Yahoo! Research • http://research.yahoo.com/Web_Information_Management/YCSB • BigTable • http://en.wikipedia.org/wiki/BigTable • Cassandra • http://wiki.apache.org/cassandra/ • HBase • http://hbase.apache.org/