SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Cassandra ETL
Streaming Bulk Loading
     Alex Araujo alexaraujo@gmail.com
Background

•   Sharded MySql ETL
    Platform on EC2 (EBS)

•   Database Size - Up to
    1TB

•   Write latencies
    exponentially
    proportional to data size
Background
• Cassandra Thrift Loading on EC2
  (Ephemeral RAID0)

• Database Size - ∑ available node space
• Write latencies ~linearly proportional to
  number of nodes
• 12 XL node cluster (RF=3): 6-125x
  Improvement over EBS backed MySQL
  systems
Thrift ETL

• Thrift overhead: Converting to/from
  internal structures
• Routing from coordinator nodes
• Writing to commitlog
• Internal structures -> on-disk format
          Source: http://wiki.apache.org/cassandra/BinaryMemtable
Bulk Load
• Core functionality
• Existing ETL
  Nodes for bulk
  loading
• Move data file &
  index generation
  off C* nodes
BMT Bulk Load

• Requires StorageProxy API (Java)
• Rows not live until flush
• Wiki example uses Hadoop
         Source: http://wiki.apache.org/cassandra/BinaryMemtable
Streaming Bulk Load
• Cassandra as Fat Client
• BYO SSTables
• sstableloader [options] /path/to/
  keyspace_dir
  • Can ignore list of nodes (-i)
  • keyspace_dir should be name of keyspace
    and contain generated SSTable Data &
    Index files
Users
UserId     email                      name          ...
<Hash>
            UserGroups
 GroupId             UserId                        ...
 <UUID>     {“date_joined”:”<date>”,”date_left”:
              ”<date>”,”active”:<true|false>}



         UserGroupTimeline
 GroupId       <TimeUUID>                          ...
 <UUID>          UserId
Setup

• Opscode Chef 0.10.2 on EC2
• Cassandra 0.8.2-dev-SNAPSHOT (trunk)
• Custom Java ETL JAR
• The Grinder 3.4 (Jython) Test Harness
Chef 0.10.2

• knife-ec2 bootstrap with --ephemeral
• ec2::ephemeral_raid0 recipe
 • Installs mdadm, unmounts default /mnt,
    creates RAID0 array on /mnt/md0
Chef 0.10.2
• cassandra::default recipe
 • Downloads/extracts apache-cassandra-
    <version>-bin.tar.gz
 • Links /var/lib/cassandra to /raid0/
    cassandra
 • Creates cassandra user & directories,
    increases file limits, sets up cassandra
    service, generates config files
Chef 0.10.2

• cassandra::cluster_node recipe
 • Determines # nodes in cluster
 • Calculates initial_token; generates
    cassandra.yaml
  • Creates keyspace and column families
Chef 0.10.2

• cassandra::bulk_load_node recipe
 • Generates same cassandra.yaml with
    empty initial_token
 • Installs/configures grinder scripts; Java
    ETL JAR
ETL JAR
for (File : files)
{
                         ETL JAR
  importer = new CBLI(...);
  importer.open();
  // Processing omitted
  importer.close()
}
ETL JAR
CassandraBulkLoadImporter.initSSTableWriters():
File tempFiles = new File(“/path/to/Prefs”);
tempFiles.mkdirs();
for (String cfName : COLUMN_FAMILY_NAMES) {
  SSTableSimpleUnsortedWriter writer = new
    SSTableSimpleUnsortedWriter(
        tempFiles, Model.Prefs.Keyspace.name, cfName,
        Model.COLUMN_FAMILY_COMPARATORS.get(cfName),
        null, bufferSizeInMB);// No Super CFs
  writers.put(name, writer);
}
ETL JAR
CassandraBulkLoadImporter.processSuppressionRecips():


for (User user : users) {
  String key = user.getUserId();
  SSTSUW writer = tableWriters.get(Model.Users.CF.name);
  // rowKey() converts String to ByteBuffer
  writer.newRow(rowKey(key));
  o.a.c.t.Column column = newUserColumn(user);
  writer.addColumn(column.name, column.value,
                    column.timestamp);
  ...
  // Repeat for each column family
}
ETL JAR
CassandraBulkLoadImporter.close():

for (String cfName : COLUMN_FAMILY_NAMES) {
  try {
    tableWriters.get(cfName).close();
  }
  catch (IOException e) {
    log.error(“close failed for ”+cfName);
    throw new RuntimeException(cfName+” did not close”);
  }
  String streamCmd = "sstableloader -v --debug"
    + tempFiles.getAbsolutePath();
  Process stream = Runtime.getRuntime().exec(streamCmd);
  if (!stream.waitFor() == 0)
    log.error(“stream failed”);
}
cassandra_bulk_load.py
import random
import sys
import uuid

from   java.io import File
from   net.grinder.script.Grinder import grinder
from   net.grinder.script import Statistics
from   net.grinder.script import Test
from   com.mycompany import App
from   com.mycompany.tool import SingleColumnBulkImport
cassandra_bulk_load.py
input_files = [] # files to load
site_id = str(uuid.uuid4())
import_id = random.randint(1,1000000)
list_ids = []     # lists users will be loaded to

try:
  App.INSTANCE.start()
  dao = App.INSTANCE.getUserDAO()
  bulk_import = SingleColumnBulkImport(
     dao.prefsKeyspace, input_files, site_id,
     list_ids, import_id)
except:
   exception = sys.exc_info()[1]
   print exception.message
   print exception.stackTrace
cassandra_bulk_load.py
# Import stats
grinder.statistics.registerDataLogExpression(
  "Users Imported", "userLong0")
grinder.statistics.registerSummaryExpression(
  "Total Users Imported", "(+ userLong0)")
grinder.statistics.registerDataLogExpression(
  "Import Time", "userLong1")
grinder.statistics.registerSummaryExpression(
  "Import Total Time (sec)", "(/ (+ userLong1) 1000)")

rate_expression = "(* (/ userLong0 (/ (/ userLong1 1000)
"+str(num_threads)+")) "+str(replication_factor)+")"

grinder.statistics.registerSummaryExpression(
  "Cluster Insert Rate (users/sec)", rate_expression)
cassandra_bulk_load.py
# Import and record stats
def import_and_record():
    bulk_import.importFiles()
    grinder.statistics.forCurrentTest.setLong(
      "userLong0", bulk_import.totalLines)
    grinder.statistics.forCurrentTest.setLong(
      "userLong1",
      grinder.statistics.forCurrentTest.time)

# Create an Import Test with a test number and a
description
import_test = Test(1, "Recip Bulk Import").wrap(
  import_and_record)

# A TestRunner instance is created for each thread
class TestRunner:
# This method is called for every run.
   def __call__(self):
      import_test()
Stress Results
• Once Data and Index files generated,
  streaming bulk load is FAST
 • Average: ~2.5x increase over Thrift
 • ~15-300x increase over MySQL
• Impact on cluster is minimal
• Observed downside: Writing own SSTables
  slower than Cassandra
Q’s?

Mais conteúdo relacionado

Mais procurados

ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzDataStax Academy
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraQuentin Ambard
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDatabricks
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...DataStax
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...Altinity Ltd
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 

Mais procurados (20)

ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 

Destaque

Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchDataStax Academy
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1DataStax Academy
 
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Ambiente Livre
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in CassandraShogo Hoshii
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...DataStax
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...DataStax
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Markus Höfer
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...DataStax
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cachergrebski
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Cassandra Summit 2010 Performance Tuning
Cassandra Summit 2010 Performance TuningCassandra Summit 2010 Performance Tuning
Cassandra Summit 2010 Performance Tuningdriftx
 
Cassandra overview: Um Caso Prático
Cassandra overview:  Um Caso PráticoCassandra overview:  Um Caso Prático
Cassandra overview: Um Caso PráticoEiti Kimura
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraDataStax Academy
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And TechniquesKnoldus Inc.
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationRamkumar Nottath
 

Destaque (20)

Migration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a HitchMigration Best Practices: From RDBMS to Cassandra without a Hitch
Migration Best Practices: From RDBMS to Cassandra without a Hitch
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
Escalabilidade Linear com o Banco de Dados NoSQL Apache Cassandra.
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
 
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
Building a Fast, Resilient Time Series Store with Cassandra (Alex Petrov, Dat...
 
Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016Bucket your partitions wisely - Cassandra summit 2016
Bucket your partitions wisely - Cassandra summit 2016
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
How Cassandra Deletes Data (Alain Rodriguez, The Last Pickle) | Cassandra Sum...
 
On heap cache vs off-heap cache
On heap cache vs off-heap cacheOn heap cache vs off-heap cache
On heap cache vs off-heap cache
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Cassandra Summit 2010 Performance Tuning
Cassandra Summit 2010 Performance TuningCassandra Summit 2010 Performance Tuning
Cassandra Summit 2010 Performance Tuning
 
Cassandra overview: Um Caso Prático
Cassandra overview:  Um Caso PráticoCassandra overview:  Um Caso Prático
Cassandra overview: Um Caso Prático
 
Managing (Schema) Migrations in Cassandra
Managing (Schema) Migrations in CassandraManaging (Schema) Migrations in Cassandra
Managing (Schema) Migrations in Cassandra
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
Cassandra - Tips And Techniques
Cassandra - Tips And TechniquesCassandra - Tips And Techniques
Cassandra - Tips And Techniques
 
Performance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migrationPerformance tuning - A key to successful cassandra migration
Performance tuning - A key to successful cassandra migration
 

Semelhante a ETL With Cassandra Streaming Bulk Loading

Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksMongoDB
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
DevOps Enabling Your Team
DevOps Enabling Your TeamDevOps Enabling Your Team
DevOps Enabling Your TeamGR8Conf
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparisonshsedghi
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile appsIvano Malavolta
 
Google apps script database abstraction exposed version
Google apps script database abstraction   exposed versionGoogle apps script database abstraction   exposed version
Google apps script database abstraction exposed versionBruce McPherson
 
What is MariaDB Server 10.3?
What is MariaDB Server 10.3?What is MariaDB Server 10.3?
What is MariaDB Server 10.3?Colin Charles
 
Immutable Deployments with AWS CloudFormation and AWS Lambda
Immutable Deployments with AWS CloudFormation and AWS LambdaImmutable Deployments with AWS CloudFormation and AWS Lambda
Immutable Deployments with AWS CloudFormation and AWS LambdaAOE
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosJoe Stein
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database JonesJohn David Duncan
 
Lab Manual Combaring Redis with Relational
Lab Manual Combaring Redis with RelationalLab Manual Combaring Redis with Relational
Lab Manual Combaring Redis with RelationalAmazon Web Services
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)lennartkats
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesYevgeniy Brikman
 
Pycon 2012 Apache Cassandra
Pycon 2012 Apache CassandraPycon 2012 Apache Cassandra
Pycon 2012 Apache Cassandrajeremiahdjordan
 

Semelhante a ETL With Cassandra Streaming Bulk Loading (20)

Fun Teaching MongoDB New Tricks
Fun Teaching MongoDB New TricksFun Teaching MongoDB New Tricks
Fun Teaching MongoDB New Tricks
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
DevOps Enabling Your Team
DevOps Enabling Your TeamDevOps Enabling Your Team
DevOps Enabling Your Team
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 
Local data storage for mobile apps
Local data storage for mobile appsLocal data storage for mobile apps
Local data storage for mobile apps
 
Google apps script database abstraction exposed version
Google apps script database abstraction   exposed versionGoogle apps script database abstraction   exposed version
Google apps script database abstraction exposed version
 
What is MariaDB Server 10.3?
What is MariaDB Server 10.3?What is MariaDB Server 10.3?
What is MariaDB Server 10.3?
 
Immutable Deployments with AWS CloudFormation and AWS Lambda
Immutable Deployments with AWS CloudFormation and AWS LambdaImmutable Deployments with AWS CloudFormation and AWS Lambda
Immutable Deployments with AWS CloudFormation and AWS Lambda
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Building node.js applications with Database Jones
Building node.js applications with Database JonesBuilding node.js applications with Database Jones
Building node.js applications with Database Jones
 
Lab Manual Combaring Redis with Relational
Lab Manual Combaring Redis with RelationalLab Manual Combaring Redis with Relational
Lab Manual Combaring Redis with Relational
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
Domain-Specific Languages for Composable Editor Plugins (LDTA 2009)
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
Reusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modulesReusable, composable, battle-tested Terraform modules
Reusable, composable, battle-tested Terraform modules
 
Pycon 2012 Apache Cassandra
Pycon 2012 Apache CassandraPycon 2012 Apache Cassandra
Pycon 2012 Apache Cassandra
 

Último

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 

Último (20)

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 

ETL With Cassandra Streaming Bulk Loading

  • 1. Cassandra ETL Streaming Bulk Loading Alex Araujo alexaraujo@gmail.com
  • 2. Background • Sharded MySql ETL Platform on EC2 (EBS) • Database Size - Up to 1TB • Write latencies exponentially proportional to data size
  • 3. Background • Cassandra Thrift Loading on EC2 (Ephemeral RAID0) • Database Size - ∑ available node space • Write latencies ~linearly proportional to number of nodes • 12 XL node cluster (RF=3): 6-125x Improvement over EBS backed MySQL systems
  • 4. Thrift ETL • Thrift overhead: Converting to/from internal structures • Routing from coordinator nodes • Writing to commitlog • Internal structures -> on-disk format Source: http://wiki.apache.org/cassandra/BinaryMemtable
  • 5. Bulk Load • Core functionality • Existing ETL Nodes for bulk loading • Move data file & index generation off C* nodes
  • 6. BMT Bulk Load • Requires StorageProxy API (Java) • Rows not live until flush • Wiki example uses Hadoop Source: http://wiki.apache.org/cassandra/BinaryMemtable
  • 7. Streaming Bulk Load • Cassandra as Fat Client • BYO SSTables • sstableloader [options] /path/to/ keyspace_dir • Can ignore list of nodes (-i) • keyspace_dir should be name of keyspace and contain generated SSTable Data & Index files
  • 8. Users UserId email name ... <Hash> UserGroups GroupId UserId ... <UUID> {“date_joined”:”<date>”,”date_left”: ”<date>”,”active”:<true|false>} UserGroupTimeline GroupId <TimeUUID> ... <UUID> UserId
  • 9. Setup • Opscode Chef 0.10.2 on EC2 • Cassandra 0.8.2-dev-SNAPSHOT (trunk) • Custom Java ETL JAR • The Grinder 3.4 (Jython) Test Harness
  • 10. Chef 0.10.2 • knife-ec2 bootstrap with --ephemeral • ec2::ephemeral_raid0 recipe • Installs mdadm, unmounts default /mnt, creates RAID0 array on /mnt/md0
  • 11. Chef 0.10.2 • cassandra::default recipe • Downloads/extracts apache-cassandra- <version>-bin.tar.gz • Links /var/lib/cassandra to /raid0/ cassandra • Creates cassandra user & directories, increases file limits, sets up cassandra service, generates config files
  • 12. Chef 0.10.2 • cassandra::cluster_node recipe • Determines # nodes in cluster • Calculates initial_token; generates cassandra.yaml • Creates keyspace and column families
  • 13. Chef 0.10.2 • cassandra::bulk_load_node recipe • Generates same cassandra.yaml with empty initial_token • Installs/configures grinder scripts; Java ETL JAR
  • 15. for (File : files) { ETL JAR importer = new CBLI(...); importer.open(); // Processing omitted importer.close() }
  • 16. ETL JAR CassandraBulkLoadImporter.initSSTableWriters(): File tempFiles = new File(“/path/to/Prefs”); tempFiles.mkdirs(); for (String cfName : COLUMN_FAMILY_NAMES) { SSTableSimpleUnsortedWriter writer = new SSTableSimpleUnsortedWriter( tempFiles, Model.Prefs.Keyspace.name, cfName, Model.COLUMN_FAMILY_COMPARATORS.get(cfName), null, bufferSizeInMB);// No Super CFs writers.put(name, writer); }
  • 17. ETL JAR CassandraBulkLoadImporter.processSuppressionRecips(): for (User user : users) { String key = user.getUserId(); SSTSUW writer = tableWriters.get(Model.Users.CF.name); // rowKey() converts String to ByteBuffer writer.newRow(rowKey(key)); o.a.c.t.Column column = newUserColumn(user); writer.addColumn(column.name, column.value, column.timestamp); ... // Repeat for each column family }
  • 18. ETL JAR CassandraBulkLoadImporter.close(): for (String cfName : COLUMN_FAMILY_NAMES) { try { tableWriters.get(cfName).close(); } catch (IOException e) { log.error(“close failed for ”+cfName); throw new RuntimeException(cfName+” did not close”); } String streamCmd = "sstableloader -v --debug" + tempFiles.getAbsolutePath(); Process stream = Runtime.getRuntime().exec(streamCmd); if (!stream.waitFor() == 0) log.error(“stream failed”); }
  • 19. cassandra_bulk_load.py import random import sys import uuid from java.io import File from net.grinder.script.Grinder import grinder from net.grinder.script import Statistics from net.grinder.script import Test from com.mycompany import App from com.mycompany.tool import SingleColumnBulkImport
  • 20. cassandra_bulk_load.py input_files = [] # files to load site_id = str(uuid.uuid4()) import_id = random.randint(1,1000000) list_ids = [] # lists users will be loaded to try: App.INSTANCE.start() dao = App.INSTANCE.getUserDAO() bulk_import = SingleColumnBulkImport( dao.prefsKeyspace, input_files, site_id, list_ids, import_id) except: exception = sys.exc_info()[1] print exception.message print exception.stackTrace
  • 21. cassandra_bulk_load.py # Import stats grinder.statistics.registerDataLogExpression( "Users Imported", "userLong0") grinder.statistics.registerSummaryExpression( "Total Users Imported", "(+ userLong0)") grinder.statistics.registerDataLogExpression( "Import Time", "userLong1") grinder.statistics.registerSummaryExpression( "Import Total Time (sec)", "(/ (+ userLong1) 1000)") rate_expression = "(* (/ userLong0 (/ (/ userLong1 1000) "+str(num_threads)+")) "+str(replication_factor)+")" grinder.statistics.registerSummaryExpression( "Cluster Insert Rate (users/sec)", rate_expression)
  • 22. cassandra_bulk_load.py # Import and record stats def import_and_record(): bulk_import.importFiles() grinder.statistics.forCurrentTest.setLong( "userLong0", bulk_import.totalLines) grinder.statistics.forCurrentTest.setLong( "userLong1", grinder.statistics.forCurrentTest.time) # Create an Import Test with a test number and a description import_test = Test(1, "Recip Bulk Import").wrap( import_and_record) # A TestRunner instance is created for each thread class TestRunner: # This method is called for every run. def __call__(self): import_test()
  • 23. Stress Results • Once Data and Index files generated, streaming bulk load is FAST • Average: ~2.5x increase over Thrift • ~15-300x increase over MySQL • Impact on cluster is minimal • Observed downside: Writing own SSTables slower than Cassandra