SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Cassandra Hands On
Niall Milton, CTO, DigBigData
Examples courtesy of Patrick Callaghan, DataStax
Sponsored By
Introduction
—  We will be walking through Cassandra use cases
from Patrick Callaghan on github.
—  https://github.com/PatrickCallaghan/
—  Patrick sends his apologies but due to Aer Lingus
air strike on Friday he couldn’t get a flight back to
UK
—  This presentation will cover the important points
from each sample application
Agenda
—  Transactions Example
—  Paging Example
—  Analytics Example
—  Risk Sensitivity Example
Transactions Example
Scenario
—  We want to add products, each with a quantity to
an order
—  Orders come in concurrently from random buyers
—  Products that have sold out will return “OUT OF
STOCK”
—  We want to use lightweight transactions to
guarantee that we do not allow orders to complete
when no stock is available
Lightweight Transactions
—  Guarantee a serial isolation level, ACID
—  Uses PAXOS consensus algorithm to achieve this in a
distributed system. See:
—  http://research.microsoft.com/en-us/um/people/lamport/
pubs/paxos-simple.pdf
—  Every node is still equal, no master or locks
—  Allows for conditional inserts & updates
—  The cost of linearizable consistency is higher latency,
not suitable for high volume writes where low latency is
required
Retrieve & Run the Code
1.  git clone
https://github.com/PatrickCallaghan/datastax-
transaction-demo.git
2.  mvn clean compile exec:java -
Dexec.mainClass="com.datastax.demo.SchemaSetup”
3.  mvn clean compile exec:java -
Dexec.mainClass="com.datastax.transactions.Main" -
Dload=true -DcontactPoints=127.0.0.1 -
DnoOfThreads=10
Schema
1.  create keyspace if not exists
datastax_transactions_demo WITH replication =
{'class': 'SimpleStrategy',
'replication_factor': '1' };
2.  create table if not exists products(productId
text, capacityleft int, orderIds set<text>,
PRIMARY KEY (productId));
3.  create table if not exists
buyers_orders(buyerId text, orderId text,
productId text, PRIMARY KEY(buyerId, orderId));
Model
public class Order {	
	
	private String orderId;	
	private String productId;	
	private String buyerId;	
		
	…	
}
Method
—  Find current product quantity at CL.SERIAL
—  This allows us to execute a PAXOS query without
proposing an update, i.e. read the current value
SELECT capacityLeft from products WHERE
productId = ‘1234’
e.g. capacityLeft = 5
Method Contd.
—  Do a conditional update using IF operator to make
sure product quantity has not changed since last
quantity check
—  Note the use of the set collection type here.
—  This statement will only succeed if the IF condition is
met
UPDATE products SET orderIds=orderIds +
{'3'}, capacityleft = 4 WHERE productId =
’1234' IF capacityleft = 5;
Method Contd.
—  If last query succeeds, simply insert the order.
INSERT into orders (buyerId, orderId,
productId) values (1,3,’1234’);
—  This guarantees that no order will be placed where
there is insufficient quantity to fulfill it.
Comments
—  Using LWT incurs a cost of higher latency because
all replicas must be consulted before a value is
committed / returned.
—  CL.SERIAL does not propose a new value but is
used to read the possibly uncommitted PAXOS
state
—  The IF operator can also be used as IF NOT EXISTS
which is useful for user creation for example
Paging Example
Scenario
—  We have 1000s of products in our product
catalogue
—  We want to browse these using a simple select
—  We don’t want to retrieve all at once!
Cursors
—  We are often dealing with wide rows in Cassandra
—  Reading entire rows or multiple rows at once could
lead to OOM errors
—  Traditionally this meant using range queries to
retrieve content
—  Cassandra 2.0 (and Java driver) introduces cursors
—  Makes row based queries more efficient (no need to
use the token() function)
—  This will simplify client code
Retrieve & Run the Code
1.  git clone
https://github.com/PatrickCallaghan/datastax-
paging-demo.git
2.  mvn clean compile exec:java -
Dexec.mainClass="com.datastax.demo.SchemaSetup"
3.  mvn clean compile exec:java -
Dexec.mainClass="com.datastax.paging.Main"
Schema
create table if not exists
products(productId text, capacityleft int,
orderIds set<text>, PRIMARY KEY
(productId));
—  N.B With the default partitioner, products will be
ordered based on Murmer3 hash value. Old way we
would need to use the token() function to retrieve
them in order
Model
public class Product {	
	
	private String productId;	
	private int capacityLeft;	
	private Set<String> orderIds;	
	
	…	
}
Method
1.  Create a simple select query for the products
table.
2.  Set the fetch size parameter
3.  Execute the statement
Statement stmt = new
SimpleStatement("Select * from products”);	
stmt.setFetchSize(100);	
ResultSet resultSet =
this.session.execute(stmt);
Method Contd.
1.  Get an iterator for the result set
2.  Use a while loop to iterate over the result set
Iterator<Row> iterator = resultSet.iterator();	
while (iterator.hasNext()){	
	Row row = iterator.next();	
// do stuff with the row	
}
Comments
—  Very easy to transparently iterate in a memory
efficient way over a large result set
—  Cursor state is maintained by driver.
—  Allows for failover between different page
responses, i.e. the state is not lost if a page fails to
load from a node in the replica set, the page will be
requested from another node
—  See: http://www.datastax.com/dev/blog/client-
side-improvements-in-cassandra-2-0
Analytics Example
Scenario
—  Don’t have Hadoop but want to run some HIVE type
analytics on our large dataset
—  Example: Get the Top10 financial transactions
ordered by monetary value for each user
—  May want to add more complex filtering later
(where value > 1000) or even do mathematical
groupings, percentiles, means, min, max
Cassandra for Analytics
—  Useful for many scenarios when no other analytics
solution is available
—  Using cursors, queries are bounded & memory efficient
depending on the operation
—  Can be applied anywhere we can do iterative or recursive
processing, SUM, AVG, MIN, MAX etc.
—  NB: The example code also includes an
CQLSSTableWriter which is fast & convenient if we want
to manually create SSTables of large datasets rather
than send millions of insert queries to Cassandra
Retrieve & Run the Code
1.  git clone
https://github.com/PatrickCallaghan/datastax-
analytics-example.git
2.  export MAVEN_OPTS=-Xmx512M (up the memory)
3.  mvn clean compile exec:java -
Dexec.mainClass="com.datastax.bulkloader.Main"
4.  mvn clean compile exec:java -
Dexec.mainClass="com.datastax.analytics.TopTrans
actionsByAmountForUserRunner"
Schema
create table IF NOT EXISTS transactions (	
	accid text,	
	txtnid uuid,	
	txtntime timestamp,	
	amount double,	
	type text,	
	reason text,	
	PRIMARY KEY(accid, txtntime)	
);
Model
public class Transaction {	
	pivate String txtnId;	
	private String acountId;	
	private double amount;	
	private Date txtnDate;	
	private String reason;	
	private String type;	
	…	
}
Method
—  Pass a blocking queue into the DAO method which cursors the
data, allows us to pop items off as they are added
—  NB: Could also use a callback here to update the queue
public void
getAllProducts(BlockingQueue<Transaction>
processorQueue)	
Statement stmt = new SimpleStatement(“SELECT * FROM
transactions”);	
stmt.setFetchSize(2500);	
ResultSet resultSet = this.session.execute(stmt);
Method Contd.
1.  Get an iterator for the result set
2.  Use a while loop to iterate over the result set, add each row
into the queue
while (iterator.hasNext()) {	
	Row row = iterator.next();	
	Transaction transaction = 	
	createTransactionFromRow(row); //convenience	
	queue.offer(transaction); 	 	 		
}
Method Contd.
1.  Use Java Collections & Transaction comparator to
track Top results
private Set<Transaction> orderedSet = new
BoundedTreeSet<Transaction>(10, new
TransactionAmountComparator());
Comments
—  Entirely possible, but probably not to be thought of as a
complete replacement for dedicated analytics solutions
—  Issues are token distribution across replicas and mixed write
and read patterns
—  Running analytics or MR operations can be a read heavy
operation (as well as memory and i/o intensive)
—  Transaction logging tends to be write heavy
—  Cassandra can handle it, but in practice it is better to split
workloads except for smaller cases, where latency doesn’t
matter or where the cluster is not generally under significant
load
—  Consider DSE Hadoop, Spark, Storm as alternatives
Risk Sensitivity Example
Scenario
—  In financial risk systems, positions have sensitivity to
certain variable
—  Positions are hierarchical and is associated with a trader
at a desk which is part of an asset type in a certain
location.
—  E.g. Frankfurt/FX/desk10/trader7/position23
—  Sensitivity values are inserted for each position. We
need to aggregate them for each level in the hierarchy
—  The Sum of all sensitivities over time is the new
sensitivity as they are represented by deltas.
Scenario
—  E.g. Aggregations for:
—  Frankfurt/FX/desk10/trader7
—  Frankfurt/FX/desk10
—  Frankfurt/FX
—  As new positions are entered the risk sensitivities will
change and will need to be aggregated for each level
for the new value to be available
Queries
select * from risk_sensitivities_hierarchy
where hier_path = 'Paris/FX'; !
select * from risk_sensitivities_hierarchy
where hier_path = 'Paris/FX/desk4' and
sub_hier_path='trader3'; !
select * from risk_sensitivities_hierarchy
where hier_path = 'Paris/FX/desk4' and
sub_hier_path='trader3' and
risk_sens_name='irDelta';!
Retrieve & Run the Code
1.  git clone
https://github.com/PatrickCallaghan/datastax-
analytics-example.git
2.  export MAVEN_OPTS=-Xmx512M (up the memory)
3.  mvn clean compile exec:java -
Dexec.mainClass="com.datastax.bulkloader.Main"
4.  mvn clean compile exec:java -
Dexec.mainClass="com.heb.finance.analytics.Main"
-DstopSize=1000000
Schema
create table if not exists risk_sensitivities_hierarchy ( 	
	hier_path text,	
	sub_hier_path text, 	
	risk_sens_name text, 	
	value double, 	
	PRIMARY KEY (hier_path, sub_hier_path,
risk_sens_name)	
) WITH compaction={'class': 'LeveledCompactionStrategy'};	
NB: Notice the use of LCS as we want the table to be efficient for
reads also
Model
public class RiskSensitivity	
	public final String name;	
	public final String path;	
	public final String position;	
	public final BigDecimal value;	
	…	
}
Method
—  Write a service to write new sensitivities to
Cassandra Periodically.
insert into risk_sensitivities_hierarchy
(hier_path, sub_hier_path, risk_sens_name,
value) VALUES (?, ?, ?, ?)
Method Contd.
—  In our aggregator do the following periodically
—  Select data for hierarchies we wish to aggregate
select * from risk_sensitivities_hierarchy where
hier_path = ‘Frankfurt/FX/desk10/trader4’
—  Will get all positions related to this hierarchy
—  Add the values (represented as deltas) to each other to get
the new sensitivity
—  E.g. S1 = -3, S2 = 2, S3= -1
—  Write it back for ‘Frankfurt/FX/desk10/trader4’
Comments
—  Simple way to maintain up to date risk sensitivity
on an on going basis based on previous data
—  Will mean (N Hierarchies) * (N variables) queries
are executed periodically (keep an eye on this)
—  Cursors, blocking queue and bounded collections
help us achieve the same result without reading
entire rows
—  Has other applications such as roll ups for stream
data provided you have a reasonably low cardinality
in terms of number of (time resolution) * variables.
—  Thanks Patrick Callaghan for the hard work coding
the examples!
— Questions?

Mais conteúdo relacionado

Mais procurados

Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 

Mais procurados (20)

Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
LA Cassandra Day 2015 - Testing Cassandra
LA Cassandra Day 2015  - Testing CassandraLA Cassandra Day 2015  - Testing Cassandra
LA Cassandra Day 2015 - Testing Cassandra
 
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig KerstiensFive Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
 
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
 
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
Event sourcing  - what could possibly go wrong ? Devoxx PL 2021Event sourcing  - what could possibly go wrong ? Devoxx PL 2021
Event sourcing - what could possibly go wrong ? Devoxx PL 2021
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
WSO2 Product Release Webinar: WSO2 Complex Event Processor 4.0
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Building a fully-automated Fast Data Platform
Building a fully-automated Fast Data PlatformBuilding a fully-automated Fast Data Platform
Building a fully-automated Fast Data Platform
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey Low latency in java 8 by Peter Lawrey
Low latency in java 8 by Peter Lawrey
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco SlotDistributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
Distributed Computing on PostgreSQL | PGConf EU 2017 | Marco Slot
 
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
Andrzej Ludwikowski - Event Sourcing - what could possibly go wrong? - Codemo...
 
Stream processing - Apache flink
Stream processing - Apache flinkStream processing - Apache flink
Stream processing - Apache flink
 
WSO2 Complex Event Processor
WSO2 Complex Event ProcessorWSO2 Complex Event Processor
WSO2 Complex Event Processor
 
Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)Life of a Label (PromCon2016, Berlin)
Life of a Label (PromCon2016, Berlin)
 
Creating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on MesosCreating Fault Tolerant Services on Mesos
Creating Fault Tolerant Services on Mesos
 

Semelhante a Cassandra hands on

Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
Santal Li
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
drewz lin
 
App Grid Dev With Coherence
App Grid Dev With CoherenceApp Grid Dev With Coherence
App Grid Dev With Coherence
James Bayer
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Accumulo Summit
 
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhgIntroduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
zznate
 

Semelhante a Cassandra hands on (20)

Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
 
App Grid Dev With Coherence
App Grid Dev With CoherenceApp Grid Dev With Coherence
App Grid Dev With Coherence
 
Application Grid Dev with Coherence
Application Grid Dev with CoherenceApplication Grid Dev with Coherence
Application Grid Dev with Coherence
 
App Grid Dev With Coherence
App Grid Dev With CoherenceApp Grid Dev With Coherence
App Grid Dev With Coherence
 
Pragmatic Cloud Security Automation
Pragmatic Cloud Security AutomationPragmatic Cloud Security Automation
Pragmatic Cloud Security Automation
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic SystemTimely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
Timely Year Two: Lessons Learned Building a Scalable Metrics Analytic System
 
Logisland "Event Mining at scale"
Logisland "Event Mining at scale"Logisland "Event Mining at scale"
Logisland "Event Mining at scale"
 
Streaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache CassandraStreaming, Analytics and Reactive Applications with Apache Cassandra
Streaming, Analytics and Reactive Applications with Apache Cassandra
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Neo4j Stored Procedure Training Part 2
Neo4j Stored Procedure Training Part 2Neo4j Stored Procedure Training Part 2
Neo4j Stored Procedure Training Part 2
 
Lampstack (1)
Lampstack (1)Lampstack (1)
Lampstack (1)
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Introduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhgIntroduction to apache_cassandra_for_developers-lhg
Introduction to apache_cassandra_for_developers-lhg
 
Riga DevDays 2017 - Efficient AWS Lambda
Riga DevDays 2017 - Efficient AWS LambdaRiga DevDays 2017 - Efficient AWS Lambda
Riga DevDays 2017 - Efficient AWS Lambda
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
 
Java performance
Java performanceJava performance
Java performance
 

Último

一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
JOHNBEBONYAP1
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
ydyuyu
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
galaxypingy
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Monica Sydney
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
Asmae Rabhi
 

Último (20)

一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdfpdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
pdfcoffee.com_business-ethics-q3m7-pdf-free.pdf
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
哪里办理美国迈阿密大学毕业证(本硕)umiami在读证明存档可查
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
Power point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria IuzzolinoPower point inglese - educazione civica di Nuria Iuzzolino
Power point inglese - educazione civica di Nuria Iuzzolino
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 

Cassandra hands on

  • 1. Cassandra Hands On Niall Milton, CTO, DigBigData Examples courtesy of Patrick Callaghan, DataStax Sponsored By
  • 2. Introduction —  We will be walking through Cassandra use cases from Patrick Callaghan on github. —  https://github.com/PatrickCallaghan/ —  Patrick sends his apologies but due to Aer Lingus air strike on Friday he couldn’t get a flight back to UK —  This presentation will cover the important points from each sample application
  • 3. Agenda —  Transactions Example —  Paging Example —  Analytics Example —  Risk Sensitivity Example
  • 5. Scenario —  We want to add products, each with a quantity to an order —  Orders come in concurrently from random buyers —  Products that have sold out will return “OUT OF STOCK” —  We want to use lightweight transactions to guarantee that we do not allow orders to complete when no stock is available
  • 6. Lightweight Transactions —  Guarantee a serial isolation level, ACID —  Uses PAXOS consensus algorithm to achieve this in a distributed system. See: —  http://research.microsoft.com/en-us/um/people/lamport/ pubs/paxos-simple.pdf —  Every node is still equal, no master or locks —  Allows for conditional inserts & updates —  The cost of linearizable consistency is higher latency, not suitable for high volume writes where low latency is required
  • 7. Retrieve & Run the Code 1.  git clone https://github.com/PatrickCallaghan/datastax- transaction-demo.git 2.  mvn clean compile exec:java - Dexec.mainClass="com.datastax.demo.SchemaSetup” 3.  mvn clean compile exec:java - Dexec.mainClass="com.datastax.transactions.Main" - Dload=true -DcontactPoints=127.0.0.1 - DnoOfThreads=10
  • 8. Schema 1.  create keyspace if not exists datastax_transactions_demo WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1' }; 2.  create table if not exists products(productId text, capacityleft int, orderIds set<text>, PRIMARY KEY (productId)); 3.  create table if not exists buyers_orders(buyerId text, orderId text, productId text, PRIMARY KEY(buyerId, orderId));
  • 9. Model public class Order { private String orderId; private String productId; private String buyerId; … }
  • 10. Method —  Find current product quantity at CL.SERIAL —  This allows us to execute a PAXOS query without proposing an update, i.e. read the current value SELECT capacityLeft from products WHERE productId = ‘1234’ e.g. capacityLeft = 5
  • 11. Method Contd. —  Do a conditional update using IF operator to make sure product quantity has not changed since last quantity check —  Note the use of the set collection type here. —  This statement will only succeed if the IF condition is met UPDATE products SET orderIds=orderIds + {'3'}, capacityleft = 4 WHERE productId = ’1234' IF capacityleft = 5;
  • 12. Method Contd. —  If last query succeeds, simply insert the order. INSERT into orders (buyerId, orderId, productId) values (1,3,’1234’); —  This guarantees that no order will be placed where there is insufficient quantity to fulfill it.
  • 13. Comments —  Using LWT incurs a cost of higher latency because all replicas must be consulted before a value is committed / returned. —  CL.SERIAL does not propose a new value but is used to read the possibly uncommitted PAXOS state —  The IF operator can also be used as IF NOT EXISTS which is useful for user creation for example
  • 15. Scenario —  We have 1000s of products in our product catalogue —  We want to browse these using a simple select —  We don’t want to retrieve all at once!
  • 16. Cursors —  We are often dealing with wide rows in Cassandra —  Reading entire rows or multiple rows at once could lead to OOM errors —  Traditionally this meant using range queries to retrieve content —  Cassandra 2.0 (and Java driver) introduces cursors —  Makes row based queries more efficient (no need to use the token() function) —  This will simplify client code
  • 17. Retrieve & Run the Code 1.  git clone https://github.com/PatrickCallaghan/datastax- paging-demo.git 2.  mvn clean compile exec:java - Dexec.mainClass="com.datastax.demo.SchemaSetup" 3.  mvn clean compile exec:java - Dexec.mainClass="com.datastax.paging.Main"
  • 18. Schema create table if not exists products(productId text, capacityleft int, orderIds set<text>, PRIMARY KEY (productId)); —  N.B With the default partitioner, products will be ordered based on Murmer3 hash value. Old way we would need to use the token() function to retrieve them in order
  • 19. Model public class Product { private String productId; private int capacityLeft; private Set<String> orderIds; … }
  • 20. Method 1.  Create a simple select query for the products table. 2.  Set the fetch size parameter 3.  Execute the statement Statement stmt = new SimpleStatement("Select * from products”); stmt.setFetchSize(100); ResultSet resultSet = this.session.execute(stmt);
  • 21. Method Contd. 1.  Get an iterator for the result set 2.  Use a while loop to iterate over the result set Iterator<Row> iterator = resultSet.iterator(); while (iterator.hasNext()){ Row row = iterator.next(); // do stuff with the row }
  • 22. Comments —  Very easy to transparently iterate in a memory efficient way over a large result set —  Cursor state is maintained by driver. —  Allows for failover between different page responses, i.e. the state is not lost if a page fails to load from a node in the replica set, the page will be requested from another node —  See: http://www.datastax.com/dev/blog/client- side-improvements-in-cassandra-2-0
  • 24. Scenario —  Don’t have Hadoop but want to run some HIVE type analytics on our large dataset —  Example: Get the Top10 financial transactions ordered by monetary value for each user —  May want to add more complex filtering later (where value > 1000) or even do mathematical groupings, percentiles, means, min, max
  • 25. Cassandra for Analytics —  Useful for many scenarios when no other analytics solution is available —  Using cursors, queries are bounded & memory efficient depending on the operation —  Can be applied anywhere we can do iterative or recursive processing, SUM, AVG, MIN, MAX etc. —  NB: The example code also includes an CQLSSTableWriter which is fast & convenient if we want to manually create SSTables of large datasets rather than send millions of insert queries to Cassandra
  • 26. Retrieve & Run the Code 1.  git clone https://github.com/PatrickCallaghan/datastax- analytics-example.git 2.  export MAVEN_OPTS=-Xmx512M (up the memory) 3.  mvn clean compile exec:java - Dexec.mainClass="com.datastax.bulkloader.Main" 4.  mvn clean compile exec:java - Dexec.mainClass="com.datastax.analytics.TopTrans actionsByAmountForUserRunner"
  • 27. Schema create table IF NOT EXISTS transactions ( accid text, txtnid uuid, txtntime timestamp, amount double, type text, reason text, PRIMARY KEY(accid, txtntime) );
  • 28. Model public class Transaction { pivate String txtnId; private String acountId; private double amount; private Date txtnDate; private String reason; private String type; … }
  • 29. Method —  Pass a blocking queue into the DAO method which cursors the data, allows us to pop items off as they are added —  NB: Could also use a callback here to update the queue public void getAllProducts(BlockingQueue<Transaction> processorQueue) Statement stmt = new SimpleStatement(“SELECT * FROM transactions”); stmt.setFetchSize(2500); ResultSet resultSet = this.session.execute(stmt);
  • 30. Method Contd. 1.  Get an iterator for the result set 2.  Use a while loop to iterate over the result set, add each row into the queue while (iterator.hasNext()) { Row row = iterator.next(); Transaction transaction = createTransactionFromRow(row); //convenience queue.offer(transaction); }
  • 31. Method Contd. 1.  Use Java Collections & Transaction comparator to track Top results private Set<Transaction> orderedSet = new BoundedTreeSet<Transaction>(10, new TransactionAmountComparator());
  • 32. Comments —  Entirely possible, but probably not to be thought of as a complete replacement for dedicated analytics solutions —  Issues are token distribution across replicas and mixed write and read patterns —  Running analytics or MR operations can be a read heavy operation (as well as memory and i/o intensive) —  Transaction logging tends to be write heavy —  Cassandra can handle it, but in practice it is better to split workloads except for smaller cases, where latency doesn’t matter or where the cluster is not generally under significant load —  Consider DSE Hadoop, Spark, Storm as alternatives
  • 34. Scenario —  In financial risk systems, positions have sensitivity to certain variable —  Positions are hierarchical and is associated with a trader at a desk which is part of an asset type in a certain location. —  E.g. Frankfurt/FX/desk10/trader7/position23 —  Sensitivity values are inserted for each position. We need to aggregate them for each level in the hierarchy —  The Sum of all sensitivities over time is the new sensitivity as they are represented by deltas.
  • 35. Scenario —  E.g. Aggregations for: —  Frankfurt/FX/desk10/trader7 —  Frankfurt/FX/desk10 —  Frankfurt/FX —  As new positions are entered the risk sensitivities will change and will need to be aggregated for each level for the new value to be available
  • 36. Queries select * from risk_sensitivities_hierarchy where hier_path = 'Paris/FX'; ! select * from risk_sensitivities_hierarchy where hier_path = 'Paris/FX/desk4' and sub_hier_path='trader3'; ! select * from risk_sensitivities_hierarchy where hier_path = 'Paris/FX/desk4' and sub_hier_path='trader3' and risk_sens_name='irDelta';!
  • 37. Retrieve & Run the Code 1.  git clone https://github.com/PatrickCallaghan/datastax- analytics-example.git 2.  export MAVEN_OPTS=-Xmx512M (up the memory) 3.  mvn clean compile exec:java - Dexec.mainClass="com.datastax.bulkloader.Main" 4.  mvn clean compile exec:java - Dexec.mainClass="com.heb.finance.analytics.Main" -DstopSize=1000000
  • 38. Schema create table if not exists risk_sensitivities_hierarchy ( hier_path text, sub_hier_path text, risk_sens_name text, value double, PRIMARY KEY (hier_path, sub_hier_path, risk_sens_name) ) WITH compaction={'class': 'LeveledCompactionStrategy'}; NB: Notice the use of LCS as we want the table to be efficient for reads also
  • 39. Model public class RiskSensitivity public final String name; public final String path; public final String position; public final BigDecimal value; … }
  • 40. Method —  Write a service to write new sensitivities to Cassandra Periodically. insert into risk_sensitivities_hierarchy (hier_path, sub_hier_path, risk_sens_name, value) VALUES (?, ?, ?, ?)
  • 41. Method Contd. —  In our aggregator do the following periodically —  Select data for hierarchies we wish to aggregate select * from risk_sensitivities_hierarchy where hier_path = ‘Frankfurt/FX/desk10/trader4’ —  Will get all positions related to this hierarchy —  Add the values (represented as deltas) to each other to get the new sensitivity —  E.g. S1 = -3, S2 = 2, S3= -1 —  Write it back for ‘Frankfurt/FX/desk10/trader4’
  • 42. Comments —  Simple way to maintain up to date risk sensitivity on an on going basis based on previous data —  Will mean (N Hierarchies) * (N variables) queries are executed periodically (keep an eye on this) —  Cursors, blocking queue and bounded collections help us achieve the same result without reading entire rows —  Has other applications such as roll ups for stream data provided you have a reasonably low cardinality in terms of number of (time resolution) * variables.
  • 43. —  Thanks Patrick Callaghan for the hard work coding the examples! — Questions?