80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
Cassandra Java APIs Old and New – A Comparison
1. Cassandra Java APIs
Old and New – A Comparison
Shahryar Sedghi
Toronto Cassandra User Group
Sep. 18, 2013
2. #TCUG 2
Who am I?
www.linkedin.com/pub/shahryar-sedghi/1/439/420
@ shahryar.sedghi@parseix.com
Founder at
www.parseix.com
• Did some work on IBM Hierarchical databases (IMS DB / DOS DL1) in late 70s early 80s
• Worked extensively on IBM’s first (World’s first) relational Database (SQL/DS) in early 80s
• Have worked with Oracle and DB2 for years (not as a DBA)
• Started working on Cassandra, late 2011 (1.0.5)
@parseix
3. #TCUG 3
Disclaimer
• Code samples used here except for Astyanax
(that was just taken from the website) have
worked once in a certain release of Cassandra.
Only JDBC (modified) and new Java Driver
have been tested with Cassandra 1.2
4. #TCUG 4
Agenda
• What a Java API for Cassandra needs?
• A basic introduction to Cassandra data model
• Thrift
• Thrift based APIs
• Binary Protocol
• DATASTAX new Java API
5. #TCUG 5
A Java Database API
• Typically used in Java Application Servers
– Thread Safe
– Connection Pooling
• When used with Cassandra
– Tolerates database Machine/Network failure
– Load balancing
– Reconnects to the failed machine when its back
• Together they should provide a highly
available environment for Web apps without
an expensive HA investment
6. #TCUG 6
Cassandra Data Model at a Glance
B
A
D
K
B1 Value11 B2 Value12 B3 Value13 B4 Value14
A1 Value21 A2 Value22 A3 Value23
D1 Value51 D2 Value52 D3 Value53 D4 Value54 D5 Value55
• Is a row key, by default (best practice) it is not sorted, it is sorted by hash of the Key
• All columns of one row reside in one node
• Is a column name, 2 billion distinct column names can be in one row
• Columns are sorted by column name (Ascending or Descending)
• Is a column value, it can be null or can be a different type for each column in each
row. E.G. A1 can be an Integer and D1 can be a String
• If all 1s and all 2s and all 3s, … (e.g., A1,B1, C1) column values carry the same
data type, it can be used like a relational DB with CQL 2, better scalability and less
functionality, but not the best use of Cassandra
C C1 Value61 C2 Value62
D51 Value551 D52 Value552 D53 Value553
Super Column (Deprecated)
7. #TCUG 7
Data Model -Composite Columns
122 11:firstName
• We would like to model the following data structure:
{deptartmentId Integer, employeeId Integer, firtName String, lastName String}
11:lastName 12:firstName 12:lastName 13:firstName 13:lastName
departmentId 122, employeeId 11, 12 and 13
225 17:firstName 17:lastName 19:firstName 19:lastName
departmentId 225, employeeId 17 and 19
• CQL3
create table department(
departmentid int,
employeeid int,
firstname text,
lastname text,
PRIMARY KEY (departmentid ,
employeeid)
);
• departmentId is called Partition key
• employeeId is called Clustering key
Logical Row
Physical Row
8. #TCUG 8
Thrift
• An Apache Project
• YaRPC (Remote Procedure Call)
• Has an IDL (Interface Definition Language) like other RPCs
• Language Neutral
• Easier than many others to use
• Good fit for early releases of Cassandra to support all sorts
of clients
– Apparently not every client works as well as Java and Python
• Is RPC a good fit for database interaction? Yes and no
• Cassandra thrift by default listens on 9160
9. #TCUG 9
Thrift Importance for Cassandra
• Any Clients, except new DATASTAX drivers for Java
and .NET are using Thrift underneath
– Including Hector, JDBC and Astyanax
• Supports
– Ring Discovery
– Native access to Cassandra
– CQL 2
– CQL 3
• JDBC and Astyanax may move to native driver in the
future
10. #TCUG 10
Thrift Example: Ring Discovery
Ttransport transport = new TFramedTransport(new
TSocket(“192.168.1.14", 9160));
TProtocol protocol = new TBinaryProtocol(transport);
client = new Cassandra.Client(protocol);
transport.open();
List<TokenRange> trList = client.describe_ring(“mydb");
TokenRange tr = trList.get(0);
for(String endpoint: tr.getEndpoints()){
System.out.println(endpoint);
}
11. #TCUG 11
Thrift Example: Get All Row Keys
ColumnParent columnParent = new ColumnParent(“xyz");
SlicePredicate predicate = new SlicePredicate();
predicate.setSlice_range(new SliceRange(ByteBuffer.wrap(new
byte[0]), ByteBuffer.wrap(new byte[0]), false, 1)); // Here you can
specify a slice
KeyRange keyRange = new KeyRange(); //Get all keys, or set a range
List<KeySlice> keySlices = client.get_range_slices(columnParent,
predicate, keyRange, ConsistencyLevel.ONE); // or null in this case
ArrayList<Integer> list = new ArrayList<Integer>();
for (KeySlice ks : keySlices) {
list.add(ByteBuffer.wrap(ks.getKey()).getInt());
System.out.println(ByteBuffer.wrap(ks.getKey()).getInt());
}
12. #TCUG 12
Hector
• Most Commonly used Java API for Cassandra
• Using Thrift underneath
• Among the other features:
– Connection Pooling
– Ring Discovery and automatic Failover
– automatic retry of downed hosts
– automatic discovery of additional hosts in the
cluster
– suspension of hosts for a short period of time
after several timeouts
14. #TCUG 14
Astyanax
• Developed by Netflix
• Supports all Hector functions, much easier
• Much better connection pool and failover than Hector
• More than an API for Cassandra
– Provides some database functionality at the API level, called
Recipes
• Parallel all rows query
• Message Queue
• Chunked Object Store
• many more
• Utilities
– JSON Writer, CVS Importer
• Netflix expressed the plan to move to binary protocol at
Cassandra Summit 2013
15. #TCUG 15
Astyanax Example: Pagination
ColumnList<String> columns; int pageize = 10;
try {
RowQuery<String, String> query = keyspace
.prepareQuery(CF_STANDARD1) .getKey("A")
.setIsPaginating() .withColumnRange(new
RangeBuilder().setMaxSize(pageize).build());
while (!(columns = query.execute().getResult()).isEmpty()) {
for (Column<String> c : columns) {
// do something like c.getStringValue()
}
}
} catch (ConnectionException e) { }
16. #TCUG 16
JDBC(Java Database Connectivity)
• Standard Java Database API
• Only supports CQL to access Cassandra
• Current Cassandra JDBC driver is a shallow
implementation of JDBC on top of Thrift
• URL is like:
– jdbc:cassandra://192.168.1.5:9160?version=3.0.0
• All Java Application Servers support connection
pooling for JDBC
• No database failover and Cassandra Cluster support
• Helps to convert relational database apps to Cassandra
17. #TCUG 17
JDBC Example: Insert
• This code can run in a Servlet or an “EJB”!!! with some minor
modification
• Nothing in this code points to Cassandra or Thrift classes
• insertQuery for CQL is not always as simple as this
Context envCtx = (Context) new InitialContext().lookup("java:comp/env");
DataSource datasource = (DataSource) envCtx.lookup("jdbc/cassandra");
Connection cqlCon = datasource.getConnection();
String insertQuery = "INSERT INTO department(departmentid, employeeid, firstname,
lastname) VALUES ( ?, ?, ? )";
PreparedStatment statement = cqlCon.prepareStatement(insertQuery);
statement.setInt(1, 122);
statement.setInt(2, 11);
statement.setString(3, "John");
statement.setString(4, "Doe");
statement.close();
cqlCon.close();
18. #TCUG 18
Cassandra Binary Protocol
• Inherently asynchronous
– Can be used synchronously as well
• Frame and stream based
– Many Request with different Stream id can be sent asynchronously
– A set of frames belong to the same stream coming from the server
• Certain events are pushed from the server
– Topology change
– Status Change
– Schema change
• Because of the asynchronous nature, can easily be integrated
with new technologies like WebSockets and Servlet 3.0, 3.1
• Listens on port 9042
19. #TCUG 19
DATASTAX Java Driver
• Implements the Binary Protocol client side
• Similar to JDBC but easier in certain areas
– Specific to Cassandra, not portable
• Supports CQL and plan to support OO and DB APIs
• Supports
– Query Builder (who wants this?)
– Node Discovery
– Connection pooling
– Reconnection policies
– Load balancing policies
– Retry policies
• Cursor support announced during Cassandra Summit
2013
21. #TCUG 21
DATASTAX Java Driver Example: Select
String selectQuery = "select * from department where departmentid = ? ";
PreparedStatment statement = session.prepare(selectQuery);
statement.setConsistencyLevel(ConsistencyLevel.ONE);
BoundStatement query = statement.bind(122);
ResultSet result = session.execute(query); // you can do async here and
// get a Future instead
for(Row row:result){
System.out.println(row.getInt("employeeid"));
System.out.println(row.getString(“firstname"));
System.out.println(row.getString(“lastname"));
}