CEDRICK
Good Afternoon
We are going to speak you about Software Development and ways to speed-up your applications.
But before starting I would like to ask you a few questions:
(1) Who is already familiar with JAVA programming language
Ok, so YOU are allowed to stay in the room.
Who said Java is no more the most popular language ? It is.
full disclosure, during this presentation you will see some Java code, you will probably to try it it afterward
Ok so Question 2 :
(2) Who is currently using or has used the DataStax Java driver for Apache Cassandra, OSS or DSE : nevermind
Oh so many ? Alexs, we are probably giving talk in the correct conference then.
Good. That what we expected.
(3) And final question the hardest, who has already tested the new generation drivers releases this year ?
Not so many, again not a surprise this is rather new.
You are in the good room, we are gonna do our presentation with the new driver and you will see the differences.
CLICK
CEDRICK
I am Cedrick Lunven, Developer Advocate at DataStax. My job is to produce tons of contents for you to succeed with your Apache Cassandra projects :
Academy.datastax.com : VIDEO Trainings, blogposts Getting Started all sort of Tutorials
VIDEOS : DDS videos weekly and all recorded meetup we are doing, TWICH
A Twitch Channel with stream and live coding every week
CODE : References applications and sample codes
And I am with Alexandre Dutra.
ALEXANDRE
I am Alexandre Dutra, Technical Manager & Software Engineer working for DataStax since 2015. I am one of the main contributors of the DataStax Java driver for Apache Cassandra and DSE, and I am also the author of the new Reactive API introduced in the latest release of the DSE driver this year. I am a big fan of reactive programming and have been using reactive programming for a while now, not only in the driver but also in other products that we develop at DataStax, such as the DataStax Bulk Loader.
For now I will Cedrick set the scene and I will come back from demos
CEDRICK
In this session we will work with a sample service, sample API.
We will first present the first in a synchronous process model.
From then, we will move to ASYNCHRONOUS processing model and then Reactive mode.
Let’s get started !
So we need a sample use case for our API fitting Apache Cassandra strengths.
Imagine, If you would have to explain why Cassandra is so cool to someone that has been working with SGBD his all life and you get 2 min what would you say ?
As advocate this is part of my job so why I like to say is :
This is a distribute Databases, multiple nodes there is no real sense to use only one
Data is distributed among those nodes but also replicated : that means you can loose some node you don’t loose data, BTYW you can loose any node because there is no master
In a mode let’s have this abacus 1TB of data and 3000 transaction and per core, if you need more volume add more nodes, if you need more throughput add more nodes
So what are use cases for Cassandra :
Data resiliency
Gobal distribution read/write anywhere and it is replicated
HUGE volume and still real time queries
Very high throughput
And this is a …CLICK
CEDRICK
We chose a Time based event series aka Timeseries and thus for 2 main reasons :
One :
The use case fits very well Apache Cassandra : High Throughput, heavy writes, lot of volumes, need scalability
Two
Aleks : (joking) is because we are always using the same sample
Me : No ! Come on.
Two
Because the data model is simple yet interesting.
<CLICK>
As you imagine we define the partition key as the source of events to evenly distributed events among the nodes
<CLICK>
We designed a valueDate as a Clustering Columns to search easily and graph some charts
<CLICK>
We specified DESC to get the last item first, those are the one we want to display.
<CLICK>
No bucketing so let’s define a TTL to 24H to avoid too large partitions
<CLICK>
And time series would use TWCS limit SSTABLE number
<CLICK>
Because, just have a look to what we are trying to draw
ALEXANDRE
But let’s keep things simple and stupid KISS.
ALEXANDRE
Few logos:
As we already told we are using Java why not using the last Java 12
Services are implemented and connected with Spring
Everything is wrapped into a Spring boot 2.1 application
Services are exposed as REST with Spring MVC
Did you see our gray hairs and beards here, we do Java,
we are serious people and do not play with the teenager language JavaScript
CEDRICK
So this is the sequence diagram we will show again and again to understand what happened under the hood
Client can UI or any system invoking our service
API is our interface, the way we expose the service
Driver represents the Java Driver
API and Driver compose your backend application runtime, real stuff and not javascript for teenagers
And the eye, well I am pretty usre you can guess
During application start first operation executed is Connection
<CLICK>
And, still in the initializing phase we prepare the statements
<CLICK>
Those operations are, and will remain, synchronous everytime.
We won’t show that to you again in the next scenarii
But what about the real Crud OPERATIONS ? Let’s start with the mutations.
<CLICK>
CEDRICK
Create and Delete operations follow the same pattern, they just don’t produce the same return codes, http codes.
201 for the first and 204 for the other.
<CLICK>
I am decoding here. Client send parameters to the API
which will validate those,
Joking : YOU are responsible for the parameters, no excuses. JavaNullPointerException is YOUR fault.
<CLICK>
API will use the params to create a Statement
Those are binded together at the driver level to create a BoundStatement
<CLICK>
Execution is trigger and you get a status, but no records this is CREATE and DELETE
Then you propagate status up to the client, success or exception
<CLICK>
Again, we are waiting like hell
This is it for the synchronous mutations, what about the queries ?
ALEXANDRE
Exactly same thing as before
In the most general sense reactive programming is a programming paradigm that allows to create responsive and resilient systems for processing data. These ideas have been summarized in the Reactive Manifesto. The manifesto stresses 4 key characristics:
Responsiveness: The system always responds in a timely manner.
Resilience: The system stays responsive in the face of failure.
Elasticity: The system stays responsive under varying workload and adapts itself to variable ingestion rates.
Message Driven: the system relies on asynchronous messaging to achieve loose coupling, and isolation.
But usually when we refer to Reactive Programming we refer to the Reactive Streams initiative.
This is a working group of people that created an API that allows you to build reactive systems in many programming languages, including Java.
This initiative provides you with a Specification, an API and a TCK (Test Compatibility Kit). The TCK allows you to certify that your implementation is compliant with the API requirements.
The API defines a few roles like the Publisher (the guy who emits data), the Subscriber (the guy who receives data) and the Subscription which defines the contract between a publisher and a subscriber.
ALEXANDRE
Few logos:
As we already told we are using Java why not using the last Java 12
Services are implemented and connected with Spring
Everything is wrapped into a Spring boot 2.1 application
Services are exposed as REST
NEW: Spring Web Flux + Reactor
NEW: Driver reactive API
Those operation are and will remain synchronous
Do it at the application launch
Let’s talk a bit about backpressure. Backpressure is a key notion in reactive streams; it’s a mechanism that allows the publisher and the subscriber to agree on an acceptable throughput for both, in order to avoid overwhelming the system with more messages than it can handle.
The driver is capable of communicating backpressure between your application and the remote DSE server in some situations only, namely when the server acts like a Publisher and your application acts as a subscriber. – IOW when you are reading data from the server In this case, the driver will only fetch more results if the subscriber is ready to process them.
The opposite situation is trickier: if you are writing data to DES, then your application is acting like a Publisher, and it’s your responsibility to regulate your throughput in order to avoid overwhelming the cluster. This is because the Cassandra protocol does not allow the server to communicate backpressure to a client. So if you don’t regulate your data ingestion rate, you risk getting back an OverloadedException.
https://issues.apache.org/jira/browse/CASSANDRA-7937
Apply backpressure gently when overloaded with writes
https://issues.apache.org/jira/browse/CASSANDRA-10993
Make read and write requests paths fully non-blocking, eliminate related stages
(TPC for OSS Cassandra), slated for 4.x
Hard to debug: but there are tools to help, see the chapter on debugging for Reactor: https://projectreactor.io/docs/core/snapshot/reference/#debugging