Slides from Lenses session at Redis Conf 19
The Rise of DataOps on Streaming data, Lenses as a DataOps platform with SQL on Redis and Kafka.
Gain visibility and unlock your data scientists.
Project Based Learning (A.I).pptx detail explanation
The Rise of DataOps - SQL on Redis
1. The Rise of DataOps - SQL on Redis
Andrew Stevenson
Lenses, CTO
2. Speaker – Andrew Stevenson
CTO at lenses.io
C++, Data Warehousing, Big/Fast Data
Always realtime
Clearing & Settlement
HFT
Investment Banking
Energy
Netherlands
lenses.io
DevOps
We all know what devops is about, at least I hope you do. Its about developers and operation practices coming together to essentially ship products faster, creating a converyor belt to deliver software that combines both disciplines, so we get CI/CD, monitoring, logging, metrics and better testing so improved software quality.
But important point here is its tech focused, developers and operations.
DataOps sits at a higher level. We heard from Thomas, from Google this morning about the higher the abstraction to more value you add.
Every company I know is trying to be data driven, we have data scientists, data engineers, business analysts, data warehouses, the protagonists is Data.
At Lenses we see 3 pillars forming DataOps, streaming flows, think Redis Gears here, but we focus on real time data.
Data goverance, auditing and security. There’s been dubious data ethincs by companies recently. And data visibility.
Ok, so what does a DataOps platform look like.
We’ll we have data sources, usually lots, some streaming, some not, anything from flat files to stock exchange feeds.
We have data storage, typically more than one, like S3 for cold storage, a RDBMS, KV store. Its varied, you have different access patterns and different needs
You also have some form of data transport, ideal a distributed log, that supports high throughput and low latency with ordering guarantees, something like Redis Streams, Kafka or Pulsar.
You also need processing, to transform and manipulate the data and ideally somewhere to run that, like kubernetes.
You need monitoring,
You need visibility – we just talk about how important that is to enabling data drive organisations.
This become interesting, say you want to move from Kafka to Redis Streams, ideally you’d prefer not to rewrite you application landscape.
So our weapon of choice is to power Lenses is SQL.
SQL is everywhere in data, SQL on this, SQL on that, and why is that…..
Nearly everybody knows some SQL and lets not forget big or fast data was around before the recent fad. Many big data teams come from a data warehousing background, that was my personal journey.
It has its flaws, for example, syntax varies from vendor to vender. Not everything can be done via SQL, you’d not write a Machine learning algorithm in SQL
I’ve used SQL successfully for all sorts for things
From simple ETL loads to realtime trade reconciliation, value at risk reporting and trade analysis, and at scale.
Onto the Lenses SQL Engine.
It has 3 main components, You see 4 but the Connect Query is actually part of Lenses yet.
The table query which is like querying a database, the continuous query is like tailing a file and the SQL processor are like the T in ETL.
So the two query APIs we have implemented for Redis are the table query and the continuous query. This slide shows and example of each.
We have a stream of events being appended a lot the bottom, 1 to 12 with new events arriving all the time. The message have a schema which contains a currency field, and we want all records with currency GBP.
If we query using the Table API we get all the messages from the start until the time of the query matching the predicate, where currency equals GBP, this will return events 1, 3, 5
For the Continous Query API we get new events arriving after the query start time that match the predicate, events, 7,8,9, 11, 12 and so on as new events arrive.
So how does is work? At a high level each request, against the websocket endpoint then spins up an Akka Streams flow from the SQL received, this then streams the data back to the client.
I don’t mention SQL is config lightly.
SQL can be version controlled and if we are using a container orchestrator like Kubernetes, we need just one docker to manage, inject the SQL and deploy in our CI/CD.