SlideShare a Scribd company logo
1 of 60
1Confidential
KSQL
An Open Source Streaming SQL Engine for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.confluent.io
www.kai-waehner.de
2KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
3KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
4KSQL- Streaming SQL for Apache Kafka
Apache Kafka - A Distributed, Scalable Commit Log
5KSQL- Streaming SQL for Apache Kafka
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
Apache Kafka – The Rise of a Streaming Platform
6KSQL- Streaming SQL for Apache Kafka
Apache Kafka – The Rise of a Streaming Platform
7KSQL- Streaming SQL for Apache Kafka
KSQL – A Streaming SQL Engine for Apache Kafka
8KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
9KSQL- Streaming SQL for Apache Kafka
Why KSQL?
Population
CodingSophistication
Realm of Stream Processing
New, Expanded Realm
BI
Analysts
Core
Developers
Data
Engineers
Core Developers
who don’t like
Java
10KSQL- Streaming SQL for Apache Kafka
Trade-Offs
• subscribe()
• poll()
• send()
• flush()
• mapValues()
• filter()
• punctuate()
• Select…from…
• Join…where…
• Group by..
Flexibility Simplicity
Kafka Streams KSQL
Consumer
Producer
11KSQL- Streaming SQL for Apache Kafka
What is it for ?
Streaming ETL
• Kafka is popular for data pipelines
• KSQL enables easy transformations of data within the pipe
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
12KSQL- Streaming SQL for Apache Kafka
What is it for ?
Simple Derivations of Existing Topics
• One-liner to re-partition and / or re-key a topic for new uses
CREATE STREAM views_by_userid
WITH (PARTITIONS=6,
VALUE_FORMAT=‘JSON’,
TIMESTAMP=‘view_time’) AS
SELECT *
FROM clickstream
PARTITION BY user_id;
13KSQL- Streaming SQL for Apache Kafka
What is it for ?
Analytics, e.g. Anomaly Detection
• Identifying patterns or anomalies in real-time data, surfaced in milliseconds
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY card_number
HAVING count(*) > 3;
14KSQL- Streaming SQL for Apache Kafka
What is it for ?
Real Time Monitoring
• Log data monitoring, tracking and alerting
• Sensor / IoT data
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
15KSQL- Streaming SQL for Apache Kafka
Where is KSQL not such a great fit (at least today)?
Powerful ad-hoc query
○ Limited span of time usually
retained in Kafka
○ No indexes
BI reports (Tableau etc.)
○ No indexes
○ No JDBC (most Bi tools are not
good with continuous results!)
16KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
17KSQL- Streaming SQL for Apache Kafka
KSQL – A Streaming SQL Engine for Apache Kafka
18KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● No need for source code
• Zero, none at all, not even one line.
• No SerDes, no generics, no lambdas, ...
● All the Kafka Streams “magic” out-of-the-box
• Exactly Once Semantics
• Windowing
• Event-time aggregation
• Late-arriving data
• Distributed, fault-tolerant, scalable, ...
19KSQL- Streaming SQL for Apache Kafka
STREAM and TABLE as first-class citizens
20KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS syntax
CREATE STREAM `stream_name`
[WITH (`property = expression` [, …] ) ]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WHERE `condition` ]
[ PARTITION BY `column_name` ]
● where property can be any of the following:
KAFKA_TOPIC = name - what to call the sink topic
FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream
AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to
PARTITIONS = # - number of partitions in sink topic
TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the
event time.
21KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS syntax
CREATE TABLE `stream_name`
[WITH ( `property_name = expression` [, ...] )]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
● where property values are same as for ‚Create Streams as Select‘
22KSQL- Streaming SQL for Apache Kafka
SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
[ LIMIT n ]
where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition
23KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries
● Three types supported (same as Kafka Streams):
• TUMBLING (= SLIDING)
• SELECT appname, ip, COUNT(appname) AS problem_count FROM
logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR'
GROUP BY appname, ip;
• HOPPING
• SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (
size 20 second, advance by 5 second) GROUP BY itemid;
• SESSION
• SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20
second) GROUP BY itemid;
24KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
25KSQL- Streaming SQL for Apache Kafka
Create a STREAM and a TABLE from Kafka Topics
ksql> CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews',
value_format='DELIMITED');
ksql> CREATE TABLE users_original (registertime bigint, gender varchar, regionid varchar, userid varchar) WITH
(kafka_topic='users', value_format='JSON');
ksql> SELECT pageid FROM pageviews_original LIMIT 3;
ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM
pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE';
26KSQL- Streaming SQL for Apache Kafka
Live Demo – KSQL Hello World
27KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
28KSQL- Streaming SQL for Apache Kafka
KSQL - Components
KSQL has 3 main components:
1. The CLI, designed to be familiar to users of MySQL, Postgres etc.
2. The Engine which actually runs the Kafka Streams topologies
3. The REST server interface enables an Engine to receive instructions from the CLI
(Note that you also need a Kafka Cluster… KSQL is deployed independently)
29KSQL- Streaming SQL for Apache Kafka
Kafka Cluster
JVM
KSQL EngineRESTKSQL>
#1 STAND-ALONE AKA ‘LOCAL MODE’
30KSQL- Streaming SQL for Apache Kafka
#1 STAND-ALONE AKA ‘LOCAL MODE’
Starts a CLI, an Engine,
and a REST server all
in the same JVM
Ideal for laptop development
• Start with default settings:
• > bin/ksql-cli local
Or with customized settings:
• > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
31KSQL- Streaming SQL for Apache Kafka
#2 CLIENT-SERVER
Kafka Cluster
JVM
KSQL Engine
REST
KSQL>
JVM
KSQL Engine
REST
JVM
KSQL Engine
REST
32KSQL- Streaming SQL for Apache Kafka
#2 CLIENT-SERVER
Start any number
of Server nodes
• > bin/ksql-server-start
Start any number of CLIs and
specify ‘remote’ server address
• >bin/ksql-cli remote http://myserver:8090
All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up / down without restart
33KSQL- Streaming SQL for Apache Kafka
#3 AS PRE-DEFINED APP
Kafka Cluster
JVM
KSQL Engine
JVM
KSQL Engine
JVM
KSQL Engine
34KSQL- Streaming SQL for Apache Kafka
#3 AS PRE-DEFINED APP
Running the KSQL server
with a pre-defined set of
instructions/queries
• Version control your queries and
transformations as code
Start any number of Engine
instances
• Pass a file of KSQL statements to execute
• > bin/ksql-node query-file=foo/bar.sql
All running Engines share the processing load
• Technically, instances of the same Kafka Streams Applications
• Scale up/down without restart
35KSQL- Streaming SQL for Apache Kafka
Dedicating resources
36KSQL- Streaming SQL for Apache Kafka
How do you deploy applications?
37KSQL- Streaming SQL for Apache Kafka
Where to develop and operate your applications?
38KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
39KSQL- Streaming SQL for Apache Kafka
Demo: Clickstream Analysis
Kafka
Producer
Elastic
search
Grafana
Kafka
Cluster
Kafka
Connect
KSQL
Stream of
Log Events
40KSQL- Streaming SQL for Apache Kafka
Demo: Clickstream Analysis
• https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo#clickstream-analysis
• Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana
• 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I
• Setup in 5 minutes (with or without Docker)
SELECT STREAM
CEIL(timestamp TO HOUR) AS timeWindow, productId,
COUNT(*) AS hourlyOrders, SUM(units) AS units
FROM Orders GROUP BY CEIL(timestamp TO HOUR),
productId;
timeWindow | productId | hourlyOrders | units
------------+-----------+--------------+-------
08:00:00 | 10 | 2 | 5
08:00:00 | 20 | 1 | 8
09:00:00 | 10 | 4 | 22
09:00:00 | 40 | 1 | 45
... | ... | ... | ...
41KSQL- Streaming SQL for Apache Kafka
Live Demo – KSQL Clickstream Analysis
42KSQL- Streaming SQL for Apache Kafka
Agenda
1) Apache Kafka Ecosystem
2) Motivation for KSQL
3) KSQL Concepts
4) Live Demo
5) KSQL Architecture
6) Use Case: Clickstream Analysis
7) Getting Started
43KSQL- Streaming SQL for Apache Kafka
KSQL Quick Start
github.com/confluentinc/ksql
Local runtime
or
Docker container
44KSQL- Streaming SQL for Apache Kafka
Remember: Developer Preview!
Caveats of Developer Preview
• No ORDER BY yet
• No Stream-stream joins yet
• Limited function library
• Avro support only via workaround
• Breaking API / Syntax changes still possible
BE EXCITED, BUT BE ADVISED
45KSQL- Streaming SQL for Apache Kafka
Resources and Next Steps
Get Involved
• Try the Quickstart on GitHub
• Check out the code
• Play with the examples
The point of a developer preview is to improve things—together!
https://github.com/confluentinc/ksql
http://confluent.io/ksql
https://slackpass.io/confluentcommunity #ksql
46KSQL- Streaming SQL for Apache Kafka
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.confluent.io
www.kai-waehner.de
LinkedIn
Questions? Feedback?
Please contact me…
Come to our booth…
Come to Kafka Summit London in April 2018…
47KSQL- Streaming SQL for Apache Kafka
Appendix
48KSQL- Streaming SQL for Apache Kafka
KSQL Concepts
● STREAM and TABLE as first-class citizens
● Interpretations of topic content
● STREAM - data in motion
● TABLE - collected state of a stream
• One record per key (per window)
• Current values (compacted topic) not yet
• Changelog
● STREAM – TABLE Joins
49KSQL- Streaming SQL for Apache Kafka
Schema & Format
● A Kafka broker knows how to move bytes
• Technically a key-value message (byte[], byte[])
● To enable declarative SQL-like queries and transformations we have to define
a richer structure
● Structural metadata maintained in an in-memory catalog
• DDL is recorded in a special topic
50KSQL- Streaming SQL for Apache Kafka
Schema & Format
Start with message (value) format
● JSON - the simplest choice
● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in.
Will be expanded.
● AVRO - requires that you also supply a schema-file (.avsc)
Pseudo-columns are automatically provided
• ROWKEY, ROWTIME - for querying the message key and timestamp
• (PARTITION, OFFSET coming soon)
• CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH
(value_format = 'delimited', kafka_topic='my_pageview_topic');
51KSQL- Streaming SQL for Apache Kafka
Schema & Datatypes
● varchar / string
● boolean / bool
● integer / int
● bigint / long
● double
● array(of_type) - of-type must be primitive (no nested Array or Map yet)
● map(key_type, value_type) - key-type must be string, value-type must be
primitive
52KSQL- Streaming SQL for Apache Kafka
Interactive Querying
● Great for iterative development
● LIST (or SHOW) STREAMS / TABLES
● DESCRIBE STREAM / TABLE
● SELECT
• Selects rows from a KSQL stream or table.
• The result of this statement will be printed out in the console.
• To stop the continuous query in the CLI press Ctrl+C.
53KSQL- Streaming SQL for Apache Kafka
SELECT statement syntax
SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
[ LIMIT n ]
where from_item is one of the following:
stream_or_table_name [ [ AS ] alias]
from_item LEFT JOIN from_item ON join_condition
54KSQL- Streaming SQL for Apache Kafka
WINDOWing
● Not ANSI SQL ! à Continuous Queries :-)
● Three types supported (same as KStreams):
• TUMBLING (= SLIDING)
• SELECT appname, ip, COUNT(appname) AS problem_count FROM
logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR'
GROUP BY appname, ip;
• HOPPING
• SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING (
size 20 second, advance by 5 second) GROUP BY itemid;
• SESSION
• SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20
second) GROUP BY itemid;
55KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS SELECT
● Once your query is ready and you want to run your query non-interactively
• CREATE STREAM AS SELECT ...;
● Creates a new KSQL Stream along with the corresponding Kafka topic and
streams the result of the SELECT query into the topic
● To find what streams are already running:
• SHOW QUERIES;
● If you need to stop one:
• TERMINATE query_id;
56KSQL- Streaming SQL for Apache Kafka
CREATE STREAM AS syntax
CREATE STREAM `stream_name`
[WITH (`property = expression` [, …] ) ]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WHERE `condition` ]
[ PARTITION BY `column_name` ]
● where property can be any of the following:
KAFKA_TOPIC = name - what to call the sink topic
FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream
AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to
PARTITIONS = # - number of partitions in sink topic
TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the
event time.
57KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS SELECT
● Once your query is ready and you want to run it non-interactively
● CREATE TABLE AS SELECT ...;
● Just like ‚CREATE STREAM AS SELECT‘ but for aggregations
58KSQL- Streaming SQL for Apache Kafka
CREATE TABLE AS syntax
CREATE TABLE `stream_name`
[WITH ( `property_name = expression` [, ...] )]
AS SELECT `select_expr` [, ...]
FROM `from_item` [, ...]
[ WINDOW `window_expression` ]
[ WHERE `condition` ]
[ GROUP BY `grouping expression` ]
[ HAVING `having_expression` ]
● where property values are same as for ‚Create Streams as Select‘
59KSQL- Streaming SQL for Apache Kafka
Functions
● Scalar Functions:
• CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE
• ABS, CEIL, FLOOR, RANDOM, ROUND
• StringToTimestamp, TimestampToString
• GetStringFromJSON
• CAST
● Aggregate Functions:
• SUM, COUNT, MIN, MAX
● User- defined Functions:
• Java Interface
60KSQL- Streaming SQL for Apache Kafka
Session Variables
● Just as in MySQL, ORCL etc. there are settings to control how your CLI
behaves
● Set any property the KStreams consumers/producers will understand
● Defaults can be set in the ksql.properties file
● To see a list of currently set or default variable values:
• ksql> show properties;
● Useful examples:
• num.stream.threads=4
• commit.interval.ms=1000
• cache.max.bytes.buffering=2000000
● TIP! - Your new best friend for testing or building a demo is:
• ksql> set ‘auto.offset.reset’ = ‘earliest’;

More Related Content

What's hot

Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022HostedbyConfluent
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafkaconfluent
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...confluent
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams APIconfluent
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryJean-Paul Azar
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication confluent
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Jean-Paul Azar
 

What's hot (20)

Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
KSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for KafkaKSQL: Streaming SQL for Kafka
KSQL: Streaming SQL for Kafka
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 

Similar to KSQL – An Open Source Streaming Engine for Apache Kafka

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...Kai Wähner
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLFlorent Ramiere
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Paolo Castagna
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLconfluent
 
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaReal Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaDavid Peterson
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019confluent
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020Maheedhar Gunturu
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Matt Stubbs
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaMatthias J. Sax
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLconfluent
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsLightbend
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...confluent
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafkaconfluent
 

Similar to KSQL – An Open Source Streaming Engine for Apache Kafka (20)

KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!Un'introduzione a Kafka Streams e KSQL... and why they matter!
Un'introduzione a Kafka Streams e KSQL... and why they matter!
 
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQLKafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
Kafka Summit SF 2017 - Kafka Stream Processing for Everyone with KSQL
 
Real Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and KafkaReal Time Stream Processing with KSQL and Kafka
Real Time Stream Processing with KSQL and Kafka
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
KSQL in Practice (Almog Gavra, Confluent) Kafka Summit London 2019
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
Big Data LDN 2018: STREAMING DATA MICROSERVICES WITH AKKA STREAMS, KAFKA STRE...
 
KSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache KafkaKSQL---Streaming SQL for Apache Kafka
KSQL---Streaming SQL for Apache Kafka
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Streaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka StreamsStreaming Microservices With Akka Streams And Kafka Streams
Streaming Microservices With Akka Streams And Kafka Streams
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
 
Real-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache KafkaReal-Time Stream Processing with KSQL and Apache Kafka
Real-Time Stream Processing with KSQL and Apache Kafka
 

More from Kai Wähner

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Kai Wähner
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaKai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareKai Wähner
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Kai Wähner
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureKai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryKai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail IndustryKai Wähner
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKai Wähner
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Kai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingKai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesKai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Kai Wähner
 

More from Kai Wähner (20)

Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping MetaverseKafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform MiddlewareApache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity IndustryData Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare IndustryApache Kafka in the Healthcare Industry
Apache Kafka in the Healthcare Industry
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Apache Kafka for Real-time Supply Chainin the Food and Retail IndustryApache Kafka for Real-time Supply Chainin the Food and Retail Industry
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
 
Apache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and ManufacturingApache Kafka Landscape for Automotive and Manufacturing
Apache Kafka Landscape for Automotive and Manufacturing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 

KSQL – An Open Source Streaming Engine for Apache Kafka

  • 1. 1Confidential KSQL An Open Source Streaming SQL Engine for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de LinkedIn @KaiWaehner www.confluent.io www.kai-waehner.de
  • 2. 2KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 3. 3KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 4. 4KSQL- Streaming SQL for Apache Kafka Apache Kafka - A Distributed, Scalable Commit Log
  • 5. 5KSQL- Streaming SQL for Apache Kafka The Log ConnectorsConnectors Producer Consumer Streaming Engine Apache Kafka – The Rise of a Streaming Platform
  • 6. 6KSQL- Streaming SQL for Apache Kafka Apache Kafka – The Rise of a Streaming Platform
  • 7. 7KSQL- Streaming SQL for Apache Kafka KSQL – A Streaming SQL Engine for Apache Kafka
  • 8. 8KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 9. 9KSQL- Streaming SQL for Apache Kafka Why KSQL? Population CodingSophistication Realm of Stream Processing New, Expanded Realm BI Analysts Core Developers Data Engineers Core Developers who don’t like Java
  • 10. 10KSQL- Streaming SQL for Apache Kafka Trade-Offs • subscribe() • poll() • send() • flush() • mapValues() • filter() • punctuate() • Select…from… • Join…where… • Group by.. Flexibility Simplicity Kafka Streams KSQL Consumer Producer
  • 11. 11KSQL- Streaming SQL for Apache Kafka What is it for ? Streaming ETL • Kafka is popular for data pipelines • KSQL enables easy transformations of data within the pipe CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 12. 12KSQL- Streaming SQL for Apache Kafka What is it for ? Simple Derivations of Existing Topics • One-liner to re-partition and / or re-key a topic for new uses CREATE STREAM views_by_userid WITH (PARTITIONS=6, VALUE_FORMAT=‘JSON’, TIMESTAMP=‘view_time’) AS SELECT * FROM clickstream PARTITION BY user_id;
  • 13. 13KSQL- Streaming SQL for Apache Kafka What is it for ? Analytics, e.g. Anomaly Detection • Identifying patterns or anomalies in real-time data, surfaced in milliseconds CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTES) GROUP BY card_number HAVING count(*) > 3;
  • 14. 14KSQL- Streaming SQL for Apache Kafka What is it for ? Real Time Monitoring • Log data monitoring, tracking and alerting • Sensor / IoT data CREATE TABLE error_counts AS SELECT error_code, count(*) FROM monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE type = 'ERROR' GROUP BY error_code;
  • 15. 15KSQL- Streaming SQL for Apache Kafka Where is KSQL not such a great fit (at least today)? Powerful ad-hoc query ○ Limited span of time usually retained in Kafka ○ No indexes BI reports (Tableau etc.) ○ No indexes ○ No JDBC (most Bi tools are not good with continuous results!)
  • 16. 16KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 17. 17KSQL- Streaming SQL for Apache Kafka KSQL – A Streaming SQL Engine for Apache Kafka
  • 18. 18KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● No need for source code • Zero, none at all, not even one line. • No SerDes, no generics, no lambdas, ... ● All the Kafka Streams “magic” out-of-the-box • Exactly Once Semantics • Windowing • Event-time aggregation • Late-arriving data • Distributed, fault-tolerant, scalable, ...
  • 19. 19KSQL- Streaming SQL for Apache Kafka STREAM and TABLE as first-class citizens
  • 20. 20KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS syntax CREATE STREAM `stream_name` [WITH (`property = expression` [, …] ) ] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WHERE `condition` ] [ PARTITION BY `column_name` ] ● where property can be any of the following: KAFKA_TOPIC = name - what to call the sink topic FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to PARTITIONS = # - number of partitions in sink topic TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the event time.
  • 21. 21KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS syntax CREATE TABLE `stream_name` [WITH ( `property_name = expression` [, ...] )] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] ● where property values are same as for ‚Create Streams as Select‘
  • 22. 22KSQL- Streaming SQL for Apache Kafka SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  • 23. 23KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries ● Three types supported (same as Kafka Streams): • TUMBLING (= SLIDING) • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  • 24. 24KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 25. 25KSQL- Streaming SQL for Apache Kafka Create a STREAM and a TABLE from Kafka Topics ksql> CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews', value_format='DELIMITED'); ksql> CREATE TABLE users_original (registertime bigint, gender varchar, regionid varchar, userid varchar) WITH (kafka_topic='users', value_format='JSON'); ksql> SELECT pageid FROM pageviews_original LIMIT 3; ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE';
  • 26. 26KSQL- Streaming SQL for Apache Kafka Live Demo – KSQL Hello World
  • 27. 27KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 28. 28KSQL- Streaming SQL for Apache Kafka KSQL - Components KSQL has 3 main components: 1. The CLI, designed to be familiar to users of MySQL, Postgres etc. 2. The Engine which actually runs the Kafka Streams topologies 3. The REST server interface enables an Engine to receive instructions from the CLI (Note that you also need a Kafka Cluster… KSQL is deployed independently)
  • 29. 29KSQL- Streaming SQL for Apache Kafka Kafka Cluster JVM KSQL EngineRESTKSQL> #1 STAND-ALONE AKA ‘LOCAL MODE’
  • 30. 30KSQL- Streaming SQL for Apache Kafka #1 STAND-ALONE AKA ‘LOCAL MODE’ Starts a CLI, an Engine, and a REST server all in the same JVM Ideal for laptop development • Start with default settings: • > bin/ksql-cli local Or with customized settings: • > bin/ksql-cli local –-properties-file foo/bar/ksql.properties
  • 31. 31KSQL- Streaming SQL for Apache Kafka #2 CLIENT-SERVER Kafka Cluster JVM KSQL Engine REST KSQL> JVM KSQL Engine REST JVM KSQL Engine REST
  • 32. 32KSQL- Streaming SQL for Apache Kafka #2 CLIENT-SERVER Start any number of Server nodes • > bin/ksql-server-start Start any number of CLIs and specify ‘remote’ server address • >bin/ksql-cli remote http://myserver:8090 All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up / down without restart
  • 33. 33KSQL- Streaming SQL for Apache Kafka #3 AS PRE-DEFINED APP Kafka Cluster JVM KSQL Engine JVM KSQL Engine JVM KSQL Engine
  • 34. 34KSQL- Streaming SQL for Apache Kafka #3 AS PRE-DEFINED APP Running the KSQL server with a pre-defined set of instructions/queries • Version control your queries and transformations as code Start any number of Engine instances • Pass a file of KSQL statements to execute • > bin/ksql-node query-file=foo/bar.sql All running Engines share the processing load • Technically, instances of the same Kafka Streams Applications • Scale up/down without restart
  • 35. 35KSQL- Streaming SQL for Apache Kafka Dedicating resources
  • 36. 36KSQL- Streaming SQL for Apache Kafka How do you deploy applications?
  • 37. 37KSQL- Streaming SQL for Apache Kafka Where to develop and operate your applications?
  • 38. 38KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 39. 39KSQL- Streaming SQL for Apache Kafka Demo: Clickstream Analysis Kafka Producer Elastic search Grafana Kafka Cluster Kafka Connect KSQL Stream of Log Events
  • 40. 40KSQL- Streaming SQL for Apache Kafka Demo: Clickstream Analysis • https://github.com/confluentinc/ksql/tree/0.1.x/ksql-clickstream-demo#clickstream-analysis • Leverages Apache Kafka, Kafka Connect, KSQL, Elasticsearch and Grafana • 5min screencast: https://www.youtube.com/watch?v=A45uRzJiv7I • Setup in 5 minutes (with or without Docker) SELECT STREAM CEIL(timestamp TO HOUR) AS timeWindow, productId, COUNT(*) AS hourlyOrders, SUM(units) AS units FROM Orders GROUP BY CEIL(timestamp TO HOUR), productId; timeWindow | productId | hourlyOrders | units ------------+-----------+--------------+------- 08:00:00 | 10 | 2 | 5 08:00:00 | 20 | 1 | 8 09:00:00 | 10 | 4 | 22 09:00:00 | 40 | 1 | 45 ... | ... | ... | ...
  • 41. 41KSQL- Streaming SQL for Apache Kafka Live Demo – KSQL Clickstream Analysis
  • 42. 42KSQL- Streaming SQL for Apache Kafka Agenda 1) Apache Kafka Ecosystem 2) Motivation for KSQL 3) KSQL Concepts 4) Live Demo 5) KSQL Architecture 6) Use Case: Clickstream Analysis 7) Getting Started
  • 43. 43KSQL- Streaming SQL for Apache Kafka KSQL Quick Start github.com/confluentinc/ksql Local runtime or Docker container
  • 44. 44KSQL- Streaming SQL for Apache Kafka Remember: Developer Preview! Caveats of Developer Preview • No ORDER BY yet • No Stream-stream joins yet • Limited function library • Avro support only via workaround • Breaking API / Syntax changes still possible BE EXCITED, BUT BE ADVISED
  • 45. 45KSQL- Streaming SQL for Apache Kafka Resources and Next Steps Get Involved • Try the Quickstart on GitHub • Check out the code • Play with the examples The point of a developer preview is to improve things—together! https://github.com/confluentinc/ksql http://confluent.io/ksql https://slackpass.io/confluentcommunity #ksql
  • 46. 46KSQL- Streaming SQL for Apache Kafka Kai Waehner Technology Evangelist kontakt@kai-waehner.de @KaiWaehner www.confluent.io www.kai-waehner.de LinkedIn Questions? Feedback? Please contact me… Come to our booth… Come to Kafka Summit London in April 2018…
  • 47. 47KSQL- Streaming SQL for Apache Kafka Appendix
  • 48. 48KSQL- Streaming SQL for Apache Kafka KSQL Concepts ● STREAM and TABLE as first-class citizens ● Interpretations of topic content ● STREAM - data in motion ● TABLE - collected state of a stream • One record per key (per window) • Current values (compacted topic) not yet • Changelog ● STREAM – TABLE Joins
  • 49. 49KSQL- Streaming SQL for Apache Kafka Schema & Format ● A Kafka broker knows how to move bytes • Technically a key-value message (byte[], byte[]) ● To enable declarative SQL-like queries and transformations we have to define a richer structure ● Structural metadata maintained in an in-memory catalog • DDL is recorded in a special topic
  • 50. 50KSQL- Streaming SQL for Apache Kafka Schema & Format Start with message (value) format ● JSON - the simplest choice ● DELIMITED - in this preview, the implicit delimiter is a comma and the escaping rules are built-in. Will be expanded. ● AVRO - requires that you also supply a schema-file (.avsc) Pseudo-columns are automatically provided • ROWKEY, ROWTIME - for querying the message key and timestamp • (PARTITION, OFFSET coming soon) • CREATE STREAM pageview (viewtime bigint, userid varchar, pageid varchar) WITH (value_format = 'delimited', kafka_topic='my_pageview_topic');
  • 51. 51KSQL- Streaming SQL for Apache Kafka Schema & Datatypes ● varchar / string ● boolean / bool ● integer / int ● bigint / long ● double ● array(of_type) - of-type must be primitive (no nested Array or Map yet) ● map(key_type, value_type) - key-type must be string, value-type must be primitive
  • 52. 52KSQL- Streaming SQL for Apache Kafka Interactive Querying ● Great for iterative development ● LIST (or SHOW) STREAMS / TABLES ● DESCRIBE STREAM / TABLE ● SELECT • Selects rows from a KSQL stream or table. • The result of this statement will be printed out in the console. • To stop the continuous query in the CLI press Ctrl+C.
  • 53. 53KSQL- Streaming SQL for Apache Kafka SELECT statement syntax SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] [ LIMIT n ] where from_item is one of the following: stream_or_table_name [ [ AS ] alias] from_item LEFT JOIN from_item ON join_condition
  • 54. 54KSQL- Streaming SQL for Apache Kafka WINDOWing ● Not ANSI SQL ! à Continuous Queries :-) ● Three types supported (same as KStreams): • TUMBLING (= SLIDING) • SELECT appname, ip, COUNT(appname) AS problem_count FROM logstream WINDOW TUMBLING (size 1 minute) WHERE loglevel='ERROR' GROUP BY appname, ip; • HOPPING • SELECT itemid, SUM(arraycol[0]) FROM orders WINDOW HOPPING ( size 20 second, advance by 5 second) GROUP BY itemid; • SESSION • SELECT itemid, SUM(sales_price) FROM orders WINDOW SESSION (20 second) GROUP BY itemid;
  • 55. 55KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS SELECT ● Once your query is ready and you want to run your query non-interactively • CREATE STREAM AS SELECT ...; ● Creates a new KSQL Stream along with the corresponding Kafka topic and streams the result of the SELECT query into the topic ● To find what streams are already running: • SHOW QUERIES; ● If you need to stop one: • TERMINATE query_id;
  • 56. 56KSQL- Streaming SQL for Apache Kafka CREATE STREAM AS syntax CREATE STREAM `stream_name` [WITH (`property = expression` [, …] ) ] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WHERE `condition` ] [ PARTITION BY `column_name` ] ● where property can be any of the following: KAFKA_TOPIC = name - what to call the sink topic FORMAT = DELIMITED | JSON | AVRO - defaults to format of input stream AVROSCHEMAFILE = path/to/file - if FORMAT=AVRO, where the output schema file will be written to PARTITIONS = # - number of partitions in sink topic TIMESTAMP = column - The name of the column to use as the timestamp. This can be used to define the event time.
  • 57. 57KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS SELECT ● Once your query is ready and you want to run it non-interactively ● CREATE TABLE AS SELECT ...; ● Just like ‚CREATE STREAM AS SELECT‘ but for aggregations
  • 58. 58KSQL- Streaming SQL for Apache Kafka CREATE TABLE AS syntax CREATE TABLE `stream_name` [WITH ( `property_name = expression` [, ...] )] AS SELECT `select_expr` [, ...] FROM `from_item` [, ...] [ WINDOW `window_expression` ] [ WHERE `condition` ] [ GROUP BY `grouping expression` ] [ HAVING `having_expression` ] ● where property values are same as for ‚Create Streams as Select‘
  • 59. 59KSQL- Streaming SQL for Apache Kafka Functions ● Scalar Functions: • CONCAT, IFNULL, LCASE, LEN, SUBSTRING,TRIM, UCASE • ABS, CEIL, FLOOR, RANDOM, ROUND • StringToTimestamp, TimestampToString • GetStringFromJSON • CAST ● Aggregate Functions: • SUM, COUNT, MIN, MAX ● User- defined Functions: • Java Interface
  • 60. 60KSQL- Streaming SQL for Apache Kafka Session Variables ● Just as in MySQL, ORCL etc. there are settings to control how your CLI behaves ● Set any property the KStreams consumers/producers will understand ● Defaults can be set in the ksql.properties file ● To see a list of currently set or default variable values: • ksql> show properties; ● Useful examples: • num.stream.threads=4 • commit.interval.ms=1000 • cache.max.bytes.buffering=2000000 ● TIP! - Your new best friend for testing or building a demo is: • ksql> set ‘auto.offset.reset’ = ‘earliest’;