SlideShare uma empresa Scribd logo
1 de 103
Baixar para ler offline
Workshop Series:
ksqlDB
2021 10 20 2
:
Jupil Hwang ( )
Hyunsoo Kim ( )
:
14:00 – 17:00
2
Agenda — ksqlDB
3 3
01 02:00 - 02:10 PM
05 Lab: Hands on
03:00 AM - 05:00 PM
02 Talk: Kafka, Kafka Streams ksqlDB
02:10 - 02:30 PM
03 Lab:
02:30 - 02:45 PM
04 Lab:
02:45 - 03:00 PM
4
• Q&A
• 궁금한 점이 있으시다면 Q&A를 통해 질문 보내주시기 바랍니다. 발
표 이후 연사가 직접 답변 전달할 예정입니다.
• 온라인 설문조사
• 금일 워크샵에 대한 소중한 의견 보내주시기 바랍니다. 향후 알찬 내
용을 준비하는데 참고하겠습니다.
• 설문조사 참여 링크는 (1) Zoom 채팅창 통해 확인, (2) 행사 종료 이
후 웹 브라우저 통해 자동 참여
워크샵 안내사항
Confluent Platform & Cloud:
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
EAI / ESB
●
●
● / (Pub/sub)
(Point-to-Point)
●
●
App App
NoSQL DBs Big Data Analytics
?
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
EAI / ESB
App App
:
스트리밍 플랫폼은 조직
의 모든 사람과 시스템에
게 데이터에 대한 단일
정보 소스(single source
of truth)를 제공한다.
NoSQL DBs Big Data Analytics
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
App App
Streaming Platform
80% +
Fortune 100
Apache Kafka
Confluent
LinkedIn
•
• Producer Consumer (Decouple)
•
Event Streaming Platform
, , ,
Core Loans Credit Cards Patient
Lending
Data :
...
Device
Logs
... ...
...
Data Stores Logs 3rd Party
Apps
Custom Apps /
Microservices
Real-time
Inventory
Real-time
Fraud
Detection
Real-time
Customer
360
Machine
Learning
Models
Real-time
Data
Transformat
ion
...
Data in Motion Applications
Data-in-Motion Pipeline
Amazon
S3
SaaS
apps
Data in Motion :
, , .
Streaming Platform :
A Sale A shipment
A Trade
A Customer
Experience
11
…and more
12
Event Stream Processing
,
What’s stream processing good for?
13
Materialized Cache
view
Streaming ETL Pipeline
,
Source Sink
Event-Driven
Microservice
Confluent Platform Conceptual Architecture
14
OSS
Apache Kafka
Data
Sink
POJO /
MicroServices
Data
Sink
OSS Apache Kafka® Messaging Data Integration/ETL .
POJO /
MicroServices
Streams
Apps
Source
Connector
Data
Source
Sink
Connector
ksqlDB
Schema
Registry
Confluent Platform Conceptual Architecture
15
Confluent Platform
(Apache Kafka)
Enterprise
Security
ksqlDB
Replicator
Machine
Learning
Data
Sink
Data
Source
Schema
Registry
Control
Center
Source
Connector
Sink
Connector
Micro
Services
Mobile
Devices
Car/IoT
MQTT
Proxy
REST
Proxy
Sensor
Data
Sink
Confluent Platform
(Apache Kafka)
Confluent Platform Kafka Cluster Connect, Replicator, ksqlDB REST/MQTT Proxy .
Streams
Apps
Confluent
Hall of Innovation
CTO Innovation
Award Winner
2019
Enterprise Technology
Innovation
AWARDS
Vision
● Kafka
● Event streaming
Category Leadership
● Kafka commits 80%
● 1 Kafka
● 5000 Kafka
Value
● Risk
●
● TCO
● Time-to-market
Product
● Kafka
● Software
Cloud-Native Service
16
Confluent Enterprise Apache Kafka
17
- cloud, on-
prem, hybrid, or multi-cloud
,
– Connect
Stream processing
application
– KStreams, ksqlDB
Confluent
18
Open Source | Community licensed
Fully Managed Cloud Service
Self-managed Software
Training Partners
Enterprise
Support
Professional
Services
ARCHITECT
OPERATOR
DEVELOPER EXECUTIVE
Confluent Platform
Self-Balancing Clusters | Tiered Storage
DevOps
Operator | Ansible
GUI-
Control Center | Proactive Support
ksqlDB
Pre-built
Connectors | Hub | Schema Registry
Non-Java Clients | REST Proxy
Admin REST APIs
Multi-Region Clusters | Replicator
Cluster Linking
Schema Registry | Schema Validation
RBAC | Secrets | Audit Logs
TCO / ROI
Revenue / Cost / Risk Impact
Complete Engagement Model
Apache Kafka ?
19
Kafka distributed commit log
• Publish Subscribe .
• .
• Transaction .
1 2 3 4 5 6 7 8
Append-only
writes
Reads are a single
seek and scan
App App App
Producers
App App App
Consumers
Kafka
Cluster
Kafka Connect Kafka Streams ?
Kafka Streams API
• Java
• Producer/ Consumer APIs
Kafka Connect API
• Kafka
•
Orders
Customers
STREAM
PROCESSING
KStreams / KTable
Multi-Language
Development
Confluent
) Connector
21
200+ Pre-Built
Connectors
Event Stream
Processing
ksqlDB
/ KStream
Stream Processing by Analogy
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
Confluent 3
23
Kafka Clients Kafka Streams ksqlDB
ConsumerRecords<String, String> records =
consumer.poll(100);
Map<String, Integer> counts = new DefaultMap<String,
Integer>();
for (ConsumerRecord<String, Integer> record : records) {
String key = record.key();
int c = counts.get(key)
c += record.value()
counts.put(key, c)
}
for (Map.Entry<String, Integer> entry : counts.entrySet()) {
int stateCount;
int attempts;
while (attempts++ < MAX_RETRIES) {
try {
stateCount = stateStore.getValue(entry.getKey())
stateStore.setValue(entry.getKey(), entry.getValue() +
stateCount)
break;
} catch (StateStoreException e) {
RetryUtils.backoff(attempts);
}
}
}
builder
.stream("input-stream",
Consumed.with(Serdes.String(), Serdes.String()))
.groupBy((key, value) -> value)
.count()
.toStream()
.to("counts", Produced.with(Serdes.String(), Serdes.Long()));
SELECT x, count(*) FROM stream GROUP BY x EMIT
CHANGES;
subscribe(), poll(), send(),
flush(), beginTransaction(), …
KStream, KTable, filter(), map(),
flatMap(), join(), aggregate(),
transform(), …
CREATE STREAM, CREATE TABLE,
SELECT, JOIN, GROUP BY, SUM, …
Stream Processing
KSQL UDFs
24
25
3-5 ,
DB
CONNECTOR
CONNECTOR
APP
APP
DB
STREAM
PROCESSING
CONNECTOR APP
DB
2
3
4
1
26
ksqlDB , , Push Pull
DB
APP
APP
DB
PULL
PUSH
CONNECTORS
STREAM
PROCESSING
STATE STORES
ksqlDB
1 2
APP
Serve lookups against
materialized views
Create
materialized views
Perform continuous
transformations
Capture data
CREATE STREAM purchases AS
SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd')
FROM pageviews;
CREATE TABLE orders_by_country AS
SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total
FROM purchases
WINDOW TUMBLING (SIZE 5 MINUTES)
LEFT JOIN user_profiles ON purchases.customer_id = user_profiles.customer_id
GROUP BY country
EMIT CHANGES;
SELECT * FROM orders_by_country WHERE country='usa';
CREATE SOURCE CONNECTOR jdbcConnector WITH (
‘connector.class’ = '...JdbcSourceConnector',
‘connection.url’ = '...',
…);
Connector
Stream
Table
Query
SQL
Filter messages to a separate topic in real-time
28
Partition 0
Partition 1
Partition 2
Topic: Blue and Red Widgets
Partition 0
Partition 1
Partition 2
Topic: Blue Widgets Only
STREAM
PROCESSING
Filters
29
Filters CREATE STREAM high_readings AS
SELECT sensor,
reading,
FROM readings
WHERE reading > 41
EMIT CHANGES;
Easily merge and join topics to one another
30
Partition 0
Partition 1
Partition 2
Topic: Blue and Red Widgets
Partition 0
Partition 1
Partition 2
Topic: Green and Yellow Widgets
Partition 0
Partition 1
Partition 2
Topic: Blue and Yellow Widgets
STREAM
PROCESSING
Joins
31
Joins
CREATE STREAM enriched_readings AS
SELECT reading, area, brand_name,
FROM readings
INNER JOIN brands b
ON b.sensor = readings.sensor
EMIT CHANGES;
Aggregate streams into tables and capture
summary statistics
32
Partition 0
Partition 1
Partition 2
Topic: Blue and Red Widgets Table: Widget Count
STREAM
PROCESSING
Widget Color Count
Blue 15
Red 9
Aggregate
33
Aggregate CREATE TABLE avg_readings AS
SELECT sensor,
AVG(reading) AS location
FROM readings
GROUP BY sensor
EMIT CHANGES;
Workshop
35
• Zoom과 브라우저(Instructions, ksqlDB console 및 Confluent
Control Center)로 작업하게 됩니다.
• 질문이 있는 경우 Zoom chat 기능을 통해 게시할 수 있습니다.
• 막히더라도 걱정하지 마세요 - Zoom에서 "Raise hand" 버튼을
사용하면 Confluent 엔지니어가 도와드릴 것입니다.
• 그냥 앞질러서 복사하여 붙여넣기 하는 것을 피하십시오 - 대부분의
사람들은 실제로 콘솔에 코드를 입력할 때 더 잘 배웁니다. 그리고
실수로부터 배울 수 있습니다.
•
교육 진행하는 방법
37
•
•
•
• submit
• id -
/
.
Use Case -
38
• . / ,
.
• 9/12/19 12:55:05 GMT, 5313, {
"rating_id": 5313,
"user_id": 3,
"stars": 1,
"route_id": 6975,
"rating_time": 1519304105213,
"channel": "web",
"message": "why is it so difficult to keep the bathrooms clean?"
}
Use Case - Approach 1
39
리뷰를 데이터 웨어하우스로 이동시킵니다.
매월 말에 검토를 처리한 다음, 상당한 수의 의견이 접수된 해
당 부서에 전달합니다.
이 접근 방식은 이미 발생했었던 일을 알려줍니다.
Use Case - Approach 2
40
실시간으로 리뷰를 처리하고 공항 관리팀에 대시보드를
제공합니다.
이 대시보드는 주제별로 리뷰를 정렬하여 청결과 관련된
문제를 신속하게 표시할 수 있습니다.
이 접근 방식은 지금 무슨 일이 일어나고 있는지 알려줍
니다.
Use Case - Approach 3
41
실시간으로 리뷰를 처리합니다.
최근 10 동안의 화장실 청결과 관련된 3 나쁜 리뷰
에 대한 알림을 설정합니다.
자동으로 청소 직원을 호출하여 문제를 처리합니다.
이 접근 방식은 무슨 일이 일어나고 있는지에 따라 무언
가를 수행합니다.
Hands on
3.
3.2.1
Cluster Architectural Overview
43
MySQL
Microservice
Website
Kafka Connect
Datagen Source
connector
MySQL CDC
connector
Kafka
ksqlDB
transforms
enriches
queries
ksqlDB
44
ksqlDB Kafka Brokers
node
Confluent Control
Center
ksqlDB Editor &
DataFlow
ksqlDB
CLI
ksqlDB
RESTFul
API
ksqlDB console
45
ksqlDB console
46
> show topics;
> show streams;
> print 'ratings';
Hands on
4. ksqlDB
4.2.2
Discussion - tables vs streams
48
> describe extended customers;
> select * from customers emit changes;
> select * from customers_flat emit changes;
Stream <-> Table duality
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables
49
Streams and Tables
{ "event_ts": "2020-02-17T15:22:00Z",
"person" : "robin",
"location": "Leeds"
}
{ "event_ts": "2020-02-17T17:23:00Z",
"person" : "robin",
"location": "London"
}
{ "event_ts": "2020-02-17T22:23:00Z",
"person" : "robin",
"location": "Wakefield"
}
{ "event_ts": "2020-02-18T09:00:00Z",
"person" : "robin",
"location": "Leeds"
+--------------------+-------+---------+
|EVENT_TS |PERSON |LOCATION |
+--------------------+-------+---------+
|2020-02-17 15:22:00 |robin |Leeds |
|2020-02-17 17:23:00 |robin |London |
|2020-02-17 22:23:00 |robin |Wakefield|
|2020-02-18 09:00:00 |robin |Leeds |
+-------+---------+
|PERSON |LOCATION |
+-------+---------+
|robin |Leeds |
Kafka topic
+-------+---------+
|PERSON |LOCATION |
+-------+---------+
|robin |London |
+-------+---------+
|PERSON |LOCATION |
+-------+---------+
|robin |Wakefield|
+-------+---------+
|PERSON |LOCATION |
+-------+---------+
|robin |Leeds |
ksqlDB Table
ksqlDB Stream
Stream (append-only series of
events):
Topic + Schema
Table: state for
given key
Topic + Schema
50
• Streams = INSERT only
Immutable, append-only
• Tables = INSERT, UPDATE, DELETE
Mutable, row key (event.key) identifies which
row
51
The key to mutability is … the event.key!
52
Stream Table
Has unique key constraint? No Yes
First event with key ‘alice’ arrives INSERT INSERT
Another event with key ‘alice’ arrives INSERT UPDATE
Event with key ‘alice’ and value == null arrives INSERT DELETE
Event with key == null arrives INSERT <ignored>
RDBMS analogy: A Stream is ~ a Table that has no unique key and is append-only.
Creating a table from a stream or topic
streams
Aggregating a stream (COUNT example)
streams
Aggregating a stream (COUNT example)
streams
KSQL for Data Exploration
An easy way to inspect your data in Kafka
SHOW TOPICS;
SELECT page, user_id, status, bytes
FROM clickstream
WHERE user_agent LIKE 'Mozilla/5.0%';
PRINT 'my-topic' FROM BEGINNING;
56
KSQL for Data Transformation
Quickly make derivations of existing data in Kafka
CREATE STREAM clicks_by_user_id
WITH (PARTITIONS=6,
TIMESTAMP='view_time’
VALUE_FORMAT='JSON') AS
SELECT * FROM clickstream
PARTITION BY user_id;
Change number of partitions
1
Convert data to JSON
2
Repartition the data
3
57
Hands on
4.3
4.4 Query
8
.
59
• Kafka .
• Format .
• data streams join .
• Event Stream Query Query
.
• !
KSQL for Real-Time, Streaming ETL
Filter, cleanse, process data while it is in motion
CREATE STREAM clicks_from_vip_users AS
SELECT user_id, u.country, page, action
FROM clickstream c
LEFT JOIN users u ON c.user_id = u.user_id
WHERE u.level ='Platinum'; Pick only VIP users
1
60
CDC — only after state
61
JSON 데이터는 Debezium CDC를 통해
MySQL에서 가져오는 정보를 보여줍니다.
여기서 "BEFORE" 데이터가 없음을 알 수 있습
니다(null임).
이것은 레코드가 업데이트 없이 방금 생성되었음
을 의미합니다. 새 고객이 처음 추가된 경우를 예
로 들 수 있습니다.
CDC — before and after
62
이제 고객 레코드에 대한 업데이트가 있었기 때
문에 일부 "BEFORE" 데이터가 있습니다.
KSQL for Anomaly Detection
Aggregate data to identify patterns and anomalies in real-time
CREATE TABLE possible_fraud AS
SELECT card_number, COUNT(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 30 SECONDS)
GROUP BY card_number
HAVING COUNT(*) > 3;
Aggregate data
1
… per 30-sec windows
2
63
KSQL for Real-Time Monitoring
Derive insights from events (IoT, sensors, etc.) and turn them into actions
CREATE TABLE failing_vehicles AS
SELECT vehicle, COUNT(*)
FROM vehicle_monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE event_type = 'ERROR’
GROUP BY vehicle
HAVING COUNT(*) >= 5; Now we know to alert, and whom
1
64
Confluent Control Center
C3 - Connector
ksqlDB - Cloud UI (1/2)
67
ksqlDB - Cloud UI (2/2)
68
Monitoring ksqlDB applications
Data flow (1/2)
69
Monitoring ksqlDB applications
Data flow (2/2)
70
ksqlDB Internals
Storage Layer
(Brokers)
Processing Layer
(ksqlDB, KStreams,
etc.)
Partitions play a central role in Kafka
72
Topics are partitioned. Partitions enable scalability, elasticity, fault-tolerance.
stored in
replicated based on
ordered based on
partitions
Data is
joined based on
read from and written to
processed based on
Processing
Layer
(KSQL,
KStreams)
00100 11101 11000 00011 00100 00110
Topic
alice Paris bob Sydney alice Rome
Stream
plus schema (serdes)
alice 2
bob 1
Table
plus aggregation
Storage Layer
(Brokers)
Topics vs. Streams and Tables
73
Kafka Processing
Data is processed per-partition
...
...
...
...
P1
P2
P3
P4
Storage Processing
read via
network
Topic App Instance 1 Application
App Instance 2
‘payments’ with consumer group
‘my-app’
74
Kafka Processing
Data is processed per-partition
...
...
...
...
P1
P2
P3
P4
Storage Processing State
Stream Task 1
Stream Task 2
Stream Task 3
Stream Task 4
read via
network
Application Instance 1
Topic
Application Instance 2
75
Streams and Tables are partitioned, too
...
...
...
...
P1
P2
P3
P4
Stream Task 1
Stream Task 2
Stream Task 3
Stream Task 4
KTable / TABLE
2 GB
3 GB
5 GB
2 GB
Application Instance 1
Application Instance 2
76
Kafka Streams Architecture
77
Advanced Features
Windowing
79
“10 3 ”
Windowed Query ksqlDB 로직을 .
.
Tumbling Hopping Session
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY key
WINDOW HOPPING (SIZE 5 MINUTE, ADVANCE BY 1 MINUTE)
GROUP BY key
WINDOW SESSION (60 SECONDS)
GROUP BY key
UDF and machine learning
80
“ ”
ksqlDB는 스트림 처리를 단순화하기 위해 여러 내장 함수들을 제공합니다. 예는 다음과 같습니다.:
• GEODISTANCE: 두 위도/경도 좌표 사이의 거리를 측정
• MASK: 문자열을 마스크하거나 난독화된 버전으로 변환
• JSON_ARRAY_CONTAINS: 배열에 검색 값이 포함되어 있는지 확인
사용자 정의 함수를 개발하여 ksqlDB에서 사용 가능한 기능을 확장합니다. 일반적인 사용 사례는 ksqlDB를
통해 기계 학습 알고리즘을 구현하여 이러한 모델이 실시간 데이터 변환에 기여할 수 있도록 하는 것입니다.
ksqlDB ?
81
Streaming ETL Anomaly detection
Real-time monitoring
and Analytics
Sensor data and IoT Customer 360-view
https://docs.ksqldb.io/en/latest/#what-can-i-do-with-ksqldb
Example: Streaming ETL pipeline
82
* Full example here
• Apache Kafka is a popular choice for powering data pipelines
• ksqlDB makes it simple to transform data within the pipeline,
preparing the messages for consumption by another system.
Example: Anomaly detection
83
• Identify patterns and spot anomalies in real-time data with
millisecond latency, enabling you to properly surface out-of-the-
ordinary events and to handle fraudulent activities separately.
* Full example here
Any questions?
87
one more …
Developer https://developer.confluent.io
Tutorials
90
•
• Kafka ksqlDB
Kafka ksqlDB
?
• ?
• Apache Kafka® .
https://kafka-tutorials.confluent.io/
Free eBooks
Kafka: The Definitive Guide
Neha Narkhede, Gwen Shapira, Todd
Palino
Making Sense of Stream Processing
Martin Kleppmann
I ❤ Logs
Jay Kreps
Designing Event-Driven Systems
Ben Stopford
http://cnfl.io/book-bundle
Confluent
92
Confluent Blog
cnfl.io/blog
Confluent Cloud
cnfl.io/confluent-cloud
Community
cnfl.io/meetups
93
Max processing parallelism = #input partitions
...
...
...
...
P1
P2
P3
P4
Topic Application Instance 1
Application Instance 2
Application Instance 3
Application Instance 4
Application Instance 5 *** idle ***
Application Instance 6 *** idle ***
→ Need higher parallelism? Increase the original topic’s partition count.
→ Higher parallelism for just one use case? Derive a new topic from the
original with higher partition count. Lower its retention to save storage.
94
How to increase # of partitions when needed
CREATE STREAM products_repartitioned
WITH (PARTITIONS=30) AS
SELECT * FROM products
95
KSQL example: statement below creates a new stream with the desired number of partitions.
‘Hot’ partitions is a problem, often caused by
Strategies to address hot partitions include
1a. Ingress: Find better partitioning function ƒ(event.key) for producers
1b. Storage: Re-partition data into new topic if you can’t change the original
2. Scale processing vertically, e.g. more powerful CPU instances
...
...
...
...
P1
P2
P3
P4
96
1. Events not evenly distributed across partitions
2. Events evenly distributed but certain events take longer to process
Joining Streams and Tables
Data must be ‘co-partitioned’
Table
Stream
Join Output
(Stream) 97
Joining Streams and Tables
Data must be ‘co-partitioned’
bob male
alice female
alex male
alice Paris
Table
P1
P2
P3
zoie female
andrew male
mina female
natalie female
blake male
alice Paris
Stream
P2
(alice, Paris) from
stream’s P2 has a
matching entry for
alice in the table’s P2.
female 98
Joining Streams and Tables
Data is looked up in same partition number
99
alice Paris alice male
alice female
alice Paris
Stream Table
P2 P1
P2
P3
Here, key ‘alice’ exists in
multiple partitions.
But entry in P2
(female) is used
because the stream-
side event is from
stream’s partition P2.
female
Scenario 2
Joining Streams and Tables
Data is looked up in same partition number
100
alice Paris alice male
alice Paris
Stream Table
P2 P1
P2
P3
Here, key ‘alice’ exists
only in the table’s P1 !=
P2.
null
no
match!
Scenario 3
Data co-partitioning requirements in detail
Further Reading on Joining Streams and Tables:
https://www.confluent.io/kafka-summit-sf18/zen-and-the-art-of-streaming-joins
https://docs.confluent.io/current/ksql/docs/developer-guide/partition-data.html
101
1. Same keying scheme for both input sides
2. Same number of partitions
3. Same partitioning function ƒ(event.key)
Why is that so?
Because of how input data is mapped to stream tasks
...
...
...
P1
P2
P3
storage
processing state
Stream Task 2
read via
network
Strea
m
Topic
...
...
...
P1
P2
P3
Table
Topic
from stream’s P2
from table’s P2
102
How to re-partition your data when needed
CREATE STREAM products_repartitioned
WITH (PARTITIONS=42) AS
SELECT * FROM products
PARTITION BY product_id;
103
KSQL example: statement below creates a new stream with changed number of partitions and a new field as
event.key (so that its data is now correctly co-partitioned for joining)

Mais conteúdo relacionado

Mais procurados

Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
alex_araujo
 

Mais procurados (20)

Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics Apache Flink, AWS Kinesis, Analytics
Apache Flink, AWS Kinesis, Analytics
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQL
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
Intelligent Auto-scaling of Kafka Consumers with Workload Prediction | Ming S...
 
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...
Using Modular Topologies in Kafka Streams to scale ksqlDB’s persistent querie...
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
2022년 07월 21일 Confluent+Imply 웨비나 발표자료
2022년 07월 21일 Confluent+Imply 웨비나 발표자료2022년 07월 21일 Confluent+Imply 웨비나 발표자료
2022년 07월 21일 Confluent+Imply 웨비나 발표자료
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache KafkaReal-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 

Semelhante a Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드

Semelhante a Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드 (20)

All Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZAll Streams Ahead! ksqlDB Workshop ANZ
All Streams Ahead! ksqlDB Workshop ANZ
 
APAC ksqlDB Workshop
APAC ksqlDB WorkshopAPAC ksqlDB Workshop
APAC ksqlDB Workshop
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Confluent Tech Talk Korea
Confluent Tech Talk KoreaConfluent Tech Talk Korea
Confluent Tech Talk Korea
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
Confluent Platform 5.5 + Apache Kafka 2.5 => New Features (JSON Schema, Proto...
 
Ovations AWS pop-up loft 2019 Technical presentation
Ovations AWS pop-up loft 2019 Technical presentationOvations AWS pop-up loft 2019 Technical presentation
Ovations AWS pop-up loft 2019 Technical presentation
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
DevOps, Microservices and Serverless Architecture
DevOps, Microservices and Serverless ArchitectureDevOps, Microservices and Serverless Architecture
DevOps, Microservices and Serverless Architecture
 
Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)Serverless in production, an experience report (FullStack 2018)
Serverless in production, an experience report (FullStack 2018)
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
Flink Forward San Francisco 2018: - Jinkui Shi and Radu Tudoran "Flink real-t...
 
Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)Serverless in Production, an experience report (AWS UG South Wales)
Serverless in Production, an experience report (AWS UG South Wales)
 
MongoDB Stitch Introduction
MongoDB Stitch IntroductionMongoDB Stitch Introduction
MongoDB Stitch Introduction
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
 

Mais de confluent

Mais de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드

  • 2. : Jupil Hwang ( ) Hyunsoo Kim ( ) : 14:00 – 17:00 2
  • 3. Agenda — ksqlDB 3 3 01 02:00 - 02:10 PM 05 Lab: Hands on 03:00 AM - 05:00 PM 02 Talk: Kafka, Kafka Streams ksqlDB 02:10 - 02:30 PM 03 Lab: 02:30 - 02:45 PM 04 Lab: 02:45 - 03:00 PM
  • 4. 4 • Q&A • 궁금한 점이 있으시다면 Q&A를 통해 질문 보내주시기 바랍니다. 발 표 이후 연사가 직접 답변 전달할 예정입니다. • 온라인 설문조사 • 금일 워크샵에 대한 소중한 의견 보내주시기 바랍니다. 향후 알찬 내 용을 준비하는데 참고하겠습니다. • 설문조사 참여 링크는 (1) Zoom 채팅창 통해 확인, (2) 행사 종료 이 후 웹 브라우저 통해 자동 참여 워크샵 안내사항
  • 6. App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL EAI / ESB ● ● ● / (Pub/sub) (Point-to-Point) ● ● App App
  • 7. NoSQL DBs Big Data Analytics ? App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App MOM MOM ETL ETL EAI / ESB App App
  • 8. : 스트리밍 플랫폼은 조직 의 모든 사람과 시스템에 게 데이터에 대한 단일 정보 소스(single source of truth)를 제공한다. NoSQL DBs Big Data Analytics App App DWH Transactional Databases Analytics Databases Data Flow DB DB App App App App Streaming Platform
  • 9. 80% + Fortune 100 Apache Kafka Confluent LinkedIn • • Producer Consumer (Decouple) •
  • 10. Event Streaming Platform , , , Core Loans Credit Cards Patient Lending Data : ... Device Logs ... ... ... Data Stores Logs 3rd Party Apps Custom Apps / Microservices Real-time Inventory Real-time Fraud Detection Real-time Customer 360 Machine Learning Models Real-time Data Transformat ion ... Data in Motion Applications Data-in-Motion Pipeline Amazon S3 SaaS apps Data in Motion : , , .
  • 11. Streaming Platform : A Sale A shipment A Trade A Customer Experience 11 …and more
  • 13. What’s stream processing good for? 13 Materialized Cache view Streaming ETL Pipeline , Source Sink Event-Driven Microservice
  • 14. Confluent Platform Conceptual Architecture 14 OSS Apache Kafka Data Sink POJO / MicroServices Data Sink OSS Apache Kafka® Messaging Data Integration/ETL . POJO / MicroServices Streams Apps Source Connector Data Source Sink Connector ksqlDB Schema Registry
  • 15. Confluent Platform Conceptual Architecture 15 Confluent Platform (Apache Kafka) Enterprise Security ksqlDB Replicator Machine Learning Data Sink Data Source Schema Registry Control Center Source Connector Sink Connector Micro Services Mobile Devices Car/IoT MQTT Proxy REST Proxy Sensor Data Sink Confluent Platform (Apache Kafka) Confluent Platform Kafka Cluster Connect, Replicator, ksqlDB REST/MQTT Proxy . Streams Apps
  • 16. Confluent Hall of Innovation CTO Innovation Award Winner 2019 Enterprise Technology Innovation AWARDS Vision ● Kafka ● Event streaming Category Leadership ● Kafka commits 80% ● 1 Kafka ● 5000 Kafka Value ● Risk ● ● TCO ● Time-to-market Product ● Kafka ● Software Cloud-Native Service 16
  • 17. Confluent Enterprise Apache Kafka 17 - cloud, on- prem, hybrid, or multi-cloud , – Connect Stream processing application – KStreams, ksqlDB
  • 18. Confluent 18 Open Source | Community licensed Fully Managed Cloud Service Self-managed Software Training Partners Enterprise Support Professional Services ARCHITECT OPERATOR DEVELOPER EXECUTIVE Confluent Platform Self-Balancing Clusters | Tiered Storage DevOps Operator | Ansible GUI- Control Center | Proactive Support ksqlDB Pre-built Connectors | Hub | Schema Registry Non-Java Clients | REST Proxy Admin REST APIs Multi-Region Clusters | Replicator Cluster Linking Schema Registry | Schema Validation RBAC | Secrets | Audit Logs TCO / ROI Revenue / Cost / Risk Impact Complete Engagement Model
  • 19. Apache Kafka ? 19 Kafka distributed commit log • Publish Subscribe . • . • Transaction . 1 2 3 4 5 6 7 8 Append-only writes Reads are a single seek and scan App App App Producers App App App Consumers Kafka Cluster
  • 20. Kafka Connect Kafka Streams ? Kafka Streams API • Java • Producer/ Consumer APIs Kafka Connect API • Kafka • Orders Customers STREAM PROCESSING KStreams / KTable
  • 22. Stream Processing by Analogy Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
  • 23. Confluent 3 23 Kafka Clients Kafka Streams ksqlDB ConsumerRecords<String, String> records = consumer.poll(100); Map<String, Integer> counts = new DefaultMap<String, Integer>(); for (ConsumerRecord<String, Integer> record : records) { String key = record.key(); int c = counts.get(key) c += record.value() counts.put(key, c) } for (Map.Entry<String, Integer> entry : counts.entrySet()) { int stateCount; int attempts; while (attempts++ < MAX_RETRIES) { try { stateCount = stateStore.getValue(entry.getKey()) stateStore.setValue(entry.getKey(), entry.getValue() + stateCount) break; } catch (StateStoreException e) { RetryUtils.backoff(attempts); } } } builder .stream("input-stream", Consumed.with(Serdes.String(), Serdes.String())) .groupBy((key, value) -> value) .count() .toStream() .to("counts", Produced.with(Serdes.String(), Serdes.Long())); SELECT x, count(*) FROM stream GROUP BY x EMIT CHANGES;
  • 24. subscribe(), poll(), send(), flush(), beginTransaction(), … KStream, KTable, filter(), map(), flatMap(), join(), aggregate(), transform(), … CREATE STREAM, CREATE TABLE, SELECT, JOIN, GROUP BY, SUM, … Stream Processing KSQL UDFs 24
  • 26. 26 ksqlDB , , Push Pull DB APP APP DB PULL PUSH CONNECTORS STREAM PROCESSING STATE STORES ksqlDB 1 2 APP
  • 27. Serve lookups against materialized views Create materialized views Perform continuous transformations Capture data CREATE STREAM purchases AS SELECT viewtime, userid,pageid, TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd') FROM pageviews; CREATE TABLE orders_by_country AS SELECT country, COUNT(*) AS order_count, SUM(order_total) AS order_total FROM purchases WINDOW TUMBLING (SIZE 5 MINUTES) LEFT JOIN user_profiles ON purchases.customer_id = user_profiles.customer_id GROUP BY country EMIT CHANGES; SELECT * FROM orders_by_country WHERE country='usa'; CREATE SOURCE CONNECTOR jdbcConnector WITH ( ‘connector.class’ = '...JdbcSourceConnector', ‘connection.url’ = '...', …); Connector Stream Table Query SQL
  • 28. Filter messages to a separate topic in real-time 28 Partition 0 Partition 1 Partition 2 Topic: Blue and Red Widgets Partition 0 Partition 1 Partition 2 Topic: Blue Widgets Only STREAM PROCESSING Filters
  • 29. 29 Filters CREATE STREAM high_readings AS SELECT sensor, reading, FROM readings WHERE reading > 41 EMIT CHANGES;
  • 30. Easily merge and join topics to one another 30 Partition 0 Partition 1 Partition 2 Topic: Blue and Red Widgets Partition 0 Partition 1 Partition 2 Topic: Green and Yellow Widgets Partition 0 Partition 1 Partition 2 Topic: Blue and Yellow Widgets STREAM PROCESSING Joins
  • 31. 31 Joins CREATE STREAM enriched_readings AS SELECT reading, area, brand_name, FROM readings INNER JOIN brands b ON b.sensor = readings.sensor EMIT CHANGES;
  • 32. Aggregate streams into tables and capture summary statistics 32 Partition 0 Partition 1 Partition 2 Topic: Blue and Red Widgets Table: Widget Count STREAM PROCESSING Widget Color Count Blue 15 Red 9 Aggregate
  • 33. 33 Aggregate CREATE TABLE avg_readings AS SELECT sensor, AVG(reading) AS location FROM readings GROUP BY sensor EMIT CHANGES;
  • 35. 35 • Zoom과 브라우저(Instructions, ksqlDB console 및 Confluent Control Center)로 작업하게 됩니다. • 질문이 있는 경우 Zoom chat 기능을 통해 게시할 수 있습니다. • 막히더라도 걱정하지 마세요 - Zoom에서 "Raise hand" 버튼을 사용하면 Confluent 엔지니어가 도와드릴 것입니다. • 그냥 앞질러서 복사하여 붙여넣기 하는 것을 피하십시오 - 대부분의 사람들은 실제로 콘솔에 코드를 입력할 때 더 잘 배웁니다. 그리고 실수로부터 배울 수 있습니다. • 교육 진행하는 방법
  • 36.
  • 38. Use Case - 38 • . / , . • 9/12/19 12:55:05 GMT, 5313, { "rating_id": 5313, "user_id": 3, "stars": 1, "route_id": 6975, "rating_time": 1519304105213, "channel": "web", "message": "why is it so difficult to keep the bathrooms clean?" }
  • 39. Use Case - Approach 1 39 리뷰를 데이터 웨어하우스로 이동시킵니다. 매월 말에 검토를 처리한 다음, 상당한 수의 의견이 접수된 해 당 부서에 전달합니다. 이 접근 방식은 이미 발생했었던 일을 알려줍니다.
  • 40. Use Case - Approach 2 40 실시간으로 리뷰를 처리하고 공항 관리팀에 대시보드를 제공합니다. 이 대시보드는 주제별로 리뷰를 정렬하여 청결과 관련된 문제를 신속하게 표시할 수 있습니다. 이 접근 방식은 지금 무슨 일이 일어나고 있는지 알려줍 니다.
  • 41. Use Case - Approach 3 41 실시간으로 리뷰를 처리합니다. 최근 10 동안의 화장실 청결과 관련된 3 나쁜 리뷰 에 대한 알림을 설정합니다. 자동으로 청소 직원을 호출하여 문제를 처리합니다. 이 접근 방식은 무슨 일이 일어나고 있는지에 따라 무언 가를 수행합니다.
  • 43. Cluster Architectural Overview 43 MySQL Microservice Website Kafka Connect Datagen Source connector MySQL CDC connector Kafka ksqlDB transforms enriches queries
  • 44. ksqlDB 44 ksqlDB Kafka Brokers node Confluent Control Center ksqlDB Editor & DataFlow ksqlDB CLI ksqlDB RESTFul API
  • 46. ksqlDB console 46 > show topics; > show streams; > print 'ratings';
  • 48. Discussion - tables vs streams 48 > describe extended customers; > select * from customers emit changes; > select * from customers_flat emit changes;
  • 49. Stream <-> Table duality http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables 49
  • 50. Streams and Tables { "event_ts": "2020-02-17T15:22:00Z", "person" : "robin", "location": "Leeds" } { "event_ts": "2020-02-17T17:23:00Z", "person" : "robin", "location": "London" } { "event_ts": "2020-02-17T22:23:00Z", "person" : "robin", "location": "Wakefield" } { "event_ts": "2020-02-18T09:00:00Z", "person" : "robin", "location": "Leeds" +--------------------+-------+---------+ |EVENT_TS |PERSON |LOCATION | +--------------------+-------+---------+ |2020-02-17 15:22:00 |robin |Leeds | |2020-02-17 17:23:00 |robin |London | |2020-02-17 22:23:00 |robin |Wakefield| |2020-02-18 09:00:00 |robin |Leeds | +-------+---------+ |PERSON |LOCATION | +-------+---------+ |robin |Leeds | Kafka topic +-------+---------+ |PERSON |LOCATION | +-------+---------+ |robin |London | +-------+---------+ |PERSON |LOCATION | +-------+---------+ |robin |Wakefield| +-------+---------+ |PERSON |LOCATION | +-------+---------+ |robin |Leeds | ksqlDB Table ksqlDB Stream Stream (append-only series of events): Topic + Schema Table: state for given key Topic + Schema 50
  • 51. • Streams = INSERT only Immutable, append-only • Tables = INSERT, UPDATE, DELETE Mutable, row key (event.key) identifies which row 51
  • 52. The key to mutability is … the event.key! 52 Stream Table Has unique key constraint? No Yes First event with key ‘alice’ arrives INSERT INSERT Another event with key ‘alice’ arrives INSERT UPDATE Event with key ‘alice’ and value == null arrives INSERT DELETE Event with key == null arrives INSERT <ignored> RDBMS analogy: A Stream is ~ a Table that has no unique key and is append-only.
  • 53. Creating a table from a stream or topic streams
  • 54. Aggregating a stream (COUNT example) streams
  • 55. Aggregating a stream (COUNT example) streams
  • 56. KSQL for Data Exploration An easy way to inspect your data in Kafka SHOW TOPICS; SELECT page, user_id, status, bytes FROM clickstream WHERE user_agent LIKE 'Mozilla/5.0%'; PRINT 'my-topic' FROM BEGINNING; 56
  • 57. KSQL for Data Transformation Quickly make derivations of existing data in Kafka CREATE STREAM clicks_by_user_id WITH (PARTITIONS=6, TIMESTAMP='view_time’ VALUE_FORMAT='JSON') AS SELECT * FROM clickstream PARTITION BY user_id; Change number of partitions 1 Convert data to JSON 2 Repartition the data 3 57
  • 59. . 59 • Kafka . • Format . • data streams join . • Event Stream Query Query . • !
  • 60. KSQL for Real-Time, Streaming ETL Filter, cleanse, process data while it is in motion CREATE STREAM clicks_from_vip_users AS SELECT user_id, u.country, page, action FROM clickstream c LEFT JOIN users u ON c.user_id = u.user_id WHERE u.level ='Platinum'; Pick only VIP users 1 60
  • 61. CDC — only after state 61 JSON 데이터는 Debezium CDC를 통해 MySQL에서 가져오는 정보를 보여줍니다. 여기서 "BEFORE" 데이터가 없음을 알 수 있습 니다(null임). 이것은 레코드가 업데이트 없이 방금 생성되었음 을 의미합니다. 새 고객이 처음 추가된 경우를 예 로 들 수 있습니다.
  • 62. CDC — before and after 62 이제 고객 레코드에 대한 업데이트가 있었기 때 문에 일부 "BEFORE" 데이터가 있습니다.
  • 63. KSQL for Anomaly Detection Aggregate data to identify patterns and anomalies in real-time CREATE TABLE possible_fraud AS SELECT card_number, COUNT(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY card_number HAVING COUNT(*) > 3; Aggregate data 1 … per 30-sec windows 2 63
  • 64. KSQL for Real-Time Monitoring Derive insights from events (IoT, sensors, etc.) and turn them into actions CREATE TABLE failing_vehicles AS SELECT vehicle, COUNT(*) FROM vehicle_monitoring_stream WINDOW TUMBLING (SIZE 1 MINUTE) WHERE event_type = 'ERROR’ GROUP BY vehicle HAVING COUNT(*) >= 5; Now we know to alert, and whom 1 64
  • 67. ksqlDB - Cloud UI (1/2) 67
  • 68. ksqlDB - Cloud UI (2/2) 68
  • 72. Storage Layer (Brokers) Processing Layer (ksqlDB, KStreams, etc.) Partitions play a central role in Kafka 72 Topics are partitioned. Partitions enable scalability, elasticity, fault-tolerance. stored in replicated based on ordered based on partitions Data is joined based on read from and written to processed based on
  • 73. Processing Layer (KSQL, KStreams) 00100 11101 11000 00011 00100 00110 Topic alice Paris bob Sydney alice Rome Stream plus schema (serdes) alice 2 bob 1 Table plus aggregation Storage Layer (Brokers) Topics vs. Streams and Tables 73
  • 74. Kafka Processing Data is processed per-partition ... ... ... ... P1 P2 P3 P4 Storage Processing read via network Topic App Instance 1 Application App Instance 2 ‘payments’ with consumer group ‘my-app’ 74
  • 75. Kafka Processing Data is processed per-partition ... ... ... ... P1 P2 P3 P4 Storage Processing State Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4 read via network Application Instance 1 Topic Application Instance 2 75
  • 76. Streams and Tables are partitioned, too ... ... ... ... P1 P2 P3 P4 Stream Task 1 Stream Task 2 Stream Task 3 Stream Task 4 KTable / TABLE 2 GB 3 GB 5 GB 2 GB Application Instance 1 Application Instance 2 76
  • 79. Windowing 79 “10 3 ” Windowed Query ksqlDB 로직을 . . Tumbling Hopping Session WINDOW TUMBLING (SIZE 5 MINUTES) GROUP BY key WINDOW HOPPING (SIZE 5 MINUTE, ADVANCE BY 1 MINUTE) GROUP BY key WINDOW SESSION (60 SECONDS) GROUP BY key
  • 80. UDF and machine learning 80 “ ” ksqlDB는 스트림 처리를 단순화하기 위해 여러 내장 함수들을 제공합니다. 예는 다음과 같습니다.: • GEODISTANCE: 두 위도/경도 좌표 사이의 거리를 측정 • MASK: 문자열을 마스크하거나 난독화된 버전으로 변환 • JSON_ARRAY_CONTAINS: 배열에 검색 값이 포함되어 있는지 확인 사용자 정의 함수를 개발하여 ksqlDB에서 사용 가능한 기능을 확장합니다. 일반적인 사용 사례는 ksqlDB를 통해 기계 학습 알고리즘을 구현하여 이러한 모델이 실시간 데이터 변환에 기여할 수 있도록 하는 것입니다.
  • 81. ksqlDB ? 81 Streaming ETL Anomaly detection Real-time monitoring and Analytics Sensor data and IoT Customer 360-view https://docs.ksqldb.io/en/latest/#what-can-i-do-with-ksqldb
  • 82. Example: Streaming ETL pipeline 82 * Full example here • Apache Kafka is a popular choice for powering data pipelines • ksqlDB makes it simple to transform data within the pipeline, preparing the messages for consumption by another system.
  • 83. Example: Anomaly detection 83 • Identify patterns and spot anomalies in real-time data with millisecond latency, enabling you to properly surface out-of-the- ordinary events and to handle fraudulent activities separately. * Full example here
  • 84.
  • 85.
  • 86.
  • 90. Tutorials 90 • • Kafka ksqlDB Kafka ksqlDB ? • ? • Apache Kafka® . https://kafka-tutorials.confluent.io/
  • 91. Free eBooks Kafka: The Definitive Guide Neha Narkhede, Gwen Shapira, Todd Palino Making Sense of Stream Processing Martin Kleppmann I ❤ Logs Jay Kreps Designing Event-Driven Systems Ben Stopford http://cnfl.io/book-bundle
  • 93. 93
  • 94. Max processing parallelism = #input partitions ... ... ... ... P1 P2 P3 P4 Topic Application Instance 1 Application Instance 2 Application Instance 3 Application Instance 4 Application Instance 5 *** idle *** Application Instance 6 *** idle *** → Need higher parallelism? Increase the original topic’s partition count. → Higher parallelism for just one use case? Derive a new topic from the original with higher partition count. Lower its retention to save storage. 94
  • 95. How to increase # of partitions when needed CREATE STREAM products_repartitioned WITH (PARTITIONS=30) AS SELECT * FROM products 95 KSQL example: statement below creates a new stream with the desired number of partitions.
  • 96. ‘Hot’ partitions is a problem, often caused by Strategies to address hot partitions include 1a. Ingress: Find better partitioning function ƒ(event.key) for producers 1b. Storage: Re-partition data into new topic if you can’t change the original 2. Scale processing vertically, e.g. more powerful CPU instances ... ... ... ... P1 P2 P3 P4 96 1. Events not evenly distributed across partitions 2. Events evenly distributed but certain events take longer to process
  • 97. Joining Streams and Tables Data must be ‘co-partitioned’ Table Stream Join Output (Stream) 97
  • 98. Joining Streams and Tables Data must be ‘co-partitioned’ bob male alice female alex male alice Paris Table P1 P2 P3 zoie female andrew male mina female natalie female blake male alice Paris Stream P2 (alice, Paris) from stream’s P2 has a matching entry for alice in the table’s P2. female 98
  • 99. Joining Streams and Tables Data is looked up in same partition number 99 alice Paris alice male alice female alice Paris Stream Table P2 P1 P2 P3 Here, key ‘alice’ exists in multiple partitions. But entry in P2 (female) is used because the stream- side event is from stream’s partition P2. female Scenario 2
  • 100. Joining Streams and Tables Data is looked up in same partition number 100 alice Paris alice male alice Paris Stream Table P2 P1 P2 P3 Here, key ‘alice’ exists only in the table’s P1 != P2. null no match! Scenario 3
  • 101. Data co-partitioning requirements in detail Further Reading on Joining Streams and Tables: https://www.confluent.io/kafka-summit-sf18/zen-and-the-art-of-streaming-joins https://docs.confluent.io/current/ksql/docs/developer-guide/partition-data.html 101 1. Same keying scheme for both input sides 2. Same number of partitions 3. Same partitioning function ƒ(event.key)
  • 102. Why is that so? Because of how input data is mapped to stream tasks ... ... ... P1 P2 P3 storage processing state Stream Task 2 read via network Strea m Topic ... ... ... P1 P2 P3 Table Topic from stream’s P2 from table’s P2 102
  • 103. How to re-partition your data when needed CREATE STREAM products_repartitioned WITH (PARTITIONS=42) AS SELECT * FROM products PARTITION BY product_id; 103 KSQL example: statement below creates a new stream with changed number of partitions and a new field as event.key (so that its data is now correctly co-partitioned for joining)