After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Scaling API-first – The story of a global engineering organization
Apache Kafka - A modern Stream Processing Platform
1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Apache Kafka
A modern Stream Processing Platform
Guido Schmutz
nlOUG Tech Experience – 7.6.2018
@gschmutz guidoschmutz.wordpress.com
2. Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Kafka Connect & Streams - the Ecosystem around Kafka
3. Agenda
1. What is Apache Kafka?
2. Kafka Connect
3. KSQL
4. Kafka Streams
5. Kafka and "Big Data" / "Fast Data" Ecosystem
6. Kafka in Software Architecture
Kafka Connect & Streams - the Ecosystem around Kafka
4. What is Apache Kafka?
Kafka Connect & Streams - the Ecosystem around Kafka
5. Apache Kafka History
2012 2013 2014 2015 2016 2017
Cluster mirroring
data compression
Intra-cluster
replication
0.7
0.8
0.9
Data Processing
(Streams API)
0.10
Data Integration
(Connect API)
0.11
2018
Exactly Once
Semantics
Performance
Improvements
KSQL Developer
Preview
Kafka Connect & Streams - the Ecosystem around Kafka
1.0 JBOD Support
Support Java 9
1.1 Header for Connect
Replica movement
between log dirs
6. Apache Kafka – A Streaming Platform
Kafka Connect & Kafka Streams/KSQL
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
7. Strong Ordering Guarantees
most business systems need strong
ordering guarantees
messages that require relative
ordering need to be sent to the same
partition
supply same key for
all messages that
require a relative order
To maintain global ordering use a
single partition topic
Producer 1
Consumer 1
Broker 1
Broker 2
Broker 3
Consumer 2
Consumer 3
Key-1
Key-2
Key-3
Key-4
Key-5
Key-6
Key-3
Key-1
Kafka Connect & Streams - the Ecosystem around Kafka
9. Hold Data for Long-Term – Data Retention
Producer 1
Broker 1
Broker 2
Broker 3
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.bytes
4. Log compaction based
(entries with same key are removed):
kafka-topics.sh --zookeeper zk:2181
--create --topic customers
--replication-factor 1
--partitions 1
--config cleanup.policy=compact
Kafka Connect & Streams - the Ecosystem around Kafka
14. Demo (I) – Run Producer and Kafka-Console-Consumer
Kafka Connect & Streams - the Ecosystem around Kafka
15. Demo (I) – Java Producer to "truck_position"
Constructing a Kafka Producer
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers","broker-1:9092);
kafkaProps.put("key.serializer", "...StringSerializer");
kafkaProps.put("value.serializer", "...StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);
ProducerRecord<String, String> record =
new ProducerRecord<>("truck_position", driverId, eventData);
try {
metadata = producer.send(record).get();
} catch (Exception e) {}
Kafka Connect & Streams - the Ecosystem around Kafka
16. Demo (II) – devices send to MQTT instead of Kafka
Truck-2
truck/nn/
position
Truck-1
Truck-3
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
17. Demo (II) – devices send to MQTT instead of Kafka
Kafka Connect & Streams - the Ecosystem around Kafka
18. Demo (II) - devices send to MQTT instead of Kafka –
how to get the data into Kafka?
Truck-2
truck/nn/
position
Truck-1
Truck-3
truck
position raw
?
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
19. Apache Kafka – wait there is more!
Microservices with Kafka Ecosystem24
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
21. Kafka Connect - Overview
Source
Connector
Sink
Connector
Kafka Connect & Streams - the Ecosystem around Kafka
22. Kafka Connect – Single Message Transforms (SMT)
Simple Transformations for a single message
Defined as part of Kafka Connect
• some useful transforms provided out-of-the-box
• Easily implement your own
Optionally deploy 1+ transforms with each
connector
• Modify messages produced by source
connector
• Modify messages sent to sink connectors
Makes it much easier to mix and match connectors
Some of currently available
transforms:
• InsertField
• ReplaceField
• MaskField
• ValueToKey
• ExtractField
• TimestampRouter
• RegexRouter
• SetSchemaMetaData
• Flatten
• TimestampConverter
Kafka Connect & Streams - the Ecosystem around Kafka
23. Kafka Connect – Many Connectors
60+ since first release (0.9+)
20+ from Confluent and Partners
Source: http://www.confluent.io/product/connectors
Confluent supported Connectors
Certified Connectors Community Connectors
Kafka Connect & Streams - the Ecosystem around Kafka
29. KSQL: a Streaming SQL Engine for Apache Kafka
• Enables stream processing with zero coding required
• The simples way to process streams of data in real-time
• Powered by Kafka and Kafka Streams: scalable, distributed, mature
• All you need is Kafka – no complex deployments
• available as Developer preview!
• STREAM and TABLE as first-class citizens
• STREAM = data in motion
• TABLE = collected state of a stream
• join STREAM and TABLE
Kafka Connect & Streams - the Ecosystem around Kafka
38. Demo (V) – Create JDBC Connect through REST API
Kafka Connect & Streams - the Ecosystem around Kafka
39. Demo (V) - Create Table with Driver State
Kafka Connect & Streams - the Ecosystem around Kafka
ksql> CREATE TABLE driver_t
(id BIGINT,
first_name VARCHAR,
last_name VARCHAR,
available VARCHAR)
WITH (kafka_topic='trucking_driver',
value_format='JSON',
key='id');
Message
----------------
Table created
40. Demo (V) - Create Table with Driver State
ksql> CREATE STREAM dangerous_driving_and_driver_s
WITH (kafka_topic='dangerous_driving_and_driver_s',
value_format='JSON')
AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype
FROM truck_position_s
LEFT JOIN driver_t
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Unsafe tail distance
1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Lane Departure
1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Unsafe tail
distance
Kafka Connect & Streams - the Ecosystem around Kafka
42. Kafka Streams - Overview
• Designed as a simple and lightweight library in Apache
Kafka
• no external dependencies on systems other than Apache
Kafka
• Part of open source Apache Kafka, introduced in 0.10+
• Leverages Kafka as its internal messaging layer
• Supports fault-tolerant local state
• Event-at-a-time processing (not microbatch) with millisecond
latency
• Windowing with out-of-order data using a Google DataFlow-like
model
Kafka Connect & Streams - the Ecosystem around Kafka
43. Kafka Stream DSL and Processor Topology
KStream<Integer, String> stream1 =
builder.stream("in-1");
KStream<Integer, String> stream2=
builder.stream("in-2");
KStream<Integer, String> joined =
stream1.leftJoin(stream2, …);
KTable<> aggregated =
joined.groupBy(…).count("store");
aggregated.to("out-1");
1 2
lj
a
t
State
Kafka Connect & Streams - the Ecosystem around Kafka
44. Kafka Stream DSL and Processor Topology
KStream<Integer, String> stream1 =
builder.stream("in-1");
KStream<Integer, String> stream2=
builder.stream("in-2");
KStream<Integer, String> joined =
stream1.leftJoin(stream2, …);
KTable<> aggregated =
joined.groupBy(…).count("store");
aggregated.to("out-1");
1 2
lj
a
t
State
Kafka Connect & Streams - the Ecosystem around Kafka
45. Kafka Streams Cluster
Processor Topology
Kafka Cluster
input-1
input-2
store (changelog)
output
1 2
lj
a
t
State
Kafka Connect & Streams - the Ecosystem around Kafka
48. Kafka Streams: Key Features
Kafka Connect & Streams - the Ecosystem around Kafka
• Native, 100%-compatible Kafka integration
• Secure stream processing using Kafka's security features
• Elastic and highly scalable
• Fault-tolerant
• Stateful and stateless computations
• Interactive queries
• Time model
• Windowing
• Supports late-arriving and out-of-order data
• Millisecond processing latency, no micro-batching
• At-least-once and exactly-once processing guarantees
50. Demo (IV) - Create Stream
final KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> source =
builder.stream(stringSerde, stringSerde, "truck_position");
KStream<String, TruckPosition> positions =
source.map((key,value) ->
new KeyValue<>(key, TruckPosition.create(key,value)));
KStream<String, TruckPosition> filtered =
positions.filter(TruckPosition::filterNonNORMAL);
filtered.map((key,value) -> new KeyValue<>(key,value.toCSV()))
.to("dangerous_driving");
Kafka Connect & Streams - the Ecosystem around Kafka
51. Kafka and "Big Data" / "Fast Data"
Ecosystem
Kafka Connect & Streams - the Ecosystem around Kafka
52. Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular products / frameworks
• Apache Spark Streaming
• Apache Flink
• Apache Storm
• Apache Apex
• Apache NiFi
• StreamSets
• Oracle Stream Analytics
• Oracle Service Bus
• Oracle GoldenGate
• Oracle Event Hub Cloud Service
• Debezium CDC
• …
Additional Info: https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Kafka Connect & Streams - the Ecosystem around Kafka
53. Kafka in Software Architecture
Kafka Connect & Streams - the Ecosystem around Kafka
54. Hadoop Clusterd
Hadoop Cluster
Big Data
Kafka – the Event Hub and more …. !
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Event
Hub
Data
Flow
Data
Flow
Change
Data
Capture
Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Location
Social
Click
stream
Sensor
Data
Mobile
Apps
Weather
Data
Stream Processing
Microservices
55. Hadoop Clusterd
Hadoop Cluster
Big Data
Kafka – the Event Hub and more …. !
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Event
Hub
Data
Flow
Data
Flow
Change
Data
Capture
Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Location
Social
Click
stream
Sensor
Data
Mobile
Apps
Weather
Data
Stream Processing
Microservices
56. Kafka Connect & Streams - the Ecosystem around Kafka
Technology on its own won't help you.
You need to know how to use it properly.