1. 11Confidential
State of the Streaming
Platform 2017
What’s new in Apache Kafka and the Confluent Platform
David Tucker, Confluent
2. 44Confidential
The shift to streams
“By 2020, 70% of organizations will adopt data
streaming to enable real-time analytics.”1
1: Gartner: Harness Streaming Data for Real-Time Analytics - Nov 2016
2: Forrester’s 2016 Predictions: Turn Data Into Insight And Action - Nov 2015
“Streaming ingestion and analytics will become
a must-have for digital winners.”2
3. 55Confidential
Vision of a Streaming Enterprise
Search
NewSQL / NoSQL
RDBMS Monitoring
Document StoreReal-time Analytics Data Warehouse
Mobile Apps
Legacy Apps
Hadoop
Streaming Platform
4. 66Confidential
What Can You Do with a Streaming Platform ?
• Publish and Subscribe to streams of data
• Analogous to traditional messaging systems
• Store streams of data
• Consumers can look back in time
• Process streams of data
• Analyze and correlate events in real time
5. 77Confidential
The typical integration architecture
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Hadoop
Data
Warehouse
MySQL Cassandra Oracle
App
Databases
Storage
Interfaces
Monitoring
App
Databases
Storage
Interfaces
6. 88Confidential
Challenges abound
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Hadoop
Data
Warehouse
Espresso Cassandra Oracle
App
Databases
Storage
Interfaces
Monitoring
App
Databases
Storage
Interfaces
Difficult to handle
massive amounts of data
Diverse data sets, arriving
at an increasing rate
Many complex data
pipelines
Require a separate
cluster for real-time
Difficult & time
consuming to change
Require mission critical
availability into most
recent/relevant data
7. 99Confidential
Modernized architecture using Apache Kafka
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle
Hadoop
Streams API
App
Streams API
Monitoring
App
Data
Warehouse
Apache Kafka
8. 1010Confidential
Challenges addressed by a streaming platform
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational MetricsEspresso Cassandra Oracle
Hadoop
Streams API
App
Streams API
Monitoring
App
Data
Warehouse
Apache Kafka
Rewind data stream to re-
load into any target system
Scale to meet demands
of diverse streams
Pub/sub to data
streams
Lightweight, easy to
modify with minimal
disruption
Decoupled from upstream
apps creating agility Real-time, context specific
data in the moment
9. 1111Confidential
Stream Data is
The Faster the Better
Stream Data can be
Big or Fast (Lambda)
Stream Data will be
Big AND Fast (Kappa)
From Big Data to Stream Data
Apache Kafka is the Enabling Technology of this Transition
Big Data was
The More the Better
ValueofData
Volume of Data
ValueofData
Age of Data
Job 1 Job 2
Streams
Table 1 Table 2
DB
Speed Table Batch Table
DB
Streams Hadoop
10. 1212Confidential
Ingest, Process, Load, and Serve Data at a Global Scale
Data Systeam A
…
Data System B
…
Kafka cluster
Applications
Other data
stores
Kafka cluster
FIX
Raw data / Events
Kafka Streams
(Data Enrichment and Transformation)
Kafka Connect
(Connectors to Extract and Load data)
Confluent
Replicator
Confluent
Replicator
Custom
Replication
Custom
Replication
11. 1313Confidential
Confluent: Enterprise Streaming Platform based on Apache Kafka™
Confluent
Platform
Database
Changes
Log
Events
loT Data
Web
Events
…
CRM
Data
Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time
Applications
…
Apache Open Source Confluent Open Source Confluent Commercial
Confluent Enterprise
Apache Kafka™
Data Compatibility
Monitoring & Administration
Operations
Clients Connectors
Complete
Open
Trusted
Enterprise Grade
13. 1717Confidential
Apache KafkaTM Connect – Streaming Data Capture
JDBC
IRC /
Twitter
MySQL
Elastic
NoSQL
HDFS
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of
data sources and sinks
Preserves data schema
Part of Apache Kafka
project
Integrated within
Confluent Platform’s
Control Center
14. 1818Confidential
Apache KafkaTM Connect – Let the framework do the hard work
• Serialization / de-serialization
• Schema Registry integration
• Fault tolerance, automatic fail-over
• Partitioning and scale-out
• … and let the developer focus on domain specific details on copying data
15. 1919Confidential
Kafka Connect Architecture: Logical Model
Connect has three main components: Connectors, Tasks, and Workers
Data flowing into / out of the connectors is a stream; each stream is 1 or more
partitions. In practice, a stream partition could be a database table, a log file, etc.
There may or may not be an exact alignment of streams to Kafka topics.
17. 2121Confidential
Kafka Connect API Library of Connectors
* Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing has been performed.
Databases
*
Datastore/File Store
*
Analytics
*
Applications / Other
21. 2525Confidential
The Challenge of Data Compatibility at Scale
App 1
App 2
App 3
Many sources without a policy
causes mayhem in a centralized
data pipeline
Ensuring downstream systems
can use the data is key to an
operational stream pipeline
Example: Date formats
Even within a single application,
different formats can be
presented
Incompatibly formatted message
24. 2828Confidential
Architecture of Kafka Streams API, a Part of Apache Kafka
Kafka
Streams API
Producer
Kafka Cluster
Topic TopicTopic
Consumer Consumer
Key benefits
• No additional cluster
• Easy to run as a service
• Supports large aggregations and joins
• Security and permissions fully
integrated from Kafka
Example Use Cases
• Microservices
• Continuous queries
• Continuous transformations
• Event-triggered processes
25. 2929Confidential
Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™
Example Use Cases
• Microservices
• Large-scale continuous queries and transformations
• Event-triggered processes
• Reactive applications
• Customer 360-degree view, fraud detection, location-
based marketing, smart electrical grids, fleet
management, …
Key Benefits of Apache Kafka’s Streams API
• Build Apps, Not Clusters: no additional cluster required
• Elastic, highly-performant, distributed, fault-tolerant,
secure
• Equally viable for small, medium, and large-scale use
cases
• “Run Everywhere”: integrates with your existing
deployment strategies such as containers, automation,
cloud
Your App
Kafka
Streams API
26. 3030Confidential
Architecture Example
Before: Complexity for development and operations, heavy footprint
1 2 3
Capture business
events in Kafka
Must process events with
separate, special-purpose
clusters
Write results
back to Kafka
Your Processing Job
27. 3131Confidential
Architecture Example
With Kafka Streams: App-centric architecture that blends well into your existing infrastructure
1 2 3a
Capture business
events in Kafka
Process events fast, reliably, securely
with standard Java applications
Write results
back to Kafka
Your App
3b
External apps can directly
query the latest results
AppApp
Kafka
Streams API
31. 3636Confidential
Confluent Control Center: Alerting
Alerts
• Configure alerts on incomplete data
delivery, high latency, Kafka connector
status, and more
• Manage alerts for different users and
applications from a web UI
• Manage alerts for different users and
applications from a web UI
User authentication
• Control access to Confluent Control Center
• Integrates with existing enterprise
authentication systems
33. 3838Confidential
Demo Scenario: Multiple Streaming Data Pipelines
• IRC feed of Wikipedia updates
• IRC Source connector publishes real-time stream of Wikipedia updates to Kafka topic
• Kafka Streams application parses records and re-writes to new topic
• Elasticsearch Sink connector indexes parsed data
• Kibana dashboards visualize Wikipedia updates in real time
• Twitter feed augmented with sentiment data
• Twitter Source connector configured to publish data to Kafka topic
• Kafka Streams application strips extraneous twitter fields and adds sentiment score
• Sink connector saves K-Streams output to key-value store (eg Couchbase or DynamoDB)
• Key-value queries can track sentiment trends
35. 4040Confidential
Wikipedia Transformation
• Raw input records
{"createdat":1485386068652,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc-
pmtpa","hostname":"special.user"},"message":"[[List of Iranian
Americans]] https://en.wikipedia.org/w/index.php?diff=761978901&oldid=760575313 *
01:445:4080:1510:F1A4:7C08:B276:FA8B * (+0) /* Media/Journalism */"}
{"createdat":1485386069199,"channel":"#en.wikipedia","sender":{"nick":"rc-pmtpa","login":"~rc-
pmtpa","hostname":"special.user"},"message":"[[In the Bleak
Midwinter]] https://en.wikipedia.org/w/index.php?diff=761978902&oldid=761960970 * Grover cleveland *
(+422) /* Settings */"}
• Parsed records
{"createdat":1485386068652,"wikipage":"List of Iranian
Americans","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/i
ndex.php?diff=761978901&oldid=760575313","username":"01:445:4080:1510:F1A4:7C08:B276:FA8B","bytech
ange":0,"commitmessage":"/* Media/Journalism */"}
{"createdat":1485386069199,"wikipage":"In the Bleak
Midwinter","isnew":false,"isminor":false,"isunpatrolled":false,"isbot":false,"diffurl":"https://en.wikipedia.org/w/in
dex.php?diff=761978902&oldid=761960970","username":"Grover
cleveland","bytechange":422,"commitmessage":"/* Settings */"}
36. 4141Confidential
Twitter Transformation
• Raw input records
"CreatedAt": 1479252348000,
"Id": 798668350956126200,
"Text": "Iago Aspas pays tribute to #Spain players for making his international debut “easy” vs
#England… https://t.co/G13NUaZj8W",
"Source": "<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>",
"User": { }
128 separate fields
• Filtered records
{"sentiment":"Negative","sentimentScore":1,"UserName":"tits","CreatedAt":1485387765000,"Text":"RT
@STsportsdesk: Football: Real Madrid eliminated from #CopaDelRey by Celta Vigo
https://t.co/QfCLayqRsH
https://t.co/53GWANPDXj","id":"824402156707049475","UserScreenName":"titusanghongwen"}