Beyond the brokers - A tour of the Kafka ecosystem. Presentation done the 28/03/2019 (Lyon JUG: https://www.meetup.com/Lyon-Java-User-Group-LyonJUG/events/259569434/)
Beyond the brokers - A tour of the Kafka ecosystem
1. 1
1
Beyond the brokers
A tour of the Kafka Ecosystem
Damien Gasparina
Solution Architect
damien@confluent.io
2. 2
Massive volumes
of new data
generated every
day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters,
clouds
Structured,
unstructured
polymorphic
What
9. 10
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05
Break Silos
04
03
Go To Production
02
Learn Kafka
01
Investment & Time
Solve A Critical
Need
Maturity model
10. 11
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05Break Silos
04
03
Go To Production
02
Learn Kafka
01
Solve A Critical
Need
Maturity model
05
11. 12
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05Break Silos
04
03
Go To Production
02
Learn Kafka
01
Solve A Critical
Need
Maturity model
05
13. 15
… spawned a full platform
Apache Kafka®
Core | Connect API | Streams API
Stream Processing & Compatibility
KSQL | Schema Registry
Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME APPLICATIONS
OPEN SOURCE FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
CONFLUENT PLATFORM
Administration & Monitoring
Control Center | Security
Connectivity
Clients | Connectors | REST Proxy
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
15. 17
Apache Kafka Connect API: Import and Export Data In & Out of Kafka
JDBC
Mongo
MySQL
Elastic
Cassandra
HDFS
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of
data sources and
sinks
Preserves data
schema
Integrated within
Confluent Control
Center
16. 18
Connectors: Connect Kafka Easily with Data Sources and Sinks
Databases Datastore/File Store
Analytics Applications / Other
17. 19
Kafka Connect API, Part of the Apache Kafka™ Project
Connect any source to any target system
Integrated
• 100% compatible with Kafka v0.9
and higher
• Integrated with Confluent’s Schema
Registry
• Easy to manage with Confluent
Control Center
Flexible
• 40+ open source connectors available
• Easy to develop additional connectors
• Flexible support for data types and
formats
Compatible
• Maintains critical metadata
• Preserves schema information
• Supports schema evolution
Reliable
• Automated failover
• Exactly-once guarantees
• Balances workload between
nodes
20. 22
Clients: Communicate with Kafka in a Broad Variety of Languages
Apache Kafka
Confluent Platform Community Supported
Proxy http/REST
stdin/stdout
Confluent Platform Clients developed and fully supported by Confluent
21. 23
REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall
REST Proxy
Non-Java Applications
Native Kafka Java
Applications
Schema Registry
REST / HTTP
Simplifies administrative
actions
Simplifies message creation
and consumption
Provides a RESTful
interface to a Kafka cluster
23. 25
MQTT Proxy: Streamline IoT Data Integration with Kafka
Connect all IoT data sources with the
streaming platform - leverages all of your
infrastructure investments
Reduce operational cost and complexity by
eliminating third party MQTT brokers and
their intermediate storage and lag
Ensure IoT data delivery at all QoS levels
(QoS0, QoS1 and QoS2) of the MQTT
protocol
Kafka Broker
Kafka Broker
Kafka Broker
MQTT
ProxyGatewaysDevices
MQTT MQTT
36. 38
KSQL for Data Exploration
SELECT status, bytes
FROM clickstream
WHERE user_agent =
'Mozilla/5.0 (compatible; MSIE 6.0)';
37. 39
KSQL for Streaming ETL
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
39. 41
User Defined Functions (UDF)
SELECT eventid, anomaly(sensorinput)
FROM sensor
@Udf(description = "apply analytic model to sensor input")
public String anomaly(String sensorinput){ return your_logic; }
40. 42
KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
41. 43
KSQL: Enable Stream Processing using SQL-like Semantics
Example Use Cases
• Streaming ETL
• Anomaly detection
• Event monitoring
Leverage Kafka Streams API
without any coding required
KSQL server
Engine
(runs queries)
REST API
CLIClients
Confluent
Control Center
GUI
Kafka Cluster
Use any programming language
Connect via CLI or Control
Center user interface
44. 46
Lowering the Bar to Enter the World of Streaming
Kafka User Population
CodingSophistication
Core Java developers
Core developers who don’t use Java/Scala
Data engineers, architects, DevOps/SRE
BI analysts
streams
46. 48
The Challenge of Data Compatibility at Scale : implicit explicit !
App 1
App 2
App 3
Many sources without a policy causes
mayhem in a centralized data
pipeline
Ensuring downstream systems can
use the data is key to an
operational stream pipeline
Example: Date formats
Even within a single application,
different formats can be
presented
Incompatibly formatted message
47. 49
Schema Registry: Make Data Backwards Compatible and Future-Proof
● Define the expected fields for each Kafka topic
● Automatically handle schema changes (e.g. new
fields)
● Prevent backwards incompatible changes
● Support multi-data center
environments
Elastic
Cassandra
HDFS
Example Consumers
Serializer
App 1
Serializer
App 2
!
Kafka Topic!
Schema
Registry
49. 51
Which one do you prefer ?
• Zip
• Yum/apt
• Ansible
• Docker
• DC/OS
• Helm-charts
• Confluent Operator
• ... Cloud!
50. 52
Operator: Achieve End to End Automation on Kubernetes
Confluent Platform
Docker Images
Public Cloud On-Premises
Pivotal
Mesospher
e
Red HatAWS Azure GCP
Confluent Operator operationalizes years of experience
delivering a fully-managed service - Confluent Cloud - on the
leading public clouds
Confluent Cloud
Docker Images
Confluent Operator
Accelerate time to value with
automated zero-touch
provisioning
Reduce OpEx and boost DevOps
agility with rolling updates,
elastic scaling and auto data
balancing
Increase resiliency via SLA
monitoring through Control
Center or Prometheus
51. 53
Amazon Elastic Container
Service (EKS)
Google Kubernetes Engine
(GKE)
Azure Kubernetes Service
(AKS)
KUBERNETES
DISTRIBUTIONS
SERVICE PROVIDERS
53. 55
Auto Data Balancer: Achieve Enterprise-level Performance for Kafka
Befor
e
After
Rebalance
Dynamically move
partitions to optimize
resource utilization and
reliability
Enable elastic scaling by
easily adding and
removing nodes from your
Kafka cluster
ADB traffic is throttled
upon data transfers to
ensure network bandwidth
54. 56
Replicator: Stretch Kafka Across Data Centers and Public Cloud
Protect business-critical data and
metadata by replicating down to topic-
level configurations
Minimize recovery time objectives (RTO)
through automated failover and
switchback
Meet recovery point objectives (RPO)
running more workers to increase
replication throughput
Bridge your data center to
the cloud with Confluent
Cloud
56. 58
System Health
Are all brokers and topics available?
How much data is being processed?
What can be tuned to improve
performance?
End-to-End SLA Monitoring
Does Kafka process all events <15
seconds?
Is the 8am report missing data?
Are there duplicate events?
58. 60
Confluent Control Center– Cluster Health & Administration
Cluster health dashboard
• Monitor the health of your
Kafka clusters
and get alerts if any problems
occur
• Measure system load,
performance,
and operations
• View aggregate statistics or
drill down
by broker or topic
Cluster administration
• Monitor topic configurations
59. 61
Operate More Secure, Reliable and Performant Apache Kafka
● Broker configuration view → see config across
multiples Kafka clusters or check values for
specific brokers
● Consumer lag → view how consumers are
performing based on offset, spot potential
issues and take proactive steps to keep
performance high
● Feature access controls → control customer
access to topic inspection, schemas, and
KSQL
For Operators
60. 62
Build More Powerful Streaming Applications
● Topic inspection → gain insight into the actual data
in Kafka topics
● Schema registry integration → view older and current
schema versions in a git-like UI
● KSQL GUI → create streams and tables from topics,
experiment with transient queries, and run
persistent queries to filter and enrich data
For Developers
62. 64
Make stream processing more
accessible
Build stream processing IP in CE
Manage streams & tables
Run KSQL (transient & persistent)
View persistent queries
KSQL UI
64
76. 78
Kafka Provides a
Central Nervous
System for the
Modern Digital
Enterprise
Enabling companies to respond
accurately and in real time to
business events