IoT Sensor Analytics with Apache Kafka, KSQL, TensorFlow and MQTT => Kafka-Native End-to-End IoT Data Integration and Processing.
Large numbers of IoT devices lead to big data and the need for further processing and analysis. Apache Kafka is a highly scalable and distributed open source streaming platform, which can connect to MQTT and other IoT standards. Kafka ingests, stores, processes and forwards high volumes of data from thousands of IoT devices.
The rapidly expanding world of stream processing can be daunting, with new concepts such as various types of time semantics, windowed aggregates, changelogs, and programming frameworks to master. KSQL is the streaming SQL engine on top of Apache Kafka which simplifies all this and make stream processing available to everyone without the need to write source code.
This talk shows how to leverage Kafka and KSQL in an IoT sensor analytics scenario for predictive maintenance and integration with real time monitoring systems. A live demo shows how to embed and deploy Machine Learning models - built with frameworks like TensorFlow, DeepLearning4J or H2O - into mission-critical and scalable real time applications.
8. 10
?
Architecture (High Level)
Kafka BrokerKafka BrokerStreaming
Platform
Connect
w/ MQTT
connector
GatewayDevicesDevicesDevicesDevice
Device Tracking
(Real Time)
Predictive
Maintenance
(Near Real Time)
Log Analytics
(Batch)
Edge Data Center / Cloud
How to integrate?
9. 13
Agenda
1) IoT Use Cases
2) MQTT Standard
3) Apache Kafka Ecosystem
4) TensorFlow for IoT Scenarios
5) End-to-End IoT Integration Architecture(s)
6) IoT Data Processing
7) Live Demo: End-to-End Sensor Analytics
10. 14
MQTT - Publish / subscribe messaging protocol
• Built on top of TCP/IP for constrained devices and unreliable networks
• Many (open source) broker implementations
• Many client libraries
• IoT-specific features for bad network / connectivity
• Widely used (mostly IoT, but also web and mobile apps via MQTT over WebSockets)
11. 17
MQTT Architecture (large scale)
Load
Balancer
MQTT
Server 1
MQTT
Server 2
MQTT
Server 3
MQTT
Server 4
topic: [deviceid]/car
...
Processor
1
Processor
2
Processor
3
Processor
4
12. 18
MQTT Trade-Offs
Pros
• Lightweight
• Simple API
• Built for poor connectivity / high latency scenario
• Many client connections (tens of thousands per MQTT server)
Cons
• Queuing, not stream processing
• Can’t handle usage surges (no buffering)
• No high scalability (true for most MQTT brokers)
• Very asynchronous processing (often offline for long time)
• No good integration to the rest of the enterprise
• No reprocessing of events
13. 19
Agenda
1) IoT Use Cases
2) MQTT Standard
3) Apache Kafka Ecosystem
4) TensorFlow for IoT Scenarios
4) End-to-End IoT Integration Architecture(s)
5) IoT Data Processing
6) Live Demo: End-to-End Sensor Analytics
14. 20
Apache Kafka – The Rise of a Streaming Platform
The Log ConnectorsConnectors
Producer Consumer
Streaming Engine
17. 25
Apache Kafka at Scale
https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/63921
https://qconlondon.com/london2018/presentation/cloud-native-and-scalable-kafka-architecture
(2018)
(2018)
18. 26
Kafka Trade-Offs (from IoT perspective)
Pros
• Stream processing, not just queuing
• High throughput
• Large scale
• High availability
• Long term storage and buffering
• Reprocessing of events
• Good integration to the rest of the enterprise
Cons
• Not built for tens of thousands connections
• Requires stable network and good infrastructure
• No IoT-specific features like keep alive, last will or testament
20. 28
Agenda
1) IoT Use Cases
2) MQTT Standard
3) Apache Kafka Ecosystem
4) TensorFlow for IoT Scenarios
5) End-to-End IoT Integration Architecture(s)
6) IoT Data Processing
7) Live Demo: End-to-End Sensor Analytics
21. 29
TensorFlow
TensorFlow is an open source software library for high
performance numerical computation. Its flexible architecture
allows easy deployment of computation across a variety of
platforms (CPUs, GPUs, TPUs), and from desktops to clusters of
servers to mobile and edge devices. Originally developed by
researchers and engineers from the Google Brain team within
Google’s AI organization, it comes with strong support for
machine learning and deep learning and the flexible
numerical computation core is used across many other scientific
domains.
https://www.tensorflow.org/
22. 30
The First Analytic Models
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero downtime?
23. 31
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
25. 33
Apache Kafka’s Open Ecosystem as Infrastructure for ML
Kafka
Streams
Kafka
Connect
Rest Proxy
Schema Registry
Go/.NET /Python
Kafka Producer
KSQL
Kafka
Streams
26. 37
Replayability — a log never forgets!
Time
Model B Model XModel A
Producer
Distributed Commit Log
Different models with same data
Different ML frameworks
AutoML compatible
A/B testing
Google Cloud Storage HDFS
28. 39
Model Deployment #1: RPC Communication to do Model Inference
Streams
Input Event
Prediction
Request
Response
Model Serving
TensorFlow Serving
gRPC
29. 40
Model deployment #2: Model interference natively in the App
Streams
Input Event
Prediction
30. 41
Agenda
1) IoT Use Cases
2) MQTT Standard
3) Apache Kafka Ecosystem
4) TensorFlow for IoT Scenarios
5) End-to-End IoT Integration Architecture(s)
6) IoT Data Processing
7) Live Demo: End-to-End Sensor Analytics
31. 42
?
Architecture (High Level)
Kafka BrokerKafka BrokerStreaming
Platform
Connect
w/ MQTT
connector
GatewayDevicesDevicesDevicesDevice
Device Tracking
(Real Time)
Predictive
Maintenance
(Near Real Time)
Log Analytics
(Batch)
Edge Data Center / Cloud
How to integrate?
32. 43
?
Architecture (High Level) – Machine Learning Perspective
Kafka BrokerKafka BrokerStreaming
Platform
Connect
w/ MQTT
connector
GatewayDevicesDevicesDevicesDevice
Edge Analytics
Real Time
Model Serving
Predictive
Maintenance
Near Real Time
Model Serving
Model Training
(Batch)
Edge Data Center / Cloud
37. 57
MQTT Proxy
Kafka BrokerKafka BrokerKafka Broker
MQTT
ProxyMQTT
DevicesDevicesDevicesDevices
Kafka
Consumer
MQTT Proxy
MQTT is push-based
Horizontally scalable
Consumes push data from IoT devices and forwards it to Kafka Broker at low-latency
Kafka Producer under the hood
No MQTT Broker needed
Kafka Broker
Source of truth
Responsible for persistence, high availability, reliability
39. 60
Confluent REST Proxy
REST Proxy
IoT Applicatons
Native Kafka
Applications
(Java, C, Go, …)
REST / HTTP(S)
TCP
The „simple alternative“ for IoT
• Simple and understood
• HTTP(S) Proxy à Push-based
• Security ”easier”
• Scalable with standard load balancer
(still synchronous HTTP)
• Not for very high throughput
• Implement Kafka Connect features in
your client app
40. 62
Agenda
1) IoT Use Cases
2) MQTT Standard
3) Apache Kafka Ecosystem
4) TensorFlow for IoT Scenarios
5) End-to-End IoT Integration Architecture(s)
6) IoT Data Processing
7) Live Demo: End-to-End Sensor Analytics
41. 6363
Processing Options for MQTT Data with Apache Kafka
Streams
Kafka native vs. additional big data cluster and technology
(or others, you name it …)
42. 6464
IoT Data Processing
Kafka Client
Batch
System
AnalyticsKafka Cluster Kafka Connect
Kafka Streams
/
KSQL
MQTT Device
Kafka Ecosystem
Other Components
Real Time
System
All Data
Alerting
Process
Data
Continuously
Forward
Processed
Data
On premise DC / CloudAt the edge
43. 6868
KSQL – Continuous Queries for Streaming ETL / Anomaly Detection
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTES)
GROUP BY card_number
HAVING count(*) > 3;
44. 6969
Agenda
1) IoT Use Cases
2) MQTT Standard
3) Apache Kafka Ecosystem
4) TensorFlow for IoT Scenarios
5) End-to-End IoT Integration Architecture(s)
6) IoT Data Processing
7) Live Demo: End-to-End Sensor Analytics
45. 7070
KSQL and Deep Learning (Auto Encoder) for Anomaly Detection
MQTT
Proxy
Elastic
search
Grafana
Kafka
Cluster
Kafka
Connect
KSQL
Car Sensors
Kafka Ecosystem
Other Components
Real Time
Emergency
System
All Data
PotentialDefect
Apply
Analytic
Model
Filter
Anomalies
On premise DCAt the edge
5858
KSQL and Deep Learning (Auto Encoder) for Anomaly Detection
MQTT
Proxy
Elastic
search
Grafana
Kafka
Cluster
Kafka
Connect
KSQL
Car Sensors
Kafka Ecosystem
Other Components
Real Time
Emergency
System
All Data
PotentialDefect
Apply
Analytic
Model
Filter
Anomalies
On premise DCAt the edge
46. 7171
Model Training with Python, KSQL, TensorFlow, Keras and Jupyter
https://github.com/kaiwaehner/python-jupyter-apache-kafka-ksql-tensorflow-keras
47. 7272
Model Deployment with Apache Kafka, KSQL and TensorFlow
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
49. 74
Model Training with Python, KSQL, TensorFlow, Keras and Jupyter
https://github.com/kaiwaehner/python-jupyter-apache-kafka-ksql-tensorflow-keras
50. 75
Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data
https://github.com/kaiwaehner/ksql-udf-deep-learning-mqtt-iot