Real-World Deployments of Data Streaming with Apache Kafka across the Healthcare Value Chain using open source and cloud-native technologies and serverless SaaS:
1) Legacy Modernization and Hybrid Cloud: Optum (UnitedHealth Group, Centene, Bayer)
2) Streaming ETL (Bayer, Babylon Health)
3) Real-time Analytics (Cerner, Celmatix, CDC/Centers for Disease Control and Prevention)
4) Machine Learning and Data Science (Recursion, Humana)
5) Open API and Omnichannel (Care.com, Invitae)
A healthy diet for your Java application Devoxx France.pdf
Data in Motion Healthcare Industry
1. The Rise of Data in Motion in the Healthcare Industry
Use Cases, Architectures and Examples powered by Apache Kafka
Kai Waehner
Field CTO
kai.waehner@confluent.io
linkedin.com/in/kaiwaehner
@KaiWaehner
confluent.io
kai-waehner.de
2. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Healthcare includes many topics…
https://isilanguagesolutions.com/2019/02/25/what-are-the-differences-between-health-care-medical-life-science-and-pharmaceutical-translations/
3. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Healthcare Value Chain
4
https://www.researchgate.net/publication/265654743_The_business_of_healthcare_innovation_in_the_Wharton_School_curriculum
4. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
The world is changing.
5. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
“Pandemic drives digital
adoption forward 5 years
in a span of 8 weeks.”
Digital adoption through COVID and beyond, McKinsey
Covid Increases the Pressure
6
6. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Digital health
ecosystems: A payer
perspective
- McKinsey Article August
2019
Digital
Health
Ecosystem
Disruption
7. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
This transformation is
happening everywhere
8. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Doctors become Software
9. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Medical Research becomes Software
10. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Patient Data becomes Software
11. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Security becomes Software
12. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Healthcare Companies and Organizations
13. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
What enables this
transformation?
14. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Real-time Data beats Slow Data.
19
15. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Real-time Data beats Slow Data.
Emergency
Real-time sensor
diagnostics
Intelligent routing
ETA updates
Patient Care
Diagnosis
Treatment
Connected Health
Insurance
Member Enrollment
Claim processing
Omnichannel
patient experience
Cybersecurity
Threat detection
Incident response
Data privacy
protection
16. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
This is a fundamental paradigm shift...
21
Infrastructure
as code
Data in Motion
as continuous
streams of events
Future of the
datacenter
Future of data
Cloud
Event
Streaming
17. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
What is Data in Motion?
18. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
‘Event’ is what happens in your business
Transportation
GPS in the ambulance sends ETA to the hospital at 5:11am.
Kafka
Insurance Claim
Alice filed a healthcare insurance claim Friday at 7:34pm.
Kafka
Patient Interaction
The doctor updates Sabine’s case status at 9:10am.
Kafka
19. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Data in Motion in the Healthcare Industry
Your Business as Streams of Events, powered by Kafka
Insurance Claim
Processing
Contact
Relatives
Patient
Diagnosis
Surgery
Ambulance
Emergency
Situation
20. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
An Event Streaming Platform is the
Underpinning of an Event-driven Architecture
25
MES
ERP
Sensors
Mobile
Customer 360
Real-time
Alerting System
Data
warehouse
Producers
Consumers
Streams of real time events
Stream processing
apps
Connectors
Connectors
Stream processing
apps
Supplier
Alert
Forecast
Inventory Customer
Order
21. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
With Data in Motion…
Hadoop ... Device
Logs ... App ...
Microservice
Mainframes
Data
Warehouse Splunk ...
Data Stores Logs 3rd Party Apps Custom Apps / Microservices
Supply Chain
Management
Medical Fraud
Detection
Patient &
Beneficiary 360
Disease Spread
Modeling
HL Data
Transformation ...
Contextual Event-Driven Applications
Universal Event Pipeline
22. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Public Health Data Automation in Confluent
28
Connectors:
CDC
MQ
REST Proxy
EDI / Batch Input
Processing
Legacy Data
Storage and
Processing
Claims Clinical
Schema
Registry
ksqlDB / Streams
HL7-FHIR
MicroServices
Analytics
Sink Connector
Sinks
23. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Example: Benefits application process
Software-using
1 3 5
4 6
2
BENEFICIARY FORM
INTAKE
CASE
MANAGER
APPLICATION
REVIEW
BENEFITS
APPLICATION
APPROVE
DENY
Software-defined
1
BENEFICIARY BENEFITS
APP UI
3
APPROVE
DENY
$
BENEFITS
SERVICE
RISK/FRAUD
SERVICE
!
EXTERNAL
AGENCY
SERVICE
2
Weeks
Seconds
24. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Use Cases for Data in Motion in the Healthcare Industry
31
Know Your Patient (= “Customer 360”)
● Digital Transformation
● eCommerce Optimization
● Product Catalog Optimization
● Product-Inventory Profiling and
Filtering by Customer or Persona
● Real-time Pricing Models
● Next Best Offer/Cross-Sell/
Recommendations
● Omni-Channel Experience
● Customer Profile Updates
● …
Operations (Healthcare 4.0 including
Drug R&D, Patient Care, etc.)
● Supply Chain Optimization
● Shipment Notifications/Delays
● Inventory Processing and
Oversight
● Predictive Inventory Management
● Connected Health
● Improved Care
● Proactive Patient Care
● Patient Notifications
● Pharma Modernization
● M&A Rapid Integration
● …
IT Perspective
● Cybersecurity/
SIEM Optimization
● Mainframe Offload
● Hybrid Cloud Integration/ Bridge
to Cloud
● Middleware/
Messaging Modernization
● Streaming ETL & Analytics
● …
25. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Real World Deployments
26. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
1. Legacy Modernization and Hybrid Cloud
2. Streaming ETL
3. Real-time Analytics
4. Machine Learning and Data Science
5. Open API and Omnichannel
Data in Motion across the Healthcare Value Chain
27. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
1. Legacy Modernization and Hybrid Cloud
2. Streaming ETL
3. Real-time Analytics
4. Machine Learning and Data Science
5. Open API and Omnichannel
Data in Motion across the Healthcare Value Chain
28. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Optum – Self-Service Kafka
American pharmacy benefit manager and health care provider
(subsidiary of UnitedHealth Group)
Kafka as a Service within UnitedHealth Group
Centrally managed and utilized by over 200 internal application
teams
Repeatable, scalable, cost-efficient way to standardize data
From mainframe via CDC into modern data processing and
analytics tools
29. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Centene
Integration and Data Processing at Scale in Real-Time
Healthcare Insurer acts as intermediary for both government-sponsored and privately insured health care programs
Largest Medicaid and Medicare Managed Care Provider in the US
https://www.confluent.io/online-talks/building-an-enterprise-eventing-framework-on-demand/
30. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Bayer AG – Hybrid Real-Time Data Flow
Adopted a cloud first strategy and started a multi-year transition to the cloud.
Kafka-based cross-datacenter DataHub was created to facilitate migration and to drive shift to real-time stream processing.
Strong enterprise adoption and supports a myriad of use cases
41
https://www.confluent.io/kafka-summit-sf18/bringing-streaming-data-to-the-masses
31. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
1. Legacy Modernization and Hybrid Cloud
2. Streaming ETL
3. Real-time Analytics
4. Machine Learning and Data Science
5. Open API and Omnichannel
Data in Motion across the Healthcare Value Chain
32. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Bayer AG – Data Integration and Processing in R&D
Analysis of clinical trials, patents, reports, news, literature, etc.
250M documents, 7TB raw text from 30 data sources.
Variety of document streams with different formats and schemas flowing through several text processing and enrichment steps.
Scalable, reliable Kafka pipelines with Kafka Streams (Java) and Faust (Python) replaced custom, error-prone, non-scalable scripts.
43
https://www.kafka-summit.org/sessions/bayer-document-stream-pipelines
33. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Babylon Health – Secure and Agile Integration
Connectivity + Agile Microservice Architecture.
GDPR and PII compliant security.
44
https://www.confluent.io/kafka-summit-lon19/one-key-to-rule-them-all
34. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
1. Legacy Modernization and Hybrid Cloud
2. Streaming ETL
3. Real-time Analytics
4. Machine Learning and Data Science
5. Open API and Omnichannel
Data in Motion across the Healthcare Value Chain
35. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Cerner – Sepsis Alerting
Supplier of health information technology services, devices, and hardware
~30% of all US Healthcare Data in a Cerner Solution
Central event streaming platform for sepsis alerting in real-time to save lives
36. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Celmatix - Reproductive Health Care
47
https://www.confluent.io/customers/celmatix/
Preclinical-stage biotech company that provides
digital tools and genetic insights focused on fertility.
Personalized information to disrupt how women
approach their lifelong reproductive health journey.
Real-time aggregation of heterogeneous data data
collected from Electronic Medical Records (EMRs)
and genetic data collected from partners through
their Personalized Reproductive Medicine (PReM)
Initiative.
Proactive reproductive health decisions by leveraging
real-time genomics data and applying technologies
such as big data analytics, machine learning, A/I and
whole-genome DNA sequencing
Data governance for security and compliance.
37. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Centers for Disease Control and Prevention (CDC):
Covid-19 Electronic Lab Reporting
https://www.confluent.io/resources/kafka-summit-2020/flattening-the-curve-with-kafka/
CELR
(COVID Electronic Lab Reporting)
Case notifications, lab reporting,
healthcare interoperability in real-time
Track the threat of COVID-19 virus to
provide comprehensive data for local,
state, and federal response
Better understand locations with an
increase in incidence
Rapidly aggregate, validate, transform,
and distribute laboratory testing data
submitted by public health departments
and other partners
38. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
1. Legacy Modernization and Hybrid Cloud
2. Streaming ETL
3. Real-time Analytics
4. Machine Learning and Data Science
5. Open API and Omnichannel
Data in Motion across the Healthcare Value Chain
39. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Recursion – Discovering Drugs in Real-Time
Accelerate drug discovery.
Find drug treatments by processing biological images.
Massively parallel system.
Combines experimental biology, artificial intelligence,
automation and real-time event streaming.
50
https://www.confluent.io/customers/recursion
https://www.confluent.io/kafka-summit-san-francisco-2019/discovering-drugs-with-kafka-streams
40. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Humana – Real-Time Integration and Analytics
Interoperability platform to transition from Insurance Company with Elements of Health,
to truly a Health Company with Elements of Insurance.
Consumer-centric, health plan agnostic, provider agnostic. Cloud resilient and elastic. Event-driven and real-time.
Inter organization data sharing (aka “data exchange / data sharing”)
Use cases include real-time updates of health information (Connecting HCP’s -> Pharmacies), reducing pre-authorizations from 20-
30 minutes to 1 minute, real-time home healthcare assistant communication
51
https://www.confluent.io/resources/kafka-summit-2020/levi-bailey-keynote-humana-improving-health-with-event-driven-architectures/
41. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
1. Legacy Modernization and Hybrid Cloud
2. Streaming ETL
3. Real-time Analytics
4. Machine Learning and Data Science
5. Open API and Omnichannel
Data in Motion across the Healthcare Value Chain
42. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Care.com – Trusted Caregivers
53
Online marketplace for a range of care services including senior care and housekeeping
Bravo Platform as simple, unified IT architecture to be able to streamline go-to-market initiatives
From a monolithic architecture into a truly decoupled, scalable microservices platform
Migration from Confluent Platform to Confluent Cloud to focus on business problems
Data Governance with Schema Registry across different run times (Java, .NET, Go, etc.)
“Care APIs” (inspired by Google APIs) to define all of their data and service contracts with Protobuf
Enhance security for PII data with fine-grained RBAC and data lineage
https://www.confluent.io/customers/care-com/
43. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Invitae – Data Science and 24/7 Production
Biotechnology company that provides DNA-based testing for the detection of genetic abnormalities beyond
what can be identified through traditional methodologies
Gene panels and single-gene testing for a broad range of clinical areas including
hereditary cancer, cardiology, neurology, pediatric genetics, metabolic disorders, immunology, hematology.
Bring comprehensive genetic information into mainstream medical practice
to improve the quality of healthcare for billions of people.
Omnichannel: Genetic results are often just the beginning. Invitae's interactive, educational portal and caring
gentic counselors can help you understand your results and what to do next.
Truly decoupled infrastructure to enable others to join in and consume the data.
Paradigm shift: Building an application entirely of streams.
54
https://www.confluent.io/kafka-summit-san-francisco-2019/from-zero-to-streaming-healthcare-in-production
44. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
What is Data Streaming with the
Apache Kafka Ecosystem?
45. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Kafka: The Trinity of Event Streaming
01
Publish & Subscribe
to Streams of Events
02
Store
your Event Streams
03
Process & Analyze
your Events Streams
46. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Kafka Makes Your Business Real-time
CREATE STREAM payments (user VARCHAR, amount INT)
WITH (kafka_topic = 'all_payments', value_format = 'avro');
CREDIT
SERVICE
ksqlDB
CREATE TABLE credit_scores AS
SELECT user, updateScore(p.amount) AS credit_score
FROM payments AS p
GROUP BY user
EMIT CHANGES;
RISK
SERVICE
ksqlDB
47. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Databases
Messaging
ETL / Data Integration
Data Warehouse
Why can’t I do this with my
existing data platforms?
48. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Enterprise Data Platform Requirements Are Shifting
1 3 4
2
Scalable for
Transactional Data
Transient Raw data
Built for
Historical Data
Built for Real-
Time Events
Scalable for
ALL data
Persistent +
Durable
Enriched
data
● Value: Trigger real-
time workflows (i.e.
real-time order
management)
● Value: Scale across
the enterprise (i.e.
customer 360)
● Value: Build
mission-critical
apps with zero data
loss (i.e. instant
payments)
● Value: Add context &
situational awareness
(i.e. ride sharing ETA)
62
49. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Only Event Streaming Has All 4 Requirements
63
50. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Only Event Streaming Has All 4 Requirements
Messaging
Databases
Event Streaming
Data Warehouse
BUILT FOR REAL-
TIME EVENTS
SCALABLE
FOR ALL DATA
PERSISTENT &
DURABLE
CAPABLE OF
ENRICHMENT
64
Good for transactional applications
Good for ultra low-latency, fire-and-forget use cases
Good for batch data integration
Good for historical analytics and reporting
Platform for Event-Driven Transformation
(Scalable Messaging + Real-Time Data Integration + Stream Processing)
ETL/Data Integration
51. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Project Example:
Drug Discovery
52. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Use Case: Drug Discovery
“On average, it takes at least ten
years for a new medicine to
complete the journey from initial
discovery to the marketplace”
PhRMA
http://phrma-docs.phrma.org/sites/default/files/pdf/rd_brochure_022307.pdf
53. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Recursion – Discovering Drugs in Real-Time
Accelerate drug discovery.
Find drug treatments by processing biological images.
Massively parallel system.
Combines experimental biology, artificial intelligence,
automation and real-time event streaming.
70
https://www.confluent.io/customers/recursion
https://www.confluent.io/kafka-summit-san-francisco-2019/discovering-drugs-with-kafka-streams
54. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Image and Video Processing
… (on high level) is “just” pixels (arrays of 0s and 1s) and matrix multiplication
55. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Drug Discovery
in manual and slow, bursty batch mode, not scalable
56. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Drug Discovery
in automated, scalable, reliable real time Mode
57. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Digital Image Processing for Drug Discovery
Find drug treatments by processing biological images:
• ML models can be trained to decide between healthy cells and disease
cells with problematic genes
• Grow healthy cells and disease cells in labs
• Apply different drugs à Make disease cells look healthy again
58. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Digital Image
Processing
(OpenCV
SaaS Service
REST API)
Kafka, ksqlDB and TensorFlow for
Drug Discovery in Real Time at Scale
Kafka Client
(.NET C++)
Batch
Reporting
Platform
BI
Dashboard
Confluent
Server
Tiered Storage
Kafka
Connect
Laboratory
(Windows Machines)
Confluent Platform
Other Components
Model Training
and Scoring
(Python Client +
TensorFlow)
All Data
Processed
Images
Images
Human
Intelligence
Streaming
ETL
(ksqlDB)
Stateful
Workflow
Orchestration
(Kafka Streams)
Database
(MySQL) Kafka Connect
(Oracle CDC)
Historical Drugs Data
59. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Ingestion of Images
Replication
Cluster Linking
Kafka
Connect
Laboratory
60. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Data Preprocessing
Preprocessing
Filter, transform, anonymize, extract features,
reduce noise, enhance brightness / contrast
Streams
Data Ready
For Model Training
61. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
SELECT image_id, experiment_id, image_details
FROM image_channel i
LEFT JOIN experiment_database e ON i.experiment_id =
e.experiment_id
WHERE e.image_type = ‘black_and_white';
Data Processing with ksqlDB
62. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Direct streaming ingestion
for model training and / or scoring
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model B
Model A
Producer
Distributed Commit Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
63. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Confluent Tiered Storage for Kafka
85
64. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Use Cases for Reprocessing Historical Events
Give me all events from time A to time B
Real-time Producer
Time
• New consumer application
• Error-handling
• Compliance / regulatory processing
• Query and analyze existing events
• Schema changes in analytics platform
• Model training
Real-time Consumer
Consumer of Historical Data
65. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Local Predictions
Model Training
in Cloud
Model Deployment
at the Edge
Analytic Model
Separation of
Model Training and Model Inference
66. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Streams
Input Event
Prediction
Request
Response
Model Serving
TensorFlow Serving
gRPC / HTTP
Application
Stream Processing with External Model and RPC
Model
67. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
“CREATE STREAM ImageAnalysis AS
SELECT image_id, analyzeImage(image_details)
FROM image_channel;“
User Defined Function (UDF)
Embedded Model Deployment with
Apache Kafka, ksqlDB and TensorFlow
68. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Model Training and Scoring
with the same ML Pipeline (or even in the same Application)
• Data Science team responsible for the whole model lifecycle
• Beloved Python tool stack (Pandas, scikit learn, TensorFlow, Jupyter, …)
• 24/7 production scale with Confluent Python Client (e.g. deployed in Docker containers on Kubernetes)
69. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Digital Image
Processing
(External SaaS
Service + REST)
Kafka, ksqlDB and TensorFlow for
Drug Discovery in Real Time at Scale
Kafka Client
(.NET C++)
Batch
Reporting
Platform
BI
Dashboard
Confluent
Server
Tiered Storage
Kafka
Connect
Laboratory
(Windows Machines)
Confluent Platform
Other Components
Model Training
and Scoring
(Python Client +
TensorFlow)
All Data
Processed
Images
Images
Human
Intelligence
Streaming
ETL
(ksqlDB)
Stateful
Workflow
Orchestration
(Kafka Streams)
Database
(MySQL) Kafka Connect
(Oracle CDC)
Historical Drugs Data
70. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Data in Motion Is The Future Of Data
92
Infrastructure
as code
Data in motion
as continuous
streams of events
Future of the
datacenter
Future of data
Cloud
Event
Streaming
71. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Why Confluent?
72. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
The Rise of Data in Motion
2010
Apache Kafka
created at LinkedIn by
Confluent founders
2014
2020
80%
Fortune 100
Companies
trust and use
Apache Kafka
95
73. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
I N V E S T M E N T & T I M E
V
A
L
U
E
3
4
5
1
2
Event Streaming Maturity Model
Initial Awareness /
Pilot (1 Kafka
Cluster)
Start to Build
Pipeline / Deliver 1
New Outcome
(1 Kafka Cluster)
Mission-Critical
Deployment
(Stretched, Hybrid,
Multi-Region)
Build Contextual
Event-Driven Apps
(Stretched, Hybrid,
Multi-Region)
Central Nervous
System
(Global Kafka)
Product, Support, Training, Partners, Technical Account Management...
96
74. Data in Motion with Apache Kafka in the Healthcare Industry – @KaiWaehner - www.kai-waehner.de
Car Engine Car Self-driving Car
Confluent completes Apache Kafka. Cloud-native. Everywhere.