SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
National Center for Emerging and Zoonotic Infectious Diseases
Flattening the Curve with
Covid-19 Electronic Lab Reporting
Rishi Tarar, Northrop Grumman
Jason Hall, CDC
Kafka Summit , 2020
Background
§ This architecture stemmed out of necessity for CDC’s
EIP(Emerging Infections Program) programs, with an eye on
ongoing agency efforts (CDC Data and IT Modernization)
§ Multiple national level use case Implementations proved out
the architecture and exposed commonality that can extend
enterprise wide…
§ And meet hard challenges like a Pandemic – head on
COVID-19 Electronic Lab Reporting(CELR) - Scope
§ Agency initiative to collect COVID-19 line level lab testing data
from alljurisdictions in United States
§ Goal to have most comprehensive testing data
§ Improve the quality and fidelity of line level data on an
ongoing basis
§ Could be used for other conditions
PHD
PHL
Private
Labs
X
CDC
CSV v1
2.52.3.1
2.5.1 2.3.1
CELR
(Alice)
Live
Live
PHLIP
AIMS
CSV
Lab
Device
Manufacturers
NotActive
HHS
AIMS Hub
2.3z
CSV v2
2.5.12.3
PHLIP
Very High level Data Flows
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2016
Future State is a Mirage – Transition State is reality
Current
Stream
New
Stream
2020
Another
Current State Future StateTransition State
Primary Citizen -> TESTING EVENT
§ Data: Each record is an TESTING EVENT
§ Producers organized adjacent to feed formats
§ Streaming data and shaping it record-by-record through the pipelines
§ Each record is a primary citizen
– Each record flows through the set of stream processors
– Metadata is added to each record
– Makes “things” happening to “a” record rapidly observable
– Each record conforms to an evolving schema capability
§ Data can be aggregated and streamed to any destination on the fly
Event Pipelines
Event sourcing
Program Y
Data
Sources
Program X
Feed1
Data
Lake
New
Producer
Current
Producer
Feed2
Validate Redact Transform Translate
Biz
Rules
Case
Clasification
S3 Sink
Connector
JDBC
Sink
Connector
event event event event event
Elastic
Sink
Connector
Data Lake
Kafka
Data Streams
Configuration driven workloads
Data events sink-ed
Storage (Blob,Relational, ElasticSearch)
Frameworks
Pipelines organized
by Program and
Pathogen
The Platform High Level Architecture
Kafka
FLAT Pipelines
Pipelines
HL7
CDA Pipelines
FHIR Pipelines
CDA
Labs
Hospital
FHIR
SPHL
CSV/
JSON
XLXS
HL7 Pipelines
Registri
es
Data Lake
S3
Dashboards
Data Sets
Data Science Tools
Case Notifications Lab Reporting
Healthcare
Interoperability
Use Cases Implemented
Athena
Redshift
Schema
Dictionary
Partner
Collaboration
Tools Real Time Data
Stream
Custom Data
Sets
Bulk Exports
Machine
Learning
Business
User
(Non Tech)
Data Manager
Data Science
User
Data Storefronts
Merged Lab Data
Athena
Tables
Redshift
Tables
Quick
Sights
Data
Science
Tools
DCIPHER
Curated
Views
All Data
HHS
CELR
Portal
Self Service Data Storefront
Business
User
Data Manager Data Science
User
Automated Data Storefront
Line Level
Lab Data
Aggregated
Lab Data
Data Products
Provenance
Validation Reports
Dead letter Reports
Audit Reports
VAR
Team
Glue Crawler Glue Jobs
Analytical Pipelines
Data Lake
Update
Hourly
Glue
ETL
Translation
Exclusion
Tagging
Race and Age
Calculation
Fllter
Schedule
Trigger
Trigger
Schedule
Data
Catalogue
Features in place TODAY
§ Ingest
• Real time Staged Event Pipeline Processing or
Manual Upload
• HL7 Pipelines - Support HL7 (2.5.1, 2.5, 2.3.z,
2.3.1)
• FLAT Pipelines - CSV/FLAT/JSON (Any Size)
• FHIR Pipeline*
§ Validation
• FLAT File and Record level Validations via
Configurations (no code)
• HL7 2.5.1, 2.5, 2.3.z, 2.3.1 Validations via
Configurations
§ Transformations
• FLAT to HL7 Hierarchy
• HL7 to FHIR (per build.fhir.org) *
• HL7 to FLAT via Configurations
§ Translations
• Terminology transformations via Configurations
§ Data Lake Management Services
• At scale ETL Workflows
• SQL Style Querying on all Data
• Data Replay and Data de-Duplication
• Biz rules for calculating fields
• Machine Learning for feature extraction
from raw data and ETL for Data cleaning *
• Configuration Management
• Data Case Classifications
• Data Catalogue (Schemas and Dictionaries)
• Auditor Services for proactive issue
detections
§ Data Policy and Governance
• Data Use Agreement Filter
• Data Enrichment
• Auto Data Catalogue
• Data Security
• Data Redaction
• Pseudonymization for linking*
• De-Identification
§ Data Products (Reporting/Provision)
• Merged Line level Data from all sources
in single schema
• On demand canned Data Products
(extracts)
• Bulk Data Exports - time stamped data
sets at scale
• Self Service Custom queries
• De-duplications for resubmissions at the
record level
§ Data Integration Products
• Data Routing
• Clinical Decision Support for Guidance
Delivery *
• Exposing Data as FHIR API *
• SMART on FHIR App for integration with
EHR *
§ Analytics
• Real Time Dashboards for Lake Operations
• Real Time Dashboards for Lake Data Quality
and Provenance
• Jupyter Notebooks with all tooling (R, Python,
Scala) for Data Science
• Spark Jobs for high volume batch processing
• Canned ML algorithms
§ DEVSECOPS
• DEV to PROD in hours not days
• Full scans and deployment as part of CI/CD
• HOSTED on FISMA Moderate Cloud
Environment
• CDC ATO Environment
• HIPAA Compliant Environment
§ Data Apps
• Portal Access for Partner Agencies based on
Business needs
More Features in place TODAY
Tech Stack
• AWS EKS – Kubernetes <- Microservices
• Rancher
• AWS Lambda <- Serverless
• AWS Glue <- Serverless
• AWS Athena <- Serverless
• AWS Redshift
• AWS Sagemaker /JupyterLab
• AWS Quicksights <- Serverless
• AWS S3 , SQS , SNS , Dynamo DB , RDS Postgres
• Confluent Kafka
• Elasticsearch
• Kibana
• GitLab
All Features are AT SCALE
§ Parallelism in Data Pipelines for Large-Scale Processing
– 30 Kafka Partitions, 5 Broker Kafka Cluster
§ Horizontal scaling for storage (S3, Redshift -> Petabytes)
§ Delivering Data to Consumers at Scale
– Bulk Exports -> Gigabyte Slices of Data
§ Cloud managed Serverless services for analytics
Current Status
§ Status: In Production
§ Infrastructure Build out completed in ~5 days
§ Initial production deployment in ~10 days
§ Data Streams Logistical stabilization: ~15 days
§ O&M Started 30 days from Start Date
§ Full stack release cycle every 3 days (twice a week) -> now down to once per week
§ Data Products and Analytic Products are in “Real Time”
§ Data Consumers
– HHS Protect
– CDC
CELR
For more information, contact CDC
1-800-CDC-INFO (232-4636)
TTY: 1-888-232-6348 www.cdc.gov
The findings and conclusions in this report are those of the authors and do not necessarily represent the
official position of the Centers for Disease Control and Prevention.
Jason Hall, NCEZID, (zfr9@cdc.gov)
Rishi Tarar, Enterprise Architect and Fellow, Northrop Grumman (rrt8@cdc.gov)
Terminology
§ Disease surveillance is an epidemiological practice by which the spread of
disease is monitored in order to establish patterns of progression.
– The main role of disease surveillance is to
• predict, observe, and
• minimize the harm caused by outbreak, epidemic, and pandemic
situations, as well as
• increase knowledge about which factors contribute to such
circumstances.
“Surveillance data is a series of natural and spontaneous
raw data streams.
Don't resist them; that only creates sorrow and silos.
Let reality be the reality.
Let data streams flow naturally forward in whatever way it likes.”
-- Adapted from Lao Tzu

Mais conteúdo relacionado

Mais procurados

IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowKai Wähner
 
Events Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public SectorEvents Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public Sectorconfluent
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)KafkaZone
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...HostedbyConfluent
 
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...confluent
 
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...confluent
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...HostedbyConfluent
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...confluent
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaKai Wähner
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...HostedbyConfluent
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...HostedbyConfluent
 
Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network usi...
Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network usi...Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network usi...
Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network usi...confluent
 
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...HostedbyConfluent
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
 

Mais procurados (20)

IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlowIoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
 
Events Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public SectorEvents Everywhere: Enabling Digital Transformation in the Public Sector
Events Everywhere: Enabling Digital Transformation in the Public Sector
 
Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)Tale of two streaming frameworks (Karthik D - Walmart)
Tale of two streaming frameworks (Karthik D - Walmart)
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
 
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
 
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
Building an Enterprise Eventing Framework (Bryan Zelle, Centene; Neil Buesing...
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
 
Reliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at AirbnbReliable and Scalable Data Ingestion at Airbnb
Reliable and Scalable Data Ingestion at Airbnb
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...Maximize the Business Value of Machine Learning and Data Science with Kafka (...
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network usi...
Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network usi...Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network usi...
Mission-Critical, Real-Time Fault-Detection for NASA's Deep Space Network usi...
 
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 

Semelhante a Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka Summit 2020

The challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&DThe challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&DLaura Berry
 
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...All Things Open
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformSanjay Padhi, Ph.D
 
What's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingSplunk
 
An overview of clinical data repository
An overview of clinical data repositoryAn overview of clinical data repository
An overview of clinical data repositoryNetrah Laxminarayanan
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreHPCC Systems
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Paris Carbone
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowEric Kavanagh
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryRTTS
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...Amazon Web Services
 
Big Data Vendor Panel - Data Stax
Big Data Vendor Panel - Data StaxBig Data Vendor Panel - Data Stax
Big Data Vendor Panel - Data StaxMikan Associates
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray BrownPartners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray BrownDigital Queensland
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 

Semelhante a Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka Summit 2020 (20)

The challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&DThe challenges of Analytical Data Management in R&D
The challenges of Analytical Data Management in R&D
 
Irida immemxi hsiao
Irida immemxi hsiaoIrida immemxi hsiao
Irida immemxi hsiao
 
Irida bccdc dec10_2015
Irida bccdc dec10_2015Irida bccdc dec10_2015
Irida bccdc dec10_2015
 
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
Scalable and Repeatable Machine Learning pipelines: A key requirement for you...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
What's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-Boarding
 
An overview of clinical data repository
An overview of clinical data repositoryAn overview of clinical data repository
An overview of clinical data repository
 
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio VillanustreBig Data Processing Beyond MapReduce by Dr. Flavio Villanustre
Big Data Processing Beyond MapReduce by Dr. Flavio Villanustre
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...
 
Time's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data NowTime's Up! Getting Value from Big Data Now
Time's Up! Getting Value from Big Data Now
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
 
Big Data Vendor Panel - Data Stax
Big Data Vendor Panel - Data StaxBig Data Vendor Panel - Data Stax
Big Data Vendor Panel - Data Stax
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray BrownPartners in Technology 13 Sept 2013 HSIA CIO Ray Brown
Partners in Technology 13 Sept 2013 HSIA CIO Ray Brown
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
 

Mais de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mais de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka Summit 2020

  • 1. National Center for Emerging and Zoonotic Infectious Diseases Flattening the Curve with Covid-19 Electronic Lab Reporting Rishi Tarar, Northrop Grumman Jason Hall, CDC Kafka Summit , 2020
  • 2. Background § This architecture stemmed out of necessity for CDC’s EIP(Emerging Infections Program) programs, with an eye on ongoing agency efforts (CDC Data and IT Modernization) § Multiple national level use case Implementations proved out the architecture and exposed commonality that can extend enterprise wide… § And meet hard challenges like a Pandemic – head on
  • 3. COVID-19 Electronic Lab Reporting(CELR) - Scope § Agency initiative to collect COVID-19 line level lab testing data from alljurisdictions in United States § Goal to have most comprehensive testing data § Improve the quality and fidelity of line level data on an ongoing basis § Could be used for other conditions
  • 5. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 2016 Future State is a Mirage – Transition State is reality Current Stream New Stream 2020 Another Current State Future StateTransition State
  • 6. Primary Citizen -> TESTING EVENT § Data: Each record is an TESTING EVENT § Producers organized adjacent to feed formats § Streaming data and shaping it record-by-record through the pipelines § Each record is a primary citizen – Each record flows through the set of stream processors – Metadata is added to each record – Makes “things” happening to “a” record rapidly observable – Each record conforms to an evolving schema capability § Data can be aggregated and streamed to any destination on the fly
  • 7. Event Pipelines Event sourcing Program Y Data Sources Program X Feed1 Data Lake New Producer Current Producer Feed2 Validate Redact Transform Translate Biz Rules Case Clasification S3 Sink Connector JDBC Sink Connector event event event event event Elastic Sink Connector Data Lake Kafka Data Streams Configuration driven workloads Data events sink-ed Storage (Blob,Relational, ElasticSearch) Frameworks Pipelines organized by Program and Pathogen
  • 8. The Platform High Level Architecture Kafka FLAT Pipelines Pipelines HL7 CDA Pipelines FHIR Pipelines CDA Labs Hospital FHIR SPHL CSV/ JSON XLXS HL7 Pipelines Registri es Data Lake S3 Dashboards Data Sets Data Science Tools Case Notifications Lab Reporting Healthcare Interoperability Use Cases Implemented Athena Redshift Schema Dictionary Partner Collaboration Tools Real Time Data Stream Custom Data Sets Bulk Exports Machine Learning Business User (Non Tech) Data Manager Data Science User
  • 9. Data Storefronts Merged Lab Data Athena Tables Redshift Tables Quick Sights Data Science Tools DCIPHER Curated Views All Data HHS CELR Portal Self Service Data Storefront Business User Data Manager Data Science User Automated Data Storefront Line Level Lab Data Aggregated Lab Data Data Products Provenance Validation Reports Dead letter Reports Audit Reports VAR Team Glue Crawler Glue Jobs Analytical Pipelines Data Lake Update Hourly Glue ETL Translation Exclusion Tagging Race and Age Calculation Fllter Schedule Trigger Trigger Schedule Data Catalogue
  • 10. Features in place TODAY § Ingest • Real time Staged Event Pipeline Processing or Manual Upload • HL7 Pipelines - Support HL7 (2.5.1, 2.5, 2.3.z, 2.3.1) • FLAT Pipelines - CSV/FLAT/JSON (Any Size) • FHIR Pipeline* § Validation • FLAT File and Record level Validations via Configurations (no code) • HL7 2.5.1, 2.5, 2.3.z, 2.3.1 Validations via Configurations § Transformations • FLAT to HL7 Hierarchy • HL7 to FHIR (per build.fhir.org) * • HL7 to FLAT via Configurations § Translations • Terminology transformations via Configurations § Data Lake Management Services • At scale ETL Workflows • SQL Style Querying on all Data • Data Replay and Data de-Duplication • Biz rules for calculating fields • Machine Learning for feature extraction from raw data and ETL for Data cleaning * • Configuration Management • Data Case Classifications • Data Catalogue (Schemas and Dictionaries) • Auditor Services for proactive issue detections § Data Policy and Governance • Data Use Agreement Filter • Data Enrichment • Auto Data Catalogue • Data Security • Data Redaction • Pseudonymization for linking* • De-Identification
  • 11. § Data Products (Reporting/Provision) • Merged Line level Data from all sources in single schema • On demand canned Data Products (extracts) • Bulk Data Exports - time stamped data sets at scale • Self Service Custom queries • De-duplications for resubmissions at the record level § Data Integration Products • Data Routing • Clinical Decision Support for Guidance Delivery * • Exposing Data as FHIR API * • SMART on FHIR App for integration with EHR * § Analytics • Real Time Dashboards for Lake Operations • Real Time Dashboards for Lake Data Quality and Provenance • Jupyter Notebooks with all tooling (R, Python, Scala) for Data Science • Spark Jobs for high volume batch processing • Canned ML algorithms § DEVSECOPS • DEV to PROD in hours not days • Full scans and deployment as part of CI/CD • HOSTED on FISMA Moderate Cloud Environment • CDC ATO Environment • HIPAA Compliant Environment § Data Apps • Portal Access for Partner Agencies based on Business needs More Features in place TODAY
  • 12. Tech Stack • AWS EKS – Kubernetes <- Microservices • Rancher • AWS Lambda <- Serverless • AWS Glue <- Serverless • AWS Athena <- Serverless • AWS Redshift • AWS Sagemaker /JupyterLab • AWS Quicksights <- Serverless • AWS S3 , SQS , SNS , Dynamo DB , RDS Postgres • Confluent Kafka • Elasticsearch • Kibana • GitLab
  • 13. All Features are AT SCALE § Parallelism in Data Pipelines for Large-Scale Processing – 30 Kafka Partitions, 5 Broker Kafka Cluster § Horizontal scaling for storage (S3, Redshift -> Petabytes) § Delivering Data to Consumers at Scale – Bulk Exports -> Gigabyte Slices of Data § Cloud managed Serverless services for analytics
  • 14. Current Status § Status: In Production § Infrastructure Build out completed in ~5 days § Initial production deployment in ~10 days § Data Streams Logistical stabilization: ~15 days § O&M Started 30 days from Start Date § Full stack release cycle every 3 days (twice a week) -> now down to once per week § Data Products and Analytic Products are in “Real Time” § Data Consumers – HHS Protect – CDC
  • 15. CELR
  • 16. For more information, contact CDC 1-800-CDC-INFO (232-4636) TTY: 1-888-232-6348 www.cdc.gov The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention. Jason Hall, NCEZID, (zfr9@cdc.gov) Rishi Tarar, Enterprise Architect and Fellow, Northrop Grumman (rrt8@cdc.gov)
  • 17. Terminology § Disease surveillance is an epidemiological practice by which the spread of disease is monitored in order to establish patterns of progression. – The main role of disease surveillance is to • predict, observe, and • minimize the harm caused by outbreak, epidemic, and pandemic situations, as well as • increase knowledge about which factors contribute to such circumstances.
  • 18. “Surveillance data is a series of natural and spontaneous raw data streams. Don't resist them; that only creates sorrow and silos. Let reality be the reality. Let data streams flow naturally forward in whatever way it likes.” -- Adapted from Lao Tzu