Empowering Real Time Patient Care Through Spark Streaming

•

0 likes•265 views

Takeda’s Plasma Derived Therapies (PDT) business unit has recently embarked on a project to use Spark Streaming on Databricks to empower how they deliver value to their Plasma Donation centers. As patients come in and interface without clinics, we store and track all of the patient interactions in real time and deliver outputs and results based on said interactions. The current problem with our existing architecture is that it is very expensive to maintain and has an unsustainable number of failure points. Spark Streaming is essential for allowing this use case because it allows for a more robust ETL pipeline. With Spark Streaming, we are able to replace our existing ETL processes (that are based on Lamdbas, step functions, triggered jobs, etc) into a purely stream driven architecture. Data is brought into our s3 raw layer as a large set of CSV files through AWS DMS and Informatica IICS as these services bring data from on-prem systems into our cloud layer. We have a stream currently running which takes these raw files up and merges them into Delta tables established in the bronze/stage layer. We are using AWS Glue as the metadata provider for all of these operations. From the stage layer, we have another set of streams using the stage Delta tables as their source, which transform and conduct stream to stream lookups before writing the enriched records into RDS (silver/prod layer). Once the data has been merged into RDS we have a DMS task which lifts the data back into S3 as CSV files. We have a small intermediary stream which merge these CSV files into corresponding delta tables, from which we have our gold/analytic streams. The on-prem systems are able to speak to the silver layer and allow for the near real-time latency that our patient care centers require.

Data & Analytics

Empowering PDT Analytics through Databricks &
Spark Structured Streaming
05/05/2021
Arnav Chaudhary (he/him)
Digital Product Manager
Takeda
Jonathan E. Yee (he/him)
Data and Analytics Executive
EY
Jeff Cubeta (they/them)
Clinical Intelligence Executive
Algernon Solutions

Databricks on Takeda’s Enterprise Data Backbone
The EDB (Enterprise Data Backbone) is Takeda’s integrated data platform responsible for combining
global data assets into a single source of truth and enabling tools to provide insights via analytics.
• Data Ingestion
• Data Processing
• Advanced Analytics
Domains
• US
• Europe
• Japan
Global Regions
• Python
• R Studio
Applications
• MIT Researches as an
ongoing collaboration
Specialized
Deployment
Databricks on AWS is used heavily by Takeda across the business
200,000 DBUs
of Monthly
compute
600+ Monthly
Active Users
50+ validated
schemas with
100s of tables
15 advanced
analytics
teams using
Databricks

PDT Analytics Program
Drive improved
plasma yield
Increased access to a
greater volume of
plasma donors
What are we solving for? Expected Outcomes
Gain access to a larger
share of the donor market
to reduce CPL
Increase yield by improving
retention and the conversion
funnel
Reduce manual processes and
increase automation to
improve operations efficiency
Harvest the value of
PDT’s data assets
Reduce cost per liter
Improved data, analytics,
and process layers for PDT
analytics

PDT Donor Portal Application & Analytics Foundation
Going forward…
Previously…
• Existing 153 disparate
center systems
• Reliance on 3rd party for
marketing insights
• Manual report generation
• Lack of real-time
information for quick
decision making
• Reactive decision-making
process
• Consolidated data into one
operational data store (ODS)
Near Real time data
transmission
• Data lake to store years of
information
• Analytics platform allowing
data scientists to perform data
mining, create predictive
model, and generate
actionable insights
• Reduction of manual reports
PDT/BioLife Data
Backbone
• PDT is the pioneer using the
newly developed Takeda
Enterprise Data Backbone
Platform in the CLOUD
• Supporting Analytics,
Operational Use, and other
Products data needs (e.g. Donor
Engagement, Fuji Innovation
Engine)

Daily Batch Jobs
Manual Report Generation
Limited Access to Data
Structured Typed SQL Data
API Returned JSON
Scheduled CSV Uploads
4 Enterprise Data Systems
151 collection centers
250 SQL Tables
~ 1 TB Historic Data
~ .5 GB/Hr Ongoing CDC
We designed opportunities to drive value and address the core pain point themes for PDT
• Spark Structured Streams
• Low latency data processing
• Standardized event streams to
empower downstream apps
Real Time Data
• Single presence for Donors
• Cross system relationships
• Business process data entities
Unified Data Schema
• Uniform ingestion process
• Configuration driven operations
• S3 Delta Tables
• Data served to SQL DB for low latency,
high volume querying
Lakehouse Model
Data Isolation Latency of Analytics Narrow Audience
Three Key Pain Points with PDT Data Analytics

Lakehouse Model
Key Design Details
• Uniform ingestion platform
• Improved accessibility to data
• Delta Tables backing each layer
• Structured Streams between layers
• Support for big data analysis through
serving Delta Tables
• Support for high volume, low latency
querying using SQL based tools
• Extensible design to allow expansion

Using foreachBatch to Fork and Serve Streaming CDC Data
Using the Delta Table merge construct within _serve
Writing the CDC stream
Within the foreachBatch function, we target
multiple sinks
• Delta Table
• SQL Database
• Event Bridge

© 2019 Takeda Pharmaceutical Company Limited. All rights reserved
Thank you for attending!
We will do our best to answer any questions.

What's hot

Data Qualityjerdeb

Data Catalog as the Platform for Data IntelligenceAlation

Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY

DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...DATAVERSITY

Data Profiling, Data Catalogs and Metadata HarmonisationAlan McSweeney

Data Modeling on Azure for AnalyticsIke Ellis

Essential Reference and Master Data ManagementDATAVERSITY

Data Quality Management: Cleaner Data, Better Reportingaccenture

Data Lake ArchitectureDATAVERSITY

Internal Operations Automation Journey | Accentureaccenture

DAMA Australia: How to Choose a Data Management ToolPrecisely

Modern Data Platform on AWSAmazon Web Services

3D Data Strategy FrameworkDaniel Ren

8 Steps to Creating a Data StrategySilicon Valley Data Science

Data Management StrategiesMicheal Axelsen

Data Quality Best PracticesDATAVERSITY

Data stewardshipAldis Ērglis

Data Governance Takes a Village (So Why is Everyone Hiding?)DATAVERSITY

Data Quality PresentationStephen McCarthy

Data GovernanceRob Lux

What's hot (20)

Data Quality

Data Catalog as the Platform for Data Intelligence

Building a Data Strategy – Practical Steps for Aligning with Business Goals

DAS Slides: Building a Data Strategy - Practical Steps for Aligning with Busi...

Data Profiling, Data Catalogs and Metadata Harmonisation

Data Modeling on Azure for Analytics

Essential Reference and Master Data Management

Data Quality Management: Cleaner Data, Better Reporting

Data Lake Architecture

Internal Operations Automation Journey | Accenture

DAMA Australia: How to Choose a Data Management Tool

Modern Data Platform on AWS

3D Data Strategy Framework

8 Steps to Creating a Data Strategy

Data Management Strategies

Data Quality Best Practices

Data stewardship

Data Governance Takes a Village (So Why is Everyone Hiding?)

Data Quality Presentation

Data Governance

Similar to Empowering Real Time Patient Care Through Spark Streaming

Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo

Skilwise Big dataSkillwise Group

Skillwise Big Data part 2Skillwise Group

Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC

J1 - Keynote Data Platform - Rohan KumarMS Cloud Summit

Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks

Data Mesh using Microsoft FabricNathan Bijnens

Data Warehouse Testing in the Pharmaceutical IndustryRTTS

rough-work.pptxsharpan

Derfor skal du bruge en DataLakeMicrosoft

Customer Feedback Analytics for Starbucks Nishant Gandhi

Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra

BigDataBx #1 - Atelier 1 Cloudera Datawarehouse OptimisationExcelerate Systems

Customer value analysis of big data productsVikas Sardana

Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens

Hadoop and Your Enterprise Data WarehouseEdgar Alejandro Villegas

the Data World DistilledRTTS

Modern Data Management for Federal ModernizationDenodo

Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS

Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra

Similar to Empowering Real Time Patient Care Through Spark Streaming (20)

Bridging the Last Mile: Getting Data to the People Who Need It (APAC)

Skilwise Big data

Skillwise Big Data part 2

Girish Juneja - Intel Big Data & Cloud Summit 2013

J1 - Keynote Data Platform - Rohan Kumar

Oracle Big Data Appliance and Big Data SQL for advanced analytics

Data Mesh using Microsoft Fabric

Data Warehouse Testing in the Pharmaceutical Industry

rough-work.pptx

Derfor skal du bruge en DataLake

Customer Feedback Analytics for Starbucks

Data Lakehouse, Data Mesh, and Data Fabric (r1)

BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation

Customer value analysis of big data products

Data Mesh in Azure using Cloud Scale Analytics (WAF)

Hadoop and Your Enterprise Data Warehouse

the Data World Distilled

Modern Data Management for Federal Modernization

Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...

Data Lakehouse, Data Mesh, and Data Fabric (r2)

Recently uploaded

Data-Analysis for Chicago Crime Data 2023ymrp368

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh9953056974 Low Rate Call Girls In Saket, Delhi NCR

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Midocean dropshipping via API with DroFxolyaivanovalion

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

ALSO dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls

Zuja dropshipping via API with DroFx.pptxolyaivanovalion

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

Recently uploaded (20)

Data-Analysis for Chicago Crime Data 2023

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh

Log Analysis using OSSEC sasoasasasas.pptx

Edukaciniai dropshipping via API with DroFx

Invezz.com - Grow your wealth with trading signals

Determinants of health, dimensions of health, positive health and spectrum of...

Midocean dropshipping via API with DroFx

BabyOno dropshipping via API with DroFx.pptx

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Carero dropshipping via API with DroFx.pptx

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

ALSO dropshipping via API with DroFx.pptx

Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

Best VIP Call Girls Noida Sector 39 Call Me: 8448380779

Zuja dropshipping via API with DroFx.pptx

CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779

Empowering Real Time Patient Care Through Spark Streaming

1. Empowering PDT Analytics through Databricks & Spark Structured Streaming 05/05/2021 Arnav Chaudhary (he/him) Digital Product Manager Takeda Jonathan E. Yee (he/him) Data and Analytics Executive EY Jeff Cubeta (they/them) Clinical Intelligence Executive Algernon Solutions

3. Databricks on Takeda’s Enterprise Data Backbone The EDB (Enterprise Data Backbone) is Takeda’s integrated data platform responsible for combining global data assets into a single source of truth and enabling tools to provide insights via analytics. • Data Ingestion • Data Processing • Advanced Analytics Domains • US • Europe • Japan Global Regions • Python • R Studio Applications • MIT Researches as an ongoing collaboration Specialized Deployment Databricks on AWS is used heavily by Takeda across the business 200,000 DBUs of Monthly compute 600+ Monthly Active Users 50+ validated schemas with 100s of tables 15 advanced analytics teams using Databricks

4. PDT Analytics Program Drive improved plasma yield Increased access to a greater volume of plasma donors What are we solving for? Expected Outcomes Gain access to a larger share of the donor market to reduce CPL Increase yield by improving retention and the conversion funnel Reduce manual processes and increase automation to improve operations efficiency Harvest the value of PDT’s data assets Reduce cost per liter Improved data, analytics, and process layers for PDT analytics

5. PDT Donor Portal Application & Analytics Foundation Going forward… Previously… • Existing 153 disparate center systems • Reliance on 3rd party for marketing insights • Manual report generation • Lack of real-time information for quick decision making • Reactive decision-making process • Consolidated data into one operational data store (ODS) Near Real time data transmission • Data lake to store years of information • Analytics platform allowing data scientists to perform data mining, create predictive model, and generate actionable insights • Reduction of manual reports PDT/BioLife Data Backbone • PDT is the pioneer using the newly developed Takeda Enterprise Data Backbone Platform in the CLOUD • Supporting Analytics, Operational Use, and other Products data needs (e.g. Donor Engagement, Fuji Innovation Engine)

6. Daily Batch Jobs Manual Report Generation Limited Access to Data Structured Typed SQL Data API Returned JSON Scheduled CSV Uploads 4 Enterprise Data Systems 151 collection centers 250 SQL Tables ~ 1 TB Historic Data ~ .5 GB/Hr Ongoing CDC We designed opportunities to drive value and address the core pain point themes for PDT • Spark Structured Streams • Low latency data processing • Standardized event streams to empower downstream apps Real Time Data • Single presence for Donors • Cross system relationships • Business process data entities Unified Data Schema • Uniform ingestion process • Configuration driven operations • S3 Delta Tables • Data served to SQL DB for low latency, high volume querying Lakehouse Model Data Isolation Latency of Analytics Narrow Audience Three Key Pain Points with PDT Data Analytics

7. Unified Data Schema

8. Configuration Driven Process

9. Lakehouse Model Key Design Details • Uniform ingestion platform • Improved accessibility to data • Delta Tables backing each layer • Structured Streams between layers • Support for big data analysis through serving Delta Tables • Support for high volume, low latency querying using SQL based tools • Extensible design to allow expansion

10. Real Time Data

11. Using foreachBatch to Fork and Serve Streaming CDC Data Using the Delta Table merge construct within _serve Writing the CDC stream Within the foreachBatch function, we target multiple sinks • Delta Table • SQL Database • Event Bridge

Empowering Real Time Patient Care Through Spark Streaming

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Empowering Real Time Patient Care Through Spark Streaming

Similar to Empowering Real Time Patient Care Through Spark Streaming (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Empowering Real Time Patient Care Through Spark Streaming