SlideShare uma empresa Scribd logo
1 de 25
Building ETL pipelines
to tranSMART 17.X
New tools for the data loader
Alessia Peviani – Data Engineer
TRANSMART
17.X INSTANCE
(SLICE)
TMTK ARBORIST (plugin)
TRANSMART
LOADER
TRANSMART
COPY
GENERIC
DATA
SLICING TOOL
csr
2transmart
fhir
2transmart
ontology
2transmart
claml
2transmart
TRANSMART
17.X INSTANCE
CSR DATA FHIR DATA
GENERIC
ONTOLOGY
CLAML
ONTOLOGY
ETL pipeline
TRANSMART
COPY
Jupyter Notebook
TRANSMART
17.X INSTANCE
Tools overview: data flow and dependencies
GENERIC
ONTOLOGY
ARBORIST
(standalone)
Manual pipeline Automated ETL pipeline
Data flow
Code dependency
GENERIC
DATA
Why Manual Pipelines
Manual pipeline
• One-time / infrequent data loading
• Changing data format
• Initial data exploration / modeling
• Tools developed internally for data scientists, and to
facilitate collaboration with data owners
(researchers, clinicians)
TMTK ARBORIST (plugin)
GENERIC
DATA
TRANSMART
17.X INSTANCE
TRANSMART
COPY
Jupyter Notebook
GENERIC
ONTOLOGY
ARBORIST
(standalone)
Data flow
Code dependency
Why Automated ETL Pipelines
Data flow
Code dependency
• Frequent updates
• Stable data format
• Typically large volumes
of data
• Tools developed for
specific projects, but
with flexibility in mind to
enable use in future ETL
pipelines TRANSMART
17.X INSTANCE
(SLICE)
TRANSMART
LOADER
TRANSMART
COPY
SLICING TOOL
csr
2transmart
fhir
2transmart
ontology
2transmart
claml
2transmart
CSR DATA FHIR DATA
GENERIC
ONTOLOGY
CLAML
ONTOLOGY
ETL pipeline
TRANSMART
17.X INSTANCE
Automated ETL pipeline
GENERIC
DATA
1. Tools for manual data loading
TMTK & Arborist, transmart-copy
TMTK: The TranSMART Toolkit
Manual pipeline
• Easier collaboration with data owners:
• Option to load tree structure, word mapping, metadata
from Excel file ( “Template file”)
• Interactive tree structure modeling with the Arborist
• Updated to support TranSMART 17.X features
• Added additional sheets in template file
• Added export in transmart-copy format
TMTK ARBORIST (plugin)
GENERIC
DATA
TRANSMART
17.X INSTANCE
TRANSMART
COPY
Jupyter Notebook
GENERIC
ONTOLOGY
ARBORIST
(standalone)
Data flow
Code dependency
https://github.com/thehyve/tmtk
TMTK: The TranSMART Toolkit
Input template file
The Arborist
Manual pipeline
Two flavors of interactive tree modeling:
• Without leaving Jupyter Notebook (embedded)
• By sending tree to web service (standalone)
• Share trees with data owners and let them edit
• Re-import final result into TMTK
• Functionality (both versions):
• Add/remove tree node and metadata tags
• Edit position and name
TMTK ARBORIST (plugin)
GENERIC
DATA
TRANSMART
17.X INSTANCE
TRANSMART
COPY
Jupyter Notebook
GENERIC
ONTOLOGY
ARBORIST
(standalone)
Data flow
Code dependency
https://github.com/thehyve/arborist
The Arborist: Embedded vs Standalone
The Arborist: Embedded vs Standalone
The Arborist: Plug-in vs Standalone
https://arborist-test-trait.thehyve.net/
The Arborist: Plug-in vs Standalone
feedback
Data owner
Data loading with transmart-copy
Manual pipeline
TMTK ARBORIST (plugin)
GENERIC
DATA
TRANSMART
17.X INSTANCE
TRANSMART
COPY
Jupyter Notebook
GENERIC
ONTOLOGY
ARBORIST
(standalone)
Data flow
Code dependency
• A simple tool with a simple function
• Batch and incremental data loading
• What you see (data tables) is what you get
(transmart tables)
• Supports new TranSMART 17.X features
https://github.com/thehyve/transmart-core/tree/dev/transmart-copy
transmart-copy
transmart-copy
(TranSMART 17.X database
schemas as seen in pgAdmin)
https://github.com/thehyve/transmart-core/tree/dev/transmart-copy/src/test/resources/examples/SURVEY0
transmart-copy vs transmart-batch
transmart-copy transmart-batch
transmart-copy vs transmart-batch
transmart-copy transmart-batch
• A simple tool with a simple function
• Batch and incremental data loading
• What you see (data tables) is what you get
(transmart tables)
• Supports new TranSMART 17.X features
• Complex, includes validation steps
• Batch loading only (inefficient in some
cases)
• Does not make explicit what your data
will look like once in transmart
2. Tools for automated ETL pipelines
transmart-loader & related tools, hyper-dicer
transmart-loader
Data flow
Code dependency
• Python library encoding
TranSMART entities as well-
defined classes
• Ensures data export to
transmart-copy compatible
format
• Flexible tool, fixed output
but can be adapted to new
input formats (various
flavors) TRANSMART
17.X INSTANCE
(SLICE)
TRANSMART
LOADER
TRANSMART
COPY
SLICING TOOL
csr
2transmart
fhir
2transmart
ontology
2transmart
claml
2transmart
CSR DATA FHIR DATA
GENERIC
ONTOLOGY
CLAML
ONTOLOGY
ETL pipeline
TRANSMART
17.X INSTANCE
Automated ETL pipeline
GENERIC
DATA
transmart-loader based tools
Data flow
Code dependency
transmart-loader -based
mapping tools:
• csr2transmart
• fhir2transmart
• ontology2transmart
• claml2transmart
TRANSMART
17.X INSTANCE
(SLICE)
TRANSMART
LOADER
TRANSMART
COPY
SLICING TOOL
csr
2transmart
fhir
2transmart
ontology
2transmart
claml
2transmart
CSR DATA FHIR DATA
GENERIC
ONTOLOGY
CLAML
ONTOLOGY
ETL pipeline
TRANSMART
17.X INSTANCE
Automated ETL pipeline
GENERIC
DATA
https://github.com/thehyve/
transmart-loader
Slicing tool
Data flow
Code dependency
• Developed for DiFuture
consortium to populate
project-specific data
warehouses
• Given a set of constraints,
automatically extracts
corresponding data and
loads them to another
TranSMART instance
TRANSMART
17.X INSTANCE
(SLICE)
TRANSMART
LOADER
TRANSMART
COPY
SLICING TOOL
csr
2transmart
fhir
2transmart
ontology
2transmart
claml
2transmart
CSR DATA FHIR DATA
GENERIC
ONTOLOGY
CLAML
ONTOLOGY
ETL pipeline
TRANSMART
17.X INSTANCE
Automated ETL pipeline
GENERIC
DATA
Slicing tool
https://github.com/thehyve/transmart-hyper-dicer
• Official name: transmart-hyper-dicer
(short for “TranSMART hypercube dicer”)
Functionality:
• Require input JSON file with query constraints
• Extract relevant part of the ontology for the given set of data
• Populate an EXISTING (empty) TranSMART instance
Conclusions
• Transmart 17.X comes with a range of data loading tools,
both for for simple manual pipelines, and complex
automated ETL pipelines.
• Increasingly modular to improve maintainability (separate
functions in separate tools)
• Development mostly driven by specific ETL projects, but
keeping in mind flexibility to create tools easily adaptable to
new use case scenarios
TMTK ARBORIST (plugin)
GENERIC
DATA
TRANSMART
17.X INSTANCE
TRANSMART
COPY
Jupyter Notebook
Workshop this afternoon!
GENERIC
ONTOLOGY
ARBORIST
(standalone)
Manual pipeline
Data flow
Code dependency
PYTHON API
CLIENT
Jupyter Notebook
ROOM B4-221 (next to lunch area) at 14:30
Jupyter Notebook & tools for:
• Data Loading to TranSMART
• TranSMART API calls
Acknowledgements
Gijs Kant
Software Architect
Ewelina Grudzień
Software Engineer
Artur Faizullin
System Administrator
Stefan Payralbe
Data Engineer
Jochem Bijlard
Data Engineer
Ward Weistra
(former Hyver)
Brenda Hijmans, NKI
(TMTK contributor)
Building ETL pipelines for tranSMART 17.X - New tools for the data loader

Mais conteúdo relacionado

Semelhante a Building ETL pipelines for tranSMART 17.X - New tools for the data loader

Apache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Paris Carbone
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016 Databricks
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management ServiceSafe Software
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Databricks
 
Linkmeup v076 (2019-06)
Linkmeup v076 (2019-06)Linkmeup v076 (2019-06)
Linkmeup v076 (2019-06)eucariot
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Tshark pen testing, very good insight of the pent test
Tshark pen testing, very good insight of the pent testTshark pen testing, very good insight of the pent test
Tshark pen testing, very good insight of the pent testclaudiu59
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector BuilderMark Wilkinson
 
Aioug ha day oct2015 goldengate- High Availability Day 2015
Aioug ha day oct2015 goldengate- High Availability Day 2015Aioug ha day oct2015 goldengate- High Availability Day 2015
Aioug ha day oct2015 goldengate- High Availability Day 2015aioughydchapter
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Robert Metzger
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)Ontico
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseTugdual Grall
 

Semelhante a Building ETL pipelines for tranSMART 17.X - New tools for the data loader (20)

Apache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT ManagementApache StreamPipes – Flexible Industrial IoT Management
Apache StreamPipes – Flexible Industrial IoT Management
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...
 
Odp
OdpOdp
Odp
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
Environment Canada's Data Management Service
Environment Canada's Data Management ServiceEnvironment Canada's Data Management Service
Environment Canada's Data Management Service
 
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
Apache Spark 2.0: A Deep Dive Into Structured Streaming - by Tathagata Das
 
Linkmeup v076 (2019-06)
Linkmeup v076 (2019-06)Linkmeup v076 (2019-06)
Linkmeup v076 (2019-06)
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Tshark pen testing, very good insight of the pent test
Tshark pen testing, very good insight of the pent testTshark pen testing, very good insight of the pent test
Tshark pen testing, very good insight of the pent test
 
FAIR Projector Builder
FAIR Projector BuilderFAIR Projector Builder
FAIR Projector Builder
 
Aioug ha day oct2015 goldengate- High Availability Day 2015
Aioug ha day oct2015 goldengate- High Availability Day 2015Aioug ha day oct2015 goldengate- High Availability Day 2015
Aioug ha day oct2015 goldengate- High Availability Day 2015
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Pipeline parallelism
Pipeline parallelismPipeline parallelism
Pipeline parallelism
 
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
Postgres в основе вашего дата-центра, Bruce Momjian (EnterpriseDB)
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Softshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with CouchbaseSoftshake 2013: Introduction to NoSQL with Couchbase
Softshake 2013: Introduction to NoSQL with Couchbase
 

Último

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Último (20)

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Building ETL pipelines for tranSMART 17.X - New tools for the data loader

  • 1. Building ETL pipelines to tranSMART 17.X New tools for the data loader Alessia Peviani – Data Engineer
  • 2. TRANSMART 17.X INSTANCE (SLICE) TMTK ARBORIST (plugin) TRANSMART LOADER TRANSMART COPY GENERIC DATA SLICING TOOL csr 2transmart fhir 2transmart ontology 2transmart claml 2transmart TRANSMART 17.X INSTANCE CSR DATA FHIR DATA GENERIC ONTOLOGY CLAML ONTOLOGY ETL pipeline TRANSMART COPY Jupyter Notebook TRANSMART 17.X INSTANCE Tools overview: data flow and dependencies GENERIC ONTOLOGY ARBORIST (standalone) Manual pipeline Automated ETL pipeline Data flow Code dependency GENERIC DATA
  • 3. Why Manual Pipelines Manual pipeline • One-time / infrequent data loading • Changing data format • Initial data exploration / modeling • Tools developed internally for data scientists, and to facilitate collaboration with data owners (researchers, clinicians) TMTK ARBORIST (plugin) GENERIC DATA TRANSMART 17.X INSTANCE TRANSMART COPY Jupyter Notebook GENERIC ONTOLOGY ARBORIST (standalone) Data flow Code dependency
  • 4. Why Automated ETL Pipelines Data flow Code dependency • Frequent updates • Stable data format • Typically large volumes of data • Tools developed for specific projects, but with flexibility in mind to enable use in future ETL pipelines TRANSMART 17.X INSTANCE (SLICE) TRANSMART LOADER TRANSMART COPY SLICING TOOL csr 2transmart fhir 2transmart ontology 2transmart claml 2transmart CSR DATA FHIR DATA GENERIC ONTOLOGY CLAML ONTOLOGY ETL pipeline TRANSMART 17.X INSTANCE Automated ETL pipeline GENERIC DATA
  • 5. 1. Tools for manual data loading TMTK & Arborist, transmart-copy
  • 6. TMTK: The TranSMART Toolkit Manual pipeline • Easier collaboration with data owners: • Option to load tree structure, word mapping, metadata from Excel file ( “Template file”) • Interactive tree structure modeling with the Arborist • Updated to support TranSMART 17.X features • Added additional sheets in template file • Added export in transmart-copy format TMTK ARBORIST (plugin) GENERIC DATA TRANSMART 17.X INSTANCE TRANSMART COPY Jupyter Notebook GENERIC ONTOLOGY ARBORIST (standalone) Data flow Code dependency https://github.com/thehyve/tmtk
  • 7. TMTK: The TranSMART Toolkit Input template file
  • 8. The Arborist Manual pipeline Two flavors of interactive tree modeling: • Without leaving Jupyter Notebook (embedded) • By sending tree to web service (standalone) • Share trees with data owners and let them edit • Re-import final result into TMTK • Functionality (both versions): • Add/remove tree node and metadata tags • Edit position and name TMTK ARBORIST (plugin) GENERIC DATA TRANSMART 17.X INSTANCE TRANSMART COPY Jupyter Notebook GENERIC ONTOLOGY ARBORIST (standalone) Data flow Code dependency https://github.com/thehyve/arborist
  • 9. The Arborist: Embedded vs Standalone
  • 10. The Arborist: Embedded vs Standalone
  • 11. The Arborist: Plug-in vs Standalone https://arborist-test-trait.thehyve.net/
  • 12. The Arborist: Plug-in vs Standalone feedback Data owner
  • 13. Data loading with transmart-copy Manual pipeline TMTK ARBORIST (plugin) GENERIC DATA TRANSMART 17.X INSTANCE TRANSMART COPY Jupyter Notebook GENERIC ONTOLOGY ARBORIST (standalone) Data flow Code dependency • A simple tool with a simple function • Batch and incremental data loading • What you see (data tables) is what you get (transmart tables) • Supports new TranSMART 17.X features https://github.com/thehyve/transmart-core/tree/dev/transmart-copy
  • 14. transmart-copy transmart-copy (TranSMART 17.X database schemas as seen in pgAdmin) https://github.com/thehyve/transmart-core/tree/dev/transmart-copy/src/test/resources/examples/SURVEY0
  • 16. transmart-copy vs transmart-batch transmart-copy transmart-batch • A simple tool with a simple function • Batch and incremental data loading • What you see (data tables) is what you get (transmart tables) • Supports new TranSMART 17.X features • Complex, includes validation steps • Batch loading only (inefficient in some cases) • Does not make explicit what your data will look like once in transmart
  • 17. 2. Tools for automated ETL pipelines transmart-loader & related tools, hyper-dicer
  • 18. transmart-loader Data flow Code dependency • Python library encoding TranSMART entities as well- defined classes • Ensures data export to transmart-copy compatible format • Flexible tool, fixed output but can be adapted to new input formats (various flavors) TRANSMART 17.X INSTANCE (SLICE) TRANSMART LOADER TRANSMART COPY SLICING TOOL csr 2transmart fhir 2transmart ontology 2transmart claml 2transmart CSR DATA FHIR DATA GENERIC ONTOLOGY CLAML ONTOLOGY ETL pipeline TRANSMART 17.X INSTANCE Automated ETL pipeline GENERIC DATA
  • 19. transmart-loader based tools Data flow Code dependency transmart-loader -based mapping tools: • csr2transmart • fhir2transmart • ontology2transmart • claml2transmart TRANSMART 17.X INSTANCE (SLICE) TRANSMART LOADER TRANSMART COPY SLICING TOOL csr 2transmart fhir 2transmart ontology 2transmart claml 2transmart CSR DATA FHIR DATA GENERIC ONTOLOGY CLAML ONTOLOGY ETL pipeline TRANSMART 17.X INSTANCE Automated ETL pipeline GENERIC DATA https://github.com/thehyve/ transmart-loader
  • 20. Slicing tool Data flow Code dependency • Developed for DiFuture consortium to populate project-specific data warehouses • Given a set of constraints, automatically extracts corresponding data and loads them to another TranSMART instance TRANSMART 17.X INSTANCE (SLICE) TRANSMART LOADER TRANSMART COPY SLICING TOOL csr 2transmart fhir 2transmart ontology 2transmart claml 2transmart CSR DATA FHIR DATA GENERIC ONTOLOGY CLAML ONTOLOGY ETL pipeline TRANSMART 17.X INSTANCE Automated ETL pipeline GENERIC DATA
  • 21. Slicing tool https://github.com/thehyve/transmart-hyper-dicer • Official name: transmart-hyper-dicer (short for “TranSMART hypercube dicer”) Functionality: • Require input JSON file with query constraints • Extract relevant part of the ontology for the given set of data • Populate an EXISTING (empty) TranSMART instance
  • 22. Conclusions • Transmart 17.X comes with a range of data loading tools, both for for simple manual pipelines, and complex automated ETL pipelines. • Increasingly modular to improve maintainability (separate functions in separate tools) • Development mostly driven by specific ETL projects, but keeping in mind flexibility to create tools easily adaptable to new use case scenarios
  • 23. TMTK ARBORIST (plugin) GENERIC DATA TRANSMART 17.X INSTANCE TRANSMART COPY Jupyter Notebook Workshop this afternoon! GENERIC ONTOLOGY ARBORIST (standalone) Manual pipeline Data flow Code dependency PYTHON API CLIENT Jupyter Notebook ROOM B4-221 (next to lunch area) at 14:30 Jupyter Notebook & tools for: • Data Loading to TranSMART • TranSMART API calls
  • 24. Acknowledgements Gijs Kant Software Architect Ewelina Grudzień Software Engineer Artur Faizullin System Administrator Stefan Payralbe Data Engineer Jochem Bijlard Data Engineer Ward Weistra (former Hyver) Brenda Hijmans, NKI (TMTK contributor)