SlideShare a Scribd company logo
1 of 35
Download to read offline
(Efficient) Data Exchange with
"Foreign" Ecosystems
Uwe Korn – QuantCo – 2nd July 2019
About me
• Engineering at QuantCo

• Apache {Arrow, Parquet} PMC

• Focus on Python but interact with
R, Java, SAS, …
@xhochy
@xhochy
mail@uwekorn.com
https://uwekorn.com
Python vs R
👊
Python vs R
👊
Python & R
Python & R
… & Java & Rust &
Javascript & C# & Matlab
& …
Do we have a problem?
Do we have a problem?
• Yes, there are different ecosystems!
Do we have a problem?
• Yes, there are different ecosystems!
• PyData

• Python / R

• Pandas / NumPy / PySpark/sparklyr / Docker
Do we have a problem?
• Yes, there are different ecosystems!
• PyData

• Python / R

• Pandas / NumPy / PySpark/sparklyr / Docker
• Two weeks ago: Berlin Buzzwords

• Java / Scala

• Flink / ElasticSearch / Kafka

• Scala-Spark / Kubernetes
Do we have a problem?
• Yes, there are different ecosystems!
• PyData

• Python / R

• Pandas / NumPy / PySpark/sparklyr / Docker
• Two weeks ago: Berlin Buzzwords

• Java / Scala

• Flink / ElasticSearch / Kafka

• Scala-Spark / Kubernetes
• SQL-based databases

• ODBC / JDBC

• Custom protocols (e.g. Postgres)
Why solve this?
• We build pipelines to move data

• Goal: end-to-end data products

Somewhere along the path we need to talk

• Avoid duplicate work / work on converters

• We don’t want Python vs R but use each of them where they’re best.
Apache Arrow at its core
• Main idea: common columnar representation of data in memory
• Provide libraries to access the data structures

• Broad support for many languages

• Create building blocks to form an ecosystem around it

• Implement adaptors for existing structures
Columnar Data
Previous Work
• CSV works really everywhere 

• Slow, untyped and row-wise

• Parquet is gaining traction in all ecosystems

• one of the major features and interaction points of Arrow

• Still, this serializes data

• RAM-Copy: 10GB/s on a Laptop

• DataFrame implementations look similar but still are incompatible
Languages
• C++, C(glib), Python, Ruby, R, Matlab

• C#

• Go

• Java

• JavaScript

• Rust
There’s a social component
• It’s not only APIs you need to bring together

• Communities are also quite distinct

• Get them talking!
Shipped with batteries
• There is more than just data structures

• Batteries in Arrow

• Vectorized Parquet reader: C++, Rust, Java(WIP)

C++ also supports ORC

• Gandiva: LLVM-based expression kernels

• Plasma: Shared-memory object store

• DataFusion: Rust-based query engine

• Flight: RPC protocol built on top of gRPC with zero-copy optimizations
Ecosystem
• RAPIDS: Analytics on the GPU

• Dremio: Data platform

• Turbodbc: columnar ODBC access in C++/Python

• Spark: fast Python and R bridge

• fletcher (pandas): Use Arrow instead of NumPy as backing storage

• fletcher (FPGA): Use Arrow on FPGAs

• Many more … https://arrow.apache.org/powered_by/
Ecosystem
Kartothek: 

• Heavily relies on Parquet adapter

• Uses Arrow’s type system which is more sophisticated than pandas’

• Using Arrow instead of building some components on their own allows
us to provide Kartothek access in other languages easily in the future
Does it work?
Does it work?
Everything is amazing on slides …
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Let’s take a real example with:
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Let’s take a real example with:
• ERP System in Java with JDBC access (no non-Java client)
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Let’s take a real example with:
• ERP System in Java with JDBC access (no non-Java client)
• ETL and Data Cleaning in Python
Does it work?
Everything is amazing on slides …
… so does this Arrow actually work?
Let’s take a real example with:
• ERP System in Java with JDBC access (no non-Java client)
• ETL and Data Cleaning in Python
• Analysis in R
Does it work?
Does it work?
Does it work?
Does it work?
WIP
Get started easily?
Up Next
• Build more adaptors, e.g. Postgres

• Building blocks for query engines on top of Arrow

• Datasets

• Analytical kernels

• DataFrame implementations directly on top of Arrow
Thanks
Slides at https://twitter.com/xhochy

Question here!

More Related Content

What's hot

Not Everything is an Object - Rocksolid Tour 2013
Not Everything is an Object  - Rocksolid Tour 2013Not Everything is an Object  - Rocksolid Tour 2013
Not Everything is an Object - Rocksolid Tour 2013
Gary Short
 

What's hot (13)

Why ruby and rails
Why ruby and railsWhy ruby and rails
Why ruby and rails
 
C# - Raise the bar with functional & immutable constructs (Dutch)
C# - Raise the bar with functional & immutable constructs (Dutch)C# - Raise the bar with functional & immutable constructs (Dutch)
C# - Raise the bar with functional & immutable constructs (Dutch)
 
Challenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali LanguageChallenges in Building NLP Applications in Nepali Language
Challenges in Building NLP Applications in Nepali Language
 
Not Everything is an Object - Rocksolid Tour 2013
Not Everything is an Object  - Rocksolid Tour 2013Not Everything is an Object  - Rocksolid Tour 2013
Not Everything is an Object - Rocksolid Tour 2013
 
PharoDAYS 2015: On Relational Databases by Guille Polito
PharoDAYS 2015: On Relational Databases by Guille PolitoPharoDAYS 2015: On Relational Databases by Guille Polito
PharoDAYS 2015: On Relational Databases by Guille Polito
 
Where Node.JS Meets iOS
Where Node.JS Meets iOSWhere Node.JS Meets iOS
Where Node.JS Meets iOS
 
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...
Flink Forward SF 2017: Tzu-Li (Gordon) Tai -  Joining the Scurry of Squirrels...Flink Forward SF 2017: Tzu-Li (Gordon) Tai -  Joining the Scurry of Squirrels...
Flink Forward SF 2017: Tzu-Li (Gordon) Tai - Joining the Scurry of Squirrels...
 
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
Flink Forward SF 2017: Trevor Grant - Introduction to Online Machine Learning...
 
sparklyr - Jeff Allen
sparklyr - Jeff Allensparklyr - Jeff Allen
sparklyr - Jeff Allen
 
IWMW 1998: Dataweb: the Horror Stories
IWMW 1998: Dataweb: the Horror StoriesIWMW 1998: Dataweb: the Horror Stories
IWMW 1998: Dataweb: the Horror Stories
 
파이콘한국2017 - Years with Python
파이콘한국2017 - Years with Python파이콘한국2017 - Years with Python
파이콘한국2017 - Years with Python
 
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...Trends in Programming Technology you might want to keep an eye on af Bent Tho...
Trends in Programming Technology you might want to keep an eye on af Bent Tho...
 
Road to Dynamic LINQ - Part 2
 Road to Dynamic LINQ - Part 2 Road to Dynamic LINQ - Part 2
Road to Dynamic LINQ - Part 2
 

Similar to PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems

The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
Lucidworks
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
ScyllaDB
 

Similar to PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems (20)

PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Hunting for anglerfish in datalakes
Hunting for anglerfish in datalakesHunting for anglerfish in datalakes
Hunting for anglerfish in datalakes
 
The Final Frontier
The Final FrontierThe Final Frontier
The Final Frontier
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
 
2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review2015 Data Science Summit @ dato Review
2015 Data Science Summit @ dato Review
 
Rust is for "Big Data"
Rust is for "Big Data"Rust is for "Big Data"
Rust is for "Big Data"
 
Frontend as a first class citizen
Frontend as a first class citizenFrontend as a first class citizen
Frontend as a first class citizen
 
Scaling with swagger
Scaling with swaggerScaling with swagger
Scaling with swagger
 
C# .NET - Um overview da linguagem
C# .NET - Um overview da linguagem C# .NET - Um overview da linguagem
C# .NET - Um overview da linguagem
 
Ruby - The Hard Bits
Ruby - The Hard BitsRuby - The Hard Bits
Ruby - The Hard Bits
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
cadec-2017-golang
cadec-2017-golangcadec-2017-golang
cadec-2017-golang
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
Scylla Summit 2022: Learning Rust the Hard Way for a Production Kafka+ScyllaD...
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
 
groovy & grails - lecture 1
groovy & grails - lecture 1groovy & grails - lecture 1
groovy & grails - lecture 1
 

More from Uwe Korn

More from Uwe Korn (10)

Going beyond Apache Parquet's default settings
Going beyond Apache Parquet's default settingsGoing beyond Apache Parquet's default settings
Going beyond Apache Parquet's default settings
 
pandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fastpandas.(to/from)_sql is simple but not fast
pandas.(to/from)_sql is simple but not fast
 
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
 
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detailApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
 
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copyFulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
 
Scalable Scientific Computing with Dask
Scalable Scientific Computing with DaskScalable Scientific Computing with Dask
Scalable Scientific Computing with Dask
 
Extending Pandas using Apache Arrow and Numba
Extending Pandas using Apache Arrow and NumbaExtending Pandas using Apache Arrow and Numba
Extending Pandas using Apache Arrow and Numba
 
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...
PyData Amsterdam 2018 – Building customer-visible data science dashboards wit...
 
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
 
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityHow Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
 

Recently uploaded

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 

Recently uploaded (20)

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 

PyData Frankfurt - (Efficient) Data Exchange with "Foreign" Ecosystems

  • 1. (Efficient) Data Exchange with "Foreign" Ecosystems Uwe Korn – QuantCo – 2nd July 2019
  • 2. About me • Engineering at QuantCo • Apache {Arrow, Parquet} PMC • Focus on Python but interact with R, Java, SAS, … @xhochy @xhochy mail@uwekorn.com https://uwekorn.com
  • 6. Python & R … & Java & Rust & Javascript & C# & Matlab & …
  • 7. Do we have a problem?
  • 8. Do we have a problem? • Yes, there are different ecosystems!
  • 9. Do we have a problem? • Yes, there are different ecosystems! • PyData • Python / R • Pandas / NumPy / PySpark/sparklyr / Docker
  • 10. Do we have a problem? • Yes, there are different ecosystems! • PyData • Python / R • Pandas / NumPy / PySpark/sparklyr / Docker • Two weeks ago: Berlin Buzzwords • Java / Scala • Flink / ElasticSearch / Kafka • Scala-Spark / Kubernetes
  • 11. Do we have a problem? • Yes, there are different ecosystems! • PyData • Python / R • Pandas / NumPy / PySpark/sparklyr / Docker • Two weeks ago: Berlin Buzzwords • Java / Scala • Flink / ElasticSearch / Kafka • Scala-Spark / Kubernetes • SQL-based databases • ODBC / JDBC • Custom protocols (e.g. Postgres)
  • 12. Why solve this? • We build pipelines to move data • Goal: end-to-end data products
 Somewhere along the path we need to talk • Avoid duplicate work / work on converters • We don’t want Python vs R but use each of them where they’re best.
  • 13.
  • 14. Apache Arrow at its core • Main idea: common columnar representation of data in memory • Provide libraries to access the data structures • Broad support for many languages • Create building blocks to form an ecosystem around it • Implement adaptors for existing structures
  • 16. Previous Work • CSV works really everywhere • Slow, untyped and row-wise • Parquet is gaining traction in all ecosystems • one of the major features and interaction points of Arrow • Still, this serializes data • RAM-Copy: 10GB/s on a Laptop • DataFrame implementations look similar but still are incompatible
  • 17. Languages • C++, C(glib), Python, Ruby, R, Matlab • C# • Go • Java • JavaScript • Rust
  • 18. There’s a social component • It’s not only APIs you need to bring together • Communities are also quite distinct • Get them talking!
  • 19. Shipped with batteries • There is more than just data structures • Batteries in Arrow • Vectorized Parquet reader: C++, Rust, Java(WIP)
 C++ also supports ORC • Gandiva: LLVM-based expression kernels • Plasma: Shared-memory object store • DataFusion: Rust-based query engine • Flight: RPC protocol built on top of gRPC with zero-copy optimizations
  • 20. Ecosystem • RAPIDS: Analytics on the GPU • Dremio: Data platform • Turbodbc: columnar ODBC access in C++/Python • Spark: fast Python and R bridge • fletcher (pandas): Use Arrow instead of NumPy as backing storage • fletcher (FPGA): Use Arrow on FPGAs • Many more … https://arrow.apache.org/powered_by/
  • 21. Ecosystem Kartothek: • Heavily relies on Parquet adapter • Uses Arrow’s type system which is more sophisticated than pandas’ • Using Arrow instead of building some components on their own allows us to provide Kartothek access in other languages easily in the future
  • 23. Does it work? Everything is amazing on slides …
  • 24. Does it work? Everything is amazing on slides … … so does this Arrow actually work?
  • 25. Does it work? Everything is amazing on slides … … so does this Arrow actually work? Let’s take a real example with:
  • 26. Does it work? Everything is amazing on slides … … so does this Arrow actually work? Let’s take a real example with: • ERP System in Java with JDBC access (no non-Java client)
  • 27. Does it work? Everything is amazing on slides … … so does this Arrow actually work? Let’s take a real example with: • ERP System in Java with JDBC access (no non-Java client) • ETL and Data Cleaning in Python
  • 28. Does it work? Everything is amazing on slides … … so does this Arrow actually work? Let’s take a real example with: • ERP System in Java with JDBC access (no non-Java client) • ETL and Data Cleaning in Python • Analysis in R
  • 34. Up Next • Build more adaptors, e.g. Postgres • Building blocks for query engines on top of Arrow • Datasets • Analytical kernels • DataFrame implementations directly on top of Arrow