O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
© 2016 Dremio Corporation @DremioHQ
The Columnar Era: Leveraging Parquet,
Arrow and Kudu for High-Performance
Analytics
Ju...
© 2016 Dremio Corporation @DremioHQ
• Architect at @DremioHQ
• Formerly Tech Lead at Twitter on Data Platforms.
• Creator ...
© 2016 Dremio Corporation @DremioHQ
Agenda
• Benefits of Columnar representation
– Immutable On disk (Apache Parquet)
– Mu...
© 2016 Dremio Corporation @DremioHQ
Benefits of Columnar formats
@EmrgencyKittens
© 2016 Dremio Corporation @DremioHQ
Columnar layout
Logical table
representation
Row layout
Column layout
© 2016 Dremio Corporation @DremioHQ
Mutable or Immutable Storage
• Different trade offs
– Immutable: (Parquet).
• Higher w...
© 2016 Dremio Corporation @DremioHQ
On Disk and in Memory
• Different trade offs
– On disk: Storage.
• Accessed by multipl...
© 2016 Dremio Corporation @DremioHQ
Parquet on disk columnar format
© 2016 Dremio Corporation @DremioHQ
Parquet on disk columnar format
• Nested data structures
• Compact format:
– type awar...
© 2016 Dremio Corporation @DremioHQ
Access only the data you need
a b c
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
a5 b5 c5
a b c...
© 2016 Dremio Corporation @DremioHQ
Parquet nested representation
Document
DocId Links Name
Backward Forward Language Url
...
© 2016 Dremio Corporation @DremioHQ
Kudu data representation
© 2016 Dremio Corporation @DremioHQ
Kudu Tablets
• Typed columns
• Inserts buffered in an in-memory store (like HBase’s me...
© 2016 Dremio Corporation @DremioHQ
Kudu
• High throughput for big scans (columnar
storage and replication)
– Goal: Within...
© 2016 Dremio Corporation @DremioHQ
LSM vs Kudu
• LSM – Log Structured Merge (Cassandra, HBase, etc)
– Inserts and updates...
© 2016 Dremio Corporation @DremioHQ
Kudu trade-offs: write
• Batch inserts are slower than Parquet
– Extra bloom filter lo...
© 2016 Dremio Corporation @DremioHQ
Kudu trade-offs: read
• Scan speed is close to Parquet and faster than HBase
– Columna...
© 2016 Dremio Corporation @DremioHQ
Kudu is…
– NOT a SQL database
• “Bring Your Own SQL”
– NOT a filesystem
• data must ha...
© 2016 Dremio Corporation @DremioHQ
Arrow in memory columnar format
© 2016 Dremio Corporation @DremioHQ
Arrow goals
• Well-documented and cross language
compatible
• Designed to take advanta...
© 2016 Dremio Corporation @DremioHQ
Arrow in memory columnar format
• Nested Data Structures
• Maximize CPU throughput
– P...
© 2016 Dremio Corporation @DremioHQ
CPU pipeline
© 2016 Dremio Corporation @DremioHQ
Minimize CPU cache misses
a cache miss costs 10 to 100s cycles depending on the level
© 2016 Dremio Corporation @DremioHQ
Focus on CPU Efficiency
Traditional
Memory Buffer
Arrow
Memory Buffer
• Cache Locality...
© 2016 Dremio Corporation @DremioHQ
Columnar data
persons = [{
name: ’Joe',
age: 18,
phones: [
‘555-111-1111’,
‘555-222-22...
© 2016 Dremio Corporation @DremioHQ
Java: Memory Management
• Chunk-based managed allocator
– Built on top of Netty’s JEMa...
© 2016 Dremio Corporation @DremioHQ
Arrow RPC & IPC
© 2016 Dremio Corporation @DremioHQ
Common Message Pattern
• Schema Negotiation
– Logical Description of structure
– Ident...
© 2016 Dremio Corporation @DremioHQ
Record Batch Construction
Schema
Negotiation
Dictionary
Batch
Record
Batch
Record
Batc...
© 2016 Dremio Corporation @DremioHQ
Moving Data Between Systems
RPC
• Avoid Serialization & Deserialization
• Layer TBD: F...
© 2016 Dremio Corporation @DremioHQ
Shared Need => Open Source Opportunity
“We are also considering switching to
a columna...
© 2016 Dremio Corporation @DremioHQ
Community Driven Standard
© 2016 Dremio Corporation @DremioHQ
An open source standard
• Parquet: Common need for on disk columnar.
• Arrow: Common n...
© 2016 Dremio Corporation @DremioHQ
The Apache Arrow Project
• New Top-level Apache Software Foundation project
– Announce...
© 2016 Dremio Corporation @DremioHQ
Interoperability and Ecosystem
© 2016 Dremio Corporation @DremioHQ
High Performance Sharing & Interchange
Today With Arrow
• Each system has its own inte...
© 2016 Dremio Corporation @DremioHQ
Language Bindings
Parquet
• Target Languages
– Java
– CPP (underway)
– Python & Pandas...
© 2016 Dremio Corporation @DremioHQ
Example data exchanges:
© 2016 Dremio Corporation @DremioHQ
RPC: Query execution
The memory
representation is sent
over the wire.
No serialization...
© 2016 Dremio Corporation @DremioHQ
RPC: future arrow based interchange
The memory
representation is sent
over the wire.
N...
© 2016 Dremio Corporation @DremioHQ
IPC: Python with Spark or Drill
SQL engine
Python
process
User
defined
function
SQL
Ope...
© 2016 Dremio Corporation @DremioHQ
What’s Next
• Parquet – Arrow conversion for Python & C++
• Arrow IPC Implementation
•...
© 2016 Dremio Corporation @DremioHQ
Get Involved
• Join the community
– dev@{arrow,parquet,kudu.incubator}.apache.org
– Sl...
Próximos SlideShares
Carregando em…5
×

The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Analytics

8.790 visualizações

Publicada em

The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Analytics

Publicada em: Tecnologia
  • Seja o primeiro a comentar

The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Analytics

  1. 1. © 2016 Dremio Corporation @DremioHQ The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Analytics Julien Le Dem Principal Architect, Dremio VP Apache Parquet, Apache Arrow PMC
  2. 2. © 2016 Dremio Corporation @DremioHQ • Architect at @DremioHQ • Formerly Tech Lead at Twitter on Data Platforms. • Creator of Parquet • Apache member • Apache PMCs: Arrow, Incubator, Pig, Parquet Julien Le Dem @J_ Julien
  3. 3. © 2016 Dremio Corporation @DremioHQ Agenda • Benefits of Columnar representation – Immutable On disk (Apache Parquet) – Mutable on disk (Apache Kudu) – In memory (Apache Arrow) • Community Driven Standard • Interoperability and Ecosystem
  4. 4. © 2016 Dremio Corporation @DremioHQ Benefits of Columnar formats @EmrgencyKittens
  5. 5. © 2016 Dremio Corporation @DremioHQ Columnar layout Logical table representation Row layout Column layout
  6. 6. © 2016 Dremio Corporation @DremioHQ Mutable or Immutable Storage • Different trade offs – Immutable: (Parquet). • Higher write throughput (no random modification after completion). • Easy to share, replicate, access concurrently. • Modifications require rewrite of dataset. • No operational overhead (no extra service, just your file system) – Mutable: (Kudu) • More flexible trade off between update speed and read speed. • Low-latency for short accesses (primary key indexes and quorum replication) • Database-like semantics (initially single-row ACID) • Needs to be managed (new daemon).
  7. 7. © 2016 Dremio Corporation @DremioHQ On Disk and in Memory • Different trade offs – On disk: Storage. • Accessed by multiple queries. • Priority to I/O reduction (but still needs good CPU throughput). • Mostly Streaming access. – In memory: Transient. • Specific to one query execution. • Priority to CPU throughput (but still needs good I/O). • Streaming and Random access.
  8. 8. © 2016 Dremio Corporation @DremioHQ Parquet on disk columnar format
  9. 9. © 2016 Dremio Corporation @DremioHQ Parquet on disk columnar format • Nested data structures • Compact format: – type aware encodings – better compression • Optimized I/O: – Projection push down (column pruning) – Predicate push down (filters based on stats)
  10. 10. © 2016 Dremio Corporation @DremioHQ Access only the data you need a b c a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a b c a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 a b c a1 b1 c1 a2 b2 c2 a3 b3 c3 a4 b4 c4 a5 b5 c5 + = Columnar Statistics Read only the data you need!
  11. 11. © 2016 Dremio Corporation @DremioHQ Parquet nested representation Document DocId Links Name Backward Forward Language Url Code Country Columns: docid links.backward links.forward name.language.code name.language.country name.url Borrowed from the Google Dremel paper https://blog.twitter.com/2013/dremel-made-simple-with-parquet
  12. 12. © 2016 Dremio Corporation @DremioHQ Kudu data representation
  13. 13. © 2016 Dremio Corporation @DremioHQ Kudu Tablets • Typed columns • Inserts buffered in an in-memory store (like HBase’s memstore) • Flushed to disk: Columnar layout, similar to Apache Parquet • Updates use MVCC (updates tagged with timestamp, not in-place) – Allow “SELECT AS OF <timestamp>” queries and consistent cross-tablet scans • Near-optimal read path for “current time” scans – No per row branches, fast vectorized decoding and predicate evaluation • Performance worsens based on number of recent updates
  14. 14. © 2016 Dremio Corporation @DremioHQ Kudu • High throughput for big scans (columnar storage and replication) – Goal: Within 2x of Parquet • Low-latency for short accesses (primary key indexes and quorum replication) – Goal: 1ms read/write on SSD • Database-like semantics (initially single-row ACID) • Relational data model – SQL query – “NoSQL” style scan/insert/update (Java client) Parquet
  15. 15. © 2016 Dremio Corporation @DremioHQ LSM vs Kudu • LSM – Log Structured Merge (Cassandra, HBase, etc) – Inserts and updates all go to an in-memory map (MemStore) and later flush to on-disk files (HFile/SSTable) – Reads perform an on-the-fly merge of all on-disk HFiles • Kudu – Shares some traits (memstores, compactions) – More complex. – Slower writes in exchange for faster reads (especially scans) 15
  16. 16. © 2016 Dremio Corporation @DremioHQ Kudu trade-offs: write • Batch inserts are slower than Parquet – Extra bloom filter lookup per insert • Random updates are slower than HBase – HBase model allows random updates without incurring a disk seek – Kudu requires a key lookup before update, bloom lookup before insert 16
  17. 17. © 2016 Dremio Corporation @DremioHQ Kudu trade-offs: read • Scan speed is close to Parquet and faster than HBase – Columnar on disk like Parquet – Only one DiskRowSet contains updates for a given row. Fewer files lookup than Hbase (but more than Parquet). • Single-row reads may be slower than Hbase (and both are faster than Parquet) – Columnar design is optimized for scans – Future: may introduce “column groups” for applications where single-row access is more important – Especially slow at reading a row that has had many recent updates (e.g YCSB “zipfian”) 17
  18. 18. © 2016 Dremio Corporation @DremioHQ Kudu is… – NOT a SQL database • “Bring Your Own SQL” – NOT a filesystem • data must have tabular structure – NOT an in-memory database • Very fast for memory-sized workloads, but can operate on larger data too 18
  19. 19. © 2016 Dremio Corporation @DremioHQ Arrow in memory columnar format
  20. 20. © 2016 Dremio Corporation @DremioHQ Arrow goals • Well-documented and cross language compatible • Designed to take advantage of modern CPU characteristics • Embeddable in execution engines, storage layers, etc. • Interoperable
  21. 21. © 2016 Dremio Corporation @DremioHQ Arrow in memory columnar format • Nested Data Structures • Maximize CPU throughput – Pipelining – SIMD – cache locality • Scatter/gather I/O
  22. 22. © 2016 Dremio Corporation @DremioHQ CPU pipeline
  23. 23. © 2016 Dremio Corporation @DremioHQ Minimize CPU cache misses a cache miss costs 10 to 100s cycles depending on the level
  24. 24. © 2016 Dremio Corporation @DremioHQ Focus on CPU Efficiency Traditional Memory Buffer Arrow Memory Buffer • Cache Locality • Super-scalar & vectorized operation • Minimal Structure Overhead • Constant value access – With minimal structure overhead • Operate directly on columnar compressed data
  25. 25. © 2016 Dremio Corporation @DremioHQ Columnar data persons = [{ name: ’Joe', age: 18, phones: [ ‘555-111-1111’, ‘555-222-2222’ ] }, { name: ’Jack', age: 37, phones: [ ‘555-333-3333’ ] }]
  26. 26. © 2016 Dremio Corporation @DremioHQ Java: Memory Management • Chunk-based managed allocator – Built on top of Netty’s JEMalloc implementation • Create a tree of allocators – Limit and transfer semantics across allocators – Leak detection and location accounting • Wrap native memory from other applications
  27. 27. © 2016 Dremio Corporation @DremioHQ Arrow RPC & IPC
  28. 28. © 2016 Dremio Corporation @DremioHQ Common Message Pattern • Schema Negotiation – Logical Description of structure – Identification of dictionary encoded Nodes • Dictionary Batch – Dictionary ID, Values • Record Batch – Batches of records up to 64K – Leaf nodes up to 2B values Schema Negotiation Dictionary Batch Record Batch Record Batch Record Batch 1..N Batches 0..N Batches
  29. 29. © 2016 Dremio Corporation @DremioHQ Record Batch Construction Schema Negotiation Dictionary Batch Record Batch Record Batch Record Batch name (offset) name (data) age (data) phones (list offset) phones (data) data header (describes offsets into data) name (bitmap) age (bitmap) phones (bitmap) phones (offset) { name: ’Joe', age: 18, phones: [ ‘555-111-1111’, ‘555-222-2222’ ] } Each box (vector) is contiguous memory The entire record batch is contiguous on wire
  30. 30. © 2016 Dremio Corporation @DremioHQ Moving Data Between Systems RPC • Avoid Serialization & Deserialization • Layer TBD: Focused on supporting vectored io – Scatter/gather reads/writes against socket IPC • Alpha implementation using memory mapped files – Moving data between Python and Drill • Working on shared allocation approach – Shared reference counting and well-defined ownership semantics
  31. 31. © 2016 Dremio Corporation @DremioHQ Shared Need => Open Source Opportunity “We are also considering switching to a columnar canonical in-memory format for data that needs to be materialized during query processing, in order to take advantage of SIMD instructions” -Impala Team “A large fraction of the CPU time is spent waiting for data to be fetched from main memory…we are designing cache-friendly algorithms and data structures so Spark applications will spend less time waiting to fetch data from memory and more time doing useful work” – Spark Team “Drill provides a flexible hierarchical columnar data model that can represent complex, highly dynamic and evolving data models and allows efficient processing of it without need to flatten or materialize.” -Drill Team
  32. 32. © 2016 Dremio Corporation @DremioHQ Community Driven Standard
  33. 33. © 2016 Dremio Corporation @DremioHQ An open source standard • Parquet: Common need for on disk columnar. • Arrow: Common need for in memory columnar. • Arrow building on the success of Parquet. • Benefits: – Share the effort – Create an ecosystem • Standard from the start
  34. 34. © 2016 Dremio Corporation @DremioHQ The Apache Arrow Project • New Top-level Apache Software Foundation project – Announced Feb 17, 2016 • Focused on Columnar In-Memory Analytics 1. 10-100x speedup on many workloads 2. Common data layer enables companies to choose best of breed systems 3. Designed to work with any programming language 4. Support for both relational and complex data as-is • Developers from 13+ major open source projects involved – A significant % of the world’s data will be processed through Arrow! Calcite Cassandra Deeplearning4j Drill Hadoop HBase Ibis Impala Kudu Pandas Parquet Phoenix Spark Storm R
  35. 35. © 2016 Dremio Corporation @DremioHQ Interoperability and Ecosystem
  36. 36. © 2016 Dremio Corporation @DremioHQ High Performance Sharing & Interchange Today With Arrow • Each system has its own internal memory format • 70-80% CPU wasted on serialization and deserialization • Functionality duplication and unnecessary conversions • All systems utilize the same memory format • No overhead for cross-system communication • Projects can share functionality (eg: Parquet-to-Arrow reader)
  37. 37. © 2016 Dremio Corporation @DremioHQ Language Bindings Parquet • Target Languages – Java – CPP (underway) – Python & Pandas (underway) • Engines integration: – Faster to list those who don’t support it Arrow • Target Languages – Java (beta) – CPP (underway) – Python & Pandas (underway) – R – Julia • Initial Focus – Read a structure – Write a structure – Manage Memory Kudu • Target Languages – Java – CPP • Engines integration: – MapReduce, – Spark – Impala – Drill
  38. 38. © 2016 Dremio Corporation @DremioHQ Example data exchanges:
  39. 39. © 2016 Dremio Corporation @DremioHQ RPC: Query execution The memory representation is sent over the wire. No serialization overhead. Scanner Scanner Scanner Parquet files projection push down read only a and b Partial Agg Partial Agg Partial Agg Agg Agg Agg Shuffle Arrow batches Result
  40. 40. © 2016 Dremio Corporation @DremioHQ RPC: future arrow based interchange The memory representation is sent over the wire. No serialization overhead. Scanner projection/predicate push down Operator Arrow batches Tablet Mem Disk SQL execution Scanner Operator Scanner Operator Tablet Mem Disk Tablet Mem Disk …
  41. 41. © 2016 Dremio Corporation @DremioHQ IPC: Python with Spark or Drill SQL engine Python process User defined function SQL Operator 1 SQL Operator 2 reads reads
  42. 42. © 2016 Dremio Corporation @DremioHQ What’s Next • Parquet – Arrow conversion for Python & C++ • Arrow IPC Implementation • Kudu – Arrow integration • Apache {Spark, Drill} to Arrow Integration – Faster UDFs, Storage interfaces • Support for integration with Intel’s Persistent Memory library via Apache Mnemonic
  43. 43. © 2016 Dremio Corporation @DremioHQ Get Involved • Join the community – dev@{arrow,parquet,kudu.incubator}.apache.org – Slack: • https://apachearrowslackin.herokuapp.com/ • https://getkudu-slack.herokuapp.com/ – http://{arrow,parquet,kudu}.apache.org – Follow @Apache{Parquet,Arrow,Kudu}

×