O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci e...
U B E R | Data
Xiang Fu 

Sr Software Engineer II @ Uber

Streaming Analytics Team
Quick Introduction
U B E R | Data
Uber Scale
Messages Bytes
Apache Kafka Trillion per day ~PB per day

Streaming Analytics
Platform
Billions ...
U B E R | Data
Agenda
● Pinot @ Uber

● Architecture

● Case Study

● Pinot Perf
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci e...
U B E R | Data
Experimentation platform (Internal Dashboard)
A / B Tests

See progress of
tests in real-time
U B E R | Data
UberEats (Realtime User Facing Product)
UberEats Restaurant
Manager

“What is my revenue for
past 90 days?”
U B E R | Data
Many More…
• UberPool Analytics
• Mobile Analytics
...
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci e...
U B E R | Data
Pinot Workflow
Athena-X
Hive/Spark SQL/oozie
● Projection, Filtering

● Window Aggregation 

● Join
U B E R | Data
Pinot Realtime: Self Service
● Projection, Filtering

● Window Aggregation 

● Join
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci e...
U B E R | Data
Pinot Data Model
Column Name Column Type Filtering Compression Indexing
RiderId SingleValue/
Dimension
Yes ...
U B E R | Data
Pinot Data Ingestion
Realtime Ingestion:
Consumer Type Scalability Consistency
High Level Consumer Hard to ...
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci e...
U B E R | Data
Pinot Realtime Ingestion
Hardware 4 base SKU boxes(24 cores, 128G RAM)
Consumer Type HLC LLC
Peak Traffic(m...
U B E R | Data
Pinot/Druid Data Size
Raw Data: 

500M Rows, 30 columns

Raw Json: 391.9G
Three Storage Tiers 

in Pinot/Dr...
U B E R | Data
Pinot/Druid Query Performance
Max Duration:
select max(duration) from trips
Count All Grouped by City:
sele...
U B E R | Data
Pinot/Druid Concurrent Query
Query: select count(*) from trips group by city_id
U B E R | Data
Guaranteed SLA for Site Facing Products
Aggregation on Rider
trips:
select count(*) from trips
where riderI...
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci e...
Próximos SlideShares
Carregando em…5
×

Pinot: Near Realtime Analytics @ Uber

17.549 visualizações

Publicada em

Lnkd meetup slides 0426

Publicada em: Tecnologia
  • This presentation is referenced in this comparison of ClickHouse, Druid and Pinot: https://medium.com/@leventov/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Pinot: Near Realtime Analytics @ Uber

  1. 1. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Pinot: Near Real-Time Analytics @ Uber
  2. 2. U B E R | Data Xiang Fu Sr Software Engineer II @ Uber Streaming Analytics Team Quick Introduction
  3. 3. U B E R | Data Uber Scale Messages Bytes Apache Kafka Trillion per day ~PB per day Streaming Analytics Platform Billions processed per day 100s of TB processed per day Pinot 100s of Billions 10s of TB
  4. 4. U B E R | Data Agenda ● Pinot @ Uber ● Architecture ● Case Study ● Pinot Perf
  5. 5. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Pinot @ Uber
  6. 6. U B E R | Data Experimentation platform (Internal Dashboard) A / B Tests See progress of tests in real-time
  7. 7. U B E R | Data UberEats (Realtime User Facing Product) UberEats Restaurant Manager “What is my revenue for past 90 days?”
  8. 8. U B E R | Data Many More… • UberPool Analytics • Mobile Analytics ...
  9. 9. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Architecture
  10. 10. U B E R | Data Pinot Workflow Athena-X Hive/Spark SQL/oozie ● Projection, Filtering ● Window Aggregation ● Join
  11. 11. U B E R | Data Pinot Realtime: Self Service ● Projection, Filtering ● Window Aggregation ● Join
  12. 12. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Case Study
  13. 13. U B E R | Data Pinot Data Model Column Name Column Type Filtering Compression Indexing RiderId SingleValue/ Dimension Yes Dictionary Sorted DriverId SingleValue/ Dimension Yes Dictionary Inverted TripId SingleValue/ Dimension No No Dictionary No PickUpPoints MultiValue/ Dimension No No Dictionary No TripFare SingleValue/ Metric No No Dictionary No Step 1 List Column Spec Step 2 Analyze Query Pattern Step 3 Decide Compression & Indexing Strategy
  14. 14. U B E R | Data Pinot Data Ingestion Realtime Ingestion: Consumer Type Scalability Consistency High Level Consumer Hard to scale beyond one node Sacrificing consistency during failures Low Level Consumer Scalable beyond one node Strong consistency guarantees even during failure Segment Persistence: 500k msg or 6 hours Offline Ingestion: Using Oozie to schedule daily incremental backfill from Hive to Pinot
  15. 15. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Pinot Perf
  16. 16. U B E R | Data Pinot Realtime Ingestion Hardware 4 base SKU boxes(24 cores, 128G RAM) Consumer Type HLC LLC Peak Traffic(msg/sec/box) 20k 200k Peak Traffic(bytes/sec/box) 4M 40M Storage Kafka Pinot Total Data Volume(GB) 500 60
  17. 17. U B E R | Data Pinot/Druid Data Size Raw Data: 
 500M Rows, 30 columns
 Raw Json: 391.9G Three Storage Tiers 
 in Pinot/Druid - Segments in Deep Storage 
 (NFS or HDFS) - Local Disk Cache - Memory
  18. 18. U B E R | Data Pinot/Druid Query Performance Max Duration: select max(duration) from trips Count All Grouped by City: select count(*) from trips group by city_id top 10000 Count All in One Month: select count(*) from trips where Month = '201601' Count All in SF: select count(*) from trips where city_id=1 group by Month Unique Drivers in SF: select distinctCountHLL(driver_uuid) from trips where city_id=1 Unique Drivers By Date: select distinctCountHLL(driver_uuid) from trips group by Date
  19. 19. U B E R | Data Pinot/Druid Concurrent Query Query: select count(*) from trips group by city_id
  20. 20. U B E R | Data Guaranteed SLA for Site Facing Products Aggregation on Rider trips: select count(*) from trips where riderId = x and date > 20170225
  21. 21. Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Thank you

×