Pinot: Realtime Distributed OLAP datastore

Kishore Gopalakrishna
Kishore GopalakrishnaSenior Staff Software Engineer, Data infrastructure at LinkedIn em LinkedIn
Pinot
Kishore Gopalakrishna
Tuesday, August 18, 15
Agenda
• Pinot @ LinkedIn - Current
• Pinot - Architecture
• Pinot Operations
• Pinot @ LinkedIn - Future
Tuesday, August 18, 15
WVMP
Tuesday, August 18, 15
Slice and Dice Metrics
Tuesday, August 18, 15
Pinot @ LinkedIn
Customers Members Internal tools
Tuesday, August 18, 15
• 100B documents
• 1B documents ingested per day
• 100M queries per day
• 10’s of ms latency
• 30 tables in prod, 250 * 3 std app nodes

 

Pinot @ LinkedIn
Tuesday, August 18, 15
Key features
SQL-like
interface
Columnar
storage and
indexing
Real-time
data load
Tuesday, August 18, 15
(S)QL: Filters and Aggs
SELECT count(*)
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
'day' >= 15949 AND 'day' <= 15963 AND
paid = 'y’ AND
action = 'stop'
Tuesday, August 18, 15
(S)QL: Group By
SELECT count(*)
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
'day' >= 15949 AND 'day' <= 15963 AND
paid = 'y’
GROUP BY action
Tuesday, August 18, 15
(S)QL: ORDER BY and LIMIT
SELECT *
FROM companyFollowHistoricalEvents
WHERE entityId = 121011 AND
entityId = 1000 AND
action = 'start'
ORDER BY creationTime DESC LIMIT 1
Tuesday, August 18, 15
Whats not supported
• JOIN: unpredictable performance
• NOT A SOURCE OF TRUTH
• Mutation
Tuesday, August 18, 15
Pinot
• Data flow
• Query Execution
• How to use/operate
• Pinot @ LinkedIn - Future
Tuesday, August 18, 15
Broker Helix
Real
time
Historical
Kafka Hadoop
Pinot
Architecture
Queries
Raw
Data
Tuesday, August 18, 15
Pinot
• Pinot segments
Tuesday, August 18, 15
Pinot Segment layout: Columnar storage
Tuesday, August 18, 15
Pinot Segment layout: Sorted Forward Index
Tuesday, August 18, 15
Pinot Segment layout: Other techniques
• Indexes: Inverted index, Bitmap, RoaringBitmap
• Compression: Dictionary Encoding, P4Delta
• Multi Valued columns, skip lists,
• Hyperloglog for unique
• T-digest for Percentile, Quantile

Tuesday, August 18, 15
Data aware
pre-computation
Star tree Index
Tuesday, August 18, 15
Pinot
• Query Execution
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
S1
S3
S2
S1
S3
S2
Helix
Brokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
Brokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
5. Gather Response
Tuesday, August 18, 15
Pinot Query Execution: Distributed
Servers
1.Query
S1
S3
S2
S1
S3
S2
Helix
2. Fetch routing table from HelixBrokers
3. Scatter Request
4. Process Request
&
send response
5. Gather Response
6. Return Response
Tuesday, August 18, 15
Pinot Query Execution: Single Node Architecture
EXECUTION ENGINE
INVERTED
INDEX
BITMAP
INDEX
COLUMN FORMAT
PLANNER
Tuesday, August 18, 15
Pinot Query Execution: Single Node Architecture
SELECT
campaignId,
sum(clicks)
FROM Table A
WHERE
accountId = 121011
AND
'day' >= 15949
GROUP BY
campaignId
account Id daycampaign Id click
Filter
Operator
Projection
Operator
Aggregation
Group by
Operator
Combine Operator
Pinot
Segments
Data sources
Matching
doc ids
campaignId,Click tuple
Tuesday, August 18, 15
Pinot
• Operations
Tuesday, August 18, 15
Cluster Management: Deployment
Helix
Brokers
Servers
• Brokers and Servers register themselves in Helix
• All servers start with no use case specific configuration
Controller
Tuesday, August 18, 15
On boarding new use case
Helix
Brokers
Servers
XLNT XLNT
XLNT
Create Table
command
Controller
XLNT
XLNTTag
Servers
TableName
Brokers
3
XLNT_T1
1
Tuesday, August 18, 15
Segment Assignment
Servers
S3
S2
S1
Upload Segment S2
S1
S3
S2
S1
S3
Helix
Brokers
Copies
TableName
2
XLNT_T1
Controller
Tuesday, August 18, 15
• AUTO recovery mode: Automatically redistribute
segments on failure/addition of new nodes
• Custom mode: Run in degraded mode until node is
restarted/replaced.
Pinot - Fault tolerance/Elasticity
Tuesday, August 18, 15
Pinot vs Druid
Druid Pinot
Architecture
Realtime + Offline,
Realtime only
Realtime + Offline
Realtime only -> consistency is hard and
schema evolution/Bootstrap is hard
Inverted Index
Always On all columns,
Fixed
Configurable on per
column basis
Allows trade off between scanning v/s
inverted index + scanning. More data can be
fit in given memory size
Data organization N/A Sorts data
Organizing data provides speed/better
compression and removes the need for
inverted index
Smart pre-
materialization
N/A star-tree Allows trade off between latency and space
Query Execution
Layer
Fixed Plan
Split into Planning
and execution
Smart choices can be made at runtime
based on metadata/query.
Tuesday, August 18, 15
• Documentation & tooling
• In progress - consistency among real time replicas.
• Improve cost to serve - leverage SSD, partial pre
materialization
• ThirdEye - Business Metrics Monitoring
Pinot - Future
Tuesday, August 18, 15
Thank You
30
Tuesday, August 18, 15
1 de 36

Mais conteúdo relacionado

Mais procurados(20)

10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
rpolat372 visualizações
Instana - ClickHouse presentationInstana - ClickHouse presentation
Instana - ClickHouse presentation
Miel Donkers601 visualizações
Learning Agile through the candy gameLearning Agile through the candy game
Learning Agile through the candy game
Carlos Morales5.8K visualizações
Singer, Pinterest's Logging InfrastructureSinger, Pinterest's Logging Infrastructure
Singer, Pinterest's Logging Infrastructure
Discover Pinterest8.1K visualizações
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
Altinity Ltd6.6K visualizações
Demystifying observability Demystifying observability
Demystifying observability
Abigail Bangser525 visualizações
Airflow introductionAirflow introduction
Airflow introduction
Chandler Huang2.2K visualizações
Distributed tracing using open tracing &amp; jaeger 2Distributed tracing using open tracing &amp; jaeger 2
Distributed tracing using open tracing &amp; jaeger 2
Chandresh Pancholi808 visualizações
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng2.8K visualizações
Grafana Loki: like Prometheus, but for LogsGrafana Loki: like Prometheus, but for Logs
Grafana Loki: like Prometheus, but for Logs
Marco Pracucci2.8K visualizações
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
Walter Liu10.9K visualizações

Destaque(20)

10 facts about jobs in the future10 facts about jobs in the future
10 facts about jobs in the future
Pew Research Center's Internet & American Life Project1.8M visualizações
The AI RushThe AI Rush
The AI Rush
Jean-Baptiste Dumont1.1M visualizações
2017 holiday survey: An annual analysis of the peak shopping season2017 holiday survey: An annual analysis of the peak shopping season
2017 holiday survey: An annual analysis of the peak shopping season
Deloitte United States1.3M visualizações
Inside Google's Numbers in 2017Inside Google's Numbers in 2017
Inside Google's Numbers in 2017
Rand Fishkin1.9M visualizações
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Open Source LinkedIn Analytics Pipeline - BOSS 2016 (VLDB)
Issac Buenrostro16.9K visualizações
Penyimpangan Nilai "Persatuan" dalam pancasilaPenyimpangan Nilai "Persatuan" dalam pancasila
Penyimpangan Nilai "Persatuan" dalam pancasila
helda12349.3K visualizações
Why OpenDaylightWhy OpenDaylight
Why OpenDaylight
Lumina Networks128.1K visualizações
PENYIMPANGAN KEPADA SILA KE 3PENYIMPANGAN KEPADA SILA KE 3
PENYIMPANGAN KEPADA SILA KE 3
Aldya Rachma21.4K visualizações
Pinot: Near Realtime Analytics @ UberPinot: Near Realtime Analytics @ Uber
Pinot: Near Realtime Analytics @ Uber
Xiang Fu21.8K visualizações
オールフェスタ Git勉強会資料 (public)オールフェスタ Git勉強会資料 (public)
オールフェスタ Git勉強会資料 (public)
Shunsuke Tadokoro1.4K visualizações
Do Fluxo de Caixa ao Planejamento FinanceiroDo Fluxo de Caixa ao Planejamento Financeiro
Do Fluxo de Caixa ao Planejamento Financeiro
Granatum12.1K visualizações
Presentation r4i Presentation r4i
Presentation r4i
Kant Weerakant Drive Thailand29.8K visualizações
National Research Award_2559National Research Award_2559
National Research Award_2559
Kant Weerakant Drive Thailand22.7K visualizações
Presentation talent mobilityPresentation talent mobility
Presentation talent mobility
Kant Weerakant Drive Thailand1.1K visualizações
Research r4iResearch r4i
Research r4i
Kant Weerakant Drive Thailand30.7K visualizações
Mother teresa!Mother teresa!
Mother teresa!
lsammut5.8K visualizações
A Tribute to Mother Teresa !!A Tribute to Mother Teresa !!
A Tribute to Mother Teresa !!
Supriya S.2.4K visualizações
Mother Teresa: Saint of the GuttersMother Teresa: Saint of the Gutters
Mother Teresa: Saint of the Gutters
guimera 11K visualizações

Similar a Pinot: Realtime Distributed OLAP datastore

Truck and Body PresentationTruck and Body Presentation
Truck and Body PresentationCBN2014
8.5K visualizações19 slides
Stream processing at HotstarStream processing at Hotstar
Stream processing at HotstarKafkaZone
343 visualizações31 slides
PostgresPostgres
PostgresJeff Dickey
870 visualizações24 slides

Similar a Pinot: Realtime Distributed OLAP datastore(20)

Truck and Body PresentationTruck and Body Presentation
Truck and Body Presentation
CBN20148.5K visualizações
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
KafkaZone343 visualizações
PostgresPostgres
Postgres
Jeff Dickey870 visualizações
Scaling postgresScaling postgres
Scaling postgres
Denish Patel21.5K visualizações
8051,chapter1,architecture and peripherals8051,chapter1,architecture and peripherals
8051,chapter1,architecture and peripherals
amrutachintawar2397.1K visualizações
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
Treasure Data, Inc.2.4K visualizações
Active Directory Recon 101Active Directory Recon 101
Active Directory Recon 101
prashant35351.1K visualizações
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
Krishna Gade1.2K visualizações
Hypothetical Partitioning for PostgreSQLHypothetical Partitioning for PostgreSQL
Hypothetical Partitioning for PostgreSQL
Yuzuko Hosoya203 visualizações

Mais de Kishore Gopalakrishna(8)

Multi-Tenant Data Cloud with YARN & HelixMulti-Tenant Data Cloud with YARN & Helix
Multi-Tenant Data Cloud with YARN & Helix
Kishore Gopalakrishna5.6K visualizações
Helix talk at RelateIQHelix talk at RelateIQ
Helix talk at RelateIQ
Kishore Gopalakrishna8.8K visualizações
Untangling cluster management with HelixUntangling cluster management with Helix
Untangling cluster management with Helix
Kishore Gopalakrishna7.3K visualizações
Data driven testing: Case study with Apache HelixData driven testing: Case study with Apache Helix
Data driven testing: Case study with Apache Helix
Kishore Gopalakrishna8.1K visualizações
Apache Helix presentation at VmwareApache Helix presentation at Vmware
Apache Helix presentation at Vmware
Kishore Gopalakrishna5.3K visualizações
Apache Helix presentation at ApacheCon 2013Apache Helix presentation at ApacheCon 2013
Apache Helix presentation at ApacheCon 2013
Kishore Gopalakrishna7.5K visualizações
Apache Helix presentation at SOCC 2012Apache Helix presentation at SOCC 2012
Apache Helix presentation at SOCC 2012
Kishore Gopalakrishna6.4K visualizações

Último(20)

ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman152 visualizações
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang34 visualizações
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum183 visualizações
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman20 visualizações
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum117 visualizações
Green Leaf Consulting: Capabilities DeckGreen Leaf Consulting: Capabilities Deck
Green Leaf Consulting: Capabilities Deck
GreenLeafConsulting170 visualizações
PyCon ID 2023 - Ridwan Fadjar Septian.pdfPyCon ID 2023 - Ridwan Fadjar Septian.pdf
PyCon ID 2023 - Ridwan Fadjar Septian.pdf
Ridwan Fadjar163 visualizações
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver23 visualizações

Pinot: Realtime Distributed OLAP datastore

  • 2. Agenda • Pinot @ LinkedIn - Current • Pinot - Architecture • Pinot Operations • Pinot @ LinkedIn - Future Tuesday, August 18, 15
  • 4. Slice and Dice Metrics Tuesday, August 18, 15
  • 5. Pinot @ LinkedIn Customers Members Internal tools Tuesday, August 18, 15
  • 6. • 100B documents • 1B documents ingested per day • 100M queries per day • 10’s of ms latency • 30 tables in prod, 250 * 3 std app nodes Pinot @ LinkedIn Tuesday, August 18, 15
  • 8. (S)QL: Filters and Aggs SELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND 'day' >= 15949 AND 'day' <= 15963 AND paid = 'y’ AND action = 'stop' Tuesday, August 18, 15
  • 9. (S)QL: Group By SELECT count(*) FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND 'day' >= 15949 AND 'day' <= 15963 AND paid = 'y’ GROUP BY action Tuesday, August 18, 15
  • 10. (S)QL: ORDER BY and LIMIT SELECT * FROM companyFollowHistoricalEvents WHERE entityId = 121011 AND entityId = 1000 AND action = 'start' ORDER BY creationTime DESC LIMIT 1 Tuesday, August 18, 15
  • 11. Whats not supported • JOIN: unpredictable performance • NOT A SOURCE OF TRUTH • Mutation Tuesday, August 18, 15
  • 12. Pinot • Data flow • Query Execution • How to use/operate • Pinot @ LinkedIn - Future Tuesday, August 18, 15
  • 15. Pinot Segment layout: Columnar storage Tuesday, August 18, 15
  • 16. Pinot Segment layout: Sorted Forward Index Tuesday, August 18, 15
  • 17. Pinot Segment layout: Other techniques • Indexes: Inverted index, Bitmap, RoaringBitmap • Compression: Dictionary Encoding, P4Delta • Multi Valued columns, skip lists, • Hyperloglog for unique • T-digest for Percentile, Quantile Tuesday, August 18, 15
  • 18. Data aware pre-computation Star tree Index Tuesday, August 18, 15
  • 20. Pinot Query Execution: Distributed Servers S1 S3 S2 S1 S3 S2 Helix Brokers Tuesday, August 18, 15
  • 21. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix Brokers Tuesday, August 18, 15
  • 22. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers Tuesday, August 18, 15
  • 23. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request Tuesday, August 18, 15
  • 24. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response Tuesday, August 18, 15
  • 25. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response 5. Gather Response Tuesday, August 18, 15
  • 26. Pinot Query Execution: Distributed Servers 1.Query S1 S3 S2 S1 S3 S2 Helix 2. Fetch routing table from HelixBrokers 3. Scatter Request 4. Process Request & send response 5. Gather Response 6. Return Response Tuesday, August 18, 15
  • 27. Pinot Query Execution: Single Node Architecture EXECUTION ENGINE INVERTED INDEX BITMAP INDEX COLUMN FORMAT PLANNER Tuesday, August 18, 15
  • 28. Pinot Query Execution: Single Node Architecture SELECT campaignId, sum(clicks) FROM Table A WHERE accountId = 121011 AND 'day' >= 15949 GROUP BY campaignId account Id daycampaign Id click Filter Operator Projection Operator Aggregation Group by Operator Combine Operator Pinot Segments Data sources Matching doc ids campaignId,Click tuple Tuesday, August 18, 15
  • 30. Cluster Management: Deployment Helix Brokers Servers • Brokers and Servers register themselves in Helix • All servers start with no use case specific configuration Controller Tuesday, August 18, 15
  • 31. On boarding new use case Helix Brokers Servers XLNT XLNT XLNT Create Table command Controller XLNT XLNTTag Servers TableName Brokers 3 XLNT_T1 1 Tuesday, August 18, 15
  • 32. Segment Assignment Servers S3 S2 S1 Upload Segment S2 S1 S3 S2 S1 S3 Helix Brokers Copies TableName 2 XLNT_T1 Controller Tuesday, August 18, 15
  • 33. • AUTO recovery mode: Automatically redistribute segments on failure/addition of new nodes • Custom mode: Run in degraded mode until node is restarted/replaced. Pinot - Fault tolerance/Elasticity Tuesday, August 18, 15
  • 34. Pinot vs Druid Druid Pinot Architecture Realtime + Offline, Realtime only Realtime + Offline Realtime only -> consistency is hard and schema evolution/Bootstrap is hard Inverted Index Always On all columns, Fixed Configurable on per column basis Allows trade off between scanning v/s inverted index + scanning. More data can be fit in given memory size Data organization N/A Sorts data Organizing data provides speed/better compression and removes the need for inverted index Smart pre- materialization N/A star-tree Allows trade off between latency and space Query Execution Layer Fixed Plan Split into Planning and execution Smart choices can be made at runtime based on metadata/query. Tuesday, August 18, 15
  • 35. • Documentation & tooling • In progress - consistency among real time replicas. • Improve cost to serve - leverage SSD, partial pre materialization • ThirdEye - Business Metrics Monitoring Pinot - Future Tuesday, August 18, 15