SlideShare a Scribd company logo
1 of 31
Download to read offline
Using Elastic to
search over 2.5B
videos
Talk structure
● 4 steps to make user experience great again
● 4 patterns to simplify architecture and reduce costs
© 2016 Tubular Labs
2
Data size
● 2.5B documents
● AVG doc size 2Kb, 4Tb total size
● 200M daily updates (~8% of the index)
● Constant indexing rate of 3k/s with spikes
● Querying rate 1-3 r/s (low concurrency)
© 2016 Tubular Labs
3
Hardware
● 52 x c3.4xlarge
● 128 shards
● 16 cores per node
● ~3 shards per node
● 832 cores, 16Tb
SSD, 1.5Tb RAM
© 2016 Tubular Labs
4
● 26 x c3.8xlarge
● 416 shards
● 32 cores per node
● 16 shards per node
● 832 cores, 16Tb
SSD, 1.5Tb RAM
Before After (25% bigger)
Indexing
Optimize indexing
● Using bulk API
• 1Mb per batch (500 docs), should be 5k docs/s
• Recommended 5-15Mb
● Increasing refresh interval
• From 1 to 30 seconds
● Monitoring bulk.rejected
• Increased bulk.queueSize from 50 to 2000
© 2016 Tubular Labs
6
Searching
Product view
© 2016 Tubular Labs
8
Summary
Search results
Term aggregations
Before optimization
© 2016 Tubular Labs
9
Goal
© 2016 Tubular Labs
10
• Slow queries • From 15 to 5 seconds for 95th
• Seeking for 3x improvement
Problem Goal
Understand hardware utilization
© 2016 Tubular Labs
11
• Run the heaviest query
• No bottlenecks (CPU, disk IO, network)
• Thread pool search.size 25
• Max search.active is 3
CPU utilization
© 2016 Tubular Labs
12
• Know
• Your
• Concurrency
Benchmarking # of shards
© 2016 Tubular Labs
13
On a single
32 cores node
More CPU per request results
© 2016 Tubular Labs
14
15s to 7.5s
Search & Aggregations
© 2016 Tubular Labs
15
• Searching and sorting
is fast
• 8 term aggregations
are slow
Aggregation impact
© 2016 Tubular Labs
16
Check facet usage
© 2016 Tubular Labs
17
● Talk to your product manager
● Low product usage
● Remove networks and claims aggregations
● Replace facets with filters
Removing two aggregations results
© 2016 Tubular Labs
18
15s to 5.3s
Cardinality
© 2016 Tubular Labs
19
● Reduce cardinality
● Going from 200M to 5M (channels to creators)
● Reducing # of topics from 5M to 500
Reducing cardinality results
© 2016 Tubular Labs
20
15s to 4.4s
Split query and aggregations
© 2016 Tubular Labs
21
● Searching and aggregating separately
● Using shard-level query cache
● Showing results in UI asynchronously
Split query and aggregations results
© 2016 Tubular Labs
22
15s to 4.0s
Performance gain
© 2016 Tubular Labs
23
● From 15 to 4 seconds (<5 seconds)
● Overall improvement 3.7x
● What about costs?
Architecture patterns
Part 2. Goals
© 2016 Tubular Labs
25
● Reduce costs
● Improve reliability
● Simplify architecture
● Reduce variability in latency
Current flow
© 2016 Tubular Labs
26
● Too many
dependencies
● Expensive
intermediate
storage
Denormalization
© 2016 Tubular Labs
27
● 90% of data is
shared
● No extra calls
from frontend
Partial updates with Update API (experimental)
© 2016 Tubular Labs
28
“Partial” updates with parent-child relations (experimental)
© 2016 Tubular Labs
29
Split data by hot/full (idea for future)
© 2016 Tubular Labs
30
● Cheaper
hardware on full
● Shard allocation
filtering
Thank you

More Related Content

What's hot

Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-
Boyang Niu
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
 

What's hot (20)

Graylog Engineering - Design Your Architecture
Graylog Engineering - Design Your ArchitectureGraylog Engineering - Design Your Architecture
Graylog Engineering - Design Your Architecture
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Presto changes
Presto changesPresto changes
Presto changes
 
Garbage collection in JVM
Garbage collection in JVMGarbage collection in JVM
Garbage collection in JVM
 
OLAP Architecture
OLAP ArchitectureOLAP Architecture
OLAP Architecture
 
Data- How Does It Work-
Data- How Does It Work-Data- How Does It Work-
Data- How Does It Work-
 
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang HuiStor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
Stor4NFV: Exploration of Cloud native Storage in OPNFV - Ren Qiaowei, Wang Hui
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
Big data: Loading your data with flume and sqoop
Big data:  Loading your data with flume and sqoopBig data:  Loading your data with flume and sqoop
Big data: Loading your data with flume and sqoop
 
Presto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - LyftPresto Summit 2018 - 07 - Lyft
Presto Summit 2018 - 07 - Lyft
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
Managing your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphereManaging your CF templates as a code with python and troposphere
Managing your CF templates as a code with python and troposphere
 
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
 
Speed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with AlluxioSpeed Up Uber's Presto with Alluxio
Speed Up Uber's Presto with Alluxio
 
Fast dataarchitecture
Fast dataarchitectureFast dataarchitecture
Fast dataarchitecture
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
Petabyte Scale Object Storage Service Using Ceph in A Private Cloud - Varada ...
 
NRD: Nagios Result Distributor
NRD: Nagios Result DistributorNRD: Nagios Result Distributor
NRD: Nagios Result Distributor
 
Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)Breaking Prometheus (Promcon Berlin '16)
Breaking Prometheus (Promcon Berlin '16)
 

Viewers also liked

Een Gezond Gebit2
Een Gezond Gebit2Een Gezond Gebit2
Een Gezond Gebit2
guest031320
 

Viewers also liked (20)

AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...
AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...
AWS re:Invent 2016: Deploying and Managing .NET Pipelines and Microsoft Workl...
 
ITV& Bashton
ITV& Bashton ITV& Bashton
ITV& Bashton
 
Mobile and Serverless : an Untold Story
Mobile and Serverless : an Untold StoryMobile and Serverless : an Untold Story
Mobile and Serverless : an Untold Story
 
Writing New Relic Plugins: NSQ
Writing New Relic Plugins: NSQWriting New Relic Plugins: NSQ
Writing New Relic Plugins: NSQ
 
The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)The Lost Tales of Platform Design (February 2017)
The Lost Tales of Platform Design (February 2017)
 
Heelal
HeelalHeelal
Heelal
 
Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?Gartner 2017 London: How to re-invent your IT Architecture?
Gartner 2017 London: How to re-invent your IT Architecture?
 
Reproducible Science with Python
Reproducible Science with PythonReproducible Science with Python
Reproducible Science with Python
 
Neuigkeiten von DEPAROM & Co
Neuigkeiten von DEPAROM & CoNeuigkeiten von DEPAROM & Co
Neuigkeiten von DEPAROM & Co
 
Evolution of OPNFV CI System: What already exists and what can be introduced
Evolution of OPNFV CI System: What already exists and what can be introduced  Evolution of OPNFV CI System: What already exists and what can be introduced
Evolution of OPNFV CI System: What already exists and what can be introduced
 
(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte Scale(SEC313) Security & Compliance at the Petabyte Scale
(SEC313) Security & Compliance at the Petabyte Scale
 
Introduction to smpc
Introduction to smpc Introduction to smpc
Introduction to smpc
 
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...
 
Reversing malware analysis training part3 windows pefile formatbasics
Reversing malware analysis training part3 windows pefile formatbasicsReversing malware analysis training part3 windows pefile formatbasics
Reversing malware analysis training part3 windows pefile formatbasics
 
Data Visualization on the Tech Side
Data Visualization on the Tech SideData Visualization on the Tech Side
Data Visualization on the Tech Side
 
Persistence in the cloud with bosh
Persistence in the cloud with boshPersistence in the cloud with bosh
Persistence in the cloud with bosh
 
Security For Humans
Security For HumansSecurity For Humans
Security For Humans
 
AWS + Puppet = Dynamic Scale
AWS + Puppet = Dynamic ScaleAWS + Puppet = Dynamic Scale
AWS + Puppet = Dynamic Scale
 
Business selectors
Business selectorsBusiness selectors
Business selectors
 
Een Gezond Gebit2
Een Gezond Gebit2Een Gezond Gebit2
Een Gezond Gebit2
 

Similar to Tubular Labs - Using Elastic to Search Over 2.5B Videos

stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
Gaurav "GP" Pal
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Anna Ossowski
 

Similar to Tubular Labs - Using Elastic to Search Over 2.5B Videos (20)

DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
Meetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWSMeetup #3: Migrating an Oracle Application from on-premise to AWS
Meetup #3: Migrating an Oracle Application from on-premise to AWS
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Data Lessons Learned at Scale
Data Lessons Learned at ScaleData Lessons Learned at Scale
Data Lessons Learned at Scale
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
moi-connect16
moi-connect16moi-connect16
moi-connect16
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Loggly - Benchmarking 5 Node.js Logging Libraries
Loggly - Benchmarking 5 Node.js Logging LibrariesLoggly - Benchmarking 5 Node.js Logging Libraries
Loggly - Benchmarking 5 Node.js Logging Libraries
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3Scaling Monitoring At Databricks From Prometheus to M3
Scaling Monitoring At Databricks From Prometheus to M3
 
Our Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent CloudOur Multi-Year Journey to a 10x Faster Confluent Cloud
Our Multi-Year Journey to a 10x Faster Confluent Cloud
 
BAXTER phase 1b
BAXTER phase 1bBAXTER phase 1b
BAXTER phase 1b
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speed
 
Training Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of ApplicationsTraining Webinar: Detect Performance Bottlenecks of Applications
Training Webinar: Detect Performance Bottlenecks of Applications
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
 
Rally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at ScaleRally--OpenStack Benchmarking at Scale
Rally--OpenStack Benchmarking at Scale
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Recently uploaded (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Tubular Labs - Using Elastic to Search Over 2.5B Videos

  • 1. Using Elastic to search over 2.5B videos
  • 2. Talk structure ● 4 steps to make user experience great again ● 4 patterns to simplify architecture and reduce costs © 2016 Tubular Labs 2
  • 3. Data size ● 2.5B documents ● AVG doc size 2Kb, 4Tb total size ● 200M daily updates (~8% of the index) ● Constant indexing rate of 3k/s with spikes ● Querying rate 1-3 r/s (low concurrency) © 2016 Tubular Labs 3
  • 4. Hardware ● 52 x c3.4xlarge ● 128 shards ● 16 cores per node ● ~3 shards per node ● 832 cores, 16Tb SSD, 1.5Tb RAM © 2016 Tubular Labs 4 ● 26 x c3.8xlarge ● 416 shards ● 32 cores per node ● 16 shards per node ● 832 cores, 16Tb SSD, 1.5Tb RAM Before After (25% bigger)
  • 6. Optimize indexing ● Using bulk API • 1Mb per batch (500 docs), should be 5k docs/s • Recommended 5-15Mb ● Increasing refresh interval • From 1 to 30 seconds ● Monitoring bulk.rejected • Increased bulk.queueSize from 50 to 2000 © 2016 Tubular Labs 6
  • 8. Product view © 2016 Tubular Labs 8 Summary Search results Term aggregations
  • 10. Goal © 2016 Tubular Labs 10 • Slow queries • From 15 to 5 seconds for 95th • Seeking for 3x improvement Problem Goal
  • 11. Understand hardware utilization © 2016 Tubular Labs 11 • Run the heaviest query • No bottlenecks (CPU, disk IO, network) • Thread pool search.size 25 • Max search.active is 3
  • 12. CPU utilization © 2016 Tubular Labs 12 • Know • Your • Concurrency
  • 13. Benchmarking # of shards © 2016 Tubular Labs 13 On a single 32 cores node
  • 14. More CPU per request results © 2016 Tubular Labs 14 15s to 7.5s
  • 15. Search & Aggregations © 2016 Tubular Labs 15 • Searching and sorting is fast • 8 term aggregations are slow
  • 16. Aggregation impact © 2016 Tubular Labs 16
  • 17. Check facet usage © 2016 Tubular Labs 17 ● Talk to your product manager ● Low product usage ● Remove networks and claims aggregations ● Replace facets with filters
  • 18. Removing two aggregations results © 2016 Tubular Labs 18 15s to 5.3s
  • 19. Cardinality © 2016 Tubular Labs 19 ● Reduce cardinality ● Going from 200M to 5M (channels to creators) ● Reducing # of topics from 5M to 500
  • 20. Reducing cardinality results © 2016 Tubular Labs 20 15s to 4.4s
  • 21. Split query and aggregations © 2016 Tubular Labs 21 ● Searching and aggregating separately ● Using shard-level query cache ● Showing results in UI asynchronously
  • 22. Split query and aggregations results © 2016 Tubular Labs 22 15s to 4.0s
  • 23. Performance gain © 2016 Tubular Labs 23 ● From 15 to 4 seconds (<5 seconds) ● Overall improvement 3.7x ● What about costs?
  • 25. Part 2. Goals © 2016 Tubular Labs 25 ● Reduce costs ● Improve reliability ● Simplify architecture ● Reduce variability in latency
  • 26. Current flow © 2016 Tubular Labs 26 ● Too many dependencies ● Expensive intermediate storage
  • 27. Denormalization © 2016 Tubular Labs 27 ● 90% of data is shared ● No extra calls from frontend
  • 28. Partial updates with Update API (experimental) © 2016 Tubular Labs 28
  • 29. “Partial” updates with parent-child relations (experimental) © 2016 Tubular Labs 29
  • 30. Split data by hot/full (idea for future) © 2016 Tubular Labs 30 ● Cheaper hardware on full ● Shard allocation filtering