SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Using ElasticSearch to scale near real-time search
John Billings

2013-11-20
Problem: Finding reviews
Problem: Finding reviews
Where are reviews stored?
Searching using SQL

select * from reviews where
content like ‘%chicken tikka masala%’
and id = 123456
And we’re done
Not so fast...
What did we forget?

● Analysis
○ Tokenization
○ Stemming (‘curries’ vs ‘curry’)
○ Stop words (‘the’, ‘and’, ‘a’, …)
● Performance
● Highlighting / snippetting
● Faceting
Version 2
Query

curl -w'n' -XGET 'host:14900/reviews/_search' -d '{
"fields" : [],
"highlight": {"fields": {"review_comment" : {}}},
"query": {
"bool": {
"must": [
{"match": {"review_comment": "chicken tikka masala"}},
{"match": {"business_id": "123456"}}
]
}
}
}'
Response

{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 341,
"max_score" : 2.7088916,
"hits" : [
// Hit objects
]
}
}
Response

{
"_index" : "reviews",
"_type" : "review",
"_id" : "123456",
"_score" : 2.0553625,
"highlight" : {
"review_comment" : [ "
I've found. We usually order saag paneer,
<em>chicken</em> <em>tikka</em> <em>masala</em>, bengan bharta would recommend them all" ]
}
}
Aside: Clients
Indexing

Updates

JSON docs

Gearman
Indexer
modules
Indexing

class IndexerEvent:
table: String
id: int
action: [‘insert’, ‘update’, ‘delete’]
timestamp: float
class IndexRequest:
index_name: String
doc_type: String
doc_id: int
field_values: Map<String, String>
class Indexer:
tables_to_watch: List<String>
handle_event: IndexerEvent -> List<IndexRequest>
Replication and sharding

curl -w'n' -XPUT 'host:14900/reviews/' -d '{
“settings”: {
“index”: {
“number_of_shards”: 3,
“number_of_replicas”: 2,
}
}
}’
Using ElasticSearch to discover local businesses
And we’re done
Not so fast…
Performance can be unpredictable
Problem: Disks are slow
Problem: Memory usage is unpredictable
Problem: Tenants can be noisy
Any questions?

Mais conteúdo relacionado

Destaque (7)

How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service Discovery
 
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E..."Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
"Optimal Learning for Fun and Profit" by Scott Clark (Presented at The Yelp E...
 
Giving Design Critique
Giving Design CritiqueGiving Design Critique
Giving Design Critique
 
Yelp Academic Dataset
Yelp Academic DatasetYelp Academic Dataset
Yelp Academic Dataset
 
Humans by the hundred
Humans by the hundredHumans by the hundred
Humans by the hundred
 
Building a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from YelpBuilding a smarter application Stack by Tomas Doran from Yelp
Building a smarter application Stack by Tomas Doran from Yelp
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 

Semelhante a "Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13)

Final Presentation V1.8
Final Presentation V1.8Final Presentation V1.8
Final Presentation V1.8
Chad Koziel
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Lucidworks
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
MongoDB
 
Google
GoogleGoogle
Google
soon
 

Semelhante a "Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13) (20)

Final Presentation V1.8
Final Presentation V1.8Final Presentation V1.8
Final Presentation V1.8
 
Salesforce Data Models for Pros: Simplifying The Complexities
Salesforce Data Models for Pros: Simplifying The ComplexitiesSalesforce Data Models for Pros: Simplifying The Complexities
Salesforce Data Models for Pros: Simplifying The Complexities
 
SFDC Data Models For Pros - Simplifying The Complexities
SFDC Data Models For Pros - Simplifying The ComplexitiesSFDC Data Models For Pros - Simplifying The Complexities
SFDC Data Models For Pros - Simplifying The Complexities
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
 
Vertically Scaled Design Patters
Vertically Scaled Design PattersVertically Scaled Design Patters
Vertically Scaled Design Patters
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Google
GoogleGoogle
Google
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
GraphQL - when REST API is to less - lessons learned
GraphQL - when REST API is to less - lessons learnedGraphQL - when REST API is to less - lessons learned
GraphQL - when REST API is to less - lessons learned
 
Keynote - Speaker: Grigori Melnik
Keynote - Speaker: Grigori Melnik Keynote - Speaker: Grigori Melnik
Keynote - Speaker: Grigori Melnik
 
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
미움 받을 용기 : 저 팀은 뭘 안다고 추천한다고 들쑤시고 다니는건가
 
Learning to rank search results
Learning to rank search resultsLearning to rank search results
Learning to rank search results
 
LJC Conference 2014 Cassandra for Java Developers
LJC Conference 2014 Cassandra for Java DevelopersLJC Conference 2014 Cassandra for Java Developers
LJC Conference 2014 Cassandra for Java Developers
 
GraphQL - when REST API is to less - lessons learned
GraphQL - when REST API is to less - lessons learnedGraphQL - when REST API is to less - lessons learned
GraphQL - when REST API is to less - lessons learned
 
E-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchE-Commerce search with Elasticsearch
E-Commerce search with Elasticsearch
 
Browser testing with nightwatch.js
Browser testing with nightwatch.jsBrowser testing with nightwatch.js
Browser testing with nightwatch.js
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Elasticsearch in 15 Minutes
Elasticsearch in 15 MinutesElasticsearch in 15 Minutes
Elasticsearch in 15 Minutes
 
Automated testing for client-side - Adam Klein, 500 Tech
Automated testing for client-side - Adam Klein, 500 TechAutomated testing for client-side - Adam Klein, 500 Tech
Automated testing for client-side - Adam Klein, 500 Tech
 
Client side unit tests - using jasmine & karma
Client side unit tests - using jasmine & karmaClient side unit tests - using jasmine & karma
Client side unit tests - using jasmine & karma
 

Mais de Yelp Engineering

Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
Yelp Engineering
 

Mais de Yelp Engineering (11)

Human Ops
Human OpsHuman Ops
Human Ops
 
Teeing Up Python - Code Golf
Teeing Up Python - Code GolfTeeing Up Python - Code Golf
Teeing Up Python - Code Golf
 
Fluxx Streaming
Fluxx StreamingFluxx Streaming
Fluxx Streaming
 
Building a World Class Security Team
Building a World Class Security TeamBuilding a World Class Security Team
Building a World Class Security Team
 
Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of Services
 
Humans by the hundred (DevOps Days Ohio)
Humans by the hundred (DevOps Days Ohio)Humans by the hundred (DevOps Days Ohio)
Humans by the hundred (DevOps Days Ohio)
 
Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Tech Talks: Mobile Testing 1, 2, 3Yelp Tech Talks: Mobile Testing 1, 2, 3
Yelp Tech Talks: Mobile Testing 1, 2, 3
 
Ensuring Consistency in a Replicated World
Ensuring Consistency in a Replicated WorldEnsuring Consistency in a Replicated World
Ensuring Consistency in a Replicated World
 
A Beginners Guide To Launching Yelp In Hong Kong
A Beginners Guide To Launching Yelp In Hong KongA Beginners Guide To Launching Yelp In Hong Kong
A Beginners Guide To Launching Yelp In Hong Kong
 
Own Your Career
Own Your CareerOwn Your Career
Own Your Career
 
Optimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOEOptimal Learning for Fun and Profit with MOE
Optimal Learning for Fun and Profit with MOE
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13)