SlideShare a Scribd company logo
1 of 29
Download to read offline
Airbnb Search Architecture 
Maxim Charkov, Engineering Manager 
maxim.charkov@airbnb.com, @mcharkov
Airbnb 
Total Guests 
20,000,000+ 
Countries 
190 
Cities 
34,000+ 
Castles 
600+ 
Listings Worldwide 
800,000+
Search 
www.airbnb.com
Booking Model 
Search Contact Accept Book
Search Backend 
Technical Stack 
____________________________ 
DropWizard as a service framework (incl. Jetty, Jersey, Jackson) 
Guice dependency injection framework, Guava libraries, etc. 
ZooKeeper (via Smartstack) for service discovery. 
Lucene for index storage and simple retrieval. 
In-house built real time indexing, ranking, advanced filtering.
Search Backend 
~150 search threads 
4 indexing threads 
Data maintained by indexers: 
Inverted Lucene index for retrieval 
Forward index for ranking signals 
Relevance models 
JVM
Indexing 
What’s in the Lucene index? 
____________________________ 
Positions of listings indexed using Lucene’s spatial module (RecursivePrefixTreeStrategy) 
Categorical and numerical properties like room type and maximum occupancy 
Calendar information 
Full text (descriptions, reviews, etc.) 
~40 fields per listing from a variety of data sources, all updated in real time
Indexing 
Challenges 
____________________________ 
Bootstrap (creating the index from scratch) 
Ensuring consistency of the index with ground truth data in real time
Indexing 
master calendar fraud 
SpinalTap 
Medusa PersistentStorage 
Search1 Search2 … SearchN
Indexing 
master calendar fraud 
SpinalTap 
Medusa PersistentStorage 
Search1 Search2 … SearchN
Indexing 
SpinalTap 
____________________________ 
Responsible for detecting updates happening to the ground truth data 
(no need to maintain search index invalidation logic in application code) 
Tails binary update logs from MySQL servers (5.6+) 
Converts them into actionable data objects, called “Mutations” 
Broadcasts using a distributed queue, like Kafka or RabbitMQ
Indexing 
# sources for mysql binary logs 
sources: 
- name : airslave 
host : localhost 
port : 11 
user : spinaltap 
password: spinaltap 
- name : calendar_db 
host : localhost 
port : 11 
user : spinaltap 
password: spinaltap 
! 
destinations: 
- name : kafka 
clazzName : 
com.airbnb.spinaltap.destination.kafka.KafkaDestination 
! 
pipes: 
- name : search 
sources : [“airslave", "calendar_db"] 
tables : ["production:listings,calendar_db:schedule2s"] 
destination : kafka 
SpinalTap Pipes 
____________________________ 
Each pipe connects one or more binlog sources (MySQL) with a 
destination (e.g. Kafka) 
Configured via YAML files
Indexing 
{ 
"seq" : 3, 
"binlogpos" : "mysql-bin.000002:5217:5273", 
"id" : -1857589909002862756, 
"type" : 2, 
"table" : { 
"id" : 70, 
"name" : "users", 
"db" : "my_db", 
"columns" : [ { 
"name" : "name", 
"type" : 15, 
"ispk" : false 
}, { 
"name" : "age", 
"type" : 2, 
"ispk" : false 
} ] 
}, 
"rows" : [ { 
"1" : { 
"name" : "eric", 
"age" : 31, 
}, 
"2" : { 
"name" : "eric", 
"age" : 28, 
} 
} ] 
} 
SpinalTap Mutations 
____________________________ 
Each binlog entry is parsed and converted into one of three 
event types: “Insert”, “Delete” or “Update” 
“Insert” and “Delete” carry the entire row to be inserted or 
deleted 
“Update” mutations contain both the old and the current row 
Additional information: unique id, sequence number, column 
and table metadata
Indexing 
Medusa 
____________________________ 
Documents in index contain data from ~15 different source tables 
Lucene needs a copy of all fields (not just fields that changed) to update the index 
We also need a mechanism to build the entire index from scratch, without putting too much strain on MySQL
Indexing 
Reads from SpinalTap or directly from MySQL 
Data from multiple tables is joined into Thrift objects, 
which correspond to Lucene documents 
The intermediate Thrift objects are persisted in Redis 
As changes are detected, updated objects are pushed 
to the Search instances to update Lucene indexes 
Can bootstrap the entire index in 3 minutes via 
multithreaded streaming 
Leader election via ZooKeeper 
Medusa PersistentStorage 
Search1 Search2 … SearchN
Ranking 
Ranking Problem 
____________________________ 
Not a text search problem 
Users are almost never searching for a specific item, rather they’re looking to “Discover” 
The most common component of a query is location 
Highly personalized – the user is a part of the query 
Optimizing for conversion (Search -> Inquiry -> Booking) 
Evolution through continuos experimentation
Ranking 
Ranking Components 
____________________________ 
Relevance 
Quality 
Bookability 
Personalization 
Desirability of location 
New host promotion 
etc.
Ranking 
Several hundred signals determining search ranking: 
Properties of the listing (reviews, location, etc.) 
Behavioral signals (mined from request logs) 
Image quality and click ability (computer vision) 
Host behavior (response time/rate, cancellations, etc.) 
Host preferences model 
DB snapshots Logs
Ranking 
public void attemptLoadData() { 
DateTime remoteTs = dataLoader.getModTime(pathToSignals); 
! 
if (currentTs == null || remoteTs.isAfter(currentTs) { 
Map<K, D> newSignals = loadData(); 
if (newSignals != null && (signalsMap == null || isHealthy(newSignals)) { 
synchronized (this) { 
signalsMap = newSignals; 
currentTs = remoteTs; 
this.notifyAll(); 
} 
} else { 
LOG.severe("Failed to load the avro file: " + pathToSignals); 
} 
} 
} 
! 
… 
! 
ThreadedLoader<Integer, QualitySignalsAvro> qualitySignalsLoader = 
loaders.get(LoaderCollection.Loader.QualitySignals); 
final QualitySignalsAvro qs = qualitySignalsLoader.get(hostingId, true); 
Loading Signals 
____________________________ 
Storing signals in a separate data structure 
Pros: 
Good fit for this type of update pattern: not real-time, but 
almost everything changes on each load 
No need for costly Lucene index rebuild 
Greatly simplifies design 
Cons: 
Unable to use Lucene retrieval on such data
Life of a Query 
Query 
Understanding 
Retrieval 
External Calls 
Geocoding 
Configuring retrieval options 
Choosing ranking models Quality 
Populator Scorer 
2000 results 
Third Pass Ranking 
Result Generation AirEvents Logging 
Bookability 
2000 results Relevance 
Filtering and Reranking 
Pricing Service 
Social Connections 
25 results 
25 results
Ranking 
Second Pass Ranking 
____________________________ 
Traditional ranking works like this: 
! 
then sort by rr 
In contrast, second pass operates on the entire list at once: 
! 
Makes it possible to implement features like result diversity, etc.
Life of a Query 
Query 
Understanding 
Retrieval 
External Calls 
Geocoding 
Configuring retrieval options 
Choosing ranking models Quality 
Populator Scorer 
2000 results 
Third Pass Ranking 
Result Generation AirEvents Logging 
Bookability 
2000 results Relevance 
Filtering and Reranking 
Pricing Service 
Social Connections 
25 results 
25 results
Ranking
Ranking
Ranking
Ranking
Outside of the scope of this talk 
____________________________ 
Ranking models 
Machine Learning infrastructure 
Tools (loadtest, deploy, etc.) 
Other Search Infrastructure services: UserProfiler, Pricing, Social, Hoods, etc.
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb

More Related Content

What's hot

What's hot (20)

Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
Search@airbnb
Search@airbnbSearch@airbnb
Search@airbnb
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com Data Streaming Ecosystem Management at Booking.com
Data Streaming Ecosystem Management at Booking.com
 
Cassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per monthCassandra & puppet, scaling data at $15 per month
Cassandra & puppet, scaling data at $15 per month
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searches
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
High performance queues with Cassandra
High performance queues with CassandraHigh performance queues with Cassandra
High performance queues with Cassandra
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup) Polyglot persistence @ netflix (CDE Meetup)
Polyglot persistence @ netflix (CDE Meetup)
 
Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)Rainbird: Realtime Analytics at Twitter (Strata 2011)
Rainbird: Realtime Analytics at Twitter (Strata 2011)
 
MySQL Database Monitoring: Must, Good and Nice to Have
MySQL Database Monitoring: Must, Good and Nice to HaveMySQL Database Monitoring: Must, Good and Nice to Have
MySQL Database Monitoring: Must, Good and Nice to Have
 
The architecture of search engines in Booking.com
The architecture of search engines in Booking.comThe architecture of search engines in Booking.com
The architecture of search engines in Booking.com
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
 

Similar to Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb

Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real worldXebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Michaël Figuière
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Justin Smestad
 

Similar to Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb (20)

About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
 
Making (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with CachingMaking (Almost) Any Database Faster and Cheaper with Caching
Making (Almost) Any Database Faster and Cheaper with Caching
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!
 
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar SeriesLog Analytics with Amazon Elasticsearch Service - September Webinar Series
Log Analytics with Amazon Elasticsearch Service - September Webinar Series
 
Decompose the monolith into AWS Step Functions
Decompose the monolith into AWS Step FunctionsDecompose the monolith into AWS Step Functions
Decompose the monolith into AWS Step Functions
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
 
DjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling DisqusDjangoCon 2010 Scaling Disqus
DjangoCon 2010 Scaling Disqus
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Microservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital OneMicroservices, Continuous Delivery, and Elasticsearch at Capital One
Microservices, Continuous Delivery, and Elasticsearch at Capital One
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real worldXebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
Xebia Knowledge Exchange (mars 2010) - Lucene : From theory to real world
 
Discovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clustersDiscovery Day 2019 Sofia - Big data clusters
Discovery Day 2019 Sofia - Big data clusters
 
Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017Nyc big datagenomics-pizarroa-sept2017
Nyc big datagenomics-pizarroa-sept2017
 
Logisland "Event Mining at scale"
Logisland "Event Mining at scale"Logisland "Event Mining at scale"
Logisland "Event Mining at scale"
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012The Very Very Latest in Database Development - Oracle Open World 2012
The Very Very Latest in Database Development - Oracle Open World 2012
 
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
The Very Very Latest In Database Development - Lucas Jellema - Oracle OpenWor...
 

More from Lucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Lucidworks
 

More from Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb

  • 1.
  • 2. Airbnb Search Architecture Maxim Charkov, Engineering Manager maxim.charkov@airbnb.com, @mcharkov
  • 3. Airbnb Total Guests 20,000,000+ Countries 190 Cities 34,000+ Castles 600+ Listings Worldwide 800,000+
  • 5. Booking Model Search Contact Accept Book
  • 6. Search Backend Technical Stack ____________________________ DropWizard as a service framework (incl. Jetty, Jersey, Jackson) Guice dependency injection framework, Guava libraries, etc. ZooKeeper (via Smartstack) for service discovery. Lucene for index storage and simple retrieval. In-house built real time indexing, ranking, advanced filtering.
  • 7. Search Backend ~150 search threads 4 indexing threads Data maintained by indexers: Inverted Lucene index for retrieval Forward index for ranking signals Relevance models JVM
  • 8. Indexing What’s in the Lucene index? ____________________________ Positions of listings indexed using Lucene’s spatial module (RecursivePrefixTreeStrategy) Categorical and numerical properties like room type and maximum occupancy Calendar information Full text (descriptions, reviews, etc.) ~40 fields per listing from a variety of data sources, all updated in real time
  • 9. Indexing Challenges ____________________________ Bootstrap (creating the index from scratch) Ensuring consistency of the index with ground truth data in real time
  • 10. Indexing master calendar fraud SpinalTap Medusa PersistentStorage Search1 Search2 … SearchN
  • 11. Indexing master calendar fraud SpinalTap Medusa PersistentStorage Search1 Search2 … SearchN
  • 12. Indexing SpinalTap ____________________________ Responsible for detecting updates happening to the ground truth data (no need to maintain search index invalidation logic in application code) Tails binary update logs from MySQL servers (5.6+) Converts them into actionable data objects, called “Mutations” Broadcasts using a distributed queue, like Kafka or RabbitMQ
  • 13. Indexing # sources for mysql binary logs sources: - name : airslave host : localhost port : 11 user : spinaltap password: spinaltap - name : calendar_db host : localhost port : 11 user : spinaltap password: spinaltap ! destinations: - name : kafka clazzName : com.airbnb.spinaltap.destination.kafka.KafkaDestination ! pipes: - name : search sources : [“airslave", "calendar_db"] tables : ["production:listings,calendar_db:schedule2s"] destination : kafka SpinalTap Pipes ____________________________ Each pipe connects one or more binlog sources (MySQL) with a destination (e.g. Kafka) Configured via YAML files
  • 14. Indexing { "seq" : 3, "binlogpos" : "mysql-bin.000002:5217:5273", "id" : -1857589909002862756, "type" : 2, "table" : { "id" : 70, "name" : "users", "db" : "my_db", "columns" : [ { "name" : "name", "type" : 15, "ispk" : false }, { "name" : "age", "type" : 2, "ispk" : false } ] }, "rows" : [ { "1" : { "name" : "eric", "age" : 31, }, "2" : { "name" : "eric", "age" : 28, } } ] } SpinalTap Mutations ____________________________ Each binlog entry is parsed and converted into one of three event types: “Insert”, “Delete” or “Update” “Insert” and “Delete” carry the entire row to be inserted or deleted “Update” mutations contain both the old and the current row Additional information: unique id, sequence number, column and table metadata
  • 15. Indexing Medusa ____________________________ Documents in index contain data from ~15 different source tables Lucene needs a copy of all fields (not just fields that changed) to update the index We also need a mechanism to build the entire index from scratch, without putting too much strain on MySQL
  • 16. Indexing Reads from SpinalTap or directly from MySQL Data from multiple tables is joined into Thrift objects, which correspond to Lucene documents The intermediate Thrift objects are persisted in Redis As changes are detected, updated objects are pushed to the Search instances to update Lucene indexes Can bootstrap the entire index in 3 minutes via multithreaded streaming Leader election via ZooKeeper Medusa PersistentStorage Search1 Search2 … SearchN
  • 17. Ranking Ranking Problem ____________________________ Not a text search problem Users are almost never searching for a specific item, rather they’re looking to “Discover” The most common component of a query is location Highly personalized – the user is a part of the query Optimizing for conversion (Search -> Inquiry -> Booking) Evolution through continuos experimentation
  • 18. Ranking Ranking Components ____________________________ Relevance Quality Bookability Personalization Desirability of location New host promotion etc.
  • 19. Ranking Several hundred signals determining search ranking: Properties of the listing (reviews, location, etc.) Behavioral signals (mined from request logs) Image quality and click ability (computer vision) Host behavior (response time/rate, cancellations, etc.) Host preferences model DB snapshots Logs
  • 20. Ranking public void attemptLoadData() { DateTime remoteTs = dataLoader.getModTime(pathToSignals); ! if (currentTs == null || remoteTs.isAfter(currentTs) { Map<K, D> newSignals = loadData(); if (newSignals != null && (signalsMap == null || isHealthy(newSignals)) { synchronized (this) { signalsMap = newSignals; currentTs = remoteTs; this.notifyAll(); } } else { LOG.severe("Failed to load the avro file: " + pathToSignals); } } } ! … ! ThreadedLoader<Integer, QualitySignalsAvro> qualitySignalsLoader = loaders.get(LoaderCollection.Loader.QualitySignals); final QualitySignalsAvro qs = qualitySignalsLoader.get(hostingId, true); Loading Signals ____________________________ Storing signals in a separate data structure Pros: Good fit for this type of update pattern: not real-time, but almost everything changes on each load No need for costly Lucene index rebuild Greatly simplifies design Cons: Unable to use Lucene retrieval on such data
  • 21. Life of a Query Query Understanding Retrieval External Calls Geocoding Configuring retrieval options Choosing ranking models Quality Populator Scorer 2000 results Third Pass Ranking Result Generation AirEvents Logging Bookability 2000 results Relevance Filtering and Reranking Pricing Service Social Connections 25 results 25 results
  • 22. Ranking Second Pass Ranking ____________________________ Traditional ranking works like this: ! then sort by rr In contrast, second pass operates on the entire list at once: ! Makes it possible to implement features like result diversity, etc.
  • 23. Life of a Query Query Understanding Retrieval External Calls Geocoding Configuring retrieval options Choosing ranking models Quality Populator Scorer 2000 results Third Pass Ranking Result Generation AirEvents Logging Bookability 2000 results Relevance Filtering and Reranking Pricing Service Social Connections 25 results 25 results
  • 28. Outside of the scope of this talk ____________________________ Ranking models Machine Learning infrastructure Tools (loadtest, deploy, etc.) Other Search Infrastructure services: UserProfiler, Pricing, Social, Hoods, etc.