SlideShare a Scribd company logo
1 of 24
Download to read offline
Realtime Search
Infrastructure
sphinx at craigslist
!
Jeremy Zawodny
@jzawodn
https://github.com/jzawodn
http://blog.zawodny.com/
Jeremy@Zawodny.com
About Me
at craigslist since mid-2008!
first major project: “fix search”!
Perl, search, MySQL, redis, MongoDB, data,
backend services!
previously: Yahoo! and Marathon Oil!
wrote 1st edition of High Performance MySQL
About craigslist
engineering culture!
no product managers or marketing!
< 50 employees!
self-hosted infrastructure we own & manage!
no “cloud” or virtualization!
multi-datacenter!
driven by user needs and feedback
Search
Outline
challenges!
history of search at craigslist!
lessons!
questions?
Challenges
indexing rate (incoming volume)!
thousands of postings per minute!
churn and half-life!
traffic (always increasing)!
peak over 4,000 queries/second!
query multipliers (new features)!
spreading the load!
sharding and partitioning
History
search 1.0: Perl + DBM (2000-2002)!
search 2.0: MySQL Full-Text (2002-2008)!
search 3.0: sphinx master/slave (2008-2011)!
search 4.0: autonomous sphinx (2011-2013)!
search 5.0: realtime sphinx (2013 - today)
Evolution
needs and desires are changing!
sphinx is improving!
hardware is more capable!
learn from previous mistakes!
it’s fun to do new things :-)!
searching > browsing
MySQL Full-Text
MySQL Full-Text
used up until late 2008!
manual sharding!
performance was poor (easy to DoS)!
often fell off a cliff!
limited query syntax!
MyISAM corruption
Sphinx
Sphinx
awesome!
speaks MySQL
protocol!
amazingly fast
indexing!
very fast queries!
easy to understand!
programmer
friendly!
open source!
great support!
stable
Sphinx Tools
searchd: the sphinx server process!
multi-threaded or pre-forking!
indexer: build batch indexes off-line!
indextool: check indexes and get details!
search: diagnostic tool for simple searches
Master/Slave Sphinx
one index per city!
growth by sharding into 2 then 3 clusters!
masters build indexes every 10 minutes!
used indexer and perl scripts to generate XML!
build versioning and rollback mechanism!
slaves pull indexes via rsync and reload!
used pre-forking config!
hardware was dual proc, dual core AMD Opterons with 32GB RAM
Master/Slave Sphinx
master slave2
slave1
slave3
Postings DB
web1
web2
web3
web4
web5
web6
Hardware Upgrade!
Autonomous Sphinx
24 cores, 72GB RAM, 300GB SSD!
combine master and slave onto single node!
eliminate SPOF!
no replication delay!
simplify codebase!
better utilization of hardware
Autonomous Sphinx
Autonomous Sphinx
slave2
slave1
slave3
Postings DB
web1
web2
web3
web4
web5
web6
Real-Time Sphinx
RT indexes in sphinx have matured!
reduce overhead from the searchd restart!
reduce time to search from posting going live!
goal < 10 seconds!
eliminate XML generation code!
use MySQL protocol
Sphinx Clusters
Live: what you use!
highest traffic, volume, churn!
Team: what we use!
lowest traffic, lots of extra data!
Forums: yes, we have threaded discussions!
low volume, low traffic!
Archive: posting more than a few months old!
terabytes of indexes, constantly growing
Ram & Disk Chunks
Indexes begin as “ram chunks”!
rt_mem_limit caps their size!
once too large, they become “disk chunks”!
obviously, disk is slower than RAM!
the more chunks, the more docs to check!
query times fall, CPU use rises...
Lessons
stopwords: google has spoiled users!
MONITOR ALL THE THINGS!!1!!
Mind your rt_mem_limit!
Keep it all in RAM!
Make re-indexing easy!
Automate cloning
Questions?
While you ask, keeping in mind...!
craigslist is hiring!
front-end developers!
systems and network admins!
back-end developers!
send me your resume: z@craigslist.org!
https://www.craigslist.org/about/craigslist_is_hiring

More Related Content

What's hot

Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Ontico
 
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChangerZero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChangerMongoDB
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs RedisItamar Haber
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorPierre Baillet
 
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops TeamManaging 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops TeamRedis Labs
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redisZhichao Liang
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentMongoDB
 
Sharding
ShardingSharding
ShardingMongoDB
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. RedisTim Lossen
 
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用Akira Kitauchi
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the WorldSATOSHI TAGOMORI
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining William Simms
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture ForumChristopher Spring
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableShu Ting Tseng
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011Tim Y
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingMongoDB
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 

What's hot (20)

Optimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at LocalyticsOptimizing MongoDB: Lessons Learned at Localytics
Optimizing MongoDB: Lessons Learned at Localytics
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
 
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChangerZero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
Zero to 1 Billion+ Records: A True Story of Learning & Scaling GameChanger
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
Why Your MongoDB Needs Redis
Why Your MongoDB Needs RedisWhy Your MongoDB Needs Redis
Why Your MongoDB Needs Redis
 
MongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log CollectorMongoFr : MongoDB as a log Collector
MongoFr : MongoDB as a log Collector
 
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops TeamManaging 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
Managing 50K+ Redis Databases Over 4 Public Clouds ... with a Tiny Devops Team
 
A simple introduction to redis
A simple introduction to redisA simple introduction to redis
A simple introduction to redis
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
Sharding
ShardingSharding
Sharding
 
Cassandra vs. Redis
Cassandra vs. RedisCassandra vs. Redis
Cassandra vs. Redis
 
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
企業・業界情報プラットフォームSPEEDAにおけるElasticsearchの活用
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic Concepts
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
Redis overview for Software Architecture Forum
Redis overview for Software Architecture ForumRedis overview for Software Architecture Forum
Redis overview for Software Architecture Forum
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011High Performance Weibo QCon Beijing 2011
High Performance Weibo QCon Beijing 2011
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to Sharding
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 

Viewers also liked

Webinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBWebinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBBoxed Ice
 
Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the NumbersDevin Foley
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesAdrian Nuta
 
Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Dhaval Dalal
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinxAdrian Nuta
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQLmwasaha mwagambo
 
Social Media Trends - Content Curation
Social Media Trends - Content CurationSocial Media Trends - Content Curation
Social Media Trends - Content CurationChris Mikulin
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLNguyen Van Vuong
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MYXPLAIN
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practiceEugene Fidelin
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsDataWorks Summit
 
MySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesMySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesIsaac Mosquera
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsOSSCube
 
Plaquette Commerciale Phone Contact
Plaquette Commerciale Phone ContactPlaquette Commerciale Phone Contact
Plaquette Commerciale Phone Contactphonecontact
 
Canada Digital Future 2014
Canada Digital Future 2014Canada Digital Future 2014
Canada Digital Future 2014Counselorauto
 

Viewers also liked (20)

Webinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDBWebinar - Approaching 1 billion documents with MongoDB
Webinar - Approaching 1 billion documents with MongoDB
 
Craigslist by the Numbers
Craigslist by the NumbersCraigslist by the Numbers
Craigslist by the Numbers
 
Fulltext engine for non fulltext searches
Fulltext engine for non fulltext searchesFulltext engine for non fulltext searches
Fulltext engine for non fulltext searches
 
Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.Midas - on-the-fly schema migration tool for MongoDB.
Midas - on-the-fly schema migration tool for MongoDB.
 
Tayra
TayraTayra
Tayra
 
SphinxSearch
SphinxSearchSphinxSearch
SphinxSearch
 
Real time fulltext search with sphinx
Real time fulltext search with sphinxReal time fulltext search with sphinx
Real time fulltext search with sphinx
 
Managing Big Data with MySQL
Managing Big Data with MySQLManaging Big Data with MySQL
Managing Big Data with MySQL
 
Social Media Trends - Content Curation
Social Media Trends - Content CurationSocial Media Trends - Content Curation
Social Media Trends - Content Curation
 
Sphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQLSphinx - High performance full-text search for MySQL
Sphinx - High performance full-text search for MySQL
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
 
MySQL Performance Tips & Best Practices
MySQL Performance Tips & Best PracticesMySQL Performance Tips & Best Practices
MySQL Performance Tips & Best Practices
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
MySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 TipsMySQL Performance Tuning: Top 10 Tips
MySQL Performance Tuning: Top 10 Tips
 
Plaquette Commerciale Phone Contact
Plaquette Commerciale Phone ContactPlaquette Commerciale Phone Contact
Plaquette Commerciale Phone Contact
 
Qy stainless steel self priming gas-liquid mixing pump
Qy stainless steel self priming gas-liquid mixing pumpQy stainless steel self priming gas-liquid mixing pump
Qy stainless steel self priming gas-liquid mixing pump
 
Canada Digital Future 2014
Canada Digital Future 2014Canada Digital Future 2014
Canada Digital Future 2014
 
Ripening of a RESTful API
Ripening of a RESTful APIRipening of a RESTful API
Ripening of a RESTful API
 

Similar to Realtime Search Infrastructure at Craigslist (OpenWest 2014)

phpXperts seminar 2010 CodeMan! with noSQL!
phpXperts seminar 2010 CodeMan! with noSQL!phpXperts seminar 2010 CodeMan! with noSQL!
phpXperts seminar 2010 CodeMan! with noSQL!nhm taveer hossain khan
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At CraigslistMySQLConference
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
 
Plugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxPlugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxLiu Lizhi
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Zabbix 3.0 and beyond - FISL 2015
Zabbix 3.0 and beyond - FISL 2015Zabbix 3.0 and beyond - FISL 2015
Zabbix 3.0 and beyond - FISL 2015Zabbix
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraPyData
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedis Labs
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBJustin Smestad
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL Bernd Ocklin
 
AWS Welcome to re:Invent recap - 20161214
AWS Welcome to re:Invent recap - 20161214AWS Welcome to re:Invent recap - 20161214
AWS Welcome to re:Invent recap - 20161214Amazon Web Services
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用LINE Corporation
 
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
Web scraping with BeautifulSoup, LXML, RegEx and ScrapyWeb scraping with BeautifulSoup, LXML, RegEx and Scrapy
Web scraping with BeautifulSoup, LXML, RegEx and ScrapyLITTINRAJAN
 

Similar to Realtime Search Infrastructure at Craigslist (OpenWest 2014) (20)

phpXperts seminar 2010 CodeMan! with noSQL!
phpXperts seminar 2010 CodeMan! with noSQL!phpXperts seminar 2010 CodeMan! with noSQL!
phpXperts seminar 2010 CodeMan! with noSQL!
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
My Sql And Search At Craigslist
My Sql And Search At CraigslistMy Sql And Search At Craigslist
My Sql And Search At Craigslist
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Plugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxPlugin Opensql2008 Sphinx
Plugin Opensql2008 Sphinx
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Zabbix 3.0 and beyond - FISL 2015
Zabbix 3.0 and beyond - FISL 2015Zabbix 3.0 and beyond - FISL 2015
Zabbix 3.0 and beyond - FISL 2015
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
Alexander Sibiryakov- Frontera
Alexander Sibiryakov- FronteraAlexander Sibiryakov- Frontera
Alexander Sibiryakov- Frontera
 
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach ShoolmanRedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
RedisConf17 - Doing More With Redis - Ofer Bengal and Yiftach Shoolman
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
 
AWS Welcome to re:Invent recap - 20161214
AWS Welcome to re:Invent recap - 20161214AWS Welcome to re:Invent recap - 20161214
AWS Welcome to re:Invent recap - 20161214
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用ログ収集プラットフォーム開発におけるElasticsearchの運用
ログ収集プラットフォーム開発におけるElasticsearchの運用
 
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
Web scraping with BeautifulSoup, LXML, RegEx and ScrapyWeb scraping with BeautifulSoup, LXML, RegEx and Scrapy
Web scraping with BeautifulSoup, LXML, RegEx and Scrapy
 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 

Recently uploaded

Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgsaravananr517913
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 

Recently uploaded (20)

Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfgUnit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
Unit7-DC_Motors nkkjnsdkfnfcdfknfdgfggfg
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 

Realtime Search Infrastructure at Craigslist (OpenWest 2014)

  • 1. Realtime Search Infrastructure sphinx at craigslist ! Jeremy Zawodny @jzawodn https://github.com/jzawodn http://blog.zawodny.com/ Jeremy@Zawodny.com
  • 2. About Me at craigslist since mid-2008! first major project: “fix search”! Perl, search, MySQL, redis, MongoDB, data, backend services! previously: Yahoo! and Marathon Oil! wrote 1st edition of High Performance MySQL
  • 3. About craigslist engineering culture! no product managers or marketing! < 50 employees! self-hosted infrastructure we own & manage! no “cloud” or virtualization! multi-datacenter! driven by user needs and feedback
  • 5. Outline challenges! history of search at craigslist! lessons! questions?
  • 6. Challenges indexing rate (incoming volume)! thousands of postings per minute! churn and half-life! traffic (always increasing)! peak over 4,000 queries/second! query multipliers (new features)! spreading the load! sharding and partitioning
  • 7. History search 1.0: Perl + DBM (2000-2002)! search 2.0: MySQL Full-Text (2002-2008)! search 3.0: sphinx master/slave (2008-2011)! search 4.0: autonomous sphinx (2011-2013)! search 5.0: realtime sphinx (2013 - today)
  • 8. Evolution needs and desires are changing! sphinx is improving! hardware is more capable! learn from previous mistakes! it’s fun to do new things :-)! searching > browsing
  • 10. MySQL Full-Text used up until late 2008! manual sharding! performance was poor (easy to DoS)! often fell off a cliff! limited query syntax! MyISAM corruption
  • 12. Sphinx awesome! speaks MySQL protocol! amazingly fast indexing! very fast queries! easy to understand! programmer friendly! open source! great support! stable
  • 13. Sphinx Tools searchd: the sphinx server process! multi-threaded or pre-forking! indexer: build batch indexes off-line! indextool: check indexes and get details! search: diagnostic tool for simple searches
  • 14. Master/Slave Sphinx one index per city! growth by sharding into 2 then 3 clusters! masters build indexes every 10 minutes! used indexer and perl scripts to generate XML! build versioning and rollback mechanism! slaves pull indexes via rsync and reload! used pre-forking config! hardware was dual proc, dual core AMD Opterons with 32GB RAM
  • 17. Autonomous Sphinx 24 cores, 72GB RAM, 300GB SSD! combine master and slave onto single node! eliminate SPOF! no replication delay! simplify codebase! better utilization of hardware
  • 20. Real-Time Sphinx RT indexes in sphinx have matured! reduce overhead from the searchd restart! reduce time to search from posting going live! goal < 10 seconds! eliminate XML generation code! use MySQL protocol
  • 21. Sphinx Clusters Live: what you use! highest traffic, volume, churn! Team: what we use! lowest traffic, lots of extra data! Forums: yes, we have threaded discussions! low volume, low traffic! Archive: posting more than a few months old! terabytes of indexes, constantly growing
  • 22. Ram & Disk Chunks Indexes begin as “ram chunks”! rt_mem_limit caps their size! once too large, they become “disk chunks”! obviously, disk is slower than RAM! the more chunks, the more docs to check! query times fall, CPU use rises...
  • 23. Lessons stopwords: google has spoiled users! MONITOR ALL THE THINGS!!1!! Mind your rt_mem_limit! Keep it all in RAM! Make re-indexing easy! Automate cloning
  • 24. Questions? While you ask, keeping in mind...! craigslist is hiring! front-end developers! systems and network admins! back-end developers! send me your resume: z@craigslist.org! https://www.craigslist.org/about/craigslist_is_hiring