O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Realtime Search Infrastructure at Craigslist (OpenWest 2014)

12.993 visualizações

Publicada em

A brief history of search infrastructure at craigslist with an emphasis on our recent transition to using realtime (RT) indexing in Sphinx.

Publicada em: Engenharia, Tecnologia
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • HOW TO UNLOCK HER LEGS! (SNEAK PEAK), learn more... ♣♣♣ https://tinyurl.com/y52uv4vq
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @ https://www.ThesisScientist.com
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • I really like that
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Realtime Search Infrastructure at Craigslist (OpenWest 2014)

  1. Realtime Search Infrastructure sphinx at craigslist ! Jeremy Zawodny @jzawodn https://github.com/jzawodn http://blog.zawodny.com/ Jeremy@Zawodny.com
  2. About Me at craigslist since mid-2008! first major project: “fix search”! Perl, search, MySQL, redis, MongoDB, data, backend services! previously: Yahoo! and Marathon Oil! wrote 1st edition of High Performance MySQL
  3. About craigslist engineering culture! no product managers or marketing! < 50 employees! self-hosted infrastructure we own & manage! no “cloud” or virtualization! multi-datacenter! driven by user needs and feedback
  4. Search
  5. Outline challenges! history of search at craigslist! lessons! questions?
  6. Challenges indexing rate (incoming volume)! thousands of postings per minute! churn and half-life! traffic (always increasing)! peak over 4,000 queries/second! query multipliers (new features)! spreading the load! sharding and partitioning
  7. History search 1.0: Perl + DBM (2000-2002)! search 2.0: MySQL Full-Text (2002-2008)! search 3.0: sphinx master/slave (2008-2011)! search 4.0: autonomous sphinx (2011-2013)! search 5.0: realtime sphinx (2013 - today)
  8. Evolution needs and desires are changing! sphinx is improving! hardware is more capable! learn from previous mistakes! it’s fun to do new things :-)! searching > browsing
  9. MySQL Full-Text
  10. MySQL Full-Text used up until late 2008! manual sharding! performance was poor (easy to DoS)! often fell off a cliff! limited query syntax! MyISAM corruption
  11. Sphinx
  12. Sphinx awesome! speaks MySQL protocol! amazingly fast indexing! very fast queries! easy to understand! programmer friendly! open source! great support! stable
  13. Sphinx Tools searchd: the sphinx server process! multi-threaded or pre-forking! indexer: build batch indexes off-line! indextool: check indexes and get details! search: diagnostic tool for simple searches
  14. Master/Slave Sphinx one index per city! growth by sharding into 2 then 3 clusters! masters build indexes every 10 minutes! used indexer and perl scripts to generate XML! build versioning and rollback mechanism! slaves pull indexes via rsync and reload! used pre-forking config! hardware was dual proc, dual core AMD Opterons with 32GB RAM
  15. Master/Slave Sphinx master slave2 slave1 slave3 Postings DB web1 web2 web3 web4 web5 web6
  16. Hardware Upgrade!
  17. Autonomous Sphinx 24 cores, 72GB RAM, 300GB SSD! combine master and slave onto single node! eliminate SPOF! no replication delay! simplify codebase! better utilization of hardware
  18. Autonomous Sphinx
  19. Autonomous Sphinx slave2 slave1 slave3 Postings DB web1 web2 web3 web4 web5 web6
  20. Real-Time Sphinx RT indexes in sphinx have matured! reduce overhead from the searchd restart! reduce time to search from posting going live! goal < 10 seconds! eliminate XML generation code! use MySQL protocol
  21. Sphinx Clusters Live: what you use! highest traffic, volume, churn! Team: what we use! lowest traffic, lots of extra data! Forums: yes, we have threaded discussions! low volume, low traffic! Archive: posting more than a few months old! terabytes of indexes, constantly growing
  22. Ram & Disk Chunks Indexes begin as “ram chunks”! rt_mem_limit caps their size! once too large, they become “disk chunks”! obviously, disk is slower than RAM! the more chunks, the more docs to check! query times fall, CPU use rises...
  23. Lessons stopwords: google has spoiled users! MONITOR ALL THE THINGS!!1!! Mind your rt_mem_limit! Keep it all in RAM! Make re-indexing easy! Automate cloning
  24. Questions? While you ask, keeping in mind...! craigslist is hiring! front-end developers! systems and network admins! back-end developers! send me your resume: z@craigslist.org! https://www.craigslist.org/about/craigslist_is_hiring

×