O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Implementing search with solr at 7digital

1.490 visualizações

Publicada em

Presented by James Atherton, Search Team Lead, 7digital

A usage/case study, describing our journey as we implemented Lucene/Solr, the lessons we learned along the way and where we hope to go in the future.How we implemented our instant search/search suggest. How we handle trying to index 400 million tracks and metadata for over 40 countries, comprising over 300GB of data, and about 70GB of indexes. Finally where we hope to go in the future.

  • Seja o primeiro a comentar

Implementing search with solr at 7digital

  1. 1. Implementing Search with Solr at 7digitalJames AthertonContent Discovery Team Lead
  2. 2. Implementing Searchwith SolrJames AthertonContent Discovery Lead@mr_road
  3. 3. Who is 7digital?Online digital content providerCovering over 47 territoriesOnline music store: www.7digital.comAPI: api.7digital.comWe power a number of music services:SamsungBlackberryTurntable.fmPure
  4. 4. Where we came from...SQL SearchesSELECT *FROM <table>WHERE name LIKE <search_term>%;This was SLOW and BAD!!
  5. 5. Wrapped Solr in an API
  6. 6. Old ArchitectureAPIDB
  7. 7. Domain ObjectsArtist DocumentsRelease Documents (e.g. album or single)Track Documents
  8. 8. First Attempt - 2011• Artists and Releases• Solr 1.4• 17 stores• ~40GB• Dropped DIH as it had issues
  9. 9. 2011 ArchitectureHTTPAPISearchAPI SolrDBSolrTracksArtistsReleases
  10. 10. 2012• Added Tracks Core• Solr 3.5• 47 stores• ~400GB• More than 430 M docs• Didnt revisit DIH
  11. 11. Current ArchitectureHTTPAPISearchAPIArtist/ReleaseSolrsTrack SolrsTrack SolrsTrack SolrsTrack SolrsArtist/ReleaseSolrs
  12. 12. Things LearntWe should have split by <X>; for us Shops.
  13. 13. Beware Inflection PointsData size: 400GB != 40GB * 10Throughput: 600 rpm IS NOT 4 * 150 rpm
  14. 14. What we want in our servers?RAM ?Fast Disks?CPUs?Virtual?Bare Metal?
  15. 15. Optimize really...?
  16. 16. Cache Warming/First search?
  17. 17. TestingTest ingestion/data import, then test againYour data is not as clean as you thinkLoad test early and oftenWe need to be better at this still
  18. 18. LogsLogging is worth its weight in goldBut dont get weighed down
  19. 19. MonitoringWe use statsd/graphite and NewRelic:
  20. 20. Visualise IndexingWhich territorys data has been indexed?
  21. 21. Instant Search
  22. 22. Magic DeploysWe recently adopted CFEngine, it is awesome!!
  23. 23. The FutureHTTPAPISearchAPIArtist SolrsTrackSolrsTrackSolrsTrackSolrsReleaseSolrsTrack SolrsSolr Cloud, inthe Cloud??
  24. 24. Questions?
  25. 25. Resourceshttps://github.com/etsy/statsd/https://github.com/7digitalhttp://d3js.org/
  26. 26. James Atherton@mr_road@7digital