O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Inside Solr 5 - Bangalore Solr/Lucene Meetup

1.260 visualizações

Publicada em

What's new in Solr 5 and what's in store in the near term

Publicada em: Software
  • Seja o primeiro a comentar

Inside Solr 5 - Bangalore Solr/Lucene Meetup

  1. 1. October 13-15, 2015 • Austin, TX http://lucenerevolution.org
  2. 2. Inside Apache Solr 5
  3. 3. COMMUNITY CUSTOMERS PRODUCTS Apache Solr + Lucidworks
  4. 4. Search is more than just a box.
  5. 5. personal. contextual. actionable. Search makes data
  6. 6. Search can be smarter. location search history query security context Personal, contextual, relevant results: consumer- like simplicity and power in the enterprise.
  7. 7. Product Offering Environment Features Support Level Additional Support Availability Response Time Number of Incidents Pricing Model Solr Enterprise 24x7 SLA-Backed Unlimited Incidents Per Node Dev Support (4 Contacts) Operational Support Regular Health Checks Security Log Analysis / SiLK Support Dashboards & Reporting Enhanced Admin UI Fusion Dev Support (4 Contacts) Operational Support Regular Health Checks 24x7 SLA-Backed Unlimited Incidents Per Node Security Crawlers & Connectors Log Analysis / SiLK Support Enhanced Admin UI Data Enrichment Machine Learning Recommendations Advanced Relevancy Tuning Developer Support How-To Support Knowledge Base Fusion Support 9x5 SLA-Backed Unlimited Incidents Per Named Developer ProductionDevelopment
  8. 8. • Get Started • Dig in • Go Big • Get Finished • Sneak peak Inside Apache Solr 5
  9. 9. • Easy to start/stop ./bin/solr {start|stop} • Create collections: ./bin/solr create -c <COLL_NAME> • No more WAR! Web container (Jetty) is now an implementation detail • Scripts to support installing and running Solr as a service on Linux. Get Started
  10. 10. JSON’s great: • Solr 5 “does the right thing” for JSON out of the box Except when it isn’t: • Most data isn’t JSON • Solr handles CSV, XML, Rich Content out of the box without having to install plugins Your Content, Your Way
  11. 11. Your Content, Your Way • Solr 5 will ship Tika 1.7, adding: • OCR support • PST and Matlab • Better Date Handling • More flexibility with spatial units
  12. 12. Dig In
  13. 13. • Stats and Pivot faceting now work together • Focused on accuracy of results • First few steps in unification of all facet types with stats and aggregations • http://lucidworks.com/blog/you- got-stats-in-my-facets/ Pivots and Stats
  14. 14. • Schema API: REST API for adding field types, and dynamic fields • Managing Request Handlers through API • Implicit registration of replication, Real Time Get and Administration Handlers • Improved APIs for managing collections API Goodness
  15. 15. Lucene 5 Highlights • Stronger index safety guarantees • Reduced memory usage in a number of areas • No more FieldCache (replaced w/ UninvertingReader) • Multi-valued sorting and suggesters • Better IO defaults when using SSDs • More efficient handling of merging stored fields
  16. 16. Go Big • Many scaling improvements focused on interactions with Zookeeper: • Split cluster state management reduces chattiness in large multi-tenant implementations • Improved performance for Overseer operations >40% • Better timeout defaults based on real-world testing • See my Lucene Revolution Keynote for more details: http://bit.ly/shalinRevKeynote
  17. 17. Distributed IDF • IDF = Inverse Document Frequency = A measure of the relative importance of a word in a collection • 4 implementations: • LocalStatsCache: Local Stats • ExactStatsCache: One time use aggregation • ExactSharedStatsCache: Stats shared across requests • LRUStatsCache: Stats shared in an LRU cache across requests
  18. 18. • Ease of getting started means nothing if you can’t stay running in production • Jepsen tests simulate network partitions, data loss, i.e. “The Real World” • https://github.com/ LucidWorks/jepsen/tree/solr- jepsen • http://bit.ly/solr-jepsen Get Finished
  19. 19. Stability Improvements • Protection of ZK content • ReplicationHandler now has an option to throttle the speed of replication • More control over terminating long running queries • Finite default timeouts for select and update requests
  21. 21. • Facets and Analytics: • Mix and match all facet types and stats (SOLR-6352, SOLR-6353, SOLR-4212) • Percentiles via t-digest (SOLR-6350) • Replication performance (SOLR-6816) • Finish off Config APIs (various) • Data location aware ValueSource implementation for fast changing distributed data • First class support for more languages OOTB Near Term Road Map
  22. 22. Resources Release Notes: • Solr: http://wiki.apache.org/solr/ReleaseNote50 • Lucene: https://wiki.apache.org/lucene-java/ ReleaseNote50 Lucidworks: http://www.lucidworks.com Shalin Shekhar Mangar • shalin@apache.org • Twitter: https://twitter.com/shalinmangar
  23. 23. Credits What’s new in Solr 5.0 — Anshum Gupta • http://www.slideshare.net/anshumg/solr-50 Lucidworks webinar “Inside Solr 5” - Grant Ingersoll • http://www.slideshare.net/lucidworks/webinar-inside- apache-solr-5