SlideShare uma empresa Scribd logo
1 de 28
Mastering
Solr 1.4
Yonik Seeley
March 4, 2010
Agenda



     Faceting
     Trie Fields (numeric ranges)
     Distributed Search
     1.5 Preview
     Q&A
     See http://bit.ly/mastersolr for
         Slides for download after the presentation
         On-demand viewing of the webinar in ~48 hours



                                Lucid Imagination, Inc.
My background



         Creator of Solr, the Lucene Search Server
         Co-founder of Lucid Imagination
         Expertise: Distributed Search systems and performance
         Lucene/Solr committer, a member of the Lucene PMC,
         member of the Apache Software Foundation
         Work: CNET Networks, BEA, Telcordia, among others
         M.S. in Computer Science, Stanford




                           Lucid Imagination, Inc.
Getting the most from this talk



      Assuming you’ve been completed the Solr tutorial
      Assuming you’re familiar faceting and other
      high-level Solr/Lucene concepts
      I don’t expect you to have deployed Solr
      in production, but it will provide helpful context




                              Lucid Imagination, Inc.
Faceting Deep Dive
Existing single-valued faceting algorithm
                     Documents
                     matching the
                     base query                        Lucene FieldCache Entry
                     “Juggernaut”                      (StringIndex) for the “hero” field
                          0                       order: for each
 q=Juggernaut             2    lookup             doc, an index into     lookup: the
 &facet=true              7                       the lookup array       string values
 &facet.field=hero                                         5                (null)
                                                           3              batman
                      accumulator
                                                           5                flash
                         0
                                                           1             spiderman
                         1
                                                           4             superman
                         0   increment
                                                           5             wolverine
                         0
                                                           2
                         0
                                                           1
                         2


                             Lucid Imagination, Inc.
Existing single-valued faceting algorithm



      facet.method=fc
      for each doc in base set:
        ord = FieldCache.order[doc]
        accumulator[ord]++
      O(docs_matching_q)
      Not used for boolean




                               Lucid Imagination, Inc.
Multi-valued faceting: enum method (1 of 2)

facet.method=enum
For each term in field:
  Retrieve filter
                                                                       Solr filterCache (in memory)
  Calculate intersection size
                                                                      hero:batman    hero:flash

        Lucene Inverted         Docs matching
        Index (on disk)         base query                                 1            0
                                                      intersection
                                     0                                     3            1
     batman         1 3 5 8                              count
                                     1                                     5            5
     flash          0 1 5            5                                     8
     spiderman      2 4 7            9               Priority queue
                                                      batman=2
     superman       0 6 9
     wolverine      1 2 7 8



                                         Lucid Imagination, Inc.
Multi-valued faceting: enum method (2 of 2)



      O(n_terms_in_field)
      Short circuits based on term.df
      filterCache entries int[ndocs] or BitSet(maxDoc)
      filterCache concurrency + efficiency upgrades in 1.4
      Size filterCache appropriately
      Prevent filterCache use for small terms with
      facet.enum.cache.minDf




                              Lucid Imagination, Inc.
Multi-valued faceting: new UnInvertedField
facet.method=fc
Like single-valued FieldCache method, but with multi-valued FieldCache
Good for many unique terms, relatively few values per doc
  Best case: 50x faster, 5x smaller (100K unique values, 1-5 per doc)
  O(n_docs), but optimization to count the inverse when n>maxDoc/2
Memory efficient
  Terms in a document are delta coded variable width term numbers
  Term number list for document packed in an int or in a shared byte[]
  Hybrid approach: “big terms” that match >5% of index use filterCache instead
  Only 1/128th of string values in memory


                                   Lucid Imagination, Inc.
Faceting: fieldValueCache
Implicit cache with UnInvertedField entries
  Not autowarmed – use static warming request
  http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc)




                                  Lucid Imagination, Inc.
Faceting: fieldValueCache
  Implicit cache with UnInvertedField entries
     Not autowarmed – use static warming request
     http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc)



item_cat:{field=cat,memSize=5376,tindexSize=52,time=2,phase1=2,
nTerms=16,bigTerms=10,termInstances=6,uses=44}




                                     Lucid Imagination, Inc.
Migrating from 1.3 to 1.4 faceting


   filterCache
     Lower size (need room for normal fq filters and “big terms”)
     Lower or eliminate autowarm count
   1.3 enum method can sometimes be better
     f.<fieldname>.facet.method=enum
     Field has many values per document and many unique values
     Huge index without faceting time constraints




                                 Lucid Imagination, Inc.
Multi-select faceting
http://search.lucidimagination.com
                                      Very generic support
                                            Reuses localParams syntax {!name=val}
                                            Ability to tag filters
                                            Ability to exclude certain filters when
                                            faceting, by tag
                                     q=index replication&facet=true
                                     &fq={!tag=proj}project:(lucene OR solr)
                                     &facet.field={!ex=proj}project
                                     &facet.field={!ex=src}source




                                     Lucid Imagination, Inc.                          14
New Trie Fields
(numeric ranges)
New Trie* fields

  Numeric,Date fields index at multiple precisions to speed up range
queries
  Base10 Example: 175 is indexed as hundreds:1 tens:17 ones:175
  TrieRangeQuery:[154 TO 183] is executed as
  tens:[16 TO 17] OR ones:[154 TO 159] OR ones:[180 TO 183]
  Best case: 40x speedup of range queries
  Configurable precisionStep per field (expressed in bits)
     precisionStep=8 means index first 8, then first 16, first 24, and 32 bits
     precisionStep=0 means index normally
  Extra bonus: more memory efficient FieldCache entries


                                    Lucid Imagination, Inc.
Trie* Index Overhead
100,000 documents, random 32 bit integers from 0-1000
Precision Step                    Index Size*                              Index Size Multiplier
0                                 223K                                     1
8                                 588K                                     2.6
6                                 838K                                     3.7
4                                 1095K                                    4.9

100,000 documents, random 32 bit integers
Precision Step                    Index Size*                              Index Size Multiplier
0                                 1.17M                                    1
8                                 3.03M                                    2.6
6                                 3.86M                                    3.3
4                                 5.47M                                    4.7

    *Index Size reflects only the portion of the index related to indexing the field

                                                 Lucid Imagination, Inc.
Schema migration

   Use int, float, long, double, date for normal numerics
     <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
     omitNorms="true" positionIncrementGap="0"/>
   Use tint, tfloat, tlong, tdouble, tdate for faster range queries
     <fieldType name="tint" class="solr.TrieIntField" precisionStep="8"
     omitNorms="true" positionIncrementGap="0"/>
     Date faceting also uses range queries
   date, tdate, NOW can be used in function queries
     Date boosting: recip(ms(NOW,mydatefield),3.16e-11,1,1)




                                 Lucid Imagination, Inc.
Distributed Search
Distributed Search



      Split into multiple shards
        When single query latency too long
        • Super-linear speedup possible
        Optimal performance when free RAM > shard size
        • Minimum: RAM > (shard index size – stored_fields)
      Use replication for HA & increased capacity (queries/sec)




                                 Lucid Imagination, Inc.
Distributed Search & Sharding
http://...?shards=VIP1,VIP2
                              VIP1                                    VIP2

                 searchers
 2x4
                    replication


                       master handles updates

http://...?shards=VIP1,VIP2,VIP3,VIP4
                                                4x2
             VIP1                    VIP2                      VIP3          VIP4



 4x2

                                            Lucid Imagination, Inc.
Distributed Search: Use Cases



      2 shards x 4 replicas
        Fewer masters to manage (and fewer total servers)
        Less increased load on other replicas when one goes down (33%
        vs 100%)
        Less network bandwidth
      4 shards x 2 replicas
        Greater indexing bandwidth




                              Lucid Imagination, Inc.
Index Partitioning

  Partition by date
    Easy incremental scalability - add more servers over time as needed
    Easy to remove oldest data
    Enables increased replication factor for newest data
  Partition by document id
    Works well for updating
    Not easily resizable
  Partitioning to allow querying a subset of shards
    Increases system throughput, decreases network bandwidth
    Partition by userId for mailbox search
    Partition by region for geographic search

                                 Lucid Imagination, Inc.
1.5 Preview
1.5 Preview: SolrCloud
Baby steps toward simplifying cluster management
Integrates Zookeeper
   Central configuration (solrconfig.xml, etc)
   Tracks live nodes
   Tracks shards of collections
Removes need for external load balancers
   shards=localhost:8983/solr|localhost:8900/solr,localhost:7574/solr|localhost:7500/solr

Can specify logical shard ids
   shards=NY_shard,NJ_shard

Clients don’t need to know shards:
http://localhost:8983/solr/collection1/select?distrib=true

                                           Lucid Imagination, Inc.
1.5 Preview: Spatial Search

 PointType
   Compound values: 38.89,-77.03
   Range queries and exact matches supported
 Distance Functions
   haversine
 Sorting by function query
 Still needed
   Ability to return sort values
   Distance filtering



                                   Lucid Imagination, Inc.
Q&A
Resources

   Apache Solr web site
     http://lucene.apache.org/solr
   LucidWorks: free Certified Distribution of Solr + Reference Guide
     http://www.lucidimagination.com/Downloads
   Search all of Lucene/Solr (wiki, mailing lists, issues, ref man, etc)
     http://search.lucidimagination.com
   Download slides (in ~4 hours) & re-play this talk (~48 hours)
      http://bit.ly/mastersolr
   Thanks for coming!


                                 Lucid Imagination, Inc.

Mais conteúdo relacionado

Mais procurados

Python Programming: Data Structure
Python Programming: Data StructurePython Programming: Data Structure
Python Programming: Data StructureChan Shik Lim
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2Zaar Hai
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務台灣資料科學年會
 
Advanced python
Advanced pythonAdvanced python
Advanced pythonEU Edge
 
Creating your own Abstract Processor
Creating your own Abstract ProcessorCreating your own Abstract Processor
Creating your own Abstract ProcessorAodrulez
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習台灣資料科學年會
 
Tutorial4 Threads
Tutorial4  ThreadsTutorial4  Threads
Tutorial4 Threadstech2click
 
Zope component architechture
Zope component architechtureZope component architechture
Zope component architechtureAnatoly Bubenkov
 
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1Preston Lee
 
[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務台灣資料科學年會
 
Lecture02 class -_templatev2
Lecture02 class -_templatev2Lecture02 class -_templatev2
Lecture02 class -_templatev2Hariz Mustafa
 
Network vs. Code Metrics to Predict Defects: A Replication Study
Network vs. Code Metrics  to Predict Defects: A Replication StudyNetwork vs. Code Metrics  to Predict Defects: A Replication Study
Network vs. Code Metrics to Predict Defects: A Replication StudyKim Herzig
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflowKeon Kim
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
Advanced Java Practical File
Advanced Java Practical FileAdvanced Java Practical File
Advanced Java Practical FileSoumya Behera
 

Mais procurados (20)

Python Programming: Data Structure
Python Programming: Data StructurePython Programming: Data Structure
Python Programming: Data Structure
 
Python & Stuff
Python & StuffPython & Stuff
Python & Stuff
 
C# - What's next
C# - What's nextC# - What's next
C# - What's next
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2
 
[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務[系列活動] 手把手的深度學習實務
[系列活動] 手把手的深度學習實務
 
The walking 0xDEAD
The walking 0xDEADThe walking 0xDEAD
The walking 0xDEAD
 
Advance python
Advance pythonAdvance python
Advance python
 
分散式系統
分散式系統分散式系統
分散式系統
 
Advanced python
Advanced pythonAdvanced python
Advanced python
 
Creating your own Abstract Processor
Creating your own Abstract ProcessorCreating your own Abstract Processor
Creating your own Abstract Processor
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
Tutorial4 Threads
Tutorial4  ThreadsTutorial4  Threads
Tutorial4 Threads
 
Zope component architechture
Zope component architechtureZope component architechture
Zope component architechture
 
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
Ruby Supercomputing - Using The GPU For Massive Performance Speedup v1.1
 
[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務[系列活動] 手把手的深度學實務
[系列活動] 手把手的深度學實務
 
Lecture02 class -_templatev2
Lecture02 class -_templatev2Lecture02 class -_templatev2
Lecture02 class -_templatev2
 
Network vs. Code Metrics to Predict Defects: A Replication Study
Network vs. Code Metrics  to Predict Defects: A Replication StudyNetwork vs. Code Metrics  to Predict Defects: A Replication Study
Network vs. Code Metrics to Predict Defects: A Replication Study
 
Attention mechanisms with tensorflow
Attention mechanisms with tensorflowAttention mechanisms with tensorflow
Attention mechanisms with tensorflow
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Advanced Java Practical File
Advanced Java Practical FileAdvanced Java Practical File
Advanced Java Practical File
 

Destaque

SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessLucidworks (Archived)
 
Нестандартные методы интернет рекламы
Нестандартные методы интернет рекламыНестандартные методы интернет рекламы
Нестандартные методы интернет рекламыVladimir
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
第4回「ブラウザー勉強会」オープニング トーク
第4回「ブラウザー勉強会」オープニング トーク第4回「ブラウザー勉強会」オープニング トーク
第4回「ブラウザー勉強会」オープニング トーク彰 村地
 
Oslb office365
Oslb office365Oslb office365
Oslb office365彰 村地
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentationMarty Kaszubowski
 
Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Marty Kaszubowski
 
Updated: Getting Ready for Due-Diligence
Updated:  Getting Ready for Due-DiligenceUpdated:  Getting Ready for Due-Diligence
Updated: Getting Ready for Due-DiligenceMarty Kaszubowski
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrLucidworks (Archived)
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucidworks (Archived)
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombsstanica
 
Building SaaS Solutions for Online Media Using Apache Solr
Building SaaS Solutions for Online Media Using Apache SolrBuilding SaaS Solutions for Online Media Using Apache Solr
Building SaaS Solutions for Online Media Using Apache SolrLucidworks (Archived)
 
Tv ролики
Tv роликиTv ролики
Tv роликиtarodnova
 

Destaque (20)

SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
Нестандартные методы интернет рекламы
Нестандартные методы интернет рекламыНестандартные методы интернет рекламы
Нестандартные методы интернет рекламы
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Customized Navigation Using SOLR
Customized Navigation Using SOLRCustomized Navigation Using SOLR
Customized Navigation Using SOLR
 
第4回「ブラウザー勉強会」オープニング トーク
第4回「ブラウザー勉強会」オープニング トーク第4回「ブラウザー勉強会」オープニング トーク
第4回「ブラウザー勉強会」オープニング トーク
 
Oslb office365
Oslb office365Oslb office365
Oslb office365
 
Van gogh
Van goghVan gogh
Van gogh
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentation
 
Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14Presentation to Virginia Beach Vision, 1 27-14
Presentation to Virginia Beach Vision, 1 27-14
 
Updated: Getting Ready for Due-Diligence
Updated:  Getting Ready for Due-DiligenceUpdated:  Getting Ready for Due-Diligence
Updated: Getting Ready for Due-Diligence
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Lucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lrLucene rev preso bialecki solr crawlers-lr
Lucene rev preso bialecki solr crawlers-lr
 
Lucene rev preso cisco gannu
Lucene rev preso cisco gannuLucene rev preso cisco gannu
Lucene rev preso cisco gannu
 
Spanish bombss
Spanish bombssSpanish bombss
Spanish bombss
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValuesColumn Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Building SaaS Solutions for Online Media Using Apache Solr
Building SaaS Solutions for Online Media Using Apache SolrBuilding SaaS Solutions for Online Media Using Apache Solr
Building SaaS Solutions for Online Media Using Apache Solr
 
Tv ролики
Tv роликиTv ролики
Tv ролики
 
Search Analytics What? Why? How?
Search Analytics What? Why? How?Search Analytics What? Why? How?
Search Analytics What? Why? How?
 
Com camp2014
Com camp2014Com camp2014
Com camp2014
 

Semelhante a Learn How to Master Solr1 4

Beirut Java User Group JVM presentation
Beirut Java User Group JVM presentationBeirut Java User Group JVM presentation
Beirut Java User Group JVM presentationMahmoud Anouti
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBCody Ray
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory AnalysisMoabi.com
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataJetlore
 
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012Big Data Spain
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012Amazon Web Services
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...DataStax Academy
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovationsLucidworks (Archived)
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafkaNitin Kumar
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileDatabricks
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsIvan Shcheklein
 
housing price prediction ppt in artificial
housing price prediction ppt in artificialhousing price prediction ppt in artificial
housing price prediction ppt in artificialKrishPatel802536
 
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Databricks
 

Semelhante a Learn How to Master Solr1 4 (20)

Beirut Java User Group JVM presentation
Beirut Java User Group JVM presentationBeirut Java User Group JVM presentation
Beirut Java User Group JVM presentation
 
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDBBuilding a Scalable Distributed Stats Infrastructure with Storm and KairosDB
Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis[Ruxcon 2011] Post Memory Corruption Memory Analysis
[Ruxcon 2011] Post Memory Corruption Memory Analysis
 
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive DataSpark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
Spark and Shark: Lightning-Fast Analytics over Hadoop and Hive Data
 
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012
Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012
 
Sparklyr
SparklyrSparklyr
Sparklyr
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
Beyond the Query: A Cassandra + Solr + Spark Love Triangle Using Datastax Ent...
 
Seeley yonik solr performance key innovations
Seeley yonik   solr performance key innovationsSeeley yonik   solr performance key innovations
Seeley yonik solr performance key innovations
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
Deep learning with kafka
Deep learning with kafkaDeep learning with kafka
Deep learning with kafka
 
Failure Of DEP And ASLR
Failure Of DEP And ASLRFailure Of DEP And ASLR
Failure Of DEP And ASLR
 
Advertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-MobileAdvertising Fraud Detection at Scale at T-Mobile
Advertising Fraud Detection at Scale at T-Mobile
 
Sedna XML Database: Executor Internals
Sedna XML Database: Executor InternalsSedna XML Database: Executor Internals
Sedna XML Database: Executor Internals
 
Spark and shark
Spark and sharkSpark and shark
Spark and shark
 
housing price prediction ppt in artificial
housing price prediction ppt in artificialhousing price prediction ppt in artificial
housing price prediction ppt in artificial
 
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
 

Mais de Lucidworks (Archived)

Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Lucidworks (Archived)
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and SolrLucidworks (Archived)
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceLucidworks (Archived)
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchLucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Lucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCLucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCLucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCLucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCLucidworks (Archived)
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucidworks (Archived)
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseLucidworks (Archived)
 

Mais de Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
Lucene/Solr Revolution 2013: Paul Doscher Opening Remarks
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
 

Último

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Último (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Learn How to Master Solr1 4

  • 2. Agenda Faceting Trie Fields (numeric ranges) Distributed Search 1.5 Preview Q&A See http://bit.ly/mastersolr for Slides for download after the presentation On-demand viewing of the webinar in ~48 hours Lucid Imagination, Inc.
  • 3. My background Creator of Solr, the Lucene Search Server Co-founder of Lucid Imagination Expertise: Distributed Search systems and performance Lucene/Solr committer, a member of the Lucene PMC, member of the Apache Software Foundation Work: CNET Networks, BEA, Telcordia, among others M.S. in Computer Science, Stanford Lucid Imagination, Inc.
  • 4. Getting the most from this talk Assuming you’ve been completed the Solr tutorial Assuming you’re familiar faceting and other high-level Solr/Lucene concepts I don’t expect you to have deployed Solr in production, but it will provide helpful context Lucid Imagination, Inc.
  • 6. Existing single-valued faceting algorithm Documents matching the base query Lucene FieldCache Entry “Juggernaut” (StringIndex) for the “hero” field 0 order: for each q=Juggernaut 2 lookup doc, an index into lookup: the &facet=true 7 the lookup array string values &facet.field=hero 5 (null) 3 batman accumulator 5 flash 0 1 spiderman 1 4 superman 0 increment 5 wolverine 0 2 0 1 2 Lucid Imagination, Inc.
  • 7. Existing single-valued faceting algorithm facet.method=fc for each doc in base set: ord = FieldCache.order[doc] accumulator[ord]++ O(docs_matching_q) Not used for boolean Lucid Imagination, Inc.
  • 8. Multi-valued faceting: enum method (1 of 2) facet.method=enum For each term in field: Retrieve filter Solr filterCache (in memory) Calculate intersection size hero:batman hero:flash Lucene Inverted Docs matching Index (on disk) base query 1 0 intersection 0 3 1 batman 1 3 5 8 count 1 5 5 flash 0 1 5 5 8 spiderman 2 4 7 9 Priority queue batman=2 superman 0 6 9 wolverine 1 2 7 8 Lucid Imagination, Inc.
  • 9. Multi-valued faceting: enum method (2 of 2) O(n_terms_in_field) Short circuits based on term.df filterCache entries int[ndocs] or BitSet(maxDoc) filterCache concurrency + efficiency upgrades in 1.4 Size filterCache appropriately Prevent filterCache use for small terms with facet.enum.cache.minDf Lucid Imagination, Inc.
  • 10. Multi-valued faceting: new UnInvertedField facet.method=fc Like single-valued FieldCache method, but with multi-valued FieldCache Good for many unique terms, relatively few values per doc Best case: 50x faster, 5x smaller (100K unique values, 1-5 per doc) O(n_docs), but optimization to count the inverse when n>maxDoc/2 Memory efficient Terms in a document are delta coded variable width term numbers Term number list for document packed in an int or in a shared byte[] Hybrid approach: “big terms” that match >5% of index use filterCache instead Only 1/128th of string values in memory Lucid Imagination, Inc.
  • 11. Faceting: fieldValueCache Implicit cache with UnInvertedField entries Not autowarmed – use static warming request http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc) Lucid Imagination, Inc.
  • 12. Faceting: fieldValueCache Implicit cache with UnInvertedField entries Not autowarmed – use static warming request http://localhost:8983/solr/admin/stats.jsp (mem size, time to create, etc) item_cat:{field=cat,memSize=5376,tindexSize=52,time=2,phase1=2, nTerms=16,bigTerms=10,termInstances=6,uses=44} Lucid Imagination, Inc.
  • 13. Migrating from 1.3 to 1.4 faceting filterCache Lower size (need room for normal fq filters and “big terms”) Lower or eliminate autowarm count 1.3 enum method can sometimes be better f.<fieldname>.facet.method=enum Field has many values per document and many unique values Huge index without faceting time constraints Lucid Imagination, Inc.
  • 14. Multi-select faceting http://search.lucidimagination.com Very generic support Reuses localParams syntax {!name=val} Ability to tag filters Ability to exclude certain filters when faceting, by tag q=index replication&facet=true &fq={!tag=proj}project:(lucene OR solr) &facet.field={!ex=proj}project &facet.field={!ex=src}source Lucid Imagination, Inc. 14
  • 16. New Trie* fields Numeric,Date fields index at multiple precisions to speed up range queries Base10 Example: 175 is indexed as hundreds:1 tens:17 ones:175 TrieRangeQuery:[154 TO 183] is executed as tens:[16 TO 17] OR ones:[154 TO 159] OR ones:[180 TO 183] Best case: 40x speedup of range queries Configurable precisionStep per field (expressed in bits) precisionStep=8 means index first 8, then first 16, first 24, and 32 bits precisionStep=0 means index normally Extra bonus: more memory efficient FieldCache entries Lucid Imagination, Inc.
  • 17. Trie* Index Overhead 100,000 documents, random 32 bit integers from 0-1000 Precision Step Index Size* Index Size Multiplier 0 223K 1 8 588K 2.6 6 838K 3.7 4 1095K 4.9 100,000 documents, random 32 bit integers Precision Step Index Size* Index Size Multiplier 0 1.17M 1 8 3.03M 2.6 6 3.86M 3.3 4 5.47M 4.7 *Index Size reflects only the portion of the index related to indexing the field Lucid Imagination, Inc.
  • 18. Schema migration Use int, float, long, double, date for normal numerics <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/> Use tint, tfloat, tlong, tdouble, tdate for faster range queries <fieldType name="tint" class="solr.TrieIntField" precisionStep="8" omitNorms="true" positionIncrementGap="0"/> Date faceting also uses range queries date, tdate, NOW can be used in function queries Date boosting: recip(ms(NOW,mydatefield),3.16e-11,1,1) Lucid Imagination, Inc.
  • 20. Distributed Search Split into multiple shards When single query latency too long • Super-linear speedup possible Optimal performance when free RAM > shard size • Minimum: RAM > (shard index size – stored_fields) Use replication for HA & increased capacity (queries/sec) Lucid Imagination, Inc.
  • 21. Distributed Search & Sharding http://...?shards=VIP1,VIP2 VIP1 VIP2 searchers 2x4 replication master handles updates http://...?shards=VIP1,VIP2,VIP3,VIP4 4x2 VIP1 VIP2 VIP3 VIP4 4x2 Lucid Imagination, Inc.
  • 22. Distributed Search: Use Cases 2 shards x 4 replicas Fewer masters to manage (and fewer total servers) Less increased load on other replicas when one goes down (33% vs 100%) Less network bandwidth 4 shards x 2 replicas Greater indexing bandwidth Lucid Imagination, Inc.
  • 23. Index Partitioning Partition by date Easy incremental scalability - add more servers over time as needed Easy to remove oldest data Enables increased replication factor for newest data Partition by document id Works well for updating Not easily resizable Partitioning to allow querying a subset of shards Increases system throughput, decreases network bandwidth Partition by userId for mailbox search Partition by region for geographic search Lucid Imagination, Inc.
  • 25. 1.5 Preview: SolrCloud Baby steps toward simplifying cluster management Integrates Zookeeper Central configuration (solrconfig.xml, etc) Tracks live nodes Tracks shards of collections Removes need for external load balancers shards=localhost:8983/solr|localhost:8900/solr,localhost:7574/solr|localhost:7500/solr Can specify logical shard ids shards=NY_shard,NJ_shard Clients don’t need to know shards: http://localhost:8983/solr/collection1/select?distrib=true Lucid Imagination, Inc.
  • 26. 1.5 Preview: Spatial Search PointType Compound values: 38.89,-77.03 Range queries and exact matches supported Distance Functions haversine Sorting by function query Still needed Ability to return sort values Distance filtering Lucid Imagination, Inc.
  • 27. Q&A
  • 28. Resources Apache Solr web site http://lucene.apache.org/solr LucidWorks: free Certified Distribution of Solr + Reference Guide http://www.lucidimagination.com/Downloads Search all of Lucene/Solr (wiki, mailing lists, issues, ref man, etc) http://search.lucidimagination.com Download slides (in ~4 hours) & re-play this talk (~48 hours) http://bit.ly/mastersolr Thanks for coming! Lucid Imagination, Inc.