SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
About Solr
                         People as A Search Problem




Thursday, May 26, 2011
About Me


                    • Building websites since 1996, Java since
                      1997
                    • Prior web search experience
                    • Building and scaling eHarmony
                      products since 2002



Thursday, May 26, 2011
What is Jazzed


                    • Subscription Based
                      Dating Site
                    • Incubated by
                      eHarmony




Thursday, May 26, 2011
What is Jazzed


                     • Create a profile
                     • Search for others
                     • View their photos
                     • Privately
                       Communicate


Thursday, May 26, 2011
What is Jazzed


                     • Create a profile
                     • Search for others
                     • View their photos
                     • Privately
                       Communicate


Thursday, May 26, 2011
What is Jazzed


                     • Create a profile
                     • Search for others
                     • View their photos
                     • Privately
                       Communicate


Thursday, May 26, 2011
What is Jazzed


                     • Create a profile
                     • Search for others
                     • View their photos
                     • Privately
                       Communicate


Thursday, May 26, 2011
How is it different?


                    • Covers broader range of relationships
                    • Easy to get started
                    • Real profiles screened by machine and
                      humans
                    • Fast, effective search oriented tools



Thursday, May 26, 2011
Jazzed Stats

                    • Started Fall 2009
                    • Beta Summer 2010
                    • Launched October 2010
                    • 100,000s of Profiles
                    • 1,000s of Searches Daily


Thursday, May 26, 2011
Jazzed Architecture



                    • Event-driven SOA
                    • REST, JSON, EIP, Not-only-SQL
                    • Technology incubation




Thursday, May 26, 2011
Tech Stack


                    • Java 6, Spring 3, Jersey 1.1, JMS
                      (AQMP)
                    • RHEL 4, Oracle 11g, Voldemort 0.81,
                      Solr 1.4.1, NFS




Thursday, May 26, 2011
Thursday, May 26, 2011
Thursday, May 26, 2011
Not Covered


                    • Distributed Search
                    • Caching Strategies
                    • Data Import
                    • Analyzers/Tokenizers



Thursday, May 26, 2011
Why Lucene?

                    • Proven Solid IR library
                    • Prefer Open Source Solutions
                    • Not Only SQL
                    • Flexible Ranking
                    • Pluggable


Thursday, May 26, 2011
Why Solr


                    • Performant, Extensible, RESTful Service
                    • Configuration, Schema, Multicores
                    • Admin Interface
                    • Replication, Backups, Monitoring



Thursday, May 26, 2011
Open Source



                    • Strengthens Engineering Team
                    • Be apart of great community
                    • Not Brochure-ware




Thursday, May 26, 2011
Not Only SQL



                    • One solution does not fit all
                    • Prefer availability over consistency
                    • Horizontal Scaling over Vertical




Thursday, May 26, 2011
Flexible Ranking

                    • Query Strategies
                         • Boolean Algebra
                         • Vector Space Analysis
                         • Hybrids
                    • Extensive Function Support
                    • Index and Query Boosting


Thursday, May 26, 2011
...Oh My!


                    • Standard Plugins - Geospatial*,
                      Faceting, Spelling, MoreLikeThis
                    • Full Text with Highlighted Results
                    • Client agnostic



Thursday, May 26, 2011
Inevitable Question

                    • “Does it scale?”
                    • Solr POC Benchmark
                         • 10 Million profiles
                         • >200 queries/sec under 100ms 90th
                         • Default tuning until 5 million profiles


Thursday, May 26, 2011
Profile Service



                    • RESTful Hybrid Data Service
                    • Public, Private, Attributes
                    • Event Producer




Thursday, May 26, 2011
Profiles

                    • Mostly structured
                    • Categories - Eye Color, Desired
                      Ethnicity
                    • Dates - Birthdate
                    • Numbers - Coordinates, Age Range
                    • Text -Name, Headline


Thursday, May 26, 2011
Inverting People
                                            Term          Document
                                           MALE           1, 3, 5, 7, 9
                                          FEMALE         2, 4, 6, 8, 10
                    • Stored as an        HAIR_RED              8
                      inverted index     HAIR_BLOND        1, 2, 5, 6
                                          EYE_BLUE         1, 2, 3, 10
                    • Index random
                                         EYE_BROWN      4, 5, 6, 7, 8, 9
                      accessed by term       fun           1, 3, 7, 9
                                            funny          2, 4, 6, 10
                                            beach     1, 2, 3, 4, 5, 6, 7, 8


Thursday, May 26, 2011
Schema Design


                    • Single “Table”
                    • One-to-many = multi-value fields
                    • Individual vs Composite Fields
                         • copyTo and have both!



Thursday, May 26, 2011
Field considerations


                    • Stored or not
                    • Indexed or not
                    • Multivalued - desires fields
                    • Type



Thursday, May 26, 2011
Solr Types Used
                                                 The ‘t’ is for Trie
                    • tdate, tint, tfloat* - birthdate, loginAt
                    • text - all text
                    • string - id, non indexed text
                    • random - good for random sorts
                    • enum - for all enumerations


Thursday, May 26, 2011
Data Duplication


                    • By function - numberPhotos &
                      hasPhotos
                    • By relationship - hiddenBy & hidden
                    • By analysis - name & text



Thursday, May 26, 2011
Saving Profiles


                    • Updating is in memory operation
                    • No partial updates
                    • Commit means flush index changes
                    • Autocommit on maxDocs, maxTime or
                      both



Thursday, May 26, 2011
Why Also Voldemort


                    • Private profiles can not be stale
                    • Many fields not searchable or viewable
                      by others
                    • Isolate queries from fetch by id



Thursday, May 26, 2011
Querying


                    • Superset of Lucene
                    • Efficient Range Queries
                    • Multiple Query Handlers
                         • Dismax, Boost, Geo



Thursday, May 26, 2011
Recall vs Precision



                    • Focus on recall when corpus is small
                    • Precision once it is at critical mass




Thursday, May 26, 2011
Boolean Queries


                    • Default operator set to AND
                    • +gender:FEMALE +seeking:MALE
                      +eyeColor:EYE_BLUE +hairColor:
                      (HAIR_RED, HAIR_BLONDE)
                    • Sort order is important



Thursday, May 26, 2011
Hybrid Queries


                    • Default operator set to OR
                    • +gender:FEMALE +seeking:MALE
                      eyeColor:EYE_BLUE hairColor:
                      (HAIR_RED, HAIR_BLONDE)




Thursday, May 26, 2011
Why you’re lucky if you
                      like redheads

                    • Inverse Document
                      Frequency (IDF)  1.Blue eyed, redheads
                                       2.Blue eyed, blonds
                    • Rarer is favored
                                       3.Redheads
                      over more common
                                       4.Blonds
                    • More fields
                      matched = higher
                      ranking

Thursday, May 26, 2011
Boosting



                    • Query time by importance
                         • eyeColor:EYE_BLUE^2
                           hairColor:HAIR_BLOND




Thursday, May 26, 2011
Filter Fields

                                             id   hidden
                                             1    2, 4, 6
                    • Useful for roles and
                      other lists            2      1

                    • -hidden:(2 4 6)




Thursday, May 26, 2011
Filter Fields

                                             id    hidden
                                             1     2, 4, 6
                    • Useful for roles and
                      other lists            2       1

                    • -hidden:(2 4 6)        id   hiddenBy
                                             1       2
                    • -hiddenBy:1
                                             2       1
                                             4       1
                                             6       1

Thursday, May 26, 2011
Date Math



                    • Simplifies query preprocessing
                    • +birthDate:[NOW/DAY+1DAY-36YEAR
                      TO NOW/DAY-25YEAR]




Thursday, May 26, 2011
Date Math



                    • Simplifies query preprocessing
                    • +birthDate:[NOW/DAY+1DAY-36YEAR
                      TO NOW/DAY-25YEAR]

                          Between 25 and 35 years old



Thursday, May 26, 2011
Distance Searching




                    • lat, lon, distance
                    • SolrLocal by Patrick O’Leary
                    • Additional overhead ~90ms per query
                    • Superceded in Solr 3.1



Thursday, May 26, 2011
Testing Queries



                    • Log queries and ids returned
                    • Version your search strategies
                    • Improve one thing at a time




Thursday, May 26, 2011
Geo Service


                    • Read-mostly service
                    • Fields - Postal Code, Country,
                      State, Cities, Lat, Lon
                    • Usage - Registration
                      Validation, City Selection



Thursday, May 26, 2011
Operations



                    • Servlet container and filesystem
                    • Jetty 6, 64 Java 6 JVM
                    • 8G Heap -XX:+UseCompressedOops




Thursday, May 26, 2011
Operations


                    • Active/Passive
                    • Layer 7 Load balancing
                    • Nightly snapshots
                    • Eventually SolrCloud



Thursday, May 26, 2011
Multicore


                    • Run multiple schemas on the same
                    • Hot swappable for backwards
                      compatible changes
                    • private / public profiles



Thursday, May 26, 2011
Security


                     • No security provided
                     • At minimum secure      <delete>
                                                <query>*:*</query>
                       your UpdateHandler     </delete>


                     • Separate Cores



Thursday, May 26, 2011
Future

                    • Solr 3.1
                    • Mutual Matching
                    • Faceting / Guided Search
                    • Incorporating spelling
                    • Hierarchies, categories, better ranking
                      models


Thursday, May 26, 2011
Faceting

                    • Returns counts
                      with query
                      results
                    • Efficient
                    • Guides the user
                      toward precision


Thursday, May 26, 2011
Thank you
                         jtuberville@eharmony.com
                            Twitter: @jtuberville




Thursday, May 26, 2011

Mais conteúdo relacionado

Destaque

Tennis
TennisTennis
Tennis
aritz
 
Maroon5
Maroon5Maroon5
Maroon5
tanica
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Lucidworks (Archived)
 
Using Solr to find the Right Person for the Right Job
Using Solr to find the Right Person for the Right JobUsing Solr to find the Right Person for the Right Job
Using Solr to find the Right Person for the Right Job
Lucidworks (Archived)
 
Gaiety Hotel - full version
Gaiety Hotel - full versionGaiety Hotel - full version
Gaiety Hotel - full version
dummypackages
 
I love you mommy
I love you mommyI love you mommy
I love you mommy
Nyiah
 
Cancer
CancerCancer
Cancer
tanica
 

Destaque (20)

Tennis
TennisTennis
Tennis
 
Updated: Preparing an investor presentation
Updated:  Preparing an investor presentationUpdated:  Preparing an investor presentation
Updated: Preparing an investor presentation
 
Maroon5
Maroon5Maroon5
Maroon5
 
Tate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search ExperienceTate Tyler - Designing the Search Experience
Tate Tyler - Designing the Search Experience
 
How To Get The Justin Bieber Smile
How To Get The Justin Bieber SmileHow To Get The Justin Bieber Smile
How To Get The Justin Bieber Smile
 
How The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open SourceHow The Guardian Embraced the Internet using Content, Search, and Open Source
How The Guardian Embraced the Internet using Content, Search, and Open Source
 
Short Presentation
Short PresentationShort Presentation
Short Presentation
 
Search Analytics What? Why? How?
Search Analytics What? Why? How?Search Analytics What? Why? How?
Search Analytics What? Why? How?
 
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
 
Using Solr to find the Right Person for the Right Job
Using Solr to find the Right Person for the Right JobUsing Solr to find the Right Person for the Right Job
Using Solr to find the Right Person for the Right Job
 
Creep
CreepCreep
Creep
 
Simbad marinela
Simbad marinelaSimbad marinela
Simbad marinela
 
Solr & Lucene at Etsy
Solr & Lucene at EtsySolr & Lucene at Etsy
Solr & Lucene at Etsy
 
Moving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source SearchMoving to Solr/Lucene Open Source Search
Moving to Solr/Lucene Open Source Search
 
20101023 ie9 cache
20101023 ie9 cache20101023 ie9 cache
20101023 ie9 cache
 
Gaiety Hotel - full version
Gaiety Hotel - full versionGaiety Hotel - full version
Gaiety Hotel - full version
 
I love you mommy
I love you mommyI love you mommy
I love you mommy
 
The Seven Deadly Sins of Solr
The Seven Deadly Sins of SolrThe Seven Deadly Sins of Solr
The Seven Deadly Sins of Solr
 
IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響IE のサポート変更が Azure に及ぼす影響
IE のサポート変更が Azure に及ぼす影響
 
Cancer
CancerCancer
Cancer
 

Semelhante a Jazeed about Solr - People as A Search Problem

P90 X Your Database!!
P90 X Your Database!!P90 X Your Database!!
P90 X Your Database!!
Denish Patel
 
Skills & Training for Library Publishing
Skills & Training for Library PublishingSkills & Training for Library Publishing
Skills & Training for Library Publishing
kimballs
 
Education 2.3 m erwin
Education 2.3 m erwinEducation 2.3 m erwin
Education 2.3 m erwin
Erwin Huang
 

Semelhante a Jazeed about Solr - People as A Search Problem (15)

Atlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide DeckAtlassian RoadTrip 2011 Slide Deck
Atlassian RoadTrip 2011 Slide Deck
 
JavaScript Intro
JavaScript IntroJavaScript Intro
JavaScript Intro
 
Building Languages for the JVM - StarTechConf 2011
Building Languages for the JVM - StarTechConf 2011Building Languages for the JVM - StarTechConf 2011
Building Languages for the JVM - StarTechConf 2011
 
Fred Spencer: Designing a Great UI
Fred Spencer: Designing a Great UIFred Spencer: Designing a Great UI
Fred Spencer: Designing a Great UI
 
Building an experimentation framework
Building an experimentation frameworkBuilding an experimentation framework
Building an experimentation framework
 
Business of APIs Conference 2011 - Unicorns
Business of APIs Conference 2011 - UnicornsBusiness of APIs Conference 2011 - Unicorns
Business of APIs Conference 2011 - Unicorns
 
Preparing and Researching Presentations
Preparing and Researching PresentationsPreparing and Researching Presentations
Preparing and Researching Presentations
 
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
Lean UX Principles in Practice (Zach Larson on SideReel's iOS App)
 
Sustainable Theming with Fusion - DCCO 2011
Sustainable Theming with Fusion - DCCO 2011Sustainable Theming with Fusion - DCCO 2011
Sustainable Theming with Fusion - DCCO 2011
 
JSLOL
JSLOLJSLOL
JSLOL
 
Bonfire... How'd You Do That?! - AtlasCamp 2011
Bonfire... How'd You Do That?! - AtlasCamp 2011Bonfire... How'd You Do That?! - AtlasCamp 2011
Bonfire... How'd You Do That?! - AtlasCamp 2011
 
P90 X Your Database!!
P90 X Your Database!!P90 X Your Database!!
P90 X Your Database!!
 
Skills & Training for Library Publishing
Skills & Training for Library PublishingSkills & Training for Library Publishing
Skills & Training for Library Publishing
 
Education 2.3 m erwin
Education 2.3 m erwinEducation 2.3 m erwin
Education 2.3 m erwin
 
AIIM Ottawa May 12 2011 Agenda
AIIM Ottawa May 12 2011 AgendaAIIM Ottawa May 12 2011 Agenda
AIIM Ottawa May 12 2011 Agenda
 

Mais de Lucidworks (Archived)

Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Lucidworks (Archived)
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Lucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Lucidworks (Archived)
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Lucidworks (Archived)
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Lucidworks (Archived)
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Lucidworks (Archived)
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Lucidworks (Archived)
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Lucidworks (Archived)
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Lucidworks (Archived)
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
Lucidworks (Archived)
 

Mais de Lucidworks (Archived) (20)

Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
The Data-Driven Paradigm
The Data-Driven ParadigmThe Data-Driven Paradigm
The Data-Driven Paradigm
 
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
 
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
 
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for BusinessSFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
 
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr PerformanceSFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
 
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with SearchChicago Solr Meetup - June 10th: Exploring Hadoop with Search
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
 
What's new in solr june 2014
What's new in solr june 2014What's new in solr june 2014
What's new in solr june 2014
 
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache SolrMinneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
 
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com SearchMinneapolis Solr Meetup - May 28, 2014: Target.com Search
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...Unstructured   Or: How I Learned to Stop Worrying and Love the xml, Presented...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DCWhat's New  in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
 
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DCSolr At AOL, Presented by Sean Timm at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
 
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DCIntro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
 
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DCTest Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
 
Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Solr4 nosql search_server_2013
Solr4 nosql search_server_2013Solr4 nosql search_server_2013
Solr4 nosql search_server_2013
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Jazeed about Solr - People as A Search Problem

  • 1. About Solr People as A Search Problem Thursday, May 26, 2011
  • 2. About Me • Building websites since 1996, Java since 1997 • Prior web search experience • Building and scaling eHarmony products since 2002 Thursday, May 26, 2011
  • 3. What is Jazzed • Subscription Based Dating Site • Incubated by eHarmony Thursday, May 26, 2011
  • 4. What is Jazzed • Create a profile • Search for others • View their photos • Privately Communicate Thursday, May 26, 2011
  • 5. What is Jazzed • Create a profile • Search for others • View their photos • Privately Communicate Thursday, May 26, 2011
  • 6. What is Jazzed • Create a profile • Search for others • View their photos • Privately Communicate Thursday, May 26, 2011
  • 7. What is Jazzed • Create a profile • Search for others • View their photos • Privately Communicate Thursday, May 26, 2011
  • 8. How is it different? • Covers broader range of relationships • Easy to get started • Real profiles screened by machine and humans • Fast, effective search oriented tools Thursday, May 26, 2011
  • 9. Jazzed Stats • Started Fall 2009 • Beta Summer 2010 • Launched October 2010 • 100,000s of Profiles • 1,000s of Searches Daily Thursday, May 26, 2011
  • 10. Jazzed Architecture • Event-driven SOA • REST, JSON, EIP, Not-only-SQL • Technology incubation Thursday, May 26, 2011
  • 11. Tech Stack • Java 6, Spring 3, Jersey 1.1, JMS (AQMP) • RHEL 4, Oracle 11g, Voldemort 0.81, Solr 1.4.1, NFS Thursday, May 26, 2011
  • 14. Not Covered • Distributed Search • Caching Strategies • Data Import • Analyzers/Tokenizers Thursday, May 26, 2011
  • 15. Why Lucene? • Proven Solid IR library • Prefer Open Source Solutions • Not Only SQL • Flexible Ranking • Pluggable Thursday, May 26, 2011
  • 16. Why Solr • Performant, Extensible, RESTful Service • Configuration, Schema, Multicores • Admin Interface • Replication, Backups, Monitoring Thursday, May 26, 2011
  • 17. Open Source • Strengthens Engineering Team • Be apart of great community • Not Brochure-ware Thursday, May 26, 2011
  • 18. Not Only SQL • One solution does not fit all • Prefer availability over consistency • Horizontal Scaling over Vertical Thursday, May 26, 2011
  • 19. Flexible Ranking • Query Strategies • Boolean Algebra • Vector Space Analysis • Hybrids • Extensive Function Support • Index and Query Boosting Thursday, May 26, 2011
  • 20. ...Oh My! • Standard Plugins - Geospatial*, Faceting, Spelling, MoreLikeThis • Full Text with Highlighted Results • Client agnostic Thursday, May 26, 2011
  • 21. Inevitable Question • “Does it scale?” • Solr POC Benchmark • 10 Million profiles • >200 queries/sec under 100ms 90th • Default tuning until 5 million profiles Thursday, May 26, 2011
  • 22. Profile Service • RESTful Hybrid Data Service • Public, Private, Attributes • Event Producer Thursday, May 26, 2011
  • 23. Profiles • Mostly structured • Categories - Eye Color, Desired Ethnicity • Dates - Birthdate • Numbers - Coordinates, Age Range • Text -Name, Headline Thursday, May 26, 2011
  • 24. Inverting People Term Document MALE 1, 3, 5, 7, 9 FEMALE 2, 4, 6, 8, 10 • Stored as an HAIR_RED 8 inverted index HAIR_BLOND 1, 2, 5, 6 EYE_BLUE 1, 2, 3, 10 • Index random EYE_BROWN 4, 5, 6, 7, 8, 9 accessed by term fun 1, 3, 7, 9 funny 2, 4, 6, 10 beach 1, 2, 3, 4, 5, 6, 7, 8 Thursday, May 26, 2011
  • 25. Schema Design • Single “Table” • One-to-many = multi-value fields • Individual vs Composite Fields • copyTo and have both! Thursday, May 26, 2011
  • 26. Field considerations • Stored or not • Indexed or not • Multivalued - desires fields • Type Thursday, May 26, 2011
  • 27. Solr Types Used The ‘t’ is for Trie • tdate, tint, tfloat* - birthdate, loginAt • text - all text • string - id, non indexed text • random - good for random sorts • enum - for all enumerations Thursday, May 26, 2011
  • 28. Data Duplication • By function - numberPhotos & hasPhotos • By relationship - hiddenBy & hidden • By analysis - name & text Thursday, May 26, 2011
  • 29. Saving Profiles • Updating is in memory operation • No partial updates • Commit means flush index changes • Autocommit on maxDocs, maxTime or both Thursday, May 26, 2011
  • 30. Why Also Voldemort • Private profiles can not be stale • Many fields not searchable or viewable by others • Isolate queries from fetch by id Thursday, May 26, 2011
  • 31. Querying • Superset of Lucene • Efficient Range Queries • Multiple Query Handlers • Dismax, Boost, Geo Thursday, May 26, 2011
  • 32. Recall vs Precision • Focus on recall when corpus is small • Precision once it is at critical mass Thursday, May 26, 2011
  • 33. Boolean Queries • Default operator set to AND • +gender:FEMALE +seeking:MALE +eyeColor:EYE_BLUE +hairColor: (HAIR_RED, HAIR_BLONDE) • Sort order is important Thursday, May 26, 2011
  • 34. Hybrid Queries • Default operator set to OR • +gender:FEMALE +seeking:MALE eyeColor:EYE_BLUE hairColor: (HAIR_RED, HAIR_BLONDE) Thursday, May 26, 2011
  • 35. Why you’re lucky if you like redheads • Inverse Document Frequency (IDF) 1.Blue eyed, redheads 2.Blue eyed, blonds • Rarer is favored 3.Redheads over more common 4.Blonds • More fields matched = higher ranking Thursday, May 26, 2011
  • 36. Boosting • Query time by importance • eyeColor:EYE_BLUE^2 hairColor:HAIR_BLOND Thursday, May 26, 2011
  • 37. Filter Fields id hidden 1 2, 4, 6 • Useful for roles and other lists 2 1 • -hidden:(2 4 6) Thursday, May 26, 2011
  • 38. Filter Fields id hidden 1 2, 4, 6 • Useful for roles and other lists 2 1 • -hidden:(2 4 6) id hiddenBy 1 2 • -hiddenBy:1 2 1 4 1 6 1 Thursday, May 26, 2011
  • 39. Date Math • Simplifies query preprocessing • +birthDate:[NOW/DAY+1DAY-36YEAR TO NOW/DAY-25YEAR] Thursday, May 26, 2011
  • 40. Date Math • Simplifies query preprocessing • +birthDate:[NOW/DAY+1DAY-36YEAR TO NOW/DAY-25YEAR] Between 25 and 35 years old Thursday, May 26, 2011
  • 41. Distance Searching • lat, lon, distance • SolrLocal by Patrick O’Leary • Additional overhead ~90ms per query • Superceded in Solr 3.1 Thursday, May 26, 2011
  • 42. Testing Queries • Log queries and ids returned • Version your search strategies • Improve one thing at a time Thursday, May 26, 2011
  • 43. Geo Service • Read-mostly service • Fields - Postal Code, Country, State, Cities, Lat, Lon • Usage - Registration Validation, City Selection Thursday, May 26, 2011
  • 44. Operations • Servlet container and filesystem • Jetty 6, 64 Java 6 JVM • 8G Heap -XX:+UseCompressedOops Thursday, May 26, 2011
  • 45. Operations • Active/Passive • Layer 7 Load balancing • Nightly snapshots • Eventually SolrCloud Thursday, May 26, 2011
  • 46. Multicore • Run multiple schemas on the same • Hot swappable for backwards compatible changes • private / public profiles Thursday, May 26, 2011
  • 47. Security • No security provided • At minimum secure <delete> <query>*:*</query> your UpdateHandler </delete> • Separate Cores Thursday, May 26, 2011
  • 48. Future • Solr 3.1 • Mutual Matching • Faceting / Guided Search • Incorporating spelling • Hierarchies, categories, better ranking models Thursday, May 26, 2011
  • 49. Faceting • Returns counts with query results • Efficient • Guides the user toward precision Thursday, May 26, 2011
  • 50. Thank you jtuberville@eharmony.com Twitter: @jtuberville Thursday, May 26, 2011