SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
A PATTERN FOR
IMPLEMENTING SOLR



        1

                    1
BOTTOM LINE UP FRONT



•   Migrating from an existing search architecture to the Solr platform
    is less an exercise in technology and coding, and more an exercise
    in project management, metrics, and managing expectations.




                                                                          2
•   “Typically smart people, fed into the search
    migration project meat grinder, produce
    hamburger quality results.  Okay search, with okay
    relevance, and an okay project.  But if you apply
    this pattern, you'll get back steak!”           -
    Arin Sime




                                                         3
I want feedback!

Project definition       We Start Here

Precursor Work

   Prototype        Typical starting point for
                    technology driven team
Implementation

  Testing/QA                repeats!

  Deployment

Ongoing Tuning       Forgotten phase for a
                    technology driven team


                                                     4
PROGRAMMERS DOMINATE


•   We dive right into writing indexers and building queries

•   We skip the first two phases!

•   We don’t plan for the last phase!




                                                               5
NEED HETEROGENOUS SKILLS
•   More so than regular development project, we need multiple
    skills:
      •   Business Analysts        •   Content Folks (Writers)

      •   Developers               •   End Users

      •   QA/Testers               •   UX Experts

      •   Report Writers           •   Ops Team

      •   Big Brain Scientists     •   Librarians!

                                                                 6
PHASE 1: PROJECT DEFINITION


•   Well understood part of any project right?

    •   objectives, key success criteria, evaluated risks

•   Leads to a Project Charter:

    •   structure, team membership, acceptable tradeoffs



                                                            7
CHALLENGES
•   Competing business stakeholders:

    •   Tester: When I search for “lamp shades”, I used to see these
        documents, now I see a differing set.

    •   Business Owner: How do I know that the new search engine is
        better?

    •   User: My pet feature “search within these results” works
        differently.

    •   Marketing Guy: I want to control the results so the current
        marketing push for toilet paper brand X always shows up at the
        top.
                                                                         8
CHALLENGES



•   Stakeholders want a better search implementation, but
    perversely often want it to all work “the exact same way”.  
    Getting agreement across all the stakeholders for the project
    vision, and agree on the metrics is a challenge.




                                                                    9
CHALLENGES



•   Can be difficult to bring in non technical folks onto Search Team.

    •   Have a content driven site? You need them to provide the right
        kind of content to fit into your search implementation!




                                                                         10
ENSURING SKILLS NEEDED



•   Search is something everybody uses daily, but is it’s own
    specialized domain

    •   Solr does pass the 15 minute rule, don’t get over confident!




                                                                      11
PERFECT SOLR PERSON
             WOULD BE ALL OF
•   Mathematician    •   Business Analyst

•   Librarian        •   Systems Engineer

•   UX Expert        •   Geographer!

•   Writer           •   Psychologist

•   Programmer



                                            12
KNOWLEDGE TRANSFER


•   If you don’t have the perfect team already, bring in experts and do
    domain knowledge transfer.

•   Learn the vocabulary of search to better communicate together

    •   “auto complete” vs “auto suggest”

•   Do “Solr for Content Team” brownbag sessions!



                                                                          13
14
HAVE A COOL PROJECT NAME!




                            15
“Putting our
             content in the lime
             light”




PROJECT LIMELIGHT
                                   16
PHASE 2: PRECURSOR WORK

•   A somewhat tenuous phase, this is making sure that we can
    measure the goals defined in the project definition.

    •   Do we have tools to track “increase conversions through
        search”?

•   In a greenfield search, we don’t have any previous relevancy/recall
    to measure against, but in a brownfield migration project we can
    do some apples to (apples? oranges?) comparisons.


                                                                         17
METRICS
          18
DATA COLLECTION


•   Have we been collecting enough data about current search
    patterns to measure success against?

•   Often folks have logs that record search queries but are missing
    crucial data like number of results returned per query!




                                                                       19
RELEVANCY



•   Do we have any defined relevancy metrics?

•   Relevancy is like porn.....




                                               20
I KNOW IT WHEN I SEE IT!




  http://en.wikipedia.org/wiki/Les_Amants

                                            21
22
MEASURE USER BEHAVIOR



•   Are we trying to solve user interaction issues with existing search?

•   Do we have the analytics in place? Google Analytics?
    Omniture?




                                                                           23
POGOSTICKING
  image from http://searchpatterns.org/

                                          24
THRASHING
 image from http://searchpatterns.org/

                                         25
BROAD BASE OF SKILLS



•   Not your normal “I am a developer, I crank out code” type of
    tasks!




                                                                   26
INVENTORY USERS
                                                 Users as in “Systems”!



•   Search often permeates multiple systems... “I can just leverage
    your search to power my content area”

•   Do you know which third party systems are actually accessing
    your existing search?

    •   A plan for cutting the cord on an existing search platform!



                                                                          27
PHASE 3: PROTOTYPE


•   The fun part! <-- Why tech driven teams start here!

•   Solr is very simple and robust platform.

    •   Most time should be spent on defining the schema needs to
        support the search queries, and indexing the correct data




                                                                    28
GOING FROM QUESTIONS TO
        ANSWERS



                          29
INDEXING: PUSH ME PULL ME
•   Are we in a pull environment?   •   Sunspot

    •   DIH

    •   Crawlers

    •   Scheduled Indexers

•   Are we in a push
    environment?



                                                  30
VERIFY INDEXING STRATEGY


•   Use the complete dataset, not a partial load!

•   Is indexing time performance acceptable?

•   Quality of indexed data? Duplicates? Odd characters?




                                                           31
WHERE IS SEARCH BUSINESS
              LOGIC?


•   Does it go Solr side in request handlers (solrconfig.xml?)

•   Is it specified as lots of URL parameters?

•   Do you have a frontend library like Sunspot that provides a layer
    of abstraction/DSL?




                                                                        32
HOOKING SOLR UP TO
              FRONTEND


•   The first integration tool may not be the right one!

•   A simple query/result is very easy to do.

•   A highly relevant query/result is very difficult to do.




                                                             33
PART OF PROTOTYPING IS
             DEPLOYMENT

•   Make sure when you are demoing the prototype Solr, its been
    deployed into an environment like QA

•   Running Solr by hand on a developer’s laptop is NOT enough.

•   Figuring out deployment (configuration management,
    environment, 1-click deploy) need to be at least looked at



                                                                  34
PHASE 4: IMPLEMENTATION


•   Back on familiar ground! We are extending the data being
    indexed, enhancing search queries, adding features.

•   Apply all the patterns of any experienced development team.

    •   Just don’t forget to involve your non techies in defining
        approaches!



                                                                   35
INDEXERS PROLIFERATE!


•   Make sure you have strong
    patterns for indexers

•   A good topic for a code
    review!




                                 36
PHASE 5: TESTING/QA


•   Most typical testing patterns apply EXCEPT

    •   Can be tough to automate testing if data is changing rapidly

    •   You want the full dataset at your finger tips

    •   You can still do it!



                                                                       37
WATCH OUT FOR RELEVANCY!
•   Sometimes seems like once you
    validate one search, the previous
    one starts failing

    •   How do you empirically
        measure this?

•   Need production like data sets
    during QA

•   Don’t get tied up in doc id 598 is
    the third result. Be happy 598
    shows up in first 10 results!
                                         38
EXPLORATORY TESTING?


•   ...simultaneous learning, test
    design and test execution

•   Requires tester to understand
    the corpus of data indexed

•   behave like a user
                                                                            James Bach

                         http://en.wikipedia.org/wiki/Exploratory_testing
                                                                                         39
STUMP THE CHUMP



•   You can always write a crazy
    search query that Solr will
    barf on... Is that what your
    users are typing in?




                                   40
DOES SOLR ADMIN WORK?



•   Do searches via Solr Admin reflect what the front end does? If
    not, provide your own test harness!

•   Make adhoc searches by QA really really easy

•   “Just type these 15 URL params in!” is not an answer!

                                                                    41
PHASE 6: DEPLOYMENT


•   Similar to any large scale system

    •   Network plumbing tasks, multiple servers, IP addresses

    •   Hopefully all environment variables are external to Solr
        configurations?

•   Think about monitoring.. Replication, query load!



                                                                   42
DO YOU NEED UPTIME
              THROUGH RELEASE?


•   Solr is both code, configuration, and data! Do you have to
    reindex your data?

    •   Can you reindex your data from someplace else?




                                                                43
44
PRACTICE THIS PROCESS!


•   mapping out the steps to backup cores, redeploy new ones,
    update master and slave servers is fairly straightforward if done
    ahead of time

•   These steps are a great thing to involve your Ops team in




                                                                        45
PHASE 7: ONGOING TUNING


•   The part we forget to budget for!

•   Many knobs and dials available to Solr, need to keep tweaking
    them as:

    •   data set being indexed changes

    •   as behavior of users changes



                                                                    46
HAVE REGULAR CHECKINS
    WITH CONTENT PROVIDERS


•   Have an editorial calender of content? Evaluate what synonyms
    you are using based on content

•   Can you better highlight content using Query Elevation to boost
    certain documents?




                                                                      47
QUERY TRENDS

•   Look at queries returning 0 results

•   are queries getting slower/faster

•   are users leveraging all the features available to them

•   Does your analytics highlight negative behaviors such as
    pogosticking or thrashing?

•   AUTOMATE THESE REPORTS!


                                                               48
1.0-1.5s 2.0-2.5s
                            1.5-2.0s2.5s
                                  >
                  Query Duration
                       6%      2% 2%
                                 1%




               0.5-1.0s
                 20%



                                           Less than 0.5 s
                                                69%



89% of all
queries take
less than 1s




                                                             49
Note: It’s harder to get queries in that 0-0.1s range, though
It is questionable if focusing on that leads to noticeable
improvement

                           Over time, we want to see this trend
                           become steeper, which would indicate
                           queries are becoming shorter and more
                           noticeable performance improvements




                                                                   50
Project definition              Start!

Precursor Work

   Prototype

Implementation

  Testing/QA                  repeats!

  Deployment

Ongoing Tuning      Maximize value of investment


                                                   51

Mais conteúdo relacionado

Mais procurados

DevOps - It's About How We Work
DevOps - It's About How We WorkDevOps - It's About How We Work
DevOps - It's About How We Work
Randy Shoup
 
Agile Training March 2015
Agile Training March 2015Agile Training March 2015
Agile Training March 2015
David Phipps
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 

Mais procurados (20)

Heavenly hell – automated tests at scale wojciech seliga
Heavenly hell – automated tests at scale   wojciech seligaHeavenly hell – automated tests at scale   wojciech seliga
Heavenly hell – automated tests at scale wojciech seliga
 
Andrew rusling 21 experiments to increase velocity
Andrew rusling 21 experiments to increase velocityAndrew rusling 21 experiments to increase velocity
Andrew rusling 21 experiments to increase velocity
 
Moving Fast At Scale
Moving Fast At ScaleMoving Fast At Scale
Moving Fast At Scale
 
Quality at Speed - Penny Wyatt
Quality at Speed - Penny WyattQuality at Speed - Penny Wyatt
Quality at Speed - Penny Wyatt
 
When Support Calls
When Support CallsWhen Support Calls
When Support Calls
 
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup
Minimum Viable Architecture -- Good Enough is Good Enough in a StartupMinimum Viable Architecture -- Good Enough is Good Enough in a Startup
Minimum Viable Architecture -- Good Enough is Good Enough in a Startup
 
DevOps - It's About How We Work
DevOps - It's About How We WorkDevOps - It's About How We Work
DevOps - It's About How We Work
 
Staying Ahead of the Curve
Staying Ahead of the CurveStaying Ahead of the Curve
Staying Ahead of the Curve
 
Agile Training March 2015
Agile Training March 2015Agile Training March 2015
Agile Training March 2015
 
Scrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper ProductivityScrum Plus Extreme Programming (XP) for Hyper Productivity
Scrum Plus Extreme Programming (XP) for Hyper Productivity
 
Kanban in Action Workshop
Kanban in Action WorkshopKanban in Action Workshop
Kanban in Action Workshop
 
Infrastructure is development
Infrastructure is developmentInfrastructure is development
Infrastructure is development
 
One Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us BetterOne Terrible Day at Google, and How It Made Us Better
One Terrible Day at Google, and How It Made Us Better
 
SDLC & DevSecOps
SDLC & DevSecOpsSDLC & DevSecOps
SDLC & DevSecOps
 
Effectively Culturing a Healthy Culture and Workflow - Jeff Pierce - DevOpsD...
Effectively Culturing a Healthy Culture and Workflow - Jeff Pierce  - DevOpsD...Effectively Culturing a Healthy Culture and Workflow - Jeff Pierce  - DevOpsD...
Effectively Culturing a Healthy Culture and Workflow - Jeff Pierce - DevOpsD...
 
IT Trends 120-ish in the real world
 IT Trends 120-ish in the real world IT Trends 120-ish in the real world
IT Trends 120-ish in the real world
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Preparing Agile for Scale - Van Dusen
Preparing Agile for Scale - Van DusenPreparing Agile for Scale - Van Dusen
Preparing Agile for Scale - Van Dusen
 
Software devops engineer in test (SDET)
Software devops engineer in test (SDET)Software devops engineer in test (SDET)
Software devops engineer in test (SDET)
 
Scaling Your Architecture for the Long Term
Scaling Your Architecture for the Long TermScaling Your Architecture for the Long Term
Scaling Your Architecture for the Long Term
 

Destaque

Destaque (7)

OSSCON: Big Search 4 Big Data
OSSCON: Big Search 4 Big DataOSSCON: Big Search 4 Big Data
OSSCON: Big Search 4 Big Data
 
Indexing big data in the cloud
Indexing big data in the cloudIndexing big data in the cloud
Indexing big data in the cloud
 
Facebook API for iOS
Facebook API for iOSFacebook API for iOS
Facebook API for iOS
 
Alphageeks meetup - facebook api
Alphageeks meetup - facebook apiAlphageeks meetup - facebook api
Alphageeks meetup - facebook api
 
Intro to Agile Practices and Values
Intro to Agile Practices and ValuesIntro to Agile Practices and Values
Intro to Agile Practices and Values
 
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
 
Richmond Spin - How To Sell A Traditional Client
Richmond Spin - How To Sell A Traditional ClientRichmond Spin - How To Sell A Traditional Client
Richmond Spin - How To Sell A Traditional Client
 

Semelhante a Solr pattern

Tooling for the JavaScript Era
Tooling for the JavaScript EraTooling for the JavaScript Era
Tooling for the JavaScript Era
martinlippert
 
Agile business analysis the changing role of business analysts in agile sof...
Agile business analysis   the changing role of business analysts in agile sof...Agile business analysis   the changing role of business analysts in agile sof...
Agile business analysis the changing role of business analysts in agile sof...
Nari Kannan
 
Introduction To Agile Refresh Savannah July20 2010 V1 4
Introduction To Agile Refresh Savannah July20 2010 V1 4Introduction To Agile Refresh Savannah July20 2010 V1 4
Introduction To Agile Refresh Savannah July20 2010 V1 4
Marvin Heery
 

Semelhante a Solr pattern (20)

Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...Agile Software Development in practice: Experience, Tips and Tools from the T...
Agile Software Development in practice: Experience, Tips and Tools from the T...
 
Better Search Engine Testing - Eric Pugh
Better Search Engine Testing - Eric PughBetter Search Engine Testing - Eric Pugh
Better Search Engine Testing - Eric Pugh
 
Tooling for the JavaScript Era
Tooling for the JavaScript EraTooling for the JavaScript Era
Tooling for the JavaScript Era
 
Towards an Agile approach to building application profiles
Towards an Agile approach to building application profilesTowards an Agile approach to building application profiles
Towards an Agile approach to building application profiles
 
Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...Executing for Every Screen: Build, launch and sustain products for your custo...
Executing for Every Screen: Build, launch and sustain products for your custo...
 
Agile software development for startups
Agile software development for startupsAgile software development for startups
Agile software development for startups
 
Kku2011
Kku2011Kku2011
Kku2011
 
Supersize me: Making Drupal go large
Supersize me: Making Drupal go largeSupersize me: Making Drupal go large
Supersize me: Making Drupal go large
 
Validating Ideas Through Prototyping
Validating Ideas Through PrototypingValidating Ideas Through Prototyping
Validating Ideas Through Prototyping
 
The 360 Developer
The 360 DeveloperThe 360 Developer
The 360 Developer
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
No IT Left Behind - Connecting the Software-Defined Data Center to Multi-Moda...
No IT Left Behind - Connecting the Software-Defined Data Center to Multi-Moda...No IT Left Behind - Connecting the Software-Defined Data Center to Multi-Moda...
No IT Left Behind - Connecting the Software-Defined Data Center to Multi-Moda...
 
Java DevOps at Enterprise Scale
Java DevOps at Enterprise ScaleJava DevOps at Enterprise Scale
Java DevOps at Enterprise Scale
 
Cleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy ProjectsCleaning Code - Tools and Techniques for Large Legacy Projects
Cleaning Code - Tools and Techniques for Large Legacy Projects
 
5 Keys to Building a Successful DevOps Culture
5 Keys to Building a Successful DevOps Culture5 Keys to Building a Successful DevOps Culture
5 Keys to Building a Successful DevOps Culture
 
Lean UX in an Agency Environment
Lean UX in an Agency EnvironmentLean UX in an Agency Environment
Lean UX in an Agency Environment
 
Agile business analysis the changing role of business analysts in agile sof...
Agile business analysis   the changing role of business analysts in agile sof...Agile business analysis   the changing role of business analysts in agile sof...
Agile business analysis the changing role of business analysts in agile sof...
 
Software Supply Chain Automation Removes Roadblocks to Rugged DevOps
Software Supply Chain Automation Removes Roadblocks to Rugged DevOpsSoftware Supply Chain Automation Removes Roadblocks to Rugged DevOps
Software Supply Chain Automation Removes Roadblocks to Rugged DevOps
 
Introduction To Agile Refresh Savannah July20 2010 V1 4
Introduction To Agile Refresh Savannah July20 2010 V1 4Introduction To Agile Refresh Savannah July20 2010 V1 4
Introduction To Agile Refresh Savannah July20 2010 V1 4
 
Agile intro module 1
Agile intro   module 1Agile intro   module 1
Agile intro module 1
 

Mais de OpenSource Connections

Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
OpenSource Connections
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
OpenSource Connections
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
OpenSource Connections
 

Mais de OpenSource Connections (20)

Encores
EncoresEncores
Encores
 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Solr pattern

  • 2. BOTTOM LINE UP FRONT • Migrating from an existing search architecture to the Solr platform is less an exercise in technology and coding, and more an exercise in project management, metrics, and managing expectations. 2
  • 3. “Typically smart people, fed into the search migration project meat grinder, produce hamburger quality results.  Okay search, with okay relevance, and an okay project.  But if you apply this pattern, you'll get back steak!”   - Arin Sime 3
  • 4. I want feedback! Project definition We Start Here Precursor Work Prototype Typical starting point for technology driven team Implementation Testing/QA repeats! Deployment Ongoing Tuning Forgotten phase for a technology driven team 4
  • 5. PROGRAMMERS DOMINATE • We dive right into writing indexers and building queries • We skip the first two phases! • We don’t plan for the last phase! 5
  • 6. NEED HETEROGENOUS SKILLS • More so than regular development project, we need multiple skills: • Business Analysts • Content Folks (Writers) • Developers • End Users • QA/Testers • UX Experts • Report Writers • Ops Team • Big Brain Scientists • Librarians! 6
  • 7. PHASE 1: PROJECT DEFINITION • Well understood part of any project right? • objectives, key success criteria, evaluated risks • Leads to a Project Charter: • structure, team membership, acceptable tradeoffs 7
  • 8. CHALLENGES • Competing business stakeholders: • Tester: When I search for “lamp shades”, I used to see these documents, now I see a differing set. • Business Owner: How do I know that the new search engine is better? • User: My pet feature “search within these results” works differently. • Marketing Guy: I want to control the results so the current marketing push for toilet paper brand X always shows up at the top. 8
  • 9. CHALLENGES • Stakeholders want a better search implementation, but perversely often want it to all work “the exact same way”.   Getting agreement across all the stakeholders for the project vision, and agree on the metrics is a challenge. 9
  • 10. CHALLENGES • Can be difficult to bring in non technical folks onto Search Team. • Have a content driven site? You need them to provide the right kind of content to fit into your search implementation! 10
  • 11. ENSURING SKILLS NEEDED • Search is something everybody uses daily, but is it’s own specialized domain • Solr does pass the 15 minute rule, don’t get over confident! 11
  • 12. PERFECT SOLR PERSON WOULD BE ALL OF • Mathematician • Business Analyst • Librarian • Systems Engineer • UX Expert • Geographer! • Writer • Psychologist • Programmer 12
  • 13. KNOWLEDGE TRANSFER • If you don’t have the perfect team already, bring in experts and do domain knowledge transfer. • Learn the vocabulary of search to better communicate together • “auto complete” vs “auto suggest” • Do “Solr for Content Team” brownbag sessions! 13
  • 14. 14
  • 15. HAVE A COOL PROJECT NAME! 15
  • 16. “Putting our content in the lime light” PROJECT LIMELIGHT 16
  • 17. PHASE 2: PRECURSOR WORK • A somewhat tenuous phase, this is making sure that we can measure the goals defined in the project definition. • Do we have tools to track “increase conversions through search”? • In a greenfield search, we don’t have any previous relevancy/recall to measure against, but in a brownfield migration project we can do some apples to (apples? oranges?) comparisons. 17
  • 18. METRICS 18
  • 19. DATA COLLECTION • Have we been collecting enough data about current search patterns to measure success against? • Often folks have logs that record search queries but are missing crucial data like number of results returned per query! 19
  • 20. RELEVANCY • Do we have any defined relevancy metrics? • Relevancy is like porn..... 20
  • 21. I KNOW IT WHEN I SEE IT! http://en.wikipedia.org/wiki/Les_Amants 21
  • 22. 22
  • 23. MEASURE USER BEHAVIOR • Are we trying to solve user interaction issues with existing search? • Do we have the analytics in place? Google Analytics? Omniture? 23
  • 24. POGOSTICKING image from http://searchpatterns.org/ 24
  • 25. THRASHING image from http://searchpatterns.org/ 25
  • 26. BROAD BASE OF SKILLS • Not your normal “I am a developer, I crank out code” type of tasks! 26
  • 27. INVENTORY USERS Users as in “Systems”! • Search often permeates multiple systems... “I can just leverage your search to power my content area” • Do you know which third party systems are actually accessing your existing search? • A plan for cutting the cord on an existing search platform! 27
  • 28. PHASE 3: PROTOTYPE • The fun part! <-- Why tech driven teams start here! • Solr is very simple and robust platform. • Most time should be spent on defining the schema needs to support the search queries, and indexing the correct data 28
  • 29. GOING FROM QUESTIONS TO ANSWERS 29
  • 30. INDEXING: PUSH ME PULL ME • Are we in a pull environment? • Sunspot • DIH • Crawlers • Scheduled Indexers • Are we in a push environment? 30
  • 31. VERIFY INDEXING STRATEGY • Use the complete dataset, not a partial load! • Is indexing time performance acceptable? • Quality of indexed data? Duplicates? Odd characters? 31
  • 32. WHERE IS SEARCH BUSINESS LOGIC? • Does it go Solr side in request handlers (solrconfig.xml?) • Is it specified as lots of URL parameters? • Do you have a frontend library like Sunspot that provides a layer of abstraction/DSL? 32
  • 33. HOOKING SOLR UP TO FRONTEND • The first integration tool may not be the right one! • A simple query/result is very easy to do. • A highly relevant query/result is very difficult to do. 33
  • 34. PART OF PROTOTYPING IS DEPLOYMENT • Make sure when you are demoing the prototype Solr, its been deployed into an environment like QA • Running Solr by hand on a developer’s laptop is NOT enough. • Figuring out deployment (configuration management, environment, 1-click deploy) need to be at least looked at 34
  • 35. PHASE 4: IMPLEMENTATION • Back on familiar ground! We are extending the data being indexed, enhancing search queries, adding features. • Apply all the patterns of any experienced development team. • Just don’t forget to involve your non techies in defining approaches! 35
  • 36. INDEXERS PROLIFERATE! • Make sure you have strong patterns for indexers • A good topic for a code review! 36
  • 37. PHASE 5: TESTING/QA • Most typical testing patterns apply EXCEPT • Can be tough to automate testing if data is changing rapidly • You want the full dataset at your finger tips • You can still do it! 37
  • 38. WATCH OUT FOR RELEVANCY! • Sometimes seems like once you validate one search, the previous one starts failing • How do you empirically measure this? • Need production like data sets during QA • Don’t get tied up in doc id 598 is the third result. Be happy 598 shows up in first 10 results! 38
  • 39. EXPLORATORY TESTING? • ...simultaneous learning, test design and test execution • Requires tester to understand the corpus of data indexed • behave like a user James Bach http://en.wikipedia.org/wiki/Exploratory_testing 39
  • 40. STUMP THE CHUMP • You can always write a crazy search query that Solr will barf on... Is that what your users are typing in? 40
  • 41. DOES SOLR ADMIN WORK? • Do searches via Solr Admin reflect what the front end does? If not, provide your own test harness! • Make adhoc searches by QA really really easy • “Just type these 15 URL params in!” is not an answer! 41
  • 42. PHASE 6: DEPLOYMENT • Similar to any large scale system • Network plumbing tasks, multiple servers, IP addresses • Hopefully all environment variables are external to Solr configurations? • Think about monitoring.. Replication, query load! 42
  • 43. DO YOU NEED UPTIME THROUGH RELEASE? • Solr is both code, configuration, and data! Do you have to reindex your data? • Can you reindex your data from someplace else? 43
  • 44. 44
  • 45. PRACTICE THIS PROCESS! • mapping out the steps to backup cores, redeploy new ones, update master and slave servers is fairly straightforward if done ahead of time • These steps are a great thing to involve your Ops team in 45
  • 46. PHASE 7: ONGOING TUNING • The part we forget to budget for! • Many knobs and dials available to Solr, need to keep tweaking them as: • data set being indexed changes • as behavior of users changes 46
  • 47. HAVE REGULAR CHECKINS WITH CONTENT PROVIDERS • Have an editorial calender of content? Evaluate what synonyms you are using based on content • Can you better highlight content using Query Elevation to boost certain documents? 47
  • 48. QUERY TRENDS • Look at queries returning 0 results • are queries getting slower/faster • are users leveraging all the features available to them • Does your analytics highlight negative behaviors such as pogosticking or thrashing? • AUTOMATE THESE REPORTS! 48
  • 49. 1.0-1.5s 2.0-2.5s 1.5-2.0s2.5s > Query Duration 6% 2% 2% 1% 0.5-1.0s 20% Less than 0.5 s 69% 89% of all queries take less than 1s 49
  • 50. Note: It’s harder to get queries in that 0-0.1s range, though It is questionable if focusing on that leads to noticeable improvement Over time, we want to see this trend become steeper, which would indicate queries are becoming shorter and more noticeable performance improvements 50
  • 51. Project definition Start! Precursor Work Prototype Implementation Testing/QA repeats! Deployment Ongoing Tuning Maximize value of investment 51