Solr pattern

A PATTERN FOR
IMPLEMENTING SOLR

1

1

BOTTOM LINE UP FRONT

• Migrating from an existing search architecture to the Solr platform
is less an exercise in technology and coding, and more an exercise
in project management, metrics, and managing expectations.

2

• “Typically smart people, fed into the search
migration project meat grinder, produce
hamburger quality results. Okay search, with okay
relevance, and an okay project. But if you apply
this pattern, you'll get back steak!” -
Arin Sime

3

I want feedback!

Project deﬁnition We Start Here

Precursor Work

Prototype Typical starting point for
technology driven team
Implementation

Testing/QA repeats!

Deployment

Ongoing Tuning Forgotten phase for a
technology driven team

4

PROGRAMMERS DOMINATE

• We dive right into writing indexers and building queries

• We skip the ﬁrst two phases!

• We don’t plan for the last phase!

5

NEED HETEROGENOUS SKILLS
• More so than regular development project, we need multiple
skills:
• Business Analysts • Content Folks (Writers)

• Developers • End Users

• QA/Testers • UX Experts

• Report Writers • Ops Team

• Big Brain Scientists • Librarians!

6

PHASE 1: PROJECT DEFINITION

• Well understood part of any project right?

• objectives, key success criteria, evaluated risks

• Leads to a Project Charter:

• structure, team membership, acceptable tradeoffs

7

CHALLENGES
• Competing business stakeholders:

• Tester: When I search for “lamp shades”, I used to see these
documents, now I see a differing set.

• Business Owner: How do I know that the new search engine is
better?

• User: My pet feature “search within these results” works
differently.

• Marketing Guy: I want to control the results so the current
marketing push for toilet paper brand X always shows up at the
top.
8

CHALLENGES

• Stakeholders want a better search implementation, but
perversely often want it to all work “the exact same way”.
Getting agreement across all the stakeholders for the project
vision, and agree on the metrics is a challenge.

9

CHALLENGES

• Can be difﬁcult to bring in non technical folks onto Search Team.

• Have a content driven site? You need them to provide the right
kind of content to ﬁt into your search implementation!

10

ENSURING SKILLS NEEDED

• Search is something everybody uses daily, but is it’s own
specialized domain

• Solr does pass the 15 minute rule, don’t get over conﬁdent!

11

PERFECT SOLR PERSON
WOULD BE ALL OF
• Mathematician • Business Analyst

• Librarian • Systems Engineer

• UX Expert • Geographer!

• Writer • Psychologist

• Programmer

12

KNOWLEDGE TRANSFER

• If you don’t have the perfect team already, bring in experts and do
domain knowledge transfer.

• Learn the vocabulary of search to better communicate together

• “auto complete” vs “auto suggest”

• Do “Solr for Content Team” brownbag sessions!

13

HAVE A COOL PROJECT NAME!

15

“Putting our
content in the lime
light”

PROJECT LIMELIGHT
16

PHASE 2: PRECURSOR WORK

• A somewhat tenuous phase, this is making sure that we can
measure the goals defined in the project definition.

• Do we have tools to track “increase conversions through
search”?

• In a greenfield search, we don’t have any previous relevancy/recall
to measure against, but in a brownfield migration project we can
do some apples to (apples? oranges?) comparisons.

17

DATA COLLECTION

• Have we been collecting enough data about current search
patterns to measure success against?

• Often folks have logs that record search queries but are missing
crucial data like number of results returned per query!

19

RELEVANCY

• Do we have any deﬁned relevancy metrics?

• Relevancy is like porn.....

20

I KNOW IT WHEN I SEE IT!

http://en.wikipedia.org/wiki/Les_Amants

21

MEASURE USER BEHAVIOR

• Are we trying to solve user interaction issues with existing search?

• Do we have the analytics in place? Google Analytics?
Omniture?

23

POGOSTICKING
image from http://searchpatterns.org/

24

THRASHING
image from http://searchpatterns.org/

25

BROAD BASE OF SKILLS

• Not your normal “I am a developer, I crank out code” type of
tasks!

26

INVENTORY USERS
Users as in “Systems”!

• Search often permeates multiple systems... “I can just leverage
your search to power my content area”

• Do you know which third party systems are actually accessing
your existing search?

• A plan for cutting the cord on an existing search platform!

27

PHASE 3: PROTOTYPE

• The fun part! <-- Why tech driven teams start here!

• Solr is very simple and robust platform.

• Most time should be spent on deﬁning the schema needs to
support the search queries, and indexing the correct data

28

GOING FROM QUESTIONS TO
ANSWERS

29

INDEXING: PUSH ME PULL ME
• Are we in a pull environment? • Sunspot

• DIH

• Crawlers

• Scheduled Indexers

• Are we in a push
environment?

30

VERIFY INDEXING STRATEGY

• Use the complete dataset, not a partial load!

• Is indexing time performance acceptable?

• Quality of indexed data? Duplicates? Odd characters?

31

WHERE IS SEARCH BUSINESS
LOGIC?

• Does it go Solr side in request handlers (solrconﬁg.xml?)

• Is it speciﬁed as lots of URL parameters?

• Do you have a frontend library like Sunspot that provides a layer
of abstraction/DSL?

32

HOOKING SOLR UP TO
FRONTEND

• The ﬁrst integration tool may not be the right one!

• A simple query/result is very easy to do.

• A highly relevant query/result is very difﬁcult to do.

33

PART OF PROTOTYPING IS
DEPLOYMENT

• Make sure when you are demoing the prototype Solr, its been
deployed into an environment like QA

• Running Solr by hand on a developer’s laptop is NOT enough.

• Figuring out deployment (conﬁguration management,
environment, 1-click deploy) need to be at least looked at

34

PHASE 4: IMPLEMENTATION

• Back on familiar ground! We are extending the data being
indexed, enhancing search queries, adding features.

• Apply all the patterns of any experienced development team.

• Just don’t forget to involve your non techies in deﬁning
approaches!

35

INDEXERS PROLIFERATE!

• Make sure you have strong
patterns for indexers

• A good topic for a code
review!

36

PHASE 5: TESTING/QA

• Most typical testing patterns apply EXCEPT

• Can be tough to automate testing if data is changing rapidly

• You want the full dataset at your ﬁnger tips

• You can still do it!

37

WATCH OUT FOR RELEVANCY!
• Sometimes seems like once you
validate one search, the previous
one starts failing

• How do you empirically
measure this?

• Need production like data sets
during QA

• Don’t get tied up in doc id 598 is
the third result. Be happy 598
shows up in ﬁrst 10 results!
38

EXPLORATORY TESTING?

• ...simultaneous learning, test
design and test execution

• Requires tester to understand
the corpus of data indexed

• behave like a user
James Bach

http://en.wikipedia.org/wiki/Exploratory_testing
39

STUMP THE CHUMP

• You can always write a crazy
search query that Solr will
barf on... Is that what your
users are typing in?

40

DOES SOLR ADMIN WORK?

• Do searches via Solr Admin reﬂect what the front end does? If
not, provide your own test harness!

• Make adhoc searches by QA really really easy

• “Just type these 15 URL params in!” is not an answer!

41

PHASE 6: DEPLOYMENT

• Similar to any large scale system

• Network plumbing tasks, multiple servers, IP addresses

• Hopefully all environment variables are external to Solr
conﬁgurations?

• Think about monitoring.. Replication, query load!

42

DO YOU NEED UPTIME
THROUGH RELEASE?

• Solr is both code, conﬁguration, and data! Do you have to
reindex your data?

• Can you reindex your data from someplace else?

43

PRACTICE THIS PROCESS!

• mapping out the steps to backup cores, redeploy new ones,
update master and slave servers is fairly straightforward if done
ahead of time

• These steps are a great thing to involve your Ops team in

45

PHASE 7: ONGOING TUNING

• The part we forget to budget for!

• Many knobs and dials available to Solr, need to keep tweaking
them as:

• data set being indexed changes

• as behavior of users changes

46

HAVE REGULAR CHECKINS
WITH CONTENT PROVIDERS

• Have an editorial calender of content? Evaluate what synonyms
you are using based on content

• Can you better highlight content using Query Elevation to boost
certain documents?

47

QUERY TRENDS

• Look at queries returning 0 results

• are queries getting slower/faster

• are users leveraging all the features available to them

• Does your analytics highlight negative behaviors such as
pogosticking or thrashing?

• AUTOMATE THESE REPORTS!

48

1.0-1.5s 2.0-2.5s
1.5-2.0s2.5s
>
Query Duration
6% 2% 2%
1%

0.5-1.0s
20%

Less than 0.5 s
69%

89% of all
queries take
less than 1s

49

Note: It’s harder to get queries in that 0-0.1s range, though
It is questionable if focusing on that leads to noticeable
improvement

Over time, we want to see this trend
become steeper, which would indicate
queries are becoming shorter and more
noticeable performance improvements

50

Project deﬁnition Start!

Precursor Work

Prototype

Implementation

Testing/QA repeats!

Deployment

Ongoing Tuning Maximize value of investment

51

Solr pattern

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a Solr pattern

Semelhante a Solr pattern (20)

Mais de OpenSource Connections

Mais de OpenSource Connections (20)

Último

Último (20)

Solr pattern