SlideShare uma empresa Scribd logo
1 de 73
Apache Solr
                                               - search for everyone!




http://www.flickr.com/photos/malikdhadha/
• Co-founder and R&D Director at Integrasco

• Founder and developer of Notpod

• Leader of javaBin Sørlandet

• Programmer and Open Source enthusiast


                       Jaran Nilsen
                          twitter.com/jarannilsen
A global leader in social intelligence
What is search?
http://www.flickr.com/photos/denverjeffrey/5133538450/
http://www.flickr.com/photos/somegeekintn/3709203268/
This is Apache Solr
• Open Source enterprise search server from
  Apache

• Built on Apache Lucene

• Offers additional features to those of Lucene
First, a little history...
• Started out as a in-house CNET project for adding
  search functionality to the CNET website in 2004
• Started out as a in-house CNET project for adding
  search functionality to the CNET website.

• Donated to Apache Software Foundation in 2006
• Started out as a in-house CNET project for adding
  search functionality to the CNET website.

• Donated to Apache Software Foundation in 2006

• Graduated from incubation status in 2007
• Since version 3.1 (March 2011), Solr and Lucene are
  now sharing the same codebase.




                         +
• Since version 3.1 (March 2011), Solr and Lucene are
  now sharing the same codebase.

• Meaning sharing of features and fixes between the
  projects at a much higher rate




                         +
wget http://apache.uib.no/lucene/solr/3.6.1/apache-solr-3.6.1.tgz

tar xvf apache-solr-3.6.1.tgz

cd apache-solr-3.6.1/example/

java -jar start.jar




                                      4 small steps...
...and we’re up!
cd exampledocs/

./post.sh ipod_other.xml
The obvious part – full text searching




http://www.flickr.com/photos/49889874@N05/6877840735/
• q=yourquery

• Example:
  q=android AND ios&rows=100
Don’t worry - it’s not just XML!
The Schema
http://www.flickr.com/photos/14804582@N08/2111269218/
Key elements of schema.xml

• Unique identifer

• Default search field

• Types

• Fields and dynamic fields

• Copy fields
Solr configuration
http://www.flickr.com/photos/esetianto/4099842490/
Key elements of solrconfig.xml

• Settings for your search index

• Warm-up routines

• Cache settings

• Replication

• Update chain
Features




http://xkcd.com/619/
Facets
Facets
Facets
Just add this to your URL:


• facet=true&facet.field=field

• Example:
  facet=true&facet.field=language
Facet queries
Facet queries

&facet=true
&facet.query=price:[* TO 100]
&facet.query=price:[100 TO 200]
&facet.query=price:[200 TO 300]
&facet.query=price:[300 TO 400]
&facet.query=price:[400 TO 500]
&facet.query=price:[500 TO *]
Now you want to drill down!
http://www.flickr.com/photos/kk/4712925031/
Filter queries
Filter queries
Filter queries
Just add this to your URL:


• fq=field:value

• Example:
  fq=source:facebook.com
Produce «word clouds»
•TermsComponent

•TermVectorComponent
TermVectorComponent

    Term vector information aggregator
Scalability
http://www.flickr.com/photos/dickyfeng/3249837481/
•Sharding

•Replication
Index sharding strategy




   Solr         Solr          Solr           Solr          Solr
instance 1   instance 2    instance 3     instance 4   instance N
Index sharding strategy




   Solr         Solr          Solr           Solr              Solr
instance 1   instance 2    instance 3     instance 4       instance N




  ipod OR iphone                                  Search
Index sharding strategy




   Solr         Solr          Solr           Solr              Solr
instance 1   instance 2    instance 3     instance 4       instance N




  ipod OR iphone                                  Search
Just add this to your URL:

• shards=shard1,shard2

• Example:
 q=android&shards=solr1.node.com/s
 olr,solr2.node.com/solr,solr3.node.co
 m/solr
Replication

Indexer      Master




             Slave      android   Search
Replication configuration
Integration of Solr




http://www.flickr.com/photos/certified_su/229016531/
Solr has support for many different
            languages
•   Ruby
•   PHP
•   Java
•   Scala
•   Python
•   .NET
•   Perl
•   JavaScript
Tips & Gotcha’s
Or; how to avoid the sinkholes!
http://www.flickr.com/photos/67165210@N00/4661419386/
«What data do your
  clients need?»
«Figure out what kind of
  searches you will be
         doing»
«Spend a siginficant
    amount of time
designing schema.xml»
«Add dynamic fields for
 ALL your field types»
«Do not use Solr as your
 primary data store!»
«The 20 million mark»
But most importantly...




Don’t panic!
http://www.flickr.com/photos/11304375@N07/2046228644
http://www.flickr.com/photos/davidw/2201099990/
Thank you!




http://www.jeremiahblatz.com/personal/pics/Australia_Travel_Pictures_2009/day12/164_Sunrise_Great_Barrier_Reef.html

Mais conteúdo relacionado

Mais procurados

Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
Sourcesense
 

Mais procurados (20)

Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Introduction Apache Solr & PHP
Introduction Apache Solr & PHPIntroduction Apache Solr & PHP
Introduction Apache Solr & PHP
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Using Apache Solr
Using Apache SolrUsing Apache Solr
Using Apache Solr
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 

Semelhante a Apache Solr - search for everyone!

Yahoo! Search monkey API - CEBIT 2008
Yahoo! Search monkey API - CEBIT 2008Yahoo! Search monkey API - CEBIT 2008
Yahoo! Search monkey API - CEBIT 2008
Eric D.
 

Semelhante a Apache Solr - search for everyone! (20)

ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big Data
 
Big Search with Big Data Principles
Big Search with Big Data PrinciplesBig Search with Big Data Principles
Big Search with Big Data Principles
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
Yahoo! Search monkey API - CEBIT 2008
Yahoo! Search monkey API - CEBIT 2008Yahoo! Search monkey API - CEBIT 2008
Yahoo! Search monkey API - CEBIT 2008
 
Meet Solr For The Tirst Again
Meet Solr For The Tirst AgainMeet Solr For The Tirst Again
Meet Solr For The Tirst Again
 
Apache Solr for TYPO3 what's new 2018
Apache Solr for TYPO3 what's new 2018Apache Solr for TYPO3 what's new 2018
Apache Solr for TYPO3 what's new 2018
 
[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping[LDSP] Search Engine Back End API Solution for Fast Prototyping
[LDSP] Search Engine Back End API Solution for Fast Prototyping
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Solr @ eBay Kleinanzeigen
Solr @ eBay KleinanzeigenSolr @ eBay Kleinanzeigen
Solr @ eBay Kleinanzeigen
 
Contributing to rails
Contributing to railsContributing to rails
Contributing to rails
 
Practical IronRuby
Practical IronRubyPractical IronRuby
Practical IronRuby
 
Intro to Apache Solr
Intro to Apache SolrIntro to Apache Solr
Intro to Apache Solr
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Apache Solr for TYPO3 at TYPO3 Usergroup Day Netherlands
Apache Solr for TYPO3 at TYPO3 Usergroup Day NetherlandsApache Solr for TYPO3 at TYPO3 Usergroup Day Netherlands
Apache Solr for TYPO3 at TYPO3 Usergroup Day Netherlands
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
 
SOLR
SOLRSOLR
SOLR
 
The Guardian Open Platform Content API: Implementation
The Guardian Open Platform Content API: ImplementationThe Guardian Open Platform Content API: Implementation
The Guardian Open Platform Content API: Implementation
 

Mais de Jaran Flaath (10)

Intro to docker
Intro to dockerIntro to docker
Intro to docker
 
SpringBoot
SpringBootSpringBoot
SpringBoot
 
Effective Testing
Effective TestingEffective Testing
Effective Testing
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
JUnit 5 Alpha
JUnit 5 AlphaJUnit 5 Alpha
JUnit 5 Alpha
 
MyGitBackup
MyGitBackupMyGitBackup
MyGitBackup
 
Delivering Great Software
Delivering Great SoftwareDelivering Great Software
Delivering Great Software
 
Spring Test DBUnit
Spring Test DBUnitSpring Test DBUnit
Spring Test DBUnit
 
Trello
TrelloTrello
Trello
 
Bootstrap lightning talk
Bootstrap lightning talkBootstrap lightning talk
Bootstrap lightning talk
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Apache Solr - search for everyone!

Notas do Editor

  1. Welcome...Purpose of this talk is to show you how easy it is to get started and give you an idea of some of the cool things you can do with Solr without too much effort. I love to get quick and easy to understand introductions to tools, that can help me get started quickly. Hopefully that’s what I’ll be able to provide you guys with today.Solr is a big project and it would be silly to attempt to cover everything in one evening, so I am going to focus on some of the features that I believe are the easiest to get started with and which I also am well familiar with and have had good experience with.So let’s get cracking...
  2. Most of you already know me, but I see there are some new faces...
  3. Since 2004, Integrasco has been providing social media methodologies and technologies to corporations, agencies, government regulators and other institutionsDedicated team of vertical expertsTechnology platform for analysis of Social Media where Apache Solr is a vital component.
  4. But enough about me and Integrasco, i bet that’s not why most of you came to this meetup. Let’s talk search. What is search?
  5. I bet many of you think of get this image in your head when you’re thinking about search. For most people today, the term search is equal to the name Google. Because of this «to google» has even entered our dictionaries as an officially accepted verb.Search equals input box and search button! In in many cases this is true. But as Google has become expert on, and which hopefully we will discover Solr can help us do as well – it’s not about searching... ->
  6. It’s all about finding – helping your users find the information they are seeking – not having them search for it! You may think I am just playing with words here, but in my opinion there is a big different between «searching» and «finding». Let’s not fall into a discussion about semantics here, but let’s just say that our job as engineers is to help people find the information they are looking for, and spend our time efficiently doing that instead of spending our time designing search boxes and buttons!
  7. You don’t want your users having spend hours searching on your site (or perhaps you do, if your revenue is driven by advertisement, but let’s also put that discussion on the shelf for later). We want to have our users finding what they’re looking at with little effort and give them a good experience. The great thing is that Solr can help you with just that, to find stuff. It has a lot of features that can improve your user experience and bring value to your data without too much effort. And as we will see it does not have to be linked to search engines at all! That’s eactly what I hope to show you some of today.
  8. Apache Solr is an open source...... But before we get into the juicy details...
  9. ... lets learn a little history.
  10. Commo codebase since March 2011Which means...
  11. ... They are now sharing features and fixes at a much higher rate than what was the case before.Many often wonder what’s the difference between Solr and Lucene, and up until this merge Solr was often considered to be an additional layer on top of Lucene, providing additional functionality that was not available in Lucene.
  12. What I often find frustrating when looking at new frameworks and technologies, is in many cases the amount of time and resources you have to invest in order to try it out. I am not talking about reading documentation to get a deeper understanding of it – which you eventually have to, but the time you have to invest in order to just get started... I love those 3-point very quick «getting started» guides that just works. That’s what I hope to leave behind here today.To get started with Solr is actually very, very simple. Even though my examples here have been taken from a Unix environment, I’ve tried to make them as platform and language independent as possible. I myself work in a Java environment, but Solr is fully possible to use in many other environments - just as well as from Java.
  13. I’ve tried to shave the process of getting Solr up and running down as much as possible, and I actually came down to these four steps. This is actually all you need to be up and running with a working instance of Solr. There is obviously a lot of configuration and customization you can do to tailor Solr to your specific needs, but to get started playing this is actually all you need to do.
  14. Solr is served by Jetty on port 8983 by default, and opening the solr admin application in a browser yields this view. It’s by no means i candy, but then again – that is YOUR part of the JOB, to create a good looking application that helps find the valuable information Solr can serve you.As you see in the middle here, there’s a search box and a search button – who would have thought that? Let’s cick it!
  15. Voila – there’s not much here yet.Obviously becaus we haven’t actually indexed anything yet. So let’s add some data.
  16. Solr comes with a good set of example data that you can easily import and index to play around and see some of Solr capabilities. These example documents comes with the downloaded package and can be imported using the bash scripts available in the exampledocs folder. Let’s import a couple of ipod related documents and see what happens!
  17. We refresh our search from before and... Behold! We have search results. Don’t be scared by the XML output here, there’s several tools available to work with the response!... So, now that we have some data – it takes us to the obvious part of Solr
  18. Full text searching!This is what Solr was made for and whenever you have a set of documents that you want to query against, you should consider adding a solr instance to your system and query against the documents there – rather against a database. You will quickly see the value it adds! Now, as with most other things in feature, the querying is done via a parameter in the URL...
  19. Very simple!Now lets go back to our admin user interface for another example – just to make sure we don’t scare anyone off with URL parameters in addition to the XML!
  20. If you remember the query from before, it was «asterix, colon, asterix». Solr INDEXES are composed of FIELDS, and you can specify what field you want to search by querying «field, colon, value». So when we searched for «asterix, colon, asterix» we were searching for all values in all fields – hence giving us all documents in the index.Let’s see what happens when we search for documents where the PRICE field contains the value 19.95.
  21. ... Unsurprisingly we get documents matching the price 19.95. Ok, but what if you want to add your own test data to play around with? Let’s look quickly at the input format that were used.
  22. It looks like this. A very simpe XML structure that you can easily modify to fit your needs.
  23. Here is the full ipod example we just imported an tested with.
  24. As I said earlier – do not be scared by the XML if you feel it’s overwhelming. There are several different options both consuming the response, as well as for indexing data into Solr. And you do not need to specify all your input data in files, there are various means for importing from databases and other commonly used data sources. The example we have looked at so far has been centered around products from a shop. Obviously this is not likely to fit all your needs. As with any other technology for handling data, you need to define a DOMAIN, or a SCHEMA....
  25. In Solr, the Schema is one of the core configurations you need to master. It’s at the core of your Solr solution and it’s important to spend time designing this schema. Let’s see what it looks like...
  26. Of the key elements, we find....
  27. We define the different types we want to have available in our schema...
  28. ... And finally we define all the fields that we want in our schema. One neat thing with Solr, is the elements defined towards the bottom here – the DYNAMIC FIELDS. These allows us to specify any field we want on a document basis, after the index is up and running. This gives us good flexibility in cases where only some document contains some special data. An EXAMPLE from one of our projects at INTEGRASCO was when we received a batch of data we needed to make available for analysis together with our social data. This data contained a lot of META data that did not make sense for social data, but we needed to be able to perform queries across everything. In this situation, our dynamic fields became very handy, as we could simply define them when indexing the new Frankenstein data and it fit nicely in with the rest of our social data.
  29. The second important part of Solr.
  30. Solrconfig.xml is a complex XML document, but as important to spend time with as the Schema.xml – however, we will go into detail on it today as we don’t have time, but here’s some of the key elements that you configure in this file: Settings... How solr should handle your index files etc, Update chain ... You can define your own components for introducing into the indexing processing.
  31. Now that we’ve got solr configured and up and running, lets look closer at what it can DO beyond simple searching!
  32. The first thing I want to show is FACETS.Facets are category counts for search results.Meaning you can provide details about which categories your search results falls within. And as we will see, these categories can be a lot of things. Facets is in many situations also mentioned as FACETED NAVIGATION. Facets are a very powerful feature of solr when it comes to navigating your search results, and as we said in the beginning – it’s not just about searching, it’s about finding. And facets are a good way to provide your users with neat ways to navigate your data.
  33. Here is another good example of how facets CAN be used to enable navigation. This is taken from Finn.no. Now, I do not know whether Finn.no is USING Solr and facets for this information, but it’s a good example of how facets can be used to enable powerful and user friendly navigation.
  34. Yet another example of search facets, or faceted navigation. This time from the webshop Komplett.no, where – after you have done a search – you will get details about CATEGORIES, PRODUCERS and PRICE RANGES.
  35. And this is how you do this withSolr. You simply add a few parameters to your QUERY URL, specifying what field, or fields you wish you receive facet information for – and you’re good to go!To illustrate the output I’ve done a search in our system using the example parameters here, and this would yield....
  36. ... This output. As you see here, we get the top 10 languages for my search. You’re not limited to only getting top 10, you choose how many facet counts you want.
  37. Another cool thing is that you are not limited to just getting facets for predefined fields, you can also use FACET QUERIES to provide unique value for your data. Here’s one example from our system, where we use facet queries on date fields to generate statistics for provided queries and display trends over time.The way we do this is to add a facet query for each of the timespans we want in our chart. Then we simply render those facet query counts in a chart.Another very common way to use facet queries, is for instance to generate price range information in online stores – like we saw in the screenshot from Komplett.no. (which we have an example query sting for here ... On the next slide)
  38. So, once we have our FACETs, we may want to drill further down into our data. This is done via FILTER QUERIES. These are constraints you apply to your query to filter the results you get back from Solr.
  39. Here we are going to filter on our facets as we do it in our solution. We have performed a search, and we wish to drill down into this by only looking at Facebook data....
  40. So we choose the facebook Facet...
  41. ... and this adds a filter to our existing query – limiting our search within the MEDIA type FACEBOOK.Very easy. And the way we actually do this, in Solr....
  42. … is by adding “fq” parameters to our query. In this case, we added a filter for Facebook. And you can add as many of these as you want, so you’re not limited to filtering on only one thing at the time!Another important thing about filter queries are that they are a big help for optimizing your queries. The reason for this is that they are applied BEFORE other queries are done, which means they limit the number of documents Solr has to query.
  43. The next thing I want to talk about, which in some cases is a bit more tricky to get working, but is definitively worth it for your users – is WORD CLOUDS (or tag clouds, term clouds – whatever you like to call them)We like to call them BUZZ CLOUDS because they tell us what’s buzzing around i the Social Media space.
  44. The way you do this is via SPECIAL SEARCH COMPONENTS in Solr. These are components that you enable in your Solr config which provides access to additional information about the terms and term vectors Solr is using for your index and individual documents. The TERMS COMPONENT is a way to get information from Solr about all the terms available in your index, and how many documents they appear in. So you get the DOCUMENT FREQUENCY for all your terms. This can be used for creating an auto-suggest feature for your search box, although newer versions of Solr contains a special componet for doing just that – so it’s recommended to use that in stead. However, the reason I mention it here is that it is absolutely possible to use this to create such a word cloud for your index. What we decided was a better solution, althought a bit more processing heavy solution, was to use the TermVectorComponent. This component gives you term vector information for individual documents, rather than the entire index. This means we can get information about the term which are included in our resultset and provide a word cloud for that, rather than the entire index.
  45. They way we are doing this is by performing a query, then aggregating the term vector information for the documents i our search result. This means a bit more processing since we have to traverse the documents we’re interested in – and aggregate the term frequencies for each document – we are counting how often the terms appear in our resultset. We then use this information to render the terms based on how often they appear....
  46. ... Which in the end enables us to display clouds like this
  47. Now that we’ve looked at some of the powerful, easy to use features of Solr – how do we scale it?The question is, do we continue to build towards the sky, adding more and more memory and processing power, or do you spread it out across smaller instances and distribute across them? How do we ensure that our indexes keeps a decent size and that we can distribute our search? There’s two key features which are useful for this, and which are easily available in Solr...
  48. ... That’s SHARDING and REPLICATION.
  49. From an INDEXINGperspective sharding is about determining where to index documents. You do this using a SHARDING STRATEGY. This can be anything from an ID BASED strategy, where you place documents in different Solr instances based on a unique ID. Other strategies can be by GEOGRAPHY, USERNAMES – or as we do at Integrasco, build a strategy based on DATES.The drawback here, is that Solr does not support sharded indexing out of the box, so you need to develop the framework for this yourself. The way we have done this is by creating an index writer for each shard and connecting these index writers to our sharding strategy so it selects the right one to dispatch documents to.
  50. From the SEARCH perspective sharding is very easy. You simply provide your COORDINATING INSTANCE (which can be any of your available instances), with a list of the URLs to the shards you wish to distribute the search across. The coordinating instance will then handle DISTRIBUTION of queries to all the specified shards, and CONSOLIDATE the result before it’s send back to you.
  51. And you do not have to query all shards, it’s a fairly easy job to get your client to only specify relevant shards when performing the query. For instance in the case of Integrasco, where we have sharded based on date – we really don’t have to query the entire cluster when we know we only want to search data for 2011. Then we can select the shards for 2011 and only specify those as the shards we wish the query sent off to.
  52. And how do we do this? Again, by adding some parameters to our query URL containing the addresses of the shards we want to distribute the query across.
  53. When it comes to replication, we build a common MASTER SLAVE relationship, where we have a set of masters where we do all our WRITING and a set of slaves where we do all our SEARCHING. This way we can keep the masters fairly cheap on resources as they do not require that we keep caches up to date in memory, as they do not need to handle any queries. Multiple slavesRepeaters
  54. And the configuration for this is just a cople of lines of XML code in your SOLRCONFIG xml file, as you see examples of here.The only bad thing with the replication feature in Solr that we have seen is that is is very resource hungry when it is replicating. It sucks as much bandwidth and writes to disk as fast as it can, and in cases where there are large amounts of data in each replication batch this can quickly lead to unwanted starvation. So make sure you properly test your replication setup before going into production.
  55. When creating slide decks, I love to go on CreativeCommons.org and search for the main topic of the section and use one of the first pictures as the background. That sometimes leads to interesting slides like this one...Anyways, it’s about integration of Solr – how do you integrate it in your project. And not surprisingly, Solr comes with a lot of client libraries for different languages...
  56. As you can see Solr has good support for many different languages, and most of this are easy-to-use client libraries offering METHODS FOR ACCESSING all of the features we’ve covered in this talk. At Integrasco we’re mostly using Java, so the SolrJ library is what we’re using and it’s providing a very easy to use interface for accessing most of the Solr features we need – more or less right out of the box. I think the only two things we have had to do MORE EXTENSIVE DEVELOPMENT of on top of Solr and SolrJ is SHARDED INDEXING and the WORD CLOUDS using the TermVectorComponent.
  57. A very quick example of the use of SolrJ in Java.
  58. Use sufficient time to analyze and figure out what data your clients need and are interested in. This should be at the core of your planning and help shape the design of your schema and configuration.
  59. Similarly, figure out what kind of searches you will be doing. And make sure your schema allows for these queries to happen. Also, make sure you configure the appropriate warm-up queries to ensure your cache is performing optimally for the type of queries you are doing.
  60. Once you have the previous two covered, make sure you spend significant time designing your schema.xml. As I said this is at the core of your Solr solution and can be difficult to change once you have a lot of data indexed.
  61. This is a very good practice to follow. You never know when you might need a new field for a document, make sure you’re prepared with dynamic fields.
  62. Solr is not for storing data! The documentation even says you should not do it because you should expect the index to become corrupted at one time or another.
  63. As discovered by Twitter (and perhaps others), 20 million documents is the max size you should aim for when it comes to your shard size.
  64. There is a lot of levers and switches in Solr to use for optimization and shaping your search solution to exactly fit your needs. However, don’t mind this in the beginning. Mix and match from the default schema to get familiar with how it works. Start out very simple and build on it. You’ll learn it very quickly and see that search functionality is not just for the large players!
  65. Looking a little further ahead into the future, there’s a lot of exciting things going on with Solr. One of the most exciting developments from my point of view is the Solr Cloud product – which is built on ZooKeeper and will offer a much better setup for large search clusters. Also, there’s work going on with building a Solr distribution based on Hadoop for large scale distributed indexing and search. So it will be very interesting to see where Solr takes us over the next couple of years!
  66. This has been a little taste of what Solr has to offer – and hopefully shown you that it’s not just for large enterprises or people with huge amounts of data. Hopefully you have seen that Solr can fit very well into situation where you normally would not think of placing a search server, and that you start thinking of new ways help your users FIND what they’re looking for – or even better, help them find what they did not know they needed to look for 