SlideShare uma empresa Scribd logo
1 de 72
Baixar para ler offline
Full Text Search with
     Apache Solr
        Pittaya Sroilong
      pittaya@gmail.com
Who am I?
Solr?
Not her!
But a search server
based on Lucene
Lucene?
Full-text search
     library
100% java
   :-(
Solr is based on
     Lucene
XML/HTTP, JSON
  interface
Open Source
Shield us from using
        Java
         :-)
Who use Solr/Lucene?
Who use Solr/Lucene?
What is our problem?
How do we
implement this?
SELECT * FROM post WHERE
topic LIKE ‘%aoi%’ OR author
LIKE ‘%aoi%’ ORDER BY id DESC
SELECT * FROM post WHERE
(topic LIKE ‘%aoi%’ OR author
LIKE ‘%aoi%’)
OR
(topic LIKE ‘%miyabi%’ OR
author LIKE ‘%miyabi%’)
ORDER BY id DESC
Full table scan
         =
Performance killer
No search scoring
RDBMS isn’t designed
    to do this
Use the right tool!
Indexer
    Update index
                   Query


    Solr                    Web App
   Lucene
                   Result
1
De ne schema.xml
<field name=quot;idquot; type=quot;stringquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;fullnamequot; type=quot;stringquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;positionquot; type=quot;stringquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;tagquot; type=quot;stringiquot;
indexed=quot;truequot; stored=quot;truequot;
multiValued=quot;truequot; />
2
Deploy on any J2EE
    container
Tomcat, Jetty, etc.
3
Index documents
Document format
<add><doc>
 <field name=”id”>555</field>
 <field name=”fullname”>Kaka</field>
 <field name=”position”>Midfielder</field>
 <field name=”tag”>AC Milan</field>
 <field name=”tag”>Brazil</field>
</doc></add>
Post to Solr
http://<host>/solr/update
Any language that can
   do HTTP POST
PHP, Perl, Python
cURL
Commit
<commit />
4
Search
Query from
http://<host>/solr/select
Use Solr query syntax
http://<host>/solr/select?
q=tag:madrid&start=0&rows
=2& =fullname,position,tag
Response in XML or
JSON (con gurable)
<response>
 <result numFound=”46” start=”0”>
  <doc>
    <str name=”fullname”>Sergio Ramos</str>
    <str name=”position”>Defender</str>
    <str name=”tag”>Real Madrid</str>
    <str name=”tag”>Spain</str>
  </doc>
  <doc>
    <str name=”fullname”>Diego Forlan</str>
    <str name=”position”>Striker</str>
    <str name=”tag”>Atletico Madrid</str>
    <str name=”tag”>Uruguay</str>
  </doc>
 </result>
</response>
&wt=json
{
 “result”: { “numFound”: 46, “start”: 0,
   “docs” : [
     { “fullname”: “Sergio Ramos”,
       “position”: “Defender”,
       “tag”: [“Real Madrid”, “Spain”] },
     { “fullname”: “Diego Forlan”,
       “position”: “Striker”,
       “tag”: [“Atletico Madrid”, “Uruguay”] }
   ]
 }
}
Query examples
• David Pizzarro
 • Equiv: David OR Pizzarro
 • Default operator is
   “OR” (con gurable)
 • Result: David Villa, David
   Pizzarro, Claudio Pizzarro,
   David Seaman
• +David +tag:Roma
 • Equiv: David AND tag:Roma
 • Result: David Pizzarro
• +David +position:(Striker OR
 Mid elder)
 • Result: David Villa, David
   Pizzarro
Updating
Post new document to
http://<host>/solr/update
Deleting
<delete>
<id>345</id>
</delete>
<delete>
<query>tag:Brazil</query>
</delete>
<delete>
<query>*:*</query>
</delete>
Thai support
fwdder.com
Sharing forward mails
Use customized eld
   in schema.xml
<fieldType name=quot;html_thquot; class=quot;solr.TextFieldquot;
positionIncrementGap=quot;100quot;>
      <analyzer type=quot;indexquot;>
        <tokenizer
class=quot;solr.HTMLStripStandardTokenizerFactoryquot;/>
        <filter class=quot;solr.ThaiWordFilterFactoryquot; />
        <filter class=quot;solr.StopFilterFactoryquot;
ignoreCase=quot;truequot; words=quot;stopwords.txtquot;/>
        <filter class=quot;solr.LowerCaseFilterFactoryquot;/>
        <filter class=quot;solr.EnglishPorterFilterFactoryquot;
protected=quot;protwords.txtquot;/>
        <filter
class=quot;solr.RemoveDuplicatesTokenFilterFactoryquot;/>
      </analyzer>
    </fieldType>
<field name=quot;idquot; type=quot;stringquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;titlequot; type=quot;html_thquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;detailquot; type=quot;html_thquot;
indexed=quot;truequot; stored=quot;truequot; />
<field name=quot;tagquot; type=quot;stringiquot;
indexed=quot;truequot; stored=quot;truequot;
multiValued=quot;truequot; />
<field name=quot;useridquot; type=quot;integerquot;
indexed=quot;falsequot; stored=quot;truequot; />
Index analyzer
Debugging
&debugQuery=on
Further readings
•   http://lucene.apache.org/solr/
•   http://wiki.apache.org/solr
•   http://www.xml.com/pub/a/2006/08/09/
    solr-indexing-xml-with-lucene-
    andrest.html
•   http://lucene.apache.org/java/docs/
    scoring.html
Q&A

Mais conteúdo relacionado

Mais procurados

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
Chris Huang
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 

Mais procurados (20)

Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
Introduction to Apache Solr.
Introduction to Apache Solr.Introduction to Apache Solr.
Introduction to Apache Solr.
 
20130310 solr tuorial
20130310 solr tuorial20130310 solr tuorial
20130310 solr tuorial
 
Beyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and SolrBeyond full-text searches with Lucene and Solr
Beyond full-text searches with Lucene and Solr
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 

Destaque

Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 
A Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingA Sceptical Guide to Functional Programming
A Sceptical Guide to Functional Programming
Garth Gilmour
 
Effective akka scalaio
Effective akka scalaioEffective akka scalaio
Effective akka scalaio
shinolajla
 

Destaque (20)

Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Solr for Indexing and Searching Logs
Solr for Indexing and Searching LogsSolr for Indexing and Searching Logs
Solr for Indexing and Searching Logs
 
Solr: Search at the Speed of Light
Solr: Search at the Speed of LightSolr: Search at the Speed of Light
Solr: Search at the Speed of Light
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Scaling Solr with Solr Cloud
Scaling Solr with Solr CloudScaling Solr with Solr Cloud
Scaling Solr with Solr Cloud
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Battle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearchBattle of the giants: Apache Solr vs ElasticSearch
Battle of the giants: Apache Solr vs ElasticSearch
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop ClustersCloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
Cloudera Federal Forum 2014: Tracking Provenance in Hadoop Clusters
 
CommunitySherpa Field Presentation
CommunitySherpa Field PresentationCommunitySherpa Field Presentation
CommunitySherpa Field Presentation
 
Spring 3.1 and MVC Testing Support - 4Developers
Spring 3.1 and MVC Testing Support - 4DevelopersSpring 3.1 and MVC Testing Support - 4Developers
Spring 3.1 and MVC Testing Support - 4Developers
 
Chicago Hadoop Users Group: Enterprise Data Workflows
Chicago Hadoop Users Group: Enterprise Data WorkflowsChicago Hadoop Users Group: Enterprise Data Workflows
Chicago Hadoop Users Group: Enterprise Data Workflows
 
Reactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons LearnedReactive Programming With Akka - Lessons Learned
Reactive Programming With Akka - Lessons Learned
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection Framework
 
A Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingA Sceptical Guide to Functional Programming
A Sceptical Guide to Functional Programming
 
Actor Based Asyncronous IO in Akka
Actor Based Asyncronous IO in AkkaActor Based Asyncronous IO in Akka
Actor Based Asyncronous IO in Akka
 
Effective akka scalaio
Effective akka scalaioEffective akka scalaio
Effective akka scalaio
 
Big Data - How important it is
Big Data - How important it isBig Data - How important it is
Big Data - How important it is
 
White Paper Presentation (2)
White Paper Presentation (2)White Paper Presentation (2)
White Paper Presentation (2)
 

Semelhante a Using Apache Solr

Rails 3: Dashing to the Finish
Rails 3: Dashing to the FinishRails 3: Dashing to the Finish
Rails 3: Dashing to the Finish
Yehuda Katz
 
Building Better Applications with Data::Manager
Building Better Applications with Data::ManagerBuilding Better Applications with Data::Manager
Building Better Applications with Data::Manager
Jay Shirley
 

Semelhante a Using Apache Solr (20)

Os Pruett
Os PruettOs Pruett
Os Pruett
 
ApacheCon 2005
ApacheCon 2005ApacheCon 2005
ApacheCon 2005
 
Rapid prototyping search applications with solr
Rapid prototyping search applications with solrRapid prototyping search applications with solr
Rapid prototyping search applications with solr
 
QA for PHP projects
QA for PHP projectsQA for PHP projects
QA for PHP projects
 
DataMapper
DataMapperDataMapper
DataMapper
 
Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)Solr's Search Relevancy (Understand Solr's query debug)
Solr's Search Relevancy (Understand Solr's query debug)
 
Introduction to Active Record at MySQL Conference 2007
Introduction to Active Record at MySQL Conference 2007Introduction to Active Record at MySQL Conference 2007
Introduction to Active Record at MySQL Conference 2007
 
Java Web Programming [5/9] : EL, JSTL and Custom Tags
Java Web Programming [5/9] : EL, JSTL and Custom TagsJava Web Programming [5/9] : EL, JSTL and Custom Tags
Java Web Programming [5/9] : EL, JSTL and Custom Tags
 
前端概述
前端概述前端概述
前端概述
 
Rails 3: Dashing to the Finish
Rails 3: Dashing to the FinishRails 3: Dashing to the Finish
Rails 3: Dashing to the Finish
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC[Coscup 2012] JavascriptMVC
[Coscup 2012] JavascriptMVC
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
 
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
Um roadmap do Framework Ruby on Rails, do Rails 1 ao Rails 4 - DevDay 2013
 
Rails, Postgres, Angular, and Bootstrap: The Power Stack
Rails, Postgres, Angular, and Bootstrap: The Power StackRails, Postgres, Angular, and Bootstrap: The Power Stack
Rails, Postgres, Angular, and Bootstrap: The Power Stack
 
Building Better Applications with Data::Manager
Building Better Applications with Data::ManagerBuilding Better Applications with Data::Manager
Building Better Applications with Data::Manager
 
Unit testing zend framework apps
Unit testing zend framework appsUnit testing zend framework apps
Unit testing zend framework apps
 
Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12Workshop quality assurance for php projects tek12
Workshop quality assurance for php projects tek12
 
2009 Barcamp Nashville Web Security 101
2009 Barcamp Nashville   Web Security 1012009 Barcamp Nashville   Web Security 101
2009 Barcamp Nashville Web Security 101
 
Unit testing with zend framework tek11
Unit testing with zend framework tek11Unit testing with zend framework tek11
Unit testing with zend framework tek11
 

Mais de pittaya (6)

Firefox OS
Firefox OSFirefox OS
Firefox OS
 
Scaling Wordpress
Scaling WordpressScaling Wordpress
Scaling Wordpress
 
Cooking for guys
Cooking for guysCooking for guys
Cooking for guys
 
Reading xkcd
Reading xkcdReading xkcd
Reading xkcd
 
Fwdder : share your forward mails
Fwdder : share your forward mailsFwdder : share your forward mails
Fwdder : share your forward mails
 
Cross Processing
Cross ProcessingCross Processing
Cross Processing
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Using Apache Solr