SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
Small wins In a small
time with Apache Solr
Who am I?


    My (Buddhist) name is Upayavira

    Consultant with Sourcesense, specialising in
    search and operational technologies

    A member of the Apache Software Foundation
Who are Sourcesense?


    Open Source integrator, specialising in:
    
        Search
    
        Business Intelligence
    
        Content Management
    
        Application Lifecycle Management

    Offices in London, Amsterdam, Milan and Rome
Committers and Contributors

     Search:
     
            Lucene/Solr – contributor
     
            Hibernate Search – committer
     
            Lucene Infinispan integration – lead developer
     
            Apache UIMA – committer

     CMS:
     
            Apache Chemistry – contributor
     
            Apache Jackrabbit – contributor
     
            JBoss GateIn Portal – committer
     
            OpenSSO-Alfresco - contributor
What is Lucene?


    Lucene is a Java information retrieval library

    Provides free text search facilities

    Started in 2000, by Doug Cutting

    A project of the Apache Software Foundation

    It is designed to be embedded in Java apps
What is Solr?


    Solr is an enterprise search server based on
    Lucene

    Wraps Lucene with a RESTful web interface

    Provides configurable schema

    Provides replication functionality
Solr Design
                                       User queries




     Solr          SearchHandler
     instance


                       Lucene
                        index



                UpdateRequestHandler



                                        content
                                       application
Prerequisites



    Java, preferably Java 6

    Apache Solr 1.4.1

    http://www.sourcesense.com/dev8d-solr.zip
Prerequisites

    Extract your Solr distribution

    At a command prompt:
    – cd into the unzipped distribution directory
    – cd into the example directory
    – Enter: java -jar start.jar

    Visit http://localhost:8983/solr/ in a browser. If you see a
    welcome message, your Solr works

    Unpack your dev8d-solr.zip file

    At another command prompt, cd into your dev8d-solr
    directory
Checking Solr Works


    Visit http://localhost:8983/solr/admin/

    You should see the Solr admin page.

    Click statistics link

    You'll see NumDocs: 0

    There's nothing in the index, so searches won't show
    much

    So we need to index some sample content
Indexing Sample Content



    In your dev8d-solr directory (extracted from the zip), at
    a command prompt:

    Java -jar post.jar wikipedia-basic.xml
Searching




    http://localhost:8983/solr/select?q=*:*
Searching




    http://localhost:8983/solr/select?q=computers
Searching




    http://localhost:8983/solr/select?q=computer systems
Searching




     http://localhost:8983/solr/select?q=computers OR systems
Searching




     http://localhost:8983/solr/select?q=computers AND systems
Searching




     http://localhost:8983/solr/select?q="computer systems"
Searching




     http://localhost:8983/solr/select?q="computer systems"~10
Searching




     http://localhost:8983/solr/select?q=computers NOT data
Searching




     http://localhost:8983/solr/select?q=computers -data
Searching




     http://localhost:8983/solr/select/?q=computers&fl=title
Searching




     http://localhost:8983/solr/select/?q=computers&fq=author:yobot
Searching



     http://localhost:8983/solr/select/?
     q=computers&fq=author:yobot&fl=title,author
Searching



     http://localhost:8983/solr/select/?
     q=computers&rows=10&start=10&fl=title
Searching




     http://localhost:8983/solr/select/?q=title:system&fl=title
Searching



     http://localhost:8983/solr/select/?
     q=computers&fl=title,author&sort=author+desc
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0
     &facet.sort=lex
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0&
     facet.sort=count
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0&
     facet.sort=count&facet.mincount=2
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0&
     facet.sort=count&facet.limit=3
Searching



     http://localhost:8983/solr/select/?
     q=computers&facet=true&facet.field=author&rows=0&
     facet.sort=count&facet.limit=3&debugQuery=true
Searching




     http://localhost:8983/solr/select?q=computer&wt=json
Searching




     http://localhost:8983/solr/select?q=computer&wt=javabin
Indexing
Indexing



     Load wikipedia-basic.xml into a text editor or web browser

     Load wikipedia-enhanced.xml into a text editor or browser

     Load example/solr/conf/schema.xml into a text editor
Indexing



     schema.xml defines field types and fields used in Solr

     Equivalent to your database schema in a RDBMS
Indexing


     Change these two fields in schema.xml to be of type “string”
     and add multiValued=”true” for each.
      <field name="links" type="string" indexed="true"
     stored="true" multiValued="true"/>
      <field name="category" type="string" indexed="true"
     stored="true" multiValued="true"/>
Indexing


     Now add this to the <fields> section of solrconfig.xml:

     <field name="source" type="string" indexed="true"
     stored="true" multiValued="false"/>

     <field name="textgen" type="textgen" indexed="true"
     stored="true" multiValued="true"/>

     Now search for the “textgen” field type definition, further up
     in the file.
Indexing



     At the bottom of solrconfig.xml add the following:
     <copyField source="text" dest="textgen"/>
Indexing



     At your command prompt, in the dev8d directory, execute:

     java -jar post.jar wikipedia-enhanced.xml
More Advanced Searching



     http://localhost:8983/solr/select?q=computers%20AND
     %20babbage&facet=true&facet.field=category&facet.mincount=
     1
More Advanced Searching



     http://localhost:8983/solr/terms?
     terms.fl=text&terms=true&terms.limit=20
More Advanced Searching



     http://localhost:8983/solr/terms?
     terms.fl=textgen&terms=true&terms.limit=20
More Advanced Searching



     http://localhost:8983/solr/terms?
     terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
thank you
upayavira@sourcesense.com
Solr Host Configuration

       shard 1



       shard 2   searches



       shard 3
Solr Host Configuration

        shard 1



        shard 2



        shard 3




      co-ordinator
Solr Host Configuration

        shard 1



        shard 2



        shard 3




      co-ordinator




                     load balancer
Solr Host Configuration

        shard 1                      shard 1



        shard 2                      shard 2



        shard 3                      shard 3




      co-ordinator               co-ordinator




                     load balancer
Solr Host Configuration

        shard 1                      shard 1



        shard 2                      shard 2



        shard 3                      shard 3




      co-ordinator               co-ordinator




                     load balancer

Mais conteúdo relacionado

Mais procurados

Deep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceDeep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceAmazon Web Services
 
DRUPAL Search API Solr
DRUPAL Search API SolrDRUPAL Search API Solr
DRUPAL Search API SolrAndrew Siz
 
Deep Dive into AWS CLI - the command line interface
Deep Dive into AWS CLI - the command line interfaceDeep Dive into AWS CLI - the command line interface
Deep Dive into AWS CLI - the command line interfaceJohn Varghese
 
Lightweight Webservices with Sinatra and RestClient
Lightweight Webservices with Sinatra and RestClientLightweight Webservices with Sinatra and RestClient
Lightweight Webservices with Sinatra and RestClientAdam Wiggins
 
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...Puppet
 
Deep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceDeep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceAmazon Web Services
 
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014Amazon Web Services
 
Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3makoto tsuyuki
 
Ethiopian multiplication in Perl6
Ethiopian multiplication in Perl6Ethiopian multiplication in Perl6
Ethiopian multiplication in Perl6Workhorse Computing
 
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...Arc & Codementor
 
用Tornado开发RESTful API运用
用Tornado开发RESTful API运用用Tornado开发RESTful API运用
用Tornado开发RESTful API运用Felinx Lee
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como códigoVictor Adsuar
 
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)Puppet
 
Refactor Dance - Puppet Labs 'Best Practices'
Refactor Dance - Puppet Labs 'Best Practices'Refactor Dance - Puppet Labs 'Best Practices'
Refactor Dance - Puppet Labs 'Best Practices'Gary Larizza
 
Real time server
Real time serverReal time server
Real time serverthepian
 
Keeping it Small: Getting to know the Slim Micro Framework
Keeping it Small: Getting to know the Slim Micro FrameworkKeeping it Small: Getting to know the Slim Micro Framework
Keeping it Small: Getting to know the Slim Micro FrameworkJeremy Kendall
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To BatchLuca Mearelli
 
Controlling The Cloud With Python
Controlling The Cloud With PythonControlling The Cloud With Python
Controlling The Cloud With PythonLuca Mearelli
 

Mais procurados (20)

Apache Hacks
Apache HacksApache Hacks
Apache Hacks
 
Deep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceDeep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line Interface
 
DRUPAL Search API Solr
DRUPAL Search API SolrDRUPAL Search API Solr
DRUPAL Search API Solr
 
Deep Dive into AWS CLI - the command line interface
Deep Dive into AWS CLI - the command line interfaceDeep Dive into AWS CLI - the command line interface
Deep Dive into AWS CLI - the command line interface
 
Lightweight Webservices with Sinatra and RestClient
Lightweight Webservices with Sinatra and RestClientLightweight Webservices with Sinatra and RestClient
Lightweight Webservices with Sinatra and RestClient
 
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
Puppet Camp DC 2015: Stop Writing Puppet Modules: A Guide to Best Practices i...
 
Deep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line InterfaceDeep Dive: AWS Command Line Interface
Deep Dive: AWS Command Line Interface
 
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
(DEV301) Advanced Usage of the AWS CLI | AWS re:Invent 2014
 
Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3Django - 次の一歩 gumiStudy#3
Django - 次の一歩 gumiStudy#3
 
Ethiopian multiplication in Perl6
Ethiopian multiplication in Perl6Ethiopian multiplication in Perl6
Ethiopian multiplication in Perl6
 
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
Building Modern and Secure PHP Applications – Codementor Office Hours with Be...
 
用Tornado开发RESTful API运用
用Tornado开发RESTful API运用用Tornado开发RESTful API运用
用Tornado开发RESTful API运用
 
Terraform infraestructura como código
Terraform infraestructura como códigoTerraform infraestructura como código
Terraform infraestructura como código
 
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
Puppet Camp Portland 2015: Introduction to Hiera (Beginner)
 
Refactor Dance - Puppet Labs 'Best Practices'
Refactor Dance - Puppet Labs 'Best Practices'Refactor Dance - Puppet Labs 'Best Practices'
Refactor Dance - Puppet Labs 'Best Practices'
 
CodeIgniter 3.0
CodeIgniter 3.0CodeIgniter 3.0
CodeIgniter 3.0
 
Real time server
Real time serverReal time server
Real time server
 
Keeping it Small: Getting to know the Slim Micro Framework
Keeping it Small: Getting to know the Slim Micro FrameworkKeeping it Small: Getting to know the Slim Micro Framework
Keeping it Small: Getting to know the Slim Micro Framework
 
To Batch Or Not To Batch
To Batch Or Not To BatchTo Batch Or Not To Batch
To Batch Or Not To Batch
 
Controlling The Cloud With Python
Controlling The Cloud With PythonControlling The Cloud With Python
Controlling The Cloud With Python
 

Semelhante a Small wins in a small time with Apache Solr

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solrNet7
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineDavid Keener
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher lucenerevolution
 
Enterprise search with apache solr
Enterprise search with apache solrEnterprise search with apache solr
Enterprise search with apache solrsenthil0809
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2longkeyy
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesAnant Corporation
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered LuceneErik Hatcher
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in ScalaAlex Payne
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 

Semelhante a Small wins in a small time with Apache Solr (20)

Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
Apache Solr + ajax solr
Apache Solr + ajax solrApache Solr + ajax solr
Apache Solr + ajax solr
 
Rails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search EngineRails and the Apache SOLR Search Engine
Rails and the Apache SOLR Search Engine
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid prototyping with solr - By Erik Hatcher
Rapid prototyping with solr -  By Erik Hatcher Rapid prototyping with solr -  By Erik Hatcher
Rapid prototyping with solr - By Erik Hatcher
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Enterprise search with apache solr
Enterprise search with apache solrEnterprise search with apache solr
Enterprise search with apache solr
 
Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014Solr Masterclass Bangkok, June 2014
Solr Masterclass Bangkok, June 2014
 
Solr 8 interview
Solr 8 interview Solr 8 interview
Solr 8 interview
 
Solr Powered Lucene
Solr Powered LuceneSolr Powered Lucene
Solr Powered Lucene
 
Laravel 4 presentation
Laravel 4 presentationLaravel 4 presentation
Laravel 4 presentation
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Apache solr liferay
Apache solr liferayApache solr liferay
Apache solr liferay
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

Mais de Sourcesense

Atlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiAtlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiSourcesense
 
Atlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionAtlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionSourcesense
 
Atlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesAtlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesSourcesense
 
Atlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introAtlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introSourcesense
 
Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Sourcesense
 
Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense
 
Sharded Solr setup with master
Sharded Solr setup with masterSharded Solr setup with master
Sharded Solr setup with masterSourcesense
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StorySourcesense
 

Mais de Sourcesense (8)

Atlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad CavalcantiAtlassian Roadshow 2016 - Vlad Cavalcanti
Atlassian Roadshow 2016 - Vlad Cavalcanti
 
Atlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps SessionAtlassian Roadshow 2016 - DevOps Session
Atlassian Roadshow 2016 - DevOps Session
 
Atlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense ReferencesAtlassian Roadshow 2016 - Sourcesense References
Atlassian Roadshow 2016 - Sourcesense References
 
Atlassian Roadshow 2016 intro
Atlassian Roadshow 2016 introAtlassian Roadshow 2016 intro
Atlassian Roadshow 2016 intro
 
Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015Liferay Symposium – Italy 2015
Liferay Symposium – Italy 2015
 
Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015Sourcesense - Alfresco Day Roma 2015
Sourcesense - Alfresco Day Roma 2015
 
Sharded Solr setup with master
Sharded Solr setup with masterSharded Solr setup with master
Sharded Solr setup with master
 
Faceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents StoryFaceted Search – the 120 Million Documents Story
Faceted Search – the 120 Million Documents Story
 

Último

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 

Último (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 

Small wins in a small time with Apache Solr

  • 1. Small wins In a small time with Apache Solr
  • 2. Who am I?  My (Buddhist) name is Upayavira  Consultant with Sourcesense, specialising in search and operational technologies  A member of the Apache Software Foundation
  • 3. Who are Sourcesense?  Open Source integrator, specialising in:  Search  Business Intelligence  Content Management  Application Lifecycle Management  Offices in London, Amsterdam, Milan and Rome
  • 4. Committers and Contributors  Search:  Lucene/Solr – contributor  Hibernate Search – committer  Lucene Infinispan integration – lead developer  Apache UIMA – committer  CMS:  Apache Chemistry – contributor  Apache Jackrabbit – contributor  JBoss GateIn Portal – committer  OpenSSO-Alfresco - contributor
  • 5. What is Lucene?  Lucene is a Java information retrieval library  Provides free text search facilities  Started in 2000, by Doug Cutting  A project of the Apache Software Foundation  It is designed to be embedded in Java apps
  • 6. What is Solr?  Solr is an enterprise search server based on Lucene  Wraps Lucene with a RESTful web interface  Provides configurable schema  Provides replication functionality
  • 7. Solr Design User queries Solr SearchHandler instance Lucene index UpdateRequestHandler content application
  • 8. Prerequisites  Java, preferably Java 6  Apache Solr 1.4.1  http://www.sourcesense.com/dev8d-solr.zip
  • 9. Prerequisites  Extract your Solr distribution  At a command prompt: – cd into the unzipped distribution directory – cd into the example directory – Enter: java -jar start.jar  Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, your Solr works  Unpack your dev8d-solr.zip file  At another command prompt, cd into your dev8d-solr directory
  • 10. Checking Solr Works  Visit http://localhost:8983/solr/admin/  You should see the Solr admin page.  Click statistics link  You'll see NumDocs: 0  There's nothing in the index, so searches won't show much  So we need to index some sample content
  • 11. Indexing Sample Content  In your dev8d-solr directory (extracted from the zip), at a command prompt:  Java -jar post.jar wikipedia-basic.xml
  • 12. Searching  http://localhost:8983/solr/select?q=*:*
  • 13. Searching  http://localhost:8983/solr/select?q=computers
  • 14. Searching  http://localhost:8983/solr/select?q=computer systems
  • 15. Searching  http://localhost:8983/solr/select?q=computers OR systems
  • 16. Searching  http://localhost:8983/solr/select?q=computers AND systems
  • 17. Searching  http://localhost:8983/solr/select?q="computer systems"
  • 18. Searching  http://localhost:8983/solr/select?q="computer systems"~10
  • 19. Searching  http://localhost:8983/solr/select?q=computers NOT data
  • 20. Searching  http://localhost:8983/solr/select?q=computers -data
  • 21. Searching  http://localhost:8983/solr/select/?q=computers&fl=title
  • 22. Searching  http://localhost:8983/solr/select/?q=computers&fq=author:yobot
  • 23. Searching  http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author
  • 24. Searching  http://localhost:8983/solr/select/? q=computers&rows=10&start=10&fl=title
  • 25. Searching  http://localhost:8983/solr/select/?q=title:system&fl=title
  • 26. Searching  http://localhost:8983/solr/select/? q=computers&fl=title,author&sort=author+desc
  • 27. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author
  • 28. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0 &facet.sort=lex
  • 29. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count
  • 30. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.mincount=2
  • 31. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.limit=3
  • 32. Searching  http://localhost:8983/solr/select/? q=computers&facet=true&facet.field=author&rows=0& facet.sort=count&facet.limit=3&debugQuery=true
  • 33. Searching  http://localhost:8983/solr/select?q=computer&wt=json
  • 34. Searching  http://localhost:8983/solr/select?q=computer&wt=javabin
  • 36. Indexing  Load wikipedia-basic.xml into a text editor or web browser  Load wikipedia-enhanced.xml into a text editor or browser  Load example/solr/conf/schema.xml into a text editor
  • 37. Indexing  schema.xml defines field types and fields used in Solr  Equivalent to your database schema in a RDBMS
  • 38. Indexing  Change these two fields in schema.xml to be of type “string” and add multiValued=”true” for each. <field name="links" type="string" indexed="true" stored="true" multiValued="true"/> <field name="category" type="string" indexed="true" stored="true" multiValued="true"/>
  • 39. Indexing  Now add this to the <fields> section of solrconfig.xml:  <field name="source" type="string" indexed="true" stored="true" multiValued="false"/>  <field name="textgen" type="textgen" indexed="true" stored="true" multiValued="true"/>  Now search for the “textgen” field type definition, further up in the file.
  • 40. Indexing  At the bottom of solrconfig.xml add the following: <copyField source="text" dest="textgen"/>
  • 41. Indexing  At your command prompt, in the dev8d directory, execute:  java -jar post.jar wikipedia-enhanced.xml
  • 42. More Advanced Searching  http://localhost:8983/solr/select?q=computers%20AND %20babbage&facet=true&facet.field=category&facet.mincount= 1
  • 43. More Advanced Searching  http://localhost:8983/solr/terms? terms.fl=text&terms=true&terms.limit=20
  • 44. More Advanced Searching  http://localhost:8983/solr/terms? terms.fl=textgen&terms=true&terms.limit=20
  • 45. More Advanced Searching  http://localhost:8983/solr/terms? terms.fl=textgen&terms=true&terms.limit=20&terms.prefix=at
  • 47. Solr Host Configuration shard 1 shard 2 searches shard 3
  • 48. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator
  • 49. Solr Host Configuration shard 1 shard 2 shard 3 co-ordinator load balancer
  • 50. Solr Host Configuration shard 1 shard 1 shard 2 shard 2 shard 3 shard 3 co-ordinator co-ordinator load balancer
  • 51. Solr Host Configuration shard 1 shard 1 shard 2 shard 2 shard 3 shard 3 co-ordinator co-ordinator load balancer