SlideShare uma empresa Scribd logo
1 de 54
Israel Ekpo | World Disney Parks and Resorts Online
Building Intelligent Search
Applications with Solr 1.4.1
and PHP 5
About the Presenter

Husband (beautiful wife June)

Father (handsome son Joshua)

Sr. Software Engineer at World Disney Parks and Resorts Online

Resides in Orlando, FL

Open Source Contributor to Apache Solr / Apache Lucene Projects

Author of Apache Solr PECL Extension

Email : iekpo@php.net

Twitter : @israelekpo

Website : http://www.israelekpo.com
Summary

Why Create Search Applications?

What Solr is

What Solr is not

Why choose Apache Solr?

What Features Solr Has to Offer in Current Release 1.4.1

Taking Advantage of these features and more

Using Apache Solr via PHP 5

How do we make Search Applications Intelligent?

Additional Topics (Nutch, Local Solr, Bitwise Filtering, Plugins)

Links to slides, sample codes

Upcoming Features

Where to get help with Apache Solr
Reasons to Create Search Applications

Users have been spoiled by Google and other search engines.

Users are used to navigation using a search box.

If unable to locate information immediately, they assume it's not there.

Less patience when attempting to look for information

Certain customer service tools may not have relevant navigation

Information needs to be located immediately
Reasons to Create Search Applications

Reduce the amount of time it takes to locate information

Make products and services more accessible to customers

Increase time spent by visitors on web application

Save time, Save money

Increase employee efficiency (CRM applications)

Improve user experience on web application

Increase revenue and increase profit
About Solr

Solr is written in Java

Standalone full text Search Server within servlet container

Uses the information retrieval library Lucene at its Core

Joined Apache Incubator in January 2006

First major release in December 2006

Latest stable release is 1.4.1 - June 2010

Next major release 4.0

http://lucene.apache.org/solr/
What Solr is NOT

Solr is not a wrapper around Lucene

Solr is not a RDBMS
What version should I use?
I am confused …
The current version of Apache Solr is 1.4.1
Why is there both a 1.5 and a 3.x anyway ?
Not to mention a 4.x ?

1.5 is pre lucene/solr merge (very unlikely to ever be released)

3.1 is the next lucene/solr point release (3x branch in svn)

4.0 is the next major release (trunk in svn)
Why Choose Apache Solr?
• Solr is FREE and Open Source (released under Apache License)
• Advanced Full Text Search Capabilities absent in an RDBMS
• Optimized for High Volume Web Traffic
• Standards Based Open Interfaces - XML,JSON and HTTP
• Comprehensive HTML Administration Interfaces
• Server statistics exposed over JMX for monitoring
• Scalability - Efficient Replication to other Solr Search Servers
• Flexible and Adaptable with XML configuration
• Extensible Plugin Architecture
• Because a Voice in my head told me to
• All of the above
Features in Apache Solr 1.4.1

A Real Data Schema -Numeric Types, Dynamic Fields, Unique Keys

Hit Highlighting, Spelling Suggestions, Auto Suggests

Faceted Search and Filtering

Advanced, Configurable Text Analysis

Highly Configurable and User Extensible Caching

External Configuration via XML

An Administration Interface

Monitorable Logging

Fast Incremental Updates and Index Replication

Highly Scalable Distributed search - sharded index across multiple hosts

XML, CSV/delimited-text update formats

Rich Document Indexing - PDF, Word, HTML using Apache Tika

Multiple search indices
Setting Up and Getting Started
Setup Instructions will be posted here later today
http://www.israelekpo.com/works
Download Links
http://www.apache.org/dyn/closer.cgi/lucene/solr/
http://tomcat.apache.org/download-60.cgi
http://pecl.php.net/package/solr
Documentation and Helpful Information
http://us.php.net/solr
http://lucene.apache.org/solr/tutorial.html
http://wiki.apache.org/solr/
Important Directories
conf/
This directory is mandatory and must contain your solrconfig.xml and schema.xml.
Any other optional configuration files would also be kept here.
data/
This directory is the default location where Solr will keep your index, and is used by the replication scripts
dealing with snapshots.
You can override this location in the solrconfig.xml
Solr will create this directory if it does not already exist.
lib/
This directory is optional. If it exists, Solr will load any Jars found in this directory and use them to resolve
any "plugins” specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...).
bin/
This directory is optional.
It is the default location used for keeping the replication scripts.
solrconfig.xml – handlers, plugins
Defines data directory for index
Overides directory for external libs
Defines and configures request handlers
Defines and enables response writers
Defines parameters for autowarming
Used to register additional plugins
synonyms.txt
Used for token or keyword substitution during indexing or queries
Source(s) => replacement(s)
colour => color
cheque => check
car, boat, truck => vehicle
dude, guy => man
trousers => pants
protowords.txt
Used to protect certain words from stemming
sports
news
fighter
schema.xml – data types, fields
schema.xml – data types, fields
schema.xml – data types, fields
schema.xml – data types, fields
Field Options By Use Case
Adding data to index
Solr Query Syntax
Solr Uses and Extends the Lucene Query Syntax (Superset of Lucene Syntax)
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html
http://wiki.apache.org/solr/SolrQuerySyntax
+ Required Optional –Prohibited
Booleans
Free AND Fast
+Free +Fast
+title:Fast AND –body:dollars
Range Queries
[* TO 500]
{300 TO *}
cost:[* TO 299.99}
Removing Data from Index
Administrative Interface
Text Analysis (muy importante)
Text Analysis
Text Analysis
Text Analysis
Taking advantage of features
Spellchecker
Taking advantage of features
Spellchecker
Taking advantage of features
Spell Checker
Taking advantage of features
Spell Checker
Taking advantage of features
Spell checker (behind the scenes)
Taking advantage of features
Reasons to Use Spellchecker
• Alerts user of possible mistaeks in search kewyords
• Helps user in finding what they are looking for even when they
don’t know how to spell it
• Helps in suggesting alternate queries that could provide better
results
Taking advantage of features
Autosuggest aka Auto complete
Taking advantage of features
Auto suggest aka auto complete
Taking advantage of features
Auto suggest behind the scenes
http://docs.jquery.com/UI/Autocomplete
Taking advantage of features
Auto suggest behind the scenes
Taking advantage of features
Why use auto suggest?
• Helps users to finish their thoughts or complete search phrase
• Helps reduce number of “no matches found” experiences
• Provides “mind-reader” experience to certain users.
• May propose alternate search phrases that are more useful
Taking advantage of features
Hit Highlighting
Taking advantage of features
Hit Highlighting
Taking advantage of features
Hit Highlighting (behind the scenes)
Taking advantage of features
Hit Highlighting (behind the scenes)
Taking advantage of features
Why Use Hit Highlighting?
Makes it easier for users to parse returned results
Improves overall search experience
Taking advantage of features
Facets and Filter Queries
Makes it easier for users to parse returned results
Improves overall search experience
Taking advantage of features
Facets and Filter Queries
Taking advantage of features
Facets and Filter Queries
Taking advantage of features
Why Facets and Filter Queries?
Allows users to narrow down humongous result set
Creates visual classification or categorization of result set
Gives user an idea of number of hits per category
Improves overall search experience
Additional Topics
Using Nutch and Solr – crawling and indexing intranet sites
http://nutch.apache.org/
Local Solr – filtering based on proximity
https://issues.apache.org/jira/browse/SOLR-773
Bitwise Filtering on Integer Fields
https://issues.apache.org/jira/browse/SOLR-1913
Using Different Response Writer for PHP
http://us.php.net/manual/en/solrclient.setresponsewriter.php
https://issues.apache.org/jira/browse/SOLR-1967
Upcoming Features
Apache Solr
• Local Solr
• Results Grouping
• Field Collapsing
PECL Extension
• Ability to Send Custom Requests to Custom URLS other than select, update, terms
etc.
• Ability to add files (pdf, office documents etc)
• Windows version of latest releases.
• Ensuring that SolrQuery::getFields(), SolrQuery::getFacets() et al returns an array
consistently.
• Lowering Libxml version to 2.6.16
Where to get help
Solr Wiki
http://wiki.apache.org/solr/
Solr Mailing Lists
solr-user@lucene.apache.org (send message)
solr-user-subscribe@lucene.apache.org (subscribe)
PECL Extension Documentation on PHP.net
http://www.php.net/solr
Additional Resources
http://wiki.apache.org/solr/SolrResources
Articles on LucidImagination.com – lot of articles from experts in the community.
Questions and Answers
WDPRO is Hiring
Walt Disney Parks and Resorts Online is hiring for the following positions
http://wdpro.jobs
• Software Architects
• Software Engineers
• Web Developers
• Automation Engineers
• System Engineers
• Release Engineers
• Quality Assurance Engineers
• Business Analysts
• Project Managers
• Technology Managers
Email your resume -> iekpo@php.net
mail(“iekpo@php.net”, “I am interested! Hook me up!”, “resume , contact info”);
Feedback and Links
Attendee Evaluation and Comments
http://joind.in/2261
Link to Slides
http://slidesha.re/bAXNF3
Sample Code (will be posted later)
http://www.israelekpo.com/works
Email
iekpo@php.net

Mais conteúdo relacionado

Mais procurados

Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engineth0masr
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solrsagar chaturvedi
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrRahul Jain
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solrKnoldus Inc.
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Alexandre Rafalovitch
 

Mais procurados (20)

Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Integrating the Solr search engine
Integrating the Solr search engineIntegrating the Solr search engine
Integrating the Solr search engine
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Enterprise Search Using Apache Solr
Enterprise Search Using Apache SolrEnterprise Search Using Apache Solr
Enterprise Search Using Apache Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Introduction to Apache solr
Introduction to Apache solrIntroduction to Apache solr
Introduction to Apache solr
 
Solr Architecture
Solr ArchitectureSolr Architecture
Solr Architecture
 
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
Rebuilding Solr 6 examples - layer by layer (LuceneSolrRevolution 2016)
 

Destaque

Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for DrupalChris Caple
 
Presentation11
Presentation11Presentation11
Presentation11sumberlor
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Solr rug
Solr rugSolr rug
Solr rugphoet
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrRobert Douglass
 
Webinar Google Analytics Real Time MA 22-11-11
Webinar Google Analytics Real Time MA 22-11-11Webinar Google Analytics Real Time MA 22-11-11
Webinar Google Analytics Real Time MA 22-11-11Watt
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Murshed Ahmmad Khan
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterlucenerevolution
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.Oliver Kriska
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformDataStax Academy
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Ruby language overview
Ruby language overviewRuby language overview
Ruby language overviewUptech
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrQAware GmbH
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 

Destaque (20)

Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Intro to Apache Solr for Drupal
Intro to Apache Solr for DrupalIntro to Apache Solr for Drupal
Intro to Apache Solr for Drupal
 
Presentation11
Presentation11Presentation11
Presentation11
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Solr rug
Solr rugSolr rug
Solr rug
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Webinar Google Analytics Real Time MA 22-11-11
Webinar Google Analytics Real Time MA 22-11-11Webinar Google Analytics Real Time MA 22-11-11
Webinar Google Analytics Real Time MA 22-11-11
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!Apache Solr! Enterprise Search Solutions at your Fingertips!
Apache Solr! Enterprise Search Solutions at your Fingertips!
 
How to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr clusterHow to make a simple cheap high availability self-healing solr cluster
How to make a simple cheap high availability self-healing solr cluster
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
WebUp Feb 2017 - How (not) to get lost in bigger Ruby on Rails project.
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Drupal 7 and SolR
Drupal 7 and SolRDrupal 7 and SolR
Drupal 7 and SolR
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Ruby language overview
Ruby language overviewRuby language overview
Ruby language overview
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache Solr
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 

Semelhante a Building Intelligent Search Applications with Apache Solr and PHP5

Drupal & Apache Solr
Drupal & Apache SolrDrupal & Apache Solr
Drupal & Apache SolrAndrei Savu
 
Enterprise search in_drupal_pub
Enterprise search in_drupal_pubEnterprise search in_drupal_pub
Enterprise search in_drupal_pubdstuartnz
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopGrant Ingersoll
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-WebinarEdureka!
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.inovex GmbH
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrSease
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 reportKoji Kawamura
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
FHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhirFHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhirBrian Postlethwaite
 

Semelhante a Building Intelligent Search Applications with Apache Solr and PHP5 (20)

Drupal & Apache Solr
Drupal & Apache SolrDrupal & Apache Solr
Drupal & Apache Solr
 
Enterprise search in_drupal_pub
Enterprise search in_drupal_pubEnterprise search in_drupal_pub
Enterprise search in_drupal_pub
 
Solr 101
Solr 101Solr 101
Solr 101
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
 
Apache Solr-Webinar
Apache Solr-WebinarApache Solr-Webinar
Apache Solr-Webinar
 
Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.Suche mit Apache Lucene & Co.
Suche mit Apache Lucene & Co.
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Drupal7 and Apache Solr
Drupal7 and Apache SolrDrupal7 and Apache Solr
Drupal7 and Apache Solr
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
Apache solr
Apache solrApache solr
Apache solr
 
ApacheCon NA 2011 report
ApacheCon NA 2011 reportApacheCon NA 2011 report
ApacheCon NA 2011 report
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Getting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for SolrGetting started faster with LucidWorks for Solr
Getting started faster with LucidWorks for Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Solr
SolrSolr
Solr
 
FHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhirFHIR Server internals - sqlonfhir
FHIR Server internals - sqlonfhir
 

Último

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Último (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

Building Intelligent Search Applications with Apache Solr and PHP5

  • 1. Israel Ekpo | World Disney Parks and Resorts Online Building Intelligent Search Applications with Solr 1.4.1 and PHP 5
  • 2. About the Presenter  Husband (beautiful wife June)  Father (handsome son Joshua)  Sr. Software Engineer at World Disney Parks and Resorts Online  Resides in Orlando, FL  Open Source Contributor to Apache Solr / Apache Lucene Projects  Author of Apache Solr PECL Extension  Email : iekpo@php.net  Twitter : @israelekpo  Website : http://www.israelekpo.com
  • 3. Summary  Why Create Search Applications?  What Solr is  What Solr is not  Why choose Apache Solr?  What Features Solr Has to Offer in Current Release 1.4.1  Taking Advantage of these features and more  Using Apache Solr via PHP 5  How do we make Search Applications Intelligent?  Additional Topics (Nutch, Local Solr, Bitwise Filtering, Plugins)  Links to slides, sample codes  Upcoming Features  Where to get help with Apache Solr
  • 4. Reasons to Create Search Applications  Users have been spoiled by Google and other search engines.  Users are used to navigation using a search box.  If unable to locate information immediately, they assume it's not there.  Less patience when attempting to look for information  Certain customer service tools may not have relevant navigation  Information needs to be located immediately
  • 5. Reasons to Create Search Applications  Reduce the amount of time it takes to locate information  Make products and services more accessible to customers  Increase time spent by visitors on web application  Save time, Save money  Increase employee efficiency (CRM applications)  Improve user experience on web application  Increase revenue and increase profit
  • 6. About Solr  Solr is written in Java  Standalone full text Search Server within servlet container  Uses the information retrieval library Lucene at its Core  Joined Apache Incubator in January 2006  First major release in December 2006  Latest stable release is 1.4.1 - June 2010  Next major release 4.0  http://lucene.apache.org/solr/
  • 7. What Solr is NOT  Solr is not a wrapper around Lucene  Solr is not a RDBMS
  • 8. What version should I use? I am confused … The current version of Apache Solr is 1.4.1 Why is there both a 1.5 and a 3.x anyway ? Not to mention a 4.x ?  1.5 is pre lucene/solr merge (very unlikely to ever be released)  3.1 is the next lucene/solr point release (3x branch in svn)  4.0 is the next major release (trunk in svn)
  • 9. Why Choose Apache Solr? • Solr is FREE and Open Source (released under Apache License) • Advanced Full Text Search Capabilities absent in an RDBMS • Optimized for High Volume Web Traffic • Standards Based Open Interfaces - XML,JSON and HTTP • Comprehensive HTML Administration Interfaces • Server statistics exposed over JMX for monitoring • Scalability - Efficient Replication to other Solr Search Servers • Flexible and Adaptable with XML configuration • Extensible Plugin Architecture • Because a Voice in my head told me to • All of the above
  • 10. Features in Apache Solr 1.4.1  A Real Data Schema -Numeric Types, Dynamic Fields, Unique Keys  Hit Highlighting, Spelling Suggestions, Auto Suggests  Faceted Search and Filtering  Advanced, Configurable Text Analysis  Highly Configurable and User Extensible Caching  External Configuration via XML  An Administration Interface  Monitorable Logging  Fast Incremental Updates and Index Replication  Highly Scalable Distributed search - sharded index across multiple hosts  XML, CSV/delimited-text update formats  Rich Document Indexing - PDF, Word, HTML using Apache Tika  Multiple search indices
  • 11. Setting Up and Getting Started Setup Instructions will be posted here later today http://www.israelekpo.com/works Download Links http://www.apache.org/dyn/closer.cgi/lucene/solr/ http://tomcat.apache.org/download-60.cgi http://pecl.php.net/package/solr Documentation and Helpful Information http://us.php.net/solr http://lucene.apache.org/solr/tutorial.html http://wiki.apache.org/solr/
  • 12. Important Directories conf/ This directory is mandatory and must contain your solrconfig.xml and schema.xml. Any other optional configuration files would also be kept here. data/ This directory is the default location where Solr will keep your index, and is used by the replication scripts dealing with snapshots. You can override this location in the solrconfig.xml Solr will create this directory if it does not already exist. lib/ This directory is optional. If it exists, Solr will load any Jars found in this directory and use them to resolve any "plugins” specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...). bin/ This directory is optional. It is the default location used for keeping the replication scripts.
  • 13. solrconfig.xml – handlers, plugins Defines data directory for index Overides directory for external libs Defines and configures request handlers Defines and enables response writers Defines parameters for autowarming Used to register additional plugins
  • 14. synonyms.txt Used for token or keyword substitution during indexing or queries Source(s) => replacement(s) colour => color cheque => check car, boat, truck => vehicle dude, guy => man trousers => pants
  • 15. protowords.txt Used to protect certain words from stemming sports news fighter
  • 16. schema.xml – data types, fields
  • 17. schema.xml – data types, fields
  • 18. schema.xml – data types, fields
  • 19. schema.xml – data types, fields
  • 20. Field Options By Use Case
  • 21. Adding data to index
  • 22. Solr Query Syntax Solr Uses and Extends the Lucene Query Syntax (Superset of Lucene Syntax) http://lucene.apache.org/java/2_9_1/queryparsersyntax.html http://wiki.apache.org/solr/SolrQuerySyntax + Required Optional –Prohibited Booleans Free AND Fast +Free +Fast +title:Fast AND –body:dollars Range Queries [* TO 500] {300 TO *} cost:[* TO 299.99}
  • 25. Text Analysis (muy importante)
  • 29. Taking advantage of features Spellchecker
  • 30. Taking advantage of features Spellchecker
  • 31. Taking advantage of features Spell Checker
  • 32. Taking advantage of features Spell Checker
  • 33. Taking advantage of features Spell checker (behind the scenes)
  • 34. Taking advantage of features Reasons to Use Spellchecker • Alerts user of possible mistaeks in search kewyords • Helps user in finding what they are looking for even when they don’t know how to spell it • Helps in suggesting alternate queries that could provide better results
  • 35. Taking advantage of features Autosuggest aka Auto complete
  • 36. Taking advantage of features Auto suggest aka auto complete
  • 37. Taking advantage of features Auto suggest behind the scenes http://docs.jquery.com/UI/Autocomplete
  • 38. Taking advantage of features Auto suggest behind the scenes
  • 39. Taking advantage of features Why use auto suggest? • Helps users to finish their thoughts or complete search phrase • Helps reduce number of “no matches found” experiences • Provides “mind-reader” experience to certain users. • May propose alternate search phrases that are more useful
  • 40. Taking advantage of features Hit Highlighting
  • 41. Taking advantage of features Hit Highlighting
  • 42. Taking advantage of features Hit Highlighting (behind the scenes)
  • 43. Taking advantage of features Hit Highlighting (behind the scenes)
  • 44. Taking advantage of features Why Use Hit Highlighting? Makes it easier for users to parse returned results Improves overall search experience
  • 45. Taking advantage of features Facets and Filter Queries Makes it easier for users to parse returned results Improves overall search experience
  • 46. Taking advantage of features Facets and Filter Queries
  • 47. Taking advantage of features Facets and Filter Queries
  • 48. Taking advantage of features Why Facets and Filter Queries? Allows users to narrow down humongous result set Creates visual classification or categorization of result set Gives user an idea of number of hits per category Improves overall search experience
  • 49. Additional Topics Using Nutch and Solr – crawling and indexing intranet sites http://nutch.apache.org/ Local Solr – filtering based on proximity https://issues.apache.org/jira/browse/SOLR-773 Bitwise Filtering on Integer Fields https://issues.apache.org/jira/browse/SOLR-1913 Using Different Response Writer for PHP http://us.php.net/manual/en/solrclient.setresponsewriter.php https://issues.apache.org/jira/browse/SOLR-1967
  • 50. Upcoming Features Apache Solr • Local Solr • Results Grouping • Field Collapsing PECL Extension • Ability to Send Custom Requests to Custom URLS other than select, update, terms etc. • Ability to add files (pdf, office documents etc) • Windows version of latest releases. • Ensuring that SolrQuery::getFields(), SolrQuery::getFacets() et al returns an array consistently. • Lowering Libxml version to 2.6.16
  • 51. Where to get help Solr Wiki http://wiki.apache.org/solr/ Solr Mailing Lists solr-user@lucene.apache.org (send message) solr-user-subscribe@lucene.apache.org (subscribe) PECL Extension Documentation on PHP.net http://www.php.net/solr Additional Resources http://wiki.apache.org/solr/SolrResources Articles on LucidImagination.com – lot of articles from experts in the community.
  • 53. WDPRO is Hiring Walt Disney Parks and Resorts Online is hiring for the following positions http://wdpro.jobs • Software Architects • Software Engineers • Web Developers • Automation Engineers • System Engineers • Release Engineers • Quality Assurance Engineers • Business Analysts • Project Managers • Technology Managers Email your resume -> iekpo@php.net mail(“iekpo@php.net”, “I am interested! Hook me up!”, “resume , contact info”);
  • 54. Feedback and Links Attendee Evaluation and Comments http://joind.in/2261 Link to Slides http://slidesha.re/bAXNF3 Sample Code (will be posted later) http://www.israelekpo.com/works Email iekpo@php.net