SlideShare a Scribd company logo
1 of 4
Download to read offline
Bringing Reusability to Enterprise
       Search
       Using Solr for building reusable enterprise search
       engine.

       A Collabor Labs Technology Paper, May 2011




       This whitepaper discusses the high level technical aspects of using solr to bring re
       usability in enterprise search implementation




                                                                          Brahmaji Pusuluri
                                                                       Sr. Software Engineer


www.collabor.com                                                           info@collabor.com
Using Lucene for enterprise search

Apache Lucene(TM) is a high-performance, full-featured text search engine library written
entirely in Java. It is suitable for nearly any application that requires full-text search.




Solr - the reusable enterprise search engine

Solr is the popular, blazing fast open source enterprise search platform from the Apache
Lucene project. HTTP request processing for indexing and querying documents. Thus, you
can have an application anywhere query and index files over the Internet via XML over
HTTP using the URL of your Solr search server. It is also a highly optimized search server
with caching and replication to other Solr search servers. It has the powerful feature of
indexing Rich text documents (e.g.: word, pdf, etc.)




Working with Solr

Once Solr is installed successfully, we need to modify the following files as per the project
requirements.

Solrconfig.xml:
Solrconfig.xml solrconfig.xml is the file that contains most of the parameters for
configuring Solr itself.

Schema.xml:
Schema.xml The schema.xml file contains all of the details about which fields your
documents can contain, and how those fields should be dealt with when adding
documents to the index, or when querying those fields.

Once the settings are done you can send an xml file to the Solr to index the data by using
curl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xml
example.xml file containing the tags format which is defined in schema.xml.




Research by: Collabor Labs                                                                         Page 2
May 2011                                                   All trademarks belong to their respective owners
Index a DB table directly into Solr using DataImportHandler


Most applications store data in relational databases or XML files and searching over such
data is a common use-case. The DataImportHandler is a Solr contrib that provides a
configuration driven way to import this data into Solr in both "full builds" and using
incremental delta imports.

Edit your solrconfig.xml to add the request handler

<requestHandler name="/dataimport"
class="org.apache.solr.handler.dataimport.DataImportHandler">
<lst name="defaults">
 <str name="config">data-config.xml</str>
</lst>
</requestHandler>



The data-config.xml file contains the following.

<dataConfig>
 <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/>
 <document>
  <entity name="name" query="select id,name,desc from mytable">
    <field column="id" name="solr_id"/>
    <field column="name" name="solr_name"/>
    <field column="desc" name="solr_desc"/>
    <entity name="inner"
         query="select details from another_table where id ='${outer.id}'">
        <field column="details" name="solr_details"/>
    </entity>
  </entity>
 </document>
</dataConfig>


Run the full-import command to index the entire database.
http://localhost:8983/solr/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/dataimport?command=delta-import


Building Reusable search engine with Solr using multi-core

Multiple cores let you have a single Solr instance with separate configurations and
indexes, with their own configuration and schema for very different applications, but still
have the convenience of unified administration. Individual indexes are still fairly isolated,
but you can manage them as a single application, create new indexes on the fly by

Research by: Collabor Labs                                                                         Page 3
May 2011                                                   All trademarks belong to their respective owners
spinning up new SolrCores, and even make one SolrCore replace another SolrCore
without ever restarting your Servlet Container.

Edit the solr.xml and write a snippet. See example below.
<solr persistent="true" sharedLib="lib">
<cores adminPath="/admin/cores">
 <core name="application1" instanceDir="app1">
  <property name="dataDir" value="/app1/data" />
  <property name="configName" value="/app1/config.xml" />
  <property name="schemaName" value="/app1/schema.xml" />
 </core>
 <core name="application2" instanceDir="app2" />
</cores>
</solr>


Run the full-import command to index the entire database in application1.
http://localhost:8983/solr/application1/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/application1/dataimport?command=delta-import

Run the full-import command to index the entire database in application2.
http://localhost:8983/solr/application2/dataimport?command=full-import

Run the delta-import command to index the incremental data imports
http://localhost:8983/solr/application2/dataimport?command=delta-import

Searching for indexes

http://localhost:8983/solr/application1/select/?q=searchterm returns xml file with
results.

We can reuse single Solr installation to multiple enterprise search implementations.


References:

    1.   http://lucene.apache.org/solr/
    2.   Wikipedia pages -- Apache Solr
                          -




For more information, contact: info@collabor.com




Research by: Collabor Labs                                                                          Page 4
May 2011                                                    All trademarks belong to their respective owners

More Related Content

More from Collabor Inc.

Beyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights SoftwareBeyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights SoftwareCollabor Inc.
 
Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101Collabor Inc.
 
Datasheet wondercrowds
Datasheet wondercrowdsDatasheet wondercrowds
Datasheet wondercrowdsCollabor Inc.
 
Case study mywhitecoat.com
Case study mywhitecoat.comCase study mywhitecoat.com
Case study mywhitecoat.comCollabor Inc.
 
Whitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehiveWhitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehiveCollabor Inc.
 
Beyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle ManagementBeyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle ManagementCollabor Inc.
 
Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011Collabor Inc.
 

More from Collabor Inc. (10)

Beyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights SoftwareBeyond CRM - Collabor's Customer Engagement & Insights Software
Beyond CRM - Collabor's Customer Engagement & Insights Software
 
Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101Collabor Tech Talk - Data Encryption 101
Collabor Tech Talk - Data Encryption 101
 
Datasheet wondercrowds
Datasheet wondercrowdsDatasheet wondercrowds
Datasheet wondercrowds
 
The Cloud OS battle
The Cloud OS battleThe Cloud OS battle
The Cloud OS battle
 
Case study mywhitecoat.com
Case study mywhitecoat.comCase study mywhitecoat.com
Case study mywhitecoat.com
 
Whitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehiveWhitepaper - Building a collaboration beehive
Whitepaper - Building a collaboration beehive
 
Work 3.0 Datasheet
Work 3.0 DatasheetWork 3.0 Datasheet
Work 3.0 Datasheet
 
Case-study FFC
Case-study FFCCase-study FFC
Case-study FFC
 
Beyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle ManagementBeyond CRM - Customer Lifecycle Management
Beyond CRM - Customer Lifecycle Management
 
Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011Making flash work on iPhone & iPad April 2011
Making flash work on iPhone & iPad April 2011
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 

Bringing reusability to enterprise search

  • 1. Bringing Reusability to Enterprise Search Using Solr for building reusable enterprise search engine. A Collabor Labs Technology Paper, May 2011 This whitepaper discusses the high level technical aspects of using solr to bring re usability in enterprise search implementation Brahmaji Pusuluri Sr. Software Engineer www.collabor.com info@collabor.com
  • 2. Using Lucene for enterprise search Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is suitable for nearly any application that requires full-text search. Solr - the reusable enterprise search engine Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. HTTP request processing for indexing and querying documents. Thus, you can have an application anywhere query and index files over the Internet via XML over HTTP using the URL of your Solr search server. It is also a highly optimized search server with caching and replication to other Solr search servers. It has the powerful feature of indexing Rich text documents (e.g.: word, pdf, etc.) Working with Solr Once Solr is installed successfully, we need to modify the following files as per the project requirements. Solrconfig.xml: Solrconfig.xml solrconfig.xml is the file that contains most of the parameters for configuring Solr itself. Schema.xml: Schema.xml The schema.xml file contains all of the details about which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields. Once the settings are done you can send an xml file to the Solr to index the data by using curl. curl http://localhost:8983/solr/update -F stream. file=/tmp/example.xml example.xml file containing the tags format which is defined in schema.xml. Research by: Collabor Labs Page 2 May 2011 All trademarks belong to their respective owners
  • 3. Index a DB table directly into Solr using DataImportHandler Most applications store data in relational databases or XML files and searching over such data is a common use-case. The DataImportHandler is a Solr contrib that provides a configuration driven way to import this data into Solr in both "full builds" and using incremental delta imports. Edit your solrconfig.xml to add the request handler <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst> </requestHandler> The data-config.xml file contains the following. <dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/dbname" user="user-name" password="password"/> <document> <entity name="name" query="select id,name,desc from mytable"> <field column="id" name="solr_id"/> <field column="name" name="solr_name"/> <field column="desc" name="solr_desc"/> <entity name="inner" query="select details from another_table where id ='${outer.id}'"> <field column="details" name="solr_details"/> </entity> </entity> </document> </dataConfig> Run the full-import command to index the entire database. http://localhost:8983/solr/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/dataimport?command=delta-import Building Reusable search engine with Solr using multi-core Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own configuration and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by Research by: Collabor Labs Page 3 May 2011 All trademarks belong to their respective owners
  • 4. spinning up new SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container. Edit the solr.xml and write a snippet. See example below. <solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores"> <core name="application1" instanceDir="app1"> <property name="dataDir" value="/app1/data" /> <property name="configName" value="/app1/config.xml" /> <property name="schemaName" value="/app1/schema.xml" /> </core> <core name="application2" instanceDir="app2" /> </cores> </solr> Run the full-import command to index the entire database in application1. http://localhost:8983/solr/application1/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/application1/dataimport?command=delta-import Run the full-import command to index the entire database in application2. http://localhost:8983/solr/application2/dataimport?command=full-import Run the delta-import command to index the incremental data imports http://localhost:8983/solr/application2/dataimport?command=delta-import Searching for indexes http://localhost:8983/solr/application1/select/?q=searchterm returns xml file with results. We can reuse single Solr installation to multiple enterprise search implementations. References: 1. http://lucene.apache.org/solr/ 2. Wikipedia pages -- Apache Solr - For more information, contact: info@collabor.com Research by: Collabor Labs Page 4 May 2011 All trademarks belong to their respective owners