SlideShare uma empresa Scribd logo
1 de 16
Solr 4.1



      Abhey Gupta
Software Engineer (Java)
Value First Digital Pvt Ltd
Outline
This presentation will guide from series of question in try to answer
    usability of Solr in MIS 3

–   What is Solr ?
–   Why use Solr ?
–   Current Scenairo
–   Scope of Improvement
–   Indexing Data
–   Import MIS Data
–   Challenges
–   Demo
–   Query Example
What is Solr?
Solr is the popular, blazing fast open source enterprise search platform
  from the Apache Lucene project.

     •    Its major features include powerful full-text search, hit
         highlighting, faceted search, dynamic clustering, database
         integration, rich document (e.g., Word, PDF) handling, and
         geospatial search.

     •   Solr is highly scalable, providing distributed search and index
         replication, and it powers the search and navigation features
         of many of the world's largest internet sites.
What is Lucene?
Apache Lucene is a high-performance, full-featured text search engine library
written entirely in Java. It is a technology suitable for nearly any application
that requires full-text search, especially cross-platform

     –   An open source Java-based IR library with best practice indexing
         and query capabilities, fast and lightweight search and indexing.
     –   100% Java (.NET, Perl and other versions too).
     –   Stable, mature API.
     –   Continuously improved and tuned over more than 10 years.
     –   Cleanly implemented, easy to embed in an application.
     –   Compact, portable index representation.
     –   Programmable text analyzers, spell checking and highlighting.
     –   Not a crawler or a text extraction tool.
Who uses Lucene/Solr?
Here are five noteworthy public sites that use Solr to handle search:

–   WhiteHouse.gov – The Obama administration's keystone web site is
    Drupal and Solr!
–   Netflix – Solr powers basic movie searching on this extremely busy
    site.
–   Internet Archive – Search this vast repository of music, documents
    and video using Solr.
–   StubHub.com – This ticket reseller uses Solr to help visitors search
    for concerts and sporting events.
–   The Smithsonian Institution – Search the Smithsonian’s collection of
    over 4 million items.
Solr indexing options
Why uses Solr?
 Assuming the user has a relational DB, why use Solr? If your use case
requires a person to type words into a search box, you want a text search
engine like Solr.

Databases and Solr have complementary strengths and weaknesses.

SQL supports very simple wildcard-based text search with some simple
normalization like matching upper case to lower case. The problem is that
these are full table scans. In Solr all searchable words are stored in an
"inverse index", which searches orders of magnitude faster.

For Deatils Please consult below link
                   –   http://wiki.apache.org/solr/WhyUseSolr
Current Scenario
In current ,MIS 3 use mysql FULL TEXT search for text based search which
lacks behind solr in terms of Query Speed & Text Search



                1. Full Text            2. Full Text Search
                Search         MIS UI   Query
     USER
                                (80)

                                                                   MYSQL
                                            4. Result




                                                              3. Full table Scan for text
                                                              Search
Scope of Improvement
Instead of quering MYSQL for text search , we can deploy Solr inbetween ,
which will return result , being inverted index , this quering is fast and
efficient.


                1. Full Text
                Search                 MIS UI
     USER
                                        (80)

                                                                                           MYSQL


                                                  4. Result
                 2. Full Text Search
                 Query



                                                SOLR
                                                              3. Scan for tokenized text
                                                              Search in inverted index
Indexing Data in Solr
A Solr index can accept data from many different sources, including XML
files, comma-separated value (CSV) files, data extracted from tables in a
database, and files in common file formats such as Microsoft Word or PDF.

Here are the most common ways of loading data into a Solr index:

    –    Uploading Structured Data Store Data with the Data Import Handler

    –    Using the Solr Cell framework built on Apache Tika for ingesting
         binary files or structured files such as Office, Word, PDF, and other
         proprietary formats.

    –    Uploading XML files by sending HTTP requests to the Solr server
         from any environment where such requests can be generated.

    –    Writing a custom Java application to ingest data through Solr's Java
Indexing MIS Data
MIS has structured data on MIS server and Structured files on services
server , so this way we can index data in two ways , These are following

    –    Data Import Handler on MIS Database
           • This has benefit of manageability , as this needs to be
               deployed on MIS servers only,which are very few.
           • We can import data on delta incremental.

    –    Script to import CSV files from services
           • This will increase in manageability and deployability of scripts
               on services
           • Need to implement partial import for DLRLOG data.
Indexing Bean
Solr can also import bean type for indexing , in Services we build bean of
Every MT and DLR , we can Directly import them on Solr.

This could increase into unneccesary load , as API will index bean per
messages.


                     Index Data call per MT and DLR
      API 15                                          SOLR
Data Import Handler
We can import data to index in solr from mysql , we can do this in two ways ,
 disctributed or centeral
                   SOLR
    MYSQL                                           MYSQL




                    SOLR
     MYSQL                                           MYSQL
                                                                        SOLR




                    SOLR                             MYSQL
     MYSQL




     MYSQL         SOLR                              MYSQL
Import CSV
We can import data to index in solr from each services , we can also do this in
 two ways , disctributed or centeral
                  SCRIPT
     Service 1               SOLR



                  SCRIPT
     Service 1               SOLR



     Service 1    SCRIPT     SOLR




    …..........



     Service 1    SCRIPT
                             SOLR
Challenges
Every Import Scenario advantages trade off with some disadvantages and
challenges .

For Example

    –   DIH : Data Import handler require joins with sql query to import data
        from mttextlog,mtlog and dlrlog.

           •   Or we can get messageid from Solr and query again to mysql
               for complete data with in clause query for message id.

    –   CSV Import : It requires scripts to be deployed on every service
        server and lots of managebablity of files proccessed or not
        proccessed.

    –   BEAN Import : It requires changes at API level and could result into
Thank You

Mais conteúdo relacionado

Semelhante a Solr 4

Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2longkeyy
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmaplucenerevolution
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road maplucenerevolution
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1Stefan Schmidt
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfAbanti Aazmin
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrlucenerevolution
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25Jon Petter Hjulstad
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Apache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of SearchApache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of Searchcompany
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfFariha Tasnim
 
OUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteOUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteJon Petter Hjulstad
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Mary Jo Sminkey
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relationJay Bharat
 

Semelhante a Solr 4 (20)

Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2Solr中国8月4日答疑交流v2
Solr中国8月4日答疑交流v2
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote   Yonik Seeley & Steve Rowe lucene solr roadmapKeynote   Yonik Seeley & Steve Rowe lucene solr roadmap
Keynote Yonik Seeley & Steve Rowe lucene solr roadmap
 
KEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road mapKEYNOTE: Lucene / Solr road map
KEYNOTE: Lucene / Solr road map
 
Solr -
Solr - Solr -
Solr -
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1New Persistence Features in Spring Roo 1.1
New Persistence Features in Spring Roo 1.1
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Building a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solrBuilding a near real time search engine & analytics for logs using solr
Building a near real time search engine & analytics for logs using solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 
REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25REST - Why, When and How? at AMIS25
REST - Why, When and How? at AMIS25
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Apache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of SearchApache Solr Web Development: Unlocking the Power of Search
Apache Solr Web Development: Unlocking the Power of Search
 
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
Databasecentricapisonthecloudusingplsqlandnodejscon3153oow2016 160922021655
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
OUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA SuiteOUGN 2016: Experiences with REST support on OSB/SOA Suite
OUGN 2016: Experiences with REST support on OSB/SOA Suite
 
Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)Solr/Elasticsearch for CF Developers (and others)
Solr/Elasticsearch for CF Developers (and others)
 
Solr search engine with multiple table relation
Solr search engine with multiple table relationSolr search engine with multiple table relation
Solr search engine with multiple table relation
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Solr 4

  • 1. Solr 4.1 Abhey Gupta Software Engineer (Java) Value First Digital Pvt Ltd
  • 2. Outline This presentation will guide from series of question in try to answer usability of Solr in MIS 3 – What is Solr ? – Why use Solr ? – Current Scenairo – Scope of Improvement – Indexing Data – Import MIS Data – Challenges – Demo – Query Example
  • 3. What is Solr? Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. • Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. • Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites.
  • 4. What is Lucene? Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform – An open source Java-based IR library with best practice indexing and query capabilities, fast and lightweight search and indexing. – 100% Java (.NET, Perl and other versions too). – Stable, mature API. – Continuously improved and tuned over more than 10 years. – Cleanly implemented, easy to embed in an application. – Compact, portable index representation. – Programmable text analyzers, spell checking and highlighting. – Not a crawler or a text extraction tool.
  • 5. Who uses Lucene/Solr? Here are five noteworthy public sites that use Solr to handle search: – WhiteHouse.gov – The Obama administration's keystone web site is Drupal and Solr! – Netflix – Solr powers basic movie searching on this extremely busy site. – Internet Archive – Search this vast repository of music, documents and video using Solr. – StubHub.com – This ticket reseller uses Solr to help visitors search for concerts and sporting events. – The Smithsonian Institution – Search the Smithsonian’s collection of over 4 million items.
  • 7. Why uses Solr? Assuming the user has a relational DB, why use Solr? If your use case requires a person to type words into a search box, you want a text search engine like Solr. Databases and Solr have complementary strengths and weaknesses. SQL supports very simple wildcard-based text search with some simple normalization like matching upper case to lower case. The problem is that these are full table scans. In Solr all searchable words are stored in an "inverse index", which searches orders of magnitude faster. For Deatils Please consult below link – http://wiki.apache.org/solr/WhyUseSolr
  • 8. Current Scenario In current ,MIS 3 use mysql FULL TEXT search for text based search which lacks behind solr in terms of Query Speed & Text Search 1. Full Text 2. Full Text Search Search MIS UI Query USER (80) MYSQL 4. Result 3. Full table Scan for text Search
  • 9. Scope of Improvement Instead of quering MYSQL for text search , we can deploy Solr inbetween , which will return result , being inverted index , this quering is fast and efficient. 1. Full Text Search MIS UI USER (80) MYSQL 4. Result 2. Full Text Search Query SOLR 3. Scan for tokenized text Search in inverted index
  • 10. Indexing Data in Solr A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF. Here are the most common ways of loading data into a Solr index: – Uploading Structured Data Store Data with the Data Import Handler – Using the Solr Cell framework built on Apache Tika for ingesting binary files or structured files such as Office, Word, PDF, and other proprietary formats. – Uploading XML files by sending HTTP requests to the Solr server from any environment where such requests can be generated. – Writing a custom Java application to ingest data through Solr's Java
  • 11. Indexing MIS Data MIS has structured data on MIS server and Structured files on services server , so this way we can index data in two ways , These are following – Data Import Handler on MIS Database • This has benefit of manageability , as this needs to be deployed on MIS servers only,which are very few. • We can import data on delta incremental. – Script to import CSV files from services • This will increase in manageability and deployability of scripts on services • Need to implement partial import for DLRLOG data.
  • 12. Indexing Bean Solr can also import bean type for indexing , in Services we build bean of Every MT and DLR , we can Directly import them on Solr. This could increase into unneccesary load , as API will index bean per messages. Index Data call per MT and DLR API 15 SOLR
  • 13. Data Import Handler We can import data to index in solr from mysql , we can do this in two ways , disctributed or centeral SOLR MYSQL MYSQL SOLR MYSQL MYSQL SOLR SOLR MYSQL MYSQL MYSQL SOLR MYSQL
  • 14. Import CSV We can import data to index in solr from each services , we can also do this in two ways , disctributed or centeral SCRIPT Service 1 SOLR SCRIPT Service 1 SOLR Service 1 SCRIPT SOLR ….......... Service 1 SCRIPT SOLR
  • 15. Challenges Every Import Scenario advantages trade off with some disadvantages and challenges . For Example – DIH : Data Import handler require joins with sql query to import data from mttextlog,mtlog and dlrlog. • Or we can get messageid from Solr and query again to mysql for complete data with in clause query for message id. – CSV Import : It requires scripts to be deployed on every service server and lots of managebablity of files proccessed or not proccessed. – BEAN Import : It requires changes at API level and could result into