SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
Full­text searching with Marjory




               Markus Wolff




                     
What's Marjory?

        A webservice for full­text indexing and 
    


        searching of documents
        Written in PHP
    



        Based on Zend Framework
    



        (Very) Roughly comparable to Solr
    



        BSD­licensed, available on Google Code
    




                                
How does Marjory work?
                                Your application



                                 Sends search 
    Sends Document data                                Returns result in desired
                                 terms via GET
     or location via POST                             output format (default: XML)




                                   Marjory
                            (ReST­based webservice)



    Stores document data                                    Returns query
                                 Queries search
       in search engine                                        results
                                    engine




                                Search engine
                                            
                               (Default: Lucene)
Features

        Search engine abstraction
    


            use the engine that suits your needs, just write a 
        


            small adaptor class
            Zend_Search_Lucene built­in by default
        



        Multiple search catalogs
    


            Index many sites with one dedicated search server
        



            Put all documents matching any criteria into 
        


            separate search indexes to speed up search

                                     
More features

        Two ways to index documents:
    


            submit an XML snippet containing any content you 
        


            want to index
            or, just submit an URI (valid PHP stream resource) 
        


            and let Marjory extract the content from the 
            document
                 HTML supported by default (for now)
             



                 add your own document parser class to extract plain text 
             


                 from any other document format (or special markup 
                 structures)
                                         
Even more features

        Index documents asynchronously using Dropr 
    


        as a messaging service
            Dropr: PHP­based durable messaging service
        



            Example webservice and Dropr client included with 
        


            Marjory
            Application does not need to wait for document 
        


            retrieval, parsing and adding to the index
            More info: www.dropr.org
        




                                   
Latest additions

        Search results as a Dojo.Data compatible 
    


        JSON data source
        API exposure via JSON­RPC as alternative to 
    


        XML over ReST (experimental!)




                               
How to add a catalog

        Send a POST request to:
    


        http://marjory.example.com/rest/catalog/
        Containing this XML snippet:
    


        <add catalog=quot;MyGloriousCatalogquot; />
        Et voilá, you got yourself a new search index
    




                               
Adding a document

        Make a POST request to:
    


        http://marjory.example.com/rest/add/
        Send the document content as XML like this:
    


    <add catalog=quot;defaultquot;>
      <doc uri=quot;MyUniqueDocumentIdquot;>
          <field name=quot;titlequot;>Marjory: Search as a service</field>
          <field name=quot;abstractquot;>
            An epic novel about full­text indexing in an SOA environment
          </field>
          <field name=quot;contentquot;>Lorem ipsum dolor sit amet... (to be continued)</field> 
      </doc>

    </add>


                                                
Adding a document, the easy way

        Or, if Marjory should retrieve and parse the 
    


        document:
    <add catalog=quot;defaultquot;>
      <doc src=quot;http://my.website.tld/my/document.htmlquot; />
    </add>
        If you have many and/or complex documents, 
    


        better use Dropr to send messages to Marjory


                                
Searching for documents

        Make a GET request including the query terms:
    

        http://marjory.example.com/rest/select?q=Marjory
        Additional parameters to...
    


            Limit number of results
        



            Include only specific fields in response
        



            Specify a search catalog
        


                 Default catalog name: „default“ ­ who would have 
             


                 guessed?


                                         
Search response format

<?xml version=quot;1.0quot; encoding=quot;UTF­8quot;?>
<response>
  <responseHeader>
    <status>0</status><QTime>1</QTime>
  </responseHeader>

  <result numFound=quot;2quot; start=quot;0quot;>
   <doc>
    <str name=quot;idquot;>MA147LL/A</str>
    <str name=quot;namequot;>Apple 60 GB iPod Black</str>
   </doc>
   <doc>
    <str name=quot;idquot;>EN7800GTX/2DHTV/256M</str>
    <str name=quot;namequot;>ASUS Extreme N7800GTX</str>
   </doc>
  </result>
</response>


                                             
Looks familiar?

        Blatantly stolen from Solr :­)
    



        Why reinvent the wheel?
    



        Makes switching between the two projects easy 
    


        if need be
        Don't like it? Try JSON­RPC instead.
    




                                 
Access control

        No access control provided by Marjory
    



        Use your webserver's authentication and ACL 
    


        capabilities
        There are currently no plans to add anything 
    


        built­in, unless someone convinces me 
        otherwise :­)



                               
Things to do

        Fully unit­test the beast
    



        Add a nice admin GUI (currently in progress)
    



        Add other engines
    



        Support more document formats out of the box
    


        (PDF likely to be next addition)
        Fine­tuning (how about renaming or removing 
    


        catalogs, for example?)

                                 
Is it production­ready?

        Yes, and it's already being used on production 
    


        websites




                               
That's all, folks!

        More information:
    


            http://code.google.com/p/marjory/
        



            http://www.dropr.org/
        



            http://blog.wolff­hamburg.de/
        




                                     

Mais conteúdo relacionado

Mais procurados

ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
Clément Wehrung
 

Mais procurados (20)

ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
ePub 3, HTML 5 & CSS 3 (+ Fixed-Layout)
 
Sightly - Part 2
Sightly - Part 2Sightly - Part 2
Sightly - Part 2
 
Html and Xhtml
Html and XhtmlHtml and Xhtml
Html and Xhtml
 
HTML5 - Introduction
HTML5 - IntroductionHTML5 - Introduction
HTML5 - Introduction
 
Introduction to WEB HTML, CSS
Introduction to WEB HTML, CSSIntroduction to WEB HTML, CSS
Introduction to WEB HTML, CSS
 
Ferret
FerretFerret
Ferret
 
Elements of html powerpoint
Elements of html powerpointElements of html powerpoint
Elements of html powerpoint
 
Web Development Using CSS3
Web Development Using CSS3Web Development Using CSS3
Web Development Using CSS3
 
Web Development Using CSS3
Web Development Using CSS3Web Development Using CSS3
Web Development Using CSS3
 
How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)How to migrate from any CMS (thru the front-door)
How to migrate from any CMS (thru the front-door)
 
Things I wish web graduates knew
Things I wish web graduates knewThings I wish web graduates knew
Things I wish web graduates knew
 
WWW and HTTP
WWW and HTTPWWW and HTTP
WWW and HTTP
 
Html and html5 cheat sheets
Html and html5 cheat sheetsHtml and html5 cheat sheets
Html and html5 cheat sheets
 
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified DiffChoose Your Own Adventure: SEO For Web Developers | Unified Diff
Choose Your Own Adventure: SEO For Web Developers | Unified Diff
 
HTML Web design english & sinhala mix note
HTML Web design english & sinhala mix noteHTML Web design english & sinhala mix note
HTML Web design english & sinhala mix note
 
DIWE - Coding HTML for Basic Web Designing
DIWE - Coding HTML for Basic Web DesigningDIWE - Coding HTML for Basic Web Designing
DIWE - Coding HTML for Basic Web Designing
 
HTML5
HTML5 HTML5
HTML5
 
Web page concept final ppt
Web page concept  final pptWeb page concept  final ppt
Web page concept final ppt
 
html
htmlhtml
html
 
Xhtml 2010
Xhtml 2010Xhtml 2010
Xhtml 2010
 

Destaque

Omni-Proofing Your Organization
Omni-Proofing Your OrganizationOmni-Proofing Your Organization
Omni-Proofing Your Organization
Monica Gout
 

Destaque (18)

Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
Lezersonderzoek jo steyaert - indigov op Kortom studiedag dag van het informa...
 
Corp Web Risks and Concerns
Corp Web Risks and ConcernsCorp Web Risks and Concerns
Corp Web Risks and Concerns
 
California water footprint
California water footprintCalifornia water footprint
California water footprint
 
The Shift Home
The Shift HomeThe Shift Home
The Shift Home
 
Mis 14 FAVs de #Empleo2020
Mis 14 FAVs de #Empleo2020Mis 14 FAVs de #Empleo2020
Mis 14 FAVs de #Empleo2020
 
PHP, AJAX und XUL im Intranet
PHP, AJAX und XUL im IntranetPHP, AJAX und XUL im Intranet
PHP, AJAX und XUL im Intranet
 
why LG IPS technology?
why LG IPS technology?why LG IPS technology?
why LG IPS technology?
 
Rapid Evolution of Web Dev? aka Talking About The Web
Rapid Evolution of Web Dev? aka Talking About The WebRapid Evolution of Web Dev? aka Talking About The Web
Rapid Evolution of Web Dev? aka Talking About The Web
 
Irrigation of agricultural crops in California
Irrigation of agricultural crops in CaliforniaIrrigation of agricultural crops in California
Irrigation of agricultural crops in California
 
Thoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for SitecoreThoughts on Defensive Development for Sitecore
Thoughts on Defensive Development for Sitecore
 
Omni-Proofing Your Organization
Omni-Proofing Your OrganizationOmni-Proofing Your Organization
Omni-Proofing Your Organization
 
Magento Performance Improvements with Client Side Optimizations
Magento Performance Improvements with Client Side OptimizationsMagento Performance Improvements with Client Side Optimizations
Magento Performance Improvements with Client Side Optimizations
 
The Romanticism of Things
The Romanticism of ThingsThe Romanticism of Things
The Romanticism of Things
 
Who Gamifies the Gamificators?
Who Gamifies the Gamificators?Who Gamifies the Gamificators?
Who Gamifies the Gamificators?
 
Desarrollemos el negocio de la #mHealth
Desarrollemos el negocio de la #mHealthDesarrollemos el negocio de la #mHealth
Desarrollemos el negocio de la #mHealth
 
Transmisión de Conocimiento en Apps Sanitarias
Transmisión de Conocimiento en Apps SanitariasTransmisión de Conocimiento en Apps Sanitarias
Transmisión de Conocimiento en Apps Sanitarias
 
Dr. House Design Thinking
Dr. House Design ThinkingDr. House Design Thinking
Dr. House Design Thinking
 
Review of Malcolm Gladwell's Outliers
Review of Malcolm Gladwell's OutliersReview of Malcolm Gladwell's Outliers
Review of Malcolm Gladwell's Outliers
 

Semelhante a Search As A Service

Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web Developers
Nathan Buggia
 
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey IntroductionseyCwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
elliando dias
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 Platform
WSO2
 
Fast and Easy Website Tuneups
Fast and Easy Website TuneupsFast and Easy Website Tuneups
Fast and Easy Website Tuneups
Jeff Wisniewski
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practices
Stoyan Stefanov
 

Semelhante a Search As A Service (20)

Switching search to SOLR
Switching search to SOLRSwitching search to SOLR
Switching search to SOLR
 
REST Introduction (PHP London)
REST Introduction (PHP London)REST Introduction (PHP London)
REST Introduction (PHP London)
 
Boost and SEO
Boost and SEOBoost and SEO
Boost and SEO
 
Turbogears Presentation
Turbogears PresentationTurbogears Presentation
Turbogears Presentation
 
Advanced SEO for Web Developers
Advanced SEO for Web DevelopersAdvanced SEO for Web Developers
Advanced SEO for Web Developers
 
Web Scraping In Ruby Utosc 2009.Key
Web Scraping In Ruby Utosc 2009.KeyWeb Scraping In Ruby Utosc 2009.Key
Web Scraping In Ruby Utosc 2009.Key
 
Guide 6 - Tapping Into Your Website Configuration File.pdf
Guide 6 - Tapping Into Your Website Configuration File.pdfGuide 6 - Tapping Into Your Website Configuration File.pdf
Guide 6 - Tapping Into Your Website Configuration File.pdf
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
The Django Web Application Framework 2
The Django Web Application Framework 2The Django Web Application Framework 2
The Django Web Application Framework 2
 
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey IntroductionseyCwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
 
Getting More Traffic From Search Advanced Seo For Developers Presentation
Getting More Traffic From Search  Advanced Seo For Developers PresentationGetting More Traffic From Search  Advanced Seo For Developers Presentation
Getting More Traffic From Search Advanced Seo For Developers Presentation
 
Fast by Default
Fast by DefaultFast by Default
Fast by Default
 
Open Source Web Technologies
Open Source Web TechnologiesOpen Source Web Technologies
Open Source Web Technologies
 
Rapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 PlatformRapid Application Development with WSO2 Platform
Rapid Application Development with WSO2 Platform
 
Php frameworks
Php frameworksPhp frameworks
Php frameworks
 
Fast and Easy Website Tuneups
Fast and Easy Website TuneupsFast and Easy Website Tuneups
Fast and Easy Website Tuneups
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
 
High Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practicesHigh Performance Web Pages - 20 new best practices
High Performance Web Pages - 20 new best practices
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Search As A Service

  • 2. What's Marjory? A webservice for full­text indexing and   searching of documents Written in PHP  Based on Zend Framework  (Very) Roughly comparable to Solr  BSD­licensed, available on Google Code     
  • 3. How does Marjory work? Your application Sends search  Sends Document data Returns result in desired terms via GET or location via POST output format (default: XML) Marjory (ReST­based webservice) Stores document data Returns query Queries search in search engine results engine Search engine     (Default: Lucene)
  • 4. Features Search engine abstraction  use the engine that suits your needs, just write a   small adaptor class Zend_Search_Lucene built­in by default  Multiple search catalogs  Index many sites with one dedicated search server  Put all documents matching any criteria into   separate search indexes to speed up search    
  • 5. More features Two ways to index documents:  submit an XML snippet containing any content you   want to index or, just submit an URI (valid PHP stream resource)   and let Marjory extract the content from the  document HTML supported by default (for now)  add your own document parser class to extract plain text   from any other document format (or special markup  structures)    
  • 6. Even more features Index documents asynchronously using Dropr   as a messaging service Dropr: PHP­based durable messaging service  Example webservice and Dropr client included with   Marjory Application does not need to wait for document   retrieval, parsing and adding to the index More info: www.dropr.org     
  • 7. Latest additions Search results as a Dojo.Data compatible   JSON data source API exposure via JSON­RPC as alternative to   XML over ReST (experimental!)    
  • 8. How to add a catalog Send a POST request to:  http://marjory.example.com/rest/catalog/ Containing this XML snippet:  <add catalog=quot;MyGloriousCatalogquot; /> Et voilá, you got yourself a new search index     
  • 9. Adding a document Make a POST request to:  http://marjory.example.com/rest/add/ Send the document content as XML like this:  <add catalog=quot;defaultquot;> <doc uri=quot;MyUniqueDocumentIdquot;>     <field name=quot;titlequot;>Marjory: Search as a service</field>     <field name=quot;abstractquot;> An epic novel about full­text indexing in an SOA environment     </field>     <field name=quot;contentquot;>Lorem ipsum dolor sit amet... (to be continued)</field>  </doc> </add>    
  • 10. Adding a document, the easy way Or, if Marjory should retrieve and parse the   document: <add catalog=quot;defaultquot;>   <doc src=quot;http://my.website.tld/my/document.htmlquot; /> </add> If you have many and/or complex documents,   better use Dropr to send messages to Marjory    
  • 11. Searching for documents Make a GET request including the query terms:  http://marjory.example.com/rest/select?q=Marjory Additional parameters to...  Limit number of results  Include only specific fields in response  Specify a search catalog  Default catalog name: „default“ ­ who would have   guessed?    
  • 13. Looks familiar? Blatantly stolen from Solr :­)  Why reinvent the wheel?  Makes switching between the two projects easy   if need be Don't like it? Try JSON­RPC instead.     
  • 14. Access control No access control provided by Marjory  Use your webserver's authentication and ACL   capabilities There are currently no plans to add anything   built­in, unless someone convinces me  otherwise :­)    
  • 15. Things to do Fully unit­test the beast  Add a nice admin GUI (currently in progress)  Add other engines  Support more document formats out of the box  (PDF likely to be next addition) Fine­tuning (how about renaming or removing   catalogs, for example?)    
  • 16. Is it production­ready? Yes, and it's already being used on production   websites    
  • 17. That's all, folks! More information:  http://code.google.com/p/marjory/  http://www.dropr.org/  http://blog.wolff­hamburg.de/ 