SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Ferret
A Ruby Search Engine
  Brian Sam-Bodden
Agenda

• What is Ferret?
• Concepts
• Fields
• Indexing
• Installing Ferret
Agenda

• The Recipe
• Documents
• Ferret::Index::Index
• FQL
• Ferret in you App
Agenda

• Ferret in Rails
• Resources
What is Ferret?

• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the         Search Engine

• Port to Ruby by David Balmain
What is Ferret?

• Initially a 100% pure Ruby port
• Since 0.9 many core functions are
  implemented in C

• Fast! Now Faster than Lucene ;-)
Concepts
Concepts

• Index : Sequence of documents
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text string, keyed by field name
Fields of a Document in
        an Index
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed

  • Vectored: Frequency and location of Terms are
    stored
It’s all about Indexing

• Indexing is the processing of a source
  document into plain text tokens that Ferret
  can manipulate
• For any non-plaintext sources such as PDF,
  Word, Excel you need to:
  • Extract
  • Analyze
Installing Ferret
Installing Ferret



gem install ferret
Installing Ferret
Installing Ferret
Installing Ferret



    }
Installing Ferret



    }   Pick the latest version
        for your platform
The Recipe
The Recipe

1. Create some Documents
The Recipe

1. Create some Documents

2. Create an Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

4. Perform some Queries
Example Documents
 Create some Documents
Example Documents
  Create some Documents




 “Any String is a Document”
Example Documents
 Create some Documents
Example Documents
   Create some Documents




[“This”, “is also”, “a document”]
Example Documents
 Create some Documents
Example Documents
 Create some Documents
Ferret::Index::Index
     Create an Index
Ferret::Index::Index
            Create an Index

• Indexes are encapsulated by the class
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
 ➡ index = Ferret::I.new()
Ferret::Index::Index
     Adding Documents to the Index

• Index provides the add_document
  method

• It also provides the << alias
• Adding documents is then as easy as:
 ➡ index << “This is a document”
 ➡ index << {:first => “Bob”, :last => “Smith”}
Ferret::Index::Index
   Perform some Queries
Ferret::Index::Index
         Perform some Queries

• Index provides the search and
  search_each methods
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
Ferret::Index::Index
           Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
Ferret::Index::Index
            Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
 ➡ search_each(query, options = {}) {|doc, score| ... }
Playing with Ferret in irb
Playing with Ferret in irb
Ferret Query Language

• Ferret own Query Language, FQL is a
  powerful way to specify search queries

• FQL supports many query types,
  including:

     • Term         • Range
     • Phrase       • Wild
     • Field        • Fuzz
     • Boolean
Index.explain

• The explain method of Index describes
  how a document score against a query
 • Very useful for debugging
 • and for learning how Ferret works
Index.explain
Ferret in your App
Application


                   Database             Web


                                                                   User
                                          Manual
              File System
                                           Input


                                                                             Present
                                                      Get User’s
                        Gather Data                                       Search Results
                                                        Query



                              Index
                                                             Search Index
                            Documents
Ferret




                                              Index
Ferret in Rails

• Acts As Ferret is an ActiveRecord
  extension

• Available as a plugin
• Provides a simplified interface to
  Ferret
• Maintained by Jens Kramer
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails
• Simple model has two searchable
  fields title and body:
Ferret in Rails

• After a quick rake db:migrate we now
  have some data to play with
• Fire up the Rails Console and let’s see
  what acts_as_ferret can do for our
  models
Ferret in Rails
Want more?

• Ferret is improving constantly
• Acts As Ferret seems to catch up
  quickly

• Real-life usage seems to require some
  good engineering on your part

  • Background indexing
  • Hot swap of indexes?
Want more?

• We only covered the simplest
  constructs in Ferret

• Ferret’s API provides enough
  flexibility for the most demanding
  searching needs
Online Resources

• http://ferret.davebalmain.com
• http://lucene.apache.org
• http://lucenebook.com
• http://projects.jkraemer.net/acts_as_ferret
In-Print Resources
Thanks!

Mais conteúdo relacionado

Mais procurados

Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conferenceErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCampGokulD
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextCarsten Czarski
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introductionotisg
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013Owen O'Malley
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 

Mais procurados (20)

Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Solr Black Belt Pre-conference
Solr Black Belt Pre-conferenceSolr Black Belt Pre-conference
Solr Black Belt Pre-conference
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene BootCamp
Lucene BootCampLucene BootCamp
Lucene BootCamp
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Lucene indexing
Lucene indexingLucene indexing
Lucene indexing
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
ORC Files
ORC FilesORC Files
ORC Files
 
Elasticsearch speed is key
Elasticsearch speed is keyElasticsearch speed is key
Elasticsearch speed is key
 
Search Lucene
Search LuceneSearch Lucene
Search Lucene
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
 
Lucene and MySQL
Lucene and MySQLLucene and MySQL
Lucene and MySQL
 
Apache Solr Workshop
Apache Solr WorkshopApache Solr Workshop
Apache Solr Workshop
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 

Semelhante a Ferret

Ferret A Ruby Search Engine
Ferret A Ruby Search EngineFerret A Ruby Search Engine
Ferret A Ruby Search Engineelliando dias
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with LuceneWO Community
 
Codeigniter Training Part3
Codeigniter Training Part3Codeigniter Training Part3
Codeigniter Training Part3Weerayut Hongsa
 
EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection Damir Delija
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxMichael Hackstein
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die() LivePerson
 
Installing and tweaking FASTSearch
Installing and tweaking FASTSearchInstalling and tweaking FASTSearch
Installing and tweaking FASTSearchArno Flapper
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenchesIsmail Mayat
 
Website designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopmentWebsite designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopmentCss Founder
 
Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006pogil
 
Sage 300 ERP: Technical Tour of Diagnostic Tools
Sage 300 ERP: Technical Tour of Diagnostic ToolsSage 300 ERP: Technical Tour of Diagnostic Tools
Sage 300 ERP: Technical Tour of Diagnostic ToolsSage 300 ERP CS
 
Write code that writes code! A beginner's guide to Annotation Processing - Ja...
Write code that writes code! A beginner's guide to Annotation Processing - Ja...Write code that writes code! A beginner's guide to Annotation Processing - Ja...
Write code that writes code! A beginner's guide to Annotation Processing - Ja...DroidConTLV
 
Write code that writes code!
Write code that writes code!Write code that writes code!
Write code that writes code!Jason Feinstein
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search EnginesNitin Pande
 

Semelhante a Ferret (20)

Ferret A Ruby Search Engine
Ferret A Ruby Search EngineFerret A Ruby Search Engine
Ferret A Ruby Search Engine
 
Scaling / optimizing search on netlog
Scaling / optimizing search on netlogScaling / optimizing search on netlog
Scaling / optimizing search on netlog
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
Codeigniter Training Part3
Codeigniter Training Part3Codeigniter Training Part3
Codeigniter Training Part3
 
EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection EnCase Enterprise Basic File Collection
EnCase Enterprise Basic File Collection
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Rapid API Development ArangoDB Foxx
Rapid API Development ArangoDB FoxxRapid API Development ArangoDB Foxx
Rapid API Development ArangoDB Foxx
 
Measure() or die()
Measure() or die()Measure() or die()
Measure() or die()
 
Measure() or die()
Measure() or die() Measure() or die()
Measure() or die()
 
Installing and tweaking FASTSearch
Installing and tweaking FASTSearchInstalling and tweaking FASTSearch
Installing and tweaking FASTSearch
 
Examiness hints and tips from the trenches
Examiness hints and tips from the trenchesExaminess hints and tips from the trenches
Examiness hints and tips from the trenches
 
Website designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopmentWebsite designing company_in_delhi_phpwebdevelopment
Website designing company_in_delhi_phpwebdevelopment
 
Zend Framework MVC driven ExtJS
Zend Framework MVC driven ExtJSZend Framework MVC driven ExtJS
Zend Framework MVC driven ExtJS
 
Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006Lemur Tutorial at SIGIR 2006
Lemur Tutorial at SIGIR 2006
 
Sage 300 ERP: Technical Tour of Diagnostic Tools
Sage 300 ERP: Technical Tour of Diagnostic ToolsSage 300 ERP: Technical Tour of Diagnostic Tools
Sage 300 ERP: Technical Tour of Diagnostic Tools
 
Web server
Web serverWeb server
Web server
 
Write code that writes code! A beginner's guide to Annotation Processing - Ja...
Write code that writes code! A beginner's guide to Annotation Processing - Ja...Write code that writes code! A beginner's guide to Annotation Processing - Ja...
Write code that writes code! A beginner's guide to Annotation Processing - Ja...
 
Write code that writes code!
Write code that writes code!Write code that writes code!
Write code that writes code!
 
Introduction to Search Engines
Introduction to Search EnginesIntroduction to Search Engines
Introduction to Search Engines
 
Codeigniter
CodeigniterCodeigniter
Codeigniter
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Ferret

  • 1. Ferret A Ruby Search Engine Brian Sam-Bodden
  • 2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
  • 3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
  • 4. Agenda • Ferret in Rails • Resources
  • 5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
  • 6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
  • 8. Concepts • Index : Sequence of documents
  • 9. Concepts • Index : Sequence of documents • Document : Sequence of fields
  • 10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
  • 11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
  • 12. Fields of a Document in an Index
  • 13. Fields of a Document in an Index • Fields are individually searchable units that are:
  • 14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
  • 15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
  • 16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
  • 17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
  • 18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
  • 24. Installing Ferret } Pick the latest version for your platform
  • 26. The Recipe 1. Create some Documents
  • 27. The Recipe 1. Create some Documents 2. Create an Index
  • 28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
  • 29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
  • 30. Example Documents Create some Documents
  • 31. Example Documents Create some Documents “Any String is a Document”
  • 32. Example Documents Create some Documents
  • 33. Example Documents Create some Documents [“This”, “is also”, “a document”]
  • 34. Example Documents Create some Documents
  • 35. Example Documents Create some Documents
  • 36. Ferret::Index::Index Create an Index
  • 37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
  • 38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
  • 39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
  • 40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
  • 41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
  • 42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
  • 43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
  • 44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
  • 45. Ferret::Index::Index Perform some Queries
  • 46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
  • 47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
  • 48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
  • 49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
  • 50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
  • 53. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
  • 54. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
  • 56. Ferret in your App Application Database Web User Manual File System Input Present Get User’s Gather Data Search Results Query Index Search Index Documents Ferret Index
  • 57. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
  • 58. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 59. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 60. Ferret in Rails • Simple model has two searchable fields title and body:
  • 61. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
  • 63. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
  • 64. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
  • 65. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret