SlideShare uma empresa Scribd logo
1 de 11
Apache Lucene
Full text search
Marcelo
What’s that?
 API created on 00’s
 Apache owns that
 Indexing
 Searching
 Available on Java, .NET, C++
Why is that so good?
 Enhance user experience
 More inteligent products
 Speed processing
 Relevance
 Efficient
 Suggestions
Indexing
 IndexWritter
1. Directory implementation
2. Analizer
 Create documents
 Add these document to IndexWritter
 Optimize (merge segments)
 Close writter
Indexing
Searching
 Directory
 IndexSearcher
 QueryParser
 Query(“my search”)
 TopDocs
Searching
How does that work?
 Inverted Index
 Term Normalization
1. Similar words (merge)
2. Stop words (remove)
3. +relevance –size on disk
Term Document Ids
And 1,2,3
Big 2,4,7
Fire 1
Keep 7,8
keeper 3,4
the 1,8
Analyzers
“@Andy52 went to school yesterday!”
 StandardAnalyzer
[@Andy52] [went] [school] [yesterday!]
 StopAnalyzer
[Andy] [went] [school] [yesterday]
 SimpleAnalyzer
[andy] [went] [to] [school] [yesterday]
 WhitespaceAnalyzer
[@Andy52] [went] [to] [school] [yesterday]
 KeywordAnalyzer
[@Andy52 went to school yesterday!]
What known apps use that?
 Twitter
 Linked In
 My Space
That’s all, thanks!

Mais conteúdo relacionado

Semelhante a Apache lucene - full text search

Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
GokulD
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
TO THE NEW | Technology
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
Willem Meints
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
WO Community
 

Semelhante a Apache lucene - full text search (20)

Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
BigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearchBigData Search Simplified with ElasticSearch
BigData Search Simplified with ElasticSearch
 
Lucene Introduction
Lucene IntroductionLucene Introduction
Lucene Introduction
 
Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
ProjectHub
ProjectHubProjectHub
ProjectHub
 
Episerver and search engines
Episerver and search enginesEpiserver and search engines
Episerver and search engines
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case EviraElastic & Azure & Episever, Case Evira
Elastic & Azure & Episever, Case Evira
 
Search enabled applications with lucene.net
Search enabled applications with lucene.netSearch enabled applications with lucene.net
Search enabled applications with lucene.net
 
Full Text Search with Lucene
Full Text Search with LuceneFull Text Search with Lucene
Full Text Search with Lucene
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Introduction To Apache Lucene
Introduction To Apache LuceneIntroduction To Apache Lucene
Introduction To Apache Lucene
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 

Mais de Marcelo Cure

SciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolSciPy - Scientific Computing Tool
SciPy - Scientific Computing Tool
Marcelo Cure
 
Test driven development
Test driven developmentTest driven development
Test driven development
Marcelo Cure
 

Mais de Marcelo Cure (16)

Api design
Api designApi design
Api design
 
Zero mq
Zero mqZero mq
Zero mq
 
Dev ops engineering and chatbots
Dev ops engineering and chatbotsDev ops engineering and chatbots
Dev ops engineering and chatbots
 
Versioning APIs
Versioning APIsVersioning APIs
Versioning APIs
 
Building restful ap is with harvester js
Building restful ap is with harvester jsBuilding restful ap is with harvester js
Building restful ap is with harvester js
 
Cqrs, event sourcing and microservices
Cqrs, event sourcing and microservicesCqrs, event sourcing and microservices
Cqrs, event sourcing and microservices
 
Immutability and immutable js
Immutability and immutable jsImmutability and immutable js
Immutability and immutable js
 
Functional programming with python
Functional programming with pythonFunctional programming with python
Functional programming with python
 
Polymer
PolymerPolymer
Polymer
 
Hexagonal Architecture
Hexagonal ArchitectureHexagonal Architecture
Hexagonal Architecture
 
What's the value of the metrics
What's the value of the metricsWhat's the value of the metrics
What's the value of the metrics
 
Scala
ScalaScala
Scala
 
SciPy - Scientific Computing Tool
SciPy - Scientific Computing ToolSciPy - Scientific Computing Tool
SciPy - Scientific Computing Tool
 
Test driven development
Test driven developmentTest driven development
Test driven development
 
Usability testing
Usability testingUsability testing
Usability testing
 
Corona
CoronaCorona
Corona
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Apache lucene - full text search

  • 1. Apache Lucene Full text search Marcelo
  • 2. What’s that?  API created on 00’s  Apache owns that  Indexing  Searching  Available on Java, .NET, C++
  • 3. Why is that so good?  Enhance user experience  More inteligent products  Speed processing  Relevance  Efficient  Suggestions
  • 4. Indexing  IndexWritter 1. Directory implementation 2. Analizer  Create documents  Add these document to IndexWritter  Optimize (merge segments)  Close writter
  • 6. Searching  Directory  IndexSearcher  QueryParser  Query(“my search”)  TopDocs
  • 8. How does that work?  Inverted Index  Term Normalization 1. Similar words (merge) 2. Stop words (remove) 3. +relevance –size on disk Term Document Ids And 1,2,3 Big 2,4,7 Fire 1 Keep 7,8 keeper 3,4 the 1,8
  • 9. Analyzers “@Andy52 went to school yesterday!”  StandardAnalyzer [@Andy52] [went] [school] [yesterday!]  StopAnalyzer [Andy] [went] [school] [yesterday]  SimpleAnalyzer [andy] [went] [to] [school] [yesterday]  WhitespaceAnalyzer [@Andy52] [went] [to] [school] [yesterday]  KeywordAnalyzer [@Andy52 went to school yesterday!]
  • 10. What known apps use that?  Twitter  Linked In  My Space