SlideShare uma empresa Scribd logo
1 de 26
Lucene Introduction Otis Gospodnetic, Sematext Int’l @otisg [email_address] http://jroller.com/otis http://sematext.com/
About Otis ,[object Object],[object Object],[object Object],[object Object],[object Object]
What is Lucene? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
What Lucene Ain’t ,[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
The Lucene Family ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Integration Data Source Data Source Gather Parse Make Doc Search UI Search App e.g. webapp Search Index Index Otis Gospodnetic, Sematext Int’l
Integration: Rich Doc Indexing HTML PDF Gather Make Doc Index Index MS Word PDF Parse with Tika Otis Gospodnetic, Sematext Int’l
Lucene Strengths ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Query Types ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Query Syntax ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Code: FS Indexer Otis Gospodnetic, Sematext Int’l private  IndexWriter  writer; public Indexer(String indexDir) throws IOException { Directory dir =  FSDirectory.open (new File(indexDir)); writer =  new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT),  true, IndexWriter.MaxFieldLength.UNLIMITED); } public void close() throws IOException { writer.close(); } public void index(String dataDir, FileFilter filter) throws Exception { File[] files = new File(dataDir).listFiles(); for (File f: files) { Document doc =  new Document(); doc.add(new Field("contents", new FileReader(f))); doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); } }
Indexing Pipeline Otis Gospodnetic, Sematext Int’l Tokenizer TokenFilter Document Document Writer Inverted Index add
Indexer Pipeline: Analysis Source: Lucene in Action Otis Gospodnetic, Sematext Int’l ,[object Object],[object Object]
Analysis in Action Otis Gospodnetic, Sematext Int’l " The quick brown fox jumped over the lazy dogs " WhitespaceAnalyzer  : [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] SimpleAnalyzer  : [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] StopAnalyzer  : [quick] [brown] [fox] [jumped] [over] [lazy] [dogs]  StandardAnalyzer : [quick] [brown] [fox] [jumped] [over] [lazy] [dogs]  " XY&Z Corporation - xyz@example.com " WhitespaceAnalyzer : [XY&Z] [Corporation] [-] [xyz@example.com]  SimpleAnalyzer : [xy] [z] [corporation] [xyz] [example] [com]  StopAnalyzer : [xy] [z] [corporation] [xyz] [example] [com]  StandardAnalyzer : [xy&z] [corporation] [xyz@example.com]
Field Options ,[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Inverted Index Source: developer.apple.com Otis Gospodnetic, Sematext Int’l
Index Directory ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Code: Searcher Otis Gospodnetic, Sematext Int’l public void search(String indexDir, String q) throws IOException, ParseException { Directory dir =  FSDirectory.open (new File(indexDir)); IndexSearcher is =  new IndexSearcher(dir, true); QueryParser parser =  new QueryParser(&quot;contents&quot;, new  StandardAnalyzer(Version.LUCENE_CURRENT)); Query query = parser.parse(q); TopDocs hits = is.search(query, 10); System.err.println(&quot;Found &quot; + hits.totalHits + &quot; document(s)&quot;); for (int i=0; i<hits.scoreDocs.length; i++) { ScoreDoc scoreDoc = hits.scoreDocs[i]; Document doc = is.doc(scoreDoc.doc); System.out.println( doc.get(&quot;filename&quot;) ); } is.close(); }
Code: Doc Deletion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Code: Doc Updates Otis Gospodnetic, Sematext Int’l void  updateDocument(Term  term, Document  doc, Analyzer analyzer)             Updates a document by first deleting the document(s) containing term and then adding the new document.   void Via  IndexWriter  facade void  updateDocument(Term term, Document doc)            Updates a document by first deleting the document(s) containing term and then adding the new document.   void
Pitfalls ,[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Performance Tips ,[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Lucene 2.9 & 3.0 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Community [email_address] [email_address] Otis Gospodnetic, Sematext Int’l &quot;I posted, went to get a sandwich, and came back to see two answers.  The change works, and I can get the fix into production today. This list is magic.&quot;
Resources ,[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l
Contact ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Otis Gospodnetic, Sematext Int’l

Mais conteúdo relacionado

Mais procurados

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리Junyi Song
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginsearchbox-com
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
第45回elasticsearch勉強会 BERTモデルを利用した文書分類
第45回elasticsearch勉強会 BERTモデルを利用した文書分類第45回elasticsearch勉強会 BERTモデルを利用した文書分類
第45回elasticsearch勉強会 BERTモデルを利用した文書分類shinhiguchi
 
Elasticsearchプラグインの作り方
Elasticsearchプラグインの作り方Elasticsearchプラグインの作り方
Elasticsearchプラグインの作り方Shinsuke Sugaya
 
Azure Search 言語処理関連機能 〜 アナライザー、検索クエリー、辞書、& ランキング, etc
Azure Search 言語処理関連機能 〜 アナライザー、検索クエリー、辞書、& ランキング, etcAzure Search 言語処理関連機能 〜 アナライザー、検索クエリー、辞書、& ランキング, etc
Azure Search 言語処理関連機能 〜 アナライザー、検索クエリー、辞書、& ランキング, etcYoichi Kawasaki
 
Elastic Cloudを活用!!ゼロトラストセキュリティの「はじめの一歩」
Elastic Cloudを活用!!ゼロトラストセキュリティの「はじめの一歩」Elastic Cloudを活用!!ゼロトラストセキュリティの「はじめの一歩」
Elastic Cloudを活用!!ゼロトラストセキュリティの「はじめの一歩」Hibino Hisashi
 
Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchDaniel Schneiter
 
What's New in the Elastic 8.4 Release
What's New in the Elastic 8.4 ReleaseWhat's New in the Elastic 8.4 Release
What's New in the Elastic 8.4 ReleaseShotaro Suzuki
 
엘라스틱 서치 세미나
엘라스틱 서치 세미나엘라스틱 서치 세미나
엘라스틱 서치 세미나종현 김
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문SeungHyun Eom
 
Rest & RESTful WebServices
Rest & RESTful WebServicesRest & RESTful WebServices
Rest & RESTful WebServicesPrateek Tandon
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attentiontaeseon ryu
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearchAnton Udovychenko
 

Mais procurados (20)

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리elasticsearch_적용 및 활용_정리
elasticsearch_적용 및 활용_정리
 
RDF Data Model
RDF Data ModelRDF Data Model
RDF Data Model
 
Tutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component pluginTutorial on developing a Solr search component plugin
Tutorial on developing a Solr search component plugin
 
The Elastic ELK Stack
The Elastic ELK StackThe Elastic ELK Stack
The Elastic ELK Stack
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
第45回elasticsearch勉強会 BERTモデルを利用した文書分類
第45回elasticsearch勉強会 BERTモデルを利用した文書分類第45回elasticsearch勉強会 BERTモデルを利用した文書分類
第45回elasticsearch勉強会 BERTモデルを利用した文書分類
 
Elasticsearchプラグインの作り方
Elasticsearchプラグインの作り方Elasticsearchプラグインの作り方
Elasticsearchプラグインの作り方
 
Word embedding
Word embedding Word embedding
Word embedding
 
Azure Search 言語処理関連機能 〜 アナライザー、検索クエリー、辞書、& ランキング, etc
Azure Search 言語処理関連機能 〜 アナライザー、検索クエリー、辞書、& ランキング, etcAzure Search 言語処理関連機能 〜 アナライザー、検索クエリー、辞書、& ランキング, etc
Azure Search 言語処理関連機能 〜 アナライザー、検索クエリー、辞書、& ランキング, etc
 
Elastic Cloudを活用!!ゼロトラストセキュリティの「はじめの一歩」
Elastic Cloudを活用!!ゼロトラストセキュリティの「はじめの一歩」Elastic Cloudを活用!!ゼロトラストセキュリティの「はじめの一歩」
Elastic Cloudを活用!!ゼロトラストセキュリティの「はじめの一歩」
 
Made to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using ElasticsearchMade to Measure: Ranking Evaluation using Elasticsearch
Made to Measure: Ranking Evaluation using Elasticsearch
 
What's New in the Elastic 8.4 Release
What's New in the Elastic 8.4 ReleaseWhat's New in the Elastic 8.4 Release
What's New in the Elastic 8.4 Release
 
Azure Search 大全
Azure Search 大全Azure Search 大全
Azure Search 大全
 
엘라스틱 서치 세미나
엘라스틱 서치 세미나엘라스틱 서치 세미나
엘라스틱 서치 세미나
 
JSON in Solr: from top to bottom
JSON in Solr: from top to bottomJSON in Solr: from top to bottom
JSON in Solr: from top to bottom
 
Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문Elastic Search (엘라스틱서치) 입문
Elastic Search (엘라스틱서치) 입문
 
Rest & RESTful WebServices
Rest & RESTful WebServicesRest & RESTful WebServices
Rest & RESTful WebServices
 
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled AttentionDeBERTA : Decoding-Enhanced BERT with Disentangled Attention
DeBERTA : Decoding-Enhanced BERT with Disentangled Attention
 
Search and analyze your data with elasticsearch
Search and analyze your data with elasticsearchSearch and analyze your data with elasticsearch
Search and analyze your data with elasticsearch
 

Destaque

Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Edureka!
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Edureka!
 

Destaque (8)

Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
Hadoop Ecosystem | Big Data Analytics Tools | Hadoop Tutorial | Edureka
 
Lucene, Apache
Lucene, ApacheLucene, Apache
Lucene, Apache
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
Lucene basics
Lucene basicsLucene basics
Lucene basics
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
SlideShare 101
SlideShare 101SlideShare 101
SlideShare 101
 

Semelhante a Lucene Introduction

Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAsad Abbas
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersBen van Mol
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)Kira
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsTiziano Fagni
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with railsRishav Dixit
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화NAVER D2
 

Semelhante a Lucene Introduction (20)

Advanced full text searching techniques using Lucene
Advanced full text searching techniques using LuceneAdvanced full text searching techniques using Lucene
Advanced full text searching techniques using Lucene
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Lucene in Action
Lucene in ActionLucene in Action
Lucene in Action
 
Dapper
DapperDapper
Dapper
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Tutorial 5 (lucene)
Tutorial 5 (lucene)Tutorial 5 (lucene)
Tutorial 5 (lucene)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Fast track to lucene
Fast track to luceneFast track to lucene
Fast track to lucene
 
Elasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analyticsElasticsearch, a distributed search engine with real-time analytics
Elasticsearch, a distributed search engine with real-time analytics
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Using Thinking Sphinx with rails
Using Thinking Sphinx with railsUsing Thinking Sphinx with rails
Using Thinking Sphinx with rails
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 

Mais de otisg

Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)otisg
 
Lucandra
LucandraLucandra
Lucandraotisg
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Luceneotisg
 
UIMA
UIMAUIMA
UIMAotisg
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrievalotisg
 
Faceted Search and Solr
Faceted Search and SolrFaceted Search and Solr
Faceted Search and Solrotisg
 

Mais de otisg (6)

Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)Search at Tumblr (nyc search meetup)
Search at Tumblr (nyc search meetup)
 
Lucandra
LucandraLucandra
Lucandra
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
UIMA
UIMAUIMA
UIMA
 
Probabilistic Retrieval
Probabilistic RetrievalProbabilistic Retrieval
Probabilistic Retrieval
 
Faceted Search and Solr
Faceted Search and SolrFaceted Search and Solr
Faceted Search and Solr
 

Último

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Último (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Lucene Introduction

  • 1. Lucene Introduction Otis Gospodnetic, Sematext Int’l @otisg [email_address] http://jroller.com/otis http://sematext.com/
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Integration Data Source Data Source Gather Parse Make Doc Search UI Search App e.g. webapp Search Index Index Otis Gospodnetic, Sematext Int’l
  • 7. Integration: Rich Doc Indexing HTML PDF Gather Make Doc Index Index MS Word PDF Parse with Tika Otis Gospodnetic, Sematext Int’l
  • 8.
  • 9.
  • 10.
  • 11. Code: FS Indexer Otis Gospodnetic, Sematext Int’l private IndexWriter writer; public Indexer(String indexDir) throws IOException { Directory dir = FSDirectory.open (new File(indexDir)); writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true, IndexWriter.MaxFieldLength.UNLIMITED); } public void close() throws IOException { writer.close(); } public void index(String dataDir, FileFilter filter) throws Exception { File[] files = new File(dataDir).listFiles(); for (File f: files) { Document doc = new Document(); doc.add(new Field(&quot;contents&quot;, new FileReader(f))); doc.add(new Field(&quot;filename&quot;, f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); } }
  • 12. Indexing Pipeline Otis Gospodnetic, Sematext Int’l Tokenizer TokenFilter Document Document Writer Inverted Index add
  • 13.
  • 14. Analysis in Action Otis Gospodnetic, Sematext Int’l &quot; The quick brown fox jumped over the lazy dogs &quot; WhitespaceAnalyzer : [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] SimpleAnalyzer : [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dogs] StopAnalyzer : [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] StandardAnalyzer : [quick] [brown] [fox] [jumped] [over] [lazy] [dogs] &quot; XY&Z Corporation - xyz@example.com &quot; WhitespaceAnalyzer : [XY&Z] [Corporation] [-] [xyz@example.com] SimpleAnalyzer : [xy] [z] [corporation] [xyz] [example] [com] StopAnalyzer : [xy] [z] [corporation] [xyz] [example] [com] StandardAnalyzer : [xy&z] [corporation] [xyz@example.com]
  • 15.
  • 16. Inverted Index Source: developer.apple.com Otis Gospodnetic, Sematext Int’l
  • 17.
  • 18. Code: Searcher Otis Gospodnetic, Sematext Int’l public void search(String indexDir, String q) throws IOException, ParseException { Directory dir = FSDirectory.open (new File(indexDir)); IndexSearcher is = new IndexSearcher(dir, true); QueryParser parser = new QueryParser(&quot;contents&quot;, new StandardAnalyzer(Version.LUCENE_CURRENT)); Query query = parser.parse(q); TopDocs hits = is.search(query, 10); System.err.println(&quot;Found &quot; + hits.totalHits + &quot; document(s)&quot;); for (int i=0; i<hits.scoreDocs.length; i++) { ScoreDoc scoreDoc = hits.scoreDocs[i]; Document doc = is.doc(scoreDoc.doc); System.out.println( doc.get(&quot;filename&quot;) ); } is.close(); }
  • 19.
  • 20. Code: Doc Updates Otis Gospodnetic, Sematext Int’l void updateDocument(Term  term, Document  doc, Analyzer analyzer)           Updates a document by first deleting the document(s) containing term and then adding the new document.   void Via IndexWriter facade void updateDocument(Term term, Document doc)           Updates a document by first deleting the document(s) containing term and then adding the new document.   void
  • 21.
  • 22.
  • 23.
  • 24. Community [email_address] [email_address] Otis Gospodnetic, Sematext Int’l &quot;I posted, went to get a sandwich, and came back to see two answers. The change works, and I can get the fix into production today. This list is magic.&quot;
  • 25.
  • 26.