SlideShare uma empresa Scribd logo
1 de 20
BY
N. SUMANJALI
DPT OF LIS
PONDICHERRY UNIVERSITY
INFORMATION RETRIEVAL
 Information retrieval is the activity of obtaining
information resources relevant to an information need
from a collection of information resources.
 Searches can be based on metadata or on full-text (or
other content-based) indexing.
 Goal: Find the documents most relevant to a certain Query
 Dealing with notions of:
 Collection of documents
 Query (User’s information need)
 Notion of Relevancy
MODEL
 A model is a construct designed help us understand a
complex system
 A particular way of “looking at things”
 Models inevitably make simplifying assumptions
 What are the limitations of the model?
 Different types of models:
 Conceptual models
 Physical analog models
 Mathematical models
Retrieval Models
A retrieval model specifies the details
of:
 Document representation
 Query representation
 Retrieval function
Determines a notion of relevance.
Notion of relevance can be binary or
continuous (i.e. ranked retrieval).
CLASSES OF RM
Boolean models (set theoretic)
 Extended Boolean
Vector space models
(statistical/algebraic)
 Generalized VS
 Latent Semantic Indexing
Probabilistic models
MODELS OF IR
 Boolean model
 Based on the notion of sets
 Documents are retrieved only if they satisfy Boolean
conditions specified in the query
 Does not impose a ranking on retrieved documents
 Exact match
 Vector space model
 Based on geometry, the notion of vectors in high dimensional
space
 Documents are ranked based on their similarity to the query
(ranked retrieval)
 Best/partial match
 Language models
 Based on the notion of probabilities and processes for
generating text
 Documents are ranked based on the probability that
they generated the query
 Best/partial match
BOOLEAN MODEL
 Invented by George Boole (1815-1864)
 He devised a system of symbolic logic in which he used
three operators (+, , - ) to combine statements in
symbolic form.
 John Venn named to this operators of Boolean logic
are the logical sum(+), logical product(), and logical
difference(-).
 IR systems allow the users to express their queries by
using this operators.
BOOLEAN MODEL
 Each index term is either present or absent
 Documents are either Relevant or Not Relevant(no
ranking)
 A document is represented as a set of keywords.
 Queries are Boolean expressions of
keywords, connected by AND, OR, and
NOT, including the use of brackets to indicate scope.
 [[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton]
 Output: Document is relevant or not. No partial
matches or ranking.
BOOLEAN RETRIEVAL MODEL
 Popular retrieval model because:
 Easy to understand for simple queries.
 Clean formalism.
 Boolean models can be extended to include ranking.
 Reasonably efficient implementations possible for
normal queries.
BOOLEAN MODEL
 Weights assigned to terms are either “0” or “1”
 “0” represents “absence”: term isn’t in the document
 “1” represents “presence”: term is in the document
 Build queries by combining terms with Boolean
operators
 AND, OR, NOT
 The system returns all documents that satisfy the
query
AND/OR/NOT
A B
C
Why Boolean Retrieval Works
 Boolean operators approximate natural language
 Find documents about a good party that is not over
 AND can discover relationships between concepts
 good party
 OR can discover alternate terminology
 excellent party, wild party, etc.
 NOT can discover alternate meanings
 Democratic party
The Perfect Query Paradox
 Every information need has a perfect set of documents
 If not, there would be no sense doing retrieval
 Every document set has a perfect query
 AND every word in a document to get a query for it
 Repeat for each document in the set
 OR every document query to get the set query
 But can users realistically be expected to formulate this
perfect query?
 Boolean query formulation is hard!
Why Boolean Retrieval Fails
• Natural language is way more complex
• AND “discovers” nonexistent relationships
– Terms in different sentences, paragraphs, …
• Guessing terminology for OR is hard
– good, nice, excellent, outstanding, awesome, …
• Guessing terms to exclude is even harder!
– Democratic party, party to a lawsuit, …
BOOLEAN MODEL
 Strengths
 Precise, if you know the right strategies
 Precise, if you have an idea of what you’re looking for
 Efficient for the computer
 Simple
 Weaknesses
 Users must learn Boolean logic
 Boolean logic insufficient to capture the richness of language
 No control over size of result set: either too many documents or none
 When do you stop reading? All documents in the result set are
considered “equally good”
 What about partial matches? Documents that “don’t quite match” the
query may be useful also
 No notion of ranking (exact matching only)
 All index terms have equal weight
PROBLEMS
 Very rigid: AND means all; OR means any.
 Difficult to express complex user requests.
 Difficult to control the number of documents retrieved.
 All matched documents will be returned.
 Difficult to rank output.
 All matched documents logically satisfy the query.
 Difficult to perform relevance feedback.
 If a document is identified by the user as relevant or
irrelevant, how should the query be modified?
ADVANTAGES & DISADVANTAGES
 Advantages
 Results are predictable, relatively easy to explain
 Many different features can be incorporated
 Efficient processing since many documents can be
eliminated from search
 Disadvantages
 Effectiveness depends entirely on user
 Simple queries usually don’t work well
 Complex queries are difficult.
LIMITATIONS
 The first relates to the formulation of search statements.
 It has been noted that users are not able to formulate an exact search
statement by the combination of AND, OR and NOT operators,
especially when several query terms are involved.
 In such cases either the search statement becomes too narrow or too
broad.
 The second limitation relates to the number of retrieval items.
 It has been noted that users cannot predict a priori exactly how many
items are to be retrieved to satisfy a given query.
 If the search statement is broad, the number of retrieved items may
sometimes be several hundreds and thus it may be quite difficult to
find out the exact information required.
 The third limitation is that it identifies an item as relevant by finding
out whether a given query term is present or not in a given record in the
database.
Model  of information retrieval (3)

Mais conteúdo relacionado

Mais procurados

Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
Nanthini Dominique
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
baradhimarch81
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
silambu111
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
silambu111
 
Query formulation process
Query formulation processQuery formulation process
Query formulation process
malathimurugan
 

Mais procurados (20)

Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Vector space model of information retrieval
Vector space model of information retrievalVector space model of information retrieval
Vector space model of information retrieval
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
Information retrieval 7 boolean model
Information retrieval 7 boolean modelInformation retrieval 7 boolean model
Information retrieval 7 boolean model
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Vector space model in information retrieval
Vector space model in information retrievalVector space model in information retrieval
Vector space model in information retrieval
 
Inverted index
Inverted indexInverted index
Inverted index
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 
Functions of information retrival system(1)
Functions of information retrival system(1)Functions of information retrival system(1)
Functions of information retrival system(1)
 
Term weighting
Term weightingTerm weighting
Term weighting
 
Signature files
Signature filesSignature files
Signature files
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
 
Digital library
Digital libraryDigital library
Digital library
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications
 
information retrieval Techniques and normalization
information retrieval Techniques and normalizationinformation retrieval Techniques and normalization
information retrieval Techniques and normalization
 
Query formulation process
Query formulation processQuery formulation process
Query formulation process
 
Information retrieval 9 tf idf weights
Information retrieval 9 tf idf weightsInformation retrieval 9 tf idf weights
Information retrieval 9 tf idf weights
 
WEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEMWEB BASED INFORMATION RETRIEVAL SYSTEM
WEB BASED INFORMATION RETRIEVAL SYSTEM
 

Destaque

Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
Sadaf Rafiq
 
Proofreading and Editing
Proofreading and EditingProofreading and Editing
Proofreading and Editing
Molly Amell
 
Proof reading, editing and revising by sohail ahmed
Proof reading, editing and revising by sohail ahmedProof reading, editing and revising by sohail ahmed
Proof reading, editing and revising by sohail ahmed
Sohail Ahmed Solangi
 
Mechanics of writing – mla (chapter 3
Mechanics of writing – mla (chapter 3Mechanics of writing – mla (chapter 3
Mechanics of writing – mla (chapter 3
Aravind Nair
 

Destaque (20)

Information storage and retrieval
Information storage and retrievalInformation storage and retrieval
Information storage and retrieval
 
Copyright issues in a library digital environment
Copyright issues in a library digital environmentCopyright issues in a library digital environment
Copyright issues in a library digital environment
 
Boolean Matching in Logic Synthesis
Boolean Matching in Logic SynthesisBoolean Matching in Logic Synthesis
Boolean Matching in Logic Synthesis
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Ir models
Ir modelsIr models
Ir models
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Planning and Implementing a Digital Library Project
Planning and Implementing a Digital Library ProjectPlanning and Implementing a Digital Library Project
Planning and Implementing a Digital Library Project
 
E-RESOURCES
E-RESOURCESE-RESOURCES
E-RESOURCES
 
Proofreading and Editing
Proofreading and EditingProofreading and Editing
Proofreading and Editing
 
Information Retrieval Models Part I
Information Retrieval Models Part IInformation Retrieval Models Part I
Information Retrieval Models Part I
 
Ir for it&ites
Ir for it&itesIr for it&ites
Ir for it&ites
 
Information Consolidation
Information ConsolidationInformation Consolidation
Information Consolidation
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Proof reading, editing and revising by sohail ahmed
Proof reading, editing and revising by sohail ahmedProof reading, editing and revising by sohail ahmed
Proof reading, editing and revising by sohail ahmed
 
Editing ppt
Editing pptEditing ppt
Editing ppt
 
RTOS- Real Time Operating Systems
RTOS- Real Time Operating Systems RTOS- Real Time Operating Systems
RTOS- Real Time Operating Systems
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Mechanics of writing – mla (chapter 3
Mechanics of writing – mla (chapter 3Mechanics of writing – mla (chapter 3
Mechanics of writing – mla (chapter 3
 

Semelhante a Model of information retrieval (3)

14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
RIILP
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
IJET - International Journal of Engineering and Techniques
 

Semelhante a Model of information retrieval (3) (20)

Interview_Search_Process (1).pptx
Interview_Search_Process (1).pptxInterview_Search_Process (1).pptx
Interview_Search_Process (1).pptx
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
An empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systemsAn empirical performance evaluation of relational keyword search systems
An empirical performance evaluation of relational keyword search systems
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Planning and Preparing to Search
Planning and Preparing to SearchPlanning and Preparing to Search
Planning and Preparing to Search
 
The comparative study of information retrieval models used in search engines
The comparative study of information retrieval models used in search enginesThe comparative study of information retrieval models used in search engines
The comparative study of information retrieval models used in search engines
 
Sub1579
Sub1579Sub1579
Sub1579
 
Polyrepresentation in a Quantum-inspired Information Retrieval Framework
Polyrepresentation in a Quantum-inspired Information Retrieval FrameworkPolyrepresentation in a Quantum-inspired Information Retrieval Framework
Polyrepresentation in a Quantum-inspired Information Retrieval Framework
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
Chapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdfChapter 1 Introduction to Information Storage and Retrieval.pdf
Chapter 1 Introduction to Information Storage and Retrieval.pdf
 
Edad 695 research methodology
Edad 695 research methodologyEdad 695 research methodology
Edad 695 research methodology
 
Recommenders, Topics, and Text
Recommenders, Topics, and TextRecommenders, Topics, and Text
Recommenders, Topics, and Text
 
Reference: NYLA Library Assistants Training
Reference: NYLA Library Assistants TrainingReference: NYLA Library Assistants Training
Reference: NYLA Library Assistants Training
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
 
Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
Modern information Retrieval-Relevance Feedback
Modern information Retrieval-Relevance FeedbackModern information Retrieval-Relevance Feedback
Modern information Retrieval-Relevance Feedback
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
TheoreticalFramework
TheoreticalFrameworkTheoreticalFramework
TheoreticalFramework
 
Hinari basic course_module_2_workbook_2014_07
Hinari basic course_module_2_workbook_2014_07Hinari basic course_module_2_workbook_2014_07
Hinari basic course_module_2_workbook_2014_07
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Model of information retrieval (3)

  • 1. BY N. SUMANJALI DPT OF LIS PONDICHERRY UNIVERSITY
  • 2. INFORMATION RETRIEVAL  Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources.  Searches can be based on metadata or on full-text (or other content-based) indexing.  Goal: Find the documents most relevant to a certain Query  Dealing with notions of:  Collection of documents  Query (User’s information need)  Notion of Relevancy
  • 3. MODEL  A model is a construct designed help us understand a complex system  A particular way of “looking at things”  Models inevitably make simplifying assumptions  What are the limitations of the model?  Different types of models:  Conceptual models  Physical analog models  Mathematical models
  • 4. Retrieval Models A retrieval model specifies the details of:  Document representation  Query representation  Retrieval function Determines a notion of relevance. Notion of relevance can be binary or continuous (i.e. ranked retrieval).
  • 5. CLASSES OF RM Boolean models (set theoretic)  Extended Boolean Vector space models (statistical/algebraic)  Generalized VS  Latent Semantic Indexing Probabilistic models
  • 6. MODELS OF IR  Boolean model  Based on the notion of sets  Documents are retrieved only if they satisfy Boolean conditions specified in the query  Does not impose a ranking on retrieved documents  Exact match  Vector space model  Based on geometry, the notion of vectors in high dimensional space  Documents are ranked based on their similarity to the query (ranked retrieval)  Best/partial match
  • 7.  Language models  Based on the notion of probabilities and processes for generating text  Documents are ranked based on the probability that they generated the query  Best/partial match
  • 8. BOOLEAN MODEL  Invented by George Boole (1815-1864)  He devised a system of symbolic logic in which he used three operators (+, , - ) to combine statements in symbolic form.  John Venn named to this operators of Boolean logic are the logical sum(+), logical product(), and logical difference(-).  IR systems allow the users to express their queries by using this operators.
  • 9. BOOLEAN MODEL  Each index term is either present or absent  Documents are either Relevant or Not Relevant(no ranking)  A document is represented as a set of keywords.  Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including the use of brackets to indicate scope.  [[Rio & Brazil] | [Hilo & Hawaii]] & hotel & !Hilton]  Output: Document is relevant or not. No partial matches or ranking.
  • 10. BOOLEAN RETRIEVAL MODEL  Popular retrieval model because:  Easy to understand for simple queries.  Clean formalism.  Boolean models can be extended to include ranking.  Reasonably efficient implementations possible for normal queries.
  • 11. BOOLEAN MODEL  Weights assigned to terms are either “0” or “1”  “0” represents “absence”: term isn’t in the document  “1” represents “presence”: term is in the document  Build queries by combining terms with Boolean operators  AND, OR, NOT  The system returns all documents that satisfy the query
  • 13. Why Boolean Retrieval Works  Boolean operators approximate natural language  Find documents about a good party that is not over  AND can discover relationships between concepts  good party  OR can discover alternate terminology  excellent party, wild party, etc.  NOT can discover alternate meanings  Democratic party
  • 14. The Perfect Query Paradox  Every information need has a perfect set of documents  If not, there would be no sense doing retrieval  Every document set has a perfect query  AND every word in a document to get a query for it  Repeat for each document in the set  OR every document query to get the set query  But can users realistically be expected to formulate this perfect query?  Boolean query formulation is hard!
  • 15. Why Boolean Retrieval Fails • Natural language is way more complex • AND “discovers” nonexistent relationships – Terms in different sentences, paragraphs, … • Guessing terminology for OR is hard – good, nice, excellent, outstanding, awesome, … • Guessing terms to exclude is even harder! – Democratic party, party to a lawsuit, …
  • 16. BOOLEAN MODEL  Strengths  Precise, if you know the right strategies  Precise, if you have an idea of what you’re looking for  Efficient for the computer  Simple  Weaknesses  Users must learn Boolean logic  Boolean logic insufficient to capture the richness of language  No control over size of result set: either too many documents or none  When do you stop reading? All documents in the result set are considered “equally good”  What about partial matches? Documents that “don’t quite match” the query may be useful also  No notion of ranking (exact matching only)  All index terms have equal weight
  • 17. PROBLEMS  Very rigid: AND means all; OR means any.  Difficult to express complex user requests.  Difficult to control the number of documents retrieved.  All matched documents will be returned.  Difficult to rank output.  All matched documents logically satisfy the query.  Difficult to perform relevance feedback.  If a document is identified by the user as relevant or irrelevant, how should the query be modified?
  • 18. ADVANTAGES & DISADVANTAGES  Advantages  Results are predictable, relatively easy to explain  Many different features can be incorporated  Efficient processing since many documents can be eliminated from search  Disadvantages  Effectiveness depends entirely on user  Simple queries usually don’t work well  Complex queries are difficult.
  • 19. LIMITATIONS  The first relates to the formulation of search statements.  It has been noted that users are not able to formulate an exact search statement by the combination of AND, OR and NOT operators, especially when several query terms are involved.  In such cases either the search statement becomes too narrow or too broad.  The second limitation relates to the number of retrieval items.  It has been noted that users cannot predict a priori exactly how many items are to be retrieved to satisfy a given query.  If the search statement is broad, the number of retrieved items may sometimes be several hundreds and thus it may be quite difficult to find out the exact information required.  The third limitation is that it identifies an item as relevant by finding out whether a given query term is present or not in a given record in the database.