SlideShare uma empresa Scribd logo
1 de 45
Baixar para ler offline
Advantages of Query
Biased Summaries in
Information Retrieval
Anastasios Tombros
Computing Science Department University of Glasgow, Scotland
Mark Sanderson
CIIR, Computing Science Department University of Massachusetts,
U.S.A.
Presentation by Onur Yılmaz
Outline
 Introduction
 Summarization System
 Experimental Design
 Experimental Results
 Further Work and Conclusions
Introduction
Interaction with an IR system
 User enters a specific information need,
 Query
 Response of a typical system
 Retrieved document list
Introduction
Interaction with an IR system

Introduction
Interaction with an IR system

Title
First few
sentences
URL
Relevance
Quantification
Introduction
Interaction with an IR system
 Users have to decide which of the retrieved documents
are most likely to convey their information need
 Ideally, it should be possible to make this decision
without having to refer to the full document text
Introduction
Interaction with an IR system

Title
First few
sentences
Not clear which the document relates to a
user’s query
Introduction
What is presented?
 Aim
 to minimize users’ need to refer to the full
document text
 to provide enough information to support their
retrieval decisions
Introduction
What is presented?
 Proposed method
 An automatically generated summary of each document in
a retrieved document list, biased to a user’s query
Introduction
Document Summary
 Condensation of a full text document
 indicative of the source’s content
 informative
Introduction
Sentence Extraction
 Selection of sentences from the original document
 Scores are assigned to sentences according to a set of
extraction criteria
 Best-scoring sentences are presented in the summary
Summarisation System
 Aim of system
 Providing information on the relevance of documents
retrieved in response to their query
 Summaries were to be user-directed: biased towards
the user’s query
Summarisation System
Articles
 Wall Street Journal (WSJ) taken from the TREC
collection (Text REtrieval Conferences)
Summarisation System
Methodology
 Examining 50 randomly selected articles from the
collection
 Extract conclusions about the distribution of important
information within them
 Title
 Headings
 Leading paragraph
 Overall structural
Summarisation System
Sentence Extraction - Title Method
 Headlines of news articles
 tend to reveal the major subject of the article
 Terms in the title section of the documents
 assigned a positive weight (title score)
 Leading paragraph of each article
 Provides much information on the article’s content
 In order to quantify this contribution,
 An ordinal weight was assigned to the first two
sentences of each article.
Summarisation System
Sentence Extraction - Title Method
Summarisation System
Sentence Extraction - Title Method
 Section Headings
 «heading score» was assigned to each one of the
sentences comprising a heading
Summarisation System
Sentence Extraction - Term Occurrence (TO)
 Number of term occurrences (TO) within each document
to further assign weights to sentences
 Reasonable TO
 7  medium-sized documents
 (between 25 - 40 sentences)
 TO value is augmented by 10% of the increase in document
size
Summarisation System
Sentence Extraction - Term Occurrence (TO)
 Significantly Related Terms
 Two terms are are significant
 No more than 4 non-significant words between them
Recent studies that show that in the English
language 98% of the lexical relations occur
between words within a span of 5 words in a
Sentence.
Summarisation System
Biasing summaries towards queries
 Query Score
 calculated for each of the sentences of a document
 based on the distribution of query terms in each
sentence
Summarisation System
Biasing summaries towards queries
 For each sentence
+Query score
Score obtained by
the sentence
extraction methods
Final Score=
Summarisation System
Biasing summaries towards queries
 Summary for each document
 Output the top-scoring sentence
 Until desired summary length reached
15% of the
document's
length
up to a
maximum of
five sentences
Experimental Design
 Aim of experiment
 Use of query biased summaries in a retrieved
document list
 would have a positive effect on
 the process of relevance judgement by users
Experimental Design
Experimental Conditions
 Two levels of an independent variable:
 Use of query-biased summaries in a ranked list of
retrieved documents
 Use of static pre-defined summaries
Experimental Design
Groups of Subjects
 Two groups consisting of 10 people
 Postgraduate students
 Population is not representative of that which we wish to
generalize the conclusions
Experimental Design
Retrieval Task
 Identify, in a limited amount of time, as many
relevant documents as possible
 Subjects presented with a retrieved document list
and query
Experimental Design
Queries
 Randomly selected from the queries of the TREC test
collection
 Standard against which the subjects’ relevance
judgements compared
Experimental Design
IR System
 Classic document ranking system
 tf•idf term weighting scheme
 with stop word removal
 word stemming using the Porter stemmer
Experimental Design
Operationalising the Experiment
Select
typical or
proposed
method
Assign 5
random
queries to
subject
Present to
subject
retrieved
document
list (50
highest
ranked
documents)
Subject
identifies
the relevant
documents
in 5 minutes
Experimental Design
Experiment – Typical IR System Output

Experimental Design
Experiment – Proposed IR System Output

Experimental Results
Dependent Variable
 Dependent variable:
 Performance of the users in the
 process of relevance judgements on documents retrieved
by specific queries
Experimental Results
Criteria
 Recall and precision of the relevance judgements
 The speed of judgements
 Need of the assistance from the full text of the
retrieved documents.
 The subjective opinion about the assistance provided
Experimental Results
Recall
 Recall: Number of relevant documents correctly identified by
a subject for a query divided by the total number of relevant
documents, within the examined ones, for that query
«summary group»
managed to
successfully identify
a larger number of
relevant documents
Experimental Results
Precision
 Precision: Relevant documents correctly identified divided by
the total number of indicated relevant documents for a query
Experimental Results
Recall and Precision
 Query biased summaries allow users to identify
more relevant documents, and identify them more
accurately.
Experimental Results
Speed
 Avg. Number of Documents: Actual number of documents that
each subject managed to examine within the specified 5
minute period
13% increase
Tendency for quicker
judgement
Experimental Results
Reference to the full text of the documents
 Static summaries based on the first few lines of a
document is inadequate.
Further Work and Conclusions
Summary
 Query based summaries assist users in performing
relevance judgements more accurately and more
quickly.
 Users can identify more relevant documents for each
query, while at the same time make fewer mistakes.
Further Work and Conclusions
Summary
 Query based summaries almost alleviate the users’
need to refer to the full text
 Users rely almost solely on the information conveyed in
the query-biased summaries in order to perform their
relevance judgements.
Further Work and Conclusions
Future Work
 Possible extensions of the work
 Repeating the experiments by simulating conditions
encountered when searching for information on the
web.
 Intend to examine different summarisation techniques
and to apply our system to alternative test collections.
References
 Abracos, J., and Lopes, G.P. 1997. Statistical methods for retrieving most significant
paragraphs in newspaper articles. In Proceedings of the ACL'97/EACL'97 Workshop on
Intelligent Scalable Text Summarisation (ISTS '97), 51-57. Madrid, Spain, July 11 1997.
 Brandow, R., Mitze, K., and Rau, L.F. 1995. Automatic condensation of electronic
publications by sentence selection. Information Processing & Management 31(5): 675-685,
September 1995.
 Callan, J.P. 1994. Passage-level evidence in document retrieval. In Croft, W.B., and van
Rijbergen, C.J. eds. Proceedings of the Seventeenth Annual ACM SIGIR Conference on
Research and Development in Information Retrieval, 302-310. ACM Press, July 1994.
 Edmundson, H.P. 1964. Problems in automatic abstracting. Communications of the ACM
7(4):259-263, April 1964.
 Edmundson, H.P. 1969. New methods in automatic abstracting. Journal of the ACM
16(2):264-285, April 1969.
 Hand, T.F. 1997. A proposal for a task-based evaluation of text summarisation systems. In
Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarisation
(ISTS ’97), 31-38. Madrid, Spain, July 11 1997.
 Harman, D. 1996. Overview of the Fifth Text REtrieval Conference (TREC-5). In Proceedings
of the Text Retrieval Conference (TREC-5), National Institute of Standards and Technology,
Gaithersburg, MD 20899, USA.
References
 Jacobs, P.S., and Rau, L.F. 1990. Scisor: Extracting information from on-line news.
Communications of the ACM 33(11):88-97, November 1990.
 Keppel, G. 1973. Design and analysis: A researcher’s handbook. New Jersey: Prentice Hall.
 Knaus, D., Mittendorf, E., Schauble, P., and Sheridan, P. 1995. Highlighting relevant passages
for users of the interactive SPIDER retrieval system. In Proceedings of the Text Retrieval
Conference (TREC-4), National Institute of Standards and Technology, Gaithersburg, MD
20899, USA.
 Kupiec, J., Pedersen, J., and Chen, F. 1995. A trainable document summariser. In Fox, E.A.,
Ingwersen, P., and Fidel, R. eds. Proceedings of the Eighteenth Annual ACM SIGIR Conference
on Research and Development in Information Retrieval, 68-73. ACM Press, July 1995.
 Luhn, H.P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and
Development 2(2): 159-165, April 1958.
 Maizell, R.E., Smith, J.F., and Singer, T.E.R. 1971. Abstracting scientific and technical
literature: An introductory guide and text for scientists, abstractors and management. New
York: Willey-Interscience, John Willey & Sons Inc.
 Mani, I., and Bloedorn, E. 1997. Multi-document summarisation by graph search and
matching. In Proceedings of AAAI-97, Providence Rhode Island.
References
 McKeown, K., and Radev, D.R. 1995. Generating summaries from multiple news articles.
In Fox, E.A., Ingwersen, P., and Fidel, R. eds. Proceedings of the Eighteenth Annual ACM
SIGIR Conference on Research and Development in Information Retrieval, 74-82. ACM
Press, July 1995.
 Miike, S., Itoh, E., Ono, K., and Sumita, K. 1994. A full-text retrieval system with a
dynamic abstract generation function. In Croft, W.B., and van Rijbergen, C.J. eds.
Proceedings of the Seventeenth Annual ACM SIGIR Conference on Research and
Development in Information Retrieval, 152-161. ACM Press, July 1994.
 Miller, S. 1984. Experimental design and statistics (2nd edition). New York: Routledge.
 Paice, C.D. 1990. Constructing literature abstracts by computer: Techniques and
prospects. Information Processing & Management 26(1):171-186.
 Porter, M.F. 1980. An algorithm for suffix stripping. Program - automated library and
information systems 14(3):130-137.
 Rush, J.E., Salvador, R., and Zamora, A. 1971. Automatic abstracting and indexing. II.
Production of indicative abstracts by application of contextual inference and syntactic
coherence criteria. Journal of the American Society for Information Science 22(4):260-
274.
 Salton, G., Singhal, A., Mitra, M., and Buckley, C. 1997. Automatic text structuring and
summarisation. Information Processing & Management 33(2):193-20.
Advantages of Query
Biased Summaries in
Information Retrieval
Anastasios Tombros
Computing Science Department University of Glasgow, Scotland
Mark Sanderson
CIIR, Computing Science Department University of Massachusetts,
U.S.A.
Presentation by Onur Yılmaz

Mais conteúdo relacionado

Mais procurados

Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information RetrievalHarsh Thakkar
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methodsiosrjce
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text miningredpel dot com
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case StudyIJERA Editor
 
76 s201906
76 s20190676 s201906
76 s201906IJRAT
 
Research Paper Selection Based On an Ontology and Text Mining Technique Using...
Research Paper Selection Based On an Ontology and Text Mining Technique Using...Research Paper Selection Based On an Ontology and Text Mining Technique Using...
Research Paper Selection Based On an Ontology and Text Mining Technique Using...IOSR Journals
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION
 
Text Classification using Support Vector Machine
Text Classification using Support Vector MachineText Classification using Support Vector Machine
Text Classification using Support Vector Machineinventionjournals
 
Semantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detectionSemantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detectionTELKOMNIKA JOURNAL
 
Text Document categorization using support vector machine
Text Document categorization using support vector machineText Document categorization using support vector machine
Text Document categorization using support vector machineIRJET Journal
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationNinad Samel
 
An efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentAn efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentSaleihGero
 
RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101ThienSi Le
 
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...IOSR Journals
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...cscpconf
 

Mais procurados (17)

N15-1013
N15-1013N15-1013
N15-1013
 
Probabilistic Information Retrieval
Probabilistic Information RetrievalProbabilistic Information Retrieval
Probabilistic Information Retrieval
 
Review of Various Text Categorization Methods
Review of Various Text Categorization MethodsReview of Various Text Categorization Methods
Review of Various Text Categorization Methods
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
 
Document Retrieval System, a Case Study
Document Retrieval System, a Case StudyDocument Retrieval System, a Case Study
Document Retrieval System, a Case Study
 
76 s201906
76 s20190676 s201906
76 s201906
 
Research Paper Selection Based On an Ontology and Text Mining Technique Using...
Research Paper Selection Based On an Ontology and Text Mining Technique Using...Research Paper Selection Based On an Ontology and Text Mining Technique Using...
Research Paper Selection Based On an Ontology and Text Mining Technique Using...
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
Text Classification using Support Vector Machine
Text Classification using Support Vector MachineText Classification using Support Vector Machine
Text Classification using Support Vector Machine
 
N045038690
N045038690N045038690
N045038690
 
Semantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detectionSemantics-based clustering approach for similar research area detection
Semantics-based clustering approach for similar research area detection
 
Text Document categorization using support vector machine
Text Document categorization using support vector machineText Document categorization using support vector machine
Text Document categorization using support vector machine
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
An efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentAn efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-document
 
RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101RES860 P6 IndividualProject - version101
RES860 P6 IndividualProject - version101
 
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...Correlation Coefficient Based Average Textual Similarity Model for Informatio...
Correlation Coefficient Based Average Textual Similarity Model for Informatio...
 
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
DOMAIN KEYWORD EXTRACTION TECHNIQUE: A NEW WEIGHTING METHOD BASED ON FREQUENC...
 

Semelhante a Query Biased Summaries Improve IR Relevance Judgements

Query based summarization
Query based summarizationQuery based summarization
Query based summarizationdamom77
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval forWaheeb Ahmed
 
IRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET Journal
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...IJwest
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluatedGESIS
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
An Effective Academic Research Papers Recommendation For Non-Profiled Users
An Effective Academic Research Papers Recommendation For Non-Profiled UsersAn Effective Academic Research Papers Recommendation For Non-Profiled Users
An Effective Academic Research Papers Recommendation For Non-Profiled UsersNat Rice
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodijcsit
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptbutest
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique ofIJDKP
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationIJMER
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicIAEME Publication
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizerijcsit
 

Semelhante a Query Biased Summaries Improve IR Relevance Judgements (20)

Query Based Summarization
Query Based SummarizationQuery Based Summarization
Query Based Summarization
 
Query based summarization
Query based summarizationQuery based summarization
Query based summarization
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval for
 
IRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text Document
 
Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...
 
Search term recommendation and non-textual ranking evaluated
 Search term recommendation and non-textual ranking evaluated Search term recommendation and non-textual ranking evaluated
Search term recommendation and non-textual ranking evaluated
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
An Effective Academic Research Papers Recommendation For Non-Profiled Users
An Effective Academic Research Papers Recommendation For Non-Profiled UsersAn Effective Academic Research Papers Recommendation For Non-Profiled Users
An Effective Academic Research Papers Recommendation For Non-Profiled Users
 
Performance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information RetrievalPerformance Evaluation of Query Processing Techniques in Information Retrieval
Performance Evaluation of Query Processing Techniques in Information Retrieval
 
Arabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation methodArabic text categorization algorithm using vector evaluation method
Arabic text categorization algorithm using vector evaluation method
 
kantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.pptkantorNSF-NIJ-ISI-03-06-04.ppt
kantorNSF-NIJ-ISI-03-06-04.ppt
 
A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...A template based algorithm for automatic summarization and dialogue managemen...
A template based algorithm for automatic summarization and dialogue managemen...
 
Enhancing the labelling technique of
Enhancing the labelling technique ofEnhancing the labelling technique of
Enhancing the labelling technique of
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
 
K0936266
K0936266K0936266
K0936266
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logic
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
Multi-Topic Multi-Document Summarizer
Multi-Topic Multi-Document SummarizerMulti-Topic Multi-Document Summarizer
Multi-Topic Multi-Document Summarizer
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Query Biased Summaries Improve IR Relevance Judgements

  • 1. Advantages of Query Biased Summaries in Information Retrieval Anastasios Tombros Computing Science Department University of Glasgow, Scotland Mark Sanderson CIIR, Computing Science Department University of Massachusetts, U.S.A. Presentation by Onur Yılmaz
  • 2. Outline  Introduction  Summarization System  Experimental Design  Experimental Results  Further Work and Conclusions
  • 3. Introduction Interaction with an IR system  User enters a specific information need,  Query  Response of a typical system  Retrieved document list
  • 5. Introduction Interaction with an IR system  Title First few sentences URL Relevance Quantification
  • 6. Introduction Interaction with an IR system  Users have to decide which of the retrieved documents are most likely to convey their information need  Ideally, it should be possible to make this decision without having to refer to the full document text
  • 7. Introduction Interaction with an IR system  Title First few sentences Not clear which the document relates to a user’s query
  • 8. Introduction What is presented?  Aim  to minimize users’ need to refer to the full document text  to provide enough information to support their retrieval decisions
  • 9. Introduction What is presented?  Proposed method  An automatically generated summary of each document in a retrieved document list, biased to a user’s query
  • 10. Introduction Document Summary  Condensation of a full text document  indicative of the source’s content  informative
  • 11. Introduction Sentence Extraction  Selection of sentences from the original document  Scores are assigned to sentences according to a set of extraction criteria  Best-scoring sentences are presented in the summary
  • 12. Summarisation System  Aim of system  Providing information on the relevance of documents retrieved in response to their query  Summaries were to be user-directed: biased towards the user’s query
  • 13. Summarisation System Articles  Wall Street Journal (WSJ) taken from the TREC collection (Text REtrieval Conferences)
  • 14. Summarisation System Methodology  Examining 50 randomly selected articles from the collection  Extract conclusions about the distribution of important information within them  Title  Headings  Leading paragraph  Overall structural
  • 15. Summarisation System Sentence Extraction - Title Method  Headlines of news articles  tend to reveal the major subject of the article  Terms in the title section of the documents  assigned a positive weight (title score)
  • 16.  Leading paragraph of each article  Provides much information on the article’s content  In order to quantify this contribution,  An ordinal weight was assigned to the first two sentences of each article. Summarisation System Sentence Extraction - Title Method
  • 17. Summarisation System Sentence Extraction - Title Method  Section Headings  «heading score» was assigned to each one of the sentences comprising a heading
  • 18. Summarisation System Sentence Extraction - Term Occurrence (TO)  Number of term occurrences (TO) within each document to further assign weights to sentences  Reasonable TO  7  medium-sized documents  (between 25 - 40 sentences)  TO value is augmented by 10% of the increase in document size
  • 19. Summarisation System Sentence Extraction - Term Occurrence (TO)  Significantly Related Terms  Two terms are are significant  No more than 4 non-significant words between them Recent studies that show that in the English language 98% of the lexical relations occur between words within a span of 5 words in a Sentence.
  • 20. Summarisation System Biasing summaries towards queries  Query Score  calculated for each of the sentences of a document  based on the distribution of query terms in each sentence
  • 21. Summarisation System Biasing summaries towards queries  For each sentence +Query score Score obtained by the sentence extraction methods Final Score=
  • 22. Summarisation System Biasing summaries towards queries  Summary for each document  Output the top-scoring sentence  Until desired summary length reached 15% of the document's length up to a maximum of five sentences
  • 23. Experimental Design  Aim of experiment  Use of query biased summaries in a retrieved document list  would have a positive effect on  the process of relevance judgement by users
  • 24. Experimental Design Experimental Conditions  Two levels of an independent variable:  Use of query-biased summaries in a ranked list of retrieved documents  Use of static pre-defined summaries
  • 25. Experimental Design Groups of Subjects  Two groups consisting of 10 people  Postgraduate students  Population is not representative of that which we wish to generalize the conclusions
  • 26. Experimental Design Retrieval Task  Identify, in a limited amount of time, as many relevant documents as possible  Subjects presented with a retrieved document list and query
  • 27. Experimental Design Queries  Randomly selected from the queries of the TREC test collection  Standard against which the subjects’ relevance judgements compared
  • 28. Experimental Design IR System  Classic document ranking system  tf•idf term weighting scheme  with stop word removal  word stemming using the Porter stemmer
  • 29. Experimental Design Operationalising the Experiment Select typical or proposed method Assign 5 random queries to subject Present to subject retrieved document list (50 highest ranked documents) Subject identifies the relevant documents in 5 minutes
  • 30. Experimental Design Experiment – Typical IR System Output 
  • 31. Experimental Design Experiment – Proposed IR System Output 
  • 32. Experimental Results Dependent Variable  Dependent variable:  Performance of the users in the  process of relevance judgements on documents retrieved by specific queries
  • 33. Experimental Results Criteria  Recall and precision of the relevance judgements  The speed of judgements  Need of the assistance from the full text of the retrieved documents.  The subjective opinion about the assistance provided
  • 34. Experimental Results Recall  Recall: Number of relevant documents correctly identified by a subject for a query divided by the total number of relevant documents, within the examined ones, for that query «summary group» managed to successfully identify a larger number of relevant documents
  • 35. Experimental Results Precision  Precision: Relevant documents correctly identified divided by the total number of indicated relevant documents for a query
  • 36. Experimental Results Recall and Precision  Query biased summaries allow users to identify more relevant documents, and identify them more accurately.
  • 37. Experimental Results Speed  Avg. Number of Documents: Actual number of documents that each subject managed to examine within the specified 5 minute period 13% increase Tendency for quicker judgement
  • 38. Experimental Results Reference to the full text of the documents  Static summaries based on the first few lines of a document is inadequate.
  • 39. Further Work and Conclusions Summary  Query based summaries assist users in performing relevance judgements more accurately and more quickly.  Users can identify more relevant documents for each query, while at the same time make fewer mistakes.
  • 40. Further Work and Conclusions Summary  Query based summaries almost alleviate the users’ need to refer to the full text  Users rely almost solely on the information conveyed in the query-biased summaries in order to perform their relevance judgements.
  • 41. Further Work and Conclusions Future Work  Possible extensions of the work  Repeating the experiments by simulating conditions encountered when searching for information on the web.  Intend to examine different summarisation techniques and to apply our system to alternative test collections.
  • 42. References  Abracos, J., and Lopes, G.P. 1997. Statistical methods for retrieving most significant paragraphs in newspaper articles. In Proceedings of the ACL'97/EACL'97 Workshop on Intelligent Scalable Text Summarisation (ISTS '97), 51-57. Madrid, Spain, July 11 1997.  Brandow, R., Mitze, K., and Rau, L.F. 1995. Automatic condensation of electronic publications by sentence selection. Information Processing & Management 31(5): 675-685, September 1995.  Callan, J.P. 1994. Passage-level evidence in document retrieval. In Croft, W.B., and van Rijbergen, C.J. eds. Proceedings of the Seventeenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 302-310. ACM Press, July 1994.  Edmundson, H.P. 1964. Problems in automatic abstracting. Communications of the ACM 7(4):259-263, April 1964.  Edmundson, H.P. 1969. New methods in automatic abstracting. Journal of the ACM 16(2):264-285, April 1969.  Hand, T.F. 1997. A proposal for a task-based evaluation of text summarisation systems. In Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarisation (ISTS ’97), 31-38. Madrid, Spain, July 11 1997.  Harman, D. 1996. Overview of the Fifth Text REtrieval Conference (TREC-5). In Proceedings of the Text Retrieval Conference (TREC-5), National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
  • 43. References  Jacobs, P.S., and Rau, L.F. 1990. Scisor: Extracting information from on-line news. Communications of the ACM 33(11):88-97, November 1990.  Keppel, G. 1973. Design and analysis: A researcher’s handbook. New Jersey: Prentice Hall.  Knaus, D., Mittendorf, E., Schauble, P., and Sheridan, P. 1995. Highlighting relevant passages for users of the interactive SPIDER retrieval system. In Proceedings of the Text Retrieval Conference (TREC-4), National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.  Kupiec, J., Pedersen, J., and Chen, F. 1995. A trainable document summariser. In Fox, E.A., Ingwersen, P., and Fidel, R. eds. Proceedings of the Eighteenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 68-73. ACM Press, July 1995.  Luhn, H.P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2): 159-165, April 1958.  Maizell, R.E., Smith, J.F., and Singer, T.E.R. 1971. Abstracting scientific and technical literature: An introductory guide and text for scientists, abstractors and management. New York: Willey-Interscience, John Willey & Sons Inc.  Mani, I., and Bloedorn, E. 1997. Multi-document summarisation by graph search and matching. In Proceedings of AAAI-97, Providence Rhode Island.
  • 44. References  McKeown, K., and Radev, D.R. 1995. Generating summaries from multiple news articles. In Fox, E.A., Ingwersen, P., and Fidel, R. eds. Proceedings of the Eighteenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 74-82. ACM Press, July 1995.  Miike, S., Itoh, E., Ono, K., and Sumita, K. 1994. A full-text retrieval system with a dynamic abstract generation function. In Croft, W.B., and van Rijbergen, C.J. eds. Proceedings of the Seventeenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval, 152-161. ACM Press, July 1994.  Miller, S. 1984. Experimental design and statistics (2nd edition). New York: Routledge.  Paice, C.D. 1990. Constructing literature abstracts by computer: Techniques and prospects. Information Processing & Management 26(1):171-186.  Porter, M.F. 1980. An algorithm for suffix stripping. Program - automated library and information systems 14(3):130-137.  Rush, J.E., Salvador, R., and Zamora, A. 1971. Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria. Journal of the American Society for Information Science 22(4):260- 274.  Salton, G., Singhal, A., Mitra, M., and Buckley, C. 1997. Automatic text structuring and summarisation. Information Processing & Management 33(2):193-20.
  • 45. Advantages of Query Biased Summaries in Information Retrieval Anastasios Tombros Computing Science Department University of Glasgow, Scotland Mark Sanderson CIIR, Computing Science Department University of Massachusetts, U.S.A. Presentation by Onur Yılmaz