SlideShare uma empresa Scribd logo
1 de 20
Text Mining lecture
Information Retrieval
Prof.dr.ir. Arjen P. de Vries
arjen@acm.org
Nijmegen, October 18th
, 2017
A Tutorial on Models of Information Seeking, Searching & Retrieval by @leifos & @guidozuc
Core Research Questions
 How to represent information?
- The information need and search requests
- The objects to be shown in response to an information request
 How to match information representations?
The information objects
to be retrieved
are not necessarily
textual!
Van Rijsbergen, 1979
Two views on ‘search’
DB
 Business applications
 Deductive reasoning
 Precise and efficient
query processing
 Users with technical skills
(SQL) and precise
information needs
Selection
Books where category=‘CS’
IR
 Digital libraries, patent
collections, etc.
 Inductive reasoning
 Best-effort processing
 Untrained users with
imprecise information
needs
Ranking
Books about CS
Note: SemWeb more DB than IR!!!
Symbolic Connectionist
Search Flow Chart
A Tutorial on Models of Information Seeking, Searching & Retrieval by @leifos & @guidozuc 5
IR vs. AI
 Many related topics in AI:
- Computational Linguistics
- Natural Language Processing
- Question Answering
- Information Extraction
- Machine Translation
- Computer vision / Multimedia
vs.
 Information Retrieval?
IR vs. AI (Kunstmatige Intelligentie)
“In some sense, of course, classic IR is superhuman: there was
no pre-existing human skill, as there was with seeing, talking or
even chess playing that corresponded to the search through
millions of words of text on the basis of indices. But if one took
the view, by contrast, that theologians, lawyers and, later, literary
scholars were able, albeit slowly, to search vast libraries of
sources for relevant material, then on that view IR is just the
optimisation of a human skill and not a superhuman activity. If
one takes that view, IR is a proper part of AI, as traditionally
conceived.”
Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR
An “Essay in honour of Karen Spärck Jones”, 2006
IR vs. AI
“In some sense, of course, classic IR is superhuman: there was
no pre-existing human skill, as there was with seeing, talking or
even chess playing that corresponded to the search through
millions of words of text on the basis of indices. But if one took
the view, by contrast, that theologians, lawyers and, later, literary
scholars were able, albeit slowly, to search vast libraries of
sources for relevant material, then on that view IR is just the
optimisation of a human skill and not a superhuman activity. If
one takes that view, IR is a proper part of AI, as traditionally
conceived.”
Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR
An “Essay in honour of Karen Spärck Jones”, 2006
IR vs. AI
“In some sense, of course, classic IR is superhuman: there was
no pre-existing human skill, as there was with seeing, talking or
even chess playing that corresponded to the search through
millions of words of text on the basis of indices. But if one took
the view, by contrast, that theologians, lawyers and, later, literary
scholars were able, albeit slowly, to search vast libraries of
sources for relevant material, then on that view IR is just the
optimisation of a human skill and not a superhuman activity. If
one takes that view, IR is a proper part of AI, as traditionally
conceived.”
Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR
An “Essay in honour of Karen Spärck Jones”, 2006
Relevance
 Inherently dependent on user, context and task
 Different “relevance criteria”
- Topicality: is the document about the information request?
- Readability: can I understand the text?
- Authoritiveness: can I trust the text?
- Child-suitability: is the text appropriate for children?
- Etc.
“Computational Relevance”
“Intellectually it is possible for a human to establish the
relevance of a document to a query. For a computer to do
this we need to construct a model within which
relevance decisions can be quantified. It is interesting to
note that most research in information retrieval can be
shown to have been concerned with different aspects of
such a model.”
Van Rijsbergen, 1976
Retrieval Model
‘Computational Relevance’
 How to combine different
indicators of relevance?
- E.g., topicality, child-
suitability, polarity, …
 Apply ‘copulas’ (a
technique from
econometrics) to model
non-linear dependencies
(SIGIR 2013, CIKM 2014)
Relevance
 Various aspects of understanding this notion of relevance
position information retrieval between computer science
and information science
 Examples of questions that traditionally do not even
presume involvement of a computer:
- What makes an information object relevant?
- What stages constitute a search process?
- How does relevance evolve during this search process?
- How do users learn from the search process?
- Why do users issue short queries even if we know that long
ones are more effective?
Etc.
NLP in IR
 Stemming & Stopping
- De facto default setting
 N-grams (bi-grams)
- SDM (Sequential Dependence Model)
 Entity tagging
Footnote in Victor Lavrenko’s PhD thesis
 “It is my personal observation that almost every
mathematically inclined graduate student in Information
Retrieval attempts to formulate some sort of a non-
independent model of IR within the first two or three years
of his studies. The vast majority of these attempts yield no
improvements and remain unpublished.”
Take words as
they stand !
The Secret
 The user can simply reformulate their information need in
response to insufficiently relevant results retrieved by the
system!
Why Search Remains Difficult to Get Right
 Heterogeneous data sources
- WWW, wikipedia, news, e-mail, patents, twitter, personal
information, …
 Varying result types
- “Documents”, tweets, courses, people, experts, gene
expressions, temperatures, …
 Multiple dimensions of relevance
- Topicality, recency, reading level, …
Actual information needs often require a mix within
and across dimensions. E.g., “recent news and
patents from our top competitors”
 System’s internal information representation
- Linguistic annotations
- Named entities, sentiment, dependencies, …
- Knowledge resources
- Wikipedia, Freebase, IDC9, IPTC, …
- Links to related documents
- Citations, urls
 Anchors that describe the URI
- Anchor text
 Queries that lead to clicks on the URI
- Session, user, dwell-time, …
 Tweets that mention the URI
- Time, location, user, …
 Other social media that describe the URI
- User, rating
- Tag, organisation of `folksonomy’
+ UNCERTAINTY ALL OVER!

Mais conteúdo relacionado

Semelhante a Information Retrieval intro TMM

Information Seeking Information Literacy
Information Seeking  Information LiteracyInformation Seeking  Information Literacy
Information Seeking Information Literacy
Johan Koren
 
Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'
mahmad
 
Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)
Bradley Allen
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
Silvia Puglisi
 
text_mining.doc
text_mining.doctext_mining.doc
text_mining.doc
butest
 
Bioinformatioc: Information Retrieval
Bioinformatioc: Information RetrievalBioinformatioc: Information Retrieval
Bioinformatioc: Information Retrieval
Dr. Rupak Chakravarty
 
LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013
PrattSILS
 

Semelhante a Information Retrieval intro TMM (20)

Information Seeking Information Literacy
Information Seeking  Information LiteracyInformation Seeking  Information Literacy
Information Seeking Information Literacy
 
Information seeking
Information seekingInformation seeking
Information seeking
 
Casda 2013 n on-fiction current events
Casda 2013   n on-fiction current eventsCasda 2013   n on-fiction current events
Casda 2013 n on-fiction current events
 
Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'Concepts as Action-Oriented as 'Search'
Concepts as Action-Oriented as 'Search'
 
Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)Enterprise Navigation (KM World 2007)
Enterprise Navigation (KM World 2007)
 
Digital Methods by Richard Rogers
Digital Methods by Richard RogersDigital Methods by Richard Rogers
Digital Methods by Richard Rogers
 
Internet Research Ethics and IRBs by Elizabeth Buchanan
Internet Research Ethics and IRBs by Elizabeth BuchananInternet Research Ethics and IRBs by Elizabeth Buchanan
Internet Research Ethics and IRBs by Elizabeth Buchanan
 
Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014Presentation to the J. Craig Venter Institute, Dec. 2014
Presentation to the J. Craig Venter Institute, Dec. 2014
 
Searching for patterns in crowdsourced information
Searching for patterns in crowdsourced informationSearching for patterns in crowdsourced information
Searching for patterns in crowdsourced information
 
Ir 01
Ir   01Ir   01
Ir 01
 
text_mining.doc
text_mining.doctext_mining.doc
text_mining.doc
 
Thinking about technology .... differently
Thinking about technology .... differentlyThinking about technology .... differently
Thinking about technology .... differently
 
Bioinformatioc: Information Retrieval
Bioinformatioc: Information RetrievalBioinformatioc: Information Retrieval
Bioinformatioc: Information Retrieval
 
LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013LIS 653 Posters Spring 2013
LIS 653 Posters Spring 2013
 
Oss swot
Oss swotOss swot
Oss swot
 
Week12
Week12Week12
Week12
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Whats Wrong With Online Reading
Whats Wrong With Online ReadingWhats Wrong With Online Reading
Whats Wrong With Online Reading
 
2014 Cornell University - Repackaging Research
2014   Cornell University - Repackaging Research 2014   Cornell University - Repackaging Research
2014 Cornell University - Repackaging Research
 

Mais de Arjen de Vries

ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
Arjen de Vries
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
Arjen de Vries
 

Mais de Arjen de Vries (20)

Doing a PhD @ DOSSIER
Doing a PhD @ DOSSIERDoing a PhD @ DOSSIER
Doing a PhD @ DOSSIER
 
Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen) Masterclass Big Data (leerlingen)
Masterclass Big Data (leerlingen)
 
Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6) Beverwedstrijd Big Data (klas 3/4/5/6)
Beverwedstrijd Big Data (klas 3/4/5/6)
 
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
Beverwedstrijd Big Data (groep 5/6 en klas 1/2)
 
Web Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search EngineWeb Archives and the dream of the Personal Search Engine
Web Archives and the dream of the Personal Search Engine
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
ACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC ChairsACM SIGIR 2017 - Opening - PC Chairs
ACM SIGIR 2017 - Opening - PC Chairs
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
PUC Masterclass Big Data
PUC Masterclass Big DataPUC Masterclass Big Data
PUC Masterclass Big Data
 
Bigdata processing with Spark - part II
Bigdata processing with Spark - part IIBigdata processing with Spark - part II
Bigdata processing with Spark - part II
 
Bigdata processing with Spark
Bigdata processing with SparkBigdata processing with Spark
Bigdata processing with Spark
 
TREC 2016: Looking Forward Panel
TREC 2016: Looking Forward PanelTREC 2016: Looking Forward Panel
TREC 2016: Looking Forward Panel
 
Models for Information Retrieval and Recommendation
Models for Information Retrieval and RecommendationModels for Information Retrieval and Recommendation
Models for Information Retrieval and Recommendation
 
Better Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain KnowledgeBetter Contextual Suggestions by Applying Domain Knowledge
Better Contextual Suggestions by Applying Domain Knowledge
 
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013
 
ESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social MediaESSIR 2013 - IR and Social Media
ESSIR 2013 - IR and Social Media
 
Looking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterpriseLooking beyond plain text for document representation in the enterprise
Looking beyond plain text for document representation in the enterprise
 
Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?Recommendation and Information Retrieval: Two Sides of the Same Coin?
Recommendation and Information Retrieval: Two Sides of the Same Coin?
 
Searching Political Data by Strategy
Searching Political Data by StrategySearching Political Data by Strategy
Searching Political Data by Strategy
 
How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?How to Search Annotated Text by Strategy?
How to Search Annotated Text by Strategy?
 

Último

Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Último (20)

High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 

Information Retrieval intro TMM

  • 1. Text Mining lecture Information Retrieval Prof.dr.ir. Arjen P. de Vries arjen@acm.org Nijmegen, October 18th , 2017
  • 2. A Tutorial on Models of Information Seeking, Searching & Retrieval by @leifos & @guidozuc
  • 3. Core Research Questions  How to represent information? - The information need and search requests - The objects to be shown in response to an information request  How to match information representations? The information objects to be retrieved are not necessarily textual! Van Rijsbergen, 1979
  • 4. Two views on ‘search’ DB  Business applications  Deductive reasoning  Precise and efficient query processing  Users with technical skills (SQL) and precise information needs Selection Books where category=‘CS’ IR  Digital libraries, patent collections, etc.  Inductive reasoning  Best-effort processing  Untrained users with imprecise information needs Ranking Books about CS Note: SemWeb more DB than IR!!! Symbolic Connectionist
  • 5. Search Flow Chart A Tutorial on Models of Information Seeking, Searching & Retrieval by @leifos & @guidozuc 5
  • 6. IR vs. AI  Many related topics in AI: - Computational Linguistics - Natural Language Processing - Question Answering - Information Extraction - Machine Translation - Computer vision / Multimedia vs.  Information Retrieval?
  • 7. IR vs. AI (Kunstmatige Intelligentie) “In some sense, of course, classic IR is superhuman: there was no pre-existing human skill, as there was with seeing, talking or even chess playing that corresponded to the search through millions of words of text on the basis of indices. But if one took the view, by contrast, that theologians, lawyers and, later, literary scholars were able, albeit slowly, to search vast libraries of sources for relevant material, then on that view IR is just the optimisation of a human skill and not a superhuman activity. If one takes that view, IR is a proper part of AI, as traditionally conceived.” Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR An “Essay in honour of Karen Spärck Jones”, 2006
  • 8. IR vs. AI “In some sense, of course, classic IR is superhuman: there was no pre-existing human skill, as there was with seeing, talking or even chess playing that corresponded to the search through millions of words of text on the basis of indices. But if one took the view, by contrast, that theologians, lawyers and, later, literary scholars were able, albeit slowly, to search vast libraries of sources for relevant material, then on that view IR is just the optimisation of a human skill and not a superhuman activity. If one takes that view, IR is a proper part of AI, as traditionally conceived.” Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR An “Essay in honour of Karen Spärck Jones”, 2006
  • 9. IR vs. AI “In some sense, of course, classic IR is superhuman: there was no pre-existing human skill, as there was with seeing, talking or even chess playing that corresponded to the search through millions of words of text on the basis of indices. But if one took the view, by contrast, that theologians, lawyers and, later, literary scholars were able, albeit slowly, to search vast libraries of sources for relevant material, then on that view IR is just the optimisation of a human skill and not a superhuman activity. If one takes that view, IR is a proper part of AI, as traditionally conceived.” Yorick Wilks, Unhappy bedfellows: the relationship of AI and IR An “Essay in honour of Karen Spärck Jones”, 2006
  • 10. Relevance  Inherently dependent on user, context and task  Different “relevance criteria” - Topicality: is the document about the information request? - Readability: can I understand the text? - Authoritiveness: can I trust the text? - Child-suitability: is the text appropriate for children? - Etc.
  • 11. “Computational Relevance” “Intellectually it is possible for a human to establish the relevance of a document to a query. For a computer to do this we need to construct a model within which relevance decisions can be quantified. It is interesting to note that most research in information retrieval can be shown to have been concerned with different aspects of such a model.” Van Rijsbergen, 1976 Retrieval Model
  • 12. ‘Computational Relevance’  How to combine different indicators of relevance? - E.g., topicality, child- suitability, polarity, …  Apply ‘copulas’ (a technique from econometrics) to model non-linear dependencies (SIGIR 2013, CIKM 2014)
  • 13. Relevance  Various aspects of understanding this notion of relevance position information retrieval between computer science and information science  Examples of questions that traditionally do not even presume involvement of a computer: - What makes an information object relevant? - What stages constitute a search process? - How does relevance evolve during this search process? - How do users learn from the search process? - Why do users issue short queries even if we know that long ones are more effective? Etc.
  • 14. NLP in IR  Stemming & Stopping - De facto default setting  N-grams (bi-grams) - SDM (Sequential Dependence Model)  Entity tagging
  • 15. Footnote in Victor Lavrenko’s PhD thesis  “It is my personal observation that almost every mathematically inclined graduate student in Information Retrieval attempts to formulate some sort of a non- independent model of IR within the first two or three years of his studies. The vast majority of these attempts yield no improvements and remain unpublished.”
  • 17.
  • 18. The Secret  The user can simply reformulate their information need in response to insufficiently relevant results retrieved by the system!
  • 19. Why Search Remains Difficult to Get Right  Heterogeneous data sources - WWW, wikipedia, news, e-mail, patents, twitter, personal information, …  Varying result types - “Documents”, tweets, courses, people, experts, gene expressions, temperatures, …  Multiple dimensions of relevance - Topicality, recency, reading level, … Actual information needs often require a mix within and across dimensions. E.g., “recent news and patents from our top competitors”
  • 20.  System’s internal information representation - Linguistic annotations - Named entities, sentiment, dependencies, … - Knowledge resources - Wikipedia, Freebase, IDC9, IPTC, … - Links to related documents - Citations, urls  Anchors that describe the URI - Anchor text  Queries that lead to clicks on the URI - Session, user, dwell-time, …  Tweets that mention the URI - Time, location, user, …  Other social media that describe the URI - User, rating - Tag, organisation of `folksonomy’ + UNCERTAINTY ALL OVER!

Notas do Editor

  1. The fundamental research questions are all about REPRESENTATION And MATCHING these representations. MOUSECLICK The long term research agenda is to unify two fundamentally different views on these problems: those from the database domain, and those from the information retrieval domain Fundamental, as the deductive approach of DB world is not that easily brought together with the inductive approach underlying IR.
  2. Some of research is really about the mathematical modelling, like our recent ACM SIGIR paper on MOUSECLICK deploying copulas - a mathematical approach first applied in economy to represent macro-economic process - to model MOUSECLICK the interactions between different types of relevance; here, topic relevance and subjectivity.