SlideShare uma empresa Scribd logo
1 de 39
Algorithms for  Large Data Sets Ziv Bar-Yossef Lecture 2 March 26, 2006 http://www.ee.technion.ac.il/courses/049011
Information Retrieval
Information Retrieval Setting I want information about  Michael Jordan , the machine learning expert query User Document Collection “ Information Need” +”Michael Jordan” -basketball ,[object Object],[object Object],[object Object],[object Object],Ranked list of retrieved documents IR System documents No. 1 is good,  Rest are bad feedback Revised ranked list of retrieved documents ,[object Object],[object Object],[object Object],[object Object]
Information Retrieval   vs. Data Retrieval ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Information Retrieval vs.  Data Retrieval Database tables, structured Free text, unstructured Data Knowledgeable users or automatic processes Non-expert humans Accessibility Unordered Ordered by relevance Results Exact matches Approximate matches Results SQL,  Relational algebras Keywords, Natural language Queries Data Retrieval Information Retrieval
Information Retrieval Systems IR System query processor text processor user query ranked retrieved docs User Corpus ranking procedure system query retrieved docs index indexer tokenized docs postings raw docs
Search Engines Search Engine query processor text processor user query ranked retrieved docs User Web ranking procedure system query retrieved docs index indexer tokenized docs postings crawler global analyzer repository
Classical IR vs. Web IR Hypertext Text Documents Large Small # of matches Partially accessible Accessible Data accessibility Huge Large Volume Link-based Content-based IR techniques Widely diverse Homogeneous Format diversity In flux Infrequent Data change rate Noisy, dups Clean, no dups Data quality Web IR Classical IR
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Abstract Formulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Document Representation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Classic IR (Relevance) Models ,[object Object],[object Object]
The Boolean Model ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Boolean Model: Pros & Cons ,[object Object],[object Object],[object Object],[object Object]
The Vector Space Model (VSM) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Popular Similarity Measures ,[object Object],[object Object],[object Object],[object Object],d q d –q   d q
TF-IDF Score: Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TF-IDF Score: Definition ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
VSM: Pros & Cons ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],D L q D q Recall:  Precision:
Recall & Precision: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],List A Relevant docs: d 123 , d 56 , d 9 , d 25 , d 3 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],List B ,[object Object],[object Object]
Precision@k and Recall@k ,[object Object],[object Object],[object Object],Recall@k:  Precision@k:
Precision@k: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],List A ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],List B
Recall@k: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],List A ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],List B
“ Interpolated” Precision ,[object Object],[object Object],[object Object],[object Object],Interpolated precision@ recall level r = max { precision@k : k >= k(r) }
Precision vs. Recall: Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],List A ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],List B
Query Languages:  Keyword-Based ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Query Languages:  Pattern Matching ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Text Processing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inverted Index Michael 1  Jordan 2 ,  the 3  author 4   of 5  “graphical 6  models 7 ”,  is 8   a 9  professor 10   at 11  U.C. 12  Berkeley 13 . The 1  famous 2  NBA 3  legend 4  Michael 5  Jordan 6  liked 7   to 8  date 9  models 10 . d 1 d 2 author:   (d 1 ,4) berkeley:   (d 1 ,13) date:   (d 2 ,9) famous:   (d 2 , 2) graphical:   (d 1 ,6) jordan:  (d 1 ,2), (d 2 ,6) legend:   (d 2 ,4) like:   (d 2 ,7) michael:  (d 1 ,1), (d 2 ,5) model:   (d 1 ,7), (d 2 ,10) nba:   (d 2 ,3) professor:   (d 1 ,10) uc:   (d 1 ,12) Vocabulary   Postings
Inverted Index Structure Vocabulary File term1 term2 … Postings File postings list 1 postings list 2 … ,[object Object],[object Object]
Searching an Inverted Index ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search Optimization ,[object Object],[object Object],[object Object]
Index Construction ,[object Object],[object Object],[object Object]
Index Maintenance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Index Maintenance ,[object Object],[object Object],[object Object],[object Object]
Index Compression ,[object Object],[object Object],[object Object],michael: (1000007,5), (1000009,12), (1000013,77), (1000035,88),… michael: (1000007,5), (2,12), (4,77), (22,88),…
Variable Length Encodings ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
End of Lecture 2

Mais conteúdo relacionado

Mais procurados

WAND Top-k Retrieval
WAND Top-k RetrievalWAND Top-k Retrieval
WAND Top-k RetrievalAndrew Zhang
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
 
Named Entity Recognition from Online News
Named Entity Recognition from Online NewsNamed Entity Recognition from Online News
Named Entity Recognition from Online NewsBernardo Najlis
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubMehwish Alam
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Mehwish Alam
 
Cross-lingual Information Retrieval
Cross-lingual Information RetrievalCross-lingual Information Retrieval
Cross-lingual Information RetrievalShadi Saleh
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
Cross-Language Information Retrieval
Cross-Language Information RetrievalCross-Language Information Retrieval
Cross-Language Information RetrievalSumin Byeon
 
Text classification-php-v4
Text classification-php-v4Text classification-php-v4
Text classification-php-v4Glenn De Backer
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slideMohd Iqbal Al-farabi
 

Mais procurados (20)

WAND Top-k Retrieval
WAND Top-k RetrievalWAND Top-k Retrieval
WAND Top-k Retrieval
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
search engine
search enginesearch engine
search engine
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
A Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF GraphsA Survey of Entity Ranking over RDF Graphs
A Survey of Entity Ranking over RDF Graphs
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
 
Named Entity Recognition from Online News
Named Entity Recognition from Online NewsNamed Entity Recognition from Online News
Named Entity Recognition from Online News
 
Framester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data HubFramester: A Wide Coverage Linguistic Linked Data Hub
Framester: A Wide Coverage Linguistic Linked Data Hub
 
Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.Interactive Knowledge Discovery over Web of Data.
Interactive Knowledge Discovery over Web of Data.
 
Cross-lingual Information Retrieval
Cross-lingual Information RetrievalCross-lingual Information Retrieval
Cross-lingual Information Retrieval
 
Profile of NPOESS HDF5 Files
Profile of NPOESS HDF5 FilesProfile of NPOESS HDF5 Files
Profile of NPOESS HDF5 Files
 
inteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access FrameworkinteSearch: An Intelligent Linked Data Information Access Framework
inteSearch: An Intelligent Linked Data Information Access Framework
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Cross-Language Information Retrieval
Cross-Language Information RetrievalCross-Language Information Retrieval
Cross-Language Information Retrieval
 
Text classification-php-v4
Text classification-php-v4Text classification-php-v4
Text classification-php-v4
 
SWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQLSWT Lecture Session 3 - SPARQL
SWT Lecture Session 3 - SPARQL
 
Cross language information retrieval (clir)slide
Cross language information retrieval (clir)slideCross language information retrieval (clir)slide
Cross language information retrieval (clir)slide
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 

Destaque

Destaque (6)

CFSC 2010 Program: Food, Culture & Justice
CFSC 2010 Program: Food, Culture & JusticeCFSC 2010 Program: Food, Culture & Justice
CFSC 2010 Program: Food, Culture & Justice
 
Gbc.fac.mini.05.14
Gbc.fac.mini.05.14Gbc.fac.mini.05.14
Gbc.fac.mini.05.14
 
Egf briefing book
Egf briefing bookEgf briefing book
Egf briefing book
 
AoC London Members' Bulletin Issue 110
AoC London Members' Bulletin Issue 110AoC London Members' Bulletin Issue 110
AoC London Members' Bulletin Issue 110
 
Biotech usa 191
Biotech usa 191Biotech usa 191
Biotech usa 191
 
Baltimore’s Food Justice Initiatives_Urban Agriculture, Virtual Supermarkets ...
Baltimore’s Food Justice Initiatives_Urban Agriculture, Virtual Supermarkets ...Baltimore’s Food Justice Initiatives_Urban Agriculture, Virtual Supermarkets ...
Baltimore’s Food Justice Initiatives_Urban Agriculture, Virtual Supermarkets ...
 

Semelhante a Slides

EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...Chuancong Gao
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesJie Bao
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!Arjen de Vries
 
Fosdem 2013 petra selmer flexible querying of graph data
Fosdem 2013 petra selmer   flexible querying of graph dataFosdem 2013 petra selmer   flexible querying of graph data
Fosdem 2013 petra selmer flexible querying of graph dataPetra Selmer
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceUniversity of Washington
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R NotesLakshmiSarvani6
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 

Semelhante a Slides (20)

Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Web search engines
Web search enginesWeb search engines
Web search engines
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
Getting started with R
Getting started with RGetting started with R
Getting started with R
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
Fosdem 2013 petra selmer flexible querying of graph data
Fosdem 2013 petra selmer   flexible querying of graph dataFosdem 2013 petra selmer   flexible querying of graph data
Fosdem 2013 petra selmer flexible querying of graph data
 
Ir
IrIr
Ir
 
Ir
IrIr
Ir
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail ScienceSQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
SQL is Dead; Long Live SQL: Lightweight Query Services for Long Tail Science
 
How web searching engines work
How web searching engines workHow web searching engines work
How web searching engines work
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Slides

  • 1. Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 2 March 26, 2006 http://www.ee.technion.ac.il/courses/049011
  • 3.
  • 4.
  • 5. Information Retrieval vs. Data Retrieval Database tables, structured Free text, unstructured Data Knowledgeable users or automatic processes Non-expert humans Accessibility Unordered Ordered by relevance Results Exact matches Approximate matches Results SQL, Relational algebras Keywords, Natural language Queries Data Retrieval Information Retrieval
  • 6. Information Retrieval Systems IR System query processor text processor user query ranked retrieved docs User Corpus ranking procedure system query retrieved docs index indexer tokenized docs postings raw docs
  • 7. Search Engines Search Engine query processor text processor user query ranked retrieved docs User Web ranking procedure system query retrieved docs index indexer tokenized docs postings crawler global analyzer repository
  • 8. Classical IR vs. Web IR Hypertext Text Documents Large Small # of matches Partially accessible Accessible Data accessibility Huge Large Volume Link-based Content-based IR techniques Widely diverse Homogeneous Format diversity In flux Infrequent Data change rate Noisy, dups Clean, no dups Data quality Web IR Classical IR
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Inverted Index Michael 1 Jordan 2 , the 3 author 4 of 5 “graphical 6 models 7 ”, is 8 a 9 professor 10 at 11 U.C. 12 Berkeley 13 . The 1 famous 2 NBA 3 legend 4 Michael 5 Jordan 6 liked 7 to 8 date 9 models 10 . d 1 d 2 author: (d 1 ,4) berkeley: (d 1 ,13) date: (d 2 ,9) famous: (d 2 , 2) graphical: (d 1 ,6) jordan: (d 1 ,2), (d 2 ,6) legend: (d 2 ,4) like: (d 2 ,7) michael: (d 1 ,1), (d 2 ,5) model: (d 1 ,7), (d 2 ,10) nba: (d 2 ,3) professor: (d 1 ,10) uc: (d 1 ,12) Vocabulary Postings
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.