SlideShare uma empresa Scribd logo
1 de 27
Beauty of IR
Venkatesh Vinayakarao
An IR enthusiast!
Disclaimer
Most examples and discussions in this talk revolve
around well known search engines. This is just to get
a good learning experience. Please keep in mind
that IR is beyond search engines.
25+ slides of interesting discussion ahead… 
2/2014 Venkatesh Vinayakarao 2
Quiz
1. Explain any two challenges in Query Intent
Understanding using some examples and discuss
why is it a hard problem?
2. How are “Tiles” as discussed in the class used in
search engines? What purpose do they solve?
3. Search Engines have no UI related design
concerns. True/False?
2/2014 Venkatesh Vinayakarao 3
About Me
BE Computer Science
(Y2K)
MS (IT)
IT Service Industry
Start Up
Nokia
Yahoo
Microsoft (Bing)
PhD
Let me learn
everything all
over again!
2/2014 Venkatesh Vinayakarao 4
Our Agenda: The Beauty of IR!
Crawling Content Processing Indexing
Me!
Query (Intent)
Understanding
Ranking User Interface
Offline Horror!
Online Terror!
How to process
Korean queries for
local listings?
2/2014 Venkatesh Vinayakarao 5
Crawling
 How frequently should we crawl?
 Fresh & Super-Fresh! How to crawl cricket scores? Are we
even crawling here?
 How to avoid 404 - Page not found?
 How much time did it take google to show your first
personal page?
2/2014 Venkatesh Vinayakarao 6
Content Processing
 Good Read: https://getlisted.org/static/resources/local-search-data-providers.html
2/2014 Venkatesh Vinayakarao 7
Content Processing
 Query: “Schools in Delhi”
 Answer: “Delhi Public School”
 Good or Bad?
 Query: “Schools in Hyderabad”
 Answer: “Delhi Public School”
 Good or Bad?
 Query: “Hotels in Bombay”
 Answer: “Grand Hyatt, Mumbai”
 Good or Bad?
 How to get same results for both Mumbai and Bombay?
 Query: “Maruti Car service in delhi”
 Answer: “Rana Motors Private Limited”.
 What happened?
2/2014 Venkatesh Vinayakarao 8
Content Processing & Indexing
 A real example:
http://www.yelp.com/dataset_challenge/
Enriched Business
• Category Synonyms (for eg., auto service & car service are replaceable at times)
• User’s query forms (for eg., McDonalds is commonly queried as McD)
2/2014 Venkatesh Vinayakarao 9
Derived Values & Indexing
 Given a location, how will you find all businesses
within 1km radius?
 Query: schools near govindpuri delhi
2/2014 Venkatesh Vinayakarao 10
Query Understanding Challenge
Need a team of 3 people and one laptop.
Volunteers?
2/2014 Venkatesh Vinayakarao 11
Rules
 I will give an entity name.
 You will have to frame at least three different
(dissimilar) queries (and as many as you can) that
give same document as the correct result at first
place.
 At the end, you should submit:
 Query, Max. no. of top n correct results that you
maintained to be same.
 You will have 5 minutes.
2/2014 Venkatesh Vinayakarao 12
Questions
 Tom Cruise
 Aishwarya Rai
 Tom Hanks
 Srikanta Bedathur
 Venkatesh Vinayakarao
 Pankaj Jalote
 Amir Khan
 Andre Agassi
 Manmohan Singh
2/2014 Venkatesh Vinayakarao 13
Query Understanding
 Query: Michael Jordon
 Which MJ to return? The basketball player or actor?
 Factors
 User profile
 Query context (session details, browser data, links, etc)
 …
 Query: Delhi School
 What does user want? “Delhi Public School” or
“Schools in Delhi” or “some Indian school in US”?
 Query: “IR”
 Predict top three results
2/2014 Venkatesh Vinayakarao 14
Ok! I give up!!
 A frustrated search user: “please show me some t-
shirt brands”
2/2014 Venkatesh Vinayakarao 15
More fun with auto completion
2/2014 Venkatesh Vinayakarao 16
System Overview (Simplified)
Front-end Front-end Front-end Front-end
Query Understanding, Query Classifiers
Web Answer Local Answer
Finance
Answer
Tech Answer &
Many more
KB
Index Serve Crawled
Content
Crawler
Web
Expanded Query
User Query
2/2014 Venkatesh Vinayakarao 17
Ranking & Relevance
 How do we know if the document is relevant (in
web search context)?
 Popularity of url
 Domain score (is it ac.in or .edu?)
 TF, IDF
 Entity, Chain entity?
 Trust Factor (Wikipedia?)
 Inlinks/Outlinks
 Position of query terms
 Sequence of query terms
 … and 1000 of such things
2/2014 Venkatesh Vinayakarao 18
Are current search engines good at
relevance & ranking?
Bing Google
Query1: Vegetarian hotels in south delhi
Query2: South Indian hotels in south delhi
2/2014 Venkatesh Vinayakarao 19
…More examples
Query3: South Indian restaurants in south delhi
What’s the difference between query2 and query3? Should search engines give different
results?
2/2014 Venkatesh Vinayakarao 20
How far for a coffee?
Google: Just one word
(iiitd) missing. So
what?
Let’s make the query as “coffee shops
near iiitd delhi”.
“Coffee shops near me” gives results
from Janakpuri, Gurgaon, CP & Kamla
Nagar.
2/2014 Venkatesh Vinayakarao 21
Why is it hard?
 What makes Ranking & Relevance hard?
2/2014 Venkatesh Vinayakarao 22
User Interface
 Is UI important for search engine?
 Maps in local results
 Live sport score cards
 Finance tickers
 Filters
 Search Operators
 Entity Infoboxes
 What impact does these
make?
2/2014 Venkatesh Vinayakarao 23
Our Agenda: The Beauty of IR!
Crawling Content Processing Indexing
Me!
Query (Intent)
Understanding
Ranking User Interface
Offline Horror!
Online Terror!
How to process
Korean queries for
local listings?
2/2014 Venkatesh Vinayakarao 24
Evaluation
 Various evaluation methods
 Precision/Recall
 Mean Avg Precision
 Mean Reciprocal Rank
 If first relevant doc is at kth position, RR = 1/k.
 NDCG
 Non-Boolean/Graded relevance scores
 DCG = r1 + r2/log22 + r3/log23 + … rn/log2n
2/2014 Venkatesh Vinayakarao 25
NDCG - Example
i
Ground Truth Ranking Function1 Ranking Function2
Document
Order
ri
Document
Order
ri
Document
Order
ri
1 d4 2 d3 2 d3 2
2 d3 2 d4 2 d2 1
3 d2 1 d2 1 d4 2
4 d1 0 d1 0 d1 0
NDCGGT=1.00 NDCGRF1=1.00 NDCGRF2=0.9203
4 documents: d1, d2, d3, d4
Taken from http://www.stanford.edu/class/cs276/handouts/EvaluationNew.ppt
2/2014 Venkatesh Vinayakarao 26
Are we done?
 Q & A
2/2014 Venkatesh Vinayakarao 27

Mais conteúdo relacionado

Destaque

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Beauty ofir

  • 1. Beauty of IR Venkatesh Vinayakarao An IR enthusiast!
  • 2. Disclaimer Most examples and discussions in this talk revolve around well known search engines. This is just to get a good learning experience. Please keep in mind that IR is beyond search engines. 25+ slides of interesting discussion ahead…  2/2014 Venkatesh Vinayakarao 2
  • 3. Quiz 1. Explain any two challenges in Query Intent Understanding using some examples and discuss why is it a hard problem? 2. How are “Tiles” as discussed in the class used in search engines? What purpose do they solve? 3. Search Engines have no UI related design concerns. True/False? 2/2014 Venkatesh Vinayakarao 3
  • 4. About Me BE Computer Science (Y2K) MS (IT) IT Service Industry Start Up Nokia Yahoo Microsoft (Bing) PhD Let me learn everything all over again! 2/2014 Venkatesh Vinayakarao 4
  • 5. Our Agenda: The Beauty of IR! Crawling Content Processing Indexing Me! Query (Intent) Understanding Ranking User Interface Offline Horror! Online Terror! How to process Korean queries for local listings? 2/2014 Venkatesh Vinayakarao 5
  • 6. Crawling  How frequently should we crawl?  Fresh & Super-Fresh! How to crawl cricket scores? Are we even crawling here?  How to avoid 404 - Page not found?  How much time did it take google to show your first personal page? 2/2014 Venkatesh Vinayakarao 6
  • 7. Content Processing  Good Read: https://getlisted.org/static/resources/local-search-data-providers.html 2/2014 Venkatesh Vinayakarao 7
  • 8. Content Processing  Query: “Schools in Delhi”  Answer: “Delhi Public School”  Good or Bad?  Query: “Schools in Hyderabad”  Answer: “Delhi Public School”  Good or Bad?  Query: “Hotels in Bombay”  Answer: “Grand Hyatt, Mumbai”  Good or Bad?  How to get same results for both Mumbai and Bombay?  Query: “Maruti Car service in delhi”  Answer: “Rana Motors Private Limited”.  What happened? 2/2014 Venkatesh Vinayakarao 8
  • 9. Content Processing & Indexing  A real example: http://www.yelp.com/dataset_challenge/ Enriched Business • Category Synonyms (for eg., auto service & car service are replaceable at times) • User’s query forms (for eg., McDonalds is commonly queried as McD) 2/2014 Venkatesh Vinayakarao 9
  • 10. Derived Values & Indexing  Given a location, how will you find all businesses within 1km radius?  Query: schools near govindpuri delhi 2/2014 Venkatesh Vinayakarao 10
  • 11. Query Understanding Challenge Need a team of 3 people and one laptop. Volunteers? 2/2014 Venkatesh Vinayakarao 11
  • 12. Rules  I will give an entity name.  You will have to frame at least three different (dissimilar) queries (and as many as you can) that give same document as the correct result at first place.  At the end, you should submit:  Query, Max. no. of top n correct results that you maintained to be same.  You will have 5 minutes. 2/2014 Venkatesh Vinayakarao 12
  • 13. Questions  Tom Cruise  Aishwarya Rai  Tom Hanks  Srikanta Bedathur  Venkatesh Vinayakarao  Pankaj Jalote  Amir Khan  Andre Agassi  Manmohan Singh 2/2014 Venkatesh Vinayakarao 13
  • 14. Query Understanding  Query: Michael Jordon  Which MJ to return? The basketball player or actor?  Factors  User profile  Query context (session details, browser data, links, etc)  …  Query: Delhi School  What does user want? “Delhi Public School” or “Schools in Delhi” or “some Indian school in US”?  Query: “IR”  Predict top three results 2/2014 Venkatesh Vinayakarao 14
  • 15. Ok! I give up!!  A frustrated search user: “please show me some t- shirt brands” 2/2014 Venkatesh Vinayakarao 15
  • 16. More fun with auto completion 2/2014 Venkatesh Vinayakarao 16
  • 17. System Overview (Simplified) Front-end Front-end Front-end Front-end Query Understanding, Query Classifiers Web Answer Local Answer Finance Answer Tech Answer & Many more KB Index Serve Crawled Content Crawler Web Expanded Query User Query 2/2014 Venkatesh Vinayakarao 17
  • 18. Ranking & Relevance  How do we know if the document is relevant (in web search context)?  Popularity of url  Domain score (is it ac.in or .edu?)  TF, IDF  Entity, Chain entity?  Trust Factor (Wikipedia?)  Inlinks/Outlinks  Position of query terms  Sequence of query terms  … and 1000 of such things 2/2014 Venkatesh Vinayakarao 18
  • 19. Are current search engines good at relevance & ranking? Bing Google Query1: Vegetarian hotels in south delhi Query2: South Indian hotels in south delhi 2/2014 Venkatesh Vinayakarao 19
  • 20. …More examples Query3: South Indian restaurants in south delhi What’s the difference between query2 and query3? Should search engines give different results? 2/2014 Venkatesh Vinayakarao 20
  • 21. How far for a coffee? Google: Just one word (iiitd) missing. So what? Let’s make the query as “coffee shops near iiitd delhi”. “Coffee shops near me” gives results from Janakpuri, Gurgaon, CP & Kamla Nagar. 2/2014 Venkatesh Vinayakarao 21
  • 22. Why is it hard?  What makes Ranking & Relevance hard? 2/2014 Venkatesh Vinayakarao 22
  • 23. User Interface  Is UI important for search engine?  Maps in local results  Live sport score cards  Finance tickers  Filters  Search Operators  Entity Infoboxes  What impact does these make? 2/2014 Venkatesh Vinayakarao 23
  • 24. Our Agenda: The Beauty of IR! Crawling Content Processing Indexing Me! Query (Intent) Understanding Ranking User Interface Offline Horror! Online Terror! How to process Korean queries for local listings? 2/2014 Venkatesh Vinayakarao 24
  • 25. Evaluation  Various evaluation methods  Precision/Recall  Mean Avg Precision  Mean Reciprocal Rank  If first relevant doc is at kth position, RR = 1/k.  NDCG  Non-Boolean/Graded relevance scores  DCG = r1 + r2/log22 + r3/log23 + … rn/log2n 2/2014 Venkatesh Vinayakarao 25
  • 26. NDCG - Example i Ground Truth Ranking Function1 Ranking Function2 Document Order ri Document Order ri Document Order ri 1 d4 2 d3 2 d3 2 2 d3 2 d4 2 d2 1 3 d2 1 d2 1 d4 2 4 d1 0 d1 0 d1 0 NDCGGT=1.00 NDCGRF1=1.00 NDCGRF2=0.9203 4 documents: d1, d2, d3, d4 Taken from http://www.stanford.edu/class/cs276/handouts/EvaluationNew.ppt 2/2014 Venkatesh Vinayakarao 26
  • 27. Are we done?  Q & A 2/2014 Venkatesh Vinayakarao 27