SlideShare uma empresa Scribd logo
1 de 44
Semantic Search on the Rise
P e t e r M i k a | Y a h o o L a b s
T r a n D u c T h a n h | L y f e L i n e C o r p o r a t i o n
About the speakers
 Peter Mika
› Senior Research Scientist
› Head of Semantic Search group at Yahoo! Labs
› Expertise: Semantic Web, Information Retrieval,
Natural Language Processing
 Tran Duc Thanh
› CTO of LyfeLine Corporation, Tech Startup, Santa Clara
› Assistant Professor San Jose State University (on leave),
› Served as Assistant Professor for
Stanford University and Karlsruhe Institute of Technology
› Expertise: Semantic Search, Semantic / Linked Data Management
Agenda
3
 What is Semantic Search?
 Semantic Search technology
 Applications
 Beyond Web Search
 Q&A
What is Semantic Search?
4
Why Semantic Search? Part I.
 Improvements in IR are harder and harder to come by
› Basic relevance models are well established
› Machine learning using hundreds of features
› Heavy investment in computational power, e.g. real-time indexing and instant search
 Remaining challenges are not computational, but in modeling user
cognition
› Modeling the relationships between:
• the query
• the content
• the world at large
 Semantic gap
› Ambiguity
• jaguar
• paris hilton
› Secondary meaning
• george bush (and I mean the beer brewer
in Arizona)
› Subjectivity
• reliable digital camera
• paris hilton sexy
› Imprecise or overly precise searches
• jim hendler
 Complex needs
› Missing information
• brad pitt zombie
• florida man with 115 guns
• 35 year old computer scientist living in barcelona
› Category queries
• countries in africa
• barcelona nightlife
› Relational, transactional or computational
queries
• Friends of peter who knows VCs in the Bay Area
• 120 dollars in euros
• digital camera under 300 dollars
• world temperature in 2020
Poorly solved information needs remain
Are there even
true keyword
queries?
Users may
have stopped
asking them
Real problem
What it’s like to be a machine?
Roi Blanco
What it’s like to be a machine?
↵⏏☐ģ
✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓
ţğ★✜
✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫
≠=⅚©§★✓♪ΒΓΕ℠
✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ
⏎⌥°¶§ΥΦΦΦ✗✕☐
Why Semantic Search? Part II.
 The Semantic Web is now a reality
› Emerging agreements around schemas
• Facebook’s Open Graph Protocol (OGP)
• Schema.org
› Large amounts of data published in RDF
• As Linked Data
• Inside HTML pages
• Inside email text messages
› Private Knowledge Graphs inside corporations
 Semantic data exploited by search engines
› Better document presentation and ranking
› Advanced search functionality
Metadata in HTML: schema.org
11
 Agreement on a shared set of schemas for common types of web
content
› Bing, Google, and Yahoo! as initial founders (June, 2011), joined by Yandex later
› Similar in intent to sitemaps.org
• Use a single format to communicate the same information to all three search engines
<div vocab="http://schema.org/" typeof="Movie">
<h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
<span property="description">Jack Sparrow and Barbossa embark on a quest to
find the elusive fountain of youth, only to discover that Blackbeard and
his daughter are after it too.</span>
Director: <div property="director” typeof="Person">
<span property="name">Rob Marshall</span>
</div>
</div>
Substantial adoption of schema.org markup
12
 Over 15% of all pages now have schema.org markup
 Over 5 million sites, over 25 billion entity references
 In other words: same order of magnitude as the web
› Source: R.V. Guha: Light at the end of the tunnel, ISWC 2013 keynote
 See also
› P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012
• Based on Bing US corpus
• 31% of webpages, 5% of domains contain some metadata (including Facebook’s OGP)
› WebDataCommons
• Based on CommonCrawl Nov 2013
• 26% of webpages, 14% of domains contain some metadata (including Facebook’s OGP)
Semantic Search technology
13
 Def. Semantic Search is any
retrieval method where
› User intent and resources are
represented in a semantic model
• A set of concepts or topics that generalize
over tokens/phrases
• Additional structure such as a hierarchy
among concepts, relationships among
concepts etc.
› Semantic representations of the query
and the user intent are exploited in
some part of the retrieval process
 As a research field
› Workshops
• ESAIR (2008-2014) at CIKM, Semantic
Search (SemSearch) workshop series
(2008-2011) at ESWC/WWW, EOS
workshop (2010-2011) at SIGIR, JIWES
workshop (2012) at SIGIR, Semantic
Search Workshop (2011-2014) at VLDB
› Special Issues of journals
› Surveys
• Christos L. Koumenides, Nigel R.
Shadbolt: Ranking methods for entity-
oriented semantic web search.
JASIST 65(6): 1091-1106 (2014)
14
Semantic Search
Semantic models: implicit vs. explicit
16
 Implicit/internal semantics
› Models of text extracted from a corpus of queries, documents or interaction logs
• Query reformulation, term dependency models, translation models, topic models, latent space
models, learning to match (PLS)
› See
• Hang Li and Jun Xu: Semantic Matching in Search. Foundations and Trends in Information
Retrieval Vol 7 Issue 5, 2013, pp 343-469
 Explicit/external semantics
› Explicit linguistic or ontological structures extracted from text and linked to external
knowledge
› Obtained using IE techniques or acquired from Semantic Web markup
Entity Linking vs. Entity Retrieval
17
 Entity Linking
› Recognizing entities that are explicitly mentioned in queries and linking them to a KB
 Entity Retrieval
› Ranking entities in a KB, given a query
› Result may not be explicitly mentioned in the query
What it is like to be a machine?
↵⏏☐ģ
✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓
ţğ★✜
✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫
≠=⅚©§★✓♪ΒΓΕ℠
✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ
⏎⌥°¶§ΥΦΦΦ✗✕☐
Entity Linking
<roi>↵⏏☐ģ</roi>
✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓
ţğ★✜
✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫
≠=⅚©§★✓♪ΒΓΕ℠
✖Γ♫⅜±<roi>⏎↵⏏☐ģ</roi>ğğğμλκσςτ
⏎⌥°¶§ΥΦΦΦ✗✕☐
<roi>
Entity Retrieval
↵⏏☐ģ
<roi>
<kia>
<rio>
The role of entities in queries
21
 Entities play an important role
› ~70% of queries contain a named entity (entity mention queries) and
~50% of queries have an entity focus (entity seeking queries)
• brad pitt attacked by fans
› ~10% of queries are looking for a class of entities
• brad pitt movies
› See
• Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc object retrieval in the web of data. WWW
2010: 771-780
• Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, Ariel Fuxman: Active objects:
actions for entity-centric search. WWW 2012: 589-598
Entity linking in queries
 Common structure to entity mention queries:
query = <entity> + <intent>
› Intent is typically an additional word or phrase to
• Disambiguate, e.g. brad pitt actor
• Specify action or aspect e.g. brad pitt net worth, brad pitt download
 Entity linking in queries
› Tutorial: Entity Linking and Retrieval by Edgar Meij, Krisztián Balog and Daan Odijk
› Microsoft Entity Linking challenge
› Yahoo WebScope dataset L24 - Yahoo Search Query Log To Entities, version 1.0
 Session-level analysis
› Recognize entities and intents at the session level
› Laura Hollink, Peter Mika, Roi Blanco: Web usage mining with semantic analysis. WWW 2013: 561-570
Entity Retrieval
 Keyword search over entity graphs
› see Pound et al. WWW08 for a definition
› No common benchmark until 2010
 SemSearch Challenge 2010/2011
• 50 entity-mention queries Selected from the Search Query Tiny Sample v1.0 dataset (Yahoo!
Webscope)
• Billion Triples Challenge 2009 data set
• Evaluation using Mechanical Turk
› See report:
• Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson,
Thanh Tran: Repeatable and reliable semantic search evaluation. J. Web Sem. 21: 14-29 (2013)
Question Answering
26
 Question Answering over Linked Data competition
› 2011-2014
› Data
• Dbpedia and MusicBrainz in RDF
› Queries
• Full natural language questions of different forms, written by the organizers
• Multi-lingual
• Give me all actors starring in Batman Begins
› Results are defined by an equivalent SPARQL query
• Systems are free to return list of results or a SPARQL query
Applications
27
Semantic Search for…
28
 Improving ad-hoc document retrieval
› Query composition
› Result presentation
› Matching
› Ranking
 Providing new search functionality
› Entity retrieval
› Personalization
› Related entity recommendation
› Complex question-answering, relational search, computational search…
› Task completion
Exploiting Semantic Web markup
(Yahoo internal prototype, 2007)
Personal and
private
homepage
of the same
person
(clear from the
snippet but it
could be also
automatically
de-duplicated)
Conferences
he plans to attend
and his vacations
from homepage
plus bio events
from LinkedIn
Geolocation
Search snippets using Semantic Web markup
 Summarization of HTML is a hard task
• Template detection
• Selecting relevant snippets
• Composing readable text
› Efficiency constraints
 Yahoo SearchMonkey (2008)
› Enhanced results using structured data from the page
• Key/value pairs
• Deep links
• Image or Video
Effectiveness of enhanced results (Yahoo)
 Explicit user feedback
› Side-by-side editorial evaluation (A/B testing)
• Editors are shown a traditional search result and enhanced result for the same page
• Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)
 Implicit user feedback
› Click-through rate analysis
• Long dwell time limit of 100s (Ciemiewicz et al. 2010)
• 15% increase in ‘good’ clicks
› User interaction model
• Enhanced results lead users to relevant documents
– even though less likely to clicked than textual results
• Enhanced results effectively reduce bad clicks!
 See
› Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011:
725-734
Enhanced results at other search providers
 Google announces Rich Snippets - June, 2009
› Faceted search for recipes - Feb, 2011
 Bing tiles – Feb, 2011
 Facebook’s Like button and the Open Graph Protocol (2010)
› Shows up in profiles and news feed
› Site owners can later reach users who have liked an object
Moving beyond entity markup
33
 We would like to help our users in task completion
› But we have trained our users to talk in nouns
• Retrieval performance decreases by adding verbs to queries
› Markup for actions/intents could potentially help
 Modeling actions
› Understand what actions can be taken on a page
› Help users in mapping their query to potential actions
› Applications in web search, email etc.
THING
THING
Schema.org v1.2
including Actions
vocabulary
published
April 16, 2014
Applications of Actions markup
Email (Gmail) SERP (Yandex)
Personalized content and native ads (Yahoo)
 User profiling based on entities recognized in the content consumed
 News and ads personalized to the user
 Entity retrieval
› Which entity does a keyword query
refer to, if any?
 Related entities
› Which entity would the user visit next?
• Roi Blanco, B. Barla Cambazoglu, Peter
Mika, Nicolas Torzec:
Entity Recommendations in Web Search.
ISWC 2013
Entity displays in web search
(Bing/Google/Yahoo)
Relational Search (Facebook Graph Search)
“my friends, who is member of queen”
{band}
[id:Queen1]
Queen1
queen
[member-of-v]
is member of
member()
member
[member-vp]
is member of [id:1]
member(x,Queen1)
[who]
who
-
friends
[user-filter]
who is member of [id:1]
member(x,Queen1)
[start]
my friends, who is member of [id:Queen1]
friends(x,me), member(x,Queen1)
[user-head]
my friends
friends(x,me)
Grammar: set of production rules,
capturing all possible connections,
i.e. the search space of all parse
trees
[start]  [users]
[users]  my friends
friends(x, me)
[…]  is member of [bands]
member(x, $1)
[bands]  {band}
$1
…
Grammar-based Query
Translation: which combination of
production rules results in a parse
tree that connects the recognized
entities and relationships?
Relational Search (Facebook Graph Search)
Sem. Auto-completion
- Entity + relationships
- Multi-source
- Domain-independent
- Low manual effort
Freddie Mercury
Brian
May
Queen
Queen Elizabeth 1
Liar 197
1
single
PersonArtist Single
writer
Query Translation
Semantic Search (Graphinder)
Freddie
Mercury Queen
Queen
Elizabeth 1 single
Singlewriter
single from freddy mercury que
Data
Index
Schema
Index
Keyword Interpretation
- Imprecise / fuzzy matching
- Match every keyword
Token rewriting via syntactic distance
Relational Query Rewriting
1) single from freddie mercury queen
…
Token rewriting via semantic distance
1) single writer freddie mercury queen
…
Freddie
Mercury Queen
Singlewriter
Data
Index
Schema
Index
Query segmentation
1) single writer “freddie mercury” queen
…
Result Retrieval & Ranking
Keyword / Key Phrase Interpretation:
- Precise matching
- Match keyword and key phrases
Benefits:
- Higher selectivity of query terms (quality)
- Reduced number of query terms (efficiency)
- Better search experience…
Challenges: many rewrite candidates, some are
semantically not “valid” in the relational setting
single (marital status) writer “freddie mercury” queen (the
queen of UK)
Relational Query Rewriting (Graphinder)
Results Aggregation (Wolfram Alpha)
Factual Search/Question Answering (Google)
Beyond Web Search
45
Beyond Web search: mobile interaction
46
 Interaction
› Question-answering
› Support for interactive retrieval
› Spoken-language access
› Task completion
 Contextualization
› Personalization
› Geo
› Context (work/home/travel)
• Try getaviate.com
Interactive, conversational voice search
 Parlance EU project
› Complex dialogs within a domain
• Requires complete semantic understanding
 Complete system (mixed license)
› Automated Speech Recognition (ASR)
› Spoken Language Understanding (SLU)
› Interaction Management
› Knowledge Base
› Natural Language Generation (NLG)
› Text-to-Speech (TTS)
 Video
Conclusions
48
 Semantic Search
› Explicit understanding for queries and documents
through links to external knowledge
• Using methods of Information Extraction or
explicit annotations (markup) in webpages
• Semantic Web as a source of external knowledge
 Increasing level of understanding
› Early focus on entities and their attributes
• Applications in web search: rich results,
entity displays, entity recommendation
› Moving toward modeling intents/actions
› Adding human-like interaction
Q&A
 Peter
› pmika@yahoo-inc.com
› @pmika
› http://www.slideshare.net/pmika/
 Thanh
› tran.du.th@gmail.com
› https://sites.google.com/site/kimducthanh
› http://www.slideshare.net/thanhtran81

Mais conteúdo relacionado

Mais procurados

Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsPeter Mika
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in PracticePeter Mika
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?Peter Mika
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Bradley Allen
 
Smoke Signals and Social Signals: A look at the patents and papers
Smoke Signals and Social Signals: A look at the patents and papersSmoke Signals and Social Signals: A look at the patents and papers
Smoke Signals and Social Signals: A look at the patents and papersBill Slawski
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011sssw2011
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
 
From Queries to Answers in the Web
From Queries to Answers in the WebFrom Queries to Answers in the Web
From Queries to Answers in the WebRoi Blanco
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyondBill Slawski
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011sssw2011
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Roi Blanco
 
Semantic seo and the evolution of queries
Semantic seo and the evolution of queriesSemantic seo and the evolution of queries
Semantic seo and the evolution of queriesBill Slawski
 
Ranking in Google Since The Advent of The Knowledge Graph
Ranking in Google Since The Advent of The Knowledge GraphRanking in Google Since The Advent of The Knowledge Graph
Ranking in Google Since The Advent of The Knowledge GraphBill Slawski
 
Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Websamar_slideshare
 
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of SearchBill Slawski
 

Mais procurados (20)

Semantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistantsSemantic search: from document retrieval to virtual assistants
Semantic search: from document retrieval to virtual assistants
 
Knowledge Integration in Practice
Knowledge Integration in PracticeKnowledge Integration in Practice
Knowledge Integration in Practice
 
Semantic search
Semantic searchSemantic search
Semantic search
 
What happened to the Semantic Web?
What happened to the Semantic Web?What happened to the Semantic Web?
What happened to the Semantic Web?
 
Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
 
Smoke Signals and Social Signals: A look at the patents and papers
Smoke Signals and Social Signals: A look at the patents and papersSmoke Signals and Social Signals: A look at the patents and papers
Smoke Signals and Social Signals: A look at the patents and papers
 
Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011Peter Mika's Presentation at SSSW 2011
Peter Mika's Presentation at SSSW 2011
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
 
Semantic search
Semantic searchSemantic search
Semantic search
 
From Queries to Answers in the Web
From Queries to Answers in the WebFrom Queries to Answers in the Web
From Queries to Answers in the Web
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Search and social patents for 2012 and beyond
Search and social patents for 2012 and beyondSearch and social patents for 2012 and beyond
Search and social patents for 2012 and beyond
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
Semantic seo and the evolution of queries
Semantic seo and the evolution of queriesSemantic seo and the evolution of queries
Semantic seo and the evolution of queries
 
Ranking in Google Since The Advent of The Knowledge Graph
Ranking in Google Since The Advent of The Knowledge GraphRanking in Google Since The Advent of The Knowledge Graph
Ranking in Google Since The Advent of The Knowledge Graph
 
Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Web
 
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of Search
 

Semelhante a Semantic Search on the Rise

Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Peter Mika
 
Semantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebSemantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebPeter Mika
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Peter Mika
 
(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”icwe2015
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopPeter Skomoroch
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinciJohnny Lopez
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustPeter Skomoroch
 
Social Networks and the Semantic Web: a retrospective of the past 10 years
Social Networks and the Semantic Web: a retrospective of the past 10 yearsSocial Networks and the Semantic Web: a retrospective of the past 10 years
Social Networks and the Semantic Web: a retrospective of the past 10 yearsPeter Mika
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Connotate
 
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint SummitSearch Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint SummitJoel Oleson
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic SearchRoi Blanco
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
Advanced google searching (1)
Advanced google searching (1)Advanced google searching (1)
Advanced google searching (1)Brenda Crawford
 
Planning Your Enterprise Search Strategy
Planning Your Enterprise Search StrategyPlanning Your Enterprise Search Strategy
Planning Your Enterprise Search StrategyInnoTech
 
Evaluating search engines
Evaluating search enginesEvaluating search engines
Evaluating search enginesPhil Bradley
 
Focused Crawling for Structured Data
Focused Crawling for Structured DataFocused Crawling for Structured Data
Focused Crawling for Structured DataRobert Meusel
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customersrichwig
 

Semelhante a Semantic Search on the Rise (20)

Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015Semantic Search keynote at CORIA 2015
Semantic Search keynote at CORIA 2015
 
Semantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the WebSemantic mark-up with schema.org: helping search engines understand the Web
Semantic mark-up with schema.org: helping search engines understand the Web
 
Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015Making the Web Searchable - Keynote ICWE 2015
Making the Web Searchable - Keynote ICWE 2015
 
(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”(Keynote) Peter Mika - “Making the Web Searchable”
(Keynote) Peter Mika - “Making the Web Searchable”
 
Rapid Data Exploration With Hadoop
Rapid Data Exploration With HadoopRapid Data Exploration With Hadoop
Rapid Data Exploration With Hadoop
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
 
Social Networks and the Semantic Web: a retrospective of the past 10 years
Social Networks and the Semantic Web: a retrospective of the past 10 yearsSocial Networks and the Semantic Web: a retrospective of the past 10 years
Social Networks and the Semantic Web: a retrospective of the past 10 years
 
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
Employees, Business Partners and Bad Guys: What Web Data Reveals About Person...
 
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint SummitSearch Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
Search Strategy for Enterprise SharePoint 2013 - Vancouver SharePoint Summit
 
Large-Scale Semantic Search
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Advanced google searching (1)
Advanced google searching (1)Advanced google searching (1)
Advanced google searching (1)
 
Planning Your Enterprise Search Strategy
Planning Your Enterprise Search StrategyPlanning Your Enterprise Search Strategy
Planning Your Enterprise Search Strategy
 
Evaluating search engines
Evaluating search enginesEvaluating search engines
Evaluating search engines
 
Focused Crawling for Structured Data
Focused Crawling for Structured DataFocused Crawling for Structured Data
Focused Crawling for Structured Data
 
SharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationSharePoint Fest Chicago Presentation
SharePoint Fest Chicago Presentation
 
Pratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnectPratical Deep Dive into the Semantic Web - #smconnect
Pratical Deep Dive into the Semantic Web - #smconnect
 
Search Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your CustomersSearch Analytics: Conversations with Your Customers
Search Analytics: Conversations with Your Customers
 

Mais de Peter Mika

Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pbPeter Mika
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisPeter Mika
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchablePeter Mika
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic WebPeter Mika
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011Peter Mika
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009Peter Mika
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyPeter Mika
 
Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin YahooPeter Mika
 

Mais de Peter Mika (8)

Hackathon s pb
Hackathon s pbHackathon s pb
Hackathon s pb
 
Investigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log AnalysisInvestigating the Semantic Gap through Query Log Analysis
Investigating the Semantic Gap through Query Log Analysis
 
Making the Web searchable
Making the Web searchableMaking the Web searchable
Making the Web searchable
 
Publishing data on the Semantic Web
Publishing data on the Semantic WebPublishing data on the Semantic Web
Publishing data on the Semantic Web
 
Hack U Barcelona 2011
Hack U Barcelona 2011Hack U Barcelona 2011
Hack U Barcelona 2011
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
 
Year of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkeyYear of the Monkey: Lessons from the first year of SearchMonkey
Year of the Monkey: Lessons from the first year of SearchMonkey
 
Semantic Web Austin Yahoo
Semantic Web Austin YahooSemantic Web Austin Yahoo
Semantic Web Austin Yahoo
 

Último

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Semantic Search on the Rise

  • 1. Semantic Search on the Rise P e t e r M i k a | Y a h o o L a b s T r a n D u c T h a n h | L y f e L i n e C o r p o r a t i o n
  • 2. About the speakers  Peter Mika › Senior Research Scientist › Head of Semantic Search group at Yahoo! Labs › Expertise: Semantic Web, Information Retrieval, Natural Language Processing  Tran Duc Thanh › CTO of LyfeLine Corporation, Tech Startup, Santa Clara › Assistant Professor San Jose State University (on leave), › Served as Assistant Professor for Stanford University and Karlsruhe Institute of Technology › Expertise: Semantic Search, Semantic / Linked Data Management
  • 3. Agenda 3  What is Semantic Search?  Semantic Search technology  Applications  Beyond Web Search  Q&A
  • 4. What is Semantic Search? 4
  • 5. Why Semantic Search? Part I.  Improvements in IR are harder and harder to come by › Basic relevance models are well established › Machine learning using hundreds of features › Heavy investment in computational power, e.g. real-time indexing and instant search  Remaining challenges are not computational, but in modeling user cognition › Modeling the relationships between: • the query • the content • the world at large
  • 6.  Semantic gap › Ambiguity • jaguar • paris hilton › Secondary meaning • george bush (and I mean the beer brewer in Arizona) › Subjectivity • reliable digital camera • paris hilton sexy › Imprecise or overly precise searches • jim hendler  Complex needs › Missing information • brad pitt zombie • florida man with 115 guns • 35 year old computer scientist living in barcelona › Category queries • countries in africa • barcelona nightlife › Relational, transactional or computational queries • Friends of peter who knows VCs in the Bay Area • 120 dollars in euros • digital camera under 300 dollars • world temperature in 2020 Poorly solved information needs remain Are there even true keyword queries? Users may have stopped asking them
  • 8. What it’s like to be a machine? Roi Blanco
  • 9. What it’s like to be a machine? ↵⏏☐ģ ✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓ ţğ★✜ ✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ ≠=⅚©§★✓♪ΒΓΕ℠ ✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ ⏎⌥°¶§ΥΦΦΦ✗✕☐
  • 10. Why Semantic Search? Part II.  The Semantic Web is now a reality › Emerging agreements around schemas • Facebook’s Open Graph Protocol (OGP) • Schema.org › Large amounts of data published in RDF • As Linked Data • Inside HTML pages • Inside email text messages › Private Knowledge Graphs inside corporations  Semantic data exploited by search engines › Better document presentation and ranking › Advanced search functionality
  • 11. Metadata in HTML: schema.org 11  Agreement on a shared set of schemas for common types of web content › Bing, Google, and Yahoo! as initial founders (June, 2011), joined by Yandex later › Similar in intent to sitemaps.org • Use a single format to communicate the same information to all three search engines <div vocab="http://schema.org/" typeof="Movie"> <h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1> <span property="description">Jack Sparrow and Barbossa embark on a quest to find the elusive fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span> Director: <div property="director” typeof="Person"> <span property="name">Rob Marshall</span> </div> </div>
  • 12. Substantial adoption of schema.org markup 12  Over 15% of all pages now have schema.org markup  Over 5 million sites, over 25 billion entity references  In other words: same order of magnitude as the web › Source: R.V. Guha: Light at the end of the tunnel, ISWC 2013 keynote  See also › P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012 • Based on Bing US corpus • 31% of webpages, 5% of domains contain some metadata (including Facebook’s OGP) › WebDataCommons • Based on CommonCrawl Nov 2013 • 26% of webpages, 14% of domains contain some metadata (including Facebook’s OGP)
  • 14.  Def. Semantic Search is any retrieval method where › User intent and resources are represented in a semantic model • A set of concepts or topics that generalize over tokens/phrases • Additional structure such as a hierarchy among concepts, relationships among concepts etc. › Semantic representations of the query and the user intent are exploited in some part of the retrieval process  As a research field › Workshops • ESAIR (2008-2014) at CIKM, Semantic Search (SemSearch) workshop series (2008-2011) at ESWC/WWW, EOS workshop (2010-2011) at SIGIR, JIWES workshop (2012) at SIGIR, Semantic Search Workshop (2011-2014) at VLDB › Special Issues of journals › Surveys • Christos L. Koumenides, Nigel R. Shadbolt: Ranking methods for entity- oriented semantic web search. JASIST 65(6): 1091-1106 (2014) 14 Semantic Search
  • 15. Semantic models: implicit vs. explicit 16  Implicit/internal semantics › Models of text extracted from a corpus of queries, documents or interaction logs • Query reformulation, term dependency models, translation models, topic models, latent space models, learning to match (PLS) › See • Hang Li and Jun Xu: Semantic Matching in Search. Foundations and Trends in Information Retrieval Vol 7 Issue 5, 2013, pp 343-469  Explicit/external semantics › Explicit linguistic or ontological structures extracted from text and linked to external knowledge › Obtained using IE techniques or acquired from Semantic Web markup
  • 16. Entity Linking vs. Entity Retrieval 17  Entity Linking › Recognizing entities that are explicitly mentioned in queries and linking them to a KB  Entity Retrieval › Ranking entities in a KB, given a query › Result may not be explicitly mentioned in the query
  • 17. What it is like to be a machine? ↵⏏☐ģ ✜Θ♬♬ţğ√∞§®ÇĤĪ✜★♬☐✓✓ ţğ★✜ ✪✚✜ΔΤΟŨŸÏĞÊϖυτρ℠≠⅛⌫ ≠=⅚©§★✓♪ΒΓΕ℠ ✖Γ♫⅜±⏎↵⏏☐ģğğğμλκσςτ ⏎⌥°¶§ΥΦΦΦ✗✕☐
  • 20. The role of entities in queries 21  Entities play an important role › ~70% of queries contain a named entity (entity mention queries) and ~50% of queries have an entity focus (entity seeking queries) • brad pitt attacked by fans › ~10% of queries are looking for a class of entities • brad pitt movies › See • Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc object retrieval in the web of data. WWW 2010: 771-780 • Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, Ariel Fuxman: Active objects: actions for entity-centric search. WWW 2012: 589-598
  • 21. Entity linking in queries  Common structure to entity mention queries: query = <entity> + <intent> › Intent is typically an additional word or phrase to • Disambiguate, e.g. brad pitt actor • Specify action or aspect e.g. brad pitt net worth, brad pitt download  Entity linking in queries › Tutorial: Entity Linking and Retrieval by Edgar Meij, Krisztián Balog and Daan Odijk › Microsoft Entity Linking challenge › Yahoo WebScope dataset L24 - Yahoo Search Query Log To Entities, version 1.0  Session-level analysis › Recognize entities and intents at the session level › Laura Hollink, Peter Mika, Roi Blanco: Web usage mining with semantic analysis. WWW 2013: 561-570
  • 22. Entity Retrieval  Keyword search over entity graphs › see Pound et al. WWW08 for a definition › No common benchmark until 2010  SemSearch Challenge 2010/2011 • 50 entity-mention queries Selected from the Search Query Tiny Sample v1.0 dataset (Yahoo! Webscope) • Billion Triples Challenge 2009 data set • Evaluation using Mechanical Turk › See report: • Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson, Thanh Tran: Repeatable and reliable semantic search evaluation. J. Web Sem. 21: 14-29 (2013)
  • 23. Question Answering 26  Question Answering over Linked Data competition › 2011-2014 › Data • Dbpedia and MusicBrainz in RDF › Queries • Full natural language questions of different forms, written by the organizers • Multi-lingual • Give me all actors starring in Batman Begins › Results are defined by an equivalent SPARQL query • Systems are free to return list of results or a SPARQL query
  • 25. Semantic Search for… 28  Improving ad-hoc document retrieval › Query composition › Result presentation › Matching › Ranking  Providing new search functionality › Entity retrieval › Personalization › Related entity recommendation › Complex question-answering, relational search, computational search… › Task completion
  • 26. Exploiting Semantic Web markup (Yahoo internal prototype, 2007) Personal and private homepage of the same person (clear from the snippet but it could be also automatically de-duplicated) Conferences he plans to attend and his vacations from homepage plus bio events from LinkedIn Geolocation
  • 27. Search snippets using Semantic Web markup  Summarization of HTML is a hard task • Template detection • Selecting relevant snippets • Composing readable text › Efficiency constraints  Yahoo SearchMonkey (2008) › Enhanced results using structured data from the page • Key/value pairs • Deep links • Image or Video
  • 28. Effectiveness of enhanced results (Yahoo)  Explicit user feedback › Side-by-side editorial evaluation (A/B testing) • Editors are shown a traditional search result and enhanced result for the same page • Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)  Implicit user feedback › Click-through rate analysis • Long dwell time limit of 100s (Ciemiewicz et al. 2010) • 15% increase in ‘good’ clicks › User interaction model • Enhanced results lead users to relevant documents – even though less likely to clicked than textual results • Enhanced results effectively reduce bad clicks!  See › Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011: 725-734
  • 29. Enhanced results at other search providers  Google announces Rich Snippets - June, 2009 › Faceted search for recipes - Feb, 2011  Bing tiles – Feb, 2011  Facebook’s Like button and the Open Graph Protocol (2010) › Shows up in profiles and news feed › Site owners can later reach users who have liked an object
  • 30. Moving beyond entity markup 33  We would like to help our users in task completion › But we have trained our users to talk in nouns • Retrieval performance decreases by adding verbs to queries › Markup for actions/intents could potentially help  Modeling actions › Understand what actions can be taken on a page › Help users in mapping their query to potential actions › Applications in web search, email etc. THING THING Schema.org v1.2 including Actions vocabulary published April 16, 2014
  • 31. Applications of Actions markup Email (Gmail) SERP (Yandex)
  • 32. Personalized content and native ads (Yahoo)  User profiling based on entities recognized in the content consumed  News and ads personalized to the user
  • 33.  Entity retrieval › Which entity does a keyword query refer to, if any?  Related entities › Which entity would the user visit next? • Roi Blanco, B. Barla Cambazoglu, Peter Mika, Nicolas Torzec: Entity Recommendations in Web Search. ISWC 2013 Entity displays in web search (Bing/Google/Yahoo)
  • 35. “my friends, who is member of queen” {band} [id:Queen1] Queen1 queen [member-of-v] is member of member() member [member-vp] is member of [id:1] member(x,Queen1) [who] who - friends [user-filter] who is member of [id:1] member(x,Queen1) [start] my friends, who is member of [id:Queen1] friends(x,me), member(x,Queen1) [user-head] my friends friends(x,me) Grammar: set of production rules, capturing all possible connections, i.e. the search space of all parse trees [start]  [users] [users]  my friends friends(x, me) […]  is member of [bands] member(x, $1) [bands]  {band} $1 … Grammar-based Query Translation: which combination of production rules results in a parse tree that connects the recognized entities and relationships? Relational Search (Facebook Graph Search)
  • 36. Sem. Auto-completion - Entity + relationships - Multi-source - Domain-independent - Low manual effort Freddie Mercury Brian May Queen Queen Elizabeth 1 Liar 197 1 single PersonArtist Single writer Query Translation Semantic Search (Graphinder)
  • 37. Freddie Mercury Queen Queen Elizabeth 1 single Singlewriter single from freddy mercury que Data Index Schema Index Keyword Interpretation - Imprecise / fuzzy matching - Match every keyword Token rewriting via syntactic distance Relational Query Rewriting 1) single from freddie mercury queen … Token rewriting via semantic distance 1) single writer freddie mercury queen … Freddie Mercury Queen Singlewriter Data Index Schema Index Query segmentation 1) single writer “freddie mercury” queen … Result Retrieval & Ranking Keyword / Key Phrase Interpretation: - Precise matching - Match keyword and key phrases Benefits: - Higher selectivity of query terms (quality) - Reduced number of query terms (efficiency) - Better search experience… Challenges: many rewrite candidates, some are semantically not “valid” in the relational setting single (marital status) writer “freddie mercury” queen (the queen of UK) Relational Query Rewriting (Graphinder)
  • 41. Beyond Web search: mobile interaction 46  Interaction › Question-answering › Support for interactive retrieval › Spoken-language access › Task completion  Contextualization › Personalization › Geo › Context (work/home/travel) • Try getaviate.com
  • 42. Interactive, conversational voice search  Parlance EU project › Complex dialogs within a domain • Requires complete semantic understanding  Complete system (mixed license) › Automated Speech Recognition (ASR) › Spoken Language Understanding (SLU) › Interaction Management › Knowledge Base › Natural Language Generation (NLG) › Text-to-Speech (TTS)  Video
  • 43. Conclusions 48  Semantic Search › Explicit understanding for queries and documents through links to external knowledge • Using methods of Information Extraction or explicit annotations (markup) in webpages • Semantic Web as a source of external knowledge  Increasing level of understanding › Early focus on entities and their attributes • Applications in web search: rich results, entity displays, entity recommendation › Moving toward modeling intents/actions › Adding human-like interaction
  • 44. Q&A  Peter › pmika@yahoo-inc.com › @pmika › http://www.slideshare.net/pmika/  Thanh › tran.du.th@gmail.com › https://sites.google.com/site/kimducthanh › http://www.slideshare.net/thanhtran81