The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
Semantic Search on the Rise
1. Semantic Search on the Rise
P e t e r M i k a | Y a h o o L a b s
T r a n D u c T h a n h | L y f e L i n e C o r p o r a t i o n
2. About the speakers
Peter Mika
› Senior Research Scientist
› Head of Semantic Search group at Yahoo! Labs
› Expertise: Semantic Web, Information Retrieval,
Natural Language Processing
Tran Duc Thanh
› CTO of LyfeLine Corporation, Tech Startup, Santa Clara
› Assistant Professor San Jose State University (on leave),
› Served as Assistant Professor for
Stanford University and Karlsruhe Institute of Technology
› Expertise: Semantic Search, Semantic / Linked Data Management
3. Agenda
3
What is Semantic Search?
Semantic Search technology
Applications
Beyond Web Search
Q&A
5. Why Semantic Search? Part I.
Improvements in IR are harder and harder to come by
› Basic relevance models are well established
› Machine learning using hundreds of features
› Heavy investment in computational power, e.g. real-time indexing and instant search
Remaining challenges are not computational, but in modeling user
cognition
› Modeling the relationships between:
• the query
• the content
• the world at large
6. Semantic gap
› Ambiguity
• jaguar
• paris hilton
› Secondary meaning
• george bush (and I mean the beer brewer
in Arizona)
› Subjectivity
• reliable digital camera
• paris hilton sexy
› Imprecise or overly precise searches
• jim hendler
Complex needs
› Missing information
• brad pitt zombie
• florida man with 115 guns
• 35 year old computer scientist living in barcelona
› Category queries
• countries in africa
• barcelona nightlife
› Relational, transactional or computational
queries
• Friends of peter who knows VCs in the Bay Area
• 120 dollars in euros
• digital camera under 300 dollars
• world temperature in 2020
Poorly solved information needs remain
Are there even
true keyword
queries?
Users may
have stopped
asking them
10. Why Semantic Search? Part II.
The Semantic Web is now a reality
› Emerging agreements around schemas
• Facebook’s Open Graph Protocol (OGP)
• Schema.org
› Large amounts of data published in RDF
• As Linked Data
• Inside HTML pages
• Inside email text messages
› Private Knowledge Graphs inside corporations
Semantic data exploited by search engines
› Better document presentation and ranking
› Advanced search functionality
11. Metadata in HTML: schema.org
11
Agreement on a shared set of schemas for common types of web
content
› Bing, Google, and Yahoo! as initial founders (June, 2011), joined by Yandex later
› Similar in intent to sitemaps.org
• Use a single format to communicate the same information to all three search engines
<div vocab="http://schema.org/" typeof="Movie">
<h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
<span property="description">Jack Sparrow and Barbossa embark on a quest to
find the elusive fountain of youth, only to discover that Blackbeard and
his daughter are after it too.</span>
Director: <div property="director” typeof="Person">
<span property="name">Rob Marshall</span>
</div>
</div>
12. Substantial adoption of schema.org markup
12
Over 15% of all pages now have schema.org markup
Over 5 million sites, over 25 billion entity references
In other words: same order of magnitude as the web
› Source: R.V. Guha: Light at the end of the tunnel, ISWC 2013 keynote
See also
› P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012
• Based on Bing US corpus
• 31% of webpages, 5% of domains contain some metadata (including Facebook’s OGP)
› WebDataCommons
• Based on CommonCrawl Nov 2013
• 26% of webpages, 14% of domains contain some metadata (including Facebook’s OGP)
14. Def. Semantic Search is any
retrieval method where
› User intent and resources are
represented in a semantic model
• A set of concepts or topics that generalize
over tokens/phrases
• Additional structure such as a hierarchy
among concepts, relationships among
concepts etc.
› Semantic representations of the query
and the user intent are exploited in
some part of the retrieval process
As a research field
› Workshops
• ESAIR (2008-2014) at CIKM, Semantic
Search (SemSearch) workshop series
(2008-2011) at ESWC/WWW, EOS
workshop (2010-2011) at SIGIR, JIWES
workshop (2012) at SIGIR, Semantic
Search Workshop (2011-2014) at VLDB
› Special Issues of journals
› Surveys
• Christos L. Koumenides, Nigel R.
Shadbolt: Ranking methods for entity-
oriented semantic web search.
JASIST 65(6): 1091-1106 (2014)
14
Semantic Search
15. Semantic models: implicit vs. explicit
16
Implicit/internal semantics
› Models of text extracted from a corpus of queries, documents or interaction logs
• Query reformulation, term dependency models, translation models, topic models, latent space
models, learning to match (PLS)
› See
• Hang Li and Jun Xu: Semantic Matching in Search. Foundations and Trends in Information
Retrieval Vol 7 Issue 5, 2013, pp 343-469
Explicit/external semantics
› Explicit linguistic or ontological structures extracted from text and linked to external
knowledge
› Obtained using IE techniques or acquired from Semantic Web markup
16. Entity Linking vs. Entity Retrieval
17
Entity Linking
› Recognizing entities that are explicitly mentioned in queries and linking them to a KB
Entity Retrieval
› Ranking entities in a KB, given a query
› Result may not be explicitly mentioned in the query
20. The role of entities in queries
21
Entities play an important role
› ~70% of queries contain a named entity (entity mention queries) and
~50% of queries have an entity focus (entity seeking queries)
• brad pitt attacked by fans
› ~10% of queries are looking for a class of entities
• brad pitt movies
› See
• Jeffrey Pound, Peter Mika, Hugo Zaragoza: Ad-hoc object retrieval in the web of data. WWW
2010: 771-780
• Thomas Lin, Patrick Pantel, Michael Gamon, Anitha Kannan, Ariel Fuxman: Active objects:
actions for entity-centric search. WWW 2012: 589-598
21. Entity linking in queries
Common structure to entity mention queries:
query = <entity> + <intent>
› Intent is typically an additional word or phrase to
• Disambiguate, e.g. brad pitt actor
• Specify action or aspect e.g. brad pitt net worth, brad pitt download
Entity linking in queries
› Tutorial: Entity Linking and Retrieval by Edgar Meij, Krisztián Balog and Daan Odijk
› Microsoft Entity Linking challenge
› Yahoo WebScope dataset L24 - Yahoo Search Query Log To Entities, version 1.0
Session-level analysis
› Recognize entities and intents at the session level
› Laura Hollink, Peter Mika, Roi Blanco: Web usage mining with semantic analysis. WWW 2013: 561-570
22. Entity Retrieval
Keyword search over entity graphs
› see Pound et al. WWW08 for a definition
› No common benchmark until 2010
SemSearch Challenge 2010/2011
• 50 entity-mention queries Selected from the Search Query Tiny Sample v1.0 dataset (Yahoo!
Webscope)
• Billion Triples Challenge 2009 data set
• Evaluation using Mechanical Turk
› See report:
• Roi Blanco, Harry Halpin, Daniel M. Herzig, Peter Mika, Jeffrey Pound, Henry S. Thompson,
Thanh Tran: Repeatable and reliable semantic search evaluation. J. Web Sem. 21: 14-29 (2013)
23. Question Answering
26
Question Answering over Linked Data competition
› 2011-2014
› Data
• Dbpedia and MusicBrainz in RDF
› Queries
• Full natural language questions of different forms, written by the organizers
• Multi-lingual
• Give me all actors starring in Batman Begins
› Results are defined by an equivalent SPARQL query
• Systems are free to return list of results or a SPARQL query
26. Exploiting Semantic Web markup
(Yahoo internal prototype, 2007)
Personal and
private
homepage
of the same
person
(clear from the
snippet but it
could be also
automatically
de-duplicated)
Conferences
he plans to attend
and his vacations
from homepage
plus bio events
from LinkedIn
Geolocation
27. Search snippets using Semantic Web markup
Summarization of HTML is a hard task
• Template detection
• Selecting relevant snippets
• Composing readable text
› Efficiency constraints
Yahoo SearchMonkey (2008)
› Enhanced results using structured data from the page
• Key/value pairs
• Deep links
• Image or Video
28. Effectiveness of enhanced results (Yahoo)
Explicit user feedback
› Side-by-side editorial evaluation (A/B testing)
• Editors are shown a traditional search result and enhanced result for the same page
• Users prefer enhanced results in 84% of the cases and traditional results in 3% (N=384)
Implicit user feedback
› Click-through rate analysis
• Long dwell time limit of 100s (Ciemiewicz et al. 2010)
• 15% increase in ‘good’ clicks
› User interaction model
• Enhanced results lead users to relevant documents
– even though less likely to clicked than textual results
• Enhanced results effectively reduce bad clicks!
See
› Kevin Haas, Peter Mika, Paul Tarjan, Roi Blanco: Enhanced results for web search. SIGIR 2011:
725-734
29. Enhanced results at other search providers
Google announces Rich Snippets - June, 2009
› Faceted search for recipes - Feb, 2011
Bing tiles – Feb, 2011
Facebook’s Like button and the Open Graph Protocol (2010)
› Shows up in profiles and news feed
› Site owners can later reach users who have liked an object
30. Moving beyond entity markup
33
We would like to help our users in task completion
› But we have trained our users to talk in nouns
• Retrieval performance decreases by adding verbs to queries
› Markup for actions/intents could potentially help
Modeling actions
› Understand what actions can be taken on a page
› Help users in mapping their query to potential actions
› Applications in web search, email etc.
THING
THING
Schema.org v1.2
including Actions
vocabulary
published
April 16, 2014
32. Personalized content and native ads (Yahoo)
User profiling based on entities recognized in the content consumed
News and ads personalized to the user
33. Entity retrieval
› Which entity does a keyword query
refer to, if any?
Related entities
› Which entity would the user visit next?
• Roi Blanco, B. Barla Cambazoglu, Peter
Mika, Nicolas Torzec:
Entity Recommendations in Web Search.
ISWC 2013
Entity displays in web search
(Bing/Google/Yahoo)
35. “my friends, who is member of queen”
{band}
[id:Queen1]
Queen1
queen
[member-of-v]
is member of
member()
member
[member-vp]
is member of [id:1]
member(x,Queen1)
[who]
who
-
friends
[user-filter]
who is member of [id:1]
member(x,Queen1)
[start]
my friends, who is member of [id:Queen1]
friends(x,me), member(x,Queen1)
[user-head]
my friends
friends(x,me)
Grammar: set of production rules,
capturing all possible connections,
i.e. the search space of all parse
trees
[start] [users]
[users] my friends
friends(x, me)
[…] is member of [bands]
member(x, $1)
[bands] {band}
$1
…
Grammar-based Query
Translation: which combination of
production rules results in a parse
tree that connects the recognized
entities and relationships?
Relational Search (Facebook Graph Search)
36. Sem. Auto-completion
- Entity + relationships
- Multi-source
- Domain-independent
- Low manual effort
Freddie Mercury
Brian
May
Queen
Queen Elizabeth 1
Liar 197
1
single
PersonArtist Single
writer
Query Translation
Semantic Search (Graphinder)
37. Freddie
Mercury Queen
Queen
Elizabeth 1 single
Singlewriter
single from freddy mercury que
Data
Index
Schema
Index
Keyword Interpretation
- Imprecise / fuzzy matching
- Match every keyword
Token rewriting via syntactic distance
Relational Query Rewriting
1) single from freddie mercury queen
…
Token rewriting via semantic distance
1) single writer freddie mercury queen
…
Freddie
Mercury Queen
Singlewriter
Data
Index
Schema
Index
Query segmentation
1) single writer “freddie mercury” queen
…
Result Retrieval & Ranking
Keyword / Key Phrase Interpretation:
- Precise matching
- Match keyword and key phrases
Benefits:
- Higher selectivity of query terms (quality)
- Reduced number of query terms (efficiency)
- Better search experience…
Challenges: many rewrite candidates, some are
semantically not “valid” in the relational setting
single (marital status) writer “freddie mercury” queen (the
queen of UK)
Relational Query Rewriting (Graphinder)
41. Beyond Web search: mobile interaction
46
Interaction
› Question-answering
› Support for interactive retrieval
› Spoken-language access
› Task completion
Contextualization
› Personalization
› Geo
› Context (work/home/travel)
• Try getaviate.com
42. Interactive, conversational voice search
Parlance EU project
› Complex dialogs within a domain
• Requires complete semantic understanding
Complete system (mixed license)
› Automated Speech Recognition (ASR)
› Spoken Language Understanding (SLU)
› Interaction Management
› Knowledge Base
› Natural Language Generation (NLG)
› Text-to-Speech (TTS)
Video
43. Conclusions
48
Semantic Search
› Explicit understanding for queries and documents
through links to external knowledge
• Using methods of Information Extraction or
explicit annotations (markup) in webpages
• Semantic Web as a source of external knowledge
Increasing level of understanding
› Early focus on entities and their attributes
• Applications in web search: rich results,
entity displays, entity recommendation
› Moving toward modeling intents/actions
› Adding human-like interaction