GContext: A context-based query construction service for Google

GContext: A context-based query
construction service for Google
Ioannis Apostolatos and Ioannis Papadakis
Ionian University, Greece

Presentation outline
 Introduction
 Rationale
 Proposed approach
 Usage scenarios
 Discussion

Introduction
 At the web, information about virtually anything can
be found, provided that a searcher knows where to
look
 Searchers largely rely on large-scale web search
engines – SE in order to get assistance in locating
useful resources
 The quality of the search results depends on the
ability of the searchers to accurately express their
information needs as keywords in the search
engine's input box
 How do SE aid their users in creating successful
queries?

Rationale
 The query construction phase of a search session is
crucial to the fulfillment of the searchers‟ information
needs
 During the query construction phase, a searcher has
to express his information needs according to the
specific dialect (i.e. keywords-based) of the
underlying SE
 The searcher has to 'guess' the words that the SE
has chosen to index the web resources that
correspond to such needs

Rationale
 Spoken languages have certain features that should
be taken under consideration:
 Polysemy of words
 Polysemy occurs when a word has more than one sense
 A query that consists of an ambiguous word without further
information that correctly disambiguates it may result in a
search results list with completely useless information
 Synonymy of words
 Synonymy occurs when two or more words share the same
meaning
 The probability of two persons using the same term in
describing the same thing is less than 20%

Proposed approach
 A query construction/refinement service on top of
Google SE that is powered by the LOD cloud and
especially DBpedia
 The proposed service is a two-step process:
1. Initially, it provides autosuggest functionality by reacting to
the corresponding keystrokes of a searcher
 Prefix search is performed to an index that is comprised of
words and/or phrases originating from Wikipedia and made
available through Dbpedia („article titles‟ dataset)
 Such functionality facilitates query disambiguations, since
Wikipedia's disambiguations follow a pattern that is promoted
by prefix search
 i.e. <ambiguous word> (disambiguation info)) e.g. bass (fish)
 DBpedia‟s suggestions are appended to Google‟s original
suggestions

Proposed approach
 The proposed service is a two-step process:
(continue…)
2. Upon selection of a suggestion, the searcher is offered
the chance to refine the initial query through the
appropriate interactions that are provided by the
service (i.e. query replacements and refinements)
 Query replacements and refinements derive from the results of
SPARQL queries that are addressed to DBpedia's endpoint
 Every interaction results to the construction of an
appropriate query that is addressed to Google's
Custom Search, which, in turn, provides the
corresponding search results

Proposed approach – under the hood:
Query replacements
 Words or phrases that correspond to alternatives to
the suggestion the user has chosen from the search
box
 They are actually Wikipedia's redirections of the
article's title that the user selected from the search
box
 SPARQL query evolves around the
<http://dbpedia.org/ontology/wikiPageRedirects>
predicate

Proposed approach – under the hood:
Query refinements

 Query refinements are keywords that a user can add to the initial query
in order to semantically refine it. They are organized in three groups:
 Categories
 Wordnet categories and
 Context words
 The 'Categories' group is populated with the categories of the
Wikipedia's article that the user selected from the search box
 Corr. SPARQL query evolves around the <http://purl.org/dc/terms/subject>
predicate
 The 'Wordnet categories' group is populated with the wordnet
categories of the Wikipedia's title that the user selected from the search
box
 Corr. SPARQL query evolves around the
<http://dbpedia.org/property/wordnet_type> predicate
 The group 'Context words' is populated with information deriving from
the infobox of the corresponding Wikipedia's article
 Corr. SPARQL query evolves around the <http://dbpedia.org/property/.*>
predicate along with numerous „FILTER‟ clauses

Usage scenarios: Autosuggestions

Dealing with ambiguous queries: Jaguar the hero from Archie Comics


Dealing with ambiguous queries: Jaguar the hero from Archie
Comics

Dealing with ambiguous queries: Jaguar the hero from Archie Comics

Usage scenarios: Query replacements

Usage scenarios: Query refinements

Discussion
 So, can we compete Google? Certainly not:
 Linked data is full of „noise‟
 Things could improve if we all put some effort into it:
http://pedantic-web.org/
 SPARQL endpoints are often too slow to respond
 Unions are expensive
 “FILTER regex” clauses take forever to resolve
 Maybe the Database community provides solutions that will speed
things up
 Size matters
 Google‟s index size is far greater and fresher
 And much more…

Discussion

 Then, why bother?
 We believe that GContext can be seamlessly integrated
with any major search engine that provides access to it‟s
search box
 What about the „knowledge graph‟?
 Too early to jump to any conclusions. It was announced
on May 16th, so far only partially deployed
 A proof that we are on the right tracks:
 “… go deeper and broader” i.e. infoboxes from DBpedia
 “… Find the right thing” i.e. PageRedirects from DBpedia

Discussion

 Thank you very much,

 Questions?

GContext: A context-based query construction service for Google

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (9)

Semelhante a GContext: A context-based query construction service for Google

Semelhante a GContext: A context-based query construction service for Google (20)

Último

Último (20)

GContext: A context-based query construction service for Google

Notas do Editor