3. Introduction
At the web, information about virtually anything can
be found, provided that a searcher knows where to
look
Searchers largely rely on large-scale web search
engines – SE in order to get assistance in locating
useful resources
The quality of the search results depends on the
ability of the searchers to accurately express their
information needs as keywords in the search
engine's input box
How do SE aid their users in creating successful
queries?
4. Rationale
The query construction phase of a search session is
crucial to the fulfillment of the searchers‟ information
needs
During the query construction phase, a searcher has
to express his information needs according to the
specific dialect (i.e. keywords-based) of the
underlying SE
The searcher has to 'guess' the words that the SE
has chosen to index the web resources that
correspond to such needs
5. Rationale
Spoken languages have certain features that should
be taken under consideration:
Polysemy of words
Polysemy occurs when a word has more than one sense
A query that consists of an ambiguous word without further
information that correctly disambiguates it may result in a
search results list with completely useless information
Synonymy of words
Synonymy occurs when two or more words share the same
meaning
The probability of two persons using the same term in
describing the same thing is less than 20%
6. Proposed approach
A query construction/refinement service on top of
Google SE that is powered by the LOD cloud and
especially DBpedia
The proposed service is a two-step process:
1. Initially, it provides autosuggest functionality by reacting to
the corresponding keystrokes of a searcher
Prefix search is performed to an index that is comprised of
words and/or phrases originating from Wikipedia and made
available through Dbpedia („article titles‟ dataset)
Such functionality facilitates query disambiguations, since
Wikipedia's disambiguations follow a pattern that is promoted
by prefix search
i.e. <ambiguous word> (disambiguation info)) e.g. bass (fish)
DBpedia‟s suggestions are appended to Google‟s original
suggestions
7. Proposed approach
The proposed service is a two-step process:
(continue…)
2. Upon selection of a suggestion, the searcher is offered
the chance to refine the initial query through the
appropriate interactions that are provided by the
service (i.e. query replacements and refinements)
Query replacements and refinements derive from the results of
SPARQL queries that are addressed to DBpedia's endpoint
Every interaction results to the construction of an
appropriate query that is addressed to Google's
Custom Search, which, in turn, provides the
corresponding search results
8. Proposed approach – under the hood:
Query replacements
Words or phrases that correspond to alternatives to
the suggestion the user has chosen from the search
box
They are actually Wikipedia's redirections of the
article's title that the user selected from the search
box
SPARQL query evolves around the
<http://dbpedia.org/ontology/wikiPageRedirects>
predicate
9. Proposed approach – under the hood:
Query refinements
Query refinements are keywords that a user can add to the initial query
in order to semantically refine it. They are organized in three groups:
Categories
Wordnet categories and
Context words
The 'Categories' group is populated with the categories of the
Wikipedia's article that the user selected from the search box
Corr. SPARQL query evolves around the <http://purl.org/dc/terms/subject>
predicate
The 'Wordnet categories' group is populated with the wordnet
categories of the Wikipedia's title that the user selected from the search
box
Corr. SPARQL query evolves around the
<http://dbpedia.org/property/wordnet_type> predicate
The group 'Context words' is populated with information deriving from
the infobox of the corresponding Wikipedia's article
Corr. SPARQL query evolves around the <http://dbpedia.org/property/.*>
predicate along with numerous „FILTER‟ clauses
17. Discussion
So, can we compete Google? Certainly not:
Linked data is full of „noise‟
Things could improve if we all put some effort into it:
http://pedantic-web.org/
SPARQL endpoints are often too slow to respond
Unions are expensive
“FILTER regex” clauses take forever to resolve
Maybe the Database community provides solutions that will speed
things up
Size matters
Google‟s index size is far greater and fresher
And much more…
18. Discussion
Then, why bother?
We believe that GContext can be seamlessly integrated
with any major search engine that provides access to it‟s
search box
What about the „knowledge graph‟?
Too early to jump to any conclusions. It was announced
on May 16th, so far only partially deployed
A proof that we are on the right tracks:
“… go deeper and broader” i.e. infoboxes from DBpedia
“… Find the right thing” i.e. PageRedirects from DBpedia
Impact of large-scale web search engines in information seekingAccording to Alexa: Google, F/b, Youtube, Yahoo!, Baidu, Wikipedia, windows live, twitter, qq, amazon