In this presentation, I will discuss how modern search engines, such as Google, make use of Linked Data spread inWeb pages for displaying Rich Snippets. Also i will present an example of the technology and analyze its current uptake.
Then i sketched some ideas on how Rich Snippets could be extended in the future, in particular for multimedia documents.
Original Paper :
http://scholar.google.com/citations?view_op=view_citation&hl=en&user=K3TsGbgAAAAJ&authuser=1&citation_for_view=K3TsGbgAAAAJ:u-x6o8ySG0sC
Another Presentation by Author: https://docs.google.com/present/view?id=dgdcn6h3_185g8w2bdgv&pli=1
How google is using linked data today and vision for tomorrow
1. How Google is using Linked Data Today
and Vision For Tomorrow
Thomas Steiner (Google, Germany), Raphael Troncy (Eurecom,
France) and Michael Hausenblas (Deri, Ireland)
Published at The Future Internet Assembly, Dec 2010, Ghent, Belgium
Research Paper http://bit.ly/SWoHYQ
Presenter :
Vasu Jain
2. Contents
• Challenges on Web in terms of research
• How Google is using Linked data to display Rich Snippets
• Rich Snippets formats and entities supported and analysis of its usage
• Visual Examples of Rich snippets
• RDFa, Microformat and Microdata
• Presence on Web and Business Impact
• Extension of Rich Snippets in the future in particular for multimedia
content
• Future Internet Architecture and Thought-experiment of Triple-centric
Networking
• Displayed your website’s rich snippets in Google search results
• Conclusion, References and Useful Links
April-2012 Contents 2
3. Challenges on Web in terms of research
• Web is important part of Application layer of Network architectures. Two major trends
opening huge perspectives and challenges on Web in terms of research
• The Web of Data (also called Semantic Web)
• The Social Web (also called Web 2.0)
• As originally envisioned, "The Semantic Web provides a common framework that allows
data to be shared and reused across application, enterprise, and community
boundaries. It is a system that enables machines to "understand" and respond to
complex human requests based on their meaning.
• Web 2.0 applications are a new trend in Web development and design that facilitates
communication, secures information sharing, interoperability, and collaboration.
Web 2.0 basically refers to a dynamic Web that includes open communication with an
emphasis on Web-based communities of users, and more open sharing of information.
• Web 2.0 is ‘The Web as Platform’ and the Semantic Web is ‘The Web of Meaning’.
April-2012 Challenges on Web in terms of research 3
4. Shift triggered by these trends
Fundamental shift triggered by these trends:
•While previously the Internet has been concerned about sending bits from one host of
the network to another, new applications now require to make sense out of those bits.
•In other words, the Internet architecture needs a new layer, that takes care of data
interoperability for interconnecting pieces of machine- process able data to make sense
out of them.
Proposals for New Layer
•A new data layer to the Systems Interconnection (OSI) stack, a so called Linked Data
Layer" located between the application layer and the presentation layer and that aims to
make sense of the data in such a way that it establishes interoperability between different
applications.
April-2012 Shift triggered by these trends 4
5. Snippets in Google Search
In 1998, Google introduced Snippet, a short description of or excerpt from a website which
appears in Google search results. Snippets are created automatically based on the site's
content.
April-2012 5
Snippets in Google Search
6. Rich Snippets
• Rich Snippets in Google Search
In 2009, Google announced, Rich Snippets, a new presentation of snippets that applies
Google's algorithms to display structured data embedded in search result pages with
the objective of highlighting the searched for properties to user in a visually
outstanding way.
Rich Snippets give users convenient summary information about their search results at
a glance. When searching for a product or service, users can easily see reviews and
ratings, and when searching for a person, they'll get help distinguishing between
people with the same name.
April-2012 Conclusions 6
7. Rich Snippets
• Rich Snippets formats Supported
A lot of previous work on structured data has focused on debates around encoding
format to be accepted
Later on Google realized that structured data on the web can and should accommodate
multiple encodings and thus accepted both Microformat encoding and RDFa encoding.
The Rich Snippet feature was built on open standards or community-agreed-on approaches
• RDFa (Resource Description Framework – in Attributes)
http://en.wikipedia.org/wiki/RDFa
• Microformat Encoding
http://en.wikipedia.org/wiki/Microformat
• Microdata Encoding
http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html
An additional feature was launched along with Rich Snippets announcement, Rich Snippets in
Custom Search, similar to Yahoo!'s BOSS (Build your Own Search Service) initiative.
April-2012 Rich Snippets 7
8. Entities in Rich Snippet Encodings
Entities supported by Google Rich Snippets as of now….
•Software applications
•Breadcrumbs
•Events
•Music
•Businesses and Organizations
•People
•Products
•Recipes
•Review Ratings
•Reviews
•Videos: Facebook Share
(… they promise to get more soon)
April-2012 Entities in Rich Snippet Encodings 8
13. Rich Snippets - RDFa
• RDFa (Resource Description Framework – in Attributes)
RDFa is a way to label content to describe a specific type of information, such as a
restaurant review, an event, a person, or a product listing.
These information types are called entities or items. Each entity may have a number of
properties. For example, a Person properties are name, address and email address.
In general, RDFa uses simple attributes in XHTML tags (often <span> or <div>) to assign
brief and descriptive names to entities and properties..
• An Example (Ticket Booking of an upcoming Event)
A short HTML block showing an entity. Here, details about entity can be marked up in
the body of a Web page in order to help understanding the location, schedule, price or
reviews of the event.
April-2012 Rich Snippets - RDFa 13
14. Rich Snippets - RDFa example
Mark-up for an event at a certain business location.
From structured
mark-up on a Website...
typeof="v:Event" indicates the marked-up content describes an Event (Item type)
The dimensions that composed the event (description, type, starting time) are described
with properties. The property name is prefixed with v: <span property="v:description">
April-2012 Rich Snippets - RDFa 14
15. Rich Snippets - RDFa example
...to a Rich Snippet on Google
Caution: Google does not display information that isn't visible to the user like Hidden div's .
It can be tempting to add all the content relevant for a rich snippet in one place on the
page, mark it up, and then hide the entire block of text using CSS or other techniques.
Google will not show content from hidden div's in Rich Snippets.
Exceptions: Geo information (latitude and longitude of location) can be included in the
HTML markup.
April-2012 Rich Snippets - RDFa 15
16. Rich Snippets – Microformat & Microdata
• Microformat and Microdata are simple conventions (known as entities) used on web
pages to describe a specific type of information like a person, product etc. Each entity
has its own properties.
• Microformats use the class attribute in HTML tags (often <span> or <div>) to assign
brief and descriptive names to entities and their properties.
• Microdata uses simple attributes in HTML tags (often <span> or <div>) to assign brief
and descriptive names to items and properties.
April-2012 An example of HTML block showing basic contact info for a person. 16
17. Rich Snippets – Presence on Web
Statistics with regards to semantic mark-up on the Web in June 2010
•A random sample of one million Web pages have been harvested in order to compare the
use of Microformats and RDFa markup.
•Then, they examined how much of this mark-up data was actually used for Rich Snippets.
•Only a tiny fraction of all semantic mark-up present on the Web was used for Rich Snippets
at the time of this experiment.
Pitfalls
•Incorrect labeling (e.g. marking up the date of an event as part of the event description),
•Incorrect inclusion of unrelated words in the structured mark-up (e.g. marking up written
by John Doe" rather than just John Doe" as value of the property v:reviewer).
•Furthermore, they observe a general confusion with what parts of a document should be
marked up at all. Although some web pages include RDFa event markup, none of them are
used by the Rich Snippet technology as of today.
April-2012 Presence on Web 17
18. Rich Snippets – Presence on Web
April-2012 Presence on Web 18
19. Rich Snippets - Business Impact
Benefits of Rich Snippets in Google Search …
•Webmasters: Provided webmasters the ability to add useful information to their web
search result snippets to help Google make sense of their bits.
•Purpose To provide more information to a user about the content that exists on page so
they can decide which result is more relevant for their query.
•Additional Traffic to a webpage With extra information people tend to rely more on a
particular search result with linked data, thus an increasing number of impressions noted
on sites with Rich Snippets.
•Higher Click through Rate An increasing number of higher click-through rate for pages
with Rich Snippets was experienced as shown in a paper by Kavi Goel, Pravir Gupta
•Rich snippet markup accurately reflects the primary content of your page. Web sites can
suffer significant sales collapses by going down a position in their natural search ranking.
•Easy to add simple lines of Markup to existing HTML, no affect to Visual appearance of
the webpage.
April-2012 Business Impact 19
20. Vision for Rich Snippets in Future
New business-related vocabularies such as the Tickets Ontology, are expected to see
broader and broader usage and implementation. This would allow for comparative Rich
Snippets.
Using information from the user’s Social Graph, given user has given access to her social
graph. This would mean to carry part of the Facebook experience right into the search
experience.
Conclusions 20
21. Vision for Rich Snippets in Future
Even Richer Snippets using Multimedia semantics to provide richer video search results. We
believe that there is high potential for semantically annotated multimedia content to
improve content search.
We show a mock-up of a person highlighted, which could be based on media fragment URIs.
Such media fragment URI could look like:
http://example.org/video.webm?t=428,434#xywh=150,60,50,70&xywh=240,50,50,70
Vision for Rich Snippets in Future 21
22. Future Internet Architecture
Today’s Rich Snippets: Content is exclusively determined by the information in one particular
Web page
Vision of extended Rich Snippet: Outlined above features information from more than just
one data source. Thus, an information sharing mechanism must be established to combine
information coming from various data sources.
Content-centric Networking: Two notions of packages involved: Interest and Data packages.
Interests get broadcast by consumers, and as soon as a node can satisfy an interest, it
responds with the data. Otherwise, it rebroadcasts the interest.
Advantage over common host-based networking
•Data packages are not only exclusively thought for the initially interested node, but can be
shared between nodes with common interests.
•Useful when many parties are interested in the same content.
April-2012 Future Internet Architecture 22
23. Future Internet Architecture
• Figure illustrates how the interest and data packages could look like if we applied the
principle of Content-centric Networking to Triple-centric Networking.
• Google is at the early stages of this thought-experiment and have not carried out any
experimentation to justify their assumption.
April-2012 Pros & Cons of the paper 23
24. Displayed your website’s rich snippets
in Google search results
Websites like TripAdvisor, Yelp, Amazon etc. stand out over other search results with their
star ratings thus increasing their click through. To get your website’s rich snippets displayed
on Google search results:
1. Mark it up with microformats : Markup your website with extra information for the entities
like Reviews, People, Products, Businesses, Recipes, Events
2. Test to make sure it works : Use the Google rich snippets testing tool
3. Submit your site to Google http://www.google.com/support/webmasters/bin/request.py?
contact_type=rich_snippets_feedback
4. Google approves websites that they see as reliable source of reviews, have a substantial
amount of reviews, are marked up correctly.
April-2012 Displayed your website’s rich snippets in Google search results 24
26. Conclusion
It has become visible that Rich Snippets are a very sensible element in the Linked Data value
chain due to the high visibility and the confirmed change of user click-through behavior.
We have interlinked the social graph of a user with common event-related data in the Linked
Data cloud.
It is obvious that for the online ticket search example, the decision what ticket vendor to
include, and what vendor to exclude from the vendors shown in the Rich Snippets is a crucial
one.
The suggested addition of a Linked Data layer between the current application and
presentation layer could help establish the links between the data providers and facilitate to
make sense of data and present them in an efficient way.
April-2012 Conclusions 26
27. Useful Links
• Expression-of-interest form for webmasters to indicate their interest for Rich Snippets to be
shown for their pages.
http://support.google.com/webmasters/bin/request.py?&contact_type=rich_snippets_feedback
• Rich Snippets Testing Tool Beta
http://www.google.com/webmasters/tools/richsnippets
April-2012 Useful links 27
28. References and Useful Links
• http://en.wikipedia.org/wiki/Google_Search#Rich_Snippets
• http://en.wikipedia.org/wiki/Semantic_Web
• https://twitter.com/#!/tomayac
• http://www.zdnet.com/blog/web2explorer/web-20-and-semantic-web-mars--venus/13
• http://googlewebmastercentral.blogspot.com/
• http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146646
• http://support.google.com/webmasters/bin/answer.py?hl=en&answer=99170
• http://support.google.com/webmasters/?hl=en#topic=21997
• About RDFa
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146898
• About microformat
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=146897
• About microdata
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=176035
April-2012 Pros & Cons of the paper 28
Eurecom is a graduate school in the domain of information and communication technology and a research center in communication systems. Digital Enterprise Research Institute , DERI , is a research institute at the National University of Ireland, Galway . Its focus is research into the Semantic Web and Web Science
. Humans are capable of using the Web to carry out tasks such as finding the Irish word for "folder", reserving a library book, and searching for the lowest price for a DVD. However, machines cannot accomplish all of these tasks without human direction, because web pages are designed to be read by people, not machines. The semantic web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web.
embedded
They were developed by Google but built on open standards, so rich snippets were later adopted by other search engines. Rich snippets can support various markup languages to help identify the information to be presented, such as event or item information.
A breadcrumb trail is a set of links (breadcrumbs) that can help a user understand and navigate your site's hierarchy All other @ http://support.google.com/webmasters/?hl=en#topic=21997
embedded
embedded
embedded
embedded
embedded
embedded
embedded
embedded
The components of this URI are first a temporal dimension (t=428,434), which selects seconds 428 to 434 of the whole video, and then a spatial dimension (xywh=150,60,50,70 and xywh=240,50,50,70), which creates two bounding boxes at the x, y parameters with a width w and a height h.