1. SEMANTIC WEB Visual Resource Association 2010 Conference Engaging New Technologies, Part II Meghan Musolff, University of Michigan Greg Reser, University of California, San Diego
41. Semantic Web Linked Data Use URIs as names for things Uniform Resource Identifier
42. Semantic Web Linked Data Use URIs as names for things Use HTTP URIs so that people can look up those names
43. Semantic Web Linked Data Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using RDF or SPARQL
44. Semantic Web Use URIs as names for things Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information, using RDF or SPARQL Include links to other URIs so that they can discover more things Linked Data
45. HELLO my name is Semantic Web Linked Data http://dbpedia.org/page/Mona_Lisa
46. Semantic Web Use HTTP URIs as names for things so that people can look up those names http://dbpedia.org/page/Mona_Lisa Linked Data
47. Semantic Web When someone looks up a URI, provide useful information, using RDF or SPARQL Linked Data
48. Semantic Web Include links to other URIs so that they can discover more things Linked Data
60. Semantic Web Under the hood http://dbpedia.org/property/title "Abraham Lincoln" property tag plain text
61. Semantic Web RDF Triples (N3) Title: Abraham Lincoln Predicate (property) http://dbpedia.org/property/title Object (value) "Abraham Lincoln" Subject (thing) http://dbpedia.org/page/Abraham_Lincoln_%281920_statue%29
62. Semantic Web Title: Abraham Lincoln XML <vra xsi:schemaLocation=" http://www.vraweb.org/projects/vracore4/vra-4.0-restricted.xsd "> <work id="w_0005" refid="1" source=“Greg's imaginary collection"> <title pref="true">Abraham Lincoln</title>
63. XML <vra xsi:schemaLocation=" http://www.vraweb.org/projects/vracore4/vra-4.0-restricted.xsd "> <work id="w_0005" refid="1" source=“Greg's imaginary collection"> <title pref="true">Abraham Lincoln</title> Semantic Web Title: Abraham Lincoln
64. XML <vra xsi:schemaLocation=" http://www.vraweb.org/projects/vracore4/vra-4.0-restricted.xsd "> <work id="w_0005" refid="1" source=“Greg's imaginary collection"> <title pref="true">Abraham Lincoln</title> Semantic Web Title: Abraham Lincoln
91. Semantic Web For links to more information about this topic (and other topics covered by the Engaging Technologies Team, please see our delicious page. VRA 2010 Engaging Technologies Delicious Account
Notas do Editor
Engaging New Technologies, Part II Presentation by Meghan Musolff and Greg Reser
In order to find a definition of Semantic Web and Linked Data, let’s start by Googling the terms “semantic web” or “linked data.” Often, your search results will return an image similar to the next slide….
Scary, right?
Our goal by the end of this presentation is to have you understand this complicated image, the concepts of the semantic web and linked data, and why this is important to professionals who are attending the visual resource association (VRA) conference.
So, what is the Semantic Web? It’s about taking all the little bits of information on the web (names, dates, locations, facts, images, everything really) and linking it all together with other bits of information to facilitate both the discovery of new information and the discovery of new relationships between data.
Is the Semantic Web and the term “Web 3.0” the same thing? Some people think so…
If you think about Web 2.0 and all the related web applications that facilitate sharing and collaboration (twitter, facebook, Wikipedia, etc.), the Semantic Web is about taking all that data from those web applications, combining it with other information from educational, cultural, and government institutions, and making all that data searchable and relatable.
Included in the concept of semantic web is the idea of linked data. Linked data is the method of exposing, sharing, and connecting data on the Web.
Let’s use a common visual explanation to understand the concept. As the web exists today, data is underground. It’s rather boring and brown. One must take existing data and do something with it. And really data sets are more powerful when used together.
What the linked data method does is take data sets that exist on the web and bring them together to create something new and beautiful.
The greater idea of Linked Data is to bring together not just two or three sets of data, but all the data available on the web. In this way, we can produce a whole field of beautiful projects.
OK, you might be thinking to yourself, but doesn’t Google or any other web crawler already do this? Unfortunately, not really. Google, or any other web crawler, is searching the web for documents and links. Not necessarily the data embedded in those documents. That said, there is an aspect of Google that is using the concept of linked data and a great easy first example to the concept of Linked Data.
Let’s say, hypothetically, that you have heard of this great BBQ restaurant called Drooling Dog and you want more information on the place. So, we do a Google search and this is what our results look like…
From the search results, we can see that this restaurant has 4 stars, from 15 reviews. We can also see its price range. These bits of information, called Google Rich Snippets , give a user convenient summary information at a glance about his/her search results and make one more likely to click through to the business’ website. So, how is this accomplished? You guess it: Linked Data, my friends. Google allows webmasters to annotate his/her website following linked data guidelines. They do this by labeling content to make it clear that each piece of text represents a certain type of data: restaurant name, reviews, price range. See, you were using linked data and didn’t even know it?
Let’s do another quick example with something I know you use, although maybe you won’t admit it, every day: Wikipedia.
Here’s is the Wikipedia entry for the history of the city of Berlin. Look at all the information (dates, names, places, etc.) found in just this one section alone!
Every link in the text leads to another Wikipedia article which also contains more data and structured links, creating a whole web of information (and hence, the term World Wide Web ).
If you’re a Wikipedia junkie, you’ll know that often the most important facts from a Wikipedia entry are displayed on the right hand side of the screen in a convenient ready-access infobox. For Berlin, we can see information on elevation, population statistics, time zone, postal code, etc. These structured links can be extracted in a meaningful way: they describe relationships between things, places, events and people. Basically, this is great data just sitting there waiting to be used.
Enter the site Dbpedia. DBpedia is a project aiming to extract structured information from the data created as part of the Wikipedia project and the project's goal is to make this structured information available on the World Wide Web in a huge, cross-domain knowledge base.
As of November 2009, the DBpedia data set currently describes 2.9 million “things” with 479 million “facts.” “Things” include persons, places, music albums, films, video games, organizations, species, diseases, and links to external web pages. Included in this information is also 807,000 links to images. It really is gathering everything within the Wikipedia project. DBpedia then allows users to query relationships and properties associated with Wikipedia resources, including links to other related databases.
Let’s repeat our search for Berlin in Dbpedia. From this result page you can see some of the information from the infobox on the Wikipedia page repeated: elevation, area code, etc.
If we look further down the DBpedia entry on Berlin, we can see how it is gathering all the references in Wikipedia to Berlin in one place. For example, this screen shot shows some of the other information, identified by tags, about Berlin that is available on Wikipedia. Here we have links about soccer clubs, alma maters, and birth places.
Basically, this data on Berlin, gathered from Wikipedia and compiled by DBpedia, is waiting for an application to work its magic. One such application is the BBC’s Search+. When one searches for Berlin on the BBC website…
… one category of search results is labeled “Knowledge.” The information found in this knowledge section is imported from DBpedia and other linked data sources. Cool, right?
Another example? Sure thing. Here we repeat our search, except this time search for Elton John…
Here's the knowledge section on the right side, pulling data from Dbpedia and other sources.
Scrolling down this Knowledge section, we find out a link to Neil Diamond…
… And just like that we are looking at the BBC Search page for Neil Diamond.
Now at this point, Greg and I thought it would be super cool for Neil Diamond to appear on stage with us. Unfortunately, he didn't return our phone calls. We'll just have to make do with this picture. Sigh.....
An even cooler application of DBpedia information is a project called DBpedia mobile. DBpedia Mobile is a location-centric DBpedia client application for mobile devices consisting of a map view and a GPS-enabled launcher application.[
Based on the current GPS position of a mobile device, DBpedia Mobile renders a map containing information about nearby locations from the DBpedia dataset. DBpedia Mobile's initial view is a browser-based area map that indicates the user's position and nearby DBpedia resources with appropriate labels and icons.
Then a user can click on these points of interest and be directed to more information on the site. Awesome.
These are just a few quick examples of using Linked Data. But the real exciting aspect is the amount of data that is becoming available to fuel these types of applications. At the beginning there were just a few datasets incorporated into DBpedia.
But in just a short span of time, the concept has grown and grown. And there are tons of people marking up their data in order to conform to linked data standards.
Linked Data a structured approach to the semantic web. It aims to make data easy to understand in different contexts.
Linked Data has four major principles laid down by Tim Berners-Lee: Principle 1: Use URIs as names for things. Uniform Resource Identifiers are unique and will ensure that the Things you describe are distinguishable from other Things. Because web domains are registered, you can be certain that your URI is unique. If it doesn't use the universal URI set of symbols, it‘s not Semantic Web.
Principle 2: Use HHTP URIs so the people can look up those names. URIs should be actionable. They should do something. HTTP is universally actionable. Every computer can do something with HTTP.
Principle 3: When someone looks up a URI, provide useful information, using RDF or SPARQL Don’t send users down a blind alley. The destination should be more than just text and HTML, it should be actionable at every step.
Principle 4: Include links to other URIs so that they can discover more things. Make your document a starting point to other linked documents. All of this enables users to deal with the information with greater efficiency and certainty.
This is DBpedia’s URI name for the Mona Lisa. Yes, there can be many different URIs for the Mona Lisa, but by making it a link, a person and a machine can know the source and can trace it back to find synonyms or “Same As” links. Think of HTTP URIs as names (not addresses).
The DBpedia name for “Mona Lisa” takes you to this page which is full of other data in the form of links.
Here are some of the actionable links in the DBpedia “Mona Lisa” page. You can look up the properties and classes you find in this data and get information from the RDF, RDFS, and OWL ontologies including the relationships between the terms in the ontology. This means that you, or the web service you are using, can make intelligent connections between this data and data from other sources.
The links in the page take you to other related links – they can take you down many different paths.
Here is an example of how DBpedia’s linked data can be used for another work of art – the statue, “Abraham Lincoln” in the Lincoln Memorial
This is the Wikipedia record for Daniel Chester French, the sculptor of the statue “Abraham Lincoln”.
Here is the DBpedia linked data browser display for Daniel Chester French. It includes information about The Lincoln Memorial and other works by French.
The faceted search options on the sidebar are created from the links in the Wikipedia record. This allows the user to narrow the results to what they are really interested in.
How is this done? By treating each link as a URI for a thing – a concept, time, person, place etc. then grouping those into navigation links.
The links are parsed out into various vocabularies such as SKOS, OWL, and DBpedia.
This section of the DBpedia record shows various buildings French is the “architect of”.
Other sculptures French is the “artist of”.
When you click on Artist of: Abraham_Lincoln_(1920_statue), you are linked to the page which contains information about the statue.
The page we’ve been looking at is still an html view of the data, what’s going on under the hood? Focusing on the DBpedia property “title”, let’s look at how the data is actually written.
The name of the statue is actually the HTTP address for the whole page. Yes, the actual address written in the browser location bar.
The property is written as an HTTP link. The value of the data is written as plain text.
This way of serializing data is known as RDF Triples – Subject + Predicate + Object
This part of a VRA Core 4 XML record for Abraham Lincoln sculpture.
The work id is not machine resolvable – w_0005 is not a universal unique identifier and is certain to be repeated in many other databases. Yes, you can add a URI to this element. The point is that it is not required for XML whereas it is for RDF.
Some elements of the Core 4 XML can be referenced and repurposed with a some work. You could write a script that would join the xsi schema to the “title” property, but this places a huge burden on the user end.
A visual display of the structure of RDF Triples. This is a collection of 26 different elements, each written on a single row. You can clearly see the structure of Subject, Predicate, Object.
RDF/XML – I wont go into detail here except to say that, while it is structured differently, it uses the same principles as triples. It’s all about links.
This is another linked data browser called RelFinder. It allows you to search for two or more Things and then see the relationships between them.
It is a dynamic visualization. By clicking on the names, you can see different relationship paths.
You also see data from DBpedia in the sidebar.
How linked data is useful to VRA: #1. Your images can do a lot more. They can be used by linked data web browsers as well as linked data aggregators without you having to do any additional work.
How linked data is useful to VRA: #2. Your images will be found by many more users many of whom would never come to your institutional database.
How linked data is useful to VRA: #3. By incorporating linked data into your interface, your users are given much richer data with better accuracy and they can be taken to resources they didn’t know existed.
Here is an example of how linked data can be used to improve image search accuracy. Flickr Wrappr combines data from DBpedia and Flickr for more accurate results. The service only returns images licensed under CC-BY, CC-NC, CC-NC-SA, CC-SA licenses, that is, images that may be used in derivative works.
Flickr Wrappr takes your search and queries DBpedia. When a match is found, the GPS coordinates are pulled out and used to search Flickr.
Flickr results for “White House” – lots of results, lots of wrong results
Flickr results for “White House” – completely unrelated images.
Flickr wrappr results for “White House” – much more precise.
Plus, Flickr wrappr gives you all this other linked data (partial view of page).
The new Library of Congress Subject Heading website uses linked data.
This is the result screen for the heading “Memorials – Washington (D.C.)”
LCSH uses HTTP URI’s to identify the headings.
They also allow you to see the data in RDF/XML, N-Triples and JSON.
There is also a Visualize tab which turns the linked data into a dynamic chart.
This is more than just a cool visualization. It shows how the linked concept work together and each term is an active link to more information and more relationships.
The Library of Congress’s Chronicling America provides several Linked Data views to make it easy to connect with other information resources and to process and analyze newspaper information with conceptual precision. Linked Data allows LC to connect the information in Chronicling America directly to related data on the Web explicitly.
This the front cover of the January 15, 1905 New York Tribune Sunday Magazine.
LC uses linked data to bring together various ontologies to ensure that what they mean by &quot;title&quot; and &quot;issue&quot; is consistent with the intent of other publishers of linked data. Concepts like Title (defined in DCMI Metadata Terms) and Issue (defined in the Bibliographic Ontology) are cross referenced between ontologies to ensure accurate usage.
The link to “More about this newspaper” gathers data from other LC records as well as various linked data databases online. This provides a much richer environment for the user.
VRA DBpedia page. We are all interested in using standards to make images easier to find.