2. Linked Open Data
for Libraries, Archives, and Museums
Presented by: Julie
Bobby
Claire
Rafael
3. • Origins in electronic library catalogs from
the 1970s (e.g., WorldCat, which was
created in 1971)
• Moreover, the development of library
standards, such as MARC or Z39.50, were
designed only for the library community in
the 1960s and 1970s, respectively.
• This legacy has complicated efforts to join
that wider search stream, and it also led
burgeoning web entities such as DBpedia,
which offers a Semantic Web mirror of
Wikipedia, to originally bypass library data
Source Kelley, M. (2011, Aug. 31), How the W3C Has Come To Love Library Linked Data. Library Journal Retrieved April 20, 2012 from
http://www.libraryjournal.com/lj/home/891826-264/how_the_w3c_has_come.html.csp
LOD/LAM: Library Origins
4. • LODLAM initiative product of increasingly connected culture (MIT Sloan
school, NEH, Internet Archive)
• Since January 2011, the International Linked Open Data in Libraries,
Archives, and Museums Summit (“LOD-LAM”) have convene leaders in
their respective areas of expertise from the humanities and sciences “to
catalyze practical, actionable approaches to publishing Linked Open Data,
specifically:
• Identify the tools and techniques for publishing and working with Linked Open Data.
• Draft precedents and policy for licensing and copyright considerations regarding the
publishing of library, archive, and museum metadata.
• Publish definitions and promote use cases that will give LAM staff the tools they need
to advocate for Linked Open Data in their institutions.”
Source: “About,” LODLAM (2012) Retrieved April 20, 2012 http://lod-lam.net/summit/about/
The LOD/LAM
Intiative
6. Linked Open Data Principles
1. Use URIs as names for things
1. Use HTTP URIs so that people can look up those names
1. When someone looks up a URI, provide useful
information using standards (RDF, SPARQL)
1. Include links to other URIs so that they can discover more
things
2. Use HTTP URIs so that people can look up those
names
3. When someone looks up a URI, provide useful
information using standards (RDF, SPARQL)
4. Include links to other URIs so that others can discover
more things
Since 2006 (Voss conference) there has been a push to create linked open data as a method for processing and sharing data across libraries, archives and museums. Our presentation today discusses some key issues in this movement and attempts to provide an overview of the technology, culture, and trends of LODLAM .
In the Internet the web is linked through various documents that we voluntarily share and link to other documents. So, for example, if you are on my wordpress site you can follow a the links I have created to other documents that relate to my topic. From these documents you can click on their their links to other documents and so forth. It is a fairly elegant idea that has allowed us to share almost anything we want to with anyone willing to look. Since about 2006 there has been a push, begun with Tim Berners-Lee (inventor of the web), to share and link our data together. This is especially relevant to our cultural heritage institutions – our libraries, archives and museums – because of massive amount of data they store and create. A recent conference at the Smithsonian, Jon Voss, a leader in information technology stated that in 2010 alone data produced by libraries archives and museums increased by an astounding 1000%. So the issue we will be discussing is how do we create a structure for data, how does that structure relate to LAMs, and what can come from sharing data.
Before we look at the technical aspects of LOD I think we first need to look at the principle of LD first devised by Tim Berners-Lee in 2006 and the addendum he added in 2010 which specifically addresses linked open data. For now we won’t get too caught up on the difference between LD and LOD suffice to say one is open and the other is closed; at the level I’ll be describing them they can be considered the same thing. The first principle is straight-forward, when you name any linked data use a URI. The W3C says that a URI is essentially a superset of a URL. URL describe locations for things and URNs describe names of things. Everybody follows this rule. The second rule flows naturally from the first. Use HTTP URIs so people can look up names. It is the standard for web communication and will be used in the Web of data. The third principle establishes the need for standards for linked data. While not in place in 2006 as of now the W3C has called for RDFs and SPARQL as standards. RDF is a data structure and SPARQL is a database query. I’ll go into detail on these later. Just now that they are considered to be standards for linking data. The forth principle advocates RDF hyperlinks to aid in the discovery of other data. These operate in essesntaily the same way as hyperlinks between web documents with one exception; Tom Heath and Christian Bizer explain in their book, Linked Data: Evolving the Web into a Global Data Space, that RDF links are typed, which means they are able to describe relationships between things. For example the type ‘performed at’ may be set between a musician and a place.
The key concept of Linked Data are RDF triples. Triples are essentially nodes and links. Any piece of information can be defined as this and contains three pieces of information: The Subject (any piece of data), the Predicate (the vocabulary used to define relationships) and the object (any other piece of data. This makes data machine readable, scalable. In other words it is taking what makes the web so great and applying it data. The vocabularies used in the triples should come from commonly used authorities. I am saying should for two reasons: 1) remember that this model for creating linked open data is just a set of standards and best practices established to attempt at making it universal practice and 2) this method is so new that there may not be an ontology out there for you to use. In this case it will be necessary to create it and, hopefully, your ontolgy becomes an authoritative source. Some examples of widely used ontologies include FOAF, used to define personal data, Geo-names, an ontology for names of places, SKOS, which is an ontology used to define taxonomies.
RDF triples are elegant because they are so scalable. You can easily add millions of triples to a single piece of data. You can have millions of pieces of data. Remember that statistic about how in 2010 data for LAMs grew 1000%. What is needed is a way to access these large data-sets. That is what RDF Query languages do and W3C advocates for SPARL as the standard. SPARQL stands for SPARQL Protocol and RDF Query Language.