A non-technical introduction to Linked Data, from a Cultural Heritage organization's perspective. This presentation is from the Provenance Index workshop at the Getty in 2016, with an emphasis on why Linked Data is valuable, as well as how it works in general. [Please see speaker notes for explanations of image slides]
2. @azaroth42
ABriefIntroduction
toLinkedData
Silo? “An insular management system incapable of
reciprocal operation with other, related
information systems” :(
Silos protect the data, but make it hard to get at.
Technologists spend a lot of time designing very
clever silos, that tend to keep the data very very
safe … and very hard to get at.
But it’s not the silo that’s the important thing we
should be thinking about.
Our Data Lives in Silos
http://www.getty.edu/art/collection/objects/34108/bernd-and-hilla-becher-grain-elevator-elliott-illinois-usa-german-1982/
3. @azaroth42
ABriefIntroduction
toLinkedData
We build lots of different types of silo, that serve the
same purpose – protecting the data – but all look
and work very differently. Technologists get excited
about new bigger, better silos … or even just
different ones.
Actually … Water Towers (1963-1995) at San
Francisco Museum of Modern Art this weekend.
In Many Silos
Becher photography at San Francisco Museum of Modern Art: https://www.sfmoma.org/artwork/FC.840.A-DD
4. @azaroth42
ABriefIntroduction
toLinkedData
The data that we’re locking up inside
them is what’s important … not the ugly
building we constructed to keep it in. We
can’t let our grain sit in its silos until it rots
away, or we all starve for lack of it.
Follower of the Egerton Master, about
1410, Man Pointing to Grain Stalks (leaf)
Data
http://www.getty.edu/art/collection/objects/3437/follower-of-the-egerton-master-a-man-pointing-to-grain-stalks-zodiacal-sign-of-leo-french-about-1410/
5. @azaroth42
ABriefIntroduction
toLinkedData
So we build crazy contraptions on top of our data
silos to get the content back out again, adding
more technology to solve the problem the first
technology created.
We try to move data from one silo to another,
because it’s more convenient with other data. So
we try to build bigger and bigger silos to hold
more and more data.
But the farm down the road still can’t get to our
data, and we can’t get to theirs …
Data Elevators?
http://www.getty.edu/art/collection/objects/34108/bernd-and-hilla-becher-grain-elevator-elliott-illinois-usa-german-1982/
6. @azaroth42
ABriefIntroduction
toLinkedData
Even when that data is about the same thing. Same
person, material, place, event. Black figure attic vases
(amphorae), attributed to the same artist (The Affecter)
in the British Museum, the Getty Villa, and the Louvre
from left to right. Much prettier silos for wine, or grain!
If only there was an information system that could take
content from multiple, distributed institutions, and
provide a way to link it together such that someone
sitting on their couch when making slides for
presentations could look up images of attic vases across
three different countries …
Common Content
7. @azaroth42
ABriefIntroduction
toLinkedData
The Web
No surprise … the web is the largest
and most successful information
system ever built. Many people have
never lived without it (a scary
thought, sorry). But while the web is
good for humans with browsers, it’s
incomprehensible to machines. So
some people thought … what can we
learn from how the web works, and
apply it to our data such that it can
be easily shared, accessed and
understood? That led to the notion
of Linked Data.
11. @azaroth42
ABriefIntroduction
toLinkedData
So ... I have to tell you now, that you may have already
been exposed to some Linked Data. That doesn’t look
like Linked Data, you say? That’s just a screenshot of a
web page… Well, what if I change the colors…
Data
15. @azaroth42
ABriefIntroduction
toLinkedData
Why is the Web Successful?
• Identify things with URLs
• Return useful information when you go to the URL
• Link to other things with URLs
• Common format for describing things
–HTML on the Web, RDF for Data
• Easy for anyone to use
• Easy for anyone to contribute
18. @azaroth42
ABriefIntroduction
toLinkedData
Data Should be Easy to Use
• Easy to understand data gets used
• More usage, more links to it
• More links, the more usage
• More usage, the more reputation
• More reputation, the more trust
• More trust, the more usage
19. @azaroth42
ABriefIntroduction
toLinkedData
Data Should be Easy to Use
• Easy to understand data gets used
• More usage, more links to it
• More links, the more usage
• More usage, the more reputation
• More reputation, the more trust
• More trust, the more usage
20. @azaroth42
ABriefIntroduction
toLinkedData
Why Linked Data?
• Avoid conceit that our institution is the only one
• We can’t know everything, but can link out
• Find new information for things we care about
– Hidden across different institutions
– Or within a single organization
21. @azaroth42
ABriefIntroduction
toLinkedData
Why Linked Data?
• Easily discover objects with the same or related
features [in different institutions]
• Merge information from [different] trusted
institutions, saving time and money
• Graph (like the web) more powerful than records
(like individual files)
24. @azaroth42
ABriefIntroduction
toLinkedData
Conclusions
• Linked Data takes the best practices of the Web
and applies them to Data
• Use URLs for identity, give access to the raw data
• Recognize that many people have something
valuable to say
• Make data publicly available
– Using standards
– Consistently with others
25. @azaroth42
ABriefIntroduction
toLinkedData
Thank You!
Rob Sanderson
rsanderson@getty.edu
@azaroth42
Further References:
– https://www.w3.org/DesignIssues/LinkedData.html
– https://www.ted.com/talks/tim_berners_lee_on_the_next_web
– Linked Data: https://www.youtube.com/watch?v=4x_xzT5eF5Q
– JSON-LD: https://www.youtube.com/watch?v=vioCbTo3C-4
– http://www.slideshare.net/azaroth42/linked-data-building-standards-and-communities
Thanks to the IIIF Community, @manusporny for content inspiration.
26. @azaroth42
ABriefIntroduction
toLinkedData
Masterclass: Distributed Identity
• Others also create URLs for the same thing
–http://dbpedia.org/resource/Affecter
–http://collection.britishmuseum.org/id/…
• Distributed datasets, connected via links
• No blessed gatekeeper
• Reconciliation:
Does URL-X identify the same thing as URL-Y?
I like the idea of these two figures shaking hands – as you’ll hear, Linked Data is about agreement.
However this is Paris shaking hands with Hermes, accepting his fate and the inevitability of the Trojan War … not the sort of agreement that I had in mind!
Silo? “An insular management system incapable of reciprocal operation with other, related information systems” :(
Silos protect the data, but make it hard to get at. Technologists spend a lot of time designing very clever silos, that tend to keep the data very very safe … and very hard to get at.
But it’s not the silo that’s the important thing we should be thinking about.
Grain Elevator, Elliott Illinois. Bernd and Hilla Becher.
(Took a lot of photos of silos, including German cement factories…)
We build lots of different types of silo, that serve the same purpose – protecting the data – but all look and work very differently. Technologists get excited about new bigger, better silos … or even just different ones.
Actually … Water Towers (1963-1995) at San Francisco Museum of Modern Art this weekend.
The data that we’re locking up inside them is what’s important … not the ugly building we constructed to keep it in. We can’t let our grain sit in its silos until it rots away, or we all starve for lack of it.
Follower of the Egerton Master, about 1410, Man Pointing to Grain Stalks (leaf)
So we build crazy contraptions on top of our data silos to get the content back out again, adding more technology to solve the problem the first technology created.
We try to move data from one silo to another, because it’s more convenient with other data. So we try to build bigger and bigger silos to hold more and more data.
But the farm down the road still can’t get to our data, and we can’t get to theirs …
Even when that data is about the same thing. Same person, material, place, event. Black figure attic vases (amphorae), attributed to the same artist (The Affecter) in the British Museum, the Getty Villa, and the Louvre from left to right. Much prettier silos for wine, or grain!
If only there was an information system that could take content from multiple, distributed institutions, and provide a way to link it together such that someone sitting on their couch when making slides for presentations could look up images of attic vases across three different countries …
No surprise … the web is the largest and most successful information system ever built. Many people have never lived without it (a scary thought, sorry). But while the web is good for humans with browsers, it’s incomprehensible to machines. So some people thought … what can we learn from how the web works, and apply it to our data such that it can be easily shared, accessed and understood? That led to the notion of Linked Data.
So what /is/ the Web then? It’s pages with URLs, with interesting content
With links in that content
To other pages with more interesting content. And it’s important to note that what makes the web work is that the content might not be in the same web site.
So ... I have to tell you now, that you may have already been exposed to some Linked Data. That doesn’t look like Linked Data, you say? That’s just a screenshot of a web page… Well, what if I change the colors…
Better? Data is often represented as label and value – the creator is Affecter, the medium is Terracotta. Those labels and values are about the object.
We can then have different labels and values for different objects. Here’s Affecter.
And instead of using the name Affecter, we can refer to the other piece of data with its identifier. This is just a relational data structure, but it’s the beginning of Linked Data.
Okay, so we have relational data, and we have the web as a model for integration between systems. Let’s see how that looks…
The easiest change is to use URLs instead of numbers or strings for identifiers. Now we’ve solved two problems:
All identifiers are globally unique, without a centralized system
They can also be linked to and retrieved, via the web
Lots of URIs … which should we use? Not only ease of contribution is important, but ease of use of the data contributed. It benefits everyone … the data is available and used, which adds to the reputation of the providing institution, which makes it trusted, which means it’s used more often …
Lots of URIs … which should we use? Not only ease of contribution is important, but ease of use of the data contributed. It benefits everyone … the data is available and used, which adds to the reputation of the providing institution, which makes it trusted, which means it’s used more often …
Can’t even maintain all the versions of the data for our own content, let alone all the data about everything! We /have/ to trust others rather than follow our Cultural Heritage instinct as curators and collectors and gather everything to ourselves.
We trust ourselves, but have different systems in all the different programs. We surely trust ourselves?
We don’t know all of the research questions up front. The best we can do is provide access to other information, just like the web links between pages and lets the user follow their nose through them.
Give examples.
The web only has one type of hyperlink, and it doesn’t have any meaning. Data on the other hand has many different labels for those links, and we need to be consistent with what they mean. We shouldn’t just invent new labels all the time, as then all we can tell is there is a relationship between the Amphora and Affecter, but no one would know what it was. Imagine if instead of creator it said P26018 … that’s what a machine sees when it sees creator… we need an agreed upon set of meanings about relationships. We have some of those already, but it’s easy to get sidetracked and create a lot of private meanings while thinking about our own data, rather than the bigger picture of the global web of data.
A new problem that the human document web doesn’t have – multiple organizations can create identities for the same real world object or person or place or concept. And they do. So we have this problem of reconciliation – does URL-X identify the same object as URL-Y ?
And the solution for it is more links and more data. This is DBPedia’s entry for Affecter, for example…
… which has a link to the Getty Guide page. That’s actually not very good as it’s part of the human web, but it’s a start.
Urgh. The Bechers have *one* page in DBPedia… a “Person” with two birthDates, two birthPlaces, … :( We need to be careful with what we trust.
This is Nomisma’s entry for Attica, which is better. It’s real Linked Data, and you can see the labels and values.
Much better – we know that the concepts are closely aligned, and it’s into the Getty Thesaurus for Geographic Names, which again is real linked data.