This document provides an overview of a course on digital humanities. It outlines the topics that will be covered in each of the 12 classes, including introductions to digital humanities, semantic modeling, crowdsourcing and visualization. One class focuses specifically on semantic coding and modeling using standards like RDF, URIs, OWL and SPARQL. It also discusses ontologies like CIDOC-CRM that can be used to semantically represent cultural heritage data.
1. Digital Humanities 101 - 2013/2014 - Course 6
Digital Humanities Laboratory
Fr´d´ric Kaplan
e e
frederic.kaplan@epfl.ch
2. o
Semester 1 : Content of each course
• (1) 19.09 Introduction to the course / Live Tweeting and Collective note
taking
• (2) 25.09 Introduction to Digital Humanities / Wordpress / First assignment
• (3) 2.10 Introduction to the Venice Time Machine project / Zotero
• 9.10 No course
• (4) 16.10 Digitization techniques / Deadline first assignment
• (5) 23.10 Datafication / Presentation of projects
• (6) 30.10 Semantic modelling / RDF / Deadline peer-reviewing of first
assignment
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
2
3. o
Semester 1 : Content of each course
• (7) 6.11 Pattern recognition / OCR / Semantic disambiguation
• (8) 13.11 Historical Geographical Information Systems, Procedural modelling
/ City Engine / Deadline Project selection
• (9) 20.11 Crowdsourcing / Wikipedia / OpenStreetMap
• (10) 27.11 Cultural heritage interfaces and visualisation / Museographic
experiences
• 4.12 Group work on the projects
• 11.12 Oral exam / Presentation of projects / Deadline Project blog
• 18.12 Oral exam / Presentation of projects
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
3
4. o
Objective of today's course
• Showing you the beauty and making you feel the power of semantic coding
• Give you a quick idea about what is behind the following strange acronyms :
RDF, URI, OWL, SPARQL, SWRL, CIDOC-CRM
• Motivate you to look deeper.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
4
5. o
A short introduction to semantic coding
• Many good books exist. I recommend
this one.
• I will reuse some of their example in the
following slides.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
5
8. o
The simplest kind of dataset, that everyone is familiar
with, is tabular data (any data kept in a table such as an
Excel spreadsheet).
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
8
10. o
Data kept in table is easy to display, sort, print, edit.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
10
11. o
You might not even think of data in an Excel spreadsheet
as modeled. But there are semantics in data table.
Where ?
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
11
12. o
There are also obvious limitations with this kind of
storage.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
12
14. o
You cannot search for the routes that stay more than 2
days at Corfu. Sorting the columns does not capture the
deeper meaning of the text we entered.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
14
15. o
Relational databases are a solution. Many very mature
products exist like Oracle DB, MySQL and PostgreSQL. A
relational database allows multiple tables to be joined in a
standardized way.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
15
17. o
But, as our project goes we may need to reformate our
tables.This is called schema migration. A painful process.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
17
18. o
For big databases, schema can get incredibly complex.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
18
19. o
Trying to normalize these databases in a single schema is
a labor-intensive process.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
19
20. o
How to make future-proof schemata
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
20
21. o
How to make future-proof schemata
• With this mode of coding we can add easily new properties (price of
Route, captain, etc.). The schema is future-proof.
• In addition, the data about the data (i.e. the medadata, the name of
columns) is now part of the data itself.
• This is ideal for projects in Perpetual Beta.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
21
22. o
and most important it makes a direct and simple
connection with a well-developed research field : logic.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
22
23. o
Indeed, this can be written in a different way
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
23
24. o
Indeed, this can be written in a different way
• (Subject Predicate Object)
• (R1 departure Venice)
• This is called a RDF statement, an atomic relation in a database
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
24
25. o
RDF statements
• (Subject Predicate Object)
• (R1 departure Venice)
• This is called a RDF statement, an atomic relation in a database
• (R1 departure-date 2.7.1422)
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
25
26. o
This is a graph
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
26
27. o
As RDF statements can be understood both a logic
statements and as parts of a graph, one can use many
tools and idea from logic and graph theory to manipulate
them.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
27
28. o
URIs
• The nodes of the Graph are called Resources.
• When you want to coordinate multiple datasets it can become
increasingly difficult to guarantee unique and consistent identifiers fore
ach node.
• R1 that we use in our database may mean something else in an other
database.
• For naming resources, RDF uses URIs (Unique Resource Identifiers) and
an optional Fragment identifier.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
28
29. o
URIs
• You are probably familiar with URL (Universal Resource Locators), the
string used to specify how web pages are retrieved.
• URIs generalize this concept further by saying that anything, whether you
can retrieve it electronically or not, can be uniquely identified in a similar
way.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
29
31. o
Since URIs can identified anything as a resource, the
subject of an RDF statement can be a resource, the object
can be a resource and most importantly predicates are
always resources.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
31
32. o
An example of URI Ref for a common RDF predicate
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
32
33. o
It is common in RDF to shorten URIs by assigning a
namespace to the base URI and writing only the
distinctive part of the identifier. The last URIs can be
written in a shorter manner : rdf:type
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
33
34. o
Serialization
• While the data model that RFD uses is very simple, the serialized
representation tends to get complicated when a RDF graph is saved in a
file or sent over a network.
• Different serialization formats exist :, N3, RDF/XML(the most freq.
used), RDFa (RDF in attributes)
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
34
35. o
Vocabularies
• A set of URIRefs is known as a vocabulary.
• We can design a specific vocabulary for our maritime route examples.
• There are also famous vocabularies like the RDF vocabulary (the set of
URIRefs describing the RDF concepts, ex. rdf :resource, rdf :type)
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
35
36. o
SPARQL
• Just as SQL provides a standard query language across relational
databases, SPARQL provides a query language for RDF graphs.
(pronounce sparkle)
• SPARQL queries attempt to match patterns in the graph and bind
wildcard variables as its finds solutions.
• Departure( ?x1,Venice)
• Captain( ?x1, ?x2), Gender( ?x2,Women)
• Semantic coding is all about asking bigger questions.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
36
37. o
SWRL
• With RDF coding, we can also write rules to infer new triples
• If hasParent( ?x1, ?x2) and hasBrother( ?x2, ?x3) then hasUncle( ?x1, ?x3)
• This is also a way of detecting possible incoherence in the set of
knowledge coded in the triple store (actors doing things after their death)
• One standard language to do this is SWRL (Semantic Web Rule
Language)
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
37
38. o
Ontologies
• An ontology provides a special vocabulary with which knowledge can be
represented.
• This vocabulary allows us to specify which entities will be represented,
how they can be grouped and what relationship connect them together.
• (Venice isa Place), (Corfu isa Place), (Place haslat latitude), (Place
haslong longitude)
• Now, something very beautiful...
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
38
39. o
An ontology can be expressed as RDF triples and stored
in a graph alongside the data it describes.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
39
40. o
An ontology can be expressed as RDF triples and stored
in a graph alongside the data it describes.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
40
41. o
OWL
• OWL (Web Ontology Language) is an ontology language layered on top
of RDF and RDFs
• Terminology statements
• ex:Bridge rdf:type rdfs:class
• ex:Bridge rdfs:subclass ex:Place
• Assertion statements
• ex:Rialto rdf:type ex:Bridge
• ex:ex:RialtoCons ex:broughtIntoExistence ex:Rialto
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
41
42. o
It is relatively easy to create your own ontology using a
software like Protégé. But some ontologies aim at being
universal
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
42
44. o
CIDOC-CRM
• CIDOC-CRM is an ontology for Cultural heritage.
• About 20 years of work.
• An ISO standard 21127.
• 100+ schema. Very stable.
• CIDOC-CRM is a tentative to formalise an underlying semantics common
to many classifications. It includes very interesting ideas.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
44
45. o
CIDOC-CRM : Events
• In CIDOC-CRM, the modelling is event-centric.
• The underlying idea is to model change, not state. Therefore, temporal
entities play a central role.
• Instead of coding the birthdate of a actor, it is better to code the event
of its birth.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
45
46. o
Actors relate to things only via temporal entities and events.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
46
47. o
CIDOC-CRM : Events
• The participation or presence of several non-temporal entities in an event
e1 allows to conclude that they have been in the same time-interval and
space, even without knowledge of the particular time or space.
• They must have existed at that time. They have not been somewhere
else at that time (with electronic communication, the space volume in
which events occur can become very large).
• The events e0i of creation of each participant i have happened before or
at the time of e1. The events e2i of destruction (or vanishing) of each
participant have happened after or at the time of e1.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
47
48. o
CIDOC-CRM : Properties
• The property P11 had participants denotes active or passive involvement
of Actors, whereas P12 occurred in the presence of ranges from objects
just being there (e.g. a desk where a treaty was signed)
• The properties P92 brought into existence, P93 took out of existence are
limiting the existence of things which have a persistent existence.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
48
52. o
CIDOC-CRM : Place
• CIDOC-CRM has also implemented a very interesting model for places.
What is hard about places ?
• The question where is it can be answered in natural language by relation
to two different kinds of entities : geometric areas or objects.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
52
53. o
In France, in Athens, 39N 124E. Points given by spatial
coordinates are typically understood as the centre of a
wider, extended area.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
53
54. o
on mount St Helens, at the Rhine river.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
54
55. o
on Queen Elizabeth (the ship), in my suitcase, at home.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
55
56. o
CIDOC-CRM : Place
• Following the CIDOC CRM, geometric areas (E53 Place) can only be
defined relative to larger objects, including the surface of earth.
• Those objects in turn may be located at different times at different places
(relative to a larger object).
• The cultural interest is in the relation to other things and not to an
abstract absolute space. Absolute coordinates seem to make no sense
when the reference objects move.
• As historical information is incomplete and sparse, and many reference
objects move, normalization of place information to absolute coordinates
should not replace the primary information, which is typically relative.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
56
58. o
CIDOC-CRM : Influence
• Another problematic issue is the notion of influence. It is difficult to
develop a systematic understanding of the different forms of influence
and their mutual relations
• Some are more physical, like using a mould or a tool. The influence of a
mould on a produced object is strong and can often be verified on the
object afterwards. The influence of a hammer is less specific.
• Similarly, making a copy of a painting has a strong influence on the
product, copying the idea of a painting, a weak one. The latter is more
an intellectual influence than a physical one.
• If a real influence existed, a temporal sequence can be deduced.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
58
63. o
Summary : Guidelines for coding historical data
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
63
64. o
(1) Prefer events to properties. Actors do not have
properties, they participate to event. Instead of coding the
birthdate of a actor, it is better to code the event of its
birth.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
64
65. o
(2) Code date intervals instead of dates. This is much
more flexible and permits to detect inconsistencies.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
65
66. o
(3) Code places in a relative manner and not an absolute
manner. The cultural interest is in the relation to other
things and not to an abstract absolute space. Absolute
coordinates seem to make no sense when the reference
objects move.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
66
67. o
All this is very beautifut, but is it sufficient to do the kind
of historical modeling we want to do ? We have an issue,
which one ?
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
67
68. o
Metaknowledge : Knowledge about how knowledge is
produced.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
68
69. o
How can we encode metaknowledge
• Expressed knowledge (RDF triples) is not in the same space as resources
(URI). We can easily attach new information to resource but not to
triples.
• It is not easy to represent metaknowledge like the origin of the
uncertainty linked with an information.
• To overcome this issue we need to introduce two levels of knowledge and
use a trick.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
69
70. o
Reifued RDF vs. Standard RDF
• An expressed RDF (RialtoReconstruction hasTimeSpan 1588-1591) can
be transformed in 3 reified triplets
• (s1 rdf:subject RialtoReconstruction)
• (s1 rdf:predicate hasTimeSpan)
• (s1 rdf:object 1588-1591)
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
70
71. o
Reifued RDF vs. Standard RDF
• An expressed RDF (RialtoReconstruction hasTimeSpan 1588-1591) can
be transformed in 3 reified triplets
• (s1 rdf:subject RialtoReconstruction)
• (s1 rdf:predicate hasTimeSpan)
• (s1 rdf:object 1588-1591)
• (s1 metardf:reliability 0.8)
• (s1 metardf:creator FredericKaplan)
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
71
72. o
Possible historical spaces
• Now our RDF store includes both historical knowledge and knowledge
about the creation of this historical knowledge.
• These kinds of metainformation can document all the construction
phases (whether realized by humans or machines)
• With this approach, we can extract through queries the historical
knowledge corresponding to some specific sources and thus create a
possible historical reality.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
72
74. o
Encoding metahistorical information
• We must not only model historical information, but model each step of
the construction of historical knowledge.
• There is a need for semantic framework capable of coding historical
information and meta-historical information.
• Coding meta-historical information implies documenting the choice of
sources, transcription phases, interpretation processes realized by humans
or machines.
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
74
75. o
No unique global truth but fully documented possible
historical reconstructions
Digital Humanities 101 - 2013/2014 - Course 6 | 2013
75