The document describes Humanities Networked Infrastructure (HuNI), an Australian project that aggregates data from 30 humanities datasets containing over 740,000 entities. HuNI harvests records from the source datasets and maps them to six basic categories, but does not import relationships between entities or perform de-duplication. HuNI allows users to create their own collections of entities, link entities, and record their views on the data. It aims to organize heterogeneous data in a way that does not entirely pre-determine relationships or impose a strong conceptual framework, instead encouraging multiple interpretations through user-contributed collections and links.
2. HuNI (Humanities Networked Infrastructure)
• Aggregates data from 30 different Australian humanities
datasets
• Data are defined as entities occurring in the source
datasets: 740,000 entities in all
• Harvested records are mapped to one of six basic
categories
• No imported relationships between entities
• No de-duplication of entities
3.
4.
5.
6.
7.
8.
9.
10. Challenges for HuNI
• How to organize and link heterogeneous data for
browsing – without entirely pre-determining the
structure and relationships
• How to make the aggregated data useful – without
imposing too much of a conceptual framework
• How to respect the different disciplinary perspectives
reflected in the source datasets
• Researchers need to be able to record and share their
views about the data
11. Concept
HuNI Record Category
Event Organisation Person Place Work
More icons = mo
PERSON A natural person
ORGANISATION A company, club, trust, gallery, political party, etc
WORK A cultural artefact or “man-made” thing created by
someone, that has some existence in its own right,
either physical or digital
PLACE A real, spatial location
EVENT An activity that occurs in space and time and may
involve people, organisations, places, works, etc.
CONCEPT Something whose existence is primarily mental
http://wiki.huni.net.au/display/DS/Data+Model
12.
13.
14.
15.
16.
17. HuNI: creating collections
• Users are able to create their own collections of data
• They can create categories and classifications, and assign individual
entities to them
• Users can choose whether to make these collections public
• The list of public collections can be seen and browsed
• Individual entities show which public collections they belong to
• The graph for each entity also shows its membership of a public
collection
18.
19.
20.
21.
22.
23. HuNI: socially-linked data
• Users are also able to create links between entities
• These links are public, by default
• There are no pre-determined links between entities
• Users can add to each others’ links, including disagreeing
with them or contradicting them
• Links can describe any kind of reciprocal relationship
• There is no pre-determined ontology or vocabulary of
relationships
24.
25.
26.
27.
28. HuNI: classification and categorization 1
• Specific individual entities and phenomena are the focus of the HuNI
data aggregate
• There is as little pre-defined classification and categorization as
possible
• HuNI avoids hierarchical ontological structures (= “flat ontologies”?)
• Entities are organized and presented primarily so that researchers can
work with them and manipulate them – classifying entities into
collections and creating links between individual entities
• HuNI is not organizing and presenting the entities so as to reflect an
authoritative classification or organization of knowledge
29. HuNI: classification and categorization 2
• Not organizing the entities for structured or faceted search and
retrieval
– Only indexing them for a basic keyword search
• Not organizing them into browsable semantic hierarchies
– Providing only basic browsing via the six categories (and the list of
source datasets)
• HuNI is trying to find a middle ground between:
– The linguistic and conceptual limitations of “search”
– The imposition of a single “normative” ontology or classificatory
semantic structure
30. HuNI: vernacular classification
• The user-contributed collections and links give meaning to the data
• Multiple interpretations and perceptions of relationships between
entities are encouraged – even if these are contradictory
• Users can express the relationships they see in the data – including
classifications and categorizations
• HuNI resists a single normative or expert interpretation or
classification of the data
• HuNI encourages the sharing of different perspectives by researchers
and other users
31. Dr Toby Burrows
Marie Curie Fellow
Department of Digital Humanities
King’s College London
26-29 Drury Lane
London WC2B 5RL
toby.burrows@kcl.ac.uk
@tobyburrows
tobyburrows.wordpress.com
32. Alternative approaches
• Search – use ontologies to classify search results (facets)
• Topic modeling – automatic generation of semantic categories
and relations from text-based Natural Language Processing
• Linked Data with light categorization for reasoning
– Vocabularies & thesauri encoded for the Semantic Web
(SKOS)
• Social tagging or “folksonomies”
v Tags are applied to entities
v There is no formal classification or categorization of concepts
v There are no relationships between tags (other than being used to tag the
same entity)
v Research into deriving ontologies from social tagging
33. Massive
A)ack
Tags
(last.fm)
00s
80s
90s
acid
jazz
alterna1ve
alterna1ve
dance
alterna1ve
rock
ambient
atmospheric
beau1ful
bristol
bristol
sound
bri1sh
chill
chill
out
chillout
dance
dark
downbeat
downtempo
dub
easy
listening
electro
electronic
electronica
england
english
experimental
favorite
favorites
favourite
female
vocalists
hip
hop
hip-‐hop
house
hypno1c
idm
indie
indie
rock
industrial
instrumental
jazz
lounge
male
vocalists
massive
a@ack
mellow
pop
psychedelic
rap
relax
rock
sexy
soul
soundtrack
technotrance
trip
hop
trip-‐hop
triphopuk