Presentation by Toby Burrows and Deb Verhoeven to the Fifth National Forum of AeRO (the Australian eResearch Organization), held in Perth on 26 July 2013. The presentation gives an overview of the HuNI Project as at July 2013. Topics covered include: data ingest and alignment from 28 Australian humanities datasets; building HuNI’s discovery functionality; and designing Virtual Laboratory tools for researchers.
3. • Ensure that Australian cultural datasets and the research
associated with them become part of the emerging
international Linked Open Data environment
• Enable research enquiries to move easily from: what is?
to where is?
• Support the role of annotation and metadata in discovery
of new knowledge or the means to elucidate new
knowledge
• Position the idea of data as both a subject and an object
of analysis in humanities
• Contribute to debates around standards for development
and implementation
HuNI: BROAD BENEFITS
4. • Enable humanities researchers to work with cultural datasets
more efficiently and effectively, and on a larger scale;
• Encourage the systematic sharing of research data between
humanities researchers (including the cultural dataset
curators themselves), the community and cultural
institutions;
• Encourage a greater level of cross-disciplinary and
interdisciplinary research, both within the humanities and
creative arts and between the humanities/creative arts and
other disciplines, and the wider public;
• Support innovative methodologies such as network analysis,
game theory and ‘virtual history’ that rely on large-scale
datasets
HuNI: SPECIFIC BENEFITS
5. 1. Organizational level: aligning the goals and processes of the
institutions involved
2. Semantic level: aligning the meaning of the exchanged digital
resources
3. Technical level: implementing data interoperability requires
both data integration and data exchange processes as well as
enabling effective use of the data that becomes available
Pasquale Pagano, ‘Data Interoperability’ (GRDI2020)
4. Project level: The advent of more complex ‘big humanities’
projects requires multi-disciplinary personnel, which in turn
entails the management of different workflows and
expectations: developing a consortial approach, arriving at a
common definition of project methods, etc.
INTEROPERABILITY
6. 1. The PARTNERSHIP
Consortium led by Deakin University
• Cultural data providers (10) – project co-operators
• Humanities software developer (1) – project co-
developers
• eResearch organisations (2) – lead development
agencies – VeRSI and Intersect
7. HuNI PARTNER DATASETS
AMHD
MAP
CAARP
Bonza
AFIRC
Circus Oz
AusStage
Media: film, cinema, theatre, newspapers, magazines,
advertising, music, live performances
DAAO
AustLit
AWR
ADB
DoS
Biographical: artists, designers, writers, significant
people, scientists, Sydney demographics
EOAS
AUSTLANG
Mura
Indigenous languages
15. Welcome to the Cinema and Audiences Research Project (CAARP) database: An online encyclopaedia of
cinema-going in Australia.
Data
This site contains information on film screenings and venues in Australia.
430,137 screenings
10,256 films
1,978 cinemas
1,649 companies
From 1846 to now
16. • NeCTAR investment of $1.33M
• Partner contributions of $480,000
• Partner in-kind contributions amounting to >$1M
FINANCIAL COLLABORATION
17. COMMUNITY BUILDING
• Collated user-stories (20)
• Online showcase events – next one is 4th September
2013
• Link to the alpha prototype available shortly on
huni.net.au; feedback buttons
• Wider beta launch at eResearch Australasia in October
2013
• Stay up to date through our monthly newsletter and
blog feed
• Follow us on Twitter - @HuNIVL
18. Information design challenge: to use Linked Data and
ontologies / vocabularies for data to be aligned and mapped.
• Reading the data: characteristics of the data determine
the ontological components selected and the major
entities
• Major entities identified as: people, organizations, events,
relationships, places, dates, resources, and subjects
• Components from ontologies already available and being
reused or considered: CIDOC-CRM, FOAF, FRBR, FRBR-OO,
BibFrame and PROV-O
2. INTEGRATING MEANING
22. 3. HuNI FUNCTIONALITY
Data
integration
HuNI
side
Partner
side
Data harvest,
transform
and ingest
Solr Search Server
[HuNI Data]
RDF Triple Store
[HuNI Linked Data]
Data
analysis
and
mapping
HuNI Virtual Laboratory
Scholarly researcher workflow tasks Admin tasksPublic and citizen
researcher workflow tasks
Data
discovery
Data
analysis
Data
sharing
Analyse and annotate
collection
Export collection
Share collection and
analysis
Share search results
Corbicula
Registration and login
Profile management
History recording
Project management
Simple search
Advanced search
Save search results as
private collection
Refine / expand
collection
Simple search
Advanced search
Deep (SPARQL-based)
search
Data update
and
publish ADB DAAO CAARP AFIRC AusStage
23. • 28 Australian datasets
are being harvested for
integration into HuNI
• HuNI gateway components are deployed on the NeCTAR Research Cloud.
• They harvest the XML feeds and transform them for ingestion into two
HuNI data aggregates: a Solr search server and a Jena RDF Triple Store.
DATA INTEGRATION
• Live data feeds are
deployed at the partner
sites to expose updated
partner data as XML
Data
integration
HuNI
side
Partner
side
Data harvest,
transform
and ingest
Solr Search Server
[HuNI Data]
RDF Triple Store
[HuNI Linked Data]
Data
analysis
and
mapping
Corbicula
Data update
and
publish ADB DAAO CAARP AFIRC AusStage
25. TECHNOLOGY STACK
for VL TOOLS
• Front-end frameworks – AngularJS and Twitter Bootstrap
single page Web app
• Tools hosting framework – Open Social via Apache Shindig
• Back-end framework – SpringMVC via Roo
• Layer integration – RESTful Web services
26. A researcher with a HuNI
account will be able to:
• Search the HuNI data
• Save their search results as a
private collection
• Refine their collection
through additional searches
• Analyse and annotate their
collection with their own
assertions and commentary
• Export their collection for
further analysis
• Publish and share their
collections and analyses
TOOLS for RESEARCHERS
HuNI Virtual Laboratory
Scholarly researcher workflow tasks Admin tasksPublic and citizen
researcher workflow tasks
Data
discovery
Data
analysis
Data
sharing
Analyse and annotate
collection
Export collection
Share collection and
analysis
Share search results
Registration and login
Profile management
History recording
Project management
Simple search
Advanced search
Save search results as
private collection
Refine / expand
collection
Simple search
Advanced search
Solr Search Server
[HuNI Data]
27. Researchers will be able to:
• perform a “deep search” of
the graphs in the RDF Triple
Store;
• browse by high-level facets.
The large-scale aggregation of
Linked Data makes explicit the
relationships and connections
between records across all the
partner datasets, enabling the
researcher to construct more
complex semantic queries.
TOOLS for RESEARCHERS (2)
HuNI Virtual Laboratory
Scholarly researcher workflow tasks Admin tasksPublic and citizen
researcher workflow tasks
Data
discovery
Data
analysis
Data
sharing
Registration and login
Profile management
History recording
Project management
Deep (SPARQL-based)
search
RDF Triple Store
[HuNI Linked Data]
36. 4. The PROJECT
HuNI staff:
• project director/community liaison (20%)
• project manager (100%)
• technical coordinator (100%)
• information services coordinator (90%)
• community engagement (30%)
• communication coordinator (20%)
• administrative support (20%)
• software developer(s)
NeCTAR
Directorate
HuNI
Steering
Committee
Team HuNI
Technical
Working
Group
Expert
Advisory
Group
Expert Data
Group
39. HuNI: a virtual laboratory for the humanities
http://huni.net.au/@HuNIVL
Notas do Editor
Presenting on behalf of Professor Deb Verhoeven, the Project Director
HuNI is one of the VLs funded under the NeCTAR VL programmmeDon’t need to explain NeCTAR to this audience? Focus on “Data-centred workflows” is a challenge for the humanities
Different kinds of interoperability – use these to structure the rest of the presentation
Organizational interoperability
Currently funded until 31 January 2014 – funding began June 2012
The HuNI VL needs an active community of early adopters and advocates – beyond the current partnership
Semantic interoperability
This was the initial (Phase 1) ingestion workflow – subsequently revisedData Sources ingested into RDF Triple Store and structured using components from existing ontologies
Building a core ontology to which partner data can be aligned and mapped.Components of CIDOC-CRM, FOAF and FRBR-OO ontologieswere re-used for the integration of the initial datasets. Initial focus was on people.More components were then added, especially in relation to events, and to works and expressions. Work is underway to plug in vocabularies using SKOS.
This section of the HuNI ontology shows the "joins" and class relationships, where the CIDOC-CRM and FRBR-OO ontologies align. The green bubbles record the CIDOC entities and the red bubbles record the FRBR entities. The bidirectional arrows indicate where there is a "sameAs" relationship.The unidirectional arrow indicate where there is a sub-class relationship.
That was “semantic interoperability”.Third angle is technical interoperability.Diagram shows a high-level view of the various processes.Will look at these separately.
XML publishing options for partners: OAI-PMH harvesting plus a custom-built solution for non-OAI sitesWe are not harvesting all the data – only the primary entity classes common to most partners: people, places, events and objects. Lowest common denominator is a flat XML file per class entity, together with uniquely identifying information. For the person class entity: first name, last name, date of birth/death, bio, occupation. Solr search server: aggregation of harvested XML records Jena RDF Triple Store: aggregation of stored RDF Graphs
Integration into RDF has proven to be semantically and technically complex, because: The publishing format necessary to allow us to do the mappings is too high a technical barrier for most data custodians The data analysis and mapping to a common data model is time consuming and complexSoftware performance issuesThat’s why only 6 partner data sources have been have aggregated into the RDF Triple Store so far. Work is continuing on this approach.We have also developed a Solr index.XML records are harvested from the partner feeds, transformed, and submitted to the Solr search server. 24 datasets aggregated so far. Remainder in process.
But this is not just a data integration project – VL also requires tools for researchers to use.We’re building a suite of tools for researchers to work with the aggregated dataThis is the technology stack being used for the VL tools
Tasks which can be carried out by researchers against the Solr index
Tasks which can be carried out against the Linked Data aggregate
The VL will support a workflow centred around discovery, analysis and sharing. Here’s the cartoon version of this workflow!
Researchers will be able to:Display existing connections between relevant records held within their virtual collection, and Add further links between particular records, with commentary describing the relationship between them The LORE Tool (developed at UQ) will be made modified to work with HuNI in this way.
Researchers can also export their Virtual Collections and undertake further analysis in their own tool environment.HuNI will also include a Tool Integration Framework specifying how third party tools can integrate within the lab and work with HuNI data.
Researchers will have the option to share their virtual collections, and their analyses, with other researchers
Currently in alphaLink will be made available on huni.net.au soon for testing and feedback
Fourth element of interoperability – project level.Collaborative governance structure in place: Steering Committee plus advisory groupsStaff for various function (includes some in-kind contributions)Formal project management methodology (Prince2)Some challenges: HuNI staff in four states; most effective communication methods, when to use face-to-face