3. Overview
• What
• Why
• Focus
• Governance
• Main Activities + Budget
• Applicants
• Centres
• Sustainability
• Schedule
• Support
3
4. What I
• NL part of the CLARIN + DARIAH
infrastructures
• A research infrastructure in which a
humanities researcher
– Can find all data relevant for the research
– Can find all tools relevant for the research
– Can apply the tools to the data without any technical
background or ad-hoc adaptations
• intelligent search in and through the data
• Information extraction, analysis, aggregation,
visualisation, validation, conversion, enrichment,
annotation, …
– Can store data resulting from the research
– Can store tools resulting from the research
5
5. What II
• Virtual and distributed
• Based on one or more centres per country
6
6. Why I
• Enormous increase of available data
• Data are ‘rich’
– complex, fuzzy, ambiguous,
heterogeneous, have a time dimension
• Data are digital
– (Advanced) digital tools can be used to support
humanities research
– they must be used to cope with the quantity
7
7. Why II
• big opportunity to bring
humanities research to a new level
– Empirical basis increased by orders of
magnitude
– Information hidden in these data can be
disclosed and analysed
– Will enable new research questions
– Existing research questions can be addressed
in new ways
– Quality, effectiveness and efficiency increase
– potential for ground-breaking research
8
8. Focus
• 3 humanities disciplines
– Language studies
– Media studies
– Socio-economic history
9
9. Focus
• Why these 3?
– Language studies core of CLARIN
– Socio-economic studies core of DARIAH
– All are forerunners in the use of digital data
and tools
– Their dominant data types cover the whole
spectrum:
• Language studies: text
• Media studies: audio-visual data
• Socio-economic history: structured data
10
10. Focus
• Media studies:
– text as carrier of cultural content / information
v.
– as object of inquiry in language studies
– also an important aspect of CLARIN
– crucial for other humanities disciplines
• Cross-fertilization
– NLP techniques enable extraction of
information from texts for storage in
structured databases (e.g. socio-economic
data)
– Speech recognition + image recognition enable
advanced indexing of audio-visual material
• Solid foundation for future extension to
other disciplines 11
11. Focus
• For each discipline a core team
13
Language studies Media Studies Socio-economic
history
Researcher Sjef Barbiers
(Meertens / UU)
José van Dijck /
Julia Noordegraaf
(UvA)
Jan Luiten van
Zanden (UU)
ICT
researcher
Antal van den
Bosch (RUN)
Maarten de Rijke /
Cees Snoek (UvA)
Frank van
Harmelen (VU)
Data Centre The Language
Archive (MPI+)
NISV IISH
12. Governance
• Main features
– Based on CLARIN-NL governance
– Consortium will be formed + consortium
agreement
– Executive Board ‘lean and mean’ team
• General, technical aspects, user aspects,
dissemination, outreach, education, training
– Overview Board (Raad van Toezicht)
– National Advisory Panel
– International Advisory Panel
14
13. Main Activities
• Technical Implementation including (continuation
of) Centre set-up
• Interoperability: concrete implementations to
realize and test interoperability
– Formal and semantic interoperability
– Metadata, data, and software
– Linking publications to resources (enhanced publications)
– Compatible with CLARIN and DARIAH
• Intelligent Search: concrete implementations of
searching for, in and through data
• Enrichment/annotation, information extraction,
analysis, aggregation and visualisation software
15
14. Main Activities
• Data Curation
• Software Curation + demonstrators
• Research Pilots: test in a small research project
whether CLARIAH-functionality indeed supports
the research
• Education & Training
• Dissemination & Outreach
• Management
• EU-oriented activities
– E.g. cooperation projects with other countries
• Budget: app. 18 m€ (but still has to be finalized)
16
15. Applicants
• Small number
– Required by template
– Recommended by experts
• 2011 ‘penvoerder’ was UU, now KNAW institute
• Applicants:
– KNAW institute: Lex Heerma van Voss
– Intended director: Jan Odijk
– 3 Humanities researchers
• Sjef Barbiers
• José van Dijck
• Jan Luiten van Zanden
• Others involved sign “Letter of Intent” to
participate in CLARIAH and become consortium
member
17
16. Centres
• The Language Archive (TLA, MPI+)
• Netherlands Institute for Sound and Vision
(NISV)
• International Institute for Social History (IISH)
• Data Archiving and Networked Services (DANS)
• Huygens Institute
• Institute for Dutch Lexicology (INL)
• Meertens Institute
• National Library (KB)
• University Libraries
• …
18
17. Sustainability
• What after the CLARIAH project?
– Centres provide data / services independently
of CLARIAH (before, during and after)
– Concrete commitment by KNAW of 0.5M euro /
year for 5 years after CLARIAH to maintain the
infrastructure
– We have to organize ourselves to be able to
run the services as efficiently as possible
– For software sustainability close collaboration
with NL eScience Centre, cf. recent start of
‘Alliance for Software Sustainability’ (DANS
and NL eScience initiative)
19
18. Date Action
1 Oct 2013 Submission Deadline
Oct-Dec 2013 Consultation of referents;
rebuttal submitters;
recommendations NWO-
gebiedsbesturen.
Jan 2014 First Meeting committee
Mar/Apr 2014 Site visits
Second Meeting Committee
Begin May 2014 Committee Recommendation to NWO AB
End May 2014 NWO AB Decision
End May/Begin June
2014
AB decision to Minister
Mid 2014 Minister informs Tweede Kamer;
NWO informs submitters
Jan 1, 2015 (if awarded) Start of CLARIAH
Schedule
20
19. Support I
• Support by public and private
organisations
– Many of the data and technologies used in
CLARIAH are directly relevant for public
organisations and companies, e.g.
• Intelligent information extraction from a
heterogeneous set of ‘rich’ data
• Either as a customer or as a developer of
such technology
21
20. Support II
– Close involvement of IBM from the start
– Support by many public institutes and
companies
• Both for:
–CLARIN-NL and the
–2011 CLARIAH proposal
22