SlideShare a Scribd company logo
1 of 21
Download to read offline
Recognition and Enrichment
of Archival Documents
Facts and Figures
• READ
• Recognition and Enrichment of Archival Documents
• 13 Partners, coordinated by the University of Innsbruck
• 10 Institutions as associated partners via a Memorandum of
Understanding
• Duration: 1.1.2016 to 30.6.2019
• Grant: 8,2 mill. EUR
• Objectives
• Applied research in pattern recognition and human language
technology
• Services for archives, humanities scholars, volunteers and
computer scientists
• Network building among those user groups
READ Consortium
READ Partners
University of Innbruck
(co-ordinator)
University of London
Technical University Valencia Technical University Lausanne
University College London University of Rostock
National Centre for Scientific
Research - Demokritos
XEROX – European Research Centre
Technical University Vienna University of Leipzig
National Archive Finland Diozesan Archive Passau
READ MoU Partners
READ MoU Partners
Australian National Library Gottfried Wilhelm Leibniz Bibliothek
National Library of Spain Centre virtuel de la connaissance
sur l'Europe Digital Humanities Lab
(Luxembourg)
The Linnean Society of London The Hessian
State Archive Marburg (Germany)
The Munch Museum (Norway) The Civic Archives of Bozen
Bolzano (Italy)
Music and Instrument Museum
Leipzig
The University and Research Library
Erfurt/Gotha (Germany)
Friedrich-August-Universität
Erlangen/Nürnberg
PLANET GmbH. (Germany)
What will remain once the
project has finished its work in
June 2019?
Publications
• H2020 Grant Agreement
• Article 29.2 Open access to scientific publications
• Each beneficiary must ensure open access (free of
charge online access for any user) to all peer-reviewed
scientific publications relating to its results.
• Open Access
• Golden way
• E.g. FrontiersIn from EPFL (Technical University Lausanne)
• Green way
• Key Performance Indicator
• 15-25 scientific publications per year
Research Data
• H2020 Grant Agreement
• 29.3 Open access to research data
• Regarding the digital research data generated in the
action (‘data’), the beneficiaries must:
• (a) deposit in a research data repository and take
measures to make it possible for third parties to access,
mine, exploit, reproduce and disseminate — free of
charge for any user — the following:
• (i) the data, including associated metadata, needed to
validate the results presented in scientific publications as
soon as possible;
What are Research Data in READ?
• Images and corresponding Reference Data
(= ground truth)
• Images = Raw material
• Reference data or ground truth = the expected, perfect
output
• Data = what is actually produced by an algorithm/tool
• Example
• Image of a page
• Correct text of a page = reference data
• Data = the text produced by a HTR engine
• Difference between expected result and actual result =
the result of a scientific experiment, e.g. measured as
Word Error Rate
Research Data are used for…
• Evaluation
• Difference between expected and actual result
• Problem description / requirements specification
• What do we actually expect from an algorithm or tool?
• Simple with HTR, but becomes much more complicated with
Layout Analysis
• E.g. do we need the whole text of a page, or maybe just
person names within one column of a table? Such questions
need to be defined and need to be reflected in the design of
the reference data
• Machine learning (training data)
• Machine learning tools need training data
• Reference data are the basis for this training process
Research Data in READ
• Key Performance Indicator
• 3 Mill. Images with at least 50.000 pages of reference
data at the end of the project
• Why such a large amount?
• Our objective is that the READ dataset is “somehow”
representative for many document types in archives, for
writing and layout styles of several centuries and
languages
• We are therefore very much interested in any kind of
digitised document collection
• Progress in computer science is strongly connected to
the availability of large data sets
Research Data for Competitions
• Key Performance Indicator
• READ will organise several research competitions at various
conferences
• Competitions
• Nowadays a popular way to measure the progress of research
in a specific field. E.g. line detection, or text recognition, or
writer retrieval…
• Evaluation of competition results
• Depends on the availability of reference data
• Attractiveness of competitions
• Dependent on the challenge itself, but also on the size of
dataset and the quality of reference data
• 160.000 EUR are foreseen as sub-contracts for the production
of reference data
Research Data in READ
• Images will be connected with reference data such as:
• Correct text (e.g. on page or line level)
• Correct writer attribution (e.g. letters with names of writers)
• Correct person names on page level
• Correct layout elements, e.g. text lines, text blocks, tables, or
forms
• Detailed descriptions of tables or forms
• Everything which is interesting for archives, scholars, the
public!
• Data will be made available e.g. via ZENODO or other
Research Data Platforms
• Archives are encouraged to provide their collections!
Open Source Software
• Release as OS
• Not an obligation of the Grant Agreement, but from the
specific e-Infrastructure call of the EU
• Foreseen for (nearly) all software tools in the project
• During 2016 we will take the first steps and move parts
of the software to GITHUB or a similar platform
• Advantage
• Many tools are research tools and therefore “not easy”
to implement
• The implementation in Transkribus will allow users to try
out the tools in beforehand
Interim summary
• Open Access to publications
• E.g. via Open Access publishers
• Open Research Data (images and reference data)
• E.g. via Repositories, such as ZENODO (run by CERN Data
service)
• Open Source for the software tools
• E.g. via open software repositories, such as GITHUB
An (expert) user will have “everything together” to
dive deeper into the results of the project
Open Platform
Build a platform which provides recognition,
transcription and enrichment of historical documents
as a general infrastructure for archives, libraries,
humanities scholars, volunteers, the public – and
computer scientists.
Why a Platform? (1)
• Software as a Service (SAAS)
• Implementation of the full range of tools from READ requires
a lot of work and knowhow
• The entrance hurdle for archives and humanities scholars is
much lower since the services can be accessed and used via
the Internet
• E.g. users are free to upload their documents, to run tests and
to further decide which services they want to use
• Machine Learning
• Most tools require large amounts of training data
• The more data are available in the platform the higher the
chance to improve accuracy
• E.g.: if a user in Greifswald transcribes a German text from
1700 these data may also be used to train the HTR engine for
a user in Bavaria. Or in the US.
Why a platform? (2)
• Cooperation
• Successful digitisation projects need collaboration between
content holders, scholars, computer scientists and volunteers
• Platform serves as a mediation tool between these
stakeholder groups
• E.g. they can define requirements, produce reference data,
implement new services, edit and correct results in a shared
manner
• Standardisation
• Full benefits of technology can only be enjoyed if a large
variety of standards is obeyed
• De-facto standardisation by using the same platform and
tools
• E.g. the real benefit of digital editions will be enjoyed once
they are centrally accessible
Service Platform
• READ Service Platform = Transkribus
• We are obliged to run the service platform from the very first
day of the project
• We are also obliged to provide a business plan in month 12
• And to implement this business plan after month 12
• Final objective
• To run and maintain the service platform also after the end of
the project
• A business model needs to be developed
• General approach
• Service levels
• To provide free services for everyone – only if some limits are
exceeded than service fees will be applied
Overview of tools and services
• Handwritten Text Recognition
• HTR based on HMM and on NN
• Keyword Spotting
• Query by Example
• Query by String
• Image Preprocessing
• Binarisation, Enhancement
• Layout Analysis
• Basic analysis of words, lines, region types (text, graphical,…)
• Table and Forms Recognition
• Generic and template based recognition
Overview of tools and services
• Document Understanding
• Columns, marginalia, date, etc.
• Automatic Writer Identification and Retrieval
• Training and retrieval of specific writers/writing styles
• Language Toolkit
• Adaptation of language resources to support HTR
• Text2Image matching
• Matching existing text with images
• E-Learning module
• Online training tool for students and volunteers to practise deciphering
of handwritten documents
• ScanApp
• The mobile phone as document scanner with direct connection to the
Transkribus service platform
• And many more…
READ Platform
http://transkribus.eu/
READ Website
http://read.transkribus.eu/ (coming soon)
User’s guide
http://transkribus.eu/wiki/
Thank you a lot for your attention!

More Related Content

What's hot

Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
The European Library
 
IIIF for Index of Christian Art
IIIF for Index of Christian ArtIIIF for Index of Christian Art
IIIF for Index of Christian Art
Jon Stroop
 

What's hot (19)

TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
TIB AV-Portal: Semantic Content Mining with Semi-Automatic Metadata Editing. ...
 
Are New Digital Literacies Skills Neededrscd2018
Are New Digital Literacies Skills Neededrscd2018Are New Digital Literacies Skills Neededrscd2018
Are New Digital Literacies Skills Neededrscd2018
 
DARIAH Athens May 2009
DARIAH  Athens  May 2009DARIAH  Athens  May 2009
DARIAH Athens May 2009
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
SMART-GS: A Tool for Studying Digitized Historical Manuscripts
SMART-GS: A Tool for Studying Digitized Historical ManuscriptsSMART-GS: A Tool for Studying Digitized Historical Manuscripts
SMART-GS: A Tool for Studying Digitized Historical Manuscripts
 
HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013HathiTrust Reserach Center Nov2013
HathiTrust Reserach Center Nov2013
 
Enrichment and Europeana
Enrichment and EuropeanaEnrichment and Europeana
Enrichment and Europeana
 
SCONUL Summer Conference 2019 - Svein Arne Brygfjeld
SCONUL Summer Conference 2019 -  Svein Arne BrygfjeldSCONUL Summer Conference 2019 -  Svein Arne Brygfjeld
SCONUL Summer Conference 2019 - Svein Arne Brygfjeld
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
Connecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked DataConnecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked Data
 
Introduction to Annotation, Content Search, and IIIF Authentication from the ...
Introduction to Annotation, Content Search, and IIIF Authentication from the ...Introduction to Annotation, Content Search, and IIIF Authentication from the ...
Introduction to Annotation, Content Search, and IIIF Authentication from the ...
 
Life of a data archive: Workflow, staff, skills, partnerships. ADP example
Life of a data archive: Workflow, staff, skills, partnerships. ADP exampleLife of a data archive: Workflow, staff, skills, partnerships. ADP example
Life of a data archive: Workflow, staff, skills, partnerships. ADP example
 
Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...Linked Data and cultural heritage data: an overview of the approaches from Eu...
Linked Data and cultural heritage data: an overview of the approaches from Eu...
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
 
IIIF for Index of Christian Art
IIIF for Index of Christian ArtIIIF for Index of Christian Art
IIIF for Index of Christian Art
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
 
Realizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyondRealizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyond
 
A non-technical introduction to text mining for information specialists
A non-technical introduction to text mining for information specialists A non-technical introduction to text mining for information specialists
A non-technical introduction to text mining for information specialists
 

Viewers also liked

Viewers also liked (8)

Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
 
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
 
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
 
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
 
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
 
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
 
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
Making Sigillographic Material Accessible to Researchers – Digitising, Catalo...
 
IMPACT Final Conference - Hildelies Balk-Pennington de Jongh
IMPACT Final Conference - Hildelies Balk-Pennington de JonghIMPACT Final Conference - Hildelies Balk-Pennington de Jongh
IMPACT Final Conference - Hildelies Balk-Pennington de Jongh
 

Similar to co:op-READ-Convention Marburg - Günter Mühlberger

Managing eResources at Universities
Managing eResources at UniversitiesManaging eResources at Universities
Managing eResources at Universities
PK Mishra
 

Similar to co:op-READ-Convention Marburg - Günter Mühlberger (20)

Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"Caplan and York, 'What It Takes To Make It Last:  E-Resources Preservation"
Caplan and York, 'What It Takes To Make It Last: E-Resources Preservation"
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
 
Research Software Engineering Inside and Outside the Library
Research Software Engineering Inside and Outside the LibraryResearch Software Engineering Inside and Outside the Library
Research Software Engineering Inside and Outside the Library
 
RELIANCE ROHub hackathon
RELIANCE ROHub hackathonRELIANCE ROHub hackathon
RELIANCE ROHub hackathon
 
Managing eResources at Universities
Managing eResources at UniversitiesManaging eResources at Universities
Managing eResources at Universities
 
C N I20080404
C N I20080404C N I20080404
C N I20080404
 
Torsten Reimer
Torsten ReimerTorsten Reimer
Torsten Reimer
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
Relationship Building and Advocacy Across the Campus
Relationship Building and Advocacy Across the CampusRelationship Building and Advocacy Across the Campus
Relationship Building and Advocacy Across the Campus
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Daniel Feerst - Society-Pays for Publishing Program
Daniel Feerst - Society-Pays for Publishing ProgramDaniel Feerst - Society-Pays for Publishing Program
Daniel Feerst - Society-Pays for Publishing Program
 
DLCS
DLCSDLCS
DLCS
 
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
 
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
 
OA Network: Heading for Joint Standards and Enhancing Cooperation: Value‐Adde...
OA Network: Heading for Joint Standards and Enhancing Cooperation: Value‐Adde...OA Network: Heading for Joint Standards and Enhancing Cooperation: Value‐Adde...
OA Network: Heading for Joint Standards and Enhancing Cooperation: Value‐Adde...
 
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...Enabling better science: Results and vision of the OpenAIRE infrastructure an...
Enabling better science: Results and vision of the OpenAIRE infrastructure an...
 
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...Enabling better science - Results and vision of the OpenAIRE infrastructure a...
Enabling better science - Results and vision of the OpenAIRE infrastructure a...
 
Digital Library
Digital LibraryDigital Library
Digital Library
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 

More from ICARUS - International Centre for Archival Research

More from ICARUS - International Centre for Archival Research (20)

ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
ICARUS-Meeting #20 | The Age of Digital Technology: Documents, Archives and S...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
ICARUS-Meeting #17 | Transparency - Accessibility – Dialogue. How a creative ...
 

Recently uploaded

GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 

co:op-READ-Convention Marburg - Günter Mühlberger

  • 1. Recognition and Enrichment of Archival Documents
  • 2. Facts and Figures • READ • Recognition and Enrichment of Archival Documents • 13 Partners, coordinated by the University of Innsbruck • 10 Institutions as associated partners via a Memorandum of Understanding • Duration: 1.1.2016 to 30.6.2019 • Grant: 8,2 mill. EUR • Objectives • Applied research in pattern recognition and human language technology • Services for archives, humanities scholars, volunteers and computer scientists • Network building among those user groups
  • 3. READ Consortium READ Partners University of Innbruck (co-ordinator) University of London Technical University Valencia Technical University Lausanne University College London University of Rostock National Centre for Scientific Research - Demokritos XEROX – European Research Centre Technical University Vienna University of Leipzig National Archive Finland Diozesan Archive Passau
  • 4. READ MoU Partners READ MoU Partners Australian National Library Gottfried Wilhelm Leibniz Bibliothek National Library of Spain Centre virtuel de la connaissance sur l'Europe Digital Humanities Lab (Luxembourg) The Linnean Society of London The Hessian State Archive Marburg (Germany) The Munch Museum (Norway) The Civic Archives of Bozen Bolzano (Italy) Music and Instrument Museum Leipzig The University and Research Library Erfurt/Gotha (Germany) Friedrich-August-Universität Erlangen/Nürnberg PLANET GmbH. (Germany)
  • 5. What will remain once the project has finished its work in June 2019?
  • 6. Publications • H2020 Grant Agreement • Article 29.2 Open access to scientific publications • Each beneficiary must ensure open access (free of charge online access for any user) to all peer-reviewed scientific publications relating to its results. • Open Access • Golden way • E.g. FrontiersIn from EPFL (Technical University Lausanne) • Green way • Key Performance Indicator • 15-25 scientific publications per year
  • 7. Research Data • H2020 Grant Agreement • 29.3 Open access to research data • Regarding the digital research data generated in the action (‘data’), the beneficiaries must: • (a) deposit in a research data repository and take measures to make it possible for third parties to access, mine, exploit, reproduce and disseminate — free of charge for any user — the following: • (i) the data, including associated metadata, needed to validate the results presented in scientific publications as soon as possible;
  • 8. What are Research Data in READ? • Images and corresponding Reference Data (= ground truth) • Images = Raw material • Reference data or ground truth = the expected, perfect output • Data = what is actually produced by an algorithm/tool • Example • Image of a page • Correct text of a page = reference data • Data = the text produced by a HTR engine • Difference between expected result and actual result = the result of a scientific experiment, e.g. measured as Word Error Rate
  • 9. Research Data are used for… • Evaluation • Difference between expected and actual result • Problem description / requirements specification • What do we actually expect from an algorithm or tool? • Simple with HTR, but becomes much more complicated with Layout Analysis • E.g. do we need the whole text of a page, or maybe just person names within one column of a table? Such questions need to be defined and need to be reflected in the design of the reference data • Machine learning (training data) • Machine learning tools need training data • Reference data are the basis for this training process
  • 10. Research Data in READ • Key Performance Indicator • 3 Mill. Images with at least 50.000 pages of reference data at the end of the project • Why such a large amount? • Our objective is that the READ dataset is “somehow” representative for many document types in archives, for writing and layout styles of several centuries and languages • We are therefore very much interested in any kind of digitised document collection • Progress in computer science is strongly connected to the availability of large data sets
  • 11. Research Data for Competitions • Key Performance Indicator • READ will organise several research competitions at various conferences • Competitions • Nowadays a popular way to measure the progress of research in a specific field. E.g. line detection, or text recognition, or writer retrieval… • Evaluation of competition results • Depends on the availability of reference data • Attractiveness of competitions • Dependent on the challenge itself, but also on the size of dataset and the quality of reference data • 160.000 EUR are foreseen as sub-contracts for the production of reference data
  • 12. Research Data in READ • Images will be connected with reference data such as: • Correct text (e.g. on page or line level) • Correct writer attribution (e.g. letters with names of writers) • Correct person names on page level • Correct layout elements, e.g. text lines, text blocks, tables, or forms • Detailed descriptions of tables or forms • Everything which is interesting for archives, scholars, the public! • Data will be made available e.g. via ZENODO or other Research Data Platforms • Archives are encouraged to provide their collections!
  • 13. Open Source Software • Release as OS • Not an obligation of the Grant Agreement, but from the specific e-Infrastructure call of the EU • Foreseen for (nearly) all software tools in the project • During 2016 we will take the first steps and move parts of the software to GITHUB or a similar platform • Advantage • Many tools are research tools and therefore “not easy” to implement • The implementation in Transkribus will allow users to try out the tools in beforehand
  • 14. Interim summary • Open Access to publications • E.g. via Open Access publishers • Open Research Data (images and reference data) • E.g. via Repositories, such as ZENODO (run by CERN Data service) • Open Source for the software tools • E.g. via open software repositories, such as GITHUB An (expert) user will have “everything together” to dive deeper into the results of the project
  • 15. Open Platform Build a platform which provides recognition, transcription and enrichment of historical documents as a general infrastructure for archives, libraries, humanities scholars, volunteers, the public – and computer scientists.
  • 16. Why a Platform? (1) • Software as a Service (SAAS) • Implementation of the full range of tools from READ requires a lot of work and knowhow • The entrance hurdle for archives and humanities scholars is much lower since the services can be accessed and used via the Internet • E.g. users are free to upload their documents, to run tests and to further decide which services they want to use • Machine Learning • Most tools require large amounts of training data • The more data are available in the platform the higher the chance to improve accuracy • E.g.: if a user in Greifswald transcribes a German text from 1700 these data may also be used to train the HTR engine for a user in Bavaria. Or in the US.
  • 17. Why a platform? (2) • Cooperation • Successful digitisation projects need collaboration between content holders, scholars, computer scientists and volunteers • Platform serves as a mediation tool between these stakeholder groups • E.g. they can define requirements, produce reference data, implement new services, edit and correct results in a shared manner • Standardisation • Full benefits of technology can only be enjoyed if a large variety of standards is obeyed • De-facto standardisation by using the same platform and tools • E.g. the real benefit of digital editions will be enjoyed once they are centrally accessible
  • 18. Service Platform • READ Service Platform = Transkribus • We are obliged to run the service platform from the very first day of the project • We are also obliged to provide a business plan in month 12 • And to implement this business plan after month 12 • Final objective • To run and maintain the service platform also after the end of the project • A business model needs to be developed • General approach • Service levels • To provide free services for everyone – only if some limits are exceeded than service fees will be applied
  • 19. Overview of tools and services • Handwritten Text Recognition • HTR based on HMM and on NN • Keyword Spotting • Query by Example • Query by String • Image Preprocessing • Binarisation, Enhancement • Layout Analysis • Basic analysis of words, lines, region types (text, graphical,…) • Table and Forms Recognition • Generic and template based recognition
  • 20. Overview of tools and services • Document Understanding • Columns, marginalia, date, etc. • Automatic Writer Identification and Retrieval • Training and retrieval of specific writers/writing styles • Language Toolkit • Adaptation of language resources to support HTR • Text2Image matching • Matching existing text with images • E-Learning module • Online training tool for students and volunteers to practise deciphering of handwritten documents • ScanApp • The mobile phone as document scanner with direct connection to the Transkribus service platform • And many more…
  • 21. READ Platform http://transkribus.eu/ READ Website http://read.transkribus.eu/ (coming soon) User’s guide http://transkribus.eu/wiki/ Thank you a lot for your attention!