Europeana Newspapers Project
Workshop on Refinement and Quality Assessment
University Library "Svetozar Marković“
Belgrade, June 13th
2013
Hans-Jörg Lieder/ Ulrike Kölsch
Project Coordinator
Berlin State Library, Germany
Belgrade/June 13th 2013/University Library
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp 2
Content
Project Profile
• Consortium & Stakeholders
• Aims and Objectives
• Adding value
• Where do we go from here?
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp 3
Consortium & Stakeholders
• 18 partners from 12 countries within the consortium
National and University libraries
Universities
SME
• External partners and stakeholders
Involvement of libraries outside the project consortium via associated and
network partnerships
• Framework
Funded as a Best Practice Network in the ICT PSP program of the
European Commission
Project duration: February 2012 – January 2015
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp
Consortium Partners
10. CCS Content Conversion
Specialists GmbH
11. Stichting LIBER, Netherlands
12. National Library of Latvia
13. National Library of Turkey
14. University Library of Belgrade
15. University of Innsbruck
16. State Library Dr. Friedrich
Tessmann, Italy
17. The British Library, UK
18. Europeana Foundation,
Netherlands
01. State Library Berlin, Germany
02. National Library of the
Netherlands
03. National Library of Estonia
04. National Library of Austria
05. National Library of Finland
06. State and University Library
Hamburg, Germany
07. National Library of France
08. National Library of Poland
09. University of Salford
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp
Europeana Newspapers Consortium
NLF
SBB ONB
NLP
BnF
NLE
SUB HH
USAL
NLLLIBER,
KB, EF
CCS
NLT
UB
UIBK
LFT
BL
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp
Associated Partners
1. National Library of Czech Republic
2. National Library of Wales
3. National and University Library Ljubljana, Slovenia
4. National Library of Portugal
5. National and University Library of Iceland
6. National Library of Spain
7. National and University Library Zagreb, Croatia
8. National Library of Belgium
9. St. Cyril and Methodius National Library, Bulgaria
10.National Library of Luxembourg
11.Lucian Blaga Central University Library, Romania
Since April 2013 the project has eleven Associated partners and started
intensive networking with further libraries
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp 7
Europeana Newspapers: Aims and Objectives
• Refinement methods for OCR, OLR (article segmentation),
Named Entity Recognition (NER) and class recognition
Creation of 18 million pages of digitised newspapers
- 10 million refined pages: OCR (UIBK, Austria)
- 2 million refined pages: OCR/OLR (article segmentation) (CCS, Germany)
Delivery of 8 million pages already available locally
• Quality evaluation and prediction tools
• Aggregation and refinement of newspapers for The European Library
and Europeana
• Metadata: best practice recommendation for
Creation of OCR-ready images
Full-texts and associated metadata
NER
• Dissemination: Further libraries are encouraged and supported in
contributing newspapers content to Europeana
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp
Value: Europeana Newspapers spreads best practice
Europeana Newspapers supports the creation of a larger window
into European culture by:
• Developing best practice for the digitisation of newspapers
• Sharing best practice and experiences through workshop with project partners,
associated partners, and networking partners
• Publishing best practice on our website
• National Information days
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp
Added Value: Aggregation
Activities focused on three key messages:
1. The project and its outcomes (e.g. online access to a
collection of high-quality digitised newspapers);
2. The technological challenges (e.g. techniques for refining
content and the development of a standardised metadata
model);
3. The content-related issues (e.g. improving the extent of
newspaper digitisation, the changing nature of historical
research).
The European Library
• A single library domain aggregator
• Content from major European libraries
• Dedicated newspaper content browser
• Full-text search capabilities
• Portal for researchers
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp 10
Added Value: Scenarios
• Keyword and Phrase Search
• Image Browsing
• Access via content structure (OLR and NER results)
• Geo-location based service
• Text mining
• Crowd sourced correction and enrichment
• Access through mobile apps
• ...
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp
Where are we now?
• OCR-Processing completed almost four million newspaper pages
• Available specification of use scenarios
• Available initial versions of evaluation tools
• Europeana Newspapers survey report
• Development of three tools to support highly standardised data
creation, data controlling and data delivery within the project
• Metadata recommendations ready to be published in October 2013
• Specifications for content browser
• CCS has started work (OLR)
• Dissemination and Information
- Established associated and networking partnerships
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp
Where do we go from here
Activities focused on three key messages:
1. The project and its outcomes (e.g. online access to a
collection of high-quality digitised newspapers);
2. The technological challenges (e.g. techniques for refining
content and the development of a standardised metadata
model);
3. The content-related issues (e.g. improving the extent of
newspaper digitisation, the changing nature of historical
research).
More newspaper content
• Most libraries have digitised less than 10% of their physical
newspaper collection
More recent content
• 20th century content unavailable or only available under licence at
national level: need to work with publishers and rights holders
Exploit richness of European digitised newspaper collections
• OCR not applied across the board and often selectively
Improved accessiblity
• Richness of content has knock on effect on accessibility (e.g. full
text search)
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the
Competitiveness and Innovation Framework Programme by the European Community http://
ec.europa.eu/ict_psp 13
Why newspapers? …and how, anyway?
"Die Zeitungen sind die Sekundenzeiger der Geschichte.“
(Newspapers are the second hands of history)
(This hand however, is not only of inferior metal to the other hands, it also
seldom works properly.)
Arthur Schopenhauer
Relevant to all customers/citizens
Relevant to regional and European policies incl. Europeana
Newspaper holdings in public institutions are…
• … sometimes: solid and complete, beautiful bound; excellent microfilm copies
• … frequently: frail and crumbly, missing editions, incomplete supplements,
poorly bound; poor microfilm copies, legal uncertainties with contemporary
material
Thank you for your attention!
Contact:
hans-joerg.lieder@sbb.spk-berlin.de
ulrike.koelsch@europeana-newspapers.eu
For more information, please see www.europeana-newspapers.eu
or follow our project news via Twitter (@eurnews) and
Facebook (https://www.facebook.com/EuropeanaNewspapers)
Notas do Editor
Titel Overview Mission statement Why newspapers iew, not 1 and 6 Special focus: Turkey Thanks and bye