SlideShare uma empresa Scribd logo
1 de 15
Copyright: Olmsted County Historical Society 
Europeana Newspapers 
…in a nutshell 
Newspapers in Europe and the Digital 
Agenda for Europe - Final Workshop 
29 September 2014, London, British Library 
Clemens Neudecker, State Library Berlin 
@cneudecker
Facts & Figures 
• Europeana Newspapers – EU ICT-PSP Best Practice Network 
• Started in February 2012 and will run until January 2015 
• 18 partners, 11 associated partners, 22 networking partners 
(28 countries involved) 
• Total budget: €5.16M – EC contribution: €4.12M 
• Project coordination: State Library Berlin / Preußischer Kulturbesitz 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 2
Europeana Newspapers is all over Europe…and beyond 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
3 
Red = Project 
Partners 
Blue = Associated 
Partners 
Green = Networking 
Partners
Refinement - we‘re scaling it up! 
• 8 million pages refined with Optical Character Recognition (OCR) 
• 2 million pages refined with Optical Layout Recognition (OLR) 
• Technical resources for Named Entity Recognition (NER) in 
three languages (Dutch, German, French) 
• Metadata for >18 million pages ingested to Europeana 
 In comparison: currently provides access to 
8,056,532 pages 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
4
Quality & Performance 
Bag of Words OCR Evaluation 
Per Language 
100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
Layout Analysis Performance 
Per evaluation profile 
Bag of Words OCR Evaluation 
Per Font 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
Bag of Words OCR Evaluation 
5 
82.4% 
85.3% 
80.9% 
75.9% 
67.5% 
83.4% 84.1% 
68.1% 
93.1% 
57.6% 
87.0% 
68.3% 
76.1% 
82.6% 
54.1% 
32.7% 
100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
Success Rate 
Language Setting 
71.9% 
74.3% 
80% 
75% 
70% 
65% 
60% 
55% 
50% 
Index based Count based 
Success Rate 
Bag of Words OCR Evaluation 
Index based rate vs. count based rate 
79.1% 
62.2% 
55.9% 
58.8% 
94.7% 
0% 
Keyword 
search 
Phrase search Access via 
content 
structure 
Print/ebook 
on demand 
Content 
based image 
retrieval 
Success Rate (harmonic, area based) 
Evaluation Profile 
67.3% 
81.4% 
64.0% 
100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
Gothic Normal Mixed 
Success Rate 
Font 
FineReader vs. Tesseract 
75.3% 
53.78% 
100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
Success Rate (count based) 
OCR Engine 
FineReader Tesseract
Access via TEL & Europeana 
• Full text search in TEL Historic Newspapers Browser: 
http://www.theeuropeanlibrary.org/tel4/newspapers 
(recently updated following usability testing) 
• Metadata search in Europeana: 
http://www.europeana.eu/portal 
(now with embedded object presentation via TEL) 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
6
Full-text search 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
7
Browse by date 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
8
Explore on a map 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
9
Title list 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
10
Embedded TEL Viewer in Europeana! 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 11
Metadata Best Practices 
• Europeana Newspapers METS/ALTO Profile (ENMAP) 
• Contributions to ALTO standard v2.x, v3.0 
• Structural metadata with tool support - Structify 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
12
Media, News, Events 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
13
Lots of opportunities for research & reuse 
• Metadata for >18M pages licensed CC0 
• Images & full-text for 10M pages licensed public domain 
• See also: 
http://www.europeana-newspapers.eu/ 
category/ 
interviews-with-researchers/ 
This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the 
Competitiveness and Innovation Framework Programme by the European Community 
http://ec.europa.eu/ict_psp 
yet another way to reuse 
newspapers… 
14
Thank you for your attention! 
@eurnews 
http://www.europeana-newspapers.eu 
http://www.theeuropeanlibrary.org/tel4/newspapers 
http://www.europeana.eu/

Mais conteúdo relacionado

Mais procurados

Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers
 
Realising the value of Europe's newspaper heritage
Realising the value of Europe's newspaper heritage Realising the value of Europe's newspaper heritage
Realising the value of Europe's newspaper heritage Europeana Newspapers
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers
 
Large scale refinement of digital historical newspapers with named entities r...
Large scale refinement of digital historical newspapers with named entities r...Large scale refinement of digital historical newspapers with named entities r...
Large scale refinement of digital historical newspapers with named entities r...cneudecker
 
Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspapers
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspaperscneudecker
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers
 
Europeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana Newspapers
 
Europeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaEuropeana Newspapers
 
Performance Evaluation and Quality Assessment
Performance Evaluation and Quality AssessmentPerformance Evaluation and Quality Assessment
Performance Evaluation and Quality AssessmentEuropeana Newspapers
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectEuropeana Newspapers
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayEuropeana Newspapers
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayEuropeana Newspapers
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers
 

Mais procurados (20)

Europeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop introEuropeana Newspapers LIBER2013 Workshop intro
Europeana Newspapers LIBER2013 Workshop intro
 
Refinement
RefinementRefinement
Refinement
 
Realising the value of Europe's newspaper heritage
Realising the value of Europe's newspaper heritage Realising the value of Europe's newspaper heritage
Realising the value of Europe's newspaper heritage
 
ENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilmsENP_Dutch_Infoday_LWilms
ENP_Dutch_Infoday_LWilms
 
Europeana Newspapers Project
Europeana Newspapers ProjectEuropeana Newspapers Project
Europeana Newspapers Project
 
Metadata
MetadataMetadata
Metadata
 
Europeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista KiisaEuropeana Newspapers Estonian Infoday Krista Kiisa
Europeana Newspapers Estonian Infoday Krista Kiisa
 
Large scale refinement of digital historical newspapers with named entities r...
Large scale refinement of digital historical newspapers with named entities r...Large scale refinement of digital historical newspapers with named entities r...
Large scale refinement of digital historical newspapers with named entities r...
 
Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013Europeana Newspaper metadata LIBER2013
Europeana Newspaper metadata LIBER2013
 
Refinement of Digitised Newspapers
Refinement of Digitised NewspapersRefinement of Digitised Newspapers
Refinement of Digitised Newspapers
 
Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013Europeana Newspapers wp2 liber2013
Europeana Newspapers wp2 liber2013
 
Europeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLiederEuropeana_Newspapers_ONB_infoday_HJLieder
Europeana_Newspapers_ONB_infoday_HJLieder
 
Europeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation PlanEuropeana Newspapers Aggregation Plan
Europeana Newspapers Aggregation Plan
 
IFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza AtanassovaIFLA 2014 Europeana Newspapers Rossitza Atanassova
IFLA 2014 Europeana Newspapers Rossitza Atanassova
 
Performance Evaluation and Quality Assessment
Performance Evaluation and Quality AssessmentPerformance Evaluation and Quality Assessment
Performance Evaluation and Quality Assessment
 
Overview of the Europeana Newspapers Project
Overview of the Europeana Newspapers ProjectOverview of the Europeana Newspapers Project
Overview of the Europeana Newspapers Project
 
Presentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information DayPresentation of Hans-Jörg Lieder, BnF Information Day
Presentation of Hans-Jörg Lieder, BnF Information Day
 
Presentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information DayPresentation of Clemens Neudecker, BnF Information Day
Presentation of Clemens Neudecker, BnF Information Day
 
Europeana Newspapers Polish Information Day
Europeana Newspapers Polish Information DayEuropeana Newspapers Polish Information Day
Europeana Newspapers Polish Information Day
 
Europeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday GenereuxEuropeana Newspapers LFT Infoday Genereux
Europeana Newspapers LFT Infoday Genereux
 

Semelhante a Europeana Newspapers Project Digitizes Millions of Newspaper Pages

Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers
 
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Apulian ICT Living Labs
 
Max Lemke, Head of Unit, Components and Systems, European Commission
Max Lemke, Head of Unit, Components and Systems, European CommissionMax Lemke, Head of Unit, Components and Systems, European Commission
Max Lemke, Head of Unit, Components and Systems, European CommissionI4MS_eu
 
Acatech.pptx
Acatech.pptxAcatech.pptx
Acatech.pptxFIWARE
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerBiblioteca Nacional de España
 
Closing Remarks - AAL Market Observatory Workshop - 22 May 2014
Closing Remarks - AAL Market Observatory Workshop - 22 May 2014Closing Remarks - AAL Market Observatory Workshop - 22 May 2014
Closing Remarks - AAL Market Observatory Workshop - 22 May 2014deboradg
 
Positioning libraries in the digital preservation landscape
Positioning libraries in the digital preservation landscapePositioning libraries in the digital preservation landscape
Positioning libraries in the digital preservation landscapeLIBER Europe
 
Presentation of H2020 ICT-32-2017 Startup Europe for Growth & Innovation Rada...
Presentation of H2020 ICT-32-2017 Startup Europe for Growth & Innovation Rada...Presentation of H2020 ICT-32-2017 Startup Europe for Growth & Innovation Rada...
Presentation of H2020 ICT-32-2017 Startup Europe for Growth & Innovation Rada...Nathalie Danse
 
Europeana and EUscreen. Joint AMIA/IASA Conference. Philadelphia- November 6,...
Europeana and EUscreen. Joint AMIA/IASA Conference. Philadelphia- November 6,...Europeana and EUscreen. Joint AMIA/IASA Conference. Philadelphia- November 6,...
Europeana and EUscreen. Joint AMIA/IASA Conference. Philadelphia- November 6,...Johan Oomen
 
Mastercourse Hortibusiness
Mastercourse HortibusinessMastercourse Hortibusiness
Mastercourse HortibusinessSjaak Wolfert
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...cneudecker
 

Semelhante a Europeana Newspapers Project Digitizes Millions of Newspaper Pages (13)

Europeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday MuehlbergerEuropeana Newspapers LFT Infoday Muehlberger
Europeana Newspapers LFT Infoday Muehlberger
 
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
Darko Fercej: Central European Living Lab for Territorial Innovation - Open d...
 
Max Lemke, Head of Unit, Components and Systems, European Commission
Max Lemke, Head of Unit, Components and Systems, European CommissionMax Lemke, Head of Unit, Components and Systems, European Commission
Max Lemke, Head of Unit, Components and Systems, European Commission
 
Acatech.pptx
Acatech.pptxAcatech.pptx
Acatech.pptx
 
Centre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens NeudeckerCentre of Competence in digitisation. Clemens Neudecker
Centre of Competence in digitisation. Clemens Neudecker
 
Closing Remarks - AAL Market Observatory Workshop - 22 May 2014
Closing Remarks - AAL Market Observatory Workshop - 22 May 2014Closing Remarks - AAL Market Observatory Workshop - 22 May 2014
Closing Remarks - AAL Market Observatory Workshop - 22 May 2014
 
Positioning libraries in the digital preservation landscape
Positioning libraries in the digital preservation landscapePositioning libraries in the digital preservation landscape
Positioning libraries in the digital preservation landscape
 
Presentation of H2020 ICT-32-2017 Startup Europe for Growth & Innovation Rada...
Presentation of H2020 ICT-32-2017 Startup Europe for Growth & Innovation Rada...Presentation of H2020 ICT-32-2017 Startup Europe for Growth & Innovation Rada...
Presentation of H2020 ICT-32-2017 Startup Europe for Growth & Innovation Rada...
 
Europeana and EUscreen. Joint AMIA/IASA Conference. Philadelphia- November 6,...
Europeana and EUscreen. Joint AMIA/IASA Conference. Philadelphia- November 6,...Europeana and EUscreen. Joint AMIA/IASA Conference. Philadelphia- November 6,...
Europeana and EUscreen. Joint AMIA/IASA Conference. Philadelphia- November 6,...
 
Mastercourse Hortibusiness
Mastercourse HortibusinessMastercourse Hortibusiness
Mastercourse Hortibusiness
 
Future Internet PPP for Smart Cities Ana Garcia
Future Internet PPP for Smart Cities Ana GarciaFuture Internet PPP for Smart Cities Ana Garcia
Future Internet PPP for Smart Cities Ana Garcia
 
Bne impact co_c
Bne impact co_cBne impact co_c
Bne impact co_c
 
An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...An Experimental Workflow Development Platform for Historical Document Digitis...
An Experimental Workflow Development Platform for Historical Document Digitis...
 

Mais de cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltextecneudecker
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungencneudecker
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...cneudecker
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritagecneudecker
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenzcneudecker
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-Dcneudecker
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspaperscneudecker
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...cneudecker
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...cneudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Miningcneudecker
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltextecneudecker
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europecneudecker
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minutencneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlincneudecker
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspaperscneudecker
 

Mais de cneudecker (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 

Último

productionpost-productiondiary-240320114322-5004daf6.pptx
productionpost-productiondiary-240320114322-5004daf6.pptxproductionpost-productiondiary-240320114322-5004daf6.pptx
productionpost-productiondiary-240320114322-5004daf6.pptxHenryBriggs2
 
call girls in Model Town DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Model Town  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Model Town  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Model Town DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
call girls in Vasant Kunj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vasant Kunj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Vasant Kunj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vasant Kunj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
Start Donating your Old Clothes to Poor People
Start Donating your Old Clothes to Poor PeopleStart Donating your Old Clothes to Poor People
Start Donating your Old Clothes to Poor PeopleSERUDS INDIA
 
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service BangaloreCall Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...narwatsonia7
 
How the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersHow the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersCongressional Budget Office
 
Disciplines-and-Ideas-in-the-Applied-Social-Sciences-DLP-.pdf
Disciplines-and-Ideas-in-the-Applied-Social-Sciences-DLP-.pdfDisciplines-and-Ideas-in-the-Applied-Social-Sciences-DLP-.pdf
Disciplines-and-Ideas-in-the-Applied-Social-Sciences-DLP-.pdfDeLeon9
 
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfMonastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfCharlynTorres1
 
(多少钱)Dal毕业证国外本科学位证
(多少钱)Dal毕业证国外本科学位证(多少钱)Dal毕业证国外本科学位证
(多少钱)Dal毕业证国外本科学位证mbetknu
 
history of 1935 philippine constitution.pptx
history of 1935 philippine constitution.pptxhistory of 1935 philippine constitution.pptx
history of 1935 philippine constitution.pptxhellokittymaearciaga
 
2024: The FAR, Federal Acquisition Regulations - Part 25
2024: The FAR, Federal Acquisition Regulations - Part 252024: The FAR, Federal Acquisition Regulations - Part 25
2024: The FAR, Federal Acquisition Regulations - Part 25JSchaus & Associates
 
Action Toolkit - Earth Day 2024 - April 22nd.
Action Toolkit - Earth Day 2024 - April 22nd.Action Toolkit - Earth Day 2024 - April 22nd.
Action Toolkit - Earth Day 2024 - April 22nd.Christina Parmionova
 
WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.Christina Parmionova
 
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILPanet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILChristina Parmionova
 
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️saminamagar
 
2024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 262024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 26JSchaus & Associates
 
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...Christina Parmionova
 
Call Girl Benson Town - Phone No 7001305949 For Ultimate Sexual Urges
Call Girl Benson Town - Phone No 7001305949 For Ultimate Sexual UrgesCall Girl Benson Town - Phone No 7001305949 For Ultimate Sexual Urges
Call Girl Benson Town - Phone No 7001305949 For Ultimate Sexual Urgesnarwatsonia7
 

Último (20)

productionpost-productiondiary-240320114322-5004daf6.pptx
productionpost-productiondiary-240320114322-5004daf6.pptxproductionpost-productiondiary-240320114322-5004daf6.pptx
productionpost-productiondiary-240320114322-5004daf6.pptx
 
call girls in Model Town DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Model Town  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Model Town  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Model Town DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
call girls in Vasant Kunj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vasant Kunj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Vasant Kunj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vasant Kunj DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
Start Donating your Old Clothes to Poor People
Start Donating your Old Clothes to Poor PeopleStart Donating your Old Clothes to Poor People
Start Donating your Old Clothes to Poor People
 
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service BangaloreCall Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
Call Girls Bangalore Saanvi 7001305949 Independent Escort Service Bangalore
 
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
Russian Call Girl Hebbagodi ! 7001305949 ₹2999 Only and Free Hotel Delivery 2...
 
How the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists LawmakersHow the Congressional Budget Office Assists Lawmakers
How the Congressional Budget Office Assists Lawmakers
 
Disciplines-and-Ideas-in-the-Applied-Social-Sciences-DLP-.pdf
Disciplines-and-Ideas-in-the-Applied-Social-Sciences-DLP-.pdfDisciplines-and-Ideas-in-the-Applied-Social-Sciences-DLP-.pdf
Disciplines-and-Ideas-in-the-Applied-Social-Sciences-DLP-.pdf
 
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdfMonastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
Monastic-Supremacy-in-the-Philippines-_20240328_092725_0000.pdf
 
(多少钱)Dal毕业证国外本科学位证
(多少钱)Dal毕业证国外本科学位证(多少钱)Dal毕业证国外本科学位证
(多少钱)Dal毕业证国外本科学位证
 
history of 1935 philippine constitution.pptx
history of 1935 philippine constitution.pptxhistory of 1935 philippine constitution.pptx
history of 1935 philippine constitution.pptx
 
2024: The FAR, Federal Acquisition Regulations - Part 25
2024: The FAR, Federal Acquisition Regulations - Part 252024: The FAR, Federal Acquisition Regulations - Part 25
2024: The FAR, Federal Acquisition Regulations - Part 25
 
Action Toolkit - Earth Day 2024 - April 22nd.
Action Toolkit - Earth Day 2024 - April 22nd.Action Toolkit - Earth Day 2024 - April 22nd.
Action Toolkit - Earth Day 2024 - April 22nd.
 
WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.WORLD CREATIVITY AND INNOVATION DAY 2024.
WORLD CREATIVITY AND INNOVATION DAY 2024.
 
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRILPanet vs.Plastics - Earth Day 2024 - 22 APRIL
Panet vs.Plastics - Earth Day 2024 - 22 APRIL
 
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Tilak Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
call girls in Laxmi Nagar DELHI 🔝 >༒9540349809 🔝 genuine Escort Service 🔝✔️✔️
 
2024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 262024: The FAR, Federal Acquisition Regulations - Part 26
2024: The FAR, Federal Acquisition Regulations - Part 26
 
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
“Exploring the world: One page turn at a time.” World Book and Copyright Day ...
 
Call Girl Benson Town - Phone No 7001305949 For Ultimate Sexual Urges
Call Girl Benson Town - Phone No 7001305949 For Ultimate Sexual UrgesCall Girl Benson Town - Phone No 7001305949 For Ultimate Sexual Urges
Call Girl Benson Town - Phone No 7001305949 For Ultimate Sexual Urges
 

Europeana Newspapers Project Digitizes Millions of Newspaper Pages

  • 1. Copyright: Olmsted County Historical Society Europeana Newspapers …in a nutshell Newspapers in Europe and the Digital Agenda for Europe - Final Workshop 29 September 2014, London, British Library Clemens Neudecker, State Library Berlin @cneudecker
  • 2. Facts & Figures • Europeana Newspapers – EU ICT-PSP Best Practice Network • Started in February 2012 and will run until January 2015 • 18 partners, 11 associated partners, 22 networking partners (28 countries involved) • Total budget: €5.16M – EC contribution: €4.12M • Project coordination: State Library Berlin / Preußischer Kulturbesitz This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 2
  • 3. Europeana Newspapers is all over Europe…and beyond This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 3 Red = Project Partners Blue = Associated Partners Green = Networking Partners
  • 4. Refinement - we‘re scaling it up! • 8 million pages refined with Optical Character Recognition (OCR) • 2 million pages refined with Optical Layout Recognition (OLR) • Technical resources for Named Entity Recognition (NER) in three languages (Dutch, German, French) • Metadata for >18 million pages ingested to Europeana  In comparison: currently provides access to 8,056,532 pages This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 4
  • 5. Quality & Performance Bag of Words OCR Evaluation Per Language 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Layout Analysis Performance Per evaluation profile Bag of Words OCR Evaluation Per Font This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp Bag of Words OCR Evaluation 5 82.4% 85.3% 80.9% 75.9% 67.5% 83.4% 84.1% 68.1% 93.1% 57.6% 87.0% 68.3% 76.1% 82.6% 54.1% 32.7% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Success Rate Language Setting 71.9% 74.3% 80% 75% 70% 65% 60% 55% 50% Index based Count based Success Rate Bag of Words OCR Evaluation Index based rate vs. count based rate 79.1% 62.2% 55.9% 58.8% 94.7% 0% Keyword search Phrase search Access via content structure Print/ebook on demand Content based image retrieval Success Rate (harmonic, area based) Evaluation Profile 67.3% 81.4% 64.0% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Gothic Normal Mixed Success Rate Font FineReader vs. Tesseract 75.3% 53.78% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Success Rate (count based) OCR Engine FineReader Tesseract
  • 6. Access via TEL & Europeana • Full text search in TEL Historic Newspapers Browser: http://www.theeuropeanlibrary.org/tel4/newspapers (recently updated following usability testing) • Metadata search in Europeana: http://www.europeana.eu/portal (now with embedded object presentation via TEL) This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 6
  • 7. Full-text search This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 7
  • 8. Browse by date This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 8
  • 9. Explore on a map This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 9
  • 10. Title list This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 10
  • 11. Embedded TEL Viewer in Europeana! This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 11
  • 12. Metadata Best Practices • Europeana Newspapers METS/ALTO Profile (ENMAP) • Contributions to ALTO standard v2.x, v3.0 • Structural metadata with tool support - Structify This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 12
  • 13. Media, News, Events This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp 13
  • 14. Lots of opportunities for research & reuse • Metadata for >18M pages licensed CC0 • Images & full-text for 10M pages licensed public domain • See also: http://www.europeana-newspapers.eu/ category/ interviews-with-researchers/ This project is partially funded under the ICT Policy Support Programme (ICT PSP) as part of the Competitiveness and Innovation Framework Programme by the European Community http://ec.europa.eu/ict_psp yet another way to reuse newspapers… 14
  • 15. Thank you for your attention! @eurnews http://www.europeana-newspapers.eu http://www.theeuropeanlibrary.org/tel4/newspapers http://www.europeana.eu/