SlideShare uma empresa Scribd logo
1 de 26
Digitisation and Digital Humanties:
what is the role of Libraries?
Clemens Neudecker (@cneudecker)
Berlin State Library
8 April 2021
Staatsbibliothek zu Berlin – Preußischer Kulturbesitz (SBB)
• Established 1661 as library of the
King of Prussia
• Largest research library in Germany
• Approximately 12m volumes,
23m media objects in total
• Part of the legal entity
Stiftung Preußischer Kulturbesitz
• https://staatsbibliothek-berlin.de/
Berlin State Library – East & West
Digitization @ SBB
• Since 2007: in-house Digitization Center
• Approx. 1.7M images annual production
• Up to 80 concurrent digitization projects
• 20 diverse bookscanners, scanrobots, etc.
• Operation in two shifts with 24 operators
• Digitisation-on-demand service
• KITODO open source digitisation
workflow management system
Digital Collections
• Main portal for digitised collections
• Currently around 180,000 digitised
documents available online
• Document published before 1920
public domain licensed
• IIIF API compatible
• Full image resolution is provided
• Full text (via OCR) and keyword search for
about 20% of the digitised content
• Downloads for images, OCR, metadata
• https://digital.staatsbibliothek-berlin.de/
ZEFYS – digitized newspapers
• Digitized historical newspapers have their own portal ZEFYS
• About 200 newspaper titles and roughly 10m pages digitized
• GDR Press Portal gives access to main newspapers from the GDR
(after authentication which is necessary due to copyright)
• ZEFYS got hacked in February 2021 - but is now being reconstructed
with a new technology stack
• No full text search (yet) but approx. 5m pages already have OCR
• Currently two major newspaper digitization projects from microfilm
• https://zefys.staatsbibliothek-berlin.de/
DDB Newspaper Portal
• Uniform access and UI for digitised
newspapers in Germany
• Key features
• Title list
• Calender
• Keyword search
• Advanced features
• Citation & Persistance
• Named Entities
• Corpus Building
• https://pro.deutsche-digitale-
bibliothek.de/
deutsches-zeitungsportal
Qurator.ai
• Leverage state-of-the-art AI/ML for
digitized cultural heritage curation
• Development of AI/ML pipeline:
• Binarization
• Layout analysis
• OCR
• Postcorrection
• Named Entity Recognition and
Named Entity Linking
• Image Similarity and Search
• https://qurator.ai
• https://github.com/qurator-spk
OCR-D
• Provide the technical and organisation
framework for the OCR processing of the
German VD digitization initiatives
(documents printed in Germany from 1600
– 1900)
• Open & collaborative development :
• Specifications & Guidelines
https://ocr-d.de/en/dev
• Open source tools https://github.com/OCR-D
• Community https://gitter.im/OCR-D/Lobby
• https://ocr-d.de
SoNAR (IDH)
• Examine and evaluate approaches for an
advanced research environment for
Historical Network Analysis
• Extract person names and relations from
databases & digitized newspapers
• Transform entities with relations into a
historical social network graph
• Create intuitive visualizations and
interfaces for querying and analyzing the
social network graph
• https://sonar.fh-potsdam.de
SBB LAB
• Experimental playground
• Provision of (open) datasets
• Documentation of public APIs
• Presentation of innovative prototypes
using SBB collections
• Events (Hackathons, Transcribathons)
• Digital Researcher Residency
(planned)
• https://lab.sbb.berlin/
Thank you for your attention!
Questions?
Clemens Neudecker (@cneudecker)
Berlin State Library
8 April 2021

Mais conteúdo relacionado

Mais procurados

Exploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegistersExploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegisters
Stanislav Ronzhin
 

Mais procurados (20)

GeoSEO and Map Series - Discovery Integrated With Geographical Search in Map ...
GeoSEO and Map Series - Discovery Integrated With Geographical Search in Map ...GeoSEO and Map Series - Discovery Integrated With Geographical Search in Map ...
GeoSEO and Map Series - Discovery Integrated With Geographical Search in Map ...
 
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...The Vienna History Wiki –  a Collaborative Knowledge Platform for the City of...
The Vienna History Wiki – a Collaborative Knowledge Platform for the City of...
 
Open Data at the Federal Level 2021
Open Data at the Federal Level 2021Open Data at the Federal Level 2021
Open Data at the Federal Level 2021
 
Sound Archives and Musical Instrument Collections
Sound Archives and Musical Instrument CollectionsSound Archives and Musical Instrument Collections
Sound Archives and Musical Instrument Collections
 
Exploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegistersExploratory querying of the Dutch GeoRegisters
Exploratory querying of the Dutch GeoRegisters
 
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...Webarchiv - Curatorial approaches, topic collections and cooperation with the...
Webarchiv - Curatorial approaches, topic collections and cooperation with the...
 
Lynx project presentation at ENDORSE 2021 Conference
Lynx project presentation at ENDORSE 2021 ConferenceLynx project presentation at ENDORSE 2021 Conference
Lynx project presentation at ENDORSE 2021 Conference
 
Introduction to Annotation, Content Search, and IIIF Authentication from the ...
Introduction to Annotation, Content Search, and IIIF Authentication from the ...Introduction to Annotation, Content Search, and IIIF Authentication from the ...
Introduction to Annotation, Content Search, and IIIF Authentication from the ...
 
Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...
Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...
Réseaux de bibliothèques à l'ère du cloud : que partager ? comment travailler...
 
MassNow - intelligent church locator
MassNow - intelligent church locatorMassNow - intelligent church locator
MassNow - intelligent church locator
 
Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer Digitised Manuscripts and the British Library's new IIIF viewer
Digitised Manuscripts and the British Library's new IIIF viewer
 
Working digitally with Historical Documents
Working digitally with Historical DocumentsWorking digitally with Historical Documents
Working digitally with Historical Documents
 
Process, not product Experiences from developing a digital interface of arch...
Process, not product  Experiences from developing a digital interface of arch...Process, not product  Experiences from developing a digital interface of arch...
Process, not product Experiences from developing a digital interface of arch...
 
Sasaki practical-linked-data
Sasaki practical-linked-dataSasaki practical-linked-data
Sasaki practical-linked-data
 
Presentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information DayPresentation of Claus Gravenhorst, BnF Information Day
Presentation of Claus Gravenhorst, BnF Information Day
 
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
d:swarm - A Library Data Management Platform Based on a Linked Open Data Appr...
 
Report of the Soil Data Facility
Report of the Soil Data Facility Report of the Soil Data Facility
Report of the Soil Data Facility
 
The data behind the HuisKluis
The data behind the HuisKluisThe data behind the HuisKluis
The data behind the HuisKluis
 
Data visualisation workshop
Data visualisation workshopData visualisation workshop
Data visualisation workshop
 
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
Maphub und Pelagios: Anwendung von Linked Data in den Digitalen Geisteswissen...
 

Semelhante a Digitisation and Digital Humanities - what is the role of Libraries?

The Europeana Newspapers Project at IMPACT Final Event
The Europeana Newspapers Project at IMPACT Final EventThe Europeana Newspapers Project at IMPACT Final Event
The Europeana Newspapers Project at IMPACT Final Event
Europeana Newspapers
 
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
IMPACT Centre of Competence
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
LIBER Europe
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
The European Library
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
Europeana Newspapers
 

Semelhante a Digitisation and Digital Humanities - what is the role of Libraries? (20)

Active archives @SBB
Active archives @SBBActive archives @SBB
Active archives @SBB
 
Dag Hensten - Nasjonalmuseet collections online
Dag Hensten - Nasjonalmuseet collections onlineDag Hensten - Nasjonalmuseet collections online
Dag Hensten - Nasjonalmuseet collections online
 
The European(a) Newspapers Project
The European(a) Newspapers ProjectThe European(a) Newspapers Project
The European(a) Newspapers Project
 
The Europeana Newspapers Project at IMPACT Final Event
The Europeana Newspapers Project at IMPACT Final EventThe Europeana Newspapers Project at IMPACT Final Event
The Europeana Newspapers Project at IMPACT Final Event
 
The Europeana Newspapers Project
The Europeana Newspapers ProjectThe Europeana Newspapers Project
The Europeana Newspapers Project
 
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...IMPACT Final Event 26-06-2012  - Use of IMPACT tools in the Europeana Newspap...
IMPACT Final Event 26-06-2012 - Use of IMPACT tools in the Europeana Newspap...
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
 
You've Digitised. What Next ?
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?
 
The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012The Europeana Newspapers Presentation - Cyberspace 2012
The Europeana Newspapers Presentation - Cyberspace 2012
 
Europeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcherEuropeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcher
 
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and researchIIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
 
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Europeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 BerlinEuropeana Newspapers Aggregator Forum 2018 Berlin
Europeana Newspapers Aggregator Forum 2018 Berlin
 
How to Build a Digital Library
How to Build a Digital LibraryHow to Build a Digital Library
How to Build a Digital Library
 
What's up, Europeana Newspapers?
What's up, Europeana Newspapers?What's up, Europeana Newspapers?
What's up, Europeana Newspapers?
 

Mais de cneudecker

OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
cneudecker
 

Mais de cneudecker (20)

ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 
Coding da Vinci Berlin 2017 - Europeana Newspapers
Coding da Vinci Berlin 2017 - Europeana NewspapersCoding da Vinci Berlin 2017 - Europeana Newspapers
Coding da Vinci Berlin 2017 - Europeana Newspapers
 
Coding da Vinci Berlin 2017 - Europeana Collections 1914-1918
Coding da Vinci Berlin 2017 - Europeana Collections 1914-1918Coding da Vinci Berlin 2017 - Europeana Collections 1914-1918
Coding da Vinci Berlin 2017 - Europeana Collections 1914-1918
 
Europeana Newspapers Transcribathon
Europeana Newspapers TranscribathonEuropeana Newspapers Transcribathon
Europeana Newspapers Transcribathon
 
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen BibliothekenDigitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
Digitale Kuratierungstechnologien: Anwendungsfälle in Digitalen Bibliotheken
 
How to read a million books?
How to read a million books?How to read a million books?
How to read a million books?
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Digitisation and Digital Humanities - what is the role of Libraries?

  • 1. Digitisation and Digital Humanties: what is the role of Libraries? Clemens Neudecker (@cneudecker) Berlin State Library 8 April 2021
  • 2. Staatsbibliothek zu Berlin – Preußischer Kulturbesitz (SBB) • Established 1661 as library of the King of Prussia • Largest research library in Germany • Approximately 12m volumes, 23m media objects in total • Part of the legal entity Stiftung Preußischer Kulturbesitz • https://staatsbibliothek-berlin.de/
  • 3. Berlin State Library – East & West
  • 4. Digitization @ SBB • Since 2007: in-house Digitization Center • Approx. 1.7M images annual production • Up to 80 concurrent digitization projects • 20 diverse bookscanners, scanrobots, etc. • Operation in two shifts with 24 operators • Digitisation-on-demand service • KITODO open source digitisation workflow management system
  • 5. Digital Collections • Main portal for digitised collections • Currently around 180,000 digitised documents available online • Document published before 1920 public domain licensed • IIIF API compatible • Full image resolution is provided • Full text (via OCR) and keyword search for about 20% of the digitised content • Downloads for images, OCR, metadata • https://digital.staatsbibliothek-berlin.de/
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. ZEFYS – digitized newspapers • Digitized historical newspapers have their own portal ZEFYS • About 200 newspaper titles and roughly 10m pages digitized • GDR Press Portal gives access to main newspapers from the GDR (after authentication which is necessary due to copyright) • ZEFYS got hacked in February 2021 - but is now being reconstructed with a new technology stack • No full text search (yet) but approx. 5m pages already have OCR • Currently two major newspaper digitization projects from microfilm • https://zefys.staatsbibliothek-berlin.de/
  • 14.
  • 15.
  • 16.
  • 17. DDB Newspaper Portal • Uniform access and UI for digitised newspapers in Germany • Key features • Title list • Calender • Keyword search • Advanced features • Citation & Persistance • Named Entities • Corpus Building • https://pro.deutsche-digitale- bibliothek.de/ deutsches-zeitungsportal
  • 18. Qurator.ai • Leverage state-of-the-art AI/ML for digitized cultural heritage curation • Development of AI/ML pipeline: • Binarization • Layout analysis • OCR • Postcorrection • Named Entity Recognition and Named Entity Linking • Image Similarity and Search • https://qurator.ai • https://github.com/qurator-spk
  • 19. OCR-D • Provide the technical and organisation framework for the OCR processing of the German VD digitization initiatives (documents printed in Germany from 1600 – 1900) • Open & collaborative development : • Specifications & Guidelines https://ocr-d.de/en/dev • Open source tools https://github.com/OCR-D • Community https://gitter.im/OCR-D/Lobby • https://ocr-d.de
  • 20. SoNAR (IDH) • Examine and evaluate approaches for an advanced research environment for Historical Network Analysis • Extract person names and relations from databases & digitized newspapers • Transform entities with relations into a historical social network graph • Create intuitive visualizations and interfaces for querying and analyzing the social network graph • https://sonar.fh-potsdam.de
  • 21. SBB LAB • Experimental playground • Provision of (open) datasets • Documentation of public APIs • Presentation of innovative prototypes using SBB collections • Events (Hackathons, Transcribathons) • Digital Researcher Residency (planned) • https://lab.sbb.berlin/
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Thank you for your attention! Questions? Clemens Neudecker (@cneudecker) Berlin State Library 8 April 2021