SlideShare uma empresa Scribd logo
1 de 50
How to read a million books
Clemens Neudecker
Staatsbibliothek zu Berlin
17 May 2017
How to read a million books?
Clemens Neudecker
Staatsbibliothek zu Berlin
17 May 2017
How to read a million books
newspapers?
Clemens Neudecker
Staatsbibliothek zu Berlin
17 May 2017
How to read* a million newspapers?
(* hint: do try this at home)
Clemens Neudecker
Staatsbibliothek zu Berlin
17 May 2017
https://xkcd.com/1838
Agenda
• Digital Collections
• Data, Tools, Formats
• Europeana Newspapers
• NLP challenges
• Experiments & use cases
About me
• Research Coordinator @ Berlin State Library
• M.A. Philosophy, Computer Science, Political Science
• Mostly curious about
– Optical Character Recognition, Document Analysis
– Natural Language Processing
– Digital Humanities
• More: @cneudecker, cneud.net
Staatsbibliothek zu Berlin
• Established 1661 as the Library of the King of Prussia
• Today largest research library in Germany,
with approx. 11.5m volumes (23m objects)
• Part of the „Stiftung Preußischer Kulturbesitz“,
a unique union of museums, archives, libraries
and research institutes from Berlin
• http://staatsbibliothek-berlin.de/
Digitisation 2.0
Europeana
• http://www.europeana.eu/portal/
• Europe‘s Digital Library
• > 53m objects incl.
art, sound, fashion,...
• API: http://labs.europeana.eu/api
DPLA
• https://dp.la/
• Digital Public
Library of America
• > 16m objects
• API: https://dp.la/info/developers/codex/
Hathi Trust
• https://www.hathitrust.org/
• Public copy of
Google Books
• > 15m volumes
• API: https://www.hathitrust.org/data
DDB
• http://ddb.de/
• Germany‘s federal
Digital Library
• > 9m objects
• API: https://api.deutsche-digitale-bibliothek.de/
Trove
• http://trove.nla.gov.au/
• Digital Library
of Australia
• > 540m objects
• API:
http://help.nla.gov.au/trove/building-with-trove/api
Formats & Standards
• What data is available?
• Typically, a digital object is composed of:
– Scanned Images in TIFF, JP2 or JPEG
– Descriptive metadata in DublinCore
– Structural metadata in METS
– Text content in ALTO or TEI
– Europeana in EDM
– Linked Data in RDF or JSON-LD
Tools (1/3)
• OAI-PMH
– https://pypi.python.org/pypi/Sickle
– https://pypi.python.org/pypi/pyoai
– https://pypi.python.org/pypi/oaiharvest
• METS
– https://pypi.python.org/pypi/metsrw
– https://pypi.python.org/pypi/pymets
Tools (2/3)
• DublinCore
– https://pypi.python.org/pypi/pydc
– https://pypi.python.org/pypi/dcxml
• Europeana
– https://pypi.python.org/pypi/europeana-search
– https://pypi.python.org/pypi/django-europeana
Tools (3/3)
• IIIF
– https://pypi.python.org/pypi/iiif/
– https://pypi.python.org/pypi/Flask-IIIF/
• KB NL
– https://pypi.python.org/pypi/kb
– https://github.com/KBNLresearch/intro-kb-apis
– http://lab.kb.nl/
Europeana Newspapers
• EU-project to make Europe‘s historical
newspapers searchable & accessible
• http://www.europeana-newspapers.eu/
Europeana Newspapers Collection
• 12 million historic newspaper pages text
(> 10.000.000.000 tokens)
• 40 languages, 4 alphabets
• 400 years (1618 – 2016)
• http://www.theeuropeanlibrary.org/tel4/newspapers
OCR / OLR
(U.lag nul «chestttetrung- ■geeinoel II, Setch«it,zen I—Ig Ufr sterntpeechee g» U II.
für ftrene-geingelpilche: 13 01191 nnd 13 03 11 io"gl f l««lt-beOeu; OetHn *1,
blnftraße IS IZeinsptechee; H I Sanemeinummet gurfilrft 8MB); ««de«: gdn.o(tio||e III
(ZemlpreAei 284.3»). Iie.gonlen nur nnier heimnnn » Erden bei der veutlchen Bonl
»n« Vtdennld-Getelltchoil gttloto bumduig. Commerz- nndprinoldonlN voINchrSomI
bomduig u 189 ß>, .»ontbeegee Kochelchlen- eitchelne» 12 mal wSchenNIch. täglich
zweimal — morgen« nnd ndendn —, Sonntage nnr morgen». Toonlnge nur abend»
Zn den Kochdorerlen wird die Ndend-Nuegode noch am üben!
Dieser Entwurf ist. wie Bürgermeister Roß mitteilte, den Fraktionen zur
Stellungnahme vorgelegt worden. Zum Donnerstag war eine zweite Sitzung der
Fraktionsfübrer vom Vertreter des Senats angeordnet worden, zu der ober zwei
Fraktionen, die Teutschnationalen und die Nationalsozialisten, nicht erschienen
waren. Von den Nationalsozialisten ist kurz vor Beginn der Sitzung eine telephonische
Erklärung abgegeben worden, etwa des Inhalts, daß die Fraktion sich den sachlichen
Verhandlungen entziehen müsse, solange nicht gewisse Vorbedingungen erfüllt sein.
http://www.theeuropeanlibrary.org/tel4/newspapers/issue/Hamburger_Nachrichten/1932/12/31
https://github.com/cneud/alto-tools
Performance
82.4%
85.3%
80.9%
75.9%
67.5%
83.4% 84.1%
68.1%
93.1%
57.6%
87.0%
68.3%
76.1%
82.6%
54.1%
32.7%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
SuccessRate
Language Setting
Bag of Words OCR Evaluation
Per Language
79.1%
62.2%
55.9%
58.8%
94.7%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Keyword
search
Phrase search Access via
content
structure
Print/ebook
on demand
Content
based image
retrieval
SuccessRate(harmonic,areabased)
Evaluation Profile
Layout Analysis Performance
Per evaluation profile
Experimental(!) Downloads
• http://data.theeuropeanlibrary.org/download/
newspapers-by-country/README.html
• http://research.europeana.eu/itemtype/
newspapers
• http://test-solr-mongo.eanadev.org/
europeana-research-newspapers-dump/
sample-2017-04-26/Staatsbibliothek_zu_Berlin_
Preu%253Fischer_Kulturbesitz/titles.html
OCR
• EU: IMPACT project (2008-2012)
• US: eMOP project (2013-2015)
• DE: OCR-D project (2016-2018)
• Google:
–Tesseract
–ocropy (fka OCRopus)
–Aksara
Named Entity Recognition
• 3 Categories:
– PERSON; LOCATION; ORGANIZATION
• 3 Languages:
– Dutch; French; German
• Powered by Stanford CoreNLP - CRF-NER
Annotations
Language # tokens # PER # LOC # ORG
French 207,000 5,672 5,614 2,574
Dutch 182,483 4,492 4,448 1,160
German 96,735 7,914 6,143 2,784
Language # tokens # PER # LOC # ORG
French 100% 2,75% 2,71% 1,24%
Dutch 100% 2,46% 2,44% 0,64%
German 100% 8,18% 6,35% 2,88%
Language Word-Error-Rate
(Bag of Words)
Reading Order
Success Rate
French 16,6% 19,9%
Dutch 17,6% 23,2%
German 15,9% / 21,9% 13,6%
Evaluation
Dutch French
Challenges
• https://github.com/EuropeanaNewspapers/
ner-corpora/wiki/Corpus-cleanup
Lack of metadata
Issue
There is no associated metadata for the annotated
text (newspaper title, date, etc.)
Solution
Automatically match lines with newspaper pages
through keyword search
OCR errors vs. historical spelling
Issue
Text contains OCR errors but also
valid(!) historical spelling variants
Solution
Document language profiling to distinguish
OCR errors and spelling variants
theylteil eyeitht
   ,
Sentence splits
Issue
During data pre-processing, (parts) of
sentences have been erroneously cut
Solution
Reconstruct sentences through keyword
search and matching procedure
Hyphenation
Issue
Text contains hyphenation to be removed
but hyphens do also occur in regular text
Solution
Use a tokenizer to determine hyphens to be
removed
Missing tags
Issue
Human operators forgot to tag some entities
or tagged them with the wrong category
Solution
???
Punctuation
Issue
According to CONLL, punctuation should be in
a separate line from the token - but
abbreviations…
Solution
???
https://altomator.github.io/EN-data_mining/
https://github.com/altomator/EN-data_mining
http://www.kbresearch.nl/dictionary/
https://github.com/jlonij/dictionary-viewer
http://www.kbresearch.nl/telraam/
https://gist.github.com/WillemJan/6ab02c48af576ba47b68
http://ngramviewer.kbresearch.nl/
https://bitbucket.org/ilps/pm-ngramviewers-kbkranten
http://www.digitalvictorianist.com/
https://twitter.com/VictorianHumour
https://github.com/BL-Labs/embellishments
http://networks.viraltexts.org/1836to1899/index.html
https://github.com/dasmiq/passim
EUROPEANA
TRANSCRIBATHON
CAMPUS BERLIN 2017
22-23 June 2017
Berlin State Library
http://pro.europeana.eu/
event/europeana-
transcribathon-campus-
2017
Thank you for your attention!
Clemens Neudecker
Staatsbibliothek zu Berlin
17 May 2017

Mais conteúdo relacionado

Mais procurados

Europeana, more than data aggregation?
Europeana, more than data aggregation?Europeana, more than data aggregation?
Europeana, more than data aggregation?Antoine Isaac
 
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...labsbl
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
Linked Data: principles and examples
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples Victor de Boer
 
One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebVictor de Boer
 
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015Antoine Isaac
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
Doing Digital Research @ British Library
Doing Digital Research @ British LibraryDoing Digital Research @ British Library
Doing Digital Research @ British LibraryMia
 
Multilingual challenges in Europeana
Multilingual challenges in EuropeanaMultilingual challenges in Europeana
Multilingual challenges in EuropeanaAntoine Isaac
 
Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017Europeana
 
Prototype on Illuminated Manuscripts
Prototype on Illuminated ManuscriptsPrototype on Illuminated Manuscripts
Prototype on Illuminated ManuscriptsEquipex Biblissima
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Datahorvadam
 
Post-Its and Placemarks
Post-Its and PlacemarksPost-Its and Placemarks
Post-Its and Placemarksaboutgeo
 
Estermann Panel on Authority Files, 3 June 2020
Estermann Panel on Authority Files, 3 June 2020Estermann Panel on Authority Files, 3 June 2020
Estermann Panel on Authority Files, 3 June 2020Beat Estermann
 
Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013Antoine Isaac
 

Mais procurados (20)

Europeana in a Research Context
Europeana in a Research ContextEuropeana in a Research Context
Europeana in a Research Context
 
Europeana, more than data aggregation?
Europeana, more than data aggregation?Europeana, more than data aggregation?
Europeana, more than data aggregation?
 
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
Linked Data: principles and examples
Linked Data: principles and examples Linked Data: principles and examples
Linked Data: principles and examples
 
One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic Web
 
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
The Power Of The User
The Power Of The UserThe Power Of The User
The Power Of The User
 
Doing Digital Research @ British Library
Doing Digital Research @ British LibraryDoing Digital Research @ British Library
Doing Digital Research @ British Library
 
EDL Stockholm
EDL StockholmEDL Stockholm
EDL Stockholm
 
Multilingual challenges in Europeana
Multilingual challenges in EuropeanaMultilingual challenges in Europeana
Multilingual challenges in Europeana
 
Open data and entrepreneurship
Open data and entrepreneurshipOpen data and entrepreneurship
Open data and entrepreneurship
 
Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017Europeana Research Panel DH Benelux 2017
Europeana Research Panel DH Benelux 2017
 
Keynote csws2013
Keynote csws2013Keynote csws2013
Keynote csws2013
 
Prototype on Illuminated Manuscripts
Prototype on Illuminated ManuscriptsPrototype on Illuminated Manuscripts
Prototype on Illuminated Manuscripts
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Post-Its and Placemarks
Post-Its and PlacemarksPost-Its and Placemarks
Post-Its and Placemarks
 
Estermann Panel on Authority Files, 3 June 2020
Estermann Panel on Authority Files, 3 June 2020Estermann Panel on Authority Files, 3 June 2020
Estermann Panel on Authority Files, 3 June 2020
 
Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013Europeana and Schema.org - DC2013
Europeana and Schema.org - DC2013
 

Semelhante a How to read a million books?

Europeana Newspapers - Data, Tools & Future Plans
 Europeana Newspapers - Data, Tools & Future Plans  Europeana Newspapers - Data, Tools & Future Plans
Europeana Newspapers - Data, Tools & Future Plans cneudecker
 
The Use of Big Data Techniques for Digital Archiving
The Use of Big Data Techniques for Digital ArchivingThe Use of Big Data Techniques for Digital Archiving
The Use of Big Data Techniques for Digital ArchivingSven Schlarb
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...Trevor Owens
 
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...Digital Classicist Seminar Berlin
 
E-ARK-iPRES2016-Bern-October-2016
E-ARK-iPRES2016-Bern-October-2016E-ARK-iPRES2016-Bern-October-2016
E-ARK-iPRES2016-Bern-October-2016Sven Schlarb
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataBoris Villazón-Terrazas
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8plan4all
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProopenminted_eu
 
The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...Lukas Koster
 
ResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMVladimir Alexiev, PhD, PMP
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Redesigning our Combine Harvester
Redesigning our Combine HarvesterRedesigning our Combine Harvester
Redesigning our Combine HarvesterTry PurpleSearch
 
Digital Archiving at the Meertens Institute
Digital Archiving at the Meertens InstituteDigital Archiving at the Meertens Institute
Digital Archiving at the Meertens Institutejuntez
 
Digital archaeology and museums
Digital archaeology and museumsDigital archaeology and museums
Digital archaeology and museumsdejp3
 
What's up, Europeana Newspapers?
What's up, Europeana Newspapers?What's up, Europeana Newspapers?
What's up, Europeana Newspapers?cneudecker
 
ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016dp-blog-cz
 
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Marcus Smith
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryHeinz Pampel
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
 

Semelhante a How to read a million books? (20)

Europeana Newspapers - Data, Tools & Future Plans
 Europeana Newspapers - Data, Tools & Future Plans  Europeana Newspapers - Data, Tools & Future Plans
Europeana Newspapers - Data, Tools & Future Plans
 
The Use of Big Data Techniques for Digital Archiving
The Use of Big Data Techniques for Digital ArchivingThe Use of Big Data Techniques for Digital Archiving
The Use of Big Data Techniques for Digital Archiving
 
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
We Have Interesting Problems: Some Applied Grand Challenges from Digital Libr...
 
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
E-ARK-iPRES2016-Bern-October-2016
E-ARK-iPRES2016-Bern-October-2016E-ARK-iPRES2016-Bern-October-2016
E-ARK-iPRES2016-Bern-October-2016
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
TEAMS 6, 7 and 8
TEAMS 6, 7 and 8TEAMS 6, 7 and 8
TEAMS 6, 7 and 8
 
Infrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKProInfrastructure crossroads... and the way we walked them in DKPro
Infrastructure crossroads... and the way we walked them in DKPro
 
The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...The drawbridge to knowledge - Linking scholarly publications and research inf...
The drawbridge to knowledge - Linking scholarly publications and research inf...
 
ResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRM
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
Redesigning our Combine Harvester
Redesigning our Combine HarvesterRedesigning our Combine Harvester
Redesigning our Combine Harvester
 
Digital Archiving at the Meertens Institute
Digital Archiving at the Meertens InstituteDigital Archiving at the Meertens Institute
Digital Archiving at the Meertens Institute
 
Digital archaeology and museums
Digital archaeology and museumsDigital archaeology and museums
Digital archaeology and museums
 
What's up, Europeana Newspapers?
What's up, Europeana Newspapers?What's up, Europeana Newspapers?
What's up, Europeana Newspapers?
 
ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016
 
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
 
Making Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org RegistryMaking Research Data Repositories Visible – The re3data.org Registry
Making Research Data Repositories Visible – The re3data.org Registry
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
 

Mais de cneudecker

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Librarycneudecker
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltextecneudecker
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungencneudecker
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?cneudecker
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspaperscneudecker
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...cneudecker
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritagecneudecker
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenzcneudecker
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-Dcneudecker
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspaperscneudecker
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...cneudecker
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...cneudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentscneudecker
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Miningcneudecker
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltextecneudecker
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europecneudecker
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minutencneudecker
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshellcneudecker
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlincneudecker
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspaperscneudecker
 

Mais de cneudecker (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
ALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für VolltexteALTO, PAGE & Co. Formate für Volltexte
ALTO, PAGE & Co. Formate für Volltexte
 
OCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für ZeitungenOCR und Strukturerkennung für Zeitungen
OCR und Strukturerkennung für Zeitungen
 
Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?Digitisation and Digital Humanities - what is the role of Libraries?
Digitisation and Digital Humanities - what is the role of Libraries?
 
Multimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical NewspapersMultimodal Perspectives for Digitised Historical Newspapers
Multimodal Perspectives for Digitised Historical Newspapers
 
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
OCR und Strukturerkennung: Herausforderungen und Ansätze für die Zeitungsdigi...
 
AI for digitized cultural heritage
AI for digitized cultural heritageAI for digitized cultural heritage
AI for digitized cultural heritage
 
Kuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher IntelligenzKuratieren mit künstlicher Intelligenz
Kuratieren mit künstlicher Intelligenz
 
Überblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-DÜberblick zum DFG-Projekt OCR-D
Überblick zum DFG-Projekt OCR-D
 
The many uses of digitized newspapers
The many uses of digitized newspapersThe many uses of digitized newspapers
The many uses of digitized newspapers
 
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
Digitalisate kuratieren mit KI - von unstrukturierten Daten zu strukturierten...
 
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
Von der Zeitungsdigitalisierung zu historischen Netzwerken - Methoden und Her...
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
Text and Data Mining
Text and Data MiningText and Data Mining
Text and Data Mining
 
Formate für Volltexte
Formate für VolltexteFormate für Volltexte
Formate für Volltexte
 
Extrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in EuropeExtrablatt: The Latest News on Newspaper Digitisation in Europe
Extrablatt: The Latest News on Newspaper Digitisation in Europe
 
Reise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 MinutenReise durch Europeana Collections in 11 Minuten
Reise durch Europeana Collections in 11 Minuten
 
Europeana Newspapers in a Nutshell
Europeana Newspapers in a NutshellEuropeana Newspapers in a Nutshell
Europeana Newspapers in a Nutshell
 
lab.sbb.berlin
lab.sbb.berlinlab.sbb.berlin
lab.sbb.berlin
 
Named Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana NewspapersNamed Entity Recognition for Europeana Newspapers
Named Entity Recognition for Europeana Newspapers
 

Último

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Último (20)

Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

How to read a million books?