Presentation of LectAuRep (Notary Registers Automated Reading) project. HTR applied to the French National Archives Notary Registers. IA4LAM community call, September 21st 2021 (+ link to recording in downloaded pdf).
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
LectAuRep transforms archives with AI
1. LectAuRep
(Notary Registers Automated Reading)
HTR applied to the French National Archives Notary Registers
Aurélia Rostaing
aurelia.rostaing@culture.gouv.fr
IA4LAM
September 21st 2021
2. - Several image
qualities
- 1000s
handwritings
(1803-1940s)
- +600,000 pages
Corpora with visually structured information
3. Serve end-users
> Turn massively digitized archives into data
through AI (recurrent neural networks - LSTM)
in order to read, search, mine text
Serve LAM communities
>Mutualize & document data, models and methods
(joint estate regime)
8. Still to be worked out
Sort out a DMP & manage workflows
Crowdsource raw HTR data editing
Feed the data to our VRR & customers via a TEI pipeline
Ongoing transcriptions/ground truth/raw HTR data
Anonymous full-text search (flat data)
Fuzzy + NER-based search (TEI)
Ground truth (EAD - RiC)
9. Feeding this data to
our VRR & customers
Raw ground truth
Partly edited ground truth
10. Editing text & structure for reading and research
https://github.com/HugoSchtr
https://gitlab.inria.fr/almanach/lectaurep/lepidemo
11. Further Reading on LectAuRep, eScripta & HTR
https://lectaurep.hypotheses.org
https://gitlab.inria.fr/almanach/lectaurep
https://escripta.hypotheses.org/
https://gitlab.com/scripta/escriptorium
https://gitlab.inria.fr/almanach/lectaurep/lepidemo
http://kraken.re/
https://readcoop.eu/transkribus
https://teklia.com/