JudaicaLink: Linked Data in the Jewish Studies FID
1. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
JudaicaLink
Linked Data in the Jewish Studies FID
Kai Eckert
http://www.judaicalink.org
1
2. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
FID Jewish Studies / Israel Studies
Creation of a specialized information service
(Fach-Informations-Dienst) for the domain of Jewish
studies and Israel Studies.
Our part:
● Metadata integration and enrichment.
● Multilingual data matching.
2
Funding by
Consortium
3. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Portal of Jewish Studies
Goals:
● Create a central access point
● Offer high performance information
infrastructure
And also,
● Contextualize the digital Judaica
collections
● Enrich the metadata
● Connect different data sources as
Linked Open Data
3
4. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
The Portal
4
http://umber.ub.uni-frankfurt.de/judaica/ (Beta version, not yet officially launched!)
5. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem 5
6. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Portal of Jewish Studies
Goals:
● Create a central access point
● Offer high performance information
infrastructure
And also,
● Contextualize the digital Judaica
collections
● Enrich the metadata
● Connect different data sources as
Linked Open Data
6
7. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Re-Transliteration
7
● Automatic Retro-Conversion
of Romanized Hebrew Text
● Improve search facilities
for Hebrew speakers
● Needed to match data
cross-lingually
Aaron Christianson
8. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Retro-Conversion of Romanized Hebrew Text
lĕqahaḥ t teḥ qst ʿivrî bĕ-taʿătîq lātîḥnî
עבוריות לאותיות אותו ולהפוך
8
9. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
המלא הסיפור
ֵאל ָמַה רּ וִסַה
9
10. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Problem Statements
● No Hebrew Script until 2011
● Multiple standards of Romanization
● Ambiguities: Same romanized character can refer to
several Hebrew letters.
● Data imported from other catalogs
● The transliterations contain errors
(yes, even librarians make - rare - mistakes)
10
11. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
The Plan
1. Generate all possible original
forms in a “stupid” way.
2. Match the output against known
Hebrew names / titles.
3. Use the verified matches to
train a statistical model on the
word/phrase level.
11
12. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Portal of Jewish Studies
Goals:
● Create a central access point
● Offer high performance information
infrastructure
And also,
● Contextualize the digital Judaica
collections
● Enrich the metadata
● Connect different data sources as
Linked Open Data
12
13. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Contextualization of Digital Resources
● Find relevant data sources
● Find matching resources
● Extract information
● Add information to
library collection
13
Maral Dadvar
14. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
It’s all about Labels!
● Labels (Strings) are the first thing
we search to generate matching
candidates.
● Every additional label for a resource is
a possible new entry point to create
a connection.
● Caveat: More labels also create
more false positives. Further evidence
is needed to establish a link.
14
15. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem 15
16. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
● Make unstructured data
sources like online
encyclopedia available
as structured data
● Identify and collect
relevant subsets of
general-purpose
knowledge bases like
DBpedia
● To function as a single
hub for the
contextualization
process
16
17. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Main Tasks
1. Find new resource descriptions - with labels!
2. Find new labels (and other data) for known resources.
3. Find connections and duplicates within known resources.
4. Make the data available for others to contextualize.
17
18. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Example 1: YIVO Encyclopedia
18
19. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
What Data can we find?
● A title
● Describing text
● Links in texts
"Surface form" => Concept
● Pictures
● Description of pictures
19
20. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Making use of the Surface Forms
Minsk article links to "Poland before 1795" calling it
"Polish-Lithuanian Commonwealth".
"Polish-Lithuanian Commonwealth" is a subsection
of "Poland before 1795" in the main article "Poland".
So is "Demography"…
Surface forms are evidence for labels.
20
21. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Example 2: Biographisches Handbuch der Rabbiner
● The Biographisches Handbuch der Rabbiner is an online encyclopedia
provided by the Salomon L. Steinheim-Institute for German-Jewish history at
the University of Duisburg-Essen, edited by Michael Brocke and Julius
Carlebach.
● The goal of this encyclopedia is to be a complete directory of all rabbis who
lived and worked in or originated from German-speaking areas since the age
of enlightenment.
● http://www.steinheim-institut.de/wiki/index.php/Biographisches_Handbuch_de
r_Rabbiner_%28BHR%29
21
22. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Available
as PDF.
22
23. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Some notes about PDF Sources
● PDF is great to keep the visual layout of a text across
systems.
● In all other aspects, in particular regarding the access to
the content, it is horrible.
● Digital-born PDFs (as in this case) are PDFs that have
been created directly by the authoring software.
● Even worse: PDFs created from scans (with OCR).
23
24. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem 24
25. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Biographisches Portal der Rabbiner
Gladly some people at Steinheim Institute created a database from the handbook:
http://www.steinheim-institut.de:50580/cgi-bin/bhr#i0001
This URL above is shown in the browser when you view the entry on Aach, Löb.
Great stuff:
● Semi-structured form of the entry
● “Link” to the PDF by means of volume and page number
● Reference to the number of the entry as it is used in the PDF.
● A GND number!!!
Not so great:
● We can not link to the database as the link above does not resolve to the
article.
25
26. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Solution
The solution actually was already implemented:
There is an undocumented way to address an entry:
http://steinheim-institut.de:50580/cgi-bin/bhr?id=1
26
27. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Example 3: DBpedia
Generation of a DBpedia subgraph.
1. Focused Crawling of data sources
a. identify “relevant” resources
b. extract “relevant” information
2. Find matches in the whole dataset
a. extract “relevant” information
27
28. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Interlinking
The more data sources we have, the better we can use them to support the linking
process.
28
29. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Architecture and Deployment
Triple store and SPARQL endpoint: Apache Jena Fuseki
Linked Data frontend (URI dereferencing, HTML Views): Pubby (DM2E version)
Static HTML pages of the website: Hugo
Versioning and management: GitHub
Search Access: Elasticsearch (planned)
29
30. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Dataset Description in Markdown
with Metadata Frontmatter
+++
author = "Kai Eckert"
title = "Yivo Encyclopedia"
website = "http://www.yivoencyclopedia.org"
example = "http://data.judaicalink.org/data/yivo/Moscow"
graph = "http://data.judaicalink.org/data/yivo"
loaded = true
[[files]]
url = "http://data.judaicalink.org/dumps/yivo/current/yivo.n3.gz"
description = "Extraction from YIVO Encyclopediae"
+++
The YIVO Encyclopedia of Jews in Eastern Europe, courtesy of the YIVO Institute of
Jewish Research, NY.
<!--more-->
...
30
31. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
The whole website is maintained via GitHub
31
32. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Every new commit gets pushed to the web server
Via the static site generator Hugo, all HTML pages are generated.
32
+++
author = "Kai Eckert"
title = "Yivo Encyclopedia"
website = "http://www.yivoencyclopedia.org"
example =
"http://data.judaicalink.org/data/yivo/Moscow"
graph = "http://data.judaicalink.org/data/yivo"
loaded = true
[[files]]
url =
"http://data.judaicalink.org/dumps/yivo/current/
yivo.n3.gz"
description = "Extraction from YIVO
Encyclopediae"
+++
The YIVO Encyclopedia of Jews in Eastern Europe,
courtesy of the YIVO Institute of Jewish
Research, NY.
<!--more-->
...
33. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Every new commit gets pushed to the web server
A Python script parses the metadata of the pages
and loads and unloads the datasets automatically.
Advantages:
● No one needs access to the server.
● Write access to the data is easily done via GitHub.
● Data dumps of all datasets are always available.
● Description, dumps and loaded data are always synchronous.
● History of the datasets is maintained (and the dumps)
● Mistakes can easily be reverted by going back to an earlier commit.
33
34. WISS Research Group | JudaicaLink: Linked Data in the Jewish Studies FID - EVA/MINERVA 2017 - Nov 14th, 2017 - Jerusalem
Thank you.
http://slideshare.net/kaiec
http://www.wisslab.org
34