2. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Context
I care for metadata Ugh!
Your OPAC sucks
We
cooperate…
How to link Library Data
with the „Oceans“ of WWW ?
German
National Library
published authority
data
3. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Example
a search in subject index (with GND Identifiers)
a search in full text http://primo.fu-berlin.de
• GND = Thesaurus for
subject indexing in Germany
• Search with GND limited to
local resources
4. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
• search beyond the local
holdings => easier, more reliable
• suggest content using
semantic relations
( GND is a Thesaurus ! )
You* should use
identifiers
*publishers, authors, aggregators
Assigning IDs
is time consuming
- Reality -
Assigning IDs
is fun
- Vision -
5. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Questions & Tasks
• Could machines do the subject indexing?
-> Use SMA to enrich DBpedia pages with GND IDs
• Can we support Librarians in subject indexing?
-> Build Annotator Prototype
https://github.com/jhercher/LEE/
6. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Demonstrator
AnnotatorApp:
filters stoppwords and
displays Library entities
for your text
7. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Review concepts and
start a search using concept id’s
https://github.com/jhercher/LEE
8. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
How to Fusepool
9. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Workflow
1. Select a subset of GND Subject Headings using SPARQL
2. Import Subject Headings
3. Configure SMA dictionary component
4. Import documents (Graph)
5. Batch matching of documents with dictionaries using
Fusepools DLC
6. Review results and build services on top
10. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
http://zbw.eu/beta/sparql/gnd
http://d-nb.info/standards/elementset/gnd
NomenclatureInBiologyOrChemistry
SubjectHeadingSensoStricto
ProductNameOrBrandName
HistoricSingleEventOrEra
EthnographicName
GroupOfPersons
SubjectHeading
Language
11. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
http://localhost:8080/admin/graphs/
12. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
13. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
14. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Results
15. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
<http://de.dbpedia.org/resource/Wilder_Streik_bei_Ford_(1973)>
<http://purl.org/dc/elements/1.1/subject>
<http://d-nb.info/gnd/7708211-4> , # Drug-eluting Stent(syn: DES)
<http://d-nb.info/gnd/4302110-4> , # Ford
<http://d-nb.info/gnd/4578282-9> , # sich [„self“@en]
<http://d-nb.info/gnd/4248646-4> , # Spitzel [„spy“@en] (syn: IM)
<http://d-nb.info/gnd/4389837-3> , # August (month)
<http://d-nb.info/gnd/4291333-0> , # Niederlage [„defeat“@en]
<http://d-nb.info/gnd/4002623-1> . # Arbeitnehmer [„employee“@en]
• GND Dictionary includes: articles, prepositions, adjectives…
• Acronyms („IM, DES“) -> activate „Case Sensitivity“
• Not every match is useful in the context („August, Defeat“)
http://localhost:8080/graph?name=urn:x-localinstance:/dlc/
{yourDataset}/enhance.graph
16. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
human (found in GND) = 1
SMA GND suggestions = 7
SMA correct = 3
precision = 33%
recall = 100%
SMA false = 1
Prototype: GND Annotator
Persons LocationsTopics Time
manual Evaluation only for Topics
ok
ok
not relevant
false
not relevant
ok
not relevant
17. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Results (1)
Recall: 78%"
Precision: 73%
18. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Results (2)
Recall: 90%"
Precision: 72%
19. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
http://primo.kobv.de/docId=TN_thieme_articles10.1055/s-0029-1237743
Fusepool in the wild (1)
no exact
string match
chemical term geographic
financial
education
too broad
20. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Fusepool in the wild (2)
Abstract
Reviews
TOC
ISBN: 9783642371103
Drawback:
Quality of annotations
depend on text input
21. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Feedback
22. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Why Fusepool?
1. Ready for the Semantic Web"
• can handle graphs (clerezza, TDB,…)
• Data i/o using REST
2. String Matching SMA"
• Import & configuration of dictionaries (e.g. a Thesaurus)
• batch matching & annotation using Data Life Center (DLC)
3. Easy to install Builds at http://jenkins.fusepool.info
23. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Conclusion
!
• Fusepool: Infrastructure to build new services
• … better linking beyond the aquarium(s)
• TODO:
• build tailored interfaces for annotation, search, recommender
• improve the dictionaries
24. Fusepool final public workshop!
Brussels, June 25th – Johannes Hercher, Free University Berlin, University Library
Thank You!
twitter: @jhercher
github: https://github.com/jhercher/
mail: hercher@ub.fu-berlin.de