This webinar in the course of the LOD2 webinar series will present Zemanta and its LODRefine - a LOD-enabled version of OpenRefine (previously Google Refine), which is a part of the LOD2 stack. LODRefine extends cleansing and linking functionalities of OpenRefine by providing means to reconcile and augment your data with DBpedia or any other SPARQL endpoint, extract named entities using Zemanta API, export data in one of the RDF formats, and recently also to exploit available crowdsourcing services. In webinar we will demonstrate several task which demonstrate the ease of use and versatility of LODRefine.
If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series: http://lod2.eu/BlogPost/webinar-series
LOD2 Plenary Meeting 2011: Institute Mihajlo Pupin – Partner Introduction
LOD2 Webinar Series: Zemanta / Open refine
1. Creating Knowledge out of Interlinked Data
LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.eu
2. Creating Knowledge out of Interlinked Data
LOD2 is a large-scale integrating project co-funded by the European
Commission within the FP7 Information and Communication Technologies
Work Programme. This 4-year project comprises leading Linked Open
Data technology researchers, companies, and service providers. Coming
from across 12 countries the partners are coordinated by the Agile
Knowledge Engineering and Semantic Web Research Group at the
University of Leipzig, Germany.
LOD2 will integrate and syndicate Linked Data with existing large-scale
applications. The project shows the benefits in the scenarios of Media and
Publishing, Corporate Data intranets and eGovernment.
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu
3. Creating Knowledge out of Interlinked Data
Once
per
month
the
LOD2
webinar
series
offer
a
free
webinar
about
tools
and
services
along
the
Linked
Open
Data
Life
Cycle.
Stay
with
us
and
learn
more
about
acquisiAon,
ediAng,
composing,
connected
applicaAons
–
and
finally
publishing
Linked
Open
Data.
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu
4. Creating Knowledge out of Interlinked Data
LODRefine – LOD-enabled
OpenRefine
The tool for cleansing, linking and augmenting data by Mateja Verlic, Zemanta
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 4 http://lod2.eu
5. Creating Knowledge out of Interlinked Data
Company
Zemanta brings useful content to bloggers,
connect authors to their peers and publishers
to marketers.
• Content research services
• Content enrichment tools
Our role in LOD2
• Web scale link & text mining from unstructured data
• Tools for cleansing data and crowdsourcing of cleansing
Dr. Mateja Verlič
LOD2 Webinar . 29.11.2011 . Page 5 http://lod2.eu
6. Creating Knowledge out of Interlinked Data
Presentation outline
• Terminology briefing
• Introduction to LODRefine
• The core: OpenRefine
• LOD-friendly extensions
• Demonstration
• Q&A
LOD2 Webinar . 29.11.2011 . Page 6 http://lod2.eu
7. Creating Knowledge out of Interlinked Data
Reconciling
Def: to reconcile
• To reestablish a close relationship between.
• To make compatible or consistent.
(The Free Dictionary)
LOD2 Webinar . 29.11.2011 . Page 7 http://lod2.eu
8. Creating Knowledge out of Interlinked Data
Augmenting / extending
Def: to augment
• To make (something already developed or well under way) greater, as in size,
extent, or quantity
(The Free Dictionary)
LOD2 Webinar . 29.11.2011 . Page 8 http://lod2.eu
9. Creating Knowledge out of Interlinked Data
Crowdsourcing
Def: crowdsourcing
• is the act of outsourcing tasks, traditionally performed by an employee or
contractor, to an undefined, large group of people or community (a crowd),
through an open call.
LOD2 Webinar . 29.11.2011 . Page 9 http://lod2.eu
10. Creating Knowledge out of Interlinked Data
Introduction to LODRefine
LOD-enabled OpenRefine
Google Refine ==> OpenRefine
LODGrefine ==> LODRefine
• Supporting DBpedia (and Freebase)
• Supporting crowdsourcing
• Exporting RDF
• Extracting named entities
LOD2 Webinar . 29.11.2011 . Page 10 http://lod2.eu
11. Creating Knowledge out of Interlinked Data
LODRefine’s place in LOD life cycle
LOD2 Webinar . 29.11.2011 . Page 11 http://lod2.eu
12. Creating Knowledge out of Interlinked Data
OpenRefine
Cross-platform server-client application
• Runs locally
• No dataset
Supports:
• Faceted browsing
• Regular expressions
• GREL expressions
• Extensions
value.split(",")[0].strip()
LOD2 Webinar . 29.11.2011 . Page 12 http://lod2.eu
13. Creating Knowledge out of Interlinked Data
OpenRefine
LOD2 Webinar . 29.11.2011 . Page 13 http://lod2.eu
14. Creating Knowledge out of Interlinked Data
The Extensions
Extend functionalities of OpenRefine
Developed by
• Zemanta: DBpedia extension, Crowdsourcing
• DERI: RDF Refine
• Free Your Metadata Group: Named Entity Extraction extension
LOD2 Webinar . 29.11.2011 . Page 14 http://lod2.eu
15. Creating Knowledge out of Interlinked Data
RDF Refine extension
Reconciliation and interlinking
• DBpedia
• Any SPARQL Endpoint or RDF dump
• Supporting for Apache Stanbol
Exporting RDF
• Defining graph shape before exporting
• Using custom vocabularies or importing existing ones
Webpage: http://refine.deri.ie/
Github: https://github.com/fadmaa/grefine-rdf-extension
LOD2 Webinar . 29.11.2011 . Page 15 http://lod2.eu
16. Creating Knowledge out of Interlinked Data
RDF Refine extension - reconciling
LOD2 Webinar . 29.11.2011 . Page 16 http://lod2.eu
17. Creating Knowledge out of Interlinked Data
DBpedia extension
Extending reconciled data with columns from DBpedia
• RDF extension recommended
Extracting Named Entities using Zemanta API
• API key required
Webpage: http://code.zemanta.com/sparkica
Github: https://github.com/sparkica/dbpedia-extension
LOD2 Webinar . 29.11.2011 . Page 17 http://lod2.eu
18. Creating Knowledge out of Interlinked Data
DBpedia extension – extending data
LOD2 Webinar . 29.11.2011 . Page 18 http://lod2.eu
19. Creating Knowledge out of Interlinked Data
DBpedia extension – extracting entities
LOD2 Webinar . 29.11.2011 . Page 19 http://lod2.eu
20. Creating Knowledge out of Interlinked Data
NER extension
Extracts named entities from unstructured text
Currently supports
• Alchemy API
• DBpedia Lookup
• Zemanta API
API keys required
Webpage: http://freeyourmetadata.org/named-entity-extraction/
Github: https://github.com/RubenVerborgh/Refine-NER-Extension
LOD2 Webinar . 29.11.2011 . Page 20 http://lod2.eu
21. Creating Knowledge out of Interlinked Data
NER extension – extracting entities
LOD2 Webinar . 29.11.2011 . Page 21 http://lod2.eu
22. Creating Knowledge out of Interlinked Data
Crowdsourcing extension
Support for
• Creating new crowdsourcing jobs
• Publishing data on CrowdFlower service
• Multiple labor channels (Amazon MT)
• CrowdFlower API key required
Job templates
• Evaluating reconciliation results
• Finding information (e.g. URLs)
Webpage: http://code.zemanta.com/sparkica/
Github: https://github.com/sparkica/crowdsourcing
LOD2 Webinar . 29.11.2011 . Page 22 http://lod2.eu
23. Creating Knowledge out of Interlinked Data
Crowdsourcing extension – create job from template
LOD2 Webinar . 29.11.2011 . Page 23 http://lod2.eu
24. Creating Knowledge out of Interlinked Data
Crowdsourcing extension – upload data
LOD2 Webinar . 29.11.2011 . Page 24 http://lod2.eu
25. Creating Knowledge out of Interlinked Data
Availability of LODRefine & extensions
LOD2 Webinar . 29.11.2011 . Page 25 http://lod2.eu
26. Creating Knowledge out of Interlinked Data
Availability of LODRefine & extensions
LOD2 Webinar . 29.11.2011 . Page 26 http://lod2.eu
27. Creating Knowledge out of Interlinked Data
Demonstration
Top 50 summer books by Forbes
• Creating project
• Preparing data
• Reconciling, extending data with DBpedia
Reconciliation evaulation for NHL players (links extracted from blogs)
• Create crowdsourcing job from template
• Upload data to CrowdFlower
LOD2 Webinar . 29.11.2011 . Page 27 http://lod2.eu
28. Creating Knowledge out of Interlinked Data
Contact
Zemanta Other extensions – resources
Celovska 32, SI-1000 Ljubljana, Slovenia
RDF extension
Presenter Webpage: http://refine.deri.ie/
Mateja Verlic Github: https://github.com/fadmaa/grefine-rdf-extension
Email: mateja.verlic@zemanta.com
Twitter: @sparkica NER extension
Skype: mverlic Webpage: http://freeyourmetadata.org/named-entity-extraction/
Github: https://github.com/RubenVerborgh/Refine-NER-Extension
LODRefine and extensions – resources
LOD2 project & Webinars
LODRefine LOD2 project: http://lod2.eu
Webpage: http://code.zemanta.com/sparkica Webinar series: http://lod2.eu/BlogPost/webinar-series
Github: https://github.com/sparkica/OpenRefine/tree/lodrefine
OpenRefine Resources
Extensions Google Group: https://groups.google.com/forum/#!forum/openrefine
DBpedia extension: https://github.com/sparkica/dbpedia-extension Github: https://github.com/OpenRefine/OpenRefine/
Crowdsourcing extension: Wiki: https://github.com/OpenRefine/OpenRefine/wiki
https://github.com/sparkica/crowdsourcing
Refine-stats extension: https://github.com/sparkica/refine-stats
Utlitities extension: https://github.com/sparkica/utilities
Thanks for your attention!
LOD2 Webinar . 29.11.2011 . Page 28
http://lod2.eu
http://lod2.eu
29. Creating Knowledge out of Interlinked Data
Credits
Jingle R.E.M., Martin Kaltenböck, Florian Kondert
Coordination Thomas Thurner
Martin Kaltenböck
Moderation Martin Kaltenböck
Presented by Mateja Verlič
LOD2 Webinar . 29.11.2011 . Page 29 http://lod2.eu
30. Creating Knowledge out of Interlinked Data
Hope
you
enjoyed
staying
with
us
–
if
you
need
more
detailed
informaAon,
visit
us
at
www.lod2.eu
and
let
us
know
how
we
can
improve
to
meet
your
expectaAons!
Don’t
forget
to
register
for
our
next
webinar
26.02.
2013
–
dbPedia
Spotlight
(University
of
Mannheim)
27.03.
2013
–
CKAN
and
publicdata.eu
(Open
Knowledge
FoundaAon)
Have
a
great
day
and
don’t
forget
...
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 30 http://lod2.eu
31. Creating Knowledge out of Interlinked Data
http://lod2.eu
LOD2 Webinar . 29.11.2011 . Page 31 http://lod2.eu