9. Data Discovery: Vocabularies
FOAF – Friend of a Friend:
• A Semantic Web Vocabulary used to describe people,
their activities and their relationships between one
another.
• It is becoming very popular for people who discover this
to setup and have their own FOAF profile.
• This vocabulary is the base from which other
vocabularies are extended.
15. Data Catalogs
• Community maintained registry exists
• Contains 362 data catalogs (growing)
• Based on CKAN data catalog platform
http://datacatalogs.org/
17. What is CKAN?
• Metadata repository with crowd-sourcing enabled
• Everybody can register and publish data about their datasets
• Developer-friendly web application
• Provides a well-documented API
• Easy to install, easy to use as your own metadata
repository
23. CKAN API
• Well-documented
• http://docs.ckan.org/en/latest/api.html
• Covers everything you can do with the web interface
• You can write your own web interface
• OKFN maintained library for accessing API
• ckanclient (python)
24. CKAN API: Methods
• Retrieving data
• Creating new data
• Update existing data
• Delete existing data
• Data is: packages, resources, groups, tags, users etc.
25. CKAN API: Examples
ckan = CkanClient(base_location=ckan_api_url,
api_key=ckan_api_key)
package_list = ckan.package_list()
formats = []
for package in package_list:
resource_list = package[‘resources’]
for resource in resource_list:
if(not resource['format'] in formats):
formats.append(resource['format'])
return sorted(formats)
https://github.com/okfn/ckanclient
26. Use Case: CSV2RDF Conversion
• Framework for CSV2RDF conversion
• Crowd-sourcing enabled
• RDF Visualizations
https://github.com/earthquakesan/CSV2RDF-WIKI
30. Data Conversion
• Structured: Relational Databases
• Semi structured: XML, HTML, XLS, CSV, APIs
• Unstructured: Raw text
PublicData.eu Statistics
31. XML
RDB
Spreadsheet
?
How does government
spending in certain sectors
relates to my company’s
earnings?
How does the historic
spending relates to the
current figures?
Give me report about all of
my customers across the
whole organization
Data Conversion
33. Merging data with RDF
XML
RDB
Spreadsheet
Once in RDF:
Easily integrate your
data
Concepts can be
mapped to one
another
Query everything with
one W3C standard
language (SPARQL)
35. • Red App has model
• Need to integrate Red & Blue models
Merging Data with RDF: Example
36. • Step 1: Merge RDF
• Same nodes (URIs) join automatically
Merging Data with RDF: Example
37. • Step 2: Add relationships and rules
• (Relationships are also RDF)
Merging Data with RDF: Example
38. • Step 3: Define Green model
• (Making use of Red
• & Blue models)
Merging Data with RDF: Example
39. • What the Blue app sees:
• No difference!
Merging Data with RDF: Example
40. • What the Red app sees
• No difference!
Merging Data with RDF: Example
41. RDF helps bridge other formats/models
• Producers and consumers may use different formats/models
• Rules can specify transformations
• Inference engine finds path to desired result model
RDF
Model
Transform
A1
A2
A3
B1
B2
C1
C2
X
Y
Z Ontologies
& Rules
Ontologies
& Rules
Ontologies
& Rules
54. Raw Text Processing: ConTEXT
●
No installation and configuration required.
●
Access content from a variety of sources
●
Instantly show the results of text analysis to users in a variety
of visualizations.
●
Allow refinement of automatic annotations and take feedback
into account
●
Provide a generic architecture where different modules for
content acquisition, natural language processing and
visualization can be plugged together.
http://rdface.aksw.org/nlp/hub.php
57. Definition
• In general, integration of multiple information systems
aims at combining selected systems so that they form
a unified new whole and give users the illusion of
interacting with one single information system
59. Federated SPARQL Queries
• Query processing involving multiple distributed data
sources, e.g. Linked Open Data cloud
DBpedia
New York
Times
Query both data collections in an integrated
way
60. Federated Query Processing
Federation mediator at the server
Virtual integration of (remote) data sources
Communication via SPARQL protocol
SPARQL
Data
Source
SPARQL
Data
Source
Federation
Mediator
SPARQL
Data
Source
Query
61. Federated Query Engines
Engine Name Implementation
language
License
FedX Java GNU A.G.P.L
SPLENDID Java L.G.P.L
LHD Java MIT
DARQ Java GPL
ANAPSID Python GNU G.P.L
ADERIS Java Apache