This document discusses interaction with linked data, focusing on visualization techniques. It begins with an overview of the linked data visualization process, including extracting data analytically, applying visualization transformations, and generating views. It then covers challenges like scalability, handling heterogeneous data, and enabling user interaction. Various visualization techniques are classified and examples are provided, including bar charts, graphs, timelines, and maps. Finally, linked data visualization tools and examples using tools like Sigma, Sindice, and Information Workbench are described.
3. Motivation: Music! (2)
EUCLID – Interaction with Linked Data 3
• Our aim: build a music-based portal using Linked
Data technologies
• So far, we have studied different mechanisms to
consume Linked Data:
• Executing SPARQL queries
• Dereferencing URIs
• Downloading RDF dumps
• Extracting RDFa data
• The output of these mechanisms corresponds to
data in machine-readable formats
CH 2
CH 3
CH 1
5. Visualizations techniques are needed in order to
transform the machine-readable data into this:
Motivation: Music! (4)
EUCLID – Interaction with Linked Data 5
Source: http://musicbrainz.fluidops.net/
6. In addition, visualization techniques allow for:
Motivation: Music! (5)
EUCLID – Interaction with Linked Data 6
• Telling a story
• Engaging our pattern matching
brain
• Identifying data characteristics
which cannot be directly inferred
from statistical properties:
• Anscombe’s quartet: 4 datasets very
different, but with same statistical values.
Image: http://en.wikipedia.org/wiki/Anscombe's_quartet
Source: Donaldson, I. and Lamere P. Using Visualizations for Music Discovery
Image: Chan W., Qu. H, Mak, W. Visualizing the
Semantic Structure in Classical Musical Works.
7. Agenda
1. Linked Data visualization
2. Linked Data search
3. Methods for Linked Data analysis
7EUCLID – Interaction with Linked Data
9. LDVisualizationTechniques
• Linked Data visualization techniques should provide
graphical representations of the information within
the LD datasets
• Visualization techniques should be selected
accordingly to:
– The type of data: Specific types of data should be
visualized in a certain way
– The purpose of the visualization: Depending on the type
of analysis/application to employ
9EUCLID – Interaction with Linked Data
10. LDVisualizationTechniques (2)
EUCLID – Interaction with Linked Data 10
• (Raw) RDF data: Instance data, taxonomies,
ontologies, vocabularies.
• Analytically extracted data: Subset of
the data denominated region of interest (ROI),
obtained via data extraction mechanisms, for
example, SPARQL queries.
• Visualization abstraction: It is obtained by
applying visualization transformations to render the
data into displayable information.
• View: Final result. The visual mapping
transformations obtain a graphic representation of
the data using the selected visualization technique.
• User interaction: The user interacts (click,
zoom, etc.) with the visualization, which may trigger
a new visualization process.
RDF data
Analytically
extracted data
Visualization
abstraction
View
Data extraction
Visualization
transformation
Visual mapping
transformation
Overview of the Linked DataVisualization process
Process partially based on: Brunetti , J.M.; Auer, S.; García, R. The Linked Data Visualization Model.
(Optional)
User
interaction
11. country releases
United Kingdom 225
United States 140
Germany 30
Luxembourg 29
LDVisualizationTechniques (3)
EUCLID – Interaction with Linked Data 11
Example of the Linked DataVisualization process
…
RDF data
Analytically
extracted data
…
Visualization
abstraction
SELECT ?country (COUNT(?release) AS ?releases)
WHERE {
<http://dbpedia.org/resource/The_Beatles> foaf:made
?release .
?release a mo:Release ;
mo:label ?label .
?label foaf:based_near ?country .}
GROUP BY ?country
ORDER BY DESC(?releases)
Data extraction
SPARQL query: Retrieve number of releases per
country of The Beatles
#widget : HeatMap |
input = 'country_code' |
output = {{ 'releases' }}
Visualization
transformation
country_code releases
GB 225
US 140
DE 30
LU 29
?country_code2 := REPLACE(str(?country), "http://ontologi.es/place/", "", "i”)
?country_code := REPLACE(?country_code2, "%", "", "i")
Formatting the names of the countries
View Visual mapping
transformation
Selecting the visualization technique (input, output)
Can be performed in a single step
… …
13. Challenges for
Linked DataVisualization
EUCLID – Interaction with Linked Data 13
• Enabling user interaction
– Users must be able to navigate through the data by exploiting the
connections between Linked Data resources
– The user might edit the underlying data to enrich it by:
• Creating additional metadata
• Highlighting or correcting errors
• Validating data
• Supporting data reusability
– The output (the plotted data or the visualization itself) might be
encoded using standard ontologies and vocabularies
• Scalability
– Linked Data visualization techniques should support the display of
large amount of data in an efficient way
14. Challenges for
Linked Open DataVisualization
EUCLID – Interaction with Linked Data 14
• Extracting data from different repositories
– A Linked Data set might be partitioned into several repositories
– The region of interest (ROI) might include data from different data
sets, requiring the access to distributed repositories
• Handling heterogeneous data
– The same data (concepts) might be modeled differently, for example,
using different vocabularies
– Certain values might have different formats, for example, dates
represented as DD-MM-YYYY, MM-DD-YYYY or just YYYY
• Dealing with missing values
– Due to the semi-structuredness of Linked Data, some instances might
have missing values for certain properties
15. Classification of
VisualizationTechniques
15EUCLID – Interaction with Linked Data
Task Visualization techniques
Comparison of attributes /
values
• Bar/column and pie chart
• Line charts
• Histogram
Analysis of relationships
and hierarchies
• Graph
• Arc diagram
• Matrix
• Node-link visualizations
• Space-filling techniques: Treemaps, icicles and sunburst,
circle packing and rose diagrams
Analysis of temporal or
geographical events
• Timeline
• Maps
Analysis of multi-
dimensional data
• Parallel coordinates
• Radar/star chart
• Scatter plot
16. Bar/column chart
Allows the comparison of values of
different categories.
Pie chart
Useful for performing comparison
of percentages or proportions.
Comparison of
Attributes /Values
16EUCLID – Interaction with Linked Data
Line chart
Allows visualizing data as a series of
data points, where the measurement
points (x-axis) are ordered.
Histogram
Graphical representation of the
distribution of the data.
Image source: http://mbostock.github.io/protovis/Image source: http://musicbrainz.fluidops.net
Image source: http://mbostock.github.io/protovis/Image source: http://musicbrainz.fluidops.net
17. Arc diagram
The nodes are displayed in one
dimension, and the arcs represent
the connections.
Analysis of
Relationships and Hierarchies
Graph
The data entries are represented as
nodes and the links as edges.
17EUCLID – Interaction with Linked Data
Adjacency Matrix diagram
The nodes are displayed as rows and
columns, and the links between the
nodes are entries in the matrix.
Node-link visualizations
The data is organized in hierarchies.
Source of images: http://mbostock.github.io/protovis/
18. Icicles and sunburst
Hierarchies are represented by
adjacencies.
Analysis of
Relationships and Hierarchies (2)
Treemaps
Subdivide area into rectangles.
18EUCLID – Interaction with Linked Data
Circle-packing
Containment is used to represent the
hierarchies.
Rose diagrams
Areas are equal angles and the data
is represented by
the extension of
the area.
Source of images: http://mbostock.github.io/protovis/
Space-fillingtechniques
19. Analysis of Temporal or
Geographical Events
Timeline
19EUCLID – Interaction with Linked Data
Maps
Source: http://mbostock.github.io/protovis/
Choropleth maps
Aggregate data by
geographical area
Location maps
Display geo-points on a map
Dorling cartograms
Aggregate data and replace
each area with a circle
Discrete data points in time Continuous data in time
Source: http://www.kottke.org/08/08/2008-movie-box-office-chart
Source: http//musicbrainz.fluidops.net
Source: Google Map API Source: http//musicbrainz.fluidops.net
20. Scatter plot
Useful for performing comparison
of percentages or proportions.
Analysis of
Multidimensional Data
Radar/star chart
Displays multivariate data as a two-
dimensional chart. The axes
correspond to the
variables.
20EUCLID – Interaction with Linked Data
Parallel coordinates
Allows visualizing high-dimensional data.
Each vertical axis denotes a dimension, and
a multidimensional point is represented as
a polyline with vertices on the axes.
Source: http://mbostock.github.io/protovis/
Source: http://mbostock.github.io/protovis/Source: http://mbostock.github.io/protovis/
21. OtherVisualizationTechniques
EUCLID – Interaction with Linked Data 21
• Text-based visualizations: tag clouds
• Some of the previously presented techniques can be
combined to produce more complex data
visualizations
Phrase Net of Beatles Lyrics
DBpedia music genres
Source: http://www.wordle.net
Source: http://many-eyes.com
22. • Get an overview of the data
• Identification of relevant resources, classes or properties in
datasets
• Learning about certain underlying characteristics of the data,
e.g., vocabularies or ontologies
• Detecting missing links between nodes in an RDF graph
• Discovering new paths between nodes in an RDF graph
• Identifying hidden patterns in the data
• Finding errors or atypical values (outliers)
22EUCLID – Interaction with Linked Data
Applications of Linked Data
Visualization Techniques
23. Linked DataVisualization
Tool Requirements
The requirements for visualization tools that consume Linked Data can be
summarized as follows:
• Data navigation and exploration capabilities in order to understand the
structure and the content
• Exploiting data structures:
• Links to visualize hierarchies or graphs
• Multi-dimensional
• User interaction:
• Basic and advanced querying
• Filtering values
• Interactive UI: responsive to the user input
• Publication/syndication of the graphical representation of the data
• Data extraction in order to export the data such that can be reused by
third parties
23EUCLID – Interaction with Linked Data
24. Linked DataVisualization
ToolTypes
1. LD browsers with text-based representation
• Dereference URIs to retrieve the resource description
• Use a textual representation of LD resources
• Display adequately texts and images
• Mainly support exploratory browsing and knowledge discovery
2. LD and RDF browsers with visualization options
• Exploit picture, graphics, images and other visual
representations of the data
• Support user interaction: allows for querying, filtering and
jumping between resources
• Suitable for browsing and knowledge discovery as well as
analytic activities
24EUCLID – Interaction with Linked Data
25. Linked DataVisualization
ToolTypes (2)
3.Visualization toolkits
• Frameworks providing a wide range of visualization techniques
• General toolkits support LD visualization by applying a set of
transformations of the data
• Some toolkits are specially designed to consume LD
4. SPARQL visualization
• These tools allow transforming the output of SPARQL queries
into graphics
• Contact SPARQL endpoints in order to evaluate the query
• Suitable for analytical activities
25EUCLID – Interaction with Linked Data
26. Linked DataVisualization
ToolTypes (3)
26EUCLID – Interaction with Linked Data
LD browsers with text-
based presentations
Sig.ma
Sindice
OpenLink RDF Browser
Marbles
Disco Hyperdata Browser
Piggy Bank (SIMILE)
Zitgist DataViewer
iLOD
URI Burner
Dipper – Talis Platform Browser
LD and RDF browsers
with visualization
options
Tabulator
IsaViz
OpenLink Data Explorer
RDF Gravity
RelFinder
DBpedia Mobile
LESS
SIMILE Exhibit
Haystack
FoaF Explorer
Humboldt
LENA
Noadster
Visualization toolkits
Linked Data tools:
Information Workbench
Visual RDF (by Graves)
LOD Live
LOD Visualization
Data-Driven Documents (D3)
NetworkX
Many Eyes
Tableau
Prefuse
SPARQL visualization
Information Workbench
Google Visualization API
SPARQL package for R
Gruff (for AllegroGraph)
Linked Data:
General data:
27. Linked DataVisualization
Examples (1)
EUCLID – Interaction with Linked Data 27
Sig.ma
Source: http://sig.ma/search?q=The+Beatles
Retrieves information from
different LD sources
Keyword
search
Displays
values per
predicate
Displays
the source
for each
value
28. Linked DataVisualization
Examples (2)
EUCLID – Interaction with Linked Data 28
Sig.ma
Source: http://sig.ma/search?q=The+Beatles
Displays
values per
predicate:
May include (redundant)
information in different
languages, for example: annés
and anno
Summary:
• Sig.ma lists all the triples, and group
them per predicate
• Useful for browsing predicates and
values within data sets
• The meaning of the values is not evident
URIs are clickable, allowing
navigation through RDF
resources
29. Linked DataVisualization
Examples (3)
EUCLID – Interaction with Linked Data 29
Sindice
Keyword
search
Filtering
per type
of
document
Retrieves links
to documents
Allows accessing
cache documents
Allows inspecting
resources
Source: http://sindice.com/search?q=The+Beatles
30. Linked DataVisualization
Examples (4)
EUCLID – Interaction with Linked Data 30
Sindice
Both interfaces display the
set of triples related to the
inspected resource
Cache triples
Live triples
31. Linked DataVisualization
Examples (5)
EUCLID – Interaction with Linked Data 31
Information Workbench
• Demo available at: http://musicbrainz.fluidops.net
• Displays human-readable content about Linked Data
resources
• Supports visualization techniques (different types of charts,
maps, timelines, etc.) to plot results from SPARQL queries
• Allows the user to interact with the displayed data
35. Linked DataVisualization
Examples (9)
EUCLID – Interaction with Linked Data 35
Information Workbench: User interaction
LD visualizations must support navigation through the data
Source: http://musicbrainz.fluidops.net/resource/Analytical5
36. Linked DataVisualization
Examples (9)
EUCLID – Interaction with Linked Data 36
Information Workbench: SPARQLVisualization
Implements widgets which allow:
• Retrieving ROI via SPARQL queries
• Selecting the appropriate visualization technique
• Configuring parameters of the visualization
37. Linked DataVisualization
Examples (10)
EUCLID – Interaction with Linked Data 37
Information Workbench: SPARQL visualization
SELECT ?release
((SUM(xsd:double(?duration/60000))) AS ?avg)
WHERE {
<http://dbpedia.org/resource/The_Beatles>
foaf:made ?release .
?release mo:record ?record .
?record mo:track ?track .
?track mo:duration ?duration .}
GROUP BY ?release
ORDER BY DESC(?avg)
LIMIT 10
SPARQLQuery
Result set
Top ten The Beatles releases according to the sum of track durations in minutes
38. Linked DataVisualization
Examples (11)
EUCLID – Interaction with Linked Data 38
Information Workbench: SPARQL visualization
Top ten The Beatles releases according to the sum of track durations in minutes
Widget
Visualization: Bar chart
{{#widget: BarChart |
query ='SELECT (COUNT(?Release) AS ?COUNT)
?label WHERE {
<http://musicbrainz.org/artist/8538e728-ca0b-4321-b7e5-
cff6565dd4c0#_> foaf:made ?Release.
?Release rdf:type mo:Release .
?Release dc:title ?label .}
GROUP BY ?label
ORDER BY DESC(?COUNT)
LIMIT 20'
| settings = 'Settings:barvertical_mb'
| asynch = 'true'
| input = 'label'
| output = 'COUNT'
| height = '300’}}
39. Linked DataVisualization
Examples (12)
EUCLID – Interaction with Linked Data 39
Information Workbench: SPARQL visualization
Top ten The Beatles releases according to the sum of track durations in minutes
Other visualizations of the same result set …
Line chart:
Pie chart:
40. Linked DataVisualization
Examples (13)
EUCLID – Interaction with Linked Data 40
Information Workbench: Automated Widget Suggestion
Bar chart
Line chart
Pie chart
1
2 3Table
Pivot
view
Select a suggested visualization Visualization
automatically built
41. Linked DataVisualization
Examples (14)
EUCLID – Interaction with Linked Data 41
Other tools
Source: http://en.lodlive.it Source: http://lodvisualization.appspot.com
LODVisualizationLOD live
• Graph visualizations
• Interactive UI (the graph can be
expanded by clicking on the nodes)
• Live access to SPARQL endpoints
• Hierarchy visualizations: treemaps and trees
• Live access to SPARQL endpoints
(supporting JSON and SPARQL 1.1)
42. LinkingOpen Data Cloud
Visualization (1)
42EUCLID – Interaction with Linked Data
“The Linking Open Data cloud diagram”
by Richard Cyganiak and Anja Jentzsch
Source: http://lod-cloud.net
• The nodes correspond
to Linked Data sets
• The edges represent
connections between
Linked Data sets
• The size of the nodes is
proportional to the
number of triples in
each data set
• The datasets are
categorized by
knowledge domains
represented with colors
43. LinkingOpen Data Cloud
Visualization (2)
43EUCLID – Interaction with Linked Data
Image source: http://twitpic.com/17qj1h
“Linked Open Data Cloud” generated by Gephis
• The central cluster (green) displays DBpedia as a central focus
• The size of the nodes reflect the size of the datasets
• The length of the connections encode information about the data structure
Source: A. Dadzie and M. Rowe. Approaches to Visualizing Linked Data: A Survey. 2011
44. LinkingOpen Data Cloud
Visualization (3)
44EUCLID – Interaction with Linked Data
“Linked Open Data Graph” by Protovis
Source: http://inkdroid.org/lod-graph/
• The data to be displayed are
retrieved using the CKAN API
• The nodes represent Linked Data
sets available in the Data Hub “lod-
cloud” group
• The size of the nodes is proportional
to the data set size
• Edges are connections between data
sets
• The colors reflect the CKAN rating
and the intensity of the color reflects
the number of received ratings
• The nodes can be clicked to go to the
data set CKAN page
45. LD Reporting
EUCLID – Interaction with Linked Data 45
• Visualizations techniques are used in the creation of reports
included in data monitoring and management solutions
• Provides and overview of the dataset by generating a low-level
descriptive analysis:
• Quantitative information about the dataset
• Users may interact with the data via dashboards
• Some systems support this feature over structured data:
• Google Webmaster Tools (https://www.google.com/webmasters/tools)
• Information Workbench (http://www.fluidops.com/information-workbench)
• eCloudManager (http://www.fluidops.com/ecloudmanager)
46. GoogleWebmasterTool:
Structure Data Dashboard (1)
EUCLID – Interaction with Linked Data 46
• Provides to webmasters information about the structured
data embedded in their websites (and recognized by Google)
• The dashboard three levels:
i. Site-level view: aggregates the data by classes defined in
the vocabulary schema
ii. Item-type-level view: provides details per page for each
type of resource
iii. Page-level view: shows the attributes of every type of
resource on a given web page
47. GoogleWebmasterTool:
Structure Data Dashboard (2)
EUCLID – Interaction with Linked Data 47
Source: http://googlewebmastercentral.blogspot.de/2012/07/introducing-structured-data-dashboard.html
Site-level view
48. GoogleWebmasterTool:
Structure Data Dashboard (3)
EUCLID – Interaction with Linked Data 48
Source: http://googlewebmastercentral.blogspot.de/2012/07/introducing-structured-data-dashboard.html
Page-level view
Site-level view
50. Semantic Search Process
Using semantic models for the search process
50EUCLID – Interaction with Linked Data
Faceted
Search
Semantic
Search
Image based on: Tran, T., Herzig, D., Ladwig, G. SemSearchPro- Using semantics through the search process
Data graphs Query
Result
visualization/present
ation
User query
(e.g. keywords, NL)
Query visualization
(Optional) User
System
Refinement
Presentation
Analysis
Presentation /
Ranking
Graph matching
Entity Extraction /
Semantic query analysis
51. Image Source: http://musicontology.com
Semantic Search: Example (1)
51EUCLID – Interaction with Linked Data
User query
(NL)
“songs written by members of the beatles”
Entity extraction:
Query expansion:
song
track
melody
tune
synonym
mo:Track
Candidates
…
song member (of)written by (the) beatles
Entity mapping:
52. Semantic Search: Example (2)
52EUCLID – Interaction with Linked Data
User query
(NL)
“songs written by members of the beatles”
Entity extraction:
Query expansion:
writer
composer
creator
synonym
mo:composer
Image Source: http://musicontology.com
Candidates
written by
inverse of
…
song member (of)written by (the) beatles
Entity mapping:
53. Semantic Search: Example (3)
53EUCLID – Interaction with Linked Data
User query
(NL)
“songs written by members of the beatles”
Entity extraction: song member (of)written by (the) beatles
Query expansion:
member (of)
mo:member
_of
mo:member
inverse of
Image Source: http://musicontology.com
Entity mapping:
54. Semantic Search: Example (4)
54EUCLID – Interaction with Linked Data
User query
(NL)
“songs written by members of the beatles”
Entity extraction: song member (of)written by (the) beatles
Entity mapping:
(the) beatles
Candidates
Beatles
(Book)
The Beatles
(Music Group)
Beatle
(Animal)
Beatle
(Automobile)
How to identify the right “Beatle”? Examine the context (Contextual Analysis)
55. Semantic Search: Example (5)
55EUCLID – Interaction with Linked Data
User query
(NL)
“songs written by members of the beatles”
Entity extraction: song member (of)written by (the) beatles
Entity mapping:
(the) beatles
Contextual Analysis
foaf:Agent
mo:composer
mo:Track
mo:
MusicArtist
rdfs:subClassOf
mo:
MusicGroup
mo:member
rdfs:subClassOf
This subgraph is part of the query
The Beatles
(Music Group)
dbpedia:
The_Beatles
Entity mapping:
56. Semantic Search: Example (6)
56EUCLID – Interaction with Linked Data
User query
(NL)
“songs written by members of the beatles”
Entity extraction: song member (of)written by (the) beatles
?y
Mo:Track
?x
dbpedia:
The_Beatles
Results
(I want to) Come Home
Angel in Disguise
Another Day
…
Answers presented to the user
The results could be ranked
Query
foaf:Agent
57. Semantic Search
• Aims at understanding the meaning of the resources specified
in the query
• Different approaches to exploit semantics:
• Query expansion using ontologies
Since ontologies represent knowledge about specific domains, they can
be used to expand the query by incorporating related ontology terms into
the query.
• Contextual analysis
In LD, this approach may explore the resources specified in the query and their
adjacent nodes in the RDF graph. Mainly applied to disambiguate query terms.
• Reasoning
In some cases, the answer to a specific query is not explicitly contained in the
data, but it can be computed by using reasoning methods.
57EUCLID – Interaction with Linked Data
58. Semantic Search & Linked Data
58EUCLID – Interaction with Linked Data
Component Semantic search SPARQL query
Keyword or NL /
concept matching
Performs entity extraction
and matching to formal
concepts
Not supported
Fuzzy
concepts/relation/logics
Allows the application of
fuzzy qualifiers as query
constrains
Not supported
Graph patterns Uses the context and
other semantic
information to locate
interesting sub-graphs
Applies pattern matching
Path discovery Finds new interesting
links that may lead to
additional information
Not supported
Semantic Search vs. SPARQL query
59. Semantic Search: Google (1)
59EUCLID – Interaction with Linked Data
Input: query in NL
Output: List of answers
Google performs semantic search on certain entities and queries!
60. Semantic Search: Google (2)
60EUCLID – Interaction with Linked Data
Input: question in NL
Output: List of web pages
ranked using the algorithm
Google PageRank to display the
most relevant pages first
61. Semantic Search: DuckDuckGo (1)
61EUCLID – Interaction with Linked Data
Input: question in NL
Output: List of answers
62. Semantic Search: DuckDuckGo (2)
62EUCLID – Interaction with Linked Data
Performs disambiguation of the
query terms.
The 45 suggestions are grouped by
classes according to their
corresponding knowledge domain:
This approach is denominated
Faceted Search
63. Faceted Search: Example
InformationWorkbench: Searching for artists in categories
63EUCLID – Interaction with Linked Data
Facet
Facet
Facet
Source: http://musicbrainz.fluidops.net/resource/mo:MusicArtist?view=pivot
Depictions of artists
64. Faceted Search
• Facets = properties
• Suitable for browsing multi-dimensional taxonomies based on
the search attributes
• Allows user to explore the data:
• User submits a (keyword) query
• Faceted system dynamically identifies the relevant facets (properties)
for the given query and the constrains (values of those properties), and
display the search results
• User may “drill down” by selecting specific constrains to the search
results
• Information can be accessed and ranked in multiple ways
64EUCLID – Interaction with Linked Data
65. Faceted Search (2)
Challenges for supporting Faceted Search
• Identifying which facets to surface:
• In heterogeneous datasets, data entries may have different facets
• Dynamically identify the most appropriate facets for each query
• Ordering the facets depending on the relevance to the query
• Computing previews:
• Accurately predicting counts, without examining all the results
• Offering facet preview to give users an idea of what to expect
65EUCLID – Interaction with Linked Data
Source: Teevan , J., Dumais, S., Gutt. Z. Challenges for Supporting Faceted Search in Large, Heterogeneous
Corpora like the Web
66. Faceted Search: LD Example (1)
FacetedDBLP
• Retrieves information from the DBLP collection
• Shows the result set with different facets:
• Publication years
• Authors
• Conferences
• It is implemented upon the DBLP++ dataset (enhancement of
DBLP including additional keywords and abstracts):
• DBLP ++ is stored in a MySQL database
• Uses D2R server to consume RDF triples
66EUCLID – Interaction with Linked Data
67. Faceted Search: LD Example (2)
67EUCLID – Interaction with Linked Data
Input: “crowdsourcing”
Facets
485 results
FacetedDBLP
68. Classification of Search Engines
68EUCLID – Interaction with Linked Data
Semantic
Search
Systems
Faceted
Search
Systems
Google
(GKG)Bing
KIM
sig.ma
LOD cloud cache
/facet
Longwell
mSpace
Exhibit (SIMILE)
PoolParty Semantic
Search Server
DuckDuckGo
Hakia
SenseBot
PowerSet
DeepDive
Kosmix
Factibles
Lexxe
Information Workbench
69. Searching for Semantic Data
69EUCLID – Interaction with Linked Data
Search for
• Ontologies
• Vocabularies
• RDF documents
70. Semantic Data Search Engines (1)
EUCLID – Interaction with Linked Data 70
Searching for ontologies
Swoogle
http://kmi-web05.open.ac.uk/WatsonWUIhttp://swoogle.umbc.edu
Watson
Keyword search
Keyword search
71. Semantic Data Search Engines (2)
Searching for vocabularies: LOV Portal
• Allows to search properties, classes or vocabularies in
the Linked Open Vocabulary (LOV) catalog
• The LOV search engine implement faceted search on:
• The knowledge domain
• The role of the resource matched from the input query
• The vocabulary containing the resource
• Results are ranked according to a score considering:
• Relevancy to the query (string)
• Element labels matched importance
• Number of LOV vocabularies that refer to the element
71EUCLID – Interaction with Linked Data
72. Semantic Data Search Engines (3)
72EUCLID – Interaction with Linked Data
Facets
84 results
Input: “artist”
CH 3
Searching for vocabularies: LOV Portal
73. Semantic Data Search Engines (4)
EUCLID – Interaction with Linked Data 73
Searching for documents
http://swse.deri.org http://sindice.com
Semantic Web Search Engine Sindice
74. METHODS FOR LINKED DATA
ANALYSIS
EUCLID – Interaction with Linked Data 74
75. Features of Data Analysis
75EUCLID – Interaction with Linked Data
Statistical analysis
• Allows describing the data via Exploratory Data Analysis (EDA) methods
• Includes statistical inference and prediction
Data aggregation & filtering
• One of the first steps in data analysis is pre-processing in order to select the
appropriate data to study
Visualization techniques can be built on top of these as part of data analysis
Machine learning
• Focuses on prediction
• Combines Artificial Intelligence and Statistics
• Includes supervised and unsupervised learning (not covered in this course)
76. LD Data Aggregation & Filtering
EUCLID – Interaction with Linked Data 76
• Data aggregation refers to merging/summarizing several
values into a single a one
• Filtering allows retrieving relevant data properties and
selecting a particular range of data values
• SPARQL is able to perform these features via SELECT queries
as follows:
Features SPARQL capabilities
Aggregation Combining aggregate functions (COUNT, SUM, AVG, … ) and
GROUP BY operator
Filtering Combining projection, FILTER and HAVING operators
77. LD Statistical Analysis
EUCLID – Interaction with Linked Data 77
• Statistical analysis supports descriptive and predictive
operations
• SPARQL supports some descriptive operations (average,
maximum, minimum) but does not offer more sophisticated
statistical features like:
• Fitting distributions
• Linear regressions
• Analysis of variance
• …
• Some approaches are able to consume data retrieved from
SPARQL endpoints:
– “R for SPARQL” by Willen Robert van Hage & Tomi Kauppinen
– “Performing Statistical Methods on Linked Data” by Zapilko & Mathiak
78. R – Statistical Computing
EUCLID – Interaction with Linked Data 78
• R is a language and environment for statistical computing
• R provides a wide variety of statistical and graphical
techniques
• Linear and nonlinear modeling
• Classical statistical tests
• Time-series analysis
• Classification (Machine Learning)
• Clustering (Machine Learning)
• Extensible with further functionalities
• R is available as Free Software (under the terms of the
GNU general public license)
80. R for SPARQL
EUCLID – Interaction with Linked Data 80
• The R for SPARQL Package enables to:
• Connect a SPARQL endpoint over HTTP
• Pose a SELECT query or an UPDATE operation (LOAD, INSERT, DELETE)
• If given a SELECT query, it returns the results as a data frame
• The results can directly be mapped and visualized
• Posing requests:
• If the parameter query is given, it is assumed that the input is a SELECT query
and a GET request will be performed to get the results from the URL of the
endpoint
• If the parameter update is given, it is assumed that the input is an UPDATE
operation and a POST request will be submit to the URL of the endpoint.
Nothing is returned
Source: http://linkedscience.org/tools/sparql-package-for-r/
81. R for SPARQL: Example (1)
EUCLID – Interaction with Linked Data 81
1. Download the R package and load it:
• library(SPARQL)
• Library(sp) #user for plotting spatial data
2. Define the endpoint with the triples
• endpoint = "http://spatial.linkedscience.org/sparql"
3. Define the query
• q = "SELECT ?cell ?row ?col ?polygon ?DEFOR_2002
WHERE {
?cell a <http://linkedscience.org/lsv/ns#Item> ;
<http://spatial.linkedscience.org/context/amazon/Lin> ?row ;
<http://spatial.linkedscience.org/context/amazon/Col> ?col;
<http://observedchange.com/tisc/ns#geometry> ?polygon .
<http://spatial.linkedscience.org/context/amazon/DEFOR_2002>
?DEFOR_2002 .
}"
Source: http://linkedscience.org/tools/sparql-package-for-r
82. R for SPARQL: Example (2)
EUCLID – Interaction with Linked Data 82
4. Link the result to an object
• res <- SPARQL(endpoint,q)$results
5. Handling the results
• res$row <- -res$row
• coordinates(res) <- ~col - row
6. Chose the graphical format and plot the results
• spplot(res,"DEFOR_2002",col.regions=rev(heat.colors(
17))[-1], at=(0:16)/100, main="relative
deforestation per pixel during 2002")
Source: http://linkedscience.org/tools/sparql-package-for-r
83. R for SPARQL: Example (3)
EUCLID – Interaction with Linked Data 83
Source: http://linkedscience.org/tools/sparql-package-for-r
84. Machine Learning
EUCLID – Interaction with Linked Data 84
• Machine Learning techniques allow to extract interesting
information from data sources, and can be used to discover
hidden patterns within datasets by generalizing from examples
• Different ML approaches can be applied:
• Clustering: groups similar data into data partitions called clusters
• Association rule learning: discovers relations between variables
• Decision tree learning: analyses observations to build a predictive
model represented as a tree
• Many others …
• Weka is a Data Mining framework commonly used to apply ML
on tabular data:
– www.cs.waikato.ac.nz/ml/weka
85. Machine Learning on LD
EUCLID – Interaction with Linked Data 85
Challenges for applying Machine Learning on LD
• LD heterogeneity introduces noise to the data:
– Same LD resources, different URIs
– Predicates with similar semantics, but different constraints
• The data is not independent and identically distributed (iid):
– It does not consist of only one type of objects
– The entities are related to each other
• LD rarely contains negative examples needed for ML
algorithms:
– For example, owl:differentFrom
Source http://www.cip.ifi.lmu.de/~nickel/iswc2012-slides
86. Applications of
Machine Learning on LD
EUCLID – Interaction with Linked Data 86
• Node ranking:
– Ranking nodes according to their relevance for a query
• Link prediction:
– Infer edges between LD resources
– Predict the new edges that will be added to the RDF graph
• Entity resolution:
– Determine whether two URIs correspond to the same real-
world object
• Taxonomy learning:
– Infer taxonomies or concept hierarchies from a given
vocabulary or ontology
87. Summary
EUCLID – Interaction with Linked Data 87
• Linked Data visualization techniques:
• Visualizations must be chosen according the type of the data
• Wide variety of tools supporting SPARQL results’ visualization
• Might be used in dashboards for supporting administrative tasks
• Linked Data search
• Semantic search: exploits the meaning of user queries (NL or set of
keywords) to present useful results
• Faceted search: allows browsing multi-dimensional data
• Linked Data analysis:
• Includes data manipulation such as aggregation & filtering
• Applies statistical methods to get a better understanding of the data
• Machine Learning techniques can be applied for predictive analysis
• Visualization techniques can be built on top of the previous features
88. For exercises, quiz and further material visit our website:
EUCLID - Providing Linked Data 88
@euclid_project euclidproject euclidproject
http://www.euclid-project.eu
Other channels:
eBook Course
89. Acknowledgements
• Alexander Mikroyannidis
• Alice Carpentier
• Andreas Harth
• Andreas Wagner
• Andriy Nikolov
• Barry Norton
• Daniel M. Herzig
• Elena Simperl
• Günter Ladwig
• Inga Shamkhalov
• Jacek Kopecky
• John Domingue
• Juan Sequeda
• Kalina Bontcheva
• Maria Maleshkova
• Maria-Esther Vidal
• Maribel Acosta
• Michael Meier
• Ning Li
• Paul Mulholland
• Peter Haase
• Richard Power
• Steffen Stadtmüller
89
Notas do Editor
visualizations techniques by visualization techniques Tell by TellingEngage by EngagingIdentify by Identifying
Accordingly BY accordingly to
may may BY may
Allows BY allow
Dbpedia by DBpedia
Can you please send me: - the endpoint - a query that works
Semantic query analysis mean: query expansion using ontologies, context analysis and reasoning
guest1Password1SPARQL Package enables to connect to a SPARQL end-point over HTTP, pose a SELECT query or an update query (LOAD, INSERT, DELETE).If given a SELECT query it returns the results as a data frame with a named column for each variable from the SELECT query, a list of prefixes and namespaces that were shortened to qnames is also returned.If given an update query nothing is returned. If the parameter “query” is given, it is assumed the given query is a SELECT query and a GET request will be done to get the results from the URL of the end point.Otherwise, if the parameter “update” is given, it is assumed the given query is an update query and a POST request will be done to send the request to the URL of the end point.
Accessing the dataAt first, make sure that you have recent versions of the two R packages SPARQL and sp installed. Load the two packages by calling:library(SPARQL) # make sure to use at least version 1.9library(sp)Define the endpoint that will provide you with the triples byendpoint <- "http://spatial.linkedscience.org/sparql"To reduce the XML’s file size, the data is queried piece-wise. The query is initiated byq <- "SELECT ?cell ?row ?col ?polygon WHERE { ?cell a <http://linkedscience.org/lsv/ns#Item> ; <http://spatial.linkedscience.org/context/amazon/Lin> ?row ; <http://spatial.linkedscience.org/context/amazon/Col> ?col ; <http://observedchange.com/tisc/ns#geometry> ?polygon . }"res <- SPARQL(url=endpoint, q)$resultsand completed within a loop over all deforestation variablesfor(var in c("DEFOR_2002", "DEFOR_2003", "DEFOR_2004", "DEFOR_2005", "DEFOR_2006", "DEFOR_2007","DEFOR_2008")) {tmp_q <- paste("SELECT ?cell ?",var,"\\n WHERE { \\n ?cell a <http://linkedscience.org/lsv/ns#Item> ;\\n <http://spatial.linkedscience.org/context/amazon/",var,"> ?",var," .\\n }\\n",sep="")cat(tmp_q) res <- merge(res, SPARQL(endpoint, tmp_q)$results, by="cell")}Creating a SpatialPixelsDataFrameWe copy the results to a new object and flip the y-axis:amazon <- resamazon$row <- -res$rowAssigningcoordinates to a data.framewillresult in a Spatial-object. Setting the type to griddedwill produce a SpatialPixelsDataFrame:coordinates(amazon) <- ~ col+rowgridded(amazon) <- TRUEPlotting and handling the dataAs a first application, we produce a mapshowing relative deforestation per pixel during 2002 by:spplot(amazon,"DEFOR_2002",col.regions=rev(heat.colors(17))[-1], at=(0:16)/100, main="relative deforestation per pixel during 2002")
Accessing the dataAt first, make sure that you have recent versions of the two R packages SPARQL and sp installed. Load the two packages by calling:library(SPARQL) # make sure to use at least version 1.9library(sp)Define the endpoint that will provide you with the triples byendpoint <- "http://spatial.linkedscience.org/sparql"To reduce the XML’s file size, the data is queried piece-wise. The query is initiated byq <- "SELECT ?cell ?row ?col ?polygon WHERE { ?cell a <http://linkedscience.org/lsv/ns#Item> ; <http://spatial.linkedscience.org/context/amazon/Lin> ?row ; <http://spatial.linkedscience.org/context/amazon/Col> ?col ; <http://observedchange.com/tisc/ns#geometry> ?polygon . }"res <- SPARQL(url=endpoint, q)$resultsand completed within a loop over all deforestation variablesfor(var in c("DEFOR_2002", "DEFOR_2003", "DEFOR_2004", "DEFOR_2005", "DEFOR_2006", "DEFOR_2007","DEFOR_2008")) {tmp_q <- paste("SELECT ?cell ?",var,"\\n WHERE { \\n ?cell a <http://linkedscience.org/lsv/ns#Item> ;\\n <http://spatial.linkedscience.org/context/amazon/",var,"> ?",var," .\\n }\\n",sep="")cat(tmp_q) res <- merge(res, SPARQL(endpoint, tmp_q)$results, by="cell")}Creating a SpatialPixelsDataFrameWe copy the results to a new object and flip the y-axis:amazon <- resamazon$row <- -res$rowAssigningcoordinates to a data.framewillresult in a Spatial-object. Setting the type to griddedwill produce a SpatialPixelsDataFrame:coordinates(amazon) <- ~ col+rowgridded(amazon) <- TRUEPlotting and handling the dataAs a first application, we produce a mapshowing relative deforestation per pixel during 2002 by:spplot(amazon,"DEFOR_2002",col.regions=rev(heat.colors(17))[-1], at=(0:16)/100, main="relative deforestation per pixel during 2002")
Accessing the dataAt first, make sure that you have recent versions of the two R packages SPARQL and sp installed. Load the two packages by calling:library(SPARQL) # make sure to use at least version 1.9library(sp)Define the endpoint that will provide you with the triples byendpoint <- "http://spatial.linkedscience.org/sparql"To reduce the XML’s file size, the data is queried piece-wise. The query is initiated byq <- "SELECT ?cell ?row ?col ?polygon WHERE { ?cell a <http://linkedscience.org/lsv/ns#Item> ; <http://spatial.linkedscience.org/context/amazon/Lin> ?row ; <http://spatial.linkedscience.org/context/amazon/Col> ?col ; <http://observedchange.com/tisc/ns#geometry> ?polygon . }"res <- SPARQL(url=endpoint, q)$resultsand completed within a loop over all deforestation variablesfor(var in c("DEFOR_2002", "DEFOR_2003", "DEFOR_2004", "DEFOR_2005", "DEFOR_2006", "DEFOR_2007","DEFOR_2008")) {tmp_q <- paste("SELECT ?cell ?",var,"\\n WHERE { \\n ?cell a <http://linkedscience.org/lsv/ns#Item> ;\\n <http://spatial.linkedscience.org/context/amazon/",var,"> ?",var," .\\n }\\n",sep="")cat(tmp_q) res <- merge(res, SPARQL(endpoint, tmp_q)$results, by="cell")}Creating a SpatialPixelsDataFrameWe copy the results to a new object and flip the y-axis:amazon <- resamazon$row <- -res$rowAssigningcoordinates to a data.framewillresult in a Spatial-object. Setting the type to griddedwill produce a SpatialPixelsDataFrame:coordinates(amazon) <- ~ col+rowgridded(amazon) <- TRUEPlotting and handling the dataAs a first application, we produce a mapshowing relative deforestation per pixel during 2002 by:spplot(amazon,"DEFOR_2002",col.regions=rev(heat.colors(17))[-1], at=(0:16)/100, main="relative deforestation per pixel during 2002")
Can you please send me: - the endpoint - a query that works
Can you please send me: - the endpoint - a query that works
Can you please send me: - the endpoint - a query that works