Slides from our tutorial on Linked Data generation in the energy domain, presented at the Sustainable Places 2014 conference on October 2nd in Nice, France
1. Linked Energy Data Generation
Tutorial
Filip Radulovic, María Poveda Villalón, Raúl García-Castro
{fradulovic,mpoveda,rgarcia}@fi.upm.es
ETSI Informaticos
Universidad Politécnica de Madrid
Campus de Montegancedo s/n
28660 Boadilla del Monte, Madrid, Spain
Twitter: @LD4SC
02.10.2014. Sustainable Places 2014, Nice, France
2. License
• This work is licensed under the Creative Commons
Attribution – Non Commercial – Share Alike License
• You are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
• Under the following conditions
• Attribution — You must attribute the work by inserting
• “[source http://www.oeg-upm.net/]” at the footer of each
reused slide
• a credits slide stating: “These slides are partially based
on “Linked Energy Data Generation” by F. Radulovic, M.
Poveda-Villalón, R. García-Castro”
• Non-commercial
• Share-Alike
2
3. Table of Contents
1. Introduction
2. Data preparation
3. Ontology development
4. Data generation
5. Discussion and Conclusions
3
5. Classic Web
• Typical web page
markup consists of:
• Rendering information
(e.g., font size and
colour)
• Hyper-links to related
content
• Semantic content is
accessible to humans
but not (easily) to
computers…
5
6. Classic Web
Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
Information from
single pages
can be found via
search engines
6
7. CIA World
FactBook
MovieDB
Classic Web
Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
What about complex queries over multiple
pages / data sources?
Show me a picture of the tallest
building in the country with the highest
CO2 emission rate in 2013
Impossible
7
8. CIA World
FactBook
MovieDB
Classic Web
What about complex queries over multiple
pages / data sources?
Show me a picture of the tallest
building in the country with the highest
CO2 emission rate in 2013?
Impossible
8
9. What do we actually want?
• Use the Web like a single global database
• Move from a Web of documents to a Web of Data
Slide adapted from Boris Villazón Terrazas and “5min Introduction to Linked Data”- Olaf Hartig
Wikipedia
CIA World
FactBook
Shanghai Tower 2013-8-3CC BY-SA 3.0
9
10. Linked Data enables such Web of Data
Slide adapted from Boris Villazón Terrazas and “5min Introduction to Linked Data”- Olaf Hartig
Global Identifier: URI (Uniform Resource Identifier) identifies a resource on the Internet.
Data Model: RDF (Resource Description Framework) standard model for data interchange on the Web.
Access Mechanism: HTTP
Connection: Typed Links
Wikipedia
CIA World
FactBook
Shanghai Tower 2013-8-3CC BY-SA 3.0
http://cia.../China 10000…http://...wikipedia.../data/s
hangaiTower
http://.../co2emission
http://.../depiction
2013
http://.../co2emissionPerYearhttp://.../location
http://.../location
http://.../year
http://…#sameAs
10
11. The four principles (Tim Berners Lee, 2006)
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those
names.
3. When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
4. Include links to other URIs, so that they can discover
more things.
http://www.w3.org/DesignIssues/LinkedData.html
11
12. “The Semantic Web is an extension of the current Web in which information is
given well-defined meaning, better enabling computers and people to work in
cooperation.
It is based on the idea of having data on the Web defined and linked such that it
can be used for more effective discovery, automation, integration, and reuse
across various applications.”
Hendler, J., Berners-Lee, T., and Miller, E. Integrating Applications on the Semantic Web, 2002,
http://www.w3.org/2002/07/swint.html
Semantic Web definition
12
13. Benefits + Cases of success
• Provide semantics meaningful data & common understanding
• Interoperability
• Reasoning power
• Infer more data
• Find mistakes in the original data?
• Enrich your data (with what is already out there)
• Search engines are indexing some schemas
• Increase visibility
• Multilingual information
13
14. In this tutorial
“D4.1 Requirements and guidelines for energy data generation”
From READY4SmartCities project available at http://goo.gl/IWDmYy
14
15. Table of Contents
1. Introduction
2. Data preparation
1. Select data source
2. Obtain access to data source
3. Analyse licensing of the data source
4. Analyse data source
3. Ontology development
4. Data generation
5. Discussion and Conclusions
15
16. Select data source
• Selecting the data source that will be transformed
into Linked Data
• Steps
1. To define the requirements
2. To select one or several data sources
• Alternatives:
• Data set from your own organization
• Data sourced not owned by your organization (external data
sources)
16
17. Select data source – LCC example
• Limitation to external data sources (search)
1. Requirements
• Real-world scenario in the energy domain
• Available for use
• Available in machine-processable format (the
more structured the data are, the better)
• Can be linked with generic entities (e.g., location)
2. Leeds City Council – energy consumption
(http://data.gov.uk/dataset/council-energy-consumption)
17
18. Table of Contents
1. Introduction
2. Data preparation
1. Select data source
2. Obtain access to data source
3. Analyse licensing of the data source
4. Analyse data source
3. Ontology development
4. Data generation
5. Discussion and Conclusions
18
19. Obtain access to data source
• Data access means
• technical means to retrieve the data
• legal rights to use the data
• In some cases, data source might not be accessible
• Steps
1. To identify the person to contact
2. To request the access
3. To obtain access and to retrieve the data
• Access alternatives: files, programming interface,
database, data streams, etc.
19
20. Obtain access to data source – LCC example
• Data set already available for download
• Available in a CSV file
20
21. Table of Contents
1. Introduction
2. Data preparation
1. Select data source
2. Obtain access to data source
3. Analyse licensing of the data source
4. Analyse data source
3. Ontology development
4. Data generation
5. Discussion and Conclusions
21
22. Analysing licensing of the data source
• Licenses specify the legal terms under which a data
set can be used and exploited
• Steps
1. To identify the publisher
2. To find the applicable license
• Web page, data set metadata, data itself
• Contact the publisher
3. To read the license and determine legal terms
• Tips
• Analysis should be performed upon all available copies of
the data
• Ensure compatible licences between several data sources
22
24. Table of Contents
1. Introduction
2. Data preparation
1. Select data source
2. Obtain access to data source
3. Analyse licensing of the data source
4. Analyse data source
3. Ontology development
4. Data generation
5. Discussion and Conclusions
24
25. Analyse data source
• Getting insight into data structure and organization
• Steps
1. To analyse the characteristics of the data
• Data values, data ranges, etc.
2. To obtain the schema of the data
• Description of concepts and their relationships
• Data format alternatives:
• Structured data
• Unstructured data
• Tip: Use standard modeling language for data
schema (e.g., UML)
25
26. Analyse data source – LCC example
• Electricity, gas and oil consumptions as decimal
values
• 1-year intervals - 2010/11, 2011/12, 2012/13
• Different types of council sites (mostly buildings)
• Full address provided (street, city, district)
• Correspondence with people from LCC open data
26
27. Table of Contents
1. Introduction
2. Data preparation
3. Ontology development
4. Data generation
5. Discussion and Conclusions
27
28. Ontology development - Preparation
• RDF – Resource Description Framework
• Data model
• (subject-predicate-object)
• Resource naming strategy
• For terms
• Pattern:
http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#myter
m
• Example:
http://smartcity.linkeddata.es/lcc/ontology/EnergyConsumption#hasQu
antitiveValue
• For individuals
• Pattern:
http://smartcity.linkeddata.es/lcc/resource/LeisureCentre/myIndividual
• Example:
http://smartcity.linkeddata.es/lcc/resource/LeisureCentre/LeisureCentr
eWetJohnCharlesCentreforSport
• RDF syntaxes
• RDF/XML, ttl, N3, N quads 28
29. Ontology development
[1] Suárez-Figueroa, M.C. PhD Thesis: NeOn Methodology for Building Ontology Networks:
Specification, Scheduling and Reuse. Spain. June 2010.
Activity definition taken from [1]
Focus of each activity
Existing tools to carry out the activity
Tips, alternatives and references
29
30. Ontology development
Ontology Requirements: refers to the activity of collecting the
requirements that the ontology should fulfil (for example, reasons to build
the ontology, identification of target groups and intended uses). (NeOn)
30
Proposed references:
- NeOn Guidelines for non functional
requirements.
- Competency Questions technique [1]
Tools: mind map, text editor,
etc
[1] Gruninger, M., Fox, M. S. The role of competency questions in enterprise engineering. In Proceedings of
the IFIP WG5.7 Workshop on Benchmarking - Theory and Practice, Trondheim, Norway, 1994.
31. Ontology development – LLC example
LCC example (Data from….)
Non functional requirements specified:
• The ontology will try to adopt concepts and design patterns
in other ontologies where possible
• The ontology should be implemented in OWL 2 DL
31
32. Ontology development
Ontology term extraction to extract a glossary of terms that
may be developed.
Tools for terminology extraction:
• Identify nouns, verbs, etc.
• Tools: Freeling for free text
Focus:
• Extract terminology from Competency Questions
(NeOn)
• Extract terminology directly from the data
• Expert advise || Done by experts
32
Complete the list with synonyms
33. Ontology development – LLC example
Site
place
Address
PostCode
Electricity
Consumption, utilization
years
time
33
34. Ontology development
Ontology conceptualization refers to the activity of
organizing and structuring the information (data, knowledge,
etc.), obtained during the acquisition process, into meaningful
models at the knowledge level and according to the ontology
requirements specification document. (NeOn)
Drawing tools, including paper and pencil
Focus drafting (optional):
• Identify main domains and top concept
• Establish relations between concepts and domains
Focus detail model:
• Establish hierarchies
• Establish specific relationships among defined
elements, rules, axioms, etc.
34
Do not try to define everything. You might
change your mind during the implementation.
36. Ontology development
Ontology search refers to the activity of finding candidate
ontologies or ontology modules to be reused (NeOn).
Search tools:
• General purpose:
• LOV: http://lov.okfn.org
• LOD2Stats: http://stats.lod2.eu/vocabularies
• Google
• Others: ODP Portal http://ontologydesignpatterns.org
• Domain base:
• Smart cities: http://smartcity.linkeddata.es/
Focus:
• Terms already used in LOD
• Save time and resources
• Increase interoperability
Use domain terms and synonyms
Do not spend too much
time trying to find terms
for everything. You might
need to create them.
36
38. Ontology development
Ontology Selection refers to the activity of choosing the most suitable
ontologies or ontology modules among those available in an ontology
repository or library, for a concrete domain of interest and associated
tasks. (NeOn)
Evaluation tools:
• OOPS! – OntOlogy pitfalls scanner [1] http://www.oeg-
upm.net/oops/
• Triple checker http://graphite.ecs.soton.ac.uk/checker/
(already included in OOPS!)
• Vapour http://validator.linkeddata.org/vapour (to be included
in OOPS!)
Also it should be considered:
• Modelling issues (OOPS!, reasoners, manually review, etc.)
• Domain coverage (based on the data to be represented)
• Used in Linked Data (LOD2Stats, Sindice, etc)
Focus:
• Assessment by Linked Data principles
• Modelling issues
• Domain coverage: data driven
[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. (2012). Validating ontologies with oops!.
In Knowledge Engineering and Knowledge Management (pp. 267-281). Springer Berlin Heidelberg.
Further reference:
NeOn Guidelines
38
39. Ontology development – LLC example
• Domain coverage
• Schema.org for public places and provides some additional
terms and properties that can be used(e.g., PostalAddress
and City)
• Also widely-known and accepted vocabulary
interoperability
• Closer semantics
• ero:FinalEnergy class from the Energy Resource and the
ssn:Property class from the SSN ontology in order to
represent specific indicator for which the consumption is
related to
39
40. Ontology development
Ontology Integration. It refers to the activity of including one ontology
in another ontology. (NeOn)
Tools:
• Ontology editors: Protégé, NeOn Toolkit, etc.
• Plug-ins: Ontology Module Extraction and Partition
• Text editors for manual approach
Focus:
• How much information should I reuse?
• How to reuse the elements or vocabs? Preliminary analysis [1]
• Should I import another ontology?
• Should I reference other ontology element URIs?
• ... replicating manually the URI?
• ... merging ontologies?
• How to link them?
Techniques:
• Import the ontology as a whole
• Reuse some parts of the ontology (or ontology module)
• Reuse statements
[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. The Landscape of Ontology Reuse in
Linked Data. 1st Ontology Engineering in a Data-driven World (OEDW 2012) Workshop at the18th
International Conference on Knowledge Engineering and Knowledge Management . Galway, Ireland, 9th
October 2012. http://www.slideshare.net/MariaPovedaVillalon/mpoveda-oedw2012v1
40
41. Ontology development
Ontology Enrichment It refers to the activity of extending an ontology with
new conceptual structures (e.g., concepts, roles and axioms). (NeOn)
Focus:
• How should I create terms according to ontological foundations
and Linked Data principles?
Ontology development:
• Ontology Development 101: A Guide to Creating Your First
Ontology [2]
• Ontology Engineering Patterns
http://www.w3.org/2001/sw/BestPractices/
• Extracting ontology conceptualization, formalization
techniques from existing methodologies
Recommendation
• Link to existing entities
• Provide human readable documentation
• Keep the semantics of the reused elements
[1] Natalya F. Noy and Deborah L. McGuinness. Ontology Development 101: A Guide to Creating Your First
Ontology’. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical
Informatics Technical Report SMI-2001-0880, March 2001.
Tools:
• Ontology editors: Protégé, NeOn Toolkit, etc.
41
43. Ontology development
Ontology Evaluation it refers to the activity of checking the
technical quality of an ontology against a frame of reference. (NeOn)
Evaluation tools related to Linked Data principles:
• OOPS! – OntOlogy pitfalls scanner [2] http://www.oeg-
upm.net/oops/
• Triple checker http://graphite.ecs.soton.ac.uk/checker/
(already included in OOPS!)
Evaluation tools/techniques other aspects:
• Modelling issues (OOPS!, reasoners, manually review, etc.)
• Domain coverage (based on the data to be represented)
• Application based (queries)
• Syntax issues: validators
Focus:
• Assessment by Linked Data principles
• Modelling issues
• Domain coverage: data driven
[1] Poveda-Villalón, M., Suárez-Figueroa, M. C., & Gómez-Pérez, A. (2012). Validating ontologies with oops!.
In Knowledge Engineering and Knowledge Management (pp. 267-281). Springer Berlin Heidelberg.
43
44. Ontology development – LLC example
Minor, mostly
lack of
annotations
in reused
terms.
44
45. Table of Contents
1. Introduction
2. Data preparation
3. Ontology development
4. Data generation
1. Data transformation
2. Data linking
5. Discussion and Conclusions
45
46. Data transformation
• Transformation of the data to RDF
• Steps
1. To select the RDF serialization
• RDF/XML, Turtle, N-Triples, JSON-LD
2. To select a tool
3. To transform the data
4. To evaluate the obtained RDF data
• Syntax evaluation
• Accuracy
• Usage
46
47. Data transformation - Tools
47
Database to RDF Data streams to RDF
• morph-RDB
• D2R Server
• TopBraid Composer
• morph-streams
• D2R Server
Spreadsheets to RDF XML to RDF
• TopBraid Composer
• Excel2RDF
• RDF123
• XLWrap
• OpenRefine
• XML2RDF
• TopBraid Composer
• OpenRefine (GoogleRefine,
LODRefine)
61. Data transformation – LCC example Evaluation
• Syntax evaluation
• Consistency with the ontologies
• Usage evaluation by running SPARQL queries
• show all electricity consumptions and related time periods for
all council sites related to culture
• show all energy consumptions and related time period of
council sites from Wakefield district
61
62. Table of Contents
1. Introduction
2. Data preparation
3. Ontology development
4. Data generation
1. Data transformation
2. Data linking
5. Discussion and Conclusions
62
63. Data linking
• Ensuring that data are not just “isolated islands”
• Steps
1. To identify classes whose instances can be the
subject of linking
2. To identify data sets that may contain instances
for the previously-identified classes
3. To select the tools for performing the task
4. To use the tool in order to obtain links
• Tools: LN2R, LD mapper, Silk, LIMES, RDF-AI,
Serimi, OpenRefine
63
64. Data linking – LCC example
1. Classes: City, District
2. Data sets: Dbpedia
3. Tool: OpenRefine
64
70. Table of Contents
1. Introduction
2. Data preparation
3. Ontology development
4. Data generation
5. Discussion and Conclusions
70
71. Discussion and Conclusions
• The guidelines are based on requirements from
smart city stakeholders
• Address the broad scope of scenarios
• Different data formats (databases, CSV, Excel, XML, etc.)
• Update frequencies (static and dynamic data)
• Legal and licensing issues
• Introduces a complete example
71
Radulovic, F., García-Castro, R., Poveda-Villalón, M., Weise, M., Tryferdis, T.: D4.1: Requirements and guidelines for energy
data generation. Technical report, READY4SmartCities Consortium, May 2014
73. Linked Data is just data
73
01000000
electric1011
01000000
electric1112
01000000
0 20 40 60 80 100
electric1213
Building
Electrical consumption
0e+00
2e+06
4e+06
6e+06
8e+06
0 500000 1000000 1500000 2000000
Electricity
Gas
Electricity vs gas consumption 12/13
0.0e+00
4.0e+06
8.0e+06
1.2e+07
0 500000 1000000 1500000 2000000
Electricity
Oil
Electricity vs oil consumption 12/13
74. Benefits of linking data
74
resPlus$electricTotal
0e+00
2e+06
4e+06
6e+06
Total electric consumption
Original data
+ geolocation
resP
Total electric consumption in
locations with population > 20.000
Original data
+ geolocation
+ population
77. Discussion and Conclusions – Future work
• Development of services for facilitating the usage of
Linked Data technology
• Support in adopting Linked Data technology
• Guidelines for publication and exploitation of Linked
Data
• Summer school for 2015
• Other training?
77
78. Linked Energy Data Generation
Tutorial
Filip Radulovic, María Poveda Villalón, Raúl García-Castro
{fradulovic,mpoveda,rgarcia}@fi.upm.es
ETSI Informaticos
Universidad Politécnica de Madrid
Campus de Montegancedo s/n
28660 Boadilla del Monte, Madrid, Spain
Twitter: @LD4SC
02.10.2014. Sustainable Places 2014, Nice, France