Nell’iperspazio con Rocket: il Framework Web di Rust!
Proposed Linked Data Migration Framework for Singapore Government Datasets
1.
2. • Basics of Linked Data
• data.gov.sg
• Purpose of this project
• Migrational Framework
• Eight Steps
• Use Cases
• Conclusion
3. Governments
Enterprises
Types of Data
•Factual Data
Entertainment
•Transactional Data
•Textual Data
•Spatial Data
Libraries &
•Multimedia
Museums
•Files & Database
Social Media Data
Business (Blogs, Facebook)
OPPORTUNITY OF LINKING DATA ACROSS VARIOUS DOMAINSAND TYPES
4. Mr.Brendan Luyt’s Associated publication search…….
(Traditional
Approach) (Linked Data
Approach)
Mr.Lee Kuan Yew! an exploration!..
Others….
7. Data.Gov.Sg
iDA Singapore launched Data.gov.sg portal and mGov@SG public services during June 2011
Data.gov.sg provides 5000+ public data sets from 50 government agencies
Purpose: Building applications, research and for creating applications using the data
8. Accountant-General's Department
Accounting and Corporate Regulatory Authority
Agency For Science, Technology & Research
SG Government Data Eco System
Attorney-General’s Chambers
Building & Construction Authority
Central Narcotics Bureau
Central Provident Fund Board
Civil Aviation Authority of Singapore
Department of Statistics
Economic Development Board STRUCTURED DATA
Energy Market Authority
Health Sciences Authority
Housing & Development Board
Ministry of Education Immigration & Checkpoints Authority Agency Websites
Infocomm Development Authority of Singapore
Inland Revenue Authority of Singapore TEXTUAL
Ministry of Foreign Affairs
Institute of Technical Education
Intellectual Property Office of Singapore HTML
UNSTRUCTURED DATA
Ministry of Community Development, Youth & Sports JTC Corporation SG DATA SPATIAL
Judiciary, Subordinate Courts
Judiciary, Supreme Court DGS Eco System
Ministry of Health Land Transport Authority API PDF
Majlis Ugama Islam Singapura
Ministry of Law –Community Mediation Unit Maritime & Port Authority of Singapore
STATUTORY Media Development Authority
BOARDS Monetary Authority of Singapore Singstat
MINISTRIES publications
Ministry of Manpower Nanyang Polytechnic
National Environment Agency
Ministry of Transport National Heritage Board
National Library Board STRUCTURED DATA XLS
National Parks Board
Ngee Ann Polytechnic
People's Association
Public Service Division
Public Transport Council
Public Utilities Board
Republic Polytechnic
Sentosa Development Corporation Map-related APIs from various agencies
Singapore Civil Defence Force Traffic-related APIs from Land Transport Authority
Singapore Customs
Tourism-related APIs from the Singapore Tourism Board
Singapore Land Authority
Singapore Police Force Environment-related APIs from the National Environment Agency
CET Centers(Emp) Infocomm Access (C)
Singapore Polytechnic WDA Service points(Emp) Silver infocomm (C) Library-related data feeds & web services from National Library Board
Child care (F)
Singapore Sports Council Disability (F) Wireless Hotspots (R)
Singapore Workforce Development Agency Sports clubs (S)
Elder care (F)
Spring Singapore Family (F) Breast Screen (H)
Temasek Polytechnic Family Friendly Estab (F) Kindergartens (Edu) Cervical Screen (H)
Urban Redevelopment Authority C- Community Get TokenAddress
Student Care (F) Healthier Dining (H)
Heritage sites(Cul) Cul - Culture SearchAgency Data
Comm Mediation Center (C) Quit Centers (H)
Monuments(Cul) THEMES E- Environment CATEGORIES SearchStatic Map OPERATIONS
Museums(Cul) BFABuildings(C) Emp- Employment Get Layer InfoMashup
GreenBuilding(E) Edu - Education Get Related Data
After Death Facilities (E)
CD Councils (C) H- Health Get Directions
Funeral Palours (E)
Community Clubs (C) ABC Water Proj (R) F- Family Public Transportation
Dengue Cluster (H)
Constituency offices (C) R- Recreation Reverse Geocode
Hawker Center (E) National Parks (R)
NEA Offices (E) Other facilities (C) Skyrise greenery (E) S- Sports
Recycling Bins (E) Other Pan networks (C)
Waste Disposal Site (E) PA head quarters (C)
Residents Committee(C) Libraries (Cul)
Waste Treatment (E) Water Venture (C) Streets and Places(Cul)
9. Drawbacks of Existing Data Ecosystem
•Siloed architecture
•Absence of vocabulary standardization(common language)
•Multiple data consumption end points
•Steep learning curve for developers during application development process
•Absence of interlinking between data sets
Solutions to above identified drawbacks through Linked Data works at multiple levels
Data Storage - Can support distributed storage
Data Representation - Common format(RDF) for both data and metadata.
Data Consumption - via a single output terminal(SPARQL)
Data Interlinking - Use of Ontologies (vocabularies)
IDA can use Linked data on top of their traditional systems instead of going for a
complete overhaul
10. UK Linked Data Implementation
http://wheredoesmymoneygo.org/bubbletree-map.html#/~/grand-total--2010-
http://www.sgdi.gov.sg/
http://labs.data.gov.uk/gov-structure/departments/
11. Linked Data Representation Format
RDF
Subject-Predicate -Object
Jurong belongs to the West Zone
0.21222
12.5555
http://w3.org/2003/01/geo/wgs84_pos#/lat http://w3.org/2003/01/geo/wgs84_pos#/long
http://data.gov.sg/resource/area/Jurong_West Subject
http://data.gov.sg/ontology/property/has_zone Predicate
http://data.gov.sg/resource/zone/West Object
12. Why are we doing this project?
To prescribe a migrational framework for linked data for
data.gov.sg (DGS) data sets
First hand view of the required migration activities
Issues anticipated at each step
Evaluation & Recommendation on Linked Data tools
To help IDA in understanding the benefits of Linked Data
13. Framework Formulation Process
• Based on study of Linked Data Migration Research
Papers and cookbooks published by the World Wide
Web Consortium(W3C)
• Analysis of Linked Data implementations in UK ,US
and Brazil
• Evaluation of Linked Data tools with Singapore data
sets for recommendation in each step of the
framework
• Contemplating on probable issues that could be
faced during implementation
14. Datasets Used for Framework Evaluation
URA Sites for Sales dataset(Urban Planning)
DOS Population and Household Characteristics dataset (Population Demographics)
Age Pyramid of Resident Population
Old Age Support Ratio
15. Proposed Linked Data Migrational
Framework for DGS Allocation
Allocation
10
Resource
15 Allocation
Resource
Govt Agencies and IDA Allocation
15
Govt Agencies Domain Resource Allocation
Specification Identfication Analysis Matter Experts 5
Resource Allocation
Ontology Modelers 20
Resource
Object Modeling
IDA and Web Architects
T
Objectives 5
PU
Re-use Create Allocation
IN
Specifications Resource
Project Duration Ontology Modeling Developers Allocation
T
PU
Dataset Prioritization 15
PROCESS
IN
Relational Model URI Naming Developers and Domain Resource
Dataset License Setting S2R D2R A2R
Dataset Overview 15
Experts
T
Impln Mode Selection
PU
Conceptual View RDF Creation Resource
IN
PROCESS
Roadmap Drawing Objects in Public Vocabularies Developers
T
Architecture Whiteboard
PU
Re-use of Existing Resources
OU
IN
Overview ER Model External Linking Web Architects
PROCESS
TP
Class and Properties
T
Vocabularies
PU
UT
Spreadsheets,
IN
Conceptual View Creation of New
DBMS, API
Vocabularies Visualization of URI Datasets Publication
OU
PROCESS
TP
1 mining process
T
PU
UT
OWL, RDFS, RDF
PROCESS
Discovery & Exploitation
IN
Conversion to RDF triples Government and
Vocabulary files
OU
using Mapping files external data sets
TP
T
RDF Triples
PU
UT
2 URI Administration
IN
Ontologies
PROCESS
T
URI Lifecycle
OU
Linking based on
PU
SPARQL, API Actual Data
TP
IN
RDF Triples Similarity Algorithms
UT
Data Insertion Existing Apps
OU
3
PROCESS
TP
VOID Modeling
UT
Gamification
PROCESS
Data Retrieval
Outbound Links Crowdsourcing
4 OU
API to SPARQL conversion
Catalog Registration
TP
5 UT VOID Triples External Reference
JSON data New Apps
OU
OU
TP
TP
UT
UT
6
7
8
16. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Key
Points
Specification Home
Identification Analysis
Addressing security concerns with licenses. •Understand data.gov.sg database specifications
•The Open Database License (ODbL) (relational model & ER model)
•Open Data Commons Attribution License •Seven issues identified at data storage and
•The Creative Commons Licenses consumption level
Linked Data only(just Linked Data for files
Linked Data +RDF
URI linking) only(URIs for files)
1st level 2nd level
Ideal for testing the URI Complete realization of Optional
lifecycle Linked data and
Semantic Web To improve the
Decision on URI standards discovery of files in DGS
Administration through semantic
Decision to use this annotation
Centralized(DGS) vs. mode can be taken
Decentralized(Agency) after evaluation of POC
17. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Object Modeling Home
This is modeling without usage context.
*Requires normalization of database model in 3NF form
Key Learning
Issues Ease in identifying the use of common
Possibility of applying high abstraction and objects across data sets
high granularity to objects Facilitates brainstorming of relationships
between objects
18. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Ontology Modeling Home
Takes the output conceptual diagram from Object Modeling as input.
Key Impetus
•Re-use of popular vocabularies (below table)
•Use of STDTrip methodology for arriving at Ontologies for relational databases.
Issues
•Conflicting vocabulary in data.gov.sg and
OneMap
Use Case Problem Statement
•Different levels of granularity in datasets
Consider an industrial entrepreneur
(ex: Location in URA ‘Site for Sales’ dataset
intending to buy a site from Urban
Redevelopment Authority (URA)
Predicate/Vocabularies Purpose
rdfs:label and
skos:prefLabel Naming things
Geonames Model spatial data
VoID Description Describe RDF schema or vocabulary
vCard Describing address
RDF, RDFS Model simple data
19. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Ontology Modeling Home
Date fields, location fields and fields related to
measurements in DGS have scope for
vocabulary re-use
Vocabulary for the identified data sets
(developed using Protege) with screenshots
List of vocabularies required for LOGD
implementation
List of tools used for ontology modeling
OUTPUT?
ALLOCATION PERCENTAGE?
PERSONNEL INVOLVED
20. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Key
Points
URI Naming Home
Uniform Resource Indicator (URI) is analogous to assignment of ip address to every computer
Identified URI Administration Modes
1.) Maintained centrally in the DGS platform (resultant URIs will start with
http://data.gov.sg/) – RECOMMENDED
2.) Maintained by individual agencies (resultant URIs will start with http://ura.gov.sg or
http://sla.gov.sg).
3.) Maintained externally by third party platforms such as Kasabi (resultant URIs will start with
http://data.kasabi.org).
ABOX TBOX
http://data.gov.sg/ontology/Ministry/ http://data.gov.sg/ministry/MOH
http://data.gov.sg/ontology/Agency/ http://data.gov.sg/agency/SLA
Issues
http://data.gov.sg/ontology/SiteLocation http://data.gov.sg/location/pioneer_road_north • Usage of different
http://data.gov.sg/ontology/Race http://data.gov.sg/race/chinese Linked Data tools can
Dataset URIs hamper URI naming
Dataset ID URAstaticfile001
Dataset http://data.gov.sg/dataset/ URAstaticfile001/
Class http://data.gov.sg/terms/class/URAstaticfile001/sitesforsale • Possibility of Dead links
Property http://data.gov.sg/terms/property/URAstaticfile001/time
Row 1 http://data.gov.sg/dataset/URAstaticfile001/1
Row 1 - A generic column http://data.gov.sg/dataset/URAstaticfile001/1/columnName
21. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
RDF Creation Home
Evaluated 3 tools for each mode of conversion - Google Refine, RDF Views and RDF Sponger
Issues
•Absence of intimation about API outages can cause the system to return null or invalid results
•Google Refine doesn’t create URIs for each row in the static file
•Changes to data.gov.sg tables , API output done without appropriate changes in mapping files will affect RDF conversion
22. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Key
Points
External Linking Home
External Linking is connecting with other data sets in the web of data
CIA World Supreme
WorldBank
Factbook
DBpedia Flickr FAO Geonames
Court
Data.gov.sg
<http://data.gov.sg/location/bugis> <owl:sameAs> <http://www.dbpedia.org/resource/Bugis>
<http://data.gov.sg/race/malay> <owl:sameAs> <http://www.dbpedia.org/resource/Malay_race>
Issues
•The outbound links made to data sets outside of IDA’s purview can be risky
•Dead links are a vivid possibility during the change of resource URIs or system
downtime
23. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Key
Points
Datasets Publication Home
SPARQL Query
Select ?cc
Where {
Metadata Linked Data API call
Triple Store http://data.gov.sg/lda/
LDA-SPARQL ?cc dgs:haszone dgs:north.
Publication childcare/north
Mapping file ?cc dgs:facilitytype dgs:childcare.
}
LIMIT 100
Datasets
Publication
Triple
Store
Linked Data Linked Data RDF Triples
API Hosting JSON Output Http://data.gov.sg/facility/cc/name1
Entry: name1 Http://data.gov.sg/facility/cc/name2
Entry: name2 Conversion Http://data.gov.sg/facility/cc/name3
. from RDF to .
. JSON .
. .
Entry: name100 Http://data.gov.sg/facility/cc/name100
Recommendations
•Linked data hosting platforms are
best suited for open license Issues
datasets(ex: Singstat publications) • Difficulty for Application developers - SPARQL does not
currently support sub-queries, views, stored procedures etc
•Use of APIs for updating RDF triples
instead of SPARQL Update document • Inferencing is not possible with Linked Data API
•Use of VOID generators for creating • Security implementation with 3rd party Linked Data hosting
statistics triples platforms.
24. Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8
Discovery & Exploitation Home
Key Theme
1.) Internal discovery within Singapore for local citizens
2.) External discovery for attracting usage of Singapore government data in
international economic & political research and global issues(water scarcity, Carbon
Footprint etc)
• Internal Discovery can be improved by having different end points(SPARQL, API,
Apps, RDF Dumps), creating awareness programs on availability of these data sets,
employing crowdsourcing and gamification techniques to enhance visibility and
utility of these data sets
• External discovery is optional if IDA wishes to see the DGS system being limited to
Singapore purview. External discovery can be initiated by registering the datasets in
open government dataset portals(Potential candidates are datasets with Open
license)
25. Interlinked Datasets Post-Migration
Original
data Possible because of
provided the re-use of the
by URA common resource
Similarly, location based
data from OneMap API is URI Pasir Ris across
retrieved for Pasir Ris data sets
26. Other Interesting Use Cases
Q & A Engine that works on top of government linked data. Inspired by www.trueknowledge.com
Definitely not Science Fiction!
27. Sense-Making
Question: Which recent year had a growth rate close to 50% for majority of Singapore
based SME?
Step1: Spot the resources in this query
Dbpedia Spotlight does just that! – Semantic Information Extraction
Which recent year had a growth rate close to 50% for majority of Singapore based SME
Step2: Identify the relationship between the resources
SME is instance of the Organization class Organization class comes under Singapore country
Growth rate is a property of Sales class Year is a class by itself
Majority is subset of Group class
Step3: Use NLP technique – Syntactic Analysis (Stanford Parser) followed by Focus
Extraction for understanding the question
Syntactic Parse tree is generated followed by Access Pattern
Step 4: Look for RDF triples that meet the criteria
2010 is retuned as the result!
28. Summary
Object Modeling
Concept Map
Four in-person
discussion sessions Ontology Modeling
with IDA, NIIT and SLA
Protégé
Analysis of Five
data.gov.sg system
specifications
URI Naming
Evaluation of Four Pubby
existing Migration
Frameworks
RDF Creation
Prototyping with Six
core Linked Data Tools Google Refine RDF Views RDF Sponger
External Linking
SILK LIMES
Dataset Publication
Virtuoso Universal Server Linked Data API
29. Summary
• Applicability of the framework to Singapore
Government Data
• Issues identified in existing Data Eco System
• Recommended tools and best practices for each step
• Launchpad for SG Linked Data implementation
Final Thoughts…
• ROI is not a key metric for Linked Data implementation
• Benefits of moving to Linked Data is intangible and may
not be immediately realizable
• Volume of work is huge compared to traditional
systems
Notas do Editor
Dbpedia – Places and EventsCIA and World bank- Economic AnalysisFlickr – placesFAO – export and import commoditiesSupreme Court – Facts