NPOs and NGOs are acting more-and-more as open data providers for various stakeholders like citizens, enterprises and communities. Linked open data becomes a key concept to meet several demands of information professionals, for instance interoperability and accessibility of data, multilinguality and harmonisation of metadata.
The open data value chain is about to change from a rather simple to a more complex network of data streams which produces new revenue models and more differentiated roles – linked open data plays a central role in this development.
This webinar is about the use of linked open data and controlled vocabularies in the specific enviroments, NGOs and NPOs are working in. Get an overview about the underpinning motivation and concepts which drive the very concrete use cases which will be presented:
1. Fisheries Linked Open Data
HARMONIZATION AND INTERLINKING OF
FISHERY REFERENCE TERMINOLOGIES
CLAUDIO BALDASSARRE
2. Outline
Fisheries Linked Open Data
Harmonization
Interlinked Domains
Application Scenarios
FLOD Consumer Applications
FLOD as Master Data Management
Objectives, Challenges and Current Status
3. Fisheries Linked Open Data
A core of code lists that are references for statistical reports or
data dissemination (e.g. yearbooks, web portals).
The codes are associated to terms (and translations) to
provide controlled vocabularies.
Fishing gears (ISSCGF) ex: purse seines - 01.1.0
Fishing vessels (ISSCVF) ex: purse seiners - 02.1.0
Fishing Area (FAO ) ex: western Mediterranean – 37.1
Marine species (ASFIS) ex: yellow fin tuna - YFT
A dense network of cross domain relationships.
e.g. Sovereignty of a Country on Exclusive Economic Zone
e.g. Participation of a Country in fishing agreements
Serves fishery communities of practice inside and outside
FAO
Statisticians, Marine Scientists, Content Mangers
5. Harmonization
Supports statisticians in
FLOD Fishery
Statistics
Fisheries Division to aggregate
catch statistics from regional to
en: Yellow Fin Tuna, global level.
es : Rabil
fr : Albacore
lt : Thunnus Albacares
asfis : YFT
taxonomic : 1750102610
worms : 127027
aquamaps : 22833
fishbase : 22833
6. Interlinked Domains
Enables users to formulate
Land Geo-
Politics
complex requests leveraging
cross-domain connections:
Marine Amount of fish caught in 2008 in Danish
Fishery
Geo-
Statistics
Politics Exclusive Economic Zone by vessels that
practice fishing with traps?
FLOD Catch statistics reported in FAO subdivisions
intersecting the marine areas sovereigned
Fishery Fishery by Denmark?
Vessels Technique
Countries interested in the expiration of the
Fishery legal agreements involving FAO fishing area
Legislation 18 ending in year 2012?
Driving Competency Question: all deep-water species member of family x and family y that are critically
endangered and predominately feed on prey species z that occur in this LME but only between latitudes A
and B, and longitudes C and D
7. Application Scenarios
Reallocate species catch statistics
Land Geo-
Politics based on geospatial information,
Marine
and fishing agreements.
Fishery
Geo-
Statistics
Politics Generate landing pages
populated with data from remote
FLOD
open linked datasets.
Fishery Fishery Enhance search by exploiting
Vessels Technique
network of connections in FLOD.
Fishery
Legislation
Document retrieval driven by
contextual information.
Driving Competency Question: all deep-water species member of family x and family y that are critically
endangered and predominately feed on prey species z that occur in this LME but only between latitudes A
and B, and longitudes C and D
8. FLOD Portal: Harmonization Exposed
Search for reference terms
Display alternative codes
from harmonized code lists
Display translation for the
controlled term
Display meta information on
data provenance (i.e. rights
holder and publisher)
Multilingual auto completion
to avoid spelling errors
All data are exposed through
the FLOD SPARQL endpoint
9. FLOD Portal: Network of Publications
Display a list FLOD
individuals annotating this
publication.
Display the occurrence of the
user query in a specific page
of the publication.
Display provenance
information for this
publication.
Multilingual auto completion
to avoid spelling errors
All data are exposed through
the FLOD SPARQL endpoint
10. Enrich User Information Context
Mine FLOD entities into the
web page browsed by the
users
Enrich the content with data
from the FLOD SPARQL
endpoint retrieved through
the hyperlinks
Provides an alternative to
search the portal or the
SPARQL endpoint for casual
users
11. SPREAD: Time Series Spatial Reallocation
Retrieves all Exclusive
Economic Zone where a
Country is allowed to fish.
Retrieves reference species
codes from regional code lists
Retrieves spatial intersection
of fishing areas
Retrieves fishing rights based
on fishing agreements
All data are stored in the
FLOD SPARQL endpoint
12. Smart Time Series
Mine FLOD entities into catch
time series (i.e. species, water
areas, country)
Leverage the network of
FLOD to associate geo-
referential data to geographic
entities (e.g. water areas)
Generate KLM model
including references to the
entities URIs found in each
statistical record
Map time series records on
Google map
13. FLOD as Master Data Management
Master Data Management is a wide area of technological
investigation in FAO to identify a toolkit that enables the
management/maintenance of reference data at corporate level.
MDM Principles
Features
• Managing Multiple Vocabularies/Classifications and their Cross Mappings
• Multilingual Services
• Import/Export Routines Supporting Multiple Formats
• Integration with Existing Tools and Systems through open APIs/Web services
• Governance (data ownership and update workflows)
Benefits
FLOD FLOD projects inherits the definition of principles and the
benefits of MDM, and develops MDM features with the
adoption of semantic technologies.
14. Vocabularies and their Cross Mappings
Objectives
Align reference terminologies of fishing gears, vessels, marine species
and fishing areas.
Link individuals of fishing gears, vessels, marine species, fishing and
administrative sea areas, geo-political territories, legal and governmental
entities.
Challenges
Evolve from an hierarchical structure to a data model design enabling
accurate alignment and linking capabilities w.r.t. the heterogeneity of
classifications granularity.
Current status
FLOD is designed by architecting modules of part-whole, constituency,
collection, and other reusable ontology engineering patterns.
Ingestion workflows are in place from structured sources of
terminologies and relationships to generate linked datasets.
15. Multilingual Services
Objectives
Associate the name(s) of the reference terminologies among the
variety of lexicalizations of local usage
Track evolution of lexicalizations over time.
Challenges
A descriptive semantic model for lexicalizations that responds to the
needs of selecting the appropriate name(s) in the user information
context.
Current status
FLOD implements most known RDFS:label property with metadata
on language; it defers information on more lexicalizations or name
change to the source system of names provenance.
16. Import/Export Routines
Objectives
Streamline workflows of data import and conversion to RDF.
Record and store versions of imported data.
Selective maintenance operations targeting specific datasets.
Challenges
A framework for ETL operations for system administrator with low
or no knowledge of linked datasets.
Keep synchronization among data sources and linked dataset on
regular basis, with control on versioning.
Current status
Semi-automatic processes of reading and casting reference
terminologies in to datasets through FLOD ontology modules.
Each dataset receives a timestamp as an explicit metadata at its
creation.
17. Integration with Existing Tools and Systems
Objectives
Homogeneously search remote reference terminologies hosted in a
combination of DBs and KBs.
Empower search engines with query expansion based on the knowledge
available in FLOD.
Expose knowledge base content trough RDF agnostic API.
Challenges
Define a top level domain ontology that contains the reference super
concepts to address the reference terminologies.
Implement a mechanism to cast user terms as individuals or concepts in
the top level ontology.
Current status
An implementation of OpenSearch with semantic extension is being
prepared to enable query services on top of FLOD SPARQL endpoint.
FLOD portal includes spelling support to convert user query terms in to
references to FLOD individuals.
18. Governance
Objectives
Model the role and activity of the data providers with explicit
reference to owner, publisher, rights holder, right to update,
and system of provenance.
Inject the Import/Export maintenance routine with the roles
and activity of governance.
Challenges
Identify licensing schemas that can drive the modeling activity.
Current status
Round tables on governance have been started in contexts
where the data providers have active participation.
19. Conclusions
FLOD represents a source of harmonized reference data and
controlled terms for applications of statistics and marine science.
FLOD it is consumed by applications in need to aggregate either
data instances, or documental resources relevant to users’ need (e.g.
search or context).
FLOD approach to maintenance of linked reference datasets leads
to a decentralized maintenance effort under the responsibility of
respective data owner.
FLOD roadmap is at point where robust maintenance framework
and a scalable operational infrastructure are recommended.
Where FLOD approach to code lists and controlled terms
maintenance proves to be successful it can provide good practice
and recommendations to corporate Master Data Management.
20. Acknowledgements
iMarine project supports the development of:
FLOD Web Portal
SPREAD geospatial reallocation engine
FLOD content enricher
iMarine: http://www.i-marine.eu/
Notas do Editor
Start introducing MDMA wide area of technological investigation in FAO (ref. from francesco, blog entry)FLOD is an instance of MDM hence inherits the features, the principles the business benefits, do that any applied technology to tackle a challenge that is localized in the fisheries domain could be elevated and shared for the enterprise master data management.
Start introducing MDMA wide area of technological investigation in FAO (ref. from francesco, blog entry)FLOD is an instance of MDM hence inherits the features, the principles the business benefits, do that any applied technology to tackle a challenge that is localized in the fisheries domain could be elevated and shared for the enterprise master data management.
Start introducing MDMA wide area of technological investigation in FAO (ref. from francesco, blog entry)FLOD is an instance of MDM hence inherits the features, the principles the business benefits, do that any applied technology to tackle a challenge that is localized in the fisheries domain could be elevated and shared for the enterprise master data management.
What is the current status with respect to:A terminology is made of codes and names with multiple translations, hence also named codelist, sometimes controlled terms. Is a reference for one or multiple communities of practice and is organized in a hierarchical way.In some operational scenarios of data aggregation (e.g. statistics of global fish captures ) locally identified fishing gears, vessels, marine species need to be reduced to global code lists like FI ownsSome times the level of detaisamog terminologies/codelist is heterogeneous so that for one term/code n terms appear in another list
What is the current status with respect to:Managing multiple vocabulariesManging multilingualGovernanceImport exportIntegration with other services
What is the current status with respect to:Managing multiple vocabulariesManging multilingualGovernanceImport exportIntegration with other services
I left governace at last because is the aspect where we need more spin to boot strap the activityWe are trying to apply governance from iMarin project