Discussion Notes: Presentation to Ecoinformatics International Technical Collaboration Partnership
International Web Meeting - Linked Open Data and Environmental Information
Day 1 – December 6, 2010
Geospatial Topic – Dave Smith
1. Ecoinformatics International Technical Collaboration Partnership
International Web Meeting - Linked Open Data and Environmental Information
Day 1 – December 6, 2010
Geospatial Topic – Dave Smith
December 6, 2010
Dave Smith
USEPA/OEI/OIC/IESD/ISSB
smith.davidg@epa.gov
202-566-0797
Document Change History
Revision Date Author Description
1.0 12/6/2010 David G. Smith Initial Version
FRS as a Linked Open Data Pilot - Background
EPA maintains a database of facilities, which is aggregated from a variety of sources – 32 federal
databases (mostly EPA, along with a few others such as Energy Information Administration), and 57
state and tribal databases. Information about facilities is conflated from these sources, to include
facility name and geographic location (to include spatial feature type such as point or polygon, latitude,
longitude, coordinate reference system, and collection metadata), physical and mailing address, points
of contact, activities conducted at the given location (via North American Industry Classification System -
NAICS and its’ predecessor, Standard Industrial Classification - SIC codes), and any associated program
identifiers, permit numbers, and other related items.
This in turn serves as a geospatial foundation piece for some of EPA’s reporting and mapping tools and
capabilities, such as Envirofacts, MyEnvironment and other tools, allowing parametric data and reports
from a variety of programs to be linked to facilities.
Currently this integration is being done via traditional means, i.e. Relational Database Management
System queries; additionally, web services and APIs are limited - as such, integration opportunity is
generally limited to what we can do within the Agency.
2. EcoInformatics – Geospatial Discussion
November 11, 2010 December 6, 2010
Opportunity
Via Linked Open Data approaches, there is opportunity and potential for publishing this facilities data
framework to allow analysis across other agencies as well, such as Occupational Safety and Health
Administration - OSHA or Mine Safety and Health Administration - MSHA enforcement histories,
offshore platforms using Bureau of Ocean Energy Management, Regulation and Enforcement - BOEMRE
data, and other types of cross-cutting, government-wide approaches, as more Linked Open Data assets
become available.
Initial Efforts
EPA is still in the planning stages – we have published some initial FRS data as RDF via Data.gov,
however we are now working to iteratively refine our LOD publishing approach, through the use of a
“cookbook” approach which we hope to be able to apply to a number of EPA datasets, which will
establish a framework to provide consistent methodologies and approaches for publishing Linked Open
Data agencywide. Part of this will be to leverage existing agency investments in metadata, data
dictionaries, terminologies and ontologies, toward further contextualizing of agency data assets.
For FRS, we hope to contextualize the various facets of the data, e.g. corporate/organizational entity,
points of contact, activities and other aspects.
Geospatial Enablement
There are multiple aspects to geo-enablement via Linked Open Data – one being how to represent the
features in a manner that works for mapping, such as points, lines, polygons and associated topologies,
the associated coordinates, along with metadata describing such things as coordinate reference systems
and locational accuracy estimates.
For the geospatial feature component of FRS, we hope to look at current OGC standards and efforts,
such as the GeoSemantics SWG, as well as emergent GeoSPARQL efforts, and to collaborate with the
Spatial Ontology Community of Practice (SOCOP). We will need to delve into the most efficacious means
of representing features, such as GeoRSS, along with current coordinate reference systems (e.g. NAD83)
toward interoperability and geospatial analysis.
Another aspect of this deals with the geography of interest, delving into relating the facility attribute
ontology with the surrounding terrain ontology to contextualize, for example, if we are dealing with a
mining facility, can one relate the facility interest with other datasets such as geology, stratigraphy, and
other mining-related data?
These may require some tuning in how we collect and model data, for example, most of our data has
historically been program-specific, with some of these subtler nuances currently only reachable through
imperfect derivation, based on things like NAICS code.
Next Steps
2
3. EcoInformatics – Geospatial Discussion
November 11, 2010 December 6, 2010
We hope to collaborate with our counterparts in other agencies on best practices and lessons learned –
in the case of EPA’s Facility Registry System, there are direct, tangible, and implementable pieces which
we can put into motion, and there is opportunity to develop a more robust Linked Open Data approach,
an effort which has already kicked off.
3