1. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California
An Automatic Approach to Building Earth Science Knowledge
Graph to Improve Data Discovery (ESKG)
Lewis J. McGibbney (JPL)
Yongyao Jiang (George Mason University)
2. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California
Agenda/Objectives
This presentation aims to provide…
• an introduction to ESKG (for those who may have never heard of it before)
• an update on the ESKG testbed Project (for those who have…)
• a budget update
• future plans and community building efforts
Who is this presentation for?
• Put simply anyone with an interest in the ESIP Labs project
• Semantic Technologists
• Data integration and discovery enthusiasts
What are the takeaways?
• Learn about the growing ESIP Semantic Technology stack
• Learn about the ESKG codebase - https://github.com/ESIPFed/eskg/
• Consider getting engaged in the process of building linked open data and
knowledge graph(s) for the Earth Sciences community.
ESIP Summer 2017 2
3. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California The ESKG Team
ESIP Summer 2017 3
4. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California
Introduction
What is ESKG?
ESKG [1] is an effort to revolutionize the way in which ESIP communities interact
with ES data in the open world through the entity, spatial and temporal linkages and
characteristics that make it up. This project will enable the advancement of ESIP
collaboration areas including both Discovery, Semantic Technologies and possibly
Drone communities by putting graph information right at our fingertips in an
interactive, modern manner and reducing the efforts to constructing ontology.
ESKG will strengthen ties between observations and user communities by:
• developing a knowledge graph derived from heterogeneous sources via natural
language processing and knowledge extraction techniques, and
• allowing users to traverse, explore, query, reason and navigate ES data via
knowledge graph interaction.
ESIP Summer 2017 4
5. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California How is this achieved? What are the goals?
• Leverage progress made through recent and existing Semantic
Technologies Testbed projects [2], [3] to deliver the ESKG concept
originally focused on NASA JPL’s PO.DAAC [4].
• Provide the ESIP community with a semi/fully automated knowledge
representation methodology which will overcome current limitations
associated with manual ontology development approaches.
• Ensure that both of the above goals advance development of Semantic
Technologies and Discovery collaboration areas within the ESIP
Federation.
ESIP Summer 2017 5
6. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California
Rationale behind Ontology Integration and Knowledge
Graphs
• Ontology-based data integration involves the use of ontology to effectively
combine data or information from multiple heterogeneous sources [5].
• Within Earth Science’s there are several persistent efforts to improve semantics
and increase their uptake.
• Such efforts take numerous forms e.g.
– Community level – ESIP Semantic Technology Committee [6]
– Infrastructure level – ESIP Semantic Technology Portal [2] and Community
Ontology Repository [3], more to come on this…
– Vocabulary level – SWEET [7], ENVO [8], etc.
The goal is that, by using such resources we can better align search and
discovery functionality within the Earth Sciences community(s) it will
eventually serve.
ESIP Summer 2017 6
7. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California What kind of things does ESKG allow?
Currently…
1. Generation of high quality domain-specific (oceanographic) ontology
2. Free and open use and availability of those ontology to the wider Earth
Science community e.g. persistence and archival within ESIP
infrastructure [2], [3]
3. Consumption of our own (and others) semantic resources within the
NASA AIST (NNX15AM85G) MUDROD [9] semantic search and
search engine ranking and possibly visualization
1. Semantic search – term expansion/auto completion/suggestions
through use of synonyms, sub/super classes, term negation
through antonyms, etc.
2. Search engine ranking – using ontological relationships as a
fundamental element within an overall MUDROD scoring metric.
3. Visualization – through an ontology graph interface such as we
have seen deployed at the old SWEET JPL Website. This use
case typically aligns with domain discovery.
ESIP Summer 2017 7
8. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California
ESKG Architecture
ESIP Summer 2017 8
PODAACWebServicesClient
PODAACOntologyMapper
Storage Abstraction
MUDROD
9. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California PO.DAAC Datasets Ontology (PDO)
• PO.DAAC offers Webservices API’s for programmatic access to a
variety of PO.DAAC data holdings [10]
• Data is available in a variety of serializations e.g. HTML, ATOM/RSS,
GCMD, etc.
• All PO.DAAC metadata is currently structured according to GCMD-DIF
v9.8.2. The XSD can be found at [11] an example PO.DAAC GCMD
metadata record can be found at [12] and the link also below.
https://podaac.jpl.nasa.gov/ws/metadata/dataset?datasetId=PODAAC-
TELND-PGTX1&format=gcmd
• Complete end-to-end code for generating PDO and persisting it into
either ESIP Semantic Portal or COR can be located at [13]. This is
essentially the canonical ESKG source.
ESIP Summer 2017 9
10. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California Project Update
An entire project update as of 30th June 2017 can be located at [14]. This provides
• An updated budget showing project spending to date as well as projected project
spend accommodating specific events (such as proposed conference/workshop
attendance) as explained within the text.
• Alignment of projected conference/workshop attendance with deliverables outlined
within a previously submitted Memorandum of Understanding (MOU) as attached
in Appendix I.
In order to successfully execute the ESKG Testbed project, we originally requested
the full available budget of $7,000. In the project proposal we also stated the following:
• the budget would be equally distributed over the project duration,
• no costs would being directed towards travel or ESIP meeting attendance (e.g
travel to meeting, accommodation and meeting registration) as such costs had
already been obtained by proposed participating collaborators, finally
• proportional small budget ‘may’ however be directed towards additional
community/breakout events are ESIP meetings, and preparatory work such as
generation of poster presentation(s), and transition of the ESKG project to an
appropriate long term home. ESIP Summer 2017 10
11. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California Project Budgetting
ESIP Summer 2017 11
12. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California Future Work
• Integration of NOAA (and hopefully USGS) data sources
• Community building event attendance at the International Conference
on Biomedical Ontology 2017 September 13th - 17th, 2017 in
Newcastle, England [15]. In particular ESKG (and therefore ESIP) will
be present at the ONTOEDIT 2017 Workshop [16] aimed around
addressing challenges in the design, authoring and publication of
ontology.
• Make a formal release of ESKG to Maven Central [17] enabling the
ESKG Java client to be used as a dependency in other projects.
• Present ESKG at AGU Fall Meeting, New Orleans, 11th – 15th, 2017
IN028: Enabling Interoperability, Interdisciplinary Use, and
Stewardship of Scientific Data through Knowledge Representation
Frameworks [18]
ESIP Summer 2017 12
13. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California Acknowledgements
• ESKG was funded as an ESIP Tesbed Project acknowledgement
recognized. No salaries were paid to either Lewis John McGibbney or
Yongyao Jiang.
• Lewis John McGibbney’s participation at ESIP is funded by the National
Aeronautics and Space Administration acknowledgement recognized.
• To the ESIP Testbed Committee/Review Panel.
Please contact us at [19] with any questions.
ESIP Summer 2017 13
14. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California
References
1. http://bit.ly/2jFjEDB
2. http://semanticportal.esipfed.org
3. http://cor.esipfed.org/
4. https://podaac.jpl.nasa.gov
5. H. Wache; T. Vögele; U. Visser; H. Stuckenschmidt; G. Schuster;
H. Neumann; S. Hübner (2001). Ontology-Based Integration of
Information A Survey of Existing Approaches. CiteSeerX
10.1.1.142.4390
6. http://wiki.esipfed.org/index.php/Semantic_Technologies
7. https://github.com/ESIPFed/sweet
8. https://github.com/EnvironmentOntology/envo/
9. https://mudrod.github.io/
10. https://podaac.jpl.nasa.gov/ws
11. http://gcmd.nasa.gov/Aboutus/xml/dif/dif_v9.8.2.xsd
12. https://podaac.jpl.nasa.gov/ws/metadata/dataset?datasetId=POD
AAC-TELND-PGTX1&format=gcmd
ESIP Summer 2017 14
15. National Aeronautics and
Space Administration
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, California
References
13. https://github.com/ESIPFed/eskg
14. http://bit.ly/2vdaHre
15. https://conferences.ncl.ac.uk/icbo17/
16. https://conferences.ncl.ac.uk/icbo17/workshops/
17. http://search.maven.org/
18. https://agu.confex.com/agu/fm17/preliminaryview.cgi/Session236
20
19. https://github.com/ESIPFed/eskg/#community
ESIP Summer 2017 15