Metadata synchronisation with GeoNetwork - a users perspective: making metadata great again.
Presented at the ANDS facilitated GeoNetwork Community of Practice on April 3rd, 2017 in Canberra.
2. What we needed to do:
• Programmatically synchronise metadata between catalogue
and files (both directions)
• Automagically create large numbers of metadata records
populated with values drawn from diverse sources
(e.g. in-house databases, spreadsheets, text files, floppy
disks, papyrus scrolls, stone tablets etc.)
• Update spatial information in metadata record from dataset
• Update online distribution linkage metadata from authorised
distributions (e.g. THREDDS at NCI)
• Demonstrate practical querying of the catalogue for real-time
operational usage
NO REPLICATION WITHOUT SYNCHRONISATION!
GeoNetwork Users’ Group
3. GeoNetwork Users’ Group
Issues which needed to be resolved
• The task was initially given as file format translation only, with no
mention of metadata. I was a metadata noob who had to ask lots
of stupid questions
• Data is hosted externally to GA at the NCI
• No “hard” linkages existed between datasets and their metadata
records. Needed to find records using “soft” keys like title words
and filenames.
• Some datasets have existing records, some new ones don’t.
• GA’s eCat GeoNetwork catalogue implementation went live in
April, 2016, creating an immediate and pressing support backlog.
• GA’s heroic and knowledgable eCat gurus (Andy, Marty, Belle &
Aaron) were (are?) oversubscribed.
4. GeoNetwork Users’ Group
Issues which needed to be resolved (Continued)
(You know you’re in trouble when the issues run to two slides)
• GA is an early adopter of ISO19115-3
• CSW API for updating data is relatively cumbersome and
requires identity management and authentication.
GeoNetwork API needed to change status of edited records.
Too damn hard.
• Centralised identity management & authentication is still a
work in progress at GA
• Many metadata update operations are relatively complex and
best handled with direct manipulation of the metadata XML in
Python scripts
• CSW querying is user-unfriendly
6. Overview of Metadata Synchronisation Workflow
1. Convert datasets to standards-compliant netCDF-CF with
ACDD metadata attributes
2. Generate unique identifiers (UUID, eCat ID, DOI) for each
dataset and write these into the files
3. Create new XML metadata records using values drawn from
dataset and other specified sources, and bulk-ingest them
into eCat
4. Populate ACDD metadata attributes in netCDF files from
values in eCat records
5. Generate updated XML metadata records with updated
extents and valid URLs for online distributions, and bulk-
ingest them into eCat
6. Repeat steps 4 and 5 as required
GeoNetwork Users’ Group
7. Key factors which make it work
• UUID is written into dataset in order to establish a “hard” link
to the associated metadata record
• Bulk ingestion of new or updated records into eCat becomes
a simple, semi-automatic operation. If it validates, it’s good.
• The complete, unfiltered, internally-visible metadata record
must be accessed for updating (as opposed to the filtered
externally-visible one)
• Workflow is modular and tuneable, and adaptable to different
collections. Already successfully applied to geophysics,
bathymetry and elevation datasets.
• Simple tools have been provided to users to easily leverage
CSW queries (csw_find)
GeoNetwork Users’ Group
8. Sample auto-created netCDF header with ACDD attributes// global attributes:
:GDAL = "GDAL 1.11.1, released 2014/09/24" ;
:survey_id = "409" ;
:ecat_id = 105000 ;
:geospatial_lon_min = 148.206 ;
:geospatial_lon_resolution = 0.00399999999999068 ;
:geospatial_lat_max = -35.31 ;
:geospatial_bounds_crs = "GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS
84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["deg
ree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]" ;
:geospatial_lat_min = -36.022 ;
:geospatial_lat_resolution = 0.00399999999999778 ;
:geospatial_lat_units = "degrees_north" ;
:geospatial_lon_units = "degrees_east" ;
:geospatial_bounds = "POLYGON((148.4480 -36.0202, 148.2227 -35.9990, 148.2173 -35.9931, 148.2086 -35.3141, 148.2154 -35.3099, 148.5179
-35.4566, 148.5221 -35.4634, 148.5228 -36.0172, 148.5172 -36.0228, 148.4480 -36.0202))" ;
:geospatial_lon_max = 148.522 ;
:uuid = "c85d7857-f031-4b16-9917-e8b732e3e950" ;
:title = "Total Magnetic Intensity (TMI) grid of Canberra-Wagga Wagga, ACT/NSW, 1973/74 survey" ;
:source = "This mNSW0409.nc grid includes airborne-derived TMI data for the Canberra-Wagga Wagga, ACT/NSW, 1973/74 survey acquired for
the geological survey of ACT, NSW" ;
:summary = "Total magnetic intensity (TMI) data measures variations in the intensity of the Earth magnetic field caused by the
contrasting content of rock-forming minerals in the Earth crust. Magnetic anomalies can be either positive (field stronger than normal) or negative
(field weaker) depending on the susceptibility of the rock. The data are processed via standard methods to ensure the response recorded is that due
only to the rocks in the ground. The results produce datasets that can be interpreted to reveal the geological structure of the sub-surface. The
processed data is checked for quality by GA geophysicists to ensure that the final data released by GA are fit-for-purpose.This magnetic grid has a
cell size of 0.004 degrees (approximately 400m). The data used to produce this grid was acquired in 1975 by the ACT, NSW Government, and consisted of
24461 line-kilometres of data at 1500m line spacing and 150m terrain clearance." ;
:product_version = "Version 2.0, April 2015" ;
:history = "This mNSW0409.nc grid is an airborne-derived Total Magnetic Intensity (TMI) grid for the Canberra-Wagga Wagga, ACT/NSW,
1973/74 survey. The survey was acquired under the project No. 409 for the geological survey of ACT, NSW. The grid has a cell size of 0.004 degrees
(approximately 400m). A total of 24461 line-kilometres of data at a line spacing of 1500m were acquired to produce this grid. To constrain long
wavelengths in the grid, an independent data set, the Australia-wide Airborne Geophysical Survey (AWAGS) airborne magnetic data, was used to control
the base levels of the survey grid (Milligan et al., 2009). This survey grid is essentially levelled to AWAGS. Details of the specifications of
individual airborne surveys can be found in the Fourteenth Edition of the Index of Airborne Geophysical Surveys (Percival, 2014). This Index is also
available online at http://www.ga.gov.au/metadata-gateway/metadata/record/gcat_f3ad4f15-96bc-0cf3-e044-
00144fdd4fa6/Index+of+airborne+geophysical+surveys%3A+14th+edition. Further up to date information about individual surveys can also be obtained
online from the Airborne Surveys Database at http://www.ga.gov.au/oracle/argus/. The original grid was converted from ERMapper (.ers) format to
netCDF4_classic format using GDAL1.11.1. The main purpose of this conversion is to enable access to the data by relevant open source tools and
software. The netCDF grid was created on 2016-08-29T10:51:42 and has its y-axis indexed Southward-positive. ReferencesMilligan, P.R., Minty, B.R.S.,
Richardson, M. & Franklin, R., 2009. The Australia-wide Airborne Geophysical Survey accurate continental magnetic coverage. Preview, No. 138, p. 1-
128. Percival, P.J., 2014. Index of airborne geophysical surveys (Fourteenth Edition)." ;
:institution = "Commonwealth of Australia (Geoscience Australia)" ;
:keywords = "TMI, magnetics, NCI, AU, Magnetism and Palaeomagnetism, Airborne Digital Data, Geophysical Survey, grid, 409" ;
:license = "Creative Commons Attribution 4.0 International Licence" ;
:time_coverage_start = "1973-04-09" ;
:time_coverage_end = "1975-02-07" ;
:doi = "http://dx.doi.org/10.4225/25/589c553856e0e" ;
:metadata_link = "https://pid.nci.org.au/dataset/c85d7857-f031-4b16-9917-e8b732e3e950" ;
:Conventions = "CF-1.6, ACDD-1.3" ;
:date_created = "2013-05-03T00:00:00" ;
:date_modified = "2016-08-29T10:51:42" ;
}
GeoNetwork Users’ Group
9. csw_find examples – Human-friendly Metadata searches
Find all NCI filenames & titles for potassium grids overlapping a geographic bounding box
$ csw_find -k NCI,grid,potassium -b 148.996,-35.48,149.399,-35.124
"FILE:GEO" "/g/data1/rr2/National_Coverages/radmap_v3_2015_unfiltered_pctk/radmap_v3_2015_unfiltered_pctk.nc" "radmap v3 2015
unfiltered pct potassium grid"
"FILE:GEO" "/g/data1/rr2/National_Coverages/radmap_v3_2015_filtered_pctk/radmap_v3_2015_filtered_pctk.nc" "radmap v3 2015 filtered pct
potassium grid"
"FILE:GEO" "/g/data2/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rNSW1218k/rNSW1218k.nc" "Radiometric
Potassium grid of Southeast Lachlan, NSW, 2010 survey"
"FILE:GEO" "/g/data2/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rNSW0756k/rNSW0756k.nc" "Radiometric
Potassium grid of NSW DMR, Discovery 2000, Area S, Braidwood, NSW 2001 survey"
Find all WMS endpoints for all NCI data from survey ID 850
$ csw_find -k NCI,grid,850 -p WMS -f url
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/thorium/rSA0850_A6t/rSA0850_A6t.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rSA0850_A1k/rSA0850_A1k.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rSA0850_A6k/rSA0850_A6k.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/uranium/rSA0850_A6u/rSA0850_A6u.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/uranium/rSA0850_A1u/rSA0850_A1u.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/potassium/rSA0850_851K/rSA0850_851K.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A6/mSA0850A6.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/thorium/rSA0850_A1t/rSA0850_A1t.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A5/mSA0850A5.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A4/mSA0850A4.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/uranium/rSA0850_851u/rSA0850_851u.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A7/mSA0850A7.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/rad_survey_grids_levelled/thorium/rSA0850_851t/rSA0850_851t.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850A1/mSA0850A1.nc"
"http://dapds00.nci.org.au/thredds/wms/uc0/rr2_dev/rcb547/AWAGS_Levelled_Grids/mag_survey_grids_levelled/mSA0850_851/mSA0850_851.nc"
csw_find defaults to hitting GA’s externally-visible GeoNetwork CSW, but can be pointed to any
CSW URL
GeoNetwork Users’ Group
10. Still to-do
• Integrate tools into completely automated workflows (e.g.
folder watchers, regular nightly/weekly synchronisation, etc)
• Support non-netCDF file formats (partially done)
• Better implement MD5 checksums in metadata against data
files for change detection (currently only in description)
• Eliminate remaining manual steps in bulk ingestion/updating
of metadata records
• Demonstrate large-scale automated discovery and processing
system
• Implement Linked Data solutions to express relationships
between entities (e.g. Datasets<->Surveys)
GeoNetwork Users’ Group
11. Presented by Alex Ip
data@ga.gov.au
GeoNetwork Users’ Group
Thank you!
Questions?