The document discusses using the SpatioTemporal Asset Catalog (STAC) to catalog geospatial datasets. STAC defines JSON schemas to encode metadata about spatiotemporal data like remote sensing imagery. This allows datasets like the European Space Agency's Sentinel-2 satellite data, containing petabytes of images, to be more easily searched. The STAC API also defines standards for searching and discovering STAC metadata. Tools like PySTAC and pystac-client make it easier to work with STAC catalogs and APIs in Python. Open questions remain around best representing multi-dimensional datasets like Zarr in STAC.
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
1. Using STAC to catalog SpatioTemporal datasets
Rob Emanuele
Geospatial Architect, Microsoft AI For Earth
Member, STAC Project Steering Committee
2. Motivation: Sentinel 2
• Multispectral imaging satellites
run by the European Space
Agency (ESA)
• Openly licensed data
• Over 15 million individual
captures with 10s of file assets
each for a single product –
Petabytes of data!
• How can we find the images
we need?
3.
4. SpatioTemporal Asset Catalog (STAC)
• Defines JSON schemas for encoding metadata
about spatiotemporal data (remote sensing
imagery, point cloud, weather data…)
• Core definitions, with core and community
extensions
• v1.0.0-rc.4; final 1.0 release coming soon!
5. STAC API
• Defines OpenAPI schemas for searching and
discovering STAC metadata
• Aligns with and extends the OGC API - Features
specification
• Includes a search endpoint for spatiotemporal and
attribute queries within or across Collections
• v1.0.0-beta.1, working towards a 1.0 release after the
final STAC 1.0 release
6. STAC – Core Types
Catalog
General grouping of
Catalogs, Collections and
Items
Item
Represents a discrete set of
data assets as a GeoJSON
Feature
Collection
A specifically grouped set of
Catalogs, Collections, and
Items, along with additional
metadata
9. STAC API
• Landing Page is a
Catalog
• Collections have Items
which can be searched
by space, time and other
properties
• Aligns with OGC
Features: API
10. STAC – Extensions
• Allows the definition of reusable
properties and other changes that any
STAC object can implement
• STAC is most useful with well defined
data format, dataset, and domain
specific extensions
• Community extensions defined at
https://stac-extensions.github.io/
12. PySTAC
• Implements core types in a Pythonic interface
• Including common extensions with patterns to write
custom extensions
• Validates STAC
• Convent for creating new STACs
• Core dependency of other STAC Python tooling
https://github.com/stac-utils/pystac
15. Open Question:
Representing Zarr et al
The datacube extension adds
support for adding dimensional
information for n-dimensional
datasets like Zarr or HDF5. This is
a work in progress.
It’s an unsolved problem of how
best to represent Zarr datasets in
STAC – e.g. are they Collection-
only or can we represent Items?
16. STAC + Dask = Awesome
With the metadata supplied by a
STAC API query, we can lazily
construct a DataArray from many
files without having to read them
and know how that data lines up
in space and time.
See Tom’s talk tomorrow for more
details!
17. Learn More & Collaborate
stacspec.org
GitHub
Gitter channel