Roelof Pieters (Overstory) – Tackling Forest Fires and Deforestation with Satellite Data and AI
1. Roelof Pieters
CTO & Co-founder
roelof@overstory.ai
@graphific
Tackling Forest Fires and Deforestation with Satellite
Data and AI
May 14, 2021
2. • Computer Science/Deep Learning
• Anthropology and Development
Sociology
• Previously founded Creative.ai,
Stockholm.ai, GitXiv and other open
source initiatives/networks/startups
Introduction
Roelof Pieters
CTO & Co-founder, Overstory
This session
• Advancements in satellite data & AI
• Example: deforestation monitoring
• Example: preventing wildfires and power
outages
3. Indra den Bakker
CEO & Co-founder
Anniek Schouten
COO & Co-founder
David de Meij
Data Scientist
Roelof Pieters
CTO & Co-founder
Rochelle Silva
Radar &
GIS Researcher
Lorenzo Riches
Data Scientist
We’re Overstory
Arsha Yuditha
Amiranti
GIS Specialist
Elvira Garkava
Business Developer
Fiona Spruill
Chief Product Officer
Andrea Giardini
DevOps Engineer
Killian Tobin
Head of Business
Development
7. Satellite market
Today 2029
# of satellites
2010
1000
800
600
400
200
1,470 satellites launched
between 2010 and 2019
8,500 satellites expected to launch
between 2019 and 2028*
Hyperspectral
VHR SAR Video Real-time
Source:Euroconsult: https://spacenews.com/analysis-are-smallsats-entering-the-maturity-stage/
11. July 2, 2017 July 4, 2017 July 5, 2017
Input data Frequent revisits
up to daily on a global scale
12. Input data Frequent revisits
up to daily on a global scale
2.5m
source:
https://platform.digitalglobe.com/earth-imaging-basics-spatial-resolution/
30cm
13. Tri-stereo and video monitoring
for 3D-mapping
SAR data to look through
clouds
Multi- and hyperspectral
data
Up to 30 cm resolution
Up to 20 VHR daily revisits
& geostationary satellites
Very high-resolution satellite data
14. ● Mega Large Imagery
● Constant change
● Noisy satellite data
Data Science Challenges
● Noisy or lack of labels
● Generalize from training
● “ground truth”
● Classical machine learning still the norm
15. Unsupervised Machine Learning
???
A? B? C?
Allows Overstory to get insights
anywhere in the world with 1-2 factors
less customer/labeled data (1000 instead
of 10K-100K data points)
"Everything is related to everything else,
but near things are more related than
distant things"
16. Stereo Imaging
For creating height maps and 3D maps we use Stereo Imaging
techniques to create Digital Surface Maps (DSM) and Digital Terrain
Maps (DTM), as well as 3D point clouds as in this example
18. A process by which freely accessible low resolution imagery can be upscaled to commercial-grade high
resolution, allowing for more accurate insights, easier labelling by our annotators, and increased
accuracy for our machine learning models, at a cheaper cost
(Upscaling Landsat-8 to
Digital Globe Worldview-3-like level)
LS8 TIRS
LS8
CIRRUS
LS8
Panchromatic
LS8 SWIR
LS8 NIR LS8 RGB
DG-WV3 RGB
DG-WV3
Panchromatic
DG-WV3 SWIR
31cm
1.24m
3.7m
15m
100m
30m
Generative Upscaling
Li et al (2019) Feedback Network for Image
Super-Resolution"
See also Super-Resolution Generative
Adversarial Network(s) (many papers)
19. Active Learning
(re)train
candidate selection
oracle / human annotator
● eg BAyesian Active Learning library
(BaaL) by ElementAI:
https://github.com/ElementAI/baal/
○ MCDropout (Gal et al. 2015)
○ BALD (Houlsby et al. 2011)
(ElementAI)
22. ● high resolution forest and landcover map
● up to date with 2014 to now
● high forest and crop type
accuracy
Deforestation monitoring
what how
● deep learning for segmentation
● multiple satellite sources: sensor
fusion / multimodality
● noisy data: generative gap filling
● dynamic data regime: open data, active
labelling (ground and satellite), noisy
labels
23. Climate &
weather
Height data
Advancements in
deep learning
SAR (radar)
data
Multi resolution
data
Sensor fusion
Convolutional Neural
Networks
Unsupervised
learning and active
(bayesian) learning
Multispectral
data
Output
Segmentation
Layers of satellite
imagery
Land Cover Segmentation on
pixel level
24. Active Learning
Encoder Neural Planet Embedding
1.
2.
3.
4a.
Data
repositor
y
5.
4b.
(Overstory training data
pipeline)
Annotator
Field worker
Public Data
33. ● very high resolution tree species map:
37 different tree species, shrub species and
grass classes
● over 16,000 km of power lines
with a corridor width of 150
meter
● total area of over 505,000 km2
● hard constraints on minimum
level of accuracy
fire risk monitoring
what how
● deep learning, naturally :)
● open source labels, customer labels,
external party ground validation
● very high resolution satellite imagery
(50cm)
● massively parallel distributed
processing through dask, kubernetes,
and distributed data parallel training
35. Getting a new notebook
Our Infrastructure - Jupyterhub
👥
I need a new notebook! Processing...
We need a new machine
for this...
New node
Notebook available
1
2
3
4
5
6
36. Dask
Dask provides advanced parallelism for
analytics, enabling performance at scale
for the tools you love
Our Infrastructure
37. I need some heavy compute
Our Infrastructure - Dask
👥
I need a new
Dask cluster! Dask-gateway
Dask-scheduler
Dask-worker
Dask-worker
Cluster
available
1
2
3
4
5
6
47. Overstory is on a mission to monitor all
natural resources on Earth in real-time.
48. We’re in this together!
Overstory is on a mission to monitor all
natural resources on Earth in real-time.
Come join us!
https://www.overstory.com/careers
49. Learning Resources / Research
● List of videos about Geospatial data science from FOSS4G (Free
and Open Source Software for Geospatial / OSGeo)
https://www.youtube.com/channel/UC_2Lyc9VUX-jC-E1prJitHw/vi
deos
● ICLR 2020 proceedings now available: https://iclr.cc/virtual_2020/
& video recordings of climate change AI workshop:
https://www.youtube.com/channel/UCyjDr_aoMlzhSvCTdT7eZ9g/
videos
● CVPR pre-papers for EarthVision: Large Scale Computer Vision
for Remote Sensing Imagery Workshop
http://openaccess.thecvf.com/CVPR2020_workshops/CVPR2020
_w11.py
Awesome lists of resources:
● https://github.com/sshuair/awesome-gis
● https://github.com/robmarkcole/satellite-image-deep-learning
● https://github.com/chrieke/awesome-satellite-imagery-datasets
● https://github.com/acgeospatial/awesome-earthobservation-code
● https://github.com/wenhwu/awesome-remote-sensing-change-
detection
Amazing projects
● https://www.globalforestwatch.org/
● https://trase.earth/
● https://www.half-earthproject.org/
● https://www.climatewatchdata.org/
Geospatial Toolkit/UI:
● QGIS: https://www.qgis.org
Satellite Data
● sentinel-2 10m resolution satellite imagery:
https://scihub.copernicus.eu/dhus/#/home
● landsat 30m resolution satellite imagery:
https://landsat.gsfc.nasa.gov/
ML for geospatial/satellite data libraries:
● https://github.com/sentinel-hub/eo-learn
● https://rastervision.io/
● https://github.com/fastai/fastai2/
Resources
50. Rolnick, et al. Tackling Climate Change with Machine Learning, arXiv:1906.05433 & https://www.climatechange.ai/