The document discusses mapping biomass in the Amazon forest using Amazon analytics services. It describes how biomass maps are traditionally created using satellite data and LiDAR flights, but the process is computationally intensive. The author aims to generate thousands of maps to capture uncertainty using random forest algorithms, but notes their traditional methods cannot scale to that level. They are exploring using Amazon analytics services to generate the large number of maps needed.
19. AmazonS3 –Source ofTruth,MultipleClusters
Amazon S3
Interactive Spark Cluster
Amazon EMR
Amazon EMR
HDFS
HDFS
EC2 Instance Memory
Intermediates stored on
local disk or HDFSLocal
HDFS
EC2 Instance Memory
Intermediates stored on
local disk or HDFSLocal
Transient ETL Job
Source of Truth
HDFS
HDFS
HDFS
Local Intermediate HDFS/Storage
Local Intermediate HDFS/Storage
20. External Metadata Management
Amazon S3
Interactive Spark Cluster
Amazon EMR
Amazon EMR
HDFS
Transient ETL Job
Source of Truth
HDFS
Describes Data in S3
MySQL DB
instance
Customershaveoptions
Glue Data
Catalog