OpenTopography - Scalable Services for Geosciences Data

OpenTopography Facility
OpenTopography FacilityOpenTopography Facility

The OpenTopography facility was funded by the National Science Foundation (NSF) in 2009 to provide efficient online access to Earth science-oriented high-resolution lidar topography data, online processing tools, and derivative products.

OpenTopography - Scalable Services for Geosciences Data
www.opentopography.org
Canopy Height (ft)
@opentopography
info@opentopography.org
DOI / OGC
CSW
DATA USAGE ANALYTICS
HPC & CLOUD INTEGRATION
CYBERINFRASTRUCTURE
Spatiotemporal variations in data access illustrate that certain regions of a dataset can be "cold", while others are "hot". OT collects analytics which include user data selections through time.
We have developed tools that allow us to mine and visualize this information, and are exploring how to utilize these analytics to develop storage optimizations based on data value and cost.
For the hottest data, fast (I/O) and scaleable access are required. In these cases, data stored on SSD and accessible through HPC systems such as Gordon are desirable. For "cooler" data
which sees more infrequent access, cheaper (and slower) storage systems such as the cloud can be used to lower data facility operating costs. A tiered storage system offers the potential
to dynamically manage data storage and associated system performance based on real analytical information about usage.
In the case of topographic data, events such as earthquakes, floods, landslides, and other geophysical events are likely to cause an increase in demand for data that intersect the spatial
extent of the event. External feeds (e.g., USGS NEIC) could be monitored to proactively move data into high performance storage in anticipation of increased demand.
Activity based
data ranking and
tiered cloud &
HPC integrated
storage
1. On-demand job execution on Gordon (XSEDE HPC Resource)
OT received a Microsoft Azure for Research Award (allocated $40k in Azure Resources) to
explore integration of cloud resources into our existing infrastructure.
A prototype OT image on Azure VM depot allows us (or others) to quickly deploy the OT
software stack on an appropriately sized resource.
Data can be pulled from OT’s storage on the SDSC Cloud for processing in Azure.
USE CASE: TauDEM hydrologic analysis of DEMs
TauDEM is an open source hydrologic analysis toolkit developed by David Tarboton
(USU).
As part of OT’s CyberGIS collaboration, we implemented TauDEM (MPI) on Gordon. We
dynamically scale the number of cores allocated to the job, as a function of the size of
the input DEM.
2. Integration of cloud based on-demand geospatial processing services
OT has a dedicated Gordon I/O Node XSEDE allocation with 48 GB Memory/4.8TB
Flash memory + 16 Compute nodes (256 cores) with 64GB memory + QDR InfiniBand
Interconnect.
Performance tests using a DEM generation use case showed 20x job speed-ups
when four concurrent jobs are executed on Gordon vs OT's standard compute cluster.
Test case: 208 million LIDAR returns gridded to 20cm grid.
http://www.engineering.usu.edu/dtarb/
The OpenTopography cyberinfrastructure employs a multi-tier service-oriented architecture (SOA) that is highly scalable, permitting upgrades to the infrastructure tier and
corresponding algorithms without the need to update the APIs and clients. The SOA has enabled the integration of compute intensive algorithms, like the TauDEM hydrology
suite running on the Gordon XSEDE resource, as a service made available to the OpenTopography user community. The pluggable services architecture allows researchers
to integrate their algorithms into the OpenTopography processing workflow. OpenTopography also interoperates with other CI systems like the NSF-funded CyberGIS
viewshed analysis application, NASA SSARA, etc.
OpenTopography implements a catalog services for the web (CSW),
using the ISO 19115 metadata standard that can be federated with
other environments, e.g., NSF Earthcube, Thomson Reuters Web of
Science, etc. All datasets served via OpenTopography are assigned a
DOI that not only provides a persistent identifier for the dataset.
Cover image of Science featured a 0.25 m digital elevation model
(DEM) and hillshade of offset channels along the San Andreas
Fault in the Carrizo Plain produced by OpenTopography.
The OpenTopography facility was funded by the National Science Foundation (NSF) in 2009 to provide efficient online access to Earth science-oriented high-resolution lidar topography data, online processing tools, and derivative products.
Currently, OpenTopography serves 183 high resolution LIDAR (Light Detection and Ranging) point cloud datasets with over 820 billion returns covering approximately 179,153 sq. km. of important geologic
features such as the San Andreas Fault, Yellowstone, Tetons, Yosemite National Parks, etc., to a growing user community. Information collected from over 42,250 custom point cloud jobs that have processed
upwards of 1.4 trillion LIDAR returns, and over 19,800 custom raster data jobs, is being analyzed to prioritize future development based on usage insights as well as identifying novel approaches to managing
the exponential growth in data.
Collaboration Opportunities
Analysis of user behavior and data usage for optimizing
data location in deep storage/memory hierarchies
Pluggable services framework - Tracking software
provenance / framework security
New data types - Full waveform
LIDAR, Hyperspectral Imagery data
New processing algorithms - change detection, difference analysis
and time series analysis. Algorithm optimizations/parallelization
| | |

Mais conteúdo relacionado

Mais procurados(20)

Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
Rob Emanuele1.6K visualizações
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
Robert Grossman1K visualizações
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
Rob Emanuele1.2K visualizações
Earth Science PlatformEarth Science Platform
Earth Science Platform
Ted Habermann778 visualizações
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
CCRinc2.1K visualizações
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
Rob Emanuele963 visualizações
Slide 1Slide 1
Slide 1
butest261 visualizações
DATACUBES: Conquering Space & TimeDATACUBES: Conquering Space & Time
DATACUBES: Conquering Space & Time
plan4all736 visualizações
Application of web ontology to harvest estimation of rice in thailandApplication of web ontology to harvest estimation of rice in thailand
Application of web ontology to harvest estimation of rice in thailand
AIMS (Agricultural Information Management Standards)248 visualizações
Application of web ontology to harvest estimation of rice in ThailandApplication of web ontology to harvest estimation of rice in Thailand
Application of web ontology to harvest estimation of rice in Thailand
AIMS (Agricultural Information Management Standards)445 visualizações
LocationTech ProjectsLocationTech Projects
LocationTech Projects
Jody Garnett2.6K visualizações
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
The Statistical and Applied Mathematical Sciences Institute182 visualizações

Destaque

Religious c europeReligious c europe
Religious c europejlo1313
99 visualizações1 slide
PresentacióN1PresentacióN1
PresentacióN1sergio.afra7
137 visualizações10 slides
826_tipo_de_tortugas.doc826_tipo_de_tortugas.doc
826_tipo_de_tortugas.docElhuyarOlinpiada
506 visualizações2 slides
Offshore developmentOffshore development
Offshore developmentsagar Patel
204 visualizações9 slides
2ºA2ºA
2ºAroseconrado
162 visualizações16 slides

Destaque(19)

Religious c europeReligious c europe
Religious c europe
jlo131399 visualizações
PresentacióN1PresentacióN1
PresentacióN1
sergio.afra7137 visualizações
826_tipo_de_tortugas.doc826_tipo_de_tortugas.doc
826_tipo_de_tortugas.doc
ElhuyarOlinpiada506 visualizações
Offshore developmentOffshore development
Offshore development
sagar Patel204 visualizações
2ºA2ºA
2ºA
roseconrado162 visualizações
Ve liveshow lang nghe mua xuan veVe liveshow lang nghe mua xuan ve
Ve liveshow lang nghe mua xuan ve
Kpop Festival 2012 Hà Nội Việt Nam | Mua vé 0966.624.815273 visualizações
AutotrofiiAutotrofii
Autotrofii
Biro Bela338 visualizações
Encuentro de saberes "Los sentidos"Encuentro de saberes "Los sentidos"
Encuentro de saberes "Los sentidos"
lualdom488 visualizações
Ficha Técnica Renault Symbol 2014Ficha Técnica Renault Symbol 2014
Ficha Técnica Renault Symbol 2014
rfarias_10935 visualizações
2.1 2 The Impact of Marketing2.1 2 The Impact of Marketing
2.1 2 The Impact of Marketing
ioanekk304 visualizações
1987sep661987sep66
1987sep66
iaturfblog317 visualizações
A familiaA familia
A familia
Maria Domingues269 visualizações
Word triquiWord triqui
Word triqui
wilson000372 visualizações
Turmeric pastaTurmeric pasta
Turmeric pasta
Rancho de Caldera Eco-Resort Hotel & Madre Tierra Restaurant Boquete, Republic of Panama http://www.ranchodecaldera.com212 visualizações
Articles 104782 archivo-powerpoint_0Articles 104782 archivo-powerpoint_0
Articles 104782 archivo-powerpoint_0
danielarojassepulveda183 visualizações

Similar a OpenTopography - Scalable Services for Geosciences Data(20)

grid mininggrid mining
grid mining
ARNOLD775 visualizações
My Other Computer is a Data Center (2010 v21)My Other Computer is a Data Center (2010 v21)
My Other Computer is a Data Center (2010 v21)
Robert Grossman1.3K visualizações
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
Ian Foster624 visualizações
Slide 1Slide 1
Slide 1
butest186 visualizações
Big Data to SMART Data : Process ScenarioBig Data to SMART Data : Process Scenario
Big Data to SMART Data : Process Scenario
CHAKER ALLAOUI2.1K visualizações
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
Ian Foster766 visualizações
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
Robert Grossman1.2K visualizações
Earth on AWS - Next-Generation Open Data PlatformsEarth on AWS - Next-Generation Open Data Platforms
Earth on AWS - Next-Generation Open Data Platforms
Amazon Web Services1.9K visualizações
2019 02-12 eosc-hub for eo2019 02-12 eosc-hub for eo
2019 02-12 eosc-hub for eo
EGI Federation236 visualizações
remotesensing-12-01253.pdfremotesensing-12-01253.pdf
remotesensing-12-01253.pdf
NguyenVanTuan2961 visualizações

Último(20)

Photowave Presentation Slides - 11.8.23.pptxPhotowave Presentation Slides - 11.8.23.pptx
Photowave Presentation Slides - 11.8.23.pptx
CXL Forum118 visualizações
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet48 visualizações
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum183 visualizações
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman20 visualizações
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
METHOD AND SYSTEM FOR PREDICTING OPTIMAL LOAD FOR WHICH THE YIELD IS MAXIMUM ...
Prity Khastgir IPR Strategic India Patent Attorney Amplify Innovation23 visualizações
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum118 visualizações
The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya51 visualizações
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh34 visualizações
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum117 visualizações
Business Analyst Series 2023 -  Week 3 Session 5Business Analyst Series 2023 -  Week 3 Session 5
Business Analyst Series 2023 - Week 3 Session 5
DianaGray1042 visualizações
Green Leaf Consulting: Capabilities DeckGreen Leaf Consulting: Capabilities Deck
Green Leaf Consulting: Capabilities Deck
GreenLeafConsulting170 visualizações
.conf Go 2023 - SIEM project @ SNF.conf Go 2023 - SIEM project @ SNF
.conf Go 2023 - SIEM project @ SNF
Splunk163 visualizações

OpenTopography - Scalable Services for Geosciences Data

  • 1. OpenTopography - Scalable Services for Geosciences Data www.opentopography.org Canopy Height (ft) @opentopography info@opentopography.org DOI / OGC CSW DATA USAGE ANALYTICS HPC & CLOUD INTEGRATION CYBERINFRASTRUCTURE Spatiotemporal variations in data access illustrate that certain regions of a dataset can be "cold", while others are "hot". OT collects analytics which include user data selections through time. We have developed tools that allow us to mine and visualize this information, and are exploring how to utilize these analytics to develop storage optimizations based on data value and cost. For the hottest data, fast (I/O) and scaleable access are required. In these cases, data stored on SSD and accessible through HPC systems such as Gordon are desirable. For "cooler" data which sees more infrequent access, cheaper (and slower) storage systems such as the cloud can be used to lower data facility operating costs. A tiered storage system offers the potential to dynamically manage data storage and associated system performance based on real analytical information about usage. In the case of topographic data, events such as earthquakes, floods, landslides, and other geophysical events are likely to cause an increase in demand for data that intersect the spatial extent of the event. External feeds (e.g., USGS NEIC) could be monitored to proactively move data into high performance storage in anticipation of increased demand. Activity based data ranking and tiered cloud & HPC integrated storage 1. On-demand job execution on Gordon (XSEDE HPC Resource) OT received a Microsoft Azure for Research Award (allocated $40k in Azure Resources) to explore integration of cloud resources into our existing infrastructure. A prototype OT image on Azure VM depot allows us (or others) to quickly deploy the OT software stack on an appropriately sized resource. Data can be pulled from OT’s storage on the SDSC Cloud for processing in Azure. USE CASE: TauDEM hydrologic analysis of DEMs TauDEM is an open source hydrologic analysis toolkit developed by David Tarboton (USU). As part of OT’s CyberGIS collaboration, we implemented TauDEM (MPI) on Gordon. We dynamically scale the number of cores allocated to the job, as a function of the size of the input DEM. 2. Integration of cloud based on-demand geospatial processing services OT has a dedicated Gordon I/O Node XSEDE allocation with 48 GB Memory/4.8TB Flash memory + 16 Compute nodes (256 cores) with 64GB memory + QDR InfiniBand Interconnect. Performance tests using a DEM generation use case showed 20x job speed-ups when four concurrent jobs are executed on Gordon vs OT's standard compute cluster. Test case: 208 million LIDAR returns gridded to 20cm grid. http://www.engineering.usu.edu/dtarb/ The OpenTopography cyberinfrastructure employs a multi-tier service-oriented architecture (SOA) that is highly scalable, permitting upgrades to the infrastructure tier and corresponding algorithms without the need to update the APIs and clients. The SOA has enabled the integration of compute intensive algorithms, like the TauDEM hydrology suite running on the Gordon XSEDE resource, as a service made available to the OpenTopography user community. The pluggable services architecture allows researchers to integrate their algorithms into the OpenTopography processing workflow. OpenTopography also interoperates with other CI systems like the NSF-funded CyberGIS viewshed analysis application, NASA SSARA, etc. OpenTopography implements a catalog services for the web (CSW), using the ISO 19115 metadata standard that can be federated with other environments, e.g., NSF Earthcube, Thomson Reuters Web of Science, etc. All datasets served via OpenTopography are assigned a DOI that not only provides a persistent identifier for the dataset. Cover image of Science featured a 0.25 m digital elevation model (DEM) and hillshade of offset channels along the San Andreas Fault in the Carrizo Plain produced by OpenTopography. The OpenTopography facility was funded by the National Science Foundation (NSF) in 2009 to provide efficient online access to Earth science-oriented high-resolution lidar topography data, online processing tools, and derivative products. Currently, OpenTopography serves 183 high resolution LIDAR (Light Detection and Ranging) point cloud datasets with over 820 billion returns covering approximately 179,153 sq. km. of important geologic features such as the San Andreas Fault, Yellowstone, Tetons, Yosemite National Parks, etc., to a growing user community. Information collected from over 42,250 custom point cloud jobs that have processed upwards of 1.4 trillion LIDAR returns, and over 19,800 custom raster data jobs, is being analyzed to prioritize future development based on usage insights as well as identifying novel approaches to managing the exponential growth in data. Collaboration Opportunities Analysis of user behavior and data usage for optimizing data location in deep storage/memory hierarchies Pluggable services framework - Tracking software provenance / framework security New data types - Full waveform LIDAR, Hyperspectral Imagery data New processing algorithms - change detection, difference analysis and time series analysis. Algorithm optimizations/parallelization | | |