SlideShare a Scribd company logo
1 of 36
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident and Windows Azure EranChinthakaWithana Beth Plale
Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 2
Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 3
Geo-Science Applications High Resource Requirements Compute intensive, dedicated HPC hardware e.g. Weather Research and Forecasting (WRF) Model Emergence of ensemble applications Large amount of small jobs e.g.  Examining each air layer, over a long period of time.  Single experiment = About 14000 jobs each taking few minutes to complete 4
Geo-Science Applications: Challenges Compute intensive applications Mid-scale scientists often scramble to find sufficient computational resources to test and run their codes Software requirements and platform dependence MPI, Cygwin (if windows), Linux only binaries Management of large job executions  Fault tolerance Reliability* of Grid computing resources and middleware Utilizing different compute resources *Marru S, Perera S, Feller M, Martin S. Reliable and Scalable Job Submission: LEAD Science Gateways Testing and Experiences with WS GRAM on TeraGrid Resources . TeraGrid Conference June 2008 5
Geo-Science Applications: Opportunities Cloud computing resources On-demand access to “unlimited” resources Flexibility Worker roles and VM roles Recent porting of geo-science applications WRF, WRF Preprocessing System (WPS) port to Windows Increased use of ensemble applications (large number of small runs) Production quality, opensource scientific workflow systems Microsoft Trident 6
Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 7
Research Vision Enabling geo-science experiments  Type of applications Compute intensive, ensembles Type of scientists Meteorologists, atmospheric scientists, emergency management personnel, geologists Utilizing both Cloud computing and Grid computing resources Utilizing opensource, production quality scientific workflow environments Improved data and meta-data management Geo-Science Applications Scientific Workflows Compute Resources 8
Existing Approaches Condor Features Enables creation of resource pools from grid and cloud computing resources Limitations On-demand resource allocation and management Ease of integration with workflow environments GridWay, SAGA, Falcon Limitations tightly integrated with complex middleware to address a broad range of problems GRAM Features Coordinates job submissions to Grid computing resources Limitations Scalability and reliability issues Ease of installation and maintenance CARMEN project Features Concentrates on building a cloud environment for neuroscientists Provide data sharing and analysis capabilities	 Encapsulates tools as WS-I compliant web services  Dynamic deployments using Dynasoar Limitations Strict application requirements Ability to support wide variety of compute resources 9
Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 10
Design Decisions Decoupled architecture with low turnaround time Web services interfaces for interactions Ease of integrating with workflow engines and tools Ability to support multiple job description languages e.g. JSDL and RSL Flexibility to support various security protocols transport level security and WS-Security Extensibility to support a range of compute resources Should support grid and cloud resources Should be able to schedule and monitor jobs Robust management of scientific jobs Experiences with GRAM2 and GRAM4 Ease of installation and maintenance  11
Proposed Framework Azure Blob Store Azure  Management API Sigiri Job Mgmt.Daemons Trident Activity Azure Fabric Web Service Job Queue Azure VM, Worker & Web Roles 12
Proposed Framework Sigiri – Abstraction for grids and clouds Web service Decouples job acceptance from execution and monitoring Daemons Manages compute resource interactions Job submissions and monitoring Cleaning up resources Efficient allocation of resources Template based approach for cloud computing resources 13
Performance Evaluation 14
Proposed Framework Trident Activity Activities compose a workflow Activity wraps a task / application Input parameter collection and validation Request composition Invocation and monitoring Framework activities Interacts with Sigiri to schedule jobs and monitor the progress Data movement to / from cloud storage (Windows Blob Store or Amazon S3) Visualization  15
Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 16
Weather Research and Forecast Model (WRF) Mesoscale numerical weather prediction system  Designed to serve both operational forecasting and atmospheric research needs. A software architecture allowing for computational parallelism and system extensibility 17
Background: LEAD II and Vortex2 Experiment May 1, 2010 to June 15, 2010 ~6 weeks, 7-days per week Workflow started on the hour every hour each morning.  Had to find and bind to latest model data (i.e., RUC 13km and ADAS data) to set initial and boundary conditions.   If model data was not available at NCEP and University of Oklahoma, workflow could not begin. Execution of complete WRF stack within 1 hour 18
The Trident Vortex2 Workflow: Timeline Bulk of time (50 min) spent in Lead Workflow Proxy Activity 19 Sigiri Integration
Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 20
Moving WRF Stack to Windows Azure  Opportunities Reliability of grid computing resources WRF and WRF Preprocessing System (WPS) ported to Windows Enable midscale scientists to exploit the capabilities of WRF Concerns Porting of WRF to Azure to run on multiple nodes Strict software requirements and the choice between worker and VM Roles Need of MPI, Cygwin Restricted to single virtual machine 21
Enabling WRF Stack on Azure Sigiri Job Mgmt.Daemons Web Service Job Queue Azure Blob Store Azure  Management API Trident Activity Azure Fabric Azure VM Roles 22
Enabling WRF Stack on Azure Sigiri – Microsoft Azure Daemon Maintains applications to virtual machine mappings Can use an external service as well Interacts with Windows Azure API to Deploy and start hosted services Handle Azure security credentials  Maintain Azure VM pools Monitor job executions Sigiri – Microsoft Azure Service Deployed inside virtual machines Accepts job submission requests from Microsoft Azure daemon Launches jobs and monitors them Enables status queries 23
Working with Azure VM Roles All the related applications are installed on a virtual machine using Hyper-V Virtual machine image sizes are limited 35GB to enable a wide variety of instance types Custom virtual machines (VHD files) are uploaded and stored in Azure Blob Store Managed by azure command line tools Custom hosted service deployments are needed to start VM roles with custom VM images Dynamic configuration of service descriptors to support service requirements Configuration of  VHD files, number of instances, certificate associations Hosted services are deployed and started on-demand using Azure management API Sigiri Azure daemon  manages the interactions with Azure management API Manages the life cycle of virtual machines Light-weight Sigiri service within the started VM role instances acts as job managers  Azure blob store is used for all data transfers to and from virtual machines  24
WRF Workflow 25 WRF Preprocessing service ARWPost service GrADS WRF Input files are moved to Windows Azure Execution of real.exe in Windows Azure Execution of wrf.exe in Windows Azure
WRF Job Execution in Windows Azure using Microsoft Trident 26
Test Run with Ophelia (14-Sep-2005) 27
Using Windows Azure for WRF Executions: Concerns and Experiences Default environment has no support for MPI executions Limitations of MPI on Azure Limited to single node, shared memory execution Only small scale experiments are possible within a single node Execution of Linux binaries are limited to the capabilities of platform emulators (cygwin) Windows Azure VM roles are in beta stage Debugging is hard Support  VM creation has about 20 to 30 steps (Marty Humprey “Cloud, HPC or Hybrid: A Case Study Involving Satellite Image Processing”) Windows management API is not well documented Certain semantics of the API related to VM roles is ambiguous Increased startup overheads of virtual machines 28
Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 29
Towards Enabling Ensemble Runs in Geo-Science Search for a proper use of worker roles for geo-science applications Sample Application enables the study of change in the strength and impact of storms that start over the oceans has given access to and manipulation of climate model scenarios for emergency management and personnel and local government officials Typical simulation only takes a few minutes to run on a medium-sized workstation (Input: 3GB, Output: 8GB) Complete experiment sweeps both temporal and spatial parameters Air layers at different heights over a period of time About 14000 – 15000 jobs per experiment and then aggregation of data Research Focus Fault tolerant ensemble execution using Azure worker roles Management of large number of workers Orchestrated through Trident Downstream workflow management of the data results 30
Towards Enabling Ensemble Runs in Geo-Science Framework Extensions Management of large number of job submissions and their life cycles Optimal allocation of workers for jobs Using a combination of worker and VM roles Fault tolerance	 Replication of stragglers and failing jobs Management of data movements Management of resources before and after job executions Making experiment outputs available for scientists and interested parties Data catalogues Meta-data management 31
Extensions to the Framework to Enable Ensemble Runs Azure Blob Store Azure  Management API Sigiri Job Mgmt.Daemons Trident Activity Azure Fabric Web Service Job Queue Azure VM, Worker & Web Roles 32 Replication Management Worker Management Fault Tolerance
Summary Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 33
Further Information Please visit our website: http://pti.iu.edu/d2i/leadII-home LEAD II and Vortex2 video: http://pti.iu.edu/video/vortex2 Contact us Eran Chinthaka Withana (echintha@cs.indiana.edu) Beth Plale (plale@cs.indiana.edu) 34
Team Members: Indiana University (lead) University of Miami, University of Oklahoma 35
Questions … ?? Further Information Please visit our website: http://pti.iu.edu/d2i/leadII-home LEAD II and Vortex2 video: http://pti.iu.edu/video/vortex2 Contact us EranChinthakaWithana (echintha@cs.indiana.edu) Beth Plale (plale@cs.indiana.edu) 36

More Related Content

What's hot

20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3Tim Bell
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light SourcesIan Foster
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker, Inc.
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
task scheduling in cloud datacentre using genetic algorithm
task scheduling in cloud datacentre using genetic algorithmtask scheduling in cloud datacentre using genetic algorithm
task scheduling in cloud datacentre using genetic algorithmSwathi Rampur
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayQAware GmbH
 
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...OW2
 
Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Saptak Sen
 
VIDEO STREAM ANALYSIS IN CLOUDS: AN OBJECT DETECTION AND CLASSIFICATION FRAME...
VIDEO STREAM ANALYSIS IN CLOUDS: AN OBJECT DETECTION AND CLASSIFICATION FRAME...VIDEO STREAM ANALYSIS IN CLOUDS: AN OBJECT DETECTION AND CLASSIFICATION FRAME...
VIDEO STREAM ANALYSIS IN CLOUDS: AN OBJECT DETECTION AND CLASSIFICATION FRAME...Nexgen Technology
 

What's hot (11)

20190620 accelerating containers v3
20190620 accelerating containers v320190620 accelerating containers v3
20190620 accelerating containers v3
 
Dice presents-feb2014
Dice presents-feb2014Dice presents-feb2014
Dice presents-feb2014
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
Docker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce HoffDocker in Open Science Data Analysis Challenges by Bruce Hoff
Docker in Open Science Data Analysis Challenges by Bruce Hoff
 
grid mining
grid mininggrid mining
grid mining
 
task scheduling in cloud datacentre using genetic algorithm
task scheduling in cloud datacentre using genetic algorithmtask scheduling in cloud datacentre using genetic algorithm
task scheduling in cloud datacentre using genetic algorithm
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...
OW2con'16 Keynote address: Kubernetes, the rising tide of systems administrat...
 
Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and Taking High Performance Computing to the Cloud: Windows HPC and
Taking High Performance Computing to the Cloud: Windows HPC and
 
VIDEO STREAM ANALYSIS IN CLOUDS: AN OBJECT DETECTION AND CLASSIFICATION FRAME...
VIDEO STREAM ANALYSIS IN CLOUDS: AN OBJECT DETECTION AND CLASSIFICATION FRAME...VIDEO STREAM ANALYSIS IN CLOUDS: AN OBJECT DETECTION AND CLASSIFICATION FRAME...
VIDEO STREAM ANALYSIS IN CLOUDS: AN OBJECT DETECTION AND CLASSIFICATION FRAME...
 
53
5353
53
 

Similar to Enabling Mid-Scale Geo Experiments with Microsoft Trident and Windows Azure

Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
 
An Introduction to Eclipse Kura - Eclipse Day Florence 2014
An Introduction to Eclipse Kura - Eclipse Day Florence 2014An Introduction to Eclipse Kura - Eclipse Day Florence 2014
An Introduction to Eclipse Kura - Eclipse Day Florence 2014Eurotech
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Rafael Ferreira da Silva
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用lantianlcdx
 
HumphreyCloudburstingEscience2011.pdf
HumphreyCloudburstingEscience2011.pdfHumphreyCloudburstingEscience2011.pdf
HumphreyCloudburstingEscience2011.pdfTejasParbate
 
Fog Computing Platform
Fog Computing PlatformFog Computing Platform
Fog Computing Platform霈萱 蔡
 
Efficient resource management with Red Hat OpenShift
Efficient resource management with Red Hat OpenShiftEfficient resource management with Red Hat OpenShift
Efficient resource management with Red Hat OpenShiftrgcalvo
 
Cloud application architecture with sql azure and windows azure
Cloud application architecture with sql azure and windows azureCloud application architecture with sql azure and windows azure
Cloud application architecture with sql azure and windows azureEduardo Castro
 
Knowledge labs cc1
Knowledge labs cc1Knowledge labs cc1
Knowledge labs cc1Padma Priya
 
[D2 CAMPUS] tech meet up(Back-end) - Latency Reduction between mobile users a...
[D2 CAMPUS] tech meet up(Back-end) - Latency Reduction between mobile users a...[D2 CAMPUS] tech meet up(Back-end) - Latency Reduction between mobile users a...
[D2 CAMPUS] tech meet up(Back-end) - Latency Reduction between mobile users a...NAVER D2
 
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...AM Publications
 
Java EE Modernization with Mesosphere DCOS
Java EE Modernization with Mesosphere DCOSJava EE Modernization with Mesosphere DCOS
Java EE Modernization with Mesosphere DCOSMesosphere Inc.
 
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsNane Kratzke
 
Muves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 FinalMuves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 FinalElastic Grid, LLC.
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
Cloud computing and grid computing 360 degree compared
Cloud computing and grid computing 360 degree comparedCloud computing and grid computing 360 degree compared
Cloud computing and grid computing 360 degree comparedMd. Hasibur Rashid
 

Similar to Enabling Mid-Scale Geo Experiments with Microsoft Trident and Windows Azure (20)

Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
An Introduction to Eclipse Kura - Eclipse Day Florence 2014
An Introduction to Eclipse Kura - Eclipse Day Florence 2014An Introduction to Eclipse Kura - Eclipse Day Florence 2014
An Introduction to Eclipse Kura - Eclipse Day Florence 2014
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
 
HumphreyCloudburstingEscience2011.pdf
HumphreyCloudburstingEscience2011.pdfHumphreyCloudburstingEscience2011.pdf
HumphreyCloudburstingEscience2011.pdf
 
Computer project
Computer projectComputer project
Computer project
 
Fog Computing Platform
Fog Computing PlatformFog Computing Platform
Fog Computing Platform
 
Sky High With Azure
Sky High With AzureSky High With Azure
Sky High With Azure
 
Microsoft: Invent with Purpose
Microsoft: Invent with PurposeMicrosoft: Invent with Purpose
Microsoft: Invent with Purpose
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
Efficient resource management with Red Hat OpenShift
Efficient resource management with Red Hat OpenShiftEfficient resource management with Red Hat OpenShift
Efficient resource management with Red Hat OpenShift
 
Cloud application architecture with sql azure and windows azure
Cloud application architecture with sql azure and windows azureCloud application architecture with sql azure and windows azure
Cloud application architecture with sql azure and windows azure
 
Knowledge labs cc1
Knowledge labs cc1Knowledge labs cc1
Knowledge labs cc1
 
[D2 CAMPUS] tech meet up(Back-end) - Latency Reduction between mobile users a...
[D2 CAMPUS] tech meet up(Back-end) - Latency Reduction between mobile users a...[D2 CAMPUS] tech meet up(Back-end) - Latency Reduction between mobile users a...
[D2 CAMPUS] tech meet up(Back-end) - Latency Reduction between mobile users a...
 
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
Improved Utilization of Infrastructure of Clouds by using Upgraded Functional...
 
Java EE Modernization with Mesosphere DCOS
Java EE Modernization with Mesosphere DCOSJava EE Modernization with Mesosphere DCOS
Java EE Modernization with Mesosphere DCOS
 
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise ArchitectsClouNS - A Cloud-native Application Reference Model for Enterprise Architects
ClouNS - A Cloud-native Application Reference Model for Enterprise Architects
 
Muves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 FinalMuves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 Final
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Cloud computing and grid computing 360 degree compared
Cloud computing and grid computing 360 degree comparedCloud computing and grid computing 360 degree compared
Cloud computing and grid computing 360 degree compared
 

More from Eran Chinthaka Withana

Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...Eran Chinthaka Withana
 
Opensource development and apache software foundation
Opensource development and apache software foundationOpensource development and apache software foundation
Opensource development and apache software foundationEran Chinthaka Withana
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsEran Chinthaka Withana
 
Usage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in CloudsUsage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in CloudsEran Chinthaka Withana
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantEran Chinthaka Withana
 

More from Eran Chinthaka Withana (9)

Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
 
Cassandra At Wize Commerce
Cassandra At Wize CommerceCassandra At Wize Commerce
Cassandra At Wize Commerce
 
Opensource development and apache software foundation
Opensource development and apache software foundationOpensource development and apache software foundation
Opensource development and apache software foundation
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and Clouds
 
Usage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in CloudsUsage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in Clouds
 
Versioning for Workflow Evolution
Versioning for Workflow EvolutionVersioning for Workflow Evolution
Versioning for Workflow Evolution
 
Web Services in the Real World
Web Services in the Real WorldWeb Services in the Real World
Web Services in the Real World
 
Axis2 Landscape
Axis2 LandscapeAxis2 Landscape
Axis2 Landscape
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition Assistant
 

Recently uploaded

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Enabling Mid-Scale Geo Experiments with Microsoft Trident and Windows Azure

  • 1. Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident and Windows Azure EranChinthakaWithana Beth Plale
  • 2. Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 2
  • 3. Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 3
  • 4. Geo-Science Applications High Resource Requirements Compute intensive, dedicated HPC hardware e.g. Weather Research and Forecasting (WRF) Model Emergence of ensemble applications Large amount of small jobs e.g. Examining each air layer, over a long period of time. Single experiment = About 14000 jobs each taking few minutes to complete 4
  • 5. Geo-Science Applications: Challenges Compute intensive applications Mid-scale scientists often scramble to find sufficient computational resources to test and run their codes Software requirements and platform dependence MPI, Cygwin (if windows), Linux only binaries Management of large job executions Fault tolerance Reliability* of Grid computing resources and middleware Utilizing different compute resources *Marru S, Perera S, Feller M, Martin S. Reliable and Scalable Job Submission: LEAD Science Gateways Testing and Experiences with WS GRAM on TeraGrid Resources . TeraGrid Conference June 2008 5
  • 6. Geo-Science Applications: Opportunities Cloud computing resources On-demand access to “unlimited” resources Flexibility Worker roles and VM roles Recent porting of geo-science applications WRF, WRF Preprocessing System (WPS) port to Windows Increased use of ensemble applications (large number of small runs) Production quality, opensource scientific workflow systems Microsoft Trident 6
  • 7. Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 7
  • 8. Research Vision Enabling geo-science experiments Type of applications Compute intensive, ensembles Type of scientists Meteorologists, atmospheric scientists, emergency management personnel, geologists Utilizing both Cloud computing and Grid computing resources Utilizing opensource, production quality scientific workflow environments Improved data and meta-data management Geo-Science Applications Scientific Workflows Compute Resources 8
  • 9. Existing Approaches Condor Features Enables creation of resource pools from grid and cloud computing resources Limitations On-demand resource allocation and management Ease of integration with workflow environments GridWay, SAGA, Falcon Limitations tightly integrated with complex middleware to address a broad range of problems GRAM Features Coordinates job submissions to Grid computing resources Limitations Scalability and reliability issues Ease of installation and maintenance CARMEN project Features Concentrates on building a cloud environment for neuroscientists Provide data sharing and analysis capabilities Encapsulates tools as WS-I compliant web services Dynamic deployments using Dynasoar Limitations Strict application requirements Ability to support wide variety of compute resources 9
  • 10. Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 10
  • 11. Design Decisions Decoupled architecture with low turnaround time Web services interfaces for interactions Ease of integrating with workflow engines and tools Ability to support multiple job description languages e.g. JSDL and RSL Flexibility to support various security protocols transport level security and WS-Security Extensibility to support a range of compute resources Should support grid and cloud resources Should be able to schedule and monitor jobs Robust management of scientific jobs Experiences with GRAM2 and GRAM4 Ease of installation and maintenance 11
  • 12. Proposed Framework Azure Blob Store Azure Management API Sigiri Job Mgmt.Daemons Trident Activity Azure Fabric Web Service Job Queue Azure VM, Worker & Web Roles 12
  • 13. Proposed Framework Sigiri – Abstraction for grids and clouds Web service Decouples job acceptance from execution and monitoring Daemons Manages compute resource interactions Job submissions and monitoring Cleaning up resources Efficient allocation of resources Template based approach for cloud computing resources 13
  • 15. Proposed Framework Trident Activity Activities compose a workflow Activity wraps a task / application Input parameter collection and validation Request composition Invocation and monitoring Framework activities Interacts with Sigiri to schedule jobs and monitor the progress Data movement to / from cloud storage (Windows Blob Store or Amazon S3) Visualization 15
  • 16. Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 16
  • 17. Weather Research and Forecast Model (WRF) Mesoscale numerical weather prediction system Designed to serve both operational forecasting and atmospheric research needs. A software architecture allowing for computational parallelism and system extensibility 17
  • 18. Background: LEAD II and Vortex2 Experiment May 1, 2010 to June 15, 2010 ~6 weeks, 7-days per week Workflow started on the hour every hour each morning. Had to find and bind to latest model data (i.e., RUC 13km and ADAS data) to set initial and boundary conditions. If model data was not available at NCEP and University of Oklahoma, workflow could not begin. Execution of complete WRF stack within 1 hour 18
  • 19. The Trident Vortex2 Workflow: Timeline Bulk of time (50 min) spent in Lead Workflow Proxy Activity 19 Sigiri Integration
  • 20. Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 20
  • 21. Moving WRF Stack to Windows Azure Opportunities Reliability of grid computing resources WRF and WRF Preprocessing System (WPS) ported to Windows Enable midscale scientists to exploit the capabilities of WRF Concerns Porting of WRF to Azure to run on multiple nodes Strict software requirements and the choice between worker and VM Roles Need of MPI, Cygwin Restricted to single virtual machine 21
  • 22. Enabling WRF Stack on Azure Sigiri Job Mgmt.Daemons Web Service Job Queue Azure Blob Store Azure Management API Trident Activity Azure Fabric Azure VM Roles 22
  • 23. Enabling WRF Stack on Azure Sigiri – Microsoft Azure Daemon Maintains applications to virtual machine mappings Can use an external service as well Interacts with Windows Azure API to Deploy and start hosted services Handle Azure security credentials Maintain Azure VM pools Monitor job executions Sigiri – Microsoft Azure Service Deployed inside virtual machines Accepts job submission requests from Microsoft Azure daemon Launches jobs and monitors them Enables status queries 23
  • 24. Working with Azure VM Roles All the related applications are installed on a virtual machine using Hyper-V Virtual machine image sizes are limited 35GB to enable a wide variety of instance types Custom virtual machines (VHD files) are uploaded and stored in Azure Blob Store Managed by azure command line tools Custom hosted service deployments are needed to start VM roles with custom VM images Dynamic configuration of service descriptors to support service requirements Configuration of VHD files, number of instances, certificate associations Hosted services are deployed and started on-demand using Azure management API Sigiri Azure daemon manages the interactions with Azure management API Manages the life cycle of virtual machines Light-weight Sigiri service within the started VM role instances acts as job managers Azure blob store is used for all data transfers to and from virtual machines 24
  • 25. WRF Workflow 25 WRF Preprocessing service ARWPost service GrADS WRF Input files are moved to Windows Azure Execution of real.exe in Windows Azure Execution of wrf.exe in Windows Azure
  • 26. WRF Job Execution in Windows Azure using Microsoft Trident 26
  • 27. Test Run with Ophelia (14-Sep-2005) 27
  • 28. Using Windows Azure for WRF Executions: Concerns and Experiences Default environment has no support for MPI executions Limitations of MPI on Azure Limited to single node, shared memory execution Only small scale experiments are possible within a single node Execution of Linux binaries are limited to the capabilities of platform emulators (cygwin) Windows Azure VM roles are in beta stage Debugging is hard Support VM creation has about 20 to 30 steps (Marty Humprey “Cloud, HPC or Hybrid: A Case Study Involving Satellite Image Processing”) Windows management API is not well documented Certain semantics of the API related to VM roles is ambiguous Increased startup overheads of virtual machines 28
  • 29. Agenda Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 29
  • 30. Towards Enabling Ensemble Runs in Geo-Science Search for a proper use of worker roles for geo-science applications Sample Application enables the study of change in the strength and impact of storms that start over the oceans has given access to and manipulation of climate model scenarios for emergency management and personnel and local government officials Typical simulation only takes a few minutes to run on a medium-sized workstation (Input: 3GB, Output: 8GB) Complete experiment sweeps both temporal and spatial parameters Air layers at different heights over a period of time About 14000 – 15000 jobs per experiment and then aggregation of data Research Focus Fault tolerant ensemble execution using Azure worker roles Management of large number of workers Orchestrated through Trident Downstream workflow management of the data results 30
  • 31. Towards Enabling Ensemble Runs in Geo-Science Framework Extensions Management of large number of job submissions and their life cycles Optimal allocation of workers for jobs Using a combination of worker and VM roles Fault tolerance Replication of stragglers and failing jobs Management of data movements Management of resources before and after job executions Making experiment outputs available for scientists and interested parties Data catalogues Meta-data management 31
  • 32. Extensions to the Framework to Enable Ensemble Runs Azure Blob Store Azure Management API Sigiri Job Mgmt.Daemons Trident Activity Azure Fabric Web Service Job Queue Azure VM, Worker & Web Roles 32 Replication Management Worker Management Fault Tolerance
  • 33. Summary Geo-Science Applications: Challenges and Opportunities Research Vision Proposed Framework Applications Scheduling time-critical MPI applications in Windows Azure Scheduling large number of small jobs (ensembles) in Windows Azure 33
  • 34. Further Information Please visit our website: http://pti.iu.edu/d2i/leadII-home LEAD II and Vortex2 video: http://pti.iu.edu/video/vortex2 Contact us Eran Chinthaka Withana (echintha@cs.indiana.edu) Beth Plale (plale@cs.indiana.edu) 34
  • 35. Team Members: Indiana University (lead) University of Miami, University of Oklahoma 35
  • 36. Questions … ?? Further Information Please visit our website: http://pti.iu.edu/d2i/leadII-home LEAD II and Vortex2 video: http://pti.iu.edu/video/vortex2 Contact us EranChinthakaWithana (echintha@cs.indiana.edu) Beth Plale (plale@cs.indiana.edu) 36