SlideShare uma empresa Scribd logo
1 de 16
Versioning for Workflow Evolution Roger Barga, Nelson Araujo Microsoft Research, Microsoft Corporation, Redmond, Washington Eran Chinthaka Withana, Beth Plale                School of Informatics and Computing Indiana University, Bloomington, Indiana 3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”; June 22, 2010;  Eran C. Withana
Workflow Evolution Computational Science Experiments Sequence of activities Set of configurable parameters and input data Produces outputs to be analyzed and evaluated further Evolution of Research Changes in research artifacts
Workflow Evolution Workflows as a good tool to track evolution of research Automate repeatable tasks in an efficient manner Algorithms & experimental procedures encoded in to workflows Tracking workflows tracks research too Tracking effects over time Provenance of data products Lineage of and the roots of errors and affected data products Comparing Results More than one research direction in a given experiment Comparing outputs from different paths of the research Attribution Attribution of credit based on who performed, who owns/created, who own data products Sharing and attribution of research can and should be an integral part of research Eg: Sub-modules from myexperiments.org Workflow Evolution Framework and versioning model Enables the management of knowledge encoded in workflow executions
Related Work Workflow evolution share a lot in common with provenance collection frameworks I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society. Existing evolution frameworks J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006. Evolution Data Models L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142 Versioning at different levels Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999.  System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
Use Cases 1. Research Reproduction 2. Scientific Workflows In LEAD tracking namelist input files and visualizations Tracking activity binaries
Versioning Model Dimensions of workflow evolution Direct evolution occurs when a user of the workflow performs one of the following actions: Changes the flow and arrangements of the components within the system Changes the components within the workflow Changes inputs and/or output parameters or configuration parameters to different components within the workflow Contributions tracks components that are                                   reused from a previous system  Workflow Evolution Capturing Stages User explicitly saves the workflow User closes the workflow editor Execution of a workflow Warning: This granularity might not capture        all edits
Trident Workbench Trident Registry Management Workflow Packages Design Trident Runtime Services Trident Registry Data Model Publish-Subscribe Blackboard Workbench Trident Data Model Monitor Data Access Layer Scientific Workflows Evolution Framework Administration Browser Versioning Model RegistryManagement WindowsWorkflowFoundation Local Storage Other Local/remote Versioning System Architecture within Trident Scientific workflow worbench Trident Evolution FrameworkArchitecture Trident Architecture
User View (within Trident) Workflow Evolution View Versioned Objects in Registry
Performance Evaluation Evaluation strategies  Delta – difference between two consecutive versions Checkpointing  - complete version saved after fixed number of version No Delta, No Checkpointing Each version saved as it is With Delta, No Checkpointing Delta with previous version With Delta, With Checkpointing Checkpointed after n versions Workflows used
Performance Evaluation File Write Time                      O Workflow                                                                       M Workflow
Performance Evaluation Version Recovery Time                      O Workflow                                                                       M Workflow
Performance Evaluation Space Usage for a Version                      O Workflow                                                                       M Workflow
Performance Evaluation Data Retrieved per Version                      O Workflow                                                                       M Workflow
Discussion "No delta, No Checkpointing" options performs poorly with respect to storage usage  4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large delta outperforms both other options with respect to  version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small delta version recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large delta Criteria for selecting object maintenance strategy size of data objects average changes for data objects between different versions of the same object response time to the user and the system Challenges in working with different types of artifacts
Future Work Dynamic strategy to adjust versioning technique depending on object properties Challenges Unavailability of visualization software  Visualizing different types of data products, integrating other viz tools LEAD II Vortex2 Use case Tracking different WF Activity library versions
Thank You !!!                               Questions …?

Mais conteúdo relacionado

Mais procurados

Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
SQL Reporting Services
SQL Reporting ServicesSQL Reporting Services
SQL Reporting Services
neha mittal
 
Be05 introduction to sql azure
Be05   introduction to sql azureBe05   introduction to sql azure
Be05 introduction to sql azure
DotNetCampus
 

Mais procurados (20)

Partially Contained Databases in SQL Server 2012+
Partially Contained Databases in SQL Server 2012+Partially Contained Databases in SQL Server 2012+
Partially Contained Databases in SQL Server 2012+
 
Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101Back to [Jaspersoft] basics: visualize.js 101
Back to [Jaspersoft] basics: visualize.js 101
 
scalable distributed service integrity attestation for software as a service ...
scalable distributed service integrity attestation for software as a service ...scalable distributed service integrity attestation for software as a service ...
scalable distributed service integrity attestation for software as a service ...
 
Back to [Jaspersoft] Basics: Dashboards 101
Back to [Jaspersoft] Basics:  Dashboards 101Back to [Jaspersoft] Basics:  Dashboards 101
Back to [Jaspersoft] Basics: Dashboards 101
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
SQL Reporting Services
SQL Reporting ServicesSQL Reporting Services
SQL Reporting Services
 
AZURE Data Related Services
AZURE Data Related ServicesAZURE Data Related Services
AZURE Data Related Services
 
Work with data in ASP.NET
Work with data in ASP.NETWork with data in ASP.NET
Work with data in ASP.NET
 
Be05 introduction to sql azure
Be05   introduction to sql azureBe05   introduction to sql azure
Be05 introduction to sql azure
 
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
Microsoft Build 2018 Analytic Solutions with Azure Data Factory and Azure SQL...
 
Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the Cloud
 
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
Data Migration to Azure SQL and Azure SQL Managed Instance - June 19 2020
 
Tech UG - Newcastle 09-17 - logic apps
Tech UG - Newcastle 09-17 -   logic appsTech UG - Newcastle 09-17 -   logic apps
Tech UG - Newcastle 09-17 - logic apps
 
Sql Azure - Adi Cohn
Sql Azure - Adi CohnSql Azure - Adi Cohn
Sql Azure - Adi Cohn
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Scalable Service Architectures
Scalable Service ArchitecturesScalable Service Architectures
Scalable Service Architectures
 
Scalable distributed service integrity for SaaS
Scalable distributed service integrity for SaaSScalable distributed service integrity for SaaS
Scalable distributed service integrity for SaaS
 
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview SlidesMicrosoft Azure Data Factory Hands-On Lab Overview Slides
Microsoft Azure Data Factory Hands-On Lab Overview Slides
 
Microsoft for Your Data
Microsoft for Your DataMicrosoft for Your Data
Microsoft for Your Data
 
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting ServicesImplementing Mobile Reports in SQL Sserver 2016 Reporting Services
Implementing Mobile Reports in SQL Sserver 2016 Reporting Services
 

Semelhante a Versioning for Workflow Evolution

Development Practices & The Microsoft Approach
Development Practices & The Microsoft ApproachDevelopment Practices & The Microsoft Approach
Development Practices & The Microsoft Approach
Steve Lange
 
Stat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo MasterStat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo Master
reachtimsq
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
Sebrina_Malone_Resume10202016
Sebrina_Malone_Resume10202016Sebrina_Malone_Resume10202016
Sebrina_Malone_Resume10202016
Sebrina Malone
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
Stian Soiland-Reyes
 

Semelhante a Versioning for Workflow Evolution (20)

eResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software developmenteResearch workflows for studying free and open source software development
eResearch workflows for studying free and open source software development
 
Development Practices & The Microsoft Approach
Development Practices & The Microsoft ApproachDevelopment Practices & The Microsoft Approach
Development Practices & The Microsoft Approach
 
Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)
 
2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)2013 06-24 Wf4Ever: Annotating research objects (PDF)
2013 06-24 Wf4Ever: Annotating research objects (PDF)
 
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)2013 06-24 Wf4Ever: Annotating research objects (PPTX)
2013 06-24 Wf4Ever: Annotating research objects (PPTX)
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 
Vsts
VstsVsts
Vsts
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Stat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo MasterStat 5.4 Pre Sales Demo Master
Stat 5.4 Pre Sales Demo Master
 
Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)Lviv Data Science Club (Sergiy Lunyakin)
Lviv Data Science Club (Sergiy Lunyakin)
 
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
Taverna workflows: provenance and reproducibility - STFC/NERC workshop 2013
 
Stat 5
Stat 5Stat 5
Stat 5
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java ApplicationsTowards a Macrobenchmark Framework for Performance Analysis of Java Applications
Towards a Macrobenchmark Framework for Performance Analysis of Java Applications
 
Replicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearchReplicating FLOSS Research as eResearch
Replicating FLOSS Research as eResearch
 
Test Automation Framework Designs
Test Automation Framework DesignsTest Automation Framework Designs
Test Automation Framework Designs
 
Team Foundation Server 2010 - Overview
Team Foundation Server 2010 - OverviewTeam Foundation Server 2010 - Overview
Team Foundation Server 2010 - Overview
 
Sebrina_Malone_Resume10202016
Sebrina_Malone_Resume10202016Sebrina_Malone_Resume10202016
Sebrina_Malone_Resume10202016
 
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
2013-07-19 myExperiment research objects, beyond workflows and packs (PPTX)
 
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, RomeWorkflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
 

Mais de Eran Chinthaka Withana

Opensource development and apache software foundation
Opensource development and apache software foundationOpensource development and apache software foundation
Opensource development and apache software foundation
Eran Chinthaka Withana
 

Mais de Eran Chinthaka Withana (9)

Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
Redefining ETL Pipelines with Apache Technologies to Accelerate Decision-Maki...
 
Cassandra At Wize Commerce
Cassandra At Wize CommerceCassandra At Wize Commerce
Cassandra At Wize Commerce
 
Opensource development and apache software foundation
Opensource development and apache software foundationOpensource development and apache software foundation
Opensource development and apache software foundation
 
User Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and CloudsUser Inspired Management of Scientific Jobs in Grids and Clouds
User Inspired Management of Scientific Jobs in Grids and Clouds
 
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...
 
Usage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in CloudsUsage Patterns to Provision for Scientific Experiments in Clouds
Usage Patterns to Provision for Scientific Experiments in Clouds
 
Web Services in the Real World
Web Services in the Real WorldWeb Services in the Real World
Web Services in the Real World
 
Axis2 Landscape
Axis2 LandscapeAxis2 Landscape
Axis2 Landscape
 
CBR Based Workflow Composition Assistant
CBR Based Workflow Composition AssistantCBR Based Workflow Composition Assistant
CBR Based Workflow Composition Assistant
 

Último

Último (20)

Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Versioning for Workflow Evolution

  • 1. Versioning for Workflow Evolution Roger Barga, Nelson Araujo Microsoft Research, Microsoft Corporation, Redmond, Washington Eran Chinthaka Withana, Beth Plale School of Informatics and Computing Indiana University, Bloomington, Indiana 3rd International Workshop on Data Intensive Distributed Computing, Chicago, IL, US; “Versioning for Workflow Evolution”; June 22, 2010; Eran C. Withana
  • 2. Workflow Evolution Computational Science Experiments Sequence of activities Set of configurable parameters and input data Produces outputs to be analyzed and evaluated further Evolution of Research Changes in research artifacts
  • 3. Workflow Evolution Workflows as a good tool to track evolution of research Automate repeatable tasks in an efficient manner Algorithms & experimental procedures encoded in to workflows Tracking workflows tracks research too Tracking effects over time Provenance of data products Lineage of and the roots of errors and affected data products Comparing Results More than one research direction in a given experiment Comparing outputs from different paths of the research Attribution Attribution of credit based on who performed, who owns/created, who own data products Sharing and attribution of research can and should be an integral part of research Eg: Sub-modules from myexperiments.org Workflow Evolution Framework and versioning model Enables the management of knowledge encoded in workflow executions
  • 4. Related Work Workflow evolution share a lot in common with provenance collection frameworks I. T. Foster, J.-S. Vockler, M. Wilde, and Y. Zhao. Chimera: A virtual data system for representing, querying, and automating data derivation. In Proceedings of the 14th International Conference on Scientific and Statistical Database Management, pages 37-46, Washington, DC, USA, 2002. IEEE Computer Society. Existing evolution frameworks J. Freire, C. Silva, S. Callahan, E. Santos, C. Scheidegger, and H. Vo. Managing rapidly-evolving scientific workflows. Lecture Notes in Computer Science, 4145:10, 2006. Evolution Data Models L. Bavoil, S. P. Callahan, P. J. Crossno, J. Freire, C. E. Scheidegger, C. T. Silva, H. T. Vo. Vistrails: Enabling interactive multiple-view visualizations. In IEEE Visualization, 2005. VIS 05, pages 135-142 Versioning at different levels Application level: D. Santry, M. Feeley, N. Hutchinson, and A. Veitch. Elephant: The file system that never forgets. In Workshop on Hot Topics in Operating Systems, pages 2-7. IEEE Computer Society, 1999. System/database level: R. Chatterjee, G. Arun, S. Agarwal, B. Speckhard, and R. Vasudevan. Using applications of data versioning in database application development. In ICSE '04: Proceedings of the 26th International Conference on Software Engineering, pages 315{325, Washington, DC, USA, 2004. IEEE Computer Society Disk storage level: M. Flouris and A. Bilas. Clotho: Transparent data versioning at the block I/O level. In Proceedings of the 12th NASA Goddard, 21st IEEE Conference on Mass Storage Systems and Technologies (MSST 2004),pages 315-328, 2004.
  • 5. Use Cases 1. Research Reproduction 2. Scientific Workflows In LEAD tracking namelist input files and visualizations Tracking activity binaries
  • 6. Versioning Model Dimensions of workflow evolution Direct evolution occurs when a user of the workflow performs one of the following actions: Changes the flow and arrangements of the components within the system Changes the components within the workflow Changes inputs and/or output parameters or configuration parameters to different components within the workflow Contributions tracks components that are reused from a previous system Workflow Evolution Capturing Stages User explicitly saves the workflow User closes the workflow editor Execution of a workflow Warning: This granularity might not capture all edits
  • 7. Trident Workbench Trident Registry Management Workflow Packages Design Trident Runtime Services Trident Registry Data Model Publish-Subscribe Blackboard Workbench Trident Data Model Monitor Data Access Layer Scientific Workflows Evolution Framework Administration Browser Versioning Model RegistryManagement WindowsWorkflowFoundation Local Storage Other Local/remote Versioning System Architecture within Trident Scientific workflow worbench Trident Evolution FrameworkArchitecture Trident Architecture
  • 8. User View (within Trident) Workflow Evolution View Versioned Objects in Registry
  • 9. Performance Evaluation Evaluation strategies Delta – difference between two consecutive versions Checkpointing - complete version saved after fixed number of version No Delta, No Checkpointing Each version saved as it is With Delta, No Checkpointing Delta with previous version With Delta, With Checkpointing Checkpointed after n versions Workflows used
  • 10. Performance Evaluation File Write Time O Workflow M Workflow
  • 11. Performance Evaluation Version Recovery Time O Workflow M Workflow
  • 12. Performance Evaluation Space Usage for a Version O Workflow M Workflow
  • 13. Performance Evaluation Data Retrieved per Version O Workflow M Workflow
  • 14. Discussion "No delta, No Checkpointing" options performs poorly with respect to storage usage 4-5 times for smaller workflow, smaller delta and 2-times for larger workflow, large delta outperforms both other options with respect to version save time, 20-30 times for the large workflow, large delta and 5 times for smaller workflow, small delta version recovery time 10 times for the smaller workflow, small delta and 5 times larger workflow, large delta Criteria for selecting object maintenance strategy size of data objects average changes for data objects between different versions of the same object response time to the user and the system Challenges in working with different types of artifacts
  • 15. Future Work Dynamic strategy to adjust versioning technique depending on object properties Challenges Unavailability of visualization software Visualizing different types of data products, integrating other viz tools LEAD II Vortex2 Use case Tracking different WF Activity library versions
  • 16. Thank You !!! Questions …?