SlideShare a Scribd company logo
1 of 21
Download to read offline
Wf4Ever: Advanced Workflow Preservation
   Technologies for Enhanced Science
                 Grant agreement no.: 27092


     Data Curation & Preservation Session
                     Jose Enrique Ruiz
                         IAA-CSIC

                    8th December 2010
        IVOA 2010 Fall Interoperability Meeting - Nara
Introduction
                                                          The Team



             1.  Intelligent Software Components (ISOCO, Spain)
             2.  University of Manchester (UNIMAN, UK)
             3.  Universidad Politécnica de Madrid (UPM, Spain)
2    7
 5       4   4.  Poznan Supercomputing and Networking Centre
                 (PSNC, Poland)
             5.  Universisty of Oxford (OXF, UK)
             6.  Instituto de Astrofísica de Andalucía (IAA, Spain)
13           7.  Leiden University Medical Centre (LUMC, NL)
6




                                                                      2
Introduction
                                          The Consortium Classified



Partners                     Technological Core Competencies

» One SME                    » Digital Libraries
» Six public organizations   » Workflow Management
                             » Semantic Web
                             » Integrity & Authenticity
                             » Provenance



Major Sectors                Case Studies

» Education                  » Workflow Preservation in Astronomy (IAA)
» IT                         » Workflow Preservation for Genome-wide
» Astronomy                    Analysis and Biobanking (LUMC)
» Bioinfiormatics




                                                                          3
4
5
6
7
8
9
YOUR
 FAVOURITE
MASS MODEL
    TOOL
             10
Scientific Workflows
                                                                                              State of the art
A Scientific Workflow can be seen as the combination of      » Central in experimental science
 data and processes into a configurable, structured set of      ›  Enable automation
   steps that implement semi-automated computational            ›  Make science repeatable (and sometimes
           solutions in scientific problem-solving
                                                                   reproducible)
                                                                ›  Encourage best practices
                                                             » Scientist-friendly
                                                                ›  Aimed at (some types of) scientists, possibly
                                                                   even without strong computational skills
                                                             » Communities: need for scientific data
                                                               preservation
                                                                ›  Enhance scientific development by building on,
                                                                   sharing, and extending previous results within
                                                                   scientific communities
                                                             » However, workflow preservation is especially
                                                               complex
                                                                ›  Workflows not only specified statically at design
                                                                   time but also interpreted through their execution
                                                                ›  Complex models are required to describe
                                                                   workflows and related resources, including
                                                                   documents, data and services
                                                                ›  Resources often beyond control of scientists
                                                                                                                   11
Project Objectives
                                                                                           Goals

Technological infrastructure for the preservation and efficient retrieval and reuse of scientific
                              workflows in a range of disciplines



                                                 » Creation and management of complex
                                                   Research Objects that take into account
                                                   the dual nature (static and dynamic) of
                                                   scientific workflows

                                                 » Archival, classification, and indexing of
                                                   scientific workflows and their associated
                                                   materials in scalable semantic repositories,
                                                   providing advanced access and
                                                   recommendation capabilities

                                                 » Creation of scientific communities to
                                                   collaboratively share, reuse and evolve
                                                   workflows and their parts, stimulating the
                                                   development of new scientific knowledge
                                                                                                    12
Integrity & Authenticy
                                                                                 Definitions

                 Integrity                                   Authenticity



» The quality or condition of being whole,   » Authenticity has a twofold dimension: data
  complete and unaltered                       origin and entity authenticity

» Crucial for ensuring the quality of        » Data origin: Proof of the origin of data,
  preserved data in Research Objects           their genuineness, trustworthiness and
                                               realness
                                             » Entity: ensuring that an entity, e.g. a
                                               person or other kind of actor, is the one it
                                               claims to be




                                                                                              13
Integrity and Authenticity Maintenance
                                                                                Objectives


    Evaluate and preserve the integrity and authenticity of archived Research Objects



» To ensure the data can be accessed and
  interpreted unchanged, complete, and correct
  today and in the future                                     Provenance-based
                                                              means to calculate
» To preserve the integrity of archived Research
  Objects by tracking and verifying changes in               measures of integrity
  archived objects as well as related resources                 and authenticity
» To assist scientists in anticipating potential
  inconsistencies caused by uncontrolled changes
  in such resources

» To verify and proof the authenticity of authors and
  contributors to Research Objects as well as of
  internal and related resources
                                                                                           14
Project Objectives
                                              Common needs from the Community


                What do you want to know when accessing a workflow ?




» If I can use it for my purposes (in my words)
» If I can expect it to run, when it was last run, by whom
» What it does quickly, by one of
   » example input / output (and trying it)
   » a description
   » ‘reading’ its key parts
   » what is was used for
   » related workflows
   » its creator
   » contacting the creator or last user
» How I need to cite the author and workflow



                                                                                      15
Project Objectives
                                              Common needs from the Community


                 What do you want to know when sharing a workflow ?




» What rights others have
» What a good workflow is to get a good score
   » Make my workflow findable, reusable, and ready for review
   » Instructions to authors
   » Two types of contributions: serious science, preliminary/playing around
» If my workflow may have issues
» What the system or other users think it does
   » How it relates to other things
» Share freely or anonymously upon request




                                                                                         16
Project Objectives
                                                               Main Challenges



                                      Quality


•  Workflow + resources
                                    Preservation        •  Sharing & Reuse
•  Manipulation                                         •  Affinity
•  Classification,          •  Store                    •  Incremental
   categorization           •  Access                      development
•  Similarity               •  Evolution                •  Credit, citation
•  Abstraction              •  Scale
                            •  Versioning

      Representation                                             Community

                          Attribution, accountability

                                                                                 17
Project Objectives
                                                                Main Challenges



                                      Quality           INTEGRITY


•  Workflow + resources
                                    Preservation         •  Sharing & Reuse
•  Manipulation                                          •  Affinity
•  Classification,          •  Store                     •  Incremental
   categorization           •  Access                       development
•  Similarity               •  Evolution                 •  Credit, citation
•  Abstraction              •  Scale
                            •  Versioning

      Representation                  AUTHENTICITY                Community

                          Attribution, accountability

                                                                                  18
Evaluation, Validation & Community Building
                                                                           Wf4Ever Case Studies

Two workflow-intensive scientific case studies in the domains of Astronomy and Genomics

                 Astronomy                                             Genomics

» Application area: Virtual Observatory (VO)          » Application areas: Biobanking and
  data processing                                       Genome-Wide analisys
   ›  Astrophysical quantities propagation               ›  Interpretation of GWAS data
   ›  Source extraction on CCD images                    ›  Gene expression studies
   ›  Modeling of galaxy 3D data                      » Focus on authenticity and experimental
» Focus on bringing workflow-based                      reproducibility
  methodologies into Astronomy                        » Community! Lots of available workflows
» Creation of Golden Exemplars                           ›  myExperiment
» Beachhead                                              ›  SysMO-DB
                                                      » Long tradition of workflow application

                                             Overall goals

» To collect and preserve existing workflows and their related objects in each area
» To create scientific communities around the use and preservation of scientific workflows
» To apply, evaluate, and provide feedback on the results obtained from system and component-
  level research
                                                                                                 19
Astronomy WorkPackage
                                                                                Main Goals

WP5: Workflow Preservation in Astronomy

Scientific contribution

»  evelopment of an online community of scientists working on Astronomy
 D
» ntroduction of workflow and workflow preservation needs in Astronomy and the
 I
Virtual Observatory
»  rovide a set of workflows for frequently used complex task-combinations and
 P
demands in the Astronomy domain

Technological contribution

»  nline repository of preserved Astronomy workflows identifying preservation needs
 O
»  reation of three Golden Exemplars workflows:
 C
        ›  using Wf4Ever results
        ›  involving additional implementations for wrapping VO Web services


Workflow-based methodology deployed in the VO community through exemplars
and preservation methodologies
                                                                                          20
Thanks for your Attention!
                                              Questions




http://www.wf4ever-project.org



                                                        21

More Related Content

Viewers also liked

Viewers also liked (7)

Use of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubesUse of CharDM in an archive of velocity cubes
Use of CharDM in an archive of velocity cubes
 
Riesgos eléctricos
Riesgos eléctricosRiesgos eléctricos
Riesgos eléctricos
 
B0DEGA 3D VO Archive - IVOA 2010 Fall Interop
B0DEGA 3D VO Archive - IVOA 2010 Fall InteropB0DEGA 3D VO Archive - IVOA 2010 Fall Interop
B0DEGA 3D VO Archive - IVOA 2010 Fall Interop
 
C. Lai Ping
C. Lai Ping C. Lai Ping
C. Lai Ping
 
Ipfa2012 photo contest winners
Ipfa2012 photo contest winnersIpfa2012 photo contest winners
Ipfa2012 photo contest winners
 
SVO Activities - SEA 2008
SVO Activities - SEA 2008SVO Activities - SEA 2008
SVO Activities - SEA 2008
 
Research Objects in Wf4Ever
Research Objects in Wf4EverResearch Objects in Wf4Ever
Research Objects in Wf4Ever
 

Similar to Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i

OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objectsseanb
 
OeRC Seminar
OeRC SeminarOeRC Seminar
OeRC Seminarseanb
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and SharingJisc
 
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...Neil Chue Hong
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional RepositoriesJoshua Parker
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objectsStian Soiland-Reyes
 
Doing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers SeminarDoing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers SeminarNeil Chue Hong
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Jian Qin
 
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12ASIS&T
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNADaniel S. Katz
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?Graham Pryor
 
Research Solutions for Education
Research Solutions for EducationResearch Solutions for Education
Research Solutions for EducationLee Stott
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalSayeed Choudhury
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsJian Qin
 
Innovation, community, sustainability
Innovation, community, sustainabilityInnovation, community, sustainability
Innovation, community, sustainabilityPaul Walk
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesKerstin Forsberg
 

Similar to Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i (20)

OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
 
OeRC Seminar
OeRC SeminarOeRC Seminar
OeRC Seminar
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
Doing Science Properly in the Digital Age: Software Skills for Free-Range Res...
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Institutional Repositories
Institutional RepositoriesInstitutional Repositories
Institutional Repositories
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects
 
Doing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers SeminarDoing Science Properly In The Digital Age - Rutgers Seminar
Doing Science Properly In The Digital Age - Rutgers Seminar
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
 
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
Preparing eScience Librarians for Managing Research Data - Jian Qin - RDAP12
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
Why manage research data?
Why manage research data?Why manage research data?
Why manage research data?
 
Research Solutions for Education
Research Solutions for EducationResearch Solutions for Education
Research Solutions for Education
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
Dc sheridan dlf_2011_final
Dc sheridan dlf_2011_finalDc sheridan dlf_2011_final
Dc sheridan dlf_2011_final
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future Jobs
 
Innovation, community, sustainability
Innovation, community, sustainabilityInnovation, community, sustainability
Innovation, community, sustainability
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 

More from Jose Enrique Ruiz

Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroidsJose Enrique Ruiz
 
IPython Notebooks - Hacia los papers ejecutables
IPython Notebooks - Hacia los papers ejecutablesIPython Notebooks - Hacia los papers ejecutables
IPython Notebooks - Hacia los papers ejecutablesJose Enrique Ruiz
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesJose Enrique Ruiz
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable PapersJose Enrique Ruiz
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paperJose Enrique Ruiz
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyJose Enrique Ruiz
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VODataJose Enrique Ruiz
 
Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web ServicesJose Enrique Ruiz
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationJose Enrique Ruiz
 
Workflows in the Virtual Observatory
Workflows in the Virtual ObservatoryWorkflows in the Virtual Observatory
Workflows in the Virtual ObservatoryJose Enrique Ruiz
 
VO web-services-based astronomy workflows
VO web-services-based astronomy workflowsVO web-services-based astronomy workflows
VO web-services-based astronomy workflowsJose Enrique Ruiz
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsJose Enrique Ruiz
 
Collaborative Digital Experiments
Collaborative Digital ExperimentsCollaborative Digital Experiments
Collaborative Digital ExperimentsJose Enrique Ruiz
 
El Observatorio Virtual - eCA
El Observatorio Virtual - eCAEl Observatorio Virtual - eCA
El Observatorio Virtual - eCAJose Enrique Ruiz
 
Multidimensional Data in the VO
Multidimensional Data in the VOMultidimensional Data in the VO
Multidimensional Data in the VOJose Enrique Ruiz
 

More from Jose Enrique Ruiz (18)

Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
IPython Notebooks - Hacia los papers ejecutables
IPython Notebooks - Hacia los papers ejecutablesIPython Notebooks - Hacia los papers ejecutables
IPython Notebooks - Hacia los papers ejecutables
 
Velocity cubes of galaxies
Velocity cubes of galaxiesVelocity cubes of galaxies
Velocity cubes of galaxies
 
Implementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxiesImplementing a VO archive for datacubes of galaxies
Implementing a VO archive for datacubes of galaxies
 
Open Science and Executable Papers
Open Science and Executable PapersOpen Science and Executable Papers
Open Science and Executable Papers
 
Digital Science: Towards the executable paper
Digital Science: Towards the executable paperDigital Science: Towards the executable paper
Digital Science: Towards the executable paper
 
Digital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in AstronomyDigital Science: Reproducibility and Visibility in Astronomy
Digital Science: Reproducibility and Visibility in Astronomy
 
Workflows to access and massage VOData
Workflows to access and massage VODataWorkflows to access and massage VOData
Workflows to access and massage VOData
 
Curation and Characterization of Web Services
Curation and Characterization of Web ServicesCuration and Characterization of Web Services
Curation and Characterization of Web Services
 
Digital Science
Digital ScienceDigital Science
Digital Science
 
Wf4Ever: Workflow Preservation
Wf4Ever: Workflow PreservationWf4Ever: Workflow Preservation
Wf4Ever: Workflow Preservation
 
Workflows in the Virtual Observatory
Workflows in the Virtual ObservatoryWorkflows in the Virtual Observatory
Workflows in the Virtual Observatory
 
Workflow Preservation
Workflow PreservationWorkflow Preservation
Workflow Preservation
 
VO web-services-based astronomy workflows
VO web-services-based astronomy workflowsVO web-services-based astronomy workflows
VO web-services-based astronomy workflows
 
Curating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital ExperimentsCurating and Preserving Collaborative Digital Experiments
Curating and Preserving Collaborative Digital Experiments
 
Collaborative Digital Experiments
Collaborative Digital ExperimentsCollaborative Digital Experiments
Collaborative Digital Experiments
 
El Observatorio Virtual - eCA
El Observatorio Virtual - eCAEl Observatorio Virtual - eCA
El Observatorio Virtual - eCA
 
Multidimensional Data in the VO
Multidimensional Data in the VOMultidimensional Data in the VO
Multidimensional Data in the VO
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i

  • 1. Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science Grant agreement no.: 27092 Data Curation & Preservation Session Jose Enrique Ruiz IAA-CSIC 8th December 2010 IVOA 2010 Fall Interoperability Meeting - Nara
  • 2. Introduction The Team 1.  Intelligent Software Components (ISOCO, Spain) 2.  University of Manchester (UNIMAN, UK) 3.  Universidad Politécnica de Madrid (UPM, Spain) 2 7 5 4 4.  Poznan Supercomputing and Networking Centre (PSNC, Poland) 5.  Universisty of Oxford (OXF, UK) 6.  Instituto de Astrofísica de Andalucía (IAA, Spain) 13 7.  Leiden University Medical Centre (LUMC, NL) 6 2
  • 3. Introduction The Consortium Classified Partners Technological Core Competencies » One SME » Digital Libraries » Six public organizations » Workflow Management » Semantic Web » Integrity & Authenticity » Provenance Major Sectors Case Studies » Education » Workflow Preservation in Astronomy (IAA) » IT » Workflow Preservation for Genome-wide » Astronomy Analysis and Biobanking (LUMC) » Bioinfiormatics 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 11. Scientific Workflows State of the art A Scientific Workflow can be seen as the combination of » Central in experimental science data and processes into a configurable, structured set of ›  Enable automation steps that implement semi-automated computational ›  Make science repeatable (and sometimes solutions in scientific problem-solving reproducible) ›  Encourage best practices » Scientist-friendly ›  Aimed at (some types of) scientists, possibly even without strong computational skills » Communities: need for scientific data preservation ›  Enhance scientific development by building on, sharing, and extending previous results within scientific communities » However, workflow preservation is especially complex ›  Workflows not only specified statically at design time but also interpreted through their execution ›  Complex models are required to describe workflows and related resources, including documents, data and services ›  Resources often beyond control of scientists 11
  • 12. Project Objectives Goals Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines » Creation and management of complex Research Objects that take into account the dual nature (static and dynamic) of scientific workflows » Archival, classification, and indexing of scientific workflows and their associated materials in scalable semantic repositories, providing advanced access and recommendation capabilities » Creation of scientific communities to collaboratively share, reuse and evolve workflows and their parts, stimulating the development of new scientific knowledge 12
  • 13. Integrity & Authenticy Definitions Integrity Authenticity » The quality or condition of being whole, » Authenticity has a twofold dimension: data complete and unaltered origin and entity authenticity » Crucial for ensuring the quality of » Data origin: Proof of the origin of data, preserved data in Research Objects their genuineness, trustworthiness and realness » Entity: ensuring that an entity, e.g. a person or other kind of actor, is the one it claims to be 13
  • 14. Integrity and Authenticity Maintenance Objectives Evaluate and preserve the integrity and authenticity of archived Research Objects » To ensure the data can be accessed and interpreted unchanged, complete, and correct today and in the future Provenance-based means to calculate » To preserve the integrity of archived Research Objects by tracking and verifying changes in measures of integrity archived objects as well as related resources and authenticity » To assist scientists in anticipating potential inconsistencies caused by uncontrolled changes in such resources » To verify and proof the authenticity of authors and contributors to Research Objects as well as of internal and related resources 14
  • 15. Project Objectives Common needs from the Community What do you want to know when accessing a workflow ? » If I can use it for my purposes (in my words) » If I can expect it to run, when it was last run, by whom » What it does quickly, by one of » example input / output (and trying it) » a description » ‘reading’ its key parts » what is was used for » related workflows » its creator » contacting the creator or last user » How I need to cite the author and workflow 15
  • 16. Project Objectives Common needs from the Community What do you want to know when sharing a workflow ? » What rights others have » What a good workflow is to get a good score » Make my workflow findable, reusable, and ready for review » Instructions to authors » Two types of contributions: serious science, preliminary/playing around » If my workflow may have issues » What the system or other users think it does » How it relates to other things » Share freely or anonymously upon request 16
  • 17. Project Objectives Main Challenges Quality •  Workflow + resources Preservation •  Sharing & Reuse •  Manipulation •  Affinity •  Classification, •  Store •  Incremental categorization •  Access development •  Similarity •  Evolution •  Credit, citation •  Abstraction •  Scale •  Versioning Representation Community Attribution, accountability 17
  • 18. Project Objectives Main Challenges Quality INTEGRITY •  Workflow + resources Preservation •  Sharing & Reuse •  Manipulation •  Affinity •  Classification, •  Store •  Incremental categorization •  Access development •  Similarity •  Evolution •  Credit, citation •  Abstraction •  Scale •  Versioning Representation AUTHENTICITY Community Attribution, accountability 18
  • 19. Evaluation, Validation & Community Building Wf4Ever Case Studies Two workflow-intensive scientific case studies in the domains of Astronomy and Genomics Astronomy Genomics » Application area: Virtual Observatory (VO) » Application areas: Biobanking and data processing Genome-Wide analisys ›  Astrophysical quantities propagation ›  Interpretation of GWAS data ›  Source extraction on CCD images ›  Gene expression studies ›  Modeling of galaxy 3D data » Focus on authenticity and experimental » Focus on bringing workflow-based reproducibility methodologies into Astronomy » Community! Lots of available workflows » Creation of Golden Exemplars ›  myExperiment » Beachhead ›  SysMO-DB » Long tradition of workflow application Overall goals » To collect and preserve existing workflows and their related objects in each area » To create scientific communities around the use and preservation of scientific workflows » To apply, evaluate, and provide feedback on the results obtained from system and component- level research 19
  • 20. Astronomy WorkPackage Main Goals WP5: Workflow Preservation in Astronomy Scientific contribution »  evelopment of an online community of scientists working on Astronomy D » ntroduction of workflow and workflow preservation needs in Astronomy and the I Virtual Observatory »  rovide a set of workflows for frequently used complex task-combinations and P demands in the Astronomy domain Technological contribution »  nline repository of preserved Astronomy workflows identifying preservation needs O »  reation of three Golden Exemplars workflows: C ›  using Wf4Ever results ›  involving additional implementations for wrapping VO Web services Workflow-based methodology deployed in the VO community through exemplars and preservation methodologies 20
  • 21. Thanks for your Attention! Questions http://www.wf4ever-project.org 21