SlideShare uma empresa Scribd logo
1 de 20
1IEEE eScience 2017
sciunits:sciunits:
Reusable Research ObjectsReusable Research Objects
School of Computing, College of Computing and Digital MediaSchool of Computing, College of Computing and Digital Media
Dai Hai Ton That, Gabe Fils,Dai Hai Ton That, Gabe Fils,
Zhihao Yuan, Tanu MalikZhihao Yuan, Tanu Malik
Presented by Gabe FilsPresented by Gabe Fils
2IEEE eScience 2017
Problem SpaceProblem Space
No easily creatable, readily reusable, efficiently versioned,No easily creatable, readily reusable, efficiently versioned,
discrete unit of computation existsdiscrete unit of computation exists
Virtualization Distributed
Version Control
Research Object
portable
self-contained
repeatable collaborative
versioned
documented
3IEEE eScience 2017
Introducing: TheIntroducing: The sciunitsciunit
 Captures application executionsCaptures application executions
 Repeats executionsRepeats executions
 Reproduces executions, changing input argsReproduces executions, changing input args
 Versioned executions stored as oneVersioned executions stored as one sciunitsciunit
 Uses provenance for self-documentationUses provenance for self-documentation
AA reusablereusable research object.research object.
4IEEE eScience 2017
Sample Applications: FIE, VICSample Applications: FIE, VIC
FIE Application WorkflowFIE Application Workflow
 City of Chicago Food InspectionsCity of Chicago Food Inspections
Evaluation ModelEvaluation Model
 Four applicationsFour applications
 Two languagesTwo languages
 130 files130 files
 1580 dependencies1580 dependencies
 908 MB908 MB
https://github.com/chicago/food-inspections-https://github.com/chicago/food-inspections-
evaluationevaluation
 Variable Infiltration CapacityVariable Infiltration Capacity
 Four applicationsFour applications
 Five languagesFive languages
 7 GB7 GB https://vic.readthedocs.iohttps://vic.readthedocs.io
5IEEE eScience 2017
sciunitsciunit ArchitectureArchitecture
https://bitbucket.org/geotrust/sciunit-clihttps://bitbucket.org/geotrust/sciunit-cli
sciunitsciunit clientclient sciunitsciunit serverserver
sciunitsciunit
 LightweightLightweight
 VersionedVersioned
 Self-containedSelf-contained
 StoresStores sciunitssciunits
 VisualizesVisualizes sciunitssciunits
 Linux CLI AppLinux CLI App
 C / PythonC / Python
 Minimal DependenciesMinimal Dependencies
6IEEE eScience 2017
ClientClient packagepackage CommandCommand
Initialize or retrieve aInitialize or retrieve a sciunitsciunit::
Run FIE, capture into package:Run FIE, capture into package:
7IEEE eScience 2017
Packaging DetailsPackaging Details
Package In Storage (133 MB):Package In Storage (133 MB):
22ndnd
Version Of Package (133 MB):Version Of Package (133 MB):
FIE pkg-root DirFIE pkg-root Dir
1) Attach to process1) Attach to process
2) Intercept system calls2) Intercept system calls
3) Copy files / executables3) Copy files / executables
4) Log system calls4) Log system calls
8IEEE eScience 2017
ClientClient repeatrepeat CommandCommand
original calloriginal call
61021 exec61021 exec
“/usr/bin/python”“/usr/bin/python”
replaced callreplaced call
61021 exec61021 exec
“/home/user1/pkgroot/usr/bin/python”“/home/user1/pkgroot/usr/bin/python”
Repeat a package:Repeat a package: 1) Attach to process1) Attach to process
2) Replace system call args2) Replace system call args
9IEEE eScience 2017
Versioning SolutionVersioning Solution
80G File, Fixed 4K Chunks:80G File, Fixed 4K Chunks:
Same File, 1 Byte Inserted At Start:Same File, 1 Byte Inserted At Start:
 When to store?When to store?
 During packagingDuring packaging
 After packagingAfter packaging
 How to store?How to store?
 Line-based diffsLine-based diffs
 Fixed-size chunksFixed-size chunks
 Content-definedContent-defined
10IEEE eScience 2017
Rabin HashRabin Hash
 Hash of subset of file bytes (Hash of subset of file bytes (RH(BRH(B11,, BB22, …, … BBnn))))
 Fixed-size sliding windowFixed-size sliding window nn
 Hash at any positionHash at any position ii ((RH(XRH(X(i,n)(i,n)))))
 Deduplicate chunkDeduplicate chunk
11IEEE eScience 2017
Storage And RetrievalStorage And Retrieval
Deduplicated Container StorageDeduplicated Container Storage
Store package:Store package:
1) Archive package-root1) Archive package-root
2) CDC on archive2) CDC on archive
3) Store manifest3) Store manifest
Retrieve package:Retrieve package:
1) Retrieve manifest1) Retrieve manifest
2) Concatenate chunks2) Concatenate chunks
3) Extract archive3) Extract archive
12IEEE eScience 2017
Detailed VisualizationDetailed Visualization
Part Of A Normal (Verbose) Provenance LogPart Of A Normal (Verbose) Provenance Log
Small Section Of Graph Built From Normal Provenance LogSmall Section Of Graph Built From Normal Provenance Log
13IEEE eScience 2017
Summarization: Group By SimilaritySummarization: Group By Similarity
 Group vertices byGroup vertices by type / connectionstype / connections
 Effect: group subprocesses, group files in directoryEffect: group subprocesses, group files in directory
Similarity RuleSimilarity Rule
Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v)Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v)
Similarity AppliedSimilarity AppliedFull GraphFull Graph
14IEEE eScience 2017
Summarization: PackSummarization: Pack
 Find min-connected nodes, pack into hubsFind min-connected nodes, pack into hubs
Packability RulesPackability Rules
1) Type(u) = file, {1) Type(u) = file, { !e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨!e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨
2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧
3) Type(u) = file, { !(e∃3) Type(u) = file, { !(e∃ 11,e,e22) | ( x V, v≠x) ( e∃ ∈ ∧) | ( x V, v≠x) ( e∃ ∈ ∧ 11=(u,v) E, e∈=(u,v) E, e∈ 22=(x,u) E ) }∈=(x,u) E ) }∈
Packability AppliedPackability Applied
Similarity AppliedSimilarity Applied
15IEEE eScience 2017
Summarization: AnnotateSummarization: Annotate
 Higher precedence to process nodesHigher precedence to process nodes
 File with n > 1 edges → n annotationsFile with n > 1 edges → n annotations
Annotation AppliedAnnotation AppliedPackability AppliedPackability Applied
16IEEE eScience 2017
Package / Repeat PerformancePackage / Repeat Performance
 Added ptrace system callsAdded ptrace system calls
 I/O-intensive apps: VICI/O-intensive apps: VIC
 Non-I/O-intensive apps: FIENon-I/O-intensive apps: FIE
Package/Repeat RuntimesPackage/Repeat Runtimes
1) Run app normally1) Run app normally
2) Run with2) Run with packagepackage
3) Run with3) Run with repeatrepeat
17IEEE eScience 2017
Versioning PerformanceVersioning Performance
Commit/Reconstruct TimesCommit/Reconstruct Times
Storage SizesStorage Sizes
Package/Repeat RuntimesPackage/Repeat Runtimes
1) Size of several versions1) Size of several versions
2) Size after deduplication2) Size after deduplication
3) CDC / concatenation time3) CDC / concatenation time
18IEEE eScience 2017
Results From Provenance SummarizationResults From Provenance Summarization
Reduction Of EdgesReduction Of EdgesReduction Of File NodesReduction Of File Nodes
Reduction Of Process NodesReduction Of Process Nodes
1) Full FIE graph1) Full FIE graph
2) All techniques applied2) All techniques applied
3) Dynamic expansion3) Dynamic expansion
19IEEE eScience 2017
Conclusion And Current WorkConclusion And Current Work
 Graph summarization testingGraph summarization testing
 Database applicationsDatabase applications
 Exact partial repeatabilityExact partial repeatability
 Apps with network-operationsApps with network-operations
 Parallel HPC applicationsParallel HPC applications
 Emerging reusable object formatsEmerging reusable object formats
sciunitsciunit is a portable, self-contained, and inherentlyis a portable, self-contained, and inherently
understandable versioned unit of computation.understandable versioned unit of computation.
20IEEE eScience 2017
Links And AcknowledgementsLinks And Acknowledgements
sciunitsciunit::
 https://sciunit.runhttps://sciunit.run
sciunitsciunit paper:paper:
 https://arxiv.orghttps://arxiv.org
 Search for “Search for “sciunit”sciunit”
National Science Foundation grants ICER-1639759,National Science Foundation grants ICER-1639759,
ICER-1661918, ICER-1440327, ICER-1343816ICER-1661918, ICER-1440327, ICER-1343816

Mais conteúdo relacionado

Semelhante a Sciunits: Resuable Research Object

Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Research Data Alliance
 
AI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience ReportsAI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience ReportsUniversity of Antwerp
 
Towards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignTowards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignDanilo Pianini
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Felix Z. Hoffmann
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesThomas Zimmermann
 
Beyond the GFLOPS
Beyond the GFLOPSBeyond the GFLOPS
Beyond the GFLOPSSlide_N
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RAvjinder (Avi) Kaler
 
Modern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettextModern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettextAlexander Mostovenko
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolIvo Jimenez
 
Kqueue : Generic Event notification
Kqueue : Generic Event notificationKqueue : Generic Event notification
Kqueue : Generic Event notificationMahendra M
 
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat Pôle Systematic Paris-Region
 
An Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANAn Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANTom Mens
 
Comparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyComparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyBenoit des Ligneris
 

Semelhante a Sciunits: Resuable Research Object (20)

Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...Reproducibility challenges in computational settings: what are they, why shou...
Reproducibility challenges in computational settings: what are they, why shou...
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
AI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience ReportsAI For Software Engineering: Two Industrial Experience Reports
AI For Software Engineering: Two Industrial Experience Reports
 
Ab initio training Ab-initio Architecture
Ab initio training Ab-initio ArchitectureAb initio training Ab-initio Architecture
Ab initio training Ab-initio Architecture
 
Towards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems DesignTowards a Foundational API for Resilient Distributed Systems Design
Towards a Foundational API for Resilient Distributed Systems Design
 
Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?Open & reproducible research - What can we do in practice?
Open & reproducible research - What can we do in practice?
 
Handout3o
Handout3oHandout3o
Handout3o
 
Changes and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development ActivitiesChanges and Bugs: Mining and Predicting Development Activities
Changes and Bugs: Mining and Predicting Development Activities
 
Beyond the GFLOPS
Beyond the GFLOPSBeyond the GFLOPS
Beyond the GFLOPS
 
Slimfast
SlimfastSlimfast
Slimfast
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 
Icsm08a.ppt
Icsm08a.pptIcsm08a.ppt
Icsm08a.ppt
 
Modern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettextModern javascript localization with c-3po and the good old gettext
Modern javascript localization with c-3po and the good old gettext
 
The Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI toolThe Popper Experimentation Protocol and CLI tool
The Popper Experimentation Protocol and CLI tool
 
Kqueue : Generic Event notification
Kqueue : Generic Event notificationKqueue : Generic Event notification
Kqueue : Generic Event notification
 
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
 
An Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRANAn Empirical Study of Identical Function Clones in CRAN
An Empirical Study of Identical Function Clones in CRAN
 
Machine Learning and Deep Software Variability
Machine Learning and Deep Software VariabilityMachine Learning and Deep Software Variability
Machine Learning and Deep Software Variability
 
biopython, doctest and makefiles
biopython, doctest and makefilesbiopython, doctest and makefiles
biopython, doctest and makefiles
 
Comparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyComparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization Technology
 

Mais de Tanu Malik

Auditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesAuditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesTanu Malik
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusTanu Malik
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationTanu Malik
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015Tanu Malik
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityTanu Malik
 
Scientometrics
ScientometricsScientometrics
ScientometricsTanu Malik
 
EarthCube DDMA AGU
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGUTanu Malik
 
SOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsSOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsTanu Malik
 

Mais de Tanu Malik (10)

Auditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software PackagesAuditing and Maintaining Provenance in Software Packages
Auditing and Maintaining Provenance in Software Packages
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with Globus
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database Virtualization
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for Repeatability
 
Scientometrics
ScientometricsScientometrics
Scientometrics
 
EarthCube DDMA AGU
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGU
 
SOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsSOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science Objects
 

Último

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Último (20)

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Sciunits: Resuable Research Object

  • 1. 1IEEE eScience 2017 sciunits:sciunits: Reusable Research ObjectsReusable Research Objects School of Computing, College of Computing and Digital MediaSchool of Computing, College of Computing and Digital Media Dai Hai Ton That, Gabe Fils,Dai Hai Ton That, Gabe Fils, Zhihao Yuan, Tanu MalikZhihao Yuan, Tanu Malik Presented by Gabe FilsPresented by Gabe Fils
  • 2. 2IEEE eScience 2017 Problem SpaceProblem Space No easily creatable, readily reusable, efficiently versioned,No easily creatable, readily reusable, efficiently versioned, discrete unit of computation existsdiscrete unit of computation exists Virtualization Distributed Version Control Research Object portable self-contained repeatable collaborative versioned documented
  • 3. 3IEEE eScience 2017 Introducing: TheIntroducing: The sciunitsciunit  Captures application executionsCaptures application executions  Repeats executionsRepeats executions  Reproduces executions, changing input argsReproduces executions, changing input args  Versioned executions stored as oneVersioned executions stored as one sciunitsciunit  Uses provenance for self-documentationUses provenance for self-documentation AA reusablereusable research object.research object.
  • 4. 4IEEE eScience 2017 Sample Applications: FIE, VICSample Applications: FIE, VIC FIE Application WorkflowFIE Application Workflow  City of Chicago Food InspectionsCity of Chicago Food Inspections Evaluation ModelEvaluation Model  Four applicationsFour applications  Two languagesTwo languages  130 files130 files  1580 dependencies1580 dependencies  908 MB908 MB https://github.com/chicago/food-inspections-https://github.com/chicago/food-inspections- evaluationevaluation  Variable Infiltration CapacityVariable Infiltration Capacity  Four applicationsFour applications  Five languagesFive languages  7 GB7 GB https://vic.readthedocs.iohttps://vic.readthedocs.io
  • 5. 5IEEE eScience 2017 sciunitsciunit ArchitectureArchitecture https://bitbucket.org/geotrust/sciunit-clihttps://bitbucket.org/geotrust/sciunit-cli sciunitsciunit clientclient sciunitsciunit serverserver sciunitsciunit  LightweightLightweight  VersionedVersioned  Self-containedSelf-contained  StoresStores sciunitssciunits  VisualizesVisualizes sciunitssciunits  Linux CLI AppLinux CLI App  C / PythonC / Python  Minimal DependenciesMinimal Dependencies
  • 6. 6IEEE eScience 2017 ClientClient packagepackage CommandCommand Initialize or retrieve aInitialize or retrieve a sciunitsciunit:: Run FIE, capture into package:Run FIE, capture into package:
  • 7. 7IEEE eScience 2017 Packaging DetailsPackaging Details Package In Storage (133 MB):Package In Storage (133 MB): 22ndnd Version Of Package (133 MB):Version Of Package (133 MB): FIE pkg-root DirFIE pkg-root Dir 1) Attach to process1) Attach to process 2) Intercept system calls2) Intercept system calls 3) Copy files / executables3) Copy files / executables 4) Log system calls4) Log system calls
  • 8. 8IEEE eScience 2017 ClientClient repeatrepeat CommandCommand original calloriginal call 61021 exec61021 exec “/usr/bin/python”“/usr/bin/python” replaced callreplaced call 61021 exec61021 exec “/home/user1/pkgroot/usr/bin/python”“/home/user1/pkgroot/usr/bin/python” Repeat a package:Repeat a package: 1) Attach to process1) Attach to process 2) Replace system call args2) Replace system call args
  • 9. 9IEEE eScience 2017 Versioning SolutionVersioning Solution 80G File, Fixed 4K Chunks:80G File, Fixed 4K Chunks: Same File, 1 Byte Inserted At Start:Same File, 1 Byte Inserted At Start:  When to store?When to store?  During packagingDuring packaging  After packagingAfter packaging  How to store?How to store?  Line-based diffsLine-based diffs  Fixed-size chunksFixed-size chunks  Content-definedContent-defined
  • 10. 10IEEE eScience 2017 Rabin HashRabin Hash  Hash of subset of file bytes (Hash of subset of file bytes (RH(BRH(B11,, BB22, …, … BBnn))))  Fixed-size sliding windowFixed-size sliding window nn  Hash at any positionHash at any position ii ((RH(XRH(X(i,n)(i,n)))))  Deduplicate chunkDeduplicate chunk
  • 11. 11IEEE eScience 2017 Storage And RetrievalStorage And Retrieval Deduplicated Container StorageDeduplicated Container Storage Store package:Store package: 1) Archive package-root1) Archive package-root 2) CDC on archive2) CDC on archive 3) Store manifest3) Store manifest Retrieve package:Retrieve package: 1) Retrieve manifest1) Retrieve manifest 2) Concatenate chunks2) Concatenate chunks 3) Extract archive3) Extract archive
  • 12. 12IEEE eScience 2017 Detailed VisualizationDetailed Visualization Part Of A Normal (Verbose) Provenance LogPart Of A Normal (Verbose) Provenance Log Small Section Of Graph Built From Normal Provenance LogSmall Section Of Graph Built From Normal Provenance Log
  • 13. 13IEEE eScience 2017 Summarization: Group By SimilaritySummarization: Group By Similarity  Group vertices byGroup vertices by type / connectionstype / connections  Effect: group subprocesses, group files in directoryEffect: group subprocesses, group files in directory Similarity RuleSimilarity Rule Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v)Type(u) = Type(v), Input(u) = Input(v), Output(u) = Output(v) Similarity AppliedSimilarity AppliedFull GraphFull Graph
  • 14. 14IEEE eScience 2017 Summarization: PackSummarization: Pack  Find min-connected nodes, pack into hubsFind min-connected nodes, pack into hubs Packability RulesPackability Rules 1) Type(u) = file, {1) Type(u) = file, { !e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨!e | e E ( e=(u,v) e=(v,u) ) }∃ ∈ ∧ ∨ 2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧2) Type(u) = process, { !e | e E e=(u,v) }∃ ∈ ∧ 3) Type(u) = file, { !(e∃3) Type(u) = file, { !(e∃ 11,e,e22) | ( x V, v≠x) ( e∃ ∈ ∧) | ( x V, v≠x) ( e∃ ∈ ∧ 11=(u,v) E, e∈=(u,v) E, e∈ 22=(x,u) E ) }∈=(x,u) E ) }∈ Packability AppliedPackability Applied Similarity AppliedSimilarity Applied
  • 15. 15IEEE eScience 2017 Summarization: AnnotateSummarization: Annotate  Higher precedence to process nodesHigher precedence to process nodes  File with n > 1 edges → n annotationsFile with n > 1 edges → n annotations Annotation AppliedAnnotation AppliedPackability AppliedPackability Applied
  • 16. 16IEEE eScience 2017 Package / Repeat PerformancePackage / Repeat Performance  Added ptrace system callsAdded ptrace system calls  I/O-intensive apps: VICI/O-intensive apps: VIC  Non-I/O-intensive apps: FIENon-I/O-intensive apps: FIE Package/Repeat RuntimesPackage/Repeat Runtimes 1) Run app normally1) Run app normally 2) Run with2) Run with packagepackage 3) Run with3) Run with repeatrepeat
  • 17. 17IEEE eScience 2017 Versioning PerformanceVersioning Performance Commit/Reconstruct TimesCommit/Reconstruct Times Storage SizesStorage Sizes Package/Repeat RuntimesPackage/Repeat Runtimes 1) Size of several versions1) Size of several versions 2) Size after deduplication2) Size after deduplication 3) CDC / concatenation time3) CDC / concatenation time
  • 18. 18IEEE eScience 2017 Results From Provenance SummarizationResults From Provenance Summarization Reduction Of EdgesReduction Of EdgesReduction Of File NodesReduction Of File Nodes Reduction Of Process NodesReduction Of Process Nodes 1) Full FIE graph1) Full FIE graph 2) All techniques applied2) All techniques applied 3) Dynamic expansion3) Dynamic expansion
  • 19. 19IEEE eScience 2017 Conclusion And Current WorkConclusion And Current Work  Graph summarization testingGraph summarization testing  Database applicationsDatabase applications  Exact partial repeatabilityExact partial repeatability  Apps with network-operationsApps with network-operations  Parallel HPC applicationsParallel HPC applications  Emerging reusable object formatsEmerging reusable object formats sciunitsciunit is a portable, self-contained, and inherentlyis a portable, self-contained, and inherently understandable versioned unit of computation.understandable versioned unit of computation.
  • 20. 20IEEE eScience 2017 Links And AcknowledgementsLinks And Acknowledgements sciunitsciunit::  https://sciunit.runhttps://sciunit.run sciunitsciunit paper:paper:  https://arxiv.orghttps://arxiv.org  Search for “Search for “sciunit”sciunit” National Science Foundation grants ICER-1639759,National Science Foundation grants ICER-1639759, ICER-1661918, ICER-1440327, ICER-1343816ICER-1661918, ICER-1440327, ICER-1343816