Linking the prospective and retrospective provenance of scripts

Saumen Dey, Khalid Belhajjame, David Koop, Meghan Raul, Bertram Ludäscher

 Scientists process and analyze their datasets using
scripts.
 Retrospective provenance information can be useful for
scientists to analyze of script execution
 E.g., noWorkflow [Murta et al., IPAW’14]
 While useful, the amount of information such fine-
grained traces contain can be overwhelming for end
users.
There is a need for abstraction techniques that focus the attention
of users on the provenance information relevant for their analyses.
TaPP'15 2

Workflow-Oriented Proposals
 Zoom*UserViews [Biton et al., ICDE’08]:Workflow views are used for
abstracting the retrospective provenance of (potentially complex) workflows
 LabelFlow [Alper et al., IPAW’14]: Both (prospective and retrospective)
workflow provenance are summarized using user annotations and reduction
rules.
Script-Oriented Proposals
 YesWorkflow [McPhillips et al.,TaPP’15]:We have seen in the presentation by
B. Ludäscher, how a URI scheme can be used to reconstruct part of the
retrospective provenance, without recording it at run time.
 This scheme is useful when the data products used and generated,
including intermediate ones, are stored within the file system.
We adopt a similar approach as Zoom*UserViews and apply it to scripts:
We use the workflow specifications (i.e., prospective provenance) extracted by
YesWorkflow to abstract the retrospective provenance captured by noWorkflow
TaPP'15 3

Script Annotate
Extract
Workflow
Specification
Workflow
Description
YesWorkflow
TaPP'15 4

Script Annotate
Extract
Workflow
Specification
Workflow
Description
YesWorkflow
read_temperatureDataFile pastTemperatureData
read_precipitationDataFile pastPrecipitationData
simulate simulatedWeatherData
extract_temperature temperatureData
extract_precipitation precipitationData
create_plot plot
temperatureDataFile
precipitationDataFile
TaPP'15 5

Script
Run Retrospective
Provenance
noWorkflow
Annotate
Extract
Workflow
Specification
Workflow
Description
YesWorkflow
• variable(6, 93, row, 14, ”[0.0]” ,
1430231173.397779).
• dependency(6, 34, 94, 93).
TaPP'15 6

Script
Link* Query
Provenance
User
Run Retrospective
Provenance
noWorkflow
Annotate
Extract
Workflow
Specification
Workflow
Description
YesWorkflow
Link*: Links retrospective and prospective provenancesTaPP'15 7

 Variable names are not reliable IDs.
 Two different variables may have the same name within a
script, e.g., within the scope of different functions
 noWorkflow tracks line numbers in the script to identify the
variable
 YesWorkflow does not provide this information
 However, it provides the line numbers of the start and end of a
block, when requested.
 Using the two pieces of information, we are able to connect data
values in noWorkflows to their correspondingYesWorkflow
variables.
TaPP'15 8

 The variables inYesWorkflows are associated with user
annotations
 Such annotations can be used to provide more information
about the data values within noWorkflow retrospective
provenance
 We note here that it is possible inYesWorkflow to specify
variables that are not mapped to any variables within the
script
TaPP'15 9

-
d
d
e
d
r
y
y
e
-
1 # @begi n model
2 # @i n a @as past Temper at ur eDat a
3 # @i n b @as past Pr eci pi t at i onDat a
4 # @out dat a @as si mul at edWeat her
5 i f ( sum( a) / l en( a) ) < 0:
6 dat a = model 1( a, b)
7 el se:
9 # @end model
Figure3. Annotating anif-then-elsecontrol block using YesWork-
ﬂow
model
pastTemperatureData pastPrecipitationData
simulatedWeather
e
t
d
e
)
n
-
-
-
-
d
s
s
n
-
a
d
-
e
t
s
s
24 ## @begi n mai n
25 # @i n dat a1. dat @as t emper at ur eDat aFi l e @URI
f i l e: t emp. dat
26 # @i n dat a2. dat @as pr eci pi t at i onDat aFi l e @URI
f i l e: pr eci p. dat
27 # @out out put . png @as pl ot
28
29 ## @begi n r ead_f i l e_1
30 # @i n dat a1. dat @as t emper at ur eDat aFi l e @URI
f i l e: t emp. dat
31 # @out a @as past Temper at ur eDat a
32 a = r ead_f i l e( " dat a1. dat " )
33 ## @end r ead_f i l e_1
34
35 ## @begi n r ead_f i l e_2
36 # @i n dat a2. dat @as pr eci pi t at i onDat aFi l e @URI
f i l e: pr eci p. dat
37 # @out b @as past Pr eci pi t at i onDat a
38 b = r ead_f i l e( " dat a2. dat " )
39 ## @end r ead_f i l e_2
40
41 i f sum( a) / l en( a) < 0:
42 ## @begi n model _1
47 ## @end model _1
48 el se:
49 ## @begi n model _2
54 ## @end model _2
55
56 ## @begi n ext r act _t emper at ur e
model_1 Model_2
simulatedWeather
TaPP'15 10

 We found gaps in the noWorkflow dependency graph (when
exported to prolog)
 Function returns do not always link back to correct
variable.
 An object that is modified (e.g via a list.append call) is not
always captured in the dependency graph
 We also observed that noWorkflow does not capture
information about the name of the script in the retrospective
provenance
 This is required for uniquely identifying variables.
TaPP'15 11

 An object may be nested, i.e., it may have
other objects as is children.
 Any change to the child object is attributed to
the parent object.
 noWorkflow attributes any changes to any
child object to the ultimate parent.
TaPP'15 12

Q1: find the temperature file used.
fileName(N) :-
map(V,A),
A=“temperatureDataFile”,
iData(V,N).
Q2: show how the temperature file was used by.
fileUsedBy(X,Y) :-
iDepAbs(X,Y),
map(X,A),
A=“temperatureDataFile”.
fileUsedBy(X,Y) :-
fileUsedBy(X,Z).
iDepAbs(Z,Y).
TaPP'15 16

 We have implemented an approach for linking the retrospective
provenance of script to a more abstracted and user defined
prospective provenance.
 This solution is complementary to theYesWorkflow solution for
capturing retrospective provenance of data files [1]
 The issues that we came across usingYesWorkflow and
noWorkflow were communicated to the development teams of
these tools to address them
 Linking prospective and retrospective provenance for multiple
scripts as opposed to a single script
 Designers may organize the implementation their data analyses
into multiple scripts.
[1]T. M. McPhillips et al. Retrospective provenance without a run-time provenance recorder. InTapp, 2015
TaPP'15 17

Linking the prospective and retrospective provenance of scripts

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (7)

Semelhante a Linking the prospective and retrospective provenance of scripts

Semelhante a Linking the prospective and retrospective provenance of scripts (20)

Mais de Khalid Belhajjame

Mais de Khalid Belhajjame (12)

Último

Último (20)

Linking the prospective and retrospective provenance of scripts

Notas do Editor