SlideShare uma empresa Scribd logo
1 de 85
Baixar para ler offline
ETC	&	Authors	in	the	Driver’s	Seat	
vs
YesWorkflow:	Revealing	data-/workflow	from	scripts
Kurator:	Automating	data	curation	workflows
EulerX:	Agreeing	to	disagree	about	taxonomies
Whole-Tale:	Reproducible,	computational	narratives		
Bertram	Ludäscher
ludaesch@illinois.edu
ETC+Authors @	Biosphere	2
2018-01-10..12
Director,	Center	for	Informatics	Research	in	Science	&	Scholarship	(CIRSS)	
School	of	Information	Sciences	(iSchool@Illinois)
&	National	Center	for	Supercomputing	Applications	(NCSA)
&	Department	of	Computer	Science	(CS@Illinois)	
1
Author’s	Driving	..
• Curators:	dealing	with	problems	of	data	quality,	
reuse,	interoperability,	etc.	as	soon	as	they	can
– but	often:	“down	the	road”…
• Authors:		address	(meta-)data	quality	upstream	
– ..	at	the	source,	when	data	is	created
=>	Resonates	with	“empowering	scientists”	theme	
we’re	pursuing	in	other	projects	(e.g.	WT,	YW	..)
Ludäscher:	Workflows	&	Provenance	=>	Understanding 2
Provenance	(Lineage)	matters	…	
• One	of	these	sold	for	$180M,	the	other	one	for	
$22K	(but	could	be	worth	more	...	definitely	maybe	...)	
• Which	one	would	you	like	to	own?	
Ludäscher:	Workflows	&	Provenance	=>	Understanding 3
Provenance	(Lineage)	matters	…	
• One	of	these	sold	for	$180M,	the	other	one	for	…
• …	$450M	!!!
Ludäscher:	Workflows	&	Provenance	=>	Understanding 4
Provenance	is:	keeping	records …	
• Grand	Canyon’s	rock	layers	are	a	record	of	the	early	geologic	history	of	North	America.	
The	ancestral	puebloan granaries	at	Nankoweap Creek	tell	archaeologists	about	more	
recent	human	history.	(By	Drenaline,	licensed	under	CC	BY-SA	3.0)
• Not	shown:	computational	archaeologists	reconstructing	past	climate	from	multiple	tree-
ring	databases	è computational	provenance	is	key	for	transparency &	reproducibility
Ludäscher:	Workflows	&	Provenance	=>	Understanding 5
...	and	provenance	is:	
Understanding what	happened!
Zrzavý,	Jan,	David	Storch,	and Stanislav	
Mihulka.	Evolution:	Ein	Lese-Lehrbuch.	
Springer-Verlag,	2009.
Author:	Jkwchui (Based	on	
drawing	by	Truth-seeker2004)
Ludäscher:	Workflows	&	Provenance	=>	Understanding
6
Computational Provenance …
• Origin,	processing	history	of	artifacts
– data	products,	figures,	...
– also:	underlying	workflow
è understand	methods,	dataflow,	and	dependencies
Ludäscher:	Workflows	&	Provenance	=>	Understanding 7
Climate Change Impacts
in the United States
U.S. National Climate Assessment
U.S. Global Change Research Program
Rewind: Data Curation Workflows
(Filtered-Push … Kepler … Kurator projects)
Ludäscher:	Workflows	&	Provenance	=>	Understanding
8
Data	Curation	Workflows	&	Provenance
• Data	curation	and	data	cleaning	workflows	
– …	can	be	defined	using	a	workflow	system	
• workflow	=	“prospective”	provenance	(=	general	recipe)
– ...	or	using	good-old scripts (bash,	Python,	R,	...)	
• …	which	is	what	many	“mere	mortals”	use!
• Script-based	workflows	
– …	benefit	from	having	the	workflow	exposed	and	
dataflow	dependencies	revealed
Ludäscher:	Workflows	&	Provenance	=>	Understanding 9
Runtime	Provenance	
(a.k.a.	traces,	logs,		
retrospective
provenance,
“Trace-land”)
Workflow	Modeling	&	Design
(a.k.a.	prospective provenance
“Workflow-land”)
Ludäscher:	Workflows	&	Provenance	=>	Understanding
10
Workflows	ó Provenance	an	important	link!
=	W3C	PROV	+	DataONE extensions
11
Trace
Workflow
Data (extensible)
See purl.dataone.org/provone-v1-dev
• …	NSF	SKOPE: system	and	tools	to	discover,	
access,	analyze,	visualize	paleoenvironmental
data
– unprecedented	ability	to	explore	provenance	
(detailed,	comprehensible	record	of	computational	
derivation	of	results)
– for	researchers,	tinkerers,	and	modelers
• …	NSF	Whole	Tale:	
– leverage	&	contribute	to	existing	CI	to	support	the	
whole	tale	(“living	paper”),	from	workflow	run	to	
scholarly	publication
– integrate	tools	&	CI	(DataONE,	Globus,	iRODS,		
NDS,	...)	to	simplify	use	and	promote	best	
practices.
– driven	by	science	WGs	(Archaeology/SKOPE,	
materials	science,	astro,	bio	..)	
Related	Projects:	NSF	DataONE (ProvONE ..)	+	…	
Ludäscher:	Workflows	&	Provenance	=>	Understanding 12
Provenance	Support	for	Reproducible	Science	
Example:	Paleoclimate	Reconstruction
Science	paper	(OA)	uses:
• open	source	code:
– R,	PaleoCAR,	…
• Is	that	all	we	need?
• What	was	the	
“workflow”?
• Is	there	prospective
and/or	retrospective
provenance?
Ludäscher:	Workflows	&	Provenance	=>	Understanding 13
SKOPE:	Synthesized	Knowledge	Of	Past	Environments
Bocinsky,	Kohler	et	al.	study	rain-fed	maize	of Anasazi
– Four	Corners;	AD	600–1500. Climate	change	influenced	Mesa	Verde	Migrations;	late	
13th	century	AD.	Uses	network	of	tree-ring	chronologies	to	reconstruct	a	spatio-
temporal	climate	field	at	a	fairly	high	resolution	(~800	m)	from	AD	1–2000.	Algorithm	
estimates	joint	information	in	tree-rings	and	a	climate	signal	to	identify	“best”	 tree-ring	
chronologies	for	climate	reconstructing.
K.	Bocinsky,	T.	Kohler,	A	2000-year	reconstruction	of	the	rain-fed	
maize	agricultural	niche	in	the	US	Southwest.	Nature
Communications.	doi:10.1038/ncomms6618
… implemented as an R Script …
Ludäscher:	Workflows	&	Provenance	=>	Understanding 14
YesWorkflow:	Prospective	&	Retrospective	
Provenance	…	(almost)	for	free!	
• YW	annotations	in	
a	(Python,	R,	…)	
script	recreate	a	
workflow	view	
from	the	script	…	
cassette_id
sample_score_cutoff
sample_spreadsheet
file:cassette_{cassette_id}_spreadsheet.csv
calibration_image
file:calibration.img
initialize_run
run_log
file:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_log
file:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_number
raw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_image
file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_log
file:run/collected_images.csv
YW!
Ludäscher:	Workflows	&	Provenance	=>	Understanding
15
@BEGIN	..	@END	..
@IN	..	@OUT	..
@URI	..	@LOG	..
GetModernClimate
PRISM_annual_growing_season_precipitation
SubsetAllData
dendro_series_for_calibration
dendro_series_for_reconstruction CAR_Analysis_unique
cellwise_unique_selected_linear_models
CAR_Analysis_union
cellwise_union_selected_linear_models
CAR_Reconstruction_union
raster_brick_spatial_reconstruction raster_brick_spatial_reconstruction_errors
CAR_Reconstruction_union_output
ZuniCibola_PRISM_grow_prcp_ols_loocv_union_recons.tif ZuniCibola_PRISM_grow_prcp_ols_loocv_union_errors.tif
master_data_directory prism_directory
tree_ring_datacalibration_years retrodiction_years
Paleoclimate Reconstruction	(openSKOPE.org)
• …	explained	using	YesWorkflow!
Kyle	B.,	(computational)	archaeologist:	
"It	took	me	about	20	minutes	to	comment.	Less	
than	an	hour	to	learn	and	YW-annotate,	all-told."
Ludäscher:	Workflows	&	Provenance	=>	Understanding 16
YW	Demo	Use	Cases	(IDCC’17)
Domain Use	case Programming	language Provenance	methods
Climate	science C3C4 MATLAB YW	+	MATLAB
RunManager
Astrophysics LIGO Python YW	+	NW	(code-level)
Protein crystal	samples Simulate	data	
collection
Python	 YW	+	NW	(code-level)
Biodiversity	data	
curation
kurator-SPNHC Python YW-recon	+	YW-logging
Social	network analysis Twitter Python	 YW +	NW	(file-level)
Oceanography	 OHIBC Howe Sound
(multi-run multi-script)
R	 YW +	R	RunManager
Ludäscher:	Workflows	&	Provenance	=>	Understanding 17
run/  
├──  raw  
│      └──  q55  
│              ├──  DRT240  
│              │      ├──  e10000  
│              │      │      ├──  image_001.raw  
...          ...  ...  ...  
│              │      │      └──  image_037.raw  
│              │      └──  e11000  
│              │              ├──  image_001.raw  
...          ...          ...  
│              │              └──  image_037.raw  
│              └──  DRT322  
│                      ├──  e10000  
│                      │      ├──  image_001.raw  
...                  ...  ...  
│                      │      └──  image_030.raw  
│                      └──  e11000  
│                              ├──  image_001.raw  
...                          ...  
│                              └──  image_030.raw  
├──  data  
│      ├──  DRT240  
│      │      ├──  DRT240_10000eV_001.img  
...  ...  ...  
│      │      └──  DRT240_11000eV_037.img  
│      └──  DRT322  
│              ├──  DRT322_10000eV_001.img  
...          ...  
│              └──  DRT322_11000eV_030.img  
│  
├──  collected_images.csv  
├──  rejected_samples.txt  
└──  run_log.txt  
  
YW-RECON:	Prospective	&	Retrospective
Provenance	…	(almost)	for	free!	
cassette_id
sample_score_cutoff
sample_spreadsheet
file:cassette_{cassette_id}_spreadsheet.csv
calibration_image
file:calibration.img
initialize_run
run_log
file:run/run_log.txt
load_screening_results
sample_namesample_quality
calculate_strategy
rejected_sample accepted_sample num_images energies
log_rejected_sample
rejection_log
file:/run/rejected_samples.txt
collect_data_set
sample_id energy frame_number
raw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_image
file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_log
file:run/collected_images.csv
• URI-templates	link conceptual	entities	to	
runtime	provenance	“left	behind”	by	the	
script	author	…	
• …	facilitating	provenance	reconstructionLudäscher:	Workflows	&	Provenance	=>	Understanding
18
initialize_run
run_log
file:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_log
file:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_number
raw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_image
file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_log
file:run/collected_images.csv
sample_spreadsheet
file:cassette_{cassette_id}_spreadsheet.csv
calibration_image
file:calibration.img
cassette_id
sample_score_cutoff
Q1:	What	samples did	the	script	run	collect	images	
from?
run/  
├──  raw  
│      └──  q55  
│              ├──  DRT240  
│              │      ├──  e10000  
│              │      │      ├──  image_001.raw  
...          ...  ...  ...  
│              │      │      └──  image_037.raw  
│              │      └──  e11000  
│              │              ├──  image_001.raw  
...          ...          ...  
│              │              └──  image_037.raw  
│              └──  DRT322  
│                      ├──  e10000  
│                      │      ├──  image_001.raw  
...                  ...  ...  
│                      │      └──  image_030.raw  
│                      └──  e11000  
│                              ├──  image_001.raw  
...                          ...  
│                              └──  image_030.raw  
├──  data  
│      ├──  DRT240  
│      │      ├──  DRT240_10000eV_001.img  
...  ...  ...  
│      │      └──  DRT240_11000eV_037.img  
│      └──  DRT322  
│              ├──  DRT322_10000eV_001.img  
...          ...  
│              └──  DRT322_11000eV_030.img  
│  
├──  collected_images.csv  
├──  rejected_samples.txt  
└──  run_log.txt  
  
Ludäscher:	Workflows	&	Provenance	=>	Understanding
19
initialize_run
run_log
file:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_log
file:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_number
raw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_image
file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_log
file:run/collected_images.csv
sample_spreadsheet
file:cassette_{cassette_id}_spreadsheet.csv
calibration_image
file:calibration.img
cassette_id
sample_score_cutoff
Q2:	What	energies were	used	for	image	collection	from	
sample	DRT322?
run/  
├──  raw  
│      └──  q55  
│              ├──  DRT240  
│              │      ├──  e10000  
│              │      │      ├──  image_001.raw  
...          ...  ...  ...  
│              │      │      └──  image_037.raw  
│              │      └──  e11000  
│              │              ├──  image_001.raw  
...          ...          ...  
│              │              └──  image_037.raw  
│              └──  DRT322  
│                      ├──  e10000  
│                      │      ├──  image_001.raw  
...                  ...  ...  
│                      │      └──  image_030.raw  
│                      └──  e11000  
│                              ├──  image_001.raw  
...                          ...  
│                              └──  image_030.raw  
├──  data  
│      ├──  DRT240  
│      │      ├──  DRT240_10000eV_001.img  
...  ...  ...  
│      │      └──  DRT240_11000eV_037.img  
│      └──  DRT322  
│              ├──  DRT322_10000eV_001.img  
...          ...  
│              └──  DRT322_11000eV_030.img  
│  
├──  collected_images.csv  
├──  rejected_samples.txt  
└──  run_log.txt  
  
Ludäscher:	Workflows	&	Provenance	=>	Understanding
20
initialize_run
run_log
file:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_log
file:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_number
raw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_image
file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_log
file:run/collected_images.csv
sample_spreadsheet
file:cassette_{cassette_id}_spreadsheet.csv
calibration_image
file:calibration.img
cassette_id
sample_score_cutoff
Q3:	Where	is	the	raw	image	of	the	corrected	image	
DRT322_11000ev_030.img?	run/  
├──  raw  
│      └──  q55  
│              ├──  DRT240  
│              │      ├──  e10000  
│              │      │      ├──  image_001.raw  
...          ...  ...  ...  
│              │      │      └──  image_037.raw  
│              │      └──  e11000  
│              │              ├──  image_001.raw  
...          ...          ...  
│              │              └──  image_037.raw  
│              └──  DRT322  
│                      ├──  e10000  
│                      │      ├──  image_001.raw  
...                  ...  ...  
│                      │      └──  image_030.raw  
│                      └──  e11000  
│                              ├──  image_001.raw  
...                          ...  
│                              └──  image_030.raw  
├──  data  
│      ├──  DRT240  
│      │      ├──  DRT240_10000eV_001.img  
...  ...  ...  
│      │      └──  DRT240_11000eV_037.img  
│      └──  DRT322  
│              ├──  DRT322_10000eV_001.img  
...          ...  
│              └──  DRT322_11000eV_030.img  
│  
├──  collected_images.csv  
├──  rejected_samples.txt  
└──  run_log.txt  
  
Ludäscher:	Workflows	&	Provenance	=>	Understanding
21
initialize_run
run_log
file:run/run_log.txt
load_screening_results
sample_name sample_quality
calculate_strategy
rejected_sample accepted_sample num_imagesenergies
log_rejected_sample
rejection_log
file:/run/rejected_samples.txt
collect_data_set
sample_idenergyframe_number
raw_image
file:run/raw/{cassette_id}/{sample_id}/e{energy}/image_{frame_number}.raw
transform_images
corrected_image
file:data/{sample_id}/{sample_id}_{energy}eV_{frame_number}.img
total_intensitypixel_count corrected_image_path
log_average_image_intensity
collection_log
file:run/collected_images.csv
sample_spreadsheet
file:cassette_{cassette_id}_spreadsheet.csv
calibration_image
file:calibration.img
cassette_id
sample_score_cutoff
run/  
├──  raw  
│      └──  q55  
│              ├──  DRT240  
│              │      ├──  e10000  
│              │      │      ├──  image_001.raw  
...          ...  ...  ...  
│              │      │      └──  image_037.raw  
│              │      └──  e11000  
│              │              ├──  image_001.raw  
...          ...          ...  
│              │              └──  image_037.raw  
│              └──  DRT322  
│                      ├──  e10000  
│                      │      ├──  image_001.raw  
...                  ...  ...  
│                      │      └──  image_030.raw  
│                      └──  e11000  
│                              ├──  image_001.raw  
...                          ...  
│                              └──  image_030.raw  
├──  data  
│      ├──  DRT240  
│      │      ├──  DRT240_10000eV_001.img  
...  ...  ...  
│      │      └──  DRT240_11000eV_037.img  
│      └──  DRT322  
│              ├──  DRT322_10000eV_001.img  
...          ...  
│              └──  DRT322_11000eV_030.img  
│  
├──  collected_images.csv  
├──  rejected_samples.txt  
└──  run_log.txt  
  
Q5:	What	cassette-id	had	the	sample	leading	to	
DRT240_10000ev_001.img?
Ludäscher:	Workflows	&	Provenance	=>	Understanding
22
Hybrid Provenance:
YW	Model + Runtime	
Observables (file	level)	
Ludäscher:	Workflows	&	Provenance	=>	Understanding
23
�����������������
�����
���������
��������������
����������������
����������
�����������������
����������������
�������
����������
������������������
����������������
�����������������
�������������������
�����������
������������������
����������
�����������������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
• The	YW	model	can	be	connected	
with	runtime	observables
• è YW	recon	(prov reconstruction)
• Here:	
• What	specific	files	were	read,	
written	and	where	do	they	occur	
in	the	workflow?
C3-C4	Prospective	Provenance	
Ludäscher:	Workflows	&	Provenance	=>	Understanding
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
file:outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_C4_fraction
C4_fraction_data
file:outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
file:outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
mean_airtemp
file:inputs/narr_air.2m_monthly/air.2m_monthly_{start_year}_{end_year}_mean.{month}.nc
mean_precip
file:inputs/narr_apcp_rescaled_monthly/apcp_monthly_{start_year}_{end_year}_mean.{month}.nc
24
What	does	C4_fraction_data depend	on	?
C3_C4_map_present_NA
examine_pixels_for_grass
C4_Data
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_precipitation_data
Rain_Matrix
fetch_monthly_mean_air_temperature_data
Tair_Matrix
generate_netcdf_file_for_C4_fraction
C4_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
generate_netcdf_file_for_C4_fraction
C4_fraction_data
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
C4_fraction_data	
lineage very	similar	to	
overall	workflow	graph!
Ludäscher:	Workflows	&	Provenance	=>	Understanding
25
What	does	Grass_fraction_data depend	on?
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
generate_netcdf_file_for_C4_fraction
C4_fraction_data
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
C4_fraction_data	lineage different	from	overall	workflow	graph!
- Smaller subgraph
- Depends	on	only	1	of	3	inputs!
C3_C4_map_present_NA
initialize_Grass_Matrix
Grass_variable
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
Ludäscher:	Workflows	&	Provenance	=>	Understanding
26
What	happens	after	running	the	script?
Hybrid provenance	graph!
• 3	inputs	spread	across	
25 (=2x24	+	1)	files
• Do	all	3	output	files	
depend	on	all	25	
inputs?
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
outputs/SYNMAP_PRESENTVEG_C3Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_C4_fraction
C4_fraction_data
outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
mean_airtemp
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.9.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.2.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.1.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.6.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.10.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.3.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.7.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.11.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.4.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.8.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.12.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.5.nc
mean_precip
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.4.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.8.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.1.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.12.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.5.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.9.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.2.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.6.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.10.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.3.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.7.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.11.nc
Ludäscher:	Workflows	&	Provenance	=>	Understanding 27
What	C4_fraction_data	depends	on	(hybrid)	…	
C3_C4_map_present_NA
examine_pixels_for_grass
C4_Data
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_precipitation_data
Rain_Matrix
fetch_monthly_mean_air_temperature_data
Tair_Matrix
generate_netcdf_file_for_C4_fraction
C4_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
Earlier	prospective
query	result	
C3_C4_map_present_NA
examine_pixels_for_grass
C4_Data
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_precipitation_data
Rain_Matrix
fetch_monthly_mean_air_temperature_data
Tair_Matrix
generate_netcdf_file_for_C4_fraction
C4_fraction_data
outputs/SYNMAP_PRESENTVEG_C4Grass_RelaFrac_NA_v2.0.nc
SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
mean_airtemp
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.4.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.8.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.1.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.12.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.5.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.9.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.2.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.6.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.10.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.3.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.7.nc
inputs/narr_air.2m_monthly/air.2m_monthly_2000_2010_mean.11.nc
mean_precip
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.10.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.3.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.7.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.11.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.4.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.8.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.1.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.12.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.5.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.9.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.2.nc
inputs/narr_apcp_rescaled_monthly/apcp_monthly_2000_2010_mean.6.nc
Ludäscher:	Workflows	&	Provenance	=>	Understanding
28
What	Grass_fraction_data depends	on	(hybrid)…	
C3_C4_map_present_NA
initialize_Grass_Matrix
Grass_variable
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
C3_C4_map_present_NA
initialize_Grass_Matrix
Grass_variable
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
outputs/SYNMAP_PRESENTVEG_Grass_Fraction_NA_v2.0.nc
SYNMAP_land_cover_map_data
inputs/land_cover/SYNMAP_NA_QD.nc
C3_C4_map_present_NA
fetch_SYNMAP_land_cover_map_variable
lon_variable lat_variable lon_bnds_variable lat_bnds_variable
fetch_monthly_mean_air_temperature_data
Tair_Matrix
fetch_monthly_mean_precipitation_data
Rain_Matrix
initialize_Grass_Matrix
Grass_variable
examine_pixels_for_grass
C3_Data C4_Data
generate_netcdf_file_for_C3_fraction
C3_fraction_data
generate_netcdf_file_for_C4_fraction
C4_fraction_data
generate_netcdf_file_for_Grass_fraction
Grass_fraction_data
SYNMAP_land_cover_map_data
mean_airtempmean_precip
Overall workflow	
Upstream	of	
Grass_fraction_data
(prospective)	
Upstream	of	Grass_fraction_data
(hybrid)	
# @BEGIN
Gravitational_Wave_Detection
# @IN fn_d @as FN_Detector
# @IN fn_sr @as FN_Sampling_Rate
# @OUT shifted.wav @as
shifted_wave
# @OUT whitenbp.wav @as
whitened_bandpass
import numpy as np
from scipy import signal
…
# @BEGIN
Amplitude_Spectral_Density
# @IN strain_H1
# @IN strain_L1
# @PARAM fs
# @OUT psd_H1
# @OUT psd_L1
# @OUT GW150914_ASDs.png @URI …
…
NFFT = 1*fs
fmin, fmax = 10, 2000
…
YesWorkflow-annotated	
scripts	
Logic	rules	for	r
querying,	and
prospective and
provenance
upstream(strain_LI_whitenbp) [prospective]
WHITENING
strain_H1_whiten strain_L1_whiten
AMPLITUDE_SPECTRAL_DENSITY
PSD_H1 PSD_L1
LOAD_DATA
strain_H1 strain_L1
BANDPASSING
strain_L1_whitenbp
FN_Detector
file:{Detector}_LOSC_4_V1-...
FN_Sampling_rate
file:H-H1_LOSC_{Rate}_V1-...
fs
upstream(strain_L1_whitenbp) [URI-recon]
WHITENING
strain_H1_whiten strain_L1_whiten
AMPLITUDE_SPECTRAL_DENSITY
PSD_H1 PSD_L1
LOAD_DATA
strain_H1 strain_L1
BANDPASSING
strain_L1_whitenbp
FN_Detector
L-L1_LOSC_4_V1-1126259446-32.hdf5
H-H1_LOSC_4_V1-1126259446-32.hdf5
FN_Sampling_rate
H-H1_LOSC_4_V1-1126259446-32.hdf5
H-H1_LOSC_16_V1-1126259446-32.hdf5
fs
YesWorkflow toolkit
Extract annotations	and	
model script	as	a	workflow
YesWorkflow
Reconstruct scrip
retrospective	pro
YesWorkflow toolkit	
Render	workflow	
model	graphically
Prospective	Provenance
user-defined	
workflow	models
Hybrid	Pro
General	purpose	prov
Provenance	q
Query proven
(esp.	graphs
visualize re
Workflow	model	(graph)		
Facts	(Prolog)
Reconstructed	pr
Facts	(Prol
prospective	+	file-
level runtime	
observables	
Ludäscher:	Workflows	&	Provenance	=>	Understanding
29
LIGO	example:	What	strain_L1_whitenbp depends	on	…	
Overall workflow	
Upstream	of	
strain_L1_whitenbp	
(prospective)	
GRAVITATIONAL_WAVE_DETECTION
LOAD_DATA
Load hdf5 data.
strain_H1strain_L1 strain_16 strain_4
AMPLITUDE_SPECTRAL_DENSITY
Amplitude spectral density.
ASDs
file:GW150914_ASDs.png
PSD_H1PSD_L1
WHITENING
suppress low frequencies noise.
strain_H1_whiten strain_L1_whiten
BANDPASSING
remove high frequency noise.
strain_H1_whitenbp strain_L1_whitenbp
STRAIN_WAVEFORM_FOR_WHITENED_DATA
plot whitened data.
WHITENED_strain_data
file:GW150914_strain_whitened.png
SPECTROGRAMS_FOR_STRAIN_DATA
plot spectrogram for strain data.
spectrogram
file:GW150914_{detector}_spectrogram.png
SPECTROGRAMS_FOR_WHITEND_DATA
plot spectrogram for whitened data.
spectrogram_whitened
file:GW150914_{detector}_spectrogram_whitened.png
FILTER_COEFS
Filter signal in time domain (bandpassing).
COEFFICIENTS
FILTER_DATA
filter data.
filtered_white_noise_data
file:GW150914_filter.png
strain_H1_filtstrain_L1_filt
STRAIN_WAVEFORM_FOR_FILTERED_DATA
plot the filtered data.
H1_strain_filtered
file:GW150914_H1_strain_filtered.png
H1_strain_unfiltered
file:GW150914_H1_strain_unfiltered.png
WAVE_FILE_GENERATOR_FOR_WHITENED_DATA
Make sound files for whitened data.
whitened_bandpass_wavefile
file:GW150914_{detector}_whitenbp.wav
SHIFT_FREQUENCY_BANDPASSED
shift frequency of bandpassed signal.
strain_H1_shifted strain_L1_shifted
WAVE_FILE_GENERATOR_FOR_SHIFTED_DATA
Make sound files for shifted data.
shifted_wavefile
file:GW150914_{detector}_shifted.wav
DOWNSAMPLING
Downsampling from 16384 Hz to 4096 Hz.
H1_ASD_SamplingRate
file:GW150914_H1_ASD_{SamplingRate}.png
FN_Detector
file:{Detector}_LOSC_4_V1-1126259446-32.hdf5
FN_Sampling_rate
file:H-H1_LOSC_{DownSampling}_V1-1126259446-32.hdf5
fs
upstream(strain_LI_whitenbp) [prospective]
WHITENING
strain_H1_whiten strain_L1_whiten
AMPLITUDE_SPECTRAL_DENSITY
PSD_H1 PSD_L1
LOAD_DATA
strain_H1 strain_L1
BANDPASSING
strain_L1_whitenbp
FN_Detector
file:{Detector}_LOSC_4_V1-...
FN_Sampling_rate
file:H-H1_LOSC_{Rate}_V1-...
fs
upstream(strain_L1_whitenbp) [URI-recon]
WHITENING
strain_H1_whiten strain_L1_whiten
AMPLITUDE_SPECTRAL_DENSITY
PSD_H1 PSD_L1
LOAD_DATA
strain_H1 strain_L1
BANDPASSING
strain_L1_whitenbp
FN_Detector
L-L1_LOSC_4_V1-1126259446-32.hdf5
H-H1_LOSC_4_V1-1126259446-32.hdf5
FN_Sampling_rate
H-H1_LOSC_4_V1-1126259446-32.hdf5
H-H1_LOSC_16_V1-1126259446-32.hdf5
fs
upstream(strain_LI_whitenbp) [NW-recon]
WHITENING
strain_L1_whiten
strain_L1_whiten = array([8.494, -1.672, ..., 72.156])
AMPLITUDE_SPECTRAL_DENSITY
PSD_L1
psd_L1 = scipy.interpolate.interpolate.interp1d
object at 0x113969418
LOAD_DATA
strain_L1
strain_L1 = array([-1.779e-18, -1.765e-18, ..., -1.719e-18])
BANDPASSING
strain_L1_whitenbp
strain_L1_whitenbp = array([8.184, 19.935,..., -0.684])
FN_Detector
fn_d = L-L1_LOSC_4_V1-1126259446-32.hdf5
fs
fs = 4096
Upstream	of	strain_L1_whitenbp	
(hybrid	YW-NW	at	the	code-
level)	
Upstream	of	strain_L1_whitenbp	
(hybrid	YW-NW	at	the	file-level)	
3	inputs	spread	across	
5 (=2x2	+	1)	files
Does	intermediate	data	
strain_L1_whitenbp	
depend	on	all	5	inputs?
• Intermediate	data	
strain_L1_whiten
bp	depend	only	
on	2 out	of	5	
inputs!
Ludäscher:	Workflows	&	Provenance	=>	Understanding
30
DwCA Taxon	Lookup	
Workflow
• Declare	inputs,	outputs,	and	
steps of	a	script	(or	wf)	with	
YW	annotations	to	...	
– communicate	provenance	
graphically	(via	graphviz)
– combine different	forms	of	
provenance
– query provenance	
• Simple	YW	annotations	in	
comments:
– @BEGIN	Step,	@END	Step
– @IN	Data,	@OUT	Data
– @URI	Template,	@LOG	Pattern
Ludäscher:	Workflows	&	Provenance	=>	Understanding 31
�����������������
�����
��������������������������������������������������������������
��������������������������������������������������������������
��������������
����������������������������������
���������
����������������
�������������������������������������������������������������
����������
�����������������
��������������������������������������������������������������������������������������
����������������
�������
��������������
������������������
�������������������������������������
����������������
�����������������
��������������������������������������
�������������������
�����������
�������������������������������
������������������
����������
������������������������������
�����������������
�����������
����������������������������
������������
�������������
������������������������������������������������������
���������������������
�����������������������������������
�����������������
Taxon	Lookup	Workflow:	
Data	View	and	Process	View
Ludäscher:	Workflows	&	Provenance	=>	Understanding
32
The	story	of	
two	individual	
records
Ludäscher:	Workflows	&	Provenance	=>	Understanding
33
�����������������
�����������������
�������������������
�������
����������
����������
�����������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
������
������������������
����������������
�������������������������������
�����������
������������������
����
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
�����������������
������������������
����������������
�������
����������
�����������
������������������
�����
���������
��������������
����������������
����������
���������������
�����������������
����������������
���������
�����������������
�������������������
���������������������������������
����������
�����������������
��������������������������������������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
������������������������������������������������������������������
• One	took	the	GBIF
route,	while	…
• … the	other	went	
all	WORMS!
The	aggregate story	..
Ludäscher:	Workflows	&	Provenance	=>	Understanding
34
�����������������
�����
���������
��������������
����������������
����������
����������
�����������������
����������������
����������
�������
����������
������������������
����������������
���������
�����������������
�������������������
���������
�����������
������������������
�������������
���������
����������
�����������������
�������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
• How	many	records	were	
observed	as	inputs	or	outputs	
of	workflow	steps?
• Were	there	any	NULL	values?	
How	many?
Summary	I
• YW	annotations	can	be	added	
easily	to	your	scripts	to	reap	
workflow	benefits
– Documentation of	what’s	
important	
– Visualization of	dependencies
– Querying	provenance	
(prospective,	retrospective,	
and	hybrid)
è make	provenance	actionable
è provenance	for	self!
=> github.com/yesworkflow-org/yw
=> try.yesworkflow.org
Ludäscher:	Workflows	&	Provenance	=>	Understanding 35
�����������������
�����
��������������������������������������������������������������
��������������������������������������������������������������
��������������
����������������������������������
���������
����������������
�������������������������������������������������������������
����������
�����������������
��������������������������������������������������������������������������������������
����������������
�������
��������������
������������������
�������������������������������������
����������������
�����������������
��������������������������������������
�������������������
�����������
�������������������������������
������������������
����������
������������������������������
�����������������
�����������
����������������������������
������������
�������������
������������������������������������������������������
���������������������
�����������������������������������
�����������������
�����������������
�����
���������
��������������
����������������
����������
����������
�����������������
����������������
����������
�������
����������
������������������
����������������
���������
�����������������
�������������������
���������
�����������
������������������
�������������
���������
����������
�����������������
�������������
��������
�����������
������������
�������������
���������������������
�������������������������������������������������������������������
�����������������
�������������������������������������������������������������������������
João	F.	Pimentel,	Saumen	Dey,	Timothy	McPhillips,	
Khalid	Belhajjame,	David	Koop,	Leonardo	Murta,	
Vanessa	Braganholo,	Bertram	Ludäscher
Yin	&	Yang:	Demonstrating complementary	
provenance	from	noWorkflow &	
YesWorkflow
36
module.__build_class__
module.__build_class__
simulate_data_collection
180 return
180 run_logger
201 return
201 new_image_file
230 parser
231 cassette_id
236 add_option
241 add_option
246 add_option
248 set_usage
251 parse_args
251 args
251 options
254 module.len
24 cassette_id
24 sample_score_cutoff
24 data_redundancy
24 calibration_image_file
30 exists
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
36 run_log
37 write
38 str(sample_score_cutoff)
38 write
38 str(sample_score_cutoff)
49 str.format
49 sample_spreadsheet_file
50 spreadsheet_rows
cassette_q55_spreadsheet.csv
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format 51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
72 str.format
72 write
73 open
73 rejection_log
74 str.format
74 TextIOWrapper.write
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
calibration.img
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open 119 collection_log_file 120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
128 return
run/run_log.txt
run/rejected_samples.txt
run/raw/q55/DRT240/e10000/image_001.raw
run/data/DRT240/DRT240_10000eV_001.img
run/collected_images.csv
run/raw/q55/DRT240/e10000/image_002.raw
run/data/DRT240/DRT240_10000eV_002.img
run/raw/q55/DRT240/e11000/image_001.raw
run/data/DRT240/DRT240_11000eV_001.img
run/raw/q55/DRT240/e11000/image_002.raw
run/data/DRT240/DRT240_11000eV_002.img
run/raw/q55/DRT240/e12000/image_001.raw
run/data/DRT240/DRT240_12000eV_001.img
run/raw/q55/DRT240/e12000/image_002.raw
run/data/DRT240/DRT240_12000eV_002.img
run/raw/q55/DRT322/e10000/image_001.raw
run/data/DRT322/DRT322_10000eV_001.img
run/raw/q55/DRT322/e10000/image_002.raw
run/data/DRT322/DRT322_10000eV_002.img
run/raw/q55/DRT322/e11000/image_001.raw
run/data/DRT322/DRT322_11000eV_001.img
run/raw/q55/DRT322/e11000/image_002.raw
run/data/DRT322/DRT322_11000eV_002.img
noWorkflow:
not only
Workflow!
• Scripts	have	provenance,	too!
• Transparently capture	some/all	
provenance	from	Python	script	
runs.
• Use	filter	queries to	“zoom”	into	
relevant	parts	..		
37
simulate_data_collection
230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8>
251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55'])
251 args = ['q55']
251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}>
24 cassette_id = 'q55'
24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0
24 calibration_image_file = 'calibration.img'
49 str.format
49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv'
50 spreadsheet_rows(sample_spreadsheet_file)
50 sample_name = 'DRT240'50 sample_quality = 45
61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000])
61 accepted_sample = 'DRT240'61 num_images = 2
61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240'
92 collect_next_image(casset ... _{frame_number:03d}.raw')
92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw'
106 str.format
106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img')
calibration.img
run/data/DRT240/DRT240_11000eV_002.img
$	now dataflow	-f	"run/data/DRT240/DRT240_11000eV_002.img"
$(NW_FILTERED_LINEAGE_GRAPH).gv: $(NW_FACTS)
now helper df_style.py
now dataflow -v 55 -f
$(RETROSPECTIVE_LINEAGE_VALUE) -m simulation
| python df_style.py -d BT -e >
$(NW_FILTERED_LINEAGE_GRAPH).gv
..	auto-“make” this!
noWorkflow lineage	
of	an	image	file
Provenance	information	
about	Python	function	calls,	
variable assignments,	etc.
38
simulate_data_collection
initialize_run
run_log load_screening_results
sample_namesample_quality
calculate_strategy
accepted_samplerejected_sample num_imagesenergies
log_rejected_sample
rejection_log
collect_data_set
sample_id energyframe_number raw_image
transform_images
corrected_imagetotal_intensitypixel_count
log_average_image_intensity
collection_log
sample_spreadsheet
calibration_image
sample_score_cutoffdata_redundancy
cassette_id
simulate_data_collection
collect_data_set
sample_id energy frame_number raw_image
calculate_strategy
accepted_sample num_imagesenergies
load_screening_results
sample_namesample_quality
transform_images
corrected_image
sample_spreadsheet
calibration_image
sample_score_cutoff data_redundancy
cassette_id
module.__build_class__
module.__build_class__
simulate_data_collection
180 return
180 run_logger
201 return
201 new_image_file
230 parser
231 cassette_id
236 add_option
241 add_option
246 add_option
248 set_usage
251 parse_args
251 args
251 options
254 module.len
24 cassette_id
24 sample_score_cutoff
24 data_redundancy
24 calibration_image_file
30 exists
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
33 exists
32 filepath
34 module.remove
36 run_log
37 write
38 str(sample_score_cutoff)
38 write
38 str(sample_score_cutoff)
49 str.format
49 sample_spreadsheet_file
50 spreadsheet_rows
cassette_q55_spreadsheet.csv
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format 51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
72 str.format
72 write
73 open
73 rejection_log
74 str.format
74 TextIOWrapper.write
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
calibration.img
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format 106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format 93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw') 93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
50 spreadsheet_rows(sample_spreadsheet_file)
51 str.format
51 write
50 sample_name
50 sample_quality
61 calculate_strategy
61 rejected_sample
61 energies
61 accepted_sample
61 num_images
90 str.format
90 write
91 sample_id
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file
120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image 106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open 119 collection_log_file 120 module.writer 120 collection_log
121 writer.writerow
92 collect_next_image
92 collect_next_image(casset ... _{frame_number:03d}.raw')
93 str.format
93 write
92 energy
92 frame_number
92 intensity
92 raw_image_file
106 str.format
106 transform_image
106 corrected_image_file
106 total_intensity
106 pixel_count
107 str.format
107 write
118 average_intensity
119 open
119 collection_log_file 120 module.writer
120 collection_log
121 writer.writerow
92 collect_next_image
50 spreadsheet_rows
128 return
run/run_log.txt
run/rejected_samples.txt
run/raw/q55/DRT240/e10000/image_001.raw
run/data/DRT240/DRT240_10000eV_001.img
run/collected_images.csv
run/raw/q55/DRT240/e10000/image_002.raw
run/data/DRT240/DRT240_10000eV_002.img
run/raw/q55/DRT240/e11000/image_001.raw
run/data/DRT240/DRT240_11000eV_001.img
run/raw/q55/DRT240/e11000/image_002.raw
run/data/DRT240/DRT240_11000eV_002.img
run/raw/q55/DRT240/e12000/image_001.raw
run/data/DRT240/DRT240_12000eV_001.img
run/raw/q55/DRT240/e12000/image_002.raw
run/data/DRT240/DRT240_12000eV_002.img
run/raw/q55/DRT322/e10000/image_001.raw
run/data/DRT322/DRT322_10000eV_001.img
run/raw/q55/DRT322/e10000/image_002.raw
run/data/DRT322/DRT322_10000eV_002.img
run/raw/q55/DRT322/e11000/image_001.raw
run/data/DRT322/DRT322_11000eV_001.img
run/raw/q55/DRT322/e11000/image_002.raw
run/data/DRT322/DRT322_11000eV_002.img
simulate_data_collection
230 parser = <optparse.OptionParser object at 0x7fcb6e16e3c8>
251 parse_args = (<Values at 0x7fcb6cbe15c ... cutoff': 12.0}>, ['q55'])
251 args = ['q55']
251 options = <Values at 0x7fcb6cbe15c0 ... ple_score_cutoff': 12.0}>
24 cassette_id = 'q55'
24 sample_score_cutoff = 12.0 24 data_redundancy = 0.0
24 calibration_image_file = 'calibration.img'
49 str.format
49 sample_spreadsheet_file = 'cassette_q55_spreadsheet.csv'
50 spreadsheet_rows(sample_spreadsheet_file)
50 sample_name = 'DRT240'50 sample_quality = 45
61 calculate_strategy = ('DRT240', None, 2, [10000, 11000, 12000])
61 accepted_sample = 'DRT240'61 num_images = 2
61 energies = [10000, 11000, 12000] 91 sample_id = 'DRT240'
92 collect_next_image(casset ... _{frame_number:03d}.raw')
92 energy = 11000 92 frame_number = 292 raw_image_file = 'run/raw/q55/DRT240/e11000/image_002.raw'
106 str.format
106 transform_image = (980, 10, 'run/data/DRT240/DRT240_11000eV_002.img')
calibration.img
run/data/DRT240/DRT240_11000eV_002.img
lineage	query
lineage	query
YesWorkflow:
Conceptual workflow	model
noWorkflow:	
Python trace	model
But	how	do	we	
bridge	this	gap???
Would	like	to	use	YW	
model	to	query	NW	
data!
39
Habemus	Pons!
We’ve	got	the	Bridge!	
The	bridge	is	the	journey..		
(The	journey	is	the	destination)
Lineage	of	image	file
in	terms	of	YW	
model,	with	details	
from	NW	provenance
40
DataONE:	Search	and	Provenance	Display
41
Ludäscher:	Workflows	&	Provenance	=>	Understanding
DataONE:	Search	and	Provenance	Display
42
Ludäscher:	Workflows	&	Provenance	=>	Understanding
Adding YesWorkflow to DataONE
Yaxing’s script with	
inputs &	output	
products
Christopher’s	
YesWorkflow
model
Christopher	using
Yaxing’s outputs	as	
inputs	for	his	script
Christopher’s	results	
can	be	traced	back	all	
the	way	to	Yaxing’s
input
Ludäscher:	Workflows	&	Provenance	=>	Understanding
43
Demo	Time
Ludäscher:	Workflows	&	Provenance	=>	Understanding
44
(Disclaimer) https://github.com/idaks/dataone-ahm-2016-poster
https://github.com/idaks/wt-prov-summer-2017
https://github.com/yesworkflow-org/yw-idcc-17
Whole	Tale:	The	next	step	in	the	evolution	of	
the	scholarly	article:	The	“Living”	Paper
• 1st Generation:	
– narrative (prose)
• 2nd Generation:	plus …	
– name	..	identify	..	include	(access	to)	data
• 3rd Generation:	plus …	
– name	..	reference	..	include	code (software)	..	
– and	provenance …	and	exec	environment	(containers)	
Ludäscher:	Workflows	&	Provenance	=>	Understanding 45
Whole	Tale	
Whole	Tale	Dashboard
Whole	Tale:	What’s	in	a	name?
(1)	Whole	Tale ⇔ Whole	Story:
◦ Support	(computational /	data)	scientists
◦ …	along	the	complete	research	lifecycle
◦ ...	from	experiment	to	(new	kind	of)	publication
◦ ...	and	back!
(2)	Whole	Tale ⇔ for	the	Long	Tail	of	Science
–Easy	sharing	of	your	computational	narratives,	data,	and	
exec-env since	2017!
–Power	applications	for	everyone!
46Ludäscher:	Workflows	&	Provenance	=>	Understanding
Whole Tale	Vision
• Can't	reproduce	result	because:
• Don't	know	how	to	run	analysis
• Can't	get	the	software	running
• Can't	pay	for	the	computer	or	compute	
power	the	result	was	computed	on
Source:	Bryce	Mecum,	NCEAS	(WT	team)
47
Whole Tale	Vision
Addressing	reproducibility
4
8
Data Code
Execution	
Environment
Article
Source:	Bryce	Mecum,	NCEAS	(WT	team)
Whole Tale	Vision
• Living	publication	
(data	+	code	+	environment)
• Increase	odds	of	reproducibility
• Encourage	investigation	of	results	making	it	easy	to	
recreate	the	environment	the	result	was	created	in
Article
Source:	Bryce	Mecum,	NCEAS	(WT	team)
Whole Tale	Vision
Addressing	reproducibility
Article
Tale
+
Source:	Bryce	Mecum,	NCEAS	(WT	team)
Whole	Tale	Vision
Tale
Data
{ Code
D1PROV
Source:	Bryce	Mecum,	NCEAS	(WT	team)
Whole	Tale	Team
NSF-DIBBS	award:	The	Whole	Tale:	Merging	Science	and	
Cyberinfrastructure	Pathways	($5M	total,	over	5	years,	5	teams)
WT	Team:	
• Illinois	(NCSA	&	iSchool)
• Bertram	Ludäscher	(PI),	Kandace	Turner	(PM),	Victoria	Stodden	(coPI),	Matt	
Turk	(coPI)
• Kacper	Kowalik	(sw-architect),	Craig	Willis	(sw-dev)	
• U	of	Chicago	
• Kyle	Chard	(coPI),	Mihael	Hategan	(sw-dev)
• UT	Austin
• Niall	Gaffney	(coPI),	Siva	Kulasekaran	(sw-dev)
• U	Notre	Dame	
• Jarek	Nabrzyski	(coPI),	Ian	Taylor	(sw-dev),	Adam	Brinckman	(sw-dev)
• UCSB
• Matt Jones	(coPI),	Bryce	Mecum	(sw-dev)
DEMO!
Ludäscher:	Workflows	&	Provenance	=>	Understanding
53
Last	not	least:
Non-unitary syntheses
of systematic knowledge
Please
@taxonbytes
Nico Franz
School of Life Sciences, Arizona State University
CIRSS Seminar – Center for Informatics Research in Science and Scholarship
February 17, 2017 – iSchool, University of Illinois Urbana-Champaign
@ http://www.slideshare.net/taxonbytes/franz-2017-uiuc-cirss-non-unitary-syntheses-of-systematic-knowledge 54
55
http://taxonbytes.org/wp-content/uploads/2014/10/Peet-BIGCB-2014-Changing-Perspectives-on-Plant-Distributions.pdf56
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
"Taxonomic concept labels"
identify input concept regions
RCC–5 articulations provided
for each species-level concept
• Input visualization: MSW3 (2005) versus MSW2 (1993)
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
57
• Alignment visualization: "grey means taxonomically congruent"
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
58
One name &
congruent region
Many names &
congruent region
One name &
non-congruent regions
Many names &
non-congruent regions
New names &
exclusive regions
• Application of coverage constraint: parent-to-parent articulations (><) are
fully defined by alignment signal propagated from their respective children.
è Sensible when complete sampling of children is intended.
Use case 1.a. Aligning Microcebus + Mirza sec. MSW3 (2005)
• Alignment visualization: "grey means taxonomically congruent"
59
1 in 3 names is unreliable across MSW2/MSW3 classifications
Source: Franz et al. 2016. Two influential primate classifications logical aligned. doi:10.1093/sysbio/syw023
60
The 'consensus' The
'bible'
The (formerly)
federal
'standard'
The 'best', latest
regional flora
"Controllingthetaxonomicvariable"
Expert views
are in
conflict
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
61
The 'consensus' The
'bible'
The (formerly)
federal
'standard'
The 'best', latest
regional flora
Impact:
Name-based aggregation has created
a novel synthesis that nobody believes in
"Controllingthetaxonomicvariable"
"Just bad"
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
62
The 'consensus' The
'bible'
The (formerly)
federal
'standard'
The 'best', latest
regional flora
"Controllingthetaxonomicvariable"
"Just
bad"
Expert views
are
reconciled
Solution:
Instead of aggregating
an artificial 'consensus',
build translation services
Source: Franz et al. 2016. Controlling the taxonomic variable: […]. RIO Journal. doi:10.3897/rio.2.e10610
63
Leaving	taxon	and	species	headaches	…	
• To	illustrate	Euler	think	of	a	simpler	use	case:
• Agreeing	to	disagree!
• …	when	there	are	multiple,	legitimate	
perspectives
• Sorting	things	out!
– Euler	as	a	taxon	concept	(&	name)	“microscope”	...
– ..	or	scalpel
– ..	or	...?	
64
Yi-Yun	Cheng1,	Nico	Franz2,	Jodi	Schneider1,	Shizhuo Yu3,	Thomas	Rodenhausen4,	Bertram	Ludäscher1
1	
School	of	Information	Sciences,	University	of	Illinois	at	Urbana-Champaign;	2	
School	of	Life	Sciences,	Arizona	State	University;	
3	
Department	of	Computer	Science,	University	of	California	at	Davis;	4	
School	of	Information,	University	of	Arizona
Agreeing to Disagree: Reconciling Conflicting Taxonomic Views
using a Logic-based Approach
Acknowledgments
Support	of	the	authors’	research	through	the	National	Science	
Foundation	is	kindly	acknowledged	(DEB-1155984,	DBI-1342595,	and	
DBI-1643002).	The	authors	thank	Professor	Kathryn	La	Barre	for	her	
comments	and	suggestions.	We	would	also	like	to	thank	Dr.	Laetitia	
Navarro	and	Jeff	Terstriep for	help	with	creating	map	overlays	in	QGIS.
CONCLUSION
• Our	logic-based	taxonomy	alignment	approach	can	be	used	to	solve	
crosswalking issues
We	will	be	able	to	mitigate	the	membership	condition	problems	that	
occur	in	equivalent	crosswalking.
• RCC-5	approach	preserves	the	original	taxonomies	while	providing	an	
alignment	view
We	can	solve	data	integration	problems	that	happen	in	the	more	
coarse-grained	relative	crosswalking,	which	otherwise	is	subjected	to	
information	loss.
• Our	study	also	underscores	the	benefits	of	designing	different	
alignment	workflows	(Bottom	up	vs.	Top-down)	to	match	the	needs	
of	specific	taxonomy	alignment	problems
Bottom-up	approach:	seems	to	work	well	whenever	we	have	non-
overlapping	relationships	at	the	leaf-level	(lowest-level)	articulations,	
and	we	are	not	sure	how	the	higher-level	concepts	should	be	aligned.
Top-down	approach:	seems	favorable	when	there	is	an	expectation	of	
certain	higher-level	articulations	in	conjunction	with	under-specified,	
complex,	and	often	overlapping	leaf-level	relations.
RELATED	WORK
• Taxonomy	Alignment	Problems	(TAP)	
Taxonomies	T1,	T2 are	inter-linked	via	a	set	of	input	articulations A,	
defined	as	RCC-5	relations, to	yield	a	“merged”	taxonomy	T3 .
• Euler/X
Articulations – a	constraint	or	rule	that	defines	a	relationship	(a	set	
constraint)	between	two	concepts	from	different	taxonomies	.
Region	Connection	Calculus	(RCC-5)
Possible	Worlds	– When	encoding	and	solving	TAPs	via	ASP,	the	
different	answer	sets	represent	alternative	taxonomy	merge	solutions	
or	possible	worlds	(PWs).	
INTRODUCTION
Tina:	Hey	Amy,	can	you	recommend	a	signature	dish	from	where	you	
live?
Amy:	Oh,	definitely	the	half-smokes	from	the	Northeast!	They	are	
these	tasty	half-pork	and	half-beef	sausages.	
Tina:	What	a	coincidence!	We	have	half-smokes	in	the	South,	too!	
Where	do	you	live	in	the	Northeast?	New	York?	Boston?	
Amy:	Wrong	guesses!	Where	do	you	live	in	the	South?	
Tina	and	Amy	together:	Washington,	D.C.	
[The	two	of	them	look	at	each	other,	confused.]
“In	the	face	of	incompatible	information	or	data	structures	among	
users	or	among	those	specifying	the	system,	attempts	to	create	
unitary	knowledge	categories	are	futile.	Rather,	parallel	or	multiple	
representational	forms	are	required…”	(Bowker	&	Star,	2000).
CASE	1	RESULTS:	CEN	vs.	NDC
• State-level	alignments	are	all	congruent	(Bottom-up)
• Inferred	new	articulations	for	regional-level	alignments
CASE	2	RESULTS:	CEN	vs.	TZ
Figure 3. (Left) CEN-NDC taxonomy alignment problem with 49 input articulations between TCEN and TNDC
Figure 4. (Right) The unique possible world (PW) T3 reconciling TCEN and TNDC via inferred relationships
Figure 1. National Diversity Council map (NDC) vs. Census Bureau map (CEN)
• Github link:	
https://github.com/EulerProject/ASIST17
• Email:	yiyunyc2@illinois.edu
West
Southwest Southeast
Midwest North-
east
West
South
Midwest North-
east
Pacific
Mountain
Central
Eastern
West
South
Midwest
North-
east
RESEARCH	DESIGN
Step	1. Supply	input	taxonomies	T1 and	T2
Step	2.	Formulate	RCC-5	articulations	between	T1 and	T2
Step	3. Iteratively	edit	articulations	in	Euler/X
Y X X YX Y X Y X Y
Congruence
X == Y
Inclusion
X > Y
Inverse Inclusion
X < Y
Overlap
X>< Y
Disjointness
X ! Y
T1
T2
T1
T2
Inconsistent (N=0)
Ambiguous (N>1)
T3
Add/Edit
Articulations A
Euler/X
N Possible Worlds
N=1 N=0 or N>1
R1
R2
R3
R4
R5
R6
R7
R8
R9
CEN.Midwest
CEN.USA
TZ.USA
CEN.West
CEN.Northeast
TZ.EasternCEN.Midwest
TZ.EasternCEN.South
CEN.South
CEN.South*TZ.Central
TZ.CentralCEN.Midwest
CEN.SouthTZ.Eastern
CEN.SouthTZ.Mountain
TZ.Central
CEN.MidwestTZ.Eastern
TZ.MountainCEN.South
TZ.Mountain
CEN.MidwestTZ.Mountain
TZ.MountainCEN.Midwest
CEN.Midwest*TZ.Mountain
CEN.MidwestTZ.Central
TZ.MountainCEN.West
CEN.Midwest*TZ.Eastern
CEN.West*TZ.Mountain
CEN.South*TZ.Mountain
CEN.SouthTZ.Central
TZ.Eastern
CEN.South*TZ.Eastern
CEN.Midwest*TZ.Central
TZ.CentralCEN.South
TZ.Pacific
CEN.WestTZ.Mountain
Nodes
CEN 4
newComb 18
comb 1
TZ 4
Edges
input 6
inferred 37
CEN.IL NDC.IL==
CEN.IN NDC.IN
==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV
==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX
NDC.TX
==
CEN.Northeast
CEN.VT
CEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LA
CEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA
==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ NDC.AZ==
NDC.AL==
NDC.OK
==
NDC.CT==
CEN.CO NDC.CO
==
CEN.CA NDC.CA==
CEN.SD NDC.SD
==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA
==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54
NDC 55
Edges
isa_CEN 53
isa_NDC 54
Art. 49
CEN.West
NDC.Southwest
CEN.USA
NDC.USA
CEN.Northeast
NDC.Northeast
CEN.South
NDC.Southeast
NDC.West
CEN.DC
NDC.DC
CEN.NM
NDC.NM
CEN.ND
NDC.ND
CEN.Midwest
NDC.Midwest
CEN.AZ
NDC.AZ
CEN.CA
NDC.CA
CEN.MT
NDC.MT
CEN.MA
NDC.MA
CEN.IN
NDC.IN
CEN.NV
NDC.NV
CEN.MD
NDC.MD
CEN.CT
NDC.CT
CEN.NH
NDC.NH
CEN.KY
NDC.KY
CEN.PA
NDC.PA
CEN.CO
NDC.CO
CEN.WA
NDC.WA
CEN.MI
NDC.MI
CEN.VA
NDC.VA
CEN.WI
NDC.WI
CEN.NE
NDC.NE
CEN.SD
NDC.SD
CEN.MN
NDC.MN
CEN.MS
NDC.MS
CEN.ID
NDC.ID
CEN.WV
NDC.WV
CEN.NY
NDC.NY
CEN.NJ
NDC.NJ
CEN.UT
NDC.UT
CEN.ME
NDC.ME
CEN.IL
NDC.IL
CEN.TN
NDC.TN
CEN.VT
NDC.VT
CEN.GA
NDC.GA
CEN.DE
NDC.DE
CEN.NC
NDC.NC
CEN.OK
NDC.OK
CEN.MO
NDC.MO
CEN.SC
NDC.SC
CEN.AR
NDC.AR
CEN.TX
NDC.TX
CEN.LA
NDC.LA
CEN.OH
NDC.OH
CEN.IA
NDC.IA
CEN.KS
NDC.KS
CEN.RI
NDC.RI
CEN.WY
NDC.WY
CEN.FL
NDC.FL
CEN.OR
NDC.OR
CEN.AL
NDC.AL
Nodes
CEN 3
NDC 4
comb 51
Edges
input 61
inferred 3
overlapsinferred 3
CEN.Northeast
TZ.Eastern
<
CEN.Midwest
><
TZ.Mountain
><
TZ.Pacific
!
CEN.South
><
><
!
TZ.Central
><
CEN.USA
CEN.West
TZ.USA
==
!
><
!
Nodes
CEN 5
TZ 5
Edges
isa_CEN 4
isa_TZ 4
Art. 12
CEN.Midwest
CEN.USA
TZ.USA
TZ.Eastern
TZ.Central
TZ.Mountain
CEN.South
CEN.Northeast
CEN.West TZ.Pacific
Nodes
CEN 4
comb 1
TZ 4
Edges
input 7
overlapsinput 6
overlapsinferred 1
R1
R2
R3
R4
R5
R6
R7
R8
R9
Figure 2. The process of aligning
taxonomies T1 and T2 with Euler/X
Figure 5. Top-down
input alignments
between TCEN and TTZ
Figure 6. The unique
PW for the TCEN with
TTZ alignment
Figure 10. Combined concepts
solution for TCEN and TTZ
taxonomy CEN Census_Regions
(USA Northeast Midwest South West)
(Northeast CT MA ME NH NJ NY PA RI VT)
(Midwest IL IN IA KS MI MN MO NE ND OH
SD WI)
(South AL AR DE DC FL GA KY LA MD MS NC
OK SC TN TX VA WV)
(West AZ CA CO ID MT NV NM OR UT WA WY)
taxonomy NDC
National_Diversity_Council
(USA Midwest Northeast Southeast
Southwest West)
(Northeast CT DC DE MD MA ME NH NJ NY
PA RI VT)
(Midwest IA IL IN KS MI MN MO ND NE OH
SD WI)
(Southeast AL AR FL GA KY LA MS NC SC
TN VA WV)
(Southwest AZ NM OK TX)
(West CA CO ID MT NV OR WA WY UT)
articulations CEN NDC
[CEN.AL equals NDC.AL]
[CEN.AR equals NDC.AR]
[CEN.AZ equals NDC.AZ]
[CEN.CA equals NDC.CA]
[CEN.CO equals NDC.CO]
[CEN.CT equals NDC.CT]
[CEN.DC equals NDC.DC]
[CEN.DE equals NDC.DE]
[CEN.FL equals NDC.FL]
[CEN.GA equals NDC.GA]
[CEN.IA equals NDC.IA]
[CEN.ID equals NDC.ID]
[CEN.IL equals NDC.IL]
[CEN.IN equals NDC.IN]
[CEN.KS equals NDC.KS]
[CEN.KY equals NDC.KY]
[CEN.LA equals NDC.LA]
[CEN.MA equals NDC.MA]
[CEN.MD equals NDC.MD]
[CEN.ME equals NDC.ME]
[CEN.MI equals NDC.MI]
[CEN.MN equals NDC.MN]
...
Quick Scan!
taxonomy CEN Census_Regions
(USA Midwest South West Northeast)
taxonomy TZ Time_Zone
(USA Pacific Mountain Central Eastern)
articulations CEN TZ
[CEN.Midwest disjoint TZ.Pacific]
[CEN.Midwest overlaps TZ.Eastern]
[CEN.Midwest overlaps TZ.Mountain]
[CEN.Northeast is_included_in TZ.Eastern]
[CEN.South disjoint TZ.Pacific]
[CEN.South overlaps TZ.Central]
[CEN.South overlaps TZ.Eastern]
[CEN.South overlaps TZ.Mountain]
[CEN.USA equals TZ.USA]
[CEN.West disjoint TZ.Central]
[CEN.West disjoint TZ.Eastern]
[CEN.West overlaps TZ.Mountain]
Two	Taxonomies:	NDC vs CEN
“…in the face of incompatible information or data structures among users or among those
specifying the system, attempts to create unitary knowledge categories are futile. Rather, parallel
or multiple representational forms are required” [Bowker & Star, 2000, p.159]
West
Southwest Southeast
Midwest North-
east
West
South
Midwest North-
east
National	Diversity	Council	map	(NDC) US	Census	Buero map	(CEN)	
Source:	Yi-Yun	(Jessica)	Cheng	(PhD	student,	iSchool @	Illinois)
The	taxonomies
11/01/17
Cheng
• The	Census	Regions	Map	(CEN),	consists	of	four regions:	West,	
Midwest,	Northeast,	and	South,	i.e.,	the	contiguous	48	states	
and	Washington	D.C.
West
South
Midwest
North-
east
The	taxonomies
• The	National	Diversity	Council	Map	(NDC),	consists	of	five
regions:	West,	Southwest,	Midwest,	Northeast,	Southeast,	the	
48	states	and	Washington	D.C.
NDC	(with	states)
West
Southwest Southeast
Midwest North-
east
• NDC splits South into
SW and SE
• Do NDC and CEN
agree on “West”?
“Midwest”? …
• How can we sort this
out?
Sorting	things	out	…	
11/01/17
Cheng
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5
NDC 6
Edges
is_a (CEN) 4
is_a (NDC) 5
CEN.South
NDC.Northeast
o
NDC.Southwest
o
NDC.Southeast>
CEN.Midwest
NDC.Midwest=
CEN.USA
CEN.West
CEN.Northeast
NDC.USA
=
!
o
NDC.West
>
<
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5
NDC 6
Edges
is_a (CEN) 4
is_a (NDC) 5
• Given:
– taxonomies	T1,	T2
– and	relations	T1	~	T2	
(articulations,	alignment)	
• Find:	
– merged	taxonomy	T3		
• Such	that:
– T1,	T2	are	preserved
– all	pairwise	relations	are	
explicit	
T1 T2
5	ways	to	relate	concepts	(regions)
• Idea:	relate	concepts	X	and	Y	with	
articulations	
• Articulation	Language:	Region	
Connection	Calculus (RCC5):	congruence,	
inclusion,	inverse	inclusion,	overlap,	
disjointness
Y X X YX Y X Y X Y
Congruence
X == Y
Inclusion
X > Y
Inverse Inclusion
X < Y
Overlap
X>< Y
Disjointness
X ! Y
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest
NDC.Midwest==
CEN.USA
CEN.West
CEN.Northeast
NDC.USA
==
!
><
NDC.West
>
<
Merged	taxonomy	T3	
CEN.South
NDC.Northeast
NDC.Southwest
CEN.USA
NDC.USA
CEN.West
CEN.Northeast
NDC.Southeast
NDC.West
CEN.Midwest
NDC.Midwest
con
is_a
overla
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5
NDC 6
Edges
is_a (CEN) 4
is_a (NDC) 5
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5
NDC 6
Edges
is_a (CEN) 4
is_a (NDC) 5
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest
NDC.Midwest==
CEN.USA
CEN.West
CEN.Northeast
NDC.USA
==
!
><
NDC.West
>
<
5
6
4
5
s 9
T1 T2
T1	~	T2 T3
How	we	align	two	taxonomies	T1	and	T2
• Step	1. Supply	input	taxonomies	T1
and	T2
• Step	2.	Describe	the	relationships	
between	T1 and	T2
• Step	3. Iteratively	edit	articulations	
in	Euler/X
T1
T2
T1
T2
Inconsistent (N=0)
Ambiguous (N>1)
T3
Add/Edit
Articulations A
Euler/X
N Possible Worlds
N=1 N=0 or N>1
• … but where do the articulations
come from??
– expert opinion
– automatically derived from data
Case	1:	Census	Region	vs.	National	
Diversity	Council
Cheng
West
South
Midwest
North-
east
NDC	(with	states)
West
Southwest Southeast
Midwest North-
east
CEN NDC
• … but where do the articulations
come from??
– automatically derived from data
– expert input
11/01/17
Cheng
CEN.IL NDC.IL==
CEN.IN NDC.IN
==
CEN.RI NDC.RI==
CEN.IA NDC.IA==
CEN.WV NDC.WV
==
CEN.KS NDC.KS==
CEN.KY NDC.KY==
CEN.TX
NDC.TX
==
CEN.Northeast
CEN.VT
CEN.MA
CEN.ME
CEN.CT
CEN.PA
CEN.NY
CEN.NH
CEN.NJ
CEN.South
CEN.TN
CEN.MS
CEN.MD
CEN.DC
CEN.DE
CEN.VA
CEN.FL
CEN.AR
CEN.AL
CEN.OK
CEN.SC
CEN.LA
CEN.GA
CEN.NC
CEN.ID NDC.ID==
NDC.TN==
CEN.WY NDC.WY==
NDC.VT==
NDC.MS==
CEN.MT NDC.MT==
NDC.MA
==
CEN.USA
CEN.Midwest
CEN.West
NDC.ME==
NDC.MD==
CEN.MI NDC.MI==
CEN.MN NDC.MN==
NDC.DC==
NDC.DE==
CEN.OR NDC.OR==
CEN.OH NDC.OH==
NDC.VA==
NDC.FL==
NDC.AR==
CEN.AZ
NDC.AZ==
NDC.AL==
NDC.OK
==
NDC.CT==
CEN.CO NDC.CO
==
CEN.CA NDC.CA==
CEN.SD NDC.SD
==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
CEN.UT NDC.UT==
NDC.GA==
NDC.PA==
CEN.NV
CEN.NM
CEN.WA
NDC.NY==
NDC.NV==
NDC.NM==
NDC.WA
==
NDC.NH==
NDC.NJ==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.West
NDC.Midwest
NDC.Northeast
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54
NDC 55
Edges
sa_CEN 53
sa_NDC 54
Art. 49
CEN.IL NDC.IL==
CEN.IN NDC.IN
==
CEN.IA NDC.IA==
CEN.WV NDC.WV
==
CEN.KS NDC.KS==
CEN.TX
NDC.TX
==
CEN.South
CEN.TN
CEN.MS
CEN.AL
CEN.OK
CEN.SC
CEN.LA
CEN.GA
CEN.NC
NDC.TN==
NDC.MS==
CEN.USA
CEN.Midwest
CEN.MI NDC.MI==
CEN.MN NDC.MN==
CEN.OH NDC.OH==
CEN.AZ
NDC.AZ==
NDC.AL==
NDC.OK
==
CEN.SD NDC.SD
==
NDC.SC==
CEN.MO
CEN.ND
CEN.NE
CEN.WI
NDC.LA==
NDC.MO==
NDC.GA==
CEN.NM
NDC.NM==
NDC.ND==
NDC.NE==
NDC.WI==
NDC.NC==
NDC.Midwest
NDC.Southeast
NDC.USA
NDC.Southwest
Nodes
CEN 54
NDC 55
Edges
isa_CEN 53
isa_NDC 54
Art. 49
11/01/17
Cheng
CEN.West
NDC.Southwest
CEN.USA
NDC.USA
CEN.Northeast
NDC.Northeast
CEN.South
NDC.Southeast
NDC.West
CEN.DC
NDC.DC
CEN.NM
NDC.NM
CEN.ND
NDC.ND
CEN.Midwest
NDC.Midwest
CEN.AZ
NDC.AZ
CEN.CA
NDC.CA
CEN.MT
NDC.MT
CEN.MA
NDC.MA
CEN.IN
NDC.IN
CEN.NV
NDC.NV
CEN.MD
NDC.MD
CEN.CT
NDC.CT
CEN.NH
NDC.NH
CEN.KY
NDC.KY
CEN.PA
NDC.PA
CEN.CO
NDC.CO
CEN.WA
NDC.WA
CEN.MI
NDC.MI
CEN.VA
NDC.VA
CEN.WI
NDC.WI
CEN.NE
NDC.NE
CEN.SD
NDC.SD
CEN.MN
NDC.MN
CEN.MS
NDC.MS
CEN.ID
NDC.ID
CEN.WV
NDC.WV
CEN.NY
NDC.NY
CEN.NJ
NDC.NJ
CEN.UT
NDC.UT
CEN.ME
NDC.ME
CEN.IL
NDC.IL
CEN.TN
NDC.TN
CEN.VT
NDC.VT
CEN.GA
NDC.GA
CEN.DE
NDC.DE
CEN.NC
NDC.NC
CEN.OK
NDC.OK
CEN.MO
NDC.MO
CEN.SC
NDC.SC
CEN.AR
NDC.AR
CEN.TX
NDC.TX
CEN.LA
NDC.LA
CEN.OH
NDC.OH
CEN.IA
NDC.IA
CEN.KS
NDC.KS
CEN.RI
NDC.RI
CEN.WY
NDC.WY
CEN.FL
NDC.FL
CEN.OR
NDC.OR
CEN.AL
NDC.AL
Nodes
CEN 3
NDC 4
comb 51
Edges
input 61
inferred 3
overlapsinferred 3
CEN.Northeast
CEN.ND
NDC.ND
CEN.Midwest
NDC.Midwest
CEN.MA
NDC.MA
CEN.IN
NDC.IN
CEN.CT
NDC.CT
CEN.NH
NDC.NH
CEN.PA
NDC.PA
CEN.MI
NDC.MI
CEN.WI
NDC.WI
CEN.NE
NDC.NE
CEN.SD
NDC.SD
CEN.MN
NDC.MN
CEN.NY
NDC.NY
CEN.NJ
NDC.NJ
CEN.ME
NDC.ME
CEN.IL
NDC.IL
CEN.VT
NDC.VT
CEN.MO
NDC.MO
CEN.OH
NDC.OH
CEN.IA
NDC.IA
CEN.KS
NDC.KS
CEN.RI
NDC.RI
Nod
CEN
NDC
comb
Edg
input
inferre
overlapsinf
USA,	Midwest	and	State-level	
alignments	are	all	congruent
11/01/17
Cheng
CEN.West
NDC.Southwest
CEN.USA
NDC.USA
CEN.Northeast
NDC.Northeast
CEN.South
NDC.Southeast
NDC.West
CEN.DC
NDC.DC
CEN.NM
NDC.NM
CEN.ND
NDC.ND
CEN.Midwest
NDC.Midwest
CEN.AZ
NDC.AZ
CEN.CA
NDC.CA
CEN.MT
NDC.MT
CEN.MA
NDC.MA
CEN.IN
NDC.IN
CEN.NV
NDC.NV
CEN.MD
NDC.MD
CEN.CT
NDC.CT
CEN.NH
NDC.NH
CEN.KY
NDC.KY
CEN.PA
NDC.PA
CEN.CO
NDC.CO
CEN.WA
NDC.WA
CEN.MI
NDC.MI
CEN.VA
NDC.VA
CEN.WI
NDC.WI
CEN.NE
NDC.NE
CEN.SD
NDC.SD
CEN.MN
NDC.MN
CEN.MS
NDC.MS
CEN.ID
NDC.ID
CEN.WV
NDC.WV
CEN.NY
NDC.NY
CEN.NJ
NDC.NJ
CEN.UT
NDC.UT
CEN.ME
NDC.ME
CEN.IL
NDC.IL
CEN.TN
NDC.TN
CEN.VT
NDC.VT
CEN.GA
NDC.GA
CEN.DE
NDC.DE
CEN.NC
NDC.NC
CEN.OK
NDC.OK
CEN.MO
NDC.MO
CEN.SC
NDC.SC
CEN.AR
NDC.AR
CEN.TX
NDC.TX
CEN.LA
NDC.LA
CEN.OH
NDC.OH
CEN.IA
NDC.IA
CEN.KS
NDC.KS
CEN.RI
NDC.RI
CEN.WY
NDC.WY
CEN.FL
NDC.FL
CEN.OR
NDC.OR
CEN.AL
NDC.AL
Nodes
CEN 3
NDC 4
comb 51
Edges
input 61
inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USA
NDC.USA
NDC.Northeast
CEN.South
NDC.Southeast
CEN.DC
NDC.DC
CEN.NM
NDC.NM
CEN.AZ
NDC.AZ
CEN.MA
NDC.MA
CEN.MD
NDC.MD
CEN.CT
CEN.KY
NDC.KY
CEN.VA
NDC.VA
CEN.MS
NDC.MS
CEN.WV
NDC.WV
CEN.TN
NDC.TN
CEN.GA
NDC.GA
CEN.DE
NDC.DE
CEN.NC
NDC.NC
CEN.OK
NDC.OK
CEN.SC
NDC.SC
CEN.AR
NDC.AR
CEN.TX
NDC.TX
CEN.LA
NDC.LA
CEN.FL
NDC.FL
CEN.AL
NDC.AL
The	overlapping	relations	are	
automatically	derived	from	data
11/01/17
Cheng
CEN.West
NDC.Southwest
CEN.USA
NDC.USA
CEN.Northeast
NDC.Northeast
CEN.South
NDC.Southeast
NDC.West
CEN.DC
NDC.DC
CEN.NM
NDC.NM
CEN.ND
NDC.ND
CEN.Midwest
NDC.Midwest
CEN.AZ
NDC.AZ
CEN.CA
NDC.CA
CEN.MT
NDC.MT
CEN.MA
NDC.MA
CEN.IN
NDC.IN
CEN.NV
NDC.NV
CEN.MD
NDC.MD
CEN.CT
NDC.CT
CEN.NH
NDC.NH
CEN.KY
NDC.KY
CEN.PA
NDC.PA
CEN.CO
NDC.CO
CEN.WA
NDC.WA
CEN.MI
NDC.MI
CEN.VA
NDC.VA
CEN.WI
NDC.WI
CEN.NE
NDC.NE
CEN.SD
NDC.SD
CEN.MN
NDC.MN
CEN.MS
NDC.MS
CEN.ID
NDC.ID
CEN.WV
NDC.WV
CEN.NY
NDC.NY
CEN.NJ
NDC.NJ
CEN.UT
NDC.UT
CEN.ME
NDC.ME
CEN.IL
NDC.IL
CEN.TN
NDC.TN
CEN.VT
NDC.VT
CEN.GA
NDC.GA
CEN.DE
NDC.DE
CEN.NC
NDC.NC
CEN.OK
NDC.OK
CEN.MO
NDC.MO
CEN.SC
NDC.SC
CEN.AR
NDC.AR
CEN.TX
NDC.TX
CEN.LA
NDC.LA
CEN.OH
NDC.OH
CEN.IA
NDC.IA
CEN.KS
NDC.KS
CEN.RI
NDC.RI
CEN.WY
NDC.WY
CEN.FL
NDC.FL
CEN.OR
NDC.OR
CEN.AL
NDC.AL
Nodes
CEN 3
NDC 4
comb 51
Edges
input 61
inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USA
NDC.USA
NDC.Northeast
CEN.South
NDC.Southeast
CEN.DC
NDC.DC
CEN.NM
NDC.NM
CEN.AZ
NDC.AZ
CEN.MA
NDC.MA
CEN.MD
NDC.MD
CEN.CT
CEN.KY
NDC.KY
CEN.VA
NDC.VA
CEN.MS
NDC.MS
CEN.WV
NDC.WV
CEN.TN
NDC.TN
CEN.GA
NDC.GA
CEN.DE
NDC.DE
CEN.NC
NDC.NC
CEN.OK
NDC.OK
CEN.SC
NDC.SC
CEN.AR
NDC.AR
CEN.TX
NDC.TX
CEN.LA
NDC.LA
CEN.FL
NDC.FL
CEN.AL
NDC.AL
DC	is	in	both	the	South	and	the	Northeast
Case	2:	Census	Region	vs	Time	Zone
Cheng
Pacific
Mountain
Central
Eastern
West
South
Midwest
North-
east
CEN TZ
• … but where do the articulations
come from??
– automatically derived from data
– expert input
Cheng
CEN.Northeast
TZ.Eastern
<
CEN.Midwest
><
TZ.Mountain
><
TZ.Pacific
!
CEN.South
><
><
!
TZ.Central
><
CEN.USA
CEN.West
TZ.USA
==
!
><
!
CEN.Midwest
CEN.USA
TZ.USA
TZ.Eastern
TZ.Central
TZ.Mountain
CEN.South
CEN.Northeast
CEN.West TZ.Pacific
Input
Output:
Possible	World
Top-down	regional	alignment
How	do	we	know	if	our	‘expert	
articulations’	are	correct?	
11/01/17
Cheng
R1
R2
R3
R4
R5
R6
R7
R8
R9
GIS solution as the Ground Truth..
11/01/17
Cheng
R1
R2
R3
R4
R5
R6
R7
R8
R9
CEN.Midwest
CEN.USA
TZ.USA
CEN.West
CEN.Northeast
TZ.EasternCEN.Midwest
TZ.EasternCEN.South
CEN.South
CEN.South*TZ.Central
TZ.CentralCEN.Midwest
CEN.SouthTZ.Eastern
CEN.SouthTZ.Mountain
TZ.Central
CEN.MidwestTZ.Eastern
TZ.MountainCEN.South
TZ.Mountain
CEN.MidwestTZ.Mountain
TZ.MountainCEN.Midwest
CEN.Midwest*TZ.Mountain
CEN.MidwestTZ.Central
TZ.MountainCEN.West
CEN.Midwest*TZ.Eastern
CEN.West*TZ.Mountain
CEN.South*TZ.Mountain
CEN.SouthTZ.Central
TZ.Eastern
CEN.South*TZ.Eastern
CEN.Midwest*TZ.Central
TZ.CentralCEN.South
TZ.Pacific
CEN.WestTZ.Mountain
N
CE
newC
co
T
E
Combined	concepts	solution	
for	regional-level	alignments
Do	the	taxonomies	have	to	be	
spatial	in	order	to	use	RCC-5?		
• No!	The	more	typical	cases	for	taxonomy	
alignment	are	usually	between	non-spatial
taxonomies
– for	which	no	“GIS	route”	or	direct	visual	cues	
about	regional	extensions	are	available
– the	use	of	RCC-5	as	an	alignment	vocabulary	is	a	
suitable	approach	to	perform	a	wide	range	of	
multi-hierarchy	reconciliations	
Cheng
Conclusion	&	Discussion	
• Underscores	the	benefits	of	designing	different	
alignment	workflows	(Bottom-up	vs.	Top-Down)
– Bottom-up:	non-overlapping	relationships	at	the	lowest-level	
articulations,	not	sure	how	to	align	the	higher-level	concepts	
– Top-Down:	when	there	is	often	overlapping	leaf-level	relations..	
Expert	input	will	frequently	be	needed	to	establish	such	
expectations	under	the	top-down	approach	
11/01/17
Cheng
https://github.com/EulerProject/ASIST17
yiyunyc2@illinois.edu
Implications
• Logic-based	taxonomy	alignment	approach
– Disambiguate	name-based	taxonomy	alignment	over	time
• 40%	of	the	concepts	in	biology	taxonomies	undergoes	
name	change	over	time	(Franz	et	al.,	2016)
– May	mitigate	problems	in	equivalent	crosswalking
• Membership	condition	problem	that	was	often	criticized	in	
crosswalking
– Preserves	the	original	taxonomies	while	providing	an	
alignment	view
• Solve	data	integration	problems	that	happen	in	the	more	
coarse-grained	relative	crosswalking
11/01/17
Cheng
https://github.com/EulerProject/ASIST17
yiyunyc2@illinois.edu
• …	Aristotle	…	
• …	Euler	…	
• …	
• …	Greg	Whitbread	…	
• [BPB93]	J.	H.	Beach,	S.	Pramanik,	and	J.	H.	Beaman.	Hierarchic	
taxonomic	databases.,Advances in	Computer	Methods	for	Systematic	
Biology:	Artificial	Intelligence,	Databases,	Computer	Vision,	1993
• [Ber95]	Walter	G.	Berendsohn.	The	concept	of	“potential	taxa” in	
databases.	Taxon,	44:207–212,	1995.
• [Ber03]	Walter	G.	Berendsohn.	MoReTax – Handling	Factual	Information	
Linked	to	Taxonomic	Concepts	in	Biology.	No.	39	in	Schriftenreihe für
Vegetationskunde.	Bundesamt für Naturschutz,	2003.
• [GG03]	M.	Geoffroy and	A.	Güntsch.	Assembling	and	navigating	the	
potential	taxon	graph.	In	[Ber03],	pages	71–82,	2003.
• [TL07]	Thau,	D.,	&	Ludäscher,	B.	(2007).	Reasoning	about	taxonomies	in	
first-order	logic.	Ecological	Informatics,	2(3),	195-209.
• [FP09]	Franz,	N.	M.,	&	Peet,	R.	K.	(2009).	Perspectives:	towards	a	
language	for	mapping	relationships	among	taxonomic	concepts.	
Systematics	and	Biodiversity,	7(1),	5-20.
• …		 85
Some	History

Mais conteúdo relacionado

Semelhante a ETC & Authors in the Driver's Seat

Introduction to Information Visualization (Part 1)
Introduction to Information Visualization (Part 1)Introduction to Information Visualization (Part 1)
Introduction to Information Visualization (Part 1)Andrew Vande Moere
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainChristophe Debruyne
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldJohn Kunze
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
Weka presentation
Weka presentationWeka presentation
Weka presentationSaeed Iqbal
 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structuresecomputernotes
 
Web-scale semantic search
Web-scale semantic searchWeb-scale semantic search
Web-scale semantic searchEdgar Meij
 
Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Takashi J OZAKI
 
Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowYi-Fan Liou
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems PerformanceBrendan Gregg
 
GSLIS Research Showcase Presentation (Expanded)
GSLIS Research Showcase Presentation (Expanded)GSLIS Research Showcase Presentation (Expanded)
GSLIS Research Showcase Presentation (Expanded)Bertram Ludäscher
 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1ecomputernotes
 
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)Keiichiro Ono
 
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...Complement Verb
 
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...Zhenzhong Xu
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning OverviewMykhailo Koval
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!dclsocialmedia
 

Semelhante a ETC & Authors in the Driver's Seat (20)

Introduction to Information Visualization (Part 1)
Introduction to Information Visualization (Part 1)Introduction to Information Visualization (Part 1)
Introduction to Information Visualization (Part 1)
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
 
The ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years OldThe ARK Identifier Scheme at Ten Years Old
The ARK Identifier Scheme at Ten Years Old
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
2017 nov reflow sbtb
2017 nov reflow sbtb2017 nov reflow sbtb
2017 nov reflow sbtb
 
Weka presentation
Weka presentationWeka presentation
Weka presentation
 
Computer notes - data structures
Computer notes - data structuresComputer notes - data structures
Computer notes - data structures
 
Web-scale semantic search
Web-scale semantic searchWeb-scale semantic search
Web-scale semantic search
 
Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
Useing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with TensorflowUseing PSO to optimize logit model with Tensorflow
Useing PSO to optimize logit model with Tensorflow
 
Open Source Systems Performance
Open Source Systems PerformanceOpen Source Systems Performance
Open Source Systems Performance
 
GSLIS Research Showcase Presentation (Expanded)
GSLIS Research Showcase Presentation (Expanded)GSLIS Research Showcase Presentation (Expanded)
GSLIS Research Showcase Presentation (Expanded)
 
computer notes - Data Structures - 1
computer notes - Data Structures - 1computer notes - Data Structures - 1
computer notes - Data Structures - 1
 
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
Cytoscape Tutorial Session 1 at UT-KBRIN Bioinformatics Summit 2014 (4/11/2014)
 
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
 
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
 

Mais de Bertram Ludäscher

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionBertram Ludäscher
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Bertram Ludäscher
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database RulesBertram Ludäscher
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database RulesBertram Ludäscher
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsBertram Ludäscher
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Bertram Ludäscher
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueBertram Ludäscher
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesBertram Ludäscher
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesBertram Ludäscher
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...Bertram Ludäscher
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...Bertram Ludäscher
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...Bertram Ludäscher
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsBertram Ludäscher
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsBertram Ludäscher
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachBertram Ludäscher
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchBertram Ludäscher
 
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionWild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionBertram Ludäscher
 
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...Bertram Ludäscher
 

Mais de Bertram Ludäscher (20)

Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family ReunionGames, Queries, and Argumentation Frameworks: Time for a Family Reunion
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion
 
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
Games, Queries, and Argumentation Frameworks: Time for a Family Reunion!
 
[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules[Flashback] Integration of Active and Deductive Database Rules
[Flashback] Integration of Active and Deductive Database Rules
 
[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules[Flashback] Statelog: Integration of Active & Deductive Database Rules
[Flashback] Statelog: Integration of Active & Deductive Database Rules
 
Answering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query PatternsAnswering More Questions with Provenance and Query Patterns
Answering More Questions with Provenance and Query Patterns
 
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
 
Which Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A DialogueWhich Model Does Not Belong: A Dialogue
Which Model Does Not Belong: A Dialogue
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
 
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
[Flashback 2005] Managing Scientific Data: From Data Integration to Scientifi...
 
Dissecting Reproducibility: A case study with ecological niche models in th...
Dissecting Reproducibility:  A case study with ecological niche models  in th...Dissecting Reproducibility:  A case study with ecological niche models  in th...
Dissecting Reproducibility: A case study with ecological niche models in th...
 
Incremental Recomputation: Those who cannot remember the past are condemned ...
Incremental Recomputation:  Those who cannot remember the past are condemned ...Incremental Recomputation:  Those who cannot remember the past are condemned ...
Incremental Recomputation: Those who cannot remember the past are condemned ...
 
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency AnnotationsValidation and Inference of Schema-Level Workflow Data-Dependency Annotations
Validation and Inference of Schema-Level Workflow Data-Dependency Annotations
 
An ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflowsAn ontology-driven framework for data transformation in scientific workflows
An ontology-driven framework for data transformation in scientific workflows
 
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses ApproachKnowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
Knowledge Representation & Reasoning and the Hierarchy-of-Hypotheses Approach
 
Whole-Tale: The Experience of Research
Whole-Tale: The Experience of ResearchWhole-Tale: The Experience of Research
Whole-Tale: The Experience of Research
 
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligionWild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
Wild Ideas at TDWG'17: Embrace multiple possible worlds; abandon techno-ligion
 
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
Using YesWorkflow hybrid queries to reveal data lineage from data curation ac...
 

Último

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Último (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

ETC & Authors in the Driver's Seat