SlideShare uma empresa Scribd logo
1 de 27
“Provenance and Social Science Data”
15 March 2017
Documenting DataTransformations
George Alter, University of Michigan
• Data are useless without Metadata – “data
about data”
• Metadata should:
– Include all information about data creation
– Describe transformations to variables
– Be easy to create
• Our goal: Automated capture of metadata
Why Metadata?
A few words about ICPSR
• World’s largest
archive of social
science data
• Consortium
established 1962
• 760+ member
institutions around
the world
• Founding member
and home office for
the DDI Alliance
Powered by DDI Metadata
ICPSR is building search
tools based upon Data
Documentation Initiative
(DDI) XML
Codebooks (pdf and
online) are rendered from
the DDI.
Searchable database of
4.5M variables
Click here for
online
codebook
Online codebook shows
variable in context of
dataset
Link to online
crosstab tool
What question
was asked?
How was the
question coded?Link to online
graph tool
Searchable database of
4.5M variables
Click here for
variable
comparison
Variable comparison
display
Click here for
online
codebook
Search for datasets with
3 desired variables
Check boxes
for variable
comparison
Crosswalk for American National Election
Study (ANES) and General Social Survey
(GSS)
Columns link to
70 datasets
134 tags in
8 lists
Variable
comparison
display
Variables linked to
online codebooks
Metadata for the American National Election Study
What question
was asked?
Who answered
this question?
How was the
question coded?
Who answered
this question?
Metadata for the American National Election Study
Who answered
this question?
Who answered
this question?
How do we know who
answered the question?
It’s in the pdf.
When data arrive at the
archive…
• No question text
• No interview flow (question order, skip pattern)
• No variable provenance
• Data transformations are not documented.
How is research data created?
• Most surveys are conducted with computer
assisted interview software (CAI)
– CATI – Computer-assisted Telephone Interview
– CAPI – Computer-assisted Personal Interview
– CAWI – Computer Aided Web Interview
• There is no paper questionnaire
• The CAI program is the questionnaire
– i.e. the program is the metadata
Original
data
DDI XML
Original
metadata
CAI
CAI
to
DDI
Convert to
DDI:
Collectica
MQDS
others
Computer
Assisted
Interviewing
We already have tools to
convert CAI to machine-
readable metadata.
SPSS
SAS
Stata
R
Command
scripts:
Original
data
DDI XML
Original
metadata
Revised
data
SPSS
SAS
Stata
R
CAI
CAI
to
DDI
Statistical
Packages
Convert to
DDI:
Collectica
MQDS
others
Computer
Assisted
Interviewing
What happens when a
project modifies the data.
The modified
data no longer
match the
metadata.
SPSS
SAS
Stata
R
Command
scripts:
Original
data
DDI XML
Original
metadata
Revised
data
SPSS
SAS
Stata
R
SPSSSAS
Stata
R
CAI
CAI
to
DDI
Statistical
Packages
Convert to
DDI:
Collectica
MQDS
others
Computer
Assisted
Interviewing
Stat
Package
to
DDI
DDI
XML
Extracted
metadata
Extract
metadata
from
SPSS/SAS/S
tata/R
Data file
Metadata are re-
created after the
data are
transformed.
Transformations
are documented
by hand
Statistics packages have limited
metadata
• Variable names
• Variable labels
• Value labels
• No provenance
SDTL XML
Updater
DDI XML
SPSS
SAS
Stata
R
Script
Parser
Command
scripts:
Original
data
Revised
metadata
DDI XML
Original
metadata
Revised
data
SPSS
SAS
Stata
R
CAI
CAI
to
DDI
Statistical
Packages
Standard
Data
Transformation
Language
Convert to
DDI:
Collectica
MQDS
others
Computer
Assisted
Interviewing
Automating the
capture of
transformation
metadata.
Missing links that we
will build.
What statistics packages should be
covered?
ICPSR Downloads by Format
All downloads
Studies with all
formats
Delimited text 43% 29%
SPSS 22% 24%
SAS 10% 12%
Stata 19% 23%
R 5% 12%
Excel 0% 1%
Other 0% 0%
100% 100%
Number 378,007 154,663
Input Data Output Data
SPSS
MISSING VALUES X(-1).
IF (X > 3) Y=9.
IF (X < 3) Z=8.
X
2
3
4
-1
Stata
replace X=. if X==-1
generate Y=9 if X>3
generate Z=8 if X<3
X
2
3
4
-1
SAS
if X=-1 then X=.;
if X>3 then Y=9;
if X<3 then Z=8;
X
2
3
4
-1
Why do we need an SDTL?
Input Data Output Data
SPSS
MISSING VALUES X(-1).
IF (X > 3) Y=9.
IF (X < 3) Z=8.
X X Y Z
2 2 8
3 3
4 4 9
-1 -1
Stata
replace X=. if X==-1
generate Y=9 if X>3
generate Z=8 if X<3
X X Y Z
2 2 8
3 3
4 4 9
-1 9
SAS
if X=-1 then X=.;
if X>3 then Y=9;
if X<3 then Z=8;
X X Y Z
2 2 . 8
3 3 . .
4 4 9 .
-1 . . 8
Why do we need an SDTL?
What happens when a missing value is
in a logical comparison?
• SPSS
– Logical expressions including a missing value are
considered “Missing.” Usually, “Missing” is equivalent to
“False.”
• Stata
– Missing values are treated as numbers equal to infinity.
So, any number is less than a missing value.
• SAS
– Missing values are treated as numbers equal to minus
infinity. So, any number is greater than a missing value.
Input Data Output Data
SPSS
MISSING VALUES X(-1).
IF (X > 3) Y=9.
IF (X < 3) Z=8.
X X Y Z
2 2 8
3 3
4 4 9
-1 NULL
Stata
replace X=. if X==-1
generate Y=9 if X>3
generate Z=8 if X<3
X X Y Z
2 2 8
3 3
4 4 9
-1 ∞ 9
SAS
if X=-1 then X=.;
if X>3 then Y=9;
if X<3 then Z=8;
X X Y Z
2 2 . 8
3 3 . .
4 4 9 .
-1 -∞ . 8
Missing Values in Comparisons
Benefits of automated metadata
capture
• Metadata will be better
– All the information in the CAI can be included.
– Variable transformations can be described
• Automation will lower costs
– Metadata will not be discarded and re-created
• All metadata will be standardized and machine
readable
– Codebooks with rich information can be rendered at will
• If we make it easy and beneficial, researchers will
use it.
Continuous Capture of Metadata for
Statistical Data
(NSF ACI-1640575)
Project Partners
•Inter-university Consortium for Political and Social
Research (ICPSR), University of Michigan
•Colectica
•Metadata Technology North America
•Norwegian Centre for Research Data
•General Social Survey, NORC, University of Chicago
•American National Election Study, University of
Michigan
Questions?
George Alter
altergc@umich.edu

Mais conteúdo relacionado

Mais procurados

Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineMartin Magdinier
 
Connected data meetup group - introduction & scope
Connected data meetup group - introduction & scopeConnected data meetup group - introduction & scope
Connected data meetup group - introduction & scopeConnected Data World
 
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
 
Introduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheetsIntroduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheetsAnders Pedersen
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricCambridge Semantics
 
Relational data model in Cassandra: Will it fit?
Relational data model in Cassandra: Will it fit?Relational data model in Cassandra: Will it fit?
Relational data model in Cassandra: Will it fit?Matija Gobec
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security datamarkgrover
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsOntotext
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...semanticsconference
 
Towards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsTowards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsAditya Parameswaran
 
NO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudNO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudManu Cohen-Yashar
 
Job Data Analysis Reveals Key Skills Required for Data Scientists
Job Data Analysis Reveals Key Skills Required for Data ScientistsJob Data Analysis Reveals Key Skills Required for Data Scientists
Job Data Analysis Reveals Key Skills Required for Data ScientistsJobsPikr
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadatamarkgrover
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationTamikaTannis
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesConnected Data World
 

Mais procurados (20)

Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refine
 
Connected data meetup group - introduction & scope
Connected data meetup group - introduction & scopeConnected data meetup group - introduction & scope
Connected data meetup group - introduction & scope
 
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim HodsonTalis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
Talis Insight Europe 2017 - Using Talis data with other datasets - Tim Hodson
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Evolution of big data
Evolution of big dataEvolution of big data
Evolution of big data
 
Introduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheetsIntroduction to data cleaning with spreadsheets
Introduction to data cleaning with spreadsheets
 
The Power of Machine Learning and Graphs
The Power of Machine Learning and GraphsThe Power of Machine Learning and Graphs
The Power of Machine Learning and Graphs
 
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data FabricUsing Cloud Automation Technologies to Deliver an Enterprise Data Fabric
Using Cloud Automation Technologies to Deliver an Enterprise Data Fabric
 
Relational data model in Cassandra: Will it fit?
Relational data model in Cassandra: Will it fit?Relational data model in Cassandra: Will it fit?
Relational data model in Cassandra: Will it fit?
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Building Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 stepsBuilding Knowledge Graphs in 10 steps
Building Knowledge Graphs in 10 steps
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
 
Towards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsTowards Visualization Recommendation Systems
Towards Visualization Recommendation Systems
 
NO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudNO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloud
 
Job Data Analysis Reveals Key Skills Required for Data Scientists
Job Data Analysis Reveals Key Skills Required for Data ScientistsJob Data Analysis Reveals Key Skills Required for Data Scientists
Job Data Analysis Reveals Key Skills Required for Data Scientists
 
Data Discovery & Trust through Metadata
Data Discovery & Trust through MetadataData Discovery & Trust through Metadata
Data Discovery & Trust through Metadata
 
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen PresentationNeo4j GraphTour Santa Monica 2019 - Amundsen Presentation
Neo4j GraphTour Santa Monica 2019 - Amundsen Presentation
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
RDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the piecesRDF Data Quality Assessment - connecting the pieces
RDF Data Quality Assessment - connecting the pieces
 

Destaque

Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...ARDC
 
Provenance and social science data Nicholas Car - Intro to PROV
Provenance and social science data   Nicholas Car - Intro to PROVProvenance and social science data   Nicholas Car - Intro to PROV
Provenance and social science data Nicholas Car - Intro to PROVARDC
 
Kate Hudson, Harper's Bazaar UK Cover
Kate Hudson, Harper's Bazaar UK CoverKate Hudson, Harper's Bazaar UK Cover
Kate Hudson, Harper's Bazaar UK CoverStrawberry Saroyan
 
CARACTERISTICAS DE BLOG Y WIKI
CARACTERISTICAS DE BLOG Y WIKICARACTERISTICAS DE BLOG Y WIKI
CARACTERISTICAS DE BLOG Y WIKIMary Chevez
 
Viral video marketing
Viral video marketingViral video marketing
Viral video marketingJhon Duk
 
Brussels Capital of Data Science
Brussels Capital of Data ScienceBrussels Capital of Data Science
Brussels Capital of Data ScienceDigitYser
 
2017 ifma presentation pdf
2017 ifma presentation pdf2017 ifma presentation pdf
2017 ifma presentation pdfJoe Pessa
 
3Com 3C96010C-AC
3Com 3C96010C-AC3Com 3C96010C-AC
3Com 3C96010C-ACsavomir
 
Healthchain. TFG Grado Ingeniería Informática.
Healthchain. TFG Grado Ingeniería Informática.Healthchain. TFG Grado Ingeniería Informática.
Healthchain. TFG Grado Ingeniería Informática.María Teresa Nieto Galán
 

Destaque (16)

Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...Managing provenance in the Social Sciences: the Data Documentation Initiative...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
 
Provenance and social science data Nicholas Car - Intro to PROV
Provenance and social science data   Nicholas Car - Intro to PROVProvenance and social science data   Nicholas Car - Intro to PROV
Provenance and social science data Nicholas Car - Intro to PROV
 
Kate Hudson, Harper's Bazaar UK Cover
Kate Hudson, Harper's Bazaar UK CoverKate Hudson, Harper's Bazaar UK Cover
Kate Hudson, Harper's Bazaar UK Cover
 
Simetria respecto a un eje
Simetria respecto a un ejeSimetria respecto a un eje
Simetria respecto a un eje
 
CARACTERISTICAS DE BLOG Y WIKI
CARACTERISTICAS DE BLOG Y WIKICARACTERISTICAS DE BLOG Y WIKI
CARACTERISTICAS DE BLOG Y WIKI
 
Sydney fc official merchandise
Sydney fc official merchandiseSydney fc official merchandise
Sydney fc official merchandise
 
Maestria tarea
Maestria tareaMaestria tarea
Maestria tarea
 
Viral video marketing
Viral video marketingViral video marketing
Viral video marketing
 
Brussels Capital of Data Science
Brussels Capital of Data ScienceBrussels Capital of Data Science
Brussels Capital of Data Science
 
H20 - Thirst for Machine Learning
H20 - Thirst for Machine LearningH20 - Thirst for Machine Learning
H20 - Thirst for Machine Learning
 
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
Real-life Application of Analytics: Fighting the Underworld of Bike Theft wit...
 
Ux and data
Ux and dataUx and data
Ux and data
 
Telenor Connexion
Telenor Connexion Telenor Connexion
Telenor Connexion
 
2017 ifma presentation pdf
2017 ifma presentation pdf2017 ifma presentation pdf
2017 ifma presentation pdf
 
3Com 3C96010C-AC
3Com 3C96010C-AC3Com 3C96010C-AC
3Com 3C96010C-AC
 
Healthchain. TFG Grado Ingeniería Informática.
Healthchain. TFG Grado Ingeniería Informática.Healthchain. TFG Grado Ingeniería Informática.
Healthchain. TFG Grado Ingeniería Informática.
 

Semelhante a Documenting Data Transformations

Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...semanticsconference
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data LakesLinked Enterprise Date Services
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2Mahmoud Alfarra
 
Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Jonathan Woodward
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastEric Kavanagh
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsDomino Data Lab
 
Learning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar CastanedaLearning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar CastanedaDatabricks
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Jonathan Challener
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceInstitute of Contemporary Sciences
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)LizLavaveshkul
 
Data science tips for data engineers
Data science tips for data engineersData science tips for data engineers
Data science tips for data engineersIBM Analytics
 

Semelhante a Documenting Data Transformations (20)

Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakeseccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
eccenca CorporateMemory - Semantically integrated Enterprise Data Lakes
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014Data Culture Series - Keynote - 16th September 2014
Data Culture Series - Keynote - 16th September 2014
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Database
DatabaseDatabase
Database
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Data Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and AnalyticsData Science Popup Austin: Back to The Future for Data and Analytics
Data Science Popup Austin: Back to The Future for Data and Analytics
 
Learning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar CastanedaLearning to Rank Datasets for Search with Oscar Castaneda
Learning to Rank Datasets for Search with Oscar Castaneda
 
The BI Sandbox
The BI SandboxThe BI Sandbox
The BI Sandbox
 
Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...Meeting today’s dissemination challenges – Implementing International Standar...
Meeting today’s dissemination challenges – Implementing International Standar...
 
From Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data ScienceFrom Science to Data: Following a principled path to Data Science
From Science to Data: Following a principled path to Data Science
 
Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)Etl Overview (Extract, Transform, And Load)
Etl Overview (Extract, Transform, And Load)
 
GraphDB
GraphDBGraphDB
GraphDB
 
Data science tips for data engineers
Data science tips for data engineersData science tips for data engineers
Data science tips for data engineers
 
Database
DatabaseDatabase
Database
 
Database
DatabaseDatabase
Database
 

Mais de ARDC

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADAARDC
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and StandardsARDC
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation ARDC
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)ARDC
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveARDC
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domainARDC
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharingARDC
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studiesARDC
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scopeARDC
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things dataARDC
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128ARDC
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical dataARDC
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataARDC
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesARDC
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018ARDC
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintARDC
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataARDC
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018ARDC
 

Mais de ARDC (20)

Introduction to ADA
Introduction to ADAIntroduction to ADA
Introduction to ADA
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
 
Data Sharing and Release Legislation
Data Sharing and Release Legislation   Data Sharing and Release Legislation
Data Sharing and Release Legislation
 
Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)Australian Dementia Network (ADNet)
Australian Dementia Network (ADNet)
 
Investigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspectiveInvestigator-initiated clinical trials: a community perspective
Investigator-initiated clinical trials: a community perspective
 
NCRIS and the health domain
NCRIS and the health domainNCRIS and the health domain
NCRIS and the health domain
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Clinical trials data sharing
Clinical trials data sharingClinical trials data sharing
Clinical trials data sharing
 
Clinical trials and cohort studies
Clinical trials and cohort studiesClinical trials and cohort studies
Clinical trials and cohort studies
 
Introduction to vision and scope
Introduction to vision and scopeIntroduction to vision and scope
Introduction to vision and scope
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian DuncanARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
 
Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128Skilling-up-in-research-data-management-20181128
Skilling-up-in-research-data-management-20181128
 
Research data management and sharing of medical data
Research data management and sharing of medical dataResearch data management and sharing of medical data
Research data management and sharing of medical data
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Applying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and ChallengesApplying FAIR principles to linked datasets: Opportunities and Challenges
Applying FAIR principles to linked datasets: Opportunities and Challenges
 
How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018How to make your data count webinar, 26 Nov 2018
How to make your data count webinar, 26 Nov 2018
 
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global SprintReady, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
 
How FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of dataHow FAIR is your data? Copyright, licensing and reuse of data
How FAIR is your data? Copyright, licensing and reuse of data
 
Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018Peter neish DMPs BoF eResearch 2018
Peter neish DMPs BoF eResearch 2018
 

Último

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Último (20)

Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Documenting Data Transformations