SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
Research Data Overview
'A step by step guide through the research data lifecycle, data set
creation, big data vs long-tail, metadata, data centres/data repositories’
Sarah Callaghan*
[sarah.callaghan@stfc.ac.uk]
@sorcha.ni
OpenAIRE/LIBER Workshop
28 May 2013, Ghent Belgium
* and a lot of others, including, but not limited to: the NERC data citation and
publication project team, the PREPARDE project team and the CEDA team
VO Sandpit, November 2009
Who are we and why do we
care about data?
The UK’s Natural Environment Research Council (NERC)
funds six data centres which between them have
responsibility for the long-term management of NERC's
environmental data holdings.
We deal with a variety of environmental measurements,
along with the results of model simulations in:
• Atmospheric science
• Earth sciences
• Earth observation
• Marine Science
• Polar Science
• Terrestrial & freshwater science, Hydrology and
Bioinformatics

VO Sandpit, November 2009
The Scientific Method
A key part of the scientific method is
that it should be reproducible – other
people doing the same experiments in
the same way should get the same
results.
Unfortunately observational data is not
reproducible (unless you have a time
machine!)
The way data is organised and archived
is crucial to the reproducibility of
science and our ability to test
conclusions.
This is often the only part of the process
that anyone other than the originating
scientist sees. We want to change this.
http://www.mrsaverettsclassroom.com/bio
2-scientific-method.php

VO Sandpit, November 2009
The research data lifecycle
Creating
data

Reusing
data

Processing
data

Researchers are used to creating,
processing and analysing data.
Data repositories generally have the
job of preserving and giving access to
data.

Giving
access to
data

Analysing
data

Third parties, or even the original
researchers will reuse the data.

Preserving
data

See http://data-archive.ac.uk/createmanage/life-cycle for more detail

VO Sandpit, November 2009
What is a Dataset?
DataCite’s definition
(http://www.datacite.org/sites/default/files/Bu
siness_Models_Principles_v1.0.pdf):
Dataset: "Recorded information, regardless of
the form or medium on which it may be
recorded including writings, films, sound
recordings, pictorial reproductions,
drawings, designs, or other graphic
representations, procedural manuals, forms,
diagrams, work flow, charts, equipment
descriptions, data files, data processing or
computer programs (software), statistical
records, and other research data." (from the
U.S. National Institutes of Health (NIH)
Grants Policy Statement via DataCite's Best
Practice Guide for Data Citation).
VO Sandpit, November 2009

In my opinion a dataset is
something that is:
• The result of a defined
process
• Scientifically meaningful
• Well-defined (i.e. clear
definition of what is in the
dataset and what isn’t)
Creating a dataset is hard
work!

"Piled Higher and Deeper" by Jorge Cham
www.phdcomics.com

VO Sandpit, November 2009
But sometimes other
people don’t get it.

"Piled Higher and Deeper" by Jorge Cham
www.phdcomics.com

VO Sandpit, November 2009
Creating data: a radio
propagation dataset
The problem: rain and cloud
mess up your satellite radio
signal. How can we fix this?

Italsat F1: Owned and
operated by Italian
Space Agency (ASI).
Launched January
1991, ended
operational life
January 2001.

VO Sandpit, November 2009
The receive cabin at Sparsholt in
Hampshire

Inside the receive cabin – the
instruments my data came from

VO Sandpit, November 2009
Creating/processing data

One day’s worth of raw data from one of the
receivers
My job was to take this...
VO Sandpit, November 2009

...turn it into this....
Analysing data
…a process which involved 4
major steps, 4 different
computer programmes, and
16 intermediate files for each
day of measurements.
Each month of preproccessed
data represented somewhere
between a couple of days and
a week's worth of effort.
It was a job where attention to
detail was important, and you
really had to know what you
were looking at from a
scientific perspective.
...with the final result being this.
VO Sandpit, November 2009
Preserving data (the wrong way!)

Part of the Italsat data archive – on CDs
in a shelf in my office

VO Sandpit, November 2009
What the processed data
set looks like on disk

What the raw data files
looked like.
(I do have some Word
documents somewhere
which describe what all
this is…)

VO Sandpit, November 2009
Example documentation

Note the
software
filenames in the
documentation.
I still have the
IDL files on disk
somewhere, but
I’d be very
surprised if
they’re still
compatible with
the current
version of IDL

VO Sandpit, November 2009
Documentation can sometimes
produce mixed feelings

"Piled Higher and Deeper" by Jorge Cham
www.phdcomics.com

VO Sandpit, November 2009
What it all came down to:

Composite image from Flickr user bnilsen and Matt Stempeck (NOI), shared
under Creative Commons license

And I wasn’t even preserving my data properly!

VO Sandpit, November 2009
As for giving access to the data…

I did share, but there was a lot of non-disclosure agreements (I am not a lawyer!)
And I didn’t feel like I got the credit for it.(The first publication based on the data wasn’t
written by me, and I didn’t even get my name in the acknowledgements.)

VO Sandpit, November 2009
Good news: the
data is all on the
BADC now

VO Sandpit, November 2009
Another example: How is my
scarf like a dataset?
•
•
•
•
•
•
•
•

The raw material it’s made from doesn’t
contain information
But the act of knitting encodes information into
the scarf
The scarf is the result of a well defined
process (knitting) and has a particular method
used to create it
I need to be able to describe it
I need to be able to find it
I need to store it properly so it doesn't get lost,
or corrupted (i.e. eaten by moths or shredded
by mice)
I might need to recreate it so I need to keep
information about it
I put a lot of time and effort into making it, so
I’m very attached to it!

VO Sandpit, November 2009
Just like not all
scarves are the
same, not all
datasets are the
same!
http://www.flickr.com/photos/nazliceti
ner/6448303541/

http://www.flickr.com/photos/lo
vefibre/3251690074/

http://www.flickr.com/photos/maco_nix/50198
85742/

http://www.flickr.com/phot
os/halfbisqued/80841459
76/

If in doubt, ask the creator

http://www.flickr.com/phot
os/lucathegalga/2282305
884/

VO Sandpit, November 2009

http://www.flickr.com/
photos/ujkakevin/230
3531028/
Metadata
It is generally agreed that we need methods to:
• define and document datasets of importance.
• augment and/or annotate data
• amalgamate, reprocess and reuse data
To do this, we need metadata – data
about data

For example:
Longitude and latitude are metadata about the
planet.
• They are artificial
• They allow us to communicate about places on
a sphere
• They were principally designed by those who
needed to navigate the oceans, which are
lacking in visible features!

VO Sandpit, November 2009

http://www.kcoyle.net/meta_purpose.html

Metadata can often act as a
surrogate for the real thing, in
this case the planet.
Metadata for my scarf
•
•
•
•

Dataset views and suggested uses

Descriptive: “teal blue”, “scarf”
Dimensions: 200cm long, 20cm wide
Location: “Around my neck”/”Hanging on
the door of my wardrobe”
Identifier: KOI (knitted object identifier)

Information needed to recreate it:
• The raw material: King Cole Haze Glitter
DK, colourway 124 - Ocean, with dyelot
67233
• Needle size: 4mm
• Algorithm used to create it: 18 stitch feather
and fan stitch with 2 stitch garter stitch
border at the edges
• Number of stitches cast on: 54
• Tension (how tightly I knit in this pattern):
28 rows and 27 stitches for a 10cm by
10cm square
VO Sandpit, November 2009
Metadata for Discovery, Documentation,
Definition

Lawrence et al 2009, doi:10.1098/rsta.2008.0237

VO Sandpit, November 2009
MOLES: Metadata Objects for Linking
Environmental Sciences v3.4

http://proj.badc.rl.ac.uk/moles/browser/branches/V3.4/M
ODEL/Diagrams/MOLES3.4Summary.png

VO Sandpit, November 2009
What do data centres do?
Data Curation Lifecycle Model

The Digital Curation Centre’s
Curation Lifecycle Model
provides a graphical, high-level
overview of the stages required
for successful curation and
preservation of data from initial
conceptualisation or receipt
through the iterative curation
cycle.

http://www.dcc.ac.uk/resources/curation-lifecycle-model

VO Sandpit, November 2009
Data repository
workflows
• Workflows are
very varied! No onesize fits all method
• Can have multiple
workflows in the
same data centre,
depending on
interactions with
external sources
(“Engaged
submitter”/ “Data
dumper” / “Third
party requester”)

VO Sandpit, November 2009
Why should I bother putting
my data into a repository?

"Piled Higher and Deeper" by Jorge Cham
www.phdcomics.com

VO Sandpit, November 2009
It’s ok, I’ll just do regular backups

Phaistos Disk, 1700BC

These documents have been preserved for thousands of years!
But they’ve both been translated many times, with different meanings each time.
Data Preservation is not enough, we need Active Curation to preserve
Information

VO Sandpit, November 2009
VO Sandpit, November 2009
Example Big Data: CMIP5
CMIP5: Fifth Coupled Model
Intercomparison Project
• Global community activity under the
World Meteorological Organisation
(WMO) via the World Climate Research
Programme (WCRP)
•Aim:
– to address outstanding scientific
questions that arose as part of
the 4th Assessment Report
process,
– improve understanding of
climate, and
– to provide estimates of future
climate change that will be useful
to those considering its possible
consequences.

Take home points here:
Many distinct experiments, with very
different characteristics, which influence the
configuration of the models, (what they can
do, and how they should be interpreted).

VO Sandpit, November 2009
FAR:1990
SAR:1995
TAR:2001
AR4:2007
AR5:2013

VO Sandpit, November 2009
CMIP5 numbers!
Simulations:
~90,000 years
~60 experiments
~20 modelling centres (from around
the world) using
~30 major(*) model configurations
~2 million output “atomic” datasets
~10's of petabytes of output
~2 petabytes of CMIP5 requested
output
~1 petabyte of CMIP5 “replicated”
output
Which are replicated at a number of
sites (including ours)

Of the replicants:
~ 220 TB decadal
~ 540 TB long term
~ 220 TB atmosphere-only

~80 TB of 3hourly data
~215 TB of ocean 3d monthly data
~250 TB for the cloud feedbacks
~10 TB of land-biochemistry (from
the long term experiments alone)

VO Sandpit, November 2009
Handling the CMIP5 data

•
•

Major international
collaboration!
Funded by EU FP7
projects (IS-ENES,
Metafor) and US
(ESG) and other
national sources (e.g.
NERC for the UK)

http://esgf-index1.ceda.ac.uk/esgf-web-fe/

VO Sandpit, November 2009

33
Summary of the CMIP5 example
The Climate problem needs:
– Major physical e-infrastructure (networks, supercomputers)
– Comprehensive information architectures covering the whole information life
cycle, including annotation (particularly of quality)
… and hard work populating these information objects, particularly with
provenance detail.
– Sophisticated tools to produce and consume the data and information
objects
– State of the art access control techniques

Major distributed systems are social challenges as much as technical challenges.
CMIP5 is Big Data, with lots of different participants and lots of different
technologies. It also has a community willing to work together to standardise
and automate data and metadata production and curation.

VO Sandpit, November 2009

34
http://www.flickr.com/photos/zlatko/5975700417/

Big Data:
• Industrialised and standardised data
and metadata production
• Large groups of people involved
• Methods for attribution and credit for
data creation established

Long Tail Data:
• Bespoke data and metadata creation
methods
• Small groups/lone researchers
• No generally accepted methods for
attribution and credit for data creation

VO Sandpit, November 2009
Future role of the library
Domain specific repositories can:
• Pick and choose what data to keep
• Ask for (and get) more detailed metadata
• Provide specific tools and services
(visualisations, server-side processing,…)
• Deal with Big Data!
Libraries will need to:
• Pick up and manage/archive the long-tail
data where there isn’t a domain repository
• Have generalised, widely applicable
systems that can cope with subjects from
astronomy to zoology
• Be prepared to cope with anything!

VO Sandpit, November 2009
Don’t Panic!

There’s a lot of information out there
about managing data.
Some of it won’t suit what you’re
trying to do, but some will.
Learn from others’ experiences good and bad!
Good luck!

VO Sandpit, November 2009
Summary and maybe conclusions?
• Data is important, and becoming more
so for a far wider range of the
population
• Conclusions and knowledge are only
as good as the data they’re based on
• Science is supposed to be
reproducible and verifiable
• It’s up to us as scientists to care for
the data we’ve got and ensure that the
story of what we did to the data is
transparent
•So we can use the data again
•And so people will trust our results
• It’s not an easy job – but someone’s
got to do it!
VO Sandpit, November 2009
Thanks!

Any questions?
sarah.callaghan@stfc.ac.uk
@sorcha_ni
http://citingbytes.blogspot.co.uk/

Image credit: Borepatch http://borepatch.blogspot.com/2010/06/itsnot-what-you-dont-know-that-hurts.html

VO Sandpit, November 2009

Mais conteúdo relacionado

Mais procurados

Presentation to EASE, Tallinn, June 2012
Presentation to EASE, Tallinn, June 2012Presentation to EASE, Tallinn, June 2012
Presentation to EASE, Tallinn, June 2012Sarah Callaghan
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiativeHerbert Van de Sompel
 
Preservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesPreservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesDorothea Salo
 
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...Herbert Van de Sompel
 
Risk management and auditing
Risk management and auditingRisk management and auditing
Risk management and auditingDorothea Salo
 
Publication and Dissemination of Data
Publication and Dissemination of DataPublication and Dissemination of Data
Publication and Dissemination of DataJames Baker
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 
The four Es: Doing more with metadata
The four Es: Doing more with metadataThe four Es: Doing more with metadata
The four Es: Doing more with metadataTim Sherratt
 
Breaking the Data Management Barrier
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management BarrierKristin Briney
 
RDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
RDAP13 Lorrie Johnson: Facilitating Access to Scientific DataRDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
RDAP13 Lorrie Johnson: Facilitating Access to Scientific DataASIS&T
 
ODIN Project Presentation to CLOSER Leadership Team
ODIN Project Presentation to CLOSER Leadership TeamODIN Project Presentation to CLOSER Leadership Team
ODIN Project Presentation to CLOSER Leadership Teamjohnkayebl
 

Mais procurados (14)

Presentation to EASE, Tallinn, June 2012
Presentation to EASE, Tallinn, June 2012Presentation to EASE, Tallinn, June 2012
Presentation to EASE, Tallinn, June 2012
 
towards interoperable archives: the Universal Preprint Service initiative
towards interoperable archives:  the Universal Preprint Service initiativetowards interoperable archives:  the Universal Preprint Service initiative
towards interoperable archives: the Universal Preprint Service initiative
 
Preservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesPreservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanities
 
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
The OAI-ORE Interoperability Framework in the Context of the Current Scholarl...
 
Risk management and auditing
Risk management and auditingRisk management and auditing
Risk management and auditing
 
Publication and Dissemination of Data
Publication and Dissemination of DataPublication and Dissemination of Data
Publication and Dissemination of Data
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 
End-to-End eScience
End-to-End eScienceEnd-to-End eScience
End-to-End eScience
 
The four Es: Doing more with metadata
The four Es: Doing more with metadataThe four Es: Doing more with metadata
The four Es: Doing more with metadata
 
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
 
Breaking the Data Management Barrier
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management Barrier
 
RDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
RDAP13 Lorrie Johnson: Facilitating Access to Scientific DataRDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
RDAP13 Lorrie Johnson: Facilitating Access to Scientific Data
 
ODIN Project Presentation to CLOSER Leadership Team
ODIN Project Presentation to CLOSER Leadership TeamODIN Project Presentation to CLOSER Leadership Team
ODIN Project Presentation to CLOSER Leadership Team
 

Destaque

OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE
 
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access PoliciesMarina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access PoliciesOpenAIRE
 
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...OpenAIRE
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE
 
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)OpenAIRE
 
Zenodo Repository and the Open Research Data in H2020 (OAW2016)
Zenodo Repository and the Open Research Data in H2020 (OAW2016) Zenodo Repository and the Open Research Data in H2020 (OAW2016)
Zenodo Repository and the Open Research Data in H2020 (OAW2016) OpenAIRE
 

Destaque (7)

OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
OpenAIRE in 8 minutes - Introduction to European einfrastructures session at ...
 
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access PoliciesMarina Angelaki - PASTEUR4OA: Supporting Open Access Policies
Marina Angelaki - PASTEUR4OA: Supporting Open Access Policies
 
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
OpenAIRE: Services for Funders - Lightning Talk at #DI4R conference (Krakov, ...
 
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
OpenAIRE guidelines and broker service for repository managers - OpenAIRE #OA...
 
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
OpenAIRE webinar on Open Research Data in H2020 (OAW2016)
 
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
(Open) Research Data Management in H2020 (ISERD – Tel Aviv, Oct 31, 2016)
 
Zenodo Repository and the Open Research Data in H2020 (OAW2016)
Zenodo Repository and the Open Research Data in H2020 (OAW2016) Zenodo Repository and the Open Research Data in H2020 (OAW2016)
Zenodo Repository and the Open Research Data in H2020 (OAW2016)
 

Semelhante a Research Data Overview: Step-by-Step Guide

Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Jamie Bisset
 
Ischools workshop - 4 - data discovery
Ischools workshop - 4 - data discoveryIschools workshop - 4 - data discovery
Ischools workshop - 4 - data discoveryARDC
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementcunera
 
Project CAiRO Overview
Project CAiRO OverviewProject CAiRO Overview
Project CAiRO OverviewStephen Gray
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenHeinz Pampel
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020Sarah Jones
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your RoleJay Gendron
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...hsuleslie
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8Scott Edmunds
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
Your Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumYour Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumAnnemiekvdKuil
 
KESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgKESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgAI4BD GmbH
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
British Library Social Science National Postgraduate Training Day - Datasets ...
British Library Social Science National Postgraduate Training Day - Datasets ...British Library Social Science National Postgraduate Training Day - Datasets ...
British Library Social Science National Postgraduate Training Day - Datasets ...johnkayebl
 

Semelhante a Research Data Overview: Step-by-Step Guide (20)

Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction)
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Ischools workshop - 4 - data discovery
Ischools workshop - 4 - data discoveryIschools workshop - 4 - data discovery
Ischools workshop - 4 - data discovery
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Project CAiRO Overview
Project CAiRO OverviewProject CAiRO Overview
Project CAiRO Overview
 
Cairo
CairoCairo
Cairo
 
Tools für das Management von Forschungsdaten
Tools für das Management von ForschungsdatenTools für das Management von Forschungsdaten
Tools für das Management von Forschungsdaten
 
Data Management and Horizon 2020
Data Management and Horizon 2020Data Management and Horizon 2020
Data Management and Horizon 2020
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 
Data management plans
Data management plansData management plans
Data management plans
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
Smith - Developing Campus Stakeholders' Collaborations - Sept 8Smith - Developing Campus Stakeholders' Collaborations - Sept 8
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
Your Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumYour Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.Datacentrum
 
KESW2012 Hackathon St Petersburg
KESW2012 Hackathon St PetersburgKESW2012 Hackathon St Petersburg
KESW2012 Hackathon St Petersburg
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
British Library Social Science National Postgraduate Training Day - Datasets ...
British Library Social Science National Postgraduate Training Day - Datasets ...British Library Social Science National Postgraduate Training Day - Datasets ...
British Library Social Science National Postgraduate Training Day - Datasets ...
 

Mais de OpenAIRE

10th OpenAIRE Content Providers Community Call
10th OpenAIRE Content Providers Community Call10th OpenAIRE Content Providers Community Call
10th OpenAIRE Content Providers Community CallOpenAIRE
 
9th Content Providers Community Call\
9th Content Providers Community Call\9th Content Providers Community Call\
9th Content Providers Community Call\OpenAIRE
 
OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE
 
8th Content Providers Community Call
8th Content Providers Community Call8th Content Providers Community Call
8th Content Providers Community CallOpenAIRE
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community CallOpenAIRE
 
OpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managersOpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managersOpenAIRE
 
What will it cost to manage and share my data?
What will it cost to manage and share my data?What will it cost to manage and share my data?
What will it cost to manage and share my data?OpenAIRE
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)OpenAIRE
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)OpenAIRE
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)OpenAIRE
 
6th Content Providers Community Call
6th Content Providers Community Call6th Content Providers Community Call
6th Content Providers Community CallOpenAIRE
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
 
20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?OpenAIRE
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open ScienceOpenAIRE
 
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)OpenAIRE
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open ScienceOpenAIRE
 
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
 
COVID-19: Activities, tools, best practice and contact points in Greece
 COVID-19: Activities, tools, best practice and contact points in Greece COVID-19: Activities, tools, best practice and contact points in Greece
COVID-19: Activities, tools, best practice and contact points in GreeceOpenAIRE
 
5th Content Providers Community Call
5th Content Providers Community Call5th Content Providers Community Call
5th Content Providers Community CallOpenAIRE
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community CallOpenAIRE
 

Mais de OpenAIRE (20)

10th OpenAIRE Content Providers Community Call
10th OpenAIRE Content Providers Community Call10th OpenAIRE Content Providers Community Call
10th OpenAIRE Content Providers Community Call
 
9th Content Providers Community Call\
9th Content Providers Community Call\9th Content Providers Community Call\
9th Content Providers Community Call\
 
OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)OpenAIRE in the European Open Science Cloud (EOSC)
OpenAIRE in the European Open Science Cloud (EOSC)
 
8th Content Providers Community Call
8th Content Providers Community Call8th Content Providers Community Call
8th Content Providers Community Call
 
7th Content Providers Community Call
7th Content Providers Community Call7th Content Providers Community Call
7th Content Providers Community Call
 
OpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managersOpenAIRE PROVIDE Dashboard for Turkish repository managers
OpenAIRE PROVIDE Dashboard for Turkish repository managers
 
What will it cost to manage and share my data?
What will it cost to manage and share my data?What will it cost to manage and share my data?
What will it cost to manage and share my data?
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 3)
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
 
6th Content Providers Community Call
6th Content Providers Community Call6th Content Providers Community Call
6th Content Providers Community Call
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science
 
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science
 
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
COVID-19: Activities, tools, best practice and contact points in Greece
 COVID-19: Activities, tools, best practice and contact points in Greece COVID-19: Activities, tools, best practice and contact points in Greece
COVID-19: Activities, tools, best practice and contact points in Greece
 
5th Content Providers Community Call
5th Content Providers Community Call5th Content Providers Community Call
5th Content Providers Community Call
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community Call
 

Último

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 

Último (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 

Research Data Overview: Step-by-Step Guide

  • 1. Research Data Overview 'A step by step guide through the research data lifecycle, data set creation, big data vs long-tail, metadata, data centres/data repositories’ Sarah Callaghan* [sarah.callaghan@stfc.ac.uk] @sorcha.ni OpenAIRE/LIBER Workshop 28 May 2013, Ghent Belgium * and a lot of others, including, but not limited to: the NERC data citation and publication project team, the PREPARDE project team and the CEDA team VO Sandpit, November 2009
  • 2. Who are we and why do we care about data? The UK’s Natural Environment Research Council (NERC) funds six data centres which between them have responsibility for the long-term management of NERC's environmental data holdings. We deal with a variety of environmental measurements, along with the results of model simulations in: • Atmospheric science • Earth sciences • Earth observation • Marine Science • Polar Science • Terrestrial & freshwater science, Hydrology and Bioinformatics VO Sandpit, November 2009
  • 3. The Scientific Method A key part of the scientific method is that it should be reproducible – other people doing the same experiments in the same way should get the same results. Unfortunately observational data is not reproducible (unless you have a time machine!) The way data is organised and archived is crucial to the reproducibility of science and our ability to test conclusions. This is often the only part of the process that anyone other than the originating scientist sees. We want to change this. http://www.mrsaverettsclassroom.com/bio 2-scientific-method.php VO Sandpit, November 2009
  • 4. The research data lifecycle Creating data Reusing data Processing data Researchers are used to creating, processing and analysing data. Data repositories generally have the job of preserving and giving access to data. Giving access to data Analysing data Third parties, or even the original researchers will reuse the data. Preserving data See http://data-archive.ac.uk/createmanage/life-cycle for more detail VO Sandpit, November 2009
  • 5. What is a Dataset? DataCite’s definition (http://www.datacite.org/sites/default/files/Bu siness_Models_Principles_v1.0.pdf): Dataset: "Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data." (from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite's Best Practice Guide for Data Citation). VO Sandpit, November 2009 In my opinion a dataset is something that is: • The result of a defined process • Scientifically meaningful • Well-defined (i.e. clear definition of what is in the dataset and what isn’t)
  • 6. Creating a dataset is hard work! "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com VO Sandpit, November 2009
  • 7. But sometimes other people don’t get it. "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com VO Sandpit, November 2009
  • 8. Creating data: a radio propagation dataset The problem: rain and cloud mess up your satellite radio signal. How can we fix this? Italsat F1: Owned and operated by Italian Space Agency (ASI). Launched January 1991, ended operational life January 2001. VO Sandpit, November 2009
  • 9. The receive cabin at Sparsholt in Hampshire Inside the receive cabin – the instruments my data came from VO Sandpit, November 2009
  • 10. Creating/processing data One day’s worth of raw data from one of the receivers My job was to take this... VO Sandpit, November 2009 ...turn it into this....
  • 11. Analysing data …a process which involved 4 major steps, 4 different computer programmes, and 16 intermediate files for each day of measurements. Each month of preproccessed data represented somewhere between a couple of days and a week's worth of effort. It was a job where attention to detail was important, and you really had to know what you were looking at from a scientific perspective. ...with the final result being this. VO Sandpit, November 2009
  • 12. Preserving data (the wrong way!) Part of the Italsat data archive – on CDs in a shelf in my office VO Sandpit, November 2009
  • 13. What the processed data set looks like on disk What the raw data files looked like. (I do have some Word documents somewhere which describe what all this is…) VO Sandpit, November 2009
  • 14. Example documentation Note the software filenames in the documentation. I still have the IDL files on disk somewhere, but I’d be very surprised if they’re still compatible with the current version of IDL VO Sandpit, November 2009
  • 15. Documentation can sometimes produce mixed feelings "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com VO Sandpit, November 2009
  • 16. What it all came down to: Composite image from Flickr user bnilsen and Matt Stempeck (NOI), shared under Creative Commons license And I wasn’t even preserving my data properly! VO Sandpit, November 2009
  • 17. As for giving access to the data… I did share, but there was a lot of non-disclosure agreements (I am not a lawyer!) And I didn’t feel like I got the credit for it.(The first publication based on the data wasn’t written by me, and I didn’t even get my name in the acknowledgements.) VO Sandpit, November 2009
  • 18. Good news: the data is all on the BADC now VO Sandpit, November 2009
  • 19. Another example: How is my scarf like a dataset? • • • • • • • • The raw material it’s made from doesn’t contain information But the act of knitting encodes information into the scarf The scarf is the result of a well defined process (knitting) and has a particular method used to create it I need to be able to describe it I need to be able to find it I need to store it properly so it doesn't get lost, or corrupted (i.e. eaten by moths or shredded by mice) I might need to recreate it so I need to keep information about it I put a lot of time and effort into making it, so I’m very attached to it! VO Sandpit, November 2009
  • 20. Just like not all scarves are the same, not all datasets are the same! http://www.flickr.com/photos/nazliceti ner/6448303541/ http://www.flickr.com/photos/lo vefibre/3251690074/ http://www.flickr.com/photos/maco_nix/50198 85742/ http://www.flickr.com/phot os/halfbisqued/80841459 76/ If in doubt, ask the creator http://www.flickr.com/phot os/lucathegalga/2282305 884/ VO Sandpit, November 2009 http://www.flickr.com/ photos/ujkakevin/230 3531028/
  • 21. Metadata It is generally agreed that we need methods to: • define and document datasets of importance. • augment and/or annotate data • amalgamate, reprocess and reuse data To do this, we need metadata – data about data For example: Longitude and latitude are metadata about the planet. • They are artificial • They allow us to communicate about places on a sphere • They were principally designed by those who needed to navigate the oceans, which are lacking in visible features! VO Sandpit, November 2009 http://www.kcoyle.net/meta_purpose.html Metadata can often act as a surrogate for the real thing, in this case the planet.
  • 22. Metadata for my scarf • • • • Dataset views and suggested uses Descriptive: “teal blue”, “scarf” Dimensions: 200cm long, 20cm wide Location: “Around my neck”/”Hanging on the door of my wardrobe” Identifier: KOI (knitted object identifier) Information needed to recreate it: • The raw material: King Cole Haze Glitter DK, colourway 124 - Ocean, with dyelot 67233 • Needle size: 4mm • Algorithm used to create it: 18 stitch feather and fan stitch with 2 stitch garter stitch border at the edges • Number of stitches cast on: 54 • Tension (how tightly I knit in this pattern): 28 rows and 27 stitches for a 10cm by 10cm square VO Sandpit, November 2009
  • 23. Metadata for Discovery, Documentation, Definition Lawrence et al 2009, doi:10.1098/rsta.2008.0237 VO Sandpit, November 2009
  • 24. MOLES: Metadata Objects for Linking Environmental Sciences v3.4 http://proj.badc.rl.ac.uk/moles/browser/branches/V3.4/M ODEL/Diagrams/MOLES3.4Summary.png VO Sandpit, November 2009
  • 25. What do data centres do? Data Curation Lifecycle Model The Digital Curation Centre’s Curation Lifecycle Model provides a graphical, high-level overview of the stages required for successful curation and preservation of data from initial conceptualisation or receipt through the iterative curation cycle. http://www.dcc.ac.uk/resources/curation-lifecycle-model VO Sandpit, November 2009
  • 26. Data repository workflows • Workflows are very varied! No onesize fits all method • Can have multiple workflows in the same data centre, depending on interactions with external sources (“Engaged submitter”/ “Data dumper” / “Third party requester”) VO Sandpit, November 2009
  • 27. Why should I bother putting my data into a repository? "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com VO Sandpit, November 2009
  • 28. It’s ok, I’ll just do regular backups Phaistos Disk, 1700BC These documents have been preserved for thousands of years! But they’ve both been translated many times, with different meanings each time. Data Preservation is not enough, we need Active Curation to preserve Information VO Sandpit, November 2009
  • 30. Example Big Data: CMIP5 CMIP5: Fifth Coupled Model Intercomparison Project • Global community activity under the World Meteorological Organisation (WMO) via the World Climate Research Programme (WCRP) •Aim: – to address outstanding scientific questions that arose as part of the 4th Assessment Report process, – improve understanding of climate, and – to provide estimates of future climate change that will be useful to those considering its possible consequences. Take home points here: Many distinct experiments, with very different characteristics, which influence the configuration of the models, (what they can do, and how they should be interpreted). VO Sandpit, November 2009
  • 32. CMIP5 numbers! Simulations: ~90,000 years ~60 experiments ~20 modelling centres (from around the world) using ~30 major(*) model configurations ~2 million output “atomic” datasets ~10's of petabytes of output ~2 petabytes of CMIP5 requested output ~1 petabyte of CMIP5 “replicated” output Which are replicated at a number of sites (including ours) Of the replicants: ~ 220 TB decadal ~ 540 TB long term ~ 220 TB atmosphere-only ~80 TB of 3hourly data ~215 TB of ocean 3d monthly data ~250 TB for the cloud feedbacks ~10 TB of land-biochemistry (from the long term experiments alone) VO Sandpit, November 2009
  • 33. Handling the CMIP5 data • • Major international collaboration! Funded by EU FP7 projects (IS-ENES, Metafor) and US (ESG) and other national sources (e.g. NERC for the UK) http://esgf-index1.ceda.ac.uk/esgf-web-fe/ VO Sandpit, November 2009 33
  • 34. Summary of the CMIP5 example The Climate problem needs: – Major physical e-infrastructure (networks, supercomputers) – Comprehensive information architectures covering the whole information life cycle, including annotation (particularly of quality) … and hard work populating these information objects, particularly with provenance detail. – Sophisticated tools to produce and consume the data and information objects – State of the art access control techniques Major distributed systems are social challenges as much as technical challenges. CMIP5 is Big Data, with lots of different participants and lots of different technologies. It also has a community willing to work together to standardise and automate data and metadata production and curation. VO Sandpit, November 2009 34
  • 35. http://www.flickr.com/photos/zlatko/5975700417/ Big Data: • Industrialised and standardised data and metadata production • Large groups of people involved • Methods for attribution and credit for data creation established Long Tail Data: • Bespoke data and metadata creation methods • Small groups/lone researchers • No generally accepted methods for attribution and credit for data creation VO Sandpit, November 2009
  • 36. Future role of the library Domain specific repositories can: • Pick and choose what data to keep • Ask for (and get) more detailed metadata • Provide specific tools and services (visualisations, server-side processing,…) • Deal with Big Data! Libraries will need to: • Pick up and manage/archive the long-tail data where there isn’t a domain repository • Have generalised, widely applicable systems that can cope with subjects from astronomy to zoology • Be prepared to cope with anything! VO Sandpit, November 2009
  • 37. Don’t Panic! There’s a lot of information out there about managing data. Some of it won’t suit what you’re trying to do, but some will. Learn from others’ experiences good and bad! Good luck! VO Sandpit, November 2009
  • 38. Summary and maybe conclusions? • Data is important, and becoming more so for a far wider range of the population • Conclusions and knowledge are only as good as the data they’re based on • Science is supposed to be reproducible and verifiable • It’s up to us as scientists to care for the data we’ve got and ensure that the story of what we did to the data is transparent •So we can use the data again •And so people will trust our results • It’s not an easy job – but someone’s got to do it! VO Sandpit, November 2009
  • 39. Thanks! Any questions? sarah.callaghan@stfc.ac.uk @sorcha_ni http://citingbytes.blogspot.co.uk/ Image credit: Borepatch http://borepatch.blogspot.com/2010/06/itsnot-what-you-dont-know-that-hurts.html VO Sandpit, November 2009