Powerpoint presentation used to support the 'Ecosystem data and TERN' workshop on 19 May 2014, held at Macquarie University in Sydney as part of the Genes to Geosciences seminar series.
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Ecosystem data and TERN: Genes to geosciences workshop 19 May 2014
1. Ecosystem Data and Australia’s TERN:
Making the most of TERN to benefit
your research and data management!
A workshop for the “Genes to Geosciences” Series
Macquarie University, May 19, 2014: 1000 – 1500 hrs
2. Contents
1. Welcome and Introductions
2. TERN and the Research Cycle and Data Cycle
3. Australian Ecosystem Data
• what’s available
• data discovery
• evaluation of data – is it suitable for my needs?
• download and appropriate re-use
4. eMAST Example - New possibilities with ecosystem data
5. Data Management and Publishing
• why does it matter and how can it help you
• data management plans
• data publishing – what are your options and why does it matter
• data publishers – a continuum of approaches
• data publishing options with TERN
6. Wrap-up and Exit Survey
3. Who are we?
To understand your current practices and topics of interest we did a survey
beforehand.
Have you previously searched for and accessed data from a public repository?
Yes: 7 No: 5
Do you have a data management plan?
Yes: 4 No: 8
Have you published data?
Yes: 6 No: 6
Survey – your prior knowledge, experience, and
requests for today
4. • To explain and demonstrate options available to the ecosystem
science research community to use online resources for
searching, evaluating, downloading, publishing and managing
ecosystem data sets.
• Focus on activity and learning-by-doing, rather than too much
talking
• To recognise different needs of researchers in different position
and stages in research careers.
1. Aims and outcomes
5. • What will you walk away with?
- Better understanding of the national research infrastructure available
to you – TERN
- Sense of the kinds of ecosystem data that is available, and how you can
get it
- Experience searching, assessing and downloading data for your
research
- Understanding the principles of good data management and the
benefits for you
- Appreciation of the options for data management
- Introduction to tools for managing your data, including TERN
infrastructure
1. Aims and outcomes
6. 2. What is TERN?
• Infrastructure and networks to support coordinated, collaborative ecosystem
science community
• Enabling sustained, long-term collection, storage, synthesis and sharing of
ecosystem data
• Connecting science with policy and management
9. Storage,
preservation and
discoverability
of data
Data analysis,
integration and
synthesis
r
Ecosystem Science
Data + meta-data,
licensing
Research output:
new data and
publications
Enables large scale and
coordinated data
collection, sharing and
multiple re-uses
Enhanced ability to
revise, question and
expand knowledge
Knowledge gap:
research
questions
Proposal and
planning
Data collection,
verification,
quality assurance
and control
Research lifecycle
10. Storage,
preservation and
discoverability
of data
Data analysis,
integration and
synthesis
r
Ecosystem Science
Data + meta-data,
licensing
Research output:
new data and
publications
Enables large scale and
coordinated data
collection, sharing and
multiple re-uses
Enhanced ability to
revise, question and
expand knowledge
Knowledge gap:
research
questions
Proposal and
planning
Data collection,
verification,
quality assurance
and control
This morning
11. 3. Australian Ecosystem Data
• Learning Objectives:
To identify the following resources for Australian ecosystem science applications:
- ecosystem data stores
- meta-data portals
- data publishers
• Sections:
• 1030 - 1040 Data discovery
• 1040 -1055 Data discovery - exercise
• 1055 -1125 Evaluation of data – is it suitable for my needs?
• 1125 – 1145 Download and appropriate re-use
• 1145 – 1215 eMAST Possibilities
12. Data Discovery
Learning objectives:
To understand how to approach data discovery through
systematic use of ecosystem data stores, portals and data
journals.
15. TERN’s data portals and meta-data structure:
Auscover
Ozflux
Ausplots, and Transects
Coasts
Soils
Supersites Network and LTERN
eMAST
AeKOS
EcoinformaticsTERN Data
Discovery Portal
16. TERN Data:
TERN facility Kind of data available Where can I access [+ submit] data ?
AusCover Remote sensing data and derived
products covering: land cover;
ecosystem variables; fire; surface
radiation, meteorology; base satellite
data and inputs to satellite processing;
site-based datasets.
Via TDDP or AusCover portal:
www.auscover.org.au/data/product-list
[Submit - matt.paget@csiro.au]
AusPlots Vegetation and soil surveys and
samples; photopoints.
Over 330 sites sampled so far.
As at March 2014: data from ~130
rangelands sites available, with more
coming soon.
Via AEKOS data portal www.aekos.org.au or
Soils to Satellites soils2sat.ala.org.au/
(In future will also be searchable from TDDP)
Specimens (vegetation voucher samples and
soils) ian@ausplots.org.au
Photopoints: Contact ben@ausplots.org.au
ACEAS
(Australian
Centre for
Ecological
Analysis and
Synthesis)
Synthesised data products from ACEAS
working groups.
Via TDDP or ACEAS portal:
aceas-data.science.uq.edu.au/portal/
[Submit – s.guru@uq.edu.au]
17. TERN Data:
TERN facility Kind of data available Where can I access [+ submit] data ?
ACEF
Australian
Coastal
Ecosystems
Facility
Key datasets include coastal
bathymetry, coastal habitats, water
quality, beach morphology, turtle
distribution and habitat
Via TDDP or ACEF portal:
acef.tern.org.au/portal/
[Submit – jonathan.hodge@csiro.au]
Australian
SuperSite
Network
(ASN)
Vegetation composition, structure and
cover; fauna surveys; soil properties;
gas and energy flux (see OzFlux below);
meteorology; surface, ground and soil
water
Via TDDP or ASN portal:
www.tern-supersites.net.au/knb/
[Submit – shiela.lloyd@jcu.edu.au]
Australian
Transect
Network
(ATN)
Vegetation and soil surveys, including
specimens.
Via AEKOS data portal www.aekos.org.au or
Soils to Satellites soils2sat.ala.org.au/
(In future will also be searchable from TDDP)
Specimens (vegetation voucher samples and
soils) stefan.caddy-retalic@adelaide.edu.au
Eco-
Informatics
Ecological data from individual sites,
and from broadscale surveys.
Data from AusPlots and the Australian
Transect Network, alongside key data
from State and Federal partners.
See AEKOS data publication schedule
for more detail.
www.aekos.org.au
(In progress of submitting metadata to TDDP)
[submit - www.aekos.org.au/access_shared]
18. TERN Data:
TERN facility Kind of data available Where can I access [+ submit] data ?
eMAST
Ecosystem
Modelling and
Scaling
Infrastructure
Modelled climate and land surface data
derived from surface observations.
Partially available via eMAST:
www.tern.org.au/e-MAST-Data-Products-
pg26355.html
(In progress of submitting metadata to TDDP)
[Submit - bradley.evans@mq.edu.au]
LTERN
Long-Term
Ecological
Research
Network
Vegetation composition, structure and
cover; fauna surveys; surface, ground
and soil water
Via TDDP or LTERN portal:
www.ltern.org.au/knb/
[Contact emma.burns@anu.edu.au ]
OzFlux CO2 and other gas concentration and
fluxes; evapotranspiration; surface
energy balance; carbon and water
cycles
Via TDDP or OzFlux portal:
ozflux.its.monash.edu.au/ecosystem/home
[Submit -pisaac.ozflux@gmail.com ]
Soil and
Landscape
Grid of
Australia
Functional soil attributes and key
landscape features.
Under development. Best available data
products via TDDP:
http://portal.tern.org.au/search#!/q=soils/p=
1/tab=collection/group=Soils/num=10
[Submit - mike.grundy@csiro.au]
23. Data Discovery - Exercise
Exercise:
• Using the TERN Data Discovery Portal:
http://portal.tern.org.au
24. Data Download and Evaluation
Learning objective
To understand how to effectively search, download and critically assess
ecosystem data sets for use in your own work from: ecosystem data stores,
portals and data journals.
25. Evaluation of data – is it suitable for my needs?
Exercise
Exercise:
• Evaluating your chosen dataset:
• What is the metadata?
• What do different parts of the metadata mean?
• Is this going to be useful for you?
• Criteria to use for evaluation?
Data format (s)
Data currency
Data collection methods
Data QA/QC
Data licence
26. Download and Appropriate Re-use of Data
Learning Objective:
To understand what data “licensing” is from the research producer, user and
owner’s points of view.
What do licences mean?
If you download data with a licence, what are your obligations for re-use?
29. ecosystem Modelling And Scaling
infrasTructure (eMAST)
Integrating multiple data sets
Presentation by Brad Evans based on contributions by Colin
Prentice, Michael Hutchinson, Gab Abramowitz, Ben Evans,
Rhys Whitley, Julie Pauwels
33. Research domain: Impacts of rising CO2
Thus the ecosystem modeller seeks to:
1. Understand the effects of CO2 increases on
ecosystems
2. Quantify negative feedbacks – the impact of
rising CO2, land surface warming and
extreme events on ecosystems
6CO2 + 6H20 C6H12O6 + 6O2
light energy
chlorophyll +
nutrients
34. IPCC Consensus: CO2 Fertilization
WUE
NPP
WUE =
GPP
ET
NPP = GPP - R
N & P
Land Surface Models
-> Coupled to Climate Models
Other approaches
35. Observations , models and policy
(1) MORE
Observations
(2) BETTER
models are
developed
(3) Models
evaluated
against
observations
(4) EVEN
BETTER
Models
(5) BETTER
Policy
A viscous cycle
36. Unifying principles for ecosystem modellers
# 1: Observations, Models and Understanding:
Integration of empirical science and modelling
betters scientific understanding.
# 2: Transparency, Evaluation, Confidence :
Reproducible models, evaluated with observations,
enhance model efficacy.
# 3: Innovation, Standards, Simplicity: Continuous
innovation, use standards, mitigate unnecessary
complexity.
37. eMAST Observations and Models
Models
OzFlux
CO2 and water fluxes
Plot Networks
Vegetation Observations
via AeKos and Others
AusCover
Remote Sensing –
Satellite, in-situ & Obs.
Bureau of
Meteorology and
Geoscience
Australia
Land
Surface
Models
Soils
Properties of soil
dap.nci.org.au
geonetwork
TERN TDDP
tern.org.au
RDSI VM’s
raijin@nci
INTERSECT
NeCTAR
PALS
EVALUATION
NeCTAR
Virtual
Labs
38. eMAST Delivers in 2014-2015 : 1 of 3
Simple land surface process models
• eMAST R-Package: MQ & ANU Bioclimate indices and surface processes
• eMAST Earth System Model Connex (C++ & FORTRAN): MQ & ANU
Bioclimate indices and surface processes coupled to ACCESS and other
Earth System Models
• ePiSaT R-Package: Continental Gross Primary Production (data model
fusion)
• Community R-Packages: Hutchinson Drought & BoM Heatwave – in kind
from Ivan Hanigan (ANU)
• pyeMAST: Python version of eMAST tools including big data services
(connectivity with SPEDDEXES).
Statistical land surface models
• Data Assimilation: Ensemble Kalman Filter coupled to process based land
surface model (Renzullo, CSIRO)
• Fubaar: Machine learning land surface model (in-kind MQ – Keenan)
Open Source !
Tools
39. eMAST Delivers in 2014-2015 : 2 of 3
Observation assimilation into Models
• eMAST Ecosystem Model Parameters Database (EMP DB).
• NCAR’s Data Assimilation Research Testbed (DART)
• DART-CESM : In collaboration with NEON, Inc. (USA)
• DART-CABLE : In collaboration with the NCI, NCAR and CSIRO
• Assimilation of : fluxes, leaf properties, plot network observations
Modelled Data discovery and ACCESS Tools
• SPEDDEXES: A community based solution to (a) publishing big data (b)
sharing big data (c ) discovering big data and (d) programmatic access to
big data on Australia’s eResearch infrastructure.
• SPEDDEXES@NeCTAR-VL’s: Collaborative extension of the SPEDDEXES tools
to the NeCTAR Virtual Laboratories – embedding in the Climate and
Weather Laboratory
Benchmarking and Evaluation
• eMAST@PALS : Development of the PALS system for eMAST and TERN data
streams
• eMAST BENCH : International collaboration on benchmarking
Tools
40. eMAST Delivers in 2014-2015: 3 of 3
NEXT Generation of Ecosystem Models
• ARC DP on Australian Tropical Savanna’s : Past Present and Future:
Enhancing ecosystem models for Tropical Savanna’s
• ARC DP on the Next Generation of Ecosystem Models: Using plant trait
observations to inform a new approach to ecosystem modelling.
• GePiSaT: Global version of the ePiSaT model (eMAST and Imperial College
of London)
• CAMELS: Coupling ACCESS with Models of Ecosystems and the Land
Surface: Next generation approach to ecosystem and land surface
modelling
Datasets from eMAST
• ANUClimate: A extension of past methods for gridding Climate and
Weather for the Australian continent .
• eMAST Bioclimate
• eMAST Land Surface Modelling
Tools & Data
41. Climate and Bioclimate data
Res. 0.01 degrees (nominally 1km) T, P, R + and 50 + indices
42. : New approach for Big Data
It is no longer practical, let alone affordable, to
continue to do data-intensive ecosystem science
in the copy-and-work paradigm, a new approach
to working with Big Data is required.
Think about network data access, not file downloads
…
Cross-disciplinary use of file formats and services
…
Open-source server technology and file formats
…
Work with big data in a high performance facility
43. Big Data : eMAST’s collections
10
100
1000
10000 5419
1928
326
176 140
DataVolumes(TB)
Scientific Data for Research (NCI RDSI node)
by 2015
44. Three eMAST projects
1. Observations: The Ecosystem Model Parameters
Database
2. Models: Ecosystem Production in Space and Time
3. Observations in Models: CABLE-DART Data
assimilation on the NCI
45. Observations
The Ecosystem Model Parameters Database
• Originally setup to generate continental
scale surfaces of leaf properties
(nitrogen, phosphorus etc) using ANN’s
• Adapted in April 2014 for use with Data
assimilation
• Focal point for ecosystem scientists and
plot networks to contribute observations
for use in models
EMP DB
Example One
49. eMAST : Data assimilation
Collaborative ‘Community’ approach: Work with international experts (Fox –
NEON and Hoar – NCAR) and local champions Renzullo (CSIRO) and Evans. Open
to community participation (Wang, Haverd and Trudinger CSIRO)
52. Ecosystem Production in Space and Time
Example Three
ePiSaT
Data filtering:
Removal of outliers
etc.. Gap filling of
PAR (PPFD) for GPP
1
3
1R =
Assimilation
Amax = - 2
Efficiency
Φ =
2
2
3
Amax *
FC =
Rectangular
Hyperbole
3 parameter
1 2 3
Respiration
Quantum
R -
Φ I
Amax +Φ I
53. How does gross primary
productivity (GPP) vary in space
and time across Australia?
How can we ‘simply’ estimate GPP
across Australia?
What data does TERN provide that
might be useful for addressing this
research question?
Ecosystem Production in Space and Time
ePiSaT
54. Choose the ePiSaT model from
emast.org.au
TDDP or
SPEDDEXES
Obtain OzFlux data via the
TERN/ OzFlux portals
Run the ePiSaT model –
generate estimates of
ecosystem parameters,
evaluate them
Obtain climate (eMAST) and
satellite data (AusCover) to
scale the ePiSaT parameters
Produce continental scale
estimates of GPP and evaluate
them
Ecosystem Production in Space and Time
ePiSaT
55. This project is supported by the Australian National Data Service (ANDS). ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy
Program and the Education Investment Fund (EIF) Super Science Initiative. For more information visit the ANDS website ands.org.au and Research Data Australia services.ands.org.au.
58. Storage,
preservation and
discoverability
of data
Data analysis,
integration and
synthesis
r
Ecosystem Science
Data + meta-data,
licensing
Research output:
new data and
publications
Enables large scale and
coordinated data
collection, sharing and
multiple re-uses
Enhanced ability to
revise, question and
expand knowledge
Knowledge gap:
research
questions
Proposal and
planning
Data collection,
verification,
quality assurance
and control
This afternoon
59. 5. Data Management & Publishing
• Learning Objectives:
To understand recognised best practice in “data management” for ecosystem,
science data sets.
To understand what is required for “data publishing” in appropriate storage sites,
portals and journals for specific research purposes – and to understand the
diversity of options.
• Sections:
• 1305-1315 Why does data management + publishing matter and
how can it help you?
• 1315-1330 Data management plans - exercise
• 1330-1340 Data publishing – your options and why does it matter
• 1340-1350 Data publishers – a continuum of approaches
• 1350-1430 Data publishing options with TERN
60. Data Management
Learning Objectives:
To understand recognised best practice in “data management” for
ecosystem, science data sets.
- Why good data management is beneficial?
- What is good data management?
61. Poor Data Management
Unusable Lost Re-collected
www.shutterstock.com . 54240301
http://360digest.com/2006/02/25/messy-office-contest/
TERNAusPlots
62. Personal Drivers
Increase efficiency of research
Guarantee the quality and authenticity of data
Enable exposure of research outcomes via collaborations and
dissemination (40%)
Provide reproducibility of experimental and computational outcomes
Facilitate the validation and verification of results
Source: UQL-050112 – Research Data Management Fact Sheet 2
63. Survey on research data management 2012:
• 63% aware of Australian Code of Conduct
• 70% understand their data management responsibilities
• 70% don’t do data management plans
• 70% don’t keep a registry of research data collections
From Miller, C (2012). “Responses to interviews: University of Adelaide research data repository and metadata store”
• 82% agree data should be available to other researchers
• 81% would re-use another’s data
• 29% supported public access to their data
64.
65.
66. Data Management Plans - Exercise
Exercise:
Design of a “data management plan” to meet Australian
Research Council requirements.
ARC Proposal Guidelines – Under “Project Description”
“MANAGEMENT OF DATA
Outline plans for the management of data produced as a result of the proposed
research, including but not limited to storage, access and re-use
arrangements.”
67. Data Publishing
Learning Objectives:
To understand what is required for “data publishing” in appropriate storage
sites, portals and journals for specific research purposes – and to understand
the diversity of options available.
To understand the different levels of publishing possible under the “data
publishing continuum.”
68. Why should I publish data?
• replication and verification of work;
• formal and measureable recognition of data as a research output;
• a reduction in the duplication of data collection;
• re-use of data in multi- and interdisciplinary research;
• greater transparency in the research process.
69. High quality, well-described ecological
data for 1000s species occurring at
25,000 sites and another 67,000
coming soon
Successful data publishers get noticed
Correlation
between archived or
open access data
to copies of
published
articles and
citation impact
(Sharing detailed research data is
associated with increased citation
rate: Piowar, et al (2007)
70. Adopting good science practice
• Data are well-described and reproducible
• ApplyNHMRC and ARC research ethics
• NHMRC Open Access policy came into effect from 1 July 2012
http://www.nhmrc.gov.au/grants/policy/dissemination-research-findings
• ARC Open Access policy came into effect from 1 January 2013.
http://www.arc.gov.au/applicants/open_access.htm
“A11.5.2. Researchers and institutions have an obligation to care for
and maintain research data in accordance with the Australian Code
for the Responsible Conduct of Research (2007). The ARC considers
data management planning an important part of the responsible
conduct of research and strongly encourages the depositing of data
arising from a Project in an appropriate publically accessible subject
and/or institutional repository. “
71. When not to publish data or place restrictions
• Patent application
• Confidential human/individual details
• Confidential data due to commercial sponsorship arrangements
• Sensitive species declared by governments
• Sensitive location declared by governments
73. Data Publishing - Exercise
Exercise
Identification and review of potential data publishers.
We will divide you into small groups to assess the approach
to data publishing of a given data publisher in terms of:
- submission and review process;
- attributes required for re-use;
- capacity for re-use
- costs; and
- ability to measure output and re-use.
74. Data Publishing with TERN
Learning Objectives:
Identification of current and planned data publishing options in
TERN.
To understand how you can publish your data with TERN
75. TERN’s data portals and meta-data structure:
Auscover
Ozflux
Ausplots, and Transects
Coasts
Soils
Supersites Network and LTERN
eMAST
AeKOS
EcoinformaticsTERN Data
Discovery Portal
77. - Metadata complying
with ISO 19115 and 19139
international standards;
specifically the ANZLIC Profile of
those standard
- Easy to use
- Base template which can
accommodate in depth details
if needed
- *.xml format
Tool developed by ANZLIC - the Spatial
Information Council (ANZLIC)
Data Publication in TERN - ACEF using ANZMet Lite
http://spatial.gov.au/sites/default/files/legacy/osdm.gov.au/Metadata/ANZLIC%2Bmetadata%2Bresources/default.html
81. 6. Wrap up
Outcomes?
- Better understanding of the national research infrastructure
available to you – including TERN
- Knowledge of the kinds of ecosystem data that is available, and how
you can get it
- Experience searching, assessing and downloading data for your
research
- Understanding the principles of good data management and the
benefits for you
- Appreciation of the options for data management
- Introduction to tools for managing your data, including TERN
infrastructure
82. 6. Wrap up
• Email exit survey tomorrow
• Presentations and materials online and links sent to you
• Please contact us with any questions or follow up items
83. International Partners
TERN is supported by the Australian Government through
the National Collaborative Research Infrastructure Strategy
and the Super Science Initiative
84. More Questions?
Prof Stuart Phinn
s.phinn@uq.edu.au
Dr Bek Christensen
r.christensen@uq.edu.au
www.tern.org.au