This document discusses data management for the CGIAR Research Program on Dryland Systems. It provides an overview of ICARDA's current status on data management, key sources of data under the CRP Dryland Systems, and issues related to research and data quality. The document proposes a workflow for improving data management and sharing that involves validating data, archiving it, and establishing permissions and approvals for sharing. It emphasizes training partners, identifying risks, and monitoring impact to enhance data quality and open access for CRP Dryland Systems research.
1. Data management for CRP DS research:
where do we currently stand at ICARDA?
• CO: CGIAR Open Access and Data Management
Plans & Implementation (Article 4.1.9) states
“Open Access and Data Management Plans
should be prepared in order to ensure
implementation of this Policy. Such Plans shall, in
particular, outline a strategy for maximizing
opportunities to make information products Open
Access”.
• Output: Research quality and data quality issues
in CRP DS research and mechanism/workflow
2. Data management for CRP DS research:
where do we currently stand at ICARDA?
Plan
•Sources of data under CRP DS
•Status of DM at ICARDA
•CRP DS research areas for data
generation
•Issues and solutions related to Research
quality and data quality
•Workflow for DM sharing
3. Scope: Sources of data
Scope of work is determined by observing a
complex interplay of
•Base components: crops, livestock, rangelands,
trees etc. & production systems
×
•Biophysical environment constraints: water
scarcity, land degradation
×
•Technological access : Access to the product
and regulatory environment
4. Partners in the DM (OA)
• Who generates the data? Who owns them? Who regulates
their sharing?
Outcome: What after archiving with an Open Assess (OA)
System?
• Data mining
• Exploration of large or even BIG data leading to a wider
picture viewed from the bridge
• No dearth of random factors/sources in data
• Availability of prior information
• Bayesian analysis to span the statistical inference domain
to reality
5. CRP DS DM Current Status
DM Status at ICARDA /Its Flagships Target
Regions
•ICARDA Projects: D and DM with scientists,
archived in their laptops, different various
locations/countries
•GU data on Central servers, Amman, Jordan
•D Manager to be recruited
•NARS data with NARS
6. ICARDA Projects dealing with CRP DS
ICARDA Programs: DSIPS, IWLM, SEPR, BIGM
(also generating data for other CRPs)
1. Cropping systems and Agronomy on-station
and on-farm
•On-station Trials
– Single factor, multi-factors including:
– systems of rotations, intercrop, monocrops
– crop components
– fertilizer input
– IPDM controls and other management factors
7. CRP DS DM - Research quality
On-farm Trials:
• Less frequent: research for technology
generation
– Experimental design with small number of
treatments, small blocks, variable treatment
designated as control or farmer-technology,
relatively large number of replications
•Most frequent: technology verification and
demonstration
•Sampling design: large plots, small number of
sample is a concern
8. ICARDA Projects dealing with CRP DS
• A list of data sources for CRP DS and other ICARDA
projects:
• Design of crop rotation trials [general]
• DM and Analyses of data from the 2-course long-term
wheat rotations (productivity, sustainability aspects
including time-trend estimation). [NAWA: Long-term
crop rotation trials on wheat & Barley at Tel Hadya,
Syria, Long-term wheat rotation trial at Kamishly, Syria,
Long-term sustainability trials in Egypt, etc.]
• Evaluation of conservation tillage data [CA trials in
Jordan and Iraq]
• Analyses of data from livestock evaluation experiments
[Long-term trials, wheat & Barley at Tel Hadya]
9. ICARDA Projects dealing with CRP DS
ICARDA Outlook:
•Decentralization of ICARDA has changed the way we do
our business.
•Archiving data and sharing has [essentially] become the
way of our business.
•We need to extend [quality] data sharing from within
ICARDA to Public.
NARS/ five Flagship target regions
•1) The West African Sahel and dry savannas , 2) East and
Southern Africa, 3) North Africa and West Asia, 4) Central Asia,
5) South Asia
10. ICARDA Projects dealing with CRP DS:
Key to DQ
Research quality (RQ)
•Experimental design could be an issue (in terms of blocking and replications)
•Approach/Solution: thorough discussion with subject matter specialists and
biometrician/statistician
•Resources for enhancing RQ and DQ:
•
•JNR Jeffers (1978). Statistical Checklist: Design of Experiments No. 1 (Statistical checklists). Institute of
Terrestrial Ecology, Natural Environment Research Council, Cambridge, UK.
http://www.sawleystudios.co.uk/jnrj/StatisticalCheck/Design.htm)
•
•JNR Jeffers (1979). Sampling (Statistical Checklist 2). Institute of Terrestrial Ecology, Natural
Environment Research Council, Cambridge, UK.
http://www.sawleystudios.co.uk/jnrj/StatisticalCheck/Sampling.htm
•
•David J. Finney (1990). Statistical data-their care and maintenance. Indian Society of Agricultural
Statistics.
•“This bulletin is extremely useful for students and research workers … topics dealt with are: acquisition of
data, design of data gathering , care for data, types and units of data analysis and databases, copying,
statistical ethics, data-entry to the computer, data scrutiny, integrity and some illustrations.”
11. ICARDA Projects dealing with CRP DS
Examples of Data quality issues:
•Experimental design accepted; crop management properly followed.
• Experimental plots: plot size, harvested areas [2-row, 3-row, 4-row plots],
calculation per hectare basis
•Days to 50% flowering- how many plants were actually observed?
•plant height (cm)- number of plants
•seed yield, bio yield – area used; drying methods
•Data entry?
•Lack of Data recording electronic devices and transfer to file at laptop
– Early days: field-books
– Recent: Android Apps etc.
– Data in Excel worksheet
•What checks should we perform?
•What should be the level of Experimental data quality for public sharing
12. ICARDA Projects dealing with CRP DS
2. Crop Improvement CRPs (CRP Wheat, CRP DC, CRP GL)
•Single factor- Crop varieties
•Unreplicated designs for test materials + replicated or repeated
checks
•Replicated variety trials in RCB, IBD (alpha-designs), p-rep designs
•METs (Multi-environments/Multi-location and multi-year trials)
•Two-factor experiments
– Crops + Crop varieties
– Sometimes agronomic trials – planting dates, IPDMs etc.
•Result outputs from Commodity CRPs, where breeding is the key
component (CRP Wheat, CRP DC, CRP GL) flow to CRP DS.
Where are the data?
•Data with scientists in their laptops
•Status in relation to DM(OA)/sharing is unknown to me
13. ICARDA Projects dealing with CRP DS
3. Issues of the Poor Quality of Data- Indicators and
resolves
A frequent issue of data quality
•Is really something wrong with my data? Some statistical
procedures work and some others do not, BUT the data are
the same. Regression, GLM works but ANOVA does not.
What is wrong with Stats?
•ANOVA may turn to be a great tool for data checking, --
missing values in data variables may be the reality.
•How about missing or repeats by mistake in a factor levels or
factorial combinations?
14. ICARDA Projects dealing with CRP DS
Some cases of data quality issues:
•1. Research Quality– experimental design OK but Data on Design not
OK/ design factors incorrectly entered; frequently encountered; Must
be corrected before analysis else we have carried out a study different
from what we planned and still think.
– factor combinations not aligning with design (not missing
observations)
•2. Observed data values; traits values: errors of recording/data
transfers to files
– values out of range (a variable to lie within 0-100 or 0-1 goes outside;
recording error)
– Outliers/ recorded values appear too extreme. Will require validation
with the assistant/scientists and if errors are found then must be
corrected; generally viewed as the context of uni-variate analysis.
– Outliers may have issues of interpretation and detection. Looks outlier
in BY but not in log(BY) or sqrt(BY). There might be multivariate
outliers. A column of remarks, possibly in the field book may support
the recorded data.
15. ICARDA Projects dealing with CRP DS
Some cases of data quality issues… continued
•3. Relationships between the traits appearing along the crop
development cycle may also be identified and used to build in data
quality
• DAF << DMAT
• GY << BY
•4. Helpful: Electronic data loggers (balance, Android Apps, with
GIS/Date)
•5. Role of the scientist/ a data supervisor must be made effective—
random checks on data recording in the field book as well as in the
file. Observations should be validated by another researchers
experienced in the same discipline, particularly with visual scores.
Random checks could be more effective. Data errors could be linked
to the observer.
16. ICARDA Projects dealing with CRP DS
Some checks and balances:
•Data care bulletin (see References)
Tools:
•Design experiment/survey specific tools
(Biometrician/Statistician to Data Manager). Clearly define
the roles.
•Examine factors combinations appearing in the data
•Examine tables/cross-tables for qualitative data
•Descriptive statistics
• min, max, range, ratio=max/min (min>0)
• Histograms
•Box-plots and other diagnostics
17. ICARDA Projects dealing with CRP DS
Some checks and balances … continued.
•No go with ANOVA may turn to be a good thing to check
bad data. However, as in above,
– Missing values in response/covariate variables are a reality
– But missing a factor level or factor combinations appear due to
data entry error; combinations being different from those in the
design.
– Cases of repeated units – data entry errors
•Outliers, if detected via a model fitting should stay in the
data. Of course data validation, where possible, is
encouraged.
18. ICARDA Projects dealing with CRP DS
Some checks and balances...continued:
•Benefiting from ICRISAT Tools and Techniques
• on data checking tools
• archiving the data on public platforms (an
enforcer of Data Quality)
• e.g. data systems from ICRISAT, Dataverse (
http://dvn.iq.harvard.edu/dvn/)
•Computing tools/procedures: Training and development
•Excel macros, Genstat/SAS/SPSS/R/other software
•Database development/datasheet preparation/ archiving
19. ICARDA Projects dealing with CRP DS
An attractive specialization:
•Data Science
•The Data Scientist’s Toolbox:
https://www.coursera.org/course/datascitoolbox
20. ICARDA Projects dealing with CRP DS
Crystalizing an approach:
4. CRP: Dryland Systems Management: Workflow
components
Home Center: Project/Meta data: Project ID, objectives, location,
year, personnel (Planner, M&E team, data collector etc.), trial level
information, factors (design and treatment), variables etc., A report of
data validation in Step 2; links to data;
•Data <<<< validation (via agreed tools)
•Mechanism for Data Quality Check
•1. Scientists >> >2. Statistician/DM team: apply the agreed tools
• a) If fails-----> (1) to scientist for update
• b) If passes----> Get metadata and links to data
•2. Archiving (what? who will do this? DM Team?)
• Sharing permissions etc. This could be a Workflow of permissions:
Requester ---> Approval 1--->Approval 2 ---…---> Director CRP DS/nominee.
21. ICARDA Projects dealing with CRP DS
…..continued:
•Information Management. This refers to the [statistically
analysed] results files/publications generated.
•Knowledge Management: Key findings, Implications,
lessons learned
NARS
•Identify the active NARS partners
•Training on the above tools and workflow, Share Policy
and Procedure on CRP DS DM (OA)
•Identify the risk factors and their indicators and develop
an action plan with resources required
•Measure and Monitor the impact