Overcoming obstacles to sharing data about human subjects
1. Overcoming obstacles to sharing
data about human subjects
Force11 conference
Portland, Oregon
18 April, 2016
Robin Rice
EDINA and Data Library
University of Edinburgh, UK
3. The status quo
Most data underlying published research, even publicly funded research, are
not shared. How can research claims be verified?
Common barriers are well-known, confidentiality concerns are high
Qualitative research data and small-scale surveys are not commonly re-used
Tendency is to err on side of caution, given legal & ethical responsibilities
As open science agenda pushes disciplines toward reproducibility, there is a
danger of human subject-oriented research falling behind
5. What a researcher can do to be able to share
Plan for sharing (via a data management plan)
Don’t collect personal information that is not needed
Principle of informed consent: get consent to share data
Document all data processing (inside & outside analysis package)
Attribute, anonymise, or aggregate individual’s data
7. How to create an anonymised, open dataset
Numeric data, eg. surveys Qualitative data, eg. interviews
Remove names and identifiers Share the edited transcript, not video or audio
unless consented
Renumber and resort case ids Agree a pseudonym with each subject
Group numbers into categories - banding Remind subject not to disclose personal or sensitive
information, eg. about family members
Top and bottom code numbers (age, salaries) Replace proper nouns in text (names, placenames
etc.) using square brackets, don’t blank out
Use standard codes (eg. SOC, SIC) and geographic
boundaries at appropriate levels; not fine-grained
Avoid over-anonymising or data will lose value
Check for low cell counts in cross-tabs Keep a log of all replacements, generalisations or
removals made; store separately from anonymised
data
9. When open data access is not plausible
When potential for harm to research subjects is too great
Information that can be used to discriminate requires extra protection
When required by the data producer, funder, health authority, etc.
Sometimes precautions are required even for anonymised data
When anonymization is either not feasible or would destroy value of
dataset
Population too small to be anonymous, e.g. those with genetic condition
10. Lock it up to
keep safe
(Eric Parker on Flickr)
11. Take proportionate precautions;
ease route to access
Make documentation and/or code about dataset openly available
Use a template for a data access application & data use agreement
Make arrangements for unbiased review of applications for access
Transfer data safely; use secure channels, encryption
Consider options for remote access in favour of on-site only access
13. Data linkage
Probabilistic or ‘fuzzy’ matching is one method used to identify
individuals by combining information from different datasets
This can be done for legitimate research purposes, such as matching
cases in different government (administrative) datasets
Informed consent is normally impossible for this technique; the data
were collected for a different purpose than the current research
proposal
15. Information governance
Requires a bigger infrastructure than one researcher can create
Has been developed to meet ethical standards where informed
consent is not possible and research is in the public interest
Allowed by current European Data Directive and new regulation forthcoming
Makes use of the ‘five safes’
safe data, safe researcher, safe project, safe settings, safe outputs
16. Check out our free educational resources -
(R.Rice@ed.ac.uk)
Research Data Management Training
MANTRA
http://datalib.edina.ac.uk
Research Data Management & Sharing
MOOC
www.coursera.org/learn/data-
management
Notas do Editor
Information that can be used to discriminate requires extra protection:
Racial or ethnic origin
Political opinions
Membership of a political association
Religious beliefs or affiliations
Membership of a professional or trade association
Membership of a trade union
Sexual preferences or practices
Criminal record
Health and genetic information
http://www.telegraph.co.uk/finance/newsbysector/banksandfinance/11378443/You-can-be-identified-from-just-four-purchases-on-your-credit-card-bill-study-finds.html
Just four bits of information gleaned from a shopper's credit card can be used to identify almost anyone, researchers have found.
The study in the journal Science analysed three months of credit card records for 1.1m people in an unidentified industrialised country.
Ninety percent of individuals could be uniquely identified using just four pieces of information, such as where they bought coffee one day or where they purchased a new jumper or pair of shoes.
In other words, credit cards use was just as reliable at identifying someone as mobile phone records, the study found.
Knowing the price of a transaction could boost the risk of re-identification by 22pc.