SlideShare uma empresa Scribd logo
1 de 14
Reproducibility, open data, &
GDPR
Cylcia Bolibaugh, Education,
CReLLU
Data sharing in Education
EROS (Education Researchers for Open Science)
(UYSEG, CRESJ, PERC, CReLLU)
• qualitative
• quantitative (experimental)
• quantitative (individual differences)
• Various goals for sharing data -- today’s
focus on reproducibility
– Verifiability of a publication’s findings -- data
and code
GDPR & Data Protection Act
complicate sharing of research data…
– Co-regulatory approach: a shift in accountability
from data protection authorities to data controllers
and data processors (us!)
– Adoption of open science practices hindered by
worries about compliance (funder, university
requirements, legal, ethical),
Personal data & identifiability
“‘personal data’ means any information relating to an
identified or identifiable natural person (‘data subject’);
an identifiable natural person is one who can be identified,
directly or indirectly, in particular by reference to an identifier
such as a name, an identification number, location data, an
online identifier or to one or more factors specific to the
physical, physiological, genetic, mental, economic, cultural
or social identity of that natural person”
The ‘motivated intruder’ test:
To determine whether a natural person is identifiable, account should
be taken of all the means reasonably likely to be used, such as
singling out, either by the controller or by another person to identify the
natural person directly or indirectly.
To ascertain whether means are reasonably likely to be used to identify
the natural person, account should be taken of all objective factors,
such as the costs of and the amount of time required for identification,
taking into consideration the available technology at the time of the
processing and technological developments. (Recital 26 EU GDPR)
Differentiating between personal
and anonymised data:
A balance between
(1) risk of disclosure/ re-identification
(2) consequences of disclosure (“perceived
value of the information”)
A toy dataset (Polish immigrants to the UK)
-- accuracy scores on language measure
-- reaction times on language measure
-- score on cognitive measure
-- score on cognitive measure
-- Age
-- Native language
-- Age of arrival to UK
-- Length of residence in UK
Assessing risk of reidentification (Klein et al 2018)
 Small population and
rare traits
 Dyadic data
 Hierarchical data (e.g.,
small subsamples of
students, co-workers)
 Motivated intruder test
(e.g., jealous partner,
nosy neighbor, envious
co-worker, insurers,
criminals)
questions, questions…
1) do the biographical variables constitute indirect identifiers?
(1b) how can I systematically calculate the risk of re-identification (e.g. what is the
risk of reidentification for a Polish immigrant to the UK, based on their age, length of
residence in UK and age at time of immigration?)
(2) If there is only a very slight possibility that an individual could be indirectly
identified, is it still personal data?
(3) What if the perceived value of the information that might be linked to that
individual is actually quite low (e.g. how many milliseconds an individual took to
identify an English word, or their rating of how acceptable a particular phrase or
grammatical construction is)?
(4) How would one go about documenting their consideration of these factors?
solutions?
Reproducibility Open Data Usability
Binning ✗ ✓✓ ✓✓✓
Permutation ✓✗ ✓✓ ✓✓✓
K-anonymity tools
(e.g. R package
sdcMicro)
✗ ✓✓ ✓✓
Synthesized dataset
(e.g. R package
Synthpop)
✓✓ ✗ ✓
Encrypted data with
script (e.g. OSF)
✓✓✓ ✗ ✓
Restricted access
depository
✓✓✓ ✓✓✓ ✓✓
OSF approved Protected Access
repositories which are GDPR compliant
- Research Data Center of the SOEP (DE)
- Datorium (DE)
- DataFirst (DE)
- PsychData (ZPID, Leibniz)
- University of Bristol Research Data
Repository
- The UK Data Service (ESRC)
Anonymisation
• Europe-wide standards for anonymisation are needed.
– OpenAire  European Data Protection Board could issue
guidelines concerning anonymisation.
• Nationally, codes of conduct to differentiate between
personal and anonymised data.
– may only be binding for members
– involvement of umbrella orgs -- UKRN
• Institutionally, researcher friendly guidance (decision
trees, case studies, tools for documentation of risk
assessment etc)
Anonymisation
• Europe-wide standards for anonymisation
are needed.
– OpenAire  European Data Protection Board
could issue guidelines concerning
anonymisation.
• Nationally, codes of conduct to differentiate
between personal and anonymised data.
– may only be binding for members
– involvement of umbrella orgs -- UKRN
• Institutionally, researcher friendly guidance
(decision trees, case studies, tools for
documentation of risk assessment etc)
Thanks!
Questions?
The Open Data badge is
earned for making publicly
available the digitally-
shareable data necessary
to reproduce the reported
results.

Mais conteúdo relacionado

Semelhante a ODiP: Reproducibility, open data and GDPR

Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhurymaredata
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionIOSR Journals
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISStefan Dietze
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesMicah Altman
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementdri_ireland
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
NeISSProject
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...CREST @ University of Adelaide
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Robert Oostenveld
 
International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)albert ca
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRARDC
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptSangrangBargayary3
 
Developing a multiple-document-processing performance assessment for epistem...
 Developing a multiple-document-processing performance assessment for epistem... Developing a multiple-document-processing performance assessment for epistem...
Developing a multiple-document-processing performance assessment for epistem...Simon Knight
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Martin Donnelly
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigmstarastar
 

Semelhante a ODiP: Reproducibility, open data and GDPR (20)

Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Gobinda Chowdhury
Gobinda ChowdhuryGobinda Chowdhury
Gobinda Chowdhury
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Research Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESISResearch Knowledge Graphs at NFDI4DS & GESIS
Research Knowledge Graphs at NFDI4DS & GESIS
 
Managing Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and ApproachesManaging Confidential Information – Trends and Approaches
Managing Confidential Information – Trends and Approaches
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK
Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

Infrastructures Supporting Inter-disciplinary Research - Exemplars from the UK

 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)International Journal of Data Mining & Knowledge Management Process(IJDKP)
International Journal of Data Mining & Knowledge Management Process(IJDKP)
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSR
 
A brave new world: student surveillance in higher education
A brave new world: student surveillance in higher educationA brave new world: student surveillance in higher education
A brave new world: student surveillance in higher education
 
Introduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .pptIntroduction to Data Mining and technologies .ppt
Introduction to Data Mining and technologies .ppt
 
Developing a multiple-document-processing performance assessment for epistem...
 Developing a multiple-document-processing performance assessment for epistem... Developing a multiple-document-processing performance assessment for epistem...
Developing a multiple-document-processing performance assessment for epistem...
 
Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms: Research data management: a tale of two paradigms:
Research data management: a tale of two paradigms:
 
Research Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two ParadigmsResearch Data Management: A Tale of Two Paradigms
Research Data Management: A Tale of Two Paradigms
 
20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths20130222 kaptur training_goldsmiths
20130222 kaptur training_goldsmiths
 

Mais de University of York Library

Who's counting: an introduction to bibliometrics
Who's counting: an introduction to bibliometricsWho's counting: an introduction to bibliometrics
Who's counting: an introduction to bibliometricsUniversity of York Library
 
ODiP: Data Management in the Welfare Conditionality Research Project
ODiP: Data Management in the Welfare Conditionality Research ProjectODiP: Data Management in the Welfare Conditionality Research Project
ODiP: Data Management in the Welfare Conditionality Research ProjectUniversity of York Library
 
Understanding academics liber 2018 presentation blake and gallimore
Understanding academics liber 2018 presentation blake and gallimoreUnderstanding academics liber 2018 presentation blake and gallimore
Understanding academics liber 2018 presentation blake and gallimoreUniversity of York Library
 
Women's Studies: getting the most from library services and collections
Women's Studies: getting the most from library services and collectionsWomen's Studies: getting the most from library services and collections
Women's Studies: getting the most from library services and collectionsUniversity of York Library
 
Twitter for Researchers & Academics: Tips for IMPROVERS
Twitter for Researchers & Academics: Tips for IMPROVERSTwitter for Researchers & Academics: Tips for IMPROVERS
Twitter for Researchers & Academics: Tips for IMPROVERSUniversity of York Library
 
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...University of York Library
 
A basic introduction to getting the most out of Twitter
A basic introduction to getting the most out of TwitterA basic introduction to getting the most out of Twitter
A basic introduction to getting the most out of TwitterUniversity of York Library
 
Blogs and Blogging: Becoming a Networked Researcher
Blogs and Blogging: Becoming a Networked ResearcherBlogs and Blogging: Becoming a Networked Researcher
Blogs and Blogging: Becoming a Networked ResearcherUniversity of York Library
 

Mais de University of York Library (20)

Open access publication (RET workshop)
Open access publication (RET workshop)Open access publication (RET workshop)
Open access publication (RET workshop)
 
Who's counting: an introduction to bibliometrics
Who's counting: an introduction to bibliometricsWho's counting: an introduction to bibliometrics
Who's counting: an introduction to bibliometrics
 
Finding what you need with YorSearch | #UoYTips
Finding what you need with YorSearch | #UoYTipsFinding what you need with YorSearch | #UoYTips
Finding what you need with YorSearch | #UoYTips
 
Managing your research data
Managing your research dataManaging your research data
Managing your research data
 
#UoYTips: Welcome to the Library
#UoYTips: Welcome to the Library#UoYTips: Welcome to the Library
#UoYTips: Welcome to the Library
 
CLG2 The Good Nurse in the Literature May 2019
CLG2 The Good Nurse in the Literature May 2019CLG2 The Good Nurse in the Literature May 2019
CLG2 The Good Nurse in the Literature May 2019
 
ODiP: Open data and the scientific gift culture
ODiP: Open data and the scientific gift cultureODiP: Open data and the scientific gift culture
ODiP: Open data and the scientific gift culture
 
ODiP: Data Management in the Welfare Conditionality Research Project
ODiP: Data Management in the Welfare Conditionality Research ProjectODiP: Data Management in the Welfare Conditionality Research Project
ODiP: Data Management in the Welfare Conditionality Research Project
 
ODiP: Psychology Open Science Interest Group
ODiP: Psychology Open Science Interest GroupODiP: Psychology Open Science Interest Group
ODiP: Psychology Open Science Interest Group
 
Searching the Literature 2018/19
Searching the Literature 2018/19 Searching the Literature 2018/19
Searching the Literature 2018/19
 
Understanding academics liber 2018 presentation blake and gallimore
Understanding academics liber 2018 presentation blake and gallimoreUnderstanding academics liber 2018 presentation blake and gallimore
Understanding academics liber 2018 presentation blake and gallimore
 
RDM: a briefing for Health Sciences
RDM: a briefing for Health SciencesRDM: a briefing for Health Sciences
RDM: a briefing for Health Sciences
 
Searching the Literature
Searching the Literature Searching the Literature
Searching the Literature
 
Women's Studies: getting the most from library services and collections
Women's Studies: getting the most from library services and collectionsWomen's Studies: getting the most from library services and collections
Women's Studies: getting the most from library services and collections
 
Twitter for Researchers & Academics: Tips for IMPROVERS
Twitter for Researchers & Academics: Tips for IMPROVERSTwitter for Researchers & Academics: Tips for IMPROVERS
Twitter for Researchers & Academics: Tips for IMPROVERS
 
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
University of York Publishing Forum 3-11-14: Research Publishing and Copyrigh...
 
A basic introduction to getting the most out of Twitter
A basic introduction to getting the most out of TwitterA basic introduction to getting the most out of Twitter
A basic introduction to getting the most out of Twitter
 
10 useful things for Management Research
10 useful things for Management Research10 useful things for Management Research
10 useful things for Management Research
 
Blogs and Blogging: Becoming a Networked Researcher
Blogs and Blogging: Becoming a Networked ResearcherBlogs and Blogging: Becoming a Networked Researcher
Blogs and Blogging: Becoming a Networked Researcher
 
Using Twitter in Academic Teaching
Using Twitter in Academic TeachingUsing Twitter in Academic Teaching
Using Twitter in Academic Teaching
 

Último

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Último (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

ODiP: Reproducibility, open data and GDPR

  • 1. Reproducibility, open data, & GDPR Cylcia Bolibaugh, Education, CReLLU
  • 2. Data sharing in Education EROS (Education Researchers for Open Science) (UYSEG, CRESJ, PERC, CReLLU) • qualitative • quantitative (experimental) • quantitative (individual differences) • Various goals for sharing data -- today’s focus on reproducibility – Verifiability of a publication’s findings -- data and code
  • 3. GDPR & Data Protection Act complicate sharing of research data… – Co-regulatory approach: a shift in accountability from data protection authorities to data controllers and data processors (us!) – Adoption of open science practices hindered by worries about compliance (funder, university requirements, legal, ethical),
  • 4. Personal data & identifiability “‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”
  • 5. The ‘motivated intruder’ test: To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. (Recital 26 EU GDPR)
  • 6. Differentiating between personal and anonymised data: A balance between (1) risk of disclosure/ re-identification (2) consequences of disclosure (“perceived value of the information”)
  • 7. A toy dataset (Polish immigrants to the UK) -- accuracy scores on language measure -- reaction times on language measure -- score on cognitive measure -- score on cognitive measure -- Age -- Native language -- Age of arrival to UK -- Length of residence in UK
  • 8. Assessing risk of reidentification (Klein et al 2018)  Small population and rare traits  Dyadic data  Hierarchical data (e.g., small subsamples of students, co-workers)  Motivated intruder test (e.g., jealous partner, nosy neighbor, envious co-worker, insurers, criminals)
  • 9. questions, questions… 1) do the biographical variables constitute indirect identifiers? (1b) how can I systematically calculate the risk of re-identification (e.g. what is the risk of reidentification for a Polish immigrant to the UK, based on their age, length of residence in UK and age at time of immigration?) (2) If there is only a very slight possibility that an individual could be indirectly identified, is it still personal data? (3) What if the perceived value of the information that might be linked to that individual is actually quite low (e.g. how many milliseconds an individual took to identify an English word, or their rating of how acceptable a particular phrase or grammatical construction is)? (4) How would one go about documenting their consideration of these factors?
  • 10. solutions? Reproducibility Open Data Usability Binning ✗ ✓✓ ✓✓✓ Permutation ✓✗ ✓✓ ✓✓✓ K-anonymity tools (e.g. R package sdcMicro) ✗ ✓✓ ✓✓ Synthesized dataset (e.g. R package Synthpop) ✓✓ ✗ ✓ Encrypted data with script (e.g. OSF) ✓✓✓ ✗ ✓ Restricted access depository ✓✓✓ ✓✓✓ ✓✓
  • 11. OSF approved Protected Access repositories which are GDPR compliant - Research Data Center of the SOEP (DE) - Datorium (DE) - DataFirst (DE) - PsychData (ZPID, Leibniz) - University of Bristol Research Data Repository - The UK Data Service (ESRC)
  • 12. Anonymisation • Europe-wide standards for anonymisation are needed. – OpenAire  European Data Protection Board could issue guidelines concerning anonymisation. • Nationally, codes of conduct to differentiate between personal and anonymised data. – may only be binding for members – involvement of umbrella orgs -- UKRN • Institutionally, researcher friendly guidance (decision trees, case studies, tools for documentation of risk assessment etc)
  • 13. Anonymisation • Europe-wide standards for anonymisation are needed. – OpenAire  European Data Protection Board could issue guidelines concerning anonymisation. • Nationally, codes of conduct to differentiate between personal and anonymised data. – may only be binding for members – involvement of umbrella orgs -- UKRN • Institutionally, researcher friendly guidance (decision trees, case studies, tools for documentation of risk assessment etc) Thanks! Questions?
  • 14. The Open Data badge is earned for making publicly available the digitally- shareable data necessary to reproduce the reported results.

Notas do Editor

  1. Lack of clear procedural guidance, and precedent/case studies means that data controllers (i,e, researchers!) understandably risk averse (ris being not only legal compliance, but also the time investment necessary to
  2. (https://ico.org.uk/for-organisations/guide-to-the-general-data-protection-regulation-gdpr/what-is-personal-data/can-we-identify-an-individual-indirectly/) If there is only a very slight possibility that an individual could be indirectly identified, is it still personal data? You should assume that you are not looking just at the means reasonably likely to be used by an ordinary person, but also by a determined person with a particular reason to want to identify individuals. The measures reasonably likely to be taken to identify an individual may vary depending upon the perceived value of the information.
  3. https://www.york.ac.uk/library/info-for/researchers/data/sharing/#tab-3 “In practice, even sensitive and personal data may be shared ethically if care has been taken in anonymisation, suitable consent obtained, reuse conditions prudently planned and appropriate data access restrictions applied.”
  4. From a project investigating whether there are differences in learning mechanisms between child and adult language learners, minimal data required to model variability in the language attainment/proficiency of bilinguals as a function of their learning history (what age they started, how long their exposure has been, and cognitive skills theorised to underlie particular learning mechanisms. In this case, the biographical data are integral to the reproducibility of the analysis, and cannot be separated or binned etc without detriment to the reproducibility.
  5. I have a sample of Polish immigrants, and data about their age at test, the age they arrived to the UK, and their length of residence. Is the combination of these indirect identifiers sufficient to reidentify an individual? Approx 900,000 Polish immigrants to UK, so my population is large and risk of reidentification small. However sampling criteria (very advanced proficiency in English, and minimum 12 years residence) likely increase that, but by how much. Finally risk not evenly spread throughout sample: WWII immigrants.
  6. Depending on the answer to these questions, there are a variety of means by which data can be further anonymised, or other ways in which the data could be shared. However, there is a tradeoff between increasing the availability of the dataset, and ensuring the reproducibility of analyses underlying a published output, which I tried to sketch here in a back of the envelope fashion.
  7. My feeling is that there is likely to be a bias toward placing data in restricted access repositories, even when the disclosure risk is relatively small. The problem with this solution, at least for language researchers, is that it eliminates 2 repositories that are most commonly used (IRIS which is repository specifiliased in materials for L2 research, and OSF, Figshare etc). If you are interested in obtaining an open data badge, Restricted access notation was added earlier this year, but only a small number of repositories have been certified. The first 4 on the list on in Germany, and relatively few UKDA has an end user agreement But in practice, the repositories most commonly used, OSF, and figshare, github