SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Adjusting to the GDPR:
The Impact on Data Scientists and
Behavioral Researchers
Travis Greene, Galit Shmueli, Soumya Ray
National Tsing Hua University, Taiwan
INFORMS 2nd Data Science Workshop, Phoenix, Nov 4, 2018
1
Roadmap
1. Personal Data: USA vs. EU
2. GDPR in a Nutshell
3. Processing GDPR through the InfoQ Framework
4. How will GDPR impact data scientists?
2
USA
Commercial
commodity
"Collecting and processing
[personal data] is allowed unless
it causes harm or is expressly
limited by U.S. law.”
EU
Fundamental right
(Article 8 EU Charter of Fundamental Rights)
"Processing of personal data
is prohibited unless there is an
explicit legal basis that allows it."
Opt-out Opt-in
Personal Data:
Any information that could be used to ‘single out’ a person
3
(Potentially) global reach
● Up to 20M Euro fines or 4% of global turnover
● Affects both industry and research practices
● Similar privacy laws in USA, China, India, Brazil...
Data
Controller
Data
Processor
Data
Subjects
Evolution of 1995 Data Protection Directive into EU-wide
Regulation
Defines three key entities:
4
5
If you’re a data science researcher, it is
difficult to synthesize a coherent
understanding of the new GDPR changes
→ We need a structured framework!
6
Our three-step approach to analyzing GDPR
3.
Analyze
Use
categorization to
analyze the
impact of GDPR
on data science
workflow
1.
Identify
Identify key
GDPR concepts,
definitions,
principles
relevant to data
science research
2.
Categorize
Categorize key
GDPR concepts
in a meaningful
way for data
scientists
7
InfoQ provides a coherent, systematic framework for assessing
the impact of GDPR on data scientists
1. Data resolution
2. Data structure
3. Data integration
4. Temporal relevance
5. Chronology of data & goal
6. Generalizability
7. Operationalization
8. Communication
The Information Quality (InfoQ) Framework
(Kenett & Shmueli, 2014)
InfoQ depends on
4 components:
Assess InfoQ? 8 DimensionsPotential of a dataset to achieve a goal,
given analysis method and utility
8
GDPR Concepts, Definitions, Principles
Privacy by Design
Special Category Data
Purpose Limitation
Automated Profiling
Systems
Pre-GDPR Data
Pseudonymized Data
Legitim
ate
Interests
Structured and
Unstructured Data
Statistical Research
StatisticalAggregations
Consent
Principle of Proportionality
Data Controllers
InfoQ
Statistical Research
Contractual
Necessity
Goal
Scientific Research
Statistical Research
Public Interest
Research
Historical Research
Archival Research
Data
Personal Data
Special Category data
Pseudonymized data
Statistical Data
Publicly available data
Pre-GDPR Personal
Data
Utility
Principle of
Proportionality
Purpose Limitation
Contractual Necessity
Legitimate Interests
Privacy by Design
Consent
Analysis
Statistical Aggregation
Automated Profiling
Filing Systems
Structured vs.
Unstructured
Documentation
Serve Mankind
1.
Identify
2.
Categorize
1.
Collect
Data
1.Resolution
2.Structure
3.Integration
4.Temporal relevance
Examine Typical Data Analysis Workflow Using
InfoQ Framework
5.
Communicate
4.
Generalize
3.
Share
Data
2.
Use
Data
Complete Analysis
InfoQ provides us with ‘x-ray’ vision for analyzing each step of the process
InfoQ
8 Dimensions
Beginning of Research
5.Chronology
6.Generalizability
7. Operationalization
8. Communication
3.
Analyze
11
1. Collect Data
Data Minimization
What kinds of data can
we legally collect?
Purpose Limitation
On which legal grounds
can we collect users’
data?
Pseudonymization
How should collected
data be stored and
secured?
2. Use Data
Pre-GDPR Data
If subjects consented
prior to GDPR, can we
continue to use their
data?
Heterogeneity
Will these data be
available at the time of
prediction?
3. Share Data
Collaboration
How can academics
make use of the vast
stores of BBD collected
and processed by
major internet
companies?
Liability
GDPR imposes large
potential fines
5. Communicate
Data Subjects
How do we explain our
results to concerned data
subjects?
Data Protection Authorities
How can we prove our
compliance with GDPR
principles?
1.Resolution
2.Structure
3.Integration
4.Temporal relevance
4. Generalize
Consent bias
How do we know our
results will generalize
to the population of
interest?
Replication
Can our results be
replicated?
5.Chronology
6.Generalizability
7. Operationalization
8. Communication
A Modern Data Science Workflow
8 InfoQ Dimensions
1. Gathering Data
Pre-collection
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
Data Minimization & Purpose limitation
Collect only for specific purposes clearly explained
→ Must justify “Why do you need my ethnicity?”
Can’t arbitrarily repurpose personal data
→ Need legal basis
Data minimization & privacy preservation paradox
→ Power calculations may indirectly lead to
re-identification
12
Pseudonymization is just a suggestion
→ Spur research on ‘privacy protective data mining’
Different implications for different researchers
→ Personalized vs. aggregate-level models
Pseudonymized data is contextual
→ Know incentives & data environment
Pseudonymization
Data features that might (reasonably) be
used to ID a specific person are stored
separately and securely from other data
IP:
192.18.8.1
Name:
Travis
Green
1. Gathering Data
Pre-collection
Post-collection
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
13
Reconsent,
Data Availability
& Heterogeneity
Pre-GDPR user data reconsent
→ Fewer rows but more accuracy
Data availability for future prediction
→ Must expect opt-outs
More user privacy options
→ Larger heterogeneity in completeness
Models built using de-consented data
→ Still not clear, but Article 7 seems to allow it
1. Gathering Data
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
14
Increased Legal Liability
Companies dropping 3rd party sharing
→ Less rich data
Data subject re-identification and intellectual property
→ “Data access divide”: trusted researchers from elite universities
New legal instruments of compliance
→ Binding Corporate Rules (BCRs), Standard contractual clauses,
certification schemes
1. Gathering Data
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
15
Consent Bias, Guinea Pigs, & Reproducibility
Privacy-savvy users may opt-out
→ Limits inferential power
Lower standards of consent & processing
→ Non-EU users become behavioral big data guinea pigs
Reproducibility of results vs. legal liability
→ Is it worth it for firms?
1. Gathering Data
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
16
Data Subjects
→ Rights to access/information in simple, clear language
→ Right to explanation (why & how) of automated profiling
Authorities
→ Compliance documentation, data privacy impact
assessments (DPIAs), data breach reporting
1. Gathering Data
2. Using Data
3. Sharing Data
4. Generalizing
5. Communicating
Two Audiences:
Data Subjects and Data Authorities
17
Summary
& Final
Thoughts
- Rethink & justify how and why we collect, store, and analyze personal data
- Tradeoffs between economic development and fundamental rights to privacy
18

Mais conteúdo relacionado

Mais procurados

Getting Ready for GDPR
Getting Ready for GDPRGetting Ready for GDPR
Getting Ready for GDPRJessvin Thomas
 
GDPR master class accountable research organisations (january 2018)
GDPR master class   accountable research organisations (january 2018)GDPR master class   accountable research organisations (january 2018)
GDPR master class accountable research organisations (january 2018)MRS
 
Teleran Data Protection - Addressing 5 Critical GDPR Requirements
Teleran Data Protection - Addressing 5 Critical GDPR RequirementsTeleran Data Protection - Addressing 5 Critical GDPR Requirements
Teleran Data Protection - Addressing 5 Critical GDPR RequirementsChris Doolittle
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data miningMesbah Uddin Khan
 
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...Ted Myerson
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
Operations network - consent under gdpr 24.01.2018
Operations network - consent under gdpr 24.01.2018Operations network - consent under gdpr 24.01.2018
Operations network - consent under gdpr 24.01.2018MRS
 
Building a register of data processing
Building a register of data processingBuilding a register of data processing
Building a register of data processingTim Gough
 
Privacy Secrets Your Systems May Be Telling
Privacy Secrets Your Systems May Be TellingPrivacy Secrets Your Systems May Be Telling
Privacy Secrets Your Systems May Be TellingRebecca Leitch
 
GDPR master class - transparent research projects
GDPR master class - transparent research projectsGDPR master class - transparent research projects
GDPR master class - transparent research projectsMRS
 
Legal and ethical considerations for sharing research data
Legal and ethical considerations for sharing research dataLegal and ethical considerations for sharing research data
Legal and ethical considerations for sharing research dataOpenAIRE
 
Browne Jacobson - Administrative and public law - October 2017
Browne Jacobson - Administrative and public law - October 2017Browne Jacobson - Administrative and public law - October 2017
Browne Jacobson - Administrative and public law - October 2017Browne Jacobson LLP
 
Tackling the GDPR Dell EMC Index Engines Webinar
Tackling the GDPR Dell EMC Index Engines WebinarTackling the GDPR Dell EMC Index Engines Webinar
Tackling the GDPR Dell EMC Index Engines WebinarIndex Engines Inc.
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningVrushali Malvadkar
 
Webinar: Practical Technology Playbook for the GDPR
Webinar: Practical Technology Playbook for the GDPRWebinar: Practical Technology Playbook for the GDPR
Webinar: Practical Technology Playbook for the GDPRIndex Engines Inc.
 
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and ApproachesA Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches14894
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningROMALEE AMOLIC
 
Privacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachPrivacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachNarendra Dhadhal
 

Mais procurados (20)

Getting Ready for GDPR
Getting Ready for GDPRGetting Ready for GDPR
Getting Ready for GDPR
 
GDPR master class accountable research organisations (january 2018)
GDPR master class   accountable research organisations (january 2018)GDPR master class   accountable research organisations (january 2018)
GDPR master class accountable research organisations (january 2018)
 
Teleran Data Protection - Addressing 5 Critical GDPR Requirements
Teleran Data Protection - Addressing 5 Critical GDPR RequirementsTeleran Data Protection - Addressing 5 Critical GDPR Requirements
Teleran Data Protection - Addressing 5 Critical GDPR Requirements
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data mining
 
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Operations network - consent under gdpr 24.01.2018
Operations network - consent under gdpr 24.01.2018Operations network - consent under gdpr 24.01.2018
Operations network - consent under gdpr 24.01.2018
 
Building a register of data processing
Building a register of data processingBuilding a register of data processing
Building a register of data processing
 
Privacy Secrets Your Systems May Be Telling
Privacy Secrets Your Systems May Be TellingPrivacy Secrets Your Systems May Be Telling
Privacy Secrets Your Systems May Be Telling
 
GDPR master class - transparent research projects
GDPR master class - transparent research projectsGDPR master class - transparent research projects
GDPR master class - transparent research projects
 
GDPR How to get started?
GDPR  How to get started?GDPR  How to get started?
GDPR How to get started?
 
GDPR and Hadoop
GDPR and HadoopGDPR and Hadoop
GDPR and Hadoop
 
Legal and ethical considerations for sharing research data
Legal and ethical considerations for sharing research dataLegal and ethical considerations for sharing research data
Legal and ethical considerations for sharing research data
 
Browne Jacobson - Administrative and public law - October 2017
Browne Jacobson - Administrative and public law - October 2017Browne Jacobson - Administrative and public law - October 2017
Browne Jacobson - Administrative and public law - October 2017
 
Tackling the GDPR Dell EMC Index Engines Webinar
Tackling the GDPR Dell EMC Index Engines WebinarTackling the GDPR Dell EMC Index Engines Webinar
Tackling the GDPR Dell EMC Index Engines Webinar
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 
Webinar: Practical Technology Playbook for the GDPR
Webinar: Practical Technology Playbook for the GDPRWebinar: Practical Technology Playbook for the GDPR
Webinar: Practical Technology Playbook for the GDPR
 
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and ApproachesA Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 
Privacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachPrivacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approach
 

Semelhante a Adjusting to the GDPR: The Impact on Data Scientists and Behavioral Researchers

May 6 evolving international privacy regulations and cross border data tran...
May 6   evolving international privacy regulations and cross border data tran...May 6   evolving international privacy regulations and cross border data tran...
May 6 evolving international privacy regulations and cross border data tran...Ulf Mattsson
 
GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? SecurityScorecard
 
An itinerary for FAIR and privacy respecting data-driven innovation and research
An itinerary for FAIR and privacy respecting data-driven innovation and researchAn itinerary for FAIR and privacy respecting data-driven innovation and research
An itinerary for FAIR and privacy respecting data-driven innovation and researchMarlon Domingus
 
The Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarEryk Budi Pratama
 
5 key steps for SMBs for reaching GDPR Compliance
5 key steps for SMBs for reaching GDPR Compliance5 key steps for SMBs for reaching GDPR Compliance
5 key steps for SMBs for reaching GDPR ComplianceGabor Farkas
 
GDPR Benefits and a Technical Overview
GDPR  Benefits and a Technical OverviewGDPR  Benefits and a Technical Overview
GDPR Benefits and a Technical OverviewErnest Staats
 
ISACA Houston - Practical data privacy and de-identification techniques
ISACA Houston  - Practical data privacy and de-identification techniquesISACA Houston  - Practical data privacy and de-identification techniques
ISACA Houston - Practical data privacy and de-identification techniquesUlf Mattsson
 
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...ARMA International
 
The Meaning and Impact of the General Data Protection Regulation
The Meaning and Impact of the General Data Protection RegulationThe Meaning and Impact of the General Data Protection Regulation
The Meaning and Impact of the General Data Protection RegulationJake DiMare
 
Impact of GDPR on Data Collection and Processing
Impact of GDPR on Data Collection and ProcessingImpact of GDPR on Data Collection and Processing
Impact of GDPR on Data Collection and ProcessingPromptCloud
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
 
Privacidad: La Tensión entre las Capacidades Tecnológicas y las Expectativas ...
Privacidad: La Tensión entre las Capacidades Tecnológicas y las Expectativas ...Privacidad: La Tensión entre las Capacidades Tecnológicas y las Expectativas ...
Privacidad: La Tensión entre las Capacidades Tecnológicas y las Expectativas ...Facultad de Informática UCM
 
GDPR and IoT: What do you need to know?
GDPR and IoT: What do you need to know?GDPR and IoT: What do you need to know?
GDPR and IoT: What do you need to know?MicheleNati
 
Personal privacy and computer technologies
Personal privacy and computer technologiesPersonal privacy and computer technologies
Personal privacy and computer technologiessidra batool
 
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsPriyanka Aash
 
DAMA Ireland - GDPR
DAMA Ireland - GDPRDAMA Ireland - GDPR
DAMA Ireland - GDPRDAMA Ireland
 

Semelhante a Adjusting to the GDPR: The Impact on Data Scientists and Behavioral Researchers (20)

May 6 evolving international privacy regulations and cross border data tran...
May 6   evolving international privacy regulations and cross border data tran...May 6   evolving international privacy regulations and cross border data tran...
May 6 evolving international privacy regulations and cross border data tran...
 
ZyLAB ACEDS Webinar- GDPR
ZyLAB ACEDS Webinar- GDPR ZyLAB ACEDS Webinar- GDPR
ZyLAB ACEDS Webinar- GDPR
 
GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready?
 
An itinerary for FAIR and privacy respecting data-driven innovation and research
An itinerary for FAIR and privacy respecting data-driven innovation and researchAn itinerary for FAIR and privacy respecting data-driven innovation and research
An itinerary for FAIR and privacy respecting data-driven innovation and research
 
The Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI Webinar
 
5 key steps for SMBs for reaching GDPR Compliance
5 key steps for SMBs for reaching GDPR Compliance5 key steps for SMBs for reaching GDPR Compliance
5 key steps for SMBs for reaching GDPR Compliance
 
GDPR and Research Data Management
GDPR and Research Data ManagementGDPR and Research Data Management
GDPR and Research Data Management
 
GDPR Benefits and a Technical Overview
GDPR  Benefits and a Technical OverviewGDPR  Benefits and a Technical Overview
GDPR Benefits and a Technical Overview
 
ISACA Houston - Practical data privacy and de-identification techniques
ISACA Houston  - Practical data privacy and de-identification techniquesISACA Houston  - Practical data privacy and de-identification techniques
ISACA Houston - Practical data privacy and de-identification techniques
 
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
 
The Meaning and Impact of the General Data Protection Regulation
The Meaning and Impact of the General Data Protection RegulationThe Meaning and Impact of the General Data Protection Regulation
The Meaning and Impact of the General Data Protection Regulation
 
Impact of GDPR on Data Collection and Processing
Impact of GDPR on Data Collection and ProcessingImpact of GDPR on Data Collection and Processing
Impact of GDPR on Data Collection and Processing
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
Privacidad: La Tensión entre las Capacidades Tecnológicas y las Expectativas ...
Privacidad: La Tensión entre las Capacidades Tecnológicas y las Expectativas ...Privacidad: La Tensión entre las Capacidades Tecnológicas y las Expectativas ...
Privacidad: La Tensión entre las Capacidades Tecnológicas y las Expectativas ...
 
GDPR and IoT: What do you need to know?
GDPR and IoT: What do you need to know?GDPR and IoT: What do you need to know?
GDPR and IoT: What do you need to know?
 
Sible 09
Sible 09Sible 09
Sible 09
 
Personal privacy and computer technologies
Personal privacy and computer technologiesPersonal privacy and computer technologies
Personal privacy and computer technologies
 
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
 
DAMA Ireland - GDPR
DAMA Ireland - GDPRDAMA Ireland - GDPR
DAMA Ireland - GDPR
 
Employee Monitoring and Privacy.pdf
Employee Monitoring and Privacy.pdfEmployee Monitoring and Privacy.pdf
Employee Monitoring and Privacy.pdf
 

Último

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 

Último (17)

Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 

Adjusting to the GDPR: The Impact on Data Scientists and Behavioral Researchers

  • 1. Adjusting to the GDPR: The Impact on Data Scientists and Behavioral Researchers Travis Greene, Galit Shmueli, Soumya Ray National Tsing Hua University, Taiwan INFORMS 2nd Data Science Workshop, Phoenix, Nov 4, 2018 1
  • 2. Roadmap 1. Personal Data: USA vs. EU 2. GDPR in a Nutshell 3. Processing GDPR through the InfoQ Framework 4. How will GDPR impact data scientists? 2
  • 3. USA Commercial commodity "Collecting and processing [personal data] is allowed unless it causes harm or is expressly limited by U.S. law.” EU Fundamental right (Article 8 EU Charter of Fundamental Rights) "Processing of personal data is prohibited unless there is an explicit legal basis that allows it." Opt-out Opt-in Personal Data: Any information that could be used to ‘single out’ a person 3
  • 4. (Potentially) global reach ● Up to 20M Euro fines or 4% of global turnover ● Affects both industry and research practices ● Similar privacy laws in USA, China, India, Brazil... Data Controller Data Processor Data Subjects Evolution of 1995 Data Protection Directive into EU-wide Regulation Defines three key entities: 4
  • 5. 5
  • 6. If you’re a data science researcher, it is difficult to synthesize a coherent understanding of the new GDPR changes → We need a structured framework! 6
  • 7. Our three-step approach to analyzing GDPR 3. Analyze Use categorization to analyze the impact of GDPR on data science workflow 1. Identify Identify key GDPR concepts, definitions, principles relevant to data science research 2. Categorize Categorize key GDPR concepts in a meaningful way for data scientists 7
  • 8. InfoQ provides a coherent, systematic framework for assessing the impact of GDPR on data scientists 1. Data resolution 2. Data structure 3. Data integration 4. Temporal relevance 5. Chronology of data & goal 6. Generalizability 7. Operationalization 8. Communication The Information Quality (InfoQ) Framework (Kenett & Shmueli, 2014) InfoQ depends on 4 components: Assess InfoQ? 8 DimensionsPotential of a dataset to achieve a goal, given analysis method and utility 8
  • 9. GDPR Concepts, Definitions, Principles Privacy by Design Special Category Data Purpose Limitation Automated Profiling Systems Pre-GDPR Data Pseudonymized Data Legitim ate Interests Structured and Unstructured Data Statistical Research StatisticalAggregations Consent Principle of Proportionality Data Controllers InfoQ Statistical Research Contractual Necessity Goal Scientific Research Statistical Research Public Interest Research Historical Research Archival Research Data Personal Data Special Category data Pseudonymized data Statistical Data Publicly available data Pre-GDPR Personal Data Utility Principle of Proportionality Purpose Limitation Contractual Necessity Legitimate Interests Privacy by Design Consent Analysis Statistical Aggregation Automated Profiling Filing Systems Structured vs. Unstructured Documentation Serve Mankind 1. Identify 2. Categorize
  • 10. 1. Collect Data 1.Resolution 2.Structure 3.Integration 4.Temporal relevance Examine Typical Data Analysis Workflow Using InfoQ Framework 5. Communicate 4. Generalize 3. Share Data 2. Use Data Complete Analysis InfoQ provides us with ‘x-ray’ vision for analyzing each step of the process InfoQ 8 Dimensions Beginning of Research 5.Chronology 6.Generalizability 7. Operationalization 8. Communication 3. Analyze
  • 11. 11 1. Collect Data Data Minimization What kinds of data can we legally collect? Purpose Limitation On which legal grounds can we collect users’ data? Pseudonymization How should collected data be stored and secured? 2. Use Data Pre-GDPR Data If subjects consented prior to GDPR, can we continue to use their data? Heterogeneity Will these data be available at the time of prediction? 3. Share Data Collaboration How can academics make use of the vast stores of BBD collected and processed by major internet companies? Liability GDPR imposes large potential fines 5. Communicate Data Subjects How do we explain our results to concerned data subjects? Data Protection Authorities How can we prove our compliance with GDPR principles? 1.Resolution 2.Structure 3.Integration 4.Temporal relevance 4. Generalize Consent bias How do we know our results will generalize to the population of interest? Replication Can our results be replicated? 5.Chronology 6.Generalizability 7. Operationalization 8. Communication A Modern Data Science Workflow 8 InfoQ Dimensions
  • 12. 1. Gathering Data Pre-collection 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating Data Minimization & Purpose limitation Collect only for specific purposes clearly explained → Must justify “Why do you need my ethnicity?” Can’t arbitrarily repurpose personal data → Need legal basis Data minimization & privacy preservation paradox → Power calculations may indirectly lead to re-identification 12
  • 13. Pseudonymization is just a suggestion → Spur research on ‘privacy protective data mining’ Different implications for different researchers → Personalized vs. aggregate-level models Pseudonymized data is contextual → Know incentives & data environment Pseudonymization Data features that might (reasonably) be used to ID a specific person are stored separately and securely from other data IP: 192.18.8.1 Name: Travis Green 1. Gathering Data Pre-collection Post-collection 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating 13
  • 14. Reconsent, Data Availability & Heterogeneity Pre-GDPR user data reconsent → Fewer rows but more accuracy Data availability for future prediction → Must expect opt-outs More user privacy options → Larger heterogeneity in completeness Models built using de-consented data → Still not clear, but Article 7 seems to allow it 1. Gathering Data 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating 14
  • 15. Increased Legal Liability Companies dropping 3rd party sharing → Less rich data Data subject re-identification and intellectual property → “Data access divide”: trusted researchers from elite universities New legal instruments of compliance → Binding Corporate Rules (BCRs), Standard contractual clauses, certification schemes 1. Gathering Data 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating 15
  • 16. Consent Bias, Guinea Pigs, & Reproducibility Privacy-savvy users may opt-out → Limits inferential power Lower standards of consent & processing → Non-EU users become behavioral big data guinea pigs Reproducibility of results vs. legal liability → Is it worth it for firms? 1. Gathering Data 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating 16
  • 17. Data Subjects → Rights to access/information in simple, clear language → Right to explanation (why & how) of automated profiling Authorities → Compliance documentation, data privacy impact assessments (DPIAs), data breach reporting 1. Gathering Data 2. Using Data 3. Sharing Data 4. Generalizing 5. Communicating Two Audiences: Data Subjects and Data Authorities 17
  • 18. Summary & Final Thoughts - Rethink & justify how and why we collect, store, and analyze personal data - Tradeoffs between economic development and fundamental rights to privacy 18