SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
​GDPR and Hadoop
​The elephant in the room
​Janosch Woschitz
​2017-09-27
2
• GDPR Overview
• Rights of the data subject
• Challenges within Hadoop ecosystem
• Technical considerations
Agenda
3
• Complex and detailed topic
• This is NOT legal advice
• A lot of opinions and interpretations about
GDPR
• Talk is not covering all aspects of GDPR
• Process matters, documentation is your
friend
Disclaimer
Take it with a grain of salt
4
“Regulation (EU) 2016/679 of the European Parliament [...] on the protection of natural persons with
regard to the processing of personal data and on the free movement of such data, and repealing
Directive 95/46/EC (General Data Protection Regulation)”
• Establishes data protection as a fundamental right
• Creates unified data protection law for all EU member states
• Enables EU citizens to be in control of their personal data
General Data Protection Regulation
GDP what?
- Official title of the GDPR, http://eur-lex.europa.eu/eli/reg/2016/679/oj
5
• Applies if the data controller or processor (organization) or the data
subject (person) is based in the EU
• Applies to organizations based outside the European Union if they
process or monitor personal data of EU citizens
• Employees might be EU citizens as well
General Data Protection Regulation
Who is affected?
6
• Officially published on May 4th 2016
• Applicable from May 25th 2018 across the EU (including UK)
• “Regulation” instead of “Directive” → no need for national
implementing legislation, directly applicable to all EU countries
• Evaluated and reviewed on May 25th 2020
General Data Protection Regulation
When does it happen?
7
• Better data protection and portability for consumers
• Fines for non-compliance will be
– up to €10M or 2% revenue for minor violations
– up to €20M or 4% revenue for major violations
• Any individual has the right to raise a complaint against any
organisation (Art. 77)
General Data Protection Regulation
Why should I care?
8
Privacy by design
Better data protection, you said?
• Privacy by design and by default, essential data protection
• Breach notification within 72 hours
• Data minimization and access limitation
• Data Protection Officer (DPO) and Data Privacy Impact Assessments
(DPIAs)
• Active, specific and unambiguous consent
“the controller shall [...] implement appropriate technical and organisational measures [...] in an
effective manner [...] in order to meet the requirements of this Regulation and protect the rights of
data subjects.” - Article 25, GDPR
9
Personal data?
https://pixabay.com/en/family-drawing-children-cat-paper-879432/
10
Personal data (examples)
It all depends on context
• Location or web surfing data
• Video surveillance and images
• Personal interests or behavioural patterns
• A child's drawing depicting its family
• Publication of x-ray plates together with the patient's first name
• Damage caused by graffiti in public transportation
• X1234 drinks a glass of wine more than 3 times a week, drives a
Bentley and has a Windows 10 phone
11
Source: Facebook
• Right of access and data portability
– free of charge
– structured, commonly used and machine readable
• Right to erasure
– “without undue delay”
• Right to object, to restrict, to rectify, ...
Data citizen rights
Rights of the data subject
GDPR and Hadoop
13
Hadoop ecosystem & beyond
The known Hadoopverse (excerpt)
and much more ...
14
Data processing on Hadoop
Bird’s eye view
• Various data sources and ingestion tools
• Diverse input formats, structured & unstructured
• Diverse processing tools
• Liberal data access, local data science
• Write-append and immutable data structures
• Redundant data
Ingest Process Access
15
Challenges by
example
• Customer data from
RDBMS to HDFS
• Streaming device
location data to
Kafka
16
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
Challenges by example
Ingest table from RDBMS
daily import (e.g. via sqoop)
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
today
-1 day
-2 days
Big DataSmaller Data
17
Problems & Solution approaches
• Right to be forgotten
• Access limitation
• Bound to consent
• ...
• Anonymization
• Hashing
• Encryption
• ...
18
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
Challenges by example
Encrypt, a.k.a. Lost Key Pattern
daily import (e.g. via sqoop)
“userId”: 123
“firstName”: “Janosch”
“dateOfBirth”: “1984-01-01”
“userId”: 123
“firstName”: “54DCF13E4...”
“dateOfBirth”: “D3DFBCE...”
today
-1 day
-2 days
123
19
deviceId: 123pushes data to Kafka topic
123
B
“deviceId”: 123
“lat”: 52.510781
“lon”: 13.371735
Challenges by example
Deletion in log based systems
Edge device
456
A
123
D
123
∅
Kafka topic Consumer
B, C, D, ∅
offset
2
123
C
3 4 5 6
20
deviceId: 123pushes data to Kafka topic
123
D4
“deviceId”: 123
“lat”: 52.510781
“lon”: 13.371735
Challenges by example
Encrypt on write
Edge device
123
Z3
456
T3
123
6H
Kafka topic Consumer
A, B, C, D
offset
1
123
N7
2 3 4 5
123
?
21
Vendor recommendations
Distributions to the rescue!
• Hortonworks - "GDPR: The Good, Bad and Ugly", Jun 20 2017
• Cloudera - "Simplify your response to GDPR", Aug 24 2017
• GDPR compliance via partner solutions
• Only partial answers
Source: Cloudera Inc.
22
GDPR recommendations simplified
Kudu
Sentry
Navigator
Data Science
Workbench
HDFS / ...
Ranger
Atlas
Zeppelin
+ lots of partner solutions
23
Data privacy and open source
Pragmatic considerations
• Secured cluster
• Raw data in encryption zones with very limited access
• Anonymize for further processing wherever possible
• Proper retention policies, batch delete requests and perform regular
clean-ups
• Integrate with Atlas and Ranger → tagging, filtering and masking
• Custom solutions for glue and missing pieces
24
Summary
• No comprehensive open-source solution available
• Proprietary services target specific problem domains, integration still
necessary
• Some time until legal dust settled
• Idea: Avro (logical types) + Vault (or similar) + Ranger + Atlas?
The road ahead
2525 © 2017 Teradata
26
Hadoop Security Primer
In just one slide
• Authentication - Kerberos
• Authorization - Ranger, Sentry, ACLs
• Auditing / Monitoring - Ranger, Navigator, ...
• Encryption of data in motion - KMS, Navigator, ...
• Encryption of data at rest - Encryption zones, SEDs, ...
• Hadoop Security (Ben Spivey, Joey Echeverria)
• Hadoop and Kerberos: The Madness beyond the Gate
27
Personal data
According to GDPR
“any information relating to an identified or identifiable natural person (‘data
subject’);
An identifiable natural person is one who can be identified, directly or indirectly,
in particular by reference to an identifier such as a name, an identification
number, location data, an online identifier or to one or more factors specific to
the physical, physiological, genetic, mental, economic, cultural or social identity
of that natural person.”
- Article 4, GDPR

Mais conteúdo relacionado

Semelhante a GDPR and Hadoop

04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, RubrikVMUG IT
 
Isaca new delhi india - privacy and big data
Isaca new delhi india - privacy and big dataIsaca new delhi india - privacy and big data
Isaca new delhi india - privacy and big dataUlf Mattsson
 
Isaca new delhi india privacy and big data
Isaca new delhi india   privacy and big dataIsaca new delhi india   privacy and big data
Isaca new delhi india privacy and big dataUlf Mattsson
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018 e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018 e-SIDES.eu
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...BigDataEverywhere
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Cross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive dataCross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive dataUlf Mattsson
 
Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfahmedibrahimghnnam01
 
Big Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPRBig Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPRMatt Stubbs
 
Mind Your Business: Why Privacy Matters to the Successful Enterprise
 Mind Your Business: Why Privacy Matters to the Successful Enterprise Mind Your Business: Why Privacy Matters to the Successful Enterprise
Mind Your Business: Why Privacy Matters to the Successful EnterpriseEric Kavanagh
 
How MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceHow MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceMongoDB
 
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...TokenEx
 
Webinar: An EU regulation affecting companies worldwide - GDPR
Webinar: An EU regulation affecting companies worldwide - GDPRWebinar: An EU regulation affecting companies worldwide - GDPR
Webinar: An EU regulation affecting companies worldwide - GDPRpanagenda
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 

Semelhante a GDPR and Hadoop (20)

Sible 09
Sible 09Sible 09
Sible 09
 
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
 
Isaca new delhi india - privacy and big data
Isaca new delhi india - privacy and big dataIsaca new delhi india - privacy and big data
Isaca new delhi india - privacy and big data
 
Isaca new delhi india privacy and big data
Isaca new delhi india   privacy and big dataIsaca new delhi india   privacy and big data
Isaca new delhi india privacy and big data
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018 e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
e-SIDES workshop at EBDVF 2018, Vienna 14/11/2018
 
Gdpr brief and controls ver2.0
Gdpr brief and controls ver2.0Gdpr brief and controls ver2.0
Gdpr brief and controls ver2.0
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
Big Data Everywhere Chicago: The Big Data Imperative -- Discovering & Protect...
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Cross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive dataCross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive data
 
Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 
Big Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPRBig Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPR
 
Mind Your Business: Why Privacy Matters to the Successful Enterprise
 Mind Your Business: Why Privacy Matters to the Successful Enterprise Mind Your Business: Why Privacy Matters to the Successful Enterprise
Mind Your Business: Why Privacy Matters to the Successful Enterprise
 
How MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceHow MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR compliance
 
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
Security Beyond Compliance: Using Tokenisation for Data Protection by Design ...
 
Webinar: An EU regulation affecting companies worldwide - GDPR
Webinar: An EU regulation affecting companies worldwide - GDPRWebinar: An EU regulation affecting companies worldwide - GDPR
Webinar: An EU regulation affecting companies worldwide - GDPR
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 

Último

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Último (20)

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

GDPR and Hadoop

  • 1. ​GDPR and Hadoop ​The elephant in the room ​Janosch Woschitz ​2017-09-27
  • 2. 2 • GDPR Overview • Rights of the data subject • Challenges within Hadoop ecosystem • Technical considerations Agenda
  • 3. 3 • Complex and detailed topic • This is NOT legal advice • A lot of opinions and interpretations about GDPR • Talk is not covering all aspects of GDPR • Process matters, documentation is your friend Disclaimer Take it with a grain of salt
  • 4. 4 “Regulation (EU) 2016/679 of the European Parliament [...] on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation)” • Establishes data protection as a fundamental right • Creates unified data protection law for all EU member states • Enables EU citizens to be in control of their personal data General Data Protection Regulation GDP what? - Official title of the GDPR, http://eur-lex.europa.eu/eli/reg/2016/679/oj
  • 5. 5 • Applies if the data controller or processor (organization) or the data subject (person) is based in the EU • Applies to organizations based outside the European Union if they process or monitor personal data of EU citizens • Employees might be EU citizens as well General Data Protection Regulation Who is affected?
  • 6. 6 • Officially published on May 4th 2016 • Applicable from May 25th 2018 across the EU (including UK) • “Regulation” instead of “Directive” → no need for national implementing legislation, directly applicable to all EU countries • Evaluated and reviewed on May 25th 2020 General Data Protection Regulation When does it happen?
  • 7. 7 • Better data protection and portability for consumers • Fines for non-compliance will be – up to €10M or 2% revenue for minor violations – up to €20M or 4% revenue for major violations • Any individual has the right to raise a complaint against any organisation (Art. 77) General Data Protection Regulation Why should I care?
  • 8. 8 Privacy by design Better data protection, you said? • Privacy by design and by default, essential data protection • Breach notification within 72 hours • Data minimization and access limitation • Data Protection Officer (DPO) and Data Privacy Impact Assessments (DPIAs) • Active, specific and unambiguous consent “the controller shall [...] implement appropriate technical and organisational measures [...] in an effective manner [...] in order to meet the requirements of this Regulation and protect the rights of data subjects.” - Article 25, GDPR
  • 10. 10 Personal data (examples) It all depends on context • Location or web surfing data • Video surveillance and images • Personal interests or behavioural patterns • A child's drawing depicting its family • Publication of x-ray plates together with the patient's first name • Damage caused by graffiti in public transportation • X1234 drinks a glass of wine more than 3 times a week, drives a Bentley and has a Windows 10 phone
  • 11. 11 Source: Facebook • Right of access and data portability – free of charge – structured, commonly used and machine readable • Right to erasure – “without undue delay” • Right to object, to restrict, to rectify, ... Data citizen rights Rights of the data subject
  • 13. 13 Hadoop ecosystem & beyond The known Hadoopverse (excerpt) and much more ...
  • 14. 14 Data processing on Hadoop Bird’s eye view • Various data sources and ingestion tools • Diverse input formats, structured & unstructured • Diverse processing tools • Liberal data access, local data science • Write-append and immutable data structures • Redundant data Ingest Process Access
  • 15. 15 Challenges by example • Customer data from RDBMS to HDFS • Streaming device location data to Kafka
  • 16. 16 “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” Challenges by example Ingest table from RDBMS daily import (e.g. via sqoop) “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” today -1 day -2 days Big DataSmaller Data
  • 17. 17 Problems & Solution approaches • Right to be forgotten • Access limitation • Bound to consent • ... • Anonymization • Hashing • Encryption • ...
  • 18. 18 “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” Challenges by example Encrypt, a.k.a. Lost Key Pattern daily import (e.g. via sqoop) “userId”: 123 “firstName”: “Janosch” “dateOfBirth”: “1984-01-01” “userId”: 123 “firstName”: “54DCF13E4...” “dateOfBirth”: “D3DFBCE...” today -1 day -2 days 123
  • 19. 19 deviceId: 123pushes data to Kafka topic 123 B “deviceId”: 123 “lat”: 52.510781 “lon”: 13.371735 Challenges by example Deletion in log based systems Edge device 456 A 123 D 123 ∅ Kafka topic Consumer B, C, D, ∅ offset 2 123 C 3 4 5 6
  • 20. 20 deviceId: 123pushes data to Kafka topic 123 D4 “deviceId”: 123 “lat”: 52.510781 “lon”: 13.371735 Challenges by example Encrypt on write Edge device 123 Z3 456 T3 123 6H Kafka topic Consumer A, B, C, D offset 1 123 N7 2 3 4 5 123 ?
  • 21. 21 Vendor recommendations Distributions to the rescue! • Hortonworks - "GDPR: The Good, Bad and Ugly", Jun 20 2017 • Cloudera - "Simplify your response to GDPR", Aug 24 2017 • GDPR compliance via partner solutions • Only partial answers Source: Cloudera Inc.
  • 22. 22 GDPR recommendations simplified Kudu Sentry Navigator Data Science Workbench HDFS / ... Ranger Atlas Zeppelin + lots of partner solutions
  • 23. 23 Data privacy and open source Pragmatic considerations • Secured cluster • Raw data in encryption zones with very limited access • Anonymize for further processing wherever possible • Proper retention policies, batch delete requests and perform regular clean-ups • Integrate with Atlas and Ranger → tagging, filtering and masking • Custom solutions for glue and missing pieces
  • 24. 24 Summary • No comprehensive open-source solution available • Proprietary services target specific problem domains, integration still necessary • Some time until legal dust settled • Idea: Avro (logical types) + Vault (or similar) + Ranger + Atlas? The road ahead
  • 25. 2525 © 2017 Teradata
  • 26. 26 Hadoop Security Primer In just one slide • Authentication - Kerberos • Authorization - Ranger, Sentry, ACLs • Auditing / Monitoring - Ranger, Navigator, ... • Encryption of data in motion - KMS, Navigator, ... • Encryption of data at rest - Encryption zones, SEDs, ... • Hadoop Security (Ben Spivey, Joey Echeverria) • Hadoop and Kerberos: The Madness beyond the Gate
  • 27. 27 Personal data According to GDPR “any information relating to an identified or identifiable natural person (‘data subject’); An identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.” - Article 4, GDPR