SlideShare uma empresa Scribd logo
1 de 38
Technical Aspects of Data Anonymisation &
Pseudonymisation
Risks, Challenges & Mitigations
Matt Lewis
Principal Consultant
Agenda
NCC Group – who we are and what we do
Anonymisation, Pseudonymisation & Re-identification – overview
of concepts
Examples – when anonymisation goes wrong
Pitfalls of image anonymisation and other information leakage
through meta-data
A risk-based approach to anonymisation
Summary and advice
Questions
7/15/2013 © NCC Group 2
NCC Group
7/15/2013 © NCC Group 3
Global information assurance specialist
15,000 customers worldwide across all sectors
The Group has two complementary divisions - escrow and
assurance
Independence from hardware and software providers ensures we
provide unbiased and impartial advice
Largest penetration testing team in the world, with approximately
250 consultants
Me: Brief Bio
7/15/2013 © NCC Group 4
Over 12 years working in Information Security
Previous Employers:
• CESG – The Information Assurance arm of GCHQ
• Information Risk Management (IRM) plc – penetration testing
• KPMG – Executive Advisor in the Information Protection
division of IT Advisory
• NCC Group – Principal Consultant, providing penetration
testing and consultancy around all aspects of Information
Security
Anonymisation – Overview
7/15/2013 © NCC Group 5
Anonymised data should be information that does not identify any
individuals, either in isolation or when cross-referenced with other
data already in the public domain
A careful balance is required around the level of anonymisation
versus the usefulness of the resultant data
Quantitative versus Qualitative – the latter is harder to anonymise
in a consistent way, and requires more rigour on a „per record‟
basis – e.g. meeting minutes
Pseudonymisation – Overview
7/15/2013 © NCC Group 6
Information is anonymous to the receiver (e.g. researchers), but
contains codes or identifiers to allow others to re-identify
individuals from the pseudonymised data
Universally protecting pseudonymised data whilst allowing general
analysis of it is difficult – requires careful management of the
„codes‟ or „keys‟ that uniquely identify individuals
Quantitative versus Qualitative – again, the latter is harder to
pseudonymise in a consistent way, and requires more rigour on a
„per record‟ basis – e.g. meeting minutes
Anonymisation – Techniques and Methods
7/15/2013 © NCC Group 7
There are four main operations available for anonymising data
Suppression, Substitution/Distortion, Generalisation, Aggregation
Consider the following dataset:
Name Sex Birth Date Post Code Complaint
John Male 02/12/1954 SE24 6TY Pain in left eye
Daniel Male 05/01/1984 NW1 6XD Chest pains
Sarah Female 04/08/1978 E17 7WE Chest pains
Samantha Female 03/10/1960 WC1 7RA Back pains
James Male 09/09/1990 NW7 5LK Headaches
Anonymisation – Techniques and Methods
7/15/2013 © NCC Group 8
Suppression - deleting or omitting data fields entirely
Sex Complaint
Male Pain in left eye
Male Chest pains
Female Chest pains
Female Back pains
Male Headaches
Anonymisation – Techniques and Methods
7/15/2013 © NCC Group 9
Substitution/Distortion – e.g. replace a person‟s name with a
unique number – this is also an example of pseudonymisation
Name Sex Birth Date Post Code Complaint
0000001 Male 02/12/1954 SE24 6TY Pain in left eye
0000002 Male 05/01/1984 NW1 6XD Chest pains
0000003 Female 04/08/1978 E17 7WE Chest pains
0000004 Female 03/10/1960 WC1 7RA Back pains
0000005 Male 09/09/1990 NW7 5LK Headaches
Anonymisation – Techniques and Methods
7/15/2013 © NCC Group 10
Generalisation - alter rather than delete identifier values to
increase privacy while preserving utility
Name Sex Birth Year Post Code Complaint
John Male 1954 SE24 Pain in left eye
Daniel Male 1984 NW1 Chest pains
Sarah Female 1978 E17 Chest pains
Samantha Female 1960 WC1 Back pains
James Male 1990 NW7 Headaches
Anonymisation – Techniques and Methods
7/15/2013 © NCC Group 11
Aggregation - produce summary statistics across a dataset
instead of an anonymised dataset
40% of patients complain of chest pains
60% of patients are male
etc.
Name Sex Birth Date Post Code Complaint
John Male 02/12/1954 SE24 6TY Pain in left eye
Daniel Male 05/01/1984 NW1 6XD Chest pains
Sarah Female 04/08/1978 E17 7WE Chest pains
Samantha Female 03/10/1960 WC1 7RA Back pains
James Male 09/09/1990 NW7 5LK Headaches
Anonymisation of Qualitative Data
7/15/2013 © NCC Group 12
Anonymisation of Qualitative Data
7/15/2013 © NCC Group 13
Pseudonymisation in
qualitative data can be
much more difficult
The content/themes in
this meeting for
example might allow for
re-identification of any
pseudonymised
individuals
Re-identification – What is it?
7/15/2013 © NCC Group 14
Re-identification is the act of cross-referencing anonymised data with
other data sources, and using inference, deduction and correlation to
identify individuals
Depending on the nature of data re-identified, this might raise data
protection concerns
Re-identification – Who does this and Why?
7/15/2013 © NCC Group 15
Researchers – e.g. computer scientists, genuinely interested in the
challenges of re-identification
Malicious individuals use re-identification information to discriminate,
harass or discredit a victim
Investigative journalists
Organised crime – re-identification can facilitate creation of fake
identities, or be used to extort victims (if data is personal/sensitive in
nature)
Competitors – seeking to re-identify and publish to discredit
State sponsored data mining and correlation
The Internet is essentially a vast, ever-growing cross-correlation
database; access to most of which is open and free to anyone…
Re-identification
7/15/2013 © NCC Group 16
Inference as a Starting Point
Recall our example:
7/15/2013 © NCC Group 17
Name Sex Birth Date Post Code Complaint
John Male 02/12/1954 SE24 6TY Pain in left eye
Daniel Male 05/01/1984 NW1 6XD Chest pains
Sarah Female 04/08/1978 E17 7WE Chest pains
Samantha Female 03/10/1960 WC1 7RA Back pains
James Male 09/09/1990 NW7 5LK Headaches
Inference as a Starting Point
Suppose the following anonymised aggregations are published:
60% of patients complain of chest pains
60% of patients are male
100% of Back pain sufferers live in WC1
100% of patients are over 21 years of age
100% of females suffer from chest or back pains
20% of patients suffer from Pain in the left eye
From this we can infer the following table fields:
Sex, Age, Condition, Post Code
7/15/2013 © NCC Group 18
Inference as a Starting Point
Suppose we know the sample size (i.e. 5), and the time of data
publication
60% of patients are male
7/15/2013 © NCC Group 19
Sex Birth Date Complaint Post Code
Male
Male
Male
Female
Female
Inference as a Starting Point
100% of patients are over 21 years of age
7/15/2013 © NCC Group 20
Sex Birth Date Complaint Post Code
Male <= 1992
Male <= 1992
Male <= 1992
Female <= 1992
Female <= 1992
Inference as a Starting Point
100% of females suffer from chest or back pains
7/15/2013 © NCC Group 21
Sex Birth Date Complaint Post Code
Male <= 1992
Male <= 1992
Male <= 1992
Female <= 1992 Chest Pains
Female <= 1992 Back Pains
Inference as a Starting Point
100% of Back pain sufferers live in WC1
7/15/2013 © NCC Group 22
Sex Birth Date Complaint Post Code
Male <= 1992
Male <= 1992
Male <= 1992
Female <= 1992 Chest Pains
Female <= 1992 Back Pains WC1
Inference as a Starting Point
20% of patients suffer from Pain in the left eye
7/15/2013 © NCC Group 23
Sex Birth Date Complaint Post Code
Male <= 1992 Pain in left eye
Male <= 1992
Male <= 1992
Female <= 1992 Chest Pains
Female <= 1992 Back Pains WC1
Inference as a Starting Point
60% of patients complain of chest pains
The next step would be to correlate/cross-reference with other
sources
7/15/2013 © NCC Group 24
Sex Birth Date Complaint Post Code
Male <= 1992 Pain in left eye
Male <= 1992 Chest Pains
Male <= 1992 Chest Pains
Female <= 1992 Chest Pains
Female <= 1992 Back Pains WC1
An Example: Massachusetts Group Insurance Company
7/15/2013 © NCC Group 25
In the mid-1990s GIC released anonymised data on state employees that
showed every single hospital visit
The goal was to help researchers; the state spent time removing all
obvious identifiers such as name, address, and Social Security number
William Weld, then Governor of Massachusetts, assured the public that
GIC had protected patient privacy by deleting identifiers
Computer Science graduate Dr. Latanya Sweeney requested a copy of the
data and performed re-identification research on the dataset
Main anonymisation technique used: Suppression
An Example: Massachusetts Group Insurance Company
7/15/2013 © NCC Group 26
Governor Weld
lived in
Cambridge MA
54,000
Residents
7 Post Codes Electoral Roll Purchase
for $20:
Contained name,
address, post code, birth
date, sex etc.
GIC Anonymised Data
Only 6 people in
Cambridge
shared Weld‟s
birthday
Only 3 of these
were men
Only 1 lived in
Weld‟s Post
Code
Dr. Sweeney sent the Governor‟s health records (which included diagnoses and prescriptions) to his office
Another Example: America Online (AOL) Data Release
7/15/2013 © NCC Group 27
In 2006, AOL publicly released twenty million search queries for
650,000 users of AOL‟s search engine summarising three months
of activity
AOL suppressed username and IP address, but replaced these
with unique numbers that allowed researchers to correlate different
searches with a specific user (pseudonymisation)
New York Times reporters Michael Barbaro and Tom Zeller
performed some research around User 4417749‟s identity. His/her
searches had included:
• “landscapers in Lilburn, Ga”
• “several people with the last name Arnold”
• “homes sold in shadow lake subdivision gwinnett county georgia”
Another Example: America Online (AOL) Data Release
7/15/2013 © NCC Group 28
The reporters tracked down Thelma Arnold, a sixty-two-year-old
widow from Lilburn, Georgia who acknowledged that she had
authored the searches, including queries such as
• “numb fingers”, “60 single men” and “dog that urinates on everything”
Main anonymisation technique used: Suppression and Substitution
Anonymisation of non-textual data and Metadata
7/15/2013 © NCC Group 29
Anonymisation might be required of non-textual data. E.g. images
and videos (obfuscating faces)
This might require release of hundreds or thousands of
anonymised files, rather than one large dataset
Often, hidden meta-data within those files is forgotten, and can be
a valuable source for individuals attempting re-identification
Anonymisation of non-textual data and Metadata
7/15/2013 © NCC Group 30
A simple example – a picture of a
heron visiting my garden
Suppose I obfuscate the heron to
protect its identity
GPS/Meta-Data in Image Files
7/15/2013 © NCC Group 31
Meta-Data Extraction
7/15/2013 © NCC Group 32
A Risk-Based Approach
7/15/2013 © NCC Group 33
There is anonymisation guidance, but there is no anonymisation
formula
Anonymisation is not an exact science
Each data set presents a unique instance, and the choice of
anonymisation operation(s) must be carefully considered in order
to maintain anonymity and utility
A risk-based approach is therefore the only option…
Risk Mitigation Advice
7/15/2013 © NCC Group 34
Consult with experts before embarking on anonymisation, on the
proposed approach and potential risks
If there is no business case or perceived benefit in going through
the process, then don‟t
Consider release to limited audiences – only go public if strictly
required
Protect the anonymisation method/formula
Qualitative anonymisation can typically be automated, Quantitative
anonymisation will require more manual efforts
Always vet anonymisations of Quantitative and Qualitative data
before release, don‟t just fire and forget
Risk Mitigation Advice
7/15/2013 © NCC Group 35
Perform your own rudimentary Google searches and correlation
attempts with other public data sources before publishing
If in doubt, engage again with experts on the likelihood of re-
identification given the derived anonymised data set
Consider the quantity of released anonymised data - in practically
all re-identification studies performed, researchers have been more
successful with larger databases
Try and remove one or more of the top 3 culprits: Post Code,
Birthdate and Sex.
• In 2000 Dr. Latanya Sweeney showed that 87% of all Americans could
be uniquely identified using only these three bits of information
Metadata – prior to release, ensure all meta-data in documents is
removed – A number of tools exist for this, depending on the
document type (e.g. Adobe Acrobat, Microsoft Office, JPEGs etc.)
Risk Mitigation Advice - Pseudonymisation
7/15/2013 © NCC Group 36
Don‟t chose reversible identifiers – for example, names replaced
with identifiers that are based on record data
John Smith, 05/01/1978 -> JS05011978
Be aware of potential inferences from sorted data (e.g. alphabetical
ordering might provide clues for re-identification)
Keep the pseudonymisation formula secret – protect it with the same
controls as for encryption keys and passwords
Perform pseudonymisation functions on segregated, secure
environments, only copy/migrate the pseudonymised data (keep
them separate)
Remove all meta-data from pseudonymised data files – make sure
the individual(s) performing the pseudonymisation are not
referenced in the meta-data
Any cryptographic hashing used as identifiers/keys should always
be salted
References
7/15/2013 © NCC Group 37
ICO guidance on anonymisation:
http://www.ico.org.uk/for_organisations/data_protection/topic_guides/anon
ymisation
Paper on the failures of anonymisation, Paul Ohm
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006
Research/Blog on anonymisation http://33bits.org/
Questions?
Matt Lewis
matt.lewis@nccgroup.com
UK Offices
Manchester - Head Office
Cheltenham
Edinburgh
Leatherhead
London
Thame
North American Offices
San Francisco
Atlanta
New York
Seattle
Australian Offices
Sydney
European Offices
Amsterdam - Netherlands
Munich – Germany
Zurich - Switzerland

Mais conteúdo relacionado

Destaque

NCC Group 44Con Workshop: How to assess and secure ios apps
NCC Group 44Con Workshop: How to assess and secure ios appsNCC Group 44Con Workshop: How to assess and secure ios apps
NCC Group 44Con Workshop: How to assess and secure ios appsNCC Group
 
Cryptography101
Cryptography101Cryptography101
Cryptography101NCC Group
 
07182013 Hacking Appliances: Ironic exploits in security products
07182013 Hacking Appliances: Ironic exploits in security products07182013 Hacking Appliances: Ironic exploits in security products
07182013 Hacking Appliances: Ironic exploits in security productsNCC Group
 
Current & Emerging Cyber Security Threats
Current & Emerging Cyber Security ThreatsCurrent & Emerging Cyber Security Threats
Current & Emerging Cyber Security ThreatsNCC Group
 
USB: Undermining Security Barriers
USB: Undermining Security BarriersUSB: Undermining Security Barriers
USB: Undermining Security BarriersNCC Group
 
2012 12-04 --ncc_group_-_mobile_threat_war_room
2012 12-04 --ncc_group_-_mobile_threat_war_room2012 12-04 --ncc_group_-_mobile_threat_war_room
2012 12-04 --ncc_group_-_mobile_threat_war_roomNCC Group
 
Pki 201 Key Management
Pki 201 Key ManagementPki 201 Key Management
Pki 201 Key ManagementNCC Group
 
Andy Davis' Black Hat USA Presentation Revealing embedded fingerprints
Andy Davis' Black Hat USA Presentation Revealing embedded fingerprintsAndy Davis' Black Hat USA Presentation Revealing embedded fingerprints
Andy Davis' Black Hat USA Presentation Revealing embedded fingerprintsNCC Group
 
Docking stations andy_davis_ncc_group_slides
Docking stations andy_davis_ncc_group_slidesDocking stations andy_davis_ncc_group_slides
Docking stations andy_davis_ncc_group_slidesNCC Group
 
The Mobile Internet of Things and Cyber Security
The Mobile Internet of Things and Cyber Security The Mobile Internet of Things and Cyber Security
The Mobile Internet of Things and Cyber Security NCC Group
 
Real World Application Threat Modelling By Example
Real World Application Threat Modelling By ExampleReal World Application Threat Modelling By Example
Real World Application Threat Modelling By ExampleNCC Group
 

Destaque (12)

NCC Group 44Con Workshop: How to assess and secure ios apps
NCC Group 44Con Workshop: How to assess and secure ios appsNCC Group 44Con Workshop: How to assess and secure ios apps
NCC Group 44Con Workshop: How to assess and secure ios apps
 
Cryptography101
Cryptography101Cryptography101
Cryptography101
 
07182013 Hacking Appliances: Ironic exploits in security products
07182013 Hacking Appliances: Ironic exploits in security products07182013 Hacking Appliances: Ironic exploits in security products
07182013 Hacking Appliances: Ironic exploits in security products
 
Cryptography - 101
Cryptography - 101Cryptography - 101
Cryptography - 101
 
Current & Emerging Cyber Security Threats
Current & Emerging Cyber Security ThreatsCurrent & Emerging Cyber Security Threats
Current & Emerging Cyber Security Threats
 
USB: Undermining Security Barriers
USB: Undermining Security BarriersUSB: Undermining Security Barriers
USB: Undermining Security Barriers
 
2012 12-04 --ncc_group_-_mobile_threat_war_room
2012 12-04 --ncc_group_-_mobile_threat_war_room2012 12-04 --ncc_group_-_mobile_threat_war_room
2012 12-04 --ncc_group_-_mobile_threat_war_room
 
Pki 201 Key Management
Pki 201 Key ManagementPki 201 Key Management
Pki 201 Key Management
 
Andy Davis' Black Hat USA Presentation Revealing embedded fingerprints
Andy Davis' Black Hat USA Presentation Revealing embedded fingerprintsAndy Davis' Black Hat USA Presentation Revealing embedded fingerprints
Andy Davis' Black Hat USA Presentation Revealing embedded fingerprints
 
Docking stations andy_davis_ncc_group_slides
Docking stations andy_davis_ncc_group_slidesDocking stations andy_davis_ncc_group_slides
Docking stations andy_davis_ncc_group_slides
 
The Mobile Internet of Things and Cyber Security
The Mobile Internet of Things and Cyber Security The Mobile Internet of Things and Cyber Security
The Mobile Internet of Things and Cyber Security
 
Real World Application Threat Modelling By Example
Real World Application Threat Modelling By ExampleReal World Application Threat Modelling By Example
Real World Application Threat Modelling By Example
 

Semelhante a 2013 07-12 ncc-group_data_anonymisation_technical_aspects_v1 0

Privacy & innovation digital enterprise
Privacy & innovation digital enterprisePrivacy & innovation digital enterprise
Privacy & innovation digital enterpriseSabrina Kirrane
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyKhaled El Emam
 
Sacred Secrets Meet Cyberspace Privacy Issues for Consumers
Sacred Secrets Meet Cyberspace Privacy Issues for Consumers Sacred Secrets Meet Cyberspace Privacy Issues for Consumers
Sacred Secrets Meet Cyberspace Privacy Issues for Consumers Health Informatics New Zealand
 
Privacy Preserving for Mobile Health Data
Privacy Preserving for Mobile Health DataPrivacy Preserving for Mobile Health Data
Privacy Preserving for Mobile Health DataIRJET Journal
 
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)OpenAIRE
 
On Medicaid claims data analysis
On Medicaid claims data analysisOn Medicaid claims data analysis
On Medicaid claims data analysisElspeth Slayter
 
Healthcare Factoids to Power Your Thinking
Healthcare Factoids to Power Your ThinkingHealthcare Factoids to Power Your Thinking
Healthcare Factoids to Power Your ThinkingHealth Catalyst
 
Bringing scientists to data to accelerate discoveries and improve human healt...
Bringing scientists to data to accelerate discoveries and improve human healt...Bringing scientists to data to accelerate discoveries and improve human healt...
Bringing scientists to data to accelerate discoveries and improve human healt...Sri Ambati
 
Information Governance And Cancer Intelligence V1 0
Information Governance And Cancer Intelligence V1 0Information Governance And Cancer Intelligence V1 0
Information Governance And Cancer Intelligence V1 0michael_ncin
 
Albert 2 oklahoma telemedicine
Albert 2 oklahoma telemedicineAlbert 2 oklahoma telemedicine
Albert 2 oklahoma telemedicineTAOklahoma
 
Personal Information Collection: A Trade-Off Analysis
Personal Information Collection: A Trade-Off AnalysisPersonal Information Collection: A Trade-Off Analysis
Personal Information Collection: A Trade-Off AnalysisShannon Szabo-Pickering
 
PLEASE POST EACH DISCUSSION SEPARATELYEach healthcare organi
PLEASE POST EACH DISCUSSION SEPARATELYEach healthcare organiPLEASE POST EACH DISCUSSION SEPARATELYEach healthcare organi
PLEASE POST EACH DISCUSSION SEPARATELYEach healthcare organisamirapdcosden
 
Paper id 36201506
Paper id 36201506Paper id 36201506
Paper id 36201506IJRAT
 
Health Data Innovation (Wolfram Data Summit)
Health Data Innovation (Wolfram Data Summit)Health Data Innovation (Wolfram Data Summit)
Health Data Innovation (Wolfram Data Summit)Peter Speyer
 
Tmi spy health autumn 2013
Tmi spy health autumn 2013Tmi spy health autumn 2013
Tmi spy health autumn 2013hoo384
 
Patient Confidentiality Training
Patient Confidentiality TrainingPatient Confidentiality Training
Patient Confidentiality Trainingkarenleach
 
javed_prethesis2608 on predcition of heart disease
javed_prethesis2608 on predcition of heart diseasejaved_prethesis2608 on predcition of heart disease
javed_prethesis2608 on predcition of heart diseasejaved75
 

Semelhante a 2013 07-12 ncc-group_data_anonymisation_technical_aspects_v1 0 (20)

Privacy & innovation digital enterprise
Privacy & innovation digital enterprisePrivacy & innovation digital enterprise
Privacy & innovation digital enterprise
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting Privacy
 
Sacred Secrets Meet Cyberspace Privacy Issues for Consumers
Sacred Secrets Meet Cyberspace Privacy Issues for Consumers Sacred Secrets Meet Cyberspace Privacy Issues for Consumers
Sacred Secrets Meet Cyberspace Privacy Issues for Consumers
 
Decision Support System
Decision Support SystemDecision Support System
Decision Support System
 
Privacy Preserving for Mobile Health Data
Privacy Preserving for Mobile Health DataPrivacy Preserving for Mobile Health Data
Privacy Preserving for Mobile Health Data
 
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
Amnesia: Data anonymization made easy (8th OpenAIRE workshop)
 
On Medicaid claims data analysis
On Medicaid claims data analysisOn Medicaid claims data analysis
On Medicaid claims data analysis
 
Healthcare Factoids to Power Your Thinking
Healthcare Factoids to Power Your ThinkingHealthcare Factoids to Power Your Thinking
Healthcare Factoids to Power Your Thinking
 
Bringing scientists to data to accelerate discoveries and improve human healt...
Bringing scientists to data to accelerate discoveries and improve human healt...Bringing scientists to data to accelerate discoveries and improve human healt...
Bringing scientists to data to accelerate discoveries and improve human healt...
 
Information Governance And Cancer Intelligence V1 0
Information Governance And Cancer Intelligence V1 0Information Governance And Cancer Intelligence V1 0
Information Governance And Cancer Intelligence V1 0
 
Albert 2 oklahoma telemedicine
Albert 2 oklahoma telemedicineAlbert 2 oklahoma telemedicine
Albert 2 oklahoma telemedicine
 
2014 'Bioscience Colorado' Magazine
2014 'Bioscience Colorado' Magazine2014 'Bioscience Colorado' Magazine
2014 'Bioscience Colorado' Magazine
 
The Doctor is Online
The Doctor is OnlineThe Doctor is Online
The Doctor is Online
 
Personal Information Collection: A Trade-Off Analysis
Personal Information Collection: A Trade-Off AnalysisPersonal Information Collection: A Trade-Off Analysis
Personal Information Collection: A Trade-Off Analysis
 
PLEASE POST EACH DISCUSSION SEPARATELYEach healthcare organi
PLEASE POST EACH DISCUSSION SEPARATELYEach healthcare organiPLEASE POST EACH DISCUSSION SEPARATELYEach healthcare organi
PLEASE POST EACH DISCUSSION SEPARATELYEach healthcare organi
 
Paper id 36201506
Paper id 36201506Paper id 36201506
Paper id 36201506
 
Health Data Innovation (Wolfram Data Summit)
Health Data Innovation (Wolfram Data Summit)Health Data Innovation (Wolfram Data Summit)
Health Data Innovation (Wolfram Data Summit)
 
Tmi spy health autumn 2013
Tmi spy health autumn 2013Tmi spy health autumn 2013
Tmi spy health autumn 2013
 
Patient Confidentiality Training
Patient Confidentiality TrainingPatient Confidentiality Training
Patient Confidentiality Training
 
javed_prethesis2608 on predcition of heart disease
javed_prethesis2608 on predcition of heart diseasejaved_prethesis2608 on predcition of heart disease
javed_prethesis2608 on predcition of heart disease
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

2013 07-12 ncc-group_data_anonymisation_technical_aspects_v1 0

  • 1. Technical Aspects of Data Anonymisation & Pseudonymisation Risks, Challenges & Mitigations Matt Lewis Principal Consultant
  • 2. Agenda NCC Group – who we are and what we do Anonymisation, Pseudonymisation & Re-identification – overview of concepts Examples – when anonymisation goes wrong Pitfalls of image anonymisation and other information leakage through meta-data A risk-based approach to anonymisation Summary and advice Questions 7/15/2013 © NCC Group 2
  • 3. NCC Group 7/15/2013 © NCC Group 3 Global information assurance specialist 15,000 customers worldwide across all sectors The Group has two complementary divisions - escrow and assurance Independence from hardware and software providers ensures we provide unbiased and impartial advice Largest penetration testing team in the world, with approximately 250 consultants
  • 4. Me: Brief Bio 7/15/2013 © NCC Group 4 Over 12 years working in Information Security Previous Employers: • CESG – The Information Assurance arm of GCHQ • Information Risk Management (IRM) plc – penetration testing • KPMG – Executive Advisor in the Information Protection division of IT Advisory • NCC Group – Principal Consultant, providing penetration testing and consultancy around all aspects of Information Security
  • 5. Anonymisation – Overview 7/15/2013 © NCC Group 5 Anonymised data should be information that does not identify any individuals, either in isolation or when cross-referenced with other data already in the public domain A careful balance is required around the level of anonymisation versus the usefulness of the resultant data Quantitative versus Qualitative – the latter is harder to anonymise in a consistent way, and requires more rigour on a „per record‟ basis – e.g. meeting minutes
  • 6. Pseudonymisation – Overview 7/15/2013 © NCC Group 6 Information is anonymous to the receiver (e.g. researchers), but contains codes or identifiers to allow others to re-identify individuals from the pseudonymised data Universally protecting pseudonymised data whilst allowing general analysis of it is difficult – requires careful management of the „codes‟ or „keys‟ that uniquely identify individuals Quantitative versus Qualitative – again, the latter is harder to pseudonymise in a consistent way, and requires more rigour on a „per record‟ basis – e.g. meeting minutes
  • 7. Anonymisation – Techniques and Methods 7/15/2013 © NCC Group 7 There are four main operations available for anonymising data Suppression, Substitution/Distortion, Generalisation, Aggregation Consider the following dataset: Name Sex Birth Date Post Code Complaint John Male 02/12/1954 SE24 6TY Pain in left eye Daniel Male 05/01/1984 NW1 6XD Chest pains Sarah Female 04/08/1978 E17 7WE Chest pains Samantha Female 03/10/1960 WC1 7RA Back pains James Male 09/09/1990 NW7 5LK Headaches
  • 8. Anonymisation – Techniques and Methods 7/15/2013 © NCC Group 8 Suppression - deleting or omitting data fields entirely Sex Complaint Male Pain in left eye Male Chest pains Female Chest pains Female Back pains Male Headaches
  • 9. Anonymisation – Techniques and Methods 7/15/2013 © NCC Group 9 Substitution/Distortion – e.g. replace a person‟s name with a unique number – this is also an example of pseudonymisation Name Sex Birth Date Post Code Complaint 0000001 Male 02/12/1954 SE24 6TY Pain in left eye 0000002 Male 05/01/1984 NW1 6XD Chest pains 0000003 Female 04/08/1978 E17 7WE Chest pains 0000004 Female 03/10/1960 WC1 7RA Back pains 0000005 Male 09/09/1990 NW7 5LK Headaches
  • 10. Anonymisation – Techniques and Methods 7/15/2013 © NCC Group 10 Generalisation - alter rather than delete identifier values to increase privacy while preserving utility Name Sex Birth Year Post Code Complaint John Male 1954 SE24 Pain in left eye Daniel Male 1984 NW1 Chest pains Sarah Female 1978 E17 Chest pains Samantha Female 1960 WC1 Back pains James Male 1990 NW7 Headaches
  • 11. Anonymisation – Techniques and Methods 7/15/2013 © NCC Group 11 Aggregation - produce summary statistics across a dataset instead of an anonymised dataset 40% of patients complain of chest pains 60% of patients are male etc. Name Sex Birth Date Post Code Complaint John Male 02/12/1954 SE24 6TY Pain in left eye Daniel Male 05/01/1984 NW1 6XD Chest pains Sarah Female 04/08/1978 E17 7WE Chest pains Samantha Female 03/10/1960 WC1 7RA Back pains James Male 09/09/1990 NW7 5LK Headaches
  • 12. Anonymisation of Qualitative Data 7/15/2013 © NCC Group 12
  • 13. Anonymisation of Qualitative Data 7/15/2013 © NCC Group 13 Pseudonymisation in qualitative data can be much more difficult The content/themes in this meeting for example might allow for re-identification of any pseudonymised individuals
  • 14. Re-identification – What is it? 7/15/2013 © NCC Group 14 Re-identification is the act of cross-referencing anonymised data with other data sources, and using inference, deduction and correlation to identify individuals Depending on the nature of data re-identified, this might raise data protection concerns
  • 15. Re-identification – Who does this and Why? 7/15/2013 © NCC Group 15 Researchers – e.g. computer scientists, genuinely interested in the challenges of re-identification Malicious individuals use re-identification information to discriminate, harass or discredit a victim Investigative journalists Organised crime – re-identification can facilitate creation of fake identities, or be used to extort victims (if data is personal/sensitive in nature) Competitors – seeking to re-identify and publish to discredit State sponsored data mining and correlation The Internet is essentially a vast, ever-growing cross-correlation database; access to most of which is open and free to anyone…
  • 17. Inference as a Starting Point Recall our example: 7/15/2013 © NCC Group 17 Name Sex Birth Date Post Code Complaint John Male 02/12/1954 SE24 6TY Pain in left eye Daniel Male 05/01/1984 NW1 6XD Chest pains Sarah Female 04/08/1978 E17 7WE Chest pains Samantha Female 03/10/1960 WC1 7RA Back pains James Male 09/09/1990 NW7 5LK Headaches
  • 18. Inference as a Starting Point Suppose the following anonymised aggregations are published: 60% of patients complain of chest pains 60% of patients are male 100% of Back pain sufferers live in WC1 100% of patients are over 21 years of age 100% of females suffer from chest or back pains 20% of patients suffer from Pain in the left eye From this we can infer the following table fields: Sex, Age, Condition, Post Code 7/15/2013 © NCC Group 18
  • 19. Inference as a Starting Point Suppose we know the sample size (i.e. 5), and the time of data publication 60% of patients are male 7/15/2013 © NCC Group 19 Sex Birth Date Complaint Post Code Male Male Male Female Female
  • 20. Inference as a Starting Point 100% of patients are over 21 years of age 7/15/2013 © NCC Group 20 Sex Birth Date Complaint Post Code Male <= 1992 Male <= 1992 Male <= 1992 Female <= 1992 Female <= 1992
  • 21. Inference as a Starting Point 100% of females suffer from chest or back pains 7/15/2013 © NCC Group 21 Sex Birth Date Complaint Post Code Male <= 1992 Male <= 1992 Male <= 1992 Female <= 1992 Chest Pains Female <= 1992 Back Pains
  • 22. Inference as a Starting Point 100% of Back pain sufferers live in WC1 7/15/2013 © NCC Group 22 Sex Birth Date Complaint Post Code Male <= 1992 Male <= 1992 Male <= 1992 Female <= 1992 Chest Pains Female <= 1992 Back Pains WC1
  • 23. Inference as a Starting Point 20% of patients suffer from Pain in the left eye 7/15/2013 © NCC Group 23 Sex Birth Date Complaint Post Code Male <= 1992 Pain in left eye Male <= 1992 Male <= 1992 Female <= 1992 Chest Pains Female <= 1992 Back Pains WC1
  • 24. Inference as a Starting Point 60% of patients complain of chest pains The next step would be to correlate/cross-reference with other sources 7/15/2013 © NCC Group 24 Sex Birth Date Complaint Post Code Male <= 1992 Pain in left eye Male <= 1992 Chest Pains Male <= 1992 Chest Pains Female <= 1992 Chest Pains Female <= 1992 Back Pains WC1
  • 25. An Example: Massachusetts Group Insurance Company 7/15/2013 © NCC Group 25 In the mid-1990s GIC released anonymised data on state employees that showed every single hospital visit The goal was to help researchers; the state spent time removing all obvious identifiers such as name, address, and Social Security number William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identifiers Computer Science graduate Dr. Latanya Sweeney requested a copy of the data and performed re-identification research on the dataset Main anonymisation technique used: Suppression
  • 26. An Example: Massachusetts Group Insurance Company 7/15/2013 © NCC Group 26 Governor Weld lived in Cambridge MA 54,000 Residents 7 Post Codes Electoral Roll Purchase for $20: Contained name, address, post code, birth date, sex etc. GIC Anonymised Data Only 6 people in Cambridge shared Weld‟s birthday Only 3 of these were men Only 1 lived in Weld‟s Post Code Dr. Sweeney sent the Governor‟s health records (which included diagnoses and prescriptions) to his office
  • 27. Another Example: America Online (AOL) Data Release 7/15/2013 © NCC Group 27 In 2006, AOL publicly released twenty million search queries for 650,000 users of AOL‟s search engine summarising three months of activity AOL suppressed username and IP address, but replaced these with unique numbers that allowed researchers to correlate different searches with a specific user (pseudonymisation) New York Times reporters Michael Barbaro and Tom Zeller performed some research around User 4417749‟s identity. His/her searches had included: • “landscapers in Lilburn, Ga” • “several people with the last name Arnold” • “homes sold in shadow lake subdivision gwinnett county georgia”
  • 28. Another Example: America Online (AOL) Data Release 7/15/2013 © NCC Group 28 The reporters tracked down Thelma Arnold, a sixty-two-year-old widow from Lilburn, Georgia who acknowledged that she had authored the searches, including queries such as • “numb fingers”, “60 single men” and “dog that urinates on everything” Main anonymisation technique used: Suppression and Substitution
  • 29. Anonymisation of non-textual data and Metadata 7/15/2013 © NCC Group 29 Anonymisation might be required of non-textual data. E.g. images and videos (obfuscating faces) This might require release of hundreds or thousands of anonymised files, rather than one large dataset Often, hidden meta-data within those files is forgotten, and can be a valuable source for individuals attempting re-identification
  • 30. Anonymisation of non-textual data and Metadata 7/15/2013 © NCC Group 30 A simple example – a picture of a heron visiting my garden Suppose I obfuscate the heron to protect its identity
  • 31. GPS/Meta-Data in Image Files 7/15/2013 © NCC Group 31
  • 33. A Risk-Based Approach 7/15/2013 © NCC Group 33 There is anonymisation guidance, but there is no anonymisation formula Anonymisation is not an exact science Each data set presents a unique instance, and the choice of anonymisation operation(s) must be carefully considered in order to maintain anonymity and utility A risk-based approach is therefore the only option…
  • 34. Risk Mitigation Advice 7/15/2013 © NCC Group 34 Consult with experts before embarking on anonymisation, on the proposed approach and potential risks If there is no business case or perceived benefit in going through the process, then don‟t Consider release to limited audiences – only go public if strictly required Protect the anonymisation method/formula Qualitative anonymisation can typically be automated, Quantitative anonymisation will require more manual efforts Always vet anonymisations of Quantitative and Qualitative data before release, don‟t just fire and forget
  • 35. Risk Mitigation Advice 7/15/2013 © NCC Group 35 Perform your own rudimentary Google searches and correlation attempts with other public data sources before publishing If in doubt, engage again with experts on the likelihood of re- identification given the derived anonymised data set Consider the quantity of released anonymised data - in practically all re-identification studies performed, researchers have been more successful with larger databases Try and remove one or more of the top 3 culprits: Post Code, Birthdate and Sex. • In 2000 Dr. Latanya Sweeney showed that 87% of all Americans could be uniquely identified using only these three bits of information Metadata – prior to release, ensure all meta-data in documents is removed – A number of tools exist for this, depending on the document type (e.g. Adobe Acrobat, Microsoft Office, JPEGs etc.)
  • 36. Risk Mitigation Advice - Pseudonymisation 7/15/2013 © NCC Group 36 Don‟t chose reversible identifiers – for example, names replaced with identifiers that are based on record data John Smith, 05/01/1978 -> JS05011978 Be aware of potential inferences from sorted data (e.g. alphabetical ordering might provide clues for re-identification) Keep the pseudonymisation formula secret – protect it with the same controls as for encryption keys and passwords Perform pseudonymisation functions on segregated, secure environments, only copy/migrate the pseudonymised data (keep them separate) Remove all meta-data from pseudonymised data files – make sure the individual(s) performing the pseudonymisation are not referenced in the meta-data Any cryptographic hashing used as identifiers/keys should always be salted
  • 37. References 7/15/2013 © NCC Group 37 ICO guidance on anonymisation: http://www.ico.org.uk/for_organisations/data_protection/topic_guides/anon ymisation Paper on the failures of anonymisation, Paul Ohm http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 Research/Blog on anonymisation http://33bits.org/
  • 38. Questions? Matt Lewis matt.lewis@nccgroup.com UK Offices Manchester - Head Office Cheltenham Edinburgh Leatherhead London Thame North American Offices San Francisco Atlanta New York Seattle Australian Offices Sydney European Offices Amsterdam - Netherlands Munich – Germany Zurich - Switzerland