SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
Big Data Meets Privacy:
De-identification Maturity Model for Benchmarking and
Improving De-identification Practices
Nathalie Holmes
Khaled El Emam
Workshop Outline
 Big Data: Opportunities and Risks in Healthcare
 De-identification Myths: Fact or Fiction
 Overview of Terms Used in Anonymization
 De-identification Maturity Model (DMM) Case
Studies
 DMM Uses and Benefits
OPPORTUNITIES AND RISKS WITH BIG DATA
How to Successfully Leverage Data
While Protecting Individual Privacy
Big Data Tidal Wave is Creating Unforeseen
Opportunities and Risks
Organizations with the Right Tools
And a Skilled Team will
Come Out on Top
Big Data Opportunities and Risks
 A lot of useful data contains personal information about patients, study
participants, or consumers
 The challenge is getting access to the data – addressing the privacy
requirements:
- Do you have authority ?
- Is it mandatory or discretionary ?
- Do you patient / participant consent ?
- Can you anonymize the data
 These are the only ways that you get access to the data
Healthcare Breaches
 Best evidence suggests at least 27% of healthcare practices have a
breach every year
 The costs for healthcare are $200 per individual for breach notification
(Ponemon)
 This applies whether you have obtained consent or authority
De-identification is one piece of an enterprise privacy
program that can make privacy work
“Privacy by Design” provides helpful best practices
Proactive, Preventative, Embedded and Continuous
De-Identification Facts or Fiction #1
 True or False:
- It’s possible to re-identify most, if not all, data.
 False:
- Using robust methods, evidence suggests risk
can be very small.
De-Identification Facts or Fiction #2
 True or False:
- Privacy regulations say that there must be zero
chance of re-identification in order for a data set
to be used for secondary purposes.
 False:
- HIPAA states that the risk of re-identification
must be “very small”. The FTC and other
regulations use a “reasonableness” standard. All
of these standards take context into account
De-Identification Facts or Fiction #3
 True or False:
- Only covered entities should consider HIPAA as
a standard for de-identification.
 False:
- HIPAA is a good standard to use regardless of
the applicable regulations.
OVERVIEW OF ANONYMIZATION
How to Successfully Leverage Data
While Protecting Individual Privacy
PRIVACYANALYTICS.CA
© 2012-2013, Privacy Analytics. All Rights Reserved13 of 76
Balancing Data Privacy Requires Evaluation
of Privacy Protection and Data Utility
Balancing Data Privacy
Direct and In-Direct/Quasi-Identifiers
Examples of direct identifiers: Name, address, telephone
number, fax number, MRN, health card number, health plan
beneficiary number, license plate number, email address,
photograph, biometrics, SSN, SIN, implanted device number
Examples of quasi identifiers: sex, date of birth or age,
geographic locations (such as postal codes, census
geography, information about proximity to known or unique
landmarks), language spoken at home, ethnic origin, total
years of schooling, marital status, criminal history, total income,
visible minority status, profession, event dates
Terminology
A process that removes the association
between the identifying data and the data
subject. (Source ISO/TS 25237:2008)
Reducing the risk of identifying a data
subject to a very small level through the
application of a set of data transformation
techniques without any concern for the
analytics utility of the data.
Removal of fields
from a data set
A particular type of anonymization that both
removes the association with a data
subject and adds an association between a
particular set of characteristics to the data
subject and one or more pseudonyms
(Source: ISO/TS 25237:2008)
Replacing a value in
the data with a random
value from a large
database of possible
values
Data Masking
Data Masking =
No analytics on those
fields
Reducing the risk of identifying a data subject to
a very small level through the application of a set
of data transformation techniques such that the
resulting data retains a very high analytics value.
Reducing the
precision of a value
to a more general
one
The removal of
records or values
(cells) in the data
Randomly selecting a subset of records or
patients from a data set
The motives and
capacity of the
data recipient to
re-identify the data
The security and
privacy practices
that the data
recipient has in
place to manage
the data received.
Statistical De-identification
De-identification =
High analytical value
RE-IDENTIFICATION RISKS
Risks from Basic Demographics
DE-IDENTIFICATION MATURITY MODEL
How to Successfully Leverage Data
While Protecting Individual Privacy
De-identification Maturity Model (DMM)
 Formal framework to evaluate maturity of de-identification services
within an organization
 Gauges level of an organization’s readiness and experience in
relation to people, processes, technologies and consistent
measurement practices
 “DMM” used as a measurement tool; enables the enterprise to
implement a grounded strategy based on facts
 Improves compliance, facilitates access, and scales support services
Three Dimensions of the DMM
A
CB
Practice Dimension
 DMM has five maturity levels for the de-identification practices
that an organization has in place
 Level 1 is lowest level of maturity and level 5 is the highest
level of maturity
Adhoc Masking Heuristic
Risk
Based
Governance
1 2 3 4 5
A
Case Study 1 – Safe Harbor
 Organization A is a disease registry
 They have lots of databases that they connect to and they do a lot of
data releases to internal and external data analysts
 Practice Dimension (what you do):
- Their primary way of anonymizing data is through following the Safe
Harbor de-identification standard (L3)
 Implementation Dimension (how well you do it):
- There is a clear process and well defined roles for following SH,
which is well documented
- Because its documented, it’s repeatable (L3)
Safe Harbor
Safe Harbor Direct Identifiers and Quasi-identifiers
1. Names
2. ZIP Codes (except first
three)
3. All elements of dates
(except year)
4. Telephone numbers
5. Fax numbers
6. Electronic mail
addresses
7. Social security
numbers
8. Medical record
numbers
9. Health plan beneficiary
numbers
10.Account numbers
11.Certificate/license
numbers
12.Vehicle identifiers and
serial numbers,
including license plate
numbers
13.Device identifiers and
serial numbers
14.Web Universal
Resource Locators
(URLs)
15.Internet Protocol (IP)
address numbers
16.Biometric identifiers,
including finger and
voice prints
17.Full face photographic
images and any
comparable images;
18. Any other unique
identifying number,
characteristic, or code
Actual Knowledge
Case Study 1 – Safe Harbor
 Automation dimension (is it automated)
- They use a home grown scripts for implementing SH
- The scripts do not have any external validation that they work or are
sufficient (L1)
 Challenges
- Despite these efforts, they have missed some key items
- There have been pressures by analysts to provide more granular
data
Case Study 1 – Safe Harbor
- They have interpreted the SH regulation for dates such that they
have only dealt with dates of birth rather than all dates
- They have not brought all zip down to 3, and for regions where there
are fewer than 20K people replace with 000 per SH
- Some identifiers were missed (such as clinical trial participant
numbers)
- Did not consider the Actual Knowledge requirement in SH
Case Study 2 – Masking
 Company B is a claims processor
 They have a need for realistic data for software testing
 Practice Dimension (what you do):
- Their primary way of anonymizing is through data masking
- This means they deal only with the direct identifiers (L2)
 Implementation Dimension (how well you do it):
- There is a clear process for doing masking and how they implement
heuristics, which is well documented
- Because its documented, it’s repeatable (L3)
Case Study 2 – Masking
 Automation dimension (is it automated)
- They use a commercial product for masking
- This product produces consistent results (L2)
 Challenges
- Despite these efforts, they have missed some key items – the quasi-
identifiers
- Some dates and ZIP codes were not addressed
- There is no evidence that the risk of re-identification was “very small”
- The tool vendor architect provided assurance that this was OK
Case Study 3 – Governance
 Company C is an EMR vendor
 They have a need to provide reports to their clients on trends and
benchmarks to help clients to improve their businesses
 Practice Dimension (what you do):
- They have a risk-based approach which includes anonymizing both
direct identifiers (masking) and in-direct identifiers (de-identification)
 Implementation Dimension (how well you do it):
- There is a clear process for anonymizing the data which is well
documented
- Because its documented, it’s repeatable
Case Study 3 – Governance
- They have on-going training of staff on how to do the
anonymization
- They are able to quickly produce reports and metrics
documenting what they did to the data before they released it
- They have automated data sharing agreements which specifies
the controls that need to be in place by data users
- They have a full audit trail to demonstrate that the risk of re-
identification is “very small” per HIPAA
- They track when there is overlap between the various data sets
- Audits are conducted on data users to confirm compliance with
conditions
Case Study 3 – Governance
 Automation Dimension (is it automated)
- They use commercial software to do masking and de-
identification
- The product produces consistent results
- They are able to get defensible anonymization more quickly than
by doing it manually
- The product has been scrutinized by other users & peers and is
upgraded on a regular basis
- They are able to release more data sets, more quickly
Benefits of DMM
 Determine whether an organization can defensibly ensure risk of re-
identification is “very small”
 Provides a road map to meet regulatory and legal requirements
 Automation and governance allow organizations to share more data for
secondary purposes with fewer resources
 A higher the level of maturity results in higher quality data and greater
consistency in de-identification
 Significant improvement in ability to estimate resources and time
required to de-identify data sets
PRIVACYANALYTICS.CA
© 2012-2013, Privacy Analytics. All Rights Reserved51 of 92
Key Learnings
Data Anonymization Resources
Book Signing:
Sept 26,10:35 am Booth # 107
Khaled El Emam & Luk Arbuckle
Other Conference Activities
 Session: Facilitating Analytics While Protecting Individual Privacy Using
Data De-identification - Khaled El Emam
- Thursday , September 26 @ 4:00pm, Salon F
 Office hours in the Sponsor Pavilion:
- Nathalie Holmes - Thursday, September 26 @ 3:10pm, Table D
- Khaled El Emam - Thursday, September 26 @ 6:30pm, Table D
Contact
Nathalie Holmes:
nholmes@privacyanalytics.ca
613.369.4313 ext 122
Khaled El Emam:
kelemam@ehealthinformation.ca
613.738.4181
@PrivacyAnalytic
2012 Start-Up Showcase Winner
Review Quiz
 What does anonymization mean?
 What is the difference between data masking and de-identification?
 Why is it important to strive for balance between privacy and data utility?
 How many levels of maturity (Practice Dimension) are there in the DMM?
 Is it possible to be at Practice Dimension 1 (Ad hoc) and score well in the
Implementation Dimension? Ex. Have a repeatable, defined and measurable
process?
 What are some advantages of having Standard Automation (software)?
 What is the main difference between Practice Dimension 4 (Risk Based) and
Dimension 5 (Governance)?

Mais conteúdo relacionado

Mais procurados

Bridging the gap between privacy and big data Ulf Mattsson - Protegrity Sep 10
Bridging the gap between privacy and big data   Ulf Mattsson - Protegrity Sep 10Bridging the gap between privacy and big data   Ulf Mattsson - Protegrity Sep 10
Bridging the gap between privacy and big data Ulf Mattsson - Protegrity Sep 10Ulf Mattsson
 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloudsornalathaNatarajan
 
Privacy by design for peerlyst meetup
Privacy by design for peerlyst meetupPrivacy by design for peerlyst meetup
Privacy by design for peerlyst meetupIshay Tentser
 
Big Data Security Analytics (BDSA) with Randy Franklin
Big Data Security Analytics (BDSA) with Randy FranklinBig Data Security Analytics (BDSA) with Randy Franklin
Big Data Security Analytics (BDSA) with Randy FranklinSridhar Karnam
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Peter Wood
 
Data Leakage Presentation
Data Leakage PresentationData Leakage Presentation
Data Leakage PresentationMike Spaulding
 
apsec 7 Golden Rules Data Leakage Prevention / DLP
apsec 7 Golden Rules Data Leakage Prevention / DLPapsec 7 Golden Rules Data Leakage Prevention / DLP
apsec 7 Golden Rules Data Leakage Prevention / DLPandreasschuster
 
Privacy by design for startups: legal and technology
Privacy by design for startups: legal and technologyPrivacy by design for startups: legal and technology
Privacy by design for startups: legal and technologyIshay Tentser
 
May 6 evolving international privacy regulations and cross border data tran...
May 6   evolving international privacy regulations and cross border data tran...May 6   evolving international privacy regulations and cross border data tran...
May 6 evolving international privacy regulations and cross border data tran...Ulf Mattsson
 
Information Leakage & DLP
Information Leakage & DLPInformation Leakage & DLP
Information Leakage & DLPYun Lu
 
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
Atlanta ISSA  2010 Enterprise Data Protection   Ulf MattssonAtlanta ISSA  2010 Enterprise Data Protection   Ulf Mattsson
Atlanta ISSA 2010 Enterprise Data Protection Ulf MattssonUlf Mattsson
 
Isaca atlanta - practical data security and privacy
Isaca atlanta - practical data security and privacyIsaca atlanta - practical data security and privacy
Isaca atlanta - practical data security and privacyUlf Mattsson
 
Privacy Secrets Your Systems May Be Telling
Privacy Secrets Your Systems May Be TellingPrivacy Secrets Your Systems May Be Telling
Privacy Secrets Your Systems May Be TellingRebecca Leitch
 
Eight principles of consumer data privacy
Eight principles of consumer data privacyEight principles of consumer data privacy
Eight principles of consumer data privacySolix Technologies, Inc
 
ISACA Houston - How to de-classify data and rethink transfer of data between ...
ISACA Houston - How to de-classify data and rethink transfer of data between ...ISACA Houston - How to de-classify data and rethink transfer of data between ...
ISACA Houston - How to de-classify data and rethink transfer of data between ...Ulf Mattsson
 
ISACA Houston - Practical data privacy and de-identification techniques
ISACA Houston  - Practical data privacy and de-identification techniquesISACA Houston  - Practical data privacy and de-identification techniques
ISACA Houston - Practical data privacy and de-identification techniquesUlf Mattsson
 

Mais procurados (20)

Bridging the gap between privacy and big data Ulf Mattsson - Protegrity Sep 10
Bridging the gap between privacy and big data   Ulf Mattsson - Protegrity Sep 10Bridging the gap between privacy and big data   Ulf Mattsson - Protegrity Sep 10
Bridging the gap between privacy and big data Ulf Mattsson - Protegrity Sep 10
 
Security issues associated with big data in cloud
Security issues associated  with big data in cloudSecurity issues associated  with big data in cloud
Security issues associated with big data in cloud
 
Privacy by design for peerlyst meetup
Privacy by design for peerlyst meetupPrivacy by design for peerlyst meetup
Privacy by design for peerlyst meetup
 
Big Data Security Analytics (BDSA) with Randy Franklin
Big Data Security Analytics (BDSA) with Randy FranklinBig Data Security Analytics (BDSA) with Randy Franklin
Big Data Security Analytics (BDSA) with Randy Franklin
 
Data security and privacy
Data security and privacyData security and privacy
Data security and privacy
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Data Leakage Presentation
Data Leakage PresentationData Leakage Presentation
Data Leakage Presentation
 
apsec 7 Golden Rules Data Leakage Prevention / DLP
apsec 7 Golden Rules Data Leakage Prevention / DLPapsec 7 Golden Rules Data Leakage Prevention / DLP
apsec 7 Golden Rules Data Leakage Prevention / DLP
 
Privacy by design for startups: legal and technology
Privacy by design for startups: legal and technologyPrivacy by design for startups: legal and technology
Privacy by design for startups: legal and technology
 
Data Protection Presentation
Data Protection PresentationData Protection Presentation
Data Protection Presentation
 
Ib3514141422
Ib3514141422Ib3514141422
Ib3514141422
 
May 6 evolving international privacy regulations and cross border data tran...
May 6   evolving international privacy regulations and cross border data tran...May 6   evolving international privacy regulations and cross border data tran...
May 6 evolving international privacy regulations and cross border data tran...
 
Data Leakage Prevention (DLP)
Data Leakage Prevention (DLP)Data Leakage Prevention (DLP)
Data Leakage Prevention (DLP)
 
Information Leakage & DLP
Information Leakage & DLPInformation Leakage & DLP
Information Leakage & DLP
 
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
Atlanta ISSA  2010 Enterprise Data Protection   Ulf MattssonAtlanta ISSA  2010 Enterprise Data Protection   Ulf Mattsson
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
 
Isaca atlanta - practical data security and privacy
Isaca atlanta - practical data security and privacyIsaca atlanta - practical data security and privacy
Isaca atlanta - practical data security and privacy
 
Privacy Secrets Your Systems May Be Telling
Privacy Secrets Your Systems May Be TellingPrivacy Secrets Your Systems May Be Telling
Privacy Secrets Your Systems May Be Telling
 
Eight principles of consumer data privacy
Eight principles of consumer data privacyEight principles of consumer data privacy
Eight principles of consumer data privacy
 
ISACA Houston - How to de-classify data and rethink transfer of data between ...
ISACA Houston - How to de-classify data and rethink transfer of data between ...ISACA Houston - How to de-classify data and rethink transfer of data between ...
ISACA Houston - How to de-classify data and rethink transfer of data between ...
 
ISACA Houston - Practical data privacy and de-identification techniques
ISACA Houston  - Practical data privacy and de-identification techniquesISACA Houston  - Practical data privacy and de-identification techniques
ISACA Houston - Practical data privacy and de-identification techniques
 

Destaque

Anonymizing Health Data
Anonymizing Health DataAnonymizing Health Data
Anonymizing Health DataKhaled El Emam
 
Privacy in the Age of Big Data
Privacy in the Age of Big DataPrivacy in the Age of Big Data
Privacy in the Age of Big Datamarcgallardo
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 
Shariyaz abdeen data leakage prevention presentation
Shariyaz abdeen   data leakage prevention presentationShariyaz abdeen   data leakage prevention presentation
Shariyaz abdeen data leakage prevention presentationShariyaz Abdeen
 
Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control	Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control DBmaestro - Database DevOps
 
Privacy, Permissions and the Evolution of Big Data
Privacy, Permissions and the Evolution of Big DataPrivacy, Permissions and the Evolution of Big Data
Privacy, Permissions and the Evolution of Big DataVision Critical
 
Automatski - The Internet of Things - Privacy in IoT
Automatski - The Internet of Things - Privacy in IoTAutomatski - The Internet of Things - Privacy in IoT
Automatski - The Internet of Things - Privacy in IoTautomatskicorporation
 
IBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big dataIBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big dataIBM Analytics
 
Privacy and Big Data Overload!
Privacy and Big Data Overload!Privacy and Big Data Overload!
Privacy and Big Data Overload!SparkPost
 
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymizationarx-deidentifier
 
The importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital TransformationThe importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital TransformationMongoDB
 

Destaque (16)

Anonymizing Health Data
Anonymizing Health DataAnonymizing Health Data
Anonymizing Health Data
 
Privacy in the Age of Big Data
Privacy in the Age of Big DataPrivacy in the Age of Big Data
Privacy in the Age of Big Data
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
Shariyaz abdeen data leakage prevention presentation
Shariyaz abdeen   data leakage prevention presentationShariyaz abdeen   data leakage prevention presentation
Shariyaz abdeen data leakage prevention presentation
 
Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control	Protect your Database with Data Masking & Enforced Version Control
Protect your Database with Data Masking & Enforced Version Control
 
Privacy, Permissions and the Evolution of Big Data
Privacy, Permissions and the Evolution of Big DataPrivacy, Permissions and the Evolution of Big Data
Privacy, Permissions and the Evolution of Big Data
 
Automatski - The Internet of Things - Privacy in IoT
Automatski - The Internet of Things - Privacy in IoTAutomatski - The Internet of Things - Privacy in IoT
Automatski - The Internet of Things - Privacy in IoT
 
Your organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and securityYour organization and Big Data: Managing access, privacy, and security
Your organization and Big Data: Managing access, privacy, and security
 
IBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big dataIBM's four key steps to security and privacy for big data
IBM's four key steps to security and privacy for big data
 
Privacy and Big Data Overload!
Privacy and Big Data Overload!Privacy and Big Data Overload!
Privacy and Big Data Overload!
 
Privacy in the Age of Big Data
Privacy in the Age of Big DataPrivacy in the Age of Big Data
Privacy in the Age of Big Data
 
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymization
 
The importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital TransformationThe importance of efficient data management for Digital Transformation
The importance of efficient data management for Digital Transformation
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Semelhante a Big Data Privacy Benchmark

Information Risk Management Overview
Information Risk Management OverviewInformation Risk Management Overview
Information Risk Management Overviewelvinchan
 
Module 02 Performance Risk-based Analytics With all the advancem
Module 02 Performance Risk-based Analytics With all the advancemModule 02 Performance Risk-based Analytics With all the advancem
Module 02 Performance Risk-based Analytics With all the advancemIlonaThornburg83
 
Keep Calm and Comply: 3 Keys to GDPR Success
Keep Calm and Comply: 3 Keys to GDPR SuccessKeep Calm and Comply: 3 Keys to GDPR Success
Keep Calm and Comply: 3 Keys to GDPR SuccessSirius
 
Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics Nitin Verma
 
Fraud Detection using Data Mining Project
Fraud Detection using Data Mining ProjectFraud Detection using Data Mining Project
Fraud Detection using Data Mining ProjectAlbert Kennedy III
 
Big data analytics for life insurers
Big data analytics for life insurersBig data analytics for life insurers
Big data analytics for life insurersdipak sahoo
 
Big_data_analytics_for_life_insurers_published
Big_data_analytics_for_life_insurers_publishedBig_data_analytics_for_life_insurers_published
Big_data_analytics_for_life_insurers_publishedShradha Verma
 
Cybercrime and the Hidden Perils of Patient Data
Cybercrime and the Hidden Perils of Patient DataCybercrime and the Hidden Perils of Patient Data
Cybercrime and the Hidden Perils of Patient DataStephen Cobb
 
[AIIM18] GDPR: whose job is it now? - Paul Lanois
[AIIM18] GDPR: whose job is it now? - Paul Lanois[AIIM18] GDPR: whose job is it now? - Paul Lanois
[AIIM18] GDPR: whose job is it now? - Paul LanoisAIIM International
 
Solutions for privacy, disclosure and encryption
Solutions for privacy, disclosure and encryptionSolutions for privacy, disclosure and encryption
Solutions for privacy, disclosure and encryptionTrend Micro
 
Addressing the EU GDPR & New York Cybersecurity Requirements: 3 Keys to Success
Addressing the EU GDPR & New York Cybersecurity Requirements: 3 Keys to SuccessAddressing the EU GDPR & New York Cybersecurity Requirements: 3 Keys to Success
Addressing the EU GDPR & New York Cybersecurity Requirements: 3 Keys to SuccessSirius
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docxDATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docxSteveNgigi2
 
DATA BREACH LITIGATION HOW TO AVOID IT AND BE BETTER PREPARED
DATA BREACH LITIGATION HOW TO AVOID IT AND BE BETTER PREPAREDDATA BREACH LITIGATION HOW TO AVOID IT AND BE BETTER PREPARED
DATA BREACH LITIGATION HOW TO AVOID IT AND BE BETTER PREPAREDPriyanka Aash
 
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_CloudPerspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_CloudCheryl Goldberg
 
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_CloudPerspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_CloudCheryl Goldberg
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Denny Lee
 
Target Data Security Breach Case Study
Target Data Security Breach Case StudyTarget Data Security Breach Case Study
Target Data Security Breach Case StudyAngilina Jones
 
Setting the right GDPR priorities
Setting the right GDPR prioritiesSetting the right GDPR priorities
Setting the right GDPR prioritiesAlberto Canadè
 
How Big Data Analysis Can Protect You From Fraud
How Big Data Analysis Can Protect You From FraudHow Big Data Analysis Can Protect You From Fraud
How Big Data Analysis Can Protect You From FraudTrendwise Analytics
 

Semelhante a Big Data Privacy Benchmark (20)

Information Risk Management Overview
Information Risk Management OverviewInformation Risk Management Overview
Information Risk Management Overview
 
Module 02 Performance Risk-based Analytics With all the advancem
Module 02 Performance Risk-based Analytics With all the advancemModule 02 Performance Risk-based Analytics With all the advancem
Module 02 Performance Risk-based Analytics With all the advancem
 
Keep Calm and Comply: 3 Keys to GDPR Success
Keep Calm and Comply: 3 Keys to GDPR SuccessKeep Calm and Comply: 3 Keys to GDPR Success
Keep Calm and Comply: 3 Keys to GDPR Success
 
Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics Detecting health insurance fraud using analytics
Detecting health insurance fraud using analytics
 
Fraud Detection using Data Mining Project
Fraud Detection using Data Mining ProjectFraud Detection using Data Mining Project
Fraud Detection using Data Mining Project
 
Big data analytics for life insurers
Big data analytics for life insurersBig data analytics for life insurers
Big data analytics for life insurers
 
Big_data_analytics_for_life_insurers_published
Big_data_analytics_for_life_insurers_publishedBig_data_analytics_for_life_insurers_published
Big_data_analytics_for_life_insurers_published
 
Cybercrime and the Hidden Perils of Patient Data
Cybercrime and the Hidden Perils of Patient DataCybercrime and the Hidden Perils of Patient Data
Cybercrime and the Hidden Perils of Patient Data
 
[AIIM18] GDPR: whose job is it now? - Paul Lanois
[AIIM18] GDPR: whose job is it now? - Paul Lanois[AIIM18] GDPR: whose job is it now? - Paul Lanois
[AIIM18] GDPR: whose job is it now? - Paul Lanois
 
Solutions for privacy, disclosure and encryption
Solutions for privacy, disclosure and encryptionSolutions for privacy, disclosure and encryption
Solutions for privacy, disclosure and encryption
 
Addressing the EU GDPR & New York Cybersecurity Requirements: 3 Keys to Success
Addressing the EU GDPR & New York Cybersecurity Requirements: 3 Keys to SuccessAddressing the EU GDPR & New York Cybersecurity Requirements: 3 Keys to Success
Addressing the EU GDPR & New York Cybersecurity Requirements: 3 Keys to Success
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docxDATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
DATA PROTECTION IMPACT ASSESSMENT TEMPLATE (ODPC).docx
 
DATA BREACH LITIGATION HOW TO AVOID IT AND BE BETTER PREPARED
DATA BREACH LITIGATION HOW TO AVOID IT AND BE BETTER PREPAREDDATA BREACH LITIGATION HOW TO AVOID IT AND BE BETTER PREPARED
DATA BREACH LITIGATION HOW TO AVOID IT AND BE BETTER PREPARED
 
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_CloudPerspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
 
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_CloudPerspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
Perspecsys_Best_Practices_Guide_for_Protecting_Healthcare_Data_in_the_Cloud
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
Target Data Security Breach Case Study
Target Data Security Breach Case StudyTarget Data Security Breach Case Study
Target Data Security Breach Case Study
 
Setting the right GDPR priorities
Setting the right GDPR prioritiesSetting the right GDPR priorities
Setting the right GDPR priorities
 
How Big Data Analysis Can Protect You From Fraud
How Big Data Analysis Can Protect You From FraudHow Big Data Analysis Can Protect You From Fraud
How Big Data Analysis Can Protect You From Fraud
 

Mais de Khaled El Emam

Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeCanadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeKhaled El Emam
 
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Khaled El Emam
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyKhaled El Emam
 
Sharing Health Research Data
Sharing Health Research DataSharing Health Research Data
Sharing Health Research DataKhaled El Emam
 
Risk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health DataRisk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health DataKhaled El Emam
 
The De-identification of Clinical Data
The De-identification of Clinical DataThe De-identification of Clinical Data
The De-identification of Clinical DataKhaled El Emam
 
The Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by ConsumersThe Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by ConsumersKhaled El Emam
 
The Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical TrialsThe Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical TrialsKhaled El Emam
 

Mais de Khaled El Emam (8)

Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in PracticeCanadian AI 2014 Conference Keynote - Deploying SMC in Practice
Canadian AI 2014 Conference Keynote - Deploying SMC in Practice
 
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
Take Two Curves and Call Me in the Morning: The Story of the NSAs Dual_EC_DRB...
 
Facilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting PrivacyFacilitating Analytics while Protecting Privacy
Facilitating Analytics while Protecting Privacy
 
Sharing Health Research Data
Sharing Health Research DataSharing Health Research Data
Sharing Health Research Data
 
Risk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health DataRisk Based De-identification for Sharing Health Data
Risk Based De-identification for Sharing Health Data
 
The De-identification of Clinical Data
The De-identification of Clinical DataThe De-identification of Clinical Data
The De-identification of Clinical Data
 
The Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by ConsumersThe Adoption of Personal Health Records by Consumers
The Adoption of Personal Health Records by Consumers
 
The Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical TrialsThe Use of EDC in Canadian Clinical Trials
The Use of EDC in Canadian Clinical Trials
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Big Data Privacy Benchmark

  • 1. Big Data Meets Privacy: De-identification Maturity Model for Benchmarking and Improving De-identification Practices Nathalie Holmes Khaled El Emam
  • 2. Workshop Outline  Big Data: Opportunities and Risks in Healthcare  De-identification Myths: Fact or Fiction  Overview of Terms Used in Anonymization  De-identification Maturity Model (DMM) Case Studies  DMM Uses and Benefits
  • 3. OPPORTUNITIES AND RISKS WITH BIG DATA How to Successfully Leverage Data While Protecting Individual Privacy
  • 4. Big Data Tidal Wave is Creating Unforeseen Opportunities and Risks
  • 5. Organizations with the Right Tools And a Skilled Team will Come Out on Top
  • 6. Big Data Opportunities and Risks  A lot of useful data contains personal information about patients, study participants, or consumers  The challenge is getting access to the data – addressing the privacy requirements: - Do you have authority ? - Is it mandatory or discretionary ? - Do you patient / participant consent ? - Can you anonymize the data  These are the only ways that you get access to the data
  • 7. Healthcare Breaches  Best evidence suggests at least 27% of healthcare practices have a breach every year  The costs for healthcare are $200 per individual for breach notification (Ponemon)  This applies whether you have obtained consent or authority
  • 8. De-identification is one piece of an enterprise privacy program that can make privacy work “Privacy by Design” provides helpful best practices Proactive, Preventative, Embedded and Continuous
  • 9. De-Identification Facts or Fiction #1  True or False: - It’s possible to re-identify most, if not all, data.  False: - Using robust methods, evidence suggests risk can be very small.
  • 10. De-Identification Facts or Fiction #2  True or False: - Privacy regulations say that there must be zero chance of re-identification in order for a data set to be used for secondary purposes.  False: - HIPAA states that the risk of re-identification must be “very small”. The FTC and other regulations use a “reasonableness” standard. All of these standards take context into account
  • 11. De-Identification Facts or Fiction #3  True or False: - Only covered entities should consider HIPAA as a standard for de-identification.  False: - HIPAA is a good standard to use regardless of the applicable regulations.
  • 12. OVERVIEW OF ANONYMIZATION How to Successfully Leverage Data While Protecting Individual Privacy
  • 13. PRIVACYANALYTICS.CA © 2012-2013, Privacy Analytics. All Rights Reserved13 of 76 Balancing Data Privacy Requires Evaluation of Privacy Protection and Data Utility
  • 15. Direct and In-Direct/Quasi-Identifiers Examples of direct identifiers: Name, address, telephone number, fax number, MRN, health card number, health plan beneficiary number, license plate number, email address, photograph, biometrics, SSN, SIN, implanted device number Examples of quasi identifiers: sex, date of birth or age, geographic locations (such as postal codes, census geography, information about proximity to known or unique landmarks), language spoken at home, ethnic origin, total years of schooling, marital status, criminal history, total income, visible minority status, profession, event dates
  • 17. A process that removes the association between the identifying data and the data subject. (Source ISO/TS 25237:2008)
  • 18. Reducing the risk of identifying a data subject to a very small level through the application of a set of data transformation techniques without any concern for the analytics utility of the data.
  • 20. A particular type of anonymization that both removes the association with a data subject and adds an association between a particular set of characteristics to the data subject and one or more pseudonyms (Source: ISO/TS 25237:2008)
  • 21. Replacing a value in the data with a random value from a large database of possible values
  • 22. Data Masking Data Masking = No analytics on those fields
  • 23. Reducing the risk of identifying a data subject to a very small level through the application of a set of data transformation techniques such that the resulting data retains a very high analytics value.
  • 24. Reducing the precision of a value to a more general one
  • 25. The removal of records or values (cells) in the data
  • 26. Randomly selecting a subset of records or patients from a data set
  • 27. The motives and capacity of the data recipient to re-identify the data
  • 28. The security and privacy practices that the data recipient has in place to manage the data received.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37. DE-IDENTIFICATION MATURITY MODEL How to Successfully Leverage Data While Protecting Individual Privacy
  • 38. De-identification Maturity Model (DMM)  Formal framework to evaluate maturity of de-identification services within an organization  Gauges level of an organization’s readiness and experience in relation to people, processes, technologies and consistent measurement practices  “DMM” used as a measurement tool; enables the enterprise to implement a grounded strategy based on facts  Improves compliance, facilitates access, and scales support services
  • 39. Three Dimensions of the DMM A CB
  • 40. Practice Dimension  DMM has five maturity levels for the de-identification practices that an organization has in place  Level 1 is lowest level of maturity and level 5 is the highest level of maturity Adhoc Masking Heuristic Risk Based Governance 1 2 3 4 5 A
  • 41. Case Study 1 – Safe Harbor  Organization A is a disease registry  They have lots of databases that they connect to and they do a lot of data releases to internal and external data analysts  Practice Dimension (what you do): - Their primary way of anonymizing data is through following the Safe Harbor de-identification standard (L3)  Implementation Dimension (how well you do it): - There is a clear process and well defined roles for following SH, which is well documented - Because its documented, it’s repeatable (L3)
  • 42. Safe Harbor Safe Harbor Direct Identifiers and Quasi-identifiers 1. Names 2. ZIP Codes (except first three) 3. All elements of dates (except year) 4. Telephone numbers 5. Fax numbers 6. Electronic mail addresses 7. Social security numbers 8. Medical record numbers 9. Health plan beneficiary numbers 10.Account numbers 11.Certificate/license numbers 12.Vehicle identifiers and serial numbers, including license plate numbers 13.Device identifiers and serial numbers 14.Web Universal Resource Locators (URLs) 15.Internet Protocol (IP) address numbers 16.Biometric identifiers, including finger and voice prints 17.Full face photographic images and any comparable images; 18. Any other unique identifying number, characteristic, or code Actual Knowledge
  • 43. Case Study 1 – Safe Harbor  Automation dimension (is it automated) - They use a home grown scripts for implementing SH - The scripts do not have any external validation that they work or are sufficient (L1)  Challenges - Despite these efforts, they have missed some key items - There have been pressures by analysts to provide more granular data
  • 44. Case Study 1 – Safe Harbor - They have interpreted the SH regulation for dates such that they have only dealt with dates of birth rather than all dates - They have not brought all zip down to 3, and for regions where there are fewer than 20K people replace with 000 per SH - Some identifiers were missed (such as clinical trial participant numbers) - Did not consider the Actual Knowledge requirement in SH
  • 45. Case Study 2 – Masking  Company B is a claims processor  They have a need for realistic data for software testing  Practice Dimension (what you do): - Their primary way of anonymizing is through data masking - This means they deal only with the direct identifiers (L2)  Implementation Dimension (how well you do it): - There is a clear process for doing masking and how they implement heuristics, which is well documented - Because its documented, it’s repeatable (L3)
  • 46. Case Study 2 – Masking  Automation dimension (is it automated) - They use a commercial product for masking - This product produces consistent results (L2)  Challenges - Despite these efforts, they have missed some key items – the quasi- identifiers - Some dates and ZIP codes were not addressed - There is no evidence that the risk of re-identification was “very small” - The tool vendor architect provided assurance that this was OK
  • 47. Case Study 3 – Governance  Company C is an EMR vendor  They have a need to provide reports to their clients on trends and benchmarks to help clients to improve their businesses  Practice Dimension (what you do): - They have a risk-based approach which includes anonymizing both direct identifiers (masking) and in-direct identifiers (de-identification)  Implementation Dimension (how well you do it): - There is a clear process for anonymizing the data which is well documented - Because its documented, it’s repeatable
  • 48. Case Study 3 – Governance - They have on-going training of staff on how to do the anonymization - They are able to quickly produce reports and metrics documenting what they did to the data before they released it - They have automated data sharing agreements which specifies the controls that need to be in place by data users - They have a full audit trail to demonstrate that the risk of re- identification is “very small” per HIPAA - They track when there is overlap between the various data sets - Audits are conducted on data users to confirm compliance with conditions
  • 49. Case Study 3 – Governance  Automation Dimension (is it automated) - They use commercial software to do masking and de- identification - The product produces consistent results - They are able to get defensible anonymization more quickly than by doing it manually - The product has been scrutinized by other users & peers and is upgraded on a regular basis - They are able to release more data sets, more quickly
  • 50. Benefits of DMM  Determine whether an organization can defensibly ensure risk of re- identification is “very small”  Provides a road map to meet regulatory and legal requirements  Automation and governance allow organizations to share more data for secondary purposes with fewer resources  A higher the level of maturity results in higher quality data and greater consistency in de-identification  Significant improvement in ability to estimate resources and time required to de-identify data sets
  • 51. PRIVACYANALYTICS.CA © 2012-2013, Privacy Analytics. All Rights Reserved51 of 92 Key Learnings
  • 52. Data Anonymization Resources Book Signing: Sept 26,10:35 am Booth # 107 Khaled El Emam & Luk Arbuckle
  • 53. Other Conference Activities  Session: Facilitating Analytics While Protecting Individual Privacy Using Data De-identification - Khaled El Emam - Thursday , September 26 @ 4:00pm, Salon F  Office hours in the Sponsor Pavilion: - Nathalie Holmes - Thursday, September 26 @ 3:10pm, Table D - Khaled El Emam - Thursday, September 26 @ 6:30pm, Table D
  • 54. Contact Nathalie Holmes: nholmes@privacyanalytics.ca 613.369.4313 ext 122 Khaled El Emam: kelemam@ehealthinformation.ca 613.738.4181 @PrivacyAnalytic 2012 Start-Up Showcase Winner
  • 55. Review Quiz  What does anonymization mean?  What is the difference between data masking and de-identification?  Why is it important to strive for balance between privacy and data utility?  How many levels of maturity (Practice Dimension) are there in the DMM?  Is it possible to be at Practice Dimension 1 (Ad hoc) and score well in the Implementation Dimension? Ex. Have a repeatable, defined and measurable process?  What are some advantages of having Standard Automation (software)?  What is the main difference between Practice Dimension 4 (Risk Based) and Dimension 5 (Governance)?