SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn
Chair for Medical Informatics
Institute for Medical Statistics and Epidemiologie
Technical University of Munich (TUM)
A Generic Method for Assessing the
Quality of De-Identified Health Data
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
2 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Motivation: legal requirements
●
Secondary use of health care data for research
●
Data sharing in cooperative research
Goal: privacy protection
●
Ensure that recipients cannot learn the identity of data subjects
●
Re-identification can have severe legal consequences
Basis: make sure that the recipient is as trustworthy as possible
●
Sign data use agreements, approval by data access committees
●
Implement multiple layers of access to create controlled environments
Residual risks: data de-identification (also called: data anonymization)
●
Step 1: Remove identifying data (e.g. names, insurance numbers)
●
Step 2: Modify data to reduce the uniqueness of potentially identifying attribute
values (e.g. date-of-birth, sex, zip code)
Background
Technische Universität München
Generalization
Suppression
Micro-aggregation
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
3 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Example
Reduction of the uniqueness of potentially identifying values
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
4 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Trade-off: privacy risks vs. quality of data
Models are needed for measuring both aspects
●
Privacy: k-anonymity, k-map, strict average risk, population uniqueness
●
Quality: loss of information (e.g. granularity), changes in statistical properties
(e.g. tendency, dispersion, shape of distributions), data utility (e.g. classification)
Challenge
Privacy risk
Dataquality
Original data
Highest risk
No data
No risk
Potential solutions
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
5 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Data transformation: attribute generalization
Recommended for health data: generalization hierarchies
Examples
Input data Global recoding Local recoding
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
6 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Data transformation: global recoding
Identical input values are mapped to identical generalized values
Examples
Input data Global recoding Local recoding
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
7 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Data transformation: local recoding
Identical values may be generalized to different levels
Examples
More flexible: can preserve more more information content
Input data Global recoding Local recoding
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
8 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Well known model for measuring information loss
●
Developed by the statistical disclosure control community
●
A. De Waal and L. Willenborg, Information loss through global recoding and local
suppression, Netherlands Official Statistics 14 (1999), 17–20.
Often used for de-identifying health data
●
Recommended in several guidelines, used in papers
Based on the concept of mutual information
●
Quantifies the amount of information which can be obtained about one variable
by observing the other
Application to data anonymization
●
Measure loss of information by comparing input data with transformed output data
Can only be used with global recoding (details: see paper)
●
We have developed a generic variant which supports local recoding
(generalization, record suppression, cell suppression)
Non-Uniform Entropy
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
9 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Generic Non-Uniform Entropy
Global recoding to level 0
Global recoding to level 1
Global recoding to level 2
Age
input
Age
output
Global recoding, so we
can use Non-Uniform
Entropy for calculating
Δ0,1 and Δ1,2 !
Basic idea: model local recoding as iterative global recoding
This can be done for every local recoding scheme
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
10 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Generic Non-Uniform Entropy
Basic idea: model local recoding as iterative global recoding
Result: Δ' = Δ0,1 + Δ1,2
Age
input
Age
output
Global recoding to level 0
Global recoding to level 1
Global recoding to level 2
Non-Uniform
Entropy
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
11 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Generic Non-Uniform Entropy
Basic idea: model local recoding as iterative global recoding
Result: Δ' = Δ0,1 + Δ1,2
Age
input
Age
output
Global recoding to level 0
Global recoding to level 1
Global recoding to level 2
Non-Uniform
Entropy
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
12 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Experiments
Two datasets
●
Extract of the 1994 US census database: 30,162 records
●
Health interview series: US survey with 1,193,504 participants
Transformation scheme
●
Initially: global recoding with generalization
●
Schemes: original, low, medium, high
●
Followed by: local recoding with record suppression
●
Iterative removal of records (10%, 20%, …, 100%)
Measured information loss with two models
●
Non-Uniform Entropy
●
Our generic variant
Expected outcome
●
Initially: loss of information via generalization
●
Followed by: linear increase of information loss (number of removed records)
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
13 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
Results
Both models measured the same initial loss of information
Only our model captured the linear increase
→ Non-Uniform Entropy measured information gain followed by decrease
Technische Universität München
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
14 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016
The method describe here has been implemented into ARX
●
Oriented towards guidelines for health data de-identification
●
Supports a wide variety of approaches to data de-identification
●
Requires development of generic methods
Highly scalable
●
Millions of records with up to 50 potentially identifying attributes
Mentioned in several data protection guidelines
●
European Medicines Agency (EMA): External Guidance on the Implementation of
the European Medicines Agency Policy on the Publication of Clinical Data for
Medicinal Products for Human Use (2016)
●
EU Agency for Network and Information Security (ENISA): Privacy and Data
Protection by Design (2014)
ARX is open source software
●
Website: http://arx.deidentifier.org
●
Email: fabian.prasser@tum.de
ARX – An anonymization tool for biomedical data
Technische Universität München
Thank you for your attention!
Fabian Prasser, Raffael Bild, Klaus A. Kuhn:
A Generic Method for Assessing the Quality of De-Identified Health Data
15 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016

Mais conteúdo relacionado

Mais procurados

IRJET - An Effective Stroke Prediction System using Predictive Models
IRJET -  	  An Effective Stroke Prediction System using Predictive ModelsIRJET -  	  An Effective Stroke Prediction System using Predictive Models
IRJET - An Effective Stroke Prediction System using Predictive ModelsIRJET Journal
 
Computer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-KuchinkeComputer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-KuchinkeWolfgang Kuchinke
 
How Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineHow Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineMatthieu Schapranow
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...GaryCollins74
 
ICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining ApproachICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining Approachcsandit
 
Case Retrieval using Bhattacharya Coefficient with Particle Swarm Optimization
Case Retrieval using Bhattacharya Coefficient with Particle Swarm OptimizationCase Retrieval using Bhattacharya Coefficient with Particle Swarm Optimization
Case Retrieval using Bhattacharya Coefficient with Particle Swarm Optimizationrahulmonikasharma
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialMatthieu Schapranow
 
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUESPREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUEScscpconf
 
Multi-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gainMulti-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gainPaul Agapow
 
Sample size and sampleing
Sample size and sampleingSample size and sampleing
Sample size and sampleingAmna Khairy
 
Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug developmentSRUTHI N
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Matthieu Schapranow
 
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...MIS Quarterly
 
Efficiency of Prediction Algorithms for Mining Biological Databases
Efficiency of Prediction Algorithms for Mining Biological  DatabasesEfficiency of Prediction Algorithms for Mining Biological  Databases
Efficiency of Prediction Algorithms for Mining Biological DatabasesIOSR Journals
 
Statistical and Intelligent Methods of Medical Data Processing
Statistical and Intelligent Methods of Medical Data Processing Statistical and Intelligent Methods of Medical Data Processing
Statistical and Intelligent Methods of Medical Data Processing ITIIIndustries
 
Artificial intelligence in drug discovery
Artificial intelligence in drug discoveryArtificial intelligence in drug discovery
Artificial intelligence in drug discoveryRAVINDRABABUKOPPERA
 
Unifying Genomics, Phenomics, and Environments
Unifying Genomics, Phenomics, and EnvironmentsUnifying Genomics, Phenomics, and Environments
Unifying Genomics, Phenomics, and EnvironmentsAnne Thessen
 
Ehr useability and physician burn out
Ehr useability and physician burn out Ehr useability and physician burn out
Ehr useability and physician burn out Asia Mckenzie
 

Mais procurados (18)

IRJET - An Effective Stroke Prediction System using Predictive Models
IRJET -  	  An Effective Stroke Prediction System using Predictive ModelsIRJET -  	  An Effective Stroke Prediction System using Predictive Models
IRJET - An Effective Stroke Prediction System using Predictive Models
 
Computer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-KuchinkeComputer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-Kuchinke
 
How Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineHow Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision Medicine
 
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
QUANTIFYING THE IMPACT OF DIFFERENT APPROACHES FOR HANDLING CONTINUOUS PREDIC...
 
ICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining ApproachICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining Approach
 
Case Retrieval using Bhattacharya Coefficient with Particle Swarm Optimization
Case Retrieval using Bhattacharya Coefficient with Particle Swarm OptimizationCase Retrieval using Bhattacharya Coefficient with Particle Swarm Optimization
Case Retrieval using Bhattacharya Coefficient with Particle Swarm Optimization
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
 
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUESPREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
 
Multi-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gainMulti-omics for drug discovery: what we lose, what we gain
Multi-omics for drug discovery: what we lose, what we gain
 
Sample size and sampleing
Sample size and sampleingSample size and sampleing
Sample size and sampleing
 
Ai in drug discovery and drug development
Ai in drug discovery and drug developmentAi in drug discovery and drug development
Ai in drug discovery and drug development
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?
 
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
Healthcare Predicitive Analytics for Risk Profiling in Chronic Care: A Bayesi...
 
Efficiency of Prediction Algorithms for Mining Biological Databases
Efficiency of Prediction Algorithms for Mining Biological  DatabasesEfficiency of Prediction Algorithms for Mining Biological  Databases
Efficiency of Prediction Algorithms for Mining Biological Databases
 
Statistical and Intelligent Methods of Medical Data Processing
Statistical and Intelligent Methods of Medical Data Processing Statistical and Intelligent Methods of Medical Data Processing
Statistical and Intelligent Methods of Medical Data Processing
 
Artificial intelligence in drug discovery
Artificial intelligence in drug discoveryArtificial intelligence in drug discovery
Artificial intelligence in drug discovery
 
Unifying Genomics, Phenomics, and Environments
Unifying Genomics, Phenomics, and EnvironmentsUnifying Genomics, Phenomics, and Environments
Unifying Genomics, Phenomics, and Environments
 
Ehr useability and physician burn out
Ehr useability and physician burn out Ehr useability and physician burn out
Ehr useability and physician burn out
 

Destaque

Data Privacy and Anonymization
Data Privacy and AnonymizationData Privacy and Anonymization
Data Privacy and AnonymizationJeffrey Wang
 
Data Privacy: Anonymization & Re-Identification
Data Privacy: Anonymization & Re-IdentificationData Privacy: Anonymization & Re-Identification
Data Privacy: Anonymization & Re-IdentificationMike Nowakowski
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataarx-deidentifier
 
Nov 19 Webinar TPR
Nov 19 Webinar TPRNov 19 Webinar TPR
Nov 19 Webinar TPRAndrew Rose
 
Privacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachPrivacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachNarendra Dhadhal
 
Engineering data privacy - The ARX data anonymization tool
Engineering data privacy - The ARX data anonymization toolEngineering data privacy - The ARX data anonymization tool
Engineering data privacy - The ARX data anonymization toolarx-deidentifier
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymizationarx-deidentifier
 

Destaque (11)

Data Privacy and Anonymization
Data Privacy and AnonymizationData Privacy and Anonymization
Data Privacy and Anonymization
 
Data Privacy: Anonymization & Re-Identification
Data Privacy: Anonymization & Re-IdentificationData Privacy: Anonymization & Re-Identification
Data Privacy: Anonymization & Re-Identification
 
Data anonymization
Data anonymizationData anonymization
Data anonymization
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical dataARX - a comprehensive tool for anonymizing / de-identifying biomedical data
ARX - a comprehensive tool for anonymizing / de-identifying biomedical data
 
Big Data & Privacy
Big Data & PrivacyBig Data & Privacy
Big Data & Privacy
 
Nov 19 Webinar TPR
Nov 19 Webinar TPRNov 19 Webinar TPR
Nov 19 Webinar TPR
 
Privacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approachPrivacy preserving in data mining with hybrid approach
Privacy preserving in data mining with hybrid approach
 
Engineering data privacy - The ARX data anonymization tool
Engineering data privacy - The ARX data anonymization toolEngineering data privacy - The ARX data anonymization tool
Engineering data privacy - The ARX data anonymization tool
 
An overview of methods for data anonymization
An overview of methods for data anonymizationAn overview of methods for data anonymization
An overview of methods for data anonymization
 

Semelhante a ARX - A Generic Method for Assessing the Quality of De-Identified Health Data

Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Matthieu Schapranow
 
Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Athula Herath
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?Matthieu Schapranow
 
Dr. Hager 2022 Interview Gesundhyte.de Magazine, page 29
Dr. Hager 2022 Interview Gesundhyte.de Magazine, page 29Dr. Hager 2022 Interview Gesundhyte.de Magazine, page 29
Dr. Hager 2022 Interview Gesundhyte.de Magazine, page 29Dr. Martin Hager, MBA
 
Towards an ecosystem for privacy respecting analysis of distributed health data
Towards an ecosystem for privacy respecting analysis of distributed health data Towards an ecosystem for privacy respecting analysis of distributed health data
Towards an ecosystem for privacy respecting analysis of distributed health data Wessel Kraaij
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Matthieu Schapranow
 
2019 10-14 2nd Int Congress on Precision Medicine, Munich, Alain van Gool
2019 10-14 2nd Int Congress on Precision Medicine, Munich, Alain van Gool2019 10-14 2nd Int Congress on Precision Medicine, Munich, Alain van Gool
2019 10-14 2nd Int Congress on Precision Medicine, Munich, Alain van GoolAlain van Gool
 
Aysun Karatas MedicReS World Congress 2015
Aysun Karatas MedicReS World Congress 2015 Aysun Karatas MedicReS World Congress 2015
Aysun Karatas MedicReS World Congress 2015 MedicReS
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Matthieu Schapranow
 
Turning Big Data into Precision Medicine
Turning Big Data into Precision MedicineTurning Big Data into Precision Medicine
Turning Big Data into Precision MedicineMatthieu Schapranow
 
La Médecine du futur !
La Médecine du futur !La Médecine du futur !
La Médecine du futur !Geeks Anonymes
 
Effective Population Health Management Means Being Able to Predict the Future
Effective Population Health Management Means Being Able to Predict the FutureEffective Population Health Management Means Being Able to Predict the Future
Effective Population Health Management Means Being Able to Predict the FutureCitiusTech
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...Matthieu Schapranow
 
Bias in covid 19 models
Bias in covid 19 modelsBias in covid 19 models
Bias in covid 19 modelsLaure Wynants
 
Transforming Health Care In Africa
Transforming Health Care In Africa Transforming Health Care In Africa
Transforming Health Care In Africa Jacques Kpodonu,MD
 
Data explosion in medicine: challenges and opportunities
Data explosion in medicine: challenges and opportunitiesData explosion in medicine: challenges and opportunities
Data explosion in medicine: challenges and opportunitiesOurlad Alzeus Tantengco
 
Crimson Publishers - The Use of Artificial Intelligence Methods in the Evalua...
Crimson Publishers - The Use of Artificial Intelligence Methods in the Evalua...Crimson Publishers - The Use of Artificial Intelligence Methods in the Evalua...
Crimson Publishers - The Use of Artificial Intelligence Methods in the Evalua...CrimsonpublishersMedical
 
iietalk16 (1).ppt radiology and nlp discovery
iietalk16 (1).ppt radiology and nlp discoveryiietalk16 (1).ppt radiology and nlp discovery
iietalk16 (1).ppt radiology and nlp discoveryyoukayaslam
 
AI_health.ppt
AI_health.pptAI_health.ppt
AI_health.pptAlenaOlga
 
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...ijdms
 

Semelhante a ARX - A Generic Method for Assessing the Quality of De-Identified Health Data (20)

Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
 
Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...Leverage machine learning and new technologies to enhance rwe generation and ...
Leverage machine learning and new technologies to enhance rwe generation and ...
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?
 
Dr. Hager 2022 Interview Gesundhyte.de Magazine, page 29
Dr. Hager 2022 Interview Gesundhyte.de Magazine, page 29Dr. Hager 2022 Interview Gesundhyte.de Magazine, page 29
Dr. Hager 2022 Interview Gesundhyte.de Magazine, page 29
 
Towards an ecosystem for privacy respecting analysis of distributed health data
Towards an ecosystem for privacy respecting analysis of distributed health data Towards an ecosystem for privacy respecting analysis of distributed health data
Towards an ecosystem for privacy respecting analysis of distributed health data
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 
2019 10-14 2nd Int Congress on Precision Medicine, Munich, Alain van Gool
2019 10-14 2nd Int Congress on Precision Medicine, Munich, Alain van Gool2019 10-14 2nd Int Congress on Precision Medicine, Munich, Alain van Gool
2019 10-14 2nd Int Congress on Precision Medicine, Munich, Alain van Gool
 
Aysun Karatas MedicReS World Congress 2015
Aysun Karatas MedicReS World Congress 2015 Aysun Karatas MedicReS World Congress 2015
Aysun Karatas MedicReS World Congress 2015
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
Turning Big Data into Precision Medicine
Turning Big Data into Precision MedicineTurning Big Data into Precision Medicine
Turning Big Data into Precision Medicine
 
La Médecine du futur !
La Médecine du futur !La Médecine du futur !
La Médecine du futur !
 
Effective Population Health Management Means Being Able to Predict the Future
Effective Population Health Management Means Being Able to Predict the FutureEffective Population Health Management Means Being Able to Predict the Future
Effective Population Health Management Means Being Able to Predict the Future
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Bias in covid 19 models
Bias in covid 19 modelsBias in covid 19 models
Bias in covid 19 models
 
Transforming Health Care In Africa
Transforming Health Care In Africa Transforming Health Care In Africa
Transforming Health Care In Africa
 
Data explosion in medicine: challenges and opportunities
Data explosion in medicine: challenges and opportunitiesData explosion in medicine: challenges and opportunities
Data explosion in medicine: challenges and opportunities
 
Crimson Publishers - The Use of Artificial Intelligence Methods in the Evalua...
Crimson Publishers - The Use of Artificial Intelligence Methods in the Evalua...Crimson Publishers - The Use of Artificial Intelligence Methods in the Evalua...
Crimson Publishers - The Use of Artificial Intelligence Methods in the Evalua...
 
iietalk16 (1).ppt radiology and nlp discovery
iietalk16 (1).ppt radiology and nlp discoveryiietalk16 (1).ppt radiology and nlp discovery
iietalk16 (1).ppt radiology and nlp discovery
 
AI_health.ppt
AI_health.pptAI_health.ppt
AI_health.ppt
 
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...
PERFORMANCE OF DATA MINING TECHNIQUES TO PREDICT IN HEALTHCARE CASE STUDY: CH...
 

Último

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 

Último (20)

英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 

ARX - A Generic Method for Assessing the Quality of De-Identified Health Data

  • 1. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn Chair for Medical Informatics Institute for Medical Statistics and Epidemiologie Technical University of Munich (TUM) A Generic Method for Assessing the Quality of De-Identified Health Data
  • 2. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 2 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Motivation: legal requirements ● Secondary use of health care data for research ● Data sharing in cooperative research Goal: privacy protection ● Ensure that recipients cannot learn the identity of data subjects ● Re-identification can have severe legal consequences Basis: make sure that the recipient is as trustworthy as possible ● Sign data use agreements, approval by data access committees ● Implement multiple layers of access to create controlled environments Residual risks: data de-identification (also called: data anonymization) ● Step 1: Remove identifying data (e.g. names, insurance numbers) ● Step 2: Modify data to reduce the uniqueness of potentially identifying attribute values (e.g. date-of-birth, sex, zip code) Background
  • 3. Technische Universität München Generalization Suppression Micro-aggregation Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 3 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Example Reduction of the uniqueness of potentially identifying values
  • 4. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 4 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Trade-off: privacy risks vs. quality of data Models are needed for measuring both aspects ● Privacy: k-anonymity, k-map, strict average risk, population uniqueness ● Quality: loss of information (e.g. granularity), changes in statistical properties (e.g. tendency, dispersion, shape of distributions), data utility (e.g. classification) Challenge Privacy risk Dataquality Original data Highest risk No data No risk Potential solutions
  • 5. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 5 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Data transformation: attribute generalization Recommended for health data: generalization hierarchies Examples Input data Global recoding Local recoding
  • 6. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 6 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Data transformation: global recoding Identical input values are mapped to identical generalized values Examples Input data Global recoding Local recoding
  • 7. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 7 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Data transformation: local recoding Identical values may be generalized to different levels Examples More flexible: can preserve more more information content Input data Global recoding Local recoding
  • 8. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 8 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Well known model for measuring information loss ● Developed by the statistical disclosure control community ● A. De Waal and L. Willenborg, Information loss through global recoding and local suppression, Netherlands Official Statistics 14 (1999), 17–20. Often used for de-identifying health data ● Recommended in several guidelines, used in papers Based on the concept of mutual information ● Quantifies the amount of information which can be obtained about one variable by observing the other Application to data anonymization ● Measure loss of information by comparing input data with transformed output data Can only be used with global recoding (details: see paper) ● We have developed a generic variant which supports local recoding (generalization, record suppression, cell suppression) Non-Uniform Entropy
  • 9. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 9 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Generic Non-Uniform Entropy Global recoding to level 0 Global recoding to level 1 Global recoding to level 2 Age input Age output Global recoding, so we can use Non-Uniform Entropy for calculating Δ0,1 and Δ1,2 ! Basic idea: model local recoding as iterative global recoding This can be done for every local recoding scheme
  • 10. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 10 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Generic Non-Uniform Entropy Basic idea: model local recoding as iterative global recoding Result: Δ' = Δ0,1 + Δ1,2 Age input Age output Global recoding to level 0 Global recoding to level 1 Global recoding to level 2 Non-Uniform Entropy
  • 11. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 11 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Generic Non-Uniform Entropy Basic idea: model local recoding as iterative global recoding Result: Δ' = Δ0,1 + Δ1,2 Age input Age output Global recoding to level 0 Global recoding to level 1 Global recoding to level 2 Non-Uniform Entropy
  • 12. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 12 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Experiments Two datasets ● Extract of the 1994 US census database: 30,162 records ● Health interview series: US survey with 1,193,504 participants Transformation scheme ● Initially: global recoding with generalization ● Schemes: original, low, medium, high ● Followed by: local recoding with record suppression ● Iterative removal of records (10%, 20%, …, 100%) Measured information loss with two models ● Non-Uniform Entropy ● Our generic variant Expected outcome ● Initially: loss of information via generalization ● Followed by: linear increase of information loss (number of removed records)
  • 13. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 13 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 Results Both models measured the same initial loss of information Only our model captured the linear increase → Non-Uniform Entropy measured information gain followed by decrease
  • 14. Technische Universität München Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 14 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016 The method describe here has been implemented into ARX ● Oriented towards guidelines for health data de-identification ● Supports a wide variety of approaches to data de-identification ● Requires development of generic methods Highly scalable ● Millions of records with up to 50 potentially identifying attributes Mentioned in several data protection guidelines ● European Medicines Agency (EMA): External Guidance on the Implementation of the European Medicines Agency Policy on the Publication of Clinical Data for Medicinal Products for Human Use (2016) ● EU Agency for Network and Information Security (ENISA): Privacy and Data Protection by Design (2014) ARX is open source software ● Website: http://arx.deidentifier.org ● Email: fabian.prasser@tum.de ARX – An anonymization tool for biomedical data
  • 15. Technische Universität München Thank you for your attention! Fabian Prasser, Raffael Bild, Klaus A. Kuhn: A Generic Method for Assessing the Quality of De-Identified Health Data 15 / 15Health – Exploring Complexity HEC 2016 / Medical Informatics Europe MIE 201619.08.2016