SlideShare uma empresa Scribd logo
1 de 16
Amnesia
Data anonymization made easy
https://amnesia.openaire.eu
Manolis Terrovitis
mter@imis.athena-innovation.gr
http://web.imsi.athenarc.gr/~mter/
Research Center Athena, IMSI
Amnesia – Webinar 24/4/2018
Data anonymization?
• Data anonymization facilitates the publication of micro data(vs.
aggregated macrodata) , e.g., data used in scientific research
• Micro data often reveal important private information, e.g., the
medical condition of a person
o Individuals are afraid to provide their data
o Companies are afraid to share data with experts
o GDPR makes a strict protection scheme obligatory
• The aim of anonymization methods is to allow sharing such data,
without compromising the privacy of the users.
Amnesia - Webinar 24/4/2018
Data anonymization and
Amnesia
• Data anonymization
• Removal of direct identifiers, e.g., Names, SSN etc
• Removal of infrequent combinations of quasi-identifiers, e.g., unique combinations of
birth dates and zipcodes
• Infrequent combinations are removed through generalization, e.g., birth date
14/01/1977 becomes **/**/1977
• Amnesia is a scalable anonymization tool
• It offers several versions of k-anonymity
• It allows the user to select and customize possible solutions
• It offers graphical tools that allow the user to analyze the anonymized dataset
• It is scalable and uses all available CPU cores in the anonymization process
Amnesia - Webinar 24/4/2018
Link attacks
Amnesia - Webinar 24/4/2018
k-anonymity
• Each entry becomes indistinguishable from
other k-1 entries
o k-anonymity is achieved through suppression and
generalization
id Zipcode Age National. Disease
1 13053 28 Russian Heart Disease
2 13068 29 American Heart Disease
3 13068 21 Japanese Viral Infection
4 13053 23 American Viral Infection
5 14853 50 Indian Cancer
6 14853 55 Russian Heart Disease
7 14850 47 American Viral Infection
8 14850 49 American Viral Infection
9 13053 31 American Cancer
10 13053 37 Indian Cancer
11 13068 36 Japanese Cancer
12 13068 35 American Cancer
id Zipcode Age National. Disease
1 130** <30 ∗ Heart Disease
2 130** <30 ∗ Heart Disease
3 130** <30 ∗ Viral Infection
4 130** <30 ∗ Viral Infection
5 1485* ≥40 ∗ Cancer
6 1485* ≥40 ∗ Heart Disease
7 1485* ≥40 ∗ Viral Infection
8 1485* ≥40 ∗ Viral Infection
9 130** 3∗ ∗ Cancer
10 130** 3∗ ∗ Cancer
11 130** 3∗ ∗ Cancer
12 130** 3∗ ∗ Cancer
Amnesia - Webinar 24/4/2018
Generalization Hierarchy
Amnesia - Webinar 24/4/2018
7 9 16 18
0-10 10-20
*
Structural information
• We need to anonymize all relevant information about a
person, not just a tuple
• Information tends to gather over time
• Information is linked through semantic properties, it’s schema
is irrelevant
• Personal data tend to accumulate over time
• Research focuses on simple data and complicated
guaranties but real world has complex data and requires
simple guaranties
Amnesia - Webinar 24/4/2018
Limitsofk-anonymity
• 2-anonymous
Fruits Meat Vegetables Fish
Vassilis Χ Χ
Manolis Χ Χ Χ
Eleni Χ
Maria Χ Χ
Kostas Χ Χ
Food
Vassilis Χ
Manolis Χ
Eleni Χ
Maria Χ
Kostas Χ
Amnesia - Webinar 24/4/2018
km-anonymity
• 22-anonymous
• Any
combination of
m items will not
appear less
than k times
Fruits Meat Vegetables Fish
Vassilis Χ Χ
Manolis Χ Χ Χ
Eleni Χ
Maria Χ Χ
Kostas Χ Χ
Fruits Meat Other food
Vassilis Χ Χ
Manolis X Χ X
Eleni X
Maria Χ X
Kostas Χ X
Amnesia - Webinar 24/4/2018
Strengths and Weaknesses
• Strengths
o Simple to understand
• Can be the basis for consent
o Close to previous and existing legal definitions
o Low information loss
o Customizable by non-experts
• Weaknesses
o Not very strict
o Does not take into account sensitive values
Amnesia - Webinar 24/4/2018
Anonymization challenges
• Anonymization techniques have not been tested in practice
extensively
o Mapping the social notion of privacy to technical notions is not easy
• Data utility has not been studied extensively in research
o Few artificial information loss measures
• Data utility is difficult to estimate in practice
o Different applications have different needs
o No easy to quantify the loss of information
Amnesia - Webinar 24/4/2018
Amensia
• Amnesia is a data anonymization tool developed by Research
Center Athena
• Amnesia is build with Java and Javascript
• k-anonymity and km-anonymity
• Tuples and set-values
• Visual tools
o Estimating data utility
o Building hierarchies
o Customizing anonymization solutions
Amnesia - Webinar 24/4/2018
Amnesia status
• Amnesia is available as a public beta version at
o https://amnesia.openaire.eu
• On-line version is for demonstration and testing purposes mostly
• Sensitive data can be anonymized locally by downloading the
application
o Security
o Scalability
• We are in process of adjusting it to health data
Amnesia - Webinar 24/4/2018
Amensia Challenges
Is it easy to use by data owners? Are anoymized data useful?
Amnesia - Webinar 24/4/2018
• Give us feedback!!
o amnesia-helpdesk@imis.athena-
innovation.gr
• Can it anonymize your data?
o Let us know about your use case
o Ask us for help
• We need feedback for data
analysis
o Let us know if you have shared
anonymized results
• Please contact us with your
needs
Next steps
Work on the feedback More features
Amnesia - Webinar 24/4/2018
• Improve user experience
• Add support for specific
domain data
• Fix bugs!
• New algorithms
o Additional privacy guaranties
o More data types
• Better scaling capabilities
o Disk based solutions
o More efficient memory usage
HTTPS://AMNESIA.OPENAIRE.EU/
Thank you!
Amnesia - Webinar 24/4/2018

Mais conteúdo relacionado

Mais de OpenAIRE

Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)OpenAIRE
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)OpenAIRE
 
6th Content Providers Community Call
6th Content Providers Community Call6th Content Providers Community Call
6th Content Providers Community CallOpenAIRE
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
 
20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?OpenAIRE
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open ScienceOpenAIRE
 
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)OpenAIRE
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open ScienceOpenAIRE
 
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing DataOpenAIRE
 
COVID-19: Activities, tools, best practice and contact points in Greece
 COVID-19: Activities, tools, best practice and contact points in Greece COVID-19: Activities, tools, best practice and contact points in Greece
COVID-19: Activities, tools, best practice and contact points in GreeceOpenAIRE
 
5th Content Providers Community Call
5th Content Providers Community Call5th Content Providers Community Call
5th Content Providers Community CallOpenAIRE
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community CallOpenAIRE
 
3rd Content Providers Community Call
3rd Content Providers Community Call3rd Content Providers Community Call
3rd Content Providers Community CallOpenAIRE
 
2nd Content Providers Community Call
2nd Content Providers Community Call2nd Content Providers Community Call
2nd Content Providers Community CallOpenAIRE
 
1st Content Providers Community Call
1st Content Providers Community Call1st Content Providers Community Call
1st Content Providers Community CallOpenAIRE
 
20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraphOpenAIRE
 
IPR and Exploitation
IPR and Exploitation IPR and Exploitation
IPR and Exploitation OpenAIRE
 
Eosc_OpenAIRE_onboarding_v2
Eosc_OpenAIRE_onboarding_v2Eosc_OpenAIRE_onboarding_v2
Eosc_OpenAIRE_onboarding_v2OpenAIRE
 
Open Science infrastructure in the EU
Open Science infrastructure in the EUOpen Science infrastructure in the EU
Open Science infrastructure in the EUOpenAIRE
 
OpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE
 

Mais de OpenAIRE (20)

Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 2)
 
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
Open Research Gateway for the ELIXIR-GR Infrastructure (Part 1)
 
6th Content Providers Community Call
6th Content Providers Community Call6th Content Providers Community Call
6th Content Providers Community Call
 
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?20200504_Research Data & the GDPR: How Open is Open?
20200504_Research Data & the GDPR: How Open is Open?
 
20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science20200504_Data, Data Ownership and Open Science
20200504_Data, Data Ownership and Open Science
 
20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)20200429_Research Data & the GDPR: How Open is Open? (updated version)
20200429_Research Data & the GDPR: How Open is Open? (updated version)
 
20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science20200429_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science
 
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
 
COVID-19: Activities, tools, best practice and contact points in Greece
 COVID-19: Activities, tools, best practice and contact points in Greece COVID-19: Activities, tools, best practice and contact points in Greece
COVID-19: Activities, tools, best practice and contact points in Greece
 
5th Content Providers Community Call
5th Content Providers Community Call5th Content Providers Community Call
5th Content Providers Community Call
 
4th Content Providers Community Call
4th Content Providers Community Call4th Content Providers Community Call
4th Content Providers Community Call
 
3rd Content Providers Community Call
3rd Content Providers Community Call3rd Content Providers Community Call
3rd Content Providers Community Call
 
2nd Content Providers Community Call
2nd Content Providers Community Call2nd Content Providers Community Call
2nd Content Providers Community Call
 
1st Content Providers Community Call
1st Content Providers Community Call1st Content Providers Community Call
1st Content Providers Community Call
 
20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph20200130_Mannocci_OpenAIRE_ResearchGraph
20200130_Mannocci_OpenAIRE_ResearchGraph
 
IPR and Exploitation
IPR and Exploitation IPR and Exploitation
IPR and Exploitation
 
Eosc_OpenAIRE_onboarding_v2
Eosc_OpenAIRE_onboarding_v2Eosc_OpenAIRE_onboarding_v2
Eosc_OpenAIRE_onboarding_v2
 
Open Science infrastructure in the EU
Open Science infrastructure in the EUOpen Science infrastructure in the EU
Open Science infrastructure in the EU
 
OpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation RepositoriesOpenAIRE Open Innovation call: Next Generation Repositories
OpenAIRE Open Innovation call: Next Generation Repositories
 

Último

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 

Último (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 

Amnesia: Data anonymization made easy

  • 1. Amnesia Data anonymization made easy https://amnesia.openaire.eu Manolis Terrovitis mter@imis.athena-innovation.gr http://web.imsi.athenarc.gr/~mter/ Research Center Athena, IMSI Amnesia – Webinar 24/4/2018
  • 2. Data anonymization? • Data anonymization facilitates the publication of micro data(vs. aggregated macrodata) , e.g., data used in scientific research • Micro data often reveal important private information, e.g., the medical condition of a person o Individuals are afraid to provide their data o Companies are afraid to share data with experts o GDPR makes a strict protection scheme obligatory • The aim of anonymization methods is to allow sharing such data, without compromising the privacy of the users. Amnesia - Webinar 24/4/2018
  • 3. Data anonymization and Amnesia • Data anonymization • Removal of direct identifiers, e.g., Names, SSN etc • Removal of infrequent combinations of quasi-identifiers, e.g., unique combinations of birth dates and zipcodes • Infrequent combinations are removed through generalization, e.g., birth date 14/01/1977 becomes **/**/1977 • Amnesia is a scalable anonymization tool • It offers several versions of k-anonymity • It allows the user to select and customize possible solutions • It offers graphical tools that allow the user to analyze the anonymized dataset • It is scalable and uses all available CPU cores in the anonymization process Amnesia - Webinar 24/4/2018
  • 4. Link attacks Amnesia - Webinar 24/4/2018
  • 5. k-anonymity • Each entry becomes indistinguishable from other k-1 entries o k-anonymity is achieved through suppression and generalization id Zipcode Age National. Disease 1 13053 28 Russian Heart Disease 2 13068 29 American Heart Disease 3 13068 21 Japanese Viral Infection 4 13053 23 American Viral Infection 5 14853 50 Indian Cancer 6 14853 55 Russian Heart Disease 7 14850 47 American Viral Infection 8 14850 49 American Viral Infection 9 13053 31 American Cancer 10 13053 37 Indian Cancer 11 13068 36 Japanese Cancer 12 13068 35 American Cancer id Zipcode Age National. Disease 1 130** <30 ∗ Heart Disease 2 130** <30 ∗ Heart Disease 3 130** <30 ∗ Viral Infection 4 130** <30 ∗ Viral Infection 5 1485* ≥40 ∗ Cancer 6 1485* ≥40 ∗ Heart Disease 7 1485* ≥40 ∗ Viral Infection 8 1485* ≥40 ∗ Viral Infection 9 130** 3∗ ∗ Cancer 10 130** 3∗ ∗ Cancer 11 130** 3∗ ∗ Cancer 12 130** 3∗ ∗ Cancer Amnesia - Webinar 24/4/2018
  • 6. Generalization Hierarchy Amnesia - Webinar 24/4/2018 7 9 16 18 0-10 10-20 *
  • 7. Structural information • We need to anonymize all relevant information about a person, not just a tuple • Information tends to gather over time • Information is linked through semantic properties, it’s schema is irrelevant • Personal data tend to accumulate over time • Research focuses on simple data and complicated guaranties but real world has complex data and requires simple guaranties Amnesia - Webinar 24/4/2018
  • 8. Limitsofk-anonymity • 2-anonymous Fruits Meat Vegetables Fish Vassilis Χ Χ Manolis Χ Χ Χ Eleni Χ Maria Χ Χ Kostas Χ Χ Food Vassilis Χ Manolis Χ Eleni Χ Maria Χ Kostas Χ Amnesia - Webinar 24/4/2018
  • 9. km-anonymity • 22-anonymous • Any combination of m items will not appear less than k times Fruits Meat Vegetables Fish Vassilis Χ Χ Manolis Χ Χ Χ Eleni Χ Maria Χ Χ Kostas Χ Χ Fruits Meat Other food Vassilis Χ Χ Manolis X Χ X Eleni X Maria Χ X Kostas Χ X Amnesia - Webinar 24/4/2018
  • 10. Strengths and Weaknesses • Strengths o Simple to understand • Can be the basis for consent o Close to previous and existing legal definitions o Low information loss o Customizable by non-experts • Weaknesses o Not very strict o Does not take into account sensitive values Amnesia - Webinar 24/4/2018
  • 11. Anonymization challenges • Anonymization techniques have not been tested in practice extensively o Mapping the social notion of privacy to technical notions is not easy • Data utility has not been studied extensively in research o Few artificial information loss measures • Data utility is difficult to estimate in practice o Different applications have different needs o No easy to quantify the loss of information Amnesia - Webinar 24/4/2018
  • 12. Amensia • Amnesia is a data anonymization tool developed by Research Center Athena • Amnesia is build with Java and Javascript • k-anonymity and km-anonymity • Tuples and set-values • Visual tools o Estimating data utility o Building hierarchies o Customizing anonymization solutions Amnesia - Webinar 24/4/2018
  • 13. Amnesia status • Amnesia is available as a public beta version at o https://amnesia.openaire.eu • On-line version is for demonstration and testing purposes mostly • Sensitive data can be anonymized locally by downloading the application o Security o Scalability • We are in process of adjusting it to health data Amnesia - Webinar 24/4/2018
  • 14. Amensia Challenges Is it easy to use by data owners? Are anoymized data useful? Amnesia - Webinar 24/4/2018 • Give us feedback!! o amnesia-helpdesk@imis.athena- innovation.gr • Can it anonymize your data? o Let us know about your use case o Ask us for help • We need feedback for data analysis o Let us know if you have shared anonymized results • Please contact us with your needs
  • 15. Next steps Work on the feedback More features Amnesia - Webinar 24/4/2018 • Improve user experience • Add support for specific domain data • Fix bugs! • New algorithms o Additional privacy guaranties o More data types • Better scaling capabilities o Disk based solutions o More efficient memory usage