O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

The Measurement of Privacy

1.481 visualizações

Publicada em

We examine the problem of measuring the affect of anonymisation upon a data set by utilising mutual information as a metric and applying varing degrees of differential privacy to causal and non-causal structures.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

The Measurement of Privacy

  1. 1. The Measurement of Privacy A Lecture to FISMA, 19 April 2017, Espoo Ian Oliver, Wei Ren, Yoan Miche Security Research Group (Espoo) 1 © Nokia 19 April 2015 - Public
  2. 2. Outline of the Problem • Data Colletion is Ubiquitous • Balance between... • Privacy Law: GDPR, ePrivacy, SOX, COPPA, HIIPA, . . . • Business Need: Data quality, Data quantity, Information content, . . . • Consumer Trust: Excessive Data Collection, Sharing, . . . • and we want to share and process data (legally, of course) • and we want to defeat machine learning (somewhat...) 2 © Nokia 19 April 2015 - Public
  3. 3. Anonymisation Solution: Anonymise the Data 3 © Nokia 19 April 2015 - Public
  4. 4. Anonymisation Techniques How to Anonymise Data • Suppression • Hashing • Tokenisation (Equivalence Classes) • Encryption • κ-anonymisation • ℓ-diversity • (ϵ, δ)-Differential Privacy 4 © Nokia 19 April 2015 - Public
  5. 5. Problem How much anonymisation is required? 5 © Nokia 19 April 2015 - Public
  6. 6. What is Anonymisation? A = {(ϵ, δ) − differentialPrivacy, κ-anon, hash(. . .), . . .} (1) Where [απ1...πn ]+ is one or more applications of an instantiation of an anonymisation function with a given set of parameters, then: Di Do Mi Mo [απ1...πn ]+ m m > 6 © Nokia 19 April 2015 - Public
  7. 7. Desirable Properties of Anonymisation • The output dataset must be legal • The output dataset must be useful 7 © Nokia 19 April 2015 - Public
  8. 8. Legal and Useful..? A legal data set might be found by selecting all elements with a given amount (or lack of) information: χL(d : D, p : R[0,1]) = { 1, m(d) ≤ p 0, otherwise (2) A similar definition follows for χU for usefulness; and composed with χL gives: Di Do C [απ1...πn ]+ χL◦χU Assuming C exists and at least one entry in D≀ is ‘reachable’ 8 © Nokia 19 April 2015 - Public
  9. 9. The Challenge Define: sufficiently 9 © Nokia 19 April 2015 - Public
  10. 10. The Challenge or, can we measure the degree of anonymisation? 10 © Nokia 19 April 2015 - Public
  11. 11. The Challenge and thus, establish which data sets have the ‘desirable properties’ we require? 11 © Nokia 19 April 2015 - Public
  12. 12. So now we need a metric Which information content (entropy) metric? 12 © Nokia 19 April 2015 - Public
  13. 13. So now we need a metric Which information content (entropy) metric? Which fits our chosen model/framework 13 © Nokia 19 April 2015 - Public
  14. 14. Mutual Information • Basis for machine learning/AI • Well grounded theory, statistical basis • Used to evaluate internal consistency and relationships in datasets • Degenerates ‘nicely’ with too little data • xi, xj ∈ structs(D) • Extension: MI(Dx, Dy) 14 © Nokia 19 April 2015 - Public
  15. 15. Case Study • Collection of (‘anonymised’) signalling data • hashed(ID) × LOC × TIMESTAMP 15 © Nokia 19 April 2015 - Public
  16. 16. Method • Apply (ϵ, δ)-differential privacy to various combinations of fields • Create measurement mechanisms, easy for Location and Time • Select suitable MI estimator functions and parameters • Invent a few new ways of doing MI and Machine Learning... • Calculate the MI for each dataset • Match the results against the earlier model of anonymisation • Construct χL and χU 16 © Nokia 19 April 2015 - Public
  17. 17. Hashing Considered Harmful (for Anonymisation) 17 © Nokia 19 April 2015 - Public
  18. 18. Reduction in MI Causal vs Non-causal field anonymisation 18 © Nokia 19 April 2015 - Public
  19. 19. Rate of Reduction in MI Sensitivity of ϵ in (ϵ, 0)-differential privacy 19 © Nokia 19 April 2015 - Public
  20. 20. MI under (ϵ, δ)-Differential Privacy 20 © Nokia 19 April 2015 - Public
  21. 21. MI under (ϵ, δ)-Differential Privacy [ ∂MI ∂ϵ , ∂MI ∂δ ] (3) 21 © Nokia 19 April 2015 - Public
  22. 22. χL and χU This obviously depends upon how we define χL and χU but the intersection are the datasets C - now we look more local maxima/minima within there. 22 © Nokia 19 April 2015 - Public
  23. 23. Discussion • Privacy can be meaningfully metricised → evaluation of anonymisation techniques • Non-trivial datasets = MASSIVE amounts of computation • c.1 × 106 data points = 8-10 hours of computation, optimisations are possible for some estimators • MI estimators and distance functions are a problem (non-Euclidian, non-linear and non-existant in many cases) • Classification functions: χL could be very useful (lawyers replaced by algorithms?) • Heuristics for choosing (ϵ, δ) and κ • Causal vs Non-causal data points • Units of privacy elude us for the moment - MI per amount of data ? • Some surprises with differential privacy’s δ - has implications for quality of data and machine learning • Probability spaces, Kullback-Liebler, Earth-mover, Non-continuous mappings, eigenvectors, comparing matricies... 23 © Nokia 19 April 2015 - Public

×