Anúncio
Anúncio

Mais conteúdo relacionado

Similar a Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguzhan Gencoglu(20)

Anúncio

Mais de Dataconomy Media(20)

Anúncio

Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguzhan Gencoglu

  1. Federated Learning & Privacy-preserving AI Oguzhan Gencoglu, Head of AI - Top Data Science Big Data Helsinki - 27 June 2019
  2. About Top Data Science ● Business : “AI as a Service” ● Located in Helsinki, Finland ● 15 people (12 data scientists with MScs and PhDs) ● Excellent customer track record - Finland, Germany, Denmark, Japan, Vietnam, Israel, USA ● 60+ machine learning solutions delivered Customers & Partners
  3. Outline ● The Problem ● The Solution : Federated Learning ● Application Example ● Differential Privacy ● Other Privacy-preserving AI Concepts
  4. The Problem?
  5. Federated Learning
  6. Example - Gboard
  7. Example - Gboard Hard, Andrew, et al. "Federated learning for mobile keyboard prediction." arXiv preprint arXiv:1811.03604 (2018). ● Higher next-word prediction accuracy = + 24% ● More useful prediction strip = + 10% more clicks ● Better emoji recommendation = + 7% ● 11% more users share emojis
  8. Differential Privacy a constraint on the algorithms used to publish aggregate information about a database which limits the disclosure of private information Learning common patterns in a dataset without memorizing individual examples
  9. “ Do you pee in the shower? ” yes no
  10. Mikko’s real answer HEADS TAILS yes no HEADS TAILS %50 %50 %25 %25
  11. Relevant Concepts Quasi-identifier : pieces of information that are not of themselves unique identifiers, but are sufficiently well correlated with an entity to create a unique identifier Typical bank loan eligibility data
  12. Relevant Concepts Exponential Mechanism : a technique for designing differentially private algorithms (McSherry & Talwar, 2007)
  13. Differentially Private FL McMahan, Brendan, et al. "Learning Differentially Private Recurrent Language Models." (2018).
  14. Tools & Libraries ● github.com/tensorflow/federated ● github.com/tensorflow/privacy ● github.com/uber/sql-differential-privacy ● github.com/IBM/differential-privacy-library
  15. Other Privacy-preserving AI Concepts
  16. ML on Encrypted Data f3a9d 71g3e f3a9d 71g3e End-User Third Party Benign Tumor Trained Model Encrypted Prediction Input Data Encrypted Input Decrypted Prediction ● The end-user encrypts her sensitive data and sends it to a third-party host. ● As end-user owns the private key, third-party cannot decrypt the input nor output prediction. ● Third-party produces an encrypted prediction which is returned to the end-user. ● Privacy is preserved in the entire pipeline for both inputs and outputs.
  17. Homomorphic Encryption ● Homomorphic Encryption (HE) is a form of encryption that allows computation (eg multiplication and addition) on ciphertexts, generating an encrypted result which, when decrypted, matches the result of the operations as if they had been performed on the plaintext. 3 + 5 = 8 d8d4h… + 8ke3s1… = 1u3y7... Plain Domain Cipher Domain Note : Computations in the cipher domain are very costly in terms of speed and memory.
  18. Membership Inference Attacks An interesting insight: the accuracy of the inference attack increases with increasing number of classes Given a black-box machine learning model and a data record, determining whether this record was used as part of the model’s training dataset or not, was shown to be possible with extremely high accuracy [7]. As a result, we now know that just a simple query access to a black-box API that returns the model’s output on a given input, can leak significant amount of information about the individual data records on which the model were trained on.
  19. Data Generation Dar et al., Image synthesis in multi-contrast MRI with cGANs, 2019
  20. Data Generation Hyland et al., Real-valued time series generation with recurrent cGANs, 2017
  21. Thank You!

Notas do Editor

  1. Models need to run on device : offline and quicker
  2. Models need to run on device : offline and quicker
Anúncio