Data Works MD October 2021 - https://www.meetup.com/DataWorks/events/280714686/
Video - https://youtu.be/1iyKi8zZi80
-------------------------------------------------
Enabling Cross-Boundary Data Science with Privacy Enhancing Technologies
Recent breakthroughs in Privacy Enhancing Technologies (PETs) have made it possible to build systems that can keep data encrypted for the entire processing lifecycle. These advances can uniquely enable data scientists to operate on data sets that they otherwise wouldn't be able to access due to an organizational "boundary," such as a security classification or regulatory barrier. This talk will provide a brief introduction to PETs and a detailed walk-through of two algorithms that leverage PETs for data science use cases.
-------------------------------------------------
Ryan Carr serves as CTO and VP of Engineering at Enveil, the pioneering Privacy Enhancing Technology company protecting Data in Use. With experience in leading engineering efforts at institutions such as the Johns Hopkins University Applied Physics Laboratory, Ryan’s fields of expertise include large scale analytic systems, distributed algorithms, artificial intelligence, game theory and social learning, and applying cloud computing techniques to simulate and analyze complex interactions among large numbers of autonomous agents. His research in these areas has been published in highly competitive venues such as Proceedings of the Royal Society, AAAI, and AAMAS. Ryan holds a PhD/BS in Computer Science. Ryan can be reached on Twitter at @jryancarr
2. Outline
• What is Cross-Boundary Data Science?
• What are Privacy Enhancing Technologies?
• Homomorphic Encryption Primer
• Use Case: Private Information Retrieval
• Use Case: Encrypted Machine Learning
3. Many data sets have “boundaries” limiting how
others can interact with them:
• Security Classification
• Privacy Regulations
• Competitive Interests
Privacy Enhancing Technologies can allow
searches, analytics, and ML across these
boundaries.
Cross-Boundary Data Science
4. Privacy Enhancing Technology Overview
Differential Privacy
Secure Multiparty
Compute
Private Set
Intersection
Homomorphic
Encryption
Trusted Execution
Environments
Privacy Enhancing
Technologies
(PETs)
Most
Secure
Least
Secure
Homomorphic Encryption (HE)
3+ Party SMPC Protocols
Trusted Execution Environments (TEE)
By 2025, 50% of large organizations will adopt privacy-enhancing computation for processing data in untrusted
environments and multiparty data analytics use cases.
(Gartner “Top Strategic Technology Trends for 2021,” Oct. 2020)
5. Properties of modern encryption (AES, RSA, etc.):
• Encodes plaintext messages into ciphertexts
• Encoding algorithm build around a trapdoor function
• Easy to decode a ciphertext, if you have the secret key
• Provides computational security:
o Without secret key, need to try > 280 possibilities
Homomorphic Encryption (HE) does all that, plus:
• Permits operations on ciphertexts without the secret key
• Different HE algorithms for different data types
o BFV / BGV : Integers
o CKKS : Fixed point reals
o TFHE : Boolean logic
Homomorphic Encryption Primer
6. BFV Basics
• BFV = Brakerski/Fan-Vercauteren
• Security based on hardness of
Ring Learning with Errors
• Homomorphic operations:
( 𝐸 𝑎 is an encryption of 𝑎 )
o 𝐸 𝑎 + 𝐸 𝑏 = 𝐸(𝑎 + 𝑏)
o 𝐸 𝑎 + 𝑏 = 𝐸(𝑎 + 𝑏)
o 𝐸 𝑎 × 𝐸 𝑏 = 𝐸(𝑎𝑏)
o 𝐸 𝑎 × 𝑏 = 𝐸(𝑎𝑏)
8. Major Homomorphic Encryption Open Source Libraries
Homomorphic Encryption – Try it out!
SEAL
Supports BFV and CKKS.
Easiest to use, best performance for basic HE
operations.
github.com/microsoft/SEAL
PALISADE
Library for general lattice crypto, implements
its own math library
gitlab.com/palisade
HElib
Supports BGV + improvements, CKKS; Math
based on NTL library.
github.com/homenc/HElib
Homomorphic Encryption Standardization
Open Industry/Government/Academic Consortium
to Advance Secure Computation
http://homomorphicencryption.org
9. Use Case: Encrypted Search
select
forename,
middle_name,
...
aml_alert_flag,
sar_flag
from bankB.customer_profiles
where
id_doc_number = '9411998148' AND
id_doc_expiry_date = '2019-03-17' AND
nationality = 'British'
OR
soundex(forename) = soundex('Christina') AND
soundex(surname) = soundex('Thompson') AND
date_of_birth = '1963-05-20' AND
phone_number = '7903328915'
OR
soundex(forename) = soundex('Christina') AND
soundex(surname) = soundex('Thompson') AND
address = '49467 Larson Mountain' AND
postcode = 'N12'
select
forename,
middle_name,
...
aml_alert_flag,
sar_flag
from bankB.customer_profiles
where
id_doc_number = '9411998148' AND
id_doc_expiry_date = '2019909910' AND
nationality = 'British’
OR
soundex(forename) = soundex('Christina') AND
soundex(surname) = soundex('Thompson') AND
date_of_birth = ‘19699050200 AND
phone_number = '7903328915’
OR
soundex(forename) = soundex('Christina') AND
soundex(surname) = soundex('Thompson') AND
address = '49467 Larson Mountain' AND
postcode = 'N12'
Encrypted Query App
Client
Encrypted Query App
Server
User
OR
Application
Database
Boundary
10. Forename Middle Name Surname AML
Alert?
SAR
Alert?
Christina Flores Thompson Yes No
Forename Middle Name Surname AML
Alert?
SAR
Alert?
Christina Flores Thompson Yes No
Encrypted Query App
Client
Encrypted Query App
Server
User
OR
Application
Database
Encrypted Response
(sized to hold biggest possible answer)
Boundary
Use Case: Encrypted Search
17. HE enables new use cases for ML:
• Encrypted data (using CKKS), plaintext weights
• Use case: Send sensitive data to model owner for
inference. Data owner gets predictions.
Use Case: Encrypted ML Inference
18. HE enables new use cases for ML:
• Plaintext data, encrypted weights
• Use case: Send sensitive model to data owner for
inference. Model owner gets predictions.
Use Case: Encrypted ML Inference
19. HE enables new use cases for ML:
• Encrypted data, encrypted weights
• Use case: Outsource model processing to untrusted
(cloud) hardware without revealing model or data
Use Case: Encrypted ML Inference
21. • Only polynomial functions
• Only practical for low-depth models
• Extra security constraints due to properties of
CKKS
Research field for encrypted ML is very active!
Encrypted ML Limitations
22. Enveil is hiring!
• Software Engineers, Customer Success, PMs, Sales
• Office in Fulton, MD (hybrid work)
• Tons of interesting engineering problems
• No time tracking!
• Huge impact for U.S. govt and commercial customers
• Generous benefits
• Email ryan@enveil.com or careers@enveil.com
Want to work on this?