SlideShare uma empresa Scribd logo
1 de 151
Secure Big-Data Processing
University of California, Irvine, USA.
IEEE Big Data 2019, Los Angeles, California, USA
Anton Burtsev Sharad Mehrotra Shantanu Sharma
• Introduction
• How to securely process data at the cloud?
• Challenges and overview of existing state-of-the-art
• Cryptographic Techniques
• Encryption-based Data Outsourcing
• Secret-Sharing-based Data Outsourcing
• Exploiting Trusted Computing Platforms
• Secure hardware
• Hybrid cloud
• Data Partitioning-based Outsourced Data Processing
• Conclusion and Open Problems
Contents
Storage
Distributed File Systems (DFS)
Hadoop DFS, Google File System
(GFS), Gfarm, Amazon S3
Machine
Google Compute Engine, Amazon
Web Services, Microsoft Azure,
Rackspace OpenCloud
Network
Relational databases, Key-Value
Store, NoSQL, MapReduce
Database operations, e.g., selection,
projection, aggregation, and join, and
clustering, machine learning
IaaSPaaSSaaS
Data providers The public cloud Users
Big-Data Processing in the Cloud
Figurereference:PhilipDerbeko,ShlomiDolev,EhudGudes,andShantanuSharma.
“SecurityandprivacyaspectsinMapReduceonclouds:Asurvey.”Computersciencereview20(2016):1-28.
• Utility model
• Pay for only what you use
• No infrastructure build-up cost and/or
database administration costs
• Elastic
• Use as much as your needs (virtually
limitless)
• No system management
headaches
• failure, loss of data, Software
upgrades, patches, bug fixes, etc.
• Cost amortization
• Cheaper due to economy of scale
• Better control over IT investment
Why Cloud?
Public Cloud
Elastic, pay-as-you-go
service
Private
Existing servers or data centers
Hybrid
Utilize both public & private
• Data resides in shared systems
administration of which is not in owners'
control
• Unknown applications and processes share
resources with apps and data.
• Data owners have no control over the
cloud’s internal data security personnel,
policies or their enforcement
• Insider attacks
• Data mining attacks leading to information
leakage
• Cloud providers compliance to government
subpoenas
Key Challenge: Loss of Control
End Users
Public Cloud
• Availability
• Will the owners always have access to data and services?
• Integrity
• Will the cloud provide answers to queries correctly?
• Security
• Will the cloud implement its own security policies correctly?
• Privacy and confidentiality
• Will sensitive data remain confidential?
• Will data be vulnerable to misuse? By other tenants? By the service provider?
Implications of Loss of Control
What is The Solution?
Encrypt sensitive data before uploading to the cloud
Secure Computing
Download the encrypted data and compute at the trusted side
Cryptographic Solutions at the Cloud Exploiting Trusted Computing
Trusted Private Cloud Untrusted Public Cloud
Download encrypted data
Upload encrypted data
Encrypted dataCleartext data The DB owner Secure hardwareCleartext data processing
Cleartext results
Encrypted query
• An adversary may learn about data:
• From ciphertext (ciphertext representation-based attack)
• From prior knowledge of data distribution (frequency-count attack)
• From the size of the output to a query (output-size attack)
• From the access pattern used by the mechanism in answering a query (access-
pattern attack)
• From knowledge of queries that have executed (search-pattern attack)
• From knowledge of frequency of queries (workload-skew attack)
Common Attacks in Data Outsourcing
• Honest-but-Curious versus Malicious adversary
• Honest-but curious
• Executes protocols correctly, but wishes to learn about data
• Malicious
• Might sabotage data or computation
• Passive versus Active Adversary
• Passive
• Makes inferences based on passive observations - ciphertext, queries,
workload, and access patterns
• Active
• May actively injecting new data, execute queries, or interfere with the
execution
Adversarial Cloud Model
• Semantic Security
• Access to ciphertext does not help provide any information about the
plaintext other than what the adversary knew a-priori.
• Difficult to use directly
• Equivalent notion – Indistinguishability
• Adversary cannot distinguish between the ciphertexts of two
plaintexts
• Easier to prove using a real-versus-ideal game
• Security definition needs to be adapted in data outsourcing
• Since leakages occur from encrypted data representation and query
execution
Defining Security
Reference: Shafi Goldwasser, and Silvio Micali. "Probabilistic encryption." Journal of computer and system sciences 28, no. 2 (1984): 270-299.
Curtmola, Reza, Juan Garay, Seny Kamara, and Rafail Ostrovsky. "Searchable symmetric encryption: improved definitions and efficient constructions." Journal of Computer Security 19, no. 5 (2011): 895-934.
Security Goal: IND-CKA1: Real Game Model with Leakage
Profile
D0
E(D0)
Leakages
e.g., access-patterns, search-patterns,
output-size
A set of queries
A set of encrypted queries (i.e., trapdoors/tokens) for
the requested set of queries
Security Goal: IND-CKA1: Ideal Game Model with
Leakage Profile
D0
E(D’)
1. Leakages (e.g., access-patterns, search-patterns, output-size) from the real game
2. Generate a fake dataset (D’) having the same leakages
3. Randomly select D0 or D’ and encrypt it
Which dataset is
encrypted – D0 or fake?
The same set of queries like in the real game
A set of encrypted queries (i.e., trapdoors/tokens) for the
requested set of queries
• Many cloud providers support
encryption at rest
• Microsoft Always Encrypt
• Amazon Aurora , MariaDB
Cloud Layers and Security
IaaS
PaaS
SaaS
• Secure MapReduce, Secure
Spark, Secure SQL…
• Microsoft Always Encrypt,
Jana@Galois Inc.,
Pulsar@Stealth Software
Technologies
• Application security
• Garble Cloud, Cloud Protect,
SPORC
Encryption-based Cryptographic Approaches
Encrypted data
Cleartext data
The DB ownerThe DB owner
Encrypted processing
Trusted Private
Cloud
Untrusted Public Cloud Users
• Fully homomorphic approach
• Very inefficient and not practical
• Partially homomorphic
• Additive: e.g., Paillers
• Multiplicative: e.g., Elgamal
• Searchable encryption
• Bucketization [Hore et al., VLDB, 04]
• Searchable Encryption [Song et al., IEEE
SP, 00]
• Secure indexes – encrypted Bloom filters
[Goh, 03]
• Order-Preserving Encryption (OPE)
[Agrawal et al., SIGMOD, 04)
• Conjunctive keyword search [Golle et al.,
ACNS, 04]
• Encrypted inverted lists [Curtmola et al.,
CCS, 06]
• Onion encryption [Popa et al., SOSP, 11]
Different approaches
• Different levels of security
• Support different operations
• Different levels of efficiency
MPC and Secret Shared Mechanisms
Untrusted Public Clouds
Users
• Techniques:
• Secret-sharing [Shamir, CACM, 1979]
• Distributed Point Function [Gilboa et al., EUROCRYPT,
2014.]
• Function secret-sharing [Boyle et al., EUROCRYPT, 2015]
• Homomorphic Secret-Sharing [Boyle et al., CCS, 2017]
• Accumulating-Automata [Dolev et al, SCC@ASIACCS ,
2014]
• Obscure [Gupta et al, CS@UCI, 2019]
• Conclave [Volgushev et al. arxiv, 2019]
• SMCQL [Bater et al., PVLDB, 2017]
• Systems:
• Jana by Galois
• Partisia
• PULSAR by Stealth Software Technologies
• Secret Double Octopus and SecretSkyDB Ltd
• Sharemind by Cybernetica
• Unbound Tech.
Secret-Shared Data
Cleartext data
The DB ownerThe DB owner
Secret-Shared processing
Trusted Private
Cloud
• Secure against stronger adversaries
• Information-theoretically secure
• Secure against access-pattern-based attacks
• However, much more expensive
• 5-6 order of magnitude expensive compared
to plain text processing
Cryptographic Techniques vs Security Threats
represents technique is resilient to a given attack.
Resilient to attacks
Techniques Data at rest During query execution
Ciphertext
indistinguishability
Output-
Size
Workload-skew Access-patterns
Full Download
Deterministic Encryption/OPE X X X X
Non-Deterministic Encryption X X X
Searchable encryption X X X
Homomorphic + ORAM X X
Shamir’s Secret-sharing X X
Multi-party computations-Jana X X
Reference: Sharad Mehrotra, Shantanu Sharma, and Jeffrey D. Ullman. "Scaling Cryptographic Techniques by Exploiting Data Sensitivity at a Public Cloud." In Proceedings of the Ninth ACM Conference on Data
and Application Security and Privacy, pp. 165-167. ACM, 2019.
• Efficiency
• How expensive are the cryptographic operations? Is operation linear or sublinear
in the size of the data (indexable versus non-indexable)?
• Generality
• What queries can the technique support – selection, range, join, aggregation
• Dynamic Operations
• Does the scheme support insertion/deletions/updates?
• Client-Side Execution
• How much work does the client have to do? During insertion/updates/queries.
• Security
• How much security does the scheme offer? Quantifiable leakage, e.g.,
orderability, distribution? Semantic security?
Cryptographic Techniques – Design Criteria
Exploiting Trusted Platform
Trusted Private
Cloud
Untrusted Public Cloud Users
Trusted Private
Cloud Untrusted Public Cloud Users
Hybrid Cloud Scenario
Secure Hardware Scenario
Cleartext non-sensitive dataCleartext sensitive data The DB owner Cleartext non-sensitive data processing
Secure
hardware
Cleartext sensitive data processing
• Distribute computation between
untrusted platform and trusted
platform
• Solutions differ on the trusted platform
exploited, degree of integration, security
offered, and computations supported
• Hybrid Cloud-based Solutions
• HybrEx, SEMROD, Sedic
• Secure FPGA-based solutions
• Microsoft Cipherbase
• Intel SGX-based solutions
• Opaque, EnclaveDB, VC3, HardIDX
• Minimizing data movement between trusted and untrusted platforms
• Movement between trusted and untrusted platforms can lead to leakage
• Mapping complex operator workflow between trusted and untrusted
platforms
• Existing trusted hardware are vulnerable to side-channel attacks
• Oblivious access at different levels, e.g., register and cache-line
• Cost vs security
Trusted Platform – Challenges
Security Techniques vs Computation Cost
Selecting a single row from TPC-H Customer table of 1.5M rows and 8 columns
Searchable encryption: DSSE: Distributed
Searchable Symmetric Encryption (PULSAR
by Stealth Software Technologies)
MPC: Multi-party computation (Jana by
Galois)
Opaque SGX based solution [Zhang et al.,
NSDI, 2017]
• Cryptographic Overheads:
• Searchable encryption – ~2 orders of magnitude
• Secure hardware - ~3-4 order of magnitude
• MPC based solution - ~5-6 orders of magnitude
Can we design an outsourcing solution for that is
simultaneously??
Efficient – significantly better compared to downloading
cryptographically secured data, and
Secure – similar to downloading the data and local processing
Secure Data Outsourcing: Challenge
A possible approach??
Partitioned computing that exploits partial sensitivity of data
to restrict cryptographic overheads to only sensitive data
Trusted Private Cloud Untrusted Public Cloud Users
Cleartext non-sensitive dataCleartext sensitive data The DB owner Partition computationEncrypted sensitive data
Reference:SharadMehrotra,ShantanuSharma,JeffreyUllman,andAnuragMishra."Partitioneddatasecurity
onoutsourcedsensitiveandnon-sensitivedata."In2019IEEEICDE,pp.650-661.IEEE,2019.
• Organization data is often only partially sensitive
• Sensitivity dictated by policies
• Sensitivity dictates what data and in what form is it outsourced
• E.g., General office emails possibly not sensitive (hence outsourced)
• Information related to a sensitive project sensitive (hence not outsourced in
plaintext)
• Can we exploit partially sensitive nature of data to scale cryptographic
solutions without compromising security of sensitive data?
• Commercial encrypted database solutions (e.g., Jana by Galois Inc.) are beginning
to explore such solutions
Data Sensitivity
Partitioned Data Security Challenge
• Non-Linkability
• The Adversary does not learn relationship between any encrypted and plaintext
value
• Cyphertext Indistinguishability
• The adversary does not learn any relationships between encrypted values
• unless underlying crypto allows such relationships to be learnt (e.g., OPE)
Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data."
In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 650-661. IEEE, 2019.
• Introduction
• How to securely process data at the cloud?
• Challenges and overview of existing state-of-the-art
• Cryptographic Techniques
• Encryption-based Data Outsourcing
• Secret-Sharing-based Data Outsourcing
• Exploiting Trusted Computing Platforms
• Secure hardware
• Hybrid cloud
• Data Partitioning-based Outsourced Data Processing
• Conclusion and Open Problems
Contents
Cryptographic Solutions
• Encryption-based Techniques
• Bucketization [Hore et al. VLDB 04]
• Searchable Encryption [Song et al., IEEE SP 00]
• Secure indexes – encrypted Bloom filters [Goh, 03]
• Bilinear maps [Boneh et al., EuroCrypt 03]
• Order-Preserving Encryption (OPE) [Agrawal et al.,
SIGMOD 04]
• Modular-OPE [Boldyreva et al., CRYPTO 11]
• Conjunctive keyword search [Golle et al., ACNS 04]
• Encrypted inverted lists [Curtmola et al., CCS 06]
• Fully homomorphic encryption [Gentry, STOC 09]
• Onion encryption [Popa et al., SOSP 11]
• Dynamic Searchable Encryption [Cash et al.NDSS 14]
• PBTree [Li et al., VLDB 14]
• IBTree [Li et al., ICDE 17]
• Secret-Sharing Techniques
• Shamir’s secret-sharing [Shamir, CACM 79]
• Multi-Linear Secret-Sharing Schemes [Brickell et al., J. of
Cryptology 91, Bertilsson et al., AUSCRYPT 92]
• Verifiable secret sharing [Rabin et al., STOC 89]
• Proactive Secret Sharing [Herzberg et al., CRYPTO 95]
• Function Secret Sharing [Boyle et al., EUROCRYPT 15]
• Homomorphic secret sharing [Boyle et al. CRYPTO 16]
• Accumulating Automata [Dolev et al., TCS 19]
• Encryption-based Systems
• CryptDB [Popa et al., SOSP 11]
• Monomi [Tu et al.. VLDB 13]
• Cipherbase [Arasu et al., CIDR 13]
• TrustedDB [Bajaj et al., IEEE TKDE 13]
• CorrectDB [Bajaj et al., VLDB 13]
• ZeroDB [Egorov et al., arxiv 16]
• MrCrypt [Tetali et al., OOPSLA 13]
• EncKB [Yuan et al., ASIACCS 17]
• Microsoft Always Encrypted
• Oracle 12c
• Amazon Aurora
• MariaDB
• Secret-Sharing-based Systems
• SSSDB [Avni et al., ALGOCLOUD 15]
• Splinter [Wang et al., NSDI 17]
• OBSCURE [Gupta et al, VLDB 19]
• Cybernetica
• Jana by Galois Inc.
• Partisia
• Secret Double Octopus
• SecretSkyDB Ltd
• PULSAR by Stealth Software Technologies Inc.
• Unbound Tech.
EmpID name DID
E1 Alice D1
E2 Bob D2
E3 Carl D1
Problems
DDID Dname
D1 Sale
D2 Coding
On the relations, execute the following in a secure manner:
1. Selection query
(e.g., SELECT * FROM employee WHERE name= ‘Alice’)
2. Join query
(e.g., SELECT * FROM employee INNER JOIN department ON employee.DID = department.DDID)
3. Aggregation query
(e.g., SELECT count(*) FROM employee WHERE DID=‘D1’)
employee department
ID Dept Comment
Id1 D1 W1
Id2 D1 W2
.
.
.
.
.
.
.
.
.
Idi Di Wi
Idk Dk Wk
Searchable Encryption: Ciphertext Generation
A relation
Wi Ek(Wi)
Ek():
Deterministic
encryption
Li Ri
Si Ti
ki = fk(Li)
Ti= fki(Si)
 Ciphertext (CT)
n-m bits m bits
n bits
Trapdoor for wi
Key generation
Partitioning the encrypted
word into two parts
Pseudorandom string
Reference: Dawn Xiaoding Song, David Wagner, and Adrian Perrig. “Practical techniques for searches on encrypted data.”
In Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000, pp. 44-55. IEEE, 2000.
Searchable Encryption: Search at the Cloud
Ciphertext (CT)
Si Ti

Matching
or not???
CTLi CTRi
User provided values
Ek(Wi)
ki = fk(E1)
E1 E2
Partitioning the ciphertext
into two parts
Partitioning the encrypted
word into two parts
n-m bits m bitsn-m bits m bits
Ti= fki(Si)
Ek(Wi)
Data outsourcing method
Idea
A  B = C
A  C = B
B  C = A
Reference: Dawn Xiaoding Song, David Wagner, and Adrian Perrig. “Practical techniques for searches on encrypted data.”
In Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000, pp. 44-55. IEEE, 2000.
Advantage
Does not reveal any thing before the query execution, unlike
deterministic encryption that reveals things before query execution
Disadvantage
Linearly scan the entire data, i.e., no index support
Question
Can we have indexable searchable encryption?
• The cloud maintains an index
• User sends keywords and the cloud traverses the index to answer the query
• Issues:
• Index Generator: Who will create an index – the DB owner vs server?
• Mostly work consider the DB owner to create index
• Index Traverse: Interactive vs non-interactive – can the cloud traverse the index by
own?
• Index Update: Can the cloud update the index?
• Techniques:
• Early approaches: The DB owner generated, interactive traversal, non-updateable
• Exploit oblivious techniques
• Implemented in Stealth Software Technology, Inc.
• Recent: PB-Tree: The DB owner generated, non-interactive traversal, updateable
Indexable Searchable Encryption
Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky: Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions.
Our
focus
• Consider
• A number n
• The number of bit to represent the number n in binary form = w
• Prefix family will contain w+1 items
• Prefix family
• Consider a number 6
• 6 in 5-bit binary = (00110)
• Prefix family of 6 is F(6) = {00110, 0011*, 001**, 00***, 0****,*****}
• What a node of the index will contain?
• Leaf node: Prefix family of one of the data items
• Other nodes: Union of prefix families of their child nodes
Indexable Searchable Encryption
Reference:
Li, Rui, Alex X. Liu, Ann L. Wang, and Bezawada Bruhadeshwar. "Fast range query processing with strong privacy protection for cloud computing." Proceedings of the VLDB Endowment 7, no. 14 (2014): 1953-1964.
Rui Li and Alex X. Liu. "Adaptively secure conjunctive query processing over encrypted data for cloud computing." In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 697-708. IEEE, 2017.
• Step 1:
• Find prefix family of all
such numbers
• Step 2:
• Allocate the prefix family
to the root node
• Step 3:
• Divide the number in the
given node until a node
contains prefix family of
one of the given numbers
Create Index using Prefix Family: Top-Down Way
F(1), F(6), F(7), F(9), F(11), F(12), F(13), F(16), F(20), F(25)
F(1), F(6), F(7), F(16), F(20) F(9), F(11), F(12), F(13), F(25)
F(1), F(6), F(7) F(16), F(20) F(12), F(13), F(25)
F(6),F(7)
F(1)F(6) F(7) F(20) F(16)
F(12), F(13)
F(12) F(13) F(25) F(9) F(11)
F(9), F(11)
Create index on the following numbers
1, 6, 7, 9, 11, 12, 13, 16, 20, 25
• Step 1:
• User creates prefix family of 6 and sends to the cloud
• Step 2:
• The cloud starts from the root node to find the prefix family of the given query
Execute a Point Query using the Index
F(1), F(6), F(7), F(9) ,F(11), F(12), F(13), F(16), F(20), F(25)
F(1), F(6), F(7), F(16), F(20) F(9), F(11), F(12), F(13), F(25)
F(1), F(6), F(7) F(16), F(20) F(12), F(13), F(25)
F(6),F(7)
F(1)F(6) F(7) F(20) F(16)
F(12), F(13)
F(12) F(13) F(25) F(9) F(11)
F(9), F(11)
Query: Find 6
F(6) 
Reference:
Li, Rui, Alex X. Liu, Ann L. Wang, and Bezawada Bruhadeshwar. "Fast range query processing with strong privacy protection for cloud computing." Proceedings of the VLDB Endowment 7, no. 14 (2014): 1953-1964.
Rui Li and Alex X. Liu. "Adaptively secure conjunctive query processing over encrypted data for cloud computing." In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 697-708. IEEE, 2017.
A
B C
D E
F
G
H
I
Execute a Range Query
F(1), F(6), F(7), F(9) ,F(11), F(12), F(13), F(16), F(20), F(25)
F(1), F(6), F(7), F(16), F(20) F(9), F(11), F(12), F(13), F(25)
F(1), F(6), F(7) F(16), F(20) F(12), F(13), F(25)
F(6),F(7)
F(1)F(6) F(7) F(20) F(16)
F(12), F(13)
F(12) F(13) F(25) F(9) F(11)
F(9), F(11)
Query: Find all numbers between [0,8]• Step 1:
• Represent the range predicate into their prefix family
• F(0)= {00000, 0000*, 000**, 00***, 0****,*****}
• F(8) = {01000, 0100*, 010**, 01***, 0****,*****}
• Step 2:
• Minimum set of prefixes such that union of prefixes cover the range
• {00***,01000}
• Step 3:
• Check node for
minimum set of
prefixes
000 → 0
001 → 1
010 → 2
011 → 3
100 → 4
101 → 5
110 → 6
111 → 7
A
B C
D E
F
G
H I
• Indistinguishability
• Use Bloom filters
• Any prefix will be hashed to r locations using HMAC with r keys
• Node Indistinguishability
• Associate each node v with a random number v.R, then hash r times as follows:
• HMAC(k1, v.R, p), …, HMAC(kr, v.R, p)
• Reverse engineering
• An adversary can do reverse engineering to create PB-tree after observing many
queries or by asking queries
• How to solve this issue?
• IB-Tree
Issues with the Index (PB-Tree)
Reference:
Li, Rui, Alex X. Liu, Ann L. Wang, and Bezawada Bruhadeshwar. "Fast range query processing with strong privacy protection for cloud computing." Proceedings of the VLDB Endowment 7, no. 14 (2014): 1953-1964.
Rui Li and Alex X. Liu. "Adaptively secure conjunctive query processing over encrypted data for cloud computing." In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 697-708. IEEE, 2017.
Two nodes may contain overlapping prefix families
F(6) = {00110, 0011*, 001**, 00***, 0****,*****}
F(7) = {00111, 0011*, 001**, 00***, 0****,*****}
• Indistinguishable Bloom Filter
• Twin cell: 0 and 1
• For i-th location, which cell stores 1: HMAC(kr+1, i)  rB
• rB is a random number for IBF B.
• IB-Tree
• A tree like PBtree, but all nodes use Indistinguishable Bloom Filter
Searchable Encryption for Adaptive Adversary
Referenceandslidecredit:RuiLiandAlexX.Liu."Adaptivelysecureconjunctivequeryprocessingover
encrypteddataforcloudcomputing."In2017IEEE33rdInternationalConferenceonDataEngineering
(ICDE),pp.697-708.IEEE,2017.
Selected
Unselected
Name :=John and Age=[1,15]
Name :John,
Age:0001
NR
N32
N11
N21
N31
d2d1
N34
N22
N33
d4d3
N36
N12
N23
N35
d6d5
N37
d7
Name :John,
Age:001*
Name :John,
Age:01**
Name :John,
Age:1***
U U U
Processing Range Queries on IB Tree
Slide credit: Alex Liu: Adaptively Secure Conjunction Query
Processing over Encrypted Data for Cloud Computing.
Minimum set of prefixes such that union of prefixes cover the range
•Secure execution of selection queries
• Point and range queries
• Indexable vs non-indexable
•What about join and aggregation?
What we have discussed so far?
Bucketization
NAME SALA
RY
John 54500
Mary 111029
James 95300
Lisa 14500
0
E-tuple Bucket_id
fErf!$Q!! Xr2k%s
F%%3w& 11vb$$
&%gfsdf$ bbcr3@
%%33w& Xxrty*
Q: SELECT name FROM EMPLOYEE
WHERE salary ≥ 90k AND salary < 110k
false positive
Q: SELECT name FROM EMPLOYEE
WHERE Bucket_id = bbcr3@
OR Bucket_id = 11vb$$
Bucket ID
30k – 50k 1bx!23
50k – 70k Xr2k%s
70k – 80k Rtes12!
80k – 90k Cvtr^e
90k – 100k bbcr3@
100k – 115k 11vb$$
115k – 130k 23wqa%
130k – 160k Xxrty*
Pros
• Generality: allows large class of predicates to be evaluated (most of SQL)
• Efficient implementation: index
Cons
• Incurs overhead on client: pruning of false positives
Database Owner Site Cloud Site
• Buckets’ impact
• Query execution overhead
• Security
• Security metrics
• How large is the span of the bucket? – larger the better
• How are the frequencies distributed? More uniform the better
• Cost metrics
• How many false positives are generated for a predicate?
• What is the storage overhead due to metadata ?
• Improving security
• Introduce randomness to increase security level
Bucketization
Reference: Hakan Hacigümüş, Bala Iyer, Chen Li, and Sharad Mehrotra. "Executing SQL over encrypted data in the database-service-provider model." In Proceedings of the 2002 ACM SIGMOD international conference on
Management of data, pp. 216-227. ACM, 2002.
• We can do
• Joins at the cloud-side based on bucket-ids
• But with computational overhead at the DB owner due to filtering
• Can we avoid computation overheads at the DB owner in join
operation?
• Precompute join operation before outsourcing the data
What We Have Seen in Bucketization?
• Represent data in different format
• Execute join among tables before outsourcing
Precomputed Joins: Different Representation of Datasets
Slide credit: Seny Kamara Tarik Moataz:
SQL on Structurally-Encrypted Databases
Data Outsourcing
Precomputed Joins
Reference and slide credit: Seny Kamara and Tarik Moataz. "SQL on structurally-encrypted databases." In International Conference on the Theory and Application of Cryptology and Information Security,
pp. 149-180. Springer, Cham, 2018.
Join
ProjectionSelectionSelection Projection
Disadvantages:
1. Joins are precomputed
2. Aggregation queries cannot be executed at the cloud
3. Complex queries cannot be solved at the cloud
• Non-indexable Searchable Encryption
• Indexable Searchable Encryption for point and range queries
• Bucketization for join, aggregation, and most of SQL
• Precomputed joins
•Is there any system based on these techniques or based
on encryption?
What we have seen so far?
• CryptDB
• Monomi
• Arx
• Cipherbase
• TrustedDB
• CorrectDB
• SDB
• EncKV
Encryption-based Systems
• ZeroDB
• MrCrypt
• Crypsis
• Microsoft Always Encrypted
• Oracle 12c
• Amazon Aurora
• MariaDB
• Can be seen as a two-column table
• One column for key
• Another column for value
• Also, they can store complicated relational table in this format
• Example:
• Person database:
• Key: Person ID, Value: Person record
• Key: City, Value: PersonID
• Key: PersonID, Value: Name
Key-Value Store
Person id name age city
1001 alice 20 LA
1002 bob 25 LA
1003 tom 20 NY
Key = City
Value = PersonID
LA 1001
LA 1002
NY 1003
Key = PersonID
Value = Nme
1001 alice
1002 bob
1003 tom
EncKV: Encrypted Key-Value Store
1
5
2
4
3
6
Server
Hash on Row Id to allocate key-value pair to a server
LA 1001
LA 1002
NY 1003
city Person
id
Person
Id
Name
Reference and Slide credit: Xingliang Yuan, Yu Guo, Xinyu Wang, Cong Wang, Baochun Li, and Xiaohua Jia. “Enckv: An encrypted key-value store with rich queries.”
In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 423-435. ACM, 2017.
H(Gk(city||LA||i),2) Enck(1002)
H(Gk(name||bob||i),1) Enck(1002)
H(Gk(city||LA||i),1) Enck(1001)
H(Gk(city||NY||i),1) Enck(1003)
… …
Server i
Attribute
Name
Attribute
Value
Server i
Value
Occurrence
Double
encryption
1001 alice
1002 bob
1003 tom
EncKV: Encrypted Key-Value Store
H(Gk(city||LA||i),2) Enck(1002)
H(Gk(name||bob||i),1) Enck(1002)
H(Gk(city||LA||i),1) Enck(1001)
H(Gk(city||NY||i),1) Enck(1003)
… …
Enck(1001)
Enck(1002)
2
Encrypted exact match Index
Pk(name||1002) Enck(bob)
Pk(age||1001) Enck(20)
Pk(name||1001) Enck(alice)
… …
Server i
Encrypted data records
select “name” where “city=LA”
Gk(city||LA||i)1
Pk(name||1001)
Pk(name||1002)
3
Enck(bob)
Enck(alice)
4
• Observations:
• The server does not learn whether the index entries of two different values belong to the same attribute
or not before query execution, e.g., H(Gk(city||LA||i),1) and H(Gk(city||NY||i),1) for LA and NY.
• At any two servers, the index entries for the same attribute are different, e.g., H(Gk(city||LA||i),1) and
H(Gk(city||LA||j),1) for server i and j.
P, H,G: PRF
ReferenceandSlidecredit:XingliangYuan,YuGuo,XinyuWang,CongWang,BaochunLi,and
XiaohuaJia.“Enckv:Anencryptedkey-valuestorewithrichqueries.”InProceedingsofthe2017ACM
onAsiaConferenceonComputerandCommunicationsSecurity,pp.423-435.ACM,2017.
Double
encryption
Communication Overhead
If more than 1M people from LA in the table???
• Some encryption techniques are fast, but reveal information
• Deterministic encryption is fast but reveals distribution of values
• Order-Preserving encryption (OPE) is fast but reveals order of the values
• Searchable encryption is fast but only secure unless a query is executed; otherwise
reveals data distribution or order of the values
• Bucketization is more secure as compared to above techniques and fast
• Retrieve more items
• Require client-side processing
• CryptDB is fast but insecure, due to using deterministic encryption and OPE
• Open issues:
• Need a fast and secure encryption technique that can support
different types of SQL queries
• Need an index that a cloud can build
Pros and Cons of Encryption-based Techniques
Secret-Sharing-based Data
Outsourcing
•Encryption techniques are computationally secure
• A powerful adversary can break the encryption technique
• Google, with sufficient computational capabilities, broke SHA-1 (https://shattered.io/)
•Information-theoretical security
• Secure regardless of the computational power of an
adversary
• Quantum secure
Why Secret-Sharing?
Shamir’s Secret-Sharing (SSS) [Shamir79] – Key Idea
• One point  Infinite number of lines
• Two points  Only one line
• Where f(0) is the secret
• Alice wants to share her secret value 5 to Bob and Carl
• Bob and Carl do not communicate with each other
• Impact of degree of the polynomial vs security
• 𝑓 servers collude  polynomial degree should be 𝑓 + 1
• Servers do not collude  a polynomial of the degree 1
• Fault tolerant
• Due to creating multiple shares
Reference:AdiShamir.“Howtoshareasecret.”CommunicationsoftheACM22,no.11(1979):612-613.
Shamir’s Secret-Sharing (SSS)
Secret
S
Secret Owner Non-Communicating Public Servers
s1
s2
s3
s4
Mathematical operations
f(x) = S + ax
Each server
cannot learn
the secret S
Secret-Share Creation:
e.g., under the assumption that
no server will collude
Reference: Adi Shamir. “How to share a secret.” Communications of the ACM 22, no. 11 (1979): 612-613.
Shamir’s Secret-Sharing (SSS)
Secret
S
Secret Owner Non-Communicating Public Servers
s1
s2
s3
s4
Lagrange Interpolation
Secret Reconstruction
e.g., under the assumption that
no server will collude
Reference: Adi Shamir. “How to share a secret.” Communications of the ACM 22, no. 11 (1979): 612-613.
Shamir’s Secret-Sharing (SSS)
Secret
S
Secret Owner Non-Communicating Public Servers
s1
s2
s3
s4
Secret Reconstruction
e.g., under the assumption that
no server will collude
Lagrange Interpolation
Reference: Adi Shamir. “How to share a secret.” Communications of the ACM 22, no. 11 (1979): 612-613.
• Similar to Order-Preserving Encryption (OPE)
• If cleartext values have a relation, such as 𝒂 < 𝒃, then
• 𝑆 𝑎 < 𝑆 𝑏
• Efficient for maximum/minimum and range queries
Order-Preserving Secret-Sharing
Reference: Fatih Emekci, Ahmed Methwally, Divyakant Agrawal, and Amr El Abbadi. “Dividing secrets to secure data outsourcing.” Information Sciences 263 (2014): 198-210.
Computing over Secret Shared Data
Secret Sharing
Communicating Servers
(Jana and Sharemind)
Non-communicating
servers (SSDB, OBSCURE)
• Selection and aggregation queries
• Significant communication overheads
amongst servers
• Selection and aggregation queries
Our
focus
• Outsource the above relation using Shamir’s secret-sharing
• Add all secret-shared values of ‘Salary’ attributes
• Exploit additive homomorphic property
Simple Aggregation using Secret-Shared Data
EmpID Name Salary Dept
E101 John 1000 Testing
E101 John 100000 Security
E102 Adam 5000 Testing
E103 Eve 2000 Design
SELECT SUM(Salary) FROM Employee
•Aggregation with complex selection obliviously, i.e.,
access-pattern hiding
•Complex Selection Query Execution
•Join Query Execution
Challenges
• The DB owner keeps each polynomial,
which was used to create database
shares
• To execute a query, the DB owner
creates shares of the query predicate
and fetches the desired value from the
clouds
• Very fast
• Access-pattern attack
• Distribution revealing
DB Owner Assisted Query Execution
Reference: Fatih Emekci, Ahmed Methwally, Divyakant Agrawal, and Amr El Abbadi. “Dividing secrets to secure data outsourcing.” Information Sciences 263 (2014): 198-210.
• How to search on secret-shared outsourced data
• Without remembering any polynomial, which were used to create the
dataset
• Otherwise, the DB owner can store the entire dataset also
• Supporting multiple-DB owners
Big Question
Solution
Non-interactive string-matching over the secret-shared data
Step 1: Unary representation
Step 2: Creating secret-shares of unary represented data
Step 3: Outsourcing the data
String Matching over Secret-Shared Data
A
B
C
1, 0, 0
0, 1, 0
0, 0, 1
Polynomials
Secret-shares
Secret-shares
Secret-shares
Reference: Dolev et al. Accumulating automata and cascaded equations automata for communicationless information theoretically secure multi-party computation, TCS 2019.
String-Matching over Secret-Shared Data
Secret-Share Creation by the DB owner
B
0
1
0
0 + 5x
1 + 9x
0 + 2x
5
10
2
10
19
4
15
28
6
This is representing B 0, 1, 0 of secret-shared form
→
The adversary cannot learn the actual value, B
Dolev et al. Accumulating automata and cascaded equations automata for communicationless information theoretically secure multi-party computation, 2019.
String-Matching over Secret-Shared Data
5
10
2
10
19
4
15
28
6
User wants
to search
for
B0
1
0
0 + x
1 + 2x
0 + 4x
No need to share any
polynomial b/w the DB
owner and the user
1
3
4
2
5
8
3
7
12
Secret-Share
Creation by
the user
These shares are
representing B 0, 1, 0
of secret-shared form
→
The adversary cannot
learn the actual value,
B, of either the dataset
or the query predicate
Dolev et al. Accumulating automata and cascaded equations automata for communicationless information theoretically secure multi-party computation, 2019.
String-Matching over Secret-Shared Data
5
10
2
10
19
4
15
28
6
1
3
4
2
5
8
3
7
12
Cloud
operations:
Multiplication
and addition of
shares
5
30
8
20
95
32
45
196
72
43
147
313
User wants
to search
for
B
Lagrange
interpolation
Answer = 1
This is the multiplication of [0,1,0]
and [0,1,0] in secret-shared form.
So using SSS, we are hiding 1 or 0
from the adversary.
Each cloud sends only one value to the user,
regardless of dataset size →
Less communication cost
Dolev et al. Accumulating automata and cascaded
equations automata for communicationless
information theoretically secure multi-party
computation, 2019.
Can we use this string-matching technique for solving
other operations such as selection and aggregation?
V1 V2
• Based on string-matching techniques explained previously
• Supporting database outsourcing using SSS
• Execute complex selection (conjunctive and disjunctive) in an
oblivious manner
• No communication among servers
• Minimize work at the database owner site
• Result Verification Methods
• Count, Sum, Maximum, Minimum, Top-K
• Tuple verification
OBSCURE: Oblivious and Verifiable Aggregation Queries
Reference: Peeyush Gupta, Yin Li, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, and Sumaya Almanee. “Obscure: Information-theoretic oblivious and verifiable aggregation queries.”
Proceedings of the VLDB Endowment 12, no. 9 (2019): 1030-1043.
OBSCURE: Data Outsourcing using OBSCURE
EmpID Name Salary
E101 John 1000
E101 John 100000
E102 Adam 5000
E103 Eve 2000
CleartextTID SSTID Salary
5 5 5000
4 4 1000
3 3 1000
2 2 100000
Employee Relation
Create shares using SSS Create shares using OP-SS
Only order of
values is revealed.
But, which row has
the highest value is
not revealed.
Fast
answering
to
maximum
finding
queries.
EmpID Name Salary TID Index
For verification
purpose
E101 John 1000 3 3
E101 John 100000 2 2
E102 Adam 5000 5 5
E103 Eve 1000 4 4
E1 E2
• Step 1: Convert query predicates to secret-share representation
• Step 2: Send secret-shares query predicate to the servers
OBSCURE: Conjunctive Count Query
Name
John
John
Adam
Eve
Salary
1000
100000
5000
1000
John
John
John
John




String-Matching
Operation over
Secret-Shares
1
1
0
0
Answers of
String-Matching
Operations
1000
1000
1000
1000



1
0
0
1
Query
predicate
String-Matching
Operation over
Secret-Shares
Answers of
String-Matching
Query
predicate
1
0
0
0
1
Final answer to
the query
select count(*) from Employee where Name = ‘John’ and Salary = 1000
Multiply
Add
Multiplication increases the degree of the polynomial
If we have a smaller number of servers than the desired
number of servers, then we can still solve the problem by
1. Increasing communication rounds
2. Increasing computation time
V1 V2
OBSCURE: Count Query – Security Guarantees
select count(*) from Employee where Name = ‘John’ and Salary = 1000
• Identical operations on each row  Oblivious execution
• Hide access-patterns: The adversary cannot learn which rows have satisfied the query
• The adversary cannot learn anything
• By observing the values of the data and query predicates, since all values are secret-shared
• No output-size attack
Name
John
John
Adam
Eve
Salary
1000
100000
5000
1000
John
John
John
John




String-Matching
Operation over
Secret-Shares
1
1
0
0
Answers of
String-Matching
Operations
1000
1000
1000
1000



1
0
0
1
Query
predicate
String-Matching
Operation over
Secret-Shares
Answers of
String-Matching
Operations
Query
predicate
Impact of #Shares – Conjunctive Count Query
Name
John
John
Adam
Eve
Salary
1000
100000
5000
1000
John
John
John
John




1
1
0
0
1000
1000
1000
1000



1
0
0
1
1
0
0
0
1
select count(*) from Emp where
Name = ‘John’ and Salary = 1000 and Age = 40
Multiply
Add
Age
40
40
50
40
40
40
40
40



1
1
0
1
Polynomial
degree = 3
• Min. number of shares of interpolate a polynomial of the degree = 3
• Need four shares
V2
V3
V1
Reference: Peeyush Gupta, Yin Li, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, and Sumaya Almanee. “Obscure: Information-theoretic oblivious and verifiable aggregation queries.”
Proceedings of the VLDB Endowment 12, no. 9 (2019): 1030-1043.
Impact of #Shares – Conjunctive Count Query
select count(*) from Emp where
Name = ‘John’ and Salary = 1000 and Age = 40
• What if you have only three shares?
• Compute the result of any two predicate, e.g., Salary = 1000 and Age = 40
• And execute the remaining query at the user side
Name
John
John
Adam
Eve
Salary
1000
100000
5000
1000
John
John
John
John



1
1
0
0
1000
1000
1000
1000



1
0
0
1
Age
40
40
50
40
40
40
40
40



1
1
0
1
Multiply
1
0
0
1
V2
V'
V1 V3
OBSCURE: Count Query Result Verification
EmpID Name Salary TID Index
With
Something for
verification
E101 John 1000 3 3
E101 John 100000 2 2
E102 Adam 5000 5 5
E103 Eve 1000 4 4
EmpID Name Salary TID Index A B
E101 John 1000 3 3 1 1
E101 John 100000 2 2 1 1
E102 Adam 5000 5 5 1 1
E103 Eve 1000 4 4 1 1
What is this
here???
Two columns,
each is having
1 of SSS form
OBSCURE: Count Query Result Verification
Verify the answer of the following query:
select count(*) from Employee where Name = ‘John’ and Salary = 1000
1
0
0
0
A
1
1
1
1
B
1
1
1
1
0
1
1
1
1 - Value
Multiply
1
0
0
0
0
1
1
1
3
1
Add all
values
Add all
values
MultiplyCount
query
result for
each row
OBSCURE: Count Query Result Verification
Verify the answer of the following query:
select count(*) from Employee where Name = ‘John’ and Salary = 1000
1
3
The first value matches the result of the
count query →
The count query result is correct
The sum of the two values equals to the
number of rows in the dataset →
The server has scanned all the rows
to compute the answer
OBSCURE: Maximum Query
select * from Employee where Salary
in (select max(Salary) from Employee)
EmpID Name Salary Dept TID Index
E101 John 1000 Testing 3 3
E101 John 100000 Security 2 2
E102 Adam 5000 Testing 5 5
E103 Eve 1000 Design 4 4
CleartextTID SSTID Salary
5 5 5000
4 4 1000
3 3 1000
2 2 100000
Find the tuple with the
maximum salary
CleartextTID SSTID Salary
2 2 100000
Output
Based on string matching over TID
and SSTID, find the tuple having the
maximum salary
E101 John 100000 Security 2 2
E1
E2
• Dataset
• TPC-H LineItem Table 1M and 6M rows
• Cloud Machines
• 15 AWS servers, each 144GB RAM, 3.0GHz Intel Xeon CPU with 72 cores
• Database Owner or User Machine
• A 16GB RAM machine with one core
OBSCURE: Experimental Results
Reference: Peeyush Gupta, Yin Li, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, and Sumaya Almanee. “Obscure: Information-theoretic oblivious and verifiable aggregation queries.”
Proceedings of the VLDB Endowment 12, no. 9 (2019): 1030-1043.
OBSCURE vs MPC (communication among servers)
Reference: Peeyush Gupta, Yin Li, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, and Sumaya Almanee. “Obscure: Information-theoretic oblivious and verifiable aggregation queries.”
Proceedings of the VLDB Endowment 12, no. 9 (2019): 1030-1043.
OBSCURE vs Downloading and Local Processing
1M rows 6M rows
At most time is
13 seconds
At most time is
50 seconds
Computation time at a resource constrained user
(1GB RAM and single core 1.35GHz CPU)
1M rows  at most 13seconds < 26 seconds (downloading)
6M rows  at least 50seconds < 385seconds (downloading)
OBSCURE: Experiments Results – Query Execution vs
Verification Time
• At most 10 seconds on 1M rows count and sum operation verification
• At most 35 seconds on 6M rows count and sum operation verification
• Cybernetica
• Galois Inc.
• Partisia
• Secret Double Octopus
• SecretSkyDB Ltd
• Stealth Software Technologies
• Unbound Tech.
Industrial Efforts (1)
• Built on top of PostgreSQL and
supports most of SQL
• Offer multiple encryption techniques
• Deterministic encryption
• Order Preserving Encryption
• Multi-Party Computation using SPDZ
engine
• Users can select different encryption
techniques for different attributes
• Provide end-to-end privacy
preservation for relational queries
• Limitations:
• Joins on deterministically encrypted
attribute is allowed
• Search on MPC domain is very slow
Industrial Efforts (2): Jana by Galois Inc.
Thanks: David Archer @ Galois
• Stealth Software Technologies, Inc. is a small
business based in Los Angeles
• Team of world-class cryptographers and software
engineers who are pioneers in their field and have
been building solutions
• Private Updateable Lightweight Scalable Active
Repository (PULSAR) as part of the DARPA
Brandeis program
• Aggregate and analyze data while maintaining
privacy and authenticity of data
• Combines a multitude of cryptographic techniques
• Secure database via access-pattern hiding Searchable
Encryption [IKLO16]
• Function Secret Sharing [BGI15]
• Efficient Secure Multiparty Computation (MPC)
• Garbled RAM [LO13]
Industrial Efforts (3): PULSAR by Stealth Software
Technologies, Inc.
Thanks: Steve Lu @ Stealth
Comparing Secret-Sharing-based Systems
Features Pulsar (v0.5.1-269) Jana (v1.7.6) OBSCURE
Incremental insert support No Yes Yes
Indexing support Yes Yes -- limited to DET and plaintext No
Support for Sensitivity No --- encrypts all data Yes --- encrypts attributes as
described by the application designer
No
Select and range queries Yes Yes Yes
Analytical query support
(group by and join)
No --- (but application-level
joins possible, though may leak
data)
Partial --- Join only on plaintext or
DET attributes
No join
User-defined functions No No No
Academic/Industry Industry Industry Academic
• Information-theoretically secure
• Secure regardless of the computational power of an adversary
• Quantum secure
• Communication overheads
• Cannot deal with complex queries
• Join queries
• Nested queries
• Require more than one non-communicating servers
Pros and Cons of Secret-Sharing-based Techniques
• Introduction
• How to securely process data at the cloud?
• Challenges and overview of existing state-of-the-art
• Cryptographic Techniques
• Encryption-based Data Outsourcing
• Secret-Sharing-based Data Outsourcing
• Exploiting Trusted Computing Platforms
• Secure hardware
• Hybrid cloud
• Data Partitioning-based Outsourced Data Processing
• Conclusion and Open Problems
Contents
Motivation for SGX
• Security and isolation in
commodity systems
• Privilege levels (rings) protect the
kernel from user programs
Motivation for SGX
• Security and isolation in
commodity systems
• Privilege levels (rings) protect the
kernel from user programs
• Page tables protect programs
from each other
Motivation for SGX
• Security and isolation in
commodity systems
• Privilege levels (rings) protect the
kernel from user programs
• Page tables protect programs
from each other
Operating Systems haven’t changed for decades
89
 40 years old
 Time-sharing
 Expensive hardware
 Overly general
Ken Thompson (sitting) and Dennis Ritchie working together at a PDP-11 (1972)
•17,000,000 LoC
• 40 subsystems
• 3,200 device drivers
90
Modern Kernels are Vulnerable
0
100
200
300
400
500
2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Linux Kernel Vulnerabilities by Year
Motivation for SGX
• Security and isolation in
commodity systems
• Privilege levels (rings) protect the
kernel from user programs
• Page tables protect programs
from each other
• Until one program (malware)
attacks the kernel and then
attacks any program in the
system
TCB of A Modern System
• Attack surface is giant
• OS kernel
• 17,000,000 lines of code
• 40 major subsystems
• 3,200 device drivers
• Virtual Machine Monitor
• Hypervisor
• QEMU emulator
• Device drivers
• Parts of host kernel
(KVM)/Domain0 (Xen)
Enclaves
• Applications can protect their
secrets
• TCB is small
• Intel CPU
• App code itself
• Protected from malicious
• BIOS
• SMM
• Hypervisor
• Kernel
• Familiar application
environment
SGX Enclaves
• Trusted execution environment embedded in the process
SGX Enclaves
• Trusted execution environment embedded in the process
• It’s own code and data
• Controlled entry points
• Multi-threading
• Confidentiality
• Integrity
Performance
• Enters and exits are expensive
• Memory is encrypted
• Limited physical memory
Performance
Security
Powerful Adversary Model
• OS + VMM
• Controlled execution environment
• Control over page faults
• Suspending execution
• Single stepping
• Flushing caches
• Every architectural component of the CPU
• Branch target buffers
• S. Lee et al., “Inferring fine-grained control flow inside SGX enclaves with branch shadowing,” in USENIX Security, 2017
• G. Chen et al., “SgxPectre attacks: Stealing intel secrets from SGX enclaves via speculative execution,” arXiv preprint, 2018.
• Pattern-history table
• D. O'Keeffe et al., "Spectre attack against SGX enclave," 2018
• Caches
• Brasser et al., "Software grand exposure: SGX cache attacks are practical," in WOOT, 2017
• J. Gotzfried et al., "Cache attacks on Intel SGX," in EuroSec, 2017
• A. Moghimi et al., "Cachezoom: How SGX amplifies the power of cache attacks," in CHES, 2017
• M. Hahnel et al., "High-resolution side channels for untrusted operating systems," in USENIX ATC, 2017
• M. Schwarz et al., "Malware guard extension: Using SGX to conceal cache attacks," in DIMVA, 2017
• DRAM row buffer
• W. Wang et al., "Leaky cauldron on the dark land: Understanding memory side-channel hazards in SGX," in CCS, 2017
• Page-tables
• W. Wang et al., “Leaky cauldron on the dark land: Understanding memory side-channel hazards in SGX,” in CCS, 2017
• J. Van Bulck et al., “Telling your secrets without page faults: stealthy page table-based attacks on enclaved execution,” in USENIX,
2017
• Page-fault exception handlers
• Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
• S. Shinde and other, “Preventing page faults from telling your secrets,” in CCS, 2016
• Speculative execution
• J. V. Bulck et al., “Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution,” in USENIX, 2018
Side-Channel Attacks
• Controlled channel attacks
Page-Fault Tracing Attacks
Reference: Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
• Page fault address depends on sensitive data
Page Fault Tracing Attacks
Reference: Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
• Insertions are
deterministic
• Word order is known
• Observe sequence of
page faults
• Lookup exhibit same
sequences
Example: Recovering Text via Spell Checker
Reference: Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
• Wizard of Oz
• All words
• 96% accuracy
Example: Recovering Text via Spell Checker
Reference: Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
Cache Attacks: Prime + Probe
Reference: Brasser et al., "Software grand exposure: SGX cache attacks are practical," in WOOT, 2017
• Isolated core
• Execute attack in L1
• Separate instruction and
data caches
• No slef-pollution
• SMT
• Uninterrupted execution
• Performance Monitoring
Counters (PMC)
• Cache-misses
Controlled Execution Environment
Reference: Brasser et al., "Software grand exposure: SGX cache attacks are practical," in WOOT, 2017
Example: Cache-Tracing Attack
Reference: M. Hahnel et al., "High-resolution side channels for untrusted operating systems," in USENIX ATC, 2017
Text Reconstruction
Reference: M. Hahnel et al., "High-resolution side channels for untrusted operating systems," in USENIX ATC, 2017
Cache-Tracing: Reconstructed Text
Reference: M. Hahnel et al., "High-resolution side channels for untrusted operating systems," in USENIX ATC, 2017
• SGX does not clear branch history
Branch Shadowing Attack
Reference: S. Lee et al., “Inferring fine-grained control flow inside SGX enclaves with branch shadowing,” in USENIX Security, 2017
• SGX does not clear branch
history
• Can we extract this
information?
Branch Shadowing Attack
Reference: S. Lee et al., “Inferring fine-grained control flow inside SGX enclaves with branch shadowing,” in USENIX Security, 2017
• 66% of 1024 RSA private key from a
single run
• Full key from 10 runs
Branch Shadowing Attack
Reference: S. Lee et al., “Inferring fine-grained control flow inside SGX enclaves with branch shadowing,” in USENIX Security, 2017
Possible Defenses
Data-Oblivious Primitives
• Assignments and comparisons
Reference: Ohrimenko, Olga, et al. "Oblivious multi-party machine learning on trusted processors." USENIX Security, 2016.
Data-Oblivious Primitives
• Assignments and comparisons
Reference: Ohrimenko, Olga, et al. "Oblivious multi-party machine learning on trusted processors." USENIX Security, 2016.
• Array access
• Scan entire array
• AVX instructions
Data-Oblivious Primitives
Reference: Ohrimenko, Olga, et al. "Oblivious multi-party machine learning on trusted processors." USENIX Security, 2016.
What’s The Future?
• Will be fixed
• Caches
• Partitioned caches
• Branch predictors and likely other microarchitectural components of the CPU
• Speculative Taint Tracking (STT)
• Yu, Jiyong, et al. "Speculative Taint Tracking (STT): A Comprehensive Protection for
Speculatively Accessed Data." Micro, 2019
• What will not be fixed
• Paging attacks
• SGX inherently leaves page table under control of the OS
• Memory
• Enclave’s memory is observable by the OS and hardware attacks
• ORAM is 10x overhead
What will be fixed in hardware
Can we design any system supporting
database operations using SGX?
• It’s possible to build an oblivious database
• Oblivious primitives for accessing records
• Oblivious sort for joins
• Parallel Bitonic sort N*(log(N))2
Question
Opaque (1)
Patient ID disease
1 Fever
2 Cancer
3 Fever
4 Cancer
5 Cancer
Reference and slide credit: Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada Popa, Joseph E. Gonzalez, and Ion Stoica. "Opaque: An oblivious and encrypted distributed
analytics platform." In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 283-298. 2017.
• First system supporting database joins and aggregations using SGX
• However, supports primary key to foreign key join only
• How many people are suffering from cancer and fever
Opaque (2)
• How many people are suffering from cancer and fever
Patient ID disease
1 Fever
2 Cancer
3 Fever
4 Cancer
5 Cancer
Reference: Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada Popa, Joseph E. Gonzalez, and Ion Stoica. "Opaque: An oblivious and encrypted distributed analytics platform."
In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 283-298. 2017.
Opaque (2): Oblivious Aggregation
• How many people are suffering from cancer and fever
Patient ID disease
1 Fever
2 Cancer
3 Fever
4 Cancer
5 Cancer
Patient ID disease
2 Cancer
4 Cancer
5 Cancer
1 Fever
3 Fever
Quicksort
in
SGX
Cancer, 3
Fever, 2
Patient ID disease
1 Fever
2 Cancer
3 Fever
4 Cancer
5 Cancer
Decrypt
Reference: Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada Popa, Joseph E. Gonzalez, and Ion Stoica. "Opaque: An oblivious and encrypted distributed analytics platform."
In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 283-298. 2017.
What is wrong assumption of Opaque?
Not dealing with side-channel attacks (cache-line, branching)
But all side-channel attacks cannot be solved in future
• ObliDB
• Selection and join
• HardIDX
• Secure indexes using SGX
• VC3
• For secure MapReduce computations
• EnclaveDB
• For secure transaction support
• Hermetic
• Mixed differential privacy
Other Systems using Intel SGX
Data Partitioning-based Outsourced
Data Processing
•What we have seen in the previous slides?
• Many cryptographic solutions exist
• Not efficient for answering even simple queries
Scaling Secure Data Management
“At scale” solutions requires choice between
generality, security or performance.
Weaker Security Models: use weaker
models of security to scale computation
(explored in several prior systems)
Partitioned Computing: exploit partial
sensitivity of data to prevent expensive
cryptography on data that is not
sensitive
Partitioned Data Security
• Non-Linkability
• The Adversary does not learn relationship between any encrypted and plaintext
value
• Cyphertext Indistinguishability
• The adversary does not learn any relationships between encrypted values
• unless underlying crypto allows such relationships to be learnt (e.g., OPE)
Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data."
In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 650-661. IEEE, 2019.
Partitioned Computations at Public Cloud (1)
Name Department
t1 E(Adam) E(Defense)
t2 E(John) E(Security)
t3 E(Clark) E(Crypto)
t4 E(Lisa) E(Defense)
Name Department
t5 Adam Testing
t6 John Testing
t7 Lisa Design
t8 Clark Design
Query Q Answer A
Query Qs Query Qns
Answer Ans
Answer As
Sensitive Data Ds
Non-sensitive Data Dns
Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data."
In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 650-661. IEEE, 2019.
Leakage due to Partitioned Computing… (2)
Name Department
t1 E(Adam) E(Defense)
t2 E(John) E(Security)
t3 E(Clark) E(Crypto)
t4 E(Lisa) E(Defense)
Name Department
t5 Adam Testing
t6 John Testing
t7 Lisa Design
t8 Clark Design
Sensitive Data Ds
Non-sensitive Data Dns
Query: Retrieve John rows
Query
value
Tuples retrieved
from sensitive side
Tuples retrieved from
non-sensitive side
John T2 T6
Adversarial view
T2 is John’s row.
Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data." In 2019 IEEE 35th International Conference on Data Engineering (ICDE),
What if we use access-pattern-hiding techniques? (3)
Name Department
t1 E(Adam) E(Defense)
t2 E(John) E(Security)
t3 E(Clark) E(Crypto)
t4 E(Lisa) E(Defense)
Name Department
t5 Adam Testing
t6 John Testing
t7 Lisa Design
t8 Clark Design
Sensitive Data Ds
Non-sensitive Data Dns
Query: Retrieve John rows
Query
value
Tuples retrieved
from sensitive side
Tuples retrieved from
non-sensitive side
John E(….) T6
Adversarial view
Output size reveals that one of
John’s record is sensitive.
Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data." In 2019 IEEE 35th International Conference on Data Engineering (ICDE),
Secure Partitioned Computation (1)
• Data partitioned into bins
• Non-sensitive data partitioned into
non-sensitive bins (NSB)
• Sensitive data partitioned into
sensitive bin (SB)
……E( x)……..
…… x ……..
…… y……..
…… z .……..
…….……..
……E(y) ……..
…… E(z)……..
…….……..
Ds
Dns
SB(x)
SB(y)
SB(z)
NSB(x)
NSB(y)
NSB(z)
Query
value
Tuples retrieved
from sensitive side
Tuples retrieved from
non-sensitive side
John SB(y) NSB(y)
Adversarial view
• Query Q for value y mapped to
all values in the bin
corresponding to y
• Retrieves all data in NSB(y) over
non-sensitive data
• Retrieves all data in SB(y) over
sensitive data
Secure Partitioned Computation (2)
• Bins are created such that for all pairs of sensitive and non-sensitive bins,
there exists a value v,
• such that s  SB(v) and ns  NSB(v)
• The adversarial view does not allow the adversary to learn linkability
between sensitive and non-sensitive records
……E( x)……..
…… x ……..
…… y……..
…… z .……..
…….……..
……E(y) ……..
…… E(z)……..
…….……..
Ds
Dns
SB(x)
SB(y)
SB(z)
NSB(x)
NSB(y)
NSB(z)
Secure Partitioned Computation (3)
• Association amongst each sensitive bin and non-sensitive bin prevents
• Leakage through joint access of data
• Output size attacks
• Workload skew attacks can be prevented through (careful) addition of
(minimal) fake queries
……E( x)……..
…… x ……..
…… y……..
…… z .……..
…….……..
……E(y) ……..
…… E(z)……..
…….……..
Ds
SB(x)
SB(y)
SB(z)
NSB(x)
NSB(y)
NSB(z)
Dns
Query Binning
• Assumptions
• Equal number of sensitive and non-sensitive attribute values
• Each distinct attribute value appears in at most one tuple in sensitive and one
tuple in non-sensitive data
• Number of values are a product of approximately equal factors
***The paper relaxes all these assumptions
The Algorithm: One Tuple Per Value
Bin Creation: Inputs: S and NS
• Permute all sensitive values
• Find approximate square factor of |NS| = x * y such that x
≥ y
• Create x sensitive bins; contains at most y inputs in each
• Create |NS|/x non-sensitive bins
• Assign ith sensitive value to (i mod x)th sensitive bin
• Assigning non-sensitive values: Assign non-sensitive value
corresponding to ith sensitive value, which is allocated to
jth bin, to jth position of ith non-sensitive bin
• NSB[j][i]  allocateNS(SB[i][j])
• Fill remaining NS values
S = 6 NS = 6
x = 3
y = 2
SB1
SB2
SB3
NSB1
NSB2
S1
S2
S3
S4
S5
S6
NS2 NS3NS1
NS7 NS6NS4
S = {S1, S2, S3, S4, S5, S6}
NS = {NS1, NS2, NS3, NS6, NS7}
The Algorithm: One Tuple Per Value
• Bin Retrieval: Input: Query(w)
• If w is in a sensitive bin SB[i][j], then
• Retrieve ith sensitive bin and jth non-sensitive bin
• If w is in a non-sensitive bin NSB[i][j], then
• Retrieve ith non-sensitive bin and jth sensitive bin
S = 6 NS = 6
x = 3
y = 2
S = {S1, S2, S3, S4, S5, S6}
NS = {NS1, NS2, NS3, NS6, NS7}
Query: S2 SB2, NSB1
Query: NS7 NSB1, SB2
SB1
SB2
SB3
NSB1
NSB2
S1
S2
S3
S4
S5
S6
NS2 NS3NS1
NS7 NS6NS4
Query Execution Cost on Outsourced Data
Techniques Time Resilient to attacks
Size Workload-skew Access-patterns
SGX 10500x
Query Binning + SGX (60% sensitivity) 8929x
Multi-party computations-Jana 954363x
Query Binning + Jana (60% sensitivity) 680131x
x is the time to search a predicate in cleartext.
is showing a technique is resilient to a given attack.
Experiments are conducted over 1.5M rows.
Experimental Results (Selection Query)
• X-axis = Data sensitivity (1%, 2%, 20%, 40%, 60%)
• Y-axis = time
SGX Opaque + Partition computing vs SGX Opaque
Data set size = 6M rows
Jana MPC + Partition computing vs Jana MPC
Data set size = 1M rows
Analytical Model
• When is query binning better compared to pure cryptographic approach?
Ratio of cost of QB versus
crypto only approach
After several rounds of
simplications (see paper)
Under ideal assumptions….
QB is better than cryptographic only
solution if this holds (see paper)
Ratio of computation cost of cryptographic
techniques vs plaintext per tuple
Ratio of cryptographic computation vs
communication cost per tuple (typically much
greater than 1 for strong cryptographic techniques)
Average query selectivityRatio of sensitive data
• If there is no approximate square factor?
• Select nearest square number
• If there is no 1-to-1 mapping of sensitive and non-sensitive value, and
differences in size of the values?
• Bin-packing algorithm
• What about range queries?
• With the help of a modified B-tree created over non-sensitive bins
• What about join queries?
• Keep pseudo-sensitive data with sensitive data
• What about aggregation queries?
• Execute like a selection query without tuple fetching
Query Binning Extensions
Distinct Values are not a Product of Approximately
Square Factor (1)
• What will happen when the number of distinct values is not a product
of approximately square factor ???
• Increasing communication cost
• For example 82 non-sensitive values, results in 41 sensitive bins and 2 non-
sensitive bins
ns1, ns2, …, ns41
ns42, ns43, …, ns82
E(s1)
E(s2)
E(s41)
SB1
SB2
SB41
NSB1
NSB2
Communication cost = 42
At most 1 value in
a sensitive bin
At most 41 values in a
non-sensitive bin
Distinct Values are not a Product of Approximately
Square Factor (2)
• Reducing communication cost --- by finding nearest square number
• In the case of 82 non-sensitive values, 81 is nearest square number
• Thus, create 9-9 sensitive and non-sensitive bins
ns1, ns2, …, ns10
ns11, ns12, …, ns19
….E(x)….
…E(y)…..
….E(z)…..
SB1
SB2
SB9
41Sensitivevalue
82Non-sensitivevalue
Communication cost = 15
ns74, ns75, …, ns82
At most 5 values
in a sensitive bin
At most 10 values in a
non-sensitive bin
NSB1
NSB2
NSB9
The Algorithm: General Case: Multiple Tuples per Value
(1)
• What will happen if all values have a
different number of tuples??
• Size of each sensitive bin is different now
• Assumption: More non-sensitive values
have more sensitive associated tuples.
• The adversary learns from tuple retrieval
that which bin contain sensitive value
corresponding to non-sensitive values
• E.g., retrieval of SB1 and NSB1 reveals that
S1 is allocated to SB1
S = 6 NS = 6
x = 3
y = 2
SB1
SB2
SB3
NSB1
NSB2
S1
S2
S3
S4
S5
S6
NS2 NS3NS1
NS7 NS6NS4
S1 = 10
S2 = 2
S3 = 1
S4 = 15
S5 = 2
S6 = 1
NS1 = 200
NS2 = 20
NS3 = 10
NS4 = 150
NS5 = 10
NS7 = 10
Size of bin
25
4
2
Size of
bin
230
170
The Algorithm: General Case: Multiple Tuples per Value
(2)
• What will happen if all values have a
different number of tuples?
• Solution: Simply add fake tuples to
sensitive bins
• Problem: too many fake tuples
leading to increases communication
cost
• So how to overcome this problem???
S = 6 NS = 6
x = 3
y = 2
SB1
SB2
SB3
NSB1
NSB2
S1
S2
S3
S4
S5
S6
NS2 NS3NS1
NS7 NS6NS4
S1 = 10
S2 = 2
S3 = 1
S4 = 15
S5 = 2
S6 = 1
NS1 = 200
NS2 = 20
NS3 = 10
NS4 = 150
NS5 = 10
NS7 = 10
Size of bin
25
4
2
Size of
bin
230
170
Added fake
tuples
0
21
23
We add 44 fake tuples to
sensitive data
The Algorithm: General Case: Multiple Tuples per Value
(3)
• What will happen if all values have a
different number of tuples?
• Solution: Bin-packing-based approach
• Sorting: Sort all the values in a decreasing
order of the number of tuples.
• Allocate sensitive values
• Add fake tuples
• Allocate non-sensitive values as we showed
previously
S = 6 NS = 6
x = 3
y = 2
SB1
SB2
SB3
NSB1
NSB2
S4
S1
S2
S6
S3
S5
NS1 NS2NS7
NS3 NS5NS6
S1 = 10
S2 = 2
S3 = 1
S4 = 15
S5 = 2
S6 = 1
NS1 = 200
NS2 = 20
NS3 = 10
NS4 = 150
NS5 = 10
NS7 = 10
Size of bins
before adding
faking tuples
16
11
4
Added fake
tuples
0
5
12
S4 = 15
S1 = 10
S2 = 2
S5 = 2
S3 = 1
S6 = 1
After
sorting
We add fewer fake tuples than a simple
solution of adding fake tuples
44 vs 17 fake tuples
Range Queries
• A full binary-tree is constructed for all non-sensitive value
• Bins are created for each level of the tree, except the root node
• Bins are retrieved based on least-matching
• For example, a range query from ns8 to ns12  Bins as per node ns23 and ns8
Bins for each node of each level of the tree
• Introduction
• How to securely process data at the cloud?
• Challenges and overview of existing state-of-the-art
• Cryptographic Techniques
• Encryption-based Data Outsourcing
• Secret-Sharing-based Data Outsourcing
• Exploiting Trusted Computing Platforms
• Secure hardware
• Hybrid cloud
• Data Partitioning-based Outsourced Data Processing
• Conclusion and Open Problems
Contents
• We discussed:
• Encryption-based techniques and systems
• Secret-sharing-based techniques and systems
• Existing cryptographic techniques are
• Functionality vs security vs overhead
• Secret-sharing is secure but limited applicability
• Searchable encryption is fast but reveal information
• Trusted platform-based approaches are faster than cryptographic techniques
• But there is no completely trusted platform at the public cloud
• Existing secure hardware have several vulnerability
• Can we exploit secure mediation approach
• Different cryptographic technique at the same time
• Security is not clear
• Initial effort: partitioned computation but security challenges -- a naïve query execution on
partitioned data can lead to information leakage
Conclusion
Contact Information
Shantanu Sharma
University of California, Irvine, USA.
shantanu.sharma[AT]uci[DOT]edu
toshantanusharma[AT]gmail[DOT]com
Slides are available at
ics.uci.edu/~shantas/

Mais conteúdo relacionado

Mais procurados

The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?Raffael Marty
 
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEMSECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEMJournal For Research
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big DataRaffael Marty
 
A PPLICATION OF C LASSICAL E NCRYPTION T ECHNIQUES FOR S ECURING D ATA -...
A PPLICATION OF  C LASSICAL  E NCRYPTION  T ECHNIQUES FOR  S ECURING  D ATA -...A PPLICATION OF  C LASSICAL  E NCRYPTION  T ECHNIQUES FOR  S ECURING  D ATA -...
A PPLICATION OF C LASSICAL E NCRYPTION T ECHNIQUES FOR S ECURING D ATA -...IJCI JOURNAL
 
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...DataWorks Summit
 
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...Pvrtechnologies Nellore
 
Accessing secured data in cloud computing environment
Accessing secured data in cloud computing environmentAccessing secured data in cloud computing environment
Accessing secured data in cloud computing environmentIJNSA Journal
 
An efficient, secure deduplication data storing in cloud storage environment
An efficient, secure deduplication data storing in cloud storage environmentAn efficient, secure deduplication data storing in cloud storage environment
An efficient, secure deduplication data storing in cloud storage environmenteSAT Journals
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKThreat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKElasticsearch
 
IRJET- Secure Sharing of Personal Data on Cloud using Key Aggregation and...
IRJET-  	  Secure Sharing of Personal Data on Cloud using Key Aggregation and...IRJET-  	  Secure Sharing of Personal Data on Cloud using Key Aggregation and...
IRJET- Secure Sharing of Personal Data on Cloud using Key Aggregation and...IRJET Journal
 
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...dbpublications
 
An extensive research survey on data integrity and deduplication towards priv...
An extensive research survey on data integrity and deduplication towards priv...An extensive research survey on data integrity and deduplication towards priv...
An extensive research survey on data integrity and deduplication towards priv...IJECEIAES
 
Secure retrieval of files using homomorphic encryption for cloud computing
Secure retrieval of files using homomorphic encryption for cloud computingSecure retrieval of files using homomorphic encryption for cloud computing
Secure retrieval of files using homomorphic encryption for cloud computingeSAT Publishing House
 
How To Drive Value with Security Data
How To Drive Value with Security DataHow To Drive Value with Security Data
How To Drive Value with Security DataRaffael Marty
 
DLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and AlgorithmsDLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and AlgorithmsLiwei Ren任力偉
 
Enabling secure and efficient ranked keyword
Enabling secure and efficient ranked keywordEnabling secure and efficient ranked keyword
Enabling secure and efficient ranked keywordIMPULSE_TECHNOLOGY
 
Analysis of Cryptographic Algorithms for Network Security
Analysis of Cryptographic Algorithms for Network SecurityAnalysis of Cryptographic Algorithms for Network Security
Analysis of Cryptographic Algorithms for Network SecurityEditor IJCATR
 
Using DDS to Secure the Industrial Internet of Things (IIoT)
Using DDS to Secure the Industrial Internet of Things (IIoT)Using DDS to Secure the Industrial Internet of Things (IIoT)
Using DDS to Secure the Industrial Internet of Things (IIoT)Gerardo Pardo-Castellote
 

Mais procurados (19)

The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEMSECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
SECURITY BASED ISSUES IN VIEW OF CLOUD BASED STORAGE SYSTEM
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big Data
 
A PPLICATION OF C LASSICAL E NCRYPTION T ECHNIQUES FOR S ECURING D ATA -...
A PPLICATION OF  C LASSICAL  E NCRYPTION  T ECHNIQUES FOR  S ECURING  D ATA -...A PPLICATION OF  C LASSICAL  E NCRYPTION  T ECHNIQUES FOR  S ECURING  D ATA -...
A PPLICATION OF C LASSICAL E NCRYPTION T ECHNIQUES FOR S ECURING D ATA -...
 
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
 
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
Enabling efficient multi keyword ranked search over encrypted mobile cloud da...
 
Accessing secured data in cloud computing environment
Accessing secured data in cloud computing environmentAccessing secured data in cloud computing environment
Accessing secured data in cloud computing environment
 
V5 i7 0169
V5 i7 0169V5 i7 0169
V5 i7 0169
 
An efficient, secure deduplication data storing in cloud storage environment
An efficient, secure deduplication data storing in cloud storage environmentAn efficient, secure deduplication data storing in cloud storage environment
An efficient, secure deduplication data storing in cloud storage environment
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKThreat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
 
IRJET- Secure Sharing of Personal Data on Cloud using Key Aggregation and...
IRJET-  	  Secure Sharing of Personal Data on Cloud using Key Aggregation and...IRJET-  	  Secure Sharing of Personal Data on Cloud using Key Aggregation and...
IRJET- Secure Sharing of Personal Data on Cloud using Key Aggregation and...
 
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
Role Based Access Control Model (RBACM) With Efficient Genetic Algorithm (GA)...
 
An extensive research survey on data integrity and deduplication towards priv...
An extensive research survey on data integrity and deduplication towards priv...An extensive research survey on data integrity and deduplication towards priv...
An extensive research survey on data integrity and deduplication towards priv...
 
Secure retrieval of files using homomorphic encryption for cloud computing
Secure retrieval of files using homomorphic encryption for cloud computingSecure retrieval of files using homomorphic encryption for cloud computing
Secure retrieval of files using homomorphic encryption for cloud computing
 
How To Drive Value with Security Data
How To Drive Value with Security DataHow To Drive Value with Security Data
How To Drive Value with Security Data
 
DLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and AlgorithmsDLP Systems: Models, Architecture and Algorithms
DLP Systems: Models, Architecture and Algorithms
 
Enabling secure and efficient ranked keyword
Enabling secure and efficient ranked keywordEnabling secure and efficient ranked keyword
Enabling secure and efficient ranked keyword
 
Analysis of Cryptographic Algorithms for Network Security
Analysis of Cryptographic Algorithms for Network SecurityAnalysis of Cryptographic Algorithms for Network Security
Analysis of Cryptographic Algorithms for Network Security
 
Using DDS to Secure the Industrial Internet of Things (IIoT)
Using DDS to Secure the Industrial Internet of Things (IIoT)Using DDS to Secure the Industrial Internet of Things (IIoT)
Using DDS to Secure the Industrial Internet of Things (IIoT)
 

Semelhante a Secure and Privacy-Preserving Big-Data Processing

Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...ATMOSPHERE .
 
Software Defined Networking in the ATMOSPHERE project
Software Defined Networking in the ATMOSPHERE projectSoftware Defined Networking in the ATMOSPHERE project
Software Defined Networking in the ATMOSPHERE projectATMOSPHERE .
 
110307 cloud security requirements gourley
110307 cloud security requirements gourley110307 cloud security requirements gourley
110307 cloud security requirements gourleyGovCloud Network
 
Cloud Cryptography
Cloud CryptographyCloud Cryptography
Cloud Cryptographyijtsrd
 
Shared responsibility - a model for good cloud security
Shared responsibility - a model for good cloud securityShared responsibility - a model for good cloud security
Shared responsibility - a model for good cloud securityJisc
 
Cloud data governance, risk management and compliance ny metro joint cyber...
Cloud data governance, risk management and compliance    ny metro joint cyber...Cloud data governance, risk management and compliance    ny metro joint cyber...
Cloud data governance, risk management and compliance ny metro joint cyber...Ulf Mattsson
 
A proposed Solution: Data Availability and Error Correction in Cloud Computing
A proposed Solution: Data Availability and Error Correction in Cloud ComputingA proposed Solution: Data Availability and Error Correction in Cloud Computing
A proposed Solution: Data Availability and Error Correction in Cloud ComputingCSCJournals
 
Shared responsibility - a model for good cloud security
Shared responsibility - a model for good cloud securityShared responsibility - a model for good cloud security
Shared responsibility - a model for good cloud securityAndy Powell
 
12-cloud-security.ppt
12-cloud-security.ppt12-cloud-security.ppt
12-cloud-security.pptchelsi33
 
Oral.pptx
Oral.pptxOral.pptx
Oral.pptxSasal6
 
IRJET- Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud Storage
IRJET-  	  Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud StorageIRJET-  	  Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud Storage
IRJET- Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud StorageIRJET Journal
 
Cyber security workshop talk.pptx
Cyber security workshop talk.pptxCyber security workshop talk.pptx
Cyber security workshop talk.pptxkamalakantas
 
SC-900 Concepts of Security, Compliance, and Identity
SC-900 Concepts of Security, Compliance, and IdentitySC-900 Concepts of Security, Compliance, and Identity
SC-900 Concepts of Security, Compliance, and IdentityFredBrandonAuthorMCP
 
Cloud security, Cloud security Access broker, CSAB's 4 pillar, deployment mode
Cloud security, Cloud security Access broker, CSAB's 4 pillar, deployment modeCloud security, Cloud security Access broker, CSAB's 4 pillar, deployment mode
Cloud security, Cloud security Access broker, CSAB's 4 pillar, deployment modeHimani Singh
 
Data Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data ProtectionData Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data ProtectionKaren Lopez
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 

Semelhante a Secure and Privacy-Preserving Big-Data Processing (20)

Encrypted Databases for Untrusted Cloud
Encrypted Databases for Untrusted CloudEncrypted Databases for Untrusted Cloud
Encrypted Databases for Untrusted Cloud
 
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
Managing Trustworthy Big-data Applications in the Cloud with the ATMOSPHERE P...
 
Software Defined Networking in the ATMOSPHERE project
Software Defined Networking in the ATMOSPHERE projectSoftware Defined Networking in the ATMOSPHERE project
Software Defined Networking in the ATMOSPHERE project
 
110307 cloud security requirements gourley
110307 cloud security requirements gourley110307 cloud security requirements gourley
110307 cloud security requirements gourley
 
Cloud Cryptography
Cloud CryptographyCloud Cryptography
Cloud Cryptography
 
Shared responsibility - a model for good cloud security
Shared responsibility - a model for good cloud securityShared responsibility - a model for good cloud security
Shared responsibility - a model for good cloud security
 
Cloud data governance, risk management and compliance ny metro joint cyber...
Cloud data governance, risk management and compliance    ny metro joint cyber...Cloud data governance, risk management and compliance    ny metro joint cyber...
Cloud data governance, risk management and compliance ny metro joint cyber...
 
Web Application Security Testing
Web Application Security TestingWeb Application Security Testing
Web Application Security Testing
 
cloud-complete.ppt
cloud-complete.pptcloud-complete.ppt
cloud-complete.ppt
 
A proposed Solution: Data Availability and Error Correction in Cloud Computing
A proposed Solution: Data Availability and Error Correction in Cloud ComputingA proposed Solution: Data Availability and Error Correction in Cloud Computing
A proposed Solution: Data Availability and Error Correction in Cloud Computing
 
Shared responsibility - a model for good cloud security
Shared responsibility - a model for good cloud securityShared responsibility - a model for good cloud security
Shared responsibility - a model for good cloud security
 
12-cloud-security.ppt
12-cloud-security.ppt12-cloud-security.ppt
12-cloud-security.ppt
 
Oral.pptx
Oral.pptxOral.pptx
Oral.pptx
 
IRJET- Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud Storage
IRJET-  	  Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud StorageIRJET-  	  Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud Storage
IRJET- Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud Storage
 
Cyber security workshop talk.pptx
Cyber security workshop talk.pptxCyber security workshop talk.pptx
Cyber security workshop talk.pptx
 
SC-900 Concepts of Security, Compliance, and Identity
SC-900 Concepts of Security, Compliance, and IdentitySC-900 Concepts of Security, Compliance, and Identity
SC-900 Concepts of Security, Compliance, and Identity
 
Cloud security, Cloud security Access broker, CSAB's 4 pillar, deployment mode
Cloud security, Cloud security Access broker, CSAB's 4 pillar, deployment modeCloud security, Cloud security Access broker, CSAB's 4 pillar, deployment mode
Cloud security, Cloud security Access broker, CSAB's 4 pillar, deployment mode
 
Outsourced database
Outsourced databaseOutsourced database
Outsourced database
 
Data Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data ProtectionData Modeling for Security, Privacy and Data Protection
Data Modeling for Security, Privacy and Data Protection
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 

Mais de Shantanu Sharma

OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesShantanu Sharma
 
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Shantanu Sharma
 
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...Shantanu Sharma
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduceShantanu Sharma
 
A Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationA Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationShantanu Sharma
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsShantanu Sharma
 
On Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksOn Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksShantanu Sharma
 
Bounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceBounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceShantanu Sharma
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceShantanu Sharma
 
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Shantanu Sharma
 

Mais de Shantanu Sharma (10)

OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
 
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
 
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- I...
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduce
 
A Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationA Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile Communication
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
 
On Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksOn Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio Networks
 
Bounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceBounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduce
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduce
 
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
 

Último

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 

Último (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 

Secure and Privacy-Preserving Big-Data Processing

  • 1. Secure Big-Data Processing University of California, Irvine, USA. IEEE Big Data 2019, Los Angeles, California, USA Anton Burtsev Sharad Mehrotra Shantanu Sharma
  • 2. • Introduction • How to securely process data at the cloud? • Challenges and overview of existing state-of-the-art • Cryptographic Techniques • Encryption-based Data Outsourcing • Secret-Sharing-based Data Outsourcing • Exploiting Trusted Computing Platforms • Secure hardware • Hybrid cloud • Data Partitioning-based Outsourced Data Processing • Conclusion and Open Problems Contents
  • 3. Storage Distributed File Systems (DFS) Hadoop DFS, Google File System (GFS), Gfarm, Amazon S3 Machine Google Compute Engine, Amazon Web Services, Microsoft Azure, Rackspace OpenCloud Network Relational databases, Key-Value Store, NoSQL, MapReduce Database operations, e.g., selection, projection, aggregation, and join, and clustering, machine learning IaaSPaaSSaaS Data providers The public cloud Users Big-Data Processing in the Cloud Figurereference:PhilipDerbeko,ShlomiDolev,EhudGudes,andShantanuSharma. “SecurityandprivacyaspectsinMapReduceonclouds:Asurvey.”Computersciencereview20(2016):1-28.
  • 4. • Utility model • Pay for only what you use • No infrastructure build-up cost and/or database administration costs • Elastic • Use as much as your needs (virtually limitless) • No system management headaches • failure, loss of data, Software upgrades, patches, bug fixes, etc. • Cost amortization • Cheaper due to economy of scale • Better control over IT investment Why Cloud? Public Cloud Elastic, pay-as-you-go service Private Existing servers or data centers Hybrid Utilize both public & private
  • 5. • Data resides in shared systems administration of which is not in owners' control • Unknown applications and processes share resources with apps and data. • Data owners have no control over the cloud’s internal data security personnel, policies or their enforcement • Insider attacks • Data mining attacks leading to information leakage • Cloud providers compliance to government subpoenas Key Challenge: Loss of Control End Users Public Cloud
  • 6. • Availability • Will the owners always have access to data and services? • Integrity • Will the cloud provide answers to queries correctly? • Security • Will the cloud implement its own security policies correctly? • Privacy and confidentiality • Will sensitive data remain confidential? • Will data be vulnerable to misuse? By other tenants? By the service provider? Implications of Loss of Control
  • 7. What is The Solution? Encrypt sensitive data before uploading to the cloud
  • 8. Secure Computing Download the encrypted data and compute at the trusted side Cryptographic Solutions at the Cloud Exploiting Trusted Computing Trusted Private Cloud Untrusted Public Cloud Download encrypted data Upload encrypted data Encrypted dataCleartext data The DB owner Secure hardwareCleartext data processing Cleartext results Encrypted query
  • 9. • An adversary may learn about data: • From ciphertext (ciphertext representation-based attack) • From prior knowledge of data distribution (frequency-count attack) • From the size of the output to a query (output-size attack) • From the access pattern used by the mechanism in answering a query (access- pattern attack) • From knowledge of queries that have executed (search-pattern attack) • From knowledge of frequency of queries (workload-skew attack) Common Attacks in Data Outsourcing
  • 10. • Honest-but-Curious versus Malicious adversary • Honest-but curious • Executes protocols correctly, but wishes to learn about data • Malicious • Might sabotage data or computation • Passive versus Active Adversary • Passive • Makes inferences based on passive observations - ciphertext, queries, workload, and access patterns • Active • May actively injecting new data, execute queries, or interfere with the execution Adversarial Cloud Model
  • 11. • Semantic Security • Access to ciphertext does not help provide any information about the plaintext other than what the adversary knew a-priori. • Difficult to use directly • Equivalent notion – Indistinguishability • Adversary cannot distinguish between the ciphertexts of two plaintexts • Easier to prove using a real-versus-ideal game • Security definition needs to be adapted in data outsourcing • Since leakages occur from encrypted data representation and query execution Defining Security Reference: Shafi Goldwasser, and Silvio Micali. "Probabilistic encryption." Journal of computer and system sciences 28, no. 2 (1984): 270-299. Curtmola, Reza, Juan Garay, Seny Kamara, and Rafail Ostrovsky. "Searchable symmetric encryption: improved definitions and efficient constructions." Journal of Computer Security 19, no. 5 (2011): 895-934.
  • 12. Security Goal: IND-CKA1: Real Game Model with Leakage Profile D0 E(D0) Leakages e.g., access-patterns, search-patterns, output-size A set of queries A set of encrypted queries (i.e., trapdoors/tokens) for the requested set of queries
  • 13. Security Goal: IND-CKA1: Ideal Game Model with Leakage Profile D0 E(D’) 1. Leakages (e.g., access-patterns, search-patterns, output-size) from the real game 2. Generate a fake dataset (D’) having the same leakages 3. Randomly select D0 or D’ and encrypt it Which dataset is encrypted – D0 or fake? The same set of queries like in the real game A set of encrypted queries (i.e., trapdoors/tokens) for the requested set of queries
  • 14. • Many cloud providers support encryption at rest • Microsoft Always Encrypt • Amazon Aurora , MariaDB Cloud Layers and Security IaaS PaaS SaaS • Secure MapReduce, Secure Spark, Secure SQL… • Microsoft Always Encrypt, Jana@Galois Inc., Pulsar@Stealth Software Technologies • Application security • Garble Cloud, Cloud Protect, SPORC
  • 15. Encryption-based Cryptographic Approaches Encrypted data Cleartext data The DB ownerThe DB owner Encrypted processing Trusted Private Cloud Untrusted Public Cloud Users • Fully homomorphic approach • Very inefficient and not practical • Partially homomorphic • Additive: e.g., Paillers • Multiplicative: e.g., Elgamal • Searchable encryption • Bucketization [Hore et al., VLDB, 04] • Searchable Encryption [Song et al., IEEE SP, 00] • Secure indexes – encrypted Bloom filters [Goh, 03] • Order-Preserving Encryption (OPE) [Agrawal et al., SIGMOD, 04) • Conjunctive keyword search [Golle et al., ACNS, 04] • Encrypted inverted lists [Curtmola et al., CCS, 06] • Onion encryption [Popa et al., SOSP, 11] Different approaches • Different levels of security • Support different operations • Different levels of efficiency
  • 16. MPC and Secret Shared Mechanisms Untrusted Public Clouds Users • Techniques: • Secret-sharing [Shamir, CACM, 1979] • Distributed Point Function [Gilboa et al., EUROCRYPT, 2014.] • Function secret-sharing [Boyle et al., EUROCRYPT, 2015] • Homomorphic Secret-Sharing [Boyle et al., CCS, 2017] • Accumulating-Automata [Dolev et al, SCC@ASIACCS , 2014] • Obscure [Gupta et al, CS@UCI, 2019] • Conclave [Volgushev et al. arxiv, 2019] • SMCQL [Bater et al., PVLDB, 2017] • Systems: • Jana by Galois • Partisia • PULSAR by Stealth Software Technologies • Secret Double Octopus and SecretSkyDB Ltd • Sharemind by Cybernetica • Unbound Tech. Secret-Shared Data Cleartext data The DB ownerThe DB owner Secret-Shared processing Trusted Private Cloud • Secure against stronger adversaries • Information-theoretically secure • Secure against access-pattern-based attacks • However, much more expensive • 5-6 order of magnitude expensive compared to plain text processing
  • 17. Cryptographic Techniques vs Security Threats represents technique is resilient to a given attack. Resilient to attacks Techniques Data at rest During query execution Ciphertext indistinguishability Output- Size Workload-skew Access-patterns Full Download Deterministic Encryption/OPE X X X X Non-Deterministic Encryption X X X Searchable encryption X X X Homomorphic + ORAM X X Shamir’s Secret-sharing X X Multi-party computations-Jana X X Reference: Sharad Mehrotra, Shantanu Sharma, and Jeffrey D. Ullman. "Scaling Cryptographic Techniques by Exploiting Data Sensitivity at a Public Cloud." In Proceedings of the Ninth ACM Conference on Data and Application Security and Privacy, pp. 165-167. ACM, 2019.
  • 18. • Efficiency • How expensive are the cryptographic operations? Is operation linear or sublinear in the size of the data (indexable versus non-indexable)? • Generality • What queries can the technique support – selection, range, join, aggregation • Dynamic Operations • Does the scheme support insertion/deletions/updates? • Client-Side Execution • How much work does the client have to do? During insertion/updates/queries. • Security • How much security does the scheme offer? Quantifiable leakage, e.g., orderability, distribution? Semantic security? Cryptographic Techniques – Design Criteria
  • 19. Exploiting Trusted Platform Trusted Private Cloud Untrusted Public Cloud Users Trusted Private Cloud Untrusted Public Cloud Users Hybrid Cloud Scenario Secure Hardware Scenario Cleartext non-sensitive dataCleartext sensitive data The DB owner Cleartext non-sensitive data processing Secure hardware Cleartext sensitive data processing • Distribute computation between untrusted platform and trusted platform • Solutions differ on the trusted platform exploited, degree of integration, security offered, and computations supported • Hybrid Cloud-based Solutions • HybrEx, SEMROD, Sedic • Secure FPGA-based solutions • Microsoft Cipherbase • Intel SGX-based solutions • Opaque, EnclaveDB, VC3, HardIDX
  • 20. • Minimizing data movement between trusted and untrusted platforms • Movement between trusted and untrusted platforms can lead to leakage • Mapping complex operator workflow between trusted and untrusted platforms • Existing trusted hardware are vulnerable to side-channel attacks • Oblivious access at different levels, e.g., register and cache-line • Cost vs security Trusted Platform – Challenges
  • 21. Security Techniques vs Computation Cost Selecting a single row from TPC-H Customer table of 1.5M rows and 8 columns Searchable encryption: DSSE: Distributed Searchable Symmetric Encryption (PULSAR by Stealth Software Technologies) MPC: Multi-party computation (Jana by Galois) Opaque SGX based solution [Zhang et al., NSDI, 2017] • Cryptographic Overheads: • Searchable encryption – ~2 orders of magnitude • Secure hardware - ~3-4 order of magnitude • MPC based solution - ~5-6 orders of magnitude
  • 22. Can we design an outsourcing solution for that is simultaneously?? Efficient – significantly better compared to downloading cryptographically secured data, and Secure – similar to downloading the data and local processing Secure Data Outsourcing: Challenge A possible approach?? Partitioned computing that exploits partial sensitivity of data to restrict cryptographic overheads to only sensitive data Trusted Private Cloud Untrusted Public Cloud Users Cleartext non-sensitive dataCleartext sensitive data The DB owner Partition computationEncrypted sensitive data Reference:SharadMehrotra,ShantanuSharma,JeffreyUllman,andAnuragMishra."Partitioneddatasecurity onoutsourcedsensitiveandnon-sensitivedata."In2019IEEEICDE,pp.650-661.IEEE,2019.
  • 23. • Organization data is often only partially sensitive • Sensitivity dictated by policies • Sensitivity dictates what data and in what form is it outsourced • E.g., General office emails possibly not sensitive (hence outsourced) • Information related to a sensitive project sensitive (hence not outsourced in plaintext) • Can we exploit partially sensitive nature of data to scale cryptographic solutions without compromising security of sensitive data? • Commercial encrypted database solutions (e.g., Jana by Galois Inc.) are beginning to explore such solutions Data Sensitivity
  • 24. Partitioned Data Security Challenge • Non-Linkability • The Adversary does not learn relationship between any encrypted and plaintext value • Cyphertext Indistinguishability • The adversary does not learn any relationships between encrypted values • unless underlying crypto allows such relationships to be learnt (e.g., OPE) Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data." In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 650-661. IEEE, 2019.
  • 25. • Introduction • How to securely process data at the cloud? • Challenges and overview of existing state-of-the-art • Cryptographic Techniques • Encryption-based Data Outsourcing • Secret-Sharing-based Data Outsourcing • Exploiting Trusted Computing Platforms • Secure hardware • Hybrid cloud • Data Partitioning-based Outsourced Data Processing • Conclusion and Open Problems Contents
  • 26. Cryptographic Solutions • Encryption-based Techniques • Bucketization [Hore et al. VLDB 04] • Searchable Encryption [Song et al., IEEE SP 00] • Secure indexes – encrypted Bloom filters [Goh, 03] • Bilinear maps [Boneh et al., EuroCrypt 03] • Order-Preserving Encryption (OPE) [Agrawal et al., SIGMOD 04] • Modular-OPE [Boldyreva et al., CRYPTO 11] • Conjunctive keyword search [Golle et al., ACNS 04] • Encrypted inverted lists [Curtmola et al., CCS 06] • Fully homomorphic encryption [Gentry, STOC 09] • Onion encryption [Popa et al., SOSP 11] • Dynamic Searchable Encryption [Cash et al.NDSS 14] • PBTree [Li et al., VLDB 14] • IBTree [Li et al., ICDE 17] • Secret-Sharing Techniques • Shamir’s secret-sharing [Shamir, CACM 79] • Multi-Linear Secret-Sharing Schemes [Brickell et al., J. of Cryptology 91, Bertilsson et al., AUSCRYPT 92] • Verifiable secret sharing [Rabin et al., STOC 89] • Proactive Secret Sharing [Herzberg et al., CRYPTO 95] • Function Secret Sharing [Boyle et al., EUROCRYPT 15] • Homomorphic secret sharing [Boyle et al. CRYPTO 16] • Accumulating Automata [Dolev et al., TCS 19] • Encryption-based Systems • CryptDB [Popa et al., SOSP 11] • Monomi [Tu et al.. VLDB 13] • Cipherbase [Arasu et al., CIDR 13] • TrustedDB [Bajaj et al., IEEE TKDE 13] • CorrectDB [Bajaj et al., VLDB 13] • ZeroDB [Egorov et al., arxiv 16] • MrCrypt [Tetali et al., OOPSLA 13] • EncKB [Yuan et al., ASIACCS 17] • Microsoft Always Encrypted • Oracle 12c • Amazon Aurora • MariaDB • Secret-Sharing-based Systems • SSSDB [Avni et al., ALGOCLOUD 15] • Splinter [Wang et al., NSDI 17] • OBSCURE [Gupta et al, VLDB 19] • Cybernetica • Jana by Galois Inc. • Partisia • Secret Double Octopus • SecretSkyDB Ltd • PULSAR by Stealth Software Technologies Inc. • Unbound Tech.
  • 27. EmpID name DID E1 Alice D1 E2 Bob D2 E3 Carl D1 Problems DDID Dname D1 Sale D2 Coding On the relations, execute the following in a secure manner: 1. Selection query (e.g., SELECT * FROM employee WHERE name= ‘Alice’) 2. Join query (e.g., SELECT * FROM employee INNER JOIN department ON employee.DID = department.DDID) 3. Aggregation query (e.g., SELECT count(*) FROM employee WHERE DID=‘D1’) employee department
  • 28. ID Dept Comment Id1 D1 W1 Id2 D1 W2 . . . . . . . . . Idi Di Wi Idk Dk Wk Searchable Encryption: Ciphertext Generation A relation Wi Ek(Wi) Ek(): Deterministic encryption Li Ri Si Ti ki = fk(Li) Ti= fki(Si)  Ciphertext (CT) n-m bits m bits n bits Trapdoor for wi Key generation Partitioning the encrypted word into two parts Pseudorandom string Reference: Dawn Xiaoding Song, David Wagner, and Adrian Perrig. “Practical techniques for searches on encrypted data.” In Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000, pp. 44-55. IEEE, 2000.
  • 29. Searchable Encryption: Search at the Cloud Ciphertext (CT) Si Ti  Matching or not??? CTLi CTRi User provided values Ek(Wi) ki = fk(E1) E1 E2 Partitioning the ciphertext into two parts Partitioning the encrypted word into two parts n-m bits m bitsn-m bits m bits Ti= fki(Si) Ek(Wi) Data outsourcing method Idea A  B = C A  C = B B  C = A Reference: Dawn Xiaoding Song, David Wagner, and Adrian Perrig. “Practical techniques for searches on encrypted data.” In Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000, pp. 44-55. IEEE, 2000. Advantage Does not reveal any thing before the query execution, unlike deterministic encryption that reveals things before query execution Disadvantage Linearly scan the entire data, i.e., no index support Question Can we have indexable searchable encryption?
  • 30. • The cloud maintains an index • User sends keywords and the cloud traverses the index to answer the query • Issues: • Index Generator: Who will create an index – the DB owner vs server? • Mostly work consider the DB owner to create index • Index Traverse: Interactive vs non-interactive – can the cloud traverse the index by own? • Index Update: Can the cloud update the index? • Techniques: • Early approaches: The DB owner generated, interactive traversal, non-updateable • Exploit oblivious techniques • Implemented in Stealth Software Technology, Inc. • Recent: PB-Tree: The DB owner generated, non-interactive traversal, updateable Indexable Searchable Encryption Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky: Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions. Our focus
  • 31. • Consider • A number n • The number of bit to represent the number n in binary form = w • Prefix family will contain w+1 items • Prefix family • Consider a number 6 • 6 in 5-bit binary = (00110) • Prefix family of 6 is F(6) = {00110, 0011*, 001**, 00***, 0****,*****} • What a node of the index will contain? • Leaf node: Prefix family of one of the data items • Other nodes: Union of prefix families of their child nodes Indexable Searchable Encryption Reference: Li, Rui, Alex X. Liu, Ann L. Wang, and Bezawada Bruhadeshwar. "Fast range query processing with strong privacy protection for cloud computing." Proceedings of the VLDB Endowment 7, no. 14 (2014): 1953-1964. Rui Li and Alex X. Liu. "Adaptively secure conjunctive query processing over encrypted data for cloud computing." In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 697-708. IEEE, 2017.
  • 32. • Step 1: • Find prefix family of all such numbers • Step 2: • Allocate the prefix family to the root node • Step 3: • Divide the number in the given node until a node contains prefix family of one of the given numbers Create Index using Prefix Family: Top-Down Way F(1), F(6), F(7), F(9), F(11), F(12), F(13), F(16), F(20), F(25) F(1), F(6), F(7), F(16), F(20) F(9), F(11), F(12), F(13), F(25) F(1), F(6), F(7) F(16), F(20) F(12), F(13), F(25) F(6),F(7) F(1)F(6) F(7) F(20) F(16) F(12), F(13) F(12) F(13) F(25) F(9) F(11) F(9), F(11) Create index on the following numbers 1, 6, 7, 9, 11, 12, 13, 16, 20, 25
  • 33. • Step 1: • User creates prefix family of 6 and sends to the cloud • Step 2: • The cloud starts from the root node to find the prefix family of the given query Execute a Point Query using the Index F(1), F(6), F(7), F(9) ,F(11), F(12), F(13), F(16), F(20), F(25) F(1), F(6), F(7), F(16), F(20) F(9), F(11), F(12), F(13), F(25) F(1), F(6), F(7) F(16), F(20) F(12), F(13), F(25) F(6),F(7) F(1)F(6) F(7) F(20) F(16) F(12), F(13) F(12) F(13) F(25) F(9) F(11) F(9), F(11) Query: Find 6 F(6)  Reference: Li, Rui, Alex X. Liu, Ann L. Wang, and Bezawada Bruhadeshwar. "Fast range query processing with strong privacy protection for cloud computing." Proceedings of the VLDB Endowment 7, no. 14 (2014): 1953-1964. Rui Li and Alex X. Liu. "Adaptively secure conjunctive query processing over encrypted data for cloud computing." In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 697-708. IEEE, 2017. A B C D E F G H I
  • 34. Execute a Range Query F(1), F(6), F(7), F(9) ,F(11), F(12), F(13), F(16), F(20), F(25) F(1), F(6), F(7), F(16), F(20) F(9), F(11), F(12), F(13), F(25) F(1), F(6), F(7) F(16), F(20) F(12), F(13), F(25) F(6),F(7) F(1)F(6) F(7) F(20) F(16) F(12), F(13) F(12) F(13) F(25) F(9) F(11) F(9), F(11) Query: Find all numbers between [0,8]• Step 1: • Represent the range predicate into their prefix family • F(0)= {00000, 0000*, 000**, 00***, 0****,*****} • F(8) = {01000, 0100*, 010**, 01***, 0****,*****} • Step 2: • Minimum set of prefixes such that union of prefixes cover the range • {00***,01000} • Step 3: • Check node for minimum set of prefixes 000 → 0 001 → 1 010 → 2 011 → 3 100 → 4 101 → 5 110 → 6 111 → 7 A B C D E F G H I
  • 35. • Indistinguishability • Use Bloom filters • Any prefix will be hashed to r locations using HMAC with r keys • Node Indistinguishability • Associate each node v with a random number v.R, then hash r times as follows: • HMAC(k1, v.R, p), …, HMAC(kr, v.R, p) • Reverse engineering • An adversary can do reverse engineering to create PB-tree after observing many queries or by asking queries • How to solve this issue? • IB-Tree Issues with the Index (PB-Tree) Reference: Li, Rui, Alex X. Liu, Ann L. Wang, and Bezawada Bruhadeshwar. "Fast range query processing with strong privacy protection for cloud computing." Proceedings of the VLDB Endowment 7, no. 14 (2014): 1953-1964. Rui Li and Alex X. Liu. "Adaptively secure conjunctive query processing over encrypted data for cloud computing." In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 697-708. IEEE, 2017. Two nodes may contain overlapping prefix families F(6) = {00110, 0011*, 001**, 00***, 0****,*****} F(7) = {00111, 0011*, 001**, 00***, 0****,*****}
  • 36. • Indistinguishable Bloom Filter • Twin cell: 0 and 1 • For i-th location, which cell stores 1: HMAC(kr+1, i)  rB • rB is a random number for IBF B. • IB-Tree • A tree like PBtree, but all nodes use Indistinguishable Bloom Filter Searchable Encryption for Adaptive Adversary Referenceandslidecredit:RuiLiandAlexX.Liu."Adaptivelysecureconjunctivequeryprocessingover encrypteddataforcloudcomputing."In2017IEEE33rdInternationalConferenceonDataEngineering (ICDE),pp.697-708.IEEE,2017. Selected Unselected
  • 37. Name :=John and Age=[1,15] Name :John, Age:0001 NR N32 N11 N21 N31 d2d1 N34 N22 N33 d4d3 N36 N12 N23 N35 d6d5 N37 d7 Name :John, Age:001* Name :John, Age:01** Name :John, Age:1*** U U U Processing Range Queries on IB Tree Slide credit: Alex Liu: Adaptively Secure Conjunction Query Processing over Encrypted Data for Cloud Computing. Minimum set of prefixes such that union of prefixes cover the range
  • 38. •Secure execution of selection queries • Point and range queries • Indexable vs non-indexable •What about join and aggregation? What we have discussed so far?
  • 39. Bucketization NAME SALA RY John 54500 Mary 111029 James 95300 Lisa 14500 0 E-tuple Bucket_id fErf!$Q!! Xr2k%s F%%3w& 11vb$$ &%gfsdf$ bbcr3@ %%33w& Xxrty* Q: SELECT name FROM EMPLOYEE WHERE salary ≥ 90k AND salary < 110k false positive Q: SELECT name FROM EMPLOYEE WHERE Bucket_id = bbcr3@ OR Bucket_id = 11vb$$ Bucket ID 30k – 50k 1bx!23 50k – 70k Xr2k%s 70k – 80k Rtes12! 80k – 90k Cvtr^e 90k – 100k bbcr3@ 100k – 115k 11vb$$ 115k – 130k 23wqa% 130k – 160k Xxrty* Pros • Generality: allows large class of predicates to be evaluated (most of SQL) • Efficient implementation: index Cons • Incurs overhead on client: pruning of false positives Database Owner Site Cloud Site
  • 40. • Buckets’ impact • Query execution overhead • Security • Security metrics • How large is the span of the bucket? – larger the better • How are the frequencies distributed? More uniform the better • Cost metrics • How many false positives are generated for a predicate? • What is the storage overhead due to metadata ? • Improving security • Introduce randomness to increase security level Bucketization Reference: Hakan Hacigümüş, Bala Iyer, Chen Li, and Sharad Mehrotra. "Executing SQL over encrypted data in the database-service-provider model." In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pp. 216-227. ACM, 2002.
  • 41. • We can do • Joins at the cloud-side based on bucket-ids • But with computational overhead at the DB owner due to filtering • Can we avoid computation overheads at the DB owner in join operation? • Precompute join operation before outsourcing the data What We Have Seen in Bucketization?
  • 42. • Represent data in different format • Execute join among tables before outsourcing Precomputed Joins: Different Representation of Datasets Slide credit: Seny Kamara Tarik Moataz: SQL on Structurally-Encrypted Databases
  • 43. Data Outsourcing Precomputed Joins Reference and slide credit: Seny Kamara and Tarik Moataz. "SQL on structurally-encrypted databases." In International Conference on the Theory and Application of Cryptology and Information Security, pp. 149-180. Springer, Cham, 2018. Join ProjectionSelectionSelection Projection Disadvantages: 1. Joins are precomputed 2. Aggregation queries cannot be executed at the cloud 3. Complex queries cannot be solved at the cloud
  • 44. • Non-indexable Searchable Encryption • Indexable Searchable Encryption for point and range queries • Bucketization for join, aggregation, and most of SQL • Precomputed joins •Is there any system based on these techniques or based on encryption? What we have seen so far?
  • 45. • CryptDB • Monomi • Arx • Cipherbase • TrustedDB • CorrectDB • SDB • EncKV Encryption-based Systems • ZeroDB • MrCrypt • Crypsis • Microsoft Always Encrypted • Oracle 12c • Amazon Aurora • MariaDB
  • 46. • Can be seen as a two-column table • One column for key • Another column for value • Also, they can store complicated relational table in this format • Example: • Person database: • Key: Person ID, Value: Person record • Key: City, Value: PersonID • Key: PersonID, Value: Name Key-Value Store Person id name age city 1001 alice 20 LA 1002 bob 25 LA 1003 tom 20 NY Key = City Value = PersonID LA 1001 LA 1002 NY 1003 Key = PersonID Value = Nme 1001 alice 1002 bob 1003 tom
  • 47. EncKV: Encrypted Key-Value Store 1 5 2 4 3 6 Server Hash on Row Id to allocate key-value pair to a server LA 1001 LA 1002 NY 1003 city Person id Person Id Name Reference and Slide credit: Xingliang Yuan, Yu Guo, Xinyu Wang, Cong Wang, Baochun Li, and Xiaohua Jia. “Enckv: An encrypted key-value store with rich queries.” In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 423-435. ACM, 2017. H(Gk(city||LA||i),2) Enck(1002) H(Gk(name||bob||i),1) Enck(1002) H(Gk(city||LA||i),1) Enck(1001) H(Gk(city||NY||i),1) Enck(1003) … … Server i Attribute Name Attribute Value Server i Value Occurrence Double encryption 1001 alice 1002 bob 1003 tom
  • 48. EncKV: Encrypted Key-Value Store H(Gk(city||LA||i),2) Enck(1002) H(Gk(name||bob||i),1) Enck(1002) H(Gk(city||LA||i),1) Enck(1001) H(Gk(city||NY||i),1) Enck(1003) … … Enck(1001) Enck(1002) 2 Encrypted exact match Index Pk(name||1002) Enck(bob) Pk(age||1001) Enck(20) Pk(name||1001) Enck(alice) … … Server i Encrypted data records select “name” where “city=LA” Gk(city||LA||i)1 Pk(name||1001) Pk(name||1002) 3 Enck(bob) Enck(alice) 4 • Observations: • The server does not learn whether the index entries of two different values belong to the same attribute or not before query execution, e.g., H(Gk(city||LA||i),1) and H(Gk(city||NY||i),1) for LA and NY. • At any two servers, the index entries for the same attribute are different, e.g., H(Gk(city||LA||i),1) and H(Gk(city||LA||j),1) for server i and j. P, H,G: PRF ReferenceandSlidecredit:XingliangYuan,YuGuo,XinyuWang,CongWang,BaochunLi,and XiaohuaJia.“Enckv:Anencryptedkey-valuestorewithrichqueries.”InProceedingsofthe2017ACM onAsiaConferenceonComputerandCommunicationsSecurity,pp.423-435.ACM,2017. Double encryption Communication Overhead If more than 1M people from LA in the table???
  • 49. • Some encryption techniques are fast, but reveal information • Deterministic encryption is fast but reveals distribution of values • Order-Preserving encryption (OPE) is fast but reveals order of the values • Searchable encryption is fast but only secure unless a query is executed; otherwise reveals data distribution or order of the values • Bucketization is more secure as compared to above techniques and fast • Retrieve more items • Require client-side processing • CryptDB is fast but insecure, due to using deterministic encryption and OPE • Open issues: • Need a fast and secure encryption technique that can support different types of SQL queries • Need an index that a cloud can build Pros and Cons of Encryption-based Techniques
  • 51. •Encryption techniques are computationally secure • A powerful adversary can break the encryption technique • Google, with sufficient computational capabilities, broke SHA-1 (https://shattered.io/) •Information-theoretical security • Secure regardless of the computational power of an adversary • Quantum secure Why Secret-Sharing?
  • 52. Shamir’s Secret-Sharing (SSS) [Shamir79] – Key Idea • One point  Infinite number of lines • Two points  Only one line • Where f(0) is the secret • Alice wants to share her secret value 5 to Bob and Carl • Bob and Carl do not communicate with each other • Impact of degree of the polynomial vs security • 𝑓 servers collude  polynomial degree should be 𝑓 + 1 • Servers do not collude  a polynomial of the degree 1 • Fault tolerant • Due to creating multiple shares Reference:AdiShamir.“Howtoshareasecret.”CommunicationsoftheACM22,no.11(1979):612-613.
  • 53. Shamir’s Secret-Sharing (SSS) Secret S Secret Owner Non-Communicating Public Servers s1 s2 s3 s4 Mathematical operations f(x) = S + ax Each server cannot learn the secret S Secret-Share Creation: e.g., under the assumption that no server will collude Reference: Adi Shamir. “How to share a secret.” Communications of the ACM 22, no. 11 (1979): 612-613.
  • 54. Shamir’s Secret-Sharing (SSS) Secret S Secret Owner Non-Communicating Public Servers s1 s2 s3 s4 Lagrange Interpolation Secret Reconstruction e.g., under the assumption that no server will collude Reference: Adi Shamir. “How to share a secret.” Communications of the ACM 22, no. 11 (1979): 612-613.
  • 55. Shamir’s Secret-Sharing (SSS) Secret S Secret Owner Non-Communicating Public Servers s1 s2 s3 s4 Secret Reconstruction e.g., under the assumption that no server will collude Lagrange Interpolation Reference: Adi Shamir. “How to share a secret.” Communications of the ACM 22, no. 11 (1979): 612-613.
  • 56. • Similar to Order-Preserving Encryption (OPE) • If cleartext values have a relation, such as 𝒂 < 𝒃, then • 𝑆 𝑎 < 𝑆 𝑏 • Efficient for maximum/minimum and range queries Order-Preserving Secret-Sharing Reference: Fatih Emekci, Ahmed Methwally, Divyakant Agrawal, and Amr El Abbadi. “Dividing secrets to secure data outsourcing.” Information Sciences 263 (2014): 198-210.
  • 57. Computing over Secret Shared Data Secret Sharing Communicating Servers (Jana and Sharemind) Non-communicating servers (SSDB, OBSCURE) • Selection and aggregation queries • Significant communication overheads amongst servers • Selection and aggregation queries Our focus
  • 58. • Outsource the above relation using Shamir’s secret-sharing • Add all secret-shared values of ‘Salary’ attributes • Exploit additive homomorphic property Simple Aggregation using Secret-Shared Data EmpID Name Salary Dept E101 John 1000 Testing E101 John 100000 Security E102 Adam 5000 Testing E103 Eve 2000 Design SELECT SUM(Salary) FROM Employee
  • 59. •Aggregation with complex selection obliviously, i.e., access-pattern hiding •Complex Selection Query Execution •Join Query Execution Challenges
  • 60. • The DB owner keeps each polynomial, which was used to create database shares • To execute a query, the DB owner creates shares of the query predicate and fetches the desired value from the clouds • Very fast • Access-pattern attack • Distribution revealing DB Owner Assisted Query Execution Reference: Fatih Emekci, Ahmed Methwally, Divyakant Agrawal, and Amr El Abbadi. “Dividing secrets to secure data outsourcing.” Information Sciences 263 (2014): 198-210.
  • 61. • How to search on secret-shared outsourced data • Without remembering any polynomial, which were used to create the dataset • Otherwise, the DB owner can store the entire dataset also • Supporting multiple-DB owners Big Question Solution Non-interactive string-matching over the secret-shared data
  • 62. Step 1: Unary representation Step 2: Creating secret-shares of unary represented data Step 3: Outsourcing the data String Matching over Secret-Shared Data A B C 1, 0, 0 0, 1, 0 0, 0, 1 Polynomials Secret-shares Secret-shares Secret-shares Reference: Dolev et al. Accumulating automata and cascaded equations automata for communicationless information theoretically secure multi-party computation, TCS 2019.
  • 63. String-Matching over Secret-Shared Data Secret-Share Creation by the DB owner B 0 1 0 0 + 5x 1 + 9x 0 + 2x 5 10 2 10 19 4 15 28 6 This is representing B 0, 1, 0 of secret-shared form → The adversary cannot learn the actual value, B Dolev et al. Accumulating automata and cascaded equations automata for communicationless information theoretically secure multi-party computation, 2019.
  • 64. String-Matching over Secret-Shared Data 5 10 2 10 19 4 15 28 6 User wants to search for B0 1 0 0 + x 1 + 2x 0 + 4x No need to share any polynomial b/w the DB owner and the user 1 3 4 2 5 8 3 7 12 Secret-Share Creation by the user These shares are representing B 0, 1, 0 of secret-shared form → The adversary cannot learn the actual value, B, of either the dataset or the query predicate Dolev et al. Accumulating automata and cascaded equations automata for communicationless information theoretically secure multi-party computation, 2019.
  • 65. String-Matching over Secret-Shared Data 5 10 2 10 19 4 15 28 6 1 3 4 2 5 8 3 7 12 Cloud operations: Multiplication and addition of shares 5 30 8 20 95 32 45 196 72 43 147 313 User wants to search for B Lagrange interpolation Answer = 1 This is the multiplication of [0,1,0] and [0,1,0] in secret-shared form. So using SSS, we are hiding 1 or 0 from the adversary. Each cloud sends only one value to the user, regardless of dataset size → Less communication cost Dolev et al. Accumulating automata and cascaded equations automata for communicationless information theoretically secure multi-party computation, 2019. Can we use this string-matching technique for solving other operations such as selection and aggregation? V1 V2
  • 66. • Based on string-matching techniques explained previously • Supporting database outsourcing using SSS • Execute complex selection (conjunctive and disjunctive) in an oblivious manner • No communication among servers • Minimize work at the database owner site • Result Verification Methods • Count, Sum, Maximum, Minimum, Top-K • Tuple verification OBSCURE: Oblivious and Verifiable Aggregation Queries Reference: Peeyush Gupta, Yin Li, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, and Sumaya Almanee. “Obscure: Information-theoretic oblivious and verifiable aggregation queries.” Proceedings of the VLDB Endowment 12, no. 9 (2019): 1030-1043.
  • 67. OBSCURE: Data Outsourcing using OBSCURE EmpID Name Salary E101 John 1000 E101 John 100000 E102 Adam 5000 E103 Eve 2000 CleartextTID SSTID Salary 5 5 5000 4 4 1000 3 3 1000 2 2 100000 Employee Relation Create shares using SSS Create shares using OP-SS Only order of values is revealed. But, which row has the highest value is not revealed. Fast answering to maximum finding queries. EmpID Name Salary TID Index For verification purpose E101 John 1000 3 3 E101 John 100000 2 2 E102 Adam 5000 5 5 E103 Eve 1000 4 4 E1 E2
  • 68. • Step 1: Convert query predicates to secret-share representation • Step 2: Send secret-shares query predicate to the servers OBSCURE: Conjunctive Count Query Name John John Adam Eve Salary 1000 100000 5000 1000 John John John John     String-Matching Operation over Secret-Shares 1 1 0 0 Answers of String-Matching Operations 1000 1000 1000 1000    1 0 0 1 Query predicate String-Matching Operation over Secret-Shares Answers of String-Matching Query predicate 1 0 0 0 1 Final answer to the query select count(*) from Employee where Name = ‘John’ and Salary = 1000 Multiply Add Multiplication increases the degree of the polynomial If we have a smaller number of servers than the desired number of servers, then we can still solve the problem by 1. Increasing communication rounds 2. Increasing computation time V1 V2
  • 69. OBSCURE: Count Query – Security Guarantees select count(*) from Employee where Name = ‘John’ and Salary = 1000 • Identical operations on each row  Oblivious execution • Hide access-patterns: The adversary cannot learn which rows have satisfied the query • The adversary cannot learn anything • By observing the values of the data and query predicates, since all values are secret-shared • No output-size attack Name John John Adam Eve Salary 1000 100000 5000 1000 John John John John     String-Matching Operation over Secret-Shares 1 1 0 0 Answers of String-Matching Operations 1000 1000 1000 1000    1 0 0 1 Query predicate String-Matching Operation over Secret-Shares Answers of String-Matching Operations Query predicate
  • 70. Impact of #Shares – Conjunctive Count Query Name John John Adam Eve Salary 1000 100000 5000 1000 John John John John     1 1 0 0 1000 1000 1000 1000    1 0 0 1 1 0 0 0 1 select count(*) from Emp where Name = ‘John’ and Salary = 1000 and Age = 40 Multiply Add Age 40 40 50 40 40 40 40 40    1 1 0 1 Polynomial degree = 3 • Min. number of shares of interpolate a polynomial of the degree = 3 • Need four shares V2 V3 V1 Reference: Peeyush Gupta, Yin Li, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, and Sumaya Almanee. “Obscure: Information-theoretic oblivious and verifiable aggregation queries.” Proceedings of the VLDB Endowment 12, no. 9 (2019): 1030-1043.
  • 71. Impact of #Shares – Conjunctive Count Query select count(*) from Emp where Name = ‘John’ and Salary = 1000 and Age = 40 • What if you have only three shares? • Compute the result of any two predicate, e.g., Salary = 1000 and Age = 40 • And execute the remaining query at the user side Name John John Adam Eve Salary 1000 100000 5000 1000 John John John John    1 1 0 0 1000 1000 1000 1000    1 0 0 1 Age 40 40 50 40 40 40 40 40    1 1 0 1 Multiply 1 0 0 1 V2 V' V1 V3
  • 72. OBSCURE: Count Query Result Verification EmpID Name Salary TID Index With Something for verification E101 John 1000 3 3 E101 John 100000 2 2 E102 Adam 5000 5 5 E103 Eve 1000 4 4 EmpID Name Salary TID Index A B E101 John 1000 3 3 1 1 E101 John 100000 2 2 1 1 E102 Adam 5000 5 5 1 1 E103 Eve 1000 4 4 1 1 What is this here??? Two columns, each is having 1 of SSS form
  • 73. OBSCURE: Count Query Result Verification Verify the answer of the following query: select count(*) from Employee where Name = ‘John’ and Salary = 1000 1 0 0 0 A 1 1 1 1 B 1 1 1 1 0 1 1 1 1 - Value Multiply 1 0 0 0 0 1 1 1 3 1 Add all values Add all values MultiplyCount query result for each row
  • 74. OBSCURE: Count Query Result Verification Verify the answer of the following query: select count(*) from Employee where Name = ‘John’ and Salary = 1000 1 3 The first value matches the result of the count query → The count query result is correct The sum of the two values equals to the number of rows in the dataset → The server has scanned all the rows to compute the answer
  • 75. OBSCURE: Maximum Query select * from Employee where Salary in (select max(Salary) from Employee) EmpID Name Salary Dept TID Index E101 John 1000 Testing 3 3 E101 John 100000 Security 2 2 E102 Adam 5000 Testing 5 5 E103 Eve 1000 Design 4 4 CleartextTID SSTID Salary 5 5 5000 4 4 1000 3 3 1000 2 2 100000 Find the tuple with the maximum salary CleartextTID SSTID Salary 2 2 100000 Output Based on string matching over TID and SSTID, find the tuple having the maximum salary E101 John 100000 Security 2 2 E1 E2
  • 76. • Dataset • TPC-H LineItem Table 1M and 6M rows • Cloud Machines • 15 AWS servers, each 144GB RAM, 3.0GHz Intel Xeon CPU with 72 cores • Database Owner or User Machine • A 16GB RAM machine with one core OBSCURE: Experimental Results Reference: Peeyush Gupta, Yin Li, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, and Sumaya Almanee. “Obscure: Information-theoretic oblivious and verifiable aggregation queries.” Proceedings of the VLDB Endowment 12, no. 9 (2019): 1030-1043.
  • 77. OBSCURE vs MPC (communication among servers) Reference: Peeyush Gupta, Yin Li, Sharad Mehrotra, Nisha Panwar, Shantanu Sharma, and Sumaya Almanee. “Obscure: Information-theoretic oblivious and verifiable aggregation queries.” Proceedings of the VLDB Endowment 12, no. 9 (2019): 1030-1043.
  • 78. OBSCURE vs Downloading and Local Processing 1M rows 6M rows At most time is 13 seconds At most time is 50 seconds Computation time at a resource constrained user (1GB RAM and single core 1.35GHz CPU) 1M rows  at most 13seconds < 26 seconds (downloading) 6M rows  at least 50seconds < 385seconds (downloading)
  • 79. OBSCURE: Experiments Results – Query Execution vs Verification Time • At most 10 seconds on 1M rows count and sum operation verification • At most 35 seconds on 6M rows count and sum operation verification
  • 80. • Cybernetica • Galois Inc. • Partisia • Secret Double Octopus • SecretSkyDB Ltd • Stealth Software Technologies • Unbound Tech. Industrial Efforts (1)
  • 81. • Built on top of PostgreSQL and supports most of SQL • Offer multiple encryption techniques • Deterministic encryption • Order Preserving Encryption • Multi-Party Computation using SPDZ engine • Users can select different encryption techniques for different attributes • Provide end-to-end privacy preservation for relational queries • Limitations: • Joins on deterministically encrypted attribute is allowed • Search on MPC domain is very slow Industrial Efforts (2): Jana by Galois Inc. Thanks: David Archer @ Galois
  • 82. • Stealth Software Technologies, Inc. is a small business based in Los Angeles • Team of world-class cryptographers and software engineers who are pioneers in their field and have been building solutions • Private Updateable Lightweight Scalable Active Repository (PULSAR) as part of the DARPA Brandeis program • Aggregate and analyze data while maintaining privacy and authenticity of data • Combines a multitude of cryptographic techniques • Secure database via access-pattern hiding Searchable Encryption [IKLO16] • Function Secret Sharing [BGI15] • Efficient Secure Multiparty Computation (MPC) • Garbled RAM [LO13] Industrial Efforts (3): PULSAR by Stealth Software Technologies, Inc. Thanks: Steve Lu @ Stealth
  • 83. Comparing Secret-Sharing-based Systems Features Pulsar (v0.5.1-269) Jana (v1.7.6) OBSCURE Incremental insert support No Yes Yes Indexing support Yes Yes -- limited to DET and plaintext No Support for Sensitivity No --- encrypts all data Yes --- encrypts attributes as described by the application designer No Select and range queries Yes Yes Yes Analytical query support (group by and join) No --- (but application-level joins possible, though may leak data) Partial --- Join only on plaintext or DET attributes No join User-defined functions No No No Academic/Industry Industry Industry Academic
  • 84. • Information-theoretically secure • Secure regardless of the computational power of an adversary • Quantum secure • Communication overheads • Cannot deal with complex queries • Join queries • Nested queries • Require more than one non-communicating servers Pros and Cons of Secret-Sharing-based Techniques
  • 85. • Introduction • How to securely process data at the cloud? • Challenges and overview of existing state-of-the-art • Cryptographic Techniques • Encryption-based Data Outsourcing • Secret-Sharing-based Data Outsourcing • Exploiting Trusted Computing Platforms • Secure hardware • Hybrid cloud • Data Partitioning-based Outsourced Data Processing • Conclusion and Open Problems Contents
  • 86. Motivation for SGX • Security and isolation in commodity systems • Privilege levels (rings) protect the kernel from user programs
  • 87. Motivation for SGX • Security and isolation in commodity systems • Privilege levels (rings) protect the kernel from user programs • Page tables protect programs from each other
  • 88. Motivation for SGX • Security and isolation in commodity systems • Privilege levels (rings) protect the kernel from user programs • Page tables protect programs from each other
  • 89. Operating Systems haven’t changed for decades 89  40 years old  Time-sharing  Expensive hardware  Overly general Ken Thompson (sitting) and Dennis Ritchie working together at a PDP-11 (1972)
  • 90. •17,000,000 LoC • 40 subsystems • 3,200 device drivers 90
  • 91. Modern Kernels are Vulnerable 0 100 200 300 400 500 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 Linux Kernel Vulnerabilities by Year
  • 92. Motivation for SGX • Security and isolation in commodity systems • Privilege levels (rings) protect the kernel from user programs • Page tables protect programs from each other • Until one program (malware) attacks the kernel and then attacks any program in the system
  • 93. TCB of A Modern System • Attack surface is giant • OS kernel • 17,000,000 lines of code • 40 major subsystems • 3,200 device drivers • Virtual Machine Monitor • Hypervisor • QEMU emulator • Device drivers • Parts of host kernel (KVM)/Domain0 (Xen)
  • 94. Enclaves • Applications can protect their secrets • TCB is small • Intel CPU • App code itself • Protected from malicious • BIOS • SMM • Hypervisor • Kernel • Familiar application environment
  • 95. SGX Enclaves • Trusted execution environment embedded in the process
  • 96. SGX Enclaves • Trusted execution environment embedded in the process • It’s own code and data • Controlled entry points • Multi-threading • Confidentiality • Integrity
  • 98. • Enters and exits are expensive • Memory is encrypted • Limited physical memory Performance
  • 100. Powerful Adversary Model • OS + VMM • Controlled execution environment • Control over page faults • Suspending execution • Single stepping • Flushing caches
  • 101.
  • 102. • Every architectural component of the CPU • Branch target buffers • S. Lee et al., “Inferring fine-grained control flow inside SGX enclaves with branch shadowing,” in USENIX Security, 2017 • G. Chen et al., “SgxPectre attacks: Stealing intel secrets from SGX enclaves via speculative execution,” arXiv preprint, 2018. • Pattern-history table • D. O'Keeffe et al., "Spectre attack against SGX enclave," 2018 • Caches • Brasser et al., "Software grand exposure: SGX cache attacks are practical," in WOOT, 2017 • J. Gotzfried et al., "Cache attacks on Intel SGX," in EuroSec, 2017 • A. Moghimi et al., "Cachezoom: How SGX amplifies the power of cache attacks," in CHES, 2017 • M. Hahnel et al., "High-resolution side channels for untrusted operating systems," in USENIX ATC, 2017 • M. Schwarz et al., "Malware guard extension: Using SGX to conceal cache attacks," in DIMVA, 2017 • DRAM row buffer • W. Wang et al., "Leaky cauldron on the dark land: Understanding memory side-channel hazards in SGX," in CCS, 2017 • Page-tables • W. Wang et al., “Leaky cauldron on the dark land: Understanding memory side-channel hazards in SGX,” in CCS, 2017 • J. Van Bulck et al., “Telling your secrets without page faults: stealthy page table-based attacks on enclaved execution,” in USENIX, 2017 • Page-fault exception handlers • Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015 • S. Shinde and other, “Preventing page faults from telling your secrets,” in CCS, 2016 • Speculative execution • J. V. Bulck et al., “Foreshadow: Extracting the keys to the Intel SGX kingdom with transient out-of-order execution,” in USENIX, 2018 Side-Channel Attacks
  • 103. • Controlled channel attacks Page-Fault Tracing Attacks Reference: Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
  • 104. • Page fault address depends on sensitive data Page Fault Tracing Attacks Reference: Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
  • 105. • Insertions are deterministic • Word order is known • Observe sequence of page faults • Lookup exhibit same sequences Example: Recovering Text via Spell Checker Reference: Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
  • 106. • Wizard of Oz • All words • 96% accuracy Example: Recovering Text via Spell Checker Reference: Y. Xu et al., “Controlled-channel attacks: Deterministic side channels for untrusted operating systems,” 2015
  • 107.
  • 108. Cache Attacks: Prime + Probe Reference: Brasser et al., "Software grand exposure: SGX cache attacks are practical," in WOOT, 2017
  • 109. • Isolated core • Execute attack in L1 • Separate instruction and data caches • No slef-pollution • SMT • Uninterrupted execution • Performance Monitoring Counters (PMC) • Cache-misses Controlled Execution Environment Reference: Brasser et al., "Software grand exposure: SGX cache attacks are practical," in WOOT, 2017
  • 110. Example: Cache-Tracing Attack Reference: M. Hahnel et al., "High-resolution side channels for untrusted operating systems," in USENIX ATC, 2017
  • 111. Text Reconstruction Reference: M. Hahnel et al., "High-resolution side channels for untrusted operating systems," in USENIX ATC, 2017
  • 112. Cache-Tracing: Reconstructed Text Reference: M. Hahnel et al., "High-resolution side channels for untrusted operating systems," in USENIX ATC, 2017
  • 113. • SGX does not clear branch history Branch Shadowing Attack Reference: S. Lee et al., “Inferring fine-grained control flow inside SGX enclaves with branch shadowing,” in USENIX Security, 2017
  • 114. • SGX does not clear branch history • Can we extract this information? Branch Shadowing Attack Reference: S. Lee et al., “Inferring fine-grained control flow inside SGX enclaves with branch shadowing,” in USENIX Security, 2017
  • 115. • 66% of 1024 RSA private key from a single run • Full key from 10 runs Branch Shadowing Attack Reference: S. Lee et al., “Inferring fine-grained control flow inside SGX enclaves with branch shadowing,” in USENIX Security, 2017
  • 117. Data-Oblivious Primitives • Assignments and comparisons Reference: Ohrimenko, Olga, et al. "Oblivious multi-party machine learning on trusted processors." USENIX Security, 2016.
  • 118. Data-Oblivious Primitives • Assignments and comparisons Reference: Ohrimenko, Olga, et al. "Oblivious multi-party machine learning on trusted processors." USENIX Security, 2016.
  • 119. • Array access • Scan entire array • AVX instructions Data-Oblivious Primitives Reference: Ohrimenko, Olga, et al. "Oblivious multi-party machine learning on trusted processors." USENIX Security, 2016.
  • 121. • Will be fixed • Caches • Partitioned caches • Branch predictors and likely other microarchitectural components of the CPU • Speculative Taint Tracking (STT) • Yu, Jiyong, et al. "Speculative Taint Tracking (STT): A Comprehensive Protection for Speculatively Accessed Data." Micro, 2019 • What will not be fixed • Paging attacks • SGX inherently leaves page table under control of the OS • Memory • Enclave’s memory is observable by the OS and hardware attacks • ORAM is 10x overhead What will be fixed in hardware
  • 122. Can we design any system supporting database operations using SGX? • It’s possible to build an oblivious database • Oblivious primitives for accessing records • Oblivious sort for joins • Parallel Bitonic sort N*(log(N))2 Question
  • 123. Opaque (1) Patient ID disease 1 Fever 2 Cancer 3 Fever 4 Cancer 5 Cancer Reference and slide credit: Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada Popa, Joseph E. Gonzalez, and Ion Stoica. "Opaque: An oblivious and encrypted distributed analytics platform." In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 283-298. 2017. • First system supporting database joins and aggregations using SGX • However, supports primary key to foreign key join only • How many people are suffering from cancer and fever
  • 124. Opaque (2) • How many people are suffering from cancer and fever Patient ID disease 1 Fever 2 Cancer 3 Fever 4 Cancer 5 Cancer Reference: Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada Popa, Joseph E. Gonzalez, and Ion Stoica. "Opaque: An oblivious and encrypted distributed analytics platform." In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 283-298. 2017.
  • 125. Opaque (2): Oblivious Aggregation • How many people are suffering from cancer and fever Patient ID disease 1 Fever 2 Cancer 3 Fever 4 Cancer 5 Cancer Patient ID disease 2 Cancer 4 Cancer 5 Cancer 1 Fever 3 Fever Quicksort in SGX Cancer, 3 Fever, 2 Patient ID disease 1 Fever 2 Cancer 3 Fever 4 Cancer 5 Cancer Decrypt Reference: Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada Popa, Joseph E. Gonzalez, and Ion Stoica. "Opaque: An oblivious and encrypted distributed analytics platform." In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), pp. 283-298. 2017. What is wrong assumption of Opaque? Not dealing with side-channel attacks (cache-line, branching) But all side-channel attacks cannot be solved in future
  • 126. • ObliDB • Selection and join • HardIDX • Secure indexes using SGX • VC3 • For secure MapReduce computations • EnclaveDB • For secure transaction support • Hermetic • Mixed differential privacy Other Systems using Intel SGX
  • 128. •What we have seen in the previous slides? • Many cryptographic solutions exist • Not efficient for answering even simple queries Scaling Secure Data Management “At scale” solutions requires choice between generality, security or performance. Weaker Security Models: use weaker models of security to scale computation (explored in several prior systems) Partitioned Computing: exploit partial sensitivity of data to prevent expensive cryptography on data that is not sensitive
  • 129. Partitioned Data Security • Non-Linkability • The Adversary does not learn relationship between any encrypted and plaintext value • Cyphertext Indistinguishability • The adversary does not learn any relationships between encrypted values • unless underlying crypto allows such relationships to be learnt (e.g., OPE) Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data." In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 650-661. IEEE, 2019.
  • 130. Partitioned Computations at Public Cloud (1) Name Department t1 E(Adam) E(Defense) t2 E(John) E(Security) t3 E(Clark) E(Crypto) t4 E(Lisa) E(Defense) Name Department t5 Adam Testing t6 John Testing t7 Lisa Design t8 Clark Design Query Q Answer A Query Qs Query Qns Answer Ans Answer As Sensitive Data Ds Non-sensitive Data Dns Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data." In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 650-661. IEEE, 2019.
  • 131. Leakage due to Partitioned Computing… (2) Name Department t1 E(Adam) E(Defense) t2 E(John) E(Security) t3 E(Clark) E(Crypto) t4 E(Lisa) E(Defense) Name Department t5 Adam Testing t6 John Testing t7 Lisa Design t8 Clark Design Sensitive Data Ds Non-sensitive Data Dns Query: Retrieve John rows Query value Tuples retrieved from sensitive side Tuples retrieved from non-sensitive side John T2 T6 Adversarial view T2 is John’s row. Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data." In 2019 IEEE 35th International Conference on Data Engineering (ICDE),
  • 132. What if we use access-pattern-hiding techniques? (3) Name Department t1 E(Adam) E(Defense) t2 E(John) E(Security) t3 E(Clark) E(Crypto) t4 E(Lisa) E(Defense) Name Department t5 Adam Testing t6 John Testing t7 Lisa Design t8 Clark Design Sensitive Data Ds Non-sensitive Data Dns Query: Retrieve John rows Query value Tuples retrieved from sensitive side Tuples retrieved from non-sensitive side John E(….) T6 Adversarial view Output size reveals that one of John’s record is sensitive. Reference: Sharad Mehrotra, Shantanu Sharma, Jeffrey Ullman, and Anurag Mishra. "Partitioned data security on outsourced sensitive and non-sensitive data." In 2019 IEEE 35th International Conference on Data Engineering (ICDE),
  • 133. Secure Partitioned Computation (1) • Data partitioned into bins • Non-sensitive data partitioned into non-sensitive bins (NSB) • Sensitive data partitioned into sensitive bin (SB) ……E( x)…….. …… x …….. …… y…….. …… z .…….. …….…….. ……E(y) …….. …… E(z)…….. …….…….. Ds Dns SB(x) SB(y) SB(z) NSB(x) NSB(y) NSB(z) Query value Tuples retrieved from sensitive side Tuples retrieved from non-sensitive side John SB(y) NSB(y) Adversarial view • Query Q for value y mapped to all values in the bin corresponding to y • Retrieves all data in NSB(y) over non-sensitive data • Retrieves all data in SB(y) over sensitive data
  • 134. Secure Partitioned Computation (2) • Bins are created such that for all pairs of sensitive and non-sensitive bins, there exists a value v, • such that s  SB(v) and ns  NSB(v) • The adversarial view does not allow the adversary to learn linkability between sensitive and non-sensitive records ……E( x)…….. …… x …….. …… y…….. …… z .…….. …….…….. ……E(y) …….. …… E(z)…….. …….…….. Ds Dns SB(x) SB(y) SB(z) NSB(x) NSB(y) NSB(z)
  • 135. Secure Partitioned Computation (3) • Association amongst each sensitive bin and non-sensitive bin prevents • Leakage through joint access of data • Output size attacks • Workload skew attacks can be prevented through (careful) addition of (minimal) fake queries ……E( x)…….. …… x …….. …… y…….. …… z .…….. …….…….. ……E(y) …….. …… E(z)…….. …….…….. Ds SB(x) SB(y) SB(z) NSB(x) NSB(y) NSB(z) Dns
  • 136. Query Binning • Assumptions • Equal number of sensitive and non-sensitive attribute values • Each distinct attribute value appears in at most one tuple in sensitive and one tuple in non-sensitive data • Number of values are a product of approximately equal factors ***The paper relaxes all these assumptions
  • 137. The Algorithm: One Tuple Per Value Bin Creation: Inputs: S and NS • Permute all sensitive values • Find approximate square factor of |NS| = x * y such that x ≥ y • Create x sensitive bins; contains at most y inputs in each • Create |NS|/x non-sensitive bins • Assign ith sensitive value to (i mod x)th sensitive bin • Assigning non-sensitive values: Assign non-sensitive value corresponding to ith sensitive value, which is allocated to jth bin, to jth position of ith non-sensitive bin • NSB[j][i]  allocateNS(SB[i][j]) • Fill remaining NS values S = 6 NS = 6 x = 3 y = 2 SB1 SB2 SB3 NSB1 NSB2 S1 S2 S3 S4 S5 S6 NS2 NS3NS1 NS7 NS6NS4 S = {S1, S2, S3, S4, S5, S6} NS = {NS1, NS2, NS3, NS6, NS7}
  • 138. The Algorithm: One Tuple Per Value • Bin Retrieval: Input: Query(w) • If w is in a sensitive bin SB[i][j], then • Retrieve ith sensitive bin and jth non-sensitive bin • If w is in a non-sensitive bin NSB[i][j], then • Retrieve ith non-sensitive bin and jth sensitive bin S = 6 NS = 6 x = 3 y = 2 S = {S1, S2, S3, S4, S5, S6} NS = {NS1, NS2, NS3, NS6, NS7} Query: S2 SB2, NSB1 Query: NS7 NSB1, SB2 SB1 SB2 SB3 NSB1 NSB2 S1 S2 S3 S4 S5 S6 NS2 NS3NS1 NS7 NS6NS4
  • 139. Query Execution Cost on Outsourced Data Techniques Time Resilient to attacks Size Workload-skew Access-patterns SGX 10500x Query Binning + SGX (60% sensitivity) 8929x Multi-party computations-Jana 954363x Query Binning + Jana (60% sensitivity) 680131x x is the time to search a predicate in cleartext. is showing a technique is resilient to a given attack. Experiments are conducted over 1.5M rows.
  • 140. Experimental Results (Selection Query) • X-axis = Data sensitivity (1%, 2%, 20%, 40%, 60%) • Y-axis = time SGX Opaque + Partition computing vs SGX Opaque Data set size = 6M rows Jana MPC + Partition computing vs Jana MPC Data set size = 1M rows
  • 141. Analytical Model • When is query binning better compared to pure cryptographic approach? Ratio of cost of QB versus crypto only approach After several rounds of simplications (see paper) Under ideal assumptions…. QB is better than cryptographic only solution if this holds (see paper) Ratio of computation cost of cryptographic techniques vs plaintext per tuple Ratio of cryptographic computation vs communication cost per tuple (typically much greater than 1 for strong cryptographic techniques) Average query selectivityRatio of sensitive data
  • 142. • If there is no approximate square factor? • Select nearest square number • If there is no 1-to-1 mapping of sensitive and non-sensitive value, and differences in size of the values? • Bin-packing algorithm • What about range queries? • With the help of a modified B-tree created over non-sensitive bins • What about join queries? • Keep pseudo-sensitive data with sensitive data • What about aggregation queries? • Execute like a selection query without tuple fetching Query Binning Extensions
  • 143. Distinct Values are not a Product of Approximately Square Factor (1) • What will happen when the number of distinct values is not a product of approximately square factor ??? • Increasing communication cost • For example 82 non-sensitive values, results in 41 sensitive bins and 2 non- sensitive bins ns1, ns2, …, ns41 ns42, ns43, …, ns82 E(s1) E(s2) E(s41) SB1 SB2 SB41 NSB1 NSB2 Communication cost = 42 At most 1 value in a sensitive bin At most 41 values in a non-sensitive bin
  • 144. Distinct Values are not a Product of Approximately Square Factor (2) • Reducing communication cost --- by finding nearest square number • In the case of 82 non-sensitive values, 81 is nearest square number • Thus, create 9-9 sensitive and non-sensitive bins ns1, ns2, …, ns10 ns11, ns12, …, ns19 ….E(x)…. …E(y)….. ….E(z)….. SB1 SB2 SB9 41Sensitivevalue 82Non-sensitivevalue Communication cost = 15 ns74, ns75, …, ns82 At most 5 values in a sensitive bin At most 10 values in a non-sensitive bin NSB1 NSB2 NSB9
  • 145. The Algorithm: General Case: Multiple Tuples per Value (1) • What will happen if all values have a different number of tuples?? • Size of each sensitive bin is different now • Assumption: More non-sensitive values have more sensitive associated tuples. • The adversary learns from tuple retrieval that which bin contain sensitive value corresponding to non-sensitive values • E.g., retrieval of SB1 and NSB1 reveals that S1 is allocated to SB1 S = 6 NS = 6 x = 3 y = 2 SB1 SB2 SB3 NSB1 NSB2 S1 S2 S3 S4 S5 S6 NS2 NS3NS1 NS7 NS6NS4 S1 = 10 S2 = 2 S3 = 1 S4 = 15 S5 = 2 S6 = 1 NS1 = 200 NS2 = 20 NS3 = 10 NS4 = 150 NS5 = 10 NS7 = 10 Size of bin 25 4 2 Size of bin 230 170
  • 146. The Algorithm: General Case: Multiple Tuples per Value (2) • What will happen if all values have a different number of tuples? • Solution: Simply add fake tuples to sensitive bins • Problem: too many fake tuples leading to increases communication cost • So how to overcome this problem??? S = 6 NS = 6 x = 3 y = 2 SB1 SB2 SB3 NSB1 NSB2 S1 S2 S3 S4 S5 S6 NS2 NS3NS1 NS7 NS6NS4 S1 = 10 S2 = 2 S3 = 1 S4 = 15 S5 = 2 S6 = 1 NS1 = 200 NS2 = 20 NS3 = 10 NS4 = 150 NS5 = 10 NS7 = 10 Size of bin 25 4 2 Size of bin 230 170 Added fake tuples 0 21 23 We add 44 fake tuples to sensitive data
  • 147. The Algorithm: General Case: Multiple Tuples per Value (3) • What will happen if all values have a different number of tuples? • Solution: Bin-packing-based approach • Sorting: Sort all the values in a decreasing order of the number of tuples. • Allocate sensitive values • Add fake tuples • Allocate non-sensitive values as we showed previously S = 6 NS = 6 x = 3 y = 2 SB1 SB2 SB3 NSB1 NSB2 S4 S1 S2 S6 S3 S5 NS1 NS2NS7 NS3 NS5NS6 S1 = 10 S2 = 2 S3 = 1 S4 = 15 S5 = 2 S6 = 1 NS1 = 200 NS2 = 20 NS3 = 10 NS4 = 150 NS5 = 10 NS7 = 10 Size of bins before adding faking tuples 16 11 4 Added fake tuples 0 5 12 S4 = 15 S1 = 10 S2 = 2 S5 = 2 S3 = 1 S6 = 1 After sorting We add fewer fake tuples than a simple solution of adding fake tuples 44 vs 17 fake tuples
  • 148. Range Queries • A full binary-tree is constructed for all non-sensitive value • Bins are created for each level of the tree, except the root node • Bins are retrieved based on least-matching • For example, a range query from ns8 to ns12  Bins as per node ns23 and ns8 Bins for each node of each level of the tree
  • 149. • Introduction • How to securely process data at the cloud? • Challenges and overview of existing state-of-the-art • Cryptographic Techniques • Encryption-based Data Outsourcing • Secret-Sharing-based Data Outsourcing • Exploiting Trusted Computing Platforms • Secure hardware • Hybrid cloud • Data Partitioning-based Outsourced Data Processing • Conclusion and Open Problems Contents
  • 150. • We discussed: • Encryption-based techniques and systems • Secret-sharing-based techniques and systems • Existing cryptographic techniques are • Functionality vs security vs overhead • Secret-sharing is secure but limited applicability • Searchable encryption is fast but reveal information • Trusted platform-based approaches are faster than cryptographic techniques • But there is no completely trusted platform at the public cloud • Existing secure hardware have several vulnerability • Can we exploit secure mediation approach • Different cryptographic technique at the same time • Security is not clear • Initial effort: partitioned computation but security challenges -- a naïve query execution on partitioned data can lead to information leakage Conclusion
  • 151. Contact Information Shantanu Sharma University of California, Irvine, USA. shantanu.sharma[AT]uci[DOT]edu toshantanusharma[AT]gmail[DOT]com Slides are available at ics.uci.edu/~shantas/

Notas do Editor

  1. Despite numerous ways in how the use of computer systems has evolved over the last four decades, modern OS kernels rely on the software development technology that has not changed since early computer systems. Modern kernels are still developed with a legacy software engineering techniques - a combination of an unsafe programming Language, and virtually no testing or verification tools.
  2. Modern kernels are notoriously complex. A typical kernel like Linux consists of 12,000,000 lines of code, 40 major subsystems, and over 3,000 device drivers. Kernel code routinely employs -- manual management of low-level concurrency primitives, -- handles millions of object allocations and deallocations per second, -- implements numerous security and access control checks, -- and adheres to multiple conventions that describe allocation, locking, and synchronization of kernel data structures in nearly every kernel function.