The document presents two methods for securing social information when databases are outsourced: Query Generalization by Dynamic Hash and Result Generalization by Bloom Filter. Queries are generalized through dynamic hashing to mix user queries and prevent determining relationships. Results are generalized using Bloom filters by including irrelevant tuples to mask which users requested the same tuples. The goal is to protect users' social information and relationships from being discovered by the outsourced database provider. Future work involves implementing and evaluating these methods on a real outsourced database service.
Securing Social Information from Query Analysis in Outsourced Databases
1. SECURING SOCIAL INFORMATION
FROM QUERY ANALYSIS
IN OUTSOURCED DATABASES
Junpei Kawamoto and Masatoshi Yoshikawa (Kyoto University)
2. iDB Forum 2008
2008/9/22
AGENDA
1.
2.
3.
4.
5.
Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Result Generalization by Bloom Filter
Conclusion and future work
2
3. iDB Forum 2008
2008/9/22
AGENDA
1.
2.
3.
4.
5.
Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Response Generalization by Bloom Filter
Conclusion and future work
3
4. iDB Forum 2008
2008/9/22
BACKGROUND OF OUR RESEARCH
Ò
Outsourced Database systems are in widely use
We can delegate management to service providers
É We can save on management costs
É We do not accept requests from outside of local networks
É We can protect the security of local networks
É
www
Traditional database system
www
Outsourced database system
4
5. iDB Forum 2008
2008/9/22
PROBLEMS
Ò
Security of users’ data
Service Provider has total authority of databases
É Service Provider can inspect users’ data
É
Data encryption at client side gives us a solution to the problem
Ò
Security of users’ social information
Social information
means the relationships among user’s accounts
5
6. iDB Forum 2008
2008/9/22
WHY IS SOCIAL INFORMATION IMPORTANT?
Ò
Two kinds of risk emerge from the social information leak
Personal information could be compromised in a chain effect
É Different online personae could be connected
É
6
7. iDB Forum 2008
2008/9/22
WHY IS SOCIAL INFORMATION IMPORTANT?
Ò
Personal information could be compromised in a chain
effect
Bob do not compromise his information.
However server can guess his name.
ID: lee
ID: carol
ID: alice_gc
Alice compromised her personal
information blunderingly.
Name: Alice, Boss: Bob, etc.
In a chain effect, server may be able
to guess other users information.
7
8. iDB Forum 2008
2008/9/22
WHY IS SOCIAL INFORMATION IMPORTANT?
Ò
Different online personae are connected
Servers could guess these login
names are associated with Alice.
ID: alice_gc
Public (official) account
Other people’s account
ID: Cheshire
Private account
8
9. iDB Forum 2008
2008/9/22
OUR PROPOSAL METHODS
Ò
Query Generalization by Dynamic Hash
Ò
Result Generalization by Bloom Filter
9
10. iDB Forum 2008
2008/9/22
AGENDA
1.
2.
3.
4.
5.
Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Response Generalization by Bloom Filter
Conclusion and future work
10
11. iDB Forum 2008
2008/9/22
DATA ENCRYPTION MODEL
Ò
We use a traditional data encryption model
É
Original table (not encrypted)
No
.
name
1
Products Review
7/17 15:00
7/17 18:00
Alice, Bob
2
Business Trip
7/18 10:00
7/20 18:00
Alice, Bob
3
É
begin
Who has authority
end
Team Meeting
7/20 15:00
7/20 18:00
Alice, Carol
4
Business Trip
7/21 12:00
Encrypted table (with index)
No etuple
.
Iname
Ibegin
Iend
acl
7/21 17:00
Dave
Etuple is encrypted original tuple
etuple = Encrypt(name, begin, end, acl)
1
5f0f1f46...
00
10
10
2
b98009af...
01
00
11
3
082ba604...
10
11
11
4
8bc546af...
01
01
01
Iname, Ibegin, and Iend are hash index
used for query processing
Iname = Hash(name)
11
12. iDB Forum 2008
2008/9/22
DATA ENCRYPTION MODEL
Ò
Query processing on the data encryption model
Name = “Business Trip”
&
Begin = “7/18 10:00”
Original query
Iname = “01”
&
IBegin = “00”
Query on server
Name = “Business Trip”
&
Begin = “7/18 10:00”
Query on client
No
.
name
begin
end
2
Business Trip
7/18 10:00
7/20 18:00
4
Business Trip
7/21 12:00
7/21 17:00
etuple
Iname
Ibegin
Iend
b98009af...
01
00
11
8bc546af...
01
01
01
12
13. iDB Forum 2008
2008/9/22
AGENDA
1.
2.
3.
4.
5.
Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Response Generalization by Bloom Filter
Conclusion and future work
13
14. iDB Forum 2008
2008/9/22
QUERY GENERALIZATION
Ò
Servers can guess users’ relationship by queries.
If only Alice and Bob sent this query,
server can guess they have some relation.
Alice
Bob
Carol
SELECT *
FROM schedule
WHERE begin = “7/15 10:00”
SELECT *
FROM schedule
WHERE begin = “7/14 10:00”
SELECT *
FROM schedule
WHERE begin = “7/15 10:00”
or “7/14 10:00”
This query is requested by at least three users
so that server cannot find group information.
14
15. iDB Forum 2008
2008/9/22
HOW TO GENERALIZE
Ò
Firstly, queries are described by hash indices
Begin = 7/15 10:00
Begin = 7/14 10:00
Ò
IBegin = 0001011
IBegin = 0011110
Next, the query is translated before it send to DB.
www
DB
Generalizer
Organization’s network
15
16. iDB Forum 2008
2008/9/22
HOW TO GENERALIZE
Ò
Generalizer uses dynamic hash to translate queries
Translated query is
IBegin = 00*****
0
0001011
0011110
1
0100101
0110110
IBegin = 01*****
1010011
1101101
1100011
IBegin = 1******
0
1
*: wild card
node
leaf
Begin = 7/15 10:00
IBegin = 00*****
Begin = 7/14 10:00
IBegin = 00*****
16
17. iDB Forum 2008
2008/9/22
SPLITTING LEAF
Ò
Leaves are split to keep the distribution of hash balanced
Insert new hash : 1000110
0
0001011
0011110
1
0100101
0110110
0
1
1010011
1101101
1100011
1000110
0
0001011
0011110
1
0100101
0110110
0
1
0
1010011
1000110
1
1101101
1100011
So that, moderate size hashes could be mixed.
17
18. iDB Forum 2008
2008/9/22
AGENDA
1.
2.
3.
4.
5.
Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Result Generalization by Bloom Filter
Conclusion and future work
18
19. iDB Forum 2008
2008/9/22
RESULT GENERALIZATION
Ò
Servers can guess users’ relationship by query result.
user
tuple
Alice
tuple1
Bob
tuple2
Carol
tuple3
Dave
tuple4
If only Alice and Carol request
the tuple2, servers can guess
there are some relationships
between them
If Alice and Dave never request
same tuples, servers can guess
there are no relationship
between them
To prevent servers’ guessing, some irrelevant tuples are requested.
19
20. iDB Forum 2008
2008/9/22
RESULT GENERALIZATION
Ò
User’s queries are generalized
Original query
Generalized query
www
user
DB
tuple
Alice
tuple1
Bob
tuple2
Carol
tuple3
Dave
tuple4
In generalized query, each tuple is
received by three users.
Servers cannot guess relationships
from the information of result
tuples
20
21. iDB Forum 2008
2008/9/22
BLOOM FILTER BASED GENERALIZATION
Ò
Users are described by k+n length bit string
Alice(uid = 1)
1 0 0 0 0 1 1
Bob(uid = 2)
0 1 1 1 0 0 0
Carol(uid = 3)
1 0 1 0 1 0 0
(uid mod 2) + 1
Ò
hash(uid)
Dave(uid = 4)
0 1 0 0 1 0 1
Access authority is logical disjunction of the bit strings
If Alice and Bob have authority:
1 1 1 1 0 1 1
If Alice and Carol have authority:
1 0 1 0 1 1 1
In this example,
k =2, n = 5
Alice
1 0 0 0 0 1 1
∨
Bob
0 1 1 1 0 0 0
21
22. iDB Forum 2008
2008/9/22
QUERY PROCESSING
Ò
When user requests tuples, bit string is used
A tuple Alice & Bob have
authority to
1111011
1111011
Alice
1000011
10
00011
Bob
0111000
01
11000
Carol
1010100
10
10100
Dave
0100101
01
00101
Server side
At server side,
first k bits are used
Client side
At client side,
last n bits are used
22
23. iDB Forum 2008
2008/9/22
ANONYMITY
In first k bits used at server side,
⎡N / k ⎤ users are assigned to each bit.
Ò Each tuple is received at least ⎡N / k ⎤ users.
Ò
Alice(uid = 1)
1 0 0 0 0 1 1
(uid mod 2) + 1
Ò
N: Total number of user
hash(uid)
To avoid result analysis, we could introduce
⎡N / k ⎤ - anonymity.
23
24. iDB Forum 2008
2008/9/22
AGENDA
1.
2.
3.
4.
5.
Security problems of outsourced databases
The data encryption model we employ
Query Generalization by Dynamic Hash
Response Generalization by Bloom Filter
Conclusion and future work
24
25. iDB Forum 2008
2008/9/22
CONCLUSION AND FUTURE WORK
Ò
Ò
Ò
Ò
We introduce a new problem for outsourced DB
Social information means the relationships among user’s accounts
To protect the social information, we introduce two method
É Query Generalization by Dynamic Hash
É Result Generalization by Bloom Filter
Future work
É Implement our methods and apply them to real service.
É Evaluate our methods.
25