Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Presentation cedem luca
1. Open Access and Database Anonymization
an Open Source Procedure
Based on an Italian Case Study
Danube University Krems, 21-23 May 2014
L. Leschiutta, G.Futia
2. dd
th
Month Year What 222nd May 2014 Giuseppe Futia – Politecnico di Torino 2
Introduction (1)
The principal way to openly share a database is to remove all
data that could lead to the identification of the involved
subjects (i.e. database anonymization);
we describe a procedure on how to process and anonymize a
collection of data that includes personal, sensitive and
judicial data;
the procedure is general purpose and implemented relying
solely on common open-source software applications.
3. dd
th
Month Year What 322nd May 2014 Giuseppe Futia – Politecnico di Torino 3
Introduction (2)
• Our study is based on a real case in which a database
consisting of 352 data fields of car accidents related data
(TWIST) needs to be open accessed;
• this work was developed in the framework of the Open-DAI
project. Open-DAI is “Opening Data Architectures and
Infrastructures” for European Public Administrations. It is a
project funded under the ICT Policy Support Programme as
part of the Competitiveness and Innovation framework
Programme (CIP) Call 2011.
4. dd
th
Month Year What 422nd May 2014 Giuseppe Futia – Politecnico di Torino 4
Non Anonymous Data
ID1 NID1 ID2 ID3 NID2 ID4 NID3 NID4
Item 1
Item 2
Item N
5. dd
th
Month Year What 522nd May 2014 Giuseppe Futia – Politecnico di Torino 5
Ordered Non Anonymous Data
ID1 ID2 ID3 ID4 NID1 NID2 NID3 NID4
Item 1
Item 2
Item N
6. dd
th
Month Year What 622nd May 2014 Giuseppe Futia – Politecnico di Torino 6
Ordered Non Anonymous Data
including Anonymous IDs
ID1 ID2 ID3 ID4 AID NID1 NID2 NID3 NID4
Item 1
1053
Item 2
1001
1057
Item N
1133
7. dd
th
Month Year What 722nd May 2014 Giuseppe Futia – Politecnico di Torino 7
Anonymous Data
AID NID1 NID2 NID3 NID4
1053
1001
1057
1133
8. dd
th
Month Year What 822nd May 2014 Giuseppe Futia – Politecnico di Torino 8
Random AIDs generation
9. dd
th
Month Year What 922nd May 2014 Giuseppe Futia – Politecnico di Torino 9
Advanced techniques: repeating IDs
IF(ISNA(VLOOKUP(C4;C$1:C3;1; ));AID.A8;VLOOKUP(C4;C$1:F3;4; ))
10. dd
th
Month Year What 1022nd May 2014 Giuseppe Futia – Politecnico di Torino 10
Non Unique IDs In Multiple Cells (1)
ID1 ID2 ID3 ID4 NID1 NID2 NID3 NID4
Item 1 Lorem ipsum
Item 2
Lorem ipsum
Item N Lorem ipsum
11. dd
th
Month Year What 1122nd May 2014 Giuseppe Futia – Politecnico di Torino 11
Non Unique IDs In Multiple Cells (2)
flag=false;
for (i=0; i<n: i++){
for (j=0; j<m: j++){
if(ID_Matrix[i][j]==ID_Matrix[n][m]{
AID_Matrix[n][m] =
AID_Matrix[i][j];
flag=true;
break;
}
}
}
if (flag==false){
AID_Matrix[n][m] = Next_Availabe_AID(k);
k++;
}
12. dd
th
Month Year What 1222nd May 2014 Giuseppe Futia – Politecnico di Torino 12
Data Wiping
• To perform this operation on Windows,
you can use the open source program
Eraser (http://eraser.heidi.ie );
• on Linux, you can use the following
commands:
> shred NonAnonymousData.csv
> rm NonAnonymousData.csv
13. dd
th
Month Year What 1322nd May 2014 Giuseppe Futia – Politecnico di Torino 13
Cryptograph the file
• On Windows this can be achieved by using the
open source 7zip program (http://www.7-
zip.org/ ) that allows to achieve a strong AES-
256 encryption.
• On Linux you can use the following command:
> gpg -c NonAnonymousData.csv
The encrypted file must then be backed up to a
safe location e.g. a non-rewritable DVD or a
WORM (Write Once Read Many) tape.
14. dd
th
Month Year What 1422nd May 2014 Giuseppe Futia – Politecnico di Torino 14
Data Degradation (location)
15. dd
th
Month Year What 1522nd May 2014 Giuseppe Futia – Politecnico di Torino 15
Data Degradation (location)
16. dd
th
Month Year What 1622nd May 2014 Giuseppe Futia – Politecnico di Torino 16
Data Degradation (time)
• 10 November 2011 at 10:25
• 10 November 2011 between 10 and 11
• Winter 2011
17. dd
th
Month Year What 1722nd May 2014 Giuseppe Futia – Politecnico di Torino 17
Conclusions: de-anonymization test
• How to test if a database is anonymous
enough?
• Reasonable efforts “the means possibly
required to effect identification are to be
considered disproportionate compared
with the (risk of) damage resulting”
• de-anonymization test
18. 22nd May 2014 Giuseppe Futia – Politecnico di Torino 18
Thank you
Luca Leschiutta (luca.leschiutta@polito.it)
Giuseppe Futia (giuseppe.futia@polito.it)
Nexa Center for Internet & Society (http://nexa.polito.it)
Dept. of Computer and Control Engineering (DAUIN)
Politecnico di Torino, Italy