SlideShare uma empresa Scribd logo
1 de 62
Baixar para ler offline
Venue and Date:
Center for Business and Graduate Studies
Dean’s Conference Room 1303
Open to the Public
Thursday, April 17, 2014 at 1 pm
Dissertation Committee:
Claude Turner, Ph.D. Chair
Soo-Yeon Ji, Ph.D. Member
Hoda El-Sayed, D.Sc. Member
Darsana Josyula, Ph.D. Member
Anthony Joseph, Ph.D. External Examiner
Department of Computer Science
Dissertation Defense
AN INVESTIGATION OF DATA PRIVACY AND
UTILITY USING MACHINE LEARNING AS A GAUGE
Kato Mivule
For the Degree of
D.Sc. in Computer Science
Cosmas U. Nwokeafor, PhD
Dean, The Graduate School
Lethia Jackson, D.Sc.
Chair, Computer Science Department
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
OUTLINE
• Introduction
o The Problem
o Contributions
• Literature Review
• Methodology
• Results and Discussion
o Results
o Discussion
• Conclusion and Future work
o Conclusion
o Future work
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
CONTRIBUTIONS
1. A proposed a data privacy engineering framework, SIED.
2. A proposed Comparative x-CEG data utility analysis heuristic.
3. A proposed Initial and Subsequent basic (IBP and SBP) privacy
indexes.
4. A proposed data swapping and noise addition hybrid model for
privacy.
5. A proposed privatized synthetic data generation model using
image and signal processing techniques (DT, DCT, and DWT).
6. An implementation of k-anonymity by minimizing information
loss via the frequency count analysis and synthetic data
replacement model.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
THE PROBLEM
Finding a user-defined balance between data privacy and utility
needs with trade-offs.
• The challenge of ambiguous definitions of privacy and utility.
“Perfect privacy can be achieved by publishing nothing at all, but this has no
utility; perfect utility can be obtained by publishing the data exactly as received, but
this offers no privacy” Cynthia Dwork (2006)
Data Privacy
~Differential Privacy
~Noise addition
~K-anonymity, etc...
Data Utility
~Completeness
~Currency
~Accuracy
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
MOTIVATION
• Generate privatized synthetic data sets that meet acceptable
privacy and utility requirements.
• Data Privacy Engineering - Adapt engineering principles in the
data privacy and utility process.
HYPOTHESIS
• Fine-tuning parameters in the data privacy procedure,
specifically using perturbation methods such as noise addition
and differential privacy, lowers the classification error and thus
generates better data utility.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
LITERATURE REVIEW
The data privacy and utility problem
• Wong et, al., (2007); Meyerson & Williams, (2004); Park &
Shim, (2007): Data privatization diminishes data utility – an
NP-Hard problem.
• Krause & Horvitz, (2010); Wang & Wu, (2005): Optimal data
utility with privacy is a well-documented NP hard problem.
• Ghosh, et al., (2008); Brenner & Nissim, (2010 ): Trade-offs
needed in the privacy verses utility process – also NP hard.
• Li & Li, (2009): It is not possible to equate privacy and utility.
• Fienberg, Rinaldo, & Yang, (2010): Even with differential
privacy, privacy is granted but at a loss of data utility.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
LITERATURE REVIEW
Techniques and Algorithms used in this study
• Data Privacy
• Noise Addition
• Logarithmic Noise
• Multiplicative Noise
• Differential Privacy
• K-anonymity
• Image and Signal Processing
• Distance Transform
• Discrete Cosine Transform
• Discrete Wavelet Transform
• Gaussian Filtering
• Machine Learning
• KNN
• Neural Networks
• Naïve Bayes
• Decision Trees
• AdaBoost M1
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 1 – SIED, a data privacy engineering framework
• SIED phases – Specifications, Implementation, Evaluation, and Dissemination
• Motivation: Given any original dataset 𝑋, a set of data privacy engineering phases should be
followed from start to completion in the generation of a privatized dataset 𝑋′
.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 1 – SIED, a data privacy engineering framework -
The SIED Specification Phase:
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 1 – SIED, a data privacy engineering framework -
The SIED Implementation Phase:
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 1 – SIED, a data privacy engineering framework -
The SIED Evaluation Phase:
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 1 – SIED, a data privacy engineering framework -
The SIED Dissemination Phase:
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 2 – A Data Privacy Parameter Mapping Heuristic
•Categorize parameters for effective fine-tuning – better privacy
and utility. What parameters need adjustment in the data privacy
process?
CATEGORY 1
PARAMETERS
CATEGORY 2
PARAMETERS
CATEGORY 3
PARAMETERS
Data Utility Goal Parameters:
For example Accuracy, Currency,
and Completeness.
Data Privacy Algorithm Parameters:
Values k in k-anonymity, ε in Noise
addition and Differential privacy.
Application Parameters (e.g. Machine
Learning Classifier):
For example weak learners in
AdaBoost.
Parameter
Adjustment and
Fine-tuning
Trade-offs
Data Privacy and Utility
Preservation
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 3 – The x-CEG and Comparative x-CEG Heuristics
The Classification
Error Gauge (x-CEG)
Replicates x times
until threshold t is
reached.
Better utility might be achieved -
Publish
Apply data privacy
Classify privatized dataset
Get original dataset
If error <= t
Adjust data privacy parameters
Adjust classifier parameters
If error > t The Comparative x-
CEG heuristic employs
multiple data privacy
and classifier algorithms
in each run.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 4 – The x-CEG Threshold determination heuristic
• Average value of the function = integral / interval.
• 𝐴𝑉𝐹 = 𝐼𝑛𝑡𝑒𝑔𝑟𝑎𝑙/𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙
•
1
𝑏−𝑎
𝑓 𝑥 𝑑𝑥
𝑏
𝑎
•
1
𝑏−𝑎
𝑓(𝑥𝑖)𝑛
𝑖=1 ∆𝑥
• 𝑊ℎ𝑒𝑟𝑒 ∆𝑥 =
𝑏−𝑎
𝑛
• 𝐴𝑛𝑑 𝑥𝑖 =
1
2
𝑥𝑖−1 + 𝑥𝑖
• 𝑇ℎ𝑒 𝑚𝑒𝑎𝑛 𝜇 =
1
𝑁
𝑥𝑖
𝑁
𝑖=1
• 𝒕 = 𝑴𝒂𝒙[𝒎𝒂𝒙 𝒎𝒆𝒂𝒏 , 𝒎𝒂𝒙 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 ]
• The threshold 𝑡 is chosen as the highest point between the max mean and max mid-point values.
• The classification error of the original data set is used as a benchmark in measuring privatized synthetic data sets.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 5 – The Initial and Subsequent Privacy Indices
• Let 𝑋 be the set of all values in database 𝑋 such that 𝑋 = {𝑋1 … 𝑋 𝑛} .
• Let 𝑋′
be the set of items to be privatized such that 𝑋′
= {𝑋1
′
… 𝑋 𝑛
′
}
• Let 𝑌 be the set of items that get revealed after our initial privacy measurement.
• Where |𝑋′
| ≤ |𝑋| and |𝑌| ≤ |𝑋|
• As long as 𝑋, 𝑋′
, 𝑎𝑛𝑑 𝑌 are countable, such that there is a one-to-one function
(injective) 𝑓: 𝑋 → 𝑁; 𝑋′ → 𝑁; 𝑌 → 𝑁 from 𝑋, 𝑋′, 𝑎𝑛𝑑 𝑌 to natural numbers
𝑁 = { 0, 1, 2, 3 … 𝑛} respectively.
• 𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝐵𝑎𝑠𝑖𝑐 𝑃𝑟𝑖𝑣𝑎𝑐𝑦 (𝐼𝐵𝑃) =
𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 𝑋′
𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 (𝑋)
∗ 100
• 𝑆𝑢𝑏𝑠𝑒𝑞𝑢𝑒𝑛𝑡 𝐵𝑎𝑠𝑖𝑐 𝑃𝑟𝑖𝑣𝑎𝑐𝑦 (𝑆𝐵𝑃) =
𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 𝑋′− 𝑌
𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 (𝑋)
∗ 100
• where 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 is the total count of elements in both 𝑋′
, 𝑌 and 𝑋.
• IBP and SBP could be taken as percentages or normalized between 0 and 1.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Methodology – Contribution 6 – The Filtered Comparative x-CEG
Heuristic - Using image and signal processing techniques to generate
privatized synthetic data.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 7– Data swapping and noise addition data privacy
hybrid - Generating privatized synthetic data using data swapping and
noise perturbation.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
METHODOLOGY
Contribution 8 – Minimizing information loss with K-anonymity
• Implementation of k-anonymity by minimizing information loss via
the frequency count analysis and synthetic data replacement model.
Kato Mivule – Bowie State University Department of Computer Science
RESULTS AND DISCUSSION
Comparative x-CEG Results
•The Iris Fisher multivariate
dataset from the UCI repository
was used.
•165 experiment runs – generating
165 privatized synthetic data sets.
•KNN, Neural Nets, Decision
Trees, AdaBoost, and Naïve Bayes
•MATLAB data privacy and
Rapid Miner for machine learning.
NOISE LEVEL KNN NEURAL NETS NAÏVE BAYES DECISION TREES ADABOOST M1
Original 96.00 96.67 96.00 94.67 97.33
Noise1(μ=5.8, σ=0.8) 66.67 74.00 64.00 66.67 64.00
Noise2(μ=0, σ=0.8) 61.33 72.00 66.67 63.33 54.67
Noise3(μ=1, σ=0.8) 68.67 74.00 69.33 66.67 60.00
Noise4(μ=2, σ=0.8) 68.67 62.67 62.00 59.33 54.67
Noise5(μ=3, σ=0.8) 72.67 66.67 67.33 61.33 50.67
Noise6(μ=4, σ=0.8) 75.33 82.67 70.00 72.00 63.33
Noise1a(μ=5, σ=0.1) 94.00 93.33 92.67 91.33 92.67
Noise1b(μ=5, σ=0.2) 92.00 94.67 91.33 90.00 90.67
Noise1c(μ=5, σ=0.3) 93.33 94.00 90.67 92.00 94.00
Noise1d(μ=5, σ=0.4) 90.00 93.33 87.33 86.67 86.67
Noise2b(μ=0, σ=0.1) 96.67 96.67 94.00 96.67 92.00
Noise2c(μ=0, σ=0.2) 89.33 92.00 86.67 87.33 90.00
Noise2d(μ=0, σ=0.3) 87.33 90.00 86.67 84.67 85.33
Noise2e(μ=0, σ=0.4) 87.33 90.00 86.67 84.67 85.33
Noise3a(μ=1, σ=0.4) 87.33 87.33 85.33 84.00 83.33
Noise3b(μ=1, σ=0.1) 97.33 94.00 96.00 96.00 94.67
Noise3c(μ=1, σ=0.2) 92.67 95.33 91.33 90.67 93.33
Noise3d(μ=1, σ=0.3) 94.67 95.33 91.33 94.00 90.00
Noise4a(μ=2, σ=0.1) 94.67 98.00 98.00 96.67 98.00
Noise4b(μ=2, σ=0.2) 93.33 96.00 92.67 91.33 90.67
Noise4c(μ=2, σ=0.3) 88.00 91.33 89.33 90.00 86.67
Noise4d(μ=2, σ=0.4) 87.33 87.33 85.33 84.00 83.33
Noise5a(μ=3, σ=0.1) 97.33 94.00 96.00 96.00 94.67
Noise5b(μ=3, σ=0.2) 92.67 95.33 91.33 90.67 93.33
Noise5c(μ=3, σ=0.3) 94.67 95.33 91.33 94.00 90.00
Noise5d(μ=3, σ=0.4) 93.33 94.00 93.33 92.00 87.33
Noise6a(μ=4, σ=0.1) 78.00 87.33 87.33 82.67 84.67
Noise6b(μ=4, σ=0.2) 93.33 95.33 94.00 93.33 92.67
Noise6c(μ=4, σ=0.3) 91.33 92.00 92.00 90.00 92.00
Noise6d(μ=4, σ=0.4) 78.00 87.33 88.67 82.67 84.67
Multiplicative 56.67 68.67 59.33 64.67 58.00
Logarithmic 50.67 58.00 56.00 53.33 57.33
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION
Comparative x-CEG Results
• A bar chart depiction of the Comparative x-CEG classification accuracy results
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION - Comparative x-CEG Results
Comparative x-CEG results classifier performance results – Neural Nets most resilient.
Kato Mivule – Bowie State University Department of Computer Science
RESULTS AND DISCUSSION
• x-CEG Threshold Determination Results
• Threshold 𝒕 = 𝑴𝒂𝒙[𝒎𝒂𝒙 𝒎𝒆𝒂𝒏 , 𝒎𝒂𝒙 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 ]
• The threshold value is chosen heuristically using the mid-point value classification accuracy of
87.33% for the Neural Nets.
Statistic KNN NEURAL
NETS
NAÏVE
BAYES
DECISION
TREES
ADABOOST
M1
MAX
Mean 84.87 87.41 84.54 83.74 82.30 87.41
Mid-Point 80.18 82.48 79.81 79.05 77.51 82.48
Max 84.87 87.41 84.54 83.74 82.30 87.41
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
Kato Mivule – Bowie State University Department of Computer Science
RESULTS AND DISCUSSION
• x-CEG Threshold Determination Results
• Threshold 𝒕 = 𝑴𝒂𝒙[𝒎𝒂𝒙 𝒎𝒆𝒂𝒏 , 𝒎𝒂𝒙 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 ]
• The threshold value is chosen heuristically using the mid-point value classification accuracy of 87.33% for the
Neural Nets.
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
Kato Mivule – Bowie State University Department of Computer Science
RESULTS AND DISCUSSION
• x-CEG Threshold Determination Results
• Threshold 𝒕 = 𝑴𝒂𝒙[𝒎𝒂𝒙 𝒎𝒆𝒂𝒏 , 𝒎𝒂𝒙 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 ]
• The threshold value is chosen heuristically using the mid-point value classification accuracy of 87.33% for the
Neural Nets.
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
Kato Mivule – Bowie State University Department of Computer Science
RESULTS AND DISCUSSION
• How much privacy? – statistical traits of the original and privatized data.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION
• How much privacy? – statistical traits of the original and privatized data.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION
• How much privacy? – statistical traits of the original and privatized data.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION
• How much privacy? – statistical traits of the original and privatized data.
Statistic Value
Original Data MSE 15.8937
Privatized Data MSE 24.0875
Original Data Entropy -3.05E+04
Privatized Data Entropy -5.05E+04
Correlation 0.9808
MSE Difference 8.1938
Entropy Difference -2.00E+04
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Data Swapping and Noise Addition Hybrid
• 330 data sets generated from the data swapping and noise addition hybrid experiment.
• Optimal data swap for acceptable privacy and utility levels is between 5% and 10% data swap.
• The two data sets satisfied the threshold criteria after the Comparative x-CEG:
• 𝑛𝑜𝑖𝑠𝑒 ~ (𝜇 = 1, 𝜎 = 0.1) at 5% swap.
• 𝑛𝑜𝑖𝑠𝑒 ~ (𝜇 = 5, 𝜎 = 0.1) at 5% swap.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Data Swapping and Noise Addition Hybrid
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Data Swapping and Noise Addition Hybrid
• Best classification accuracy obtained between 5 to10% data swap.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Signal Processing and Data Privacy Hybrid
Privatized synthetic data sets using Discrete Cosine Transforms (DCT)
.
Synthetic DCT-based Sepal Length data results
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Signal Processing and Data Privacy Hybrid
Privatized synthetic data sets using Discrete Cosine Transforms (DCT)
.
Synthetic Filtered DCT-based Sepal Length data results
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Signal Processing and Data Privacy Hybrid
Privatized synthetic data sets using Discrete Cosine Transforms (DCT)
.
Filtered DCT-based data descriptive statistics – skeletal structure not kept as in DT-based data
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Signal Processing and Data Privacy Hybrid
Privatized synthetic data sets using Discrete Cosine Transforms (DCT)
.
Filtered DCT-based data inference statistics – low correlation
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Image Processing and Data Privacy Hybrid
Privatized synthetic data sets using Distance Transforms (DT) – Skeletal Structure
kept.
.
DT-based Sepal Length data results
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Image Processing and Data Privacy Hybrid
Privatized synthetic data sets using Distance Transforms (DT) – Skeletal Structure
kept.
.
Filtered DT-based Sepal Length data results
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Image Processing and Data Privacy Hybrid
Privatized synthetic data sets using Distance Transforms (DT) – Skeletal Structure
kept.
.
Filtered DT-based data descriptive statistics – skeletal structure kept
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Image Processing and Data Privacy Hybrid
Privatized synthetic data sets using Distance Transforms (DT) – Skeletal Structure
kept.
.
Filtered DT-based data Iinference statistics – High correlation
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test
DT produced the best Davis Bouldin Criterion at 0.419 after filtering.
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
DT-based synthetic data produced the best Davis Bouldin Criterion at 0.419 after filtering, out
performing the original data.
RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test
Clustering results of the Original Fisher Iris Data
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test
Clustering results of the synthetic DT-based synthetic Fisher Iris Data
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test
Clustering Results of the Filtered DT-based Fisher Iris Data.
Clustering greatly improved after filtering.
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION
DT, DCT, and DWT improved classification accuracy after filtering.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
Results – Signal Processing – The Machine Learning Classification Error Test
Bowie State University Department of Computer Science
Priv Synth Data NN KNN NB DT AdaBoost Max
Mean 91.00 87.95 86.07 86.74 84.33 91.00
MID-POINT 75.83 72.78 71.65 72.31 70.39 75.83
Max 91.00 87.95 86.07 86.74 84.33 91.00
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION - Non-Interactive Differential Privacy (DP)
•Results of the Iris-Fisher data after DP – Too much noise is an issue with DP
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION - Non-Interactive Differential Privacy (DP)
• Classification accuracy of DP data (before filtering) reduces with increased
DP levels.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION - Non-Interactive Differential Privacy (DP)
• Improved Classification accuracy of DP data sets after filtering.
Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION - Non-Interactive Differential Privacy (DP)
• Comparative descriptive statistics of Original, DP, and filtered DP based data.
•Skeletal structure not kept as in DT-based data but outlier noise removed in DP-based
data
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
Results – Non-Interactive Differential Privacy – Inference Statistics
Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
Results – Non-Interactive Differential Privacy – How much DP?
Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
Results – Non-Interactive Differential Privacy – How much DP?
Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION– Data Privacy using K-Anonymity
• Suppress all items were k = 1.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION– Data Privacy using K-Anonymity
• Replace suppressed items with new synthetic values (most frequent values) such
that k > 1 for all items.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Data Privacy using K-Anonymity
• Only sensitive attributes removed – info loss minimized in published
attributes.
Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
RESULTS AND DISCUSSION – Data Privacy using K-Anonymity
• Only sensitive attributes removed – info loss minimized in published
attributes.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
CONCLUSION
• The Comparative x-CEG: Empirical results from this study show that fine-tuning parameters in the data privacy
procedure, specifically, Noise Addition and Differential Privacy, and with adjustments to the machine learning
classifiers, lowers the classification error and thus generates better and desirable data utility. The hypothesis holds. The
x-CEG model could help in presenting acceptable trade-off points between privacy and utility.
• The SIED model: It is vital for the appropriate solicitation of data privacy requirements that vary on a case by case
basis; therefore SIED could serve as a suitable framework in such data privacy engineering process.
• Privatized Synthetic Data Generation: Data swapping, Distance Transforms, Discrete Cosine Transforms, and
Discrete Wavelet Transforms, in combination with data privacy procedures allow for the generation of privatized
synthetic data sets. However, more research on optimal parameterization needs to be done; as well as using other signal
processing techniques.
• Distance Transforms and Filtering: Empirical results from this study show that a hybrid of Distance Transforms (DT)
and data privacy, in combination with filtering, maintains the skeletal structure of the original data, generates privatized
synthetic data with better classification accuracy results, thus better utility. However, more study needs to be done on
securing DT-based privatized data, to prevent attackers from reconstructing private data.
• Differential Privacy and Filtering: On the other hand, Differential Privacy (DP) offers strong privacy guarantees but at
the loss of data utility. However, empirical results from this study have shown that Gaussian filtering does reduce outlier
noise in DP-based data and with improved classification accuracy results.
• K-anonymity: Information loss could be minimized using frequency count analysis for privatized data models requiring
k-anonymity for confidentiality. Only remove sensitive attributes and use synthetics for suppressed values.
• Privacy versus Utility: Achieving optimal utility while granting privacy is still sought; Yet still, accurate classification
could also mean loss of privacy; Trade-offs must be made between privacy and utility.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
FUTURE WORK
•Future works include:
•Further the state-of-the-art in Data Privacy Engineering by developing data privacy
compliant software, data privacy modeling, autonomous intelligent data privacy
agent systems following the SIED framework.
•Apply data privacy and utility principles on digital forensics data, network traffic
data, bioinformatics data, and big data.
•Study efficient generation of privatized synthetic data sets.
• Apply data privacy principles to real time data; including realistic scenarios, where
users of data provide feedback on how useful the data was to them.
•Show, analytically, differences in performance between the various methods
introduced in this work, as well as other state-of-the-art methods.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
PUBLICATIONS
1. Kato Mivule, “Towards Agent-based Data Privacy Engineering”, Proceedings of the Sixth International Conference on Advanced Cognitive Technologies and
Applications – COGNITIVE 2014, May 25 – May 30, 2014 (In Print), Venice, Italy.
2. Kato Mivule and Claude Turner, “SIED, A Data Privacy Engineering Framework”, Abstracts, Emerging Researchers National Conference in STEM (ERN 2014),
Page A239, ISBN 978-0-87168-757-9, Feb 20-22, 2014, Washington DC, USA. [Best Oral Presentation Award]
3. Kato Mivule and Claude Turner, International Journal of Computer Science and Mobile Computing, ICMIC13, December- 2013, pg. 36-43, Trivandrum, Kerala,
India, Dec 17-18, 2013, Trivandrum, Kerala, India.
4. Kato Mivule and Claude Turner, A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Using Machine Learning Classification as a Gauge,
Procedia Computer Science, Volume 20, 2013, Pages 414-419, ISSN 1877-0509, Nov 13-15, Baltimore, MD, USA.
5. Kato Mivule and Claude Turner, “An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge”, International Conference on
Information and Knowledge Engineering (IKE 2013), July 22-25, Pages 203-204, Las Vegas, NV, USA.
6. Kato Mivule, Darsana Josyula, and Claude Turner, “Data Privacy Preservation in Multi-Agent Learning Systems”, Proceedings of the Fifth International Conference
on Advanced Cognitive Technologies and Applications – COGNITIVE 2013, May 27 - June 1, 2013, Pages 14-20, Valencia, Spain.
7. Kato Mivule, Claude Turner, Soo-Yeon Ji, "Towards A Differential Privacy and Utility Preserving Machine Learning Classifier", Procedia Computer Science, 2012,
Pages 176-181, Washington DC, USA.
8. Kato Mivule, Stephen Otunba, Tattwamasi Tripathy, Sharad and Sharma, "Implementation of Data Privacy and Security in an Online Student Health Records
System", Proceedings at the ISCA 21th International Conference on Software Engineering and Data Engineering (SEDE-2012), Pages 143-148, Los Angeles CA,
USA.
9. Kato Mivule, Claude Turner, "Applying Data Privacy Techniques on Published Data in Uganda", Proceedings of the 2012 International Conference on e-Learning, e-
Business, Enterprise Information Systems, and e-Government (EEE 2012), Pages 110-115, Las Vegas, NV, USA.
10. Kato Mivule, "Utilizing Noise Addition for Data Privacy, an Overview", Proceedings of the International Conference on Information and Knowledge Engineering
(IKE 2012), Pages 65-71, Las Vegas, NV, USA.
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
THANK YOU!
QUESTIONS?
kmivule@gmail.com
Kato Mivule – Bowie State University Department of Computer Science
DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE

Mais conteúdo relacionado

Mais procurados

Efficiency of LSB steganography on medical information
Efficiency of LSB steganography on medical information Efficiency of LSB steganography on medical information
Efficiency of LSB steganography on medical information IJECEIAES
 
Survey on evolutionary computation tech techniques and its application in dif...
Survey on evolutionary computation tech techniques and its application in dif...Survey on evolutionary computation tech techniques and its application in dif...
Survey on evolutionary computation tech techniques and its application in dif...ijitjournal
 
An Architectural Approach of Data Hiding In Images Using Mobile Communication
An Architectural Approach of Data Hiding In Images Using Mobile CommunicationAn Architectural Approach of Data Hiding In Images Using Mobile Communication
An Architectural Approach of Data Hiding In Images Using Mobile Communicationiosrjce
 
Development of durian leaf disease detection on Android device
Development of durian leaf disease detection on Android device Development of durian leaf disease detection on Android device
Development of durian leaf disease detection on Android device IJECEIAES
 
Multimedia data mining using deep learning
Multimedia data mining using deep learningMultimedia data mining using deep learning
Multimedia data mining using deep learningPeter Wlodarczak
 
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITYUSE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITYIJMIT JOURNAL
 
Weeds detection efficiency through different convolutional neural networks te...
Weeds detection efficiency through different convolutional neural networks te...Weeds detection efficiency through different convolutional neural networks te...
Weeds detection efficiency through different convolutional neural networks te...IJECEIAES
 
ARTIFICIAL INTELLIGENCE TECHNIQUES FOR THE MODELING OF A 3G MOBILE PHONE BASE...
ARTIFICIAL INTELLIGENCE TECHNIQUES FOR THE MODELING OF A 3G MOBILE PHONE BASE...ARTIFICIAL INTELLIGENCE TECHNIQUES FOR THE MODELING OF A 3G MOBILE PHONE BASE...
ARTIFICIAL INTELLIGENCE TECHNIQUES FOR THE MODELING OF A 3G MOBILE PHONE BASE...ijaia
 
Implementation of image steganography using lab view
Implementation of image steganography using lab viewImplementation of image steganography using lab view
Implementation of image steganography using lab viewIJARIIT
 
IRJET- Steganographic Scheme for Outsourced Biomedical Time Series Data u...
IRJET-  	  Steganographic Scheme for Outsourced Biomedical Time Series Data u...IRJET-  	  Steganographic Scheme for Outsourced Biomedical Time Series Data u...
IRJET- Steganographic Scheme for Outsourced Biomedical Time Series Data u...IRJET Journal
 
Solution for intra/inter-cluster event-reporting problem in cluster-based pro...
Solution for intra/inter-cluster event-reporting problem in cluster-based pro...Solution for intra/inter-cluster event-reporting problem in cluster-based pro...
Solution for intra/inter-cluster event-reporting problem in cluster-based pro...IJECEIAES
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
 
NLP-based personal learning assistant for school education
NLP-based personal learning assistant for school education NLP-based personal learning assistant for school education
NLP-based personal learning assistant for school education IJECEIAES
 
Hybrid deep learning model using recurrent neural network and gated recurrent...
Hybrid deep learning model using recurrent neural network and gated recurrent...Hybrid deep learning model using recurrent neural network and gated recurrent...
Hybrid deep learning model using recurrent neural network and gated recurrent...IJECEIAES
 
A one decade survey of autonomous mobile robot systems
A one decade survey of autonomous mobile robot systems A one decade survey of autonomous mobile robot systems
A one decade survey of autonomous mobile robot systems IJECEIAES
 
Effective Parameters of Image Steganography Techniques
Effective Parameters of Image Steganography TechniquesEffective Parameters of Image Steganography Techniques
Effective Parameters of Image Steganography TechniquesEditor IJCATR
 
Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social mediaPeter Wlodarczak
 
Novel framework using dynamic passphrase towards secure and energy-efficient ...
Novel framework using dynamic passphrase towards secure and energy-efficient ...Novel framework using dynamic passphrase towards secure and energy-efficient ...
Novel framework using dynamic passphrase towards secure and energy-efficient ...IJECEIAES
 

Mais procurados (20)

Efficiency of LSB steganography on medical information
Efficiency of LSB steganography on medical information Efficiency of LSB steganography on medical information
Efficiency of LSB steganography on medical information
 
Survey on evolutionary computation tech techniques and its application in dif...
Survey on evolutionary computation tech techniques and its application in dif...Survey on evolutionary computation tech techniques and its application in dif...
Survey on evolutionary computation tech techniques and its application in dif...
 
An Architectural Approach of Data Hiding In Images Using Mobile Communication
An Architectural Approach of Data Hiding In Images Using Mobile CommunicationAn Architectural Approach of Data Hiding In Images Using Mobile Communication
An Architectural Approach of Data Hiding In Images Using Mobile Communication
 
Development of durian leaf disease detection on Android device
Development of durian leaf disease detection on Android device Development of durian leaf disease detection on Android device
Development of durian leaf disease detection on Android device
 
Multimedia data mining using deep learning
Multimedia data mining using deep learningMultimedia data mining using deep learning
Multimedia data mining using deep learning
 
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITYUSE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
USE OF NETWORK FORENSIC MECHANISMS TO FORMULATE NETWORK SECURITY
 
Weeds detection efficiency through different convolutional neural networks te...
Weeds detection efficiency through different convolutional neural networks te...Weeds detection efficiency through different convolutional neural networks te...
Weeds detection efficiency through different convolutional neural networks te...
 
ARTIFICIAL INTELLIGENCE TECHNIQUES FOR THE MODELING OF A 3G MOBILE PHONE BASE...
ARTIFICIAL INTELLIGENCE TECHNIQUES FOR THE MODELING OF A 3G MOBILE PHONE BASE...ARTIFICIAL INTELLIGENCE TECHNIQUES FOR THE MODELING OF A 3G MOBILE PHONE BASE...
ARTIFICIAL INTELLIGENCE TECHNIQUES FOR THE MODELING OF A 3G MOBILE PHONE BASE...
 
Implementation of image steganography using lab view
Implementation of image steganography using lab viewImplementation of image steganography using lab view
Implementation of image steganography using lab view
 
IRJET- Steganographic Scheme for Outsourced Biomedical Time Series Data u...
IRJET-  	  Steganographic Scheme for Outsourced Biomedical Time Series Data u...IRJET-  	  Steganographic Scheme for Outsourced Biomedical Time Series Data u...
IRJET- Steganographic Scheme for Outsourced Biomedical Time Series Data u...
 
Solution for intra/inter-cluster event-reporting problem in cluster-based pro...
Solution for intra/inter-cluster event-reporting problem in cluster-based pro...Solution for intra/inter-cluster event-reporting problem in cluster-based pro...
Solution for intra/inter-cluster event-reporting problem in cluster-based pro...
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligence
 
NLP-based personal learning assistant for school education
NLP-based personal learning assistant for school education NLP-based personal learning assistant for school education
NLP-based personal learning assistant for school education
 
Hybrid deep learning model using recurrent neural network and gated recurrent...
Hybrid deep learning model using recurrent neural network and gated recurrent...Hybrid deep learning model using recurrent neural network and gated recurrent...
Hybrid deep learning model using recurrent neural network and gated recurrent...
 
A one decade survey of autonomous mobile robot systems
A one decade survey of autonomous mobile robot systems A one decade survey of autonomous mobile robot systems
A one decade survey of autonomous mobile robot systems
 
Effective Parameters of Image Steganography Techniques
Effective Parameters of Image Steganography TechniquesEffective Parameters of Image Steganography Techniques
Effective Parameters of Image Steganography Techniques
 
Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social media
 
323462348
323462348323462348
323462348
 
J017446568
J017446568J017446568
J017446568
 
Novel framework using dynamic passphrase towards secure and energy-efficient ...
Novel framework using dynamic passphrase towards secure and energy-efficient ...Novel framework using dynamic passphrase towards secure and energy-efficient ...
Novel framework using dynamic passphrase towards secure and energy-efficient ...
 

Destaque

An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...Kato Mivule
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule
 
Diss defense
Diss defenseDiss defense
Diss defensetherob762
 
Messina Proposal and Oral Defense
Messina Proposal and Oral Defense Messina Proposal and Oral Defense
Messina Proposal and Oral Defense alexmessina
 
Region filling
Region fillingRegion filling
Region fillinghetvi naik
 
X5 user manual v1.0a
X5 user manual v1.0aX5 user manual v1.0a
X5 user manual v1.0aNeder Burgos
 
HumanCloud - Trace
HumanCloud - TraceHumanCloud - Trace
HumanCloud - Traceutkarsh_hcbs
 
Thrust and lube - Startupfest 2012
Thrust and lube - Startupfest 2012Thrust and lube - Startupfest 2012
Thrust and lube - Startupfest 2012Alistair Croll
 
OUMH1103: TOPIK 3: READING FOR INFORMATION
OUMH1103: TOPIK 3: READING FOR INFORMATIONOUMH1103: TOPIK 3: READING FOR INFORMATION
OUMH1103: TOPIK 3: READING FOR INFORMATIONRasidah Sukor
 
Book Design by Jason Gonzales
Book Design by Jason GonzalesBook Design by Jason Gonzales
Book Design by Jason GonzalesJason Gonzales
 
Comparison between different marketing plans
Comparison between different marketing plansComparison between different marketing plans
Comparison between different marketing plansAji Subramanyan
 
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013Jennifer L. Scheffer
 
Wmit introduction 2012 english slideshare
Wmit introduction 2012 english slideshareWmit introduction 2012 english slideshare
Wmit introduction 2012 english slidesharegmesmatch
 
June 2013 IRMAC slides
June 2013 IRMAC slidesJune 2013 IRMAC slides
June 2013 IRMAC slidesAlistair Croll
 

Destaque (20)

An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
An Investigation of Data Privacy and Utility Preservation Using KNN Classific...
 
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
Kato Mivule: An Investigation of Data Privacy and Utility Preservation Using ...
 
3 d image processsing operations
3 d image processsing operations3 d image processsing operations
3 d image processsing operations
 
Diss defense
Diss defenseDiss defense
Diss defense
 
Messina Proposal and Oral Defense
Messina Proposal and Oral Defense Messina Proposal and Oral Defense
Messina Proposal and Oral Defense
 
Region filling
Region fillingRegion filling
Region filling
 
Wilbanks Can We Simultaneously Support Both Privacy & Research?
Wilbanks Can We Simultaneously Support Both Privacy & Research?Wilbanks Can We Simultaneously Support Both Privacy & Research?
Wilbanks Can We Simultaneously Support Both Privacy & Research?
 
Altman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless DataAltman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless Data
 
X5 user manual v1.0a
X5 user manual v1.0aX5 user manual v1.0a
X5 user manual v1.0a
 
HumanCloud - Trace
HumanCloud - TraceHumanCloud - Trace
HumanCloud - Trace
 
Thrust and lube - Startupfest 2012
Thrust and lube - Startupfest 2012Thrust and lube - Startupfest 2012
Thrust and lube - Startupfest 2012
 
Oumh1103 bab 4
Oumh1103 bab 4Oumh1103 bab 4
Oumh1103 bab 4
 
About P&T
About P&TAbout P&T
About P&T
 
OUMH1103: TOPIK 3: READING FOR INFORMATION
OUMH1103: TOPIK 3: READING FOR INFORMATIONOUMH1103: TOPIK 3: READING FOR INFORMATION
OUMH1103: TOPIK 3: READING FOR INFORMATION
 
Book Design by Jason Gonzales
Book Design by Jason GonzalesBook Design by Jason Gonzales
Book Design by Jason Gonzales
 
Comparison between different marketing plans
Comparison between different marketing plansComparison between different marketing plans
Comparison between different marketing plans
 
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
PROFESSIONAL LEARNING NETWORKS- MASS CUE 2013
 
Wmit introduction 2012 english slideshare
Wmit introduction 2012 english slideshareWmit introduction 2012 english slideshare
Wmit introduction 2012 english slideshare
 
June 2013 IRMAC slides
June 2013 IRMAC slidesJune 2013 IRMAC slides
June 2013 IRMAC slides
 
AM01PRO
AM01PROAM01PRO
AM01PRO
 

Semelhante a An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Kato Mivule
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309DrVictorFang
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeLizLyon
 
Privacy by design
Privacy by designPrivacy by design
Privacy by designblogzilla
 
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...Javier González
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECAProject
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...IJSRD
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...Rafael C. Jimenez
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...Hakka Labs
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereAlex Hardisty
 

Semelhante a An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge (20)

IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
Literature Review: The Role of Signal Processing in Meeting Privacy Challenge...
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
Privacy by design
Privacy by designPrivacy by design
Privacy by design
 
Resume 2015/1
Resume 2015/1Resume 2015/1
Resume 2015/1
 
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
A Distributed Architecture for Sharing Ecological Data Sets with Access and U...
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
A Survey Paper on an Integrated Approach for Privacy Preserving In High Dimen...
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Sinnott Paper
Sinnott PaperSinnott Paper
Sinnott Paper
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
 

Mais de Kato Mivule

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization Kato Mivule
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialKato Mivule
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Kato Mivule
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaKato Mivule
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsKato Mivule
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Kato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeKato Mivule
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoostKato Mivule
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule
 
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierTowards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierKato Mivule
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...Kato Mivule
 
Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Kato Mivule
 
Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaApplying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaKato Mivule
 

Mais de Kato Mivule (18)

A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization A Study of Usability-aware Network Trace Anonymization
A Study of Usability-aware Network Trace Anonymization
 
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A TutorialCancer Diagnostic Prediction with Amazon ML – A Tutorial
Cancer Diagnostic Prediction with Amazon ML – A Tutorial
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
 
Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...Implementation of Data Privacy and Security in an Online Student Health Recor...
Implementation of Data Privacy and Security in an Online Student Health Recor...
 
Applying Data Privacy Techniques on Published Data in Uganda
 Applying Data Privacy Techniques on Published Data in Uganda Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
 
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an OverviewKato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
Kato Mivule - Utilizing Noise Addition for Data Privacy, an Overview
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
 
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic AlgorithmsLit Review Talk by Kato Mivule: A Review of Genetic Algorithms
Lit Review Talk by Kato Mivule: A Review of Genetic Algorithms
 
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
Lit Review Talk by Kato Mivule: Protecting DNA Sequence Anonymity with Genera...
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a GaugeAn Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of  Adaptive Boosting – AdaBoostKato Mivule: An Overview of  Adaptive Boosting – AdaBoost
Kato Mivule: An Overview of Adaptive Boosting – AdaBoost
 
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
Kato Mivule: COGNITIVE 2013 - An Overview of Data Privacy in Multi-Agent Lear...
 
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning ClassifierTowards A Differential Privacy Preserving Utility Machine Learning Classifier
Towards A Differential Privacy Preserving Utility Machine Learning Classifier
 
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
A Robust Layered Control System for a Mobile Robot, Rodney A. Brooks; A Softw...
 
Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview Two Pseudo-random Number Generators, an Overview
Two Pseudo-random Number Generators, an Overview
 
Applying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in UgandaApplying Data Privacy Techniques on Published Data in Uganda
Applying Data Privacy Techniques on Published Data in Uganda
 

Último

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

An Investigation of Data Privacy and Utility Using Machine Learning as a Gauge

  • 1. Venue and Date: Center for Business and Graduate Studies Dean’s Conference Room 1303 Open to the Public Thursday, April 17, 2014 at 1 pm Dissertation Committee: Claude Turner, Ph.D. Chair Soo-Yeon Ji, Ph.D. Member Hoda El-Sayed, D.Sc. Member Darsana Josyula, Ph.D. Member Anthony Joseph, Ph.D. External Examiner Department of Computer Science Dissertation Defense AN INVESTIGATION OF DATA PRIVACY AND UTILITY USING MACHINE LEARNING AS A GAUGE Kato Mivule For the Degree of D.Sc. in Computer Science Cosmas U. Nwokeafor, PhD Dean, The Graduate School Lethia Jackson, D.Sc. Chair, Computer Science Department
  • 2. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE OUTLINE • Introduction o The Problem o Contributions • Literature Review • Methodology • Results and Discussion o Results o Discussion • Conclusion and Future work o Conclusion o Future work Kato Mivule – Bowie State University Department of Computer Science
  • 3. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE CONTRIBUTIONS 1. A proposed a data privacy engineering framework, SIED. 2. A proposed Comparative x-CEG data utility analysis heuristic. 3. A proposed Initial and Subsequent basic (IBP and SBP) privacy indexes. 4. A proposed data swapping and noise addition hybrid model for privacy. 5. A proposed privatized synthetic data generation model using image and signal processing techniques (DT, DCT, and DWT). 6. An implementation of k-anonymity by minimizing information loss via the frequency count analysis and synthetic data replacement model. Kato Mivule – Bowie State University Department of Computer Science
  • 4. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE THE PROBLEM Finding a user-defined balance between data privacy and utility needs with trade-offs. • The challenge of ambiguous definitions of privacy and utility. “Perfect privacy can be achieved by publishing nothing at all, but this has no utility; perfect utility can be obtained by publishing the data exactly as received, but this offers no privacy” Cynthia Dwork (2006) Data Privacy ~Differential Privacy ~Noise addition ~K-anonymity, etc... Data Utility ~Completeness ~Currency ~Accuracy Kato Mivule – Bowie State University Department of Computer Science
  • 5. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE MOTIVATION • Generate privatized synthetic data sets that meet acceptable privacy and utility requirements. • Data Privacy Engineering - Adapt engineering principles in the data privacy and utility process. HYPOTHESIS • Fine-tuning parameters in the data privacy procedure, specifically using perturbation methods such as noise addition and differential privacy, lowers the classification error and thus generates better data utility. Kato Mivule – Bowie State University Department of Computer Science
  • 6. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE LITERATURE REVIEW The data privacy and utility problem • Wong et, al., (2007); Meyerson & Williams, (2004); Park & Shim, (2007): Data privatization diminishes data utility – an NP-Hard problem. • Krause & Horvitz, (2010); Wang & Wu, (2005): Optimal data utility with privacy is a well-documented NP hard problem. • Ghosh, et al., (2008); Brenner & Nissim, (2010 ): Trade-offs needed in the privacy verses utility process – also NP hard. • Li & Li, (2009): It is not possible to equate privacy and utility. • Fienberg, Rinaldo, & Yang, (2010): Even with differential privacy, privacy is granted but at a loss of data utility. Kato Mivule – Bowie State University Department of Computer Science
  • 7. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE LITERATURE REVIEW Techniques and Algorithms used in this study • Data Privacy • Noise Addition • Logarithmic Noise • Multiplicative Noise • Differential Privacy • K-anonymity • Image and Signal Processing • Distance Transform • Discrete Cosine Transform • Discrete Wavelet Transform • Gaussian Filtering • Machine Learning • KNN • Neural Networks • Naïve Bayes • Decision Trees • AdaBoost M1 Kato Mivule – Bowie State University Department of Computer Science
  • 8. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 1 – SIED, a data privacy engineering framework • SIED phases – Specifications, Implementation, Evaluation, and Dissemination • Motivation: Given any original dataset 𝑋, a set of data privacy engineering phases should be followed from start to completion in the generation of a privatized dataset 𝑋′ . Kato Mivule – Bowie State University Department of Computer Science
  • 9. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 1 – SIED, a data privacy engineering framework - The SIED Specification Phase: Kato Mivule – Bowie State University Department of Computer Science
  • 10. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 1 – SIED, a data privacy engineering framework - The SIED Implementation Phase: Kato Mivule – Bowie State University Department of Computer Science
  • 11. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 1 – SIED, a data privacy engineering framework - The SIED Evaluation Phase: Kato Mivule – Bowie State University Department of Computer Science
  • 12. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 1 – SIED, a data privacy engineering framework - The SIED Dissemination Phase: Kato Mivule – Bowie State University Department of Computer Science
  • 13. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 2 – A Data Privacy Parameter Mapping Heuristic •Categorize parameters for effective fine-tuning – better privacy and utility. What parameters need adjustment in the data privacy process? CATEGORY 1 PARAMETERS CATEGORY 2 PARAMETERS CATEGORY 3 PARAMETERS Data Utility Goal Parameters: For example Accuracy, Currency, and Completeness. Data Privacy Algorithm Parameters: Values k in k-anonymity, ε in Noise addition and Differential privacy. Application Parameters (e.g. Machine Learning Classifier): For example weak learners in AdaBoost. Parameter Adjustment and Fine-tuning Trade-offs Data Privacy and Utility Preservation Kato Mivule – Bowie State University Department of Computer Science
  • 14. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 3 – The x-CEG and Comparative x-CEG Heuristics The Classification Error Gauge (x-CEG) Replicates x times until threshold t is reached. Better utility might be achieved - Publish Apply data privacy Classify privatized dataset Get original dataset If error <= t Adjust data privacy parameters Adjust classifier parameters If error > t The Comparative x- CEG heuristic employs multiple data privacy and classifier algorithms in each run. Kato Mivule – Bowie State University Department of Computer Science
  • 15. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 4 – The x-CEG Threshold determination heuristic • Average value of the function = integral / interval. • 𝐴𝑉𝐹 = 𝐼𝑛𝑡𝑒𝑔𝑟𝑎𝑙/𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 • 1 𝑏−𝑎 𝑓 𝑥 𝑑𝑥 𝑏 𝑎 • 1 𝑏−𝑎 𝑓(𝑥𝑖)𝑛 𝑖=1 ∆𝑥 • 𝑊ℎ𝑒𝑟𝑒 ∆𝑥 = 𝑏−𝑎 𝑛 • 𝐴𝑛𝑑 𝑥𝑖 = 1 2 𝑥𝑖−1 + 𝑥𝑖 • 𝑇ℎ𝑒 𝑚𝑒𝑎𝑛 𝜇 = 1 𝑁 𝑥𝑖 𝑁 𝑖=1 • 𝒕 = 𝑴𝒂𝒙[𝒎𝒂𝒙 𝒎𝒆𝒂𝒏 , 𝒎𝒂𝒙 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 ] • The threshold 𝑡 is chosen as the highest point between the max mean and max mid-point values. • The classification error of the original data set is used as a benchmark in measuring privatized synthetic data sets. Kato Mivule – Bowie State University Department of Computer Science
  • 16. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 5 – The Initial and Subsequent Privacy Indices • Let 𝑋 be the set of all values in database 𝑋 such that 𝑋 = {𝑋1 … 𝑋 𝑛} . • Let 𝑋′ be the set of items to be privatized such that 𝑋′ = {𝑋1 ′ … 𝑋 𝑛 ′ } • Let 𝑌 be the set of items that get revealed after our initial privacy measurement. • Where |𝑋′ | ≤ |𝑋| and |𝑌| ≤ |𝑋| • As long as 𝑋, 𝑋′ , 𝑎𝑛𝑑 𝑌 are countable, such that there is a one-to-one function (injective) 𝑓: 𝑋 → 𝑁; 𝑋′ → 𝑁; 𝑌 → 𝑁 from 𝑋, 𝑋′, 𝑎𝑛𝑑 𝑌 to natural numbers 𝑁 = { 0, 1, 2, 3 … 𝑛} respectively. • 𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝐵𝑎𝑠𝑖𝑐 𝑃𝑟𝑖𝑣𝑎𝑐𝑦 (𝐼𝐵𝑃) = 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 𝑋′ 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 (𝑋) ∗ 100 • 𝑆𝑢𝑏𝑠𝑒𝑞𝑢𝑒𝑛𝑡 𝐵𝑎𝑠𝑖𝑐 𝑃𝑟𝑖𝑣𝑎𝑐𝑦 (𝑆𝐵𝑃) = 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 𝑋′− 𝑌 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 (𝑋) ∗ 100 • where 𝐶𝑎𝑟𝑑𝑖𝑛𝑎𝑙𝑖𝑡𝑦 is the total count of elements in both 𝑋′ , 𝑌 and 𝑋. • IBP and SBP could be taken as percentages or normalized between 0 and 1. Kato Mivule – Bowie State University Department of Computer Science
  • 17. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Methodology – Contribution 6 – The Filtered Comparative x-CEG Heuristic - Using image and signal processing techniques to generate privatized synthetic data. Kato Mivule – Bowie State University Department of Computer Science
  • 18. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 7– Data swapping and noise addition data privacy hybrid - Generating privatized synthetic data using data swapping and noise perturbation. Kato Mivule – Bowie State University Department of Computer Science
  • 19. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE METHODOLOGY Contribution 8 – Minimizing information loss with K-anonymity • Implementation of k-anonymity by minimizing information loss via the frequency count analysis and synthetic data replacement model. Kato Mivule – Bowie State University Department of Computer Science
  • 20. RESULTS AND DISCUSSION Comparative x-CEG Results •The Iris Fisher multivariate dataset from the UCI repository was used. •165 experiment runs – generating 165 privatized synthetic data sets. •KNN, Neural Nets, Decision Trees, AdaBoost, and Naïve Bayes •MATLAB data privacy and Rapid Miner for machine learning. NOISE LEVEL KNN NEURAL NETS NAÏVE BAYES DECISION TREES ADABOOST M1 Original 96.00 96.67 96.00 94.67 97.33 Noise1(μ=5.8, σ=0.8) 66.67 74.00 64.00 66.67 64.00 Noise2(μ=0, σ=0.8) 61.33 72.00 66.67 63.33 54.67 Noise3(μ=1, σ=0.8) 68.67 74.00 69.33 66.67 60.00 Noise4(μ=2, σ=0.8) 68.67 62.67 62.00 59.33 54.67 Noise5(μ=3, σ=0.8) 72.67 66.67 67.33 61.33 50.67 Noise6(μ=4, σ=0.8) 75.33 82.67 70.00 72.00 63.33 Noise1a(μ=5, σ=0.1) 94.00 93.33 92.67 91.33 92.67 Noise1b(μ=5, σ=0.2) 92.00 94.67 91.33 90.00 90.67 Noise1c(μ=5, σ=0.3) 93.33 94.00 90.67 92.00 94.00 Noise1d(μ=5, σ=0.4) 90.00 93.33 87.33 86.67 86.67 Noise2b(μ=0, σ=0.1) 96.67 96.67 94.00 96.67 92.00 Noise2c(μ=0, σ=0.2) 89.33 92.00 86.67 87.33 90.00 Noise2d(μ=0, σ=0.3) 87.33 90.00 86.67 84.67 85.33 Noise2e(μ=0, σ=0.4) 87.33 90.00 86.67 84.67 85.33 Noise3a(μ=1, σ=0.4) 87.33 87.33 85.33 84.00 83.33 Noise3b(μ=1, σ=0.1) 97.33 94.00 96.00 96.00 94.67 Noise3c(μ=1, σ=0.2) 92.67 95.33 91.33 90.67 93.33 Noise3d(μ=1, σ=0.3) 94.67 95.33 91.33 94.00 90.00 Noise4a(μ=2, σ=0.1) 94.67 98.00 98.00 96.67 98.00 Noise4b(μ=2, σ=0.2) 93.33 96.00 92.67 91.33 90.67 Noise4c(μ=2, σ=0.3) 88.00 91.33 89.33 90.00 86.67 Noise4d(μ=2, σ=0.4) 87.33 87.33 85.33 84.00 83.33 Noise5a(μ=3, σ=0.1) 97.33 94.00 96.00 96.00 94.67 Noise5b(μ=3, σ=0.2) 92.67 95.33 91.33 90.67 93.33 Noise5c(μ=3, σ=0.3) 94.67 95.33 91.33 94.00 90.00 Noise5d(μ=3, σ=0.4) 93.33 94.00 93.33 92.00 87.33 Noise6a(μ=4, σ=0.1) 78.00 87.33 87.33 82.67 84.67 Noise6b(μ=4, σ=0.2) 93.33 95.33 94.00 93.33 92.67 Noise6c(μ=4, σ=0.3) 91.33 92.00 92.00 90.00 92.00 Noise6d(μ=4, σ=0.4) 78.00 87.33 88.67 82.67 84.67 Multiplicative 56.67 68.67 59.33 64.67 58.00 Logarithmic 50.67 58.00 56.00 53.33 57.33 DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE Kato Mivule – Bowie State University Department of Computer Science
  • 21. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE RESULTS AND DISCUSSION Comparative x-CEG Results • A bar chart depiction of the Comparative x-CEG classification accuracy results Kato Mivule – Bowie State University Department of Computer Science
  • 22. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE RESULTS AND DISCUSSION - Comparative x-CEG Results Comparative x-CEG results classifier performance results – Neural Nets most resilient. Kato Mivule – Bowie State University Department of Computer Science
  • 23. RESULTS AND DISCUSSION • x-CEG Threshold Determination Results • Threshold 𝒕 = 𝑴𝒂𝒙[𝒎𝒂𝒙 𝒎𝒆𝒂𝒏 , 𝒎𝒂𝒙 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 ] • The threshold value is chosen heuristically using the mid-point value classification accuracy of 87.33% for the Neural Nets. Statistic KNN NEURAL NETS NAÏVE BAYES DECISION TREES ADABOOST M1 MAX Mean 84.87 87.41 84.54 83.74 82.30 87.41 Mid-Point 80.18 82.48 79.81 79.05 77.51 82.48 Max 84.87 87.41 84.54 83.74 82.30 87.41 DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE Kato Mivule – Bowie State University Department of Computer Science
  • 24. RESULTS AND DISCUSSION • x-CEG Threshold Determination Results • Threshold 𝒕 = 𝑴𝒂𝒙[𝒎𝒂𝒙 𝒎𝒆𝒂𝒏 , 𝒎𝒂𝒙 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 ] • The threshold value is chosen heuristically using the mid-point value classification accuracy of 87.33% for the Neural Nets. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE Kato Mivule – Bowie State University Department of Computer Science
  • 25. RESULTS AND DISCUSSION • x-CEG Threshold Determination Results • Threshold 𝒕 = 𝑴𝒂𝒙[𝒎𝒂𝒙 𝒎𝒆𝒂𝒏 , 𝒎𝒂𝒙 𝒎𝒊𝒅𝒑𝒐𝒊𝒏𝒕 ] • The threshold value is chosen heuristically using the mid-point value classification accuracy of 87.33% for the Neural Nets. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE Kato Mivule – Bowie State University Department of Computer Science
  • 26. RESULTS AND DISCUSSION • How much privacy? – statistical traits of the original and privatized data. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 27. RESULTS AND DISCUSSION • How much privacy? – statistical traits of the original and privatized data. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 28. RESULTS AND DISCUSSION • How much privacy? – statistical traits of the original and privatized data. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 29. RESULTS AND DISCUSSION • How much privacy? – statistical traits of the original and privatized data. Statistic Value Original Data MSE 15.8937 Privatized Data MSE 24.0875 Original Data Entropy -3.05E+04 Privatized Data Entropy -5.05E+04 Correlation 0.9808 MSE Difference 8.1938 Entropy Difference -2.00E+04 Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 30. RESULTS AND DISCUSSION – Data Swapping and Noise Addition Hybrid • 330 data sets generated from the data swapping and noise addition hybrid experiment. • Optimal data swap for acceptable privacy and utility levels is between 5% and 10% data swap. • The two data sets satisfied the threshold criteria after the Comparative x-CEG: • 𝑛𝑜𝑖𝑠𝑒 ~ (𝜇 = 1, 𝜎 = 0.1) at 5% swap. • 𝑛𝑜𝑖𝑠𝑒 ~ (𝜇 = 5, 𝜎 = 0.1) at 5% swap. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 31. RESULTS AND DISCUSSION – Data Swapping and Noise Addition Hybrid Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 32. RESULTS AND DISCUSSION – Data Swapping and Noise Addition Hybrid • Best classification accuracy obtained between 5 to10% data swap. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 33. RESULTS AND DISCUSSION – Signal Processing and Data Privacy Hybrid Privatized synthetic data sets using Discrete Cosine Transforms (DCT) . Synthetic DCT-based Sepal Length data results Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 34. RESULTS AND DISCUSSION – Signal Processing and Data Privacy Hybrid Privatized synthetic data sets using Discrete Cosine Transforms (DCT) . Synthetic Filtered DCT-based Sepal Length data results Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 35. RESULTS AND DISCUSSION – Signal Processing and Data Privacy Hybrid Privatized synthetic data sets using Discrete Cosine Transforms (DCT) . Filtered DCT-based data descriptive statistics – skeletal structure not kept as in DT-based data Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 36. RESULTS AND DISCUSSION – Signal Processing and Data Privacy Hybrid Privatized synthetic data sets using Discrete Cosine Transforms (DCT) . Filtered DCT-based data inference statistics – low correlation Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 37. RESULTS AND DISCUSSION – Image Processing and Data Privacy Hybrid Privatized synthetic data sets using Distance Transforms (DT) – Skeletal Structure kept. . DT-based Sepal Length data results DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 38. RESULTS AND DISCUSSION – Image Processing and Data Privacy Hybrid Privatized synthetic data sets using Distance Transforms (DT) – Skeletal Structure kept. . Filtered DT-based Sepal Length data results DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 39. RESULTS AND DISCUSSION – Image Processing and Data Privacy Hybrid Privatized synthetic data sets using Distance Transforms (DT) – Skeletal Structure kept. . Filtered DT-based data descriptive statistics – skeletal structure kept DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 40. RESULTS AND DISCUSSION – Image Processing and Data Privacy Hybrid Privatized synthetic data sets using Distance Transforms (DT) – Skeletal Structure kept. . Filtered DT-based data Iinference statistics – High correlation DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 41. RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test DT produced the best Davis Bouldin Criterion at 0.419 after filtering. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE DT-based synthetic data produced the best Davis Bouldin Criterion at 0.419 after filtering, out performing the original data.
  • 42. RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 43. RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test Clustering results of the Original Fisher Iris Data DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 44. RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test Clustering results of the synthetic DT-based synthetic Fisher Iris Data DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 45. RESULTS AND DISCUSSION – Distance Transforms Based Data and the Clustering Test Clustering Results of the Filtered DT-based Fisher Iris Data. Clustering greatly improved after filtering. DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 46. RESULTS AND DISCUSSION DT, DCT, and DWT improved classification accuracy after filtering. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 47. Results – Signal Processing – The Machine Learning Classification Error Test Bowie State University Department of Computer Science Priv Synth Data NN KNN NB DT AdaBoost Max Mean 91.00 87.95 86.07 86.74 84.33 91.00 MID-POINT 75.83 72.78 71.65 72.31 70.39 75.83 Max 91.00 87.95 86.07 86.74 84.33 91.00 DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 48. RESULTS AND DISCUSSION - Non-Interactive Differential Privacy (DP) •Results of the Iris-Fisher data after DP – Too much noise is an issue with DP Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 49. RESULTS AND DISCUSSION - Non-Interactive Differential Privacy (DP) • Classification accuracy of DP data (before filtering) reduces with increased DP levels. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 50. RESULTS AND DISCUSSION - Non-Interactive Differential Privacy (DP) • Improved Classification accuracy of DP data sets after filtering. Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 51. RESULTS AND DISCUSSION - Non-Interactive Differential Privacy (DP) • Comparative descriptive statistics of Original, DP, and filtered DP based data. •Skeletal structure not kept as in DT-based data but outlier noise removed in DP-based data Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 52. Results – Non-Interactive Differential Privacy – Inference Statistics Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 53. Results – Non-Interactive Differential Privacy – How much DP? Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 54. Results – Non-Interactive Differential Privacy – How much DP? Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 55. RESULTS AND DISCUSSION– Data Privacy using K-Anonymity • Suppress all items were k = 1. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 56. RESULTS AND DISCUSSION– Data Privacy using K-Anonymity • Replace suppressed items with new synthetic values (most frequent values) such that k > 1 for all items. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 57. RESULTS AND DISCUSSION – Data Privacy using K-Anonymity • Only sensitive attributes removed – info loss minimized in published attributes. Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 58. RESULTS AND DISCUSSION – Data Privacy using K-Anonymity • Only sensitive attributes removed – info loss minimized in published attributes. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 59. CONCLUSION • The Comparative x-CEG: Empirical results from this study show that fine-tuning parameters in the data privacy procedure, specifically, Noise Addition and Differential Privacy, and with adjustments to the machine learning classifiers, lowers the classification error and thus generates better and desirable data utility. The hypothesis holds. The x-CEG model could help in presenting acceptable trade-off points between privacy and utility. • The SIED model: It is vital for the appropriate solicitation of data privacy requirements that vary on a case by case basis; therefore SIED could serve as a suitable framework in such data privacy engineering process. • Privatized Synthetic Data Generation: Data swapping, Distance Transforms, Discrete Cosine Transforms, and Discrete Wavelet Transforms, in combination with data privacy procedures allow for the generation of privatized synthetic data sets. However, more research on optimal parameterization needs to be done; as well as using other signal processing techniques. • Distance Transforms and Filtering: Empirical results from this study show that a hybrid of Distance Transforms (DT) and data privacy, in combination with filtering, maintains the skeletal structure of the original data, generates privatized synthetic data with better classification accuracy results, thus better utility. However, more study needs to be done on securing DT-based privatized data, to prevent attackers from reconstructing private data. • Differential Privacy and Filtering: On the other hand, Differential Privacy (DP) offers strong privacy guarantees but at the loss of data utility. However, empirical results from this study have shown that Gaussian filtering does reduce outlier noise in DP-based data and with improved classification accuracy results. • K-anonymity: Information loss could be minimized using frequency count analysis for privatized data models requiring k-anonymity for confidentiality. Only remove sensitive attributes and use synthetics for suppressed values. • Privacy versus Utility: Achieving optimal utility while granting privacy is still sought; Yet still, accurate classification could also mean loss of privacy; Trade-offs must be made between privacy and utility. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 60. FUTURE WORK •Future works include: •Further the state-of-the-art in Data Privacy Engineering by developing data privacy compliant software, data privacy modeling, autonomous intelligent data privacy agent systems following the SIED framework. •Apply data privacy and utility principles on digital forensics data, network traffic data, bioinformatics data, and big data. •Study efficient generation of privatized synthetic data sets. • Apply data privacy principles to real time data; including realistic scenarios, where users of data provide feedback on how useful the data was to them. •Show, analytically, differences in performance between the various methods introduced in this work, as well as other state-of-the-art methods. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 61. PUBLICATIONS 1. Kato Mivule, “Towards Agent-based Data Privacy Engineering”, Proceedings of the Sixth International Conference on Advanced Cognitive Technologies and Applications – COGNITIVE 2014, May 25 – May 30, 2014 (In Print), Venice, Italy. 2. Kato Mivule and Claude Turner, “SIED, A Data Privacy Engineering Framework”, Abstracts, Emerging Researchers National Conference in STEM (ERN 2014), Page A239, ISBN 978-0-87168-757-9, Feb 20-22, 2014, Washington DC, USA. [Best Oral Presentation Award] 3. Kato Mivule and Claude Turner, International Journal of Computer Science and Mobile Computing, ICMIC13, December- 2013, pg. 36-43, Trivandrum, Kerala, India, Dec 17-18, 2013, Trivandrum, Kerala, India. 4. Kato Mivule and Claude Turner, A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Using Machine Learning Classification as a Gauge, Procedia Computer Science, Volume 20, 2013, Pages 414-419, ISSN 1877-0509, Nov 13-15, Baltimore, MD, USA. 5. Kato Mivule and Claude Turner, “An Investigation of Data Privacy and Utility Preservation Using KNN Classification as a Gauge”, International Conference on Information and Knowledge Engineering (IKE 2013), July 22-25, Pages 203-204, Las Vegas, NV, USA. 6. Kato Mivule, Darsana Josyula, and Claude Turner, “Data Privacy Preservation in Multi-Agent Learning Systems”, Proceedings of the Fifth International Conference on Advanced Cognitive Technologies and Applications – COGNITIVE 2013, May 27 - June 1, 2013, Pages 14-20, Valencia, Spain. 7. Kato Mivule, Claude Turner, Soo-Yeon Ji, "Towards A Differential Privacy and Utility Preserving Machine Learning Classifier", Procedia Computer Science, 2012, Pages 176-181, Washington DC, USA. 8. Kato Mivule, Stephen Otunba, Tattwamasi Tripathy, Sharad and Sharma, "Implementation of Data Privacy and Security in an Online Student Health Records System", Proceedings at the ISCA 21th International Conference on Software Engineering and Data Engineering (SEDE-2012), Pages 143-148, Los Angeles CA, USA. 9. Kato Mivule, Claude Turner, "Applying Data Privacy Techniques on Published Data in Uganda", Proceedings of the 2012 International Conference on e-Learning, e- Business, Enterprise Information Systems, and e-Government (EEE 2012), Pages 110-115, Las Vegas, NV, USA. 10. Kato Mivule, "Utilizing Noise Addition for Data Privacy, an Overview", Proceedings of the International Conference on Information and Knowledge Engineering (IKE 2012), Pages 65-71, Las Vegas, NV, USA. Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE
  • 62. THANK YOU! QUESTIONS? kmivule@gmail.com Kato Mivule – Bowie State University Department of Computer Science DISSERTATION DEFENSE PRESENTATION BY KATO MIVULE