Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Promise 2011: "An Iterative Semi-supervised Approach to Software Fault Prediction"
1. An Iterative Semi-supervised Approach to Software Fault Prediction Huihua Lu, Bojan Cukic, Mark Culp Lane Department of Computer Science and Electrical Engineering Department of Statistics West Virginia University Morgantown, WV September 2011
The assumption of Supervised Learning is that the distribution of training data should be identical to the distribution of testing data. That is the training data should be presentive to the data space. WV
WV
WV
WV
WV
WV
WV
WV
WV
WV
1. Horizontal lines represent the performance of random forest. Supervised learning does not change with iteration… Mention that AUC does not change 2. Then observe that improvements in PD limited at low threshold because few FP modules remain to be detected Performance (PD) may slightly deteriorate at low thresholds (few %) WV
Semi-supervised approach with a small number of labeled modules (2% or 5%) may not lead to improvement WV
WV
Lager size of Labeled data, better performance Overall, semi-supervised algorithm performs better than the corresponding supervised algorithm (RF) WV