Mais conteúdo relacionado Semelhante a The Base Rate Fallacy - Source Boston 2013 (20) The Base Rate Fallacy - Source Boston 20134. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Patrick Florer
• Cofounder and CTO of Risk Centric Security.
• Fellow at the Ponemon Institute.
• 33 years of IT experience, including roles in IT operations,
development, and systems analysis.
• In addition, he worked a parallel track in medical outcomes research,
analysis, and the creation of evidence-based guidelines for medical
treatment.
• From 1986 until now, he has worked as an independent consultant,
helping customers with strategic development, analytics, risk
analysis, and decision analysis.
5. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Jeff Lowder
• President of the Society for Information Risk Analysts
(www.societyinforisk.org)
• Director, Global Information Security and Privacy,
OpenMarket
• Industry thought leader who leads world-class security organizations
by building and implementing custom methodologies and frameworks
that balance information protection and business agility.
• 16 years of experience in information security and risk management
• Previous leadership roles include:
– Director, Information Security, The Walt Disney Company
– Manager, Network Security, NetZero/United Online
– Director, Security and Privacy, Elemica
– Director, Network Security, United States Air Force Academy
7. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Quick Check #1: How Good Are You at Estimating Risk?
The witness gets the color
right 80% of the time
“I saw an
orange
cab”
What color was
the cab?
8. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Quick Check #1: What is the probability the cab was orange?
(a) 80%
(b) Somewhere between 50 and 80%
(c) 50%
(d) Less than 50% (the cab was green)
(e) There is not enough information to answer the question.
9. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Quick Check #2: How Good Are You at Estimating Risk?
85% of the taxis on
the road are green
cabs.
15% of the taxis on
the road are orange
cabs.
The witness gets the color
right 80% of the time
“I saw an
orange
cab”
What color was
the cab?
10. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Quick Check #2: What is the probability the cab was orange?
(a) 80%
(b) Somewhere between 50 and 80%
(c) 50%
(d) Less than 50% (the cab was green)
(e) There is not enough information to answer the question.
11. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
• 850 out of every 1,000 taxis are green cabs.
• 150 out of every 1,000 taxis are orange cabs.
• Of these 850 green cabs, the witness will see 170 of them as orange.
• Of the 150 remaining orange cabs, the witness will see 120 of them
as orange.
• The witness says she saw an orange cab.
Quick Check #3
12. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Quick Check #3: What is the probability the cab was orange?
(a) 80%
(b) Somewhere between 50 and 80%
(c) 50%
(d) Less than 50% (the cab was green)
(e) There is not enough information to answer the question.
13. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Correct Answers
Quick Check #1:
Unknown. Without some idea of the base rate of orange cabs, it is impossible
to estimate the probability that the witness is correct.
Quick Check #2:
(d) Less than 50% (the cab was green)
Quick Check #3:
(d) Less than 50% (the cab was green)
The exact probability is 41%.
14. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Base Rates and the Base Rate Fallacy
• Base Rate: “The base rate of an attribute (or event)
in a population is the proportion of individuals
manifesting that attribute (at a certain point in time).
A synonym for base rate is prevalence.”
Source: Gerd Gigerenzer, Calculated Risks: How to Know When Numbers
Deceive You
• Base Rate Fallacy: A fallacy of statistical reasoning
that occurs when both base rate information and
specific data about a specific individual is available.
The fallacy is to make a conclusion about the
probability (or frequency) by ignoring the base rate
information and only considering the specific data.
15. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
How Did Your Peers Do (with the Taxi Cab Question)?
A
35%
B
45%
C
15%
D
5%
Jeff Lowder’s Research:
How often do GRC
professionals commit
the base-rate fallacy?
When presented with
conditional probabilities
(such as Quick Check #2),
95% of GRC professionals
commit the base-rate
fallacy
16. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Why the Base Rate Matters to InfoSec
Example Output Relevant Base Rate
Common Vulnerability
Scoring System
Vuln scores (0-10) used to
prioritize remediation
Base rate of vuln
exploitation
Vulnerability Scanners Scan results Base rate of
vulnerabilities
Intrusion Detection
Systems
Intrusion alerts Base rate of intrusion
events
Anti-Virus/ Malware /
Spyware
Virus / Malware / Spyware
alerts
Base rate of virus /
malware / spyware
events
Risk Analysis Residual Risk Base rate of hazard
(part of inherent risk)
Third-Party Assurance Third-Party Audit Report Base rate of compliant
vendors
Base rate of accurate
audit reports
17. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Why the Base Rate Matters to InfoSec
Example Questions to Ask
Common Vulnerability Scoring System What is the base rate of vuln
exploitation?
Vulnerability Scanners What is the base rate of each type of
vulnerability?
Intrusion Detection Systems What is the base rate of intrusion
events?
Anti-Virus/ Malware / Spyware What is the base rate of virus /
malware / spyware events?
Risk Analysis What is the base rate of the hazard
(part of inherent risk)?
Third-Party Assurance What is the base rate of compliant
vendors?
What is the base rate of accurate
audit reports?
19. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Use Natural Frequencies to Avoid the Base Rate Fallacy
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen
as orange
by witness
150 orange
cabs
120 seen
as orange
by witness
30 seen as
green by
witness
Pr (orange | witness says it was orange)
120
= ---------------- = 41%
170+120
20. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Use Natural Frequencies to Avoid the Base Rate Fallacy
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen
as orange
by witness
150 orange
cabs
120 seen
as orange
by witness
30 seen as
green by
witness
Pr (Green | witness says it was orange)
170
= ---------------- = 59%
170+120
21. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Natural Frequencies Work!
Gerd Gigerenzer
Max Planck Institute (Berlin),
former Professor of Psychology (University of Chicago)
“People tend to get the right answers
using natural frequencies much more
often than using conditional
probabilities: 80% vs. 33%.”
22. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
True Positives (aka “hits”)
1,000 cabs
850 green
cabs
680 seen
as green
by witness
170 seen as
orange by
witness
150 orange
cabs
120 seen as
orange by
witness
30 seen as
green by
witness
Assumption:
The cab was green
𝐿𝐿𝐿𝐿𝐿𝐿 𝑇𝑇𝑇𝑇 = # 𝑜𝑜𝑜𝑜 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
𝑇𝑇𝑇𝑇 = # 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 & (𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔)
𝑇𝑇𝑇𝑇 = 𝟔𝟔𝟔𝟔𝟔𝟔
23. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
False Positives (aka “false alarms” or “Type I errors”)
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen as
orange by
witness
150 orange
cabs
120 seen as
orange by
witness
30 seen as
green by
witness
Assumption:
The cab was green
𝐿𝐿𝐿𝐿𝐿𝐿 𝐹𝐹𝐹𝐹 = 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃
𝐹𝐹𝐹𝐹 = # 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 & (𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜)
𝐹𝐹𝐹𝐹 = 𝟑𝟑𝟑𝟑
24. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
True Negatives
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen as
orange by
witness
150 orange
cabs
120 seen
as orange
by witness
30 seen as
green by
witness
Assumption:
The cab was green
𝐿𝐿𝐿𝐿𝐿𝐿 𝑇𝑇𝑇𝑇 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁
𝑇𝑇𝑇𝑇 = # 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 & (𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜)
𝑇𝑇𝑇𝑇 = 𝟏𝟏𝟏𝟏𝟏𝟏
25. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
False Negatives (aka “misses” or “Type II errors”)
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen
as orange
by witness
150 orange
cabs
120 seen as
orange by
witness
30 seen as
green by
witness
Assumption:
The cab was green
𝐿𝐿𝐿𝐿𝐿𝐿 𝐹𝐹𝐹𝐹 = 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁
𝐹𝐹𝐹𝐹 = # 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 & (𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔)
𝐹𝐹𝐹𝐹 = 𝟏𝟏𝟏𝟏𝟏𝟏
26. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Summary
1,000 cabs
850 green
cabs
680 seen
as green
by witness
170 seen
as orange
by witness
150 orange
cabs
120 seen
as orange
by witness
30 seen as
green by
witness
Assumption:
The cab was green
True Positive False Positive True Negative False Negative
Green Cab Orange Cab
Seen as Green Cab 680 True Positives
(TP=680)
30 False Positives
(FP=30)
Seen as Orange Cab 170 False Negatives
(FN=170)
120 True Negatives
(TN=120)
28. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
What are fourfold tables?
A fourfold table, also called a 2 x 2 table, is a
four cell table (2 rows x 2 columns) based
upon two sets of dichotomous or ”yes/no”
facts.
29. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
What are fourfold tables?
Exists/True
Does not exist/
Not True/False
Identified/Detected as
Existing/True/Positive
True Positive (TP)
(+ +)
False Positive (FP)
(- +)
Identified/Detected as
Not
Existing/False/Negative
False Negative (FN)
(+ -)
True Negative (TN)
(- -)
30. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
What are fourfold tables?
The two sets of facts could be:
Something that exists or is true, or doesn’t exist/isn’t
true, and
Something else related to #1 that exists or is true, or
doesn’t exist/isn’t true, including “something” that
attempts to identify/detect whether #1 exists/is true or
#1 does not exist/is not true.
31. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
What are fourfold tables?
The “something” that exists or doesn’t exist could be a disease,
a virus or worm, malware, exploit code, or a malicious packet:
i.e: you have a disease or you don’t; a piece of code is
malicious or it isn’t, etc.
The “something else” that “identifies/detects” could be a
medical diagnostic test, anti-virus/anti-malware software,
IDS/IPS systems, etc. The diagnostic test or software either
correctly identifies the disease, virus, or malware, or it doesn’t.
32. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
What are fourfold tables?
True Positive: “something” does exist/is true and is correctly identified as
existing/true.
False Positive: “something” does not exist/is not true, but is incorrectly identified
as existing/true.
True Negative: “something” does not exist/is false, and is correctly identified as
not existing/false.
False Negative: “something” does exist/is true, but is incorrectly identified as not
existing/not true.
33. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
A question:
Let’s assume a technology that detects malware:
95% accuracy in detecting malware
10% false positive rate
What is the probability that a file identified as
malware really is malware (Positive Predictive
Value or PPV)?
34. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
A question:
If this is all we know, can we even answer the
question?
35. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
A question:
If this is all we know, can we even answer the
question?
No, we need to know something else:
36. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
A question:
If this is all we know, can we even answer the
question?
No, we need to know something else:
We need to know the base rate of malware in
our target population (the population
prevalence) – i.e.: what is our estimate of the
proportion of malware on our network?
37. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
How do fourfold tables work?
Assuming that 3% of all files on our network are
malware (base rate / prevalence)
And accepting that the detection technology has:
95% accuracy in detecting malware
10% false positive rate
We are now ready to build a four-fold table.
38. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
How do fourfold tables work?
Out of 1 million files:
3% are malware = 30,000
97% are not malware = 970,000
39. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
How do fourfold tables work?
Out of 30,000 malicious files:
95% are correctly identified as malware =
28,500 (True Positive)
5% are incorrectly identified as not
malware = 1,500 (False Negative)
40. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
How do fourfold tables work?
Out of 970,000 non-malicious files:
10% are incorrectly identified as malware
= 97,000 (False Positive)
90% are correctly identified as not
malware = 873,000 (True Negative)
41. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
How do fourfold tables work?
True False
Positive True Positive
(TP)
28,500
False Positive
(FP)
97,000
Negative False Negative
(FN)
1,500
True Negative
(TN)
873,000
42. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Questions we can answer with fourfold tables :
What is the probability that a file identified as
malware is really malware (PPV)?
PPV = TP / (TP + FP)
= 28,500 / (28,500 + 97,000)
= 22.7%
What happened to 95%?
43. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Questions we can answer with fourfold tables :
What is the probability that a file identified as
not-malware is really not malware (Negative
Predictive Value or NPV)?
NPV = TN / (TN + FN)
= 873,000 / (873,000 + 1,500)
= 99.8%
44. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Questions we can answer with fourfold tables :
Some things to remember:
The numbers in the four cells must add up
to 100% of the total number being analyzed
(1M in this example)
As the base rate approaches 100%, the
base rate fallacy ceases to apply
45. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Bayes Theorem:
P(A|B) = P(B|A) * P(A)
P(B)
Where:
A = Malware
B = Detection
P(A) = probability that malware is present (prevalence)
P(B|A) = probability of correct detection (TP)
P(B) = probability of positive detection ((TP+FP)/All))
46. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Bayes Theorem:
Using the numbers from the example:
P(A) = 3%
P(B|A) = 95%
P(B) = (TP rate * P(A)) + (1 - P(A)) * FP rate)
= (0.95 * 0.03) + ((1 - 0.03) * 0.10)
= (0.0285 + 0.097)
= 0.1255
47. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Bayes Theorem:
We can solve the equation as follows:
P(A|B) = (0.95 * 0.03) / 0.1255
= 0.227
= 22.7%
48. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Speaking of prevalence and incidence -
What’s the difference?
Prevalence: cross-sectional, how much is out
there right now?
Incidence: longitudinal, a proportion of new
cases found during a time period
Both prevalence and incidence can be
expressed as rates.
49. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #1 – Solve with a fourfold table
The government estimates that 1,000 terrorists live in the US, in a
population of 300 million people. It wishes to install a nationwide
surveillance system with the following estimated characteristics:
99% probability of correctly identifying a terrorist (TP rate)
0.1% (0.001) probability of incorrectly identifying a non-terrorist (FP
rate).
What is the positive predictive value of this system?
Credit: Schneier on Security, July 10, 2006
50. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #2 – Solve with a fourfold table or Bayes Theorem
You are thinking about licensing a vulnerability scanning technology.
From experience and from published reports, you believe that the base
rate of vulnerable systems in your environment could be as high as
15%
The vendor claims 95% accuracy (TP rate), with a false positive rate of
10%.
What is the positive predictive value of this technology?
51. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Monte Carlo Simulation
How can the use of Monte Carlo simulation
improve the use of fourfold tables?
Examples in Excel
52. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Contact Info
Patrick Florer
CTO and Cofounder
Risk Centric Security, Inc.
• Web:
www.riskcentricsecurity.com
• Email:
patrick@riskcentricsecurity.com
Jeff Lowder
President, Society for Information
Risk Analysts (SIRA)
• Web: www.jefflowder.com
• Blog: bloginfosec.com and
www.societyinforisk.org
• Twitter: @agilesecurity
54. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Accuracy (aka Efficiency)
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen as
orange by
witness
150 orange
cabs
120 seen as
orange by
witness
30 seen as
green by
witness
Assumption:
The cab was green
What is the percentage of correct observations made by the witness?
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =
𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇 + 𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹 + 𝐹𝐹𝑁𝑁
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =
# 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 + # 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
# 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 =
680 + 30
680 + 120 + 30 + 170
= 𝟕𝟕𝟕𝟕𝟕
55. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
True Positive Rate (aka “Sensitivity”)
1,000 cabs
850 green
cabs
680 seen
as green
by witness
170 seen
as orange
by witness
150 orange
cabs
120 seen as
orange by
witness
30 seen as
green by
witness
Assumption:
The cab was green
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆) =
𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆) =
# 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔
# 𝑜𝑜𝑜𝑜 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
680
680 + 170
= 𝟖𝟖𝟖𝟖𝟖
56. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
False Positive Rate (aka “False Alarm Rate”)
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen as
orange by
witness
150 orange
cabs
120 seen
as orange
by witness
30 seen as
green by
witness
Assumption:
The cab was green
𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝛼𝛼) =
𝐹𝐹𝐹𝐹
𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇
𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝛼𝛼) =
# 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔
# 𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝛼𝛼) =
30
30 + 120
= 𝟐𝟐𝟐𝟐𝟐
57. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
True Negative Rate (aka “Specificity”)
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen as
orange by
witness
150 orange
cabs
120 seen
as orange
by witness
30 seen as
green by
witness
Assumption:
The cab was green
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑎𝑎𝑎𝑎 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂
# 𝑜𝑜𝑜𝑜 𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂𝑂 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
120
120 + 30
= 𝟖𝟖𝟖𝟖𝟖
58. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
False Negative Rate
1,000 cabs
850 green
cabs
680 seen
as green
by witness
170 seen
as orange
by witness
150 orange
cabs
120 seen as
orange by
witness
30 seen as
green by
witness
Assumption:
The cab was green
𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝛽𝛽) =
𝐹𝐹𝐹𝐹
𝐹𝐹𝐹𝐹 + 𝑇𝑇𝑇𝑇
𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝛽𝛽) =
# 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡 𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
# 𝑜𝑜𝑜𝑜 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝛽𝛽) =
170
170 + 680
= 𝟐𝟐𝟐𝟐𝟐
59. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Summary
Green Cab orange Cab
Seen as Green Cab 80% TP Rate
(Sensitivity=80%)
20% FP Rate
(α=20%)
Seen as orange Cab 20% FN Rate
(β = 20%)
80% TN Rate
(Specificity=80%)
60. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Why the Base Rate Matters to InfoSec
Example Questions to Ask
Common Vulnerability Scoring System What is the base rate of vuln
exploitation?
Vulnerability Scanners What is the base rate of each type of
vulnerability?
Intrusion Detection Systems What is the maximum acceptable
false positive rate for my
organization’s IDS?
Anti-Virus/ Malware / Spyware What is the base rate of virus /
malware / spyware events?
Risk Analysis What is the base rate of the hazard
(part of inherent risk)?
Third-Party Assurance What is the base rate of compliant
vendors?
What is the base rate of accurate
audit reports?
61. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
True Positive Rate (aka “Sensitivity”) for
IBM AppScan and SQL Injection Vulns
146 Test
Cases
136 vulnerable
test cases
136 detected
as vulnerable
by IBM
AppScan
0 detected as
non-
vulnerable by
IBM AppScan
10 non-
vulnerable test
cases
7 detected as
not vulnerable
by IBM
AppScan
3 detected as
vulnerable by
IBM AppScan
Source: “The SQL Injection
Detection Accuracy of Web
Application Scanners”
SecToolMarket.com
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆) =
𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 (𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆) =
# 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
# 𝑜𝑜𝑜𝑜 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
136
136 + 0
= 𝟏𝟏𝟏𝟏𝟏𝟏𝟏
62. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
True Negative Rate (aka “Specificity”) for
IBM AppScan and SQL Injection Vulns
146 Test
Cases
136 vulnerable
test cases
136 detected
as vulnerable
by IBM
AppScan
0 detected as
non-vulnerable
by IBM
AppScan
10 non-
vulnerable test
cases
7 detected as
not vulnerable
by IBM
AppScan
3 detected as
vulnerable by
IBM AppScan
Source: “The SQL Injection
Detection Accuracy of Web
Application Scanners”
SecToolMarket.com
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼𝐼 𝑎𝑎𝑎𝑎 𝑁𝑁𝑁𝑁𝑁𝑁 − 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉
# 𝑜𝑜𝑜𝑜 𝑁𝑁𝑁𝑁𝑁𝑁 − 𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉𝑉 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 =
7
7 + 3
= 𝟕𝟕𝟕𝟕𝟕
63. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Positive Prediction Value (PPV)
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen as
orange by
witness
150 orange
cabs
120 seen as
orange by
witness
30 seen as
green by
witness
Assumption:
The cab was green
If the witness says the cab was green, what’s the probability the witness is right?
𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑃𝑃𝑃𝑃𝑃𝑃 =
# 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔
# 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 + 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔
𝑃𝑃𝑃𝑃𝑃𝑃 =
680
680 + 30
= 𝟗𝟗𝟗𝟗. 𝟕𝟕𝟕𝟕𝟕𝟕𝟕𝟕𝟕
64. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Negative Prediction Value (NPV)
1,000 cabs
850 green
cabs
680 seen as
green by
witness
170 seen as
orange by
witness
150 orange
cabs
120 seen as
orange by
witness
30 seen as
green by
witness
Assumption:
The cab was green
If the witness says the cab was orange, what’s the probability the witness is right?
𝑁𝑁𝑁𝑁𝑁𝑁 =
𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑁𝑁𝑃𝑃𝑃𝑃 =
# 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
# 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 + 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜
𝑁𝑁𝑃𝑃𝑃𝑃 =
120
120 + 170
= 𝟒𝟒𝟒𝟒. 𝟑𝟑𝟑𝟑𝟑𝟑𝟑𝟑𝟑
65. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Example: PPV for IBM AppScan and SQL Injection Vulns
146 Test
Cases
136 vulnerable
test cases
136 detected
as vulnerable
by IBM
AppScan
0 detected as
non-vulnerable
by IBM
AppScan
10 non-
vulnerable test
cases
7 detected as
not vulnerable
by IBM
AppScan
3 detected as
vulnerable by
IBM AppScan
If IBM AppScan says a SQL injection vulnerability is present, what’s the probability
that IBM AppScan is right?
𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇 + 𝐹𝐹𝐹𝐹
𝑃𝑃𝑃𝑃𝑃𝑃 =
# 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
# 𝑜𝑜𝑜𝑜 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 + 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣
𝑃𝑃𝑃𝑃𝑃𝑃 =
136
136 + 3
= 𝟗𝟗𝟗𝟗. 𝟖𝟖𝟖𝟖𝟖
Source: “The SQL Injection
Detection Accuracy of Web
Application Scanners”
SecToolMarket.com
66. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #1 – Solve with a fourfold table - Answer
The government estimates that 1,000 terrorists live in the US, in a
population of 300 million people. It wishes to install a nationwide
surveillance system with the following estimated characteristics:
99% probability of correctly identifying a terrorist (TP rate)
0.1% (0.001) probability of incorrectly identifying a non-terrorist (FP
rate).
What is the positive predictive value of this system?
Credit: Schneier on Security, July 10, 2006
67. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #1 - Answer
Out of 300 million people, there are 1,000
terrorists
Base rate = 1,000 / 300,000,000
= 0.00000333
= 0.000333%
68. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #1 - Answer
Out of 1,000 terrorists, surveillance will
correctly identify:
1,000 * 99% = 990 (True Positive)
And mis-identify 10 (False Negative)
69. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #1 - Answer
There are 300,000,000 – 1,000 =
299,999,000 non- terrorists
Of these, surveillance will correctly identify
= 299,999,999 * 99.9%
= 299,699,001 (True Negative)
And mis-identify 299,999 (False Positive)
70. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #1 - Answer
True False
Positive True Positive
(TP)
990
False Positive
(FP)
299,999
Negative False Negative
(FN)
10
True Negative
(TN)
299,699,001
71. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #1 - Answer
What is the probability that a person identified
as a terrorist is really a terrorist (PPV)?
PPV = TP / (TP + FP)
= 990 / (990 + 299,999)
= 0.33%
Or = ~ 1 out of every 303
72. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #2 – Solve with a fourfold table or Bayes Theorem
You are thinking about licensing a vulnerability scanning technology.
From experience and from published reports, you believe that the base
rate of vulnerable systems in your environment could be as high as
15%
The vendor claims 95% accuracy (TP rate), with a false positive rate of
10%.
What is the positive predictive value of this technology?
73. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #2: Bayes Theorem - Answer
P(A|B) = P(B|A) * P(A)
P(B)
Where:
A = Vulnerability
B = Detection
P(A) = probability that vuln is present (prevalence)
P(B|A) = probability of correct detection (TP)
P(B) = probability of positive detection ((TP+FP)/All))
74. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #2: Bayes Theorem - Answer
Using the numbers from the example:
P(A) = 15%
P(B|A) = 95%
P(B) = (TP rate * P(A)) + (1 - P(A)) * FP rate)
= (0.95 * 0.15) + ((1 - 0.15) * 0.10)
= (0.0285 + 0.085)
= 0.2275
= 22.75%
75. Proprietary. Copyright © 2012, 2013 by Jeff Lowder (www.jefflowder.com) and Risk Centric Security, Inc
(www.riskcentricsecurity.com). All rights reserved.
Homework #2: Bayes Theorem - Answer
We can solve the equation as follows:
P(A|B) = (0.95 * 0.15) / 0.2275
= 0.626
Or
PPV = 62.6%