The document discusses using machine learning to predict which vulnerabilities are most likely to receive exploits in the future. It describes collecting data on vulnerabilities from CVE databases and observations of exploits and breaches. This data is then used to build a supervised classification model to predict the likelihood of future exploits. The model aims to help prioritize remediation of vulnerabilities that pose the greatest risks. Key challenges addressed include the increasing volume of vulnerabilities and shortening windows between disclosure and exploits.
3. 3
Complete Remediation is Infeasible
Complexity Abound
Multiple patch releases from major vendors, including microcode updates
Incompatible Antivirus or Endpoint protection
Massive Array of Devices Affected
Affects Printers, Thermostats, Door Locks, Cameras, Phones, etc
Intel’s Nehalem and Westmere (released in 2008 and 2010) affected
Not Just Patches
Code “should be recompiled with the /Qspectre switch enabled”
4. 4
The Modern Stack is COMPLEX
Intel / ARM / AMD CPU
Hypervisor
Java
Management Agent
Docker
Operating System
.NET
Operating System (Container)
Node
3rd
party libs
Your App
Patch Me!
Idea Credit: @samnewman
Python Ruby PHP
App Server / Web Server / etc
9. 9
“Remember the Recall”
Infosec is largely a search problem:
1. We are data rich and signal poor.
2. Multi-stage testing cost-effectively increases
both precision and recall.
3. Analyst time is the capacity constraint for most
security problems
We must aim to create signal for our
analysts.
13. 13
What Matters for Scoring
Is anyone actively targeted?
Could we detect success?
How much effort is required?
What is the attacker payoff?
Does a valid attack path exist?
score = $CVSS_SCORE
score += A if
recent_breaches_exist?
SCORE += B if exploits_exist?
SCORE += C if popular_target?
SCORE += D if
exploit_will_exist?
16. 16
Measuring Remediation Strategies
Coverage: Of the vulns we
fixed, did we pick all (100%)
of the correct ones?
Efficiency: Of the ones we
ended up fixing, did fix any
that didn’t matter?
17. 17
Coverage & Efficiency, Explained
OURS
NEIGHBORS
ROBOT MOWED
Coverage =~ 80%
Efficiency =~ 60%
EFFICIENCY:
Out of all the grass mowed, how
much of the grass should have
been cut
COVERAGE:
How much of the grass we
wanted to cut was actually cut?
wasted
effort
(inefficiency
)
not covered
18. 18
Coverage & Efficiency In Practice
CVES with
known
exploits or
events
CVEs with no known
exploit or event
Coverage
How many vulnerabilities
did we prioritize of those
that ended up with a known
exploit or event
Efficiency (green in the red
area green + blue) =~ 9.28%
Of all the vulnerabilities we
prioritized, how many ended
up with a kown exploit or
event
19. 19
Coverage & Efficiency In Practice
CVES with
known
exploits or
events
CVEs with no known
exploit or event
Total Prioritized CVEs
All CVEs
Vulnerabilities
prioritized
with known
exploits or
events
CVEs prioritized with no
known exploits or events
Coverage (green / red)
How many vulnerabilities
did we prioritize of those
that ended up with a known
exploit or event
Efficiency (green / green +
blue)
Of all the vulnerabilities we
prioritized, how many ended
up with a known exploit or
event
20. 20
Coverage / Efficiency Tradeoff
● There exists a natural tradeoff between coverage and
efficiency.
● We are operating with incomplete information at any given
moment.
● Why would you want <100% efficiency?
○ Abundance of caution (if you can afford it!)
● Why would you want <100% coverage?
○ New campaign can spin up or an older one can spin down. The
world is not static.
Continuous review and adjustment provides the best result.
22. 22
Current Attacker Velocity
Average Days from Publish to Exploit
(639 / 8%): 19.68 Days
Average Days from Publish to Event
(36 / 0.5%): 27.36 Days
Shortest Window: Adobe Reader (zero
days)
Longest Window: IE Edge (months)
24. Increasing Risk
Factoring in Velocity
Created Discovery Disclosure Public
Exploit
Code
Released
Exploitation
Detected In
the Wild
Detection
Generate
d
32. 32
What IS Machine Learning?
• Methods for automatically learning and recognizing
complex patterns from data
• A set of tools for understanding data by buildings
models from data
• measure success on coverage and efficiency
33. 33
Type of Algorithms
Do you have
labeled data?
Supervised Unsupervised
What do you
want to predict?
Classification Regression
Category
NoYes
Quantity
34. 34
We are current really good at:
• “Of my current 300 million vulnerabilities, which
ones should I remediate first?”
• “Old ones with stable, weaponized exploits, known
breaches, high risk meter scores”
36. 36
Asking the right questions:
• Classification: output is qualitative
• prediction:
“Will this vulnerability have an exploit
written for it?”
(== cause more risk later)
38. 38
Predictive - The Expectations
Distribution is not uniform. 77% of dataset is not exploited
1. Accuracy of 77% would be bad
Precision matters more than Recall
1. No one would use this model absent actual exploit available data.
2. False Negatives matter less than false positives - wasted effort.
We are not modeling when something will be exploited, just IF
1. Could be tomorrow or in 6 months. Re-run the model every day.
39. 39
Measuring performance of a
predictive model
The ideal1
10
Precision
Recall
Returns relevant
documents buy misses
many useful ones too
Returns most relevant
documents but
includes lots of junk
45. 45
The Work Averse Attacker
“An attacker massively deploys only one exploit per software
version. The only exception we find is for Internet Explorer; the
exception is characterised by a very low cost to create an
additional exploit, where it is sufficient to essentially copy and
paste code from the old exploit, with only few modifications, to
obtain the new one.”
-The Work-Averse Cyber Attacker Model: Theory and Evidence From Two Million Attack Signatures by
Luca Allodi, Fabio Massacci, Julian Williams
50. 50
Constraints on the Future
Any new rating system must be:
● Simple (in every sense of the word)
● Explainable (cause and effect understandable)
● Defensible (science!)
● an Improvement
And every data source is on the table...
53. 53
Lesson: Probability is our friend
confusing
^
78% of vulns are < 1%
● While initially confusing, probability offers a very intuitive measure
● Most vulnerabilities are predicted to have < 1% probability of exploitation
2,400+ vulnerabilities are predicted > 10%
● How can we validate probabilistic estimates?
54. 54
Lesson: Probability is our friend
confusing
^
~450 vulnerabilities
(what we say)
(what we see)
Dashed line is
“calibrated”
56. 56
Takeaways
Volume, complexity and speed of both vulnerabilities and threats are
modern vulnerability management challenges
Coverage and efficiency allow us to measure vuln management strategies
For all the new vulnerabilities you’ve seen this week… is it truly critical? Will
it be attacked in the future?
Future threats should be addressed, but only after immediate / existing
threats