Slides from Tony Martin-Vegue presentation at FAIRcon, Charlotte, NC: October 14, 2016
"Measuring DDoS Risk with FAIR (Factor Analysis of Information Risk)"
2. About Me
Tony Martin-Vegue
• CISSP, CISM, GCIH
• BS, Business Economics, University of San
Francisco
• 20 years in IT
• FAIR practitioner for about 5 years now
• Reside in the Bay Area
4. Agenda & Objectives
• Objective: Give you a hands-on look at how we can measure DDoS risk
for a typical US bank
• Distributed Denial of Service – what is it???
• Our Bank
• Fair Analysis
◦ Scope Scenario
◦ Evaluate Loss Event Frequency
◦ Evaluate Loss Magnitude
◦ Derive and Articulate Risk
◦ Resources
8. Threats: Methods and Objectives
Hacktivists
• Highly motivated, disruptive, possibly destructive, supporter of a cause.
• Wide range of capabilities
Foreign Governments
• Attacks opposition and government websites
• Very high capabilities
Cyber Criminals
• Cyber extortion or use DDoS to mask other criminal activity
• Moderate capabilities
Cyber Vandals
• Interested in networking/computing disruption, web hijacking
• Derives thrills from destruction. No strong agenda
• Wide range of capabilities
8
9. Intel’s Threat Agent Library
9
Source: Prioritizing Information Security Risks With Threat Agent Risk Assessment; Intel; https://communities.intel.com/community/itpeernetwork/blog/2010/01/05/whitepaper-
prioritizing-information-security-risks-with-threat-agent-risk-assessment
11. Anatomy of a FAIR Risk Assessment
11
Risk
Loss Event Frequency
Threat Event Frequency
Contact
Frequency
Probability
of Action
Vulnerability
Threat
Capability
Resistive
Strength
Loss Magnitude
Primary
Loss
Secondary
Loss
Source: Measuring and Managing Information risk; Jack Jones and Jack Freund
12. Stage 1: Identify asset(s) at risk
Examples:
Customer PII
Data on the website
Money
13. Stage 1: Identify asset(s) at risk
How do you do this?
• Talk to people in the IT Department
• Talk to the application owner, data owner, data custodian,
etc.
• Look at asset lists
• Reports from DLP, SIEM etc
• Talk to Business Continuity Managers
14. Stage 2: Evaluate Loss Event Frequency
Risk
Loss Event Frequency
Threat Event Frequency
Contact
Frequency
Probability
of Action
Vulnerability
Threat
Capability
Resistive
Strength
Loss Magnitude
Primary
Loss
Secondary
Loss
15. Stage 2: Evaluate Loss Event Frequency
Risk
Loss Event Frequency
Threat Event Frequency
Contact
Frequency
Probability
of Action
Vulnerability
Threat
Capability
Resistive
Strength
Loss Magnitude
Primary
Loss
Secondary
Loss
16. Why Ranges?
Credibility / Trust
Monte Carlo Simulations
Calibrated Probability Estimates
Express Uncertainty
16
18. Measuring TEF
Make a calibrated estimate
Build a list of attacks that occurred in the last 2 years
Update calibrated estimate based on new information
Repeat steps 2-3
20. Where?
• VERIS
• Privacy Clearinghouse
• Internal data
• 10-K
• Google news
• Research (e.g. academic papers, blogs)
• Vendor produced studies (whitepapers, reports) be careful
20
21. Results
Bank Oct 1, 2014 – Oct 1, 2016
Chase 2
Wells Fargo 2
Bank of America 1
Citibank 0
US Bank 1
PNC Bank 0
Bank of NY 0
Capital One 1
TD Bank 2
State Street 0
21
22. Measuring TEF with Bayes
• Old technique
• Allows an analyst to
compute frequencies
with very few data
points
23. Measuring TEF with Bayes
• Vase is filled with mostly yellow but
some blue marbles
• We want to estimate the proportion
of blue marbles without counting
every marble in the vase
• Reach in and pull out 6 marbles; 1 is
blue and 5 are yellow
• We can estimate that the proportion
of blue marbles is between 5.3 and
52%
24. Apply to DDoS Attacks
Obtain a sampling of banks
Find DDoS data for sample
Apply Bayes (betadist in Excel)
24
25. Stage 2: Evaluate Loss Event Frequency
Risk
Loss Event Frequency
Threat Event Frequency
Contact
Frequency
Probability
of Action
Vulnerability
Threat
Capability
Resistive
Strength
Loss Magnitude
Primary
Loss
Secondary
Loss
27. Stage 3: Evaluate Loss Magnitude
Risk
Loss Event Frequency
Threat Event Frequency
Contact
Frequency
Probability
of Action
Vulnerability
Threat
Capability
Resistive
Strength
Loss Magnitude
Primary
Loss
Secondary
Loss
28. Primary Loss
28
Loss Types Range of Loss How to get
Productivity $0 - $300,000
• BCP managers, managers for each LoB that sells through website
• ProTip: Someone in your org has the average value of a customer
over the customer’s lifetime.
Response $1,000 - $300,000
• IT people, outside counsel, public relations, freebies
• ProTip: Get average salaries for all departments from HR.
Replacement N/A • ProTip: How much did it cost originally?
Fines and
Judgements
N/A
• Legal and Compliance people, other companies’ SEC filings, news
reports
• ProTip: Mostly public; possible to extrapolate
Competitive
Advantage
N/A
• Senior management can get this info
• May have already been done in a BIA
Reputation N/A • Roll up to competitive advantage
30. So Far - Recap
• Our asset is the website
• The loss type is Availability
• The threat community is Cyber Criminals
• The TEF is 0-1x every year
• TC is 50-75%
• Resistance Strength is 50% - 90%
• Losses:
◦ Productivity - $0 - $300,000
◦ Response - $1,000 - $300,000
30
31. Stage 4: Derive and Articulate Risk
Risk
Loss Event Frequency
Threat Event Frequency
Contact
Frequency
Probability
of Action
Vulnerability
Threat
Capability
Resistive
Strength
Loss Magnitude
Primary
Loss
Secondary
Loss
33. Resources
• Threat Modeling
◦ Intel TARA - http://www.intel.com/Assets/en_US/PDF/whitepaper/wp_IT_Security_RiskAssessment.pdf
◦ Threat Modeling: Designing for Security: Book by Adam Shostack
• Quantitative Risk Assessment Methodology
• Measuring and Managing Information Risk: A FAIR Approach: Book by by Jack
Freund and Jack Jones
• How to Measure Anything: Book by Douglas Hubbard
• How to Measure Anything in Cybersecurity Risk: Book by Douglas Hubbard
33
Editor's Notes
Hi, good afternoon. I’m very excited to be here.
Thank you to the conference organizers and to our generous hosts.
I today I’m going to be presenting a case study today on measuring distributed denial of service risk with FAIR, factor analysis of information risk.
I’ll be using techniques that have been developed and refined over the years by Jack Jones and other fair advocates. I also draw from other sources, techniques from Douglas Hubbard and many other which I incorporate into my analyses
Quick note about me
Been in technology for over 20 years, info sec for the last 10.
I’ve been a fair practitioner for about 5 years now – the first couple spent unlearning bad risk habits and absorbing as much as I can
I work at a mortgage insurance company, in the bay area as the firms cyber risk manager
Lucky now to work at a company that appreciates fair, appreciates quant analysis so I’m able to work on fair nearly full time
I’d like to quickly tell how I got into FAIR and quant. analysis. Explains a lot about how I got here, my approach
BoD, risk reporting
Annualized loss in $
Started looking for something better, something easier
Thise slide here is a picture pf “The Subscription Room,” part of Lloyds of London
Reminder that we are part of a long line of business risk managers. Tools, techniques – new – basic foundation is not.
risk analysis has been around since the mid 1600’s when Lloyds developed the original life table, formation of actuarial science, by extension basis for modem risk management
We always think we need to re-invent the wheel; . Why do we feel the need to declare problems unsolvable
Same basic principles apply
Objective: give you a hands-on look at how we can measure DDoS for a typical US retail bank
Structure – We’re going to create a fictitious bank – they compete with the top 10 banks in the US (more exciting than my industry)
I’m going to walk through each step in FAIR, describe what the step is, the result, and exactly how I got to it. This is going to be very detailed and hands on – my goal is that you will be able to take some of these tools and techniques back to your office and start using them Monday
I’ve gone through everything I’m doing – a quick note, we’re not going to teach you the FAIR taxonomy or some of the basic concepts. We will use the taxonomy in-depth but if you find yourself lost, feel free to let me know after
Let’s create our bank to do our analysis on
Typical US retail bank
Our competitors are JPMC, Wells Fargo, BofA, CitiGroup
Our bank offers a wide variety of services: consumer, business to business, mortgages, loans, debit cards, credit cards, brokerage, insurance
We’re a household name and often in the news. Any type of cyber attack that journalists got wind of would certainly end up in the news.
The CIO read this news article about renewed DDoS attacks that are breaking bandwidth records. The CIO wants to know - , can this happen to us? Should we increase our DDoS protection services? How much will it cost me?
Luckily for us – we can do this
So, what’s a DDoS attack? This is the first step I almost always take; understanding the threat landscape. What threat event are we talking about, who perpetrates it, how does it happen and what has happened in the past.
in short a DDoS attack is when an attacker uses multiple systems, usually they’re compromised with malware but sometimes they’re not, are used to target a system with the intention of affecting the availability of the system. It’s not a hack, it’s not a breach, It’s real world equivalent is standing in front of a store entrance with the intention of not letting customers in.
DDoS attacks are nothing new. Thryve been used for over 20 years now. They have been used to protest governments, petition policy changes in companies, influence elections – anything you can think that would have people protesting with signs now has a DDoS counterpart.
Once we have a good idea of the attack, we can start to look at threat communities, what their methods are
Out of the possible threat communities that are out there, I’ve narrowed it down to 4 that are probable.
#1 - Hacktivists: Occupy Wall Street, anonymous. Different sectors have different hactivists. I worked in retail cyber security briefly and we were concerned about PETA. Take a look at who doesn’t like you and who protests you in meat space and then look to see if those groups have any cyber capabilities. Their capabilities have a very wide range, from relatively low power attacks to some of the most powerful attacks we’ve seen. It all depends of how many people they can get attacking your website.
#2 - Foreign governments – It happens. The Chinese gov’t was linked to a DDoS attack on GitHub. Russia attacks Ukranian businesses on a constant basis.
#3 - Cyber Criminals. Probably the most common threat community as far as ddos attacks against the financial services sector. A few years ago it used to be hactivists, and now the pendulum has swung over to cyber criminal activity. We see two forms here; cyber extortion. They hit your website, then you receive a ransom note – pay me in x bitcoin or I’ll keep doing it. Based on my research, they stay away from large Fis. They can’t overwhelm DDoS defenses long enough to make the bank want to pay. They hit smaller companies. The second type of attack is using DDoS as a smokescreen. For example, the criminal has taken over the account of one of your customers and is about to wire transfer the funds out of country. They can launch a DDoS attack to tie of security personnel while they transfer funds out.
#4 - Last is cyber vandals. Not too sophisticated; they basically have a shiny new DDoS toy and they try it out on your site to see if it works. They don’t have a strong agenda, so they are easily distracted or dissuaded and will move on.
I’m a big proponent of developing a Threat Agent Library for your organization. This is essentially a list of threat agents (FAIR calls them threat communities)
You list out threat agent, description, common tactics, methods, objectives, capabilities, any past actions against your company or other companies in your sector. I have a threat agent library for the company I work at and I am constantly updating it with new info. It really helps me speed up risk assessments. Much of the data is re-usable and I use to constantly when trying to ascertan the TEF when I’m doing a risk analysis.
What you see here is the Threat Agent Library from Intel. Matt Rosenquist and Tim Casey at Intel developed this internally and released it to the public. This is just a sampling, just the tip of the iceberg of what has been done in this area. I’ll have a link at the end of the presenatoin. It’s all compatible with the FAIR – drops in nicely to TEF stage
One last thing before we dive in – this is a concept that I learned from reading how to measure anything and it really help me.
We can’t tell the future, we don’t have a crystal ball –don’t frame your results that way.
When we do a risk assessment, we aren’t looking into a crystal ball– what are we doing?
We are reducing uncertainty in order to enable a decision.
Even if you reduce the uncertainty by a teeny bit, you are still helping the decision maker make an informed decision.
Here’s the FAIR taxonomy.
we’ll apply each stage of the taxonomy to our risk analysis and I’ll give you all the exact steps of information I gathered to get the data.
First step is to identify assets at risk. DDoS attacks are typically pointed against any public facing service – the more high profile, the better.
So for assets – we’re going to identify website availability as our “asset.” these are examples od assets - Remember, DDoS isn’t a breach, it’s not going after PII – it’s simply takes a website down.
If you get stuck, think about the good old CIA triad.
If there was a confidentiality attack, what could be affected
IF there was an availability attack, what could be affected
How do you do this?
Talk to people in the IT Department
Take to the application owner, data owner, data custodian, etc.
Look at asset lists
Reports from DLP, SIEM etc
Talk to Business Continuity Managers
We have our asset. It’s the website and the loos type is availability.
Stage 2 is evaluating the Loss Event Frequency.
FAIR defines this as, The frequency, within a given timeframe, that loss is expected to occur
It’s made up of two inputs
Threat Event Frequency
Vulnerability
Let’s start with TEF.
Let’s break this down in the taxonomy and talk about Threat Event Frequency and how we derive this number.
The FAIR definition of TEF is, The frequency, within a given timeframe, that threat agents are expected to act in a manner that could result in loss
This is almost always articulated in the form of a range. For example, a threat analysis of a database with PII would show that cybercriminals will act against in a way that cases a loss event between 1 and 5 times a year.
FAIR is flexible and will let you use a number or a range, but it works best with a range.
Why are ranges better?
There are a few reasons, and it’s what separates FAIR from some of the other risk analysis methods in information security.
Why ranges – this is really important. This is covered in in the FAIR material and it’s a cornerstone of Doug Hubbard’s book, but I think this is really a important concept which is why I want to touch on it.
First, ranges help you build credibility and trust.
and it points back to the story I told earlier about sitting in a room and I was saying “red” and everyone else gave numbers. It points to your credibility. One of the biggest challeges you’re going to have in your risk career is people aguring with you about quant analysis methods, where or how you get your data or how you can possiblly predict the future. Most of this is going to come from within information security and IT, not from outsides. Most business type folks – especially those in fin services – are accustomed to quanatitive analysis techniques and will understand this right away.
Using a range will help your credibility and will build trust in your team and outside your team. remember we can’t tell the future and if you lead people to believe you can, you’re gonna lose credibility quick.
Next is Monte Carlo simulations. This is a huge unlock for FAIR risk assessments. It’s not required, but using ranges with confidence interval allows you to run your assessment through monte carlo sims. Apologize, it’s a complicated subject and I can’t dig deeper with an hour presentation but I will demo it for you
Next is calibrated probability estimates, which I’ll cover a little later on, but this is a key component of an accurate FAIR risk assessment. This enables you to take imperfect and incomplete information and still do a credibler analysis
Allows us to express uncertainty –
Last is expressing uncertainty. This ties back to the previous three items. It’s OK to have uncertainty about something, it’s expected. You have to articulate it. The way do do that is with a range of numbers. If you say 1 DDoS attack will happen in 2017 -- you seem so certain about it – how could you know? On the other hand, between one to 5 times a year expresses allows you to express your level of uncertainty
Back to TEF
Threat event frequency is ”the probably frequency in a given timeframe, that a threat agent will act in a manner that could result in a loss.”
To help us measure the TEF, we have Contact Frequency and Probability of Action. CF is the probable frequency than and agent will come into contact with an asset and the PoA is the probability an agent will act once it come into contact.
Usually don’t use when assessing threats that are always hostile, like cyber criminals, etc
Very useful when ascertaining accidental human threat; example from the OpenFAIR review book is you have an extension cord powering a server that is lying across the floor in a public place. You can measure the TEF by getting the CF, how many people walk down the hallway and step over the cord every day. The PoA is the probability of someone tripping over the cord and powering off the server
Don’t usually use this – let’s focus on measuring TeF. The question for our bank is the probably frequency, within a given timeframe, that a threat agent, such as a cyber criminal, will launch a significant and noteworthy DDoS attack
Step 1: Make a calibrated estimate
I always start with a calibrated estimate. A calibrated estimate is a technique in which someone can expresses a probability of an event occurring in a way that represents their uncertainty
The question is, what is the probably frequency in a given timeframe, that a threat agent will act in a manner that could result in a loss
How many of you have been through calibration training, whether it was with risk lens, one of AIE workshops, something else?
Start with the absurd: Low range is 0 and the upper range – what’s an absurd amount of DDoS attacks? Based on working for Fis my experience would say 20. If in your head you said 100, that’s OK – that’s not wrong, that’s you expressing your uncertainty about it and step 3 will tighten that up.
Now, step 3 --
build a list of attacks that occurred in the last 2 years. I chose 2 because the threat landscape changed in the last two years away from hacktivism and toward extortion and fraud.
Step 3: Update calibrated estimate bases on new information
Step 4: Repeat steps 2-3
I always use external data in conjunction with internal data, if it’s available. This is a big difference in how I perform assessment – Chad and Isaiah taught me and I took this from from. If you use internal data only, your sample size is too small. You’re a sample of one.
First steps in building a list of of DDoS attacks against us banks is to get a list of US banks.
Good old Google, my old friend. I googled “list of top 10 US banks” and the first hit is a list of banks here.
Next we want to find the number of DDoS attacks against those banks in a two year period
Be careful with vendor studies. Many of them, not all but many, use questionable methods like non-scientific online surveys. I could go on a really long rant about vendor studies, and in fact I have a few years ago at BSIdes SF.
They can be useful but be wary of people trying to sell you FUD so you’ll buy their product
Use this to improve your calibrated probability estimate. If you remember I previously said the range was 0 and 20, the upper range was slightly absurd. My confidence in that number was pretty low.
Given this new data,
The low range would obviously be 0; the high range is 2. You could use this data to articulate your range. Based on our analysis the probability estimate of a significant DDoS attack, is between 0 and 2 times every 2 years. We can average the attacks out, which is roughly 1 attack every 2 years, so we can use that as our Most Likely number.
Based on all my research here, nearly all of the DDoS attacks came from cyber criminals, so we’re going to focus on that
Let’s take this one step further
One step further is to measure TEF with Bayes Theorem. This is why you would want to look at it.
The problem that we risk analysts with using external data is that we’re missing the total population size. Let me explain. You’re reading the news, the DBIR, Slashdot and you see a DDoS attack on a company in your sector. That’s an interesting data point and we want to use that in our risk assessments, but that single data point doesn’t tell us the probability of DDoS attacks in our sector. That’s the big problem I’ve always had – in the past I’ve had a hard time extrapolating incident data into an actual range
Now you could build out a table of companies in your sector like I did on the previous slide and list out attacks. This may not be possible for all sectors. Retail might be hard. Small business may be hard. Or, maybe you don’t have time – you want something quick but still statistically sound.
This is a very old technique. Bayes Theorem has been around for quite some time. I think someone could spend a week talking about Bayes and how to use it in risk assessment, but I’m just going to scratch the surface here to show you a technique you can use to improve risk assessments. This is relativery new to me – I’m been aware of Bayes ffrom my college stats classes, but I never applied it to risk analysis. I read about this recently from this book here – How to Measure anything in Cybersecurity. If you don’t have it already, you must check it out.
There’s a method in this book that give you a technique to copmpute frequencies with very few data points.
So how does it work. This example is right out of the book, so I’m stealing it. This vase here on the screen is filled with yellow marbles and a few blue marbles are in here also. I want to estimate the proportion of blue marbles without counting every single marble in the vase. I sample the marbles – which means I reach my hand in and I pull out 6 marbles. One is blue and the other 5 are yellow. We can estimate that the proportion of blue marbles is between 5.3 and 52%
How do we take this technique and apply it to DDoS attacks?
The same principle applies. Take a sampling of banks. It can be a random sample, or it can be a list of banks from the ABA or Fortune 100, it doesn’t matter – what does matter is that the list you use should have nothing to do with DDoS attacks.
After you have your list, find the occurrence of DDoS attacks for that list. Same techniques I told you before – Google news, VERIS, etc.
Last run a betadist function in excel with your data and you will get a range within the 90% confidence interval.
This is pretty cool stuff – I’ve just scratched the surface but these are really cool ways to help us estimate probabilities when we don’t have a ton of data
Whew. We’re done with TEF, but guess what – that really is the hardest part. That’s what most people struggle with, including me, because you’re dealing with so much external data
We’re next going to look at Vulnerability, which is a calculation of Threat Capability and Control Strength.
Threat Capability - The level of force a threat agent is able to apply
In FAIR, this is articulated in the form of a percentage – in our case, Cyber Criminals are generally smart, well resourced, motivated by money – they are in the top 50% - 75% of of all threats that launch DDoS attacks. They’re not nation-states, which would be higher, but they are motivated, have money and means and people.
How do we get this number? Talk to SME’s in your org that perform threat analysis. I also get a huge amount of value from information sharing orgs like FS-ISAC. They share info on threat agents and threat capabilities. Also don’t forget that a lot of this analysis is already done, like Intel’s TARA, work the DHS did – there’s no shame in taking it.
Next is resistive Strength – this is also called Difficulty in some older fair docs- A measure of how difficult it is for a threat actor to inflict harm (a.k.a. – resistive Strength)
We’re going to rate our resistance strength at 50% - 90%. This means we our DDoS protection service – prolexix, Akamai, firewall blacklisting, can resist 50-90% of attacks – used to be 80-90% but the new attacks we are seeing made us update the assumptiomns. These would be the attacks the CIO wants to know about.
Last – and here’s the fun part – we
This is my favorite part because I feel like it’s the easiest part. The reason why I think this is because, for the most part, you just have to go talk to people on the the the business side.
FAIR defines primary loss as the loss that is a direct result of a threat agent’s actions against the asset. There are 6 forms of primary loss
Prod: Reduction in our company’s ability to generate revenue. We may not lose any money; if the website is down, most people just try again later.
Response: managing a loss event. Akamai. Estimated around 30 for internal personnel, another 30 for Akamai and 250 to pay a ransom. It happens more often than you think
Replacement: Cost of replacing an asset. Not applicable in this case.
Fines & Judgements: Any fines levied by regulatoriy agencies or judgements from people suing you. I can’t find any case of this happening from DDoS attacks – usually end before people get really angry. The OpenFair book says this category includes bail you have to pay – always makes me chuckle – reminds me of Evan Wheeler’s comment yesterday that a company objective can be “avoid jail”
Competitive advantage – diminished company position. Not too likely to happen nowadays with DDoS attacks because they’re so common. At my current company, this is where I look at: inability to raise new capital, inability to raise debt debt financing, drop in stock price, estimate number of lost customers,
Secondary loss is considered “fallout” from primary loss. Example from the open fair book is customers taking their business elsewhere after repeated outages. Different from primary loss in that respect
In our case study we don’t have any secondary loss; most often seen in data breaches or other events that have long term or cascading effects that are different from the losses measured in primary loss
What do we have so far?
Our asset is the website – availability, ability to service existing customers, sell services to new ones
The threat community is Cyber Criminals
The loss type is Availability
The TEF is 0-2x every 2 years
TC is 50-75%
Resistance Strength is 50-90%
I’m actually going to jump over to the RiskLens app. to derive the final risk
I taught myself fair with the basic risk assessment guide and later used excel, I seen people use archer – but I prefer the risklens app because you can plug in the numbers and it will run all the monte carlo for you – the hard math in the background.
OK I lied. I said LM as my favorite – it’s actually running monte carlo.