101 ways to fail at security analytics ... and how not to do that - BSidesLV 2018
1. There's lots of ways to fail at getting value from investment in security analytics.
1
2. In this talk, I'm going to share some frameworks that can help avoid this, by thinking
about and managing data like a product.
2
3. We’re going explore how to use these frameworks to take a structured, reasoned
approach to dealing with the key dependencies that determine what value we’ll get
from spending money on making it easier to ask our data questions.
3
4. In my experience, if we don't do this, we’ll either fail totally...
4
5. … or we’ll deliver a level and consistency of value that's unmoving to our
stakeholders.
5
6. To manage data like a product, we're going to need a team that’s focused on joining
all the dots – and picking up all the things that fall between the cracks – of any
serious effort to use data and analytics to better protect our business.
6
7. Rather than a formal group, this will likely be a team of teams, combining skills in
these areas.
7
8. And while their mission is security specific in our case, it’s actually no different to any
other mission in data analytics, as this paper makes clear.
8
10. To facilitate relationships and closely connect everybody who collects and prepares
data, analyzes it, and puts that analysis to use.
10
11. This means the topics we’re going to cover in the next 55 minutes aren’t just relevant
to machine learning, or anomaly detection for threaty stuff.
They’re about the fundamental issue of how different security teams collect and
manage data…
11
12. … and then receive input from, or deliver output to each other (and other teams in
the business) to make everyone's job easier.
12
14. In the first act, I’ll explain why ‘data operations’ is a necessary function for a modern
security team to have.
14
15. In the second and third act, I’ll provide frameworks for reasoning through two key
dependencies for analytics success…
15
16. … specifically the data we have to work with, and how we get access to it.
16
17. Then finally, I’ll put this in context of the change that Security Data Operations often
needs to bring to an organization in terms of what data is available to security teams,
and why.
17
18. Think of this talk like field guide for all the things we’ll run into when ‘doing’ Security
Data Operations.
We’re going to run though a lot of material, quickly – so don’t worry about taking
everything in or reading everything on the slides.
They’re available with full notes online, so you can dive into the detail at your leisure.
18
19. So for now, please just lean back, relax, and enjoy the ride.
[0:45]
19
25. The tide of CyberCrime is rising.
And while your Board are worried, only YOU know the true meaning of the horror.
The complexity of your systems, INDECIPHERABLE. The threats you face,
INNUMERABLE. The competency of your team. Questionable.
25
29. But not just any math. Math so vast in it’s reach, so incomprehensible in its power
that it knows what the threat is and who the threat is at all times.
29
31. Gabe is printing documents on a printer two floors above and hundreds of meters
away from his team in a vain effort to hide the fact he is exfiltrating your intellectual
property in physical form.
Gabe is your data thief.
31
33. Hoff’s credentials are accessing files and systems that deviate from his baseline at
strange hours from Russia.
Hoff is collaborating with the Russian Mafia to give them a foothold in your systems.
Hoff is your mole.
33
35. Leila is transferring GBs of data to her local machine in readiness for exfiltration.
The number and volume of files are so egregious that her access to all systems must
be immediately and automatically revoked.
35
37. As our comprehensive demonstration has shown, there is only one answer to your
security needs.
That answer is Anomaloser.
37
38. That answer. Is 43.
Ah. Hahahahaha. Now, while I cannot promise you that our system is 100% accurate,
(what fool would do that) …
38
39. …what I can promise you is if before purchasing our technology you had fear,
uncertainty and doubt … AFTER we are done with you, you will have fear, uncertainty,
doubt and a massive invoice.
Which we will use to purchase bizarre neo-classical fusion art that will hang on the
walls of our West Cost office so when you visit you will be left in no doubt where your
money has gone.
39
40. And with that, there is only one question left to ask. Are you ready … to buy?
40
41. That's a condensed version of most pitches you’ll hear in security analytics, as you
start to slide down a vendor’s sale pipelines.
They usually come with a promise like this.
And while yes, I’m sure you’ll tell me something I don't know in 30 days – perhaps
how difficult it is to get data into your product – there's a serious point here about
the problem facing buyers who want to use data, analytics and automation to
improve the protection of their business.
41
42. That problem runs as follows.
10 years ago it was a simpler time. Sure, we had sales people pitching us like this for
tech like firewalls and IDS.
But we could relatively quickly - within a year say of something new coming out -
know what the deal was: what the tech could do; how easy it was to get it to do it.
42
43. Today however, things are more complicated.
Pitches are laden with technical terms about random trees and exponential forests
delivering auto-magic results.
And while it’s easy to embarrass sales people who spout this stuff in a meeting if you
have a math degree, it’s very hard (with or without a math degree), to pierce through
the technobabble, and understand what a product can do for you in the long term.
43
44. Unfortunately, most of what you hear from peers that's negative about analytics
products can be attributed to deployment failures as much as product failure.
Which means you really have no option in this space, but to give things a try.
44
45. This can be daunting, especially if you speak to people that have committed to a
vendor's technology for the long haul, and feel they're getting good value, because
they say things like this.
45
46. So while one of our jobs in Security Data Operations is to de-risk the probability we’ll
waste time on POCs, the other is to know what we're getting into if we decide to
commit to something that sounds like it stacks up.
46
47. It’s a good idea to have a set of questions that can help with this.
Here are mine, with my two personal favorites being #5 and #8.
To date only three people in vendor companies, (all CEOs, by the way), have given
double thumbs up answers without skipping a beat. Those are Alex Pinto of Niddel,
Rob Lee of Dragos, and Kelly White of Risk Recon.
47
48. Now, as helpful as these questions are, the reality is that you may be asking them
while you're already in an express elevator to hell, going down.
48
49. And when you get there, what you're up against is ‘The Vendor Dashboard’. Designed,
of course, to look AMAZING.
49
50. So amazing in fact, that one day, someone on your team – swept up in the emotion of
the moment – may think out loud to the vendor during their pitch, (and sideways to
you as the sceptic brought in to to put them through their paces): ”You mean THIS
could be our single pane of glass?”
To which you will sigh: “But don’t we have five of those already?”
50
51. Because if you spend long enough looking at this space, what you find on offer is a lot
of specific use cases, built on vendor controlled platforms, which limit how you can
use both the data you need feed into the platform, as well as the platform the data
sits on.
51
52. For vendors, this makes sense, as if you’re trying to build and sell a company, it's hard
to do that if you try to build something broad and deep. And niche can be really good
if it’s highly specific to particular data fields in a small set of very specific data
sources.
52
53. But for analytics solutions that target a bigger surface area of problems, (e.g. UEBA)
there is an unreasonable imbalance between the effort to get these solutions
working, and the enduring capability to ask the data you have to put into them the
questions that you want.
And that’s not just the case within the platform; it’s also about getting data out these
platforms that they process or generate – for example if you want to enrich or
correlate it with other sources. Sometimes you can’t get that data out at all, or if you
can, only in a format that’s hard to work with.
53
54. Of course, vendors don’t want to tell you all this stuff.
Because if they did, you might stare a year into the future, and realize there's a fair
probability that, even if the deployment doesn’t go horribly wrong, you'll have put
loads of effort to get this thing working… a new challenge will come up ... but oh dear,
at best what you're after is ‘on the roadmap’.
This means you'll then have the dubious pleasure of either re-platforming, paying a
lot for professional services to bespoke stuff, or buying more tech and trying and bolt
things together. Neither tends to end well.
54
55. And yet, this is the dynamic the majority of the ‘buy side’ in security analytics face.
Here's why...
55
56. Very few firms have the people, time and budget to build their own platforms for
analytics, and then develop analytics in-house.
56
57. If they do, they’ve got to be ready to maintain an in-house pool of expertise, because
vendors like to develop intellectual properly they can re-sell, so there’s not much
incentive for them to build analytics that live on your in-house platform, which you
have full visibility of.
57
58. At the same time, only a small percent of the buy side can or want to take the risk of
working with unproven start-ups … which is effectively outsourcing platform building
and analysis to vendors who need (but don’t have) access to data to develop their IP.
58
60. And what's on offer is a set of generally applicable, but narrow use cases, which may
or may not solve the problem that matters most to our business' threat landscape.
60
61. Side note: this is one reason I hate the term use case, because you can find a use for
most stuff, but whether it is what actually matters to solve is another question
entirely.
61
62. But we must also stare into that darkest of mirrors, because it would be wrong of me
to say our only challenge in Security Data Operations is vendors selling us stuff we
don’t need.
62
63. If you've ever heard a CISO say something like ‘The dashboard should look like
minority report’, you'll have a fair idea of the story I’m about to tell.
63
65. Teams of data engineers and data scientists are hired. They are motivated by inspiring
visions of a scalable, extensible ecosystem of platforms that will solve many teams’
problems. Optimism is high. Budget is flowing.
65
66. But fast forward a few years, millions burnt through in budget, and teams that funded
this central engineering and analytics effort asking for their money back … and what
you’re likely to find is that everyone got only a little of what they wanted, and no one
got enough of what they needed.
66
69. In the end, wherever we sit on the buy-side, our job is to make decisions under
imperfect conditions in complex environments, which give us the best possible
answers for the problems we face.
69
70. In analytics, that means some level of trade-off between proprietary and open
solutions, which will need to work together, as well as some necessary replication of
data across our environment.
70
71. This means we need to make sure that, for whatever data we need to use
- we can get it to the place we need it in a sensible format;
- then get it out to other places if we need or want to;
- and understand what an efficient system of processes looks like to support this
71
72. This is where Security Data Operations really comes into play.
72
74. … there are two areas of critical dependencies we'll need to wrestle with - which are
rarely covered in talks about analytics, but which are absolutely vital to it's success.
74
75. The first is the need to get ‘the’ data; those nuggets of gold we need to alert on or
correlate to get the insights we're after.
75
76. And second is being able to move data, or get access to it, so we can build, run and
refine our analysis.
76
78. …to look at how we can navigate the first of our critical dependencies.
The challenge of identifying and prioritizing 'the data' we want.
78
79. A logical argument at CISO level for getting our hands on data could go something like
this: "We need timely visibility into what's going on in our environment, to know if
things are operating as they should, identify risk, and find bad stuff. Data gives us
that. So give us the data.”
79
80. And not many CxOs would disagree. They know they need more current information
to run their businesses with.
80
81. However, in practice - as per the selective attention test - just having the data (i.e. the
visibility) is a far cry from being able to recognize things that we need to, even when
they're in plain sight.
81
82. In his blog on SOCless detection, Alex from Netflix sums this problem up perfectly:
“The creation and/or collection of data should be justified by a quantitative reduction
in risk to the organization (in dollar terms), but that can be difficult to forecast
accurately until you have a chance to explore the data”
82
83. Clearly, we know our business case to the CIO can't be ‘Hey, we’re in Catch 22 about
the ROI of all this, so be a good sport and hand over all the data on this list, then we'll
work out what value it has’, because even doing the handing over can be incredibly
expensive.
83
84. But neither can we make expertise-based claims about what a specific bunch of logs
will tell us in order to persuade people to give us what we want. Because we may
find, (cough: DNS / Net Flow), that the data doesn't tell us what we think it will.
84
85. This means to develop a robust business case for getting 'the data' we need, we need
do two things.
85
86. First, we need to answer all the questions that tell us about our current, potential and
future inventory of data sources that we'll have available to work with.
Then second, we need to work out what the answers mean ... for what we may or
may not be able recognize ... both now and in the future.
86
87. The best ‘thinky framework’ I’ve found for these tasks is the Cyber Defence Matrix …
87
89. It gives us is a way to think about what data from different technologies can tell us
about the entities our business depends on to operate (i.e. devices, apps, networks,
data and users) – all in relation to core security activity domains (i.e. identify, protect,
detect, etc).
89
91. And building on that, we can use the matrix to compare the data we need for specific
detection scenarios vs the data we have from those technologies.
Here's an example of how...
91
92. Let’s say we've decided, (based on some appropriate threat modelling), that we need
to detect if Super Users are trying to exfiltrate trade secrets.
92
93. We can use the matrix to think through all the different data sources we’d need to do
this:
- We’ll need a data source that can identify those users and the accounts they use
- We may want to know what device they are using when they SU or SUDO
- If we have a privilege management solution in place, we may want to reconcile
what the logs tell us they did, vs the justification for checking out elevated
privileges
- We may want to observe what they do when they log on to any given server...
- ...what they do when they access applications...
- ...and any network traffic that indicates movement of data from A to B
93
94. With this information, we can focus in on priority data sources for this detection; let's
say we decide these are endpoint PowerShell logs and web proxy data.
94
95. Now we can ask questions about those data sets across our dependency layers, to
work out the level of capability we have for each one to deliver the result we want.
Here I've gone with a sliding capability scale of 'none to optimized'.
95
96. Let’s look at an example; say this is web proxy.
Working from the bottom up:
- We’ve got great tech
- But it’s not configured well at all
- There's a weak pipeline to move it to our platform
- But there’s no real bandwidth to shift the logs at the volume we want to
- We have a good platform to put the data in, with good processes around it
- But there are only weak team processes to take analytics output, although there
are some skills in place
96
97. By doing this, it very quickly becomes clear what we need to shift to the right to get
where we want to go – and where we'll need to identify opportunities, costs,
constraints etc. to work out what's involved to achieve this.
97
98. Side note: we can also ask questions about what happens if we do all this work, then
the web proxy vendor brand we are using gets swapped out. Will we have the same
fidelity of data available or not? And what does that mean for what we can observe
or detect in the data it generates?
98
99. By going through this process for all our detection goals, and each data source
implicated in those goals, we build up two things…
99
100. The first is a cross-referenced list of our minimum viable data sets – the ones we
absolutely need to support multiple detection goals.
The second is a picture of what data we can get, and why. This informs the constraints
we need to remove now or in the longer term, if we are going to achieve our
detection goals.
The good news is because we’ve done this for multiple data sets across multiple
detections, we have a very comprehensive business case for where to make changes
to have the biggest impact on improving our overall situation.
100
101. Here’s an example of a what a 4 phase minimum viable data set plan might look like.
This isn’t necessarily ‘right’, but it demonstrates how you can build up a roadmap for
‘why these data sources, in this order?’
This kind of thing is also really helpful to share with other teams, so they can see the
logic behind your decisions.
101
102. This kind of roadmap also us take a message to the business that bridges the nirvana
they may be asking for (or want straight away), and the reality of where we need to
start on the journey to get there.
And we can put that message in the context of fundamental building blocks that will
help us improve our capabilities over time.
102
103. If you don’t have a flat network, you can go more granular with this across different
trust zones…
103
104. …which serves to further limit scope by reducing the data we need to be concerned
about.
104
105. With our business case and roadmap prepared, it’s onto our penultimate act: moving
the data from where it is, to where we can analyze it.
105
106. A key challenge that any serious effort to centralize security logs will run into is
bandwidth.
106
108. The reason for that is when decisions were made about what size pipes were needed
to move data around, security logs at high volume usually weren't on the laundry list.
108
109. This means, when the war cry goes out, different teams will process the
consequences of this in different ways…
109
110. … for example the networks team may see the project as a significant risk to their
SLAs.
110
111. As a result, before the rubber hits the road of shifting data around, you quickly
encounter a ton of limitations.
In turn, these means you'll have to make a lot of judgement calls about what data is
stored where, and for how long.
111
112. These judgement calls will bridge the three pillars of ‘getting’ the data:
- The hosts or observes generating it
- The intermediary store and forward collector
- The platform destination, where your analysis is running
112
113. And the decisions you'll need to make break down along these lines.
113
114. This is important, because if you need rich content for an investigation, but what you
have in your central platform is summarized or metricated…
114
115. … then you risk ending up in a situation like Charity Majors describes here.
Where context hasn’t just been filtered out en route from source to platform; it’s
been completely discarded.
115
116. This means, as you filter things out between source, collector and platform, the
question of local storage capacity, and how long logs are kept where gets more
important.
116
117. In a large environment, you’ll need to think about this across a huge variation of
source systems, collectors and platforms.
So this problem fast becomes multi-dimensional just for one category of technology,
like ‘Windows servers’.
117
118. Also, this stuff is going to be in different places.
Some in the cloud, some in different on prem data centers, some in the hybrid cloud.
And unless you're lucky, there will also be some nightmare legacy environment that
you’re told to go nowhere near until 'the transformation' is over.
118
119. This means our reality is more like dealing with a multi-headed squid with many
tentacles into the data we need, rather than having the power of one ring to rule
them all.
119
120. To manage this, we need to break down the problem, so we can
- understand the complexity we’re dealing with, (aka: where the bodies are buried)
- make the problem visible (but not totally horrifying) to our stakeholders
- and then make the best decisions we can in view of the constraints we’re left with
120
121. It’s also important to do this because as we talk with all the teams we’ll need help
from, we may find stakeholders who have influence, but are - for some reason –
unsupportive of our quest. If they want, can prevent us making progress.
This quest, and dealing with other teams, becomes a whole lot easier if we’ve already
thought through all their concerns, and we can show we’re here to do as much work
as we can for them, rather than adding to what’s on their plate.
121
125. First the ‘logging mechanism’ we care about. The point here is we need to think and
very precisely about what we are asking. For example, when we say ‘we'd like the the
logs for our windows servers'…
125
126. … we could be referring to native Windows logs, or Anti Virus alerts from agents on
windows server, or the logs from the different functions the servers perform.
Long story short, have a way to structure requests so it’s clear what we’re after from
the teams we’re working with, and it's easy to keep track of what we expect to get.
126
127. Second: volume. We’re going to have to understand this before we can start moving
data across our network.
127
129. The key thing we need to understand and agree on for this is ‘the time period of
interest’.
It may be general, for example over 7 days show me the log volume per day, broken
down per hour.
129
130. Or it may be specific to a constraint we know about.
For example, how long can a host go before over-writing logs, which impacts how
often we need to write and then shift data in batch if we can’t stream it.
130
131. Equally, our time period of interest may be downstream to understand if our collector
has enough horse-power to handle the 95th
percentile of log volumes from all hosts
and observers it serves at peak times.
131
132. Side note: it’s important to pick devices that are truly indicative of volume. But it’s
easy to get extrapolation wrong.
132
133. This is because some devices of the same type create far more logs than others – and
some are unpredictable in terms of what they generate what over time period. So we
need to be sure that we have the right spread of devices when we pick our sample.
133
134. Side note to side note: This also matters because if we give our infrastructure team a
spec for presumed volume we want to handle, and they host our collection servers
somewhere that doesn't have the capacity to handle true volume - or scale up the
number of collectors within that hosting environment if we need to -
we'll be in trouble.
134
135. Meta side note: there are lots of time periods of interest to think about, for example
as data about badness goes all the way from generation, to triggering an alert that we
take action on. And they all need some thought.
135
137. You can move logs directly from source to collection server or end platform
137
138. Or you may have a middle man, a ‘decentralized collection point’ that you need to
draw things from.
Why this matters is you may need to use different collection servers, placed in
different points across your infrastructure, for different data types before you send
them to a central platform. You'll need to keep track of what's collected where.
Also, if you move straight to platform, you may not get the compression advantages
you do if you send logs via a collector ... which is a consideration if you don’t have
much headroom to play with.
138
139. Format. There are lots of them.
You need to know what hosts or observers generate, as this informs what your
toolchain will need to process directly, or transform into something that can be
processed.
139
141. Capacity matters on host and observers because if (or when) you ask for the logs to
be turned up to 11, you need to know if the devices generating the logs can deal with
that, without melting their CPU, memory or Disk IOPS.
141
144. We covered the topic of the data fidelity we have available earlier, and so we don’t
need to go into detail other than to say…
144
145. … if you choose to filter out at your collector, track and measure what you filter out,
so you know what you can filter in, if you need to at a later date.
145
147. This may or may not depend on your decision of what platform(s) you go with, or
how difficult it is to get agents onto hosts in your environment.
But consider your long term plan, and if your forwarding agents will be able to drink
from the firehose of what you eventually want to send through them.
147
148. Next, the spec for your store and forward collection device.
148
149. Just as with your hosts and observers, if you turn your logs up to 11, you need to
know if your collectors can handle that.
149
150. Here’s a handy guide to figuring this out.
- First, mimic the kind of volume you’ll get from a top talking host.
- Second, tweak the input to your collection server via config to see how a given
spec of CPU, Memory etc handles the input.
- Then, where it fails, increase the resource you need.
Bonus, you can also benchmark your different forwarding decisions, and see which
fwd’r gives you the best results in terms of output, compression etc.
150
152. For your collection architecture, you’ll have some decisions about resilience.
152
153. And finally, you'll may need to consider throttling the speed of forwarding logs.
153
154. There are definite headroom benefits to periodic forwarding via batch, but there are
also considerations about frequency for analytics that are running.
For some analysis, a 24 hour lag may be acceptable; but for others, the SLA may be
measured in minutes.
154
155. If we need streaming, it’s important to know which team has their hands on the lever
for what data is moved when – and what circumstances might trigger a change the
rate of data transfer, (as well as who gets notified if that happens).
155
156. Also, if you work somewhere where bandwidth headroom is severely contended,
there may be times when you can't move logs because headroom on the network is
unavailable.
This means you need to consider what happens if a connection fails between a
collector and a platform during times when we can shift the logs, and answer
questions like
- How long will it be before logs get overwritten or moved to cold storage?
- And can we ever catch up again if we can't fail over gracefully to another collector
to handle the load?
[3:00]
156
161. Mainly that’s because the way some commercial platforms are priced limits how
much data you can make available to analyze.
161
162. But, as this alternative answer on the same thread neatly states, the main problems
that come along with platforms are agnostic of which one you chose.
Ultimately, everything comes with a hidden cost. So whether you go open source ELK
Hadoop, or Splunk or Sumo Logic, somewhere, you’ll have to pay the piper.
So don’t kid yourself about how expensive this will be if you're aiming to do it
properly.
162
163. Ok, phew. We’ve covered a lot of ground. So let’s reset quickly on where we are
before we get into our fourth and final act.
163
164. We’ve talked about the need to have a team to manage data like a product across all
the dimensions that need joining up for analytics success.
164
165. Most importantly, these involve identifying and prioritizing the data we need, and
looking at the reality of what we can get in what order.
165
167. To do this we can use the Cyber Defence Matrix to build a picture of data sets that
link to our detection goals.
167
168. Then we can then figure out our dependencies per data source to understand what
capabilities we need to shift to the right.
168
169. From this, we can create our justified, phased roadmap for data acquisition.
169
170. Which considers all the storage, transport, etc dimensions across our three pillars of
'getting the data'.
170
171. With this information, we’re ready to work with all the teams who we need help from
to get this done.
171
172. Having made sure we’ve carefully considered all the angles.
172
173. So, why does all this matter, to detect functions in particular, as well as the broader
security team?
What's the change that all this effort of managing data like a product is trying to
bring.
173
175. … the answer is usually compliance, or ‘this is just what came out of the box’.
And if you’ve looked into the cold dark soul of the pipeline to your SIEM platform –
and what it actually contains – like many others, you've probably found it to be …
wanting.
175
176. And yet, very rarely do organizations ask, for the teams that need to ask questions of
data:
- What's your user need?
- What context does that need operate in today?
- And what changes are required?
176
177. When I ask that question, I'm yet to find a SOC analyst that says ‘We need another
50k alerts to triage, with no idea of what the source system is or who owns it.’
Usually it’s much simpler: we need to search.
177
178. So what’s Status Quo, what is the system of processes, that has led to the sub-
optimal situation that most security teams find themselves operating within?
178
179. Let’s imagine that the space in these buckets represents all security-relevant data we
could capture.
179
181. For any given bucket, we'll only be generating a percentage of everything we could
be.
181
182. And although there might be some variation within a bucket over time – for example
periodic dip tests for operational troubleshooting that Bob is doing without telling
anyone on his syslog server under a desk – mostly, it’s fairly static.
Though let’s not forget about that syslog server: it’s important later on.
182
183. This means there’s some data we’re not generating, which offers potential value to
us.
183
184. And for what we are generating, we’ll only be centralizing some of it.
184
185. What that means in aggregate in terms of what data we have and where, is
something like this.
185
186. So let’s zoom in on what we are generating, and look at the last 90 days.
186
187. For this time period, there will be some amount of days we reasonably assume data
is available. Then some amount of days where we know it’s more of a gamble.
187
188. So when it happens, the moment of boom when we learn about ‘the incident’ …
188
189. … we suddenly find ourselves asking searching questions about what data we can get
our hands on.
189
191. We’ve called in a team of experts. And they know the signals to look for to
understand what happened, and how.
191
192. Of course, not all of them will be relevant, some will be noise.
But we need to get to the data sets that contain them, if we're to work out where the
adversary went and what they’ve done.
192
193. But, when we go and look for those logs, we get an unpleasant surprise.
We don’t even store for 30 days locally!! There was no policy saying we had to, so
someone dialed down logging while troubleshooting during another incident a few
months back, and they never re-set it back to how it was before.
193
194. But we can always check our SIEM? Right…?
We can, but - oh no - we find it only stores the data directly associated with an alert
for 90 days.
All other data was filtered out at local analysis points on de-centralized collectors.
194
197. It's time to ask: do we have those logs anywhere else?
197
198. And on this occasion, our prayers are answered. By Bob.
198
199. Someone finds out about his syslog ops server. And guess what, it has the data we
need on it!
199
200. So we learn our lesson. This won’t happen again. We need to roll that data source
into our SIEM...
200
201. ...to make sure that next time, everything will be OK, because we’ll get those alarms.
201
202. Except we won’t, because this is incident / story driven collection.
And what happened last time won’t be what stings us next time, and if it is, it almost
certainly won't be in the same place.
202
211. To get to better detection outcomes, we need to correlate better nuggets.
211
212. Sure, for some things, we can get by with a single, high fidelity alert in our playbook.
And we should absolutely jump on those opportunities.
212
213. But for everything else, as this fantastic thread makes clear, the answer to ‘how do
we get more relevant alerts’ is about having the relevant data available to us that we
need to devise those alerts.
213
214. It is not about dialing down data collection. That is the right answer to the wrong
problem. Because how do you know you don’t need a field in a data source, if you
never extracted it to see what it could tell you?
214
215. This becomes clearer when we think about the three angles from which we need to
view alerts and signals of malicious activity.
215
217. Before we heard the boom, we could have seen the lightening strike.
But if that was invisible to us, because it happened locally, the non high-fidelity
signals we do get alerted on later stand a greater chance of being missed, or
categorized as ’non threatening’, because they're viewed in isolation.
217
219. Sequence on it’s own may not be enough for us to climb the ladder from benign to
suspicious to malicious.
We may need to dive deep into the structure of a thing that was alerted on, or we
found during a hunt.
219
225. And sometimes there will be different signals in different data sources at the same, or
almost the same, time.
225
226. This is why our default position in Security Data Operations has to start here.
226
227. Because if history has taught us anything, it’s that we discard at risk what didn’t seem
relevant, until it was.
And by doing so we lose the ability to join and correlate data to attribute cause or
provide a piece of the narrative of an attack.
227
229. There are huge benefits to managing security data like a product.
229
230. This isn’t easy, but the right frameworks exist – along with Mitre Att&ck – to do it
scientifically.
We can begin categorizing where we have visibility, and for how long.
Within that, we can work out what our recognition capability looks like - and how
quickly we can adapt to new information to prevent and block, or detect.
230
231. This can help multiple functions collaborate and improve their operational
effectiveness.
231
232. So if you haven’t already, go bring the right people together in your business – and
get started!
You’ll be surprised by how many other teams in your business get behind this kind of
thing when they see what’s in it for them as well.
232
233. I’d like to say a huge thank you to the friends and colleagues who’ve helped shape
this talk.
233
234. And thanks to you for listening.
If you have any questions or suggestions about how we can use data to keep calm
and make the world a better place, I'd love to talk.
234