Between limited resources and a lack of trained professionals on one hand and the increasing quantity and quality of attacks on the other, securing enterprises and responding to incidents has placed defenders on the losing end of a digital arms race. Even managing the amounts of threat data and open-source intelligence has become a challenge.
This talk will cover the possibilities and perils of integrating all the various sources of threat intelligence data to protect an organization. With all the various open-source and paid-source data, simply dumping it all into a firewall or DNS RPZ zone can be problematic. What to do about compromised websites or shared hosting environments? What about DGA domains that use full words and may collide with actual innocent websites? What about how to handle threat data that is lacking in context to make appropriate decisions on its validity and accuracy? This talk will present several case studies in how these problems can be tackled and how using multi-domain analysis can help reduce the risk and maximize the value of automated protection using these types of data.
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
SANSFIRE18: War Stories on Using Automated Threat Intelligence for Defense
1. John Bambenek
VP, Security Research and Intelligence, ThreatSTOP
War Stories on Using Automated Threat
Intelligence for Defense
2. About me
• SANS ISC Handler
• VP of Security Research and Intelligence at ThreatSTOP
• Lecturer at the University of Illinois at Urbana-Champaign
• Producer of open-source threat feeds
• Involved in DNC, DCCC, et al investigations in 2016
3. The Problem – the “too much” issues
• “1,000,000 unfilled cybersecurity jobs”
• Too much work and not enough skilled people to do it.
• Too much data, no clear prioritization.
• Too much manual work to investigate and respond to incidents.
• What’s Worth Responding To? What is the Intention of the Attacker?
3
4. The Problem in Numbers
• Average dwell time during a breach: 4-5 months
• Percentage of breaches were evidence was in logs: 80+%
• These two data points mean that if a SOC knew what to look for and
had the tools to respond quickly, a great deal of damage could be
mitigated.
4
6. The Reality
There is a much smaller set of actual malware tools, many
are used by multiple people.
Problem: How to use this data effectively?
How to manage large data sets to correlate behavior over
time?
6
8. War Story #1 – Election Hacking
• Brief overview of DNC, et al related hacks.
• The private sector was “highly confident” of FSB/GRU attribution even
before the news was released in the summer of 2016.
• We have a long history of APT 28/29 history with a variety of TTPs and
other info that allowed not just the responders, but those who verified the
work of responders, to make determinations quickly.
• And see what they were doing during the French Presidential Election, and
some 2018 activity….
10. War Story #1 – Election Hacking
• TTP – likes impersonating “vendors”/”partners”.
• MIS Department in case of DNC
• Using DomainTools Brand Monitor or Farsight Brand Sentry,
you can proactively look for impersonation.
• WHOIS details also provide clues.
10
11. WHOIS Registrant Intel
• Often actors may re-use registrant information across
different campaigns. There may be other indicators too.
• Sometimes *even with WHOIS privacy protection* it may
be possible to correlate domains and by extension the
actor.
• Most criminal prosecution in cybercrime is due to an OPSEC
fail and the ability to map backwards in time of what the
actor did to find that fail that exposes them.
11
12. War Story #1 – Election Hacking
12
Maltego graph from Motherboard: https://motherboard.vice.com/en_us/article/vvaxy8/evidence-linking-
russian-hackers-fancy-bear-to-macron-phishing
13. War Story #1 – Election Hacking
• Trend Micro was looking for domains with “en-marche” in
the name and found 4.
• En Marche! Said they fed fake information to the adversary.
• Contrast with American response.
• You COULD hack back here… but why?
• There are dangers of deception though.
13
14. What we can do in 2018?
• Because of the shear number of targets, any in-depth attempt to
target political or election organizations will be “loud”.
• If data is shared (IPs, domains, etc), AND you automatically block
them, you can have a good layer of protection.
• MS-ISAC, DHS AIS, other…
15. Malware Configs
• Every malware has different configurable items.
• Not every configuration item is necessarily valuable for intelligence
purposes. Some items may have default values.
• Free-form text fields provide interesting data that may be useful for
correlation.
• Mutex can be useful for correlating binaries to the same actor.
• How to get to the identity of someone using Cobalt Strike to attack you?
• KEY POINT: Non-operational data is still useful for intelligence purposes.
15
16. Where to get Malware
• Everyone uses Virustotal
• You can buy a malware feed…
• Better is to mine your spam / e-mail for attacks.
• This is the targeted malware no one can sell you.
• Eliminate malware seen by VT (other sources), that is
unique
• Who are the repeat visitors? Advanced attackers need to
go low and slow...
16
20. War Story #2 – Understanding Locky
• Locky uses combination of static domains and a DGA for C2.
• Has an affiliate program.
• Seems to heavily favor necurs for delivery (but not
exclusively)
20
22. War Story #2 – Understanding Locky
• We know there is a close relationship between necurs and
Locky. (What about specific affiliates?)
• We can see it’s likely Locky operator runs C2 infrastructure
on behalf of affiliates.
• This can inform prosecutorial decisions or potential “hack
back” operatiors (i.e. stealing encryption keys)
22
23. Using DNS to Track the Adversary
• Only certain ways you can contact a C2 server:
• Static IP / Hostname Lists
• Proxied C2s
• Dynamic DNS
• Fast Flux / Double Flux Networks
• Domain Generation Algorithms
• Tor / i2p hidden services
23
24. Domain Generation Algorithms
Usually a complex math algorithm to create pseudo-random
but predictable domain names.
Now instead of a static list, you have a dynamic list of
hundreds or thousands of domains and adversary only
needs to have a couple registered at a time.
Can search for “friendly” registrars to avoid suspension.
24
25. Reverse Engineering DGAs
Many blog posts about reversing specific DGAs, Johannes Bader
has the most online at his blog:
Johannesbader.ch
No real shortcuts except working through IDA/Debugger and
reversing the function.
Look for functions that iterate many times.
There will be at least a function to generate the domains and a
function to connect to all of them to find the C2.
As with all reverse engineering, be aware of obfuscation and decoy
code meant to deceive you.
26. Types of DGAs
Almost all DGAs use some time of “Seed”.
Types:
Date-based
Static seed
Dynamic seed
Seed has to be globally consistent so all victims use the same
one at the same time.
27. Feed generation on DGAs
• sjuemopwhollev.co.uk,Domain used by Cryptolocker - Flashback DGA for 13 Aug 2015,2015-08-13
• meeeqyblgbussq.info,Domain used by Cryptolocker - Flashback DGA for 13 Aug 2015,2015-08-13
• ntjqyqhqwcwost.com,Domain used by Cryptolocker - Flashback DGA for 13 Aug 2015,2015-08-13,
• nvtvqpjmstuvju.net,Domain used by Cryptolocker - Flashback DGA for 13 Aug 2015,2015-08-13
• olyiyhprjuwrsl.biz,Domain used by Cryptolocker - Flashback DGA for 13 Aug 2015,2015-08-13
• sillomslltbgyu.ru,Domain used by Cryptolocker - Flashback DGA for 13 Aug 2015,2015-08-13
• gmqjihgsfulcau.org,Domain used by Cryptolocker - Flashback DGA for 13 Aug 2015,2015-08-13,
• From here you could easily feed this into RPZ or other technology to
protect your organization.
28. DGA surveillance
Pre-generate all domains 2 days before to 2 days in future.
Pipe all those domains into adnshost using parallel to limit the
number of lines.
Able to process over 700,000 domains inside 10 minutes (and
I’m not done optimizing).
• parallel -j4 --max-lines=3500 --pipe adnshost -a -f < $list-of-domains | fgrep -v
nxdomain >> $outputfile
30. What to do with this data?
• With IP addresses, you can just block them at the firewall.
• Inbound **AND** outbound traffic.
• If you control DNS, you control the endpoint. Use a DNS Firewall!
• Which means you can limit what the device can talk to in order to prevent
exploitation or command-and-control.
• DNS is on everything… even IoT devices!
31. What is a DNS Firewall?
• Uses RPZ (Response Policy Zones) or the Microsoft equivalent.
• Response Policy Zones are zone files you put into your DNS resolver
that can block, redirect, or alert on specific queries.
• Can flag on:
• Specific hostname, domain, or TLD (i.e. www.google.com or *.ru)
• The resolved IP address
• The authoritative nameserver hostnames used
• The authoritative nameserver IP addresses used
32. Block Bad Neighborhoods
• There are many networks you can be pretty sure they are “always”
safe (i.e. CDNs).
• There are many networks you can treat as completely malicious (i.e.
bullet proof hosters).
• Some countries you may not have (or want) to talk to.
• ITAR/OFAC
• Why should your MRI machine talk to a Russian IP?
33. War Story #3 – Operation Tovar
• One of the first modern successful ransomware attacks.
• Was able to proactively monitor all new registrations for
domains, mine registrant details, and ultimately get quicker
to look at proxies.
• This not only allowed us to grind to get to an indictment of
Evgeniy Bogachev, but also to retrieve the private
encryption keys so people could get their files back.
• Was able to do a bulk takedown and shut the whole system
down.
33
34. Tracking Malware Functions
• We have tools to correlate IP addresses, domains,
registration information, malware families, malware
configs…
• What about specific functions or portions of code?
• The more we can correlate, the more we can get visibility
into how code is shared, developed, and the ecosystem
behind it.
34
35. FIRST IDA Plugin
• Developed by Cisco Talos: https://github.com/vrtadmin/FIRST-
plugin-ida
• In essence, ties a database into IDA so you can search for
functions that exist elsewhere to find code level relationships.
• Presentation: https://www.botconf.eu/wp-
content/uploads/2016/11/PR11-Function-Identification-and-
Recovery-Signature-Tool-Villegas.pdf
35
37. War Story #4 - Wannacry
• We all know Wannacry, worm-based ransomware using
disclosed exploits (Thanks NSA!).
• Very quickly we noticed that the payment infrastructure
was not sound (and neither was NotPetya)
• What’s the point of cryptographic ransomware if you aren’t
getting paid? (Made only about $100k USD)
37
38. War Story #4 - Wannacry
38
From Costin Raiu twitter, 40 byte code reuse from Lazarus backdoor
39. War Story #4 - Wannacry
• 40 bytes of code were identical to a Lazarus Group (DPRK)
backdoor used in 2015.
• Found by “spot checking” and memory.
• This is not ideal
• Not found anywhere else.
• Inconclusive but suggests DPRK (since proven).
• We NEED to figure out a way to make this a database search
problem, not a tribal lore in analyst’s mind problem.
39
40. Last Key Point
• Ending this talk with WannaCry and NotPetya was intentional.
• Most of the techniques here are useful for crime.
• Increasingly, however, APT is using crime tools as “obfuscation”.
• WannaCry and NotPetya (if we’re right) are precursors to future
APT attacks using criminals tools.
• What if our research leads to a kinetic response?
• We need to get the above right to disambiguate their intentions
and to find investigate leads and potential weaknesses (hack
back?)
40
41. Solution
• Lots of us are all working on the same problems
independently, we need to be working together more and
sharing data.
• Sharing data isn’t to contribute more to “admiring the
problem”. Need to block stuff.
• Back to Pyramid of Pain, block as much as you can as low as
you can to focus limited people/resources on ”what’s left”.
41