Presentation on how to chat with PDF using ChatGPT code interpreter
Performing Network & Security Analytics with Hadoop
1. Performing Network & Security Analytics
with Hadoop
Travis Dawson
Director of Product Management Narus, Inc
Hadoop Summit 2012
2. Agenda
Who am I, What do I do
What is Network & Security Analytics
Using Hadoop in Network & Security Analytics
What becomes possible with Big Data Analytics
Putting it all together
Lessons Learned
Narus | 2
3. Who am I
What do I do
Geek
Director of Product Management, Narus Inc
– Narus Inc, A wholly owned subsidiary of Boeing
– Build High Performance Network Intelligence Systems
– I herd cats and make Powerpoints all day
– Occasionally think about product requirements
Principal Member Technical Staff, Sprint
– Sprint Advanced Technology Labs
– Wireline/Wireless Network Architecture, Design, Security
– I broke stuff
Narus | 3
4. What is Network & Security Analytics
A type of voodoo, but with computers
The (black) art of finding malicious or problematic
sessions in a mountain of network traffic
Multiple approaches
– Signatures/Blacklists
– Behavior
– Algorithmic
– Ouiji Board, Live Chicken, Full Moon, etc
Single Goal
– Identify malicious or problematic traffic before it causes
substantial harm to your network or your assets.
Narus | 4
5. Network & Security Analytics
What’s working against you
The enemy is ever-changing and infinitely intelligent
New attack vectors are more difficult to detect than ever
– Polymorphic, Randomized
– APTs are real
– Zero-Days
– Protocol, Application, OS
Traditional Methods in-effective
– Payloads ever changing
– Simply too many new and existing
Higher speeds of links makes deeper analysis harder
– 10G/sec maxes out at ~15M packets per second
Narus | 5
6. What is Network & Security Analytics
Finding the Needle in a stack of Needles
Where to look
– Which stack of Needles do I need to look at
What are you looking for
– Do you know?
– Are you guessing?
– Do you know what you are NOT looking for?
How to find something that is not ‘right’
– What is ‘right’, what is ‘not-right’, what is ‘wrong’?
– What is the difference?
– What is ‘normal’ vs what is ‘right’ ?
– How much data do you need ?
Narus | 6
7. What is Network & Security Analytics
Solving the Network & Security Analytics Problem
Multiple Methods, Multiple Algorithms, Multiple
Passes Per Analytic
You need a lot of data to determine what is ‘not-right’
– More data == More accurate results
You need to run sophisticated algorithms across the data
– Use new algorithms to find something ‘not-right’
– Not always easy
You need multiple passes on the data
– One Algorithm feeds the next Algorithm
– Focus on the workflow, how an analyst would work.
Narus | 7
8. Breaking out of the SQL Prison
A quick rant
SQL has been around since the 70’s
– So have I!
– Great for solving ‘known’ problems
Unable to perform the deep analytics required
– No combination of SELECT, JOIN, UDF will get you what you
need at times
– Unstructured data is a nightmare and now more common
However, use of one tool does not mean you can’t use
another tool as well
– SQL and Hadoop can live very happily together
– The right tool for the right job, or more precisely:
• The right tool for the right PART of the job
Narus | 8
9. Network & Security Analytic
Using Hadoop to solve the hard problems
Amount of Data
– 1 week -> 1 Month+ of data: 100’s of Billions of Sessions, 100’s
of TB’s of Data, ingesting dozens of data types and millions of
sessions per hour
Algorithms
– Looking for sessions that look something like this thing or maybe
unlike this other thing. You can do that right???
Unstructured
– We have no idea what we are going to get in terms of
information
Price per Analytic Hour
– How much does it cost to run this analytic in a set amount of time
Narus | 9
10. Network & Security Analytic
A Simple Workflow Example
Find a Polymorphic BotNet/Worm infection vector
Find the suspected infected hosts
– Clustering/Behavior/Signatures to find possible bots and worms
Find the Command & Control
– From list of suspects, who are the most popular ‘servers’
Find ALL of the possible infections
– From C&C servers, what hosts were communicated with
– Cluster and group similar hosts to find even more
Find the Infection Vector
– From all the suspect hosts, cluster hosts by common Application
‘features’ and traffic patterns
You need a LOT of data and it’s non-deterministic
Narus | 10
11. Network & Security Analytic
Workflow details
What Makes This Work
Hadoop Tools/Methods Used
– Entropy, FFT, Behavior Jobs
– Mahout (Clustering and Machine Learning)
– Custom Clustering (Hourglass Co-Clustering)
– Custom Correlation
Other Tools Used
– Streaming Classification/Statistics Engine
– RDBMS
– Visualization Front End
Narus | 11
12. Network & Security Analytic
In real life
Many tools enabling each other
I need to I know I don’t know I need to I need to view
capture the what I am what I am organize the the findings
traffic looking for looking for findings logically
Metadata Datasets
Deep Summary Views
Packets
Streaming Analysis Shallow
Capture
Analysis Hadoop Analysis
RDBMS
Narus | 12
13. Lessons learned
How we learned to make it all work
Don’t use a hammer when you need a scalpel
– It just doesn’t work, don’t force it.
– If there is a better way of doing it, use that way
Hadoop does a lot of things really well
– Complicated algorithms over vast amounts of data
– Unstructured Data
Hadoop does some things really poorly
– Low Latency results for visualization
– Simple Statistics and some groupings
Use Hadoop in conjunction with other tools
– Use the best tool for the job.
– Break the job into pieces and evaluate the tools for each piece
Narus | 13
14. Conclusion
Hadoop as a platform for Network Security Analytics
Hadoop has allowed us to solve problems for our
customers that were previously unsolvable in a
reasonable amount of time
New algorithms and analytics were made possible by
Hadoop
By using Hadoop in conjunction with our Streaming
Engine and an RDBMS we were able to create a system
that performed better then just the sum of its parts.
We are now able to scale into larger datasets and extract
even better insights then before
No longer confined by any tool, we leverage the power of
Hadoop to solve many of our problems
Narus | 14