Consider this - each day, billions of packets both benign and some malicious flow in and out of networks. The ability to survive the sheer volume of data, bring the NETFLOW data to rest, enrich it, correlate it and perform analysis is essential tasks of the modern Defensive Cyber Security Organization. SHERPASURFING is an open source platform built on the proven Cloudera stack enabling organizations to perform the Cyber Security mission at scale at an affordable price point. This session will include an overview of the solution, presentation of components and a demonstration of analytics.
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Hadoop World 2011: Sherpasurfing - Wayne Wheeles
1. "Everest for me, and I believe for the world, is the physical and
symbolic manifestation of overcoming odds to achieve a dream."
~Tom Whittaker
Sherpasurfing
Open Source Cyber Security Solution
Wayne Wheeles, Six3 Systems
Active Defensive Analytic Developer
Hadoop World 2011
cloudera Six3 Systems
2. About me
• Employed by Six3 Systems, based in Fulton MD
• Defensive Analytic Developer (CND-OPS)
• Decade of solutions in Analytics & Big data
• 18 analytics in production, 12 forms of enrichment
Hadoop World 2011 cloudera Six3 Systems
3. What is SHERPASURFING
Open source Cyber Security Solution, providing a
framework, base set of proven services, data sources,
how to guides and patterns for analytic development
built on top of the Apache Hadoop stack
Hadoop World 2011 cloudera Six3 Systems
4. Topics
• The Problem
• Is there a better way?
• Conclusions
• Questions
Hadoop World 2011 cloudera Six3 Systems
6. Forces and Economics driving Cyber Security
Hackers…
Potential Victims…
Defenders…
Hadoop World 2011 cloudera Six3 Systems
7. A Story: Aftershock Widget Corporation
Aftershock Widget Corporation
PROFILE
• Software development firm that develops applications for
variety of different platforms
A series of bizarre fraudulent charges appeared on 2009 – Credit card theft
victimized 11.1M
Aftershock Widgets credit cards Americans costing 54B
The next generation application that Aftershock has 2009 – Intellectual
designed is stolen and hits market six months Property cost the economy
1.2T dollars annually
before release by a competitor
In a recent poll 94% of
During peak ordering season web-traffic grinds to a respondents stated that
halt a DDOS was a major
concern
Hadoop World 2011 cloudera Six3 Systems
8. Aftershock Widget Corporation calls for HELP!
We are going to bring in some smart people
Who brought in more smart people
Who brought even more smart people !
Who brought in some off the shelf technology solution, resulting in a
MARKETECTURE
clear and compelling roadmap for the organization
Their solution was butts in seats,
big iron and huge integration costs
Hadoop World 2011 cloudera Six3 Systems
9. Is there a better way ?
Hadoop World 2011 cloudera Six3 Systems
10. Driving tenants of SHERPA:
• Cost effective scaling for handling BIGDATA
• Brings all data together
• Must support all forms of data
• Must be question agnostic
• Foster sharing and exchanging of analytics/tradecraft
Hadoop World 2011 cloudera Six3 Systems
11. Assess the environment:
• What are my data sources, how much data, how fast?
• What are the data formats?
• What do I need to know from the collected data?
• What do I do with the information?
• Who needs the results and what do we do with results ?
Hadoop World 2011 cloudera Six3 Systems
12. Enough Talk ! Let’s get Started
TODO List
1.) Need some commodity
X 5
Hardware, configure network
32GB RAM
4x300GB 6G SAS 15k HDD
8-Core AMD Opteron Processor Model 6128 (2.0GHz, 80W)
2.) Provision them with RHEL
X86_64 Server 6.1
3.) Install JDK 1.6u26
4.) Install the Cloudera CDH3U1,
Enterprise 3.5.2
5.) Configure HDFS, HBASE,
HDFS - 4 Data Nodes/Task Trackers, Name Node ZOOKEEPER, FLUME
HBASE – 1 Master, 4 Region Servers
Zookeeper – 4 Zookeeper servers
HUE - 1 HUE, 4 HUE Agents
FLUME – 1 Master, 2 Nodes
Hadoop World 2011 cloudera Six3 Systems
13. Sources, Sinks and Agents
Users Intellectual Protected
TODO List
Property Data
1.) Identify data sources of
potential value
2.) Install FLUME on
User Logging each source from Cloudera
Intrusion
Corporate Detection Enterprise
App Server(s) System(s)
3.) Configure FLUME Agent to tail
sink IPS/IDS signature hits a file or directory for each source
Log data
4.) Test each data source to ensure
data is being collected correctly
sink using command line (dump)
Corporate
Firewall Firewall logs $ flume dump 'text("/cp/10/21/0800/current.log")'
sink
sink 5.) Configure the sink(destination)
Flow Data Flow Capture for each FLUME Agent
Packet Capture
Aftershock Corporate Internet Gateway 1
Hadoop World 2011 sink cloudera Six3 Systems
14. The Pieces come Together
SHERPA – Analytic Framework SHERPA
Components
HUE
HBASE
PIG HIVE Data Sets
GEO Enrichment S
30 days 61,764,205 netflows
T
30 days 1,065,977 SNORT
SQOOP
Port Enrichment A
T 30 days 4,065,977 Firewall
Protocol Enrichment S Packet Data
HDFS Firewall Logs
ZOOKEEPER
Netflow Data Application Server Logging
User Logging IDS/IPS Logs
sink sink sink sink sink sink
User App Netflow Packet Firewall IDS/IPS
Logging Server Logging Capture Logs Logs
Hadoop World 2011 Logging cloudera Six3 Systems
15. Develop and Deploy Analytics
Risk Potential Correlate Perform
Index
Report
Analytic Results Enrichment
Flow
Characterization
Analytic
Analytic Runtime Environment
Health & Analytic CORE
Data Services Job Control
SHERPA Status Registry Services
SDK
SHERPA – Analytic Framework
Hadoop World 2011 cloudera Six3 Systems
16. SHERPASURFING Toolkit
• FLUME Sinks, Decorators
• HBASE Object Definitions
• Multiple forms of Enrichment
• SHERPA Developers Guide and Cookbook
• Two Sample Analytics
• Enterprise Analytic Framework
Hadoop World 2011 cloudera Six3 Systems
18. The Wrap-up
• The threat is very real, well funded and determined
• The problem has an incredible often hidden impact
• Apache Hadoop stack provide an effective foundation
• SHERPA solution builds on that stack
• Provides a framework for Cyber Security Analytics
Hadoop World 2011 cloudera Six3 Systems
19. sherpasurfing@gmail.com
QUESTIONS?
“Imagination is more important than knowledge. For knowledge is limited to all we
now know and understand, while imagination embraces the entire world, and all
there ever will be to know and understand.”
~Albert Einstein
Hadoop World 2011 cloudera Six3 Systems
Notas do Editor
On an opening note please hold all questions until the end.Two days ago I was practicing this presentation with my daughter/I opened with “This presentation has been 20 years in the making” to which my daughter replied “it took you twenty years to put together a power point?”OK, got that out of the waySHERPAsurfing takes all of the knowledge that has been captured over that 22 years and puts it to workI love quotes and this is one is a personal favorite that explains my view of SHERPASurfingAn open source Cyber Security platform is representative of my personal Everest and the fact that I am in no shape to scale Everest
I work for Novii Design, based in scenic Fulton Maryland, beautiful place between Washington and BaltimoreMy name is Wayne Wheeles, I am a Cyber Security Defensive Analytic Developer – Active Defense for Novii Design based in Fulton MDOver the last twenty years, I have worked as a software developer, Cyber Security Analytic developer, Data Scientist, Architect and Database Engineer for a variety of customers in the public and private sector. I have worked with high speed data feeds, quick response analytics and BIG DATA over 2.5PI have over 18 analytics in production, 12 different forms of data enrichment
Equally important to what is SHERPASURFING is answering “Why did you do this?”I attended a Cyber Security Conference earlier this year in Nashville, I was presenting on a very large Cyber Security solution I was working on.Over three days I was approached repeatedly by customers from small to medium sized organizations askingabout the potential for a cost effective solution.SHERPASURFING or SHERPA for short is an open source solution, providing a framework, foundation set of services, data sources, cookbooks and patterns for developmentThe purpose of all of these components is the design, development and deployment of analytics with a ultimate objective of exchange of tradecraft/analytics Based on the proven Apache HADOOP stackThe Goal, a cost effective solution, that medium sized organizations can afford that will handleBillions of records each day10 – 50 Terabytes of data each day
This afternoon we are going to discuss the problem with a focus on doing something about this is not a problem appreciation exerciseWe will investigate the potential provided by SHERPA and projects like it to explore the potential of “Is there a better way”We will circle back with conclusionsThen I will happily entertain questions and discussion
I view Cyber Security as one of the central challenges facing public and private organizations as well as individuals over the next decadeI wanted to open this afternoon with a brief introduction to the problem space posed by Cyber SecuritySomething to provide an initial context
In order to complete this story I wanted to level set on the Forces and Economics driving Cyber SecurityLets Open by discussing hackers and hackingThe commonly held opinion is that hackers are either script kitties or Jedi Knights They have created their own underground economy - Well financed, well trained, well organized and dedicated (over 1 trillion dollars annually estimated 2009) - The cost of hacking has declined by significantly the last five years - Tools - Tradecraft - Techniques - They are driven by a pervasive perception that they cannot be caught - The hackers play outside the boxPotential Victims - The cost to “defend yourself” has exploded several hundred percent, the Cyber Security Defense industry was a reported 1.4T in 2009 - An effective solution to this threat must be comprehensive and include all of the staff not just Cyber Security professionals - Companies bottom lines are being squeezed so they do not have the resources to face the threat/riskDefendersI met with a group of defenders who likened it to “putting on a hamburger necklace and jumping in a bear cage” - The defenders who are well versed in the threat since it is not static are few and far between - Tools are available at a cost that is generally unaffordable or address only one element of the threat - The tools do not bring the data together to see what is going on across the enterprise - Defenders are too often not trained to harden the environment - Often opt to use out of the box solutions
I would like to share a brief story that will illustrate how companies/organizations learn about Cyber SecurityAftershock Widget Corporation is representative of so many companies/organizations across the country and the globeThey are a small company that develops software for a variety of platformsThere education on Cyber Security begins with a series of bizarre credit card charges (anyone familiar with this ? Show of hands)Next, Aftershocks next generation app that is on the drawing board is released earlier by a competitor six months. We hear so much about Web site defacementAnd hear so very little about economic espionage/or theft of intellectual propertyFinally, so who has spent some time in a back up lately? Excellent now you understand what a Denial of Service. This is an resource exhaustion attack vectorAnd what is the result ? In one recent study it cost 6.3M for each 24hr period of outageSo what did Aftershock do?
Aftershock called for help like so many companies/organizations often follow a similar design patternWe called in some smart peopleWho brought in some more smart peopleWho brought in even more smart peopleWho brought in an off the shelf solution resulting in a clear and compelling roadmap for the companyThe solution was butts in seats, big iron, elongated time lines/delivery schedules and huge integration costs. Any time a solution is measured in pallets of money you are in troubleThis is marketecture
If I learned one thing from Being with Six3 it is Practical Cyber…..I want to share this with you today and the SHERPA Solution is built on its tenants:Really, my realization that I share with you is that there has to be a better waySHERPA is a simple attempt to provide an alternative that investigates/explores/challenges this very question
So based on over a decade of experience as a developer and countless interviews with customers I arrived at a set of driving tenants for the solutionI have to provide a solution that I can afford to scale and grow from gigabytes of data to petabytes of data without starting over again each year It has been my experience that you must bring the data together for real discovery to occur. I must be able to correlate what I am seeing from one dataSource with data from other sourcesMy solution must support: structured, unstructured, high volume, low volume, schema, nonschema data My solution must be question agnostic because the questions are everchanging (today malware, tomorrow DDOS, enext week xfiltration) My solution must offer the potential for sharing of analytics, tradecraft across the organization and potentially across the Cyber Security community (Stretch goal)
Take a look at the environment to determine what sources are available, how much data do they produceWhat format is the data in?Is it ASCII, binary, files, streams,What do I need to derive from the data?What do I do with the information ? Who do I send it to?Say I found out that there is someone Exfiltrating information Who would I inform? Corporate? Law enforcement?What do we do then?When you turn over a rock there is no telling what you will find
Six3 Systems was founded on the principle of STOP TALKING AND START DOING”So SHERPA is a Larry the Cable Guy approach to Cyber SecuritySHERPA provides a specification for the hardware, deployment plan, and tuning guideThis is the simple checklist that was derived from building our development environmentWe took some commodity hardware, 5 commodity boxes and a switch, 15000 in hardware, provisioned with Red Hat 6.1Installed the JDKInstalled all of our components based on the SHERPASURFING deployment planMy Environment is now up and running in a day or two and is ready for data!
As we mentioned earlier, I identify the sources of data that I need to be piped into the systemI inventory the things I need to defend:I inventory where sensitive data is located on my network: User Data/Personal Identifying Information, Intellectual Property, and Protected Data I inventory the sources available to defend my network:What are the sources ? Where are they ?What do they tell meWe started at our connection to the internet which is Aftershock corporate Internet gateway 1This gateway provides the customer user domain with access to applications, websites, email, resources, Webex, and increasingly VOIPAs we covered earlier, it also will provide potential adversaries with access to my Intellectual Property, User Sensitive Data, Trade Secrets and Protected Data So traditionally, moving data from the outer defenses to the point of analysis has been a major task(hand scripted, scp, cron jobs)FLUME – Which is a log and data collector framework is awesomeIt took what was a several week function down to a day or two All I need to deliver data to my analysis platform is:1.) Install FLUME on the boxes to be collected from2.) Use built in functionality and test by dumping the data3.) Configure the “SINK” or destination for each of the Agents
Reference Architecture with services inlaid
This slide we demonstrate an analytic on top of the stackWe are finalizing the analytic to be incorporated
This slide we demonstrate an analytic on top of the stackWe are finalizing the analytic to be incorporated