SlideShare a Scribd company logo
1 of 30
Download to read offline
Using Machine Learning to
support Information Security
Alexandre Pinto
alexcp@mlsecproject.org
@alexcpsec
@MLSecProject
Proving Ground (Many Thanks to Joel Wilbanks)
• This is a talk about DEFENDING not attacking
– NO systems were harmed on the development of
this talk.
– This is NOT about some vanity hack that will be
patched tomorrow
– We are actually trying to BUILD something here.
• This talk includes more MATH thank the daily
recommended assumption by the FDA.
• You have been warned...
WARNING!
• 12 years in Information Security, done a little bit of
everything.
• Past 7 or so years leading security consultancy and
monitoring teams in Brazil, London and the US.
– If there is any way a SIEM can hurt you, it did to me.
• Researching machine learning and data science in
general for the past year or so. Participates in
Kaggle machine learning competitions (for fun, not
for profit).
• First presentation in a real Infosec conference! (give
or take a few hours)
Who’s Alex?
• The elephant in the room
• Enter Machine Learning
• Principles and Kinds of ML
• ML and InfoSec
• MLSec Project
• How to get started?
• Take Aways
Agenda
The elephant in the room
• “Internet-scale companies”
The elephant in the room
• “Machine learning systems automatically
learn programs from data” (*)
• You don’t really code the program, but it
is inferred from data.
• Intuition of trying to mimic the way the
brain learns: that’s where terms like
artificial intelligence come from.
Enter Machine Learning
(*) CACM 55(10) - A Few Useful Things to Know about Machine Learning
• Sales
Applications of Machine Learning
• Trading
• Image and
Voice
Recognition
• Fraud detection systems:
– Is what he just did consistent with
past behavior?
• Network anomaly detection (?):
– NOPE!
– More like statistical analysis, bad
one at that
• Predicting likelihood of attack
actors
– Create different predictive models
and chain them to gain more
confidence in each step.
Security Applications of ML
• SPAM filters
• Data Mining:
How to do Machine Learning?
• Exploring the space:
• Supervised Learning:
– Classification (NN, SVM,
Naïve Bayes)
– Regression (linear,
logistic)
Kinds of Machine Learning
Source – scikit-learn.github.io/scikit-learn-tutorial/
• Unsupervised Learning :
– Clustering (k-means)
– Decomposition (PCA, SVD)
• Paper from Microsoft Research circa Sept’98!
• (Thanks, Wikipedia!)
Kinds of ML: Naïve Bayes (SPAM filters)
• One of the simplest examples of ML
• Try to infer a relationship between a result variable (y)
and a linear combination of others (x), minimizing the
“squared error” (distance measurement)
Kinds of ML: Linear Regression
Jesse Johnson – shapeofdata.wordpress.com
Kinds of ML: SVM FTW!
• One of my favorite algorithms!
• Support Vector Machines (SVM):
– Good for classification problems with numeric features
– Not a lot of parameters, it helps control overfitting, built in
regularization in the model, usually robust
– However, sometimes slow to train (# of points, # of features)
– Also awesome: hyperplane separation on an unknown infinite
dimension.
Jesse Johnson – shapeofdata.wordpress.com
No idea… Everyone copies this
• SIEM and Log Monitoring tools are just vertical BI
applications (from the 90’s)
• “I don't have time for your marketing hype!” – Infosec
• How many logs you think there are in your
organization?
ML and Infosec
InfoSec Data Scientists
Data Science Venn Diagram by Drew Conway
• “Data Scientist (n.): Person who is better at statistics than
any software engineer and better at software engineering
than any statistician.” -- Josh Willis, Cloudera
Considerations on Data Gathering
• Models will (generally) get better with more data
– But we always have to consider bias and variance as we
select our data points
– Also adversaries – we may be force fed “bad data”, find
signal in weird noise or design bad (or exploitable) features
• “I’ve got 99 problems, but data ain’t one”
Domingos, 2012 Abu-Mostafa, Caltech, 2012
• Adversaries - Exploiting the learning process
• Understand the model, understand the
machine, and you can circumvent it
• Something InfoSec community knows very well
• Any predictive model on Infosec will be pushed
to the limit (LIMIT!)
• Again, think back on the
way SPAM engines evolved.
Considerations on Data Gathering
MLSec Project
• Sign up, send logs, receive reports generated by
robots machine learning models!
– FREE! I need the data! Please help! ;)
• Looking for contributors, ideas, skeptics to support
project as well.
• Visit https://www.mlsecproject.org , message
@MLSecProject or just e-mail me.
• We developed an algorithm to detect malicious
behavior from log entries of firewall blocks
• Over 6 months of data from SANS DShield
• We don’t focus on frequency or network
anomaly detection. Get ground truth “badness”
and roll with it.
• After a lot of statistical-based math (true
positive ratio, true negative ratio, odds
likelihood), it can pinpoint actors that would
be 13x-18x more likely to attack you.
MLSec Project
Map of the
Internet
• (Hilbert Curve)
• Block port 22
• 2013-07-20
0
10
127
MULTICAST AND FRIENDS
Map of the
Internet
• (Hilbert Curve)
• Block port 22
• 2013-07-20
0
10
127
MULTICAST AND FRIENDS
CN
RU
CN,
BR,
TH
• Behavior: block
on port 22
• Trial inference
on 100k IP
addresses per
Class A subnet
• Logarithm
scale:
brightest tiles
are 10 to 1000
times more
likely to
attack.
MLSec Project
MLSec Project - Some interesting
results
• Ok, robot: show me who the “evil guys” are on
port 80 (most likelihood of attack), by AS name
MLSec Project - Some interesting
results
• ZOMG! It KNOWS! Call John Connor!
• 1st model did not take into consideration web crawler activity.
• Without netsec/infosec experience, scientists would be
scratching heads for days.
• Ok, robot: show me who the “evil guys” are on
port 80 (most likelihood of attack), by AS name
• Programming is a must (Python / R)
• Statistical knowledge keeps you from
making dumb mistakes
• Specific machine learning courses and
books:
– Coursera (ML/ Data Analysis / Data Science)
• Practice, Practice, Practice:
– Kaggle
– KDD, VAST, VizSec
How to get started?
• Big data is here! *BUZZWORD ALERT*
• Machine learning / predictive analytics are
coming.
• In 6-12 months, everyone will wish they were a
Data Scientist (not really!)
• There is a lot of applicability in InfoSec
• Embrace the change: the correct applicability of
ML models can greatly enhance defensive
practices.
• MLSec Project is cool, check out my talk in BH/DC
• And MOST IMPORTANTLY…
Take Aways
Machine Learning = ROBOT Unicorns + Rainbows
Machine Learning = ROBOT Unicorns + Rainbows
Thanks!
• Q&A?
• Feedback is welcome!
• (bad = Joel’s fault :P)
Alexandre Pinto
alexcp@mlsecproject.org
@alexcpsec
@MLSecProject
"Prediction is very difficult, especially if it's about the future."

 
 
 
 
 
 
 - Niels Bohr

More Related Content

What's hot

From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...Alex Pinto
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Alex Pinto
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionAlex Pinto
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Alex Pinto
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption StrategiesJoshua R Nicholson
 
Enabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident responseEnabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident responsejeffmcjunkin
 
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin FalckLuncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin FalckNorth Texas Chapter of the ISSA
 
Abstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat HuntingAbstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat Huntingchrissanders88
 
SOC2016 - The Investigation Labyrinth
SOC2016 - The Investigation LabyrinthSOC2016 - The Investigation Labyrinth
SOC2016 - The Investigation Labyrinthchrissanders88
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKThreat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKElasticsearch
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big DataRaffael Marty
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersTao Xie
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at ScaleRaffael Marty
 
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...MITRE - ATT&CKcon
 
AI In Cybersecurity – Challenges and Solutions
AI In Cybersecurity – Challenges and SolutionsAI In Cybersecurity – Challenges and Solutions
AI In Cybersecurity – Challenges and SolutionsZoneFox
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big DataRaffael Marty
 
Workshop threat-hunting
Workshop threat-huntingWorkshop threat-hunting
Workshop threat-huntingTripwire
 
EENA 2021: Keynote – Open-Source Intelligence (OSINT) for emergency services ...
EENA 2021: Keynote – Open-Source Intelligence (OSINT) for emergency services ...EENA 2021: Keynote – Open-Source Intelligence (OSINT) for emergency services ...
EENA 2021: Keynote – Open-Source Intelligence (OSINT) for emergency services ...EENA (European Emergency Number Association)
 
Cyber Threat Hunting with Phirelight
Cyber Threat Hunting with PhirelightCyber Threat Hunting with Phirelight
Cyber Threat Hunting with PhirelightHostway|HOSTING
 

What's hot (20)

From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionBeyond Matching: Applying Data Science Techniques to IOC-based Detection
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
 
Enabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident responseEnabling effective hunt teaming and incident response
Enabling effective hunt teaming and incident response
 
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin FalckLuncheon 2016-07-16 -  Topic 2 - Advanced Threat Hunting by Justin Falck
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
 
Abstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat HuntingAbstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat Hunting
 
SOC2016 - The Investigation Labyrinth
SOC2016 - The Investigation LabyrinthSOC2016 - The Investigation Labyrinth
SOC2016 - The Investigation Labyrinth
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELKThreat Hunting with Elastic at SpectorOps: Welcome to HELK
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
 
Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Security Insights at Scale
Security Insights at ScaleSecurity Insights at Scale
Security Insights at Scale
 
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
 
AI In Cybersecurity – Challenges and Solutions
AI In Cybersecurity – Challenges and SolutionsAI In Cybersecurity – Challenges and Solutions
AI In Cybersecurity – Challenges and Solutions
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
 
Cloud - Security - Big Data
Cloud - Security - Big DataCloud - Security - Big Data
Cloud - Security - Big Data
 
Workshop threat-hunting
Workshop threat-huntingWorkshop threat-hunting
Workshop threat-hunting
 
EENA 2021: Keynote – Open-Source Intelligence (OSINT) for emergency services ...
EENA 2021: Keynote – Open-Source Intelligence (OSINT) for emergency services ...EENA 2021: Keynote – Open-Source Intelligence (OSINT) for emergency services ...
EENA 2021: Keynote – Open-Source Intelligence (OSINT) for emergency services ...
 
Cyber Threat Hunting with Phirelight
Cyber Threat Hunting with PhirelightCyber Threat Hunting with Phirelight
Cyber Threat Hunting with Phirelight
 

Similar to BSidesLV 2013 - Using Machine Learning to Support Information Security

Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tpseudor00t overflow
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationRaffael Marty
 
AI Cybersecurity: Pros & Cons. AI is reshaping cybersecurity
AI Cybersecurity: Pros & Cons. AI is reshaping cybersecurityAI Cybersecurity: Pros & Cons. AI is reshaping cybersecurity
AI Cybersecurity: Pros & Cons. AI is reshaping cybersecurityTasnim Alasali
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6Rod Soto
 
Machine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberMachine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberOWASP Delhi
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxGreg Makowski
 
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousAI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousRaffael Marty
 
Software Security : From school to reality and back!
Software Security : From school to reality and back!Software Security : From school to reality and back!
Software Security : From school to reality and back!Peter Hlavaty
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupShlomo Yona
 
CarolinaCon Presentation on Streaming Analytics
CarolinaCon Presentation on Streaming AnalyticsCarolinaCon Presentation on Streaming Analytics
CarolinaCon Presentation on Streaming AnalyticsJohn Eberhardt
 
Artificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityArtificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityOlivier Busolini
 
Civilian OPSEC in cyberspace
Civilian OPSEC  in cyberspaceCivilian OPSEC  in cyberspace
Civilian OPSEC in cyberspacezapp0
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummiesSaurav Chakravorty
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 

Similar to BSidesLV 2013 - Using Machine Learning to Support Information Security (20)

Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Delivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and VisualizationDelivering Security Insights with Data Analytics and Visualization
Delivering Security Insights with Data Analytics and Visualization
 
AI Cybersecurity: Pros & Cons. AI is reshaping cybersecurity
AI Cybersecurity: Pros & Cons. AI is reshaping cybersecurityAI Cybersecurity: Pros & Cons. AI is reshaping cybersecurity
AI Cybersecurity: Pros & Cons. AI is reshaping cybersecurity
 
BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6BsidesLVPresso2016_JZeditsv6
BsidesLVPresso2016_JZeditsv6
 
Machine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed ZuberMachine Learning in Information Security by Mohammed Zuber
Machine Learning in Information Security by Mohammed Zuber
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
AI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are DangerousAI & ML in Cyber Security - Why Algorithms Are Dangerous
AI & ML in Cyber Security - Why Algorithms Are Dangerous
 
Software Security : From school to reality and back!
Software Security : From school to reality and back!Software Security : From school to reality and back!
Software Security : From school to reality and back!
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
 
CarolinaCon Presentation on Streaming Analytics
CarolinaCon Presentation on Streaming AnalyticsCarolinaCon Presentation on Streaming Analytics
CarolinaCon Presentation on Streaming Analytics
 
Artificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityArtificial Intelligence and Cybersecurity
Artificial Intelligence and Cybersecurity
 
Civilian OPSEC in cyberspace
Civilian OPSEC  in cyberspaceCivilian OPSEC  in cyberspace
Civilian OPSEC in cyberspace
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Data science for advanced dummies
Data science for advanced dummiesData science for advanced dummies
Data science for advanced dummies
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 

Recently uploaded

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

BSidesLV 2013 - Using Machine Learning to Support Information Security

  • 1. Using Machine Learning to support Information Security Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject Proving Ground (Many Thanks to Joel Wilbanks)
  • 2. • This is a talk about DEFENDING not attacking – NO systems were harmed on the development of this talk. – This is NOT about some vanity hack that will be patched tomorrow – We are actually trying to BUILD something here. • This talk includes more MATH thank the daily recommended assumption by the FDA. • You have been warned... WARNING!
  • 3. • 12 years in Information Security, done a little bit of everything. • Past 7 or so years leading security consultancy and monitoring teams in Brazil, London and the US. – If there is any way a SIEM can hurt you, it did to me. • Researching machine learning and data science in general for the past year or so. Participates in Kaggle machine learning competitions (for fun, not for profit). • First presentation in a real Infosec conference! (give or take a few hours) Who’s Alex?
  • 4. • The elephant in the room • Enter Machine Learning • Principles and Kinds of ML • ML and InfoSec • MLSec Project • How to get started? • Take Aways Agenda
  • 5. The elephant in the room • “Internet-scale companies”
  • 6. The elephant in the room
  • 7. • “Machine learning systems automatically learn programs from data” (*) • You don’t really code the program, but it is inferred from data. • Intuition of trying to mimic the way the brain learns: that’s where terms like artificial intelligence come from. Enter Machine Learning (*) CACM 55(10) - A Few Useful Things to Know about Machine Learning
  • 8. • Sales Applications of Machine Learning • Trading • Image and Voice Recognition
  • 9. • Fraud detection systems: – Is what he just did consistent with past behavior? • Network anomaly detection (?): – NOPE! – More like statistical analysis, bad one at that • Predicting likelihood of attack actors – Create different predictive models and chain them to gain more confidence in each step. Security Applications of ML • SPAM filters
  • 10. • Data Mining: How to do Machine Learning? • Exploring the space:
  • 11. • Supervised Learning: – Classification (NN, SVM, Naïve Bayes) – Regression (linear, logistic) Kinds of Machine Learning Source – scikit-learn.github.io/scikit-learn-tutorial/ • Unsupervised Learning : – Clustering (k-means) – Decomposition (PCA, SVD)
  • 12. • Paper from Microsoft Research circa Sept’98! • (Thanks, Wikipedia!) Kinds of ML: Naïve Bayes (SPAM filters)
  • 13. • One of the simplest examples of ML • Try to infer a relationship between a result variable (y) and a linear combination of others (x), minimizing the “squared error” (distance measurement) Kinds of ML: Linear Regression Jesse Johnson – shapeofdata.wordpress.com
  • 14. Kinds of ML: SVM FTW! • One of my favorite algorithms! • Support Vector Machines (SVM): – Good for classification problems with numeric features – Not a lot of parameters, it helps control overfitting, built in regularization in the model, usually robust – However, sometimes slow to train (# of points, # of features) – Also awesome: hyperplane separation on an unknown infinite dimension. Jesse Johnson – shapeofdata.wordpress.com No idea… Everyone copies this
  • 15. • SIEM and Log Monitoring tools are just vertical BI applications (from the 90’s) • “I don't have time for your marketing hype!” – Infosec • How many logs you think there are in your organization? ML and Infosec
  • 16. InfoSec Data Scientists Data Science Venn Diagram by Drew Conway • “Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.” -- Josh Willis, Cloudera
  • 17. Considerations on Data Gathering • Models will (generally) get better with more data – But we always have to consider bias and variance as we select our data points – Also adversaries – we may be force fed “bad data”, find signal in weird noise or design bad (or exploitable) features • “I’ve got 99 problems, but data ain’t one” Domingos, 2012 Abu-Mostafa, Caltech, 2012
  • 18. • Adversaries - Exploiting the learning process • Understand the model, understand the machine, and you can circumvent it • Something InfoSec community knows very well • Any predictive model on Infosec will be pushed to the limit (LIMIT!) • Again, think back on the way SPAM engines evolved. Considerations on Data Gathering
  • 19. MLSec Project • Sign up, send logs, receive reports generated by robots machine learning models! – FREE! I need the data! Please help! ;) • Looking for contributors, ideas, skeptics to support project as well. • Visit https://www.mlsecproject.org , message @MLSecProject or just e-mail me.
  • 20. • We developed an algorithm to detect malicious behavior from log entries of firewall blocks • Over 6 months of data from SANS DShield • We don’t focus on frequency or network anomaly detection. Get ground truth “badness” and roll with it. • After a lot of statistical-based math (true positive ratio, true negative ratio, odds likelihood), it can pinpoint actors that would be 13x-18x more likely to attack you. MLSec Project
  • 21. Map of the Internet • (Hilbert Curve) • Block port 22 • 2013-07-20 0 10 127 MULTICAST AND FRIENDS
  • 22. Map of the Internet • (Hilbert Curve) • Block port 22 • 2013-07-20 0 10 127 MULTICAST AND FRIENDS CN RU CN, BR, TH
  • 23. • Behavior: block on port 22 • Trial inference on 100k IP addresses per Class A subnet • Logarithm scale: brightest tiles are 10 to 1000 times more likely to attack. MLSec Project
  • 24. MLSec Project - Some interesting results • Ok, robot: show me who the “evil guys” are on port 80 (most likelihood of attack), by AS name
  • 25. MLSec Project - Some interesting results • ZOMG! It KNOWS! Call John Connor! • 1st model did not take into consideration web crawler activity. • Without netsec/infosec experience, scientists would be scratching heads for days. • Ok, robot: show me who the “evil guys” are on port 80 (most likelihood of attack), by AS name
  • 26. • Programming is a must (Python / R) • Statistical knowledge keeps you from making dumb mistakes • Specific machine learning courses and books: – Coursera (ML/ Data Analysis / Data Science) • Practice, Practice, Practice: – Kaggle – KDD, VAST, VizSec How to get started?
  • 27. • Big data is here! *BUZZWORD ALERT* • Machine learning / predictive analytics are coming. • In 6-12 months, everyone will wish they were a Data Scientist (not really!) • There is a lot of applicability in InfoSec • Embrace the change: the correct applicability of ML models can greatly enhance defensive practices. • MLSec Project is cool, check out my talk in BH/DC • And MOST IMPORTANTLY… Take Aways
  • 28. Machine Learning = ROBOT Unicorns + Rainbows
  • 29. Machine Learning = ROBOT Unicorns + Rainbows
  • 30. Thanks! • Q&A? • Feedback is welcome! • (bad = Joel’s fault :P) Alexandre Pinto alexcp@mlsecproject.org @alexcpsec @MLSecProject "Prediction is very difficult, especially if it's about the future." - Niels Bohr