SlideShare a Scribd company logo
1 of 25
Download to read offline
background image: 960x540 pixels - send to back of slide and set to 80% transparency
Using Data Science for
Cybersecurity
Anirudh Kondaveeti, Principal Data Scientist, Pivotal
Jeff Kelly, Principal Product Marketing Manager, Pivotal
Today’s Speakers
2
Using Data Science for Cybersecurity
Anirudh Kondaveeti
Principal Data Scientist, Pivotal
Jeff Kelly
Product Marketing, Pivotal
Moderator Presenter
cover this square with an image (540 x 480 pixels)
●  Cybercrime costs average US
enterprise $17m per year*
●  Cost grew at 15% CAGR over last three
years
●  Any given cybercrime can cost
significantly more
●  Target’s 2014 hack cost company
approximately $162m
●  Costs not just financial, also reputational
Cost of Cybercrime on
the Rise
*Source: 2016 Cost of Cyber Crime Study & the Risk of Business Innovation,
Ponemon Institute
cover this square with an image (540 x 480 pixels)
●  Amateur hackers giving way to
professionals
●  Developing new, more sophisticated,
methods
●  Professional hackers make their
services available for a fee
●  Costs to commit cybercrime dropping
●  Average subscription fee for a one hour/
month DDoS package is roughly $38*
Hackers Growing More
Sophisticated
*Source: Q2 2015 Global DDoS Threat Landscape, Incapsula
cover this square with an image (540 x 480 pixels)
●  Defending the perimeter no longer
enough
●  No 100%, fool-proof way to keep bad
actors out
●  Some threats come from within
●  The idea of a perimeter becoming
obsolete with mobile, cloud, IoT
●  Need better methods for threat
detection inside the network
Perimeter Defense
Inadequate
Data Science for Cybersecurity
Security must move beyond signature-based matching
•  Necessary defense direction: Find the Unknown
•  Need an advanced platform: Security is a Big Data problem
•  Multiple decentralized sources of traditional or unconventional data
•  Need a platform for better BI, reporting, and cross-source correlation
•  Develop intelligence: Security is an Advanced Analytics problem
BI and
Compliance-
driven
Investigation-
driven
Behavior-
metrics
Investigation-
driven
Data-science
driven
Background
8
Lateral Movement Detection
Advanced Persistent Threat (APT)
A handful of users are
targeted by two
phishing attacks: one
user opens Zero day
payload
(CVE-02011-0609)
The user machine is
accessed remotely
by Poison Ivy tool
Attacker elevates
access to important
user, service and
admin accounts, and
specific systems
Data is acquired from
target servers and
staged for exfiltration
Data is exfiltrated via
encrypted files over ftp to
external, compromised
machine at a hosting
provider
Phishing and
Zero Day Attack
Back Door
Lateral
Movement
Data Gathering Exfiltrate
1 2 3 4 5
APT Kill Chain
What: Identify anomalous user-level access to hosts
How: Look at People & Machines
•  Users (User Behavior Models)
•  Network, Servers (User Peer Models)
Scenarios:
Network reconnaissance from remote adversary on hijacked device
Ill-intentioned activities by legitimate employee
Access policy abuse
Business values:
Immediate security alert generation
Enhanced SIEM alert queue prioritization
Focused monitoring
Future integration with other analytic models for 360° attack view
Lateral Movement Detection
Data Computing Appliance
Logs
Active Directory Activity
Active Directory Metadata
Server Information
Structured
ExternalTables
Semi-structured
Regression Based Model
Cluster Based Model
Recommendation System
Based
User Behavioral Model
Anomalous Users
Greenplum
DIA
LDAP Activity
Lateral Movement Detection (LMD) – Flow Diagram
Model to identify users with unusual
variation in the number of servers
accessed over time
Build a regression model for each user
(Y = aX + b)
No. of servers accessed each week (Y)
~ Week Index (X)
Find the slope of the regression line for
each user (a)
Identify users who have a high positive
or negative slope to find users with
unusual activity
NumberofServers
Week of the year
Regression plot of number of servers for a user
Regression-Based Model
Build historical behavioral profile for each user
based on following features:
•  Servers accessed
•  IP addresses logged in from
•  Geographical information of login
Models stress individual user/job log-in
frequency
Multiple Feature Generations reduce false
alarms:
•  Aggregate servers to respective server group
•  Incorporate server criticality
•  Assign more weight to less popular servers and IP
addresses
•  E.g. print servers are low-weighted
•  Use recommendation engine to suggest servers to users
based on job roles and peers
Servers
s1s2s3s4s5s6s7s8s9s10
Typically uses only
a few servers
Begins logging
into a lot of
new servers
User Behavior Models (UBM)
Week1 Week2 . Week10 Week 11 . Week15
server1 2 3 1 0 . 0
server2 4 7 1 3 . 7
server3 0 2 0 0 . 0
. . . . . .
server25 1 3 5 8 . 1
PCA Model Built per User (Training Data) Testing Data
User behavior matrix is created using ‘x’ weeks of history for a user. The current week is
used as test data.
PCA is dimensionality reduction technique used to capture the components set of
multidimensional vector which account for most of the variance.
Principal dimensions are calculated from the training data.
Principal Component Analysis (PCA) Scoring
Reconstruction Error
Training Data
(User Behavior
Matrix)
Run PCA
Principal
Dimensions
Reconstruct
Project onto
Principal
Dimensions
Test Vector
(User data for
new week)
Reconstructed
Test Vector
Difference
between
two vectors
Anomaly
Score
Ref: A Lakhina, M Crovella, C Diot, Diagnosing network-wide traffic anomalies
Principal Component Analysis (PCA) Scoring
Oversampling PCA
Reference and Image Source: YR Yeh, ZY Lee, YJ Lee, Anomaly Detection via Over-sampling Principal Component Analysis
Training Data
(User Behavior
Matrix)
Run PCA
Oversampled
Test Data
Training Data
(User Behavior
Matrix)
Run PCA
First Principal Vector
Difference
in angle
between them
Anomaly
Score
First Principal Vector
after oversampling
Test Data
Principal Component Analysis (PCA) Scoring
R Code to find the Principal Components (using SVD)
SQL & R
User1
Data
User2
Data
User3
Data
User4
Data
User5
Data
User1
Model
User2
Model
User3
Model
User4
Model
User5
Model
PLR wrapper over the R Code to run in parallel
Parallelized PCA using PL/R
Users rate items
To recommend items to a particular user A
•  Find other users U similar to A
•  Identify the set of items I accessed by U
•  Recommend these items I to A
Users = Employees
Items = Servers accessed
Image Source: http://dataconomy.com/2015/03/an-introduction-to-recommendation-engines/
Recommendation System-Based Model
Ÿ  Historical profile for each user
based on number of days per
week for a particular server
weighted by
recommendations
Ÿ  AD Logs, LDAP data (job title,
dept, etc)
Ÿ  Heat Map (Top figure)
–  X-Axis : Week Index
–  Y-Axis : Server
–  Value: Number of days per
week weighted by
recommendations
Ÿ  Outlier Plot (Bottom Figure)
–  X-Axis : Week Index
–  Y-Axis : Outlier Score
Heat map before recommendations Heat map after recommendations
Servers g3 & g4 are recommended, hence
weight is decreased
Outlier score in test week decreases because the
new servers that the user accesses are
recommended for his job profile
g1g2g3g4g5
g1g2g3g4g5
Recommendation System-Based Model
Using historical windows events data to
build graphs* of typical user behavior
•  Which machines does the user log into?
•  Which machines does the user log in from?
•  How often?
•  In which order?
Ask if this behavior is typical
•  Is it typical for this user?
•  Is it typical for someone in a particular department?
•  Is this typical for someone in the user’s job role?
Graph models are sensitive to direction,
order, and frequency
34.23.123.4
Typical Behavior
Anomalous Behavior
DB with financial
information
34.23.123.51
34.23.1.1
34.23.0.1
34.23.2.8
34.23.123.4
34.23.1.1
34.23.0.1
34.23.2.8
34.23.123.51
*Reference: Alexander D. Kenta, Lorie M. Liebrockb, Joshua C. Neila. Authentication graphs: Analyzing user behavior within an enterprise network.
Graph Model
Challenge:
•  Cybersecurity threats, data privacy, data protection and fraudulent
behavior going undetected, leaving customer vulnerable to security
risks, loss of money
•  Need to gain timely insight into unusual/suspicious internal behavior
to allow for proper action
•  Tools in place cannot be customized to leverage historical security
data and allow for predictive analytics
Solution:
•  Leveraged Data Science to show use cases analyzing their active
directory data, identifying fraud, unapproved file sharing, etc.
•  Utilized Big Data Suite, specifically Greenplum + MADlib + R to
store and analyze data with potential to build out Hadoop data lake
with HDB (aka HAWQ)
Pivotal Solution includes: Pivotal
Greenplum, Pivotal HDB, Apache MADlib
Fortune 100 Companies Leverage Pivotal to Tackle
Enterprise-wide Security Risks with Analytics
•  Pivotal Data Science expertise and partnership with customers to
identify high-value use cases to solve and build data science center
of excellence for security analytics
•  Tight integration to Analytical Tools that run in-database and
across all of the data, to cover the most possible use cases
•  Scalable Solution that can grow as data needs grow, leveraging
commodity hardware to keep costs low as data volume increases
•  Join key Pivotal customers in the Security Advisory Council for
collaboration and knowledge sharing
Why Pivotal for Security Analytics
Additional Resources
& Next Steps
Read: Pivotal Data Science Blog
https://blog.pivotal.io/channels/data-science-pivotal
Strategic: Pivotal Data Science Analytics Road
mapping Engagement https://pivotal.io/contact
Tune in: Next data science webinar: “Using Data
Science to Detect Healthcare Fraud, Waste, and
Abuse,” March 14, 2017
https://pivotal.io/resources/1/webinars
Hands on:
Pivotal Greenplum Sandbox
https://network.pivotal.io/products/pivotal-gpdb
Apache MADlib (incubating)
http://madlib.incubator.apache.org/
Questions?
Using Data Science for Cybersecurity
Using Data Science for Cybersecurity

More Related Content

What's hot

Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Brian K. Dickard
 
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibEl Habib NFAOUI
 
HPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningHPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningNimai Chand Das Adhikari
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphSqrrl
 
Loan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxLoan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxBhoirRitesh19ET5008
 
Artificial Neural Network (draft)
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)James Boulie
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidmining Content
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsSC5.io
 
Achieving Defendable Architectures Via Threat Driven Methodologies
Achieving Defendable Architectures Via Threat Driven MethodologiesAchieving Defendable Architectures Via Threat Driven Methodologies
Achieving Defendable Architectures Via Threat Driven MethodologiesPriyanka Aash
 
Free and open cloud security posture monitoring
Free and open cloud security posture monitoringFree and open cloud security posture monitoring
Free and open cloud security posture monitoringElasticsearch
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoostJoonyoung Yi
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonKrishna Sankar
 
Splunk Enterprise Security
Splunk Enterprise SecuritySplunk Enterprise Security
Splunk Enterprise SecuritySplunk
 

What's hot (20)

Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)Cloud Computing Risk Management (Multi Venue)
Cloud Computing Risk Management (Multi Venue)
 
PageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_HabibPageRank_algorithm_Nfaoui_El_Habib
PageRank_algorithm_Nfaoui_El_Habib
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Final ppt
Final pptFinal ppt
Final ppt
 
HPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningHPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine Learning
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Loan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxLoan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptx
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Artificial Neural Network (draft)
Artificial Neural Network (draft)Artificial Neural Network (draft)
Artificial Neural Network (draft)
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid Miner
 
Practical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit AlgorithmsPractical AI for Business: Bandit Algorithms
Practical AI for Business: Bandit Algorithms
 
Achieving Defendable Architectures Via Threat Driven Methodologies
Achieving Defendable Architectures Via Threat Driven MethodologiesAchieving Defendable Architectures Via Threat Driven Methodologies
Achieving Defendable Architectures Via Threat Driven Methodologies
 
Free and open cloud security posture monitoring
Free and open cloud security posture monitoringFree and open cloud security posture monitoring
Free and open cloud security posture monitoring
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Loan prediction
Loan predictionLoan prediction
Loan prediction
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Federated Learning
Federated LearningFederated Learning
Federated Learning
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 
Splunk Enterprise Security
Splunk Enterprise SecuritySplunk Enterprise Security
Splunk Enterprise Security
 

Viewers also liked

Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)VMware Tanzu
 
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)VMware Tanzu
 
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)VMware Tanzu
 
LIVE DEMO: Pivotal Cloud Foundry
LIVE DEMO: Pivotal Cloud FoundryLIVE DEMO: Pivotal Cloud Foundry
LIVE DEMO: Pivotal Cloud FoundryVMware Tanzu
 
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)VMware Tanzu
 
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...VMware Tanzu
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware DetectionVMware Tanzu
 
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewVMware Tanzu
 

Viewers also liked (8)

Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
 
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
 
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
 
LIVE DEMO: Pivotal Cloud Foundry
LIVE DEMO: Pivotal Cloud FoundryLIVE DEMO: Pivotal Cloud Foundry
LIVE DEMO: Pivotal Cloud Foundry
 
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
 
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware Detection
 
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical Overview
 

Similar to Using Data Science for Cybersecurity

Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Applying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsApplying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsPriyanka Aash
 
Splunk for Security: Background & Customer Case Study
Splunk for Security: Background & Customer Case StudySplunk for Security: Background & Customer Case Study
Splunk for Security: Background & Customer Case StudyAndrew Gerber
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4Janani Eshwaran
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4Janani Eshwaran
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAmazon Web Services
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise deteo
 
A Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterA Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterForgeRock
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformArvind Sathi
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Amazon Web Services
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorDataWorks Summit
 

Similar to Using Data Science for Cybersecurity (20)

Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Applying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsApplying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data Sets
 
Splunk for Security: Background & Customer Case Study
Splunk for Security: Background & Customer Case StudySplunk for Security: Background & Customer Case Study
Splunk for Security: Background & Customer Case Study
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
A Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterA Study in Borderless Over Perimeter
A Study in Borderless Over Perimeter
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the door
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Using Data Science for Cybersecurity

  • 1. background image: 960x540 pixels - send to back of slide and set to 80% transparency Using Data Science for Cybersecurity Anirudh Kondaveeti, Principal Data Scientist, Pivotal Jeff Kelly, Principal Product Marketing Manager, Pivotal
  • 2. Today’s Speakers 2 Using Data Science for Cybersecurity Anirudh Kondaveeti Principal Data Scientist, Pivotal Jeff Kelly Product Marketing, Pivotal Moderator Presenter
  • 3. cover this square with an image (540 x 480 pixels) ●  Cybercrime costs average US enterprise $17m per year* ●  Cost grew at 15% CAGR over last three years ●  Any given cybercrime can cost significantly more ●  Target’s 2014 hack cost company approximately $162m ●  Costs not just financial, also reputational Cost of Cybercrime on the Rise *Source: 2016 Cost of Cyber Crime Study & the Risk of Business Innovation, Ponemon Institute
  • 4. cover this square with an image (540 x 480 pixels) ●  Amateur hackers giving way to professionals ●  Developing new, more sophisticated, methods ●  Professional hackers make their services available for a fee ●  Costs to commit cybercrime dropping ●  Average subscription fee for a one hour/ month DDoS package is roughly $38* Hackers Growing More Sophisticated *Source: Q2 2015 Global DDoS Threat Landscape, Incapsula
  • 5. cover this square with an image (540 x 480 pixels) ●  Defending the perimeter no longer enough ●  No 100%, fool-proof way to keep bad actors out ●  Some threats come from within ●  The idea of a perimeter becoming obsolete with mobile, cloud, IoT ●  Need better methods for threat detection inside the network Perimeter Defense Inadequate
  • 6. Data Science for Cybersecurity
  • 7. Security must move beyond signature-based matching •  Necessary defense direction: Find the Unknown •  Need an advanced platform: Security is a Big Data problem •  Multiple decentralized sources of traditional or unconventional data •  Need a platform for better BI, reporting, and cross-source correlation •  Develop intelligence: Security is an Advanced Analytics problem BI and Compliance- driven Investigation- driven Behavior- metrics Investigation- driven Data-science driven Background
  • 9. Advanced Persistent Threat (APT) A handful of users are targeted by two phishing attacks: one user opens Zero day payload (CVE-02011-0609) The user machine is accessed remotely by Poison Ivy tool Attacker elevates access to important user, service and admin accounts, and specific systems Data is acquired from target servers and staged for exfiltration Data is exfiltrated via encrypted files over ftp to external, compromised machine at a hosting provider Phishing and Zero Day Attack Back Door Lateral Movement Data Gathering Exfiltrate 1 2 3 4 5 APT Kill Chain
  • 10. What: Identify anomalous user-level access to hosts How: Look at People & Machines •  Users (User Behavior Models) •  Network, Servers (User Peer Models) Scenarios: Network reconnaissance from remote adversary on hijacked device Ill-intentioned activities by legitimate employee Access policy abuse Business values: Immediate security alert generation Enhanced SIEM alert queue prioritization Focused monitoring Future integration with other analytic models for 360° attack view Lateral Movement Detection
  • 11. Data Computing Appliance Logs Active Directory Activity Active Directory Metadata Server Information Structured ExternalTables Semi-structured Regression Based Model Cluster Based Model Recommendation System Based User Behavioral Model Anomalous Users Greenplum DIA LDAP Activity Lateral Movement Detection (LMD) – Flow Diagram
  • 12. Model to identify users with unusual variation in the number of servers accessed over time Build a regression model for each user (Y = aX + b) No. of servers accessed each week (Y) ~ Week Index (X) Find the slope of the regression line for each user (a) Identify users who have a high positive or negative slope to find users with unusual activity NumberofServers Week of the year Regression plot of number of servers for a user Regression-Based Model
  • 13. Build historical behavioral profile for each user based on following features: •  Servers accessed •  IP addresses logged in from •  Geographical information of login Models stress individual user/job log-in frequency Multiple Feature Generations reduce false alarms: •  Aggregate servers to respective server group •  Incorporate server criticality •  Assign more weight to less popular servers and IP addresses •  E.g. print servers are low-weighted •  Use recommendation engine to suggest servers to users based on job roles and peers Servers s1s2s3s4s5s6s7s8s9s10 Typically uses only a few servers Begins logging into a lot of new servers User Behavior Models (UBM)
  • 14. Week1 Week2 . Week10 Week 11 . Week15 server1 2 3 1 0 . 0 server2 4 7 1 3 . 7 server3 0 2 0 0 . 0 . . . . . . server25 1 3 5 8 . 1 PCA Model Built per User (Training Data) Testing Data User behavior matrix is created using ‘x’ weeks of history for a user. The current week is used as test data. PCA is dimensionality reduction technique used to capture the components set of multidimensional vector which account for most of the variance. Principal dimensions are calculated from the training data. Principal Component Analysis (PCA) Scoring
  • 15. Reconstruction Error Training Data (User Behavior Matrix) Run PCA Principal Dimensions Reconstruct Project onto Principal Dimensions Test Vector (User data for new week) Reconstructed Test Vector Difference between two vectors Anomaly Score Ref: A Lakhina, M Crovella, C Diot, Diagnosing network-wide traffic anomalies Principal Component Analysis (PCA) Scoring
  • 16. Oversampling PCA Reference and Image Source: YR Yeh, ZY Lee, YJ Lee, Anomaly Detection via Over-sampling Principal Component Analysis Training Data (User Behavior Matrix) Run PCA Oversampled Test Data Training Data (User Behavior Matrix) Run PCA First Principal Vector Difference in angle between them Anomaly Score First Principal Vector after oversampling Test Data Principal Component Analysis (PCA) Scoring
  • 17. R Code to find the Principal Components (using SVD) SQL & R User1 Data User2 Data User3 Data User4 Data User5 Data User1 Model User2 Model User3 Model User4 Model User5 Model PLR wrapper over the R Code to run in parallel Parallelized PCA using PL/R
  • 18. Users rate items To recommend items to a particular user A •  Find other users U similar to A •  Identify the set of items I accessed by U •  Recommend these items I to A Users = Employees Items = Servers accessed Image Source: http://dataconomy.com/2015/03/an-introduction-to-recommendation-engines/ Recommendation System-Based Model
  • 19. Ÿ  Historical profile for each user based on number of days per week for a particular server weighted by recommendations Ÿ  AD Logs, LDAP data (job title, dept, etc) Ÿ  Heat Map (Top figure) –  X-Axis : Week Index –  Y-Axis : Server –  Value: Number of days per week weighted by recommendations Ÿ  Outlier Plot (Bottom Figure) –  X-Axis : Week Index –  Y-Axis : Outlier Score Heat map before recommendations Heat map after recommendations Servers g3 & g4 are recommended, hence weight is decreased Outlier score in test week decreases because the new servers that the user accesses are recommended for his job profile g1g2g3g4g5 g1g2g3g4g5 Recommendation System-Based Model
  • 20. Using historical windows events data to build graphs* of typical user behavior •  Which machines does the user log into? •  Which machines does the user log in from? •  How often? •  In which order? Ask if this behavior is typical •  Is it typical for this user? •  Is it typical for someone in a particular department? •  Is this typical for someone in the user’s job role? Graph models are sensitive to direction, order, and frequency 34.23.123.4 Typical Behavior Anomalous Behavior DB with financial information 34.23.123.51 34.23.1.1 34.23.0.1 34.23.2.8 34.23.123.4 34.23.1.1 34.23.0.1 34.23.2.8 34.23.123.51 *Reference: Alexander D. Kenta, Lorie M. Liebrockb, Joshua C. Neila. Authentication graphs: Analyzing user behavior within an enterprise network. Graph Model
  • 21. Challenge: •  Cybersecurity threats, data privacy, data protection and fraudulent behavior going undetected, leaving customer vulnerable to security risks, loss of money •  Need to gain timely insight into unusual/suspicious internal behavior to allow for proper action •  Tools in place cannot be customized to leverage historical security data and allow for predictive analytics Solution: •  Leveraged Data Science to show use cases analyzing their active directory data, identifying fraud, unapproved file sharing, etc. •  Utilized Big Data Suite, specifically Greenplum + MADlib + R to store and analyze data with potential to build out Hadoop data lake with HDB (aka HAWQ) Pivotal Solution includes: Pivotal Greenplum, Pivotal HDB, Apache MADlib Fortune 100 Companies Leverage Pivotal to Tackle Enterprise-wide Security Risks with Analytics
  • 22. •  Pivotal Data Science expertise and partnership with customers to identify high-value use cases to solve and build data science center of excellence for security analytics •  Tight integration to Analytical Tools that run in-database and across all of the data, to cover the most possible use cases •  Scalable Solution that can grow as data needs grow, leveraging commodity hardware to keep costs low as data volume increases •  Join key Pivotal customers in the Security Advisory Council for collaboration and knowledge sharing Why Pivotal for Security Analytics
  • 23. Additional Resources & Next Steps Read: Pivotal Data Science Blog https://blog.pivotal.io/channels/data-science-pivotal Strategic: Pivotal Data Science Analytics Road mapping Engagement https://pivotal.io/contact Tune in: Next data science webinar: “Using Data Science to Detect Healthcare Fraud, Waste, and Abuse,” March 14, 2017 https://pivotal.io/resources/1/webinars Hands on: Pivotal Greenplum Sandbox https://network.pivotal.io/products/pivotal-gpdb Apache MADlib (incubating) http://madlib.incubator.apache.org/
  • 24. Questions? Using Data Science for Cybersecurity