SlideShare a Scribd company logo
1 of 34
Download to read offline
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Josef Niedermeier, HPE
Apache Spark for Cyber
Security in an Enterprise
Company
#UnifiedDataAnalytics #SparkAISummit
Agenda
• Introduction
• Challenges in Cyber Security
• Using Spark to help process an increasing amount of data
– Offloading current applications
– Replacing current applications by Big Data technologies
• Adding additional detection capabilities by Machine Learning
– Machine Learning Introduction
– Use Cases
– High level architecture
– Lessons learned
• Q&A
3#UnifiedDataAnalytics #SparkAISummit
Introduction - Team
4#UnifiedDataAnalytics #SparkAISummit
Netwok
Traffic Logs
Users
Actions
Big Data
Platform
Actionable
Intelligence
Global Cyber Security Fusion
Center Data Science Team
Vulnerabilities
Risk and
Governance
Cyber Security
Operation Center
Advanced
Thread
SIEM
Introduction - SIEM
5#UnifiedDataAnalytics #SparkAISummit
 SIEM - security information and event management
 Security Event Manager (SEM): generates alerts based on predefined rules and
input events
 Security Information Manager (SIM): stores relevant cyber security data and
allows querying to get context data
events Security
Analysts
SEM
SIM
Alerts
Query/Context
Aggregation
Filtering
Enriching
Challenges in Cyber Security
• Scalability and performance
– Increasing amount of data: according to Gartner, 25K EPS is
enterprise size, but in big organization there are several 100K EPS.
– Limited storage for historical data.
– Long query response time.
– IoT makes situation even worse.
• Quickly evolving requirements
• Lack of qualified and skilled professionals
6#UnifiedDataAnalytics #SparkAISummit
Using Spark to help
process an increasing
amount of data
#UnifiedDataAnalytics #SparkAISummit
Big Data
Processing
Offloading current applications
8#UnifiedDataAnalytics #SparkAISummit
 offload of aggregation, filtering and enriching
 offload of storage and querying
SIEM
events Security
Analysts
SEM
SIM
Alerts
Query/Context
Aggregation
Filtering
Enriching
Big Data
Storage
API
UI Query/Context
Big Data Processing – high level
9#UnifiedDataAnalytics #SparkAISummit
HDFS
NetFlow
Log
Netflow
Collector
Columnar
Store
Syslog
Collector
Distributed Processing
Batch and Streaming
Deduplication, filtering,
aggregation, enriching
SIEMNetFlow
Syslog In Memory
Data Grid
Big Data Processing
10#UnifiedDataAnalytics #SparkAISummit
Firewall logs aggregation
Big Data Processing
11#UnifiedDataAnalytics #SparkAISummit
Firewall logs aggregation
Syslog Collector sends
syslog events to Kafka.
(custom build)
High Available Load Balancer
sends syslog events to live
collectors. (custom build)
Big Data Processing
12#UnifiedDataAnalytics #SparkAISummit
Firewall logs aggregation
Firewall Aggregation (5 sec.
streaming job) aggregates events.
(using DStream.reduceByKey)
DNS enrichment adds DNS
names using DHCP and
DNS logs.
Big Data Processing
13#UnifiedDataAnalytics #SparkAISummit
Firewall logs aggregation
SIEM Loader (5 sec. streaming
job) sends aggregated events to
the SIEM.
Big Data Processing
14#UnifiedDataAnalytics #SparkAISummit
Firewall logs aggregation
Columnar Store Loader (5 sec.
streaming job) loads aggregated
events to the Columnar Store
Columnar Store
offloads storage
and querying
Big Data Processing
15#UnifiedDataAnalytics #SparkAISummit
●
Environment
●
Inputs 65,000 EPS and 32,000 EPS
5 sec micro-batches (Spark Streaming)
●
24 executors x 11 cores each on non-dedicated, heavily utilized
Hortonworks cluster
●
Results
●
Number of the events is reduced to half
●
Query times are reduced to seconds
Firewall logs aggregation
SIEM functionality using BigData
technology
16#UnifiedDataAnalytics #SparkAISummit
Evens
Security
Analysts
Alerts
Big Data
Storage
Query/Context
MS
MS
API/UIMS
Orchestration
MS
Micro services based
on Big Data Technologies
implement SIEM functionality
●
Easy to add/modify functionality
●
Design driven by users
●
Easier integration with processes
SIEM functionality using BigData
technology
17#UnifiedDataAnalytics #SparkAISummit
 Rule development and testing similar to software testing
 Similar process and tools (Jira, Git etc)
 Tools
 Spark, In Memory Data Grid
 Preliminary Results
 15 - 20 minutes to test a rule on 24h data ( 2B events) (24 executors)
 linearly scalable
Rule
Development
Unit
Testing
Fast Forward Testing
With
Production Sample
Production
Deployment
Adding additional
detection capabilities
by Machine Learning
#UnifiedDataAnalytics #SparkAISummit
Machine Learning - Introduction
19#UnifiedDataAnalytics #SparkAISummit
0
1
0 1
x2
x1
Supervised Learning
1
0 1
x2
x1
Unsupervised Learning
0
1
0 1
x2
x1
Supervised Learning
1
0 1
x2
x1
Unsupervised Learning
We can derive structure
from data and find
outliers.
We can find a function f
and its parameters that fits
training data and can be
used for classification and
regression.
Labeled data – supervised learning
Unlabeled data – unsupervised learning
Machine Learning - Supervised
20#UnifiedDataAnalytics #SparkAISummit
Training
Algorithm
Model
Parameters
(hypothesis)
Training
Labeled
Data
New
Data
Classification
/Regression
Algorithm
Classification
/Regression
Results
Training: finding a function and its parameters to fit training data
Actual Classification/Regression
20
Machine Learning – Example
21#UnifiedDataAnalytics #SparkAISummit 21
●
f: if x2 > (p0 + p1 * x1) then O else X
●
finding parameters to minimize # of wrongly
classified data points (cost function)
p0 p1 Line Cost
0.6 0 3
0.9 -0.9 2
0.8 - 0.7 0
0
1
0 1
x2
x1
Supervised Learning
Training Labeled Data
0
1
0 1
x2
x1
Supervised Learning
21
Parameters
Machine Learning - example
22#UnifiedDataAnalytics #SparkAISummit 22
classification
if x2 > (0.8 – 0.7 * x1)
then O
else X
New data Classified new data
Machine Learning – Terminology
23#UnifiedDataAnalytics #SparkAISummit 23
Precision=
True Positive
True Positive+False Positive
=Proportion of selected items that are relevant
Recall=
True Positive
True Positive+False Negative
=Proportion of relevant items that was selected
Source: https://en.wikipedia.org/wiki/Precision_and_recall
Machine Learning – Challenges
24#UnifiedDataAnalytics #SparkAISummit
●
Too many false positives
●
Precision ~ 99% can be too low
●
Data cleanliness
●
Wrong time on a device can be detected as anomaly
●
Missing labeled data
●
Hard to evaluate recall
Machine Learning – Challenges
25#UnifiedDataAnalytics #SparkAISummit 25
●
A ML algorithm for detecting a specific malware infection:
●
precision = 99%
●
recall = 99%.
●
The infection is relatively rare: 1 % of computers are infected.
What is probability that the computer is really infected if it is classified as
infected?
(99% or 91% or 50% or 1%)
Is 99% precision good enough?
Machine Learning – Challenges
26#UnifiedDataAnalytics #SparkAISummit 26
Suppose there are 10 000 computers:
●
100 are infected
●
99 infected are correctly classified as infected (true positive)
●
1 infected is classified as not infected (false negative)
●
9,900 clean
●
99 are classified incorrectly as infected (false positive)
●
9,801 are correctly classified as not infected (true negative)
●
99 true positivo and 99 false positive = 198 computers classified as
infected but only 99 are really infected so probability that the computer
classified as infected is really infected is 50%.
P(infected given classified as infected )=
P(classified as infected given infected )∗P(infected )
P(classified as infected )
=
0.99∗0.01
(0.99∗0.01+0.01∗0.99)
=0.5Using Bayes' theorem:
Machine Learning – Challenges
27#UnifiedDataAnalytics #SparkAISummit 27
●
Usually a human should make final assessment.
●
Reasonable use cases:
●
High ratio of “infection”
●
Limited (selected) data
Classifier with precision and recall 99 %
infected computers [%] really infected/classified as infected [%]
1.00% 50%
0.10% 9%
0.01% 1%
Machine Learning and Spark
28#UnifiedDataAnalytics #SparkAISummit
●
MLlib is Apache Spark's scalable machine learning library.
●
ML algorithms
●
ML workflow utilities (data → feature, evaluation, persistence, ...)
●
Several deep learning frameworks
●
Databricks – spark-deep-learning, Deep Learning Pipelines for Apache Spark
●
Yahoo -TensorFlowOnSpark
●
Intel – BigDL
●
...
Machine Learning Use Cases
29#UnifiedDataAnalytics #SparkAISummit
Use Case Data
source
Features Algorythm
Detect malicious
URL
Web
proxy log
Entropy, no of spec.
chars, path length, URL
length, contains org.
domain out of position,
has been seen, ...
Random Forest,
Long-Short Term
Memory
Generated domains
(malicious)
detection
DNS log Domain string Long-Short Term
Memory
Classify server
account activity
Active
Domain
log
Network distance,
organization distance,
time distance
Naïve Bayes,
Random Forest
Machine Learning Use Cases
30#UnifiedDataAnalytics #SparkAISummit
Use Case Data
source
Features Algorythm
Detect command
and control
communication
Netflow
data
Duration of TCP/IP
session, cardinality,
octets/packet etc.
Naïve Bayes,
Random Forest
Spark
MLlib
Batch Job
Machine Learning - Architecture
31#UnifiedDataAnalytics #SparkAISummit
Feature
extractor
Training Data
Algorithm
Training
HDFS Model
parameters
Spark
MLlib
Batch or Streaming Job
Machine Learning - Architecture
32#UnifiedDataAnalytics #SparkAISummit
Feature
extractor
New Data
Algorithm
HDFS Model
parameters
Classification
Classified
data
Machine Learning – Lessons
Learned
33#UnifiedDataAnalytics #SparkAISummit
●
Do not implement ML just to click “we are using ML”
●
Have good use cases including precision and recall requirements
●
Visualization can be more useful than ML in some cases
●
In most cases, there is necessary to validate a detection by an
analyst.
●
Cyber security analysts like if there are reasoning (why the
classifier decide that it is malicious)
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

More Related Content

What's hot

Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...Altinity Ltd
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowWes McKinney
 
Camel Day Italia 2021 - Camel K
Camel Day Italia 2021 - Camel KCamel Day Italia 2021 - Camel K
Camel Day Italia 2021 - Camel KNicola Ferraro
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkDatabricks
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon GershinskyDatabricks
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestDatabricks
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar ZecevicDataScienceConferenc1
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 

What's hot (20)

Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
 
Camel Day Italia 2021 - Camel K
Camel Day Italia 2021 - Camel KCamel Day Italia 2021 - Camel K
Camel Day Italia 2021 - Camel K
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache SparkKeeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
ELK introduction
ELK introductionELK introduction
ELK introduction
 
Map reduce vs spark
Map reduce vs sparkMap reduce vs spark
Map reduce vs spark
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 

Similar to Apache Spark for Cyber Security in an Enterprise Company

July 2021 Virtual PNW Splunk User Group Slides
July 2021 Virtual PNW Splunk User Group SlidesJuly 2021 Virtual PNW Splunk User Group Slides
July 2021 Virtual PNW Splunk User Group SlidesAmanda Richardson
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...Alluxio, Inc.
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
Big Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsBig Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsMarco Casassa Mont
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with GimelAlluxio, Inc.
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelDeepak Chandramouli
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDLionel Mommeja
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big DataRaffael Marty
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersDatabricks
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchSylvain Wallez
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Databricks
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Guglielmo Iozzia
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Precisely
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptxArthur240715
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Codemotion
 

Similar to Apache Spark for Cyber Security in an Enterprise Company (20)

July 2021 Virtual PNW Splunk User Group Slides
July 2021 Virtual PNW Splunk User Group SlidesJuly 2021 Virtual PNW Splunk User Group Slides
July 2021 Virtual PNW Splunk User Group Slides
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
Big Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsBig Data for Security - DNS Analytics
Big Data for Security - DNS Analytics
 
Unified Data Access with Gimel
Unified Data Access with GimelUnified Data Access with Gimel
Unified Data Access with Gimel
 
Data orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | GimelData orchestration | 2020 | Alluxio | Gimel
Data orchestration | 2020 | Alluxio | Gimel
 
Using bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-REDUsing bluemix predictive analytics service in Node-RED
Using bluemix predictive analytics service in Node-RED
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
Visualization in the Age of Big Data
Visualization in the Age of Big DataVisualization in the Age of Big Data
Visualization in the Age of Big Data
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
Building a data pipeline to ingest data into Hadoop in minutes using Streamse...
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx230208 MLOps Getting from Good to Great.pptx
230208 MLOps Getting from Good to Great.pptx
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 

Recently uploaded (20)

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 

Apache Spark for Cyber Security in an Enterprise Company

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Josef Niedermeier, HPE Apache Spark for Cyber Security in an Enterprise Company #UnifiedDataAnalytics #SparkAISummit
  • 3. Agenda • Introduction • Challenges in Cyber Security • Using Spark to help process an increasing amount of data – Offloading current applications – Replacing current applications by Big Data technologies • Adding additional detection capabilities by Machine Learning – Machine Learning Introduction – Use Cases – High level architecture – Lessons learned • Q&A 3#UnifiedDataAnalytics #SparkAISummit
  • 4. Introduction - Team 4#UnifiedDataAnalytics #SparkAISummit Netwok Traffic Logs Users Actions Big Data Platform Actionable Intelligence Global Cyber Security Fusion Center Data Science Team Vulnerabilities Risk and Governance Cyber Security Operation Center Advanced Thread
  • 5. SIEM Introduction - SIEM 5#UnifiedDataAnalytics #SparkAISummit  SIEM - security information and event management  Security Event Manager (SEM): generates alerts based on predefined rules and input events  Security Information Manager (SIM): stores relevant cyber security data and allows querying to get context data events Security Analysts SEM SIM Alerts Query/Context Aggregation Filtering Enriching
  • 6. Challenges in Cyber Security • Scalability and performance – Increasing amount of data: according to Gartner, 25K EPS is enterprise size, but in big organization there are several 100K EPS. – Limited storage for historical data. – Long query response time. – IoT makes situation even worse. • Quickly evolving requirements • Lack of qualified and skilled professionals 6#UnifiedDataAnalytics #SparkAISummit
  • 7. Using Spark to help process an increasing amount of data #UnifiedDataAnalytics #SparkAISummit
  • 8. Big Data Processing Offloading current applications 8#UnifiedDataAnalytics #SparkAISummit  offload of aggregation, filtering and enriching  offload of storage and querying SIEM events Security Analysts SEM SIM Alerts Query/Context Aggregation Filtering Enriching Big Data Storage API UI Query/Context
  • 9. Big Data Processing – high level 9#UnifiedDataAnalytics #SparkAISummit HDFS NetFlow Log Netflow Collector Columnar Store Syslog Collector Distributed Processing Batch and Streaming Deduplication, filtering, aggregation, enriching SIEMNetFlow Syslog In Memory Data Grid
  • 10. Big Data Processing 10#UnifiedDataAnalytics #SparkAISummit Firewall logs aggregation
  • 11. Big Data Processing 11#UnifiedDataAnalytics #SparkAISummit Firewall logs aggregation Syslog Collector sends syslog events to Kafka. (custom build) High Available Load Balancer sends syslog events to live collectors. (custom build)
  • 12. Big Data Processing 12#UnifiedDataAnalytics #SparkAISummit Firewall logs aggregation Firewall Aggregation (5 sec. streaming job) aggregates events. (using DStream.reduceByKey) DNS enrichment adds DNS names using DHCP and DNS logs.
  • 13. Big Data Processing 13#UnifiedDataAnalytics #SparkAISummit Firewall logs aggregation SIEM Loader (5 sec. streaming job) sends aggregated events to the SIEM.
  • 14. Big Data Processing 14#UnifiedDataAnalytics #SparkAISummit Firewall logs aggregation Columnar Store Loader (5 sec. streaming job) loads aggregated events to the Columnar Store Columnar Store offloads storage and querying
  • 15. Big Data Processing 15#UnifiedDataAnalytics #SparkAISummit ● Environment ● Inputs 65,000 EPS and 32,000 EPS 5 sec micro-batches (Spark Streaming) ● 24 executors x 11 cores each on non-dedicated, heavily utilized Hortonworks cluster ● Results ● Number of the events is reduced to half ● Query times are reduced to seconds Firewall logs aggregation
  • 16. SIEM functionality using BigData technology 16#UnifiedDataAnalytics #SparkAISummit Evens Security Analysts Alerts Big Data Storage Query/Context MS MS API/UIMS Orchestration MS Micro services based on Big Data Technologies implement SIEM functionality ● Easy to add/modify functionality ● Design driven by users ● Easier integration with processes
  • 17. SIEM functionality using BigData technology 17#UnifiedDataAnalytics #SparkAISummit  Rule development and testing similar to software testing  Similar process and tools (Jira, Git etc)  Tools  Spark, In Memory Data Grid  Preliminary Results  15 - 20 minutes to test a rule on 24h data ( 2B events) (24 executors)  linearly scalable Rule Development Unit Testing Fast Forward Testing With Production Sample Production Deployment
  • 18. Adding additional detection capabilities by Machine Learning #UnifiedDataAnalytics #SparkAISummit
  • 19. Machine Learning - Introduction 19#UnifiedDataAnalytics #SparkAISummit 0 1 0 1 x2 x1 Supervised Learning 1 0 1 x2 x1 Unsupervised Learning 0 1 0 1 x2 x1 Supervised Learning 1 0 1 x2 x1 Unsupervised Learning We can derive structure from data and find outliers. We can find a function f and its parameters that fits training data and can be used for classification and regression. Labeled data – supervised learning Unlabeled data – unsupervised learning
  • 20. Machine Learning - Supervised 20#UnifiedDataAnalytics #SparkAISummit Training Algorithm Model Parameters (hypothesis) Training Labeled Data New Data Classification /Regression Algorithm Classification /Regression Results Training: finding a function and its parameters to fit training data Actual Classification/Regression 20
  • 21. Machine Learning – Example 21#UnifiedDataAnalytics #SparkAISummit 21 ● f: if x2 > (p0 + p1 * x1) then O else X ● finding parameters to minimize # of wrongly classified data points (cost function) p0 p1 Line Cost 0.6 0 3 0.9 -0.9 2 0.8 - 0.7 0 0 1 0 1 x2 x1 Supervised Learning Training Labeled Data 0 1 0 1 x2 x1 Supervised Learning 21 Parameters
  • 22. Machine Learning - example 22#UnifiedDataAnalytics #SparkAISummit 22 classification if x2 > (0.8 – 0.7 * x1) then O else X New data Classified new data
  • 23. Machine Learning – Terminology 23#UnifiedDataAnalytics #SparkAISummit 23 Precision= True Positive True Positive+False Positive =Proportion of selected items that are relevant Recall= True Positive True Positive+False Negative =Proportion of relevant items that was selected Source: https://en.wikipedia.org/wiki/Precision_and_recall
  • 24. Machine Learning – Challenges 24#UnifiedDataAnalytics #SparkAISummit ● Too many false positives ● Precision ~ 99% can be too low ● Data cleanliness ● Wrong time on a device can be detected as anomaly ● Missing labeled data ● Hard to evaluate recall
  • 25. Machine Learning – Challenges 25#UnifiedDataAnalytics #SparkAISummit 25 ● A ML algorithm for detecting a specific malware infection: ● precision = 99% ● recall = 99%. ● The infection is relatively rare: 1 % of computers are infected. What is probability that the computer is really infected if it is classified as infected? (99% or 91% or 50% or 1%) Is 99% precision good enough?
  • 26. Machine Learning – Challenges 26#UnifiedDataAnalytics #SparkAISummit 26 Suppose there are 10 000 computers: ● 100 are infected ● 99 infected are correctly classified as infected (true positive) ● 1 infected is classified as not infected (false negative) ● 9,900 clean ● 99 are classified incorrectly as infected (false positive) ● 9,801 are correctly classified as not infected (true negative) ● 99 true positivo and 99 false positive = 198 computers classified as infected but only 99 are really infected so probability that the computer classified as infected is really infected is 50%. P(infected given classified as infected )= P(classified as infected given infected )∗P(infected ) P(classified as infected ) = 0.99∗0.01 (0.99∗0.01+0.01∗0.99) =0.5Using Bayes' theorem:
  • 27. Machine Learning – Challenges 27#UnifiedDataAnalytics #SparkAISummit 27 ● Usually a human should make final assessment. ● Reasonable use cases: ● High ratio of “infection” ● Limited (selected) data Classifier with precision and recall 99 % infected computers [%] really infected/classified as infected [%] 1.00% 50% 0.10% 9% 0.01% 1%
  • 28. Machine Learning and Spark 28#UnifiedDataAnalytics #SparkAISummit ● MLlib is Apache Spark's scalable machine learning library. ● ML algorithms ● ML workflow utilities (data → feature, evaluation, persistence, ...) ● Several deep learning frameworks ● Databricks – spark-deep-learning, Deep Learning Pipelines for Apache Spark ● Yahoo -TensorFlowOnSpark ● Intel – BigDL ● ...
  • 29. Machine Learning Use Cases 29#UnifiedDataAnalytics #SparkAISummit Use Case Data source Features Algorythm Detect malicious URL Web proxy log Entropy, no of spec. chars, path length, URL length, contains org. domain out of position, has been seen, ... Random Forest, Long-Short Term Memory Generated domains (malicious) detection DNS log Domain string Long-Short Term Memory Classify server account activity Active Domain log Network distance, organization distance, time distance Naïve Bayes, Random Forest
  • 30. Machine Learning Use Cases 30#UnifiedDataAnalytics #SparkAISummit Use Case Data source Features Algorythm Detect command and control communication Netflow data Duration of TCP/IP session, cardinality, octets/packet etc. Naïve Bayes, Random Forest
  • 31. Spark MLlib Batch Job Machine Learning - Architecture 31#UnifiedDataAnalytics #SparkAISummit Feature extractor Training Data Algorithm Training HDFS Model parameters
  • 32. Spark MLlib Batch or Streaming Job Machine Learning - Architecture 32#UnifiedDataAnalytics #SparkAISummit Feature extractor New Data Algorithm HDFS Model parameters Classification Classified data
  • 33. Machine Learning – Lessons Learned 33#UnifiedDataAnalytics #SparkAISummit ● Do not implement ML just to click “we are using ML” ● Have good use cases including precision and recall requirements ● Visualization can be more useful than ML in some cases ● In most cases, there is necessary to validate a detection by an analyst. ● Cyber security analysts like if there are reasoning (why the classifier decide that it is malicious)
  • 34. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT