Security analytics at web scale

•Transferir como PPTX, PDF•

0 gostou•189 visualizações

OLAP is an acronym for online analytical processing. It focuses on reporting and in a broader sense, it is about answering schema oriented queries quickly. Queries could be “how many distinct infections seen for a threat in a given month” or “what is the maximum duration in last month that a particular infection was seen in my enterprise”. Contrast this to OLTP or online transaction processing where storing a fast stream of transactional elements is more important. If we talk about OLAP, Star Schema is the first thing that comes to mind. In a relational OLAP world, Star Schema is an important concept. Modeling OLAP data in Star Schema format means segregating data into Fact and Dimension tables. The central table represents couple of dimensions which constitutes a fact and one or more measures which we try to calculate. Measure is often a derived field and can be deduced with SQL queries like group by and aggregate functions. We use Spark and HBase to implement a Hybrid OLAP system. We call it hybrid because we store data in both relational(ROLAP) and multi-dimensional (MOLAP) format. MOLAP materialization can be best visualized as a lattice. Each of the circular points here is called Tile or Cuboid. Each of the tiles can be thought to be equivalent of Group By clause in SQL, aggregates like Sum or Count are implicit and not shown in the diagram. If we are reading the lattice from bottom to top we are skipping one field out of the 3 fields (Infection_type,country,monthId). The 2-D cuboids are based on dropping one field at a time. This is called roll up. Conversely if we start from the top i.e. 0-D cuboid and move downwards we are grouping by on one field, this is called drill down. There are various literature on how to do this rollup and drilldown efficiently and which cuboids to materialize. I would strongly recommend Han and Kamber's Data Mining book and the lattice paper by Harinarayan et al for deep understanding of this domain.

Tecnologia

Security Analytics at Web Scale
pratim_mukherjee@symantec.com

 Bangladesh Bank Chief Resigns After Cyber Theft of $81 million
 New York Times (Mar 15,2016)
 Cybercrime is a key fraud risk in India
 ey.com (Jan 20,2016)
 Target settles for $39 million over data breach
 Cnn.com (Dec 2,2015)
 Anthem is warning consumers about its huge data breach
 Los Angeles Times.com (Mar,2015)
 Ashley Madison
 Anyone Here !!
Why Should You Care !

 Incident Response
 Identify root cause and fix vulnerabilities
 Intrusion Detection
 Monitor network and systems for malicious activities
 Alert Prioritization
 Reduce false positives to stop the threat with highest impact
 Predicting Compromises
 Predict attacks based on vulnerability, command & control activity and past infections
 Access Analytics
 Isolate unusual user behavior e.g. concurrent geographical login
 Simulation
 Simulate various attacks by doing internal pen testing and take precautions based on log mining
 Simulate insider attack on data loss prevention software and take precautions based on its logs
What is Security Analytics

 No real time query on Petabytes
 Reduce data in stages like a funnel
Web Scale - Dealing with Petabytes
Streaming
Logs
Kafka
Log Parser HiveSemi
Aggregates
HBase
MOLAP CubesKafka Client

 Relational OLAP (ROLAP)
 SQL kind of queries from client front-end tools for a relational back-end
database.
 ROLAP servers include optimization for each DBMS back end,
implementation of aggregation navigation logic, and additional tools and
services
 ROLAP technology tends to have greater scalability than MOLAP technology
 Multi-dimensional OLAP (MOLAP)
 Query materialized views , think about Partially Ordered Sets (POSET)
 The advantage of using a data cube is that it allows fast indexing to pre-
computed summarized data and usually much faster than ROLAP
 Difficult to scale because of “curse of dimensionality”
Hybrid OLAP

Visualization of MOLAP as Lattice
O-D (apex) cuboid
1-D cuboids
2-D cuboids
3-D (base) cuboid
Infection_type
monthId
country
(Infection_type,monthId)
(country,monthId)
(Infection_type,country)
(Infection_type,country,monthId)

HBase MOLAP View
ROWKEY
[Infection_type,country,monthId]
Aggregate Column Family
Detection Count(COUNT Distinct)
GEN-JP-1 10 4a44dc15364204a
GEN-JP-2 12 e80e9039455cc
GEN-JP-3 9 f1e5233ade6af
GEN-JP-4 15 a80fe80e90
GEN-JP-5 5 3ade6af1dd5
GEN-JP-6 12 a44dc1536420
GEN-JO-1 2 ….
GEN-JO-2 1 ….
GEN-JO-3 0 ….
GEN-JO-4 5 …..
GEN-JO-5 2 …..
GEN-JO-6 1 …...
**hashes are representative
Hyperloglog Hash

 Hyperloglog
 Used for approximate count distinct queries
 Store HLL hash in 5 bytes in HBase columns
 Apply monoid SUM pattern to rollup
 Bloom Filter
 Used for checking whether an incoming stream element is “not” a member of a set
 False negative never happens, i.e. an element “definitely not in set” is always
correct
 Also used by Hbase to ascertain whether input row key is part of a Hfile
 Count-Min Sketch
 Used for counting frequencies of specific elements in sub-linear space
 Twitter’s Algebird library with Spark for HLL and CMS implementation
Probabilistic Data Structures

Real-Time Query Response Server
Query Controller
Calcite HBase
Adapter Yes
Spark Driver on
Jetty
No
SparkSQLQuery
Is
Cuboid
Found
?
HDFS/Hive/HBase
Incoming
Query Response
HBaseQuery

Mais conteúdo relacionado

Destaque

TGGBIO2014Tjanara GorengGoreng

The changing face to workplace learning - Peter DavisLearningandTeaching

cv FAIZAN SIDDIQUIfaizan siddiqui

Using Facebook To Create Your Web Personalitywoelfelr

AGENDA DIGITAL PERUANAClaudia Quenaya Flores

Giving students feedback on assessmentLearningandTeaching

Personality studentpmirandasdccd

From Speech to Conversation: A UX ChallengeSiri Mehus

Praful_ResumePRAFUL NIMBALKAR

Tiroteo en el Empire Statenoaceituna

تفريد التعليم المحاضرة العاشرةد. عائشة بليهش العمري

Automatic speech recognitionBirudugadda Pranathi

Earned value management with Examples | Control Cost | PMBOK | PMPJustAcademy

Kirkpatrick 4 level evaluation modelzhumin

Rueda de reconocimiento escultura griegaFernando Gómez

Destaque (15)

TGGBIO2014

The changing face to workplace learning - Peter Davis

cv FAIZAN SIDDIQUI

Using Facebook To Create Your Web Personality

AGENDA DIGITAL PERUANA

Giving students feedback on assessment

Personality student

From Speech to Conversation: A UX Challenge

Praful_Resume

Tiroteo en el Empire State

تفريد التعليم المحاضرة العاشرة

Automatic speech recognition

Earned value management with Examples | Control Cost | PMBOK | PMP

Kirkpatrick 4 level evaluation model

Rueda de reconocimiento escultura griega

Último

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

A Call to Action for Generative AI in 2024Results

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Slack Application Development 101 Slidespraypatel2

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Real Time Object Detection Using Open CVKhem

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Security analytics at web scale

1. Security Analytics at Web Scale pratim_mukherjee@symantec.com

2.  Bangladesh Bank Chief Resigns After Cyber Theft of $81 million  New York Times (Mar 15,2016)  Cybercrime is a key fraud risk in India  ey.com (Jan 20,2016)  Target settles for $39 million over data breach  Cnn.com (Dec 2,2015)  Anthem is warning consumers about its huge data breach  Los Angeles Times.com (Mar,2015)  Ashley Madison  Anyone Here !! Why Should You Care !

3.  Incident Response  Identify root cause and fix vulnerabilities  Intrusion Detection  Monitor network and systems for malicious activities  Alert Prioritization  Reduce false positives to stop the threat with highest impact  Predicting Compromises  Predict attacks based on vulnerability, command & control activity and past infections  Access Analytics  Isolate unusual user behavior e.g. concurrent geographical login  Simulation  Simulate various attacks by doing internal pen testing and take precautions based on log mining  Simulate insider attack on data loss prevention software and take precautions based on its logs What is Security Analytics

4.  No real time query on Petabytes  Reduce data in stages like a funnel Web Scale - Dealing with Petabytes Streaming Logs Kafka Log Parser HiveSemi Aggregates HBase MOLAP CubesKafka Client

5.  Relational OLAP (ROLAP)  SQL kind of queries from client front-end tools for a relational back-end database.  ROLAP servers include optimization for each DBMS back end, implementation of aggregation navigation logic, and additional tools and services  ROLAP technology tends to have greater scalability than MOLAP technology  Multi-dimensional OLAP (MOLAP)  Query materialized views , think about Partially Ordered Sets (POSET)  The advantage of using a data cube is that it allows fast indexing to pre- computed summarized data and usually much faster than ROLAP  Difficult to scale because of “curse of dimensionality” Hybrid OLAP

6. Visualization of MOLAP as Lattice O-D (apex) cuboid 1-D cuboids 2-D cuboids 3-D (base) cuboid Infection_type monthId country (Infection_type,monthId) (country,monthId) (Infection_type,country) (Infection_type,country,monthId)

7. HBase MOLAP View ROWKEY [Infection_type,country,monthId] Aggregate Column Family Detection Count(COUNT Distinct) GEN-JP-1 10 4a44dc15364204a GEN-JP-2 12 e80e9039455cc GEN-JP-3 9 f1e5233ade6af GEN-JP-4 15 a80fe80e90 GEN-JP-5 5 3ade6af1dd5 GEN-JP-6 12 a44dc1536420 GEN-JO-1 2 …. GEN-JO-2 1 …. GEN-JO-3 0 …. GEN-JO-4 5 ….. GEN-JO-5 2 ….. GEN-JO-6 1 …... **hashes are representative Hyperloglog Hash

8.  Hyperloglog  Used for approximate count distinct queries  Store HLL hash in 5 bytes in HBase columns  Apply monoid SUM pattern to rollup  Bloom Filter  Used for checking whether an incoming stream element is “not” a member of a set  False negative never happens, i.e. an element “definitely not in set” is always correct  Also used by Hbase to ascertain whether input row key is part of a Hfile  Count-Min Sketch  Used for counting frequencies of specific elements in sub-linear space  Twitter’s Algebird library with Spark for HLL and CMS implementation Probabilistic Data Structures

9. Real-Time Query Response Server Query Controller Calcite HBase Adapter Yes Spark Driver on Jetty No SparkSQLQuery Is Cuboid Found ? HDFS/Hive/HBase Incoming Query Response HBaseQuery

10.  Questions/Comments Thank You

Security analytics at web scale

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (15)

Último

Último (20)

Security analytics at web scale