SlideShare uma empresa Scribd logo
1 de 14
Combat Cyber Threats
with Cloudera Impala & Apache Hadoop
Justin Erickson | Director, Product Management, Cloudera
Wayne Wheeles | Analytic, Infrastructure and Enrichment Developer Cyber
Security, Six3 Systems
July 2013
Agenda
What’s new in Impala?
• Impala recap
• Impala 1.1
• Authorization with Sentry
Cyber security with Impala
• Cyber security demo overview
• Working with WebProxy Data
• Working with Netflow Data
• IDS Amplification and Correlation “holy grail use case”
• Discussion and questions
2
Cloudera Impala
3
Interactive SQL for Hadoop
 Responses in seconds
 ANSI-92 standard SQL with Hive SQL
Native MPP Query Engine
 Purpose-built for low-latency queries
 Separate runtime from MapReduce
 Designed as part of the Hadoop ecosystem
Open Source
 Apache-licensed
Benefits of Impala
4
More & Faster Value from “Big Data”
 Interactive BI/analytics experience via SQL
 No delays from data migration
Flexibility
 Query across existing data
 Select best-fit file formats (Parquet, Avro, etc.)
 Run multiple frameworks on the same data at the same time
Cost Efficiency
 Reduce movement, duplicate storage & compute
 10% to 1% the cost of analytic DBMS
Full Fidelity Analysis
 No loss from aggregations or fixed schemas
Impala 1.1 (released July 23, 2013)
Sentry support
• Fine-grained authorization
• Role-based authorization
Support for views
Performance
• Parquet columnar
performance
• Join order sorted by table size
• More efficient metadata
refresh for larger installations
Additional SQL
• SQL-89 joins (in addition to
existing SQL-92)
• LOAD function
• REFRESH command for
JDBC/ODBC
Improved HBase
support
• Binary types
• Caching configuration
©2013 Cloudera, Inc. All Rights
Reserved.
5
Previous State of Authorization
6
Insecure Advisory Authorization
Users can grant themselves permissions
Intended to prevent accidental deletion of data
Problem: Doesn’t guard against malicious users
HDFS Impersonation
Data is protected at the file level by HDFS permissions
Problem: File-level not granular enough
Problem: Not role-based
Two Sub-Optimal Choices for SQL on Hadoop
Sentry with CDH4.3 Hive and Impala 1.1
7
Secure Authorization
Ability to control access to data and/or privileges on data for
authenticated users
Fine-Grained Authorization
Ability to give users access to a subset of data in a database
Role-Based Authorization
Ability to create/apply templatized privileges based on
functional roles
Multi-Tenant Administration
Ability for central admin group to empower lower-level
admins to manage security for each database/schema
Part of an overall infosec landscape
8
Perimeter
Guarding access to the
cluster itself
Technical Concepts:
Authentication
Network isolation
Data
Protecting data in the
cluster from
unauthorized visibility
Technical Concepts:
Encryption
Data masking
Access
Defining what users
and applications can do
with data
Technical Concepts:
Permissions
Authorization
Visibility
Reporting on where
data came from and
how it’s being used
Technical Concepts:
Auditing
Lineage
SentryKerberos | Oozie | Knox Cloudera NavigatorCertified Partners
Available 7/23
Agenda – Cyber security with Impala
What’s new in Impala?
• Impala recap
• Impala 1.1
• Authorization with Sentry
Cyber security with Impala
• Cyber security demo overview
• Working with WebProxy Data
• Working with Netflow Data
• IDS Amplification and Correlation “holy grail use case”
• Discussion and questions
9
Impala Mission Demonstration Platform
10
Application Server
Cloudera - CDH 4 Cluster
sherpa4
sherpa3 sherpa2 sherpa1
• Cloudera Manager
• HDFS
• Impala
• HBASE
• MR
• HIVE
• HDFS
• Impala
• HBASE
• MR
• HIVE
• HDFS (NN)
• Impala (State Store)
• HBASE(RS)
• MR
• HUE
• Oozie
• Zookeeper
• HIVE
Organization
Network
Gateway to
Internet
S
E
N
S
O
R
Netflow
WebProxy
IDS
Demo Platform Data Sets
Webinar Data Sets
• Netflow Data
• The term flow refers to a single data flow
connection between two hosts, defined
uniquely by its five-tuple.
• http://tools.netsa.cert.org/silk/
• IDS/IPS Data
• a device or software application that
monitors network or system activities for
malicious activities or policy violations and
produces reports to a management station
• http://www.snort.org
• WebProxy Data
• WebProxy for request by users within the
corporate domain.
Enrichment Data Sets
• Geographic enrichment
• Geo-location information of addresses
• http://dev.maxmind.com/
• Blacklist Information
• Address list of addresses identified as
potential threat
• http://www.autoshun.org/
• Whitelist Information
• Addresses known located within the
corporate network
• Statistical Cubes
• Cubes built for the purpose of providing
statistical amplification for analysis
11
Demonstration
12
Impala Mission Demonstration Platform
13
Why Impala for Cyber Security?
Cloudera Impala and HDFS are a great choice for cyber
security:
• Offers one powerful and secure platform for
structured and unstructured data.
• Uniquely provides the capability to store large
amounts of data at a acceptable price point.
• Sentry provides even greater protection for your
cyber security data.
Thank You
• Ask questions on the Q&A tab
• Recording will be available
at cloudera.com
• After webinar, inquire at:
info@cloudera.com
• Contact info:
Email:
sherpasurfing@gmail.com
impala-user@cloudera.org
Twitter:
@WayneWheeles
@JustinErickson
@Cloudera
14
Cloudera Impala
cloudera.com/impala
“Imagination is more important than
knowledge. For knowledge is limited to all
we now know and understand, while
imagination embraces the entire world, and
all there ever will be to know and
understand.”
~Albert Einstein
Six3 Cyber Security Demo
https://github.com/sherpasurfing

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
Enterprise Hadoop in the Cloud. In Minutes. | How to Run Cloudera Enterprise ...
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac... Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
Cloudera Data Science Workbench: sparklyr, implyr, and More - dplyr Interfac...
 
Unlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator OptimizerUnlock Hadoop Success with Cloudera Navigator Optimizer
Unlock Hadoop Success with Cloudera Navigator Optimizer
 
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and BeyondStanding Up an Effective Enterprise Data Hub -- Technology and Beyond
Standing Up an Effective Enterprise Data Hub -- Technology and Beyond
 
Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?Hadoop on Cloud: Why and How?
Hadoop on Cloud: Why and How?
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber Threats
 
Seeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the DataSeeking Cybersecurity--Strategies to Protect the Data
Seeking Cybersecurity--Strategies to Protect the Data
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache HadoopData Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
 
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft AzureSelf-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BTMulti-Tenant Operations with Cloudera 5.7 & BT
Multi-Tenant Operations with Cloudera 5.7 & BT
 

Semelhante a Combat Cyber Threats with Cloudera Impala & Apache Hadoop

Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
DataWorks Summit
 
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. DImperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
scoopnewsgroup
 

Semelhante a Combat Cyber Threats with Cloudera Impala & Apache Hadoop (20)

Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
BigData Security - A Point of View
BigData Security - A Point of ViewBigData Security - A Point of View
BigData Security - A Point of View
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Hadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happyHadoop Security Features That make your risk officer happy
Hadoop Security Features That make your risk officer happy
 
Hadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happyHadoop Security Features that make your risk officer happy
Hadoop Security Features that make your risk officer happy
 
Preparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity RenaissancePreparing for the Cybersecurity Renaissance
Preparing for the Cybersecurity Renaissance
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
HIPAA Compliance in the Cloud
HIPAA Compliance in the CloudHIPAA Compliance in the Cloud
HIPAA Compliance in the Cloud
 
Security Threats to Hadoop: Data Leakage Attacks and Investigation
Security Threats to Hadoop: Data Leakage Attacks  and InvestigationSecurity Threats to Hadoop: Data Leakage Attacks  and Investigation
Security Threats to Hadoop: Data Leakage Attacks and Investigation
 
Equinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journeyEquinix Big Data Platform and Cassandra - A view into the journey
Equinix Big Data Platform and Cassandra - A view into the journey
 
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. DImperative Induced Innovation - Patrick W. Dowd, Ph. D
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
 
Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)Securing Hadoop in an Enterprise Context (v2)
Securing Hadoop in an Enterprise Context (v2)
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Five steps to secure big data
Five steps to secure big dataFive steps to secure big data
Five steps to secure big data
 
Bringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache HadoopBringing Trus and Visibility to Apache Hadoop
Bringing Trus and Visibility to Apache Hadoop
 

Mais de Cloudera, Inc.

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Combat Cyber Threats with Cloudera Impala & Apache Hadoop

  • 1. Combat Cyber Threats with Cloudera Impala & Apache Hadoop Justin Erickson | Director, Product Management, Cloudera Wayne Wheeles | Analytic, Infrastructure and Enrichment Developer Cyber Security, Six3 Systems July 2013
  • 2. Agenda What’s new in Impala? • Impala recap • Impala 1.1 • Authorization with Sentry Cyber security with Impala • Cyber security demo overview • Working with WebProxy Data • Working with Netflow Data • IDS Amplification and Correlation “holy grail use case” • Discussion and questions 2
  • 3. Cloudera Impala 3 Interactive SQL for Hadoop  Responses in seconds  ANSI-92 standard SQL with Hive SQL Native MPP Query Engine  Purpose-built for low-latency queries  Separate runtime from MapReduce  Designed as part of the Hadoop ecosystem Open Source  Apache-licensed
  • 4. Benefits of Impala 4 More & Faster Value from “Big Data”  Interactive BI/analytics experience via SQL  No delays from data migration Flexibility  Query across existing data  Select best-fit file formats (Parquet, Avro, etc.)  Run multiple frameworks on the same data at the same time Cost Efficiency  Reduce movement, duplicate storage & compute  10% to 1% the cost of analytic DBMS Full Fidelity Analysis  No loss from aggregations or fixed schemas
  • 5. Impala 1.1 (released July 23, 2013) Sentry support • Fine-grained authorization • Role-based authorization Support for views Performance • Parquet columnar performance • Join order sorted by table size • More efficient metadata refresh for larger installations Additional SQL • SQL-89 joins (in addition to existing SQL-92) • LOAD function • REFRESH command for JDBC/ODBC Improved HBase support • Binary types • Caching configuration ©2013 Cloudera, Inc. All Rights Reserved. 5
  • 6. Previous State of Authorization 6 Insecure Advisory Authorization Users can grant themselves permissions Intended to prevent accidental deletion of data Problem: Doesn’t guard against malicious users HDFS Impersonation Data is protected at the file level by HDFS permissions Problem: File-level not granular enough Problem: Not role-based Two Sub-Optimal Choices for SQL on Hadoop
  • 7. Sentry with CDH4.3 Hive and Impala 1.1 7 Secure Authorization Ability to control access to data and/or privileges on data for authenticated users Fine-Grained Authorization Ability to give users access to a subset of data in a database Role-Based Authorization Ability to create/apply templatized privileges based on functional roles Multi-Tenant Administration Ability for central admin group to empower lower-level admins to manage security for each database/schema
  • 8. Part of an overall infosec landscape 8 Perimeter Guarding access to the cluster itself Technical Concepts: Authentication Network isolation Data Protecting data in the cluster from unauthorized visibility Technical Concepts: Encryption Data masking Access Defining what users and applications can do with data Technical Concepts: Permissions Authorization Visibility Reporting on where data came from and how it’s being used Technical Concepts: Auditing Lineage SentryKerberos | Oozie | Knox Cloudera NavigatorCertified Partners Available 7/23
  • 9. Agenda – Cyber security with Impala What’s new in Impala? • Impala recap • Impala 1.1 • Authorization with Sentry Cyber security with Impala • Cyber security demo overview • Working with WebProxy Data • Working with Netflow Data • IDS Amplification and Correlation “holy grail use case” • Discussion and questions 9
  • 10. Impala Mission Demonstration Platform 10 Application Server Cloudera - CDH 4 Cluster sherpa4 sherpa3 sherpa2 sherpa1 • Cloudera Manager • HDFS • Impala • HBASE • MR • HIVE • HDFS • Impala • HBASE • MR • HIVE • HDFS (NN) • Impala (State Store) • HBASE(RS) • MR • HUE • Oozie • Zookeeper • HIVE Organization Network Gateway to Internet S E N S O R Netflow WebProxy IDS
  • 11. Demo Platform Data Sets Webinar Data Sets • Netflow Data • The term flow refers to a single data flow connection between two hosts, defined uniquely by its five-tuple. • http://tools.netsa.cert.org/silk/ • IDS/IPS Data • a device or software application that monitors network or system activities for malicious activities or policy violations and produces reports to a management station • http://www.snort.org • WebProxy Data • WebProxy for request by users within the corporate domain. Enrichment Data Sets • Geographic enrichment • Geo-location information of addresses • http://dev.maxmind.com/ • Blacklist Information • Address list of addresses identified as potential threat • http://www.autoshun.org/ • Whitelist Information • Addresses known located within the corporate network • Statistical Cubes • Cubes built for the purpose of providing statistical amplification for analysis 11
  • 13. 13 Why Impala for Cyber Security? Cloudera Impala and HDFS are a great choice for cyber security: • Offers one powerful and secure platform for structured and unstructured data. • Uniquely provides the capability to store large amounts of data at a acceptable price point. • Sentry provides even greater protection for your cyber security data.
  • 14. Thank You • Ask questions on the Q&A tab • Recording will be available at cloudera.com • After webinar, inquire at: info@cloudera.com • Contact info: Email: sherpasurfing@gmail.com impala-user@cloudera.org Twitter: @WayneWheeles @JustinErickson @Cloudera 14 Cloudera Impala cloudera.com/impala “Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces the entire world, and all there ever will be to know and understand.” ~Albert Einstein Six3 Cyber Security Demo https://github.com/sherpasurfing

Notas do Editor

  1. Interactive SQL for HadoopResponses in seconds vs. minutes or hours4-100x faster than HiveNearly ANSI-92 standard SQL with HiveQLCREATE, ALTER, SELECT, INSERT, JOIN, subqueries, etc.ODBC/JDBC drivers Compatible SQL interface for existing Hadoop/CDH applicationsNative MPP Query EnginePurpose-built for low latency queries – another application being brought to HadoopSeparate runtime from MapReduce which is designed for batch processingTightly integrated with Hadoop ecosystem – major design imperative and differentiator for ClouderaSingle system (no integration)Native, open file formats that are compatible across the ecosystem (no copying)Single metadata model (no synchronization)Single set of hardware and system resources (better performance, lower cost)Integrated, end-to-end security (no vulnerabilities)Open SourceKeeps with our strategy of an open platform – i.e. if it stores or processes data, it’s open sourceApache-licensedCode available on Github
  2. More & Faster Value from Big DataProvides an interactive BI/Analytics experience on HadoopPreviously BI/Analytics was impractical due to the batch orientation of MapReduceEnables more users to gain value from organizational data assets (SQL/BI users)Makes more data available for analysis (raw data, multi-structured data, historical data)Removes delays from data migrationInto specialized analytical DBMSsInto proprietary file formats that happen to be stored in HDFSInto transient in-memory storesFlexibilityQuery across existing data in HadoopHDFS and HBaseAccess data immediately and directly in its native formatSelect best-fit file formatsUse raw data formats when unsure of access patterns (text files, RCFiles, LZO)Increase performance with optimized file formats when access patterns are known (Parquet, Avro)All file formats are compatible across the entire Hadoop ecosystem – i.e. MapReduce, Pig, Hive, Impala, etc. on the same data at the same timeCost EfficiencyReduce movement, duplicate storage & computeData movement: no time or resource penalty for migrating data into specialized systems or formatsDuplicate storage: no need to duplicate data across systems or within the same system in different file formatsCompute: use the same compute resources as the rest of the Hadoop system – You don’t need a separate set of nodes to run interactive query vs. batch processing (MapReduce)You don’t need to overprovision your hardware to enable memory-intensive, on-the-fly format conversions10% to 1% the cost of analytic DMBSLess than $1,000/TBFull Fidelity AnalysisNo loss of fidelity from aggregations or conforming to fixed schemasIf the attribute exists in the raw data, you can query against it
  3. This is an overview of my simple cluster I put together for the Webinar, 4 nodes in total: 3 node Hadoop Cluster and an Application Server.So the configuration here is one that would be present in many public and private organizationsWe have placed a sensor at the gateway or gateway(s) across the enterprise monitoring traffic incoming and outgoing.This information is captured by a variety of sensor/collectors and written to files on a regular basis.So now lets go through the data sets.
  4. 1.) Provide a brief tour of the cluster using Cloudera Manager