SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Big	
  Data	
  Security	
  
    Joey	
  Echeverria	
  |	
  Principal	
  Solu8ons	
  Architect	
  
    joey@cloudera.com	
  |	
  @fwiffo	
  




1                                         ©2013 Cloudera, Inc.
Big	
  Data	
  Security	
  




           EARLY	
  DAYS	
  




2	
  
Hadoop	
  File	
  Permissions	
  

    •    Added	
  in	
  HADOOP-­‐1298	
  
          •    Hadoop	
  0.16	
  
          •    Early	
  2008	
  
    •    Authoriza8on	
  without	
  authen8ca8on	
  
    •    POSIX-­‐like	
  RWX	
  bits	
  




3
MapReduce	
  ACLs	
  

    •    Added	
  in	
  HADOOP-­‐3698	
  
          •    Hadoop	
  0.19	
  
          •    Late	
  2008	
  
    •    ACLs	
  per	
  job	
  queue	
  
    •    Set	
  a	
  list	
  of	
  allowed	
  users	
  or	
  groups	
  per	
  opera8on	
  
          •    Job	
  submission	
  
          •    Job	
  administra8on	
  
    •    No	
  authen8ca8on	
  



4
Securing	
  a	
  Cluster	
  Through	
  a	
  Gateway	
  

    •    Hadoop	
  cluster	
  runs	
  on	
  a	
  private	
  network	
  
    •    Gateway	
  server	
  dual-­‐homed	
  (Hadoop	
  network	
  and	
  
         public	
  network)	
  
    •    Users	
  SSH	
  onto	
  gateway	
  
          •    Op8onally	
  can	
  create	
  an	
  SSH	
  proxy	
  for	
  jobs	
  to	
  be	
  
               submi`ed	
  from	
  the	
  client	
  machine	
  
    •    Provides	
  minimum	
  level	
  of	
  protec8on	
  




5
Big	
  Data	
  Security	
  




           WHY	
  SECURITY	
  MATTERS	
  




6	
  
Prevent	
  Accidental	
  Access	
  

    •    Don’t	
  let	
  users	
  shoot	
  themselves	
  in	
  the	
  foot	
  
    •    Main	
  driver	
  for	
  early	
  features	
  
    •    Not	
  security	
  per-­‐se,	
  but	
  a	
  cri8cal	
  first	
  step	
  
    •    Doesn’t	
  require	
  strong	
  authen8ca8on	
  




7
Stop	
  Malicious	
  Users	
  

    •    Early	
  features	
  were	
  necessary,	
  but	
  not	
  sufficient	
  
    •    Security	
  has	
  to	
  get	
  real	
  
    •    Hadoop	
  runs	
  arbitrary	
  code	
  
    •    Implicit	
  trust	
  doesn’t	
  prevent	
  the	
  insider	
  threat	
  




8
Co-­‐mingle	
  All	
  Your	
  Data	
  

    •    Ofen	
  overlooked	
  
    •    Big	
  data	
  means	
  gegng	
  rid	
  of	
  stovepipes	
  
          •    Scalability	
  and	
  flexibility	
  are	
  only	
  50%	
  of	
  the	
  problem	
  
          •    Trust	
  your	
  data	
  in	
  a	
  mul8-­‐tenant	
  environment	
  
    •    Most	
  cri8cal	
  driver	
  




9
Big	
  Data	
  Security	
  




            AN	
  EVOLVING	
  STORY	
  




10	
  
Authoriza8on	
  

     •    Files	
  
     •    MapReduce/YARN	
  job	
  queues	
  
     •    Service-­‐level	
  authoriza8on	
  
          •    Whitelists	
  and	
  blacklists	
  of	
  hosts	
  and	
  users	
  




11
Authen8ca8on	
  
                      2.2    High Level Use Cases                                            2    USE CASES
     •      HADOOP-­‐4487	
  
             •    Hadoop	
  0.22	
  and	
  0.20.205	
  
                     2.2 High Level Use Cases
                       1. Applications accessing files on HDFS clusters Non-MapReduce ap-
             •    Late	
  2010	
   including hadoop fs, access files stored on one or more HDFS
                           plications,
                             clusters. The application should only be able to access files and services
     •      Based	
  on	
  Kerberos	
  and	
  internal	
  delega8on	
  tokens	
  
                             they are authorized to access. See figure 1. Variations:

                              (a) Access HDFS directly using HDFS protocol.
             •    Provides	
  strong	
  user	
  authen8ca8on	
   servers via the HFTP
                            (b) Access HDFS indirectly though HDFS proxy
                                FileSystem or HTTP get.
             •    Also	
  used	
  for	
  service-­‐to-­‐service	
  authen8ca8on	
  
     	
                                         (joe)
                                                           Name
                                                           Node       delg(jo
                                                                                 e)
                                           kerb
                                                                                      MapReduce
                            Application
                                                              kerb(hdfs)                 Task
                                          bloc                                   n
                                              k to                            oke
                                                   ken                    ck t
                                                           Data       blo
                                                           Node



                                                 Figure 1: HDFS High-level Dataflow
12

                        2. Applications accessing third-party (non-Hadoop) services Non-
                           MapReduce applications and MapReduce tasks accessing files or opera-
Encryp8on	
  

     •    Over	
  the	
  wire	
  encryp8on	
  for	
  some	
  socket	
  
          connec8ons	
  
     •    RPC	
  encryp8on	
  added	
  soon	
  afer	
  Kerberos	
  
     •    Shuffle	
  encryp8on	
  (HTTPS)	
  added	
  in	
  Hadoop	
  2.0.2-­‐
          alpha,	
  back	
  ported	
  to	
  CDH4	
  MR1	
  
     •    HDFS	
  block	
  streamer	
  encryp8on	
  added	
  in	
  Hadoop	
  
          2.0.2-­‐alpha	
  
     •    Volume-­‐level	
  encryp8on	
  for	
  data	
  at	
  rest	
  



13
Big	
  Data	
  Security	
  




            SECURITY	
  FOR	
  KEY	
  VALUE	
  STORES	
  




14	
  
Apache	
  Accumulo	
  

     •    Robust,	
  scalable,	
  high	
  performance	
  data	
  storage	
  and	
  
          retrieval	
  system	
  
     •    Built	
  by	
  NSA,	
  now	
  an	
  Apache	
  project	
  
     •    Based	
  on	
  Google’s	
  BigTable	
  
     •    Built	
  on	
  top	
  of	
  HDFS,	
  ZooKeeper	
  and	
  Thrif	
  
     •    Iterators	
  for	
  server-­‐side	
  extensions	
  
     •    Cell	
  labels	
  for	
  flexible	
  security	
  models	
  




15
Data	
  Model	
  

     •    Mul8-­‐dimensional,	
  persistent,	
  sorted	
  map	
  
     •    Key/Value	
  store	
  with	
  a	
  twist	
  
     •    A	
  single	
  primary	
  key	
  (Row	
  ID)	
  
     •    Secondary	
  key	
  (Column)	
  internal	
  to	
  a	
  row	
  
           •    Family	
  
           •    Qualifier	
  
     •    Per-­‐cell	
  8mestamp	
  




16
Cell-­‐Level	
  Security	
  

     •    Labels	
  stored	
  per	
  cell	
  
     •    Labels	
  consist	
  of	
  Boolean	
  expressions	
  (AND,	
  OR,	
  
          nes8ng)	
  
     •    Labels	
  associated	
  with	
  each	
  user	
  
     •    Cell	
  labels	
  checked	
  against	
  user’s	
  labels	
  with	
  a	
  built-­‐
          in	
  iterator	
  




17
Pluggable	
  Authen8ca8on	
  

     •    Currently	
  supports	
  username/password	
  
          authen8ca8on	
  backed	
  by	
  ZooKeeper	
  
     •    ACCUMULO-­‐259	
  
           •    Targeted	
  for	
  Accumulo	
  1.5.0	
  
     •    Authen8ca8on	
  info	
  replaced	
  with	
  generic	
  tokens	
  
     •    Supports	
  mul8ple	
  implementa8ons	
  (e.g.	
  Kerberos)	
  




18
Applica8on	
  Level	
  

     •    Accumulo	
  ofen	
  paired	
  with	
  applica8on	
  level	
  
          authen8ca8on/authoriza8on	
  
     •    Accumulo	
  users	
  created	
  per	
  applica8on	
  
     •    Each	
  applica8on	
  granted	
  access	
  level	
  of	
  most	
  
          permi`ed	
  user	
  
     •    Applica8on	
  authen8cates	
  users,	
  grabs	
  user	
  
          authoriza8ons,	
  passes	
  user	
  labels	
  with	
  requests	
  




19
Apache	
  HBase	
  

     •    Also	
  based	
  on	
  Google’s	
  BigTable	
  
     •    Started	
  as	
  a	
  Hadoop	
  contrib	
  project	
  
     •    Supports	
  column-­‐level	
  ACLs	
  
     •    Kerberos	
  for	
  authen8ca8on	
  
     •    Discussion	
  and	
  early	
  prototypes	
  of	
  cell-­‐level	
  security	
  
          ongoing	
  




20
Big	
  Data	
  Security	
  




            FUTURE	
  




21	
  
Encryp8on	
  for	
  Data	
  at	
  Rest	
  

     •    Need	
  mul8ple	
  levels	
  of	
  granularity	
  
     •    Encryp8on	
  keys	
  8ed	
  to	
  authoriza8on	
  labels	
  (like	
  
          Accumulo	
  labels	
  or	
  HBase	
  ACLs)	
  
     •    APIs	
  for	
  file-­‐level,	
  block-­‐level,	
  or	
  record-­‐level	
  
          encryp8on	
  




22
Hive	
  Security	
  

     •    Column-­‐level	
  ACLs	
  
     •    Kerberos	
  authen8ca8on	
  
     •    AccessServer	
  




23
24   ©2013 Cloudera, Inc.

Mais conteúdo relacionado

Mais procurados

Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureDavid Chou
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
Live memory forensics
Live memory forensicsLive memory forensics
Live memory forensicsMehedi Hasan
 
VMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsVMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsAnne Achleman
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera, Inc.
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopCloudera, Inc.
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...Edureka!
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Networking in linux
Networking in linuxNetworking in linux
Networking in linuxVarnnit Jain
 
Microsoft And Cloud Computing
Microsoft And Cloud ComputingMicrosoft And Cloud Computing
Microsoft And Cloud ComputingDavid Chou
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computingVipin Batra
 
Storage Virtualization
Storage VirtualizationStorage Virtualization
Storage Virtualizationrjain51
 
GeoVision : CCTV Solutions : RAID vs Non-RAID System for Storing Surveillance...
GeoVision : CCTV Solutions : RAID vs Non-RAID System for Storing Surveillance...GeoVision : CCTV Solutions : RAID vs Non-RAID System for Storing Surveillance...
GeoVision : CCTV Solutions : RAID vs Non-RAID System for Storing Surveillance...TSOLUTIONS
 

Mais procurados (20)

Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services Architecture
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Live memory forensics
Live memory forensicsLive memory forensics
Live memory forensics
 
Caching
CachingCaching
Caching
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
VMware vSphere Storage Enhancements
VMware vSphere Storage EnhancementsVMware vSphere Storage Enhancements
VMware vSphere Storage Enhancements
 
Cloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera clusterCloudera training: secure your Cloudera cluster
Cloudera training: secure your Cloudera cluster
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
What is Hadoop | Introduction to Hadoop | Hadoop Tutorial | Hadoop Training |...
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Networking in linux
Networking in linuxNetworking in linux
Networking in linux
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Microsoft And Cloud Computing
Microsoft And Cloud ComputingMicrosoft And Cloud Computing
Microsoft And Cloud Computing
 
Sqoop
SqoopSqoop
Sqoop
 
IBM GPFS
IBM GPFSIBM GPFS
IBM GPFS
 
Introduction to cloud computing
Introduction to cloud computingIntroduction to cloud computing
Introduction to cloud computing
 
Storage Virtualization
Storage VirtualizationStorage Virtualization
Storage Virtualization
 
GeoVision : CCTV Solutions : RAID vs Non-RAID System for Storing Surveillance...
GeoVision : CCTV Solutions : RAID vs Non-RAID System for Storing Surveillance...GeoVision : CCTV Solutions : RAID vs Non-RAID System for Storing Surveillance...
GeoVision : CCTV Solutions : RAID vs Non-RAID System for Storing Surveillance...
 
ZFS
ZFSZFS
ZFS
 

Destaque

Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview Hortonworks
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data miningharithavijay94
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Kevin Minder
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Peter Wood
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersDataWorks Summit
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXAbhishek Mallick
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authenticationleahculver
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access SecurityCloudera, Inc.
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1Amal Abid
 

Destaque (20)

An Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache KnoxAn Approach for Multi-Tenancy Through Apache Knox
An Approach for Multi-Tenancy Through Apache Knox
 
Hdp security overview
Hdp security overview Hdp security overview
Hdp security overview
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop
HadoopHadoop
Hadoop
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Information security in big data -privacy and data mining
Information security in big data -privacy and data miningInformation security in big data -privacy and data mining
Information security in big data -privacy and data mining
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
 
Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)Big Data and Security - Where are we now? (2015)
Big Data and Security - Where are we now? (2015)
 
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise UsersApache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
Apache Knox Gateway "Single Sign On" expands the reach of the Enterprise Users
 
Apache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOXApache Knox setup and hive and hdfs Access using KNOX
Apache Knox setup and hive and hdfs Access using KNOX
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
OAuth - Open API Authentication
OAuth - Open API AuthenticationOAuth - Open API Authentication
OAuth - Open API Authentication
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
Hadoop and Data Access Security
Hadoop and Data Access SecurityHadoop and Data Access Security
Hadoop and Data Access Security
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1
 

Semelhante a Big Data Security with Hadoop

Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networkspbelko82
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api CompatibilityCloudera, Inc.
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopOwen O'Malley
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopJim Dowling
 
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayHadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayCloudera, Inc.
 
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNLessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNBillie Rinaldi
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNDataWorks Summit
 
CoreOS automated MySQL Cluster Failover using Galera Cluster
CoreOS automated MySQL Cluster Failover using Galera ClusterCoreOS automated MySQL Cluster Failover using Galera Cluster
CoreOS automated MySQL Cluster Failover using Galera ClusterYazz Atlas
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 

Semelhante a Big Data Security with Hadoop (20)

Big data security
Big data securityBig data security
Big data security
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Hw09 Security And Api Compatibility
Hw09   Security And Api CompatibilityHw09   Security And Api Compatibility
Hw09 Security And Api Compatibility
 
Plugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in HadoopPlugging the Holes: Security and Compatability in Hadoop
Plugging the Holes: Security and Compatability in Hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBayHadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
Hadoop World 2011: Hadoop Gateway - Konstantin Schvako, eBay
 
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNLessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
CoreOS automated MySQL Cluster Failover using Galera Cluster
CoreOS automated MySQL Cluster Failover using Galera ClusterCoreOS automated MySQL Cluster Failover using Galera Cluster
CoreOS automated MySQL Cluster Failover using Galera Cluster
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroHBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Big Data Security with Hadoop

  • 1. Big  Data  Security   Joey  Echeverria  |  Principal  Solu8ons  Architect   joey@cloudera.com  |  @fwiffo   1 ©2013 Cloudera, Inc.
  • 2. Big  Data  Security   EARLY  DAYS   2  
  • 3. Hadoop  File  Permissions   •  Added  in  HADOOP-­‐1298   •  Hadoop  0.16   •  Early  2008   •  Authoriza8on  without  authen8ca8on   •  POSIX-­‐like  RWX  bits   3
  • 4. MapReduce  ACLs   •  Added  in  HADOOP-­‐3698   •  Hadoop  0.19   •  Late  2008   •  ACLs  per  job  queue   •  Set  a  list  of  allowed  users  or  groups  per  opera8on   •  Job  submission   •  Job  administra8on   •  No  authen8ca8on   4
  • 5. Securing  a  Cluster  Through  a  Gateway   •  Hadoop  cluster  runs  on  a  private  network   •  Gateway  server  dual-­‐homed  (Hadoop  network  and   public  network)   •  Users  SSH  onto  gateway   •  Op8onally  can  create  an  SSH  proxy  for  jobs  to  be   submi`ed  from  the  client  machine   •  Provides  minimum  level  of  protec8on   5
  • 6. Big  Data  Security   WHY  SECURITY  MATTERS   6  
  • 7. Prevent  Accidental  Access   •  Don’t  let  users  shoot  themselves  in  the  foot   •  Main  driver  for  early  features   •  Not  security  per-­‐se,  but  a  cri8cal  first  step   •  Doesn’t  require  strong  authen8ca8on   7
  • 8. Stop  Malicious  Users   •  Early  features  were  necessary,  but  not  sufficient   •  Security  has  to  get  real   •  Hadoop  runs  arbitrary  code   •  Implicit  trust  doesn’t  prevent  the  insider  threat   8
  • 9. Co-­‐mingle  All  Your  Data   •  Ofen  overlooked   •  Big  data  means  gegng  rid  of  stovepipes   •  Scalability  and  flexibility  are  only  50%  of  the  problem   •  Trust  your  data  in  a  mul8-­‐tenant  environment   •  Most  cri8cal  driver   9
  • 10. Big  Data  Security   AN  EVOLVING  STORY   10  
  • 11. Authoriza8on   •  Files   •  MapReduce/YARN  job  queues   •  Service-­‐level  authoriza8on   •  Whitelists  and  blacklists  of  hosts  and  users   11
  • 12. Authen8ca8on   2.2 High Level Use Cases 2 USE CASES •  HADOOP-­‐4487   •  Hadoop  0.22  and  0.20.205   2.2 High Level Use Cases 1. Applications accessing files on HDFS clusters Non-MapReduce ap- •  Late  2010   including hadoop fs, access files stored on one or more HDFS plications, clusters. The application should only be able to access files and services •  Based  on  Kerberos  and  internal  delega8on  tokens   they are authorized to access. See figure 1. Variations: (a) Access HDFS directly using HDFS protocol. •  Provides  strong  user  authen8ca8on   servers via the HFTP (b) Access HDFS indirectly though HDFS proxy FileSystem or HTTP get. •  Also  used  for  service-­‐to-­‐service  authen8ca8on     (joe) Name Node delg(jo e) kerb MapReduce Application kerb(hdfs) Task bloc n k to oke ken ck t Data blo Node Figure 1: HDFS High-level Dataflow 12 2. Applications accessing third-party (non-Hadoop) services Non- MapReduce applications and MapReduce tasks accessing files or opera-
  • 13. Encryp8on   •  Over  the  wire  encryp8on  for  some  socket   connec8ons   •  RPC  encryp8on  added  soon  afer  Kerberos   •  Shuffle  encryp8on  (HTTPS)  added  in  Hadoop  2.0.2-­‐ alpha,  back  ported  to  CDH4  MR1   •  HDFS  block  streamer  encryp8on  added  in  Hadoop   2.0.2-­‐alpha   •  Volume-­‐level  encryp8on  for  data  at  rest   13
  • 14. Big  Data  Security   SECURITY  FOR  KEY  VALUE  STORES   14  
  • 15. Apache  Accumulo   •  Robust,  scalable,  high  performance  data  storage  and   retrieval  system   •  Built  by  NSA,  now  an  Apache  project   •  Based  on  Google’s  BigTable   •  Built  on  top  of  HDFS,  ZooKeeper  and  Thrif   •  Iterators  for  server-­‐side  extensions   •  Cell  labels  for  flexible  security  models   15
  • 16. Data  Model   •  Mul8-­‐dimensional,  persistent,  sorted  map   •  Key/Value  store  with  a  twist   •  A  single  primary  key  (Row  ID)   •  Secondary  key  (Column)  internal  to  a  row   •  Family   •  Qualifier   •  Per-­‐cell  8mestamp   16
  • 17. Cell-­‐Level  Security   •  Labels  stored  per  cell   •  Labels  consist  of  Boolean  expressions  (AND,  OR,   nes8ng)   •  Labels  associated  with  each  user   •  Cell  labels  checked  against  user’s  labels  with  a  built-­‐ in  iterator   17
  • 18. Pluggable  Authen8ca8on   •  Currently  supports  username/password   authen8ca8on  backed  by  ZooKeeper   •  ACCUMULO-­‐259   •  Targeted  for  Accumulo  1.5.0   •  Authen8ca8on  info  replaced  with  generic  tokens   •  Supports  mul8ple  implementa8ons  (e.g.  Kerberos)   18
  • 19. Applica8on  Level   •  Accumulo  ofen  paired  with  applica8on  level   authen8ca8on/authoriza8on   •  Accumulo  users  created  per  applica8on   •  Each  applica8on  granted  access  level  of  most   permi`ed  user   •  Applica8on  authen8cates  users,  grabs  user   authoriza8ons,  passes  user  labels  with  requests   19
  • 20. Apache  HBase   •  Also  based  on  Google’s  BigTable   •  Started  as  a  Hadoop  contrib  project   •  Supports  column-­‐level  ACLs   •  Kerberos  for  authen8ca8on   •  Discussion  and  early  prototypes  of  cell-­‐level  security   ongoing   20
  • 21. Big  Data  Security   FUTURE   21  
  • 22. Encryp8on  for  Data  at  Rest   •  Need  mul8ple  levels  of  granularity   •  Encryp8on  keys  8ed  to  authoriza8on  labels  (like   Accumulo  labels  or  HBase  ACLs)   •  APIs  for  file-­‐level,  block-­‐level,  or  record-­‐level   encryp8on   22
  • 23. Hive  Security   •  Column-­‐level  ACLs   •  Kerberos  authen8ca8on   •  AccessServer   23
  • 24. 24 ©2013 Cloudera, Inc.