O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
Why do we care about security?• SecureCommerceWebSite, Inc has a product that has both paid ads and search• “Payment Fraud” team needs logs of all credit card payments• “Search Quality” team needs all search logs and click history• “Ads Fraud” team needs to access both search logs and payment info • So we cant segregate these datasets to different clusters• If they can share a cluster, we also get better utilization!
Security pre CDH3: User Authentication• Authentication is by vigorous assertion• Trivial to impersonate other user: • Just set property “hadoop.job.ugi” when running job or command• Group resolution is done client side
Security pre CDH3: HDFS• Unix-like file permissions were introduced in Hadoop v16.1• Provides standard user/group/other r/w/x• Protects well-meaning users from accidents• Does nothing to prevent malicious users from causing harm (weak authentication)
Security pre CDH3: Job Control• ACLs per job queue for job submission / killing• No ACLs for viewing counters / logs• Does nothing to prevent malicious users from causing harm (weak authentication)
Security pre CDH3: Tasks• Individual tasks all run as the same user • Whoever the TT is running as (usually hadoop)• Tasks not isolated from each other • Tasks which read/write from local storage can interfere with each other • Malicious tasks can kill each other• Hadoop is designed to execute arbitrary code
Security with CDH3: User Authentication• Authentication is secured by Kerberos v5 • RPC connections secured with SASL “GSSAPI” mechanism • Provides proven, strong authentication and single-sign-on• Hadoop servers can ensure that users are who they say they are• Group resolution is done on the server side
Security with CDH3: Server Authentication• Kerberos authentication is bi-directional• Users can be sure that they are communicating with the Hadoop server they think they are
Security with CDH3: HDFS• Same general permissions model • Added sticky bit for directories (e.g. /tmp)• But, a user can no longer trivially impersonate other users (strong authentication)
Security with CDH3: Job Control• A job now has its own ACLs, including a view ACL• Job can now specify who can view logs, counters, configuration, and who can modify (kill) it• JT enforces these ACLs (strong authentication)
Security with CDH3: Tasks• Tasks now run as the user who launched the job • Probably the most complex part of Hadoops security implementation• Ensures isolation of tasks which run on the same TT • Local file permissions enforced • Local system permissions enforced (e.g. signals)• Can take advantage of per-user system limits • e.g. Linux ulimits
Security with CDH3: Web Interfaces• Out of the box Kerberized SSL support• Pluggable servlet filters (more on this later)
Security with CDH3: Threat Model• The Hadoop security system assumes that: • Users do not have root access to cluster machines • Users do not have root access to shared user machines (e.g. bastion box) • Users cannot read or inject packets on the network
Thanks, Yahoo!Yahoo! did the vast majority of the core Hadoop security work
Requirements: Kerberos Infrastructure• Kerberos domain (KDC) • eg. MIT Krb5 in RHEL, or MS Active Directory• Kerberos principals (SPNs) for every daemon • hdfs/hostname@REALM for DN, NN, 2NN • mapred/hostname@REALM for TT and JT • host/hostname@REALM for web UIs• Keytabs for service principals distributed to correct hosts
Configuring daemons for security• Most daemons have two configs: • Keytab location (eg dfs.datanode.keytab.file) • Kerberos principal (eg dfs.datanode.kerberos.principal)• Principal can use the special token _HOST to substitute hostname of the daemon (eg hdfs/_HOST@MYREALM)• Several other configs to enable security in the first place • See example-confs/conf.secure in CDH3
Setting up users• Each user must have a Kerberos principal• May want some shared accounts: • sharedaccount/alice and sharedaccount/bob principals both act as sharedaccount on HDFS - you can use this! • hdfs/alice is also useful for alice to act as a superuser• Users running MR jobs must also have unix accounts on each of the slaves• Centralized user database (eg LDAP) is a practical necessity
Installing Secure Hadoop• MapReduce and HDFS services should run as separate users (e.g. hdfs and mapred)• New task-controller setuid executable allows tasks to run as a user• New JNI code in libhadoop.so to plug subtle security holes• Install CDH3 with hadoop-0.20-sbin and hadoop- 0.20-native packages to get this all set up
Securing higher-level services• Many “middle tier” applications need to act on behalf of their clients when interacting with Hadoop • e.g: Oozie, Hive Server, Hue/Beeswax• “Proxy User” feature provides secure impersonation (think sudo). • hadoop.proxyuser.oozie.hosts - IPs where “oozie” may act as an impersonator • hadoop.proxyuser.oozie.groups - groups whose users “oozie” may impersonate
Customizing Security• Current plug-in points: • hadoop.http.filter.initializers - may configure a custom ServletFilter to integrate with existing enterprise web SSO • hadoop.security.group.mapping - map a kerberos principal (alice@FOOCORP.COM) to a set of groups (users,engstaff,searchquality,adsdata) • hadoop.security.auth_to_local - regex mappings of Kerberos principals to usernames
Deployment Gotchas• MIT Kerberos 1.8.1 (in Ubuntu, RHEL 5.6+) incompatible with Java Krb5 implementation • Run “kinit -R” after kinit to work around• Enable allow_weak_crypto in /etc/krb5.conf - necessary for kerberized SSL• Must deploy “unlimited security policy JAR” in JAVA_HOME/jre/lib/security• Lifesaver: HADOOP_OPTS= ”-Dsun.security.krb5.debug=true” hadoop ...
Best Practices for AD Integration• MIT Kerberos realm inside cluster: • CLUSTER.FOOCORP.COM• Existing Active Directory domain: • FOOCORP.COM or maybe AD.FOOCORP.COM• Set up one-way cross-realm trust • Cluster realm must trust corporate AD realm • See “Step by Step Guide to Kerberos 5 Interoperability” in Windows Server docs
What Hadoop Security Is• Strong authentication • Malicious impersonation now impossible• Better authorization • More control over who can view/control jobs• Ensure isolation between running tasks• An ongoing development priority
What Hadoop Security Is Not• Encryption on the wire• Encryption on disk• Protection against DOS attacks• Enabled by default
Security Beyond Core Hadoop• Comprehensive documentation and best practices • https://ccp.cloudera.com/display/CDHDOC/CDH3+Security+Guide• All components of CDH3 are capable of interacting with a secure Hadoop cluster• Hive 0.7 (included in CDH3) added a rich set of access controls• Much easier deployment if you use Cloudera Enterprise
Security Roadmap• Pluggable “edge authentication” (eg PKI, SAML)• More authorization features across CDH components • e.g. HBase access controls• Data encryption support
Questions? Aaron T. Myersatm@cloudera.com @atm