Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
1. The picture can't be displayed.
Technical tips for secure
Apache Hadoop cluster
Akira Ajisaka, Kei Kori
Yahoo Japan Corporation
Big Data
2. Akira Ajisaka (@ajis_ka)
• Software Engineer in Hadoop team @ Yahoo! JAPAN
– Upgraded HDFS to 3.3.0 and enabled RBF
– R&D for more secure Hadoop cluster than just enabling
Kerberos auth
• Apache Hadoop committer/PMC
– ~800 commits in various components in 6 years
– Handled and announced several CVEs
– Manages build and QA environment
3. Kei KORI (@2k0ri)
• Data Platform Engineer
in Hadoop team @ Yahoo! JAPAN
– Built upgrading to and continuous delivery for HDFS 3.3.0
– Research of operation for more secure Hadoop cluster
• Kubernetes admin for Hadoop client environment
– Migrates users from VM/BM to cloud native way
– Integrates ML/DL workloads with Hadoop ecosystem
5. Session Overview
Prerequisites:
• Hadoop is not secure by default
• Kerberos authentication is required
This talk is to introduce further details in practice:
• Wire encryption in Hadoop ecosystem
• HDFS transparent data encryption at rest
• Other considerations
7. Background
For making Hadoop ecosystem more secure than
perimeter security
• Not only authenticate but encrypt communications
• Protection and mitigation from internal threats like
packet sniffing
• Part of security compliance like NIST SP800-171
9. HTTP encryption for Hadoop
• dfs.http.policy: HTTPS_ONLY in hdfs-site,
yarn.http.policy: HTTPS_ONLY in yarn-site,
mapreduce.jobhistory.http.policy: HTTPS_ONLY in mapred-site
etc.
– Enable TLS on WebUI/REST API endpoints
– HTTP_AND_HTTPS while rolling update endpoints
• yarn.timeline-service.webapp.https.address in yarn-site,
mapreduce.jobhistory.webapp.https.address in mapred-site
– Set History/Timeline Server endpoints with HTTPS
• Storing certs and passphrases
using Hadoop Credential Provider into
hadoop.security.credential.provider.path
– Separates permissions from configs
– Prevents exposure outside of hadoop.security.sensitive-config-keys filtering
10. RPC encryption for Hadoop
• hadoop.rpc.protection: privacy in core-site
– Encrypts RPC incl. Kerberos authentication on SASL layer
– Propagates to
hadoop.security.saslproperties.resolver.class,
dfs.data.transfer.saslproperties.resolver.class and
dfs.data.transfer.protection
• hadoop.rpc.protection: privacy,authentication
while rolling update whole Hadoop servers/clients
– Accepts falling back to non-encrypted RPC
11. Block data transfer encryption for
Hadoop
• dfs.encrypt.data.transfer: true,
dfs.encrypt.data.transfer.cipher.suites:
AES/CTR/NoPadding in hdfs-site
– Only encrypts payload between HDFS client and DataNodes
• Rolling update is not supported within configs
– Needs managing list of encrypted nodes or extend/implement
own dfs.trustedchannel.resolver.class
– Trusted nodes by dfs.trustedchannel.resolver.class
are forced to transfer without encryption regardless of its
encryption status
12. Encryption for Spark
In spark-defaults:
• HTTP encryption
– spark.ssl.sparkHistory.enabled true
• Switches protocol on 1 port, does not support HTTP_AND_HTTPS
– spark.yarn.historyServer.address https://...
• RPC encryption
– spark.authenticate: true
• Also in yarn-site
– spark.authenticate.enableSaslEncryption true
– spark.network.sasl.serverAlwaysEncrypt true
• After all Spark components recognized enableSaslEncryption
• Shuffle encryption
– spark.network.crypto.enabled true
– spark.io.encryption.enabled true
• Encrypts spilled caches and RDDs on local disks
13. Encryption for Hive
• hive.server2.thrift.sasl.qop: auth-conf in hive-site
– Encrypts JDBC between client and HiveServer2 binary mode
– And Thrift between clients and Hive Metastore
• hive.server2.use.SSL: true in hive-site
– Only for HS2 http mode
– HS2 binary mode cannot enable both TLS and SASL
• Encryption for JDBC between HS2/Hive Metastore and remote RDBMS
• Shuffle encryption
– Tez:
tez.runtime.shuffle.ssl.enable: true,
tez.runtime.shuffle.keep-alive.enabled: true in tez-site
– MapReduce:
mapreduce.ssl.enabled: true,
mapreduce.shuffle.ssl.enabled: true in mapred-site
– Requires server certs for all NodeManagers
14. Challenges in HTTP encryption: for
Application Master / Spark Driver
• Server certs for ApplicationMaster / SparkDriver
need to be readable by the user who submitted it
– ApplicationMaster and SparkDriver run as the user
– WebApplicationProxy between ResourceManager and
ApplicationMaster relies on this encryption
• Applications support TLS and can bundle certs since
– Spark 3.0.0: SPARK-24621
– MapReduce 3.3.0: MAPREDUCE-4669
– Tez: not supported yet
15. Encryption for ZooKeeper server
• Authenticate with SASL, encrypt with TLS
– ZooKeeper doen not respect SASL QOP
• Requires ZooKeeper 3.5.6 or above for servers/quorums
– serverCnxnFactory=org.apache.zookeeper.server.Nett
yServerCnxnFactory
– sslQuorum=true
– ssl.clientAuth=NONE
– ssl.quorum.clientAuth=NONE
• Needs ZOOKEEPER-4276 to follow Upgrading existing
non-TLS cluster with no downtime
– Makes ZK can serve only with secureClientPort
16. Encryption for ZooKeeper client
• Also Requires ZooKeeper 3.5.6 or above for clients
-Dzookeeper.client.secure=true
-Dzookeeper.clientCnxnSocket=
org.apache.zookeeper.ClientCnxnSocketNetty
in client JVM args
– HADOOP_OPTS environment variable
– mapreduce.admin.map.child.java.opts,
mapreduce.admin.reduce.child.java.opts in mapred-site
for Oozie Coordinator MapReduce jobs
• Needs to replace and update ZooKeeper jars in all components
which communicate with ZooKeeper
– ZKFC, ResourceManager, Hive clients incl. HS2, Oozie and Livy
– Apache Curator also be updated to 4.2.0, Netty from 4.0 to 4.1
17. Enforcing Kerberos AuthN/Z for
ZooKeeper
• Requires ZooKeeper 3.6.0 or above for servers
– 3.6.0+:
zookeeper.sessionRequireClientSASLAuth=true
– 3.7.0+:
enforce.auth.enabled=true
enforce.auth.schemes=sasl
• Oozie Hive action will not work with forcing ZK SASL
– when acquiring the lock for Hive Metastore
– Has no mechanisms to delegate authentication or
impersonation for ZooKeeper
– Using HiveServer2 / Oozie Hive2 action solve it
19. Background
HDFS blocks are written to local filesystem of the DataNodes
• the data is not encrypted by default
• encryption is required in several use cases
Encryption can be done at several layers:
• Application: most secure, but hardest to do
• Database: most databases have this, but may incur performance
penalties
• Filesystem: high performance, transparent, but may not be flexible
• Disk: only really protects against physical theft
HDFS TDE fits between database and filesystem level
21. KeyProvider: Where KEK is saved
Implementations of KeyProvider API
• Hadoop KMS: JavaKeyStoreProvider
– JCEKS files in Hadoop compatible filesystems (localFS, HDFS,
cloud storage)
– Not recommended
• Apache Ranger KMS: RangerKeyStoreProvider
– RDBMS
– master key can be stored in Luna HSM (optional)
– HSM is required in some use cases
• PCI-DSS, FIPS 140-2
22. Extending KeyProvider API is
not difficult
• Mandatory methods for HDFS TDE
– getKeyVersion, getCurrentKey, getMetadata
• Optional methods (nice to have for operation)
– getKeys, getKeysMetadata, getKeyVersions, createKey, deleteKey,
rollNewVersion
– If not implemented, you need to create/delete/list/roll keys in some
way
• Use cases:
– LinkedIn integrated with its own key management service, LiKMS
https://engineering.linkedin.com/blog/2021/the-exabyte-club--
linkedin-s-journey-of-scaling-the-hadoop-distr
– Yahoo! JAPAN also integrated with our own credential store by only
~500 LOC (including test code)
23. KeyProvider is actually stable,
can be used safely
• KeyProvider is @Public and @Unstable
– @Unstable in Hadoop means "incompatible changes are
allowed at any time"
• Actually, the API is very stable
– No incompatible changes
– Ranger uses it since 2015: RANGER-247
• Provided a patch to mark it stable
– HADOOP-17544
24. Hadoop KMS: Where KEK is
cached and performs
authorization
• KMS interacts with HDFS clients, NameNodes, and KeyProvider
• KMS have its own ACLs separated from HDFS ACLs
– An attacker cannot decrypt data even if HDFS ACLs are compromised
– If 'usera' reads/writes data in the encryption zone with 'keya', the
configuration in kms-acls.xml will be:
– The configuration is hot-reloaded
• For HA and scalability, multiple KMS instances are supported
<property>
<name>key.acl.keya.DECRYPT_EEK</name>
<value>usera</value>
</property>
25. How to deploy multiple KMS
instances
Two Approaches:
1. Behind a load-balancer or VIP
2. Using LoadBalancingKMSClientProvider
– Implicitly used when multiple URIs are specified in
hadoop.security.key.provider.path
If you have a LB or VIP, use it
• No configuration change to scale-out/decommission
• LB saves clients' retry cost
– LoadBalancingKMSClientProvider first try to connect to a KMS, if fails, then
connect to another KMS
26. How to configure multiple KMS
instances
• Delegation Token must be synchronized
– Use ZKDelegationTokenSecretManager
– Documented an example configuration: HADOOP-17794
• hadoop.security.token.service.use_ip
– If true (default), fails to validate SSL certificates in multi-
homed environment
– Documented: HADOOP-12665
27. Tuning Hadoop KMS
• Documented and discussed in HADOOP-15743
– Reduce SSL session cache size and TTL
– Tuning https idle timeout
– Increase max file descriptors
– etc.
• This tuning is effective in HttpFS as well
– Both KMS/HttpFS use Jetty via HttpServer2
28. Recap: HDFS TDE
• Careful configuration required
– How to save KEK
– Running multiple KMS instances
– KMS Tuning
– Where to create encryption zones
– ACLs (including key ACLs and impersonation)
• They are not straightforward despite the long time
since the feature was developed
30. Updating SSL certificates
• Hadoop >= 3.3.1 allows updating SSL certificates
without downtime: HADOOP-16524
– Use hot-reload feature in Jetty
– Except DataNode since DN don't rely on Jetty
• Useful especially for NameNode because it takes >
30 minutes to restart in large cluster
31. Other considerations
• It is important to be ready to upgrade at any time
– Sometimes CVEs have been published and the vendors
warn users to upgrade
• Security requirements may increase later, so be
prepared for that early
• Operational considerations are also necessary
– Not only the cluster configuration but also the operations
will be change
32. Conclusion & Future work
We introduced many technical tips for secure Hadoop
cluster
• However, they might change in the future
• Need to catch up with the OSS community
Future work
• How to enable SSL/TLS in ApplicationMaster & Spark Driver
Web UIs
• Impersonation does not work correctly in KMSClientProvider:
HDFS-13697