SlideShare a Scribd company logo
1 of 22
SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity
Evan Kinney, Security Software Developer
SAS
$whoami
Security Software Developer
at SAS
Platform authn/authz,
general security
Kerberos; all the time
always.
Open source
developer/contributor/wrangle
r
Dinosaur enthusiast
Twitter: @evankinney
GitHub: 3van
Freenode: evanosaurus
Why are we here?
 Hadoop isn’t secure by default
 Fixing this is non-trivial
 Mistakes made are hard to diagnose
 No consistent story for authentication (yet)
 Kerberos is hard
Let’s talk about the elephant in the room.
 (this is the first time that pun has ever been used.)
 When created, Hadoop didn’t need security
 Components were almost exclusively single-tenant and isolated
 What problems do we have?
 HDFS and MapReduce
 Blindly trust that a user is who they say they are
 Allow for arbitrary Java code execution as the JobTracker service account
 Trivial to circumvent permissions restrictions
 Rogue services
 Services don’t authenticate each other
 DataNodes
 Know the block ID? Awesome, here are your dataz!
Okay, great—now how do we fix it?
 Kerberos! (surprise)
 Doesn’t require significantly changing the architecture
 Uses symmetric encryption
 Allows services to authenticate each other
 Allows services to authenticate users
 Allows users to authenticate services
Let’s talk about the three-headed dog in the room.
 Kerberos is a standard protocol [RFC 4120] that allows for
mutual authentication of entities via the use of a trusted arbiter
 Developed at MIT originally for AFS
 V4 - 1988[ish]
 V5 - 1993; 2005
 Used in a lot more places than most people think
 Active Directory == LDAP + Kerberos
 Once it’s working, is usually fairly transparent to users
Some Kerberos Terminology
 principal: an entity that can authenticate or be authenticated to
 realm: an administrative partition that contains one or more principals; always given in
uppercase
 ticket: ASN.1 structure containing information about a request as well as authenticators to
verify the info
 ticket granting ticket (TGT): the result of initial authentication with a KDC; used to assert
identity in further requests
 service ticket: issued to a principal for use in asserting identity to Kerberized services
 credentials cache (CC): contains one or more tickets for one principal
 keytab: holds any number of pre-salted/hashed long-term keys (passwords); generally used for
headless/automated services
 key distribution center (KDC): the trusted arbiter; stores keys for one or more realms and
answers authentication requests from principals
 ticket granting service (TGS): accepts TGT session keys, issues service tickets
 authentication service (AS): accepts salted/hashed password, issues TGT
How does it work?
Hadoop and Kerberos
 Hadoop relies on user accounts to enforce ACLs, et al. (if not
configured to look things up in LDAP)
 Generally speaking, a Kerberos principal is not necessarily associated
with a POSIX user account (though, in practice, they usually are)
 Once authenticated via Kerberos, a user is issued a delegation token
(specific to Hadoop) for use in further requests
 Scheduling is… hard
 Default ticket lifetime is 10 hours, can be renewed for 7 days
 Most distributions assume you’re running an isolated realm with your
own KDC
What about Active Directory?
 Great for users; not so great for admins
 Two different deployment architectures:
 only Active Directory
 both user and Hadoop service principals exist as AD objects
 Cross-realm principals (trusts)
 Hadoop service principals exist in a pure Kerberos realm,
users exist in AD
 Both have fun issues
 Where’s the user data coming from?
Only Active Directory
 Tons and tons of objects to create for Hadoop services
 Unless using a vendor-supplied management
mechanism, setup is a very manual process
 Usually requires IT involvement any time changes are
made
 AD’s idea of Kerberos != everyone else’s
Cross-Realm Principal Architecture
 Much more complex; harder to debug
 Unless you configure the KDC with replication (or a backend
database that replicates itself), it becomes a massive SPoF
 You have to administer the KDC
 Getting AD to use the correct encryption types is somewhat
challenging
 Windows (i.e. purely SSPI-based) clients tend to not work
consistently (if at all)
Speaking of encryption types…
 Three are the most used today:
 aes256-cts-hmac-sha1-96
 aes128-cts-hmac-sha1-96
 arcfour-hmac-md5
 Don’t use DES
 or 3DES, preferably… but especially not DES
 Crypto export/import restrictions cause issues with Java
 The unlimited-strength JCE policy files must be present in the JRE to allow the use
of aes256-cts
 AD won’t do AES if the domain functional level (DFL) is lower than Server 2008
 Almost all other libraries default to using the cipher suites in the order above, unless
configured otherwise
Considerations for Hadoop
 Kerberos doesn’t encrypt your data or traffic
 Communication between all DataNodes and NameNode(s)
should be isolated and/or protected (via hadoop.rpc.protection)
 If users have access to the files themselves, ACLs are basically
useless
 If they have root/admin access to the servers…
 …none of this matters anyway
 Hadoop services determine what their hostname (and, thus, service
principal name) is via reverse DNS (or via fs.default.name, if set)
 Also, Kerberos itself is very, very dependent upon properly
administered DNS records and local client configuration
Considerations for SAS
 SAS/ACCESS® Interface to Hadoop™
 Uses Java, so subject to aforementioned issues
 SAS® Enterprise Guide®, Web-based products (e.g. SAS® Visual
Analytics), et al.
 Need to configure sasauth for PAM authentication
 Need to configure PAM to obtain Kerberos credentials on login (via
SSSD, pam_krb5, QAS, etc.)
 If AD: need to configure nsswitch to obtain user info from AD (via
SSSD, nss_ldap, etc.)
 Needed for both SAS and Hadoop
How can this go wrong?
 Don’t try to enumerate them all; sadness will ensue
 Vast majority of issues are eventually attributed to incorrect or
missing configuration
 Adding debug parameters to the JVM invocation will almost
always lead you in the right direction
 sun.security.krb5.debug=true
 sun.security.jgss.debug=true
 HADOOP_JAAS_DEBUG=true
 Wireshark is invaluable
Common (and/or Particularly Egregious) Pitfalls
 Bad principal mapping to local users
 If the user principal attempting to authenticate is from a realm other than the
default realm, rules must be set up to indicate that principals from the other realm
are to be trusted as being equivalent to local accounts of the same name
 Usually only matters if using cross-realm principals (trusts)
 Consists of a set of regex-like strings used to parse principals into their
constituent parts
 Set in both krb5.conf and Hadoop configs
 krb5.conf: auth_to_local (defined per-realm)
 Hadoop: hadoop.security.auth_to_local
 Java is *supposed* to look in krb5.conf, but it doesn’t work
Common (and/or Particularly Egregious) Pitfalls
 Unlimited-strength JCE policy files missing or bad
 Are you sure you put them in the right JRE?
 Are you sure you put them in all the JREs?
 Did you download the correct version?
 Stack traces (with krb5.debug/jgss.debug):
 javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure
unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC
SHA1-96 is not supported/enabled)]
Common (and/or Particularly Egregious) Pitfalls
 “Clock skew too great”
 Kerberos requires that all parties involved in
authentication have their clocks synchronized within 5
minutes of each other (by default)
 Use chronyd/ntpd against your preferred authoritative
time source on the KDC, and have other clients get
their time from it
 If AD is involved, the PDC is also an NTP server
Common (and/or Particularly Egregious) Pitfalls
 “Mechanism level: EncryptedData is encrypted using
keytype DES3 CBC mode with SHA1-KD but decryption
key is of type NULL”
 Long story short: you’re using DES; stop it!
 Actually due to a bug in Java where the RFC wasn’t
interpreted correctly
 https://bugs.openjdk.java.net/browse/JDK-8025124
 Fixed in Java 8 b113 (and current stable)
Questions?
SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity

More Related Content

Recently uploaded

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity

  • 1. SAS1844 - Securing Hadoop Clusters while Still Retaining Your Sanity Evan Kinney, Security Software Developer SAS
  • 2. $whoami Security Software Developer at SAS Platform authn/authz, general security Kerberos; all the time always. Open source developer/contributor/wrangle r Dinosaur enthusiast Twitter: @evankinney GitHub: 3van Freenode: evanosaurus
  • 3. Why are we here?  Hadoop isn’t secure by default  Fixing this is non-trivial  Mistakes made are hard to diagnose  No consistent story for authentication (yet)  Kerberos is hard
  • 4. Let’s talk about the elephant in the room.  (this is the first time that pun has ever been used.)  When created, Hadoop didn’t need security  Components were almost exclusively single-tenant and isolated  What problems do we have?  HDFS and MapReduce  Blindly trust that a user is who they say they are  Allow for arbitrary Java code execution as the JobTracker service account  Trivial to circumvent permissions restrictions  Rogue services  Services don’t authenticate each other  DataNodes  Know the block ID? Awesome, here are your dataz!
  • 5. Okay, great—now how do we fix it?  Kerberos! (surprise)  Doesn’t require significantly changing the architecture  Uses symmetric encryption  Allows services to authenticate each other  Allows services to authenticate users  Allows users to authenticate services
  • 6. Let’s talk about the three-headed dog in the room.  Kerberos is a standard protocol [RFC 4120] that allows for mutual authentication of entities via the use of a trusted arbiter  Developed at MIT originally for AFS  V4 - 1988[ish]  V5 - 1993; 2005  Used in a lot more places than most people think  Active Directory == LDAP + Kerberos  Once it’s working, is usually fairly transparent to users
  • 7. Some Kerberos Terminology  principal: an entity that can authenticate or be authenticated to  realm: an administrative partition that contains one or more principals; always given in uppercase  ticket: ASN.1 structure containing information about a request as well as authenticators to verify the info  ticket granting ticket (TGT): the result of initial authentication with a KDC; used to assert identity in further requests  service ticket: issued to a principal for use in asserting identity to Kerberized services  credentials cache (CC): contains one or more tickets for one principal  keytab: holds any number of pre-salted/hashed long-term keys (passwords); generally used for headless/automated services  key distribution center (KDC): the trusted arbiter; stores keys for one or more realms and answers authentication requests from principals  ticket granting service (TGS): accepts TGT session keys, issues service tickets  authentication service (AS): accepts salted/hashed password, issues TGT
  • 8. How does it work?
  • 9. Hadoop and Kerberos  Hadoop relies on user accounts to enforce ACLs, et al. (if not configured to look things up in LDAP)  Generally speaking, a Kerberos principal is not necessarily associated with a POSIX user account (though, in practice, they usually are)  Once authenticated via Kerberos, a user is issued a delegation token (specific to Hadoop) for use in further requests  Scheduling is… hard  Default ticket lifetime is 10 hours, can be renewed for 7 days  Most distributions assume you’re running an isolated realm with your own KDC
  • 10. What about Active Directory?  Great for users; not so great for admins  Two different deployment architectures:  only Active Directory  both user and Hadoop service principals exist as AD objects  Cross-realm principals (trusts)  Hadoop service principals exist in a pure Kerberos realm, users exist in AD  Both have fun issues  Where’s the user data coming from?
  • 11. Only Active Directory  Tons and tons of objects to create for Hadoop services  Unless using a vendor-supplied management mechanism, setup is a very manual process  Usually requires IT involvement any time changes are made  AD’s idea of Kerberos != everyone else’s
  • 12. Cross-Realm Principal Architecture  Much more complex; harder to debug  Unless you configure the KDC with replication (or a backend database that replicates itself), it becomes a massive SPoF  You have to administer the KDC  Getting AD to use the correct encryption types is somewhat challenging  Windows (i.e. purely SSPI-based) clients tend to not work consistently (if at all)
  • 13. Speaking of encryption types…  Three are the most used today:  aes256-cts-hmac-sha1-96  aes128-cts-hmac-sha1-96  arcfour-hmac-md5  Don’t use DES  or 3DES, preferably… but especially not DES  Crypto export/import restrictions cause issues with Java  The unlimited-strength JCE policy files must be present in the JRE to allow the use of aes256-cts  AD won’t do AES if the domain functional level (DFL) is lower than Server 2008  Almost all other libraries default to using the cipher suites in the order above, unless configured otherwise
  • 14. Considerations for Hadoop  Kerberos doesn’t encrypt your data or traffic  Communication between all DataNodes and NameNode(s) should be isolated and/or protected (via hadoop.rpc.protection)  If users have access to the files themselves, ACLs are basically useless  If they have root/admin access to the servers…  …none of this matters anyway  Hadoop services determine what their hostname (and, thus, service principal name) is via reverse DNS (or via fs.default.name, if set)  Also, Kerberos itself is very, very dependent upon properly administered DNS records and local client configuration
  • 15. Considerations for SAS  SAS/ACCESS® Interface to Hadoop™  Uses Java, so subject to aforementioned issues  SAS® Enterprise Guide®, Web-based products (e.g. SAS® Visual Analytics), et al.  Need to configure sasauth for PAM authentication  Need to configure PAM to obtain Kerberos credentials on login (via SSSD, pam_krb5, QAS, etc.)  If AD: need to configure nsswitch to obtain user info from AD (via SSSD, nss_ldap, etc.)  Needed for both SAS and Hadoop
  • 16. How can this go wrong?  Don’t try to enumerate them all; sadness will ensue  Vast majority of issues are eventually attributed to incorrect or missing configuration  Adding debug parameters to the JVM invocation will almost always lead you in the right direction  sun.security.krb5.debug=true  sun.security.jgss.debug=true  HADOOP_JAAS_DEBUG=true  Wireshark is invaluable
  • 17. Common (and/or Particularly Egregious) Pitfalls  Bad principal mapping to local users  If the user principal attempting to authenticate is from a realm other than the default realm, rules must be set up to indicate that principals from the other realm are to be trusted as being equivalent to local accounts of the same name  Usually only matters if using cross-realm principals (trusts)  Consists of a set of regex-like strings used to parse principals into their constituent parts  Set in both krb5.conf and Hadoop configs  krb5.conf: auth_to_local (defined per-realm)  Hadoop: hadoop.security.auth_to_local  Java is *supposed* to look in krb5.conf, but it doesn’t work
  • 18. Common (and/or Particularly Egregious) Pitfalls  Unlimited-strength JCE policy files missing or bad  Are you sure you put them in the right JRE?  Are you sure you put them in all the JREs?  Did you download the correct version?  Stack traces (with krb5.debug/jgss.debug):  javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)]
  • 19. Common (and/or Particularly Egregious) Pitfalls  “Clock skew too great”  Kerberos requires that all parties involved in authentication have their clocks synchronized within 5 minutes of each other (by default)  Use chronyd/ntpd against your preferred authoritative time source on the KDC, and have other clients get their time from it  If AD is involved, the PDC is also an NTP server
  • 20. Common (and/or Particularly Egregious) Pitfalls  “Mechanism level: EncryptedData is encrypted using keytype DES3 CBC mode with SHA1-KD but decryption key is of type NULL”  Long story short: you’re using DES; stop it!  Actually due to a bug in Java where the RFC wasn’t interpreted correctly  https://bugs.openjdk.java.net/browse/JDK-8025124  Fixed in Java 8 b113 (and current stable)