In this episode, we will focus on security in the cloud at scale. We’ll have Netflix speakers discussing existing and upcoming security-related OSS releases, and we’ll also have external speakers from organizations that are using and contributing to Netflix security OSS.
First, Patrick Kelley from Netflix’s Security Operations team will speak about RepoMan, an upcoming OSS release designed to right-size AWS permissions. Then, Wes Miaw from Netflix’s Security Engineering team will discuss MSL (Message Security Layer).
We have two external speakers for this event - Chris Dorros from OpenDNS/Cisco will talk about his use of and contributions to Lemur, and Ryan Lane from Lyft will talk about their use of BLESS.
After the talks, we’ll have OSS authors at demo stations to answer questions and provide demos of Netflix security OSS, including Lemur, MSL, and Security Monkey.
5. ● MSL in a nutshell
● Motivations
● Netflix and MSL
● External Interest
● Continuing Work
Agenda
6. MSL in a Nutshell
● Transport protocol.
○ Security
■ encryption, integrity protection, non-replayability
○ Authentication
■ devices + servers + users
○ (Not Authorization)
7. Motivations (1)
● HTML5 Standards-Based Playback
○ JavaScript, EME, MSE
○ Web browsers & HTML5 runtime environments
● Eliminate SSL/TLS
○ initial handshake overhead
○ problematic PKI infrastructure
○ time is always wrong and never trustworthy
8. Motivations (2)
● Unified Authentication
○ authenticate once
○ device + user auth anywhere (client + server)
● Platform & Services Integration
○ device-based crypto (or no crypto)
○ third-party user authentication
9. Motivations (3)
● Updateable & Recoverable
○ fixes and features pushed by Netflix
○ recovery from platform crypto or storage bugs
10. Netflix and MSL - Network Architecture
Once messages are processed by the MSL stack all
applications trust entity + user identities.
11. Netflix and MSL - Trust
● Device Security
○ securely identify device types
○ different devices satisfy different levels of content protection
● User Security
○ user identity and data bound to the device
12. External Interest
● Financial firm trying to avoid HTTPS overhead.
● Proxy-based service that want to inspect traffic w/o
compromising the communications security.
● Company building microservices that require secure
communication and authentication.
13. Continuing Work
● New device authentication schemes.
● Platform-based session keys.
● Single-sign-on.
● Integration into third-party applications.
● Encoder abstraction.
15. Repoman Agenda
● Review: Least Privilege
● Dependency: RolliePollie
● Workflow Overview
● Introducing Role Groups
● Access Profiling
● Group Template Creation
16. Least Privilege
Assigning the correct permissions is non-trivial.
* Too many permissions, nobody complains...
until there is an incident.
* Too few permissions, the app is broken.
* There are currently around 2,500 unique AWS
permissions. Almost impossible to guess which
ones an app requires.
17. RolliePollie
Enforcement Arm of Repoman.
Notifies Security Team, or reverts
any changes, if role is ever
modified and doesn’t match
template.
Consistency is maintained across
all AWS accounts.
19. Role Groups
Especially useful for application roles deployed across
many AWS accounts.
● SecurityMonkey, Discovery, Lemur, Atlas
Treat a set of IAM roles as a single entity.
Keep their permissions consistent.
24. Access Profiling
Only remove permissions that are
supported by CloudTrail.
Handle wildcards & NotAction
Preserve Conditions
Preserve Resource & NotResource
Access Advisor data is also incredibly
useful.
25. Access Profiling
Only remove permissions that are
supported by CloudTrail.
Handle wildcards & NotAction
Preserve Conditions
Preserve Resource & NotResource
Access Advisor data is also incredibly
useful.
29. BLESS?
● BLESS = “Bastion's Lambda Ephemeral SSH Service”
● Short lived (4-minute) certificates issued after strong
user authentication
● Small codebase, running on Lambda in a separate AWS
account, as the Certificate Authority
30. BLESS on the endpoint?
Can we use the same principles as BLESS to allow
ephemeral keys on our engineers’ laptops?
● Enforce two-factor authentication when issuing a
certificate
○ Less concern if their laptop is stolen or 0wned
● Improve employee onboarding/offboarding
○ IT doesn’t have to generate the user’s private key
○ No “base deploy” to add/remove user’s public key on infrastructure
31. BLESS + kmsauth
How do we ensure the user requesting the certificate
matches the username logging into the server?
Lyft’s kmsauth to cryptographically bind the AWS user to
certificate’s username
● Only the AWS user has the permissions to get a
(kms encrypted) token for their username
● Lambda will only issue certificate with the kmsauth
token’s username
32. Blessclient
● Small python script to get kmsauth token, assume
“use-bless” role (requires MFA), and manage certificate
on user’s laptop
● Use ssh_config’s “Match exec” to call python script
whenever SSH is invoked
○ However, script doesn’t have stdio/stdout bindings, so poor UX
● SSH wrapper script to call script before invoking SSH
client for improved UX
33. Host Certificates
● Hosts gets a one-week certificate from Lambda
● Use kmsauth to bind the instance identity to the
hostnames in the certificate
● Blessclient manages CA keys on engineer laptops
39. Problems
● Developers have to make the decisions about
cryptography
○ RSA vs ECDSA?
○ 2048 vs 4096?
○ Device compatibility vs security?
● Keys are littered everywhere
○ Engineers often use laptop to create key/CSR
● Insanely manual, point-and-click, copy-paste process
● ..etc
44. Lemur @ OpenDNS
● Wrote plugin for DigiCert
○ Lemur plugin architecture FTW!
● Run in our Docker platform called Quadra
● AWS RDS for Lemur DB
● Keys transferred to Secrets storage service
● Deployed from secrets storage to SSL endpoints
45. What’s Next?
● Increased usage of Lemur API for automation
● Automatic certificate rotation
● Short-lived certs
● Integration with our HSMs
○ For internal CA
● Let’s Encrypt
● More self-service for devs