SlideShare uma empresa Scribd logo
1 de 80
Real World Cloud
 Application Security
Lessons Learned Running Large Scale Systems
            in the Public Cloud


   Jason Chan - chan@netflix.com
Netflix, Inc.
         “With more than 27 million
          streaming members in the
         United States, Canada, Latin
        America, the United Kingdom
        and Ireland, Netflix, Inc. is the
           world's leading internet
       subscription service for enjoying
        movies and TV programs . . .”
Source: http://ir.netflix.com
Me
• Cloud Security Architect @ Netflix
• Responsible for:
  • Cloud app, product, infrastructure, ops
     security
• Previously:
  • Led security team @ VMware
  • Earlier, primarily security consulting at
     @stake, iSEC Partners
AppSec Challenges
AppSec Challenges
Lots of Good Advice
• BSIMM
• Microsoft SDL
• SAFECode
But, what works?




Forrester Consulting, 12/10
Especially, given phenomena
 such as DevOps, cloud, agile,
and the unique characteristics
     of an organization?
Netflix Engineering
 Characteristics
Netflix in the Cloud -
        Why?
Netflix in the Cloud -
        Why?
Netflix in the Cloud -
        Why?

           “Undifferentiated heavy lifting”
Netflix in the Cloud -
        Why?

           “Undifferentiated heavy lifting”
Netflix in the Cloud -
        Why?

           “Undifferentiated heavy lifting”
Netflix is now ~99% in
    Public Cloud
On the way to the
    cloud . . .
On the way to the
    cloud . . .




                 (or NoOps,
           depending on definitions)
Some As-Is #s

• 27m+ subscribers
• 10,000s of systems
• 100s of engineers, apps
• ~250 test deployments/day *
• ~70 production deployments/day *
* Sample based on this week’s activities
Deploying Code @ Netflix
A common graph @ Netflix
A common graph @ Netflix
Lots of watching in prime time
A common graph @ Netflix
Lots of watching in prime time   Not as much in early morning
A common graph @ Netflix
Lots of watching in prime time              Not as much in early morning




            Old way - pay and provision for peak, 24/7/365
A common graph @ Netflix
    Lots of watching in prime time              Not as much in early morning




                Old way - pay and provision for peak, 24/7/365

Multiply this pattern across the dozens of apps
 that comprise the Netflix streaming service
Solution: Load-Based
    Autoscaling
Autoscaling
Autoscaling
•   Goals:
Autoscaling
•   Goals:

    •   # of systems matches
        load requirements
Autoscaling
•   Goals:

    •   # of systems matches
        load requirements

    •   Load per server is
        constant
Autoscaling
•   Goals:

    •   # of systems matches
        load requirements

    •   Load per server is
        constant

    •   Happens without
        intervention (the
        ‘auto’ in autoscaling
Autoscaling
•   Goals:                      •   Results:

    •   # of systems matches
        load requirements

    •   Load per server is
        constant

    •   Happens without
        intervention (the
        ‘auto’ in autoscaling
Autoscaling
•   Goals:                      •   Results:

    •   # of systems matches        •   Continuously
        load requirements               adding/removing
                                        nodes
    •   Load per server is
        constant

    •   Happens without
        intervention (the
        ‘auto’ in autoscaling
Autoscaling
•   Goals:                      •   Results:

    •   # of systems matches        •   Continuously
        load requirements               adding/removing
                                        nodes
    •   Load per server is
        constant                    •   New nodes must
                                        mirror existing
    •   Happens without
        intervention (the
        ‘auto’ in autoscaling
Every change requires a new
         cluster push
(not an incremental change to
      existing systems)
Deploying must be easy
        (it is)
Netflix Deployment
     Pipeline
Netflix Deployment
              Pipeline




Perforce/Git

Code change
Config change
Netflix Deployment
              Pipeline
               RPM file with
               app-specific
               bits


                  YUM




Perforce/Git

Code change
Config change
Netflix Deployment
              Pipeline
               RPM file with
               app-specific
               bits


                  YUM




Perforce/Git                    Bakery

Code change                   Base image +
Config change                  RPM
Netflix Deployment
              Pipeline
               RPM file with                  VM template
               app-specific                   ready to launch
               bits


                  YUM                            AMI




Perforce/Git                    Bakery

Code change                   Base image +
Config change                  RPM
Netflix Deployment
              Pipeline
               RPM file with                  VM template
               app-specific                   ready to launch
               bits


                  YUM                            AMI




Perforce/Git                    Bakery                             ASG

Code change                   Base image +                     Cluster config
Config change                  RPM                              Running systems
Netflix Deployment
              Pipeline
               RPM file with                  VM template
               app-specific                   ready to launch
               bits


                  YUM                            AMI




Perforce/Git                    Bakery                             ASG

Code change                   Base image +                     Cluster config
Config change                  RPM                              Running systems
Operational Impact

• No changes to running systems
• No systems management infrastructure
• Fewer logins to prod
• No snowflakes
• Trivial “rollback”
Security Impact
• Need to think differently on:
 • Vulnerability management
 • Patch management
 • User activity monitoring
 • File integrity monitoring
 • Forensic investigations
Org, architecture,
deployment is different,
 what about security?
We’ve adapted too.
Some principles we’ve
   found useful.
Integrate
Base AMI Security
•   AMI = Amazon Machine                   •   Average age of running
    Image                                      instance: 24 days*

•   @ Netflix, all apps are                 •   60% of instances less
    based on “Base AMI”,                       than 1 week old*
    and new pushes pick up
    the latest

•   Concentrating testing
    and improvements here
    provides greatest impact


* Based on one-time sampling (yesterday)
Base AMI Testing
•   The base AMI is managed
    like other packages, via
    P4, Jenkins, etc.

•   We watch the base AMI’s
    SCM directory & kick
    off testing when it
    changes

•   Launch an instance of
    the AMI, perform vuln
    scan and other checks
Base AMI Testing
•   The base AMI is managed
    like other packages, via
    P4, Jenkins, etc.

•   We watch the base AMI’s
    SCM directory & kick
    off testing when it
    changes

•   Launch an instance of
    the AMI, perform vuln
    scan and other checks
Base AMI Testing
•   The base AMI is managed
    like other packages, via
    P4, Jenkins, etc.

•   We watch the base AMI’s
    SCM directory & kick       SCAN COMPLETED ALERT

    off testing when it        Site name: AMI1


    changes                    Stopped by: N/A

                               Total Scan Time: 4 minutes 46 seconds



•
                               Critical Vulnerabilities: 5
    Launch an instance of      Severe Vulnerabilities:   4
                               Moderate Vulnerabilities: 4

    the AMI, perform vuln
    scan and other checks
Security Packaging


• All security tools use the same toolchain as
  the rest of engineering (P4/Git, Jenkins, etc.)
• From the RPM spec file of a webserver:
  Requires: ossec cloudpassage nflx-base-harden
  hyperguard-enforcer
• Pulls in the following RPMs:
   • Host hardening package
   • WAF agent
   • OSSEC (HIDS agent)
   • CloudPassage (config assessment,
    FW, etc.)
Static Analysis

• Available self-service through build
  environment (FindBugs, PMD)
• Jenkins (CI) plugin to display graphs and
  support drill through to results
MAN Integration
Many systems involved,
  standardization is
      important
Central Alerting
           Gateway
• A single place to generate alerts
• Python, Java libraries (or json post) to easily
  alert on events of interest
• Ties in to PagerDuty notification system
• Allows for stateful alerting and some
  response
• A prerequisite that our tools will leverage
CAG Example

import CORE.Gateway

gw = CORE.Gateway.Gateway()

gw.send("testcluster", "normal", "Something went
wrong")
Chronos

• Timeline system (API and UI) with Java/
  Python libraries, or json post
• Track config changes, deployments, etc.
• Security tools also leverage for tracking and
  analysis
Chronos Security
          Examples
• What IP addresses have been blacklisted by
  the WAF in the last few weeks?
  GET /api/v1/event?
  timelines=type:blacklist&start=20121012000000000


• Which security groups have changed today?
  GET /api/v1/event?
  timelines=type:securitygroup&start=20121024000000000
Make the right way easy
     (and secure)
Cryptex
• Many uses of crypto in web/distributed
  systems:
 • Encrypt/decrypt (cookies, data, etc.)
 • Sign/verify (URLs, data, etc.)
• Known as an area where developers should
  not DIY
• Multi-layer crypto system (HSM basis, scale
          out layer)

          • Easy for developers to use
          • Key management handled transparently
          • Access control and auditable operations
ICipherContext cipherContext =
                CryptexClientFactory.getCipherContext(KeySet.testkey);
// encryption
String cipherText = cipherContext.encrypt("NETFLIX");
// decryption
String plainText = cipherContext.decrypt(cipherText);
Cloud SSO


• Authenticated access to dashboards, admin
  apps in the cloud is problematic
 • No datacenter access, no LDAP, AD
Cloud SSO
• Solution - leverage OneLogin SaaS SSO
  option (SAML) used by IT for enterprise
  apps
• Built filter that integrates with our platform
  web server to make SSO/authentication
  trivial
Trust, but verify
Culture of ‘freedom and
   responsibility’ precludes
    traditional centralized,
command and control approach
Security Monkey
•   Cloud APIs make            •   Includes:
    verification and analysis
    of configuration &
    running state simpler
                                   •   Cert checking

                                   •   Firewall analysis
•   Security Monkey created
    as the framework for
    this analysis
                                   •   IAM entity analysis

                                   •   Limit warnings
Security Monkey


     From:  Security Monkey
     Date:  Wed, 24 Oct 2012 17:08:18 +0000
     To:  Security Alerts
     Subject:  prod Changes Detected


             Table of Contents:
                 Security Groups
                 
                         Changed Security Group
                         
                             
                             <sgname> (eu-west-1 / prod)
                              <#Security Group/<sgname> (eu-west-1 / prod)>
                         
Exploit Monkey
    • Autoscaling group is unit of deployment, so
        changes signal a good time to rerun
        dynamic scans
On 10/23/12 12:35 PM, Exploit Monkey wrote:

I noticed that testapp-live has changed current ASG name from testapp-
live-v001 to testapp-live-v002.

I'm starting a vulnerability scan against test app from these private/
public IPs:
10.29.24.174
ELB Checker (gauntlt)

•   AWS’ Elastic Load Balancer (ELB) provides cross-
    datacenter traffic balancing, but no security
    controls (if your cluster is attached to an ELB, it is
    available to the Internet)

•   Engineers may misunderstand use cases for ELBs,
    security features, and/or other measures that can
    be used to protect ELB-fronted clusters
Solution: gauntlt Testing
1. Launch gauntlt test runner
   instance, loaded with “master
   list” of ELBs and expected state

2. Determine “target list” of
   current ELBs to evaluate

3. Generate per-ELB listener
   gauntlt attack files

4. Execute attacks

5. Alert on failures and new ELBs

6. Triage findings and update ELB
   master list
Self-service, with
   exceptions
AWS Security Groups
•   Asgard cloud orchestration
    tool allows developers to
    configure their own firewall
    rules

•   Limited to same-account
    groups, no IP-based rules

•   Handles 95% of
    requirements, JIRAs for
    additional changes, and
    Security Monkey to keep an
    eye on things
Takeaways
• Netflix runs a large, dynamic service in AWS
• Good guidance + specific context can help
  jumpstart a pragmatic security program
• Newer concepts like cloud & DevOps need
  updated approach to security
• Don’t swim upstream - integrate and
  collaborate with your engineering partners
Netflix References

• http://netflix.github.com/
• http://techblog.netflix.com/
• http://slideshare.net/netflix
Other References
•   http://www.webpronews.com/netflix-outage-angers-
    customers-2008-08
•   http://www.pcmag.com/article2/0,2817,2395372,00.asp
•   http://www.readwriteweb.com/archives/
    etech_amazon_cto_aws.php
•   http://bsimm.com/online/
•   http://www.microsoft.com/en-us/download/
    confirmation.aspx?id=29884
•   http://www.gauntlt.org
Questions?
chan@netflix.com

Mais conteúdo relacionado

Mais procurados

Resilience and Security @ Scale: Lessons Learned
Resilience and Security @ Scale: Lessons LearnedResilience and Security @ Scale: Lessons Learned
Resilience and Security @ Scale: Lessons LearnedJason Chan
 
DevSecOps: Taking a DevOps Approach to Security
DevSecOps: Taking a DevOps Approach to SecurityDevSecOps: Taking a DevOps Approach to Security
DevSecOps: Taking a DevOps Approach to SecurityAlert Logic
 
Successfully Implementing DEV-SEC-OPS in the Cloud
Successfully Implementing DEV-SEC-OPS in the CloudSuccessfully Implementing DEV-SEC-OPS in the Cloud
Successfully Implementing DEV-SEC-OPS in the CloudAmazon Web Services
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesShiva Narayanaswamy
 
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...Amazon Web Services
 
Careers in Security
Careers in SecurityCareers in Security
Careers in SecurityJason Chan
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as CodeRobert Greiner
 
Maturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsMaturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsAmazon Web Services
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azuregjuljo
 
DevOps and AWS - Code PaLOUsa 2017
DevOps and AWS  - Code PaLOUsa 2017DevOps and AWS  - Code PaLOUsa 2017
DevOps and AWS - Code PaLOUsa 2017James Strong
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudDevOps.com
 
Microservice - All is Small, All is Well?
Microservice - All is Small, All is Well?Microservice - All is Small, All is Well?
Microservice - All is Small, All is Well?Eberhard Wolff
 
Micro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmMicro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmEberhard Wolff
 

Mais procurados (20)

Resilience and Security @ Scale: Lessons Learned
Resilience and Security @ Scale: Lessons LearnedResilience and Security @ Scale: Lessons Learned
Resilience and Security @ Scale: Lessons Learned
 
DevSecOps: Taking a DevOps Approach to Security
DevSecOps: Taking a DevOps Approach to SecurityDevSecOps: Taking a DevOps Approach to Security
DevSecOps: Taking a DevOps Approach to Security
 
Successfully Implementing DEV-SEC-OPS in the Cloud
Successfully Implementing DEV-SEC-OPS in the CloudSuccessfully Implementing DEV-SEC-OPS in the Cloud
Successfully Implementing DEV-SEC-OPS in the Cloud
 
DevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best PracticesDevOps, Common use cases, Architectures, Best Practices
DevOps, Common use cases, Architectures, Best Practices
 
Introduction to DevSecOps
Introduction to DevSecOpsIntroduction to DevSecOps
Introduction to DevSecOps
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
AWS Summit 2013 | Singapore - Security & Compliance and Integrated Security w...
 
Careers in Security
Careers in SecurityCareers in Security
Careers in Security
 
Continuous Delivery
Continuous DeliveryContinuous Delivery
Continuous Delivery
 
State of Union - Containerz
State of Union - ContainerzState of Union - Containerz
State of Union - Containerz
 
DevOps on AWS
DevOps on AWSDevOps on AWS
DevOps on AWS
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as Code
 
Maturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsMaturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOps
 
DevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft AzureDevOps in the Cloud with Microsoft Azure
DevOps in the Cloud with Microsoft Azure
 
DevOps and AWS - Code PaLOUsa 2017
DevOps and AWS  - Code PaLOUsa 2017DevOps and AWS  - Code PaLOUsa 2017
DevOps and AWS - Code PaLOUsa 2017
 
Vulnerability Discovery in the Cloud
Vulnerability Discovery in the CloudVulnerability Discovery in the Cloud
Vulnerability Discovery in the Cloud
 
Azure Security Center
Azure Security CenterAzure Security Center
Azure Security Center
 
Microservice - All is Small, All is Well?
Microservice - All is Small, All is Well?Microservice - All is Small, All is Well?
Microservice - All is Small, All is Well?
 
Micro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmMicro Service – The New Architecture Paradigm
Micro Service – The New Architecture Paradigm
 
Testing Framework on AWS Cloud - Solution Set
Testing Framework on AWS Cloud - Solution SetTesting Framework on AWS Cloud - Solution Set
Testing Framework on AWS Cloud - Solution Set
 

Destaque

Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security AutomationJason Chan
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from AbuseJason Chan
 
Security at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooSecurity at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooAlex Stamos
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud SecurityJason Chan
 
Virtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesVirtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesJason Chan
 
Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services SecurityJason Chan
 
AWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveAWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveJason Chan
 

Destaque (7)

Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security Automation
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from Abuse
 
Security at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooSecurity at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at Yahoo
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud Security
 
Virtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesVirtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit Perspectives
 
Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services Security
 
AWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveAWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's Perspective
 

Semelhante a Real World Cloud Application Security

Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOpsEklove Mohan
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers SoftwareDevOps Chicago
 
Configuration Management in the Cloud | AWS Public Sector Summit 2017
Configuration Management in the Cloud | AWS Public Sector Summit 2017Configuration Management in the Cloud | AWS Public Sector Summit 2017
Configuration Management in the Cloud | AWS Public Sector Summit 2017Amazon Web Services
 
Infrastructure as Code for Network
Infrastructure as Code for NetworkInfrastructure as Code for Network
Infrastructure as Code for NetworkDamien Garros
 
Wellington MuleSoft Meetup 2021-02-18
Wellington MuleSoft Meetup 2021-02-18Wellington MuleSoft Meetup 2021-02-18
Wellington MuleSoft Meetup 2021-02-18Mary Joy Sabal
 
AEM OpenCloud Delivery Practices
AEM OpenCloud Delivery PracticesAEM OpenCloud Delivery Practices
AEM OpenCloud Delivery PracticesCliffano Subagio
 
Continuous Integration as a Way of Life
Continuous Integration as a Way of LifeContinuous Integration as a Way of Life
Continuous Integration as a Way of LifeMelissa Benua
 
The Rocky Cloud Road
The Rocky Cloud RoadThe Rocky Cloud Road
The Rocky Cloud RoadGert Drapers
 
IBM InterConnect 2015 - IIB in the Cloud
IBM InterConnect 2015 - IIB in the CloudIBM InterConnect 2015 - IIB in the Cloud
IBM InterConnect 2015 - IIB in the CloudAndrew Coleman
 
VMworld Europe 2014: Taking Reporting and Command Line Automation to the Next...
VMworld Europe 2014: Taking Reporting and Command Line Automation to the Next...VMworld Europe 2014: Taking Reporting and Command Line Automation to the Next...
VMworld Europe 2014: Taking Reporting and Command Line Automation to the Next...VMworld
 
SRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterSRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterAmazon Web Services
 
API Tips & Tricks - Policy Management and Elastic Deployment
API Tips & Tricks - Policy Management and Elastic DeploymentAPI Tips & Tricks - Policy Management and Elastic Deployment
API Tips & Tricks - Policy Management and Elastic DeploymentAxway
 
Remove Undifferentiated Heavy Lifting from Jenkins (DEV201-R1) - AWS re:Inven...
Remove Undifferentiated Heavy Lifting from Jenkins (DEV201-R1) - AWS re:Inven...Remove Undifferentiated Heavy Lifting from Jenkins (DEV201-R1) - AWS re:Inven...
Remove Undifferentiated Heavy Lifting from Jenkins (DEV201-R1) - AWS re:Inven...Amazon Web Services
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 
DevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoDevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoAmazon Web Services
 
Packaging tool options
Packaging tool optionsPackaging tool options
Packaging tool optionsLen Bass
 
Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...SaltStack
 
Calculating the Savings of Moving Your Drupal Site to the Cloud
Calculating the Savings of Moving Your Drupal Site to the CloudCalculating the Savings of Moving Your Drupal Site to the Cloud
Calculating the Savings of Moving Your Drupal Site to the CloudAcquia
 

Semelhante a Real World Cloud Application Security (20)

Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOps
 
20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software20140708 - Jeremy Edberg: How Netflix Delivers Software
20140708 - Jeremy Edberg: How Netflix Delivers Software
 
Startup Showcase - QuizUp
Startup Showcase - QuizUpStartup Showcase - QuizUp
Startup Showcase - QuizUp
 
Configuration Management in the Cloud | AWS Public Sector Summit 2017
Configuration Management in the Cloud | AWS Public Sector Summit 2017Configuration Management in the Cloud | AWS Public Sector Summit 2017
Configuration Management in the Cloud | AWS Public Sector Summit 2017
 
Infrastructure as Code for Network
Infrastructure as Code for NetworkInfrastructure as Code for Network
Infrastructure as Code for Network
 
Devops architecture
Devops architectureDevops architecture
Devops architecture
 
Wellington MuleSoft Meetup 2021-02-18
Wellington MuleSoft Meetup 2021-02-18Wellington MuleSoft Meetup 2021-02-18
Wellington MuleSoft Meetup 2021-02-18
 
AEM OpenCloud Delivery Practices
AEM OpenCloud Delivery PracticesAEM OpenCloud Delivery Practices
AEM OpenCloud Delivery Practices
 
Continuous Integration as a Way of Life
Continuous Integration as a Way of LifeContinuous Integration as a Way of Life
Continuous Integration as a Way of Life
 
The Rocky Cloud Road
The Rocky Cloud RoadThe Rocky Cloud Road
The Rocky Cloud Road
 
IBM InterConnect 2015 - IIB in the Cloud
IBM InterConnect 2015 - IIB in the CloudIBM InterConnect 2015 - IIB in the Cloud
IBM InterConnect 2015 - IIB in the Cloud
 
VMworld Europe 2014: Taking Reporting and Command Line Automation to the Next...
VMworld Europe 2014: Taking Reporting and Command Line Automation to the Next...VMworld Europe 2014: Taking Reporting and Command Line Automation to the Next...
VMworld Europe 2014: Taking Reporting and Command Line Automation to the Next...
 
SRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver FasterSRV312 DevOps on AWS: Building Systems to Deliver Faster
SRV312 DevOps on AWS: Building Systems to Deliver Faster
 
API Tips & Tricks - Policy Management and Elastic Deployment
API Tips & Tricks - Policy Management and Elastic DeploymentAPI Tips & Tricks - Policy Management and Elastic Deployment
API Tips & Tricks - Policy Management and Elastic Deployment
 
Remove Undifferentiated Heavy Lifting from Jenkins (DEV201-R1) - AWS re:Inven...
Remove Undifferentiated Heavy Lifting from Jenkins (DEV201-R1) - AWS re:Inven...Remove Undifferentiated Heavy Lifting from Jenkins (DEV201-R1) - AWS re:Inven...
Remove Undifferentiated Heavy Lifting from Jenkins (DEV201-R1) - AWS re:Inven...
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
DevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San FranciscoDevOps on AWS: DevOps Day San Francisco
DevOps on AWS: DevOps Day San Francisco
 
Packaging tool options
Packaging tool optionsPackaging tool options
Packaging tool options
 
Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...Spot Trading - A case study in continuous delivery for mission critical finan...
Spot Trading - A case study in continuous delivery for mission critical finan...
 
Calculating the Savings of Moving Your Drupal Site to the Cloud
Calculating the Savings of Moving Your Drupal Site to the CloudCalculating the Savings of Moving Your Drupal Site to the Cloud
Calculating the Savings of Moving Your Drupal Site to the Cloud
 

Real World Cloud Application Security

  • 1. Real World Cloud Application Security Lessons Learned Running Large Scale Systems in the Public Cloud Jason Chan - chan@netflix.com
  • 2. Netflix, Inc. “With more than 27 million streaming members in the United States, Canada, Latin America, the United Kingdom and Ireland, Netflix, Inc. is the world's leading internet subscription service for enjoying movies and TV programs . . .” Source: http://ir.netflix.com
  • 3. Me • Cloud Security Architect @ Netflix • Responsible for: • Cloud app, product, infrastructure, ops security • Previously: • Led security team @ VMware • Earlier, primarily security consulting at @stake, iSEC Partners
  • 6. Lots of Good Advice • BSIMM • Microsoft SDL • SAFECode
  • 7. But, what works? Forrester Consulting, 12/10
  • 8. Especially, given phenomena such as DevOps, cloud, agile, and the unique characteristics of an organization?
  • 10. Netflix in the Cloud - Why?
  • 11. Netflix in the Cloud - Why?
  • 12. Netflix in the Cloud - Why? “Undifferentiated heavy lifting”
  • 13. Netflix in the Cloud - Why? “Undifferentiated heavy lifting”
  • 14. Netflix in the Cloud - Why? “Undifferentiated heavy lifting”
  • 15. Netflix is now ~99% in Public Cloud
  • 16. On the way to the cloud . . .
  • 17. On the way to the cloud . . . (or NoOps, depending on definitions)
  • 18. Some As-Is #s • 27m+ subscribers • 10,000s of systems • 100s of engineers, apps • ~250 test deployments/day * • ~70 production deployments/day * * Sample based on this week’s activities
  • 19. Deploying Code @ Netflix
  • 20. A common graph @ Netflix
  • 21. A common graph @ Netflix Lots of watching in prime time
  • 22. A common graph @ Netflix Lots of watching in prime time Not as much in early morning
  • 23. A common graph @ Netflix Lots of watching in prime time Not as much in early morning Old way - pay and provision for peak, 24/7/365
  • 24. A common graph @ Netflix Lots of watching in prime time Not as much in early morning Old way - pay and provision for peak, 24/7/365 Multiply this pattern across the dozens of apps that comprise the Netflix streaming service
  • 25. Solution: Load-Based Autoscaling
  • 27. Autoscaling • Goals:
  • 28. Autoscaling • Goals: • # of systems matches load requirements
  • 29. Autoscaling • Goals: • # of systems matches load requirements • Load per server is constant
  • 30. Autoscaling • Goals: • # of systems matches load requirements • Load per server is constant • Happens without intervention (the ‘auto’ in autoscaling
  • 31. Autoscaling • Goals: • Results: • # of systems matches load requirements • Load per server is constant • Happens without intervention (the ‘auto’ in autoscaling
  • 32. Autoscaling • Goals: • Results: • # of systems matches • Continuously load requirements adding/removing nodes • Load per server is constant • Happens without intervention (the ‘auto’ in autoscaling
  • 33. Autoscaling • Goals: • Results: • # of systems matches • Continuously load requirements adding/removing nodes • Load per server is constant • New nodes must mirror existing • Happens without intervention (the ‘auto’ in autoscaling
  • 34. Every change requires a new cluster push (not an incremental change to existing systems)
  • 35. Deploying must be easy (it is)
  • 37. Netflix Deployment Pipeline Perforce/Git Code change Config change
  • 38. Netflix Deployment Pipeline RPM file with app-specific bits YUM Perforce/Git Code change Config change
  • 39. Netflix Deployment Pipeline RPM file with app-specific bits YUM Perforce/Git Bakery Code change Base image + Config change RPM
  • 40. Netflix Deployment Pipeline RPM file with VM template app-specific ready to launch bits YUM AMI Perforce/Git Bakery Code change Base image + Config change RPM
  • 41. Netflix Deployment Pipeline RPM file with VM template app-specific ready to launch bits YUM AMI Perforce/Git Bakery ASG Code change Base image + Cluster config Config change RPM Running systems
  • 42. Netflix Deployment Pipeline RPM file with VM template app-specific ready to launch bits YUM AMI Perforce/Git Bakery ASG Code change Base image + Cluster config Config change RPM Running systems
  • 43. Operational Impact • No changes to running systems • No systems management infrastructure • Fewer logins to prod • No snowflakes • Trivial “rollback”
  • 44. Security Impact • Need to think differently on: • Vulnerability management • Patch management • User activity monitoring • File integrity monitoring • Forensic investigations
  • 45. Org, architecture, deployment is different, what about security?
  • 46. We’ve adapted too. Some principles we’ve found useful.
  • 48. Base AMI Security • AMI = Amazon Machine • Average age of running Image instance: 24 days* • @ Netflix, all apps are • 60% of instances less based on “Base AMI”, than 1 week old* and new pushes pick up the latest • Concentrating testing and improvements here provides greatest impact * Based on one-time sampling (yesterday)
  • 49. Base AMI Testing • The base AMI is managed like other packages, via P4, Jenkins, etc. • We watch the base AMI’s SCM directory & kick off testing when it changes • Launch an instance of the AMI, perform vuln scan and other checks
  • 50. Base AMI Testing • The base AMI is managed like other packages, via P4, Jenkins, etc. • We watch the base AMI’s SCM directory & kick off testing when it changes • Launch an instance of the AMI, perform vuln scan and other checks
  • 51. Base AMI Testing • The base AMI is managed like other packages, via P4, Jenkins, etc. • We watch the base AMI’s SCM directory & kick SCAN COMPLETED ALERT off testing when it Site name: AMI1 changes Stopped by: N/A Total Scan Time: 4 minutes 46 seconds • Critical Vulnerabilities: 5 Launch an instance of Severe Vulnerabilities:   4 Moderate Vulnerabilities: 4 the AMI, perform vuln scan and other checks
  • 52. Security Packaging • All security tools use the same toolchain as the rest of engineering (P4/Git, Jenkins, etc.)
  • 53. • From the RPM spec file of a webserver: Requires: ossec cloudpassage nflx-base-harden hyperguard-enforcer
  • 54. • Pulls in the following RPMs: • Host hardening package • WAF agent • OSSEC (HIDS agent) • CloudPassage (config assessment, FW, etc.)
  • 55. Static Analysis • Available self-service through build environment (FindBugs, PMD) • Jenkins (CI) plugin to display graphs and support drill through to results
  • 56.
  • 58. Many systems involved, standardization is important
  • 59. Central Alerting Gateway • A single place to generate alerts • Python, Java libraries (or json post) to easily alert on events of interest • Ties in to PagerDuty notification system • Allows for stateful alerting and some response • A prerequisite that our tools will leverage
  • 60. CAG Example import CORE.Gateway gw = CORE.Gateway.Gateway() gw.send("testcluster", "normal", "Something went wrong")
  • 61. Chronos • Timeline system (API and UI) with Java/ Python libraries, or json post • Track config changes, deployments, etc. • Security tools also leverage for tracking and analysis
  • 62. Chronos Security Examples • What IP addresses have been blacklisted by the WAF in the last few weeks? GET /api/v1/event? timelines=type:blacklist&start=20121012000000000 • Which security groups have changed today? GET /api/v1/event? timelines=type:securitygroup&start=20121024000000000
  • 63. Make the right way easy (and secure)
  • 64. Cryptex • Many uses of crypto in web/distributed systems: • Encrypt/decrypt (cookies, data, etc.) • Sign/verify (URLs, data, etc.) • Known as an area where developers should not DIY
  • 65. • Multi-layer crypto system (HSM basis, scale out layer) • Easy for developers to use • Key management handled transparently • Access control and auditable operations ICipherContext cipherContext = CryptexClientFactory.getCipherContext(KeySet.testkey); // encryption String cipherText = cipherContext.encrypt("NETFLIX"); // decryption String plainText = cipherContext.decrypt(cipherText);
  • 66. Cloud SSO • Authenticated access to dashboards, admin apps in the cloud is problematic • No datacenter access, no LDAP, AD
  • 67. Cloud SSO • Solution - leverage OneLogin SaaS SSO option (SAML) used by IT for enterprise apps • Built filter that integrates with our platform web server to make SSO/authentication trivial
  • 69. Culture of ‘freedom and responsibility’ precludes traditional centralized, command and control approach
  • 70. Security Monkey • Cloud APIs make • Includes: verification and analysis of configuration & running state simpler • Cert checking • Firewall analysis • Security Monkey created as the framework for this analysis • IAM entity analysis • Limit warnings
  • 71. Security Monkey From:  Security Monkey Date:  Wed, 24 Oct 2012 17:08:18 +0000 To:  Security Alerts Subject:  prod Changes Detected         Table of Contents:             Security Groups                                  Changed Security Group                                                                       <sgname> (eu-west-1 / prod)                          <#Security Group/<sgname> (eu-west-1 / prod)>                     
  • 72. Exploit Monkey • Autoscaling group is unit of deployment, so changes signal a good time to rerun dynamic scans On 10/23/12 12:35 PM, Exploit Monkey wrote: I noticed that testapp-live has changed current ASG name from testapp- live-v001 to testapp-live-v002. I'm starting a vulnerability scan against test app from these private/ public IPs: 10.29.24.174
  • 73. ELB Checker (gauntlt) • AWS’ Elastic Load Balancer (ELB) provides cross- datacenter traffic balancing, but no security controls (if your cluster is attached to an ELB, it is available to the Internet) • Engineers may misunderstand use cases for ELBs, security features, and/or other measures that can be used to protect ELB-fronted clusters
  • 74. Solution: gauntlt Testing 1. Launch gauntlt test runner instance, loaded with “master list” of ELBs and expected state 2. Determine “target list” of current ELBs to evaluate 3. Generate per-ELB listener gauntlt attack files 4. Execute attacks 5. Alert on failures and new ELBs 6. Triage findings and update ELB master list
  • 75. Self-service, with exceptions
  • 76. AWS Security Groups • Asgard cloud orchestration tool allows developers to configure their own firewall rules • Limited to same-account groups, no IP-based rules • Handles 95% of requirements, JIRAs for additional changes, and Security Monkey to keep an eye on things
  • 77. Takeaways • Netflix runs a large, dynamic service in AWS • Good guidance + specific context can help jumpstart a pragmatic security program • Newer concepts like cloud & DevOps need updated approach to security • Don’t swim upstream - integrate and collaborate with your engineering partners
  • 78. Netflix References • http://netflix.github.com/ • http://techblog.netflix.com/ • http://slideshare.net/netflix
  • 79. Other References • http://www.webpronews.com/netflix-outage-angers- customers-2008-08 • http://www.pcmag.com/article2/0,2817,2395372,00.asp • http://www.readwriteweb.com/archives/ etech_amazon_cto_aws.php • http://bsimm.com/online/ • http://www.microsoft.com/en-us/download/ confirmation.aspx?id=29884 • http://www.gauntlt.org

Notas do Editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n
  53. \n
  54. \n
  55. \n
  56. \n
  57. \n
  58. \n
  59. \n
  60. \n
  61. \n
  62. \n
  63. \n
  64. \n
  65. \n
  66. \n
  67. \n
  68. \n
  69. \n
  70. \n
  71. \n
  72. \n
  73. \n
  74. \n