Security in DevOps world - Evolving frameworks. Cluster Hardening best practices. Automation pipelines for managing infrastructure and PaaS. Continuous Security and DevOps Maturity Model.
15. There is only “DevOps”!
DevSecOps is redundant and unnecessary confusion!
Weshould build a culture where the Secis everyone’s responsibility and part of job role!
18. Karun Chennuri
• Sr. Engineer, Security Architecture at T-Mobile
• 14 yrs. of experience in IT that includes 12 yrs. in PaaS, Cloud Security,
Information Security, Security Solution Architecture
• Speaker at Spring One and CF Summit
• CISSP, CISM, ISO, CEH
• Multiple Open Source contributions (~ 5 major projects)
• https://www.linkedin.com/in/karunchennuri/
• @karunchennuri
21. Why DevOps?
Company Deploy
Frequency
DeployLead Time Reliability Customer Responsiveness
Amazon 23k/day Minutes High High
Google 5.5k/day Minutes High High
Netflix 500/day Minutes High High
Facebook 1/day Hours High High
Twitter 3/week Hours High High
Typical
enterprise
Once every 9
months
Months or
Quarters
Low/medium Low/medium
Agility metrics Reliability metrics
30x more frequent code deployments
8000x faster code deployment lead time
2x the change success rate
12x faster MTTR
Big 5 vs non-highperforming Orgs
MarketshareProfitability
22. DevOps SalientFeatures
• Fail fast and Fail safe
• Faster feedback loops
• Inject pressure proactively
• Value non-functionalrequirements
• No place for Low trust model
• Peer review
• Decision driven (Measure)
• RoutineCI/CD
• Don’t build when no ask!
Thousands
Containers
100+
Projectteams
Millions ofTransactions/day
SECURITY
28. Evolutionof ApplicationPatterns
Applicationsoftware(app for short) is computer software designedto perform
a group of coordinated functions, tasks, or activitiesfor thebenefitof the user.
Monolith
SOA
Microservices
Single Unit Coarse Grained Fine Grained
30. Dev{Sec}Ops MaturityModel
Metric Description
Deployment frequency Number ofdeployments toproduction in given time frame
Deployment leadtime (for apps) Time between code commit andproddeployment ofthat code
Change volume (for apps) Number ofuser stories deployed in a given time frame
Change failure rate % of production deployments that failed
MTTRecovery (for apps) Time b/w afailed proddeployment tofull restoration of prodoperations
Availability Amount of uptime/downtime in a given time period(SLA)
Customer issue volume Number ofissues reported by customers in a given time period
Customer issue resolution time Mean time toresolve a customer reported issue
Time tovalue Time between a feature request (user story creation) andrealization of business value from that feature
Time toATO (Authority to Operate) Time between the beginning ofSprint0 toachieving an ATO
Time topatch Vulnerabilities Time between identification ofa vulnerability andsuccessful patch rollout on prod
Source: GSA
40. Powered by T-Mobile
OPA as a sidecar
policy definition
Kubernetes
plugin
Customizable
(slack alerts)
TKE-V is a toolto enforce Policy-as-Code for Kubernetes
Deny Level Severities Blocked
OFF None
LOW HIGH
MED HIGH, MED
HIGH HIGH, MED, LOW
44. DevOps Myths
• DevOps replaces Agile
• DevOps means NoOps
• DevOps is only for Opensource
• DevOps is just IaaSor automation
• DevOps is only for startups!
Take an organization… take a team in it… talk to few folks in that team and post this question “What it feels like to live in a security enabled DevOps world?”
Here is what may be their reaction… in fact may be worse…
On the other hand in my humble opinion this is what sec team member might react -
And usually security person’s reaction!
That might be little harsh reaction
Some may hate me for what ‘am going to say in next few slides!
https://www.wikihow.com/Console-an-Upset-Friend
Dev One find morning meets Ops and says “Ops my friend world will now call us DevOps here on…” Ops feels excited about it.
On the other hand Sec heard this and is totally upset…
https://www.wikihow.com/Console-an-Upset-Friend
Chronic conflict between Dev and Ops preordains failure for the entire IT org, as well as the enterprise. High performing orgs such as Amazon, Google, Twitter, Etsy and Netflix are adopting a set of techniques that we now call DevOps, they are routinely and casually deploying hundreds or even thousands of production changes per day, while preserving world class reliability, stability and security. They are able to quickly deploy changes into production, with a code deployment lead time measured in minutes or hours, this enables them to innovate and out-experiment their competition in the marketplace, with higher quality and better customer outcomes.
DevOps leads to faster feature time to market, increased customer satisfaction, market share, employee productivity, allows orgs to wind in marketplace. In contrast to orgs taking weeks months in delivering a feature, DevOps shortens the lead time to few days.
In 2009, 10 deploys per day was considered fast. Now this is considered merely average. In 2012, Amazon went on record stating that they were doing, on average, 23,000 deploys per day.
Big 5 are deploying code 30 times more frequently and time required to go from “code committed” to “successfully running in prod” was 8000 times faster. High performers had lead times measured in minutes or hours, while lower performers weeks, months and quarters
When high performers deployed changes and code, they were twice likely to be completed successfully (i.e. wihtout causing a production outage or service impairment) and when the change failed and resulted in an incident the time required to resolve the incident was 12 times faster.
This explains how high performers are providing world class levesl of reliability, stability and security enabling them to out-experiment their competitors in marketplace. Overall speed in delivering things has resulted in exceeded profitability, market share and productivity goals.
“Fail Fast and Fail Safe” Is an important slogan that all high-performing orgs adopt. Thus they believe in faster feedback loops to prevent problematic code going to prod. Even if it goes, the issues are quickly detected and corrected!
Everyone in the value stream shares a culture that not only values each other’s time and contributions but also relentlessly injects pressure into the system of work to enable organizational learning and improvement.
Everyone in the team values non-functional requirements (quality,… operability). Why? Because nonfunctional requirements are just as important in achieving business objectives.
No place for low trust model eg: approval and compliance processes, command and control management culture… Instead in DevOps world rely on peer review so that everyone has confidence in the quality of the deliverable.
Everyone need to be a scientist, taking no assumptions for granted and doing nothing without measuring. DevOps doesn’t spend months/years building features that customers don’t actually want, deploy code that doesn’t work, fix something that isn’t acutally a problem.
CI/CD happening in the middle of the day during business hours seamlessly. Deployments should becoming routine and stress free jobs.
SOA has been standard dev practice for nearly 2 decades. Granular but not fine granular enough! Especially not resourceful when working with cloud computing, also limits feature request changes, not scalable easily.
Microservices are –
Easily deployable
Less dev time
Scale individually
Reusable in different projects
Better fault isolation
Work well with containers
Disadvantages:
Potentially too much granularity
Extra effort designing for communication between services
Latency during heavy use
Complex testing
Every maturity model is measurable on a scale of 5 levels.
https://www.devsecops.org/blog/2016/5/20/-security
o Securing software through “measurement”. Like all other – ilities, Security, too, must become a measurable capability in the art of deployed software.
o Hooking up security scanners to the CI/CD pipeline (isn’t just enough, you need more)
o Automation is key to solving major security issues – embrace CI/CD
o Steps to DevSecOps: Identify & eliminate Security gates, Training barriers, Communication barriers, Compile/track known weaknesses, Security curation (reduce false positives), Continuous monitoring etc.
Kata container: Stripped down guest kernel
gVisor: Intercepts system calls by acting as guest kernel in user space
Nabla: Limits system calls using unikernal and blocks rest with seccomp
Firecracker: Light weight microVM meant for running in non-virtualized environment
Grafeas: Container image scan and registry
Kritis: Admission control to verify sign of container images
Podman: Secure container with CRI-O
Kube-bench: Check your cluster against 100+ tests of the CIS Kubernetes Benchmark so you can harden it according to the best practices
Kube-hunter: Penetration testing tool that “attacks” your cluster and nodes, looking for configuration issues
Envoy can be used for “Systematic fault injection”, which can help us think in direction of using this architecture for performing Chaos Engineering attacks at app level. “Systematic” here can be introducing 400ms timeout on service calls, circuit breaker tests etc
Envoy can add fault tolerance/resiliency without any changes to the code.
Talking points I usually cover:- Use of OPA allows for generic policy definition/enforcement. Can be used in a shift-left model by calling OPA within CI/CD (we're not doing this yet, but I'm thinking of enabling this)- Deny Level allows a low barrier to entry with minimal disturbance and can be turned up as the dev organization matures. Also allows to set policy severity to fit your business needs (Not everyone views policies the same way/weight)- The alerts provide instant feedback to users and the use of "TKE" codes makes it easy for customers to locate info and remediation steps for specific policy failures- Model allows for alerts to platform team, and to development teams as well (customized by annotating namespaces with Slack Incoming Webhook URL)- Admission controller model works regardless of deployment tooling (kubectl, helm, client libraries, direct API, etc.)
Liveness Probe/Readiness Probe
Resource Requests/Limits
PDB Configuration
PVC Reclaim Policy
NodePort Whitelist
Privileged Pods
HostPath
HostNet
DevOps is logical continuation of Agile journey. “Done” in devops means code complete/fully tested and operating in production.
No need of Dev team opening a ticket for IT Ops to complete the work, many of these activities are automated so that devs can do it themselves. Having said that IT Ops is not completely eliminated.
DevOps is universal and applicable everywhere
DevOps is not just automation but it also talks about nonfunctional requirements we spoke about earlier
Nope it’s for established enterprises with 2 pizza team size.
I asked the same question again to a group of engineers (non security)
This is what their response is…
Team pointed at Security team saying… who is the most awesome person today?