This document discusses practical FinOps strategies for cloud cost optimization. It outlines key stakeholders in FinOps like engineers, product owners, executives, and procurement. It then details common FinOps processes like informing teams through data and transparency, optimizing resources through rightsizing, scheduling, and reserved instances, and continuously evaluating objectives. Specific examples provided include automating waste management, calculating savings from reserved instances and savings plans, using spot instances, and refactoring services to serverless architectures.
2. “FinOps is the practice of bringing financial accountability to the
variable spend model of cloud, enabling distributed teams tomake
business trade-offs between speed, cost, and quality.”
3. JANE SMITH
Engineering
Focus on building and
supporting services. Efficient
design and use of resources
via such activities as
rightsizing, allocating costs,
finding unused resources, and
identifying spending
anomalies.
JOHN SMITH
Product/Business Owner
Business owner needs to
understand cost implications
of features/services provided
to clients, but also provide
visibility of product roadmap
and growth plans.
PETER SMITH
Executive
Executives like a VP/Head of
Infrastructure, CTO or CIO
focus on driving
accountability and building
transparency, ensuring teams
are being efficient and not
exceeding budgets.
SALLY SMITH
Procurement
Finance and procurement
team members use the
reporting for accounting and
forecasting. Needs forecasts
and expertise for rate
negotiations with cloud
service providers.
MEETTHESTAKEHOLDERS
4. PROCESS IN THEORY
INFORM
Empower teams
with visibility.
Make sure
decisions are data
driven and
informed.
Promote best
practices with
transparency.
OPTIMIZE
Manage your
waste.
Right size.
Apply scheduling.
Utilize
Reservations and
Savings Plans.
OPERATE
Evaluate
objectives.
Validate metrics.
Analyze trends.
REPEAT
Continuous
evaluation of
objectives and
results allows to
improve the
process and fill in
the gaps.
7. SCHEDULE & RIGHT SIZE
• Measure CPU, Memory, Network, IO …
• Automated scheduling can be very effective.
12h * 5 days (60h) vs 24h * 7 days (168h)
• Stop vs Hibernate
• Upgrade to current generation instances.
• When there is no small-enough instance
consider serverless and/or sharing.
8. RESERVED CAPACITY
• EC2 Reserved Instances
• Discounted hourly rate
• Optional capacity reservation (but less flexibility)
• Standard, Convertible or Scheduled RIs
• Remember also other services
• RDS
• Redshift
• ElastiCache
• DynamoDB
• CloudFront (>10TB/month for 12 months)
9. SAVINGS PLANS
• First understand your steady-state
consumption and 12 month roadmap.
• Similar savings to reserved instances (RI)
• Compute Savings Plans apply to all instances*,
regardless of family, size, OS or region.
• Hourly(!) monetary commitment.
• Reservations are still relevant because
• SPs cover only EC2 and Fargate usage
• You can apply both to EC2 !!!
10. WHAT COST TO USE?
Unblended, Amortized, Blended, Net Unblended or Net Amortized ?
RI purchase
13. EXAMPLE - S3 LOGGING
• Requirement:
S3 buckets must have access logging!
• But don’t configure access logging for
your log storage bucket!
14. EXAMPLE - CLOUDTRAIL AUTOMATION
• Requirement:
Resources must be tagged with
owner/creator identification!
• Lambda triggered from Cloudwatch &
Cloudtrail would do this automatically.
• And when tested it did cost us “nothing”
• Until the security requirement to deliver
Cloudtrails to central archive!
Engineers want to do “best possible” while business needs good enough with low-cost.
Procurement role is getting smaller and variable spend will make them feel uneasy.
Executives want to know if ”everything is ok (=as planned)”
Empower teams with visibility.
Make sure decisions are data driven and informed.
Promote best practices (with correct cost allocation).
Unattached volumes -> Delete (potential security risk too)
Snapshots -> Configure AWS Backup
Unassociated IPs -> Delete, consider using DNS names instead.
Idle LBs -> Delete or Convert to shared ALB?
Underutilized instances/dbs -> Collect CPU/Memory/IOPS data so you know where to schedule and right size.
Waste created is because automation is missing (=no need to worry about breaking anything) or broken (=get it fixed).
If you don’t have data, there will be excuses.
Scheduling works for EC2 AND RDS(!)
Current gen gives typically more CPU/Mem/Network for lower price (but can require OS update)
Aurora Serverless, ECS Fargate, Lambda …
* including Fargate!
Unblended costs represent your usage costs on the day they are charged to you. For most of you, this is the only cost dataset that you will ever need.
Amortized costs is especially useful for those of you who have purchased AWS Reservations
Blended costs are calculated by multiplying each account’s service usage against something called a blended rate. A blended rate is the average rate of on-demand and reservation-related usage that is consumed by member accounts in an organization for a particular service.
The net unblended cost dataset reflects usage costs after discounts are applied while the net amortized cost dataset adds additional logic to amortize discount-related information, in addition to your Savings Plans or Reservation-related charges.
Typically you improve by reducing spikes -> compare with on-demand cost.(it also makes you look better)
“Run fault-tolerant workloads for up to 90% off”
Combine Spot Instances with On-Demand and Ris
No bidding -> less interruptions. Instances are only interrupted when on-demand/ri customer needs it.
Search for github for autotag, e.g. https://github.com/GorillaStack/auto-tag
Only the first copy of Cloudtrail is free.
This is TGW outbound services vpc –pattern.
Alternative could be sharing VPC with RAM.
Sharing resources (ALB, DB etc) is an anti-pattern. Use it wisely!
You can not justify all the effort with just the savings.
This is about tech.debt pay-back as well.
Red arrow is order of dependencies, green-arrow is order of return-of-investment.