This document discusses the importance of implementing FinOps practices to optimize cloud spending. FinOps advocates for collaborative work between development, operations, and finance teams to provide transparency into infrastructure costs, optimize resource utilization, and balance speed of development with cloud efficiency. The document outlines why FinOps is needed due to rising cloud bills and lack of visibility. It proposes implementing tagging, metrics, and recommendation systems to allocate costs and identify optimization opportunities in a decentralized manner. FinOps requires cultural and process changes, as well as open source tooling, to establish a collaborative cost management approach.
2. Who we are
Alex Tokarev
Head of RnD Platform V Sberbank – platformv.sber.ru
AWS and GCP expert
The man who cares about money
Pavel Zhurov
Senior Cloud Engineer at VTB
Kubernetes security nerd
For God’s sake, be careful with hostPath in production
2
3. What not about
• Particular provider optimizations
• In-house money
• Application refactoring approach
But FinOps works there as well
3
4. What about
• What is FinOps
• What for
• FinOps stakeholders
• Allocation approach
• SDLC demo
• Maturity level
• Q&A
4
6. DevOps
“DevOps” is a movement that advocates:
1. A collaborative working relationship between development and
IT operations
2. Fast flow of planned work
3. High reliability, stability, resilience, and security of the
production environment
6
7. Cloud pains
• Financial decisions moved out form finance to engineers
• Unexpected bills at the end of month
• Huge files with billing data
• Unclear charges with micro-amounts
• Each provider has vendor-specific vocabulary
• Each provider has unique billing files structure
• IT department requests more and more cloud services
7
8. Big figures case
1. Company spends $875,387 for cloud each month
2. Proper spending?
3. Is $20,000/month cost optimization big enough?
8
9. Outliers and big figures hybrid case
• Company spends $874,387 for GCP per month
• A team purchases GPU server, which costs $5,000/month for a
new NN test
• The team forgets to switch it off
• Nobody will notice – 879 387 vs 875 387
9
10. Fake economy case
• Team A switches off server X each night because they use the
server for real-time tasks during working hours
• Team B switches off server Y each day because they use the
server for batch tasks during non-working hours
• Both think that they make cost-efficient decision
Just buy RI server, share it and get 20% discount!
10
14. More cloud pains
• No visibility for spends
• Overprovisioning
• No trust in IT department
• Budgeting cycle is complicated
• Financial department has full
control over cloud spends and
should approve each request
• No on-demand infrastructure and
on-demand spend
No enough speed – old approach irrelevant!
14
15. Decision
1. Operate at near-realtime
2. Eliminate monthly or quarterly spend reviews
3. Encourage finance and IT departments to work in harmony
15
16. FinOps
Near real time reporting
+
Just-in-time processes
+
IT and finance teams work together
+
Shared cloud dictionary
=
FinOps
+
Trust
=
Balance between speed of changes, availability of services and cloud costs
16
19. Why developers don’t like FinOps
• They never thought about money
• Constant pressure to deliver more and more features
• Cloud spends are not features – someone else’s job
• Top management doesn’t articulate cost reduction as goal
• They have never seen a cloud bill
19
20. Why developers need it
• Clear picture of spends – you can use more managed services
• Proven efficiency – more robust reserved instances could be
purchased
• More robust resources – more features with less efforts
• Money questions will arise at any case so let’s do it in cloud native
fashion
• Cool stuff in CV
20
21. How to encourage
• Rightsizing (not cost reduction!) should be a target for every team
• Provide a visibility of each team spends – teams should have an
option for benchmarking
• Just ask to help business
• Explain FinOps not about cutting resources, but accountability
• Always estimate efforts of implementing savings:
3 hours of engineer can save 10000 USD – cool
3 hours of engineer can save 100 USD – bad
It solves big figure issue as well
Soft
skills
21
22. Why CTO needs it
• Benchmark against industry
• Cloud spend limits could become less strict
• More money saved – more additional headcount or team
salary’s cap
• Proven value of tech and cloud investments
• Establishing completive advantage
• Improved time-to-market
FinOps centralized team should close to CTO!!!
22
23. Why finance needs it
• More precise spending forecast
• Overspending risks mitigation
• More clear understanding of the overall cloud efficiency
• Actual revenue per spent money
• Figures to negotiate with cloud vendors
• Cap for cloud reservation plans
23
24. FinOps implementation plan - IT
• Extract consumption data for compute from:
• Billing files
• Cloud API
• Allocate shared resources to teams
• Tagging
• Internal metrics
• Find idle resources
• Introduce Cloud-native FinOps tools
• Implement recommendations system
• Implement API-based decommissioning
• Implement AI-based outliers detection
Initial stage
True real-time
management
management
+
teams
teams
Intra company FinOps
visibility
teams
teams
24
25. Cloud native FinOps
• New mandatory endpoint -
/consumption
• Money consumption via
Prometheus exposition format
• Budget control – standard cloud
native monitoring tools
• Money-related tests in CI/CD
• Agile-related tags:
tribe/team/product/
• Per-second consumption
calculation
25
26. OpenSource cloud-native tooling Start from
Task Tool Profit
Storage Prometheus
VictoriaMetrics
Long-term consumption data storage
Real-time compute management Keda
Zalando kube-metrics-adapter
Scale up/scale down
Tagging enforcement Open Policy Agent Resource allocation
Alerting Prometheus Alert Manager Outlier notifications
Reporting and dashboards Grafana Price analytics
Idle identification Goldilocks Orphan compute tasks
Containers analytics Kubecost Shared compute allocation
26
Chargeback policy in CI/CD Argo-rollouts Consumption-based rollbacks
28. Cost allocation
• The process of splitting up a cloud bill and associating the costs
to each cost center dimension
• Allocation gaps must be shared between all teams
• Takes into account all cloud services: compute, storage,
network, etc.
• Could be achieved by:
• Internal price metrics - teams calculate dependent services
consumption
• Tagging – FinOps platform calculates metrics
28
29. Internal metrics
• A dedicated /price or /money or /consumption endpoint
• Timeseries data
• {tenant: <>, price_unit: <>, value: <>}
• Perfect for multi-tenant services
29
30. Internal metrics issues
• State should be stored somewhere
• Normalization for many replicas services
• Services restart
• Teams are reluctant to care about price metrics
30
31. Tagging
• Folksonomy
• Set of words to describe a resource
• Self-explaining
• About IT and finance
• Perfect for multi-instance services
31
32. Tagging issues
• Too verbose
• Typos
• Engineers tags not aligned with business tags
• Not all cloud products have tags
Example: API gateway endpoint, DB in RDS
• No options to use for multi-tenant products
32
33. Suggested tags
Business department which business unit could be charged
Service which cost center drives spendings
Team how much cost a team efforts in cloud
Name what’s a name of a component
Front/Back what is ratio between back and front
Answered question
Performance metrics for
company:
Team vs Team
Department
vs
Department
Tag
33
34. Suggested tags
Answered question
Tag
Tags are prohibited - they are for
allocation only – not access
control!
Security-related
Use cloud project/account level!
34
Prod/Test/Dev which costs are nonproduction and safe to turn off?
35. Tagging
• Not more 6 tags
• Be careful with typos
• Enforce mandatory tagging via automation
• Dedicated tags set for FinOps
35
36. Recommendation system
• Should be on regular basis by email reports - showback
• Should be on regular basis via ticketing system for developers
• Should be taken in priority for a next sprint
• A way to get rid of engineer offence – it’s not me – it is JIRA
• A source for statistics:
• Recommendation count per release
• How many gave actual money savings
• How many were ignored
• What is average time to be taken in a sprint
• How many teams use recommendations
Recommendation discussion – it is a chance
Soft skill next level
36
37. Ticketing recommendations
By tickets to IT about usage reduction by:
- decommission resources
- rightsizing
- more expensive flavors but packed by many extra workloads
- serverless services
By tickets to finance by:
- rate reduction by commercial agreement
- reserved instances
IT is happy – finance will struggle from tickets as well
37
38. FinOps implementation plan - Management
• Introduce FinOps approach
• Implement FinOps for multi-instance services
• Implement FinOps for multi-tenant services
• Create finance and IT dashboards
• Start teams showback-ing
• Create cost allocation dashboards and cost optimization
learning materials
• Introduce chargeback
• Create top-management dashboards
38
39. xBack
Showback reports - show teams what they are spending, but
money are allocated internally from a central budget
Chargeback reports - show teams what they are spending and
money are consumed internally from a team budget
39
40. Where to hire
Nobody knows so
you must foster in
your team!
A person who is happy with cloud but ready for fiscal and business thinking
+
Decent soft skills level
40
42. FinOps maturity level
Idle/underutilized resources
removal
Quarterly based manual Monthly based manual Daily AI based via API
Automation approach Spreadsheets macros Scripts for notifications API-based cloud automation
Optimization approach No Idle removing
Software refactoring
Reserved instances
Idle removing
Rightsizing
Software refactoring
Notification approach By emails from finance Emails from FinOps team
Price dashboard for an
application
Tickets in tracking system
Price dashboard per service
Allocation awareness No Showback reports Chargeback information
Teams budget No budget Budget in not real money Budget in real money
Allocation approach No allocation Cloud provider accounts Tagging and metrics
Data retention Mailbox lifetime Limited granular data Unlimited full dataset
Cost awareness timing Quarterly Monthly Near-real time
Before FinOps Level 1 Level 2
Cost extraction approach Vendor invoices manual
reconciliation
Automatic vendor invoices
reconciliation
API-based reconciliation
42
43. What is cloud
• On-demand
• Scalable
• Self-service
• Measurable
43
44. FinOps
“FinOps” is a movement that advocates:
1. a collaborative working relationship between DevOps and
Finance data-driven management of infrastructure spending
2. Transparency between IT and finance
3. Cost efficiency, profitability and product delivery pace
44
45. Risks
• Reducing spend at the cost of innovation or at the cost of
impacting an important project
• Recommendations don’t consider spikes in utilization
• Forgetting about disaster recovery overprovisioning
• Failing to rightsize beyond compute – databases, API gateways,
etc.
• Too many reserved instances
• Neglecting very small savings – forgot to multiply by 365
Beware stubborn FinOps!
45
46. Conclusion
• FinOps is feasible with opensource technologies
• Encourage ownership of each teams to govern cloud usage
• Decentralize reduction of usage to engineers
• Centralizing reduction of spendings to finance
• FinOps is not only software and money – it is about soft skills
• FinOps is not about saving money - FinOps is about making
money
46
Пример где много ресурсов и одна команда заказывает сервер на 5000 usd
For instance, one team may have high resource usage during the day while
another has high resource usage during the night. Based on their usage, it
probably doesn’t make sense for either team to commit to RIs individually. But
overall, there’s a consistent base of resources running across the 24-hour period.
The central team identifies the opportunity to commit to a reservation and save
both teams on the rate they pay for resources.
Зачем кому нужен финопс
Зачем кому нужен финопс
Зачем кому нужен финопс
Зачем кому нужен финопс
Зачем кому нужен финопс
Акцентировать, что рекомендации очень дают визибилити
Сказать зачем price unit – для сложных продуктов, где цена за несколько а-ля s3