SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Robin van Zijll & Janna Brummel 24 May 2018
How We Try to Make a Lion Bulletproof
Setting Up SRE in a Global Financial Organization
2
Introductions
Janna Brummel
IT Chapter Lead SRE
Robin van Zijll
Product Owner SRE
ING is a global financial service provider servicing more than 35 million customers. In the
Netherlands we are the banking sector market leader with over 8 million retail customers
3
Customers
35 million
private, corporate and
institutional customers
Countries
more than 40
In Europe, Asia, Australia,
North and South America
Employees
52,000 worldwide
12,416 in NL
Market leaders Benelux
Growth markets
Commercial Banking
Challengers
4
Mobile Banking used by
3,5 million customers
who generate 4,4 million
log ins per day.
Internet Banking used
by 6,1 million customers
who jointly log in 1,4
million times a day.
17400 machines are
spread over 2 data
centers and use 14 PB of
storage.
99.72
99.65
0.19
0.22
0.09
0.14
99.40%
99.60%
99.80%
100.00%
Internet Banking Retail Mobile Banking Retail
Availability Report of 2017
Availability Change Incident
Why do we need to improve the reliability of our services?
5
Site Reliability Engineering, as pioneered by Google, is doing
work historically done by operations teams but using
engineers who aim is to automate the toil within their
organization.
By design, it is crucial that SRE teams are focused on
engineering. There is a 50% cap on operational work (tickets,
on-call, manual tasks) and at least 50% of SRE time should
be spent on engineering.
Site Reliability Engineering (SRE) is what happens when you ask a software
engineer to design an operations team
6
Within ING we have a number of challenges related to our reliability that we
want to solve through SRE
7
Teams are not in control of monitoring solutions and
cannot fix it when broken.
It takes too long for an alert to reach the right team: on
average we need 69 minutes before an engineer starts
working an incident resolution.
We do not learn enough from mistakes made – we have
yet to become a learning organization.
We prove we are in control with documents, not by
checking the actual state of our code in production.
Teams are not always aware of their services’
performance and cannot take full responsibility for run.
Our centralized monitoring solutions sometimes
encounter scalability and availability issues.
Our centralized alerting solution is unreliable and
does not send alerts directly to BizDevOps teams.
The same incidents occur multiple times and
we do not follow up on incidents enough.
Our engineers spend more time on completing
documents than coding.
Teams do not always measure availability
from a white box monitoring perspective.
We have adopted the Spotify model and work in Tribes composed of
BizDevOps squads: our SRE team is positioned centrally within NL as a silo
8
SRE
enable & supportCL
PO
Product
Development
Capacity Planning
Testing + Release
Procedures
Postmortem/Root Cause Analysis
Incident Response
Monitoring
Our SRE team enables engineering teams through delivery of tooling,
facilitation, consulting and education
9
We facilitate BizDevOps squads during post mortems
and consult whenever our help is needed in fixing or
identifying reliability issues.
We build tooling to enable BizDevOps squads. At the moment
we focus on Prometheus (alerting, white box monitoring and
traffic modeling) and Mattermost (ChatOps).
We educate others about SRE during demos and
we develop training materials.
We facilitate the creation of more SRE teams and
ask them to join our SRE community meetings
with the other NL-based SRE teams.
We are not on call: BizDevOps teams are
responsible for their own build and run.
We aim to reduce our time to repair through engineering by improving our
monitoring with Prometheus and introducing ChatOps with MatterMost
10
pull metrics
queries
push alerts
Prometheus &
Alert manager
And now for the E in SRE: Introducing the Reliability Toolkit
11
Alert
Manager
Model
Builder
SMS
E-mail
ChatOps
Tools Metrics
NLA
Client libraries in engineering frameworks
CollectD
Alert
Manager
Model
Builder
SMS
E-mail
ChatOps
Alert
Manager
Model
Builder
SMS
E-mail
ChatOps
Alert
Manager
Model
Builder
SMS
E-mail
ChatOps
12
14
Our learnings after two years of SRE at ING
15
People
Process
Technology
▪ Never compromise on mindset in hiring SREs.
▪ Assign a PO to protect team focus on engineering and to spread the SRE love.
▪ Consider what mix works well for you in terms of new and existing hires, or think about
possibilities of SRE internships.
▪ Test if SRE works for you by doing a pilot phase.
▪ Have a vision on your definition of SRE as a team, define a roadmap together.
▪ Learn from others through online resources, at conferences or company visits.
▪ Prepare to spend time explaining and promoting SRE and your tooling.
▪ Beer o’clock is great for team bonding.
▪ Make it attractive for others to use your tooling: take away pain from teams,
incorporate your tooling in widely used frameworks, find quick wins.
▪ Productization takes time, a lot of time. Don’t underestimate this.
▪ Consider scalability and ownership in your tooling strategy.
Questions?

Mais conteúdo relacionado

Semelhante a How We Try to Make a Lion Bulletproof; Setting up SRE in a Global Financial Organization

eLuminous Technologies Pvt Ltd. - Company Overview.
eLuminous Technologies Pvt Ltd. - Company Overview.eLuminous Technologies Pvt Ltd. - Company Overview.
eLuminous Technologies Pvt Ltd. - Company Overview.Shweta Joshi
 
Ariba Female CEO's Feature Summer 2001 Emily Brady
Ariba Female CEO's Feature Summer 2001 Emily BradyAriba Female CEO's Feature Summer 2001 Emily Brady
Ariba Female CEO's Feature Summer 2001 Emily Bradyebrady
 
Marketing scrum at VODW dag
Marketing scrum at VODW dagMarketing scrum at VODW dag
Marketing scrum at VODW dagJeroen Molenaar
 
Surge engr 245 lean launchpad stanford 2020
Surge engr 245 lean launchpad stanford 2020Surge engr 245 lean launchpad stanford 2020
Surge engr 245 lean launchpad stanford 2020Stanford University
 
Intellectsoft Overview
Intellectsoft OverviewIntellectsoft Overview
Intellectsoft OverviewRyan Nguyen
 
The Future of Business Intelligence - What's On The Horizon, And How CIOs Can...
The Future of Business Intelligence - What's On The Horizon, And How CIOs Can...The Future of Business Intelligence - What's On The Horizon, And How CIOs Can...
The Future of Business Intelligence - What's On The Horizon, And How CIOs Can...Christian Ofori-Boateng
 
Hashroot Technologies | Server Management | Cloud Management | Security Servi...
Hashroot Technologies | Server Management | Cloud Management | Security Servi...Hashroot Technologies | Server Management | Cloud Management | Security Servi...
Hashroot Technologies | Server Management | Cloud Management | Security Servi...HashRoot Technologies
 
Lscon16 414 Gaining Executive Buy-in For Your Learning Ecosystem
Lscon16 414 Gaining Executive Buy-in For Your Learning EcosystemLscon16 414 Gaining Executive Buy-in For Your Learning Ecosystem
Lscon16 414 Gaining Executive Buy-in For Your Learning EcosystemJohn Delano
 
How to successfully outsource for your small business
How to successfully outsource for your small businessHow to successfully outsource for your small business
How to successfully outsource for your small businessAryavrat Infotech Inc.
 
DoIT outsourcing in Ukraine
DoIT outsourcing in UkraineDoIT outsourcing in Ukraine
DoIT outsourcing in UkraineTetiana Rusanova
 
Mindbowser Infosolutions Portfolio - 2019
Mindbowser Infosolutions Portfolio - 2019Mindbowser Infosolutions Portfolio - 2019
Mindbowser Infosolutions Portfolio - 2019Mindbowser Inc
 
The Interim CIO
The Interim CIOThe Interim CIO
The Interim CIObreid8074
 

Semelhante a How We Try to Make a Lion Bulletproof; Setting up SRE in a Global Financial Organization (20)

eLuminous Technologies Pvt Ltd. - Company Overview.
eLuminous Technologies Pvt Ltd. - Company Overview.eLuminous Technologies Pvt Ltd. - Company Overview.
eLuminous Technologies Pvt Ltd. - Company Overview.
 
eLuminous Technologies - Business Overview 2016
eLuminous Technologies - Business Overview 2016eLuminous Technologies - Business Overview 2016
eLuminous Technologies - Business Overview 2016
 
About_ITV_one
About_ITV_oneAbout_ITV_one
About_ITV_one
 
Ariba Female CEO's Feature Summer 2001 Emily Brady
Ariba Female CEO's Feature Summer 2001 Emily BradyAriba Female CEO's Feature Summer 2001 Emily Brady
Ariba Female CEO's Feature Summer 2001 Emily Brady
 
Marketing scrum at VODW dag
Marketing scrum at VODW dagMarketing scrum at VODW dag
Marketing scrum at VODW dag
 
Surge engr 245 lean launchpad stanford 2020
Surge engr 245 lean launchpad stanford 2020Surge engr 245 lean launchpad stanford 2020
Surge engr 245 lean launchpad stanford 2020
 
Intellectsoft Overview
Intellectsoft OverviewIntellectsoft Overview
Intellectsoft Overview
 
The Future of Business Intelligence - What's On The Horizon, And How CIOs Can...
The Future of Business Intelligence - What's On The Horizon, And How CIOs Can...The Future of Business Intelligence - What's On The Horizon, And How CIOs Can...
The Future of Business Intelligence - What's On The Horizon, And How CIOs Can...
 
SiboneloDlaminiPOE
SiboneloDlaminiPOESiboneloDlaminiPOE
SiboneloDlaminiPOE
 
Hashroot Technologies | Server Management | Cloud Management | Security Servi...
Hashroot Technologies | Server Management | Cloud Management | Security Servi...Hashroot Technologies | Server Management | Cloud Management | Security Servi...
Hashroot Technologies | Server Management | Cloud Management | Security Servi...
 
ICS - Introduction
ICS - IntroductionICS - Introduction
ICS - Introduction
 
Spritle corp
Spritle corpSpritle corp
Spritle corp
 
Lscon16 414 Gaining Executive Buy-in For Your Learning Ecosystem
Lscon16 414 Gaining Executive Buy-in For Your Learning EcosystemLscon16 414 Gaining Executive Buy-in For Your Learning Ecosystem
Lscon16 414 Gaining Executive Buy-in For Your Learning Ecosystem
 
How to successfully outsource for your small business
How to successfully outsource for your small businessHow to successfully outsource for your small business
How to successfully outsource for your small business
 
DoIT outsourcing in Ukraine
DoIT outsourcing in UkraineDoIT outsourcing in Ukraine
DoIT outsourcing in Ukraine
 
Mindbowser Infosolutions Portfolio - 2019
Mindbowser Infosolutions Portfolio - 2019Mindbowser Infosolutions Portfolio - 2019
Mindbowser Infosolutions Portfolio - 2019
 
The Interim CIO
The Interim CIOThe Interim CIO
The Interim CIO
 
Proposal for pos
Proposal for posProposal for pos
Proposal for pos
 
Proposal for pos
Proposal for posProposal for pos
Proposal for pos
 
Robotic Process Automation Webinar Slides
Robotic Process Automation Webinar SlidesRobotic Process Automation Webinar Slides
Robotic Process Automation Webinar Slides
 

Mais de J On The Beach

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayJ On The Beach
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t HaveJ On The Beach
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...J On The Beach
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoTJ On The Beach
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsJ On The Beach
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternJ On The Beach
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorJ On The Beach
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.J On The Beach
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EEJ On The Beach
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...J On The Beach
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorJ On The Beach
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTingJ On The Beach
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...J On The Beach
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysJ On The Beach
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to failJ On The Beach
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersJ On The Beach
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...J On The Beach
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every levelJ On The Beach
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesJ On The Beach
 

Mais de J On The Beach (20)

Massively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard wayMassively scalable ETL in real world applications: the hard way
Massively scalable ETL in real world applications: the hard way
 
Big Data On Data You Don’t Have
Big Data On Data You Don’t HaveBig Data On Data You Don’t Have
Big Data On Data You Don’t Have
 
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
Acoustic Time Series in Industry 4.0: Improved Reliability and Cyber-Security...
 
Pushing it to the edge in IoT
Pushing it to the edge in IoTPushing it to the edge in IoT
Pushing it to the edge in IoT
 
Drinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actorsDrinking from the firehose, with virtual streams and virtual actors
Drinking from the firehose, with virtual streams and virtual actors
 
How do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server patternHow do we deploy? From Punched cards to Immutable server pattern
How do we deploy? From Punched cards to Immutable server pattern
 
Java, Turbocharged
Java, TurbochargedJava, Turbocharged
Java, Turbocharged
 
When Cloud Native meets the Financial Sector
When Cloud Native meets the Financial SectorWhen Cloud Native meets the Financial Sector
When Cloud Native meets the Financial Sector
 
The big data Universe. Literally.
The big data Universe. Literally.The big data Universe. Literally.
The big data Universe. Literally.
 
Streaming to a New Jakarta EE
Streaming to a New Jakarta EEStreaming to a New Jakarta EE
Streaming to a New Jakarta EE
 
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
The TIPPSS Imperative for IoT - Ensuring Trust, Identity, Privacy, Protection...
 
Pushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and BlazorPushing AI to the Client with WebAssembly and Blazor
Pushing AI to the Client with WebAssembly and Blazor
 
Axon Server went RAFTing
Axon Server went RAFTingAxon Server went RAFTing
Axon Server went RAFTing
 
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
The Six Pitfalls of building a Microservices Architecture (and how to avoid t...
 
Madaari : Ordering For The Monkeys
Madaari : Ordering For The MonkeysMadaari : Ordering For The Monkeys
Madaari : Ordering For The Monkeys
 
Servers are doomed to fail
Servers are doomed to failServers are doomed to fail
Servers are doomed to fail
 
Interaction Protocols: It's all about good manners
Interaction Protocols: It's all about good mannersInteraction Protocols: It's all about good manners
Interaction Protocols: It's all about good manners
 
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
A race of two compilers: GraalVM JIT versus HotSpot JIT C2. Which one offers ...
 
Leadership at every level
Leadership at every levelLeadership at every level
Leadership at every level
 
Machine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind LibrariesMachine Learning: The Bare Math Behind Libraries
Machine Learning: The Bare Math Behind Libraries
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

How We Try to Make a Lion Bulletproof; Setting up SRE in a Global Financial Organization

  • 1. Robin van Zijll & Janna Brummel 24 May 2018 How We Try to Make a Lion Bulletproof Setting Up SRE in a Global Financial Organization
  • 2. 2 Introductions Janna Brummel IT Chapter Lead SRE Robin van Zijll Product Owner SRE
  • 3. ING is a global financial service provider servicing more than 35 million customers. In the Netherlands we are the banking sector market leader with over 8 million retail customers 3 Customers 35 million private, corporate and institutional customers Countries more than 40 In Europe, Asia, Australia, North and South America Employees 52,000 worldwide 12,416 in NL Market leaders Benelux Growth markets Commercial Banking Challengers
  • 4. 4 Mobile Banking used by 3,5 million customers who generate 4,4 million log ins per day. Internet Banking used by 6,1 million customers who jointly log in 1,4 million times a day. 17400 machines are spread over 2 data centers and use 14 PB of storage.
  • 5. 99.72 99.65 0.19 0.22 0.09 0.14 99.40% 99.60% 99.80% 100.00% Internet Banking Retail Mobile Banking Retail Availability Report of 2017 Availability Change Incident Why do we need to improve the reliability of our services? 5
  • 6. Site Reliability Engineering, as pioneered by Google, is doing work historically done by operations teams but using engineers who aim is to automate the toil within their organization. By design, it is crucial that SRE teams are focused on engineering. There is a 50% cap on operational work (tickets, on-call, manual tasks) and at least 50% of SRE time should be spent on engineering. Site Reliability Engineering (SRE) is what happens when you ask a software engineer to design an operations team 6
  • 7. Within ING we have a number of challenges related to our reliability that we want to solve through SRE 7 Teams are not in control of monitoring solutions and cannot fix it when broken. It takes too long for an alert to reach the right team: on average we need 69 minutes before an engineer starts working an incident resolution. We do not learn enough from mistakes made – we have yet to become a learning organization. We prove we are in control with documents, not by checking the actual state of our code in production. Teams are not always aware of their services’ performance and cannot take full responsibility for run. Our centralized monitoring solutions sometimes encounter scalability and availability issues. Our centralized alerting solution is unreliable and does not send alerts directly to BizDevOps teams. The same incidents occur multiple times and we do not follow up on incidents enough. Our engineers spend more time on completing documents than coding. Teams do not always measure availability from a white box monitoring perspective.
  • 8. We have adopted the Spotify model and work in Tribes composed of BizDevOps squads: our SRE team is positioned centrally within NL as a silo 8 SRE enable & supportCL PO
  • 9. Product Development Capacity Planning Testing + Release Procedures Postmortem/Root Cause Analysis Incident Response Monitoring Our SRE team enables engineering teams through delivery of tooling, facilitation, consulting and education 9 We facilitate BizDevOps squads during post mortems and consult whenever our help is needed in fixing or identifying reliability issues. We build tooling to enable BizDevOps squads. At the moment we focus on Prometheus (alerting, white box monitoring and traffic modeling) and Mattermost (ChatOps). We educate others about SRE during demos and we develop training materials. We facilitate the creation of more SRE teams and ask them to join our SRE community meetings with the other NL-based SRE teams. We are not on call: BizDevOps teams are responsible for their own build and run.
  • 10. We aim to reduce our time to repair through engineering by improving our monitoring with Prometheus and introducing ChatOps with MatterMost 10 pull metrics queries push alerts Prometheus & Alert manager
  • 11. And now for the E in SRE: Introducing the Reliability Toolkit 11 Alert Manager Model Builder SMS E-mail ChatOps Tools Metrics NLA Client libraries in engineering frameworks CollectD Alert Manager Model Builder SMS E-mail ChatOps Alert Manager Model Builder SMS E-mail ChatOps Alert Manager Model Builder SMS E-mail ChatOps
  • 12. 12
  • 13.
  • 14. 14
  • 15. Our learnings after two years of SRE at ING 15 People Process Technology ▪ Never compromise on mindset in hiring SREs. ▪ Assign a PO to protect team focus on engineering and to spread the SRE love. ▪ Consider what mix works well for you in terms of new and existing hires, or think about possibilities of SRE internships. ▪ Test if SRE works for you by doing a pilot phase. ▪ Have a vision on your definition of SRE as a team, define a roadmap together. ▪ Learn from others through online resources, at conferences or company visits. ▪ Prepare to spend time explaining and promoting SRE and your tooling. ▪ Beer o’clock is great for team bonding. ▪ Make it attractive for others to use your tooling: take away pain from teams, incorporate your tooling in widely used frameworks, find quick wins. ▪ Productization takes time, a lot of time. Don’t underestimate this. ▪ Consider scalability and ownership in your tooling strategy.