O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
Virgin Money: Virgin Money's quest for digital performance perfection
With more than 3.2 million customers and a vastly complex tech landscape, Virgin Money's IT team faces huge pressure to provide the ultimate digital banking experience. In this candid Q&A session, Andy Lofthouse will dive into the company's journey from alert storm and countless hours of problem hunting, to rapid release cycles and precise digital experience insights, which has saved the company inordinate amounts of time and money.
Reactive to issues
Dynatrace Journey – From Synthetic to Full Stack
• Smartscape - vertical and horizontal
• Understand which services, hosts or
processes are talking to each other
• Understand the services, processes and
hosts are providing the application, directly
• No configuration and easy deployment!
• Nodes highlight red if in a current problem
• Quick drill down to the desired component
Why Dynatrace – first impressions
Quick time to value
• Not repeatable in Test and cannot be
troubleshooted with current tooling
• After months of investigation and customers
being impacted, the root-cause of the issue
cannot be found
• Issue causes severe slow downs for the users
and timeouts, eventually needing a manual
failover to the DR site
• Operations team mislead by current alerting on
their investigation path
• Poor customer experience drive
poor conversion rates
lost in War-room
up to today.
6 Virgin Money teams
and one 3rd party were
Has cost so far
impacted by bad tweets
First 2 weeks - Incidents & Alerting
Foglight Alerts - 128
• 61% of them were false alerts
• 39% of them were genuine issues.
• Out of that 39%, half of them were duplicate alerts
• Only 26 were real after duplicates/false etc taken out
Dynatrace Problem Resolution - 100
• 42% said problem resolved.
• Leaving 58% which were genuine
• 100% accurate!
Noise caused by poor alerting + poor troubleshooting + no Rootcause analysis
= 479 hours of investigation.
First value we saw
• Database CPU everynight
between 8 and 9pm
• Peak login times
• Couldn’t see this issue prior
Response time slow down
Improving collaboration across teams with shared metrics