In this presentation I will speak how are the SRE and DevOps, what is a reliability. Also about the reliability approach in Competitive Gaming in Wargaming and show a few cases.
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
SRE vs DevOps
1. SRE vs DevOps
Feel the difference
1
Levon Avakyan / Competetive
Gaming /
l_avakyan@wargaming.net
2. Content 2
• Definitions – to be one page
• SRE vs DevOps – little bit of phylosophy
• Approach – how to do well
• Cases – how we are doing in Competitive
Gaming
What I will speak about
4. Reliability 4
Little bit of the theory
Reliability is theoretically defined as the probability of
success (𝑹𝒆𝒍𝒊𝒂𝒃𝒊𝒍𝒊𝒕𝒚 = 𝟏 − 𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐨𝐟 𝐅𝐚𝐢𝐥𝐮𝐫𝐞), as the
frequency of failures; or in terms of availability, as a
probability derived from reliability, testability and
maintainability. Reliability plays a key role in the cost-
effectiveness of systems.
5. Reliability Engineering 5
Little bit of the theory
• Reliability engineering is engineering that
emphasizes dependability in the lifecycle
management of a product.
• Reliability engineering deals with the estimation,
prevention and management of high levels of
"lifetime" engineering uncertainty and risks of
failure.
6. Software Reliability 6
Little bit of the theory
• Software Reliability (SR) depends on good
requirements, design and implementation. Software
reliability engineering relies heavily on a disciplined
software engineering process to anticipate and
design against unintended consequences.
7. Site reliability engineering 7
Little bit of the theory
Site reliability engineering (SRE) is a discipline that
incorporates aspects of software engineering and
applies that to operations whose goals are to create
ultra-scalable and highly-reliable software systems.
SRE might be considered a subset of Devops that
possesses additional skill sets.
8. Development Operations 8
Little bit of the theory
DevOps is a term used to refer to a set of practices that
emphasize the collaboration and communication of
both software developers and information technology
(IT) professionals while automating the process of
software delivery and infrastructure changes. It aims at
establishing a culture and environment where building,
testing, and releasing software can happen rapidly,
frequently, and more reliably
10. 10
Site Reliability Engineering
• Main focus on to creation ultra-
scalable and highly reliable
software systems.
• It is a one of engineering
specializations
• Fully embedded in the lifecycle of
product
Development Operations
• Main focus on automated
deployment process on
production and staging
environments
• It is a role
• Mostly working with environments
SRE (SR) vs DevOps
Comprasion
11. SRE (SR) vs DevOps 11
Conclusion
• SRE (SR) is a broader concept than DevOps
• We cannot put versus between SRE (SR) and Devops
because they achieves the similar goals, but with
different approaches
14. Pre-production 14
Main purpose:
• Create specification for Development
• Clarify with business all details
Main artefacts are requirements and high level design (HLD) of new
feature/product
SRE Role:
• Review and clarify HLD
• Adding specifically requirements to improve reliability and
reduce impact to players in case of failures
15. Development 15
Main purpose:
• To develop the application
• To test the application
Main artefacts are release tag, SDD, test suites,
regulations/automation for release
SRE Role:
• Review and clarify SDD
• Monitoring design
• Load and performance test (tooling, environments)
• Stress tests
• Release preparations (tooling, massive migrations, release time
estimation)
16. Release 16
Main purpose:
• Check that application is ready to go production
• To deliver application to production environment
Main artefacts are released application and release postmortem
SRE Role:
• Review regulations
• Automatize process with standard tools
17. Post-Release 17
Main purpose:
• Monitoring
• Maintains
• Mitigating risks and decrease impact for user in case of outgages
Main artefacts are bugs and improvments for dev team and data for
product management team to analyze it
SRE Role:
• L2+-L3 maintains
• Data collection tools
18. Conclusion 18
• SRE is embedded in all life cycle of life
product
• Main aim of SRE it is increase reliability
• The scope of the responsibilities is very
variable and depends on company layout
23. Risks 23
World of Tanks Football Tournament
• High load
• A very long route for battle - a lot of points of outage
• First big load for Team Management System
• A lot of separated teams are working on event
24. What we have done 24
World of Tanks Football Tournament
• Did end to end load and performance test of system
• Got the prediction of players count from publisher
• Based on numbers create recommendation for the
schedule
• Added safe day in schedule
• Created tooling to move groups, steps, battels of
tournament to the other date
• Isolated battle processing and API
• Created auto scale configuration for workers
25. Global Map 25
Global Map
Features:
• Potentially increasing battle counts to proccess
• Have no chance to fault because it will influence to
the results of 3-week event
27. Risks 27
Global Map
• High load
• New gameplay features
• New vector tiles engines
• No chances to move battles
28. What we have done 28
Global Map
• Massive load test of new tiles vector engine
• Additional monitoring that based on game logic
• Added requirements to have opportunity to scale
most of workers
29. Conclusion 29
• SRE (SR) is a broader concept than DevOps
• We cannot put versus between SRE (SR) and Devops
because they achieves the similar goals, but with
different approaches
• SRE is embedded in all life cycle of life product
• Main aim of SRE it is increase reliability
• The scope of the responsibilities is very variable and
depends on company layout
Надежность может теоритический определятся как вероятность успеха, то есть надежность = 1 – вероятность отказа, частотой отказов с другой стороны в терминах доступности как вероятность полученная из надежности, тестируемости и ремонтопригодности. Надежность играет ключевую роль в экономической эффективности систем.
Reliability engineering является разработка, которая подчеркивает надежность в управлении жизненным циклом продукта.
Reliability engineering касается оценки, предотвращения и управления высокими уровнями «пожизненной» инженерной неопределенности и рисков отказа.
SR зависит от корректных требований, архитектуры и реализации. SR программного обеспечения в значительной степени зависит от процесса разработки ПО, чтобы предугадывать и проектировать его , чтобы противостоять непредвиденным последствиям.
SRE- это дисциплина, которая включает аспекты разработки программного обеспечения и применяется к операциям, целью которых является создание ультрамасштабируемых и высоконадежных программных систем. SRE можно рассматривать как подмножество Devops, обладающее дополнительными наборами навыков.
DevOps - термин, используемый для обозначения набора практических методов, которые подчеркивают сотрудничество и коммуникацию как разработчиков программного обеспечения, так и специалистов в области информационных технологий (ИТ), в то же время автоматизируя процесс доставки программного обеспечения и изменения инфраструктуры. Он нацелен на создание культуры и среды, где создание, тестирование и выпуск программного обеспечения могут происходить быстро, часто и надежно