O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

What is Site Reliability Engineering (SRE)

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 13 Anúncio

Mais Conteúdo rRelacionado

Semelhante a What is Site Reliability Engineering (SRE) (20)

Mais de jeetendra mandal (20)

Anúncio

Mais recentes (20)

What is Site Reliability Engineering (SRE)

  1. 1. What is Site Reliability Engineering (SRE) Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to IT infrastructure and operations.The main objectives are to create highly reliable and scalable software systems. Site reliability engineering has been described as a specific implementation of DevOps. Site reliability engineering, as a job role, may be performed by solo practitioners or organized in teams usually being responsible for a combination of the following within a broader engineering organization: System availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.Site reliability engineers often have backgrounds in software engineering, system engineering, or system administration. Focuses of site reliability engineering include automation, system design, and improvements to system resilience.
  2. 2. What is Site Reliability Engineering (SRE) Site reliability engineering (SRE) is a software engineering approach to IT operations. SRE teams use software as a tool to manage systems, solve problems, and automate operations tasks. SRE takes the tasks that have historically been done by operations teams, often manually, and instead gives them to engineers or operations teams who use software and automation to solve problems and manage production systems. SRE is a valuable practice when creating scalable and highly reliable software systems. It helps manage large systems through code, which is more scalable and sustainable for system administrators (sysadmins) managing thousands or hundreds of thousands of machines. SRE helps teams find a balance between releasing new features and ensuring reliabilty for users. SRE supports teams that are moving their IT operations from a traditional approach to a cloud-native approach.
  3. 3. DevOps vs. SRE DevOps is an approach to culture, automation, and platform design intended to deliver increased business value and responsiveness through rapid, high-quality service delivery. SRE can be considered an implementation of DevOps. However, SRE differs from DevOps because it relies on site reliability engineers within the development team who also have an operations background to remove communication and workflow problems. The site reliability engineer role itself combines the skills of development teams and operations teams by requiring an overlap in responsibilities. SRE can help DevOps teams whose developers are overwhelmed by operations tasks and need someone with more specialized operations skills. When coding and building new features, DevOps focuses on moving
  4. 4. Site Reliability Engineering (SRE) Tools Understand SRE approach not written in stone, its all on organization means see on which tech organization is working, and accordingly adopt the required tools. •Jenkins •CircleCI •JIRA •Git •Terraform •Ansible •Grafana •Kibana
  5. 5. Benefits of SRE SRE is almost obsessively focused on reliability – it’s in the name. This focus on reliability across the implementation means that operational expenses are minimized, points of failure are eased and mitigated, and repeated functions that waste time and resources are automated. All of this together results in great economic savings. •Higher levels of application reliability and resiliency •Increased efficiency through automation •Improved customer satisfaction and retention •Driving a culture of continuous improvement •Manage on-call and emergency support. •Ensure software has good logging and diagnostics.
  6. 6. Drawbacks of SRE There are some drawbacks to the SRE approach, however. Perhaps the largest one is that its still a relatively unproven concept. DevOps, by contrast, is a well-tested, battle-hardened option that is as common as it is understood. SRE, on the other hand, is still relatively recent and has a lower adoption rate. As such, it’s not as proven, and fixes to the multiple potential cracks may not be obvious. SRE also has a weakness in its requirement for strong and directive management. Because SRE rides a very thin line in terms of business logic and implementation, it’s very easy for an SRE team to “fall off the track” so to speak. The only fix to this is a stronger management body, which can result in micromanagement and loss of efficiency.
  7. 7. SRE Best Practices •Ensuring reliability - getting systems back to steady-state as quickly as possible •Eliminating toil - automating wherever possible •Blameless postmortems - driving better cross-team collaboration •Observing what matters - gaining full visibility into system health •Being pro-active - living and breathing SLOs to identify and remediate issues before SLAs are violated •Architecting for resiliency - Informing architectural design decisions to build more reliable systems
  8. 8. What does a site reliability engineer do? A site reliability engineer is a unique role that requires either a background as a sysadmin, a software developer with additional operations experience, or someone in an IT operations role that also has software development skills. SRE teams are responsible for how code is deployed, configured, and monitored, as well as the availability, latency, change management, emergency response, and capacity management of services in production. SRE teams determine the launch of new features by using service- level agreements (SLAs) to define the required reliability of the system through service-level indicators (SLI) and service-level objectives (SLO).
  9. 9. Key Site Reliability Engineering Skills The type of skills required will differ organization to organization, as is widely based on the type of application a particular organization is using, and how and where it is deployed and monitored. The other essential skills for SREs are to be more focused on application Monitoring and Diagnostics. •Know version control. •Knowledge of Linux (most preferably). •Automate things over the manual work. •CI/CD Knowledge. •Knows how to troubleshoot.
  10. 10. Summary Site reliability engineers split their time between operations tasks and project work. According to SRE best practices from Google, site reliability engineers can only spend a maximum of 50% of their time on operations—and they should be monitored to ensure they don’t go over. SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. SRE is fundamentally doing work that has historically been done by an operations team, but using engineers with software expertise and banking on the fact that these engineers are inherently both predisposed to, and have the ability to, substitute automation for
  11. 11. THANK YOU Like the Video and Subscribe the Channel

×