SlideShare a Scribd company logo
1 of 10
Brian Cory Sherwin
Site Reliability Engineer
LinkedIn
AutoRemediation and Workflow
@ LinkedIn
• Key Concepts
• Work Flow Ideas
• LinkedIn’s Solution
3
Agenda
Agenda
• Monitoring Systems
• Remediation Systems
• Action Systems
4
Separation of Powers
WorkflowMonitoring
Action
Systems
Restart An Application
• Grab some logs
• Start an application
• Open a ticket
5
Gather Data Restart
Ticket
Simple Work Flow Example
Key Goals
• Broker between action systems
• Linear Execution of Events
• Collaboration and ease of use
• Focus on Simple use cases
6
Remediation
Broker
Monitoring
Remote
Execution
Ticketing
Building an AutoRemedation @ LinkedIn
• Guaranteed Data Collection
• Better Accountability
• Formalized automation
• Extensibility
7
Gather Data Restart
Ticket
Why Use a Workflow
• Linear Execution
• Best Effort
• Limited Work Flow Control
8
Work Flow @ LinkedIn
Remediation
Broker
Monitoring
Remote
Execution
Ticketing
Work Flow Control Types
• Best Effort
• Guaranteed
• Abort
• OnFailure (planned)
9
Gather Data Restart
Ticket
LinkedIn: Work Flow Control
• Brian Cory Sherwin (bcs)
• LinkedIn
• bsherwin@linkedin
10
Questions?
Questions?

More Related Content

Similar to AutoRemediation and Workflow at LinkedIn

The 5 Critical Pillars of Office 365 Readiness
The 5 Critical Pillars of Office 365 ReadinessThe 5 Critical Pillars of Office 365 Readiness
The 5 Critical Pillars of Office 365 ReadinessAdam Levithan
 
AP Automation for EBS or PeopleSoft with Oracle WebCenter
AP Automation for EBS or PeopleSoft with Oracle WebCenterAP Automation for EBS or PeopleSoft with Oracle WebCenter
AP Automation for EBS or PeopleSoft with Oracle WebCenterBrian Huff
 
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)Jeff Ho
 
Making auditing great again! Office 365
Making auditing great again! Office 365Making auditing great again! Office 365
Making auditing great again! Office 365Paul Hunt
 
How to build a change workflow process
How to build a change workflow processHow to build a change workflow process
How to build a change workflow processTufin
 
What's New for IT Professionals in SharePoint Server 2013
What's New for IT Professionals in SharePoint Server 2013What's New for IT Professionals in SharePoint Server 2013
What's New for IT Professionals in SharePoint Server 2013CTE Solutions Inc.
 
Governance Strategies for Office 365
Governance Strategies for Office 365Governance Strategies for Office 365
Governance Strategies for Office 365Montrium
 
Nicole Larsen-Portfolio
Nicole Larsen-PortfolioNicole Larsen-Portfolio
Nicole Larsen-PortfolioNicole Larsen
 
How we built analytics from scratch (in seven easy steps)
How we built analytics from scratch (in seven easy steps)How we built analytics from scratch (in seven easy steps)
How we built analytics from scratch (in seven easy steps)plumbee
 
Best Practices for a Successful SharePoint Migration or Upgrade to the Cloud
Best Practices for a Successful SharePoint Migration or Upgrade to the CloudBest Practices for a Successful SharePoint Migration or Upgrade to the Cloud
Best Practices for a Successful SharePoint Migration or Upgrade to the CloudPerficient, Inc.
 
NetSuite Data Mining and Reporting
NetSuite Data Mining and ReportingNetSuite Data Mining and Reporting
NetSuite Data Mining and ReportingHassan RB
 
Beyond Automation: Extracting Actionable Intelligence from Clinical Trials
Beyond Automation: Extracting Actionable Intelligence from Clinical TrialsBeyond Automation: Extracting Actionable Intelligence from Clinical Trials
Beyond Automation: Extracting Actionable Intelligence from Clinical TrialsMontrium
 
2008 12 4_quickenroll_rev phoenixv2
2008 12 4_quickenroll_rev phoenixv22008 12 4_quickenroll_rev phoenixv2
2008 12 4_quickenroll_rev phoenixv2kalawhite3
 
2017 0223 webinar_nonprofit_accountability
2017 0223 webinar_nonprofit_accountability2017 0223 webinar_nonprofit_accountability
2017 0223 webinar_nonprofit_accountabilityIntacct Corporation
 
Метрики, документация, слайды и встречи в работе архитектора
Метрики, документация, слайды и встречи в работе архитектораМетрики, документация, слайды и встречи в работе архитектора
Метрики, документация, слайды и встречи в работе архитектораSQALab
 
13 - Building Info Systems
13 -  Building Info Systems13 -  Building Info Systems
13 - Building Info SystemsHemant Nagwekar
 

Similar to AutoRemediation and Workflow at LinkedIn (20)

The 5 Critical Pillars of Office 365 Readiness
The 5 Critical Pillars of Office 365 ReadinessThe 5 Critical Pillars of Office 365 Readiness
The 5 Critical Pillars of Office 365 Readiness
 
AP Automation for EBS or PeopleSoft with Oracle WebCenter
AP Automation for EBS or PeopleSoft with Oracle WebCenterAP Automation for EBS or PeopleSoft with Oracle WebCenter
AP Automation for EBS or PeopleSoft with Oracle WebCenter
 
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
Panel Review by Jeff Ho (Brief version: removing logo and all confidential info)
 
Making auditing great again! Office 365
Making auditing great again! Office 365Making auditing great again! Office 365
Making auditing great again! Office 365
 
How to build a change workflow process
How to build a change workflow processHow to build a change workflow process
How to build a change workflow process
 
What's New for IT Professionals in SharePoint Server 2013
What's New for IT Professionals in SharePoint Server 2013What's New for IT Professionals in SharePoint Server 2013
What's New for IT Professionals in SharePoint Server 2013
 
Governance Strategies for Office 365
Governance Strategies for Office 365Governance Strategies for Office 365
Governance Strategies for Office 365
 
Nicole Larsen-Portfolio
Nicole Larsen-PortfolioNicole Larsen-Portfolio
Nicole Larsen-Portfolio
 
TenT-Day10.pptx
TenT-Day10.pptxTenT-Day10.pptx
TenT-Day10.pptx
 
TenT-Day10.pptx
TenT-Day10.pptxTenT-Day10.pptx
TenT-Day10.pptx
 
How we built analytics from scratch (in seven easy steps)
How we built analytics from scratch (in seven easy steps)How we built analytics from scratch (in seven easy steps)
How we built analytics from scratch (in seven easy steps)
 
Best Practices for a Successful SharePoint Migration or Upgrade to the Cloud
Best Practices for a Successful SharePoint Migration or Upgrade to the CloudBest Practices for a Successful SharePoint Migration or Upgrade to the Cloud
Best Practices for a Successful SharePoint Migration or Upgrade to the Cloud
 
Bizcompass Presentation
Bizcompass Presentation Bizcompass Presentation
Bizcompass Presentation
 
NetSuite Data Mining and Reporting
NetSuite Data Mining and ReportingNetSuite Data Mining and Reporting
NetSuite Data Mining and Reporting
 
Beyond Automation: Extracting Actionable Intelligence from Clinical Trials
Beyond Automation: Extracting Actionable Intelligence from Clinical TrialsBeyond Automation: Extracting Actionable Intelligence from Clinical Trials
Beyond Automation: Extracting Actionable Intelligence from Clinical Trials
 
Super charged prototyping
Super charged prototypingSuper charged prototyping
Super charged prototyping
 
2008 12 4_quickenroll_rev phoenixv2
2008 12 4_quickenroll_rev phoenixv22008 12 4_quickenroll_rev phoenixv2
2008 12 4_quickenroll_rev phoenixv2
 
2017 0223 webinar_nonprofit_accountability
2017 0223 webinar_nonprofit_accountability2017 0223 webinar_nonprofit_accountability
2017 0223 webinar_nonprofit_accountability
 
Метрики, документация, слайды и встречи в работе архитектора
Метрики, документация, слайды и встречи в работе архитектораМетрики, документация, слайды и встречи в работе архитектора
Метрики, документация, слайды и встречи в работе архитектора
 
13 - Building Info Systems
13 -  Building Info Systems13 -  Building Info Systems
13 - Building Info Systems
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

AutoRemediation and Workflow at LinkedIn

  • 1.
  • 2. Brian Cory Sherwin Site Reliability Engineer LinkedIn AutoRemediation and Workflow @ LinkedIn
  • 3. • Key Concepts • Work Flow Ideas • LinkedIn’s Solution 3 Agenda Agenda
  • 4. • Monitoring Systems • Remediation Systems • Action Systems 4 Separation of Powers WorkflowMonitoring Action Systems
  • 5. Restart An Application • Grab some logs • Start an application • Open a ticket 5 Gather Data Restart Ticket Simple Work Flow Example
  • 6. Key Goals • Broker between action systems • Linear Execution of Events • Collaboration and ease of use • Focus on Simple use cases 6 Remediation Broker Monitoring Remote Execution Ticketing Building an AutoRemedation @ LinkedIn
  • 7. • Guaranteed Data Collection • Better Accountability • Formalized automation • Extensibility 7 Gather Data Restart Ticket Why Use a Workflow
  • 8. • Linear Execution • Best Effort • Limited Work Flow Control 8 Work Flow @ LinkedIn Remediation Broker Monitoring Remote Execution Ticketing
  • 9. Work Flow Control Types • Best Effort • Guaranteed • Abort • OnFailure (planned) 9 Gather Data Restart Ticket LinkedIn: Work Flow Control
  • 10. • Brian Cory Sherwin (bcs) • LinkedIn • bsherwin@linkedin 10 Questions? Questions?

Editor's Notes

  1. At least in my designs The key concept is to separate the existence of a monitoring system, a workflow system, and things doing the work. I swiss army knife might have a corkscrew and a screw driver, but they aren’t necessarily very good at any of those jobs. This is potentially a system trying to solve your issues before they become issues. Do you want a purpose built system or do you want something trying to masquerade as 3 separate systems?
  2. lets talk about a simple example of a work flow This is a very simple example I’ll reference a few times. I’ll reference this as a plan throughout the presentation and the individual units of work as a job.
  3. We had a healthy system together already. Monitoring, remote executions, code deployment. What we missed was the glue between these system But how to glue? Focus on simple use cases that you’re already doing. Restarting a web server? Re-kicking a box? All of these should be automated by you first so you don’t have run it manually. Ensure that using the system is easy to use. This should be one of the key design cases. Creating, running, and scheduling remediation work flows should not be challenging. Now that’s not to detract from the importance of understanding what is going on. At the end of the day you need to have faith in your monitoring system.
  4. Lets go back to the example work flow We can a guarantee data collection attempt. I’ve been in a few situation where problems were being fixed by the ops team without gather good data to resolve. As an app owner you should be aware of what data you need to fixe an issue. Better Accountability: We know exactly how many times we’ve done something. Ops teams sometimes can toil in the darkness restarting applications (or other simpler systems that just restart automatically) By keep better records we can make better business decisions on fixing bugs. Is it a .1% problem or a .01% problem. Without good record keeping we’d never know. Related to above: Formalizing automation would mean that simpler solutions that restart applications automatically could hide problems easier. Additionally by using a formalized system we can train less technical people to use it In addition to formalizing, extensibility is key. We do similar actions across our platform. We have dozens of applications with similar infrastructure. We can recycle automations from one group to the next without have to train people use new systems.
  5. Linear. We execute jobs with no branching. No conditionals. Many workflows can be solved using this. Allowing branching work flows is not a necessary feature and can just lead to complicated configurations. best effort: The monitoring system should be telling us to fix a problem. Each time the monitoring system tells us to fix, we begin the work flow. If we fail, it shouldn’t be an issue because the monitoring system will know its still wrong and remind us to run a work flow again. We offer users some limited workflow control options. We’ll detail that in the next slide
  6. The key understanding here is what to do on plan health changes. If gathering data fails, do you want to not attempt to restart? The answer varies on environment. The following descriptions are some of the work flow ideas we’ve come up with during our sojourn into auto remediation. Best Effort: Runs only when plan state is healthy. A particular unit of work’s failure to succeed has no bearing on further execution Guaranteed: Runs regardless of plan state. Its failure will move the plan state to unhealthy. Abort: Only runs when plan state is healthy, on failure, makes plan state unhealthy OnFailure: Runs when plan state is unhealthy, since we’re still designing it, its success could possibly move plan state back to healthy (or perhaps leave it unhealthy).