This is a review of how West Virginia University's Digital Services unit monitors and communicates system outages. In the past we have had little coverage for our systems. Notices amounted to emails which didn't work well at 2am. We've now been able to combine a number of solutions (New Relic, Pingdom, Slack, PagerDuty, StatusPage.io) into one cohesive monitoring and communication workflow.
Case Study: Automating Outage Monitoring & Communication
1. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
CASE STUDY:
AUTOMATING OUTAGE
MONITORING & COMMUNICATION
Dave Olsen, Professional Technologist
University Relations - Digital Services
WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
2. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
AGENDA
• The Problem: Lack of Monitoring and a Manual
Communication Workflow
• The Solution: Automating All The Things
• Quick Overview of Our Connected Services
• In-depth Feature Tour of StatusPage.io
3. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
THE PROBLEM
• Slowly lost central monitoring of services
• Received only email notifications for three
services that were monitored by a 3rd party UR -
Digital Services set-up
• Communication workflow wasn’t standardized
• Had no way to communicate during a complete
system failure
4. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
THE SOLUTION
• Took responsibility for monitoring our own
services
• Re-engaged Systems to find additional
monitoring opportunities
• Evaluated solutions that could send voice, text,
and push notifications in addition to email
• Documented a tiered communication workflow
• Identified an off-site hosting solution for comm
5. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
THE SOLUTION
MONITORING 1ST LEVEL COMM
app performance/uptime
uptime
voice, text, push notifications
off-site website
internal comm, logging
2ND LEVEL COMM
hardware (planned)
@wvuurdigital
Listservs
outages, prof tech cohort
Individual Emails
statuspage.io subscribers
automated, internal automated, internal/external manual, external, WOPE
6. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
THE SOLUTION
MONITORING 1ST LEVEL COMM
app performance/uptime
uptime
voice, text, push notifications
off-site website
internal comm, logging
2ND LEVEL COMM
hardware (planned)
@wvuurdigital
Listservs
outages, prof tech cohort
Individual Emails
statuspage.io subscribers
automated, internal automated, internal/external manual, external, WOPE
7. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
THE SOLUTION
MONITORING 1ST LEVEL COMM
app performance/uptime
uptime
voice, text, push notifications
off-site website
internal comm, logging
2ND LEVEL COMM
hardware (planned)
@wvuurdigital
Listservs
outages, prof tech cohort
Individual Emails
statuspage.io subscribers
automated, internal automated, internal/external manual, external, WOPE
8. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
THE LINCHPIN
StatusPage.io is the bridge
between the automated monitoring
and the manual second-level
communication channels.
14. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
IMPORTANT!
Incidents are the only things in
StatusPage.io that generate
tweets & emails. Component
updates do not.
Incidents are manual.
16. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
IMPORTANT!
Component statuses can be
automated. This way your page can
show the real status of your system
without manual intervention.
Automated updates can only
toggle between major outage and
operational.
32. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
HIGHER ED CLIENTS
• Georgia State University
• University of Derby
• University of California - Davis
• Macquarie University
• San Jose State University
• University of Michigan
33. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
WRAPPING IT UP
StatusPage.io is now the hub for
our automated and manual outage
communication efforts.
34. WEST VIRGINIA UNIVERSITY
UNIVERSITY RELATIONS - DIGITAL SERVICES
THANK YOU & QUESTIONS
Dave Olsen, Professional Technologist
University Relations - Digital Services