1. ViaSat implemented a DevOps model and tools like Splunk, xMatters, Jira and HipChat to improve incident response times and enable automated collaboration across teams.
2. Use cases described how full closed loop incidents could be managed from initial alert to resolution. CI/CD pipelines allowed for automated deployments and documentation updates.
3. Benefits included reducing response times from 10 minutes to 30 seconds on average, empowering on-call staff to focus on fixing issues rather than administrative tasks, and enabling seamless escalation to ChatOps teams.
Modern Operations at Scale within Viasat – How to Structure Teams and Build Automated Toolset
1. Modern Operations at Scale at ViaSat
How to Structure Teams and Build Automated Toolsets
C H R I S C R O C C O | N E T W O R K S O L U T I O N S E N G I N E E R | V I A S A T | L I N K E D I N . C O M / I N / C H R I S T O P H E R C R O C C O
M A R T Y J A C K S O N | D I R E C T O R , P R O D U C T E V A N G E L I S T | X M A T T E R S | L I N K E D I N . C O M / I N / M A R T Y J A C K S O N
3. IT is challenged to operate with
more agility and velocity
Business Demands
Legacy tools / tech
Quality/Reliability/Uptime
4. Change is hard and complex
❗
❗ ❗
❗
❗
❗
Executive
Engineer
Developer
IT Ops
Did my unit tests fail?
When am I on-call
next?
Too much info!
Are we having an
outage?
When will I get my
next status update?
This was not an
issue on my box?
What is your
monitor telling
you?
What’s the dial-in
number?
Did you see my email?
Open a ticket?
… and orchestration becomes
even more complex
5. Enable DevOps Automated Collaboration
Across Tools & Teams
PROCESS FLOW
DATA FLOW
AUTOMATED
ENGAGEMENT
RESPONSE DRIVEN
ORCHESTRATION
AUTOMATED
ENGAGEMENT
RESPONSE DRIVEN
ORCHESTRATION
8. ViaSat-2: Providing
Access to the Best
Available Network
Worldwide
ViaSat-2 will be our first big step toward spreading
high-capacity coverage worldwide for fixed
broadband and mobility services for aviation and
maritime. We already operate a global Ku-band
network for thousands of mobile customers, including
government and commercial aircraft as well as sea-
going vessels.
9. Seven times the throughput of
any previous Ka-band Satellite
Coverage for most of North
America, Caribbean and Central
America
New technology and expanding
markets require a change in how
we support the network
Viasat - 2
10. Solutions
Engineering
Team
Focus
On performance management including
alerting, auto remediation and visualization
using Splunk, Python, Grafana, Jira, xMatters
and other technologies
DevOps
Work with multiple DevOps teams
Automation
Automation of escalation, repair and
response activities
13. Central NOC
The Network Operations Center (NOC)
served as the intermediary between
events, appropriate resources and
resolution
Manual outreach
It was a manual process that was time-
consuming, error-prone and
inconsistent
Communications
Before xMatters, our IT
communications process was managed
by our Operations team through email/
outlook
Why we moved to a DevOps model
Who is on call?
There were many situations where an
on-call resource was unknown
Staff fatigue
We were often forced to scramble to
engage staff, often at the expense of
their work/life balance
Customer Satisfaction
Incidents that affected our
infrastructure and, ultimately, our
customers often went unresolved in a
timely manner
Why we moved to a DevOps Model
14. The role of a
central ops
team
Central team performs end to end
performance monitoring and
protects customer experience
This provides balance between
DevOps and protecting customer
experience
Individual app teams build and
run services
15. Integrating a complex IT landscape
Customer Experience
ChatOps
Monitoring and Alerting
Current Integrations
Planned Integrations
DevOps/Agile Support
Online meetings
VoIP Conferencing
Customer Support
CI/CD Pipeline
Targeted Incident Management
Documentation
18. Full Closed Loop Incident
Splunk
User defined multi-metric based
alerting which send a webhook
to xMatters.
xMatters
Parses incoming JSON payload,
supplements with additional
information and initiates targeted
event notification to stakeholders.
HipChat
xMatters uses the HipChat API to
notify DevOps teamrooms of
incident and create issue specific
room for anyone who is “Hands
on keyboard” for the event
JIRA
xMatters engages the JIRA API to
create or modify issue type specific
tracking of the alerted scenario
19. Targeted Notification
Notify only on
what’s important
xMatters parses the payload
to know what is needed to
take action, and what is
informational
Common Subject Headers
The event has the same
name across all tools
20. 1. Integrated tools removed gating elements
between network issues and first responders
2. It also removed the administrative
requirement of incident management so on-
call staff can focus on fixing the problem.
USE CASE #1 BENEFITS
23. CI/CD Pipeline
Ansible
Deployment pipeline to allow
for automated deployments to
multiple nodes across an
environment
xMatters
Notifies of deployment start
and playbook outcome
JIRA
Tickets associated with bugs,
tasks and custom issue types
automatically updated based
on outcome
Confluence
Release notes and associated
documentation automatically
updated in internal wiki space
25. 1. Continuous Deployment without
continuous monitoring
2. Rollback and remediation via mobile
response
3. Automate documentation and
release information for stakeholders
USE CASE #2 - BENEFITS
27. Call Volume Based Alerting
Customer Calls
Proprietary tools check the
health of individual customer
service at time of call
xMatters
Webhook from diagnostic tool
alerts appropriate devops
team to issue
Hipchat
Issues requiring additional
resources and review are
routed to a central Hipchat
room for ChatOps resolution
28. 1. Direct communications of problems
to fix agents
2. Reduced burden to customers from
issues
3. Seamless pivot to ChatOps for group
level resolutions
USE CASE #3 - BENEFITS
30. Empowering our people, providing peace of mind:
Response Time Improvement
from 10 minutes down to 30 seconds on average for
Exede network events
95%improvement
34. Join us on a new DevOps Journey
San Francisco
13 June
New York City
20 June
http://www.xmatters.com/agilitytour2017http://www.xmatters.com/agilitytour2017
Chicago
22 June
London
29 June
35. Thank you!
C H R I S C R O C C O | N E T W O R K S O L U T I O N S E N G I N E E R | V I A S A T | L I N K E D I N . C O M / I N / C H R I S T O P H E R C R O C C O
M A R T Y J A C K S O N | D I R E C T O R , P R O D U C T E V A N G E L I S T | X M A T T E R S | L I N K E D I N . C O M / I N / M A R T Y J A C K S O N