Modern Operations at Scale at ViaSat
How to Structure Teams and Build Automated Toolsets
C H R I S C R O C C O | N E T W O R K S O L U T I O N S E N G I N E E R | V I A S A T | L I N K E D I N . C O M / I N / C H R I S T O P H E R C R O C C O
M A R T Y J A C K S O N | D I R E C T O R , P R O D U C T E V A N G E L I S T | X M A T T E R S | L I N K E D I N . C O M / I N / M A R T Y J A C K S O N
Digital business
transformations
are hard
IT is challenged to operate with
more agility and velocity
Business Demands
 Legacy tools / tech
Quality/Reliability/Uptime
Change is hard and complex


 ❗
❗ ❗
❗
❗
❗
Executive
Engineer
Developer
IT Ops
Did my unit tests fail?

When am I on-call
next?

Too much info!

Are we having an
outage?

When will I get my
next status update?
This was not an
issue on my box?

What is your
monitor telling
you?
What’s the dial-in
number?

Did you see my email?

Open a ticket?
… and orchestration becomes
even more complex
Enable DevOps Automated Collaboration
Across Tools & Teams
PROCESS FLOW
DATA FLOW
AUTOMATED 

ENGAGEMENT

RESPONSE DRIVEN
ORCHESTRATION
AUTOMATED 

ENGAGEMENT

RESPONSE DRIVEN
ORCHESTRATION
ViaSat
ViaSat: Connecting the world is our mission
ViaSat-2: Providing
Access to the Best
Available Network
Worldwide
ViaSat-2 will be our first big step toward spreading
high-capacity coverage worldwide for fixed
broadband and mobility services for aviation and
maritime. We already operate a global Ku-band
network for thousands of mobile customers, including
government and commercial aircraft as well as sea-
going vessels.
Seven times the throughput of
any previous Ka-band Satellite
Coverage for most of North
America, Caribbean and Central
America
New technology and expanding
markets require a change in how
we support the network
Viasat - 2
Solutions
Engineering
Team
Focus
On performance management including
alerting, auto remediation and visualization
using Splunk, Python, Grafana, Jira, xMatters
and other technologies
DevOps
Work with multiple DevOps teams
Automation
Automation of escalation, repair and
response activities
The Need for Change
Deploying ViaSat-2
Moving to a
DevOps model
Central NOC
The Network Operations Center (NOC)
served as the intermediary between
events, appropriate resources and
resolution
Manual outreach
It was a manual process that was time-
consuming, error-prone and
inconsistent
Communications
Before xMatters, our IT
communications process was managed
by our Operations team through email/
outlook
Why we moved to a DevOps model
Who is on call?
There were many situations where an
on-call resource was unknown
Staff fatigue
We were often forced to scramble to
engage staff, often at the expense of
their work/life balance
Customer Satisfaction
Incidents that affected our
infrastructure and, ultimately, our
customers often went unresolved in a
timely manner
Why we moved to a DevOps Model
The role of a
central ops
team
Central team performs end to end
performance monitoring and
protects customer experience
This provides balance between
DevOps and protecting customer
experience
Individual app teams build and
run services
Integrating a complex IT landscape
Customer Experience
ChatOps
Monitoring and Alerting
Current Integrations
Planned Integrations
DevOps/Agile Support
Online meetings
VoIP Conferencing
Customer Support
CI/CD Pipeline
Targeted Incident Management
Documentation
Use Cases
Use Case #1
Full Closed Loop
Incident
Full Closed Loop Incident
Splunk
User defined multi-metric based
alerting which send a webhook
to xMatters.
xMatters
Parses incoming JSON payload,
supplements with additional
information and initiates targeted
event notification to stakeholders.
HipChat
xMatters uses the HipChat API to
notify DevOps teamrooms of
incident and create issue specific
room for anyone who is “Hands
on keyboard” for the event
JIRA
xMatters engages the JIRA API to
create or modify issue type specific
tracking of the alerted scenario
Targeted Notification
Notify only on
what’s important
xMatters parses the payload
to know what is needed to
take action, and what is
informational
Common Subject Headers
The event has the same
name across all tools
1.  Integrated tools removed gating elements
between network issues and first responders
2.  It also removed the administrative
requirement of incident management so on-
call staff can focus on fixing the problem.
USE CASE #1 BENEFITS
Space matters: A kinder
communications system
Use Case #2
CI/CD Pipeline
CI/CD Pipeline
Ansible
Deployment pipeline to allow
for automated deployments to
multiple nodes across an
environment
xMatters
Notifies of deployment start
and playbook outcome
JIRA
Tickets associated with bugs,
tasks and custom issue types
automatically updated based
on outcome
Confluence
Release notes and associated
documentation automatically
updated in internal wiki space
Centralized event information
All events in one place
Allows you to fail small, fail fast
1.  Continuous Deployment without
continuous monitoring
2. Rollback and remediation via mobile
response
3. Automate documentation and
release information for stakeholders
USE CASE #2 - BENEFITS
Use Case #3
Call volume based
alerting
Call Volume Based Alerting
Customer Calls
Proprietary tools check the
health of individual customer
service at time of call
xMatters
Webhook from diagnostic tool
alerts appropriate devops
team to issue
Hipchat
Issues requiring additional
resources and review are
routed to a central Hipchat
room for ChatOps resolution
1.  Direct communications of problems
to fix agents
2. Reduced burden to customers from
issues
3. Seamless pivot to ChatOps for group
level resolutions
USE CASE #3 - BENEFITS
Business Metrics
And benefits
Empowering our people, providing peace of mind:
Response Time Improvement
from 10 minutes down to 30 seconds on average for
Exede network events
95%improvement
xMatters
xMatters: Connecting Tools and Team Collaboratively
xMatters: Connecting Tools and Team Collaboratively
Join us on a new DevOps Journey
San Francisco
13 June
New York City
20 June
http://www.xmatters.com/agilitytour2017http://www.xmatters.com/agilitytour2017
Chicago
22 June
London
29 June
Thank you!
C H R I S C R O C C O | N E T W O R K S O L U T I O N S E N G I N E E R | V I A S A T | L I N K E D I N . C O M / I N / C H R I S T O P H E R C R O C C O
M A R T Y J A C K S O N | D I R E C T O R , P R O D U C T E V A N G E L I S T | X M A T T E R S | L I N K E D I N . C O M / I N / M A R T Y J A C K S O N

Modern Operations at Scale within Viasat – How to Structure Teams and Build Automated Toolset

  • 1.
    Modern Operations atScale at ViaSat How to Structure Teams and Build Automated Toolsets C H R I S C R O C C O | N E T W O R K S O L U T I O N S E N G I N E E R | V I A S A T | L I N K E D I N . C O M / I N / C H R I S T O P H E R C R O C C O M A R T Y J A C K S O N | D I R E C T O R , P R O D U C T E V A N G E L I S T | X M A T T E R S | L I N K E D I N . C O M / I N / M A R T Y J A C K S O N
  • 2.
  • 3.
    IT is challengedto operate with more agility and velocity Business Demands Legacy tools / tech Quality/Reliability/Uptime
  • 4.
    Change is hardand complex ❗ ❗ ❗ ❗ ❗ ❗ Executive Engineer Developer IT Ops Did my unit tests fail? When am I on-call next? Too much info! Are we having an outage? When will I get my next status update? This was not an issue on my box? What is your monitor telling you? What’s the dial-in number? Did you see my email? Open a ticket? … and orchestration becomes even more complex
  • 5.
    Enable DevOps AutomatedCollaboration Across Tools & Teams PROCESS FLOW DATA FLOW AUTOMATED 
 ENGAGEMENT RESPONSE DRIVEN ORCHESTRATION AUTOMATED 
 ENGAGEMENT RESPONSE DRIVEN ORCHESTRATION
  • 6.
  • 7.
    ViaSat: Connecting theworld is our mission
  • 8.
    ViaSat-2: Providing Access tothe Best Available Network Worldwide ViaSat-2 will be our first big step toward spreading high-capacity coverage worldwide for fixed broadband and mobility services for aviation and maritime. We already operate a global Ku-band network for thousands of mobile customers, including government and commercial aircraft as well as sea- going vessels.
  • 9.
    Seven times thethroughput of any previous Ka-band Satellite Coverage for most of North America, Caribbean and Central America New technology and expanding markets require a change in how we support the network Viasat - 2
  • 10.
    Solutions Engineering Team Focus On performance managementincluding alerting, auto remediation and visualization using Splunk, Python, Grafana, Jira, xMatters and other technologies DevOps Work with multiple DevOps teams Automation Automation of escalation, repair and response activities
  • 11.
    The Need forChange Deploying ViaSat-2
  • 12.
  • 13.
    Central NOC The NetworkOperations Center (NOC) served as the intermediary between events, appropriate resources and resolution Manual outreach It was a manual process that was time- consuming, error-prone and inconsistent Communications Before xMatters, our IT communications process was managed by our Operations team through email/ outlook Why we moved to a DevOps model Who is on call? There were many situations where an on-call resource was unknown Staff fatigue We were often forced to scramble to engage staff, often at the expense of their work/life balance Customer Satisfaction Incidents that affected our infrastructure and, ultimately, our customers often went unresolved in a timely manner Why we moved to a DevOps Model
  • 14.
    The role ofa central ops team Central team performs end to end performance monitoring and protects customer experience This provides balance between DevOps and protecting customer experience Individual app teams build and run services
  • 15.
    Integrating a complexIT landscape Customer Experience ChatOps Monitoring and Alerting Current Integrations Planned Integrations DevOps/Agile Support Online meetings VoIP Conferencing Customer Support CI/CD Pipeline Targeted Incident Management Documentation
  • 16.
  • 17.
    Use Case #1 FullClosed Loop Incident
  • 18.
    Full Closed LoopIncident Splunk User defined multi-metric based alerting which send a webhook to xMatters. xMatters Parses incoming JSON payload, supplements with additional information and initiates targeted event notification to stakeholders. HipChat xMatters uses the HipChat API to notify DevOps teamrooms of incident and create issue specific room for anyone who is “Hands on keyboard” for the event JIRA xMatters engages the JIRA API to create or modify issue type specific tracking of the alerted scenario
  • 19.
    Targeted Notification Notify onlyon what’s important xMatters parses the payload to know what is needed to take action, and what is informational Common Subject Headers The event has the same name across all tools
  • 20.
    1.  Integrated toolsremoved gating elements between network issues and first responders 2.  It also removed the administrative requirement of incident management so on- call staff can focus on fixing the problem. USE CASE #1 BENEFITS
  • 21.
    Space matters: Akinder communications system
  • 22.
  • 23.
    CI/CD Pipeline Ansible Deployment pipelineto allow for automated deployments to multiple nodes across an environment xMatters Notifies of deployment start and playbook outcome JIRA Tickets associated with bugs, tasks and custom issue types automatically updated based on outcome Confluence Release notes and associated documentation automatically updated in internal wiki space
  • 24.
    Centralized event information Allevents in one place Allows you to fail small, fail fast
  • 25.
    1.  Continuous Deploymentwithout continuous monitoring 2. Rollback and remediation via mobile response 3. Automate documentation and release information for stakeholders USE CASE #2 - BENEFITS
  • 26.
    Use Case #3 Callvolume based alerting
  • 27.
    Call Volume BasedAlerting Customer Calls Proprietary tools check the health of individual customer service at time of call xMatters Webhook from diagnostic tool alerts appropriate devops team to issue Hipchat Issues requiring additional resources and review are routed to a central Hipchat room for ChatOps resolution
  • 28.
    1.  Direct communicationsof problems to fix agents 2. Reduced burden to customers from issues 3. Seamless pivot to ChatOps for group level resolutions USE CASE #3 - BENEFITS
  • 29.
  • 30.
    Empowering our people,providing peace of mind: Response Time Improvement from 10 minutes down to 30 seconds on average for Exede network events 95%improvement
  • 31.
  • 32.
    xMatters: Connecting Toolsand Team Collaboratively
  • 33.
    xMatters: Connecting Toolsand Team Collaboratively
  • 34.
    Join us ona new DevOps Journey San Francisco 13 June New York City 20 June http://www.xmatters.com/agilitytour2017http://www.xmatters.com/agilitytour2017 Chicago 22 June London 29 June
  • 35.
    Thank you! C HR I S C R O C C O | N E T W O R K S O L U T I O N S E N G I N E E R | V I A S A T | L I N K E D I N . C O M / I N / C H R I S T O P H E R C R O C C O M A R T Y J A C K S O N | D I R E C T O R , P R O D U C T E V A N G E L I S T | X M A T T E R S | L I N K E D I N . C O M / I N / M A R T Y J A C K S O N