Murphys laws for Observability

© 2020 SPLUNK INC.
10 Murphy’s Laws of
Observability
And related guests
Dave McAllister

© 2020 SPLUNK INC.
Senior Technical Evangelist
Dave McAllister

© 2020 SPLUNK INC.
“ Whatever can go wrong, will go wrong”

© 2020 SPLUNK INC.
“ Whatever can go wrong, will go wrong
at the worst possible time”

© 2020 SPLUNK INC.
There are lots of Murphy’s categories
On Cooking
On Cars
On Physics
On measurements
On Vacations
• Murphy’s Technology
Laws
• Murphy’s Military Laws
• Murphy’s Laws on Love
and Sex
And spin offs
• Abbott’s Admonitions
• Allen’s Axioms

© 2020 SPLUNK INC.
Murphy’s for Observability #1
If you perceive that there are
four possible ways in which a
procedure can go wrong, and
circumvent these, then a fifth
way, unprepared for, will
promptly develop.

© 2020 SPLUNK INC.
A Brief View of Observability
TL;DR: Observability is a quality of software, services, platforms, or products that
allows us to understand how systems are behaving.
For Engineering purposes: Designing / defining the exposure of state variables in
a manner to allow inference of internal behavior

© 2020 SPLUNK INC.
Observability
is a Data
Problem
The more observable a system,
the quicker we can understand
why it’s acting up and fix it Metrics
Do I have
a problem?
Traces
Where is the
problem?
Logs
Why is the problem
happening?
DETECT TROUBLESHOOT ROOT CAUSE
Full-Stack Visibility
& Context-Rich Insights

© 2020 SPLUNK INC.
DATA
Every Solution Breeds
New Problems

© 2020 SPLUNK INC.
Complex
Emergent
Probe
Sense
Respond
Complicated
Good Practice
Sense
Analyze
Respond
Chaotic
Novel
Act
Sense
Respond
Simple
Best Practice
Sense
Categorize
Respond
• Microservices create complex
interactions.
• Failures don't exactly repeat.
• Debugging multi-tenancy is
painful.
• Monitoring alone can no longer
save us.
Observability Challenges
Cynefin Framework
Disorder
Microservices
Elastic and Ephemeral

© 2020 SPLUNK INC.
You can never run out of
things that can go wrong

© 2020 SPLUNK INC.
Observability Allows Us to Monitor For the
Unknown Unknowns
Today’s knowns are yesterday unknowns
Known Unknown
Known
Things we are aware of AND
understand
Things we are aware of but DON’T
understand
Unknown
Things we are NOT aware of but
understand
Things we are NOT aware of and DON’T
understand
Monitoring
Observability

© 2020 SPLUNK INC.
Nothing is as easy as it
looks

© 2020 SPLUNK INC.
EXAMPLE MICROSERVICE ARCHITECTURE

© 2020 SPLUNK INC.
Complexity
Drift and Skew
Ephemeral Behavior
Cloud-compute Elasticity

© 2020 SPLUNK INC.
Things get worse under
pressure

© 2020 SPLUNK INC.
All about scale

© 2020 SPLUNK INC.
• Kubernetes objects
• Backend services
• Deployed microservices
• Frequency of deployments
• Dimensions (e.g. pod labels) and high-cardinality
• Streaming vs batch & query analytics
• Alerting on multiple metric time series
Image source:
https://github.com/kubernetes/community/blob/master/
sig-scalability/configs-and-limits/thresholds.md
The Scalability Envelope
System scale is multi-dimensional

© 2020 SPLUNK INC.
If it is not in the
computer, it doesn’t
exist

© 2020 SPLUNK INC.
Sampling No Sampling

© 2020 SPLUNK INC.
Availability is a function
of time

© 2 0 2 0 S P L U N K I N C .
The resolution and speed of the
data directly impact the insights
you gain

© 2 0 2 0 S P L U N K I N C .
Interchangeable?
• Accuracy is that the measure is correct
• Precise means it is consistent with other measurements
Observability depends on both
But aggregation and analysis can skew this
Discussing accuracy and precision

© 2020 SPLUNK INC.
Missing the point
10 sec average =13.9
95% = 27.05
First 5 sec average =16.4
95% = 29.2
Second 5 sec average =11.4
95% = 19.4

© 2020 SPLUNK INC.
Data resolution ≠ Reporting resolution
• But both can be problematic
• Always deliver all data points regardless of reporting
• Finer granularity means more potential precision

© 2020 SPLUNK INC.
Facets of Technology
Backend
Infra. Monitoring
Incident Response
APM
Code Profiling
Dashboards
Events
and
Logs
Web User
Mobile User
On-prem
servers
Cloud Network VM Container Serverless
Packaged Apps Microservices
Supply Chain Online Services Digital Experience
Frontend
Synthetic
Monitoring RUM
Synthetics User
Monitoring
Endpoint
Monitoring
Environments
Aggregation, Analysis,
Visualization, Response
Network Performance Monitoring

© 2020 SPLUNK INC.
What is OpenTelemetry?
OpenCensus
+ =
OpenTelemetry: the next major version
of both OpenTracing and OpenCensus

© 2020 SPLUNK INC.
Predictive behavior
Sometimes you want to know what’s
coming
• Prediction is only as good as the data
precision and accuracy
• Historic versus Sudden Change
• (Trend) Stationary
• Expect false positives (and negatives)

© 2020 SPLUNK INC.
Hills Commentaries
• If we lose much by having things go
wrong, take all possible care
• If we have nothing to lose by change,
relax
• If we have everything to gain by
change, relax
• If it doesn’t matter, it does not matter
McAllister Corollary: Until it does

Murphys laws for Observability

More Related Content

What's hot

Similar to Murphys laws for Observability

More from Dave McAllister

Recently uploaded

Murphys laws for Observability