Monitoring on Kubernetes
using Prometheus
Chandresh Pancholi
Engineer at AI
Kubernetes at Arvind Internet
● Our Infra is deployed on AWS
● Kubernetes minions are running on m4.xlarge instances
● Kubernetes version 1.7.5 in QA/Prod, 1.8.3 on Pre-prod
● QA/Dev, Pre-Prod & Production running on Kubernetes
● Total Pods ⇒ More than 350 (QA/Dev, Prod)
● Total services ⇒ More than 200 (QA/Dev, Prod)
● Running Mongo, MySQL, Redis, Hazelcast in Kubernetes in QA/Dev
What is Kubernetes?
Kubernetes is an open-source container orchestration engine and also an
abstraction layer for managing full stack operations of hosts and containers.
From deployment, Scaling, Load Balancing and to rolling updates of
containerized applications across multiple hosts within a cluster. Kubernetes
make sure that your applications are in the desired state.
Kubernetes Architecture
Kubernetes Node Architecture
Master: The machine that controls Kubernetes nodes. This is where all task assignments
originate.
Node: These machines perform the requested, assigned tasks. The Kubernetes master
controls them.
Deployments: Provides declarative updates for
Pod: A group of one or more containers deployed to a single node. All containers in a pod
share an IP address, IPC, hostname, and other resources. Pods abstract network and
storage away from the underlying container. This lets you move containers around the
cluster more easily.
Service: This decouples work definitions from the pods. Kubernetes service
proxies automatically get service requests to the right pod—no matter where it
moves to in the cluster or even if it’s been replaced.
Config maps : ConfigMaps allow you to decouple configuration artifacts from
image content to keep containerized applications portable
Secrets: Secret are intended to hold sensitive information, such as passwords,
OAuth tokens, and ssh keys. Putting this information in a secret is safer and
more flexible than putting it verbatim in a pod definition or in a docker image
Monitoring at AI (earlier)
EC2
Sensu
Kubernetes
µServices
Cons
1. Multiple monitoring system
2. Difficulty in troubleshooting
3. Additional Infrastructure cost to support three monitoring system
4. Graphite doesn’t provide pod level Application metrics
5. Infra team need to understand Sensu, Prometheus alerting
6. Application metrics are single dimension Ex. (a.b.c.d.99)
7. Grafana alerting for Application metrics
Prometheus
● It developed at SoundCloud by ex-Googlers
● Prometheus is a close cousin of Kubernetes
● A multi-dimensional data model with time series data identified by metric
name and key/value pairs
● Alerting and graphing are unified, using the same language.
● Time series collection happens via a pull model over HTTP
● Targets are discovered via service discovery or static configuration
● Provides multiple exporters to send AWS EC2, Kafka, Mongo, Cassandra,
RMQ, Redis metrics
Sample metrics
{endpoint="http",instance="100.110.140.82:8080",job="hello",namespace="defau
lt",pod="hello-946046218-397x2",service="hello-world"}
{endpoint="http",instance="100.98.66.79:8080",job="hello",namespace="default",
pod="hello-946046218-5h39f",service="hello-world"}
node_exporter
Prometheus exporter for hardware and OS metrics exposed by *NIX kernels,
written in Go with pluggable metric collectors.
Metrics
● CPU (system, user, nice, iowait, steal, idle, irq, softirq, guest)
● Memory (Apps, Buffers, Cached, Free, Sla, SwapCached, PageTables, VmallocUser, Swap, Committed, Mapped,
Active, Inactive)
● Load
● Disk Space Used in percent
● Disk Utilization per Device
● Disk IOS per device (read, write)
● Disk Throughput per Device (read, write)
● Context Switches
● Network Traffic (In, Out)
● Netstat (Established)
● UDP stats (InDatagrams, InErrors, OutDatagrams, NoPorts)
● Conntrack
AWS EC2 config
Relabelling Tags
__meta_ec2_availability_zone Availability zone
__meta_ec2_instance_id Instance Id
__meta_ec2_instance_state Instance state
__meta_ec2_instance_type Instance type
__meta_ec2_private_ip Private ip
__meta_ec2_public_dns_name Public DNS Name
__meta_ec2_public_ip Public IP
__meta_ec2_tag_<tagkey> Custom Tag key
Alerting
Approach #1 - Prometheus on EC2
EC2
Kubernetes
node ex
µServices
AWS EC2
#1. Getting EC2 server metrics is quite easy and straightforward. Prometheus
provides EC2 discovery.
#2. Getting Kubernetes and Application metrics is very complex. It has 300+
lines of configuration to support just Kubernetes metrics
Approach #2. Use Prometheus operator
What is Prometheus operator?
The Prometheus Operator creates, configures, and manages Prometheus
monitoring instances. Automatically generates monitoring target configurations
based on familiar Kubernetes label queries.
Service monitor Custom Resource Definition(CRD)
Prometheus Custom Resource Definition (CRD)
Monitoring on Kubernetes using prometheus

Monitoring on Kubernetes using prometheus

  • 1.
    Monitoring on Kubernetes usingPrometheus Chandresh Pancholi Engineer at AI
  • 2.
    Kubernetes at ArvindInternet ● Our Infra is deployed on AWS ● Kubernetes minions are running on m4.xlarge instances ● Kubernetes version 1.7.5 in QA/Prod, 1.8.3 on Pre-prod ● QA/Dev, Pre-Prod & Production running on Kubernetes ● Total Pods ⇒ More than 350 (QA/Dev, Prod) ● Total services ⇒ More than 200 (QA/Dev, Prod) ● Running Mongo, MySQL, Redis, Hazelcast in Kubernetes in QA/Dev
  • 3.
    What is Kubernetes? Kubernetesis an open-source container orchestration engine and also an abstraction layer for managing full stack operations of hosts and containers. From deployment, Scaling, Load Balancing and to rolling updates of containerized applications across multiple hosts within a cluster. Kubernetes make sure that your applications are in the desired state.
  • 4.
  • 5.
  • 6.
    Master: The machinethat controls Kubernetes nodes. This is where all task assignments originate. Node: These machines perform the requested, assigned tasks. The Kubernetes master controls them. Deployments: Provides declarative updates for Pod: A group of one or more containers deployed to a single node. All containers in a pod share an IP address, IPC, hostname, and other resources. Pods abstract network and storage away from the underlying container. This lets you move containers around the cluster more easily.
  • 7.
    Service: This decoupleswork definitions from the pods. Kubernetes service proxies automatically get service requests to the right pod—no matter where it moves to in the cluster or even if it’s been replaced. Config maps : ConfigMaps allow you to decouple configuration artifacts from image content to keep containerized applications portable Secrets: Secret are intended to hold sensitive information, such as passwords, OAuth tokens, and ssh keys. Putting this information in a secret is safer and more flexible than putting it verbatim in a pod definition or in a docker image
  • 8.
    Monitoring at AI(earlier) EC2 Sensu Kubernetes µServices
  • 9.
    Cons 1. Multiple monitoringsystem 2. Difficulty in troubleshooting 3. Additional Infrastructure cost to support three monitoring system 4. Graphite doesn’t provide pod level Application metrics 5. Infra team need to understand Sensu, Prometheus alerting 6. Application metrics are single dimension Ex. (a.b.c.d.99) 7. Grafana alerting for Application metrics
  • 10.
    Prometheus ● It developedat SoundCloud by ex-Googlers ● Prometheus is a close cousin of Kubernetes ● A multi-dimensional data model with time series data identified by metric name and key/value pairs ● Alerting and graphing are unified, using the same language. ● Time series collection happens via a pull model over HTTP ● Targets are discovered via service discovery or static configuration ● Provides multiple exporters to send AWS EC2, Kafka, Mongo, Cassandra, RMQ, Redis metrics
  • 12.
  • 14.
    node_exporter Prometheus exporter forhardware and OS metrics exposed by *NIX kernels, written in Go with pluggable metric collectors.
  • 15.
    Metrics ● CPU (system,user, nice, iowait, steal, idle, irq, softirq, guest) ● Memory (Apps, Buffers, Cached, Free, Sla, SwapCached, PageTables, VmallocUser, Swap, Committed, Mapped, Active, Inactive) ● Load ● Disk Space Used in percent ● Disk Utilization per Device ● Disk IOS per device (read, write) ● Disk Throughput per Device (read, write) ● Context Switches ● Network Traffic (In, Out) ● Netstat (Established) ● UDP stats (InDatagrams, InErrors, OutDatagrams, NoPorts) ● Conntrack
  • 16.
    AWS EC2 config RelabellingTags __meta_ec2_availability_zone Availability zone __meta_ec2_instance_id Instance Id __meta_ec2_instance_state Instance state __meta_ec2_instance_type Instance type __meta_ec2_private_ip Private ip __meta_ec2_public_dns_name Public DNS Name __meta_ec2_public_ip Public IP __meta_ec2_tag_<tagkey> Custom Tag key
  • 18.
  • 19.
    Approach #1 -Prometheus on EC2 EC2 Kubernetes node ex µServices AWS EC2
  • 20.
    #1. Getting EC2server metrics is quite easy and straightforward. Prometheus provides EC2 discovery. #2. Getting Kubernetes and Application metrics is very complex. It has 300+ lines of configuration to support just Kubernetes metrics
  • 21.
    Approach #2. UsePrometheus operator
  • 22.
    What is Prometheusoperator? The Prometheus Operator creates, configures, and manages Prometheus monitoring instances. Automatically generates monitoring target configurations based on familiar Kubernetes label queries.
  • 24.
    Service monitor CustomResource Definition(CRD)
  • 25.