@snehainguva
digitalocean.com
containers
the what, why, and how
digitalocean.com
about me
software engineer @DigitalOcean
delivery team
kubernetes, prometheus, terraform
digitalocean.com
digitalocean.com
the plan:
● Build your own container
● Containers vs. VMs
● Container ecosystem
digitalocean.com
what is a container?
digitalocean.com
what is a container?
“a lightweight OS-level virtualization method”
“stand-alone piece of executable software”
“NOT a virtual machine”
digitalocean.com
build your own container
1. run input commands with arguments
2. add hostname limitations
3. add process ID limitations
4. add mount point/filesystem limitations
digitalocean.com
let’s start with a
basic “container”
func main() {
switch os.Args[1] {
case "run":
run()
default:
panic("what?")
}
}
func run() {
fmt.Printf("running %vn", os.Args[2:])
cmd := exec.Command(os.Args[2],
os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
must(cmd.Run())
}
func must(err error) {
if err != nil {
panic(err)
}
}
digitalocean.com
let’s start with a basic “container”
digitalocean.com
let’s start with a basic “container”
digitalocean.com
how can we restrict hostname access?
digitalocean.com
namespaces!!!
digitalocean.com
func run() {
fmt.Printf("running %vn", os.Args[2:])
cmd := exec.Command(os.Args[2],
os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS,
}
must(cmd.Run())
}
UTS namespace
digitalocean.com
what about PID access?
digitalocean.com
UTS + PID namespace: attempt 1
func run() {
fmt.Printf("running %vn", os.Args[2:])
cmd := exec.Command(os.Args[2],
os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS |
syscall.CLONE_NEWPID,
}
must(cmd.Run())
}
UTS + PID namespace: attempt 2
func run() {
cmd := exec.Command("/proc/self/exe", append([]string{"child"},
os.Args[2:]...)...)
cmd.Stdin = os.Stdin
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID,
}
must(cmd.Run())
}
func child() {
fmt.Printf("running %v as pid %vn", os.Args[2:], os.Getpid())
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
must(cmd.Run())
}
UTS + PID namespace: attempt 2
UTS + PID + MNT namespace: attempt 1
func run() {
md := exec.Command("/proc/self/exe", append([]string{"child"},
os.Args[2:]...)...) // link to currently running process
cmd.Stdin = os.Stdin
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
cmd.SysProcAttr = &syscall.SysProcAttr{
Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID |
syscall.CLONE_NEWNS,
}
must(cmd.Run())
}
UTS + PID + MNT namespace: attempt 1
Initial mounts in MNT namespace inherited from creating namespace → filesystem same as host
next step: UTS + PID + MNT namespace + new root filesystem
example
func child() {
fmt.Printf("running %v as pid%vn", os.Args[2:], os.Getpid())
cmd := exec.Command(os.Args[2], os.Args[3:]...)
cmd.Stdin = os.Stdin
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
must(syscall.Chroot("/home/rootfs"))
must(os.Chdir("/"))
must(syscall.Mount("proc", "proc", "proc", 0, ""))
must(cmd.Run())
}
TODO
digitalocean.com
what is a container?
process with isolation,
shared resources, and
layered filesystems
what is a container?
namespace: linux kernel feature that isolates and virtualizes system resources
for a collection of processes and their children
● PID: gives process own view of subset of system processes. ✔
● MNT: gives process mount table and allows process to have own filesystem ✔
● NET: gives process own network stack. (Container can have virtual ethernet pairs to link to host or
other containers.)
● UTS: gives process own view of system hostname and domain name ✔
● IPC: isolates inter-process communications (i.e. message queues)
● USER: newest namespace that maps process UIDs to different set of UIDs on host (can map
containers root uid to unprivileged UID on host)
what is a container?
cgroups: control groups collect set of process tasks IDS together and apply
limits, such as for resource utilization
● Enforce fair/unfair resource sharing between processes
● Exposed by kernel as special file system to to mount
● Add a process or thread by adding process IDs to task file and
read/configure values by editing subdirectory files
what is a container?
layered filesystems: optimal way to make a copy of root filesystem for each
container
● one of the reasons why it is easy to move containers around
● can “copy on write” (btrFS)
● can use “union mounts” (aufs, OverlayFS) - way of combining multiple
directories
digitalocean.com
Containers
vs.
VMs
digitalocean.com
containers vs. VMS
Source: http://electronicdesign.com/dev-tools/what-s-difference-between-containers-and-virtual-machines
digitalocean.com
vms containers
● Hypervisors run software on physical
servers to emulate a particular
hardware system (aka a virtual
machine)
● VM runs a fully copy of the
operating system (OS)
● Hardware is also virtualized
● Can run multiple applications
● Run isolated process on a single
server or host operating system
(OS)
● Can migrate only to servers with
compatiable OS kernels
● Best for a single application
digitalocean.com
container ecosystem
● Container runtime
● Orchestration tools
● As-a-service
digitalocean.com
Source: https://docs.docker.com/engine/understanding-docker/
https://coreos.com/rkt/docs/latest/rkt-vs-other-projects.html#rkt-vs-docker
containers
digitalocean.com
container orchestration
Source: https://github.com/nkhare/container-orchestration/blob/master/kubernetes/README.md
digitalocean.com
___ as-a-service
container service, managed clusters, etc.
Source: https://coreos.com/tectonic/
sources
● Liz Rice: What is a Container, Really?, Liz Rice
● Building a Container in Less than a 100 Lines
of Go, Julien Friedman
● My demo code
Containers: The What, Why, and How

Containers: The What, Why, and How

  • 1.
  • 2.
  • 3.
    digitalocean.com about me software engineer@DigitalOcean delivery team kubernetes, prometheus, terraform
  • 4.
  • 5.
    digitalocean.com the plan: ● Buildyour own container ● Containers vs. VMs ● Container ecosystem
  • 6.
  • 7.
    digitalocean.com what is acontainer? “a lightweight OS-level virtualization method” “stand-alone piece of executable software” “NOT a virtual machine”
  • 8.
    digitalocean.com build your owncontainer 1. run input commands with arguments 2. add hostname limitations 3. add process ID limitations 4. add mount point/filesystem limitations
  • 9.
    digitalocean.com let’s start witha basic “container” func main() { switch os.Args[1] { case "run": run() default: panic("what?") } } func run() { fmt.Printf("running %vn", os.Args[2:]) cmd := exec.Command(os.Args[2], os.Args[3:]...) cmd.Stdin = os.Stdin cmd.Stderr = os.Stderr cmd.Stdout = os.Stdout must(cmd.Run()) } func must(err error) { if err != nil { panic(err) } }
  • 10.
    digitalocean.com let’s start witha basic “container”
  • 11.
    digitalocean.com let’s start witha basic “container”
  • 12.
    digitalocean.com how can werestrict hostname access?
  • 13.
  • 14.
    digitalocean.com func run() { fmt.Printf("running%vn", os.Args[2:]) cmd := exec.Command(os.Args[2], os.Args[3:]...) cmd.Stdin = os.Stdin cmd.Stderr = os.Stderr cmd.Stdout = os.Stdout cmd.SysProcAttr = &syscall.SysProcAttr{ Cloneflags: syscall.CLONE_NEWUTS, } must(cmd.Run()) } UTS namespace
  • 15.
  • 16.
    digitalocean.com UTS + PIDnamespace: attempt 1 func run() { fmt.Printf("running %vn", os.Args[2:]) cmd := exec.Command(os.Args[2], os.Args[3:]...) cmd.Stdin = os.Stdin cmd.Stderr = os.Stderr cmd.Stdout = os.Stdout cmd.SysProcAttr = &syscall.SysProcAttr{ Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID, } must(cmd.Run()) }
  • 17.
    UTS + PIDnamespace: attempt 2 func run() { cmd := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...) cmd.Stdin = os.Stdin cmd.Stderr = os.Stderr cmd.Stdout = os.Stdout cmd.SysProcAttr = &syscall.SysProcAttr{ Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID, } must(cmd.Run()) } func child() { fmt.Printf("running %v as pid %vn", os.Args[2:], os.Getpid()) cmd := exec.Command(os.Args[2], os.Args[3:]...) cmd.Stdin = os.Stdin cmd.Stderr = os.Stderr cmd.Stdout = os.Stdout must(cmd.Run()) }
  • 18.
    UTS + PIDnamespace: attempt 2
  • 19.
    UTS + PID+ MNT namespace: attempt 1 func run() { md := exec.Command("/proc/self/exe", append([]string{"child"}, os.Args[2:]...)...) // link to currently running process cmd.Stdin = os.Stdin cmd.Stderr = os.Stderr cmd.Stdout = os.Stdout cmd.SysProcAttr = &syscall.SysProcAttr{ Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID | syscall.CLONE_NEWNS, } must(cmd.Run()) }
  • 20.
    UTS + PID+ MNT namespace: attempt 1 Initial mounts in MNT namespace inherited from creating namespace → filesystem same as host
  • 21.
    next step: UTS+ PID + MNT namespace + new root filesystem example func child() { fmt.Printf("running %v as pid%vn", os.Args[2:], os.Getpid()) cmd := exec.Command(os.Args[2], os.Args[3:]...) cmd.Stdin = os.Stdin cmd.Stderr = os.Stderr cmd.Stdout = os.Stdout must(syscall.Chroot("/home/rootfs")) must(os.Chdir("/")) must(syscall.Mount("proc", "proc", "proc", 0, "")) must(cmd.Run()) } TODO
  • 22.
    digitalocean.com what is acontainer? process with isolation, shared resources, and layered filesystems
  • 23.
    what is acontainer? namespace: linux kernel feature that isolates and virtualizes system resources for a collection of processes and their children ● PID: gives process own view of subset of system processes. ✔ ● MNT: gives process mount table and allows process to have own filesystem ✔ ● NET: gives process own network stack. (Container can have virtual ethernet pairs to link to host or other containers.) ● UTS: gives process own view of system hostname and domain name ✔ ● IPC: isolates inter-process communications (i.e. message queues) ● USER: newest namespace that maps process UIDs to different set of UIDs on host (can map containers root uid to unprivileged UID on host)
  • 24.
    what is acontainer? cgroups: control groups collect set of process tasks IDS together and apply limits, such as for resource utilization ● Enforce fair/unfair resource sharing between processes ● Exposed by kernel as special file system to to mount ● Add a process or thread by adding process IDs to task file and read/configure values by editing subdirectory files
  • 25.
    what is acontainer? layered filesystems: optimal way to make a copy of root filesystem for each container ● one of the reasons why it is easy to move containers around ● can “copy on write” (btrFS) ● can use “union mounts” (aufs, OverlayFS) - way of combining multiple directories
  • 26.
  • 27.
    digitalocean.com containers vs. VMS Source:http://electronicdesign.com/dev-tools/what-s-difference-between-containers-and-virtual-machines
  • 28.
    digitalocean.com vms containers ● Hypervisorsrun software on physical servers to emulate a particular hardware system (aka a virtual machine) ● VM runs a fully copy of the operating system (OS) ● Hardware is also virtualized ● Can run multiple applications ● Run isolated process on a single server or host operating system (OS) ● Can migrate only to servers with compatiable OS kernels ● Best for a single application
  • 29.
    digitalocean.com container ecosystem ● Containerruntime ● Orchestration tools ● As-a-service
  • 30.
  • 31.
  • 32.
    digitalocean.com ___ as-a-service container service,managed clusters, etc. Source: https://coreos.com/tectonic/
  • 33.
    sources ● Liz Rice:What is a Container, Really?, Liz Rice ● Building a Container in Less than a 100 Lines of Go, Julien Friedman ● My demo code