Checkpoint-restore in userspace.
Are we there yet?

                           Pavel Emelyanov
                      LinuxCon Europe 2012
What is C/R and what is it for?

C/R is an ability to snapshot an application state and restore it from the
  state at any time and place later.


Usage scenarios:
        – Live migration
        – Reboot-less kernel update
        – Applications start-up boost
        – Working environment snapshots
        – HPC load balancing
        – ...




                                          2
Is it possible to do all these nice things now?


                      Yes!
                     Almost.

             And we're close to it!


               This talk answers on:
 ✔
     How shall we be able to do it?
 ✔
     How close to it are we?
 ✔
     How far from “impossible to” are we?
 ✔
     What has happened since then?



                         3
A brief history of C/R in Linux
2005                                       2008                        2010                2011           2012
                                                                                                    Jan      Jul   Sep




                                                                                                  CRIU v0.2
                                                                                                  + LXC support


                                                                               CRIU v0.1


                                               Linus decided to merge
                                               first set of patches upstream
                                     First attempt to do C/R
                                     mostly in user-space
                   First more-or-less complete version
                   (over 100 patches)
               First collaborative attempt
       OpenVZ project starts
               to get C/R upstream
       with live-migration support
       all in kernel feature




                                                          4
CRIU project ultimate goal
             ...
  Timers

           APP      FS                                                 APP

    Creds                         dump
                   MM                      Image

                                         0011011001
                                         0010101110
                                         1101011001
                                         1011100111
                         APP             0001011011                              APP
                                         0101011100
                                         1011010110
                                             ...



                                                      restore
                          share
       APP                                                            APP




   IPC                                                          IPC
                           ...                                                    ...
            Network                                                    Network




                                             5
CRIU project concept

                         FD
                  APP                       open




        dump
                  CRIU        What files are opened?   kernel
                  tool




        restore          FD
                  ~APP                      open




                                        6
Existing kernel APIs
                   dump



                            Proc



                                             restore

                          System calls


                                                       kernel




             About self            About anybody




                            Netlink




                                      7
How CRIU grows up

                        FOO
                 APP                   Get FOO




       dump
                 CRIU
                 tool
                              Info on FOO-s
                                                  ?
                                                  kernel   Info FOO ++




                                                  X
       restore          FD
                 ~APP              Get FOO back            Get FOO ++




                                   8
CRIU project grow-up concept (Linus vision)




... this is a project by various mad Russians to perform c/r mainly from userspace,
        with various oddball helper code added into the kernel where the need is
                                      demonstrated.
 So rather than some large central lump of code, what we have is little bits and
   pieces popping up in various places which either expose something new or
     which permit something which is normally kernel-private to be modified...




                                         9
Kernel impact

            ~110 patches merged
                                        ~15 patches in flight




         9 new features appeared
         (1 C/R-only)                   2 new features to come




                                   10
The most interesting new features in kernel
   Parasite code injection
           – Read task states, that are currently retrieved by a task only about himself



   The kcmp system call
           – Helps checking which kernel objects are shared between processes


   Sockets information dumping via netlink ( sock_diag)
           – Extendable sockets state retrieving engine



   TCP repair mode
           – Read intimate state of a TCP connection
               and reconstructs it from scratch on a freshly created socket




                                             11
Other new features in kernel
   Virtual net devices indices
           – Allows to restore network devices in a namespace



   Proc map_files directory
           – Find out what exact file is mapped
           – Mappings sharing info



   Socket peeking offset
           – Allows peeking sockets queues
                 (reading without removing data from queue)



   More socket get-able sockoptions
           – Bound device
           – Packet filter




                                                  12
CRIU features so far
                                    X86_64 architecture

                                    Process tree linkage

                                    Multi-threaded apps

                               Memory mappings of all kinds

                               Terminals, groups and sessions

                           Open files (+ shared and unlinked)

                                Established TCP connection

                                       UNIX sockets

                                 LXC container environment



                                       Kernel V3.6
    IPC
                    ...
          Network          Non-posix files (inotify, epoll, etc.)




                          13
How we test it

    ZDTM – set of atomic tests for every new piece of functionality

    Real software
           
               Apache
           
               MySQL
           
               Make and gcc
           
               Tar and gzip
           
               Sshd with connections
           
               Screen with top inside
           
               VNC with xscreensaver and client connection
           
               NGINX
           
               MongoDB
           
               tcpdump



                                              14
Main plans for the nearest future

●
    Full OS resources coverage
●
    Merge in-flight patches, so that everything works on vanilla kernel
●
    Properly integrate crtools with LXC and OpenVZ
●
    Live-migration script
●
    Pre-migrate app memory before freeze (speeds things up)




                                       15
CRIU project resources


http://criu.org – project news and documentation
http://git.criu.org – git repo with tool sources
https://github.com/cyrillos/linux-2.6/ – kernel with all in-flight patches applied
criu@openvz.org mailing list
+CRIU page




                                          16
Pavel Emelyanov

                                xemul@parallels.com


17   Parallels – Optimized ComputingTM    Confidential

Checkpoint/Restore: are we there yet?

  • 1.
    Checkpoint-restore in userspace. Arewe there yet? Pavel Emelyanov LinuxCon Europe 2012
  • 2.
    What is C/Rand what is it for? C/R is an ability to snapshot an application state and restore it from the state at any time and place later. Usage scenarios: – Live migration – Reboot-less kernel update – Applications start-up boost – Working environment snapshots – HPC load balancing – ... 2
  • 3.
    Is it possibleto do all these nice things now? Yes! Almost. And we're close to it! This talk answers on: ✔ How shall we be able to do it? ✔ How close to it are we? ✔ How far from “impossible to” are we? ✔ What has happened since then? 3
  • 4.
    A brief historyof C/R in Linux 2005 2008 2010 2011 2012 Jan Jul Sep CRIU v0.2 + LXC support CRIU v0.1 Linus decided to merge first set of patches upstream First attempt to do C/R mostly in user-space First more-or-less complete version (over 100 patches) First collaborative attempt OpenVZ project starts to get C/R upstream with live-migration support all in kernel feature 4
  • 5.
    CRIU project ultimategoal ... Timers APP FS APP Creds dump MM Image 0011011001 0010101110 1101011001 1011100111 APP 0001011011 APP 0101011100 1011010110 ... restore share APP APP IPC IPC ... ... Network Network 5
  • 6.
    CRIU project concept FD APP open dump CRIU What files are opened? kernel tool restore FD ~APP open 6
  • 7.
    Existing kernel APIs dump Proc restore System calls kernel About self About anybody Netlink 7
  • 8.
    How CRIU growsup FOO APP Get FOO dump CRIU tool Info on FOO-s ? kernel Info FOO ++ X restore FD ~APP Get FOO back Get FOO ++ 8
  • 9.
    CRIU project grow-upconcept (Linus vision) ... this is a project by various mad Russians to perform c/r mainly from userspace, with various oddball helper code added into the kernel where the need is demonstrated. So rather than some large central lump of code, what we have is little bits and pieces popping up in various places which either expose something new or which permit something which is normally kernel-private to be modified... 9
  • 10.
    Kernel impact ~110 patches merged ~15 patches in flight 9 new features appeared (1 C/R-only) 2 new features to come 10
  • 11.
    The most interestingnew features in kernel Parasite code injection – Read task states, that are currently retrieved by a task only about himself The kcmp system call – Helps checking which kernel objects are shared between processes Sockets information dumping via netlink ( sock_diag) – Extendable sockets state retrieving engine TCP repair mode – Read intimate state of a TCP connection and reconstructs it from scratch on a freshly created socket 11
  • 12.
    Other new featuresin kernel Virtual net devices indices – Allows to restore network devices in a namespace Proc map_files directory – Find out what exact file is mapped – Mappings sharing info Socket peeking offset – Allows peeking sockets queues (reading without removing data from queue) More socket get-able sockoptions – Bound device – Packet filter 12
  • 13.
    CRIU features sofar X86_64 architecture Process tree linkage Multi-threaded apps Memory mappings of all kinds Terminals, groups and sessions Open files (+ shared and unlinked) Established TCP connection UNIX sockets LXC container environment Kernel V3.6 IPC ... Network Non-posix files (inotify, epoll, etc.) 13
  • 14.
    How we testit  ZDTM – set of atomic tests for every new piece of functionality  Real software  Apache  MySQL  Make and gcc  Tar and gzip  Sshd with connections  Screen with top inside  VNC with xscreensaver and client connection  NGINX  MongoDB  tcpdump 14
  • 15.
    Main plans forthe nearest future ● Full OS resources coverage ● Merge in-flight patches, so that everything works on vanilla kernel ● Properly integrate crtools with LXC and OpenVZ ● Live-migration script ● Pre-migrate app memory before freeze (speeds things up) 15
  • 16.
    CRIU project resources http://criu.org– project news and documentation http://git.criu.org – git repo with tool sources https://github.com/cyrillos/linux-2.6/ – kernel with all in-flight patches applied criu@openvz.org mailing list +CRIU page 16
  • 17.
    Pavel Emelyanov xemul@parallels.com 17 Parallels – Optimized ComputingTM Confidential

Editor's Notes

  • #4 Everything is one the slides.
  • #5 A brief C/R history – openvz version, Oren's version, attempt to merge Oren's version upstream, CRIU proof-of-concept, Linus' “OK, let's take it” and first two releases
  • #6 Consider you have an application. This application has a variety of resources associated with it: memory, open files, credentials, etc. There can be more than one application in a game, some of them sharing resources. And that's not all – they may live in some environment (we call container, yes) with its own not bound to tasks resources like networking configuration or system V IPC objects. What we do in CRIU is – we serialize the state of this whole thing into an image file (well, it's a set of files, but still). Later we can take this image and recreate the applications with their resources and environment at the very same state as it was before we dumped it.
  • #18 - Linked clones. Disk space. I/O performance. GPL and ESXi