Clustering Tensor
Flow con
Kubernetes y
Raspberry Pi
Andres L Martinez
@davilagrau
Photo by William Felker on Unsplash
almo
Google Developer Program Lead PAN EU
@davilagrau
https://www.linkedin.com/in/aleonar
https://www.instagram.com/davilagrau
https://github.com/almo
https://www.facebook.com/davilagrau
Lucas Käldström’s Motivation
● Kubernetes’ Dev Community
○ committer
○ maintainer
● Main motivations
○ Learning Google’s
technologies
○ Developing Open Source
○ Re-use old/cheap HW
Photo by Ken Treloar on Unsplash
● Hyperparameter tuning
○ Auto ML
● Scaling on QPS
○ ML API
● Ensemble learning
● Data Parallelism
Model Parallelism?
Don’t ask, don’t tell
So
why not?
Raspberry Pi
Raspberry Pi 3
● Single-board computer
● ARM 1.2 GHz 64/32-bit quad-core
○ VFPv4 Floating Point Unit
onboard (per core)
○ Hardware virtualization support
● 1 GB LPDDR2 RAM at 900 MHz
● MicroSDHC slot
● Bluetooth 4.1
● 2.4 GHz WiFi 802.11n & Ethernet
10/100
Kubernetes
Kubernetes
● Kubernetes is an open-source system
for automating deployment, scaling,
and management of containerized
applications.
● Horizontal scaling
● Service discovery and load balancing
● Self-healing
Tensor Flow
TensorFlow
● TensorFlow is an open source
software library for numerical
computation using data flow
graphs.
● TensorFlow has APIs available in
C++, Python, Java and Go.
● TensorFlow has also bindings
for: C#, Haskell, Julia, Ruby, Rust,
and Scala.
● TensorFlow Lite is TensorFlow’s
lightweight solution for mobile
and embedded devices
Raspberry Pi version
is coming soon!
Architecture
HypriotOS HypriotOS HypriotOS HypriotOS
Kubernetes
Cluster
Tensor Flow
Cluster
Cluster
Kubernetes
Setting Kubernetes up!
Master
Kubeadm
Node #2
Node#1
Node#3
Kubelet
Docker
KubeletDocker
Kubelet
Docker
Setting up the master
Settings
● OS: Installing Docker on Raspbian OR
○ Download and flash HypriotOS v1.7.1 from
https://goo.gl/y9Jyzd
● Setting up Kubernetes repositories
○ Key / Source
Master
Kubeadm
Commands
● apt-get update && apt-get install -y kubeadm
● echo `cat /boot/cmdline.txt` cgroup_enable=cpuset >
/boot/cmdline.txt
● swapoff -a #Note: Kubernetes 1.8
● kubeadm init --pod-network-cidr 10.244.0.0/16
Setting up the node (each)
Settings
● OS: Installing Docker on Raspbian OR
○ Download and flash HypriotOS v1.7.1 from
https://goo.gl/y9Jyzd
● Setting up Kubernetes repositories
○ Key / Source
Commands
● apt-get update && apt-get install -y kubeadm
● echo `cat /boot/cmdline.txt` cgroup_enable=cpuset
> /boot/cmdline.txt
● swapoff -a #Note: Kubernetes 1.8
● kubeadm join --token=XXXXX Master-IP
Node #2
Kubelet
Docker
Setting the network
Flannel: flannel is a virtual network that
gives a subnet to each host for use with
container runtimes
“Platforms like Google's Kubernetes assume that
each container (pod) has a unique, routable IP inside
the cluster. The advantage of this model is that it
reduces the complexity of doing port mapping”
Let’s scale with
Ansible Scripts https://github.com/lahsivjar/kube-arm
Cluster
Tensor Flow
Parallelization Strategies
Distributed TensorFlow
Explicit (device block): TensorFlow will insert
the appropriate data transfers between the jobs.
with tf.device(“/cpu:0”):
a = tf.Variable(3.0)
b = tf.Variable(3.0)
c = a * b
Parallelization strategies:
● In-graph replication
● Between-graph replication
● Asynchronous training
● Synchronous training
TensorFlow Serving
It might be also
a function
TensorFlow Cluster
A TensorFlow "cluster" is a set of "tasks" that participate
in the distributed execution of a TensorFlow graph.
Steps:
1. Create a tf.train.ClusterSpec that describes all of the
tasks in the cluster. This should be the same for each
task.
2. Create a tf.train.Server, passing the
tf.train.ClusterSpec to the constructor, and identifying
the local task with a job name and task index.
TensorFlow Cluster
Node #0
tf.train.ClusterSpec({
"worker": [
"worker0.example.com:2222",
"worker1.example.com:2222"
],
"ps": [
"ps0.example.com:2222",
"ps1.example.com:2222"
]})
Worker
Node #1
Worker
Node #1
PS
Node #0
PS
Setting up TensorFlow Cluster I
Node #1
PS
Node #0
PS $ python trainer.py 
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 
--job_name=ps --task_index=0
$ python trainer.py 
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 
--job_name=ps --task_index=1
Example: Between-graph replication / Asynchronous training
Setting up TensorFlow Cluster II
Node #1
Worker
Node #0
Worker $ python trainer.py 
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 
--job_name=worker --task_index=0
$ python trainer.py 
--ps_hosts=ps0.example.com:2222,ps1.example.com:2222 
--worker_hosts=worker0.example.com:2222,worker1.example.com:2222 
--job_name=worker --task_index=1
Sharding Variables in Multiples Parameters Servers
with tf.device(tf.train.replica_device_setter())
with
tf.train.MonitoredTrainingSession(master=server.target,
is_chief=(FLAGS.task_index == 0),
checkpoint_dir="/tmp/train_logs",
hooks=hooks) as mon_sess:
Node #1
PS Node #2
PS
Node #0
PS
Docker Image
CPU Only i.e. Raspberry Pi
● TensorFlow 1.1 https://goo.gl/URUpko
● Official TensorFlow Lite for Raspberry Pi,
Coming Soon! https://goo.gl/viqtuQ
● resin/rpi-raspbian + tensorflow 1.4
Coming soon!
PET!
Putting Everything
Together
Pod Controllers: Stateful Sets (PS & Workers)
● Manages the deployment and
scaling of a set of Pods.
● Provides guarantees about the
ordering and uniqueness of
these Pods.
● StatefulSet manages Pods that
are based on an identical
container spec.
● StatefulSet maintains a sticky
identity for each of their Pods.
StatefulSet
#0
Node #1
PS
Node #0
PS
Node #1
Worker
Node #0
Worker
StatefulSet
#1
Cluster Scalability
+1
Rolling
Update!
WorkersPS
WorkersPS
(Kubernetes)
TensorFlow
Thank you!
Questions?
Andres L Martinez
@davilagrau
Setting the cluster I
$ kubectl run hypriot --image=hypriot/rpi-busybox-httpd --replicas=3 --port=80
$ kubectl expose deployment hypriot --port 80
$ kubectl get endpoints hypriot
1
2
3
Setting the cluster II
$ kubectl apply -f traefik-ingress-controller.yaml
$ kubectl label node IP nginx-controller=traefik
$ kubectl apply -f cluster-ingress.yaml
1
2
3
$ cat cluster-ingress.yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: hypriot
spec:
rules:
- http:
paths:
- path: /
backend:
serviceName: hypriot
servicePort: 80
Kubernetes: orquestación de imagenes de Rasberri Pi
para el despliegue de tensorflow server.
Description:
Structure:
- Use case: IoT / processing information
- Architecture Kubernetes + Tensorflow+ rasberri pi
- Python
- Introduction to Kubernetes (Ansible?) (Laura)
- Master / Slave architecture (Laura)
- Container description: labelling and pod matching
(Laura)
- Configuration (Laura)
- Load balancer (Laura)
- round robin HTTP request (Laura)
- Monitoring load of the replicas (Laura)
- Failure tolerance (Laura)
- Introduction to Tensor Flow
- Introduction to TensorFlow / MachineLearning
- Computation Graph
- Introduction to TensorFlow Server
- Development of Use Case
Dashboard
codemotion-1 192.168.1.76 B8:27:EB:E5:9D:A5
codemotion-3 192.168.1.75 B8:27:EB:68:5B:F9
codemotion-4 192.168.1.77 B8:27:EB:1F:EF:29
codemotion-2 192.168.1.78 B8:27:EB:19:38:C3
Laura Morillo-Velarde
● Backend engineer at seedtag
● Twitter: @Laura_Morillo
● WTM Lead at GDG Madrid

Clustering tensor flow con kubernetes y raspberry pi

  • 1.
    Clustering Tensor Flow con Kubernetesy Raspberry Pi Andres L Martinez @davilagrau Photo by William Felker on Unsplash
  • 2.
    almo Google Developer ProgramLead PAN EU @davilagrau https://www.linkedin.com/in/aleonar https://www.instagram.com/davilagrau https://github.com/almo https://www.facebook.com/davilagrau
  • 3.
    Lucas Käldström’s Motivation ●Kubernetes’ Dev Community ○ committer ○ maintainer ● Main motivations ○ Learning Google’s technologies ○ Developing Open Source ○ Re-use old/cheap HW
  • 4.
    Photo by KenTreloar on Unsplash ● Hyperparameter tuning ○ Auto ML ● Scaling on QPS ○ ML API ● Ensemble learning ● Data Parallelism Model Parallelism? Don’t ask, don’t tell
  • 5.
  • 6.
  • 7.
    Raspberry Pi 3 ●Single-board computer ● ARM 1.2 GHz 64/32-bit quad-core ○ VFPv4 Floating Point Unit onboard (per core) ○ Hardware virtualization support ● 1 GB LPDDR2 RAM at 900 MHz ● MicroSDHC slot ● Bluetooth 4.1 ● 2.4 GHz WiFi 802.11n & Ethernet 10/100
  • 8.
  • 9.
    Kubernetes ● Kubernetes isan open-source system for automating deployment, scaling, and management of containerized applications. ● Horizontal scaling ● Service discovery and load balancing ● Self-healing
  • 10.
  • 11.
    TensorFlow ● TensorFlow isan open source software library for numerical computation using data flow graphs. ● TensorFlow has APIs available in C++, Python, Java and Go. ● TensorFlow has also bindings for: C#, Haskell, Julia, Ruby, Rust, and Scala. ● TensorFlow Lite is TensorFlow’s lightweight solution for mobile and embedded devices Raspberry Pi version is coming soon!
  • 12.
    Architecture HypriotOS HypriotOS HypriotOSHypriotOS Kubernetes Cluster Tensor Flow Cluster
  • 13.
  • 14.
    Setting Kubernetes up! Master Kubeadm Node#2 Node#1 Node#3 Kubelet Docker KubeletDocker Kubelet Docker
  • 15.
    Setting up themaster Settings ● OS: Installing Docker on Raspbian OR ○ Download and flash HypriotOS v1.7.1 from https://goo.gl/y9Jyzd ● Setting up Kubernetes repositories ○ Key / Source Master Kubeadm Commands ● apt-get update && apt-get install -y kubeadm ● echo `cat /boot/cmdline.txt` cgroup_enable=cpuset > /boot/cmdline.txt ● swapoff -a #Note: Kubernetes 1.8 ● kubeadm init --pod-network-cidr 10.244.0.0/16
  • 16.
    Setting up thenode (each) Settings ● OS: Installing Docker on Raspbian OR ○ Download and flash HypriotOS v1.7.1 from https://goo.gl/y9Jyzd ● Setting up Kubernetes repositories ○ Key / Source Commands ● apt-get update && apt-get install -y kubeadm ● echo `cat /boot/cmdline.txt` cgroup_enable=cpuset > /boot/cmdline.txt ● swapoff -a #Note: Kubernetes 1.8 ● kubeadm join --token=XXXXX Master-IP Node #2 Kubelet Docker
  • 17.
    Setting the network Flannel:flannel is a virtual network that gives a subnet to each host for use with container runtimes “Platforms like Google's Kubernetes assume that each container (pod) has a unique, routable IP inside the cluster. The advantage of this model is that it reduces the complexity of doing port mapping”
  • 18.
    Let’s scale with AnsibleScripts https://github.com/lahsivjar/kube-arm
  • 19.
  • 20.
    Parallelization Strategies Distributed TensorFlow Explicit(device block): TensorFlow will insert the appropriate data transfers between the jobs. with tf.device(“/cpu:0”): a = tf.Variable(3.0) b = tf.Variable(3.0) c = a * b Parallelization strategies: ● In-graph replication ● Between-graph replication ● Asynchronous training ● Synchronous training TensorFlow Serving It might be also a function
  • 21.
    TensorFlow Cluster A TensorFlow"cluster" is a set of "tasks" that participate in the distributed execution of a TensorFlow graph. Steps: 1. Create a tf.train.ClusterSpec that describes all of the tasks in the cluster. This should be the same for each task. 2. Create a tf.train.Server, passing the tf.train.ClusterSpec to the constructor, and identifying the local task with a job name and task index.
  • 22.
    TensorFlow Cluster Node #0 tf.train.ClusterSpec({ "worker":[ "worker0.example.com:2222", "worker1.example.com:2222" ], "ps": [ "ps0.example.com:2222", "ps1.example.com:2222" ]}) Worker Node #1 Worker Node #1 PS Node #0 PS
  • 23.
    Setting up TensorFlowCluster I Node #1 PS Node #0 PS $ python trainer.py --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 --job_name=ps --task_index=0 $ python trainer.py --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 --job_name=ps --task_index=1 Example: Between-graph replication / Asynchronous training
  • 24.
    Setting up TensorFlowCluster II Node #1 Worker Node #0 Worker $ python trainer.py --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 --job_name=worker --task_index=0 $ python trainer.py --ps_hosts=ps0.example.com:2222,ps1.example.com:2222 --worker_hosts=worker0.example.com:2222,worker1.example.com:2222 --job_name=worker --task_index=1
  • 25.
    Sharding Variables inMultiples Parameters Servers with tf.device(tf.train.replica_device_setter()) with tf.train.MonitoredTrainingSession(master=server.target, is_chief=(FLAGS.task_index == 0), checkpoint_dir="/tmp/train_logs", hooks=hooks) as mon_sess: Node #1 PS Node #2 PS Node #0 PS
  • 26.
    Docker Image CPU Onlyi.e. Raspberry Pi ● TensorFlow 1.1 https://goo.gl/URUpko ● Official TensorFlow Lite for Raspberry Pi, Coming Soon! https://goo.gl/viqtuQ ● resin/rpi-raspbian + tensorflow 1.4 Coming soon!
  • 27.
  • 28.
    Pod Controllers: StatefulSets (PS & Workers) ● Manages the deployment and scaling of a set of Pods. ● Provides guarantees about the ordering and uniqueness of these Pods. ● StatefulSet manages Pods that are based on an identical container spec. ● StatefulSet maintains a sticky identity for each of their Pods. StatefulSet #0 Node #1 PS Node #0 PS Node #1 Worker Node #0 Worker StatefulSet #1
  • 29.
  • 31.
    Thank you! Questions? Andres LMartinez @davilagrau
  • 32.
    Setting the clusterI $ kubectl run hypriot --image=hypriot/rpi-busybox-httpd --replicas=3 --port=80 $ kubectl expose deployment hypriot --port 80 $ kubectl get endpoints hypriot 1 2 3
  • 33.
    Setting the clusterII $ kubectl apply -f traefik-ingress-controller.yaml $ kubectl label node IP nginx-controller=traefik $ kubectl apply -f cluster-ingress.yaml 1 2 3 $ cat cluster-ingress.yaml apiVersion: extensions/v1beta1 kind: Ingress metadata: name: hypriot spec: rules: - http: paths: - path: / backend: serviceName: hypriot servicePort: 80
  • 34.
    Kubernetes: orquestación deimagenes de Rasberri Pi para el despliegue de tensorflow server. Description: Structure: - Use case: IoT / processing information - Architecture Kubernetes + Tensorflow+ rasberri pi - Python - Introduction to Kubernetes (Ansible?) (Laura) - Master / Slave architecture (Laura) - Container description: labelling and pod matching (Laura) - Configuration (Laura) - Load balancer (Laura) - round robin HTTP request (Laura) - Monitoring load of the replicas (Laura) - Failure tolerance (Laura) - Introduction to Tensor Flow - Introduction to TensorFlow / MachineLearning - Computation Graph - Introduction to TensorFlow Server - Development of Use Case
  • 35.
    Dashboard codemotion-1 192.168.1.76 B8:27:EB:E5:9D:A5 codemotion-3192.168.1.75 B8:27:EB:68:5B:F9 codemotion-4 192.168.1.77 B8:27:EB:1F:EF:29 codemotion-2 192.168.1.78 B8:27:EB:19:38:C3
  • 36.
    Laura Morillo-Velarde ● Backendengineer at seedtag ● Twitter: @Laura_Morillo ● WTM Lead at GDG Madrid