Google Cloud Platform 1
By Kaushik Bhattacharya, Customer Engineer
Google Cloud, the Netherlands
kbhattacTweets
DevOps & SRE at Google Scale
How Google does it & How can you benefit from this
2
1.How the
engineering
processes at
Google works
Engineering at Google
3.From open
source to Google
Cloud for
enterprises
2. Our learnings,
how we contribute
back to open
source
Google Cloud Platform 3
Building software at Google
Google Cloud Platform 4
From product to idea 10x
Product idea
X 10
Google confidential | Do not distribute
Moonshot thinking: Solving for X.
6
“To organize the world’s information and make it
universally accessible and useful.”
- Google
Project Loon:
Balloon powered internet for everyone!
Waymo:
Self driving car
Prototyping: First version of Google Glass was created in 90 min!
Dogfood
Google Cloud Platform 12
Code Development
Product idea
Writing code
public class foo {}
Google Cloud Platform 13
What it takes to be a Google engineer
Working on problems with SPEED AND SCALE is a challenge.
Engineers keep raising the bar on the tools and infrastructure.
Google Culture:
• Collaboration and co-development
• Sharing between products and teams (tools, libraries, services)
• Engineers have autonomy.
• Agile/Scrum, daily stand-up meetings
Google’s entire codebase is a
giant single repository of more
than 2 billion lines of code
Google Repository statistics
As of Jan 2015
Total number of files 1+ billion
Number of source files 9 million
Lines of code 2+ billion
Depth of history 35 million commits
Size of content 86 terabytes
Google Cloud Platform 17
Advantages of monolithic repo
● Unified versioning - One source of truth
● Extensive code sharing and reuse
● Collaboration across teams
● Simplified dependency management
● Large scale refactoring
● Flexible team boundaries & code
ownership
● Code visibility
Google Cloud Platform 18
Automated Test / Analysis
Google uses its own version control
system called: Piper
Sync
workspace
Write
code
Code
Review
Commit
Read/Write Access per folder Code Quality & Syntax Check
(by humans and by tooling)
Create personal copy
Auto Rollback if needed
MANDATORY
A single code tree, with fast access to the code through tooling.
All types of code languages.
Everyone, works in Trunk. - Branches are for releases.
Google Cloud Platform 19
Software testing
Product idea
Writing code
Testing
Google Cloud Platform 20
Testing at Google
● Developing & Testing go hand in hand
● 3 million test a day
● 20+ OS and Browser combos
Google Cloud Platform 21
Build processes
Product idea
Writing code
Testing
Building
Google Cloud Platform 22
Build systems
Why do we need build systems?
Well code has a lot of dependencies
and you don’t want to compile and link
these all manually.
The steps of a general build system:
1. Loading
2. Analysis
3. Execution by build system
Google Cloud Platform 23
Google’s continuous build and test system
Google has its own continuous build & test system.
Remember, at Google we develop everything at HEAD in the repo.
Endless CPU, Cross User Caching, because of Cloud Computing.
Google Cloud Platform 24
Devops at Google
Product idea
Writing code
Testing
Building
Deploying
Each week Google launches over
4 billion containers.
Google is using container technology
for more than 10 years.
Enter the container
Virtual machine
OS
Dependencies
Application Code
Hardware
Bare-metal server
OS
Dependencies
Application Code
Hardware
Container
OS
Dependencies
Application Code
Hardware
Google Cloud Platform 27
So, you mean Docker?
2004 2016
● Docker is a popular software container platform.
● Containers are a way to package software in a
format that can run isolated on a shared operating
system.
Enter the container… and new challenges
● Scheduling, scaling across clusters of servers
● Networking and connectivity
● Security and Access control
● Logging, Monitoring, and Debugging
● Health checks and uptime preservation
● ...
Google Cloud Platform 29
Large-scale cluster management at
Google with Borg
2004 2016
● It’s software that manages all production machines at Google and
runs jobs (binaries) that engineers give it on them.
● Borg ran pretty much everything inside the company, including
Google Search, Gmail, Google Maps, Google Docs...
● These binaries are run in a container environment.
● When tasks die, they are automatically started up again, and they
may run on a different machine.
Google Cloud Platform 30
Site Reliability Engineering
Product idea
Writing code
Testing
Building
Deploying
SRE
“Hope is not a strategy.
Engineering solutions to design, build, and run large-scale
systems scalably, reliably and efficiently is a strategy,
and a good one.”
32
Site Reliability Engineering
● Site Reliability Engineering is a specialized job
function that focuses on the reliability and
maintainability of large systems.
● SRE is also a mindset, and a set of engineering
approaches to running better production systems
● Google has SRE teams of site reliability engineers
responsible for a service globally available.
https://landing.google.com/sre/book.html
Google Cloud Platform 33
Open Source
Googlers contribute
back to the community.
34
Google is leader in Open Source
src: Stackalytics
35
Popular Google open source projects
https://opensource.google.com
36
Contributions to other popular open source projects and
standards by Google
37
https://research.google.com/
Google wrote lots of white papers which inspires the
big data community.
● Bigtable
● GFS
● Mapreduce
● Chubby
● Sawzall
● Dapper
● Dremel
● Borg
Google Cloud Platform 38
From Google to OSS
2004 2016
Internal Google
● Borg Container Orchestration
● Machine Learning
● Go Lang
● Google Chrome
● Stubby
● Dapper
● GFS/BigTable
Open Source
● Kubernetes
● Tensorflow
● Go Lang
● Chromium
● gRPC
● Zipkin
● HDFS/HBase
39
Tensorflow
Tensorflow is what we use for our own internal
machine learning projects, and now it’s available
to you!
Google made it open source.
More than 480 contributions
10,000 commits in a year
53k star rating
Tutorials to get started at
https://www.tensorflow.org
40
Kubernetes abstracts away the hardware
infrastructure and exposes your whole data center
as a single enormous computing resource.
● Multiple container engines (Docker, rkt,
Windows)
● Cloud and bare-metal environments
● Container Engine = Managed Kubernetes in
Google Cloud
Kubernetes
https://kubernetes.io
41
● A complete framework for connecting, securing, managing and
monitoring services
● Secure and monitor traffic for microservices and legacy services without
requiring any changes to application code
● An open platform with key contributions from Google, IBM, Lyft and
others
● Allows developers to authenticate and secure the communications
between different applications using a TLS connection
● Multi-environment and multi-platform, but Kubernetes first
Istio (A Service Mesh)
Google Cloud
Google infrastructure
for your company.
Open Source
Google Cloud Platform 43
From OSS to Google Cloud
2004 2016
Open Source
● Kubernetes
● Istio
● Tensorflow
● MySQL / Postgresql
● Spark / Hadoop
● Apache Beam
● Spinnaker
Google Cloud
● Google Kubernetes Engine
● ML Engine/Auto ML
● Cloud SQL
● Dataproc
● Dataflow
CONTAINERIZATION
ORCHESTRATION
CI / CD
SERVICE MESH
Package applications
Run applications
Manage applications
Connect and secure
applications
DevOps on Google Cloud
CI/CD on Google Cloud
Build/
Test
Artifact
storage
Deploy
Cloud Build Container
Registry
Cloud
Storage
Source
Source Repository
GitHub Bitbucket Jenkins Circle CI quay Docker
Hub
jenkins Codefresh
DOCKER
KUBERNETES
SPINNAKER
ISTIO
Package applications
Run applications
Manage applications
Connect and secure
applications
DevOps on Google Cloud
47
Google has two
decades of
experience with
building secure
software on large
scale.
Conclusion
Your company can
make use of the
same infrastructure
like Google does.
Scalable, Secure and
Open.
The learnings are
shared through
whitepapers and
contributed back
through open source.
Demo
https://git.io/fhzCx

DevOps & SRE at Google Scale

  • 1.
    Google Cloud Platform1 By Kaushik Bhattacharya, Customer Engineer Google Cloud, the Netherlands kbhattacTweets DevOps & SRE at Google Scale How Google does it & How can you benefit from this
  • 2.
    2 1.How the engineering processes at Googleworks Engineering at Google 3.From open source to Google Cloud for enterprises 2. Our learnings, how we contribute back to open source
  • 3.
    Google Cloud Platform3 Building software at Google
  • 4.
    Google Cloud Platform4 From product to idea 10x Product idea X 10
  • 5.
    Google confidential |Do not distribute Moonshot thinking: Solving for X.
  • 6.
    6 “To organize theworld’s information and make it universally accessible and useful.” - Google
  • 7.
    Project Loon: Balloon poweredinternet for everyone!
  • 8.
  • 10.
    Prototyping: First versionof Google Glass was created in 90 min!
  • 11.
  • 12.
    Google Cloud Platform12 Code Development Product idea Writing code public class foo {}
  • 13.
    Google Cloud Platform13 What it takes to be a Google engineer Working on problems with SPEED AND SCALE is a challenge. Engineers keep raising the bar on the tools and infrastructure. Google Culture: • Collaboration and co-development • Sharing between products and teams (tools, libraries, services) • Engineers have autonomy. • Agile/Scrum, daily stand-up meetings
  • 14.
    Google’s entire codebaseis a giant single repository of more than 2 billion lines of code
  • 15.
    Google Repository statistics Asof Jan 2015 Total number of files 1+ billion Number of source files 9 million Lines of code 2+ billion Depth of history 35 million commits Size of content 86 terabytes
  • 17.
    Google Cloud Platform17 Advantages of monolithic repo ● Unified versioning - One source of truth ● Extensive code sharing and reuse ● Collaboration across teams ● Simplified dependency management ● Large scale refactoring ● Flexible team boundaries & code ownership ● Code visibility
  • 18.
    Google Cloud Platform18 Automated Test / Analysis Google uses its own version control system called: Piper Sync workspace Write code Code Review Commit Read/Write Access per folder Code Quality & Syntax Check (by humans and by tooling) Create personal copy Auto Rollback if needed MANDATORY A single code tree, with fast access to the code through tooling. All types of code languages. Everyone, works in Trunk. - Branches are for releases.
  • 19.
    Google Cloud Platform19 Software testing Product idea Writing code Testing
  • 20.
    Google Cloud Platform20 Testing at Google ● Developing & Testing go hand in hand ● 3 million test a day ● 20+ OS and Browser combos
  • 21.
    Google Cloud Platform21 Build processes Product idea Writing code Testing Building
  • 22.
    Google Cloud Platform22 Build systems Why do we need build systems? Well code has a lot of dependencies and you don’t want to compile and link these all manually. The steps of a general build system: 1. Loading 2. Analysis 3. Execution by build system
  • 23.
    Google Cloud Platform23 Google’s continuous build and test system Google has its own continuous build & test system. Remember, at Google we develop everything at HEAD in the repo. Endless CPU, Cross User Caching, because of Cloud Computing.
  • 24.
    Google Cloud Platform24 Devops at Google Product idea Writing code Testing Building Deploying
  • 25.
    Each week Googlelaunches over 4 billion containers. Google is using container technology for more than 10 years.
  • 26.
    Enter the container Virtualmachine OS Dependencies Application Code Hardware Bare-metal server OS Dependencies Application Code Hardware Container OS Dependencies Application Code Hardware
  • 27.
    Google Cloud Platform27 So, you mean Docker? 2004 2016 ● Docker is a popular software container platform. ● Containers are a way to package software in a format that can run isolated on a shared operating system.
  • 28.
    Enter the container…and new challenges ● Scheduling, scaling across clusters of servers ● Networking and connectivity ● Security and Access control ● Logging, Monitoring, and Debugging ● Health checks and uptime preservation ● ...
  • 29.
    Google Cloud Platform29 Large-scale cluster management at Google with Borg 2004 2016 ● It’s software that manages all production machines at Google and runs jobs (binaries) that engineers give it on them. ● Borg ran pretty much everything inside the company, including Google Search, Gmail, Google Maps, Google Docs... ● These binaries are run in a container environment. ● When tasks die, they are automatically started up again, and they may run on a different machine.
  • 30.
    Google Cloud Platform30 Site Reliability Engineering Product idea Writing code Testing Building Deploying SRE
  • 31.
    “Hope is nota strategy. Engineering solutions to design, build, and run large-scale systems scalably, reliably and efficiently is a strategy, and a good one.”
  • 32.
    32 Site Reliability Engineering ●Site Reliability Engineering is a specialized job function that focuses on the reliability and maintainability of large systems. ● SRE is also a mindset, and a set of engineering approaches to running better production systems ● Google has SRE teams of site reliability engineers responsible for a service globally available. https://landing.google.com/sre/book.html
  • 33.
    Google Cloud Platform33 Open Source Googlers contribute back to the community.
  • 34.
    34 Google is leaderin Open Source src: Stackalytics
  • 35.
    35 Popular Google opensource projects https://opensource.google.com
  • 36.
    36 Contributions to otherpopular open source projects and standards by Google
  • 37.
    37 https://research.google.com/ Google wrote lotsof white papers which inspires the big data community. ● Bigtable ● GFS ● Mapreduce ● Chubby ● Sawzall ● Dapper ● Dremel ● Borg
  • 38.
    Google Cloud Platform38 From Google to OSS 2004 2016 Internal Google ● Borg Container Orchestration ● Machine Learning ● Go Lang ● Google Chrome ● Stubby ● Dapper ● GFS/BigTable Open Source ● Kubernetes ● Tensorflow ● Go Lang ● Chromium ● gRPC ● Zipkin ● HDFS/HBase
  • 39.
    39 Tensorflow Tensorflow is whatwe use for our own internal machine learning projects, and now it’s available to you! Google made it open source. More than 480 contributions 10,000 commits in a year 53k star rating Tutorials to get started at https://www.tensorflow.org
  • 40.
    40 Kubernetes abstracts awaythe hardware infrastructure and exposes your whole data center as a single enormous computing resource. ● Multiple container engines (Docker, rkt, Windows) ● Cloud and bare-metal environments ● Container Engine = Managed Kubernetes in Google Cloud Kubernetes https://kubernetes.io
  • 41.
    41 ● A completeframework for connecting, securing, managing and monitoring services ● Secure and monitor traffic for microservices and legacy services without requiring any changes to application code ● An open platform with key contributions from Google, IBM, Lyft and others ● Allows developers to authenticate and secure the communications between different applications using a TLS connection ● Multi-environment and multi-platform, but Kubernetes first Istio (A Service Mesh)
  • 42.
    Google Cloud Google infrastructure foryour company. Open Source
  • 43.
    Google Cloud Platform43 From OSS to Google Cloud 2004 2016 Open Source ● Kubernetes ● Istio ● Tensorflow ● MySQL / Postgresql ● Spark / Hadoop ● Apache Beam ● Spinnaker Google Cloud ● Google Kubernetes Engine ● ML Engine/Auto ML ● Cloud SQL ● Dataproc ● Dataflow
  • 44.
    CONTAINERIZATION ORCHESTRATION CI / CD SERVICEMESH Package applications Run applications Manage applications Connect and secure applications DevOps on Google Cloud
  • 45.
    CI/CD on GoogleCloud Build/ Test Artifact storage Deploy Cloud Build Container Registry Cloud Storage Source Source Repository GitHub Bitbucket Jenkins Circle CI quay Docker Hub jenkins Codefresh
  • 46.
    DOCKER KUBERNETES SPINNAKER ISTIO Package applications Run applications Manageapplications Connect and secure applications DevOps on Google Cloud
  • 47.
    47 Google has two decadesof experience with building secure software on large scale. Conclusion Your company can make use of the same infrastructure like Google does. Scalable, Secure and Open. The learnings are shared through whitepapers and contributed back through open source.
  • 48.