Learning and Development                 Be part of the learning experience at Aditi.

              presents
                                               Join the talks. Its free.
                                               Free as in freedom at work, not free-beer.


                                               Its not training. Its mind-opener.

                                               Speak at these events. Or bring an
                                               expert/friend to talk.
    Open Talk Series
                                               Mail OpenTalk@aditi.com with topic and
      A series of illuminating talks and
  interactions that open our minds to new      availability.
ideas and concepts; that makes us look for
   newer or better ways of doing what we
 did; or point us to exciting things we have
  never done before. A range of topics on      Usually at 4.30PM Wednesdays.
     Technology, Business, Fun and Life.
HOW TO ENJOY AN                    TALK



Bring coffee & friends      Switch OFF mobile      Switch ON mind




Sign attendance sheet      SHARE your wisdom      QUESTION notions




              THANK the Talker       SPREAD the good word
New Champion




                                             Sahil Sagar




Aditi Technologies | Partnering Innovation
Agenda

        • We are not talking about crawler

        • No discussion on PageRank… maybe?




                                              4
Aditi Technologies | Partnering Innovation
The art of scale




            10-50 users                      100-500 users   500-10000
                                                                         5
Aditi Technologies | Partnering Innovation
Scale ????

                      800,000 Machines




                                             Largest Linux
                                                 Base



                                                        6
Aditi Technologies | Partnering Innovation
• What gives us this scale?


                                             Good Code?




                                             More servers?




                                               Powerful
                                               Servers?




                                                             7
Aditi Technologies | Partnering Innovation
• Lets see what gives Google the scale
            Architecture


                   GOOGLE APPS
                      SEARCH
  GOOGLE APP
                        INDEX
    ENGINE
                       CRAWL                 The apps on top
                       GMAIL...
  Python. Java.   Python, Java, C++,              of it.
      C++           Sawzall, other

                           GWQ



                        Mapreduce
    BigTable
                         BigTable            The Secret Sauce
                       Chubby Lock




           GFS / GFS II

      INTERIOR NETWORK IPv6

         RHEL 2.6.X PAE
                                              Infrastructure
      SERVER HARDWARE

               RACK
                  DC
         Exterior Network
                                                                8
Aditi Technologies | Partnering Innovation
Scale in Google
            Architecture


                   GOOGLE APPS
                      SEARCH
  GOOGLE APP
                        INDEX
    ENGINE
                       CRAWL
                       GMAIL...
  Python. Java.   Python, Java, C++,
      C++           Sawzall, other
                                             1.   The first touch
                           GWQ



                        Mapreduce
                                             2.   Size does matter
    BigTable
                         BigTable
                       Chubby Lock

                                             3.   The Safe

           GFS / GFS II
                                             4.   Operating System Implementation
      INTERIOR NETWORK IPv6


         RHEL 2.6.X PAE                      5.   Interior Network Architecture

      SERVER HARDWARE

               RACK
                  DC
         Exterior Network



                                                                                    9
Aditi Technologies | Partnering Innovation
The first touch to the services




                                                                         10
Aditi Technologies | Partnering Innovation
The first touch to the service
            Architecture


                   GOOGLE APPS
                      SEARCH
  GOOGLE APP
    ENGINE
                        INDEX
                       CRAWL            Client Browser   Firewall
                                                                                             DMZ
                       GMAIL...                80/443      80/443
                                                                                              Perimeter                       Firewall
  Python. Java.   Python, Java, C++,
      C++           Sawzall, other

                           GWQ



    BigTable            Mapreduce                                                        Squid              GWS
                         BigTable                                                       Reverse Proxy     Web Server Farm
                       Chubby Lock
                                                                    NetScalar
                                                                    http multiplexing                                           Cell
                                                                                                                            Interior Network
                                                                                                                               GFS II etc
           GFS / GFS II

      INTERIOR NETWORK IPv6

         RHEL 2.6.X PAE


      SERVER HARDWARE

               RACK
                  DC
         Exterior Network
                                                                                                                                               11
Aditi Technologies | Partnering Innovation
The touch is not always real
              Architecture


                     GOOGLE APPS
                        SEARCH
    GOOGLE APP
                          INDEX
      ENGINE
                         CRAWL                         80/443                80/443
                         GMAIL...
    Python. Java.   Python, Java, C++,
        C++           Sawzall, other

                             GWQ
                                                                 Squid
                                                                Reverse Proxy


      BigTable            Mapreduce
                           BigTable
                         Chubby Lock         • Uses Squid Reverse Proxy

                                             • Perimeter Cache hit rates 30-60% = Huge!
             GFS / GFS II
                                             • Dependent on search complexity/user preferences/traffic
        INTERIOR NETWORK IPv6
                                               type
           RHEL 2.6.X PAE
                                             • All Image Thumbnails caches, much Multimedia cached
        SERVER HARDWARE

                 RACK
                                             • Expensive common queries cached (common words like
                    DC
                                               ‘Obama‘) as they require significant back-end processing.
            Exterior Network                                                                           12
Aditi Technologies | Partnering Innovation
Size does matter




                                                                13
Aditi Technologies | Partnering Innovation
Worldwide Data Centres
             Architecture


                    GOOGLE APPS
                       SEARCH
   GOOGLE APP
                         INDEX
     ENGINE
                        CRAWL
                        GMAIL...
   Python. Java.   Python, Java, C++,
       C++           Sawzall, other

                            GWQ




     BigTable            Mapreduce
                          BigTable
                        Chubby Lock




            GFS / GFS II


       INTERIOR NETWORK IPv6


          RHEL 2.6.X PAE


       SERVER HARDWARE

                RACK                         Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of
                   DC                        800K machines.
          Exterior Network
                                                                                                               14
Aditi Technologies | Partnering Innovation
The Modular Data Centre
             Architecture


                    GOOGLE APPS
                       SEARCH
   GOOGLE APP
                         INDEX
     ENGINE
                        CRAWL
                        GMAIL...
   Python. Java.   Python, Java, C++,
       C++           Sawzall, other

                            GWQ




     BigTable            Mapreduce
                          BigTable
                        Chubby Lock




            GFS / GFS II                     Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power
                                             Consumption in 30 racks (40U).
       INTERIOR NETWORK IPv6


          RHEL 2.6.X PAE                     This is the “Atomic“ Data Centre Building Block of Google.

       SERVER HARDWARE                       A Data Centre would consist of 100‘s of Modular Cells.
                RACK
                   DC
          Exterior Network
                                                                                                           15
Aditi Technologies | Partnering Innovation
THE Safe

                                       How is a server stored in the Data Centre?




                                                                                    16
Aditi Technologies | Partnering Innovation
Google Rack (GOOG rack)
               Architecture
                                             EVERYTHING custom!
                     GOOGLE APPS
                        SEARCH
     GOOGLE APP
                          INDEX
       ENGINE
                         CRAWL
                         GMAIL...        • Optimized Motherboards
    Python. Java.
        C++
                    Python, Java, C++,
                      Sawzall, other     • Have their own HW builds
                              GWQ        • Build redundancy on top of
                                           failure
      BigTable            Mapreduce
                           BigTable      • Motherboard directly
                         Chubby Lock
                                           mounted into Rack
                                         • Servers have no casing -
              GFS / GFS II
                                           just bare boards
                                         • Assist with heat dispersal
        INTERIOR NETWORK IPv6
                                           issues
            RHEL 2.6.X PAE


         SERVER HARDWARE

                 RACK
                    DC
            Exterior Network                                            17
Aditi Technologies | Partnering Innovation
THE OPERATING SYSTEM

                                      The Core Software on each of those servers




                                                                                   18
Aditi Technologies | Partnering Innovation
OPERATING SYSTEM
               Architecture


                     GOOGLE APPS
    GOOGLE APP
                        SEARCH
                          INDEX
                                             -100% Redhat Linux Based since 1998 inception
      ENGINE
                         CRAWL
                         GMAIL...
    Python. Java.   Python, Java, C++,                                    - RHEL
        C++           Sawzall, other
                                                                          - 2.6.X Kernel
                              GWQ
                                                                          - PAE
                                                                          - Custom glibc.. rpc... ipvs...
                          Mapreduce
                                                                          - Custom FS (GFS II)
      BigTable
                           BigTable                                       - Custom Kerberos
                         Chubby Lock                                      - Custom NFS
                                                                          - Custom CUPS
                                                                          - Custom gPXE bootloader
                                                                          - Custom EVERYTHING.....
             GFS / GFS II


        INTERIOR NETWORK IPv6                Kernel/Subsystem Modifications
                                             tcmalloc – replaces glibc 2.3 malloc – much faster! works very well with threads...
            RHEL 2.6.X PAE                   rpc – the rpc layer extensively modified to provide > perf increase < latency (52%/40%)

         SERVER HARDWARE
                                             Significantly modified Kernel and Subsystems – all IPv6 enabled


                 RACK
                    DC
            Exterior Network
                                                                                                                                       19
Aditi Technologies | Partnering Innovation
THE Secret Sauce




                                                                20
Aditi Technologies | Partnering Innovation
Section II – Googles Major Glue
            Architecture


                   GOOGLE APPS
                      SEARCH
  GOOGLE APP
                        INDEX
    ENGINE
                       CRAWL
                       GMAIL...
  Python. Java.   Python, Java, C++,
      C++           Sawzall, other

                           GWQ
                                             1. Google File System Architecture – GFS II
    BigTable            Mapreduce
                         BigTable
                       Chubby Lock           2. Google Database - Bigtable

                                             3. Google Computation - Mapreduce
           GFS / GFS II


      INTERIOR NETWORK IPv6


         RHEL 2.6.X PAE


      SERVER HARDWARE

               RACK
                  DC
         Exterior Network



                                                                                           21
Aditi Technologies | Partnering Innovation
GOOGLE FILE SYSTEM

                         Manages the underlying Data on behalf of the upper layers
                                     and ultimately the applications




                                                                                     22
Aditi Technologies | Partnering Innovation
GFS versus NFS


                     Network File System (NFS)                    Google File System (GFS)


               • Single machine makes part of                       Single virtual file system spread over
                 its file system available to                        many machines
                 other machines                                     Optimized for sequential read
               • Sequential or random access                         and local accesses
               • PRO: Simplicity, generality,                       PRO: High throughput, high
                 transparency                                        capacity
               • CON: Storage capacity and                          "CON": Specialized for particular
                 throughput limited by single                        types of applications
                 server
       23                                     University of Pennsylvania
Aditi Technologies | Partnering Innovation
FILE SYSTEM I – GFS II
                Architecture


                      GOOGLE APPS
                         SEARCH
      GOOGLE APP
                           INDEX
        ENGINE
                          CRAWL
                          GMAIL...
     Python. Java.   Python, Java, C++,
         C++           Sawzall, other

                               GWQ




        BigTable           Mapreduce
                            BigTable
                          Chubby Lock




               GFS / GFS II


         INTERIOR NETWORK IPv6


             RHEL 2.6.X PAE
                                             Elegant Master Failover

          SERVER HARDWARE                    Chunk Size is now 1MB

                   RACK                      Only ever lost one 64MB chunk (in GFS I) during its entire production deployment so
                     DC                      assumed extremely reliable
             Exterior Network                                                                                          24
Aditi Technologies | Partnering Innovation
CAP Theorem
                                             (Brewer's theorem)

       • Consistency: All nodes see the same data at the same
         time
       • Availability: Node failures do not prevent survivors
         from continuing to operate
       • Partition tolerance: The system continues to operate
         despite arbitrary message loss



                                                                  25
Aditi Technologies | Partnering Innovation
GOOGLE DATABASE

                         Accesses the underlying Data on behalf of the upper layers
                                      and ultimately the applications




                                                                                      26
Aditi Technologies | Partnering Innovation
Why not commercial DB?
       • Scale is too large for most commercial databases
       • Cost would be very high
              – Building internally means system can be applied
                across many projects for low incremental cost
       • Low-level storage optimizations help
         performance significantly
              – Much harder to do when running on top of a database
                layer
             “Also fun and challenging to build large-scale
            systems”
                                                                  27
Aditi Technologies | Partnering Innovation
BigTable
       • A distributed storage system for managing structured data.
       • Scalable
              –   Thousands of servers
              –   Terabytes of in-memory data
              –   Petabyte of disk-based data
              –   Millions of reads/writes per second, efficient scans
       • Self-managing
          – Servers can be added/removed dynamically
          – Servers adjust to load imbalance
       • Used for many Google projects
              – Web indexing, Personalized Search, Google Earth, Google Analytics,
                Google Finance, …

                                                                                     28
Aditi Technologies | Partnering Innovation
BigTable




         •    Physically sorted on row-key – like a row-store
         •    Column families - like column-stores
         •    Variable (record-by-record) columns within a column family
         •    Column-values versioned; stored in reverse chronological order
         •    Designed to store hyperlink structure of web



Aditi Technologies | Partnering Innovation
GOOGLE MAPREDUCE

                         Computes the underlying Data on behalf of the applications




                                                                                      30
Aditi Technologies | Partnering Innovation
Mapreduce I
             Architecture


                    GOOGLE APPS
                       SEARCH
   GOOGLE APP
     ENGINE
                         INDEX
                        CRAWL
                                        Map Reduction can be seen as a way to exploit massive parallelism
                        GMAIL...        by breaking a task down into constituent parts and executing on
   Python. Java.   Python, Java, C++,
       C++           Sawzall, other     multiple processors
                            GWQ
                                        The Major Functions are MAP & REDUCE (with a number of intermediatary steps

     BigTable       Mapreduce           MAP                       Break task down into parallel steps
                         BigTable
                        Chubby Lock     REDUCE           Combine results into final output


            GFS / GFS II


       INTERIOR NETWORK IPv6


          RHEL 2.6.X PAE


       SERVER HARDWARE
                                        Shown is a 2-pipeline Map Reduction (There are 24 Map Reductions in the indexing pipeline)
                RACK                    Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!)

                   DC
          Exterior Network
                                                                                                                                            31
Aditi Technologies | Partnering Innovation
Word-Count using MapReduce
       Problem: determine the frequency of each word in a large
         document collection




Aditi Technologies | Partnering Innovation
What runs on top of all this



                                             33
Aditi Technologies | Partnering Innovation
PageRank: Intuition                  Shouldn't E's vote be
                                                                        worth more than F's?

                                             G                  A

                                             H     E            B

How many levels                              I                  C
should we consider?                                F
                                             J                  D


            • Imagine a contest for The Web's Best Page
                   – Initially, each page has one vote
                   – Each page votes for all the pages it has a link to
                   – To ensure fairness, pages voting for more than one page must
                     split their vote equally between them
                   – Voting proceeds in rounds; in each round, each page has the
                     number of votes it received in the previous round
                   – In practice, it's a little more complicated - but not much!
       34
Aditi Technologies | Partnering Innovation
Random Surfer Model
               • PageRank has an intuitive basis in random walks
                 on graphs

               • Imagine a random surfer, who starts on a random
                 page and, in each step,
                      – with probability d, clicks on a random link on the page
                      – with probability 1-d, jumps to a random page (bored?)

               • The PageRank of a page can be interpreted as the
                 fraction of steps the surfer spends on the
                 corresponding page
       35
Aditi Technologies | Partnering Innovation
BUILD YOUR OWN GOOGLE

                                             The Basic Open Source Tools




                                                                           36
Aditi Technologies | Partnering Innovation
The Google Stack (vs Yahoo‘ish/Open Source)

                                                                                     Open Source
                                                                                             (Yahoo’ish)
                                                         Architecture                        Architecture



                                                               GOOGLE APPS
                                                                  SEARCH
                                         APP ENGINE                 INDEX              CLIENT APPLICATION
                                                                   CRAWL
                                                                   GMAIL...
                                         Python, Java,        Python, Java, C++,   Pig Latin, Python, PHP, Java ....
                                             C++,               Sawzall, other                 anything

                                             Task Queue                 GWQ                  Job Tracker




                    Googles                                        Mapreduce           Hadoop Framework
                                                                                                                                     Hadoop
                                                                    BigTable
                  Secret Sauce
                                              BigTable
                                                                  Chubby Lock
                                                                                            Mapreduce
                                                                                       Hbase (Bigtable equiv.)
                                                                                                                                   Open Source
                                                                                                                       (Other Tools such as crawlers, indexers readily available)




                                                     GFS / GFS II                         HDFS (hadoop)


                                               INTERIOR NETWORK IPv6                 INTERIOR NETWORK IPv6


                                                  RHEL 2.6.X PAE                        CentOS 2.6.X PAE


                                               SERVER HARDWARE                       SERVER HARDWARE

                                                           RACK                                RACK
                                                             DC                                  DC
                                                   Exterior Network                      Exterior Network


                                                                    Conceptual Overview
                                                                   Google vs. Open Source                                                                                      37
Aditi Technologies | Partnering Innovation
END

                                             (Thankyou)




                                                          38
Aditi Technologies | Partnering Innovation
Pre Presentation
                         The Google Philosophy                         (according to ed)




       •    Jedis build their own lightsabres (the MS Eat your own Dog Food)
       •    Parallelize Everything
       •    Distribute Everything (to atomic level if possible)
       •    Compress Everything (CPU cheaper than bandwidth)
       •    Secure Everything (you can never be too paranoid)
       •    Cache (almost) Everything
       •    Redundantize Everything (in triplicate usually)
       •    Latency is VERY evil




                                                                                           39
Aditi Technologies | Partnering Innovation
Special Thanks to ….



           The Anatomy of the Google Architecture
                                                  “The unofficial Version“

                                                    V1.0 November 2009




                                                     • Ed Austin
                                              •     {ed, edik} @i-dot.com




Aditi Technologies | Partnering Innovation
Keep Learning
For any suggestions on topics/ feedbacks etc.,
        Contact OpenTalk@aditi.com

Google Architecture - Breaking it Open

  • 1.
    Learning and Development Be part of the learning experience at Aditi. presents Join the talks. Its free. Free as in freedom at work, not free-beer. Its not training. Its mind-opener. Speak at these events. Or bring an expert/friend to talk. Open Talk Series Mail OpenTalk@aditi.com with topic and A series of illuminating talks and interactions that open our minds to new availability. ideas and concepts; that makes us look for newer or better ways of doing what we did; or point us to exciting things we have never done before. A range of topics on Usually at 4.30PM Wednesdays. Technology, Business, Fun and Life.
  • 2.
    HOW TO ENJOYAN TALK Bring coffee & friends Switch OFF mobile Switch ON mind Sign attendance sheet SHARE your wisdom QUESTION notions THANK the Talker SPREAD the good word
  • 3.
    New Champion Sahil Sagar Aditi Technologies | Partnering Innovation
  • 4.
    Agenda • We are not talking about crawler • No discussion on PageRank… maybe? 4 Aditi Technologies | Partnering Innovation
  • 5.
    The art ofscale 10-50 users 100-500 users 500-10000 5 Aditi Technologies | Partnering Innovation
  • 6.
    Scale ???? 800,000 Machines Largest Linux Base 6 Aditi Technologies | Partnering Innovation
  • 7.
    • What givesus this scale? Good Code? More servers? Powerful Servers? 7 Aditi Technologies | Partnering Innovation
  • 8.
    • Lets seewhat gives Google the scale Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL The apps on top GMAIL... Python. Java. Python, Java, C++, of it. C++ Sawzall, other GWQ Mapreduce BigTable BigTable The Secret Sauce Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE Infrastructure SERVER HARDWARE RACK DC Exterior Network 8 Aditi Technologies | Partnering Innovation
  • 9.
    Scale in Google Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other 1. The first touch GWQ Mapreduce 2. Size does matter BigTable BigTable Chubby Lock 3. The Safe GFS / GFS II 4. Operating System Implementation INTERIOR NETWORK IPv6 RHEL 2.6.X PAE 5. Interior Network Architecture SERVER HARDWARE RACK DC Exterior Network 9 Aditi Technologies | Partnering Innovation
  • 10.
    The first touchto the services 10 Aditi Technologies | Partnering Innovation
  • 11.
    The first touchto the service Architecture GOOGLE APPS SEARCH GOOGLE APP ENGINE INDEX CRAWL Client Browser Firewall DMZ GMAIL... 80/443 80/443 Perimeter Firewall Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce Squid GWS BigTable Reverse Proxy Web Server Farm Chubby Lock NetScalar http multiplexing Cell Interior Network GFS II etc GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 11 Aditi Technologies | Partnering Innovation
  • 12.
    The touch isnot always real Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL 80/443 80/443 GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ Squid Reverse Proxy BigTable Mapreduce BigTable Chubby Lock • Uses Squid Reverse Proxy • Perimeter Cache hit rates 30-60% = Huge! GFS / GFS II • Dependent on search complexity/user preferences/traffic INTERIOR NETWORK IPv6 type RHEL 2.6.X PAE • All Image Thumbnails caches, much Multimedia cached SERVER HARDWARE RACK • Expensive common queries cached (common words like DC ‘Obama‘) as they require significant back-end processing. Exterior Network 12 Aditi Technologies | Partnering Innovation
  • 13.
    Size does matter 13 Aditi Technologies | Partnering Innovation
  • 14.
    Worldwide Data Centres Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of DC 800K machines. Exterior Network 14 Aditi Technologies | Partnering Innovation
  • 15.
    The Modular DataCentre Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power Consumption in 30 racks (40U). INTERIOR NETWORK IPv6 RHEL 2.6.X PAE This is the “Atomic“ Data Centre Building Block of Google. SERVER HARDWARE A Data Centre would consist of 100‘s of Modular Cells. RACK DC Exterior Network 15 Aditi Technologies | Partnering Innovation
  • 16.
    THE Safe How is a server stored in the Data Centre? 16 Aditi Technologies | Partnering Innovation
  • 17.
    Google Rack (GOOGrack) Architecture EVERYTHING custom! GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... • Optimized Motherboards Python. Java. C++ Python, Java, C++, Sawzall, other • Have their own HW builds GWQ • Build redundancy on top of failure BigTable Mapreduce BigTable • Motherboard directly Chubby Lock mounted into Rack • Servers have no casing - GFS / GFS II just bare boards • Assist with heat dispersal INTERIOR NETWORK IPv6 issues RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 17 Aditi Technologies | Partnering Innovation
  • 18.
    THE OPERATING SYSTEM The Core Software on each of those servers 18 Aditi Technologies | Partnering Innovation
  • 19.
    OPERATING SYSTEM Architecture GOOGLE APPS GOOGLE APP SEARCH INDEX -100% Redhat Linux Based since 1998 inception ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, - RHEL C++ Sawzall, other - 2.6.X Kernel GWQ - PAE - Custom glibc.. rpc... ipvs... Mapreduce - Custom FS (GFS II) BigTable BigTable - Custom Kerberos Chubby Lock - Custom NFS - Custom CUPS - Custom gPXE bootloader - Custom EVERYTHING..... GFS / GFS II INTERIOR NETWORK IPv6 Kernel/Subsystem Modifications tcmalloc – replaces glibc 2.3 malloc – much faster! works very well with threads... RHEL 2.6.X PAE rpc – the rpc layer extensively modified to provide > perf increase < latency (52%/40%) SERVER HARDWARE Significantly modified Kernel and Subsystems – all IPv6 enabled RACK DC Exterior Network 19 Aditi Technologies | Partnering Innovation
  • 20.
    THE Secret Sauce 20 Aditi Technologies | Partnering Innovation
  • 21.
    Section II –Googles Major Glue Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ 1. Google File System Architecture – GFS II BigTable Mapreduce BigTable Chubby Lock 2. Google Database - Bigtable 3. Google Computation - Mapreduce GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 21 Aditi Technologies | Partnering Innovation
  • 22.
    GOOGLE FILE SYSTEM Manages the underlying Data on behalf of the upper layers and ultimately the applications 22 Aditi Technologies | Partnering Innovation
  • 23.
    GFS versus NFS Network File System (NFS) Google File System (GFS) • Single machine makes part of  Single virtual file system spread over its file system available to many machines other machines  Optimized for sequential read • Sequential or random access and local accesses • PRO: Simplicity, generality,  PRO: High throughput, high transparency capacity • CON: Storage capacity and  "CON": Specialized for particular throughput limited by single types of applications server 23 University of Pennsylvania Aditi Technologies | Partnering Innovation
  • 24.
    FILE SYSTEM I– GFS II Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE Elegant Master Failover SERVER HARDWARE Chunk Size is now 1MB RACK Only ever lost one 64MB chunk (in GFS I) during its entire production deployment so DC assumed extremely reliable Exterior Network 24 Aditi Technologies | Partnering Innovation
  • 25.
    CAP Theorem (Brewer's theorem) • Consistency: All nodes see the same data at the same time • Availability: Node failures do not prevent survivors from continuing to operate • Partition tolerance: The system continues to operate despite arbitrary message loss 25 Aditi Technologies | Partnering Innovation
  • 26.
    GOOGLE DATABASE Accesses the underlying Data on behalf of the upper layers and ultimately the applications 26 Aditi Technologies | Partnering Innovation
  • 27.
    Why not commercialDB? • Scale is too large for most commercial databases • Cost would be very high – Building internally means system can be applied across many projects for low incremental cost • Low-level storage optimizations help performance significantly – Much harder to do when running on top of a database layer “Also fun and challenging to build large-scale systems” 27 Aditi Technologies | Partnering Innovation
  • 28.
    BigTable • A distributed storage system for managing structured data. • Scalable – Thousands of servers – Terabytes of in-memory data – Petabyte of disk-based data – Millions of reads/writes per second, efficient scans • Self-managing – Servers can be added/removed dynamically – Servers adjust to load imbalance • Used for many Google projects – Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … 28 Aditi Technologies | Partnering Innovation
  • 29.
    BigTable • Physically sorted on row-key – like a row-store • Column families - like column-stores • Variable (record-by-record) columns within a column family • Column-values versioned; stored in reverse chronological order • Designed to store hyperlink structure of web Aditi Technologies | Partnering Innovation
  • 30.
    GOOGLE MAPREDUCE Computes the underlying Data on behalf of the applications 30 Aditi Technologies | Partnering Innovation
  • 31.
    Mapreduce I Architecture GOOGLE APPS SEARCH GOOGLE APP ENGINE INDEX CRAWL Map Reduction can be seen as a way to exploit massive parallelism GMAIL... by breaking a task down into constituent parts and executing on Python. Java. Python, Java, C++, C++ Sawzall, other multiple processors GWQ The Major Functions are MAP & REDUCE (with a number of intermediatary steps BigTable Mapreduce MAP Break task down into parallel steps BigTable Chubby Lock REDUCE Combine results into final output GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE Shown is a 2-pipeline Map Reduction (There are 24 Map Reductions in the indexing pipeline) RACK Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!) DC Exterior Network 31 Aditi Technologies | Partnering Innovation
  • 32.
    Word-Count using MapReduce Problem: determine the frequency of each word in a large document collection Aditi Technologies | Partnering Innovation
  • 33.
    What runs ontop of all this 33 Aditi Technologies | Partnering Innovation
  • 34.
    PageRank: Intuition Shouldn't E's vote be worth more than F's? G A H E B How many levels I C should we consider? F J D • Imagine a contest for The Web's Best Page – Initially, each page has one vote – Each page votes for all the pages it has a link to – To ensure fairness, pages voting for more than one page must split their vote equally between them – Voting proceeds in rounds; in each round, each page has the number of votes it received in the previous round – In practice, it's a little more complicated - but not much! 34 Aditi Technologies | Partnering Innovation
  • 35.
    Random Surfer Model • PageRank has an intuitive basis in random walks on graphs • Imagine a random surfer, who starts on a random page and, in each step, – with probability d, clicks on a random link on the page – with probability 1-d, jumps to a random page (bored?) • The PageRank of a page can be interpreted as the fraction of steps the surfer spends on the corresponding page 35 Aditi Technologies | Partnering Innovation
  • 36.
    BUILD YOUR OWNGOOGLE The Basic Open Source Tools 36 Aditi Technologies | Partnering Innovation
  • 37.
    The Google Stack(vs Yahoo‘ish/Open Source) Open Source (Yahoo’ish) Architecture Architecture GOOGLE APPS SEARCH APP ENGINE INDEX CLIENT APPLICATION CRAWL GMAIL... Python, Java, Python, Java, C++, Pig Latin, Python, PHP, Java .... C++, Sawzall, other anything Task Queue GWQ Job Tracker Googles Mapreduce Hadoop Framework Hadoop BigTable Secret Sauce BigTable Chubby Lock Mapreduce Hbase (Bigtable equiv.) Open Source (Other Tools such as crawlers, indexers readily available) GFS / GFS II HDFS (hadoop) INTERIOR NETWORK IPv6 INTERIOR NETWORK IPv6 RHEL 2.6.X PAE CentOS 2.6.X PAE SERVER HARDWARE SERVER HARDWARE RACK RACK DC DC Exterior Network Exterior Network Conceptual Overview Google vs. Open Source 37 Aditi Technologies | Partnering Innovation
  • 38.
    END (Thankyou) 38 Aditi Technologies | Partnering Innovation
  • 39.
    Pre Presentation The Google Philosophy (according to ed) • Jedis build their own lightsabres (the MS Eat your own Dog Food) • Parallelize Everything • Distribute Everything (to atomic level if possible) • Compress Everything (CPU cheaper than bandwidth) • Secure Everything (you can never be too paranoid) • Cache (almost) Everything • Redundantize Everything (in triplicate usually) • Latency is VERY evil 39 Aditi Technologies | Partnering Innovation
  • 40.
    Special Thanks to…. The Anatomy of the Google Architecture “The unofficial Version“ V1.0 November 2009 • Ed Austin • {ed, edik} @i-dot.com Aditi Technologies | Partnering Innovation
  • 42.
    Keep Learning For anysuggestions on topics/ feedbacks etc., Contact OpenTalk@aditi.com