SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Learning and Development                 Be part of the learning experience at Aditi.

              presents
                                               Join the talks. Its free.
                                               Free as in freedom at work, not free-beer.


                                               Its not training. Its mind-opener.

                                               Speak at these events. Or bring an
                                               expert/friend to talk.
    Open Talk Series
                                               Mail OpenTalk@aditi.com with topic and
      A series of illuminating talks and
  interactions that open our minds to new      availability.
ideas and concepts; that makes us look for
   newer or better ways of doing what we
 did; or point us to exciting things we have
  never done before. A range of topics on      Usually at 4.30PM Wednesdays.
     Technology, Business, Fun and Life.
HOW TO ENJOY AN                    TALK



Bring coffee & friends      Switch OFF mobile      Switch ON mind




Sign attendance sheet      SHARE your wisdom      QUESTION notions




              THANK the Talker       SPREAD the good word
New Champion




                                             Sahil Sagar




Aditi Technologies | Partnering Innovation
Agenda

        • We are not talking about crawler

        • No discussion on PageRank… maybe?




                                              4
Aditi Technologies | Partnering Innovation
The art of scale




            10-50 users                      100-500 users   500-10000
                                                                         5
Aditi Technologies | Partnering Innovation
Scale ????

                      800,000 Machines




                                             Largest Linux
                                                 Base



                                                        6
Aditi Technologies | Partnering Innovation
• What gives us this scale?


                                             Good Code?




                                             More servers?




                                               Powerful
                                               Servers?




                                                             7
Aditi Technologies | Partnering Innovation
• Lets see what gives Google the scale
            Architecture


                   GOOGLE APPS
                      SEARCH
  GOOGLE APP
                        INDEX
    ENGINE
                       CRAWL                 The apps on top
                       GMAIL...
  Python. Java.   Python, Java, C++,              of it.
      C++           Sawzall, other

                           GWQ



                        Mapreduce
    BigTable
                         BigTable            The Secret Sauce
                       Chubby Lock




           GFS / GFS II

      INTERIOR NETWORK IPv6

         RHEL 2.6.X PAE
                                              Infrastructure
      SERVER HARDWARE

               RACK
                  DC
         Exterior Network
                                                                8
Aditi Technologies | Partnering Innovation
Scale in Google
            Architecture


                   GOOGLE APPS
                      SEARCH
  GOOGLE APP
                        INDEX
    ENGINE
                       CRAWL
                       GMAIL...
  Python. Java.   Python, Java, C++,
      C++           Sawzall, other
                                             1.   The first touch
                           GWQ



                        Mapreduce
                                             2.   Size does matter
    BigTable
                         BigTable
                       Chubby Lock

                                             3.   The Safe

           GFS / GFS II
                                             4.   Operating System Implementation
      INTERIOR NETWORK IPv6


         RHEL 2.6.X PAE                      5.   Interior Network Architecture

      SERVER HARDWARE

               RACK
                  DC
         Exterior Network



                                                                                    9
Aditi Technologies | Partnering Innovation
The first touch to the services




                                                                         10
Aditi Technologies | Partnering Innovation
The first touch to the service
            Architecture


                   GOOGLE APPS
                      SEARCH
  GOOGLE APP
    ENGINE
                        INDEX
                       CRAWL            Client Browser   Firewall
                                                                                             DMZ
                       GMAIL...                80/443      80/443
                                                                                              Perimeter                       Firewall
  Python. Java.   Python, Java, C++,
      C++           Sawzall, other

                           GWQ



    BigTable            Mapreduce                                                        Squid              GWS
                         BigTable                                                       Reverse Proxy     Web Server Farm
                       Chubby Lock
                                                                    NetScalar
                                                                    http multiplexing                                           Cell
                                                                                                                            Interior Network
                                                                                                                               GFS II etc
           GFS / GFS II

      INTERIOR NETWORK IPv6

         RHEL 2.6.X PAE


      SERVER HARDWARE

               RACK
                  DC
         Exterior Network
                                                                                                                                               11
Aditi Technologies | Partnering Innovation
The touch is not always real
              Architecture


                     GOOGLE APPS
                        SEARCH
    GOOGLE APP
                          INDEX
      ENGINE
                         CRAWL                         80/443                80/443
                         GMAIL...
    Python. Java.   Python, Java, C++,
        C++           Sawzall, other

                             GWQ
                                                                 Squid
                                                                Reverse Proxy


      BigTable            Mapreduce
                           BigTable
                         Chubby Lock         • Uses Squid Reverse Proxy

                                             • Perimeter Cache hit rates 30-60% = Huge!
             GFS / GFS II
                                             • Dependent on search complexity/user preferences/traffic
        INTERIOR NETWORK IPv6
                                               type
           RHEL 2.6.X PAE
                                             • All Image Thumbnails caches, much Multimedia cached
        SERVER HARDWARE

                 RACK
                                             • Expensive common queries cached (common words like
                    DC
                                               ‘Obama‘) as they require significant back-end processing.
            Exterior Network                                                                           12
Aditi Technologies | Partnering Innovation
Size does matter




                                                                13
Aditi Technologies | Partnering Innovation
Worldwide Data Centres
             Architecture


                    GOOGLE APPS
                       SEARCH
   GOOGLE APP
                         INDEX
     ENGINE
                        CRAWL
                        GMAIL...
   Python. Java.   Python, Java, C++,
       C++           Sawzall, other

                            GWQ




     BigTable            Mapreduce
                          BigTable
                        Chubby Lock




            GFS / GFS II


       INTERIOR NETWORK IPv6


          RHEL 2.6.X PAE


       SERVER HARDWARE

                RACK                         Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of
                   DC                        800K machines.
          Exterior Network
                                                                                                               14
Aditi Technologies | Partnering Innovation
The Modular Data Centre
             Architecture


                    GOOGLE APPS
                       SEARCH
   GOOGLE APP
                         INDEX
     ENGINE
                        CRAWL
                        GMAIL...
   Python. Java.   Python, Java, C++,
       C++           Sawzall, other

                            GWQ




     BigTable            Mapreduce
                          BigTable
                        Chubby Lock




            GFS / GFS II                     Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power
                                             Consumption in 30 racks (40U).
       INTERIOR NETWORK IPv6


          RHEL 2.6.X PAE                     This is the “Atomic“ Data Centre Building Block of Google.

       SERVER HARDWARE                       A Data Centre would consist of 100‘s of Modular Cells.
                RACK
                   DC
          Exterior Network
                                                                                                           15
Aditi Technologies | Partnering Innovation
THE Safe

                                       How is a server stored in the Data Centre?




                                                                                    16
Aditi Technologies | Partnering Innovation
Google Rack (GOOG rack)
               Architecture
                                             EVERYTHING custom!
                     GOOGLE APPS
                        SEARCH
     GOOGLE APP
                          INDEX
       ENGINE
                         CRAWL
                         GMAIL...        • Optimized Motherboards
    Python. Java.
        C++
                    Python, Java, C++,
                      Sawzall, other     • Have their own HW builds
                              GWQ        • Build redundancy on top of
                                           failure
      BigTable            Mapreduce
                           BigTable      • Motherboard directly
                         Chubby Lock
                                           mounted into Rack
                                         • Servers have no casing -
              GFS / GFS II
                                           just bare boards
                                         • Assist with heat dispersal
        INTERIOR NETWORK IPv6
                                           issues
            RHEL 2.6.X PAE


         SERVER HARDWARE

                 RACK
                    DC
            Exterior Network                                            17
Aditi Technologies | Partnering Innovation
THE OPERATING SYSTEM

                                      The Core Software on each of those servers




                                                                                   18
Aditi Technologies | Partnering Innovation
OPERATING SYSTEM
               Architecture


                     GOOGLE APPS
    GOOGLE APP
                        SEARCH
                          INDEX
                                             -100% Redhat Linux Based since 1998 inception
      ENGINE
                         CRAWL
                         GMAIL...
    Python. Java.   Python, Java, C++,                                    - RHEL
        C++           Sawzall, other
                                                                          - 2.6.X Kernel
                              GWQ
                                                                          - PAE
                                                                          - Custom glibc.. rpc... ipvs...
                          Mapreduce
                                                                          - Custom FS (GFS II)
      BigTable
                           BigTable                                       - Custom Kerberos
                         Chubby Lock                                      - Custom NFS
                                                                          - Custom CUPS
                                                                          - Custom gPXE bootloader
                                                                          - Custom EVERYTHING.....
             GFS / GFS II


        INTERIOR NETWORK IPv6                Kernel/Subsystem Modifications
                                             tcmalloc – replaces glibc 2.3 malloc – much faster! works very well with threads...
            RHEL 2.6.X PAE                   rpc – the rpc layer extensively modified to provide > perf increase < latency (52%/40%)

         SERVER HARDWARE
                                             Significantly modified Kernel and Subsystems – all IPv6 enabled


                 RACK
                    DC
            Exterior Network
                                                                                                                                       19
Aditi Technologies | Partnering Innovation
THE Secret Sauce




                                                                20
Aditi Technologies | Partnering Innovation
Section II – Googles Major Glue
            Architecture


                   GOOGLE APPS
                      SEARCH
  GOOGLE APP
                        INDEX
    ENGINE
                       CRAWL
                       GMAIL...
  Python. Java.   Python, Java, C++,
      C++           Sawzall, other

                           GWQ
                                             1. Google File System Architecture – GFS II
    BigTable            Mapreduce
                         BigTable
                       Chubby Lock           2. Google Database - Bigtable

                                             3. Google Computation - Mapreduce
           GFS / GFS II


      INTERIOR NETWORK IPv6


         RHEL 2.6.X PAE


      SERVER HARDWARE

               RACK
                  DC
         Exterior Network



                                                                                           21
Aditi Technologies | Partnering Innovation
GOOGLE FILE SYSTEM

                         Manages the underlying Data on behalf of the upper layers
                                     and ultimately the applications




                                                                                     22
Aditi Technologies | Partnering Innovation
GFS versus NFS


                     Network File System (NFS)                    Google File System (GFS)


               • Single machine makes part of                       Single virtual file system spread over
                 its file system available to                        many machines
                 other machines                                     Optimized for sequential read
               • Sequential or random access                         and local accesses
               • PRO: Simplicity, generality,                       PRO: High throughput, high
                 transparency                                        capacity
               • CON: Storage capacity and                          "CON": Specialized for particular
                 throughput limited by single                        types of applications
                 server
       23                                     University of Pennsylvania
Aditi Technologies | Partnering Innovation
FILE SYSTEM I – GFS II
                Architecture


                      GOOGLE APPS
                         SEARCH
      GOOGLE APP
                           INDEX
        ENGINE
                          CRAWL
                          GMAIL...
     Python. Java.   Python, Java, C++,
         C++           Sawzall, other

                               GWQ




        BigTable           Mapreduce
                            BigTable
                          Chubby Lock




               GFS / GFS II


         INTERIOR NETWORK IPv6


             RHEL 2.6.X PAE
                                             Elegant Master Failover

          SERVER HARDWARE                    Chunk Size is now 1MB

                   RACK                      Only ever lost one 64MB chunk (in GFS I) during its entire production deployment so
                     DC                      assumed extremely reliable
             Exterior Network                                                                                          24
Aditi Technologies | Partnering Innovation
CAP Theorem
                                             (Brewer's theorem)

       • Consistency: All nodes see the same data at the same
         time
       • Availability: Node failures do not prevent survivors
         from continuing to operate
       • Partition tolerance: The system continues to operate
         despite arbitrary message loss



                                                                  25
Aditi Technologies | Partnering Innovation
GOOGLE DATABASE

                         Accesses the underlying Data on behalf of the upper layers
                                      and ultimately the applications




                                                                                      26
Aditi Technologies | Partnering Innovation
Why not commercial DB?
       • Scale is too large for most commercial databases
       • Cost would be very high
              – Building internally means system can be applied
                across many projects for low incremental cost
       • Low-level storage optimizations help
         performance significantly
              – Much harder to do when running on top of a database
                layer
             “Also fun and challenging to build large-scale
            systems”
                                                                  27
Aditi Technologies | Partnering Innovation
BigTable
       • A distributed storage system for managing structured data.
       • Scalable
              –   Thousands of servers
              –   Terabytes of in-memory data
              –   Petabyte of disk-based data
              –   Millions of reads/writes per second, efficient scans
       • Self-managing
          – Servers can be added/removed dynamically
          – Servers adjust to load imbalance
       • Used for many Google projects
              – Web indexing, Personalized Search, Google Earth, Google Analytics,
                Google Finance, …

                                                                                     28
Aditi Technologies | Partnering Innovation
BigTable




         •    Physically sorted on row-key – like a row-store
         •    Column families - like column-stores
         •    Variable (record-by-record) columns within a column family
         •    Column-values versioned; stored in reverse chronological order
         •    Designed to store hyperlink structure of web



Aditi Technologies | Partnering Innovation
GOOGLE MAPREDUCE

                         Computes the underlying Data on behalf of the applications




                                                                                      30
Aditi Technologies | Partnering Innovation
Mapreduce I
             Architecture


                    GOOGLE APPS
                       SEARCH
   GOOGLE APP
     ENGINE
                         INDEX
                        CRAWL
                                        Map Reduction can be seen as a way to exploit massive parallelism
                        GMAIL...        by breaking a task down into constituent parts and executing on
   Python. Java.   Python, Java, C++,
       C++           Sawzall, other     multiple processors
                            GWQ
                                        The Major Functions are MAP & REDUCE (with a number of intermediatary steps

     BigTable       Mapreduce           MAP                       Break task down into parallel steps
                         BigTable
                        Chubby Lock     REDUCE           Combine results into final output


            GFS / GFS II


       INTERIOR NETWORK IPv6


          RHEL 2.6.X PAE


       SERVER HARDWARE
                                        Shown is a 2-pipeline Map Reduction (There are 24 Map Reductions in the indexing pipeline)
                RACK                    Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!)

                   DC
          Exterior Network
                                                                                                                                            31
Aditi Technologies | Partnering Innovation
Word-Count using MapReduce
       Problem: determine the frequency of each word in a large
         document collection




Aditi Technologies | Partnering Innovation
What runs on top of all this



                                             33
Aditi Technologies | Partnering Innovation
PageRank: Intuition                  Shouldn't E's vote be
                                                                        worth more than F's?

                                             G                  A

                                             H     E            B

How many levels                              I                  C
should we consider?                                F
                                             J                  D


            • Imagine a contest for The Web's Best Page
                   – Initially, each page has one vote
                   – Each page votes for all the pages it has a link to
                   – To ensure fairness, pages voting for more than one page must
                     split their vote equally between them
                   – Voting proceeds in rounds; in each round, each page has the
                     number of votes it received in the previous round
                   – In practice, it's a little more complicated - but not much!
       34
Aditi Technologies | Partnering Innovation
Random Surfer Model
               • PageRank has an intuitive basis in random walks
                 on graphs

               • Imagine a random surfer, who starts on a random
                 page and, in each step,
                      – with probability d, clicks on a random link on the page
                      – with probability 1-d, jumps to a random page (bored?)

               • The PageRank of a page can be interpreted as the
                 fraction of steps the surfer spends on the
                 corresponding page
       35
Aditi Technologies | Partnering Innovation
BUILD YOUR OWN GOOGLE

                                             The Basic Open Source Tools




                                                                           36
Aditi Technologies | Partnering Innovation
The Google Stack (vs Yahoo‘ish/Open Source)

                                                                                     Open Source
                                                                                             (Yahoo’ish)
                                                         Architecture                        Architecture



                                                               GOOGLE APPS
                                                                  SEARCH
                                         APP ENGINE                 INDEX              CLIENT APPLICATION
                                                                   CRAWL
                                                                   GMAIL...
                                         Python, Java,        Python, Java, C++,   Pig Latin, Python, PHP, Java ....
                                             C++,               Sawzall, other                 anything

                                             Task Queue                 GWQ                  Job Tracker




                    Googles                                        Mapreduce           Hadoop Framework
                                                                                                                                     Hadoop
                                                                    BigTable
                  Secret Sauce
                                              BigTable
                                                                  Chubby Lock
                                                                                            Mapreduce
                                                                                       Hbase (Bigtable equiv.)
                                                                                                                                   Open Source
                                                                                                                       (Other Tools such as crawlers, indexers readily available)




                                                     GFS / GFS II                         HDFS (hadoop)


                                               INTERIOR NETWORK IPv6                 INTERIOR NETWORK IPv6


                                                  RHEL 2.6.X PAE                        CentOS 2.6.X PAE


                                               SERVER HARDWARE                       SERVER HARDWARE

                                                           RACK                                RACK
                                                             DC                                  DC
                                                   Exterior Network                      Exterior Network


                                                                    Conceptual Overview
                                                                   Google vs. Open Source                                                                                      37
Aditi Technologies | Partnering Innovation
END

                                             (Thankyou)




                                                          38
Aditi Technologies | Partnering Innovation
Pre Presentation
                         The Google Philosophy                         (according to ed)




       •    Jedis build their own lightsabres (the MS Eat your own Dog Food)
       •    Parallelize Everything
       •    Distribute Everything (to atomic level if possible)
       •    Compress Everything (CPU cheaper than bandwidth)
       •    Secure Everything (you can never be too paranoid)
       •    Cache (almost) Everything
       •    Redundantize Everything (in triplicate usually)
       •    Latency is VERY evil




                                                                                           39
Aditi Technologies | Partnering Innovation
Special Thanks to ….



           The Anatomy of the Google Architecture
                                                  “The unofficial Version“

                                                    V1.0 November 2009




                                                     • Ed Austin
                                              •     {ed, edik} @i-dot.com




Aditi Technologies | Partnering Innovation
Keep Learning
For any suggestions on topics/ feedbacks etc.,
        Contact OpenTalk@aditi.com

Mais conteúdo relacionado

Mais procurados

“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
Edge AI and Vision Alliance
 

Mais procurados (20)

MySQL 5.7にやられないためにおぼえておいてほしいこと
MySQL 5.7にやられないためにおぼえておいてほしいことMySQL 5.7にやられないためにおぼえておいてほしいこと
MySQL 5.7にやられないためにおぼえておいてほしいこと
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and LatencyOptimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latency
 
Pegasus In Depth (2018/10)
Pegasus In Depth (2018/10)Pegasus In Depth (2018/10)
Pegasus In Depth (2018/10)
 
BusinesstoVirtual Nutanix Solution Provider http://tinyurl.com/b2vnutanix
BusinesstoVirtual Nutanix Solution Provider http://tinyurl.com/b2vnutanixBusinesstoVirtual Nutanix Solution Provider http://tinyurl.com/b2vnutanix
BusinesstoVirtual Nutanix Solution Provider http://tinyurl.com/b2vnutanix
 
JavaからAkkaハンズオン
JavaからAkkaハンズオンJavaからAkkaハンズオン
JavaからAkkaハンズオン
 
HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?HPC 的に H100 は魅力的な GPU なのか?
HPC 的に H100 は魅力的な GPU なのか?
 
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
“Deploying Deep Learning Applications on FPGAs with MATLAB,” a Presentation f...
 
ClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei MilovidovClickHouse new features and development roadmap, by Aleksei Milovidov
ClickHouse new features and development roadmap, by Aleksei Milovidov
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
What is a Network Hypervisor?
What is a Network Hypervisor?What is a Network Hypervisor?
What is a Network Hypervisor?
 
LAMP TECHNOLOGY
LAMP TECHNOLOGYLAMP TECHNOLOGY
LAMP TECHNOLOGY
 
DAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon AuroraDAT202_Getting started with Amazon Aurora
DAT202_Getting started with Amazon Aurora
 
Beyaz Şapkalı Hacker CEH Eğitimi - Bölüm 19
Beyaz Şapkalı Hacker CEH Eğitimi - Bölüm 19Beyaz Şapkalı Hacker CEH Eğitimi - Bölüm 19
Beyaz Şapkalı Hacker CEH Eğitimi - Bölüm 19
 
Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!Java 9: The (G1) GC Awakens!
Java 9: The (G1) GC Awakens!
 
Beyaz Şapkalı Hacker Eğitimi Yardımcı Ders Notları
Beyaz Şapkalı Hacker Eğitimi Yardımcı Ders NotlarıBeyaz Şapkalı Hacker Eğitimi Yardımcı Ders Notları
Beyaz Şapkalı Hacker Eğitimi Yardımcı Ders Notları
 
Citrix xenapp training
Citrix xenapp training Citrix xenapp training
Citrix xenapp training
 
Best Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on SolarisBest Practices with PostgreSQL on Solaris
Best Practices with PostgreSQL on Solaris
 
Planning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera ClusterPlanning for Disaster Recovery (DR) with Galera Cluster
Planning for Disaster Recovery (DR) with Galera Cluster
 
GÜVENLİK SİSTEMLERİNİ ATLATMA
GÜVENLİK SİSTEMLERİNİ ATLATMAGÜVENLİK SİSTEMLERİNİ ATLATMA
GÜVENLİK SİSTEMLERİNİ ATLATMA
 
Azure App Service Overview
Azure App Service OverviewAzure App Service Overview
Azure App Service Overview
 

Semelhante a Google Architecture - Breaking it Open

Virtualization And Cloud Impact Overview Auditor Spin Enterprise Gr Cv4
Virtualization And Cloud Impact Overview Auditor Spin   Enterprise Gr Cv4Virtualization And Cloud Impact Overview Auditor Spin   Enterprise Gr Cv4
Virtualization And Cloud Impact Overview Auditor Spin Enterprise Gr Cv4
EnterpriseGRC Solutions, Inc.
 
Fosec2011 keynote address
Fosec2011 keynote addressFosec2011 keynote address
Fosec2011 keynote address
threesixty
 
Keeping Your Internet Business IT Asset Light By Mandar Kulkarni
Keeping Your Internet Business IT Asset Light By Mandar KulkarniKeeping Your Internet Business IT Asset Light By Mandar Kulkarni
Keeping Your Internet Business IT Asset Light By Mandar Kulkarni
iamwire
 
Intel IT OpenStack Journey - OpenStack Fall 2012 Summit.pdf
Intel IT OpenStack Journey - OpenStack Fall 2012 Summit.pdfIntel IT OpenStack Journey - OpenStack Fall 2012 Summit.pdf
Intel IT OpenStack Journey - OpenStack Fall 2012 Summit.pdf
OpenStack Foundation
 
About imaginea2013
About imaginea2013About imaginea2013
About imaginea2013
vamsi20
 

Semelhante a Google Architecture - Breaking it Open (20)

Breaking RSA & the internet
Breaking RSA & the internetBreaking RSA & the internet
Breaking RSA & the internet
 
Virtualization And Cloud Impact Overview Auditor Spin Enterprise Gr Cv4
Virtualization And Cloud Impact Overview Auditor Spin   Enterprise Gr Cv4Virtualization And Cloud Impact Overview Auditor Spin   Enterprise Gr Cv4
Virtualization And Cloud Impact Overview Auditor Spin Enterprise Gr Cv4
 
Imaginea - Ideas to Life - About Us
Imaginea - Ideas to Life - About UsImaginea - Ideas to Life - About Us
Imaginea - Ideas to Life - About Us
 
Introducing Pico - Object Detection & Analytics using Docker, IoT & Amazon Re...
Introducing Pico - Object Detection & Analytics using Docker, IoT & Amazon Re...Introducing Pico - Object Detection & Analytics using Docker, IoT & Amazon Re...
Introducing Pico - Object Detection & Analytics using Docker, IoT & Amazon Re...
 
Simplifying Real Time Data Analytics with Docker, IoT & Cloud
Simplifying Real Time Data Analytics with Docker, IoT & CloudSimplifying Real Time Data Analytics with Docker, IoT & Cloud
Simplifying Real Time Data Analytics with Docker, IoT & Cloud
 
Java User Group Freiburg - Internet of Things für Java-Entwickler
Java User Group Freiburg - Internet of Things für Java-EntwicklerJava User Group Freiburg - Internet of Things für Java-Entwickler
Java User Group Freiburg - Internet of Things für Java-Entwickler
 
AWS IoT Lab Introduction
AWS IoT Lab IntroductionAWS IoT Lab Introduction
AWS IoT Lab Introduction
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
Deploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and KubernetesDeploying deep learning models with Docker and Kubernetes
Deploying deep learning models with Docker and Kubernetes
 
Platform Engineering using GitOps, Boston Kubernetes Meetup
Platform Engineering using GitOps, Boston Kubernetes MeetupPlatform Engineering using GitOps, Boston Kubernetes Meetup
Platform Engineering using GitOps, Boston Kubernetes Meetup
 
Fosec2011 keynote address
Fosec2011 keynote addressFosec2011 keynote address
Fosec2011 keynote address
 
Alleantia le web startup competition 2012 ssh
Alleantia   le web startup competition 2012 sshAlleantia   le web startup competition 2012 ssh
Alleantia le web startup competition 2012 ssh
 
Keeping Your Internet Business IT Asset Light By Mandar Kulkarni
Keeping Your Internet Business IT Asset Light By Mandar KulkarniKeeping Your Internet Business IT Asset Light By Mandar Kulkarni
Keeping Your Internet Business IT Asset Light By Mandar Kulkarni
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Intel IT OpenStack Journey - OpenStack Fall 2012 Summit.pdf
Intel IT OpenStack Journey - OpenStack Fall 2012 Summit.pdfIntel IT OpenStack Journey - OpenStack Fall 2012 Summit.pdf
Intel IT OpenStack Journey - OpenStack Fall 2012 Summit.pdf
 
Kernel Con 2022: Securing Cloud Native Workloads
Kernel Con 2022: Securing Cloud Native WorkloadsKernel Con 2022: Securing Cloud Native Workloads
Kernel Con 2022: Securing Cloud Native Workloads
 
Build Smart Service on GCP - Google DevFest 2018 Taiwan
Build Smart Service on GCP - Google DevFest 2018 TaiwanBuild Smart Service on GCP - Google DevFest 2018 Taiwan
Build Smart Service on GCP - Google DevFest 2018 Taiwan
 
About imaginea2013
About imaginea2013About imaginea2013
About imaginea2013
 
About Imaginea, A Product Engineering company
About Imaginea, A Product Engineering companyAbout Imaginea, A Product Engineering company
About Imaginea, A Product Engineering company
 

Mais de HARMAN Services

Mais de HARMAN Services (20)

3 Dimensions Of Transformation
3 Dimensions Of Transformation3 Dimensions Of Transformation
3 Dimensions Of Transformation
 
Testing Strategies to Deliver Consistent App Performance
Testing Strategies to Deliver Consistent App Performance Testing Strategies to Deliver Consistent App Performance
Testing Strategies to Deliver Consistent App Performance
 
How to Manage APIs in your Enterprise for Maximum Reusability and Governance
How to Manage APIs in your Enterprise for Maximum Reusability and GovernanceHow to Manage APIs in your Enterprise for Maximum Reusability and Governance
How to Manage APIs in your Enterprise for Maximum Reusability and Governance
 
Digital Transformation: Connected API Ecosystems
Digital Transformation: Connected API EcosystemsDigital Transformation: Connected API Ecosystems
Digital Transformation: Connected API Ecosystems
 
Webinar - Transforming Manufacturing with IoT
Webinar - Transforming Manufacturing with IoTWebinar - Transforming Manufacturing with IoT
Webinar - Transforming Manufacturing with IoT
 
Microsoft Azure Explained - Hitesh D Kesharia
Microsoft Azure Explained - Hitesh D KeshariaMicrosoft Azure Explained - Hitesh D Kesharia
Microsoft Azure Explained - Hitesh D Kesharia
 
15 Big Data Billionaires
15 Big Data Billionaires15 Big Data Billionaires
15 Big Data Billionaires
 
Digital Transformation in Travel
Digital Transformation in TravelDigital Transformation in Travel
Digital Transformation in Travel
 
Digital Transformation in Retail
Digital Transformation in RetailDigital Transformation in Retail
Digital Transformation in Retail
 
Digital Transformation in Media
Digital Transformation in MediaDigital Transformation in Media
Digital Transformation in Media
 
Digital Transformation in Hospitality
Digital Transformation in HospitalityDigital Transformation in Hospitality
Digital Transformation in Hospitality
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Top LinkedIn Influencers Every CIO Must Follow
Top LinkedIn Influencers Every CIO Must Follow Top LinkedIn Influencers Every CIO Must Follow
Top LinkedIn Influencers Every CIO Must Follow
 
Ladbrokes and Aditi - Digital Transformation Case study
Ladbrokes and Aditi - Digital Transformation Case study Ladbrokes and Aditi - Digital Transformation Case study
Ladbrokes and Aditi - Digital Transformation Case study
 
How Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
How Internet of Things (IoT) is Reshaping the Automotive Sector - InfographicHow Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
How Internet of Things (IoT) is Reshaping the Automotive Sector - Infographic
 
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
Finding the important bugs- A talk by John Scarborough, Director of Testing, ...
 
Analyzing Gartner's CIO Study: Fliping to Digital Leadership
Analyzing Gartner's CIO Study: Fliping to Digital Leadership Analyzing Gartner's CIO Study: Fliping to Digital Leadership
Analyzing Gartner's CIO Study: Fliping to Digital Leadership
 
24 Connected Car features to look out for before the release of Bond 24
24 Connected Car features to look out for before the release of Bond 2424 Connected Car features to look out for before the release of Bond 24
24 Connected Car features to look out for before the release of Bond 24
 
Webinar: How I Met Your Connected Customer
Webinar: How I Met Your Connected CustomerWebinar: How I Met Your Connected Customer
Webinar: How I Met Your Connected Customer
 
5 Takeaways From The UX India Conference
5 Takeaways From The UX India Conference5 Takeaways From The UX India Conference
5 Takeaways From The UX India Conference
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Google Architecture - Breaking it Open

  • 1. Learning and Development Be part of the learning experience at Aditi. presents Join the talks. Its free. Free as in freedom at work, not free-beer. Its not training. Its mind-opener. Speak at these events. Or bring an expert/friend to talk. Open Talk Series Mail OpenTalk@aditi.com with topic and A series of illuminating talks and interactions that open our minds to new availability. ideas and concepts; that makes us look for newer or better ways of doing what we did; or point us to exciting things we have never done before. A range of topics on Usually at 4.30PM Wednesdays. Technology, Business, Fun and Life.
  • 2. HOW TO ENJOY AN TALK Bring coffee & friends Switch OFF mobile Switch ON mind Sign attendance sheet SHARE your wisdom QUESTION notions THANK the Talker SPREAD the good word
  • 3. New Champion Sahil Sagar Aditi Technologies | Partnering Innovation
  • 4. Agenda • We are not talking about crawler • No discussion on PageRank… maybe? 4 Aditi Technologies | Partnering Innovation
  • 5. The art of scale 10-50 users 100-500 users 500-10000 5 Aditi Technologies | Partnering Innovation
  • 6. Scale ???? 800,000 Machines Largest Linux Base 6 Aditi Technologies | Partnering Innovation
  • 7. • What gives us this scale? Good Code? More servers? Powerful Servers? 7 Aditi Technologies | Partnering Innovation
  • 8. • Lets see what gives Google the scale Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL The apps on top GMAIL... Python. Java. Python, Java, C++, of it. C++ Sawzall, other GWQ Mapreduce BigTable BigTable The Secret Sauce Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE Infrastructure SERVER HARDWARE RACK DC Exterior Network 8 Aditi Technologies | Partnering Innovation
  • 9. Scale in Google Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other 1. The first touch GWQ Mapreduce 2. Size does matter BigTable BigTable Chubby Lock 3. The Safe GFS / GFS II 4. Operating System Implementation INTERIOR NETWORK IPv6 RHEL 2.6.X PAE 5. Interior Network Architecture SERVER HARDWARE RACK DC Exterior Network 9 Aditi Technologies | Partnering Innovation
  • 10. The first touch to the services 10 Aditi Technologies | Partnering Innovation
  • 11. The first touch to the service Architecture GOOGLE APPS SEARCH GOOGLE APP ENGINE INDEX CRAWL Client Browser Firewall DMZ GMAIL... 80/443 80/443 Perimeter Firewall Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce Squid GWS BigTable Reverse Proxy Web Server Farm Chubby Lock NetScalar http multiplexing Cell Interior Network GFS II etc GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 11 Aditi Technologies | Partnering Innovation
  • 12. The touch is not always real Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL 80/443 80/443 GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ Squid Reverse Proxy BigTable Mapreduce BigTable Chubby Lock • Uses Squid Reverse Proxy • Perimeter Cache hit rates 30-60% = Huge! GFS / GFS II • Dependent on search complexity/user preferences/traffic INTERIOR NETWORK IPv6 type RHEL 2.6.X PAE • All Image Thumbnails caches, much Multimedia cached SERVER HARDWARE RACK • Expensive common queries cached (common words like DC ‘Obama‘) as they require significant back-end processing. Exterior Network 12 Aditi Technologies | Partnering Innovation
  • 13. Size does matter 13 Aditi Technologies | Partnering Innovation
  • 14. Worldwide Data Centres Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of DC 800K machines. Exterior Network 14 Aditi Technologies | Partnering Innovation
  • 15. The Modular Data Centre Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power Consumption in 30 racks (40U). INTERIOR NETWORK IPv6 RHEL 2.6.X PAE This is the “Atomic“ Data Centre Building Block of Google. SERVER HARDWARE A Data Centre would consist of 100‘s of Modular Cells. RACK DC Exterior Network 15 Aditi Technologies | Partnering Innovation
  • 16. THE Safe How is a server stored in the Data Centre? 16 Aditi Technologies | Partnering Innovation
  • 17. Google Rack (GOOG rack) Architecture EVERYTHING custom! GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... • Optimized Motherboards Python. Java. C++ Python, Java, C++, Sawzall, other • Have their own HW builds GWQ • Build redundancy on top of failure BigTable Mapreduce BigTable • Motherboard directly Chubby Lock mounted into Rack • Servers have no casing - GFS / GFS II just bare boards • Assist with heat dispersal INTERIOR NETWORK IPv6 issues RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 17 Aditi Technologies | Partnering Innovation
  • 18. THE OPERATING SYSTEM The Core Software on each of those servers 18 Aditi Technologies | Partnering Innovation
  • 19. OPERATING SYSTEM Architecture GOOGLE APPS GOOGLE APP SEARCH INDEX -100% Redhat Linux Based since 1998 inception ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, - RHEL C++ Sawzall, other - 2.6.X Kernel GWQ - PAE - Custom glibc.. rpc... ipvs... Mapreduce - Custom FS (GFS II) BigTable BigTable - Custom Kerberos Chubby Lock - Custom NFS - Custom CUPS - Custom gPXE bootloader - Custom EVERYTHING..... GFS / GFS II INTERIOR NETWORK IPv6 Kernel/Subsystem Modifications tcmalloc – replaces glibc 2.3 malloc – much faster! works very well with threads... RHEL 2.6.X PAE rpc – the rpc layer extensively modified to provide > perf increase < latency (52%/40%) SERVER HARDWARE Significantly modified Kernel and Subsystems – all IPv6 enabled RACK DC Exterior Network 19 Aditi Technologies | Partnering Innovation
  • 20. THE Secret Sauce 20 Aditi Technologies | Partnering Innovation
  • 21. Section II – Googles Major Glue Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ 1. Google File System Architecture – GFS II BigTable Mapreduce BigTable Chubby Lock 2. Google Database - Bigtable 3. Google Computation - Mapreduce GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE RACK DC Exterior Network 21 Aditi Technologies | Partnering Innovation
  • 22. GOOGLE FILE SYSTEM Manages the underlying Data on behalf of the upper layers and ultimately the applications 22 Aditi Technologies | Partnering Innovation
  • 23. GFS versus NFS Network File System (NFS) Google File System (GFS) • Single machine makes part of  Single virtual file system spread over its file system available to many machines other machines  Optimized for sequential read • Sequential or random access and local accesses • PRO: Simplicity, generality,  PRO: High throughput, high transparency capacity • CON: Storage capacity and  "CON": Specialized for particular throughput limited by single types of applications server 23 University of Pennsylvania Aditi Technologies | Partnering Innovation
  • 24. FILE SYSTEM I – GFS II Architecture GOOGLE APPS SEARCH GOOGLE APP INDEX ENGINE CRAWL GMAIL... Python. Java. Python, Java, C++, C++ Sawzall, other GWQ BigTable Mapreduce BigTable Chubby Lock GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE Elegant Master Failover SERVER HARDWARE Chunk Size is now 1MB RACK Only ever lost one 64MB chunk (in GFS I) during its entire production deployment so DC assumed extremely reliable Exterior Network 24 Aditi Technologies | Partnering Innovation
  • 25. CAP Theorem (Brewer's theorem) • Consistency: All nodes see the same data at the same time • Availability: Node failures do not prevent survivors from continuing to operate • Partition tolerance: The system continues to operate despite arbitrary message loss 25 Aditi Technologies | Partnering Innovation
  • 26. GOOGLE DATABASE Accesses the underlying Data on behalf of the upper layers and ultimately the applications 26 Aditi Technologies | Partnering Innovation
  • 27. Why not commercial DB? • Scale is too large for most commercial databases • Cost would be very high – Building internally means system can be applied across many projects for low incremental cost • Low-level storage optimizations help performance significantly – Much harder to do when running on top of a database layer “Also fun and challenging to build large-scale systems” 27 Aditi Technologies | Partnering Innovation
  • 28. BigTable • A distributed storage system for managing structured data. • Scalable – Thousands of servers – Terabytes of in-memory data – Petabyte of disk-based data – Millions of reads/writes per second, efficient scans • Self-managing – Servers can be added/removed dynamically – Servers adjust to load imbalance • Used for many Google projects – Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … 28 Aditi Technologies | Partnering Innovation
  • 29. BigTable • Physically sorted on row-key – like a row-store • Column families - like column-stores • Variable (record-by-record) columns within a column family • Column-values versioned; stored in reverse chronological order • Designed to store hyperlink structure of web Aditi Technologies | Partnering Innovation
  • 30. GOOGLE MAPREDUCE Computes the underlying Data on behalf of the applications 30 Aditi Technologies | Partnering Innovation
  • 31. Mapreduce I Architecture GOOGLE APPS SEARCH GOOGLE APP ENGINE INDEX CRAWL Map Reduction can be seen as a way to exploit massive parallelism GMAIL... by breaking a task down into constituent parts and executing on Python. Java. Python, Java, C++, C++ Sawzall, other multiple processors GWQ The Major Functions are MAP & REDUCE (with a number of intermediatary steps BigTable Mapreduce MAP Break task down into parallel steps BigTable Chubby Lock REDUCE Combine results into final output GFS / GFS II INTERIOR NETWORK IPv6 RHEL 2.6.X PAE SERVER HARDWARE Shown is a 2-pipeline Map Reduction (There are 24 Map Reductions in the indexing pipeline) RACK Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!) DC Exterior Network 31 Aditi Technologies | Partnering Innovation
  • 32. Word-Count using MapReduce Problem: determine the frequency of each word in a large document collection Aditi Technologies | Partnering Innovation
  • 33. What runs on top of all this 33 Aditi Technologies | Partnering Innovation
  • 34. PageRank: Intuition Shouldn't E's vote be worth more than F's? G A H E B How many levels I C should we consider? F J D • Imagine a contest for The Web's Best Page – Initially, each page has one vote – Each page votes for all the pages it has a link to – To ensure fairness, pages voting for more than one page must split their vote equally between them – Voting proceeds in rounds; in each round, each page has the number of votes it received in the previous round – In practice, it's a little more complicated - but not much! 34 Aditi Technologies | Partnering Innovation
  • 35. Random Surfer Model • PageRank has an intuitive basis in random walks on graphs • Imagine a random surfer, who starts on a random page and, in each step, – with probability d, clicks on a random link on the page – with probability 1-d, jumps to a random page (bored?) • The PageRank of a page can be interpreted as the fraction of steps the surfer spends on the corresponding page 35 Aditi Technologies | Partnering Innovation
  • 36. BUILD YOUR OWN GOOGLE The Basic Open Source Tools 36 Aditi Technologies | Partnering Innovation
  • 37. The Google Stack (vs Yahoo‘ish/Open Source) Open Source (Yahoo’ish) Architecture Architecture GOOGLE APPS SEARCH APP ENGINE INDEX CLIENT APPLICATION CRAWL GMAIL... Python, Java, Python, Java, C++, Pig Latin, Python, PHP, Java .... C++, Sawzall, other anything Task Queue GWQ Job Tracker Googles Mapreduce Hadoop Framework Hadoop BigTable Secret Sauce BigTable Chubby Lock Mapreduce Hbase (Bigtable equiv.) Open Source (Other Tools such as crawlers, indexers readily available) GFS / GFS II HDFS (hadoop) INTERIOR NETWORK IPv6 INTERIOR NETWORK IPv6 RHEL 2.6.X PAE CentOS 2.6.X PAE SERVER HARDWARE SERVER HARDWARE RACK RACK DC DC Exterior Network Exterior Network Conceptual Overview Google vs. Open Source 37 Aditi Technologies | Partnering Innovation
  • 38. END (Thankyou) 38 Aditi Technologies | Partnering Innovation
  • 39. Pre Presentation The Google Philosophy (according to ed) • Jedis build their own lightsabres (the MS Eat your own Dog Food) • Parallelize Everything • Distribute Everything (to atomic level if possible) • Compress Everything (CPU cheaper than bandwidth) • Secure Everything (you can never be too paranoid) • Cache (almost) Everything • Redundantize Everything (in triplicate usually) • Latency is VERY evil 39 Aditi Technologies | Partnering Innovation
  • 40. Special Thanks to …. The Anatomy of the Google Architecture “The unofficial Version“ V1.0 November 2009 • Ed Austin • {ed, edik} @i-dot.com Aditi Technologies | Partnering Innovation
  • 41.
  • 42. Keep Learning For any suggestions on topics/ feedbacks etc., Contact OpenTalk@aditi.com