SlideShare uma empresa Scribd logo
1 de 51
GOOGLE TALK ,[object Object]
Pre Presentation The Google Philosophy  (according to ed) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Anatomy of the Google Architecture “The unofficial Version“ V1.0 November 2009 Ed Austin {ed, edik} @i-dot.com
Section I – The Basic Glue 1. Exterior Network (Perimeter Architecture) 2. Data Centre  3. Rack Characteristics 4. Core Server Hardware 5. Operating System Implementation 6. Interior Network Architecture
THE PERIMETER How does your data enter the Google empire?
Perimeter Network Security (as known) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
PERIMETER NETWORK CACHING ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
THE DATA CENTRE Where do they store all that Data?
Worldwide Data Centres Where is Google Located? Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of 800K machines.  US (#1) – Europe (#2) – Asia (#3) – South America/Russia (#4) Australia – on Hold Future :  Taiwan, Malaysia, Lithuania, and Blythewood, South Carolina.
The Modular Data Centre Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power Consumption in 30 racks (40U). This is the “Atomic“ Data Centre Building Block of Google. A Data Centre would consist of 100‘s of Modular Cells. DC architecture then being the aggregation of smaller Cell level infrastructures in their own container – some being pure GFS, other BT, other Map, some mixed etc. MDC‘s can also be deployed autonomously at the Perimeter (stand alone).
THE RACK How is a server stored in the Data Centre?
Google Rack (GOOG rack) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
THE HARDWARE Millions of exactly what?
Server Hardware ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],YEAR Average Server Specification 1999/2000 PII/PIII 128MB+  2003/2004 Celeron 533, PIII 1.4 SMP, 2-4GB DRAM, Dual XEON 2.0/1-4GB/40-160GB IDE - SATA Disks via  Silicon Images SATA 3114/SATA 3124  2006 Dual Opteron/Working Set DRAM(4GB+)/2x400GB IDE (RAID0?) 2009 2-Way/Dual Core/16GB/1-2TB SATA
THE OPERATING SYSTEM The Core Software on each of those servers
OPERATING SYSTEM ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
THE INTERIOR NETWORK How does your datatravel around the Google empire?
INTERIOR NETWORK ROUTING PROTOCOL Internal network is IPv6 (exterior machines can be  reached using IPv6) Heavily Modified Version of OSPF as the IRP Intra-rack network is 100baseT Inter-rack network is 1000baseT Inter-DC network pipes unknown but  very  fast Technology: Juniper, Cisco, Foundry, HP, routers and  switches Software: ipvs (ip virtual server)
THE MAJOR GLUE The three foundation blocks of Googles Secret Sauce
Section II – Googles Major Glue 1. Google File System Architecture – GFS II 2. Google Database - Bigtable 3. Google Computation - Mapreduce 4. Google Scheduling - GWQ
GOOGLE FILE SYSTEM Manages the underlying Data on behalf of the upper layers and ultimately the applications
FILE SYSTEM I – GFS v1 The GFS II cell is Googles fundamental building block – everything can be layered on top of this Consists of (Highly distributed Linux based) Master Servers and Chunk Servers Chunk Servers serve the Data in 64MB Chunks to the client directly via Master arbitration DATA REDUNDANCY/FAULT TOLERANCE? Triplicate Copies of Chunks are kept often in other clusters / DC Chunks can be pulled from outside the DC! Expensive.... And try not to do! However apps built on top of GFS/BT do this on an ad-hoc basis (i.e. Gmail) On Chunk loss the Master handles the Recovery by sourcing a chunk copy Data is compressed  using BMDiff/Zippy Chunk Server Fault-Tolerance  achieved by Heart-beat to the Master (I am alive..) Master Failure  was problematic for Google (finally down from 2 minutes to 10 seconds)
FILE SYSTEM I – GFS II GFS II “Colossus“ Version 2 improves in many ways (is a complete rewrite) Elegant Master Failover  (no more 2s delays...) Chunk Size is now 1MB  – likely to improve latency for serving data other than Indexing – for example GMail – this was the rationale behind the change Master can store more Chunk Metadata  (therefore more chunks addressable up to 100 million) = also more Chunk Servers However according to Google Engineer they have only ever lost one 64MB chunk (in GFS I) during its entire production deployment (2004 – 2008?) so assumed extremely reliable
GOOGLE DATABASE Accesses the underlying Data on behalf of the upper layers and ultimately the applications
Bigtable I - Introduction  What is it? Googles Database Implementation since 1994 Used internally for all large scale (Search, Indexing, GMail etc) Similar to a sharded Database implemention GOALS Huge Scalability to many PB‘s (Web Database currently around 40 Billion URL‘s) Tight Latency Highly efficient scans over Textual Data Fault Tolerant Load Balancable Eliminate Googles dependency on an external provider
Bigtable II  How is Data Referenced? Distributed Multi-Dimensional Sparse Map Simple addressing model using a triple: (row, column,  { timestamp } ) -> cell contents ROWS - Rows (arbitrary length usually 10-100 Bytes Max <=64KB) - Rows stored lexographically -  example row  (URL)) COLUMNS - example column  (contents:,  PR,  anchor1: ..) TIMESTAMP   (OPTIONAL?) - timestamp (various API func args, i.e. “ALL“, “LATEST“) .
Bigtable III – Table Structure ,[object Object],[object Object],[object Object],[object Object],[object Object],C++ Bigtable Mutate of some Anchors   //open table Table *T=OpenOrDie(“/ bigtable/web/bigtable “); //write new anchor and delete old anchor RowMutation  r1(T,“uk.co.bbc.news“);  r1.Set (“anchor:www.abc.org“,“CNN“); r1.Delete (“anchor:www.def.com“); Operation op; Apply (&op, &r1); //atomic mutate to the columns Other primitives such as DeleteCells(), DeleteRow(), Scanner (read arbitrary cells in a row)
Bigtable IV  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
GOOGLE MAPREDUCE Computes the underlying Data on behalf of the applications
Mapreduce I  Map Reduction can be seen as a way to exploit massive parallelism  by breaking a task down into constituent parts and executing on  multiple processors The Major Functions are MAP & REDUCE  (with a number of intermediatary steps) MAP Break task down into parallel steps REDUCE  Combine results into final output Shown is a 2-pipeline Map Reduction  (There are 24 Map Reductions in the indexing pipeline) Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!)
Mapreduce II ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Chubby Lock  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
GOOGLE WORKQUEUE Provides Resource Management for the Computational Jobs
GWQ – Google Workqueue ,[object Object],[object Object],[object Object],[object Object]
Section III – Some more Glue 1. Languages employed 2. Development Environment 3. Google App Engine 4. Network Security 5. Future Google Architecture Advances 6. Odds n Sods 7. DIY Google
DEVELOPMENT LANGUAGES - Initially Python, Java, C++ Usual Suspects -  Sawzall (since 2006) - equivalent to Hadoops Pig Latin - written in C++ - interpreted bytecode output JIT‘d An internal Procedural language employed to solve map reduction problems.  The few published Google papers employ Sawzall in the algorithm examples. Runs in the Map phase, Aggregators run in the Reduce phase (from each Sawzall Map instance) to get the final output. - Transparent Parallelization – no specialist Distrib Sys Knowledge  Required (Good for developer) - Simple Datatypes 64-bit signed int, float, string, byte and a few unique such as  time   - Much STR regexp support   - Compound Types arrays, tuples  - typesafed (and declarations) similar to Pascal (Probably an LL(1) lang?)  - similar to Algol, C Syntax (no pointers though!) - No Processing of exceptions (no exception handlers) -  Shorter than corresponding C++ code by a factor of 10 Early versions could not write into Bigtable. Now implemented? Output sometimes pipelined into MySQL for further analysis
GOOGLE APP ENGINE Using “Application Platform technology stack“ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Security Rack Board Level (possible scenario) gPXE on the board goes through DHCP/tftp sequence to pull over an encrypted image  (this is not expensive as is done once per boot and boots are not usual) Image is pulled from a Secure Image Distribution Server  (and held encrypted on these) Once at the board end the image is OTF decrypted and booted  as normal RHEL 02/09 Google Engineer didn‘t dispute this and seemed to concur adding that in-core encryption might be a possibility (R/T decryption might not be that expensive) – this possibily means cryptology is used throughout the lifetime of the image – including components outside the working-set but sensitive parts of the in-core OS (OTF decrypted) Enterprise Kerberos is used throughout the enterprise They have an  Automated issuance system for SSL certificates, used by internal (secure) infrastructure to validate https/TLS and generic SSL connections . Complete internal network encryption unlikely due to latency introduced? Likely that one of the reasons failover between DC‘s problematic is the latency introduced due to the expense of Wide Area Encryption (essential)
Google Future Architecture ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Odds n Sods borg – google technology/architecture (is a cluster..) Borg: a hybrid protocol for scalable application-level multicast in peer-to-peer networks (WAN multimedia steaming) data cube – google technology Have a “global loadbalancer“ – assume load balances across a unified namespace – probably worldwide gmail designers implemented application level failover to move your session to an alternate DC in a seamless fashion to the end user. Probably all Google Apps will be able to migrate to an alternate DC cell (the application, and its GFS data if need be)   MySQL  is  used for back-end sys admin stuff (high availability master-slave implementations) and post Bigtable processing Remote employee access is via VPN Sys Admins maintain  5 and 30 minute SLA’s – so on the ball Has its own internal archive.org equiv.
BUILD YOUR OWN GOOGLE The Basic Open Source Tools
The Google Stack (vs Yahoo‘ish/Open Source)
END (Thankyou)
DIY GOOGLE What you require: Preferably 2 Machines + 100BT CentOS/RHEL (squid) Apache Hadoop (HDFS, Mapreduce, Pig, HBase) HDFS bmdiff/zippy compression library Google glibc/tcmalloc – perftools Supporting stuff – JRE etc Browser with Search Box pig mr call to scan a few files print results
DIY GOOGLE Install Hadoop and Pig on Cluster Install eclipse and dependencies Install PigPen for eclipse and configure to cluster (NFS)
TEMPLATE - IPv6 enablement started 2008  (2009 finished?) - IRP  OSPF Google authored RFC points towards OSPF
DEVELOPMENT ENVIRONMENT  bits&bobs A rare shot of some concrete google internal stuff  (this of a GFS Master Server code execution found as a perftools profiling example) Agile Methodologies Used ( development iterations, teamwork, collaboration, and process adaptability throughout the life-cycle of the project) “ Libraries are the predominant way of building programs” An infrastructure handles versioning of applications so they can be release without a fear of breaking things = roll out with minimal QA - Internal Code uses replacement libraries - Google as you‘d expect rewrites everything! - Hungarian Notation? - Work in small teams 3-5 people – likely few scutters know ‘‘the big picture“
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Some Pre Presentation Information ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Pre Presentation Disclaimer ,[object Object],[object Object],[object Object],[object Object],[object Object]
Bigtable VI ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecturehugo lu
 
nftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewallnftables - the evolution of Linux Firewall
nftables - the evolution of Linux FirewallMarian Marinov
 
The basic concept of Linux FIleSystem
The basic concept of Linux FIleSystemThe basic concept of Linux FIleSystem
The basic concept of Linux FIleSystemHungWei Chiu
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak PerformanceTodd Palino
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network InterfacesKernel TLV
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldJignesh Shah
 
Enhanced Interior Gateway Routing Protocol (EIGRP) || NETWORK PROTOCOL
Enhanced Interior Gateway Routing Protocol (EIGRP) || NETWORK PROTOCOLEnhanced Interior Gateway Routing Protocol (EIGRP) || NETWORK PROTOCOL
Enhanced Interior Gateway Routing Protocol (EIGRP) || NETWORK PROTOCOLNutan Singh
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Cluster management with Kubernetes
Cluster management with KubernetesCluster management with Kubernetes
Cluster management with KubernetesSatnam Singh
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Traffic Control with Envoy Proxy
Traffic Control with Envoy ProxyTraffic Control with Envoy Proxy
Traffic Control with Envoy ProxyMark McBride
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBleesjensen
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 

Mais procurados (20)

The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
nftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewallnftables - the evolution of Linux Firewall
nftables - the evolution of Linux Firewall
 
The basic concept of Linux FIleSystem
The basic concept of Linux FIleSystemThe basic concept of Linux FIleSystem
The basic concept of Linux FIleSystem
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Kafka at Peak Performance
Kafka at Peak PerformanceKafka at Peak Performance
Kafka at Peak Performance
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
Fun with Network Interfaces
Fun with Network InterfacesFun with Network Interfaces
Fun with Network Interfaces
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Enhanced Interior Gateway Routing Protocol (EIGRP) || NETWORK PROTOCOL
Enhanced Interior Gateway Routing Protocol (EIGRP) || NETWORK PROTOCOLEnhanced Interior Gateway Routing Protocol (EIGRP) || NETWORK PROTOCOL
Enhanced Interior Gateway Routing Protocol (EIGRP) || NETWORK PROTOCOL
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Cluster management with Kubernetes
Cluster management with KubernetesCluster management with Kubernetes
Cluster management with Kubernetes
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Traffic Control with Envoy Proxy
Traffic Control with Envoy ProxyTraffic Control with Envoy Proxy
Traffic Control with Envoy Proxy
 
Beautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDBBeautiful Monitoring With Grafana and InfluxDB
Beautiful Monitoring With Grafana and InfluxDB
 
Karpenter
KarpenterKarpenter
Karpenter
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 

Semelhante a The Anatomy Of The Google Architecture Fina Lv1.1

Google data centers
Google data centersGoogle data centers
Google data centersAli Al-Ugali
 
Research computing at ILRI
Research computing at ILRIResearch computing at ILRI
Research computing at ILRIILRI
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...jins0618
 
EEDC - Apache Pig
EEDC - Apache PigEEDC - Apache Pig
EEDC - Apache Pigjavicid
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computersshopnil786
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDaehyeok Kim
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performancesolarisyougood
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceSpeck&Tech
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsSatya Narayan
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster ComputingNIKHIL NAIR
 

Semelhante a The Anatomy Of The Google Architecture Fina Lv1.1 (20)

Google data centers
Google data centersGoogle data centers
Google data centers
 
Research computing at ILRI
Research computing at ILRIResearch computing at ILRI
Research computing at ILRI
 
Eedc.apache.pig last
Eedc.apache.pig lastEedc.apache.pig last
Eedc.apache.pig last
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
EEDC - Apache Pig
EEDC - Apache PigEEDC - Apache Pig
EEDC - Apache Pig
 
Cluster Computers
Cluster ComputersCluster Computers
Cluster Computers
 
Designs, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed SystemsDesigns, Lessons and Advice from Building Large Distributed Systems
Designs, Lessons and Advice from Building Large Distributed Systems
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Presentation db2 best practices for optimal performance
Presentation   db2 best practices for optimal performancePresentation   db2 best practices for optimal performance
Presentation db2 best practices for optimal performance
 
EEDC Apache Pig Language
EEDC Apache Pig LanguageEEDC Apache Pig Language
EEDC Apache Pig Language
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
2337610
23376102337610
2337610
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for science
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark Basics
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Training
TrainingTraining
Training
 

Último

Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's DevelopmentNara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Developmentnarsireddynannuri1
 
06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdfFIRST INDIA
 
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...IT Industry
 
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...srinuseo15
 
China's soft power in 21st century .pptx
China's soft power in 21st century   .pptxChina's soft power in 21st century   .pptx
China's soft power in 21st century .pptxYasinAhmad20
 
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopkoEmbed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopkobhavenpr
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...Andy (Avraham) Blumenthal
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)Delhi Call girls
 
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...anjanibaddipudi1
 
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreieGujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreiebhavenpr
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)Delhi Call girls
 
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)Delhi Call girls
 
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost LoverPowerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost LoverPsychicRuben LoveSpells
 
Group_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the tradeGroup_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the tradeRahatulAshafeen
 
422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdf422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdflambardar420420
 
04052024_First India Newspaper Jaipur.pdf
04052024_First India Newspaper Jaipur.pdf04052024_First India Newspaper Jaipur.pdf
04052024_First India Newspaper Jaipur.pdfFIRST INDIA
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)Delhi Call girls
 
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...hyt3577
 
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...Faga1939
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)Delhi Call girls
 

Último (20)

Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's DevelopmentNara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
Nara Chandrababu Naidu's Visionary Policies For Andhra Pradesh's Development
 
06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf
 
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
 
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
Transformative Leadership: N Chandrababu Naidu and TDP's Vision for Innovatio...
 
China's soft power in 21st century .pptx
China's soft power in 21st century   .pptxChina's soft power in 21st century   .pptx
China's soft power in 21st century .pptx
 
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopkoEmbed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
Embed-2 (1).pdfb[k[k[[k[kkkpkdpokkdpkopko
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 48 (Gurgaon)
 
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...
*Navigating Electoral Terrain: TDP's Performance under N Chandrababu Naidu's ...
 
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreieGujarat-SEBCs.pdf pfpkoopapriorjfperjreie
Gujarat-SEBCs.pdf pfpkoopapriorjfperjreie
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Palam Vihar (Gurgaon)
 
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Chaura Sector 22 ( Noida)
 
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost LoverPowerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
Powerful Love Spells in Phoenix, AZ (310) 882-6330 Bring Back Lost Lover
 
Group_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the tradeGroup_5_US-China Trade War to understand the trade
Group_5_US-China Trade War to understand the trade
 
422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdf422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdf
 
04052024_First India Newspaper Jaipur.pdf
04052024_First India Newspaper Jaipur.pdf04052024_First India Newspaper Jaipur.pdf
04052024_First India Newspaper Jaipur.pdf
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 47 (Gurgaon)
 
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
{Qatar{^🚀^(+971558539980**}})Abortion Pills for Sale in Dubai. .abu dhabi, sh...
 
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
THE OBSTACLES THAT IMPEDE THE DEVELOPMENT OF BRAZIL IN THE CONTEMPORARY ERA A...
 
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)
Enjoy Night ≽ 8448380779 ≼ Call Girls In Gurgaon Sector 46 (Gurgaon)
 

The Anatomy Of The Google Architecture Fina Lv1.1

  • 1.
  • 2.
  • 3. The Anatomy of the Google Architecture “The unofficial Version“ V1.0 November 2009 Ed Austin {ed, edik} @i-dot.com
  • 4. Section I – The Basic Glue 1. Exterior Network (Perimeter Architecture) 2. Data Centre 3. Rack Characteristics 4. Core Server Hardware 5. Operating System Implementation 6. Interior Network Architecture
  • 5. THE PERIMETER How does your data enter the Google empire?
  • 6.
  • 7.
  • 8. THE DATA CENTRE Where do they store all that Data?
  • 9. Worldwide Data Centres Where is Google Located? Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of 800K machines. US (#1) – Europe (#2) – Asia (#3) – South America/Russia (#4) Australia – on Hold Future : Taiwan, Malaysia, Lithuania, and Blythewood, South Carolina.
  • 10. The Modular Data Centre Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power Consumption in 30 racks (40U). This is the “Atomic“ Data Centre Building Block of Google. A Data Centre would consist of 100‘s of Modular Cells. DC architecture then being the aggregation of smaller Cell level infrastructures in their own container – some being pure GFS, other BT, other Map, some mixed etc. MDC‘s can also be deployed autonomously at the Perimeter (stand alone).
  • 11. THE RACK How is a server stored in the Data Centre?
  • 12.
  • 13. THE HARDWARE Millions of exactly what?
  • 14.
  • 15. THE OPERATING SYSTEM The Core Software on each of those servers
  • 16.
  • 17. THE INTERIOR NETWORK How does your datatravel around the Google empire?
  • 18. INTERIOR NETWORK ROUTING PROTOCOL Internal network is IPv6 (exterior machines can be reached using IPv6) Heavily Modified Version of OSPF as the IRP Intra-rack network is 100baseT Inter-rack network is 1000baseT Inter-DC network pipes unknown but very fast Technology: Juniper, Cisco, Foundry, HP, routers and switches Software: ipvs (ip virtual server)
  • 19. THE MAJOR GLUE The three foundation blocks of Googles Secret Sauce
  • 20. Section II – Googles Major Glue 1. Google File System Architecture – GFS II 2. Google Database - Bigtable 3. Google Computation - Mapreduce 4. Google Scheduling - GWQ
  • 21. GOOGLE FILE SYSTEM Manages the underlying Data on behalf of the upper layers and ultimately the applications
  • 22. FILE SYSTEM I – GFS v1 The GFS II cell is Googles fundamental building block – everything can be layered on top of this Consists of (Highly distributed Linux based) Master Servers and Chunk Servers Chunk Servers serve the Data in 64MB Chunks to the client directly via Master arbitration DATA REDUNDANCY/FAULT TOLERANCE? Triplicate Copies of Chunks are kept often in other clusters / DC Chunks can be pulled from outside the DC! Expensive.... And try not to do! However apps built on top of GFS/BT do this on an ad-hoc basis (i.e. Gmail) On Chunk loss the Master handles the Recovery by sourcing a chunk copy Data is compressed using BMDiff/Zippy Chunk Server Fault-Tolerance achieved by Heart-beat to the Master (I am alive..) Master Failure was problematic for Google (finally down from 2 minutes to 10 seconds)
  • 23. FILE SYSTEM I – GFS II GFS II “Colossus“ Version 2 improves in many ways (is a complete rewrite) Elegant Master Failover (no more 2s delays...) Chunk Size is now 1MB – likely to improve latency for serving data other than Indexing – for example GMail – this was the rationale behind the change Master can store more Chunk Metadata (therefore more chunks addressable up to 100 million) = also more Chunk Servers However according to Google Engineer they have only ever lost one 64MB chunk (in GFS I) during its entire production deployment (2004 – 2008?) so assumed extremely reliable
  • 24. GOOGLE DATABASE Accesses the underlying Data on behalf of the upper layers and ultimately the applications
  • 25. Bigtable I - Introduction What is it? Googles Database Implementation since 1994 Used internally for all large scale (Search, Indexing, GMail etc) Similar to a sharded Database implemention GOALS Huge Scalability to many PB‘s (Web Database currently around 40 Billion URL‘s) Tight Latency Highly efficient scans over Textual Data Fault Tolerant Load Balancable Eliminate Googles dependency on an external provider
  • 26. Bigtable II How is Data Referenced? Distributed Multi-Dimensional Sparse Map Simple addressing model using a triple: (row, column, { timestamp } ) -> cell contents ROWS - Rows (arbitrary length usually 10-100 Bytes Max <=64KB) - Rows stored lexographically - example row (URL)) COLUMNS - example column (contents:, PR, anchor1: ..) TIMESTAMP (OPTIONAL?) - timestamp (various API func args, i.e. “ALL“, “LATEST“) .
  • 27.
  • 28.
  • 29. GOOGLE MAPREDUCE Computes the underlying Data on behalf of the applications
  • 30. Mapreduce I Map Reduction can be seen as a way to exploit massive parallelism by breaking a task down into constituent parts and executing on multiple processors The Major Functions are MAP & REDUCE (with a number of intermediatary steps) MAP Break task down into parallel steps REDUCE Combine results into final output Shown is a 2-pipeline Map Reduction (There are 24 Map Reductions in the indexing pipeline) Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!)
  • 31.
  • 32.
  • 33. GOOGLE WORKQUEUE Provides Resource Management for the Computational Jobs
  • 34.
  • 35. Section III – Some more Glue 1. Languages employed 2. Development Environment 3. Google App Engine 4. Network Security 5. Future Google Architecture Advances 6. Odds n Sods 7. DIY Google
  • 36. DEVELOPMENT LANGUAGES - Initially Python, Java, C++ Usual Suspects - Sawzall (since 2006) - equivalent to Hadoops Pig Latin - written in C++ - interpreted bytecode output JIT‘d An internal Procedural language employed to solve map reduction problems. The few published Google papers employ Sawzall in the algorithm examples. Runs in the Map phase, Aggregators run in the Reduce phase (from each Sawzall Map instance) to get the final output. - Transparent Parallelization – no specialist Distrib Sys Knowledge Required (Good for developer) - Simple Datatypes 64-bit signed int, float, string, byte and a few unique such as time - Much STR regexp support - Compound Types arrays, tuples - typesafed (and declarations) similar to Pascal (Probably an LL(1) lang?) - similar to Algol, C Syntax (no pointers though!) - No Processing of exceptions (no exception handlers) - Shorter than corresponding C++ code by a factor of 10 Early versions could not write into Bigtable. Now implemented? Output sometimes pipelined into MySQL for further analysis
  • 37.
  • 38. Security Rack Board Level (possible scenario) gPXE on the board goes through DHCP/tftp sequence to pull over an encrypted image (this is not expensive as is done once per boot and boots are not usual) Image is pulled from a Secure Image Distribution Server (and held encrypted on these) Once at the board end the image is OTF decrypted and booted as normal RHEL 02/09 Google Engineer didn‘t dispute this and seemed to concur adding that in-core encryption might be a possibility (R/T decryption might not be that expensive) – this possibily means cryptology is used throughout the lifetime of the image – including components outside the working-set but sensitive parts of the in-core OS (OTF decrypted) Enterprise Kerberos is used throughout the enterprise They have an Automated issuance system for SSL certificates, used by internal (secure) infrastructure to validate https/TLS and generic SSL connections . Complete internal network encryption unlikely due to latency introduced? Likely that one of the reasons failover between DC‘s problematic is the latency introduced due to the expense of Wide Area Encryption (essential)
  • 39.
  • 40. Odds n Sods borg – google technology/architecture (is a cluster..) Borg: a hybrid protocol for scalable application-level multicast in peer-to-peer networks (WAN multimedia steaming) data cube – google technology Have a “global loadbalancer“ – assume load balances across a unified namespace – probably worldwide gmail designers implemented application level failover to move your session to an alternate DC in a seamless fashion to the end user. Probably all Google Apps will be able to migrate to an alternate DC cell (the application, and its GFS data if need be) MySQL is used for back-end sys admin stuff (high availability master-slave implementations) and post Bigtable processing Remote employee access is via VPN Sys Admins maintain 5 and 30 minute SLA’s – so on the ball Has its own internal archive.org equiv.
  • 41. BUILD YOUR OWN GOOGLE The Basic Open Source Tools
  • 42. The Google Stack (vs Yahoo‘ish/Open Source)
  • 44. DIY GOOGLE What you require: Preferably 2 Machines + 100BT CentOS/RHEL (squid) Apache Hadoop (HDFS, Mapreduce, Pig, HBase) HDFS bmdiff/zippy compression library Google glibc/tcmalloc – perftools Supporting stuff – JRE etc Browser with Search Box pig mr call to scan a few files print results
  • 45. DIY GOOGLE Install Hadoop and Pig on Cluster Install eclipse and dependencies Install PigPen for eclipse and configure to cluster (NFS)
  • 46. TEMPLATE - IPv6 enablement started 2008 (2009 finished?) - IRP OSPF Google authored RFC points towards OSPF
  • 47. DEVELOPMENT ENVIRONMENT bits&bobs A rare shot of some concrete google internal stuff (this of a GFS Master Server code execution found as a perftools profiling example) Agile Methodologies Used ( development iterations, teamwork, collaboration, and process adaptability throughout the life-cycle of the project) “ Libraries are the predominant way of building programs” An infrastructure handles versioning of applications so they can be release without a fear of breaking things = roll out with minimal QA - Internal Code uses replacement libraries - Google as you‘d expect rewrites everything! - Hungarian Notation? - Work in small teams 3-5 people – likely few scutters know ‘‘the big picture“
  • 48.
  • 49.
  • 50.
  • 51.