SlideShare a Scribd company logo
1 of 32
Download to read offline
Bigtable

A Distributed Storage System for
         Structured Data

        Authors: Fay Chang et. al.

        Presenter: Zafar Gilani

                                     1
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                               2
Bigtable


A distributed storage system ..
• .. for managing structured data.
• Used for demanding workloads, such as:
   – Throughput oriented batch processing.
   – Serving latency-sensitive data to the client.
• Dynamic control instead of relational model.
• Data locality properties (revisit later briefly).



                                                        3
Bigtable


Bigtable has achieved several goals
• Wide applicability: used for 60+ Google
  products, including:
  – Google Analytics, Google Code, Google Earth,
    Google Maps and Gmail.
• Scalability (explain later under evaluation).
• High performance.
• High availability.


                                                     4
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                               5
Bigtable


  Data model
  • Essentially a sparse, distributed, persistent
    multi-dimensional sorted map.
  • The map is indexed by a row key, column key
    and a timestamp.
  • Atomic reads and writes over a single row.
                            Columns




Rows


                                                    6
Bigtable


Row and column range
• Row range dynamically       • Column keys grouped
  partitioned into tablets.     into column families.
• Data in lexicographic       • Each family has the
  order.                        same type.
• Allows data locality.       • Allows access control
                                and disk or memory
                                accounting.




                                                          7
Bigtable


Row and column range
• Row range dynamically             • Column keys grouped
  partitioned into tablets.           into column families.
• Data in lexicographic             • Each family has the
  order.                              same type.
• Allows data locality.             • Allows access control
                                      and disk or memory
                                      accounting.


               Enables reasoning about data locality

                                                                8
Columns




Rows




                 9
Columns




Rows




       Anchor is a column family




                                   10
Columns




Tablets
                          “anchor:bbcworld.com   “anchor:weather


          “com.bbc.www”          “BBC”            “BBC.com”




                                                                   11
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                              12
Bigtable
Bigtable uses several other
technologies
• Google File System to store log and data files.
• SSTable file format to store BigTable data.
• Chubby, a distributed lock service.




For more details on these technologies, refer to section 4 of the paper.


                                                                            13
Bigtable


  Implementation
Master responsibilities:
-Assign tablets to tablet
servers
-Add/delete tablet servers
-Balance tablet server load
-GC
-Schema changes
                                                   INTERNET

                                                                      CLIENT
                                               Communicate directly
                                               to tablet servers


               MASTER
                              TABLET SERVERS
                                                                       14
Bigtable


How data is stored?
A three-level hierarchy, similar to B+ trees.




                                                 15
Bigtable


Location hierarchy
 Chubby file contains location
     of the root tablet.




                                  16
Bigtable


Location hierarchy
       Root tablet contains all
     tablet locations in Metadata
                 table.




                                     17
Bigtable


Location hierarchy     Metadata table stores
                     locations of actual tablets.




                                                     18
Bigtable


Location hierarchy




  Client moves up the hierarchy (Metadata -> Root -> Chubby),
  if location of tablet is unknown or incorrect.
                                                                 19
Bigtable


How data is served?




                       20
Bigtable


Tablet serving




Persistent




                  21
Bigtable


  Tablet serving


                          Compactions




Compactions occur
regularly, advantages:
-Shrinks memory usage.
-Reduces amount of data
read from log during
recovery.                                22
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                              23
Bigtable


Benchmarks for perf evaluation
• Scan:
  – Scans over values in a row range.
• Random reads from memory.
• Random reads/writes:
  – R keys to be read/written spread over N clients.
• Sequential reads/writes:
  – 0 to R-1 keys to be read/written spread over N
    clients.

                                                        24
Bigtable


Performance evaluation
                     Scan uses single RPC call and
                       shows best performance.




                                               25
Bigtable


Performance evaluation



                          Sequential reads are better
                           than random reads, since
                         each fetched block is used to
                             serve next requests.




                                                 26
Bigtable


Performance evaluation




                     Random read shows the worst
                      performance. Fetching 64KB
                     every 1000 bytes is expensive.




                                              27
Bigtable


Performance evaluation
    Not linear, but scales well.




                                    28
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                              29
Bigtable


Conclusions
• Bigtable: highly scalable and available, without
  compromising performance.
• Flexibility for Google – designed using their
  own data model.
• Custom design gives Google the ability to
  remove or minimize bottlenecks.
• Related work:
  – Apache Hbase (open source)
  – Boxwood (though targeted at a lower/FS level)

                                                     30
Bigtable

A Distributed Storage System for
         Structured Data

        Authors: Fay Chang et. al.

        Presenter: Zafar Gilani

                                     31
Bigtable


B+ Trees
• A tree with sorted data for:
  – Efficient insertion, retrieval and removal of
    records.
• All records are stored at the leaf level, only
  keys stored in interior nodes.




                                                     32

More Related Content

What's hot

What's hot (18)

Big table
Big tableBig table
Big table
 
Bigtable
BigtableBigtable
Bigtable
 
Big table presentation-final
Big table presentation-finalBig table presentation-final
Big table presentation-final
 
Big table
Big tableBig table
Big table
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in Tokyo
 
GOOGLE BIGTABLE
GOOGLE BIGTABLEGOOGLE BIGTABLE
GOOGLE BIGTABLE
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
 
Google Bigtable Paper Presentation
Google Bigtable Paper PresentationGoogle Bigtable Paper Presentation
Google Bigtable Paper Presentation
 
Bigtable
BigtableBigtable
Bigtable
 
Cloud Technology: Virtualization
Cloud Technology: VirtualizationCloud Technology: Virtualization
Cloud Technology: Virtualization
 
Bigtable
BigtableBigtable
Bigtable
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
Bigtable and Boxwood
Bigtable and BoxwoodBigtable and Boxwood
Bigtable and Boxwood
 
Mysql database
Mysql databaseMysql database
Mysql database
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
Key-Value NoSQL Database
Key-Value NoSQL DatabaseKey-Value NoSQL Database
Key-Value NoSQL Database
 

Viewers also liked

Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremGrisha Weintraub
 
Temadeinvestigacion 130402203353-phpapp02
Temadeinvestigacion 130402203353-phpapp02Temadeinvestigacion 130402203353-phpapp02
Temadeinvestigacion 130402203353-phpapp02Camilo López
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And HbaseEdward Yoon
 
Aprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesAprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesEdgar Marca
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonGrisha Weintraub
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYSupport Driven
 
I've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveI've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveSimon Willison
 
Big table por Matias tesoriero
Big table por Matias tesorieroBig table por Matias tesoriero
Big table por Matias tesorieromtesoriero
 
Mallorca MUG: MapReduce y Aggregation Framework
Mallorca MUG: MapReduce y Aggregation FrameworkMallorca MUG: MapReduce y Aggregation Framework
Mallorca MUG: MapReduce y Aggregation FrameworkEmilio Torrens
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)Sri Prasanna
 

Viewers also liked (18)

Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Temadeinvestigacion 130402203353-phpapp02
Temadeinvestigacion 130402203353-phpapp02Temadeinvestigacion 130402203353-phpapp02
Temadeinvestigacion 130402203353-phpapp02
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And Hbase
 
The Google Bigtable
The Google BigtableThe Google Bigtable
The Google Bigtable
 
Aprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y AplicacionesAprendizaje de Maquina y Aplicaciones
Aprendizaje de Maquina y Aplicaciones
 
Cloud Computing y MapReduce
Cloud Computing y MapReduceCloud Computing y MapReduce
Cloud Computing y MapReduce
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
 
Gfs vs hdfs
Gfs vs hdfsGfs vs hdfs
Gfs vs hdfs
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
 
I've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveI've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you have
 
Google's BigTable
Google's BigTableGoogle's BigTable
Google's BigTable
 
Big table por Matias tesoriero
Big table por Matias tesorieroBig table por Matias tesoriero
Big table por Matias tesoriero
 
Mallorca MUG: MapReduce y Aggregation Framework
Mallorca MUG: MapReduce y Aggregation FrameworkMallorca MUG: MapReduce y Aggregation Framework
Mallorca MUG: MapReduce y Aggregation Framework
 
Map reduce (from Google)
Map reduce (from Google)Map reduce (from Google)
Map reduce (from Google)
 
MapReduce en Hadoop
MapReduce en HadoopMapReduce en Hadoop
MapReduce en Hadoop
 
HDFS
HDFSHDFS
HDFS
 

Similar to Bigtable

8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
Xldb2011 wed 1415_andrew_lamb-buildingblocks
Xldb2011 wed 1415_andrew_lamb-buildingblocksXldb2011 wed 1415_andrew_lamb-buildingblocks
Xldb2011 wed 1415_andrew_lamb-buildingblocksliqiang xu
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In DepthFabio Fumarola
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
 
Sql Performance Tuning For Developers
Sql Performance Tuning For DevelopersSql Performance Tuning For Developers
Sql Performance Tuning For Developerssqlserver.co.il
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloudboorad
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911Ines Sombra
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
DevNation Atlanta
DevNation AtlantaDevNation Atlanta
DevNation Atlantaboorad
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud applicationNoam Sheffer
 
Active/Active Database Solutions with Log Based Replication in xDB 6.0
Active/Active Database Solutions with Log Based Replication in xDB 6.0Active/Active Database Solutions with Log Based Replication in xDB 6.0
Active/Active Database Solutions with Log Based Replication in xDB 6.0EDB
 
Spil Games: outgrowing an internet startup
Spil Games: outgrowing an internet startupSpil Games: outgrowing an internet startup
Spil Games: outgrowing an internet startupart-spilgames
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsYasin Memari
 
Codemotion 2015 Infinispan Tech lab
Codemotion 2015 Infinispan Tech labCodemotion 2015 Infinispan Tech lab
Codemotion 2015 Infinispan Tech labUgo Landini
 

Similar to Bigtable (20)

8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
Xldb2011 wed 1415_andrew_lamb-buildingblocks
Xldb2011 wed 1415_andrew_lamb-buildingblocksXldb2011 wed 1415_andrew_lamb-buildingblocks
Xldb2011 wed 1415_andrew_lamb-buildingblocks
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Lecture3.ppt
Lecture3.pptLecture3.ppt
Lecture3.ppt
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
NoSQL
NoSQLNoSQL
NoSQL
 
Sql Performance Tuning For Developers
Sql Performance Tuning For DevelopersSql Performance Tuning For Developers
Sql Performance Tuning For Developers
 
NOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the CloudNOSQL, CouchDB, and the Cloud
NOSQL, CouchDB, and the Cloud
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
DevNation Atlanta
DevNation AtlantaDevNation Atlanta
DevNation Atlanta
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
Active/Active Database Solutions with Log Based Replication in xDB 6.0
Active/Active Database Solutions with Log Based Replication in xDB 6.0Active/Active Database Solutions with Log Based Replication in xDB 6.0
Active/Active Database Solutions with Log Based Replication in xDB 6.0
 
Spil Games: outgrowing an internet startup
Spil Games: outgrowing an internet startupSpil Games: outgrowing an internet startup
Spil Games: outgrowing an internet startup
 
Challenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data GenomicsChallenges and Opportunities of Big Data Genomics
Challenges and Opportunities of Big Data Genomics
 
Codemotion 2015 Infinispan Tech lab
Codemotion 2015 Infinispan Tech labCodemotion 2015 Infinispan Tech lab
Codemotion 2015 Infinispan Tech lab
 

More from zafargilani

6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenterszafargilani
 
Assignment 1-mtat
Assignment 1-mtatAssignment 1-mtat
Assignment 1-mtatzafargilani
 
5 state-of-cloud-applications-and-platforms
5 state-of-cloud-applications-and-platforms5 state-of-cloud-applications-and-platforms
5 state-of-cloud-applications-and-platformszafargilani
 
1 logical data models for cc arch
1 logical data models for cc arch1 logical data models for cc arch
1 logical data models for cc archzafargilani
 
2 rest-elevator-pitch
2 rest-elevator-pitch2 rest-elevator-pitch
2 rest-elevator-pitchzafargilani
 
1 distributed-systems-template-modified
1 distributed-systems-template-modified1 distributed-systems-template-modified
1 distributed-systems-template-modifiedzafargilani
 

More from zafargilani (7)

6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters6 intelligent-placement-of-datacenters
6 intelligent-placement-of-datacenters
 
Assignment 1-mtat
Assignment 1-mtatAssignment 1-mtat
Assignment 1-mtat
 
5 state-of-cloud-applications-and-platforms
5 state-of-cloud-applications-and-platforms5 state-of-cloud-applications-and-platforms
5 state-of-cloud-applications-and-platforms
 
1 logical data models for cc arch
1 logical data models for cc arch1 logical data models for cc arch
1 logical data models for cc arch
 
3 apache-avro
3 apache-avro3 apache-avro
3 apache-avro
 
2 rest-elevator-pitch
2 rest-elevator-pitch2 rest-elevator-pitch
2 rest-elevator-pitch
 
1 distributed-systems-template-modified
1 distributed-systems-template-modified1 distributed-systems-template-modified
1 distributed-systems-template-modified
 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Bigtable

  • 1. Bigtable A Distributed Storage System for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani 1
  • 2. Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 2
  • 3. Bigtable A distributed storage system .. • .. for managing structured data. • Used for demanding workloads, such as: – Throughput oriented batch processing. – Serving latency-sensitive data to the client. • Dynamic control instead of relational model. • Data locality properties (revisit later briefly). 3
  • 4. Bigtable Bigtable has achieved several goals • Wide applicability: used for 60+ Google products, including: – Google Analytics, Google Code, Google Earth, Google Maps and Gmail. • Scalability (explain later under evaluation). • High performance. • High availability. 4
  • 5. Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 5
  • 6. Bigtable Data model • Essentially a sparse, distributed, persistent multi-dimensional sorted map. • The map is indexed by a row key, column key and a timestamp. • Atomic reads and writes over a single row. Columns Rows 6
  • 7. Bigtable Row and column range • Row range dynamically • Column keys grouped partitioned into tablets. into column families. • Data in lexicographic • Each family has the order. same type. • Allows data locality. • Allows access control and disk or memory accounting. 7
  • 8. Bigtable Row and column range • Row range dynamically • Column keys grouped partitioned into tablets. into column families. • Data in lexicographic • Each family has the order. same type. • Allows data locality. • Allows access control and disk or memory accounting. Enables reasoning about data locality 8
  • 10. Columns Rows Anchor is a column family 10
  • 11. Columns Tablets “anchor:bbcworld.com “anchor:weather “com.bbc.www” “BBC” “BBC.com” 11
  • 12. Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 12
  • 13. Bigtable Bigtable uses several other technologies • Google File System to store log and data files. • SSTable file format to store BigTable data. • Chubby, a distributed lock service. For more details on these technologies, refer to section 4 of the paper. 13
  • 14. Bigtable Implementation Master responsibilities: -Assign tablets to tablet servers -Add/delete tablet servers -Balance tablet server load -GC -Schema changes INTERNET CLIENT Communicate directly to tablet servers MASTER TABLET SERVERS 14
  • 15. Bigtable How data is stored? A three-level hierarchy, similar to B+ trees. 15
  • 16. Bigtable Location hierarchy Chubby file contains location of the root tablet. 16
  • 17. Bigtable Location hierarchy Root tablet contains all tablet locations in Metadata table. 17
  • 18. Bigtable Location hierarchy Metadata table stores locations of actual tablets. 18
  • 19. Bigtable Location hierarchy Client moves up the hierarchy (Metadata -> Root -> Chubby), if location of tablet is unknown or incorrect. 19
  • 20. Bigtable How data is served? 20
  • 22. Bigtable Tablet serving Compactions Compactions occur regularly, advantages: -Shrinks memory usage. -Reduces amount of data read from log during recovery. 22
  • 23. Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 23
  • 24. Bigtable Benchmarks for perf evaluation • Scan: – Scans over values in a row range. • Random reads from memory. • Random reads/writes: – R keys to be read/written spread over N clients. • Sequential reads/writes: – 0 to R-1 keys to be read/written spread over N clients. 24
  • 25. Bigtable Performance evaluation Scan uses single RPC call and shows best performance. 25
  • 26. Bigtable Performance evaluation Sequential reads are better than random reads, since each fetched block is used to serve next requests. 26
  • 27. Bigtable Performance evaluation Random read shows the worst performance. Fetching 64KB every 1000 bytes is expensive. 27
  • 28. Bigtable Performance evaluation Not linear, but scales well. 28
  • 29. Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 29
  • 30. Bigtable Conclusions • Bigtable: highly scalable and available, without compromising performance. • Flexibility for Google – designed using their own data model. • Custom design gives Google the ability to remove or minimize bottlenecks. • Related work: – Apache Hbase (open source) – Boxwood (though targeted at a lower/FS level) 30
  • 31. Bigtable A Distributed Storage System for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani 31
  • 32. Bigtable B+ Trees • A tree with sorted data for: – Efficient insertion, retrieval and removal of records. • All records are stored at the leaf level, only keys stored in interior nodes. 32