2. Bigtable
Outline
• Introduction
• Data model
• Implementation
• Performance evaluation
• Conclusions
2
3. Bigtable
A distributed storage system ..
• .. for managing structured data.
• Used for demanding workloads, such as:
– Throughput oriented batch processing.
– Serving latency-sensitive data to the client.
• Dynamic control instead of relational model.
• Data locality properties (revisit later briefly).
3
4. Bigtable
Bigtable has achieved several goals
• Wide applicability: used for 60+ Google
products, including:
– Google Analytics, Google Code, Google Earth,
Google Maps and Gmail.
• Scalability (explain later under evaluation).
• High performance.
• High availability.
4
5. Bigtable
Outline
• Introduction
• Data model
• Implementation
• Performance evaluation
• Conclusions
5
6. Bigtable
Data model
• Essentially a sparse, distributed, persistent
multi-dimensional sorted map.
• The map is indexed by a row key, column key
and a timestamp.
• Atomic reads and writes over a single row.
Columns
Rows
6
7. Bigtable
Row and column range
• Row range dynamically • Column keys grouped
partitioned into tablets. into column families.
• Data in lexicographic • Each family has the
order. same type.
• Allows data locality. • Allows access control
and disk or memory
accounting.
7
8. Bigtable
Row and column range
• Row range dynamically • Column keys grouped
partitioned into tablets. into column families.
• Data in lexicographic • Each family has the
order. same type.
• Allows data locality. • Allows access control
and disk or memory
accounting.
Enables reasoning about data locality
8
12. Bigtable
Outline
• Introduction
• Data model
• Implementation
• Performance evaluation
• Conclusions
12
13. Bigtable
Bigtable uses several other
technologies
• Google File System to store log and data files.
• SSTable file format to store BigTable data.
• Chubby, a distributed lock service.
For more details on these technologies, refer to section 4 of the paper.
13
14. Bigtable
Implementation
Master responsibilities:
-Assign tablets to tablet
servers
-Add/delete tablet servers
-Balance tablet server load
-GC
-Schema changes
INTERNET
CLIENT
Communicate directly
to tablet servers
MASTER
TABLET SERVERS
14
22. Bigtable
Tablet serving
Compactions
Compactions occur
regularly, advantages:
-Shrinks memory usage.
-Reduces amount of data
read from log during
recovery. 22
23. Bigtable
Outline
• Introduction
• Data model
• Implementation
• Performance evaluation
• Conclusions
23
24. Bigtable
Benchmarks for perf evaluation
• Scan:
– Scans over values in a row range.
• Random reads from memory.
• Random reads/writes:
– R keys to be read/written spread over N clients.
• Sequential reads/writes:
– 0 to R-1 keys to be read/written spread over N
clients.
24
29. Bigtable
Outline
• Introduction
• Data model
• Implementation
• Performance evaluation
• Conclusions
29
30. Bigtable
Conclusions
• Bigtable: highly scalable and available, without
compromising performance.
• Flexibility for Google – designed using their
own data model.
• Custom design gives Google the ability to
remove or minimize bottlenecks.
• Related work:
– Apache Hbase (open source)
– Boxwood (though targeted at a lower/FS level)
30
32. Bigtable
B+ Trees
• A tree with sorted data for:
– Efficient insertion, retrieval and removal of
records.
• All records are stored at the leaf level, only
keys stored in interior nodes.
32