4. Big Picture
• Same basic pieces as most databases:
– Driver: manage interaction with client
– Parser: process textual query language
– Compiler / Optimizer: convert logical query into physical plan
– Execution Engine: run physical plan across cluster
– Storage Handlers: feed user data in/out of execution
5. Parser
• Converts text-based query language into internal DAG representation
– Grammar, syntax, basic query validation
– Generally straightforward to implement
• Initial goal is to support a SQL-like query language for nested data (DrQL)
– Compatible with Google BigQuery/Dremel
– Designed to support data sources that have a well-defined schema (e.g.
protocol buffers) as well as those that don't (e.g. JSON)
• Other potential input styles:
– MongoDB's query language
– Hive
– Pig
6. Traditional Query Optimizers
• 30+ year history into relational query optimization
– We have to follow down the same general path
• Converts a logical query plan into a physical one
– Example: convert logical “JOIN” operator into specific hash join operator
– Attempts to choose the “best” overall execution plan
• Magic black box of statistics!
– Optimizers do great with queries that can be easily modeled with
available statistics
– Difficulties: lack of statistics, complex schemas, complex queries
– Database users often work around optimizer using query hints
● “force index”
7. Intermediate Representation
• Intermediate Representation (IR) is common internal API
– Output from Parser
– Input/Output from Optimizer
– Input to Execution Engine
• Textual Representation:
– Flexibility
● Different users can enter at different levels of the IR
● Advanced users can skip optimizer entirely
– Easier to test various pieces
– Easy to cache
● Query optimization can be computationally expensive, so traditional databases go to
great lengths to reuse execution plans
• Ideally IR would be format used between optimization passes
– Inspiration: LLVM, SQL Server showplan
8. Execution Engine
• Execution layer
– Query is a DAG of operators
• Operator layer
– Implementation of individual operators and data format
serialization
9. Execution Layer
• Query structured as a Directed Acyclic Graph (DAG) representing the data flow
– Each node is an abstract “operator”
– Communication between nodes is “blobs” of data
– Data model described well in Microsoft's Dryad paper (Isard '07)
• Responsible for handling:
– Operator dependencies
– Task scheduling
– Inter-node communication
• Notable features:
– Speculative execution
– Pipelining with spill-to-disk as fallback
– Back pressure
10. Operator Execution
• Implementation of individual operators
– Example built-in operators: hash aggregate, filter, json-scan
– Extensible so new operators are easy to plug in
• Serialization-aware:
– Each “blob” is a batch of rows in a particular format:
● Row-wise, no schema: MessagePack
● Row-wise, schema: Protocol Buffers
● Columnar, schema: Dremel-style format
– Different operator implementations for different serializations
11. Storage Interfaces
• Scanner operators
– Common APIs to convert user data into formats understood by
execution operators
– Example conversions:
● JSON → MessagePack
● CSV → MessagePack
● Dremel: columnar serialization → Protocol Buffers
• Data sources:
– HDFS
– NFS
– HBase / Cassandra
– MySQL / PostgreSQL / etc
12. Storage Interfaces
• Scanner Flexibility:
– Allow in-place filtering (predicate pushdown)
– Scanners can manage their own caching policies for their
data
• In-place processing
– Having a separate “ETL” step is painful
● Easiest to process data on demand
– Query workload gives feedback on scanner access patterns
● Database Cracking: adaptively convert storage layout into
more efficient forms
13. Design Principles
Flexible Easy
• Pluggable query languages • Unzip and run
• Extensible execution engine • Zero configuration
• Pluggable data formats • Reverse DNS not needed
• Column-based and row-based • IP addresses can change
• Schema and schema-less • Clear and concise log
• Pluggable data sources messages
Dependable Fast
• No SPOF • C/C++ core with Java support
• Instant recovery from crashes • Google C++ style guide
• Min latency and max
throughput (limited only by
hardware)