2. New SQL
• Scale like NoSQL
• Provide SQL functions
– Complete transaction support
– Full SQL query support
3. Spanner
• Global distributed DB in google
• Anti datacenter disaster
• Consistency
– External consistent read / write
– Global consistent read at timestamp
4. Spanner cluster
• zonemaster: assign data to spanserver
• spanserver: serve data to client
• location proxy: locate spanservers
• universe master: console to display status information
• placement driver: handle automated movement of data across zones
5. Spanner server
• Colossus: successor of GFS
• Tablet: store key ranged data
• Paxos: consensus protocol to keep replicas in sync
7. Data model
• SQL like schema
– Row must have PK
• Each row is multi versioned
– Version by timestamp
– Old versions are garbage collected
• Protocol buffer support
8. Interleaved tables
• Rows with same key prefix are grouped into one directory
• Data in same directory is co-located
16. Types of transaction
• Read write transaction
– Read locks hold on replica leader
– Client buffer all writes
– By end of transaction, 2PL commit
• Snapshot transaction
– Read data at previous timestamp
– Any up to date enough replica can serv
– Lock free
• Read only transaction
– Spanner choose one transaction
– Remaining is same for Snapshot transaction
19. True time API
• Explicitly express time uncertainty
• Time masters implemented using GPS and atomic clock
• Time daemon on every machine
20. Timestamp for RW transaction
• Paxos write
– Monotonical timestamp associated with each write
• Participant: prepare timestamp
– Bigger than all previous transactions
• Coordinator: commit message time
– TT.now().latest
• Coordinator: commit timestamp
– Bigger than all prepare timestamp
– Bigger then all previous transactions
– Bigger than commit message time
• Coordinator: wait time
– TT.after(commit message time)
21. Update to date?
• Safe time of replica: min of
– Safe time of Paxos
– Safe time of transaction manager
• Safe time of Paxos
– Max timestamp for Paxos write
• Safe time of transaction manager
– Min prepare time stamp of prepared transactions
22. Schema change
• Plan schema change at a future time
• All shards perform schema change at that
time
• Read / write transactions coordinate based
timestamp
26. Distributed join
• Use sharding key filter to extract sharding key ranges from input
• Merge sharding key ranges
• Compute affected shards
• Construct minimal batches for these shards
27. Run query
• Single consumer API
• Parallel consumer API
• Query auto restart
– Any machine can fail
– Restart token accompany all query result
– Capture distributed state of query plan