SlideShare a Scribd company logo
1 of 39
TiDB for Big Data
shenli@PingCAP
About me
● Shen Li (申砾)
● Tech Lead of TiDB, VP of Engineering
● Netease / 360 / PingCAP
● Infrastructure software engineer
What is Big Data?
Big Data is a term for data sets that are so large or complex that traditional
data processing application software is inadequate to deal with them.
---- From Wikipedia
Big Data Landscape
OLTP and OLAP
What is TiDB
● SQL is necessary
● Scale is easy
● Compatible with MySQL, at most cases
● OLTP + OLAP = HTAP (Hybrid Transactional/Analytical Processing)
○ Transaction + Complex query
● 24/7 availability, even in case of datacenter outages
○ Thanks to Raft consensus algorithm
● Open source, of course.
Architecture
TiKV TiKV TiKV TiKV
Raft Raft Raft
TiDB TiDB TiDB
... ......
... ...
Placement
Driver (PD)
Control flow:
Balance / Failover
Metadata / Timestamp request
Stateless SQL Layer
Distributed Storage Layer
gRPC
gRPC
gRPC
Storage stack 1/2
● TiKV is the underlying storage layer
● Physically, data is stored in RocksDB
● We build a Raft layer on top of RocksDB
○ What is Raft?
● Written in Rust!
TiKV
API (gRPC)
Transaction
MVCC
Raft (gRPC)
RocksDB
Raw KV API
(https://github.com/pingc
ap/tidb/blob/master/cmd
/benchraw/main.go)
Transactional KV API
(https://github.com/pingcap
/tidb/blob/master/cmd/ben
chkv/main.go)
RocksDB
Instance
Region 1:[a-e]
Region 3:[k-o]
Region 5:[u-z]
...
Region 4:[p-t]
RocksDB
Instance
Region 1:[a-e]
Region 2:[f-j]
Region 4:[p-t]
...
Region 3:[k-o]
RocksDB
Instance
Region 2:[f-j]
Region 5:[u-z]
Region 3:[k-o]
... RocksDB
Instance
Region 1:[a-e]
Region 2:[f-j]
Region 5:[u-z]
...
Region 4:[p-t]
Raft group
Storage stack 2/2
● Data is organized by Regions
● Region: a set of continuous key-value pairs
RPC (gRPC)
Transaction
MVCC
Raft
RocksDB
···
Dynamic Multi-Raft
● What’s Dynamic Multi-Raft?
○ Dynamic split / merge
● Safe split / merge
Region 1:[a-e]
split Region 1.1:[a-c]
Region 1.2:[d-e]split
Safe Split: 1/4
TiKV1
Region 1:[a-e]
TiKV2
Region 1:[a-e]
TiKV3
Region 1:[a-e]
raft raft
Leader Follower Follower
Raft group
Safe Split: 2/4
TiKV2
Region 1:[a-e]
TiKV3
Region 1:[a-e]
raft raft
Leader
Follower Follower
TiKV1
Region 1.1:[a-c]
Region 1.2:[d-e]
Safe Split: 3/4
TiKV1
Region 1.1:[a-c]
Region 1.2:[d-e]
Leader
Follower Follower
Split log (replicated by Raft)
Split log
TiKV2
Region 1:[a-e]
TiKV3
Region 1:[a-e]
Safe Split: 4/4
TiKV1
Region 1.1:[a-c]
Leader
Region 1.2:[d-e]
TiKV2
Region 1.1:[a-c]
Follower
Region 1.2:[d-e]
TiKV3
Region 1.1:[a-c]
Follower
Region 1.2:[d-e]
raft
raft
raft
raft
Region 1
Region 3
Region 1
Region 2
Scale-out (initial state)
Region 1*
Region 2 Region 2
Region 3Region 3
Node A
Node B
Node C
Node D
Region 1
Region 3
Region 1^
Region 2
Region 1*
Region 2 Region 2
Region 3
Region 3
Node A
Node B
Node E
1) Transfer leadership of region 1 from Node A to Node B
Node C
Node D
Scale-out (add new node)
Region 1
Region 3
Region 1*
Region 2
Region 2 Region 2
Region 3
Region 1
Region 3
Node A
Node B
2) Add Replica on Node E
Node C
Node D
Node E
Region 1
Scale-out (balancing)
Region 1
Region 3
Region 1*
Region 2
Region 2 Region 2
Region 3
Region 1
Region 3
Node A
Node B
3) Remove Replica from Node A
Node C
Node D
Node E
Scale-out (balancing)
ACID Transaction
● Based on Google Percolator
● ‘Almost’ decentralized 2-phase commit
○ Timestamp Allocator
● Optimistic transaction model
● Default isolation level: Repeatable Read
● External consistency: Snapshot Isolation + Lock
■ SELECT … FOR UPDATE
Distributed SQL
● Full-featured SQL layer
● Predicate pushdown
● Distributed join
● Distributed cost-based optimizer (Distributed CBO)
TiDB SQL Layer overview
What happens behind a query
CREATE TABLE t (c1 INT, c2 TEXT, KEY idx_c1(c1));
SELECT COUNT(c1) FROM t WHERE c1 > 10 AND c2 = ‘golang’;
Query Plan
Partial Aggregate
COUNT(c1)
Filter
c2 = “golang”
Read Index
idx1: (10, +∞)
Physical Plan on TiKV (index scan)
Read Row Data
by RowID
RowID
Row
Row
Final Aggregate
SUM(COUNT(c1))
DistSQL Scan
Physical Plan on TiDB
COUNT(c1)
COUNT(c1)
TiKV
TiKV
TiKV
COUNT(c1)
COUNT(c1)
What happens behind a query
CREATE TABLE left (id INT, email TEXT,KEY idx_id(id));
CREATE TABLE right (id INT, email TEXT, KEY idx_id(id));
SELECT * FROM left join right WHERE left.id = right.id;
Distributed Join (HashJoin)
Supported Distributed Join Type
● Hash Join
● Sort merge Join
● Index-lookup Join
Hybrid Transactional/Analytical Processing
TiDB with the Big Data Ecosystem
Syncer
● Synchronize data from MySQL in real-time
● Hook up as a MySQL replica
MySQL
(master)
Syncer
Save Point
(disk)
Rule Filter
MySQL
TiDB Cluster
TiDB Cluster
TiDB Cluster
Syncer
Syncerbinlog
Fake slave
Syncer
or
TiDB-Binlog
TiDB Server
TiDB Server Sorter
Pumper
Pumper
TiDB Server
Pumper
Protobuf
MySQL Binlog
MySQL
3rd party applicationsCistern
● Subscribe the incremental data from TiDB
● Output Protobuf formatted data or MySQL Binlog format(WIP)
Another TiDB-Cluster
TiSpark
TiKV TiKV TiKV TiKV TiKV
TiDB TiDB
TiDB
TiDB + SparkSQL = TiSpark
Spark Master
TiKV Connector
Data Storage & Coprocessor
PD
Spark Exec
TiKV Connector
Spark Exec
TiKV Connector
Spark Exec
TiSpark
● TiKV Connector is better than JDBC connector
● Index support
● Complex Calculation Pushdown
● CBO
○ Pick up right Access Path
○ Join Reorder
● Priority & Isolation Level
Too Abstract? Let’s get concrete.
TiKV
CoprocessorSpark
SQL Plan PushDown Plan
SQL: Select sum(score) from t1 group by class
where school = “engineering”;
Pushdown Plan: Sum(score), Group by class, Table:t1
Filter: School = “engineering”
Use Case
Use Case MySQL Spark TiDB TiSpark
Large-Aggregat
es
Slow or
impossible if
beyond scale
Well supported Supported Well supported
Large-joins Slow or
impossible if
beyond scale
Well supported Supported Well supported
Point Query Fast Very slow on
HDFS
Fast Fast
Modification Supported Not possible on
HDFS
Supported Supported
Benefit
● Analytical / Transactional support all on one platform
○ No need for ETL
○ Real-time query with Spark
○ Possibility for get rid of Hadoop
● Embrace Spark echo-system
○ Support of complex transformation and analytics with Scala /
Python and R
○ Machine Learning Libraries
○ Spark Streaming
Current Status
● Phase 1: (will be released with GA)
○ Aggregates pushdown
○ Type System
○ Filter Pushdown and Access Path selection
● Phase 2: (EOY)
○ Join Reorder
○ Write
Future work of TiDB
Roadmap
● TiSpark: Integrate TiKV with SparkSQL
● Better optimizer (Statistic && CBO)
● Json type and document store for TiDB
○ MySQL 5.7.12+ X-Plugin
● Integrate with Kubernetes
○ Operator by CoreOS
Thanks
https://github.com/pingcap/tidb
https://github.com/pingcap/tikv
Contact me:
shenli@pingcap.com

More Related Content

What's hot

What's hot (20)

When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
Large Table Partitioning with PostgreSQL and Django
 Large Table Partitioning with PostgreSQL and Django Large Table Partitioning with PostgreSQL and Django
Large Table Partitioning with PostgreSQL and Django
 
MySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZEMySQL 8.0 EXPLAIN ANALYZE
MySQL 8.0 EXPLAIN ANALYZE
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
ProxySQL for MySQL
ProxySQL for MySQLProxySQL for MySQL
ProxySQL for MySQL
 
Demystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash SafetyDemystifying MySQL Replication Crash Safety
Demystifying MySQL Replication Crash Safety
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
MySQL Performance for DevOps
MySQL Performance for DevOpsMySQL Performance for DevOps
MySQL Performance for DevOps
 
Advanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suiteAdvanced Percona XtraDB Cluster in a nutshell... la suite
Advanced Percona XtraDB Cluster in a nutshell... la suite
 
MySQL Connectors 8.0.19 & DNS SRV
MySQL Connectors 8.0.19 & DNS SRVMySQL Connectors 8.0.19 & DNS SRV
MySQL Connectors 8.0.19 & DNS SRV
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
Building resilient scheduling in distributed systems with Spring
Building resilient scheduling in distributed systems with SpringBuilding resilient scheduling in distributed systems with Spring
Building resilient scheduling in distributed systems with Spring
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
State of the Dolphin - May 2022
State of the Dolphin - May 2022State of the Dolphin - May 2022
State of the Dolphin - May 2022
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 

Similar to TiDB for Big Data

Similar to TiDB for Big Data (20)

A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)A Brief Introduction of TiDB (Percona Live)
A Brief Introduction of TiDB (Percona Live)
 
Scale Relational Database with NewSQL
Scale Relational Database with NewSQLScale Relational Database with NewSQL
Scale Relational Database with NewSQL
 
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
Introducing TiDB [Delivered: 09/27/18 at NYC SQL Meetup]
 
TiDB as an HTAP Database
TiDB as an HTAP DatabaseTiDB as an HTAP Database
TiDB as an HTAP Database
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
 
TiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup GroupTiDB Introduction - Boston MySQL Meetup Group
TiDB Introduction - Boston MySQL Meetup Group
 
TiDB Introduction - San Francisco MySQL Meetup
TiDB Introduction - San Francisco MySQL MeetupTiDB Introduction - San Francisco MySQL Meetup
TiDB Introduction - San Francisco MySQL Meetup
 
TiDB vs Aurora.pdf
TiDB vs Aurora.pdfTiDB vs Aurora.pdf
TiDB vs Aurora.pdf
 
Rust in TiKV
Rust in TiKVRust in TiKV
Rust in TiKV
 
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKVPresentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
Presentation at SF Kubernetes Meetup (10/30/18), Introducing TiDB/TiKV
 
Introducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live FrankfurtIntroducing TiDB - Percona Live Frankfurt
Introducing TiDB - Percona Live Frankfurt
 
Introducing TiDB @ SF DevOps Meetup
Introducing TiDB @ SF DevOps MeetupIntroducing TiDB @ SF DevOps Meetup
Introducing TiDB @ SF DevOps Meetup
 
Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator [Cologne, Germany]Introducing TiDB Operator [Cologne, Germany]
Introducing TiDB Operator [Cologne, Germany]
 
FOSDEM MySQL and Friends Devroom
FOSDEM MySQL and Friends DevroomFOSDEM MySQL and Friends Devroom
FOSDEM MySQL and Friends Devroom
 
TiDB + Mobike by Kevin Xu (@kevinsxu)
TiDB + Mobike by Kevin Xu (@kevinsxu)TiDB + Mobike by Kevin Xu (@kevinsxu)
TiDB + Mobike by Kevin Xu (@kevinsxu)
 
Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
"Smooth Operator" [Bay Area NewSQL meetup]
"Smooth Operator" [Bay Area NewSQL meetup]"Smooth Operator" [Bay Area NewSQL meetup]
"Smooth Operator" [Bay Area NewSQL meetup]
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 

More from PingCAP

[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases
PingCAP
 

More from PingCAP (20)

[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
 
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
[Paper Reading]Orca: A Modular Query Optimizer Architecture for Big Data
 
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
 
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
 
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
 
[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases
 
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
 
[Paperreading] Paxos made easy (by sen han)
[Paperreading]  Paxos made easy (by sen han)[Paperreading]  Paxos made easy (by sen han)
[Paperreading] Paxos made easy (by sen han)
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
 
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workl...
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
 
Finding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management SystemsFinding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management Systems
 
Chaos Practice in PingCAP
Chaos Practice in PingCAPChaos Practice in PingCAP
Chaos Practice in PingCAP
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPay
 
Paper Reading: FPTree
Paper Reading: FPTreePaper Reading: FPTree
Paper Reading: FPTree
 
Paper Reading: Smooth Scan
Paper Reading: Smooth ScanPaper Reading: Smooth Scan
Paper Reading: Smooth Scan
 
Paper Reading: Flexible Paxos
Paper Reading: Flexible PaxosPaper Reading: Flexible Paxos
Paper Reading: Flexible Paxos
 
Paper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in OraclePaper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in Oracle
 

Recently uploaded

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Recently uploaded (20)

The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

TiDB for Big Data

  • 1. TiDB for Big Data shenli@PingCAP
  • 2. About me ● Shen Li (申砾) ● Tech Lead of TiDB, VP of Engineering ● Netease / 360 / PingCAP ● Infrastructure software engineer
  • 3. What is Big Data? Big Data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. ---- From Wikipedia
  • 6. What is TiDB ● SQL is necessary ● Scale is easy ● Compatible with MySQL, at most cases ● OLTP + OLAP = HTAP (Hybrid Transactional/Analytical Processing) ○ Transaction + Complex query ● 24/7 availability, even in case of datacenter outages ○ Thanks to Raft consensus algorithm ● Open source, of course.
  • 7. Architecture TiKV TiKV TiKV TiKV Raft Raft Raft TiDB TiDB TiDB ... ...... ... ... Placement Driver (PD) Control flow: Balance / Failover Metadata / Timestamp request Stateless SQL Layer Distributed Storage Layer gRPC gRPC gRPC
  • 8. Storage stack 1/2 ● TiKV is the underlying storage layer ● Physically, data is stored in RocksDB ● We build a Raft layer on top of RocksDB ○ What is Raft? ● Written in Rust! TiKV API (gRPC) Transaction MVCC Raft (gRPC) RocksDB Raw KV API (https://github.com/pingc ap/tidb/blob/master/cmd /benchraw/main.go) Transactional KV API (https://github.com/pingcap /tidb/blob/master/cmd/ben chkv/main.go)
  • 9. RocksDB Instance Region 1:[a-e] Region 3:[k-o] Region 5:[u-z] ... Region 4:[p-t] RocksDB Instance Region 1:[a-e] Region 2:[f-j] Region 4:[p-t] ... Region 3:[k-o] RocksDB Instance Region 2:[f-j] Region 5:[u-z] Region 3:[k-o] ... RocksDB Instance Region 1:[a-e] Region 2:[f-j] Region 5:[u-z] ... Region 4:[p-t] Raft group Storage stack 2/2 ● Data is organized by Regions ● Region: a set of continuous key-value pairs RPC (gRPC) Transaction MVCC Raft RocksDB ···
  • 10. Dynamic Multi-Raft ● What’s Dynamic Multi-Raft? ○ Dynamic split / merge ● Safe split / merge Region 1:[a-e] split Region 1.1:[a-c] Region 1.2:[d-e]split
  • 11. Safe Split: 1/4 TiKV1 Region 1:[a-e] TiKV2 Region 1:[a-e] TiKV3 Region 1:[a-e] raft raft Leader Follower Follower Raft group
  • 12. Safe Split: 2/4 TiKV2 Region 1:[a-e] TiKV3 Region 1:[a-e] raft raft Leader Follower Follower TiKV1 Region 1.1:[a-c] Region 1.2:[d-e]
  • 13. Safe Split: 3/4 TiKV1 Region 1.1:[a-c] Region 1.2:[d-e] Leader Follower Follower Split log (replicated by Raft) Split log TiKV2 Region 1:[a-e] TiKV3 Region 1:[a-e]
  • 14. Safe Split: 4/4 TiKV1 Region 1.1:[a-c] Leader Region 1.2:[d-e] TiKV2 Region 1.1:[a-c] Follower Region 1.2:[d-e] TiKV3 Region 1.1:[a-c] Follower Region 1.2:[d-e] raft raft raft raft
  • 15. Region 1 Region 3 Region 1 Region 2 Scale-out (initial state) Region 1* Region 2 Region 2 Region 3Region 3 Node A Node B Node C Node D
  • 16. Region 1 Region 3 Region 1^ Region 2 Region 1* Region 2 Region 2 Region 3 Region 3 Node A Node B Node E 1) Transfer leadership of region 1 from Node A to Node B Node C Node D Scale-out (add new node)
  • 17. Region 1 Region 3 Region 1* Region 2 Region 2 Region 2 Region 3 Region 1 Region 3 Node A Node B 2) Add Replica on Node E Node C Node D Node E Region 1 Scale-out (balancing)
  • 18. Region 1 Region 3 Region 1* Region 2 Region 2 Region 2 Region 3 Region 1 Region 3 Node A Node B 3) Remove Replica from Node A Node C Node D Node E Scale-out (balancing)
  • 19. ACID Transaction ● Based on Google Percolator ● ‘Almost’ decentralized 2-phase commit ○ Timestamp Allocator ● Optimistic transaction model ● Default isolation level: Repeatable Read ● External consistency: Snapshot Isolation + Lock ■ SELECT … FOR UPDATE
  • 20. Distributed SQL ● Full-featured SQL layer ● Predicate pushdown ● Distributed join ● Distributed cost-based optimizer (Distributed CBO)
  • 21. TiDB SQL Layer overview
  • 22. What happens behind a query CREATE TABLE t (c1 INT, c2 TEXT, KEY idx_c1(c1)); SELECT COUNT(c1) FROM t WHERE c1 > 10 AND c2 = ‘golang’;
  • 23. Query Plan Partial Aggregate COUNT(c1) Filter c2 = “golang” Read Index idx1: (10, +∞) Physical Plan on TiKV (index scan) Read Row Data by RowID RowID Row Row Final Aggregate SUM(COUNT(c1)) DistSQL Scan Physical Plan on TiDB COUNT(c1) COUNT(c1) TiKV TiKV TiKV COUNT(c1) COUNT(c1)
  • 24. What happens behind a query CREATE TABLE left (id INT, email TEXT,KEY idx_id(id)); CREATE TABLE right (id INT, email TEXT, KEY idx_id(id)); SELECT * FROM left join right WHERE left.id = right.id;
  • 26. Supported Distributed Join Type ● Hash Join ● Sort merge Join ● Index-lookup Join
  • 28. TiDB with the Big Data Ecosystem
  • 29. Syncer ● Synchronize data from MySQL in real-time ● Hook up as a MySQL replica MySQL (master) Syncer Save Point (disk) Rule Filter MySQL TiDB Cluster TiDB Cluster TiDB Cluster Syncer Syncerbinlog Fake slave Syncer or
  • 30. TiDB-Binlog TiDB Server TiDB Server Sorter Pumper Pumper TiDB Server Pumper Protobuf MySQL Binlog MySQL 3rd party applicationsCistern ● Subscribe the incremental data from TiDB ● Output Protobuf formatted data or MySQL Binlog format(WIP) Another TiDB-Cluster
  • 31. TiSpark TiKV TiKV TiKV TiKV TiKV TiDB TiDB TiDB TiDB + SparkSQL = TiSpark Spark Master TiKV Connector Data Storage & Coprocessor PD Spark Exec TiKV Connector Spark Exec TiKV Connector Spark Exec
  • 32. TiSpark ● TiKV Connector is better than JDBC connector ● Index support ● Complex Calculation Pushdown ● CBO ○ Pick up right Access Path ○ Join Reorder ● Priority & Isolation Level
  • 33. Too Abstract? Let’s get concrete. TiKV CoprocessorSpark SQL Plan PushDown Plan SQL: Select sum(score) from t1 group by class where school = “engineering”; Pushdown Plan: Sum(score), Group by class, Table:t1 Filter: School = “engineering”
  • 34. Use Case Use Case MySQL Spark TiDB TiSpark Large-Aggregat es Slow or impossible if beyond scale Well supported Supported Well supported Large-joins Slow or impossible if beyond scale Well supported Supported Well supported Point Query Fast Very slow on HDFS Fast Fast Modification Supported Not possible on HDFS Supported Supported
  • 35. Benefit ● Analytical / Transactional support all on one platform ○ No need for ETL ○ Real-time query with Spark ○ Possibility for get rid of Hadoop ● Embrace Spark echo-system ○ Support of complex transformation and analytics with Scala / Python and R ○ Machine Learning Libraries ○ Spark Streaming
  • 36. Current Status ● Phase 1: (will be released with GA) ○ Aggregates pushdown ○ Type System ○ Filter Pushdown and Access Path selection ● Phase 2: (EOY) ○ Join Reorder ○ Write
  • 38. Roadmap ● TiSpark: Integrate TiKV with SparkSQL ● Better optimizer (Statistic && CBO) ● Json type and document store for TiDB ○ MySQL 5.7.12+ X-Plugin ● Integrate with Kubernetes ○ Operator by CoreOS