ScaleDB MySQL storage Engine
Enabling high performance and scalability, using a Multi-Table Index, and a Shared-Disk Clustering Architecture
by Moshe Shadmon moshe@scaledb.com
How to Troubleshoot Apps for the Modern Connected Worker
The scale db storage engine enabling high performance and scalability using materialized views and a shared-disk clustering architecture presentation
1. The ScaleDB
Storage Engine
Enabling high performance and
scalability, using a Multi-Table Index,
and a Shared-Disk Clustering
Architecture
Moshe Shadmon moshe@scaledb.com
2. Agenda
Overview
ScaleDB’s Clustering Architecture
o Shared-Disk vs. Shared-Nothing
o MySQL and a Shared-Disk Storage Engine
o ScaleDB Installation
o Demo
ScaleDB’s Indexing Technology
o Multi-Table Index
o Enabling Multi-Table Index in MySQL
o Demo
Summary
ScaleDB Status & Product Availability
3. Overview
Plug-in Storage Engine for MySQL
Main Features:
o Shared-Disk Architecture
o Innovative Multi-Table Indexing
o Transactional
o Row-Level Locking
o ACID Compliant
o Atomicity: All tasks of a transaction performed or none of them are.
o Consistency: The database is in a consistent state before and after the transaction.
o Isolation: Data is not available in an intermediate state during a transaction
o Durability: When a transaction completes, the transaction’s data will persist
o Disk-Based Storage Engine
4. Shared-Disk vs. Shared-Nothing
Manageability
Adaptability
Availability/Fault-Tolerance
Scalability
Performance
Total Cost of Ownership (TCO)
5. Shared-Nothing:
Database
Instance 1 Table A
Table B
Table C
Database
Instance 1
Database
Instance 2
Database
Instance 3
Table A
Table B
Table C
Vertical Partitioning
6. Shared Nothing:
Partitioning Your Data…How
Predict usage patterns, application evolution, data
growth patterns…all are moving targets
Avoid data skew: bottlenecks caused by frequently
accessed data on just a few nodes
Avoid data shipping between nodes
Avoid delays from distributed 2-phase commit
Searches outside the partition column require
participation by all nodes
Scaling becomes an exercise in fire fighting
7. Shared-Nothing:
Horizontal Partitioning
name age salary
Bob 20 10K
Shideh 18 35K
Ted 50 60K
Kevin 62 120K
Angela 55 140K
Mike 45 90K
name age salary
Physical View
Partitioned
by Salary
Logical View
name age salary
Ted 50 60K
Kevin 62 120K
Mike 46 90K
Bob 20 10K
name age salary
Shideh 18 35K
Angela 55 140K
Horizontal Partitioning – Salary % 3
8. Shared-Nothing:
Horizontal Partitioning Pitfalls
Selections with equality predicates referencing
the partitioning attribute are directed to a
single node:
o Retrieve Emp where salary = 60K
SELECT FROM Emp WHERE salary=60K
Equality predicates referencing a non-partitioning
attribute and range predicates are
directed to all nodes:
o Retrieve Emp where age = 20
o Retrieve Emp where salary < 20K
SELECT FROM Emp WHERE salary<20K
9. Shared-Disk:
No Partitioning, Full Access to Data
DB Cluster
Node 1
DB Cluster
Node 2
DB Cluster
Node 3
Table A
Table B
Table C
Shared Disk
Subsystem
High-Speed Interconnect
Database
Instance 1 Table A
Table B
Table C
11. Scalability & Availability
Shared Disk
Node A
Node B
Node C
MySQL Servers
with ScaleDB
Engine
Data
Node D
Node E
12. Shared-Disk:
Summarizing Shared-Disk Benefits
Grow by simply adding nodes to the cluster
o Servers can be added and removed dynamically
according to your needs
o No interruption to your application
High-Availability with dynamic failover
o Existing nodes automatically take over
Significantly reduced maintenance costs
o Can be built on low-cost commodity hardware
o No data partitioning
o No need for slaves
Low Total Cost of Ownership (TCO)
13. Shared-Disk:
Making it work with MySQL
NNooddee 11
ScaleDB Engine
Instance A
Cluster
Manager
Buffer Manager
Comm.
Layer
Server
Instance A
NNooddee 22
Server
Instance B
ScaleDB Engine
Instance B
Cluster
Manager
Comm.
Layer
Buffer Manager
CClluusstteerr IInntteerrccoonnnneecctt
SShhaarreedd DDiisskk SSuubb--ssyysstteemm
14. Shared-Disk: Insert New Row
ScaleDB Engine
Instance A
Cluster
Manager
Buffer Manager
Comm.
Layer
NNooddee 11
Server
Instance A
NNooddee 22
Server
Instance B
ScaleDB Engine
Instance B
Cluster
Manager
Comm.
Layer
Buffer Manager
CClluusstteerr IInntteerrccoonnnneecctt
SShhaarreedd DDiisskk SSuubb--ssyysstteemm
15. ScaleDB Engine
Instance A
Cluster
Manager
Buffer Manager
Comm.
Layer
NNooddee 11
Server
Instance A
NNooddee 22
Server
Instance B
ScaleDB Engine
Instance B
Cluster
Manager
Comm.
Layer
Buffer Manager
Shared-Disk: Select
CClluusstteerr IInntteerrccoonnnneecctt
SShhaarreedd DDiisskk SSuubb--ssyysstteemm
16. Shared-Disk: Create Table
ScaleDB Engine
Instance A
Cluster
Manager
Buffer Manager
Comm.
Layer
NNooddee 11
Server
Instance A
NNooddee 22
Server
Instance B
ScaleDB Engine
Instance B
Cluster
Manager
Comm.
Layer
Buffer Manager
CClluusstteerr IInntteerrccoonnnneecctt
SShhaarreedd DDiisskk SSuubb--ssyysstteemm
Table A
Table A Meta-Data
Meta-Data
17. ScaleDB Installation
Define cluster = true in ScaleDB Config file:
ScaleDB.cnf is at the same directory as my.cnf:
Cluster params:
o cluster = true
o nodes_in_cluster = 2
o node_id = 1
o this_machine_port = 100
o next_machine_ip_address = 192.168.0.101
o next_machine_port = 100
o log_directory = /share/logs/
18. Demo - Sysbench
ScaleDB cluster – one node – show throughput
ScaleDB cluster – 2nd node – show throughput
19. ScaleDB: Multi-Table Indexing
B-tree: Only indexes the data in tables
Index
#1
#1 #2
Index
#2
Index
#3
Index
#4
Index
#5
#3 #4 #5
ScaleDB: Indexes the data and relationships
ScaleDB
Index
#1
#2
#3
#4
#5
Advantages:
• Faster
• Smaller
• Referential integrity
20. Example
Scenario: Select information that is spread
across 3 tables: Colleges, Students and
Enrollment
Relationships: Students are enrolled in courses
within departments of colleges
SELECT c1.CollName, s.StudName, c2.CourseName , e.Grade
FROM College AS c1
JOIN Student AS s
JOIN Enrollment AS e
JOIN Course AS c2
ON ( c1.CollNo = s.CollNo AND
s.CollNo = e.CollNo AND
s.StudentNo = e.StudentNo AND
e.CollNo = c2.CollNo AND
e.DeptNo = c2.DeptNo AND
e.CourseNum = c2.CourseNum )
WHERE c1.CollNo = X
AND s.StudentNo = Y ;
21. Option #1: Conventional Joins
College Table
ID College Students
234 Institute of Technology 1,334
167 High Tech Institute 5,742
85 Golden State College 2,119
298 Kaplan College 12,323
510 California College 1,926
Students Table
ID Student Name SS# Phone
1220 Bruce Chizen 422-72-8495 (650) 234-2234
6778 Naomi Seligman 533-99-1234 (279) 331-2345
4435 Raymond Bingham
8872 Reed Hastings 412-44-5567 (312)676-8812
1129 Maria Klawe
1123 Bernard Vergnes
Enrollment Table
College ID Course Name Student Grade
510 C67 Mathematics 4435 87
167 C123 History 1 1129 70
167 C14 Photography 1 1120 88
Get College information
Get Student information
Search enrollment by College & Student
22. Option #2: Materialized View
ID College Students ID Course Name ID Student Name
234 Institute of Technology 1,334 C134 Mathematics 1145 John Cheechoo …
234 Institute of Technology 1,334 C134 Mathematics 1837 Ryane Clowe …
234 Institute of Technology 1,334 C134 Mathematics 2256 Patrick Marleau …
234 Institute of Technology 1,334 C134 Mathematics 2277 Jamie McGinn …
234 Institute of Technology 1,334 C134 Mathematics 4113 Torrey Mitchell …
. . .
234 Institute of Technology 1,334 C134 Mathematics 1145 …
385 Golden State College 2,224 G85 World History 7783 Joe Pavelski …
385 Golden State College 2,224 G85 World History 2234 Jeremy Roenick …
385 Golden State College 2,224 G85 World History 1177 Devin Setoguchi …
385 Golden State College 2,224 G85 World History 4113 Torrey Mitchell …
23. Option #3: Multi-Table Index
Colleges
Col_ID# Col_Name Col_Budget Col_Description
Coll_ID# Coll_Name Coll_Budget Coll_Description Student_ID# College_ID# Student_Name Student_Desc College_ID# Dept_ID# Student_ID# Grade
001 Agriculture $1,234,567 Nice place to visit
002 Arts $5,432,567 Sports not so good
003 Business $9,999,666 Cool logo
004 Education $3,234,567 Ugh Worcester
005 Engineering $8,238,568 Serious work
006 Law $7,237,767 Jumpy students
007 Liberal Arts $9,898,777 Pretty campus
008 Medicine $5,987,004 In Texas
Students
56-8033 008 Mike Hogan Caucasian
56-8045 008 Moshe Smith Caucasian
56-8044 008 Sally Shadmon Native American
56-8055 008 Billy Fleegle African American
56-8037 008 Saul Goode African American
56-8122 008 Tim Collins Polynesian
56-8233 008 Sam Gee Asian
56-8334 008 Rod Paulino Asian
Enrollment
008 4455 56-8037 B+
008 4455 56-8033 C
008 4455 56-8045 B+
008 4456 56-8044 A-
008 4456 56-8122 B-
008 4454 56-8233 C
008 4455 56-8334 F
008 4454 56-8055 D
CCoollleleggee
SStutuddeenntsts
EEnnrorolllmlmeenntt
DDeeppaartrmtmeenntsts
CCoouursrseess
ScaleDB Multi-Table Index
EEnnrroolllmlmeenntt
24. Mapping Foreign Keys to Data Views
Create Students Table
o Foreign key – College
Students
Enrollment
Create Enrollment Table
o Foreign key - Students
Course
Create Course Table
o Foreign Key – Department
Department
Create Department Table
o Foreign key – College
College
Create College Table
The Parent-Child tables are Created in MySQL
Such that MySQL is able to operate over the new
tables
The data of the Parent-Child tables is assembled
on the fly from the source tables
25. Mapping Foreign Keys to Data Views
Students
Course
Enrollment
College Department
College Department
College
College Students
ScaleDB
Physical files:
1. College
2. Department
3. Student
4. Course
5. Enrollment
Meta-Data Tables:
1. College
2. College-Dept
3. College-Dept-Course
4. College-Students
5. College-Students-Enrollment
6. Department
7. Students
8. Course
9. Enrollment
26. Enabling the MySQL optimizer to
use a Multi-Table Index
SELECT c1.CollName, s.StudName,
c2.CourseName , e.Grade
FROM College AS c1
JOIN Student AS s
JOIN Enrollment AS e
JOIN Course AS c2
ON ( c1.CollNo = s.CollNo AND
s.CollNo = e.CollNo AND
s.StudentNo = e.StudentNo AND
e.CollNo = c2.CollNo AND
e.DeptNo = c2.DeptNo AND
e.CourseNum = c2.CourseNum )
WHERE c1.CollNo = X
AND s.StudentNo = Y ;
CREATE TABLE sdb_view_college_course_student (
L1_CollNo INT NOT NULL,
L1_CollName CHAR(32) NOT NULL,
L1_CollBudget INT NOT NULL,
L1_CollDescription CHAR(60) NOT NULL,
… Table College Columns
L2_StudNo INT NOT NULL,
L2_StudName CHAR(48) NOT NULL,
… Table Student Columns
L3_CourseNum CHAR(9) NOT NULL,
L3_Grade CHAR(2) NOT NULL,
… Table Enrollment Columns
PRIMARY KEY ( L1_CollNo, L2_StudtNo,
L3_CourseNum))
ENGINE = SCALEDB;
Select L1_CollName, L2_StudName, L3_CourseName, L3_Grade
FROM sdb_view_college_course_student WHERE l1_CollNo = X AND l2_StudentNo
= Y ;
27. The Multi-Table Index
Multi-Table Index appears to MySQL as a data table
ScaleDB does not maintain data file associated with
the Multi-Table Index
For a query using virtual table, ScaleDB assembles
the rows on the fly using the Multi-Table Index
ScaleDB indexes are different than B-tree indexes
ScaleDB indexes provide the same functionality as
B-tree, plus…
o They maintain referential integrity with minimal overhead
o They allow you to search for the data and relationships
o They are much smaller in size
28. Demo
Query with join
Query with Multi-Table Index
2nd node virtual table
30. Summary
ScaleDB Cluster
o Multiple ScaleDB instances share the same physical data.
o Connecting to the cluster is similar to connecting to a single
node.
o For the application, the cluster appears as a single node.
o Transparent application failover
o Transparent Scalability
ScaleDB Indexes
o Provide the B-tree functionality
o High performance
Map relationships
Maintain referential integrity
Smaller footprint
Independent of the key size
31. ScaleDB Status and Product Availability
Started Beta Process
o We are looking for beta companies
Product launch is scheduled for June timeframe
Please talk to us if you are developer interested
in working with ScaleDB
moshe@scaledb.com
Notas do Editor
In this presentation I will discuss 2 main features of the ScaleDB storage engine:
ScaleDB is a shared disk storage engine which means that multiple servers share the same physical database.
ScaleDB implements a unique indexing method. All database companies use B-tree to index their data, ScaleDB is using an innovative indexing that is based on a trie structure.
In this talk I will explain and demonstrated both technologies.
ScaleDB is a general purpose storage engine that supports the MySQL storage engine API.
It is oriented to support large, disk-based data files with high performance and scalability.
In particular, it enables multiple MySQL server instances to share the same physical database as well as innovative high performance indexing.
With shared nothing, the data needs to be partitioned to consider the following:
Usage pattern – we try to have balanced distribution of calls to each of the nodes.
Complete the query at a single node – we try to distribute the data such that each query will be satisfied in a single node.
Growth of data is evenly distributed across the nodes.
With shared nothing, assuming even distribution of the data and the users across the nodes in the cluster, some queries can be satisfied well.
However, these are only queries that consider the way the data is partitioned. With the horizontal partitioning, queries using non-partitioned attributes and range queries are executed on all nodes.
With a shared disk, the growth in data size and the number of users is addressed by adding nodes to the cluster. All the nodes view a unified view of the data and each node can satisfy all the queries. With this architecture, there is no need to partition the data and the distribution of users across node depends only on the availability of the node.
With shared nothing, if the info reside on a single node execution may be efficient. However, the info may reside on multiple nodes such that the processing node needs to communicate with one and perhaps more additional nodes.
One main question is how can MySQL take advantage of a shared disk architecture as MySQL is not designed for a shared architecture.
- the synchronization is done in the engine layer. Every MySQL server operates as a single server, the ScaleDB engine synchronize the processes among the different nodes.
All database companies use Btree as their main index structure. ScaleDB is using a proprietary multi-table index. The main difference is that Btree index the data. We can search by a key such as customer id to find the customer row. The ScaleDB index allows to search for the data as well as the relations. For example, we can search for the customer row by the customer id. The search to the customer can continue using the same index path to invoices of the customer. With Btree, we would need to initiate a new search using a different index for the invoices relating the customer.
Lets consider an example where rows from 3 tables are joined to satisfy a particular query. These are potential methods to execute the query:
Conventional joins step sequentially through the applicable tables, building the query result with each step.
The second method is to build a materialized view. Materialized view works very efficient as there is no need to join the data, but it has the following drawbacks:
1. The data needs o be duplicated.
The data needs to constantly be synchronized with the source tables. For example, if we change the student address it needs to be modified not only in the Students Table but also in the view data.
There are no MySQL engines supporting materialized views.
The third approach is the ScaleDB Multi-Table Index that materialized the views without duplicating the data. In addition there is no need to synchronize the view with the source data as the view is constructed on the fly from the source data.
In this example, the Multi-Table Index maintains a path that ends with pointers to the needed rows in he College, Students and Enrollment tables. It obviates the need to physically materialize the view.
Two important feature relating the creation of a Multi-Table index are:
1. The index is created transparently from the SQL Primary-Key and Foreign Key definitions.
2. There are no supporting data tables. The index create the tables for MySQL on the fly.
With the ScaleDB index, a query that does joins can be replaced by a simpler query over an “sdb_view” table. Calls to sdb_view table is executed by the engine in a more efficient way than executing the join.