2. Agenda
□ Background & Goal
□ CUBRID Cluster basic concept
□ CUBRID Cluster general design
□ Result & Status of each milestone
□ Demo
□ Performance results
□ Pros and cons
□ Next version plan
2 / 35 CUBRID Cluster Introduction
4. Background & Goal
□ Background
In Internet Portal Service, volume size of service data is increasing very fast without deletion
(Such as: Café service)
How to scale-out DB system without modification of applications?
• Big system power by cheap commodity servers – Clustering or grid computing
□ Goal
Support Dynamic Scalability
Location transparency to the applications
Volume size & Performance
• When performance is the same, Cluster can store more data
• When data size is the same, Cluster can provide higher performance
Others
• Global Schema, Distributed Partition, Load Balancing
• Cluster Management Node, Heart beat
4 / 35 CUBRID Cluster Introduction
5. Background & Goal(cont.)
As-Is To-Be Provides “single DB view”/“multi access point”
DB system architecture is coded in
to applications
the application’s logic
DB system scale-out independently to applications
AP logic decides which SQL to which DB server
(Linear Scalability)
DB1/UPDATE tbl01 DB1/SELECT tbl01 UPDATE tbl01 SELECT tbl01 SELECT tbl35
DB3/UPDATE tbl35 DB4/SELECT tbl47 SELECT tbl47 UPDATE tbl35
RW RO
Global Schema
M M
Distributed Partition
HA DB
S S
DB1 DB2 DB3 DB4 To do
5 / 35 CUBRID Cluster Introduction
7. CUBRID Cluster basic concept
Basic Features Advanced Features
To do
• Global schema • Support HA
• Global database • Cluster management node
• Distributed partition • Dead lock detection
• Global transaction
• Dynamic scalability
• Global serial & global index
7 / 35 CUBRID Cluster Introduction
8. CUBRID Cluster basic concept– Global schema
The global schema is a single representation or a global view of all nodes where
each node has its own database and schema.
SELECT * FROM info, code
WHERE info.id = code.id SELECT * FROM contents
WHERE auth = (SELECT name FROM author WHERE …)
INSERT INTO contents…
contents info author
Global Schema
contents author contents author contents author contents author
local
code level local
info info info info
Local Schema #1 Local Schema #2 Local Schema #3 Local Schema #4
Database #1 Database #2 Database #3 Database #4
8/ CUBRID Cluster Introduction
9. CUBRID Cluster basic concept – Global database
Global database is a logical concept to represent database managed by the
CUBRID Cluster system.
Global DB A Global DB C Logical View
Logical View
⑴ ⑵ ⑶ ⒜ ⒝
Physical View Physical View
⑴ ⑵ ⒜ ⑶ ⒝
DB A DB B DB A DB C DB A DB D DB C
Node #1 Node #2 Node #3
9/ CUBRID Cluster Introduction
10. Distributed partition concept
Global Schema
Logical View Logical View
Data
System
Catalog
Physical View Physical View
Index
Schema Schema
Data
System System
Catalog Data Catalog
Index Index
DB1 ON NODE #1 DB1 ON NODE #2
CUBRID Cluster Introduction
10 /
11. CUBRID Cluster basic concept -- others
□ Global Transaction
A global transaction will be divided into several local transactions which run on different server
nodes.
Global transaction makes sure that every server node in CUBRID Cluster is consistent before
or after the global transaction.
The process of global transaction is transparent to application.
□ Dynamic Scalability
Dynamic scalability allow user to extend or shrunk server nodes in CUBRID Cluster without
stop the CUBRID Cluster.
After the new server node is added in Cluster, user can access and query global table from this
new node.
11 / 35 CUBRID Cluster Introduction
12. CUBRID Cluster basic concept – User specs
□ Registering Local DB into Global DB (Cluster)
REGISTER NODE‘node1’ ‘10.34.64.64’;
REGISTER NODE‘node2’ ‘out-dev7’;
□ Creating Global Table/Global Partition table
CREATE GLOBAL TABLE gt1 (…) ON NODE‘node1’;
CREATE GLOBAL TABLE gt2 (id INT primate key, …) partition by hash (id) partitions 2 ON
NODE ‘node1’, ‘node2’;
□ DML operations (INSERT/SELECT/DELETE/UPDATE)
□ Dynamic Scalability
-- add a new server node in global database
REGISTER 'node3' '10.34.64.66';
-- adjust data to new server node
ALTER GLOBAL TABLE gt2 ADD PARTITIONPARTITIONS 1 ON NODE 'node3';
12 / 35 CUBRID Cluster Introduction
14. CUBRID Cluster general design (DDL/INSERT)
CREATE GLOBAL TABLE INSERT INTO gt1
gt1… …
PARTITION BY HASH
ON NODE ‘Server1”, ‘Server2”,
‘Server3’, ‘Server4’;
AP AP
Workspace
Broker
Extension Broker
To store
Remote oid
C2S communication
Global Schema
Distributed partition
DB1 DB1 DB1 DB1
Server #1 Server #2 Server #3 Server #4
Global DB1
14 / 35 CUBRID Cluster Introduction
15. CUBRID Cluster general design (SELECT/DELETE)
SECLET .. FROM gt1 UPDATE..
WHERE ….
DELETE …
AP AP
Broker
Broker
Remote execution
DB1 DB1 DB1 DB1
Remote scan Server #1 Server #2 Server #3 Server #4
S2S communication
15 / 35 CUBRID Cluster Introduction
16. CUBRID Cluster general design (COMMIT)
INSERT gt1
SELECT …
FROM …
COMMIT
AP AP
Broker
Broker
Global index : 0x40430000 Index Server1 Server2 Server3 Server4
0 2 3 5 1
Local: Local: Local: Local:
2 3 5 1
DB1 DB1 DB1 DB1
2 phase commit
10.34.64.64 10.34.64.65 10.34.64.66 10.34.64.67
Coordinator
Participants 16 / 35 CUBRID Cluster Introduction
17. CUBRID Cluster general design (dynamic scale-out)
CREATE GLOBAL TABLE
ALTER GLOBAL
gt2…
TABLE gt2 ADD
PARTITION BY HASH
PARTITION … ON
ON NODE ‘Server1”,
NODE ‘Server 4’;
‘Server2”, ‘Server3’;
REGISTER ‘Server 4’
‘10.34.64.67’;
AP AP
Broker
Broker
Sync up global
schema
DB1 DB1 DB1 DB1
Server #1 Server #2 Server #3 Server #4
Rehash
17 / 35 CUBRID Cluster Introduction
18. CUBRID Cluster general design (ORDER BY-Ongoing)
SECLET .. FROM gt1
Order by ….
AP AP
Broker
Broker
Step Send remote query
1: with order by
scan scan scan scan
Step
2: sort sort sort sort
DB1 DB1 DB1 DB1
Server #1 Server #2 Server #3 Server #4
Step3: Merge results from each server
18 / 35 CUBRID Cluster Introduction
19. The result & status of each
milestone
19 / 35 CUBRID Cluster Introduction
20. CUBRID Cluster Project Overview
□ Team Composition & Roles
Service Platform and Development Center, NHN Korea
• Architect: Park Kiun (Architect/SW)
Platform Development Lab, NHN China
• Project Manager : Baek Jeonghan (Director)/Li Chenglong (Team Leader)
• Dev leader: Li Chenglong (Team Leader) /Wang Dong (Part Leader)
□ Project Duration
May, 2010 ~ Oct, 2011
□ Quality requirement
Passed CUBRID all regression test cases;
Passed CUBRID Cluster all QA and dev functions test cases;
Passed QA Performance test cases;
□ Others:
Code based on CUBRID 8.3.0.0337 version (release verison)
20 / 35 CUBRID Cluster Introduction
21. The result & status of each milestones -- Overview
2010 H1 2010 H2 2011 H1 2011 H2
May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
M1 M2 Distributed M3 M4 Next
Global Schema partition Performance nReport Ver.
M1:
Start: May 24th, 2010
End: Oct 20th, 2010 M2:
Start: Oct 21th, 2010
End: Mar 25th, 2011
M3:
Start: Mar 28th, 2011
End: Jul 17th, 2011
M4 (Ongoing):
Start: Jul 18th, 2011
End: Oct 30th, 2011
21 / 35 CUBRID Cluster Introduction
22. The result & status of each milestones – M1
□ Achievements:
Open source on sf.net (including code, wiki, bts, forum)
General design for CUBRID Cluster
Implement global database
Implement system catalog extension and Support global table DDL
Support basic DML statement(insert/select/delete/update) for global table
Support s2s communication(server transfer data)
□ Others:
Source lines of code(LOC): 19246 (add 11358, del 817, mod 7071)
Add Chinese msg (LOC): 7507
BTS issues numbers: 178 issues
Check-in in subversion: 387 times
22 / 35 CUBRID Cluster Introduction
23. The result & status of each milestones – M2
□ Achievements:
Implement distributed partition table by hash (basic DDL and DML)
Support constraints(global index, primary key, unique), query with index
Support global serial
Support global transaction (commit, rollback)
Refactor s2s communication (add s2s communication interface and connection pooling)
Support all SQL statements for café service
Passed QA functional testing
□ Others:
Source lines of code(LOC): 20242 (add 8670, del 4385, mod 7187)
BTS issues numbers: 241 issues
Fix QA bugs: 43 bugs
Check-in in subversion: 461 times
23 / 35 CUBRID Cluster Introduction
24. The result & status of each milestones – M3
□ Achievements:
Performance improvement for M2 (DDL, query, server side insert, 2PC)
Refactor global transaction, support savepoint and atomic statement
Implement dynamic scalability (register/unregister node, add/drop partition)
Support load/unloaddb, killtran
Others Features : (auto increment, global deadlock timeout)
Passed QA functional and performance testing
□ Others:
Source lines of code(LOC): 11518 (add 7065, del 1092, mod 3361)
BTS issues numbers: 165 issues
Fixed QA bugs: 52 bugs
Check-in in subversion: 461 times
24 / 35 CUBRID Cluster Introduction
25. The result & status of each milestones – M4 (Ongoing)
□ Goal:
Provide data storage engine for nReport Project
Performance improvement for order by and group by statement
Support big table join small table (global partition table join with non-partition table)
25 / 35 CUBRID Cluster Introduction
27. Performance Results
□ Test environment
3 Server nodes (10.34.64.201/202/204): Configure: data_buffer_pages=1,000,000
• CPU : Intel(R) Xeon(R) CPU E5405 @2.00GHz Table size: 100,000 and 10,000,000 rows
• Memory: 8G
Data size: 108M (total 207M) and 9.6G (total
• Network: 1000 Mbps
30G)
• OS: Center 5.5(64bit)
Each thread runs 5000 times
CUBRID Cluster M3 CUBRID 8.3.0.0337
JAVA Program JAVA Program
10.34.64.203 10.34.64.203
40 threads 40 threads
14 threads 13 threads 13 threads 40 threads
Node1 Node2 Node3 10.34.64.201
10.34.64.204 10.34.64.201 10.34.64.202 CUBRID DB
Cluster Database
27 / 35 CUBRID Cluster Introduction
28. Performance Results (cont.)
□ Create table statement:
Cluster M3:
• CREATE GLOBAL TABLE t1 (a INT, b INT, c INT, d CHAR(10),e CHAR(100),f CHAR(500),INDEX
i_t1_a(a),INDEX i_t1_b(b)) PARTITION BY HASH(a) PARTITIONS 256 ON NODE
'node1', 'node2', 'node3';
CUBRID R3.0:
• CREATE TABLE t1 (a INT, b INT, c INT, d CHAR(10),e CHAR(100),f CHAR(500),INDEX i_t1_a(a),INDEX
i_t1_b(b)) PARTITION BY HASH(a) PARTITIONS 256;
□ Test statements:
Select partition key column: SELECT * FROM t1 WHERE a = ?
Select non-partition key column: SELECT * FROM t1 WHERE b = ?
Select non-partition key column by range: SELECT * FROM t1 WHERE b BETWEEN ? AND ?
Insert with auto commit: INSERT INTO T1 VALUES (?,?,?,?,?,?);
28 / 35 CUBRID Cluster Introduction
29. Performance Results (cont.)
□ TPS (Transactions Per Second) Graph
SELECT * FROM t1 WHERE a = ? SELECT * FROM t1 WHERE b = ?
column a is indexed and partition key. column b is indexed but not partitioned key.
SELECT * FROM t1 WHERE b BETWEEN ? AND ? INSERT INTO T1 VALUES (?,?,?,?,?,?)
column b is indexed but not partitioned key. Auto commit
29 / 35 CUBRID Cluster Introduction
30. Performance Results (cont.)
□ ART (Average Response Time) Graph -- The lower the better
SELECT * FROM t1 WHERE a = ? SELECT * FROM t1 WHERE b = ?
column a is indexed and partition key. column b is indexed but not partitioned key.
SELECT * FROM t1 WHERE b BETWEEN ? AN INSERT INTO T1 VALUES (?,?,?,?,?,?)
D? Auto commit
column b is indexed but not partitioned key.
30 30 /
/ 35 CUBRID Cluster Introduction
31. Performance Results (cont.)
□ Test environment
Server nodes (10.34.64.49/50 …/58): Configure:
• CPU : Intel(R) Xeon(R) CPU E5645@ • cubrid.conf: data_buffer_pages=1,000,000
2.40GHz(12 core)
Table size: 100,000,000 rows (one hundred
• Memory: 16G
million)
• Network: 1000 Mbps
Data size: 88G (total size: 127G)
• OS: Center 5.5(64bit)
31 / 35 CUBRID Cluster Introduction
33. Pros and cons
□ Pros
Current application can use CUBRID Cluster easily
CUBRID Cluster can store more data or provide higher performance than CUBRID
CUBRID Cluster is easy to scale-out data size
CUBRID Cluster can save cost
Support transaction
□ Cons
Not support join
Performance is not good enough yet
• S2S communication may led network cost
• 2PC will write many logs led IO cost
33 / 35 CUBRID Cluster Introduction
34. Next Version plan
□ Tentative Work plan
Performance improvement
Support HA for each server node in CUBRID Cluster
Support Load balance (write to active server/read from standby server)
Support distributed partition by range/list
Support global user
Others : backup/restore DB
34 / 35 CUBRID Cluster Introduction
35. Appendix
□ Why select partition key is not fast enough? (back)
SECLET .. FROM t1 SECLET .. FROM t1
AP Where a = 100 AP Where a = 100
SECLET .. FROM t1__p__p2 SECLET .. FROM t1__p__p2
Where a = 100 Where a = 100
Broker p2 stored on server2 Broker p2 stored on server2
Step1: Send request to Step1: Send request
server1 (default server) to server2 directly
Step2: Send
remote scan
request
Step3: SCAN Step2:
SCAN
do scan do scan
DB1 DB1 DB1 DB1
Server #1 Server #2 Server #1 Server #2
Step4: fetch back No remote scan here
35 /35 / 35 CUBRID Cluster Introduction
36. Appendix (cont.)
□ Why insert is not fast enough? (BACK)
INSERT t1 (a, …) INSERT t1 (a, …)
VALUES (100, …..); VALUES (100, …..);
AP AP
COMMIT COMMIT
a=100 should be a=100 should be
stored on server2 stored on server2
Broker Broker
Dirty Dirty Write log
1 time
Write log DB1 DB1 Write log DB1 DB1
3 times 2 times
Server #1 Server #2 Server #1 Server #2
2 phase commit No 2PC here
36 / 35 CUBRID Cluster Introduction