Advanced Sharding Techniques with Spider (MUC2010)

Advanced sharding techniques with Spider

Kentoku SHIBA
kentokushiba at gmail dot com

How to shard database
without stopping the service

How to shard database

What is database sharding?
When the data volume increases or the updating traffic
increases, your updating database server cannot process
effectively.
We often use the technique for dividing data into two or
more databases to solve the problem. This is database
sharding.

Here, I will explain how to shard a data,
without stopping the service.

Initial Structure

tbl_a
Create table tbl_a (
col_a int, DB1
col_b int,
primary key(col_a)
) engine = InnoDB;

There is 1 MySQL server without Spider.

Step 1 (for sharding)
col_a%2=0

Create table tbl_a3 ( Create table tbl_a (
col_a int, tbl_a tbl_a col_a int,
col_b int, col_b int,
primary key(col_a) DB2 primary key(col_a)
) engine = InnoDB;
) engine = Spider
Connection ‘ tbl_a2
table “tbl_a”, col_a%2=1
user “user”, Create table tbl_a4 (
password “pass” col_a int,
‘
tbl_a3 tbl_a col_b int,
partition by list( primary key(col_a)
mod(col_a, 2)) (
partition pt1 values in(0)
DB3 ) engine = VP
Comment ‘
comment ‘host “DB2”’, cit "2",
partition pt2 values in(1) tbl_a4 cil "2",
comment ‘host “DB3”’ ctm “1”,
); DB1 ist “1”,
zru “1”,
tnl “tbl_a2 tbl_a3”
Create table on DB2 and DB3. ‘;

Then create tables on DB1.

Step 2
col_a%2=0

tbl_a2 tbl_a
DB2
tbl_a5
col_a%2=1

tbl_a3 tbl_a
DB3
tbl_a
DB1

Rename table on DB1.
(rename table tbl_a2 to tbl_a5, tbl_a to tbl_a2, tbl_a4 to tbl_a)

Step 3
col_a%2=0

tbl_a2 tbl_a
DB2
tbl_a5
col_a%2=1

tbl_a3 tbl_a
DB3
tbl_a
DB1

Copy data from tbl_a2 to tbl_a3 on DB1.
(select vp_copy_tables(‘tbl_a’, ‘tbl_a2’, ‘tbl_a3’))

Step 4
col_a%2=0

tbl_a2 tbl_a
DB2
tbl_a5
col_a%2=1

tbl_a tbl_a
DB3
tbl_a4
DB1

(rename table tbl_a to tbl_a4, tbl_a3 to tbl_a)

Finish
col_a%2=0

tbl_a
DB2
col_a%2=1

tbl_a tbl_a
DB3

DB1

Drop table on DB1.
(drop table tbl_a2, tbl_a4, tbl_a5)

How to re-shard database

How to re-shard database

What is re-sharding?
When the data volume increases or the updating traffic
increases so much, even if you had your database sharded,
your updating database server cannot process right again.
So we solve that problem by increasing the number of
servers and distributing the load.
It is called re-sharding to increase the number of servers,
and to distribute the load.

Here, I will explain how to re-shard
without stopping the service.

Initial Structure
col_a%2=0 col_a%2=1
col_a int,
col_b int,
primary key(col_a)
tbl_a tbl_a
) engine = Spider
Connection ‘
DB2 DB3
table “tbl_a”,
user “user”,
password “pass”
‘
partition by list(
mod(col_a, 2)) ( tbl_a
comment ‘host “DB2”’,
DB1
comment ‘host “DB3”’
);

There are 1 MySQL server with Spider and 2 remote
MySQL servers without Spider.

Step 1 (for re-sharding)
col_a%2=0 col_a%2=1 col_a%4=1

Create table tbl_a3 (
col_a int, tbl_a tbl_a tbl_a
col_b int,
primary key(col_a) DB2 DB4
) engine = Spider
Connection ‘ tbl_a2
table “tbl_a”, col_a%4=3
user “user”,
password “pass”
‘
tbl_a tbl_a3 tbl_a
partition by list(
mod(col_a, 4)) (
partition pt1 values in(1) DB1 DB5
partition pt2 values in(3) tbl_a4 col_a int,
comment ‘host “DB5”’ col_b int,
primary key(col_a)
); DB3 ) engine = VP
Comment ‘
cit "2",
Create table on DB4 and DB5. cil "2",
ctm “1”,
Then create tables on DB3. ist “1”,
zru “1”,
‘;

Step 2

tbl_a tbl_a2 tbl_a
DB2 DB4
tbl_a5
col_a%4=3

tbl_a tbl_a3 tbl_a

DB1 DB5
tbl_a
DB3


Step 3

tbl_a tbl_a2 tbl_a
DB2 DB4
tbl_a5
col_a%4=3

tbl_a tbl_a3 tbl_a

DB1 DB5
tbl_a
DB3


Step 4

tbl_a tbl_a2 tbl_a
Alter table tbl_a
partition by list( DB2 DB4
mod(col_a, 4)) (
partition pt1 values in(0,2)
tbl_a5
col_a%4=3
partition pt2 values in(3) tbl_a tbl_a tbl_a
comment ‘host “DB5”’
);
DB1 DB5
tbl_a4 Rename table
tbl_a to tbl_a4,
DB3 tbl_a3 to tbl_a;

Then alter table on DB1.

Finish
col_a%2=0 col_a%4=1

tbl_a tbl_a
DB2 DB4
col_a%4=3

tbl_a tbl_a

DB1 DB5

Drop DB3.

How to add an index

How to add an index

If you add an index in MySQL, you cannot
update your data until the process is completed.
When it comes to a big table, it takes
a long time to complete, sometimes you cannot
use the service during the change.

Here, I will explain how to add an index,
without stopping the update of your data.

Initial Structure

tbl_a
col_a int, DB1
col_b int,
primary key(col_a)
) engine = InnoDB;

There is 1 MySQL server.

Step 1 (for adding an index)

col_a int, tbl_a
col_b int,
primary key(col_a)
) engine = InnoDB; Create table tbl_a4 (
tbl_a2 col_a int,
col_b int,
primary key(col_a)
Create table tbl_a3 ( ) engine = VP
col_a int, tbl_a3 Comment ‘
col_b int, cit "2",
primary key(col_a), cil "2",
key idx1(col_b) ctm “1”,
) engine = InnoDB; tbl_a4 ist “1”,
zru “1”,
DB1 ‘;

Create tables on DB1.

Step 2

tbl_a2

tbl_a5

tbl_a3

tbl_a
DB1


Step 3

tbl_a2

tbl_a5

tbl_a3

tbl_a
DB1


Step 4

tbl_a2

tbl_a5

tbl_a

tbl_a4
DB1

(rename table tbl_a to tbl_a4, tbl_a3 to tbl_a)

Finish

tbl_a

DB1

Drop table on DB1.
(drop table tbl_a2, tbl_a4, tbl_a5)

How to change the schema

How to change the schema

If you change schema in MySQL, you cannot
update your data until the process is completed.
When it comes to a big table, it takes
a long time to complete, sometimes you cannot
use the service during the change.

Here, I will explain how to change schema,
without stopping the update of your data.

Step 1 (for adding a column)

col_a int, tbl_a
col_b int,
primary key(col_a)
) engine = InnoDB; Create table tbl_a4 (
tbl_a2 col_a int,
col_b int,
primary key(col_a)
Create table tbl_a3 ( ) engine = VP
col_a int, tbl_a3 Comment ‘
col_b int, cit "2",
col_c int default null, cil "2",
primary key(col_a) ctm “1”,
) engine = InnoDB; tbl_a4 ist “1”,
zru “1”,
DB1 ‘;


How to set up a cluster
for fault tolerance

How to set up a cluster for fault tolerance

Spider can set up a cluster for fault tolerance
by each table.

Here, I will explain how to set up cluster,
without stopping service.

'Monitoring node' in this slide is a node that works to observe
the trouble of each node that composes clustering.
'Spider_copy_tables' in this slide is in development , so please
wait for a while to use it.

Initial Structure

tbl_a Create table tbl_a (
col_a int,
DB2 col_b int,
primary key(col_a)
Create table tbl_a ( ) engine = InnoDB;
col_a int,
col_b int,
primary key(col_a)
) engine = Spider
Connection ‘ tbl_a
table “tbl_a”,
user “user”,
password “pass”,
DB1
host “DB2”
‘;

There are 1 MySQL server with Spider and 1 remote
Mysql servers without Spider.

Step 1 (for clustering)

tbl_a tbl_a tbl_a
DB2 DB3 DB4

tbl_a Create table tbl_a (
col_a int,
DB1 col_b int,
primary key(col_a)
) engine = InnoDB;

Add new data nodes(DB3 and DB4) and tables.

Step 2

tbl_a tbl_a tbl_a
DB2 DB3 DB4
col_a int,
col_b int,
primary key(col_a)
) engine = Spider
tbl_a Connection ‘
table “tbl_a”,
DB1 tbl_a
DB7 user “user”,
DB6 host “DB2 DB3 DB4”
DB5 ‘;

Add new monitoring nodes
(DB5, DB6, DB7) and tables.

Step 3
insert into mysql.spider_link_mon_servers
(db_name, table_name, link_id, sid, server, scheme, host, port, socket, username, password)
values
tbl_a tbl_a tbl_a
('db_name', 'tbl_a', 0, DB5_sid, null, 'mysql', 'DB5', 3306, null, 'user', 'pass‘),
('db_name', 'tbl_a', 0, DB6_sid, null, 'mysql', 'DB6', 3306, null, 'user', 'pass‘),
DB2 DB3 DB4
('db_name', 'tbl_a', 0, DB7_sid, null, 'mysql', 'DB7', 3306, null, 'user', 'pass‘);
Alter table tbl_a
Connection ‘
table “tbl_a”,
user “user”,
host “DB2 DB3 DB4”, tbl_a
mbk “2”, mkd “2”,
msi “DB5_sid”, DB1 tbl_a
DB7
link_status “0 2 2” DB6
‘; DB5

Register monitornig node information to
MySQL servers with Spider.
Then alter table on DB1.

Step 4

tbl_a tbl_a tbl_a
DB2 DB3 DB4

tbl_a Select spider_copy_tables(‘tbl_a’, ‘’, ‘’);

DB1 tbl_a
DB7
DB6
DB5

Copy data from DB2 to DB3 and DB4.

Finish

tbl_a tbl_a tbl_a
DB2 DB3 DB4
Alter table tbl_a
Connection ‘
table “tbl_a”,
user “user”,
mbk “2”, mkd “2”,
msi “DB5_sid”,
link_status “0 1 1”
DB1 tbl_a
DB7
‘;
DB6
DB5

Alter table on DB1.

How to add new node
after failover
and preparing new server

Create a table of a new node to the clustered table

You need to create a new node, in order to
maintain redundancy, when there is a trouble
at the node that composes the cluster.

Here, I will explain how to add a table of a
new node, without stopping the service.
'Monitoring node' in this slide is a node that works to observe
the trouble of each node that composes clustering.
'Spider_copy_tables' in this slide is still in development , it will
be available in future releases.

Initial Structure

tbl_a tbl_a tbl_a
DB2 DB3 DB4

tbl_a
DB1 tbl_a
DB7
DB6
DB5

There are 4 MySQL servers with Spider
(include 3 monitoring nodes) and
3 MySQL servers without Spider (including 1 broken node).

Step 1

tbl_a tbl_a tbl_a tbl_a
DB8 DB2 DB3 DB4

Create table tbl_a ( tbl_a
col_a int,
col_b int,
DB1 tbl_a
DB7
primary key(col_a)
DB6
) engine = InnoDB;
DB5

Add new data node(DB8) and table.

Step 2

DB8 DB2 DB3 DB4

Alter table tbl_a
table “tbl_a”,
user “user”,
DB1 tbl_a
DB7 password “pass”,
DB6 host “DB2 DB4 DB8”
DB5 ‘;

Alter table on monitoring nodes
(DB5, DB6 and DB7).

Step 3

DB8 DB2 DB3 DB4
Alter table tbl_a
Connection ‘
table “tbl_a”,
user “user”,
mbk “2”, mkd “2”,
DB7
‘; DB5

Alter table on DB1.

Step 4

DB8 DB2 DB3 DB4

tbl_a Select spider_copy_tables(‘tbl_a’, ‘’, ‘’);

DB1 tbl_a
DB7
DB6
DB5

Copy data from DB2 to DB8.

Finish

DB8 DB2 DB3 DB4
Alter table tbl_a
Connection ‘
table “tbl_a”,
user “user”,
mbk “2”, mkd “2”,
DB7
‘; DB5

Alter table on DB1.

How to avoid table partitioning
UNIQUE column limitation

How to avoid table partitioning UNIQUE column limitation

Right now, there is a restriction of MySQL that
you cannot partition in other columns when
there is a PK or UNIQUE.

Here, I will show you how to partition a table by
any columns even if there is a PK or
UNIQUE.

Step 1 (for avoiding partitioning limitation)
col_a int, Create table tbl_a2 (
primary key(col_a) tbl_a col_a int,
) engine = InnoDB col_b int,
partition by primary key(col_a)
linear hash(col_a) tbl_a2 ) engine = InnoDB;
partitions 4;
Create table tbl_a4 ( tbl_a3 col_a int,
col_a int, col_b int,
col_b int, primary key(col_a)
key idx1(col_a), tbl_a4 ) engine = VP
key idx2(col_b) Comment ‘
) engine = InnoDB ctm “1”, ist “1”,
partition by list( zru “1”, pcm “1”
mod(col_b, 2)) ( tbl_a5 ‘
partition pt1 values in(0), Connection ‘

);
partition pt2 values in(1) DB1 ‘;
tnl “tbl_a2 tbl_a3 tbl_a4”


Step 2

tbl_a2

tbl_a6

tbl_a3

tbl_a4

tbl_a
DB1


Step 3

tbl_a2

tbl_a6

tbl_a3

tbl_a4

tbl_a
DB1

Copy data from tbl_a2 to tbl_a3 and tbl_a4.
(select vp_copy_tables(‘tbl_a’, ‘tbl_a2’, ‘tbl_a3 tbl_a4’))

Step 4

tbl_a2

tbl_a6

tbl_a3
Alter table tbl_a
tbl_a4 Comment ‘
ctm “1”, ist “1”,
pcm “1”
‘,
DB1 ‘;

Alter table tbl_a.

Finish

tbl_a3

tbl_a4

tbl_a
DB1

Drop table.
(drop table tbl_a2, tbl_a6)

About MicroAd

MicroAd is an advatising company.

This company can advertise efficiently
using "behavioral targeting" technology.

【MicroAd, Inc.]
http://www.microad.jp/english/

The previous architecture

…… ……
AP AP AP AP

LVS

Slave Slave
DB DB Register new statistical rules
replication from batch server
Master
Batch
DB

Batch processing updates new statistical rules every day.
(For every advertisers, every advertising medias
and every users)

The problem with business expansion

Increase data and request.
At that time the limit of updates were 20 million
records a day.
They needed to update 100 million records a day.

They also wanted to improve the performance of
the reference slave by decreasing the amount of
the update by one slave.

They did not want to change or modify their
application to support the increase.

Then, Spider was used.

The architecture with Spider
…… AP AP AP AP ……
with Spider with Spider with Spider with Spider
Spider sharding
LVS LVS LVS

SlaveDB SlaveDB SlaveDB SlaveDB SlaveDB SlaveDB
replication replication replication
MasterDB MasterDB MasterDB

Spider sharding Register new
statistical rules from batch server
SpiderDB
(MySQL with Spider) Batch

They created the shards with
the unit of the replication.

Resolved the problem

As a result,
They achieved update 100 million records a day
and improved the performance of the reference.

They didn't need to change or modify their
applications so much.

They are planning in the near future of
resharding, when they expand the business.

Any Questions?

Thank you for taking
your time!!

Kentoku SHIBA (kentokushiba at gmail dot com)
http://wild-growth.blogspot.com/
http://spiderformysql.com

Advanced Sharding Techniques with Spider (MUC2010)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Mais de Kentoku

Mais de Kentoku (20)

Último

Último (20)

Advanced Sharding Techniques with Spider (MUC2010)