The document discusses how database replication is important for scaling web applications and high availability. It describes how database replication can be used for load balancing reads, sharding writes, and replacing masters during failures. While basic MySQL replication provides some capabilities, it has limitations in supporting multiple masters, conflict prevention, and replicating to different database types. The Tungsten Replicator aims to address these limitations by providing features like parallel replication across databases, seamless failover, and replicating from MySQL to other databases like PostgreSQL. It benchmarks show that Tungsten Replicator's parallel replication approach can significantly reduce the time to catch up slave databases compared to native MySQL replication.
Scaling API-first – The story of a global engineering organization
Moving Data is Key to Web Success
1. MOVING DATA FOR THE MASSES
Giuseppe Maxia
QA Director
Continuent, Inc
@datacharmer
This work is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported License.
Friday, November 11, 11 1
2. Databases are the backbone of the web economy
www
$$$
Friday, November 11, 11 2
14. database server
r/w requests
web server
clients
Friday, November 11, 11 11
15. database server
r/w requests
web servers
load balancer
clients
Friday, November 11, 11 12
16. database load
on a simple
web
application
r
e
write a
d
Friday, November 11, 11 13
17. write read
database load on a
successful web
application
Friday, November 11, 11 14
18. database server
✘ r/w requests
web servers
load balancer
clients
Friday, November 11, 11 15
19. read/write
master a web
application
read/only
slaves
scheme with
replication
load balancer
R/W
R/O
web servers
load balancer
clients
Friday, November 11, 11 16
20. r
e
write a read
d
read/write
master
read/only
slaves
database load
with
replication
Friday, November 11, 11 17
21. r
e
write a read
d
read/write
master
read/only
slaves
scaling
database load
with
replication
Friday, November 11, 11 18
22. • If you want a successful
company
Friday, November 11, 11 19
23. • If you want a successful
company
• in the 21st century
Friday, November 11, 11 19
24. • If you want a successful
company
• in the 21st century
• you need to move data
Friday, November 11, 11 19
25. • If you want a successful
company
• in the 21st century
• you need to move data
• fast
Friday, November 11, 11 19
26. • If you want a successful
company
• in the 21st century
• you need to move data
• fast
• and often
Friday, November 11, 11 19
27. MOVING DATA
FOR
THE MASSES
Friday, November 11, 11 20
28. How do you move
data?
Friday, November 11, 11 21
29. How do you move
data?
• Commodity hardware
Friday, November 11, 11 21
30. How do you move
data?
• Commodity hardware
• Basic
• Commodity software (MySQL)
• MySQL native replication
Friday, November 11, 11 21
31. How do you move
data?
• Commodity hardware
• Basic
• Commodity software (MySQL)
• MySQL native replication
• Advanced
• MySQL and other DBMS
• Powerful tools (Tungsten Replicator)
Friday, November 11, 11 21
33. transaction
transaction
MySQL transaction
transaction
transaction
DBMS transaction
transaction
transaction
transaction
transaction
transaction
transaction
BINARY LOG
ns act act tio n
tra ansio ns sac ntio
ac ion ion n
ac
nr tr tra aniton
tsa a n csio
nts ct n
rn a o
taa scti
r
trnsa
n
n
MySQL replication
tio
ct
tra
tra
REPLICATION
transaction
transaction
is single threaded
MySQL
DBMS
Friday, November 11, 11 23
34. master master master master
MySQL MySQL MySQL MySQL
DBMS DBMS DBMS DBMS
MySQL MySQL MySQL MySQL
DBMS DBMS DBMS DBMS
slave slave slave slave
single source multi source (fan-in)
multiple sources?
Friday, November 11, 11 24
35. from this to this
MySQL MySQL
DBMS DBMS
master master
MySQL MySQL
DBMS DBMS
master master
multiple masters?
Friday, November 11, 11 25
36. INSERT INSERT
RECORD RECORD
A A
MySQL MySQL
DBMS DBMS
master master
MySQL MySQL
DBMS DBMS
MODIFY master master MODIFY
RECORD RECORD
B B
Avoiding conflicts?
Friday, November 11, 11 26
37. master
Seamless failover?
MySQL
DBMS
master
MySQL
DBMS
master
MySQL
DBMS
MySQL MySQL
DBMS DBMS
slave slave MySQL
DBMS
MySQL
DBMS
slave slave
MySQL MySQL
DBMS DBMS
slave slave
Friday, November 11, 11 27
38. Replicating to something else?
mysql master
MySQL
DBMS
MySQL
DBMS
mysql postgresql oracle mongodb
Friday, November 11, 11 28
39. All these examples tell
us:
Nice dream, but
MySQL can’t do it
Friday, November 11, 11 29
44. What can it do?
• Easy failover
• Multiple masters
• Multiple sources to a single slave
• Conflict prevention
• Parallel replication
• Replicate to Oracle and PostgreSQL database
Friday, November 11, 11 32
46. Main components
• Transaction History Logs (THL)
• roughly corresponding to MySQL relay logs
• have a lot of metadata
• Service database
• contains metadata for latest transactions
• Metadata is committed together with data
• Makes slaves crash proof
Friday, November 11, 11 34
52. Need to synch the
missing transactions
New master
Slave 1 Slave 2
Binary log
Transactions
========
Binary log Binary log ========
========
Transactions Transactions ========
======== ========
======== ========
======== ========
======== ========
Friday, November 11, 11 40
53. Transaction position in the new
master binlog may not match
positions in the old master binlog
New master
Slave 1 Slave 2
Binary log
Transactions
? ? ========
Binary log Binary log ========
========
Transactions Transactions ========
======== ========
======== ========
======== ? ======== ?
======== ========
Friday, November 11, 11 41
59. Need to synch the
missing transactions
New master
Slave 1 Slave 2
THL
Transactions
========
THL THL ========
========
Transactions Transactions ========
======== ========
======== ========
======== ========
======== ========
Friday, November 11, 11 47
60. Transaction IDs in the new master
THL are understood immediately by
the slaves
New master
Slave 1 Slave 2
Binary log
Transactions
========
Binary log Binary log ========
========
Transactions Transactions ========
======== ========
======== ========
======== Give me ======== Give me
======== seqno #51 ======== seqno #67
Friday, November 11, 11 48
64. Parallel replication facts
✓Sharded by database
✓Good choice for slave lag problems
❖Bad choice for single database projects
Friday, November 11, 11 52
67. before the test (2)
STOPPED
binary logs
MySQL slave
OFFLINE
Tungsten slave direct:
alpha
(slave)
replicator alpha
Friday, November 11, 11 55
68. starting the test
STOPPED
binary logs
MySQL slave Concurrent sysbench
on 30 databases
running for 1 hour
OFFLINE
TOTAL DATA: 130 GB
Tungsten slave direct:
alpha
RAM per server: 20GB
(slave)
replicator alpha
Slaves will have 1 hour lag
Friday, November 11, 11 56
69. measuring results
START
binary logs
MySQL slave
ONLINE Recording
catch-up time
Tungsten slave direct:
alpha
(slave)
replicator alpha
Friday, November 11, 11 57
70. MySQL native
replication
slave catch up in 04:29:30
Friday, November 11, 11 58
71. Tungsten parallel
replication
slave catch up in 00:55:40
Friday, November 11, 11 59
76. parallel replication
direct slave facts
✓No need to install Tungsten on the master
Friday, November 11, 11 63
77. parallel replication
direct slave facts
✓No need to install Tungsten on the master
✓Tungsten runs only on the slave
Friday, November 11, 11 63
78. parallel replication
direct slave facts
✓No need to install Tungsten on the master
✓Tungsten runs only on the slave
✓Replication can revert to native slave with two
commands (trepctl offline; start
slave)
Friday, November 11, 11 63
79. parallel replication
direct slave facts
✓No need to install Tungsten on the master
✓Tungsten runs only on the slave
✓Replication can revert to native slave with two
commands (trepctl offline; start
slave)
✓Native replication can continue on other slaves
Friday, November 11, 11 63
80. parallel replication
direct slave facts
✓No need to install Tungsten on the master
✓Tungsten runs only on the slave
✓Replication can revert to native slave with two
commands (trepctl offline; start
slave)
✓Native replication can continue on other slaves
❖Failover (either native or Tungsten) becomes a
manual task
Friday, November 11, 11 63
94. Conflict prevention
facts
• Sharded by database
Friday, November 11, 11 76
95. Conflict prevention
facts
• Sharded by database
• Defined dynamically
Friday, November 11, 11 76
96. Conflict prevention
facts
• Sharded by database
• Defined dynamically
• Applied either at the master or at the
slave
Friday, November 11, 11 76
97. Conflict prevention
facts
• Sharded by database
• Defined dynamically
• Applied either at the master or at the
slave
• methods:
Friday, November 11, 11 76
98. Conflict prevention
facts
• Sharded by database
• Defined dynamically
• Applied either at the master or at the
slave
• methods:
• make replication fail
Friday, November 11, 11 76
99. Conflict prevention
facts
• Sharded by database
• Defined dynamically
• Applied either at the master or at the
slave
• methods:
• make replication fail
• drop silently
Friday, November 11, 11 76
100. Conflict prevention
facts
• Sharded by database
• Defined dynamically
• Applied either at the master or at the
slave
• methods:
• make replication fail
• drop silently
• drop with warning
Friday, November 11, 11 76
122. Installation
• Check the requirements
• Get the binaries
• Expand the tarball
• Run ./tools/tungsten-installer
Friday, November 11, 11 85
123. REQUIREMENTS
• Java JRE or JDK (Sun/Oracle or Open-jdk)
• Ruby 1.8 (only during installation)
• ssh access to the same user in all nodes
• MySQL user with all privileges
Friday, November 11, 11 86
124. Installation types
• master / slave
• slave - direct
Friday, November 11, 11 87
128. Installation (1)
# starting at node 4, but any would do
for N in 1 2 3 4
do
ssh r$N mkdir tinstall
done
cd tinstall
tar -xzf /path/to/tungsten-replicator-2.0.4.tar.gz
cd tungsten-replicator-2.0.4
Friday, November 11, 11 91
130. Installation (2)
export TUNGSTEN_BASE=$HOME/tinstall
./tools/tungsten-installer
--master-slave # installation mode
--master-host=r1 # who’s the master
--datasource-user=tungsten # mysql username
--datasource-password=secret # mysql password
--service-name=dragon # name of the service
--home-directory=$TUNGSTEN_BASE # where we install
--cluster-hosts=r1,r2,r3,r4 # hosts in cluster
--start # start replicator after installing
Friday, November 11, 11 93
131. What does the
installation do
1: Validate all servers
host4 host1 host2 host3
✔ ✔ ✔ ✔
✗ ✗ ✗ ✗
✔ ✔ ✔ ✔
✔ ✔ ✔ ✔
Report all errors
Friday, November 11, 11 94
132. What does the
installation do
1 (again): Validate all servers
host4 host1 host2 host3
✔ ✔ ✔ ✔
✔ ✔ ✔ ✔
✔ ✔ ✔ ✔
✔ ✔ ✔ ✔
Friday, November 11, 11 95
133. What does the
installation do
2: install Tungsten in all servers
host4
$HOME/ host1
tinstall/ host2
config/ host3
releases/
relay/
logs/
tungsten/
Friday, November 11, 11 96
134. example
ssh r2 chmod 444 $HOME/tinstall
./tools/tungsten-installer
--master-slave --master-host=r1
--datasource-user=tungsten
--datasource-password=secret
--service-name=dragon
--home-directory=$HOME/tinstall
--thl-directory=$HOME/tinstall/logs
--relay-directory=$HOME/tinstall/relay
--cluster-hosts=r1,r2,r3,r4 --start
ERROR >> qa.r2.continuent.com >> /home/tungsten/
tinstall is not writeable
Friday, November 11, 11 97
135. example
ssh r2 chmod 755 $HOME/tinstall
./tools/tungsten-installer
--master-slave --master-host=r1
--datasource-user=tungsten
--datasource-password=secret
--service-name=dragon
--home-directory=$HOME/tinstall
--thl-directory=$HOME/tinstall/logs
--relay-directory=$HOME/tinstall/relay
--cluster-hosts=r1,r2,r3,r4 --start
# no errors
Friday, November 11, 11 98