Pluk2013 bodybuilding ratheesh

Bodybuilding.com
•  Information
•  Motivation
•  Supplementation

S

About Me..

•  Ratheesh Kaniyala
•  MySQL Database Admin @ Bodybuilding.com
•  Responsible for keeping the site up and running
•  Work with system engineers, developers, product owners to

get the best out of the database system

Agenda

S  About Bodybuilding.com
S  Mysql Architecture @ Bodybuilding.com
S  Backups
S  Monitoring
S  Some Benchmarks on Fusionio
S  NoSQL @ BBcom

Bodybuilding.com
S  What is Bodybuilding.com?

We:
• 

Provide Information + Motivation + Supplementation for leading
a healthy lifestyle

• 

Are not just for bodybuilders

• 

Have hundreds of vendors

• 

Have thousands of content (articles, exercises, videos)

• 

Have millions of users – still growing

Statistics
S  Ranked #319 US website and #709 worldwide
S  15th largest forum with over 100 million posts
S  About 65TB served from Akamai and 57 TB from

cloudfront
S  About 25 Million Unique visitors per month
S  About 5000 Bodyspace registrations per day

APPLICATIONS

S  Our main applications:
S  Busy Forum – Powered by vBulletin, Mysql
S  Bodyspace – Php, Java, MySQL, MongoDB
S  Store – ATG (Art Technology Group), Oracle 11G
S  Content – Php, MySQL

Why Percona?
S  This is what Percona says:
• 
• 
• 
• 
• 
• 

Your queries will run faster and more consistently.
You will consolidate servers on powerful hardware.
You will delay sharding, or avoid it entirely.
You will spend less time tuning and administering.
You will achieve higher uptime.
You will troubleshoot without guesswork.

S  And we found this to be true for every point.

High Level Tech Diagram
SECURITY LAYER

WEB SERVERS
CACHING LAYER

MYSQL
RDBMS

NOSQL
DATA LAYER

SPHINX
/SOLR

DATA LAYER

Cluster A

Cluster B

Cluster C

Cluster D

A Typical MYSQL cluster
Reads

Writes & some reads

Search Indexes
Query replay for
lru_dump

Ÿ Ÿ Ÿ
Backups

Offsite
Backup

Analytics

HA

•  Multi-master asynchronous replication
•  Active – Passive mode
•  What happens when any slave dies?
•  What happens when passive master dies?
•  What happens when Active master dies?

Active Master Died - Case1

•  M2 and all slaves
will end up being
at the same
position
•  Point all slaves to
the M2 by change
master
Read only slaves

•  Resume traffic

Active Master Died – Case 2

•  M2 and all slaves
are in different
states
•  Promote the most
up-to-date slave as
master
•  Fix the slave pool
by looking into
binary logs
Read only slaves

Maintenance on Active master

•  Move all slaves
to M2
•  Change DNS to
make M2 active

Read only slaves

Maintenance on active contd…

•  Move all slaves
to M2
•  Change DNS to
make M2 active

Read only slaves

Maintenance on active contd…

•  It is not that easy to move N slaves to M2 manually because

the binlog file and position on M2 does not match that on
M1
•  Can we automate this slave migration?
• 

Yes

•  How?
• 

There is an algorithm that I can share

Slave migration algorithm
S 

step 1: For each slave node do STOP SLAVE

S 

step 2: Sleeps for 10 sec so that the secondary master is slightly ahead than the stopped
slaves

S 

step 3: STOP SLAVE on the secondary master and record the SHOW SLAVE STATUS
result on it.

S 

step 4: For each slave node do START SLAVE UNTIL the recorded position in step 3

S 

step 5: For each slave node do STOP SLAVE (required for changing master later)
S 
S 

a) check the slave status and stop slave only when the current position = recorded position in step
3
b) If the slave is behind by too much and does not catch up to the UNTIL position in 20 secs then
skip the node and make a note about it in the console and the log file. This node has to be taken
care of manually later. Look for directions in the log file on how to change master on this node.

Slave migration algorithm
contd..
S  step 6: do SHOW MASTER STATUS on secondary master and record

the position
S  step 7: For each slave node do a CHANGE MASTER TO secondary

master with the position recorded in step 6
S  Step 8: do START SLAVE on secondary master
S  step 9: For each slave node do START SLAVE

Backups
• 

Nightly logical backups (mysqldump)

• 

Every 2 hours - lvm snapshot backups of mysql datadir
• 
• 

• 

Stop slave
Take snapshot

All backups are done on a dedicated backup slave (B1)
• 

Copy to NAS

• 

Be ready for next snapshot

• 

mylvmbackup script provides various hook points to call a custom script

• 

Note: If you are not validating your backups then you don’t care about your
data?

Snapshot Process
Preconnect

Premount

Backupsuccess
Update
Monitoring

Backupsuccess

• 

Start mysql on the snapshot data – This does mysql recovery and keep
the snapshot ready-to-use

• 

Validate backup by running custom queries, counts, statistics etc so that
you are sure that it can be used for server rebuilds and recovery

Monitoring

• 

We use zabbix for monitoring – alerting + trending

• 

Close to 100,000 checks

• 

Mysql Performance Monitor plugin for Innodb

• 

Custom scripts can use zabbix trapper – e.g. data size trend

• 

Dbview (php based tool) for realtime processlist and slow query
history

• 

Red Light is ON – Means trouble

Benchmarking

S  To understand the capacity of the hardware and software
S  To understand what happens when you have a high traffic

day
S  To understand the impact of configuration changes
S  To test the storage unit

Benchmarking with linkbench
• 

Linkbench from Facebook simulates a socio graphic workload

• 

The default workload pattern kind of matches our workload

• 

https://github.com/facebook/linkbench

• 

https://www.facebook.com/notes/facebook-engineering/linkbench-a-databasebenchmark-for-the-social-graph/10151391496443920

• 

Read-Write operation mix, Throttle requests

• 

Target hot nodes

• 

Extendable and you can write plugins – not tried!

• 

Note: Linkbench partitions the linktable. If you want to benchmark without
partition, create the table without partition and then load data.

Linkbench results

• 

Ops/Sec per thread
•  [Thread-122]: 2190000/100000000 requests finished: 2.2%
complete at 28311.8 ops/sec 77.4/100000

• 

Ops/Sec Total
•  [main]: REQUEST PHASE COMPLETED. 100000000
requests done in 2641 seconds. Requests/second = 37852

Benchmark Parameters

enchmark Cases 40 GB
Database Size – 175GB, 400GB,
• 
• 

RAM – 64 GB

• 

Innodb_buffer_pool_size – 50GB for all tests

• 

Storage – Fusion iodrive2

• 

File System – XFS, EXT4

• 

Database Server Specs – Dell R620, 2 Chip-8 core-HT => 32 procs,
Intel Xeon E5 2680 processor, debian Linux Squeeze, 2.6.32 Kernel

Innodb settings for fusionio

•  Innodb_read_ahead=0
•  Innodb_read_io_threads=8
•  Innodb_adaptive_checkpoint=keep_average
•  Innodb_flush_method=O_DIRECT
•  Innodb_io_capacity=10000
•  Innodb_flush_neighbor_pages=0

Benchmark results

Ops per sec

Linkbench 175G database on fusion iodrive2
40000
35000
30000
25000
20000
15000
10000
5000
0

•  200 threads
•  RW mix (70:30)

38000
33000

30000

25000

Percona 5.1
Percona 5.5

XFS

EXT4

Bigger Database

•  200 threads
•  RW mix (70:30)

Percona 5.5.33 - 400G db
13000

14000

Ops Per Sec

12000
10000

10000

8000
6000
4000
2000
0
XFS

EXT4

Fusion Fullness
•  175G db + 400G junk
•  200 threads
•  RW Mix (70:30)

Ops Per Sec

Percona 5.5.33 – 175G DB
40000
35000
30000
25000
20000
15000
10000
5000
0

38000
33000

33000
More Full

21000

Less Full

XFS

EXT4

Fragmentation Effect
•  200 threads
•  RW mix (70:30)

40000

36000

35000
Ops Per Sec

30000

27000

25000
20000
15000
10000
5000
0
XFS

EXT4

Doublewrite Effect
46000

45000

Ops Per Sec

44000
42000

Doublewrite ON

40000
38000
36000
34000

•  No atomicity without
double write
•  Percona supports
atomic writes using
DirectFS NVM
Filesystem
•  RW Mix (70:30)
•  200 threads

38000

READ Workload
•  90% Reads
•  10% Writes
•  200 threads


Ops Per Sec

60000
59500

59500

59000

59000

58500
XFS

EXT4

WRITE Workload
•  90% writes
•  10% reads
•  200 threads

16000

14000

14000
Ops Per Sec

12000
10000
8000
6000

6000

4000
2000
0
XFS

EXT4

More threads

Ops Per Sec

40000
35000
30000
25000
20000
15000
10000
5000
0

35000

18000

XFS

EXT4

•  400 Threads
•  RW Mix (70:30)

Bufferpool Instances
40000

Ops Per Sec

39500
39000
38500
38000
37500
37000
1

2
Bufferpool Instances

8

•  200 threads
•  70:30 RW mix
•  NUMA has a
role?

Low concurrency

S  Is important because the workload may not always be high

concurrency
S  The system resources are best utilized under high

concurrency but there could be a lot of situations where you
could just have 8 threads running or 4 threads running.
S  How does the database behave in these situation?
S  Could there be performance regression?

Low concurrency contd..

12000

11000

10500

10000
8000

7500

7300

6000

4 threads

4000

8 threads

2000
0
Percona 5.1

Percona 5.5

Ops Per Sec

ODIRECT Vs Buffered

40000
35000
30000
25000
20000
15000
10000
5000
0

38000
33000

30000
Buffered

16500

XFS

O_DIRECT

EXT4

NOSQL
• 

User feeds, Nested comments, Personalization

• 

User feeds can be nasty

• 

Some users have thousands of friends

• 

Millions of feeds

• 

Feeds can have events that cannot share a schema

• 

E.g. Somebody updated weight != somebody updated bodyfat %

• 

Mongodb a savior plus a killer

• 

Savior because of its schemaless nature

• 

Killer because of its locking issues, index immaturity

• 

Sharding may solve the problem but not a shardable problem.

User Feeds
Feeds Collection

Events Collection

_id:,
Text:,
Date:,
Event_ids:

_id:,
Weight:,
Id1,
Id2,
.
.

_id:,
BodyFat:,

Pluk2013 bodybuilding ratheesh

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (17)

Semelhante a Pluk2013 bodybuilding ratheesh

Semelhante a Pluk2013 bodybuilding ratheesh (20)

Último

Último (20)

Pluk2013 bodybuilding ratheesh