2. 2
1. Who
are
we?
2. What
monitoring
tools
do
we
use?
3. What
are
StatsD,
Collectd
and
Graphite?
4. How
MySQL
logs
to
StatsD
5. Graphing
examples
6. Challenges
7. QuesHons?
Overview
4. 4
• Company
founded
in
2001
• 350+
employees
world
wide
• 180M+
unique
visitors
per
month
• Over
50M
registered
users
• 45
portals
in
19
languages
• Casual
games
• Social
games
• Real
Hme
mulHplayer
games
• Mobile
games
• 35+
MySQL
clusters
• 60k
queries
per
second
(3.5
billion
qpd)
Facts
5. 5
Geographic Reach
180
Million
Monthly
AcHve
Users(*)
Source:
(*)
Google
Analy3cs,
August
2012
6. 6
Girls,
Teens
and
Family
spielen.com
juegos.com
gamesgames.com
games.co.uk
Brands
8. 8
• Opsview/Nagios
(mainly
availability)
• CacH
(using
Baron
Schwartz/Percona
templates)
• MONYog
• Good
ol’
RRD
Existing monitoring systems we use(d)
9. 9
Opsview/Nagios
• Strong
points:
• Easy
to
create
(nagios)
plugins
• Slaves
for
scaling
out
• Weak
points:
• Stats
gathering
through
polling
• Low
granularity
(1
to
5
minutes)
• Difficult
URIs
for
graphs
10. 10
Cacti
• Strong
points:
• Awesome
Percona
templates
• Great
overviews
and
graphs
• Weak
points:
• Hard
to
add
new
metrics
(to
90+
servers)
• Not
scalable
• Low
granularity
(1
to
5
minutes)
• Hard
to
correlate
11. 11
MonYOG
• Strong
points:
• Easy
to
set
up
• Compare
any
server
with
another
• Compare
configuraHons
• Weak
points:
• “Closed
source”
• Not
scalable
• Jack
of
all
trades
12. 12
Poll limitations
• Limited
to
a
set
interval
• Data
gets
averaged
out
• (Host)
checks
are
run
serial
• Slowdowns
in
a
run
means
no/less
data
• Scaling:
add
more
masters/slaves
• Sekng
up
an
SSH
connecHon
is
slow
13. 13
Difficult to add a new metric
host065!
bash-3.2# netstat -s | grep "listen queue"!
26 times the listen queue of a socket overflowed!
!
host066!
bash-3.2# netstat -s | grep "listen queue"!
33 times the listen queue of a socket overflowed!
16. 16
• Highly
scalable
real-‐Hme
graphing
system
• Collects
numeric
Hme-‐series
• Backend
daemon
Carbon
• Carbon-‐cache:
receives
data
• Carbon-‐aggregator:
aggregates
data
• Carbon-‐relay:
replicaHon
and
sharding
• RRD
or
Whisper
database
What is Graphite?
17. 17
• Each
metric
is
in
its
own
bucket
• Periods
make
folders
• prod.syseng.mmm.<hostname>.admin_offline
• Metric
types
• Counters
• Gauge
• RetenHon
can
be
set
using
a
regex
• [mysql]
• pasern
=
^prod.syseng.mysql..*$
• retenHons
=
2s:1d,1m:3d,5m:7d,1h:5y
Graphite’s capabilities
18. 18
• Unix
daemon
that
gathers
system
staHsHcs
• Over
90
(input/output)
plugins
• Plugin
to
send
metrics
to
Graphite/Carbon
• Very
useful
for
system
metrics
What is Collectd?
19. 19
• Front-‐end
proxy
for
Graphite/Carbon
(by
Etsy)
• NodeJS
daemon
(also
other
languages)
• Receives
UDP
(on
localhost)
• Buffers
metrics
locally
• Flushes
periodically
data
to
Graphite/Carbon
(TCP)
• Client
libraries
available
in
about
any
language
• Send
any
metric
you
like!
What is StatsD?
28. 28
• MySQL
plugin
for
Collectd
• Sends
SHOW
STATUS
• No
INNODB
STATUS
• Plugin
not
flexible
• DBI
plugin
for
Collectd
• Metrics
based
on
columns
• Different
granularity
needed
• Separate
daemon
(with
persistent
connecHon)
• StatsD
is
easy
as
ABC
Why use StatsD over Collectd?
29. 29
• Wrisen
in
Python
• Gathers
data
every
0.5
seconds
• Sends
to
StatsD
(localhost)
a•er
every
run
• Easy
to
set
up:
no
configuraHon
• Persistent
connecHon
• Baron
Schwartz’
InnoDB
status
parser
(cacH
poller)
• Other
interesHng
metrics
and
counters
• InformaHon
Schema
• MySQL
5.5/5.6
Performance
Schema
• MariaDB
specific
• Galera
specific
MySQL StatsD daemon
30. 30
MySQL StatsD overview
MySQLCollector
SHOW
STATUS
SHOW
INNODB
STATUS
SHOW
VARIABLES
Persistent
connection
StatsD
Flushed
every
0.5 seconds
31. 31
• Perl
(Net::Statsd)
• Sends
any
status
change
to
StatsD
(localhost)
• Non-‐blocking
(thanks
to
UDP)
• Draw
as
infinite
in
Graphite
MySQL Multi Master patch
35. 35
• IdenHfy
your
KPIs
• Don’t
graph
everything
• More
graphs
==
less
overview
• Combine
metrics
• Stack
clusters
What is important for you?
36. 36
• Include
other
metrics
into
your
graphs
• Deployments
• Failover(s)
• Combine
applicaHon
metrics
with
your
database
• Other
influences
• Solar
flares
• Start
of
the
new
Maya
calendar
Correlate!
37. 37
• URI
based
rendering
API
• Support
for
wildcards
• stats.prod.syseng.mysql.*.status.com_select
• sumSeries
(stats.prod.syseng.mysql.*.status.com_select)
• aliasByNode(stats.prod.syseng.mysql.*.status.com_select,
4)
• Many
funcHons
• Nth
percenHle
• Holt-‐Winters
Forecast
• Timeshi•
Graphite Graphing Engine
46. 46
• MySQL_statsd
rewrite
necessary
(not
opensource
yet)
• No
alerHng
through
Graphite
(yet)
• Machine
learning
• Eternal
hunger
for
more
metrics
• Abuse
of
the
system
What challenges do we have?
47. 47
• Persistent
connecHons
+
repeatable
read
• History
list
skyrocketed
• Too
many
metrics
slows
down
graphing
• Too
many
metrics
can
kill
a
host
• EstatsD
for
Erlang
What lessons have we learned?
49. 49
• Graphite:
hsp://graphite.readthedocs.org/en/latest/
• Collectd:
hsps://collectd.org/
• StatsD
on
Github
by
Etsy:
hsps://github.com/etsy/statsd/wiki
• Etsy
on
StatsD:
hsp://codeascra•.etsy.com/2011/02/15/measure-‐
anything-‐measure-‐everything/
Practical links
50. 50
• PresentaHon
can
be
found
at:
hsp://spil.com/perconasc2013
• If
you
wish
to
contact
me:
art@spilgames.com
• Don’t
forget
to
rate
my
talk!
Thank you!