Zurg part 1

1

chenshuo.com

ZURG PART 1 OF N
2012/04 Shuo Chen

What is it?
2

 An example of muduo protorpc
A toy C++ project that can be useful
 https://github.com/chenshuo/muduo-protorpc

 分布式系统部署、监控与进程管理的几重境界
 http://www.cnblogs.com/Solstice/archive/2011/05/09/2041306.html

 多线程服务器的适用场合
 http://blog.csdn.net/Solstice/article/details/5334243

 分布式系统的工程化开发方法
 http://blog.csdn.net/solstice/article/details/5950190 (slides)
 http://techparty.org/2010/10/19/2010q4summary/ (video)

2012/04 chenshuo.com

Overview
3

 Master-Slave structure
 Communicates with bi-directional RPC
 Command line tool to change and view status

 A web frontend in future if I have time to learn web

 Central configuration of service placements
 Zurg slave is memory-less, doesn’t store any thing
 That is different to supervisord

 Also serve as a name server
 Master looks like a SPOF, but can be overcome

Why not just run services as
4
daemons?
 It’s fine to do so on 5 hosts, how about 50? 500?
 Not easy to upgrade apps
 Usually needs to ssh to every host and restart apps
 Not transparent
 How is every application running well ?
 Has to deploy a monitor system anyway
 And the notification of app crashing is not real time
 Auto restart daemons could hide the real
problem and confuse the monitor system

Zurg slave – functionalities
5

 Process management
 Run a command (short-lived child process)
 Start/stop a service (long-lived child process)
 Not standard services, but programs written by yourself

 Detect child death in real time and report to master
 Not polling with pids or process names

 Collecting performance metrics
 Monitor system health
 Both regular heartbeats and event notifications
to Master

Zurg slave – design decisions
6

 All-in-one single-threaded process
 Don’tkeep running iostat/vmstat/top/netstat/XXXstat
 Replaces(?) nagios/monit/ganglia/munin/supervisord
 No plugins, just compiled what you need into one binary

 C++ for efficient and less resource usage
 Itruns on every hosts, every little helps
 Often the monitoring tools* use too much resource

 No local configuration, easy to deploy & upgrade
 Just point it to the master
 Start it in init.d, it will take over everything else

Zurg slave – NOT in scope
7

 Configuration management
 System administration
 Use Puppet instead
 Deployment of in-house software
 Although can be done with ‘wget’ followed by ‘tar xf’


Run a command
8

 Start a child process
 Wait until it finishes (asynchronously, of course)
 Capture stdout/stderr
 No other opened files in the parent should be leaked
to child, set FD_CLOEXEC on every fd

 Sounds like re-invent Python subprocess module?
 Not exactly!


The easy part of process mgmt
9

 Start a new process
 fork(2)/exec*(2)

 How to get errno if exec() failes? It’s in child process
 “The self-pipe trick” http://cr.yp.to/docs/selfpipe.html

 Get notification when a child terminates
 SIGCHLD, either signalfd(2) or legacy signal handler
 Signal is not reliable, so run wait(2) periodically (nb)

 Get exit status of a terminated child process
 wait4(2) tells everything incl. memory/CPU usage

A simple challenge
10

 Limit the runtime of a command, not CPU time
 Typical timeout of 60 seconds
 Remember the pid when start running a command

 Set up a timer, kill(2) it when timeout

 How do you know that the process you are going
to kill is the one that you created for the cmd?
 Set atimer to kill pid 9527, 60 seconds later
 What if process 9527 dies just before the timer event,

 And a new process was created with the same pid (?!)


Pid is unique but not always
11

 Pid wraps (in minutes or seconds)
 Pid is unique when take a snapshot of all processes
 But it is not unique if time moves on

 The possible values of pids are small (1~32767)
 /proc/sys/kernel/pid_max default 32768
 /proc/loadavg lastpid 3387
 /proc/stat processes 423666
 There is a tiny time window between timer wakeup
and kill(2)ing, anything could happen in between
 And there is no mutex or lock for this race condition

How to kill a child properly?
12

 So it is not safe to kill-by-pid, you may kill
someone else’s child process by mistake
 How about check ppid first?
 Youmay kill you own new child, if another
RunCommand reuses the pid just before the timer.
 The pid + start_time combination is unique in
space and time
 Start
time is in /proc/pid/stat, in jiffies since boot
 Remember the start time after fork() a child*

 Check start time before killing the child

Why it is safe?
13

 If two processes start at almost the same time,
their pids must be different
 If two processes happen to have the same pid,
their start time must be different
 It takes seconds to wrap pid, start time is monotonic
 Since zurg slave is single-threaded, no race
condition between checking and killing
 Don’t run zurg slave as root, (it quits if euid == 0)
 Don’t run two zurg slaves with same uid on a box


Capture stdout&stderr, simple ?
14

 Two pipes are needed, dup2() the write fd to 1, 2
in child, read the other side of two fds in parent.
 Keep data in memory and send back when finishes
 Command ‘cat /dev/zero’ will blow up zurg slave
 We must limit the size of stdout and stderr
 The default size is 1024KiB
 Two approaches, when size breaches limit:
 Stop reading, i.e. block writing, wait until timeout
 Close the read side of pipe, i.e. kill child with SIGPIPE
 Directly sending a SIGPIPE signal doesn’t work

Race condition at process exits
15

 When a child exits, all its open fds will be closed
 Parent will read(2) a 0, it should close the fd,
otherwise POLLHUP will cause a busy loop
 A child could close them purposefully before dying

 The events of process exited and std{out,err} fds
closed could arrive in no particular order
 Is there any flying data that has not been received?
 The lifetime mgmt of Process/Pipe objects are
also subtle, as fds are reused so aggressively
 Read the code to find out how to do it correctly

Run Command Request
16

message RunCommandRequest {
required string command = 1;
optional string cwd = 2 [default = "/tmp"];
repeated string args = 3;
repeated string envs = 4;
optional bool envs_only = 5 [default = false];
optional int32 max_stdout = 6 [default = 1048576];
optional int32 max_stderr = 7 [default = 1048576];
optional int32 timeout = 8 [default = 60];
optional int32 max_memory_mb = 9 [default = 32768];
}


Run Command Response
17

message RunCommandResponse {
required int32 error_code = 1;
optional int32 pid = 2;
optional int32 status = 3;
optional bytes std_output = 4;
optional bytes std_error = 5;
optional int64 start_time_us = 16;
optional int64 finish_time_us = 17;
optional float user_time = 18;
optional float system_time = 19;
optional int64 memory_maxrss_kb = 20;
// optional int64 ctxsw = 21;
optional int32 exit_status = 30 [default = 0];
optional int32 signaled = 31 [default = 0];
optional bool coredump = 32 [default = false];
} 2012/04 chenshuo.com

Run Script
18

 RunCommand with script file content provided
in the request
 A programmatic way to run slightly different
scripts on many hosts


Application management
19

 Start/monitor/stop applications
 Applications a.k.a
services, long running processes
 Apps can be written in C++/Java/Python/etc.

 Share most functionalities of RunCommand
 stdout/stderr redirected to files, not captured
 No timeout
 Intrusive vs. non-intrusive
 Canzurg_slave manage any application?
 Should the managed application follow some rules?


How to detect app exiting
20

 Polling (pid and start time)
 Not real
time, always with a poll interval
 How do you know one process is the application?

 SIGCHLD
 Not 100% reliable, so call wait(2) periodically
 Pipe, leave the write side in child process, read
in zurg_slave, when app exits, read(2) returns 0
 Reliable and promptly
 The application must not close the fd* (intrusive!)


What if zurg_slave crashes?
21

 How to prevent starting duplicated services
 SIGCHILD and pipe(2) are nonrenewable
 Sockets? App reconnects to localhost zurg slave
 i.e.
heartbeat between app and zurg slave
 Even more intrusive, retry logic in all languages

 Other thoughts?
 An other layer of indirection?


To be continued
22

 Collecting health & performance data
 Periodically heartbeat to master
 Process status, performance metrics

 Zurg slave is 50% done as of end of April 2012


Zurg Master
23

 A multithreaded program
 Its status is all retrievable from outside
 Easy to build Web/GUI frontends

 Have not started coding yet.


Zurg part 1

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Zurg part 1

Semelhante a Zurg part 1 (20)

Último

Último (20)

Zurg part 1

Notas do Editor