1. 1
chenshuo.com
ZURG PART 1 OF N
2012/04 Shuo Chen
2. What is it?
2
An example of muduo protorpc
A toy C++ project that can be useful
https://github.com/chenshuo/muduo-protorpc
分布式系统部署、监控与进程管理的几重境界
http://www.cnblogs.com/Solstice/archive/2011/05/09/2041306.html
多线程服务器的适用场合
http://blog.csdn.net/Solstice/article/details/5334243
分布式系统的工程化开发方法
http://blog.csdn.net/solstice/article/details/5950190 (slides)
http://techparty.org/2010/10/19/2010q4summary/ (video)
2012/04 chenshuo.com
3. Overview
3
Master-Slave structure
Communicates with bi-directional RPC
Command line tool to change and view status
A web frontend in future if I have time to learn web
Central configuration of service placements
Zurg slave is memory-less, doesn’t store any thing
That is different to supervisord
Also serve as a name server
Master looks like a SPOF, but can be overcome
2012/04 chenshuo.com
4. Why not just run services as
4
daemons?
It’s fine to do so on 5 hosts, how about 50? 500?
Not easy to upgrade apps
Usually needs to ssh to every host and restart apps
Not transparent
How is every application running well ?
Has to deploy a monitor system anyway
And the notification of app crashing is not real time
Auto restart daemons could hide the real
problem and confuse the monitor system
2012/04 chenshuo.com
5. Zurg slave – functionalities
5
Process management
Run a command (short-lived child process)
Start/stop a service (long-lived child process)
Not standard services, but programs written by yourself
Detect child death in real time and report to master
Not polling with pids or process names
Collecting performance metrics
Monitor system health
Both regular heartbeats and event notifications
to Master
2012/04 chenshuo.com
6. Zurg slave – design decisions
6
All-in-one single-threaded process
Don’tkeep running iostat/vmstat/top/netstat/XXXstat
Replaces(?) nagios/monit/ganglia/munin/supervisord
No plugins, just compiled what you need into one binary
C++ for efficient and less resource usage
Itruns on every hosts, every little helps
Often the monitoring tools* use too much resource
No local configuration, easy to deploy & upgrade
Just point it to the master
Start it in init.d, it will take over everything else
2012/04 chenshuo.com
7. Zurg slave – NOT in scope
7
Configuration management
System administration
Use Puppet instead
Deployment of in-house software
Although can be done with ‘wget’ followed by ‘tar xf’
2012/04 chenshuo.com
8. Run a command
8
Start a child process
Wait until it finishes (asynchronously, of course)
Capture stdout/stderr
No other opened files in the parent should be leaked
to child, set FD_CLOEXEC on every fd
Sounds like re-invent Python subprocess module?
Not exactly!
2012/04 chenshuo.com
9. The easy part of process mgmt
9
Start a new process
fork(2)/exec*(2)
How to get errno if exec() failes? It’s in child process
“The self-pipe trick” http://cr.yp.to/docs/selfpipe.html
Get notification when a child terminates
SIGCHLD, either signalfd(2) or legacy signal handler
Signal is not reliable, so run wait(2) periodically (nb)
Get exit status of a terminated child process
wait4(2) tells everything incl. memory/CPU usage
2012/04 chenshuo.com
10. A simple challenge
10
Limit the runtime of a command, not CPU time
Typical timeout of 60 seconds
Remember the pid when start running a command
Set up a timer, kill(2) it when timeout
How do you know that the process you are going
to kill is the one that you created for the cmd?
Set atimer to kill pid 9527, 60 seconds later
What if process 9527 dies just before the timer event,
And a new process was created with the same pid (?!)
2012/04 chenshuo.com
11. Pid is unique but not always
11
Pid wraps (in minutes or seconds)
Pid is unique when take a snapshot of all processes
But it is not unique if time moves on
The possible values of pids are small (1~32767)
/proc/sys/kernel/pid_max default 32768
/proc/loadavg lastpid 3387
/proc/stat processes 423666
There is a tiny time window between timer wakeup
and kill(2)ing, anything could happen in between
And there is no mutex or lock for this race condition
2012/04 chenshuo.com
12. How to kill a child properly?
12
So it is not safe to kill-by-pid, you may kill
someone else’s child process by mistake
How about check ppid first?
Youmay kill you own new child, if another
RunCommand reuses the pid just before the timer.
The pid + start_time combination is unique in
space and time
Start
time is in /proc/pid/stat, in jiffies since boot
Remember the start time after fork() a child*
Check start time before killing the child
2012/04 chenshuo.com
13. Why it is safe?
13
If two processes start at almost the same time,
their pids must be different
If two processes happen to have the same pid,
their start time must be different
It takes seconds to wrap pid, start time is monotonic
Since zurg slave is single-threaded, no race
condition between checking and killing
Don’t run zurg slave as root, (it quits if euid == 0)
Don’t run two zurg slaves with same uid on a box
2012/04 chenshuo.com
14. Capture stdout&stderr, simple ?
14
Two pipes are needed, dup2() the write fd to 1, 2
in child, read the other side of two fds in parent.
Keep data in memory and send back when finishes
Command ‘cat /dev/zero’ will blow up zurg slave
We must limit the size of stdout and stderr
The default size is 1024KiB
Two approaches, when size breaches limit:
Stop reading, i.e. block writing, wait until timeout
Close the read side of pipe, i.e. kill child with SIGPIPE
Directly sending a SIGPIPE signal doesn’t work
2012/04 chenshuo.com
15. Race condition at process exits
15
When a child exits, all its open fds will be closed
Parent will read(2) a 0, it should close the fd,
otherwise POLLHUP will cause a busy loop
A child could close them purposefully before dying
The events of process exited and std{out,err} fds
closed could arrive in no particular order
Is there any flying data that has not been received?
The lifetime mgmt of Process/Pipe objects are
also subtle, as fds are reused so aggressively
Read the code to find out how to do it correctly
2012/04 chenshuo.com
18. Run Script
18
RunCommand with script file content provided
in the request
A programmatic way to run slightly different
scripts on many hosts
2012/04 chenshuo.com
19. Application management
19
Start/monitor/stop applications
Applications a.k.a
services, long running processes
Apps can be written in C++/Java/Python/etc.
Share most functionalities of RunCommand
stdout/stderr redirected to files, not captured
No timeout
Intrusive vs. non-intrusive
Canzurg_slave manage any application?
Should the managed application follow some rules?
2012/04 chenshuo.com
20. How to detect app exiting
20
Polling (pid and start time)
Not real
time, always with a poll interval
How do you know one process is the application?
SIGCHLD
Not 100% reliable, so call wait(2) periodically
Pipe, leave the write side in child process, read
in zurg_slave, when app exits, read(2) returns 0
Reliable and promptly
The application must not close the fd* (intrusive!)
2012/04 chenshuo.com
21. What if zurg_slave crashes?
21
How to prevent starting duplicated services
SIGCHILD and pipe(2) are nonrenewable
Sockets? App reconnects to localhost zurg slave
i.e.
heartbeat between app and zurg slave
Even more intrusive, retry logic in all languages
Other thoughts?
An other layer of indirection?
2012/04 chenshuo.com
22. To be continued
22
Collecting health & performance data
Periodically heartbeat to master
Process status, performance metrics
Zurg slave is 50% done as of end of April 2012
2012/04 chenshuo.com
23. Zurg Master
23
A multithreaded program
Its status is all retrievable from outside
Easy to build Web/GUI frontends
Have not started coding yet.
2012/04 chenshuo.com
Notas do Editor
* In script language
*Must be done in child process and pass back to parent