We upgraded from Backgroundrb to Resque. The pagers have stopped buzzing, and we are very pleased with the migration.
Resque was a little tricky to get the last 5% complete. This presentation shares some of the implementation details (code and config files) to help others make their Resque setup rock solid.
2. Summary
! Background queues let us defer logic outside the browser request and
response.
! Background.rb was crashing for us often. Moved to resque and it
hasn't crashed since.
! Background.rb is easier to run out of the box.
! Adding just a little code makes Resque just as easy without sacrificing
all the added flexibility.
3. Why we upgraded?
! bdrb pages Boss 4 times my first weekend
! memory leaks caused crashes
! monit can't restart workers in backgroundrb
! move to active project (ala heroku, github, redis)
4. What do each bring to the table
bdrb resque
adhoc (out of request) ! !
delay (run/remind) ! resque-schedule
schedule (cron) ! resque-schedule
mail (invisible/out of req) code resque_mailer
status reporting code resque-meta, web
backgroundrb does most of what we need out of the box
resque has plugins to make up the difference
5. Bdrb Components
scheduler
workers
rails main queue
work
enqueue queue manager
mailer
we started Monitored data
bdrb yml
simple w/ 1 queue (add started_at for delayed jobs)
scheduler is a special worker - managed by 1 process (is a runner/worker)
6. Resque Components
delayed scheduler schedule
queue
2
rails
enqueue
1
workers rake
4
resque main
main work
web main
queue
queue
queue workers
6
3
mailer
we started Monitored data
5
many moving parts
simplified in all workers are the same
scheduler simply adds entries in the queue (instead of MetaWorker/running jobs)
web ui is a nice touch
7. 1. Ad-hoc Enqueuing
bdrb resque
args hash ruby, checked
enqueue AR objects !
mail(invisible) ! !
AR objects - creeped up in the action_mailer deliver calls
Looks like bdrb wins here, but not enqueuing AR objects is best practice
8. Ad-hoc/Delayed (bdrb)
class JobWorker < BackgrounDRb::MetaWorker
set_worker_name :job_worker
def purge_job_logs()
JobLog.purge_expired!
persistent_job.finish!
end
def self.perform_later(*args)
MiddleMan.worker(:job_worker).enq_purge_job_logs(
:job_key => new_job_key, :arg => args)
end
def self.perform_at(*args)
time=args.shift
MiddleMan.worker(:job_worker).enq_purge_job_logs(
:job_key => new_job_key, :arg => *args,:scheduled_at => time)
end
def self.new_job_key()
"purge_job_logs_#{ActiveSupport::SecureRandom.hex(8)}"
end
end
don't need to do a command pattern (our code didn't)
scheduled_at = beauty of SQL
parent class
enqueue knows queue name (code not loaded)
9. Ad-hoc/Delayed (resque)
class PurgeJobLogs
@queue = :job_worker
def self.process()
JobLog.purge_expired!
end
def self.perform_later(*args)
Resque.enqueue(self, *args)
end
def self.perform_at(*args)
time=args.shift
Resque.enqueue_at(time, self, *args)
end
end
Enqueue needs worker class to know the name of the queue
(even if called directly into Resque)
interface only (perform_{at,later}) -> abstracted out to parent?
10. 2. Scheduled Enqueuing
bdrb resque
sched any method !x2 command
scheduler ! !+
adhoc jobs !
Need to define schedule in 2 places. yml and ruby.
We ran into case where this caused a problem
web ui for easy adhoc kicking off of resque commands. (very useful in staging)
11. Scheduled (bdrb)
:backgroundrb:
:ip: 127.0.0.1
:port: 11006
:environment: development
:schedules:
:scheduled_worker:
:purge_job_logs:
:trigger_args: 0 */5 * * * *
Evidence of framework - scheduled_worker defined here, need meta worker (so it can be run)
12. Scheduled (bdrb)
class ScheduledWorker < BackgrounDRb::MetaWorker
extend BdrbUtils::CronExtensions
set_worker_name :scheduled_worker
threaded_cron_job(:purge_job_logs) { JobLog.purge_expired! }
end
scheduler = MetaWorker. Defined 2 times - so it calls your code, so can call "any static method"
13. Scheduled (resque)
---
clear_logs:
cron: "*/10 * * * *"
class: PurgeJobLogs
queue: job_worker
description: Remove old logs
queue_name (so scheduler does not need to load worker into memory to enqueue)
cron is standard format (remove 'seconds') - commands
scheduler in separate process. (can run when workers are stopped / changed) - minimal env
scheduler injects into queue (vs runs jobs) - so can adhoc inject via web
no ruby code for this
15. worker list (resque)
primary:
queues: background,mail
secondary:
queues: mail,background
can have multiple workers running the same queues
can have multiple queues in 1 worker
worker pool can be * generalized, * response focused, * schedule focused, *changed at runtime
inverted priority list - prevents starvation
16. 4. Running Workers
namespace :resque do
desc 'start all background resque daemons'
task :start_daemons do
mrake_start "resque_scheduler resque:scheduler"
workers_config.each do |worker, config|
mrake_start "resque_#{worker} resque:work QUEUE=#{config['queues']}"
end
end
desc 'stop all background resque daemons'
task :stop_daemons do
sh "./script/monit_rake stop resque_scheduler"
workers_config.each do |worker, config|
sh "./script/monit_rake stop resque_#{worker} -s QUIT"
end
end
def self.workers_config
YAML.load(File.open(ENV['WORKER_YML'] || 'config/resque_workers.yml'))
end
def self.mrake_start(task)
sh "nohup ./script/monit_rake start #{task} RAILS_ENV=#{ENV['RAILS_ENV']} >> log/daemons.log &"
end
end
17. Deploying (cap)
namespace :resque do
desc "Stop the resque daemon"
task :stop, :roles => :resque do
run "cd #{current_path} && RAILS_ENV=#{rails_env} WORKER_YML=#{resque_workers_yml} rake
resque:stop_daemons; true"
end
desc "Start the resque daemon"
task :start, :roles => :resque do
run "cd #{current_path} && RAILS_ENV=#{rails_env} WORKER_YML=#{resque_workers_yml} rake
resque:start_daemons"
end
end
18. 5. Monitoring Workers (monit.erb)
check process resque_scheduler
with pidfile <%= @rails_root %>/tmp/pids/resque_scheduler.pid
group resque
alert errors@domain.com
start program = "/bin/sh -c 'cd <%= @rails_root %>; RAILS_ENV=production ./script/monit_rake
start resque_scheduler resque:scheduler'"
stop program = "/bin/sh -c 'cd <%= @rails_root %>; RAILS_ENV=production ./script/monit_rake
stop resque_scheduler'"
<% YAML.load(File.open(Rails.root+'/config/production/resque/resque_workers.yml')).each_pair do
|worker, config| %>
check process resque_<%=worker%>
with pidfile <%= @rails_root %>/tmp/pids/resque_<%=worker%>.pid
group resque
alert errors@domain.com
start program = "/bin/sh -c 'cd <%= @rails_root %>; RAILS_ENV=production ./script/monit_rake
start resque_<%=worker%> resque:work QUEUE=<%=config['queues']%>'"
stop program = "/bin/sh -c 'cd <%= @rails_root %>; RAILS_ENV=production ./script/monit_rake
stop resque_<%=worker%>'"
<% end %>
use template to generate monit file
19. Monitoring Rake Processes
#!/bin/sh
# wrapper to daemonize rake tasks: see also http://mmonit.com/wiki/Monit/FAQ#pidfile
usage() {
echo "usage: ${0} [start|stop] name target [arguments]"
echo "tname is used to create or read the log and pid file names"
echo "tfor start: target and arguments are passed to rake"
echo "tfor stop: target and arguments are passed to kill (e.g.: -n 3)"
exit 1
}
[ $# -lt 2 ] && usage
cmd=$1
name=$2
shift ; shift
pid_file=./tmp/pids/${name}.pid
log_file=./log/${name}.log
# ...
20. Monitoring Processes
case $cmd in
start)
if [ ${#} -eq 0 ] ; then
echo -e "nERROR: missing targetn"
usage
fi
pid=`cat ${pid_file} 2> /dev/null`
if [ -n "${pid}" ] ; then
ps ${pid}
if [ $? -eq 0 ] ; then
echo "ensure process ${name} (pid: ${pid_file}) is not running"
exit 1
fi
fi
echo $$ > ${pid_file}
exec 2>&1 rake $* 1>> ${log_file} ;;
stop)
pid=`cat ${pid_file} 2> /dev/null`
[ -n "${pid}" ] && kill $* ${pid}
rm -f ${pid_file} ;;
*) usage ;;
esac
22. 6. Running Web
namespace :resque do
task :setup => :environment
desc 'kick off resque-web'
task :web => :environment do
$stdout.sync=true
$stderr.sync=true
puts `env RAILS_ENV=#{RAILS_ENV} resque-web #{RAILS_ROOT}/config/initializers/resque.rb`
end
end
24. 5. Monitoring Work
bdrb resque
ad-hoc queries SQL redis query
did it run? custom resque-meta
did it fail? hoptoad !
rerun !
have id ! resque-meta
que health sample controller !
Did the job run?
resque assumes all worked - only tells you failures. not good enough for us
25. Pausing Workers
signal what happens when to use
quit wait for child & exit gracefully shutdown
term / int immediately kill child & exit shutdown now
usr1 immediately kill child stale child
usr2 don't start any new jobs
cont start to process new jobs
26. Testing Worker
bdrb resque
testing queue mid-easy resque_unit
testing command !
all workers same !
interface only !
28. Extending with Hooks
resque hooks
around_enqueue "
after_enqueue !
before_perform !
around_perform !/"
after_perform !
all plugins want to extend enqueue - not compatible
need to be able to alter arguments (e.g.: add id for meta plugins)
29. Conclusion
! Boss got no pages in first month of implementation
! no memory leaks, great uptime (don't need monit...)
! Fast
! generalized workers increases throughput (nightly vs 1 hour)
! minimal custom code
! still some intimidation
! Eating flavor of the month