SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
paexec – distributes tasks over network or CPUs
Aleksey Cheusov
vle@gmx.net
Minsk, Belarus, 2013
Problem
◮ Huge amount of data to process
◮ Typical desktop machines have more than one CPU
◮ Unlimited resources are available on the Internet
◮ Heterogeneous environment (*BSD, Linux, Windows...)
Solution
Usage
paexec [OPTIONS] 
-n ’machines or CPUs’ 
-t ’transport program’ 
-c ’calculator’ < tasks
example
ls -1 *.wav | 
paexec -x -c ’flac -s’ -n +4 > /dev/null
example
paexec 
-n ’host1 host2 host3’ 
-t /usr/bin/ssh 
-c ~/bin/toupper < tasks
Example 1: toupper
Our goal is to convert strings to upper case in parallel
˜/bin/toupper
#!/usr/bin/awk -f
{
print " ", toupper($0)
print "" # empty line -- end-of-task marker!
fflush() # We must flush stdout!
}
˜/tmp/tasks
apple
bananas
orange
Example 1: toupper
~/bin/toupper script will be run only once on remote servers
“server1” and “server2”. It takes tasks from stdin (one task per
line). Transport program is ssh(1).
paexec invocation
$ paexec -t ssh -c ~/bin/toupper 
-n ’server1 server2’ < tasks > results
$ cat results
BANANAS
ORANGE
APPLE
$
Example 1: toupper
Options -l and -r add the task number and server where this task
is processed to stdout.
paexec -lr invocation
$ paexec -lr -t ssh -c ~/bin/toupper 
-n ’server1 server2’ < tasks > results
$ cat results
server2 2 BANANAS
server2 3 ORANGE
server1 1 APPLE
$
Example 1: toupper
The same as above but four instances of ~/bin/toupper are ran
locally.
paexec invocation
$ paexec -n +4 -c ~/bin/toupper
< tasks > results
$ cat results
BANANAS
ORANGE
APPLE
$
Example 1: toupper
The same as above but without ~/bin/toupper. In this example
we run AWK program for each individual task. At the same time
we still make only one ssh connection to each server regardless of a
number of tasks given on input.
paexec invocation
$ paexec -x -t ssh -n ’server1 server2’ 
-c "awk ’BEGIN {print toupper(ARGV[1])}’ " 
< tasks > results
$ cat results
ORANGE
BANANAS
APPLE
$
Example 1: toupper
If we want to ”shquate” less one can use option -C instead of -c
and specify command after options.
paexec invocation
$ paexec -x -C -t ssh -n ’server1 server2’ 
awk ’BEGIN {print toupper(ARGV[1])}’ 
< tasks > results
$ cat results
ORANGE
BANANAS
APPLE
$
Example 1: toupper
With options -z or -Z we can easily run our tasks on unreliable
hosts. If we cannot connect or lost connection to the server,
paexec will redistribute failed tasks to another server.
paexec invocation
$ paexec -Z240 -x -t ssh 
-n ’server1 badhostname server2’ 
-c "awk ’BEGIN {print toupper(ARGV[1])}’ " 
< tasks > results
ssh: Could not resolve hostname badhostname:
No address associated with hostname
badhostname 1 fatal
$ cat results
ORANGE
BANANAS
APPLE
$
Example 2: parallel banner(1)
what is banner(1)?
$ banner -f @ NetBSD
@ @ @@@@@@ @@@@@ @@@@@@
@@ @ @@@@@@ @@@@@ @ @ @ @ @ @
@ @ @ @ @ @ @ @ @ @
@ @ @ @@@@@ @ @@@@@@ @@@@@ @ @
@ @ @ @ @ @ @ @ @ @
@ @@ @ @ @ @ @ @ @ @
@ @ @@@@@@ @ @@@@@@ @@@@@ @@@@@@
$
Example 2: parallel banner(1)
~/bin/pbanner is wrapper for banner(1) for reading tasks line
by line. Magic line is used instead empty line as an end-of-task
marker.
˜/bin/pbanner
#!/usr/bin/env sh
while read task; do
banner -f M "$task" |
echo ”$PAEXEC EOT” # end-of-task marker
done
tasks
pae
xec
paexec invocation
$ paexec -l -mt=’SE@X-L0S0!&’ -c ~/bin/pbanner 
-n +2 < tasks > result
$
Example 2: parallel banner(1)
paexec(1) reads calculator’s output asynchronously. So, its
output is sliced.
Sliced result
$ cat result
2
2
2 M M MMMMMM MMMM
2 M M M M M
1
1
1 MMMMM MM MMMMMM
1 M M M M M
1 M M M M MMMMM
1 MMMMM MMMMMM M
2 MM MMMMM M
2 MM M M
2 M M M M M
2 M M MMMMMM MMMM
2
1 M M M M
1 M M M MMMMMM
1
$
Example 2: parallel banner(1)
paexec reorder(1) normalizes paexec(1)’s output.
Ordered result
$ paexec_reorder -mt=’SE@X-L0S0!&’ results
MMMMM MM MMMMMM
M M M M M
M M M M MMMMM
MMMMM MMMMMM M
M M M M
M M M MMMMMM
M M MMMMMM MMMM
M M M M M
MM MMMMM M
MM M M
M M M M M
M M MMMMMM MMMM
$
Example 2: parallel banner(1)
The same as above but using magic end-of-task marker provided
by paexec(1).
Ordered result
$ paexec -y -lc ~/bin/pbanner -n+2 < tasks | paexec_reorder -y
MMMMM MM MMMMMM
M M M M M
M M M M MMMMM
MMMMM MMMMMM M
M M M M
M M M MMMMMM
M M MMMMMM MMMM
M M M M M
MM MMMMM M
MM M M
M M M M M
M M MMMMMM MMMM
$
Example 2: parallel banner(1)
For this trivial task wrapper like ~/bin/pbanner is not needed.
We can easily run banner(1) directly.
Sliced result
$ paexec -l -x -c banner -n+2 < tasks
2
2
2 M M MMMMMM MMMM
2 M M M M M
2 MM MMMMM M
2 MM M M
2 M M M M M
1
1
1 MMMMM MM MMMMMM
1 M M M M M
1 M M M M MMMMM
1 MMMMM MMMMMM M
2 M M MMMMMM MMMM
2
1 M M M M
1 M M M MMMMMM
1
$
Example 3: dependency graph of tasks
paexec(1) is able to build tasks taking into account their
“dependencies”. Here “devel/gmake” and others are pkgsrc
packages. Our goal in this example is to build pkgsrc package
audio/abcde and all its build-time and compile-time dependencies.
audio/cd-discid
audio/abcde
textproc/gsed audio/cdparanoia audio/id3v2 audio/id3misc/mkcue shells/bash
devel/libtool-base
audio/id3libdevel/gmake
devel/m4
devel/bison
lang/f2c
Example 3: dependency graph of tasks
‘‘paexec -g’’ takes a dependency graph on input (in tsort(1)
format). Tasks are separated by space (paexec -md=).
˜/tmp/packages to build
audio/cd-discid audio/abcde
textproc/gsed audio/abcde
audio/cdparanoia audio/abcde
audio/id3v2 audio/abcde
audio/id3 audio/abcde
misc/mkcue audio/abcde
shells/bash audio/abcde
devel/libtool-base audio/cdparanoia
devel/gmake audio/cdparanoia
devel/libtool-base audio/id3lib
devel/gmake audio/id3v2
audio/id3lib audio/id3v2
devel/m4 devel/bison
lang/f2c devel/libtool-base
devel/gmake misc/mkcue
devel/bison shells/bash
Example 3: dependency graph of tasks
If option -g is applied, every task may succeed or fail. In case of
failure all dependants fail recursively. For this to work we have to
slightly adapt “calculator”.
˜/bin/pkg builder
#!/usr/bin/awk -f
{
print "build " $0
print ”success” # build succeeded! (paexec -ms=)
print "" # end-of-task marker
fflush() # we must flush stdout
}
Example 3: dependency graph of tasks
paexec -g invocation (no failures)
$ paexec -g -l -c ~/bin/pkg_builder -n ’server2 server1’ 
-t ssh < ~/tmp/packages_to_build | paexec_reorder > result
$ cat result
build textproc/gsed
success
build devel/gmake
success
build misc/mkcue
success
build devel/m4
success
build devel/bison
success
...
build audio/id3v2
success
build audio/abcde
success
$
Example 3: dependency graph of tasks
Let’s suppose that “devel/gmake” fails to build.
˜/bin/pkg builder
#!/usr/bin/awk -f
{
print "build " $0
if ($0 == "devel/gmake")
print ”failure” # Oh no...
else
print "success" # build succeeded!
print "" # end-of-task marker
fflush() # we must flush stdout
}
Example 3: dependency graph of tasks
Package “devel/gmake” and all dependants are marked as failed.
Even if failures happen, the build continues.
paexec -g invocation (with failures)
$ paexec -gl -c ~/bin/pkg_builder -n ’server2 server1’ 
-t ssh < ~/tmp/packages_to_build | paexec_reorder > result
$ cat result
build audio/cd-discid
success
build audio/id3
success
build devel/gmake
failure
devel/gmake audio/cdparanoia audio/abcde audio/id3v2 misc/mkcue
build devel/m4
success
build textproc/gsed
success
...
$
Example 3: dependency graph of tasks
paexec is resistant not only to network failures but also to
unexpected calculator exits or crashes.
˜/bin/pkg builder
#!/usr/bin/awk -f
{
"hostname -s" | getline hostname
print "build " $0 " on " hostname
if (hostname == "server1" && $0 == "textproc/gsed")
exit 139
# Damn it, I’m dying...
# Take a note that exit status doesn’t matter.
else
print "success" # Yes! :-)
print "" # end-of-task marker
fflush() # we must flush stdout
}
Example 3: dependency graph of tasks
“textproc/gsed” failed on “server1” but then succeeded on
“server2”. Every 300 seconds we try to reconnect to “server1”.
Keywords “success”, “failure” and “fatal” may be changed with a
help of -ms=, -mf= and -mF= options respectively.
paexec -Z300 invocation (with failure)
$ paexec -gl -Z300 -t ssh -c ~/bin/pkg_builder 
-n ’server2 server1’ < ~/tmp/packages_to_build 
| paexec_reorder > result
$ cat result
build audio/cd-discid on server2
success
build textproc/gsed on server1
fatal
build textproc/gsed on server2
success
build audio/id3 on server2
success
...
$
Example 4: Converting .wav files to .flac or .ogg
In trivial cases we don’t need relatively complex “calculator”.
Running a trivial command may be enough. Below we run three
.wav to .flac/.ogg convertors in parallel. If -x is applied, task is
passed to calculator as an argument.
paexec -x invocation
$ ls -1 *.wav | paexec -x -c ’flac -s’ -n+3 >/dev/null
$
paexec -x invocation
$ ls -1 *.wav | paexec -ixC -n+3 oggenc -Q | grep .
01-Crying_Wolf.wav
02-Autumn.wav
03-Time_Heals.wav
04-Alice_(Letting_Go).wav
05-This_Side_Of_The_Looking_Glass.wav
...
$
Example 5: paexec -W
If different tasks take different amount of time to process, than it
makes sense to process “heavier” ones earlier in order to minimize
total calculation time. For this to work one can weigh each tasks.
Note that this mode enables “graph” mode automatically.
˜/bin/calc
#!/bin/sh
# $1 -- task given on input
if test $1 = huge; then
sleep 6
else
sleep 1
fi
echo "task $1 done"
Example 5: paexec -W
This is how we run unweighted tasks. The whole process takes 8
seconds.
paexec invocation
$ printf ’small1nsmall2nsmall3nsmall4nsmall5nhugen’ |
time -p paexec -c ~/bin/calc -n +2 -xg | grep -v success
task small2 done
task small1 done
task small3 done
task small4 done
task small5 done
task huge done
real 8.04
user 0.03
sys 0.03
$
Example 5: paexec -W
If we say paexec that the task “huge” is performed 6 times longer
than others, it starts“huge” first and then others. In total we
spend 6 seconds for all tasks.
paexec -W1 invocation
$ printf ’small1nsmall2nsmall3nsmall4nweight: huge 6n’ |
time -p paexec -c ~/bin/calc -n +2 -x -W1 | grep -v success
task small1 done
task small2 done
task small3 done
task small4 done
task small5 done
task huge done
real 6.02
user 0.03
sys 0.02
$
For details see the manual page.
The End

Mais conteúdo relacionado

Mais procurados

Using Node.js to Build Great Streaming Services - HTML5 Dev Conf
Using Node.js to  Build Great  Streaming Services - HTML5 Dev ConfUsing Node.js to  Build Great  Streaming Services - HTML5 Dev Conf
Using Node.js to Build Great Streaming Services - HTML5 Dev Conf
Tom Croucher
 
32 shell-programming
32 shell-programming32 shell-programming
32 shell-programming
kayalkarnan
 
Monitoring with Syslog and EventMachine (RailswayConf 2012)
Monitoring  with  Syslog and EventMachine (RailswayConf 2012)Monitoring  with  Syslog and EventMachine (RailswayConf 2012)
Monitoring with Syslog and EventMachine (RailswayConf 2012)
Wooga
 
Quize on scripting shell
Quize on scripting shellQuize on scripting shell
Quize on scripting shell
lebse123
 

Mais procurados (20)

Any event intro
Any event introAny event intro
Any event intro
 
Using Node.js to Build Great Streaming Services - HTML5 Dev Conf
Using Node.js to  Build Great  Streaming Services - HTML5 Dev ConfUsing Node.js to  Build Great  Streaming Services - HTML5 Dev Conf
Using Node.js to Build Great Streaming Services - HTML5 Dev Conf
 
Perl: Coro asynchronous
Perl: Coro asynchronous Perl: Coro asynchronous
Perl: Coro asynchronous
 
Information security programming in ruby
Information security programming in rubyInformation security programming in ruby
Information security programming in ruby
 
32 shell-programming
32 shell-programming32 shell-programming
32 shell-programming
 
Monitoring with Syslog and EventMachine (RailswayConf 2012)
Monitoring  with  Syslog and EventMachine (RailswayConf 2012)Monitoring  with  Syslog and EventMachine (RailswayConf 2012)
Monitoring with Syslog and EventMachine (RailswayConf 2012)
 
More than syntax
More than syntaxMore than syntax
More than syntax
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
Asynchronous Programming FTW! 2 (with AnyEvent)
Asynchronous Programming FTW! 2 (with AnyEvent)Asynchronous Programming FTW! 2 (with AnyEvent)
Asynchronous Programming FTW! 2 (with AnyEvent)
 
Puppet Node Classifiers Talk - Patrick Buckley
Puppet Node Classifiers Talk - Patrick BuckleyPuppet Node Classifiers Talk - Patrick Buckley
Puppet Node Classifiers Talk - Patrick Buckley
 
Implementações paralelas
Implementações paralelasImplementações paralelas
Implementações paralelas
 
Redis as a message queue
Redis as a message queueRedis as a message queue
Redis as a message queue
 
Final opensource record 2019
Final opensource record 2019Final opensource record 2019
Final opensource record 2019
 
Quize on scripting shell
Quize on scripting shellQuize on scripting shell
Quize on scripting shell
 
Ubic
UbicUbic
Ubic
 
The promise of asynchronous php
The promise of asynchronous phpThe promise of asynchronous php
The promise of asynchronous php
 
Memory Manglement in Raku
Memory Manglement in RakuMemory Manglement in Raku
Memory Manglement in Raku
 
ReactPHP
ReactPHPReactPHP
ReactPHP
 
Linux-Fu for PHP Developers
Linux-Fu for PHP DevelopersLinux-Fu for PHP Developers
Linux-Fu for PHP Developers
 
Linux shell
Linux shellLinux shell
Linux shell
 

Semelhante a Paexec — distributes tasks over network or CPUs

101 3.4 use streams, pipes and redirects
101 3.4 use streams, pipes and redirects101 3.4 use streams, pipes and redirects
101 3.4 use streams, pipes and redirects
Acácio Oliveira
 
Lecture 3 Perl & FreeBSD administration
Lecture 3 Perl & FreeBSD administrationLecture 3 Perl & FreeBSD administration
Lecture 3 Perl & FreeBSD administration
Mohammed Farrag
 
Shell Scripts
Shell ScriptsShell Scripts
Shell Scripts
Dr.Ravi
 
Aws dc elastic-mapreduce
Aws dc elastic-mapreduceAws dc elastic-mapreduce
Aws dc elastic-mapreduce
beaknit
 
Aws dc elastic-mapreduce
Aws dc elastic-mapreduceAws dc elastic-mapreduce
Aws dc elastic-mapreduce
beaknit
 

Semelhante a Paexec — distributes tasks over network or CPUs (20)

Paexec -- distributed tasks over network or cpus
Paexec -- distributed tasks over network or cpusPaexec -- distributed tasks over network or cpus
Paexec -- distributed tasks over network or cpus
 
101 3.4 use streams, pipes and redirects
101 3.4 use streams, pipes and redirects101 3.4 use streams, pipes and redirects
101 3.4 use streams, pipes and redirects
 
Tres Gemas De Ruby
Tres Gemas De RubyTres Gemas De Ruby
Tres Gemas De Ruby
 
Perl - laziness, impatience, hubris, and one liners
Perl - laziness, impatience, hubris, and one linersPerl - laziness, impatience, hubris, and one liners
Perl - laziness, impatience, hubris, and one liners
 
EC2
EC2EC2
EC2
 
Workshop on command line tools - day 2
Workshop on command line tools - day 2Workshop on command line tools - day 2
Workshop on command line tools - day 2
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
One-Liners to Rule Them All
One-Liners to Rule Them AllOne-Liners to Rule Them All
One-Liners to Rule Them All
 
From Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOSFrom Zero to Application Delivery with NixOS
From Zero to Application Delivery with NixOS
 
Lecture 3 Perl & FreeBSD administration
Lecture 3 Perl & FreeBSD administrationLecture 3 Perl & FreeBSD administration
Lecture 3 Perl & FreeBSD administration
 
Shell Scripts
Shell ScriptsShell Scripts
Shell Scripts
 
Fullstack conf 2017 - Basic dev pipeline end-to-end
Fullstack conf 2017 - Basic dev pipeline end-to-endFullstack conf 2017 - Basic dev pipeline end-to-end
Fullstack conf 2017 - Basic dev pipeline end-to-end
 
Aws dc elastic-mapreduce
Aws dc elastic-mapreduceAws dc elastic-mapreduce
Aws dc elastic-mapreduce
 
Aws dc elastic-mapreduce
Aws dc elastic-mapreduceAws dc elastic-mapreduce
Aws dc elastic-mapreduce
 
Sge
SgeSge
Sge
 
Dtalk shell
Dtalk shellDtalk shell
Dtalk shell
 
Perly Parsing with Regexp::Grammars
Perly Parsing with Regexp::GrammarsPerly Parsing with Regexp::Grammars
Perly Parsing with Regexp::Grammars
 
DevSecCon London 2017 - MacOS security, hardening and forensics 101 by Ben Hu...
DevSecCon London 2017 - MacOS security, hardening and forensics 101 by Ben Hu...DevSecCon London 2017 - MacOS security, hardening and forensics 101 by Ben Hu...
DevSecCon London 2017 - MacOS security, hardening and forensics 101 by Ben Hu...
 
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea LuzzardiWhat's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
What's New in Docker 1.12 by Mike Goelzer and Andrea Luzzardi
 
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea LuzzardiWhat's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
What's New in Docker 1.12 (June 20, 2016) by Mike Goelzer & Andrea Luzzardi
 

Mais de Minsk Linux User Group

Mais de Minsk Linux User Group (20)

Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...
 Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P... Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...
Vladimir ’mend0za’ Shakhov — Linux firmware for iRMC controller on Fujitsu P...
 
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...
Андрэй Захарэвіч — Hack the Hackpad: Першая спроба публічнага кіравання задач...
 
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасць
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасцьСвятлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасць
Святлана Ермаковіч — Вікі-дапаможнік. Як узмацніць беларускую вікі-супольнасць
 
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использования
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использованияТимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использования
Тимофей Титовец — Elastic+Logstash+Kibana: Архитектура и опыт использования
 
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSDАндрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
 
Vitaly ̈_Vi ̈ Shukela - My FOSS projects
Vitaly  ̈_Vi ̈ Shukela - My FOSS projectsVitaly  ̈_Vi ̈ Shukela - My FOSS projects
Vitaly ̈_Vi ̈ Shukela - My FOSS projects
 
Vitaly ̈_Vi ̈ Shukela - Dive
Vitaly  ̈_Vi ̈ Shukela - DiveVitaly  ̈_Vi ̈ Shukela - Dive
Vitaly ̈_Vi ̈ Shukela - Dive
 
Alexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизниAlexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизни
 
Vikentsi Lapa — How does software testing become software development?
Vikentsi Lapa — How does software testing  become software development?Vikentsi Lapa — How does software testing  become software development?
Vikentsi Lapa — How does software testing become software development?
 
Михаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? ПродолжениеМихаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? Продолжение
 
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPNМаксим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
 
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...
 
MajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного ДомаMajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного Дома
 
Максим Салов - Отладочный монитор
Максим Салов - Отладочный мониторМаксим Салов - Отладочный монитор
Максим Салов - Отладочный монитор
 
Максим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overviewМаксим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overview
 
Константин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о JenkinsКонстантин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о Jenkins
 
Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»
 
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
 
Vikentsi Lapa - Tools for testing
Vikentsi Lapa - Tools for testingVikentsi Lapa - Tools for testing
Vikentsi Lapa - Tools for testing
 
Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Paexec — distributes tasks over network or CPUs

  • 1. paexec – distributes tasks over network or CPUs Aleksey Cheusov vle@gmx.net Minsk, Belarus, 2013
  • 2. Problem ◮ Huge amount of data to process ◮ Typical desktop machines have more than one CPU ◮ Unlimited resources are available on the Internet ◮ Heterogeneous environment (*BSD, Linux, Windows...)
  • 3. Solution Usage paexec [OPTIONS] -n ’machines or CPUs’ -t ’transport program’ -c ’calculator’ < tasks example ls -1 *.wav | paexec -x -c ’flac -s’ -n +4 > /dev/null example paexec -n ’host1 host2 host3’ -t /usr/bin/ssh -c ~/bin/toupper < tasks
  • 4. Example 1: toupper Our goal is to convert strings to upper case in parallel ˜/bin/toupper #!/usr/bin/awk -f { print " ", toupper($0) print "" # empty line -- end-of-task marker! fflush() # We must flush stdout! } ˜/tmp/tasks apple bananas orange
  • 5. Example 1: toupper ~/bin/toupper script will be run only once on remote servers “server1” and “server2”. It takes tasks from stdin (one task per line). Transport program is ssh(1). paexec invocation $ paexec -t ssh -c ~/bin/toupper -n ’server1 server2’ < tasks > results $ cat results BANANAS ORANGE APPLE $
  • 6. Example 1: toupper Options -l and -r add the task number and server where this task is processed to stdout. paexec -lr invocation $ paexec -lr -t ssh -c ~/bin/toupper -n ’server1 server2’ < tasks > results $ cat results server2 2 BANANAS server2 3 ORANGE server1 1 APPLE $
  • 7. Example 1: toupper The same as above but four instances of ~/bin/toupper are ran locally. paexec invocation $ paexec -n +4 -c ~/bin/toupper < tasks > results $ cat results BANANAS ORANGE APPLE $
  • 8. Example 1: toupper The same as above but without ~/bin/toupper. In this example we run AWK program for each individual task. At the same time we still make only one ssh connection to each server regardless of a number of tasks given on input. paexec invocation $ paexec -x -t ssh -n ’server1 server2’ -c "awk ’BEGIN {print toupper(ARGV[1])}’ " < tasks > results $ cat results ORANGE BANANAS APPLE $
  • 9. Example 1: toupper If we want to ”shquate” less one can use option -C instead of -c and specify command after options. paexec invocation $ paexec -x -C -t ssh -n ’server1 server2’ awk ’BEGIN {print toupper(ARGV[1])}’ < tasks > results $ cat results ORANGE BANANAS APPLE $
  • 10. Example 1: toupper With options -z or -Z we can easily run our tasks on unreliable hosts. If we cannot connect or lost connection to the server, paexec will redistribute failed tasks to another server. paexec invocation $ paexec -Z240 -x -t ssh -n ’server1 badhostname server2’ -c "awk ’BEGIN {print toupper(ARGV[1])}’ " < tasks > results ssh: Could not resolve hostname badhostname: No address associated with hostname badhostname 1 fatal $ cat results ORANGE BANANAS APPLE $
  • 11. Example 2: parallel banner(1) what is banner(1)? $ banner -f @ NetBSD @ @ @@@@@@ @@@@@ @@@@@@ @@ @ @@@@@@ @@@@@ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @@@@@ @ @@@@@@ @@@@@ @ @ @ @ @ @ @ @ @ @ @ @ @ @@ @ @ @ @ @ @ @ @ @ @ @@@@@@ @ @@@@@@ @@@@@ @@@@@@ $
  • 12. Example 2: parallel banner(1) ~/bin/pbanner is wrapper for banner(1) for reading tasks line by line. Magic line is used instead empty line as an end-of-task marker. ˜/bin/pbanner #!/usr/bin/env sh while read task; do banner -f M "$task" | echo ”$PAEXEC EOT” # end-of-task marker done tasks pae xec paexec invocation $ paexec -l -mt=’SE@X-L0S0!&’ -c ~/bin/pbanner -n +2 < tasks > result $
  • 13. Example 2: parallel banner(1) paexec(1) reads calculator’s output asynchronously. So, its output is sliced. Sliced result $ cat result 2 2 2 M M MMMMMM MMMM 2 M M M M M 1 1 1 MMMMM MM MMMMMM 1 M M M M M 1 M M M M MMMMM 1 MMMMM MMMMMM M 2 MM MMMMM M 2 MM M M 2 M M M M M 2 M M MMMMMM MMMM 2 1 M M M M 1 M M M MMMMMM 1 $
  • 14. Example 2: parallel banner(1) paexec reorder(1) normalizes paexec(1)’s output. Ordered result $ paexec_reorder -mt=’SE@X-L0S0!&’ results MMMMM MM MMMMMM M M M M M M M M M MMMMM MMMMM MMMMMM M M M M M M M M MMMMMM M M MMMMMM MMMM M M M M M MM MMMMM M MM M M M M M M M M M MMMMMM MMMM $
  • 15. Example 2: parallel banner(1) The same as above but using magic end-of-task marker provided by paexec(1). Ordered result $ paexec -y -lc ~/bin/pbanner -n+2 < tasks | paexec_reorder -y MMMMM MM MMMMMM M M M M M M M M M MMMMM MMMMM MMMMMM M M M M M M M M MMMMMM M M MMMMMM MMMM M M M M M MM MMMMM M MM M M M M M M M M M MMMMMM MMMM $
  • 16. Example 2: parallel banner(1) For this trivial task wrapper like ~/bin/pbanner is not needed. We can easily run banner(1) directly. Sliced result $ paexec -l -x -c banner -n+2 < tasks 2 2 2 M M MMMMMM MMMM 2 M M M M M 2 MM MMMMM M 2 MM M M 2 M M M M M 1 1 1 MMMMM MM MMMMMM 1 M M M M M 1 M M M M MMMMM 1 MMMMM MMMMMM M 2 M M MMMMMM MMMM 2 1 M M M M 1 M M M MMMMMM 1 $
  • 17. Example 3: dependency graph of tasks paexec(1) is able to build tasks taking into account their “dependencies”. Here “devel/gmake” and others are pkgsrc packages. Our goal in this example is to build pkgsrc package audio/abcde and all its build-time and compile-time dependencies. audio/cd-discid audio/abcde textproc/gsed audio/cdparanoia audio/id3v2 audio/id3misc/mkcue shells/bash devel/libtool-base audio/id3libdevel/gmake devel/m4 devel/bison lang/f2c
  • 18. Example 3: dependency graph of tasks ‘‘paexec -g’’ takes a dependency graph on input (in tsort(1) format). Tasks are separated by space (paexec -md=). ˜/tmp/packages to build audio/cd-discid audio/abcde textproc/gsed audio/abcde audio/cdparanoia audio/abcde audio/id3v2 audio/abcde audio/id3 audio/abcde misc/mkcue audio/abcde shells/bash audio/abcde devel/libtool-base audio/cdparanoia devel/gmake audio/cdparanoia devel/libtool-base audio/id3lib devel/gmake audio/id3v2 audio/id3lib audio/id3v2 devel/m4 devel/bison lang/f2c devel/libtool-base devel/gmake misc/mkcue devel/bison shells/bash
  • 19. Example 3: dependency graph of tasks If option -g is applied, every task may succeed or fail. In case of failure all dependants fail recursively. For this to work we have to slightly adapt “calculator”. ˜/bin/pkg builder #!/usr/bin/awk -f { print "build " $0 print ”success” # build succeeded! (paexec -ms=) print "" # end-of-task marker fflush() # we must flush stdout }
  • 20. Example 3: dependency graph of tasks paexec -g invocation (no failures) $ paexec -g -l -c ~/bin/pkg_builder -n ’server2 server1’ -t ssh < ~/tmp/packages_to_build | paexec_reorder > result $ cat result build textproc/gsed success build devel/gmake success build misc/mkcue success build devel/m4 success build devel/bison success ... build audio/id3v2 success build audio/abcde success $
  • 21. Example 3: dependency graph of tasks Let’s suppose that “devel/gmake” fails to build. ˜/bin/pkg builder #!/usr/bin/awk -f { print "build " $0 if ($0 == "devel/gmake") print ”failure” # Oh no... else print "success" # build succeeded! print "" # end-of-task marker fflush() # we must flush stdout }
  • 22. Example 3: dependency graph of tasks Package “devel/gmake” and all dependants are marked as failed. Even if failures happen, the build continues. paexec -g invocation (with failures) $ paexec -gl -c ~/bin/pkg_builder -n ’server2 server1’ -t ssh < ~/tmp/packages_to_build | paexec_reorder > result $ cat result build audio/cd-discid success build audio/id3 success build devel/gmake failure devel/gmake audio/cdparanoia audio/abcde audio/id3v2 misc/mkcue build devel/m4 success build textproc/gsed success ... $
  • 23. Example 3: dependency graph of tasks paexec is resistant not only to network failures but also to unexpected calculator exits or crashes. ˜/bin/pkg builder #!/usr/bin/awk -f { "hostname -s" | getline hostname print "build " $0 " on " hostname if (hostname == "server1" && $0 == "textproc/gsed") exit 139 # Damn it, I’m dying... # Take a note that exit status doesn’t matter. else print "success" # Yes! :-) print "" # end-of-task marker fflush() # we must flush stdout }
  • 24. Example 3: dependency graph of tasks “textproc/gsed” failed on “server1” but then succeeded on “server2”. Every 300 seconds we try to reconnect to “server1”. Keywords “success”, “failure” and “fatal” may be changed with a help of -ms=, -mf= and -mF= options respectively. paexec -Z300 invocation (with failure) $ paexec -gl -Z300 -t ssh -c ~/bin/pkg_builder -n ’server2 server1’ < ~/tmp/packages_to_build | paexec_reorder > result $ cat result build audio/cd-discid on server2 success build textproc/gsed on server1 fatal build textproc/gsed on server2 success build audio/id3 on server2 success ... $
  • 25. Example 4: Converting .wav files to .flac or .ogg In trivial cases we don’t need relatively complex “calculator”. Running a trivial command may be enough. Below we run three .wav to .flac/.ogg convertors in parallel. If -x is applied, task is passed to calculator as an argument. paexec -x invocation $ ls -1 *.wav | paexec -x -c ’flac -s’ -n+3 >/dev/null $ paexec -x invocation $ ls -1 *.wav | paexec -ixC -n+3 oggenc -Q | grep . 01-Crying_Wolf.wav 02-Autumn.wav 03-Time_Heals.wav 04-Alice_(Letting_Go).wav 05-This_Side_Of_The_Looking_Glass.wav ... $
  • 26. Example 5: paexec -W If different tasks take different amount of time to process, than it makes sense to process “heavier” ones earlier in order to minimize total calculation time. For this to work one can weigh each tasks. Note that this mode enables “graph” mode automatically. ˜/bin/calc #!/bin/sh # $1 -- task given on input if test $1 = huge; then sleep 6 else sleep 1 fi echo "task $1 done"
  • 27. Example 5: paexec -W This is how we run unweighted tasks. The whole process takes 8 seconds. paexec invocation $ printf ’small1nsmall2nsmall3nsmall4nsmall5nhugen’ | time -p paexec -c ~/bin/calc -n +2 -xg | grep -v success task small2 done task small1 done task small3 done task small4 done task small5 done task huge done real 8.04 user 0.03 sys 0.03 $
  • 28. Example 5: paexec -W If we say paexec that the task “huge” is performed 6 times longer than others, it starts“huge” first and then others. In total we spend 6 seconds for all tasks. paexec -W1 invocation $ printf ’small1nsmall2nsmall3nsmall4nweight: huge 6n’ | time -p paexec -c ~/bin/calc -n +2 -x -W1 | grep -v success task small1 done task small2 done task small3 done task small4 done task small5 done task huge done real 6.02 user 0.03 sys 0.02 $
  • 29. For details see the manual page. The End