How to avoid Benchmark Stuff ("BS") evaluating performance of code. This installment uses time to compare the execution speed of Perl and various shell commands, with and without plumbing.
2. “Perl is too slow”
Heard that before? Yeah...
Mostly wrong – can't refute it without data.
Need to benchmark the times.
3. Damn lies...
Good benchmarks find realistic times.
Most benchmarks prove a point.
They get ignored.
Ignored results are not lazy.
4. Benchmarking perl
The *NIX “time” command.
Good enough to answer most questions.
Avoids much Benchmarking Stuff (“BS”).
5. Simplest tool: “time”
real, system, and user times.
real time heavily affected by system load.
system + user better indication of “work”.
real – work = blocked.
6. “bash takes less time to start up”
perl isn't any slower:
Zero work for both.
Real is all blocked.
$ time perl -e 0
real
user
sys
0m0.005s
0m0.000s
0m0.000s
$ time bash /dev/null
real
user
sys
0m0.005s
0m0.000s
0m0.000s
7. BS: Startup Times
If something just ran it is probably in core.
Saves overhead running it the second time.
Run everything twice to benchmark startups.
Multiple runs or single-user manage
background noise.
8. Minimizing startup issues
Save kernel calls, context switches, interrupts,
latency, transfer I/O...
tmpfs on linux minimizes overhead.
Test with un-loaded system.
Avoid “virtual” systems (CPU, EMC) unless
that is what you are testing.
9. What does startup time tell us?
Opterons are fast?
Useless by itself.
Necessary baseline.
Differences are a warning.
10. Analyzing startup times.
Big differences usually indicate a problem:
Mis-compiled: “-O0” “-g” on production code.
Mixing 32- and 64-bit code and O/S.
Background noise from other running jobs.
Botched startups leave everything else suspect.
11. Do something!
OK, let's time an operation.
Listing a directory is common enough.
“ls” lists the contents, sorts lexically.
Perl's “glob” is similar.
12. Trivial persuit: ls vs glob.
Mostly blocked: 7ms bash vs. 9ms perl.
Failing to clear the screen can skew results!
Remote display, virtual machines.
lembark@dizzy etl $ time bash -c '/bin/ls -d /tmp/*'
real
user
sys
0m0.007s
0m0.000s
0m0.000s
lembark@dizzy etl $ time perl -e '$="n"; $,=" "; print glob "/tmp/*"'
real
user
sys
0m0.019s
0m0.010s
0m0.000s
13. BS: Milliseconds matter
Really care about 12ms? OK, perl is slower.
Most of the difference is in blocked time.
Hint: perl and shell block at the same rate.
perl compiles a statement, which adds
overhead.
Use “ls” for what it is.
14. Doing more
Search files using their basenames:
Find all of the basenames from “2012.05.05”
through “2012.05.16”.
First step: How many files are there?
15. Times
Compare File::Find with /bin/find.
Roughly same system time, added user for compile.
Shell is faster because it is single-purpose.
$ time find . -type f | wc -l;
18583
real
user
sys
0m0.080s
0m0.020s
0m0.050s
$ time perl -MFile::Find -e 'my $i = 0; find sub { -l or -d or ++$i },"."; print $i, "n"'
18583
real
user
sys
0m0.274s
0m0.220s
0m0.050s
16. Multi-layer pipes
Compare the basename to a regex.
Shell:
find . -type f | xargs -l1 basename |
egrep -E '2012.05.(?:0[5-9]|1[0-6])'
Find files, extract basenames, and search with extended syntax (largely
borrowed from Perl).
One-liner with perl, File::Find & File::Basename.
17. BS: Forks & pipes are “free”.
Real, user, and system time are higher for bash.
xargs has to fork/exec many copies of basename.
system overhead from buffering pipes is also higher.
Plumbing is expensive!
$ time find . -type f | xargs -l1 basename | egrep -E '2012.05.(?:0[5-9]|1[0-6])' | wc -l
1604
real
user
sys
0m29.823s
0m0.710s
0m4.220s
$ time perl -MFile::Find=find -MFile::Basename=basename -e 'my $i=0; find sub { -l || -d and return;
/2012.05.(?:0[5-9]|1[0-6])/ and ++$i }, "."; print $i, "n"'
1604
real
user
sys
0m0.301s
0m0.170s
0m0.130s
18. Replacing content “in place”
perl's “-i” replaces files in place.
Shell pre-opens files, can't “sort
Shell requires “sort
-d < a > a”.
-d < a > b && mv b a”.
Now imagine filtering a few thousand files...
19. perl -n & -p with -i
Say you have to update the package names for a
few hundred modules from “::Source” to “::RDS”.
Mixing shell with perl:
find . -type f |
xargs perl -i -p -e's/::Sourceb/::RDS/g';
Exercise: Try writing this in pure shell.
20. Running it doesn't take long either
Nice division of labor:
find & xargs deal with the names.
perl deals with the regex.
not much typing either way.
not much time either.
$ time find . -type f | xargs perl -i -p -e 's/::Sourceb/::RDS/g'
real
user
sys
0m0.112s
0m0.044s
0m0.016s
21. What this means to you.
Plumbing and forks are not free.
Single-purpose programs faster for one thing.
Chaining the simpler tools adds overhead.
Languages faster for multi-stage tasks.