Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Linux intro 5 extra: makefiles
1. Programming for Evolutionary Biology
March 17th - April 1st 2012
Leipzig, Germany
Introduction to Unix systems
Extra: writing simple pipelines
with make
Giovanni Marco Dall'Olio
Universitat Pompeu Fabra
Barcelona (Spain)
2. GNU/make
make is a tool to store commandline instructions
and reexecute them quickly, along with all their
parameters
It is a declarative programming language
It belongs to a class of softwares called 'automated
build tools'
3. Simplest Makefile example
The simplest Makefile contains just the name of a task and
the commands associated with it:
print_hello is a makefile 'rule': it stores the commands
needed to say 'Hello, world!' to the screen.
4. Simplest Makefile example
Makefile rule
Target of the
rule
Commands associated
This is a with the rule
tabulation (not
8 spaces)
5. Simplest Makefile example
Create a file in your
computer and save it as
'Makefile'.
Write these instructions in it:
print_hello:
echo 'Hello, world!!'
This is a tabulation
Then, open a terminal and (<Tab> key)
type:
make -f Makefile print_hello
7. Simplest Makefile example
–
explanation
When invoked, the program 'make' looks for a file in the
current directory called 'Makefile'
When we type 'make print_hello', it executes any procedure
(target) called 'print_hello' in the makefile
It then shows the commands executed and their output
8. Tip1: the 'Makefile' file
The 'f' option allows you to define the file which
contains the instructions for make
If you omit this option, make will look for any file
called 'Makefile' in the current directory
make -f Makefile all
is equivalent to:
make all
9. A sligthly longer example
You can add as many
commands you like
to a rule
For example, this
'print_hello' rule
contains 5 commands
Note: ignore the '@'
thing, it is only to
disable verbose mode
(explained later)
11. Make - advantages
Make allows you to save shell commands along
with their parameters and reexecute them;
It allows you to use commandline tools which are
more flexible;
Combined with a revision control software, it
makes possible to reproduce all the operations
made to your data;
13. The target syntax
Makefile syntax:
<target>: (prerequisites)
<commands associated to the
rule>
14. The target syntax
The target of a rule can be either a title for the task, or a file
name.
Everytime you call a make rule (example: 'make all'), the
program looks for a file called like the target name (e.g.
'all', 'clean', 'inputdata.txt', 'results.txt')
The rule is executed only if that file doesn't exists.
15. Filename as target names
In this
makefile, we
have two rules:
'testfile.txt' and
'clean'
16. Filename as target names
In this
makefile, we
have two rules:
'testfile.txt' and
'clean'
When we call
'make
testfile.txt',
make checks if
a file called
'testfile.txt'
already exists.
17. Filename as target names
The commands
associated with the
rule 'testfile.txt' are
executed only if
that file doesn't
exists already
18. Multiple target definition
A target can also be a
list of files
You can retrieve the
matched target with
the special variable
$@
19. Special characters
The % character can be used as a wild card
For example, a rule with the target:
%.txt:
....
would be activated by any file ending with '.txt'
'make 1.txt', 'make 2.txt', etc..
We will be able to retrieve the matched expression
with '$*'
21. Makefile – cluster support
Note that in the previous
example we created three
files at the same time, by
executing three times the
command 'touch'
If we use the 'j' option when
invoking make, the three
processess will be launched
in parallel
22. The commands syntax
Makefile syntax:
<target>: (prerequisites)
<commands associated to the
rule>
23. Inactivating verbose mode
You can disactivate the verbose mode for a line by
adding '@' at its beginning:
Differences here
24. Skipping errors
The modifiers '' tells make to ignore errors returned
by a command
Example:
'mkdir /var' will cause an error (the '/var' directory
already exists) and cause gnu/make to exit
'mkdir /var' will cause an error anyway, but
gnu/make will ignore it
25. Moving throught directories
A big issue with make is that every line is executed as a
different shell process.
So, this:
lsvar:
cd /var
ls
Won't work (it will list only the files in the current
directory, not /var)
The solution is to put everything in a single process:
lsvar:
(cd /var; ls)
27. The commands syntax
Makefile syntax:
<target>: (prerequisites)
<commands associated to the
rule>
We will look at the 'prerequisites' part of a make
rule, that I had skipped before
28. Real Makefile-rule syntax
Complete syntax for a Makefile rule:
<target>: <list of prerequisites>
<commands associated to the rule>
Example:
result1.txt: data1.txt data2.txt
cat data1.txt data2.txt > result1.txt
@echo 'result1.txt' has been calculated'
Prerequisites are files (or rules) that need to exists already in
order to create the target file.
If 'data1.txt' and 'data2.txt' don't exist, the rule 'result1.txt' will
exit with an error (no rule to create them)
29. Piping Makefile rules
together
You can pipe two Makefile rules together by
defining prerequisites
30. Piping Makefile rules
together
The rule 'result1.txt' depends on the rule 'data1.txt',
which should be executed first
31. Piping Makefile rules
together
Let's look at this
example
again:
what happens if
we remove the
file 'result1.txt'
we just
created?
32. Piping Makefile rules
together
Let's look at this
example
again:
what happens if
we remove the
file 'result1.txt'
we just
created?
The second time
we run the
'make
result1.txt'
command, it is
not necessary
to create
data1.txt
33. Other pipe example
all: result1.txt result2.txt
result1.txt: data1.txt
calculate_result.py
python calculate_result.txt --input
data1.txt
result2.txt: data2.txt
cut -f 1, 3 data2.txt > result2.txt
Make all will calculate result1.txt and result2.txt, if
they don't exist already (and they are older than
their prerequisites)
34. Conditional execution by
modification date
We have seen how make can be used to create a
file, if it doesn't exists.
file.txt:
# if file.txt doesn't exists, then create it:
echo 'contents of file.txt' > file.txt
We can do better: create or update a file only if it is
newer than its prerequisites
35. Conditional execution by
modification date
Let's have a better look at this example:
result1.txt: data1.txt
calculate_result.py
python calculate_result.txt --input
data1.txt
A great feature of make is that it execute a rule not
only if the target file doesn't exist, but also if it
has a 'last modification date' earlier than all of its
prerequisites
36. Conditional execution by
modification date
result1.txt: data1.txt
@sed 's/b/B/i' data1.txt > result1.txt
@echo 'result1.txt has been calculated'
In this example, result1.txt will be recalculated
every time 'data1.txt' is modified
$: touch data1.txt calculate_result.py
$: make result1.txt
result1.txt has been calculated
$: make result1.txt
result1.txt is already up-to-date
$: touch data1.txt
$: make result1.txt
result1.txt has been calculated
37. Conditional execution -
applications
This 'conditional execution by modification date
comparison' feature of make is very useful
Let's say you discover an error in one of your input
data: you will be able to repeat the analysis by
executing only the operations needed
You can also use it to recalculate results every time
you modify a script:
result.txt: scripts/calculate_result.py
python calculate_result.py > result.py
40. Variables and functions
You may have already noticed that Make's syntax is
really old :)
In fact, it is a ~40 years old language
It uses special variables like $@, $^, and it can be
worst than perl!!!
(perl developers – please don't get mad at me :) )
41. Variables
Variables are declared with a '=' and by convention
are upper case.
They are called by including their name in '$()'
WORKING_DIR
is a variable
42. Special variables - $@
Make uses some custom variables, with a syntax
similar to perl
'$@' always corresponds to the target name:
$: cat >Makefile
%.txt:
echo $@
$: make filename.txt $@ took the value of
echo filename.txt 'filename.txt'
filename.txt
43. Other special variables
$@ The rule's target
$< The rule's first
prerequisite
$? All the rule's out of
date prerequisites
$^ All Prerequisites
44. Functions
Usually you don't want to declare functions in
make, but there are some builtin utilities that can
be useful
Most frequently used functions:
$(addprefix <prefix>, list)
→ add a prefix to a spaceseparated list
example:
FILES = file1 file2 file3
$(addprefix /home/user/data, $(FILES)
$(addsuffix) work similarly
45. Full makefile example
INPUTFILES = lower_DAF lower_maf upper_maf
lower_daf upper_daf
RESULTSDIR = ./results
RESULTFILES = $(addprefix $(RESULTSDIR)/,
$(addsuffix _filtered.txt,$(INPUTFILES)
help:
@echo 'type "make filter" to calculate results'
all: $(RESULTFILES)
$(RESULTSDIR)/%_filtered.txt: data/%.txt
src/filter_genes.py
python src/filter_genes.py --genes
data/Genes.txt --window $< --output $@
It looks like very complicated, but in the end
you always use the same Makefile structure
47. Testing a makefile
make n: only shows the commands to be executed
You can pass variables to make:
$: make say_hello MYNAME=”Giovanni”
hello, Giovanni
Strongly suggested: use a Revision Control
Software with support for branching (git, hg,
bazaar) and create a branch for testing
48. Another complex Makefile
example
# make masked sequence our starting point is the
myseq.m: myseq file myseq, the end point
rmask myseq > myseq.m
is the blast results blastout
# run blast on masked seq
blastout: mydb.psq myseq.m we first want to mask out
blastx mydb myseq.m > blastout any repeats using rmask to
echo “ran blast!” create myseq.m
# index blastable db we then blastx myseq.m
mydb.psq: mydb
against a protein db called
formatdb -p T mydb
mydb
# rules follow this pattern:
target: subtarget1, ..., subtargetN
before blastx is run the
shell command 1 protein db must be
shell command 2... indexed using formatdb
(slide taken from biomake web site)
49. The “make” command
% make blastout
# run blast on masked seq
formatdb -p T mydb
blastout: mydb.psq myseq.m rmask myseq.fst > myseq.m
blastx mydb myseq.m > blastout blastx mydb myseq.m > blastout
echo “ran blast!”
% make blastout
# index blastable db make: 'blastout' is up to date
mydb.psq: mydb
% cat newseqs >> mydb
formatdb -p T mydb % make blastout
formatdb -p T mydb
# make masked sequence blastx mydb myseq.m > blastout
myseq.m: myseq
rmask myseq > myseq.m make uses unix file
modification timestamps when
checking dependencies
if a subtarget is more recent
than the goal target, then
(slide taken from biomake web site) reexecute action
50. BioMake and alternatives
BioMake is an alternative to make, thought to be
used in bioinformatics
Developed to annotate the Drosophila
melanogaster genome (Berkeley university)
Cleaner syntax,derived from prolog
Separates the rule's name from the name of the
target files
51. A BioMake example
formatdb(DB)
req: DB
run: formatdb DB
comment: prepares blastdb for blasting (wublast)
rmask(Seq)
flat: masked_seqs/Seq.masked
req: Seq
srun: RepeatMasker -lib $(LIB) Seq
comment: masks out repeats from input sequence
mblastx(Seq,DB)
flat: blast_results/Seq.DB.blastx
req: formatdb(DB) rmask(Seq)
srun: blastx -filter SEG+XNU DB rmask(Seq)
comment: this target is for the results of running blastx on
a masked input genomic sequence (wublast)
(slide taken from biomake web site)
52. Other alternatives
There are other many alternatives to make:
BioMake (prolog?)
o/q/dist/etc.. make
Ant (Java)
Scons (python)
Paver (python)
Waf (python)
This list is biased because I am a python programmer :)
These tools are more oriented to software development
53. Conclusions
Make is very basic for bioinformatics
It is useful for the simpler tasks:
Logging the operations made to your data files
Working with clusters
Avoid recalculations
Apply a pipeline to different datasets
It is installed in almost any unix system and has a standard
syntax (interchangeable, reproducible)
Study it and understand its logic. Use it in the most basic way,
without worrying about prerequisites and special variables.
Later you can look for easier tools (biomake, rake, taverna,
54. Suggested readings
Software Carpentry for bioinformatics
http://swc.scipy.org/lec/build.html
A Makefile is a pipeline
http://www.nodalpoint.org/2007/03/18/a_pipeline_is_a_makefil
BioMake and SKAM
http://skam.sourceforge.net/
BioWiki Make Manifesto
http://biowiki.org/MakefileManifesto
Discussion on the BIP mailing list
http://www.mailarchive.com/biologyinpython@lists.idyll.org
Gnu/Make manual by R.Stallman and R.MacGrath
http://theory.uwinnipeg.ca/gnu/make/make_toc.html