SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
Understanding and using
GNU/Linux
by Giuseppe Profiti
Updated September 2015
Tutorial for the Programming for Bioinformatics course,
International Master of Bioinformatics
University of Bologna, Italy
http://www.biocomp.unibo.it/lsbioinfo/
First version: 12/2013
Last version: 11/2015
November 2015 Giuseppe Profiti 2/66
Goals and means
● Goals
– Understanding what an Operating System is
– Know how to proficiently use GNU/Linux
● Means
– Simple examples (maybe biology-inspired)
– Exercises and hands-on
● Not covered
– Formal details
– “How do I use <our favourite software>?”
November 2015 Giuseppe Profiti 3/66
What is an Operating System?
● It's a piece of
software
● It manages hardware
and software
resources
● It's useful for general
purpose and
heterogeneous
hardware systems
ImagefromWikimediaCommons,PublicDomain
November 2015 Giuseppe Profiti 4/66
Hardware, OS and software
Hardware
Operating
system
Image from Flickr, released under Creative Commons BY by Petr Dosek
November 2015 Giuseppe Profiti 5/66
Same OS, different software
Image from Wikimedia Commons, Public Domain (NASA)
ImagefromFlickr,CreativeCommonsBYbyTexasA&MUniversity
November 2015 Giuseppe Profiti 6/66
Another example
Image from Flickr, released under Creative Commons BY by Andrea Arden
Different hardware, different OS and software
November 2015 Giuseppe Profiti 7/66
GNU/Linux
● Originates from Unix
● Linux is the kernel
– Manages the hardware, memory and so on
● GNU is a set of software and tools
– They run on top of Linux
– Provide functionality
● Multi user, multi threaded
● Ubuntu, Lubuntu, Xubuntu, Debian, Red Hat..
● MacOS is based on Unix too
November 2015 Giuseppe Profiti 8/66
What's the difference?
Image from Wikimedia Commons, GNU GPL license
November 2015 Giuseppe Profiti 9/66
A Linux distribution includes
● The Kernel (Linux)
● An install system for the distribution
● Drivers
– How the system can manage specific hardware
● A package manager
– To install and update software
– Usually different from one distribution to the other
November 2015 Giuseppe Profiti 10/66
Login
● Once started the system asks for your
– Username
– Password
● Each user has a different main folder on disk
● Users have different access rights
● The superuser (called “root”) can do everything
● On Ubuntu, the main user you created when
installing can run programs as root, if needed
November 2015 Giuseppe Profiti 11/66
Shell
● It is the main interface with the system
● Can be used to
– Navigate the file system
– Execute tools
– Install software
– Connect to other machines
– Edit files
– … everything the system can do
● Also called Console, or Terminal
November 2015 Giuseppe Profiti 12/66
How a shell looks like
Image from Wikimedia Commons, licensed as Public Domain by User:AVRS
November 2015 Giuseppe Profiti 13/66
“It's a trap!”
Every time you use the mouse in a shell,
you are doing something wrong.
ImagebyManuelR.,WikimediaCommons,CC-BY
November 2015 Giuseppe Profiti 14/66
Exercise 1: Open a shell
● If you don't use the Graphical User Interface
– You already are in a shell
● If you use the Graphical User Interface
– In Ubuntu: Click the logo, type “terminal”, select it
– Other systems: find the terminal icon somewhere
● The terminal may have a black, white or colour
background
– No matter the colour, it works in the same way
November 2015 Giuseppe Profiti 15/66
The prompt
● It is a string saying that the shell is ready
● It may state the current directory
● It ends with $,%,> or #
● After that, you can type a command
● After a command, you type the Enter key
November 2015 Giuseppe Profiti 16/66
Exercise 2: create a directory
● To create a directory (or folder) type:
mkdir tutorial-p2b
● and press the Enter key ↵
● What do you see?
● To check the existence of the new directory:
ls
● and press the Enter key ↵
November 2015 Giuseppe Profiti 17/66
Upper-case and lower-case
● The shell is CASE SENSITIVE
– Upper-case and lower-case are different
● LS is different from ls
● Tutorial-p2b is not tutorial-p2b
● Then, to run a program, you have to type its
name correctly
●
You can use the TAB key ↹ to complete a
filename after typing its initials
– IF the system can distinguish what file you want
November 2015 Giuseppe Profiti 18/66
Exercise 3: look inside a directory
● Type:
ls tutorial-p2b ↵
● Type:
ls Tutorial-p2b ↵
● Type:
ls tut
● Then the TAB key ↹ , then the Enter key ↵
November 2015 Giuseppe Profiti 19/66
File system
● It stores both data files and programs
● Directories are lists of files
● Hierarchical structure
● The root of the tree is the directory /
/
home etc bin
me you
November 2015 Giuseppe Profiti
Filesystem
● Files and directories are stored in a filesystem
● The filesystem is like a tree:
– It has one root directory “/”
– Each subdirectory is a branch in the tree
– Each file is a leaf
November 2015 Giuseppe Profiti
Path
● A path specifies a location in the filesystem
● It indicates the branches to follow
● Each branch (directory) is separated by /
● The path can be absolute or relative
● Absolute: always starts from the root
– i.e. “/home/Alice/Desktop/vacation/sunset.jpg”
● Relative: starts from your current directory
– i.e. “Desktop/vacation/sunset.jpg” if you are in
/home/Alice/
November 2015 Giuseppe Profiti
Special directories
● The current directory is “.”
– So “sunset.jpg” and “./sunset.jpg” are the same file
● The previous directory is “..”
– i.e. If you are in “/home/Alice/Desktop/work/”, you
write “../vacation/sunset.jpg”
– If you are in “/home/Alice/experiment/data/”, you
type “../../Desktop/vacation/sunset.jpg”
November 2015 Giuseppe Profiti
/
B A
WORKHOME
A
3.TXT
1.TXT
3.TXT2.TXT1.TXT
Exercise 4: path
While in /home/ check the following relative paths:
● A/1.TXT
● ../WORK/1.TXT
● ../WORK/A/../1.TXT
● ../WORK/A/../../HOME/B/../A/1.TXT
Specify the absolute paths for
the following files:
● leftmost and rightmost 3.TXT
● leftmost and rightmost 1.TXT
November 2015 Giuseppe Profiti
File permissions
● Files can be read, written and executed
● The owner of a file can restrict these operations
– For herself
– For other members of the group
– For everyone else
Examples:
● Experiment data that should not be overwritten
● Data shared only with group members for read
purposes
November 2015 Giuseppe Profiti
File permissions
● Permissions can be changed using chmod
● The shortcuts are:
– User (u), Group (g), Others (o), All (a)
– adding (+), removing (-)
– Read (r), Write (w) and eXecute (x)
● To remove the write permission to the group:
chmod g-w
November 2015 Giuseppe Profiti
Show file permissions
● To show the permissions use
ls -l
● For each file, at the beginning you get
-rw-r-xr--
– -rw-r-xr-- this are for the user (read and write)
– -rw-r-xr-- this are for the group (read and execute)
– -rw-r-xr-- this are for others (only read)
● The first position is for things like directories (d)
November 2015 Giuseppe Profiti
File types
● Extensions mean nothing
– .doc, .jpg and so on are just conventions
● Text and binary files
– Text can be printed and read by humans
● Plain text, CSV, XML are all text-based
– Binary can be read by programs
● Data and programs
– A program can be executed by the system
(executable permission does not make a program)
November 2015 Giuseppe Profiti
Programs and processes
● An executable program sits in the disk
● A running program becomes a process
– You can have multiple processes spawned from the
same program: i.e. many blastall running
● Each process has a unique identifier (pid)
● To inspect the running processes: ps or top
● To quit a running process, use CTRL+c or
kill <pid>
November 2015 Giuseppe Profiti
Exercise 5: processes
● Open two shells
● In one shell run the following command
sleep 20m
● In the other shell, run ps to find the pid of sleep
● Kill the process using
kill <pid>
● Note: on remote servers you can't CTRL-C
unless you keep the connection open
November 2015 Giuseppe Profiti
Parameters vs arguments
● The argument(s) is the subject of the operation
– ls /home/Alice/Desktop
– kill 260046
● Parameters (or options) modify the behaviour
– ls -l /home/Beatrix/Desktop
– top -h
● Parameters usually start with minus sign
– Single one for single letter (-h, -p, -t)
– double for longer parameters (--help, --out)
November 2015 Giuseppe Profiti
Moving files around
● You can copy files using the command cp
– cp path/of/original/file path/of/copy
● You can move files using mv
– mv path/of/original/file new/path
● You can delete files using rm
– rm file/to/delete
– Warning: deletion is permanent
November 2015 Giuseppe Profiti 32/66
Redirection
● You can save the result of commands to a file
● The output is redirected using >
ls > files.list
● The file is created empty before running ls
● Avoid deletion of the content with append >>
– Adds the output to the end of file
● Errors are not “output”, use 2>
● Both output and error redirected with &>
November 2015 Giuseppe Profiti
Inspecting a file
● head prints the first 10 lines
● tail prints the last 10 lines
– You can change the number of lines of both head
and tail by specifying it as parameter
● cat shows the whole file
– Beware to long files
● more shows the whole file, one page at time
November 2015 Giuseppe Profiti
Editing a file
● Too many editors to list them all, just a few
● On the shell
– cat > filename writes everything you type to file
● CTRL+d ends the input
– nano, pico: easy to use
– vim, emacs: more advanced
● On the GUI
– gedit
– gvim
November 2015 Giuseppe Profiti
Finding text: grep
● It prints the lines containing a match
grep “pattern” filename
● Pattern can be a string or a regular expression
● Useful parameters
– -w matches whole words (i.e. spaces around)
– -x matches whole lines
– -i ignore case (uppercase = lowercase)
– -v reverse match (i.e. lines NOT containing pattern)
November 2015 Giuseppe Profiti
Exercise 6: grep
● Download the following file
http://profiti.web.cs.unibo.it/res/p2b/ex.tar.gz
● Move it to the working directory
● Uncompress it
tar -xvf ex.tar.gz
November 2015 Giuseppe Profiti
Exercise 6: grep
● Find all the lines containing “m” in test1.txt
grep “m” test1.txt
● Find all the lines NOT containing “m” in test1.txt
grep -v “m” test1.txt
November 2015 Giuseppe Profiti
Finding text: grep /2
● You can provide a file of patterns
grep -f patterns.txt data.txt
● The program looks for every line as a separate
pattern
● It may take a while if the two files are big
November 2015 Giuseppe Profiti
Comparing
● Look for the differences in two similar files
diff file1 file2
● Compares the two files line by line
● Output
– Line numbers for the different lines
– “<” for lines only in file1
– “>” for lines only in file2
● It is not quite easy to use
November 2015 Giuseppe Profiti
Sorting
● Diffing is easier when data are sorted
sort filename
● Useful parameters:
– -n numerical sort (otherwise 100 < 2)
– -r reverse sort
– -k x sort on column number x
– -t x uses x as column separator
November 2015 Giuseppe Profiti
Getting columns
● Printing a specific column with cut (ex.: 3rd)
cut -f 3 filename
● You can specify column separator with -d
● Useful arguments for -f:
– N prints the Nth column, counted starting from 1
– N- prints from the Nth to the end of the line
– N-M prints from Nth to Mth (included)
– -M prints from 1 up to Mth (included)
November 2015 Giuseppe Profiti 42/66
Pipe: motivation
● Example: I want the file names for all the files
with rwx permissions
● Solution with redirection:
ls -l > files.list
grep “rwx”files.list > wanted-files.list
cut -f 10- -d” ” wanted-files.list >
result.list
November 2015 Giuseppe Profiti 43/66
Pipe
● Too many intermediate files
– Possibly big: disk space issues
– Hard to remember: do I need myfiles.list or my.list?
● Rule of thumb: keep intermediate result only if you
need it later for further analysis
● For everything else, use pipe |
ls -l | grep “rwx” | cut -f 10- -d” ” >
result.list
● Pipe sends the result of a command to the input of the
following one
November 2015 Giuseppe Profiti 44/66
Pipe
● All the previous examples work also without a
file as input, but with a pipe
● The first 10 lines of a list of files
ls | head
● The first column of the last line of a sorted file
sort file.txt | tail -1 | cut -f 1
November 2015 Giuseppe Profiti 45/66
Pipe vs sequence
● Pipe sends the result to the next command
● If you want to execute commands in sequence,
separate them using ;
ls; head test.txt
● What if the second depends from the first?
python my.py > a.txt && sort a.txt
November 2015 Giuseppe Profiti
Shell scripting
● What if the command is very long and you have
to use it again?
● What if you have to repeat the same operations
for many inputs?
● Shell scripting is programming for the shell
● Same primitives of programming languages
– IF choices, FOR loops
– Parameters, variables
November 2015 Giuseppe Profiti
Shell scripting /2
● Save commands to a text file
● Add execution permissions to the file
● Call the file from the shell
● Example:
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
November 2015 Giuseppe Profiti
Shell scripting /3
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● $( ) returns the output of the commands
inside
● Useful for cat and everything that returns a
content
November 2015 Giuseppe Profiti
Shell scripting /4
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● * is a wildcard, it means “every string”
● In this case, every string ending with “.fasta”
● Other wildcards are:
– ? means any single character
– [] group choices, i.e. [ae] means either a or e
November 2015 Giuseppe Profiti
Shell scripting /5
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● for execute the commands between do and
done one time for each iteration
● i is the iteration variable, it gets one of the
values (in the example, a file name), you
access its value using $i
November 2015 Giuseppe Profiti
Shell scripting /6
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● The final result of all the for loops is passed to
sort
● This script returns a list of fasta file with an
associated number of entries, sorted by that
number
November 2015 Giuseppe Profiti
Shell scripting /7
for i in $(ls *.fasta); do
echo $i, $(grep “^>” $i | wc -l);
done | sort -n -k 2 > $1
● The final result is redirected to a file, specified
at command line
● Examples:
bash myscript.sh result1.txt
bash myscript.sh result2.txt
November 2015 Giuseppe Profiti
Awk
● Awk executes a series of commands for each
line of the input
● It can execute different commands for different
lines, using matching regular expressions
● It may be faster than other tools
● It is easy to use and powerfull
November 2015 Giuseppe Profiti
Awk /2
awk '/<regex>/ {<commands>}' a.txt
● You can specify multiple regular expressions
● Commands can contain if and assignments
● Two special keywords instead of regex
– BEGIN matches the beginning of the input, before
the first line
– END matches the end of the input, after the last line
November 2015 Giuseppe Profiti
Awk /3
awk 'BEGIN {a=0} {a=a+1} END{print a}'
● It counts the number of lines
● Before the first line, sets the variable a to zero
● For each line, increases the counter
– There is no regex, so each line matches
● At the end, prints the value of the counter
● Works better than wc -l
November 2015 Giuseppe Profiti
Awk /4
awk '{print $2,$3}'
● Prints the second and the third column
● Columns are separated by space
● You can specify a different separator with -F
awk -F “,” '{print $2,$3}'
● NF is the number of columns (or “fields”)
● $NF is the value of the last column
November 2015 Giuseppe Profiti
Awk /5
awk '/^ATOM/ {if ($5==”A”) print $7,$8,$9}'
● Prints the positions for each atom in the A chain
● It matches only lines starting with “ATOM”
● You can select lines not matching a pattern
awk '!/(TAG)|(TAA)|(TGA)/ {print $3,$4}'
● The ! means “not matching”
● Round brackets group patterns
● | is for alternatives
November 2015 Giuseppe Profiti
Awk exercise 1
● Using the example files from
http://profiti.web.cs.unibo.it/res/p2b/ex.tar.gz
1.Print lines containing m in test1.txt
2.Print lines not containing m in test1.txt
3.Print lines with A in second column in test1.txt
4.Print the third column of test1.txt
(a) Use comma as separator
(b) Use E as separator
November 2015 Giuseppe Profiti
Awk /6
awk 'BEGIN {name=””}
/^>/ {name=$0; d[name]=””}
!/^>/ {d[name]=d[name]+length($0)}
END {for (i in d)
print substr(i,2,length(i)),d[i]}'
● Uses an array d, it's like python dictionaries
● $0 is the whole line
● substr is the substring, positions starts from 1
● Prints a list of fasta entries and their length
November 2015 Giuseppe Profiti
Awk exercise 2
● Print the sum of the elements of the third
column of test1.txt
● Print the average of the elements of the fourth
column of test1.txt
● Take a look at data1.txt and data2.txt
– Did you just opened them with an editor?
– Did you just used cat?
November 2015 Giuseppe Profiti
Awk exercise 3
● How many lines in data1.txt and data2.txt?
$wc -l data*
2999997 data1.txt
2999999 data2.txt
● Is it true?
– data1.txt contains 2999998 lines
– data2.txt contains 3000000 lines
● They contain the same numbers, but 2
● Which ones?
November 2015 Giuseppe Profiti
Awk /7
awk 'BEGIN {while
((getline<"patterns.txt")>0)diz[$1]=0}
{if ($1 in diz) print $0}'
● Works like grep -f patterns.txt
● Getline reads the file one line at the time
● Each line becomes a key in the array
● The input is then checked against existing keys
● For big files, it is faster than grep
– O(N*M) vs O(N+M)
November 2015 Giuseppe Profiti
Awk exercise 3, solution
diff <(sort data1.txt) <(sort data2.txt)
● Diff is picky, the result is not that good
– Took 14 seconds on a test computer
grep -v -f data1 data2.txt
● Good luck, it may take a while
– It may freeze your computer
● Awk takes 4 seconds on a test computer
November 2015 Giuseppe Profiti
Awk vs Python
● Reading fasta, awk style
awk 'BEGIN {name=””}
/^>/ {name=$0; d[name]=””}
!/^>/ {d[name]=d[name]+length($0)}
END {for (i in d)
print substr(i,2,length(i)),d[i]}'
● Note: awk scripts can be saved to a file
● Use the -f option to call the saved file
November 2015 Giuseppe Profiti
Awk vs Python
● Reading fasta, Python style
import sys
f = open(sys.argv[1])
d = {}
name = “”
for r in f:
r = r.rstrip()
if r[0]=='>':
name = r[1:]
d[name]=0
else:
d[name]+=len(r)
f.close()
for k in d:
print k,d[k]
November 2015 Giuseppe Profiti 66/66

Mais conteúdo relacionado

Semelhante a Introduction to Linux

Installing Software, Part 2: Package Managers
Installing Software, Part 2: Package ManagersInstalling Software, Part 2: Package Managers
Installing Software, Part 2: Package ManagersKevin OBrien
 
Day 3 ubuntu boot camp
Day 3 ubuntu boot campDay 3 ubuntu boot camp
Day 3 ubuntu boot campDarlene Parker
 
Suse Studio: "How to create a live openSUSE image with OpenFOAM® and CFD tools"
Suse Studio: "How to create a live openSUSE image with  OpenFOAM® and CFD tools"Suse Studio: "How to create a live openSUSE image with  OpenFOAM® and CFD tools"
Suse Studio: "How to create a live openSUSE image with OpenFOAM® and CFD tools"Baltasar Ortega
 
ProgFund_Lecture_6_Files_and_Exception_Handling-3.pdf
ProgFund_Lecture_6_Files_and_Exception_Handling-3.pdfProgFund_Lecture_6_Files_and_Exception_Handling-3.pdf
ProgFund_Lecture_6_Files_and_Exception_Handling-3.pdflailoesakhan
 
How to start Django automatically after restarting development or local syste...
How to start Django automatically after restarting development or local syste...How to start Django automatically after restarting development or local syste...
How to start Django automatically after restarting development or local syste...Vidhi_Khatri
 
Installing Software, Part 3: Command Line
Installing Software, Part 3: Command LineInstalling Software, Part 3: Command Line
Installing Software, Part 3: Command LineKevin OBrien
 
DT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital ToolboxDT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital ToolboxCarlos Cámara
 
[HKDUG] #20180512 - Fix Hacked Drupal with GIT
[HKDUG] #20180512 - Fix Hacked Drupal with GIT[HKDUG] #20180512 - Fix Hacked Drupal with GIT
[HKDUG] #20180512 - Fix Hacked Drupal with GITWong Hoi Sing Edison
 
Linux Directory Structure
Linux Directory StructureLinux Directory Structure
Linux Directory StructureKevin OBrien
 
Efficient development workflows with composer
Efficient development workflows with composerEfficient development workflows with composer
Efficient development workflows with composernuppla
 
Installing Software, Part 1 - Repositories
Installing Software, Part 1 - RepositoriesInstalling Software, Part 1 - Repositories
Installing Software, Part 1 - RepositoriesKevin OBrien
 
Efficient development workflows with composer
Efficient development workflows with composerEfficient development workflows with composer
Efficient development workflows with composernuppla
 
Day 2 ubuntu boot camp
Day 2 ubuntu boot campDay 2 ubuntu boot camp
Day 2 ubuntu boot campDarlene Parker
 

Semelhante a Introduction to Linux (20)

Installing Software, Part 2: Package Managers
Installing Software, Part 2: Package ManagersInstalling Software, Part 2: Package Managers
Installing Software, Part 2: Package Managers
 
An Introduction To Linux
An Introduction To LinuxAn Introduction To Linux
An Introduction To Linux
 
Linux
LinuxLinux
Linux
 
Day 3 ubuntu boot camp
Day 3 ubuntu boot campDay 3 ubuntu boot camp
Day 3 ubuntu boot camp
 
Suse Studio: "How to create a live openSUSE image with OpenFOAM® and CFD tools"
Suse Studio: "How to create a live openSUSE image with  OpenFOAM® and CFD tools"Suse Studio: "How to create a live openSUSE image with  OpenFOAM® and CFD tools"
Suse Studio: "How to create a live openSUSE image with OpenFOAM® and CFD tools"
 
ProgFund_Lecture_6_Files_and_Exception_Handling-3.pdf
ProgFund_Lecture_6_Files_and_Exception_Handling-3.pdfProgFund_Lecture_6_Files_and_Exception_Handling-3.pdf
ProgFund_Lecture_6_Files_and_Exception_Handling-3.pdf
 
How to start Django automatically after restarting development or local syste...
How to start Django automatically after restarting development or local syste...How to start Django automatically after restarting development or local syste...
How to start Django automatically after restarting development or local syste...
 
Publican
PublicanPublican
Publican
 
Installing Software, Part 3: Command Line
Installing Software, Part 3: Command LineInstalling Software, Part 3: Command Line
Installing Software, Part 3: Command Line
 
DT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital ToolboxDT2014-15 S01: Digital Toolbox
DT2014-15 S01: Digital Toolbox
 
Opensource Software usability
Opensource Software usabilityOpensource Software usability
Opensource Software usability
 
Day4 ubuntu boot camp
Day4 ubuntu boot campDay4 ubuntu boot camp
Day4 ubuntu boot camp
 
[HKDUG] #20180512 - Fix Hacked Drupal with GIT
[HKDUG] #20180512 - Fix Hacked Drupal with GIT[HKDUG] #20180512 - Fix Hacked Drupal with GIT
[HKDUG] #20180512 - Fix Hacked Drupal with GIT
 
Linux Directory Structure
Linux Directory StructureLinux Directory Structure
Linux Directory Structure
 
Efficient development workflows with composer
Efficient development workflows with composerEfficient development workflows with composer
Efficient development workflows with composer
 
Installing Software, Part 1 - Repositories
Installing Software, Part 1 - RepositoriesInstalling Software, Part 1 - Repositories
Installing Software, Part 1 - Repositories
 
Efficient development workflows with composer
Efficient development workflows with composerEfficient development workflows with composer
Efficient development workflows with composer
 
Git
GitGit
Git
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 
Day 2 ubuntu boot camp
Day 2 ubuntu boot campDay 2 ubuntu boot camp
Day 2 ubuntu boot camp
 

Último

[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...kalichargn70th171
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 

Último (20)

[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
The Ultimate Guide to Performance Testing in Low-Code, No-Code Environments (...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 

Introduction to Linux

  • 1. Understanding and using GNU/Linux by Giuseppe Profiti Updated September 2015 Tutorial for the Programming for Bioinformatics course, International Master of Bioinformatics University of Bologna, Italy http://www.biocomp.unibo.it/lsbioinfo/ First version: 12/2013 Last version: 11/2015
  • 2. November 2015 Giuseppe Profiti 2/66 Goals and means ● Goals – Understanding what an Operating System is – Know how to proficiently use GNU/Linux ● Means – Simple examples (maybe biology-inspired) – Exercises and hands-on ● Not covered – Formal details – “How do I use <our favourite software>?”
  • 3. November 2015 Giuseppe Profiti 3/66 What is an Operating System? ● It's a piece of software ● It manages hardware and software resources ● It's useful for general purpose and heterogeneous hardware systems ImagefromWikimediaCommons,PublicDomain
  • 4. November 2015 Giuseppe Profiti 4/66 Hardware, OS and software Hardware Operating system Image from Flickr, released under Creative Commons BY by Petr Dosek
  • 5. November 2015 Giuseppe Profiti 5/66 Same OS, different software Image from Wikimedia Commons, Public Domain (NASA) ImagefromFlickr,CreativeCommonsBYbyTexasA&MUniversity
  • 6. November 2015 Giuseppe Profiti 6/66 Another example Image from Flickr, released under Creative Commons BY by Andrea Arden Different hardware, different OS and software
  • 7. November 2015 Giuseppe Profiti 7/66 GNU/Linux ● Originates from Unix ● Linux is the kernel – Manages the hardware, memory and so on ● GNU is a set of software and tools – They run on top of Linux – Provide functionality ● Multi user, multi threaded ● Ubuntu, Lubuntu, Xubuntu, Debian, Red Hat.. ● MacOS is based on Unix too
  • 8. November 2015 Giuseppe Profiti 8/66 What's the difference? Image from Wikimedia Commons, GNU GPL license
  • 9. November 2015 Giuseppe Profiti 9/66 A Linux distribution includes ● The Kernel (Linux) ● An install system for the distribution ● Drivers – How the system can manage specific hardware ● A package manager – To install and update software – Usually different from one distribution to the other
  • 10. November 2015 Giuseppe Profiti 10/66 Login ● Once started the system asks for your – Username – Password ● Each user has a different main folder on disk ● Users have different access rights ● The superuser (called “root”) can do everything ● On Ubuntu, the main user you created when installing can run programs as root, if needed
  • 11. November 2015 Giuseppe Profiti 11/66 Shell ● It is the main interface with the system ● Can be used to – Navigate the file system – Execute tools – Install software – Connect to other machines – Edit files – … everything the system can do ● Also called Console, or Terminal
  • 12. November 2015 Giuseppe Profiti 12/66 How a shell looks like Image from Wikimedia Commons, licensed as Public Domain by User:AVRS
  • 13. November 2015 Giuseppe Profiti 13/66 “It's a trap!” Every time you use the mouse in a shell, you are doing something wrong. ImagebyManuelR.,WikimediaCommons,CC-BY
  • 14. November 2015 Giuseppe Profiti 14/66 Exercise 1: Open a shell ● If you don't use the Graphical User Interface – You already are in a shell ● If you use the Graphical User Interface – In Ubuntu: Click the logo, type “terminal”, select it – Other systems: find the terminal icon somewhere ● The terminal may have a black, white or colour background – No matter the colour, it works in the same way
  • 15. November 2015 Giuseppe Profiti 15/66 The prompt ● It is a string saying that the shell is ready ● It may state the current directory ● It ends with $,%,> or # ● After that, you can type a command ● After a command, you type the Enter key
  • 16. November 2015 Giuseppe Profiti 16/66 Exercise 2: create a directory ● To create a directory (or folder) type: mkdir tutorial-p2b ● and press the Enter key ↵ ● What do you see? ● To check the existence of the new directory: ls ● and press the Enter key ↵
  • 17. November 2015 Giuseppe Profiti 17/66 Upper-case and lower-case ● The shell is CASE SENSITIVE – Upper-case and lower-case are different ● LS is different from ls ● Tutorial-p2b is not tutorial-p2b ● Then, to run a program, you have to type its name correctly ● You can use the TAB key ↹ to complete a filename after typing its initials – IF the system can distinguish what file you want
  • 18. November 2015 Giuseppe Profiti 18/66 Exercise 3: look inside a directory ● Type: ls tutorial-p2b ↵ ● Type: ls Tutorial-p2b ↵ ● Type: ls tut ● Then the TAB key ↹ , then the Enter key ↵
  • 19. November 2015 Giuseppe Profiti 19/66 File system ● It stores both data files and programs ● Directories are lists of files ● Hierarchical structure ● The root of the tree is the directory / / home etc bin me you
  • 20. November 2015 Giuseppe Profiti Filesystem ● Files and directories are stored in a filesystem ● The filesystem is like a tree: – It has one root directory “/” – Each subdirectory is a branch in the tree – Each file is a leaf
  • 21. November 2015 Giuseppe Profiti Path ● A path specifies a location in the filesystem ● It indicates the branches to follow ● Each branch (directory) is separated by / ● The path can be absolute or relative ● Absolute: always starts from the root – i.e. “/home/Alice/Desktop/vacation/sunset.jpg” ● Relative: starts from your current directory – i.e. “Desktop/vacation/sunset.jpg” if you are in /home/Alice/
  • 22. November 2015 Giuseppe Profiti Special directories ● The current directory is “.” – So “sunset.jpg” and “./sunset.jpg” are the same file ● The previous directory is “..” – i.e. If you are in “/home/Alice/Desktop/work/”, you write “../vacation/sunset.jpg” – If you are in “/home/Alice/experiment/data/”, you type “../../Desktop/vacation/sunset.jpg”
  • 23. November 2015 Giuseppe Profiti / B A WORKHOME A 3.TXT 1.TXT 3.TXT2.TXT1.TXT Exercise 4: path While in /home/ check the following relative paths: ● A/1.TXT ● ../WORK/1.TXT ● ../WORK/A/../1.TXT ● ../WORK/A/../../HOME/B/../A/1.TXT Specify the absolute paths for the following files: ● leftmost and rightmost 3.TXT ● leftmost and rightmost 1.TXT
  • 24. November 2015 Giuseppe Profiti File permissions ● Files can be read, written and executed ● The owner of a file can restrict these operations – For herself – For other members of the group – For everyone else Examples: ● Experiment data that should not be overwritten ● Data shared only with group members for read purposes
  • 25. November 2015 Giuseppe Profiti File permissions ● Permissions can be changed using chmod ● The shortcuts are: – User (u), Group (g), Others (o), All (a) – adding (+), removing (-) – Read (r), Write (w) and eXecute (x) ● To remove the write permission to the group: chmod g-w
  • 26. November 2015 Giuseppe Profiti Show file permissions ● To show the permissions use ls -l ● For each file, at the beginning you get -rw-r-xr-- – -rw-r-xr-- this are for the user (read and write) – -rw-r-xr-- this are for the group (read and execute) – -rw-r-xr-- this are for others (only read) ● The first position is for things like directories (d)
  • 27. November 2015 Giuseppe Profiti File types ● Extensions mean nothing – .doc, .jpg and so on are just conventions ● Text and binary files – Text can be printed and read by humans ● Plain text, CSV, XML are all text-based – Binary can be read by programs ● Data and programs – A program can be executed by the system (executable permission does not make a program)
  • 28. November 2015 Giuseppe Profiti Programs and processes ● An executable program sits in the disk ● A running program becomes a process – You can have multiple processes spawned from the same program: i.e. many blastall running ● Each process has a unique identifier (pid) ● To inspect the running processes: ps or top ● To quit a running process, use CTRL+c or kill <pid>
  • 29. November 2015 Giuseppe Profiti Exercise 5: processes ● Open two shells ● In one shell run the following command sleep 20m ● In the other shell, run ps to find the pid of sleep ● Kill the process using kill <pid> ● Note: on remote servers you can't CTRL-C unless you keep the connection open
  • 30. November 2015 Giuseppe Profiti Parameters vs arguments ● The argument(s) is the subject of the operation – ls /home/Alice/Desktop – kill 260046 ● Parameters (or options) modify the behaviour – ls -l /home/Beatrix/Desktop – top -h ● Parameters usually start with minus sign – Single one for single letter (-h, -p, -t) – double for longer parameters (--help, --out)
  • 31. November 2015 Giuseppe Profiti Moving files around ● You can copy files using the command cp – cp path/of/original/file path/of/copy ● You can move files using mv – mv path/of/original/file new/path ● You can delete files using rm – rm file/to/delete – Warning: deletion is permanent
  • 32. November 2015 Giuseppe Profiti 32/66 Redirection ● You can save the result of commands to a file ● The output is redirected using > ls > files.list ● The file is created empty before running ls ● Avoid deletion of the content with append >> – Adds the output to the end of file ● Errors are not “output”, use 2> ● Both output and error redirected with &>
  • 33. November 2015 Giuseppe Profiti Inspecting a file ● head prints the first 10 lines ● tail prints the last 10 lines – You can change the number of lines of both head and tail by specifying it as parameter ● cat shows the whole file – Beware to long files ● more shows the whole file, one page at time
  • 34. November 2015 Giuseppe Profiti Editing a file ● Too many editors to list them all, just a few ● On the shell – cat > filename writes everything you type to file ● CTRL+d ends the input – nano, pico: easy to use – vim, emacs: more advanced ● On the GUI – gedit – gvim
  • 35. November 2015 Giuseppe Profiti Finding text: grep ● It prints the lines containing a match grep “pattern” filename ● Pattern can be a string or a regular expression ● Useful parameters – -w matches whole words (i.e. spaces around) – -x matches whole lines – -i ignore case (uppercase = lowercase) – -v reverse match (i.e. lines NOT containing pattern)
  • 36. November 2015 Giuseppe Profiti Exercise 6: grep ● Download the following file http://profiti.web.cs.unibo.it/res/p2b/ex.tar.gz ● Move it to the working directory ● Uncompress it tar -xvf ex.tar.gz
  • 37. November 2015 Giuseppe Profiti Exercise 6: grep ● Find all the lines containing “m” in test1.txt grep “m” test1.txt ● Find all the lines NOT containing “m” in test1.txt grep -v “m” test1.txt
  • 38. November 2015 Giuseppe Profiti Finding text: grep /2 ● You can provide a file of patterns grep -f patterns.txt data.txt ● The program looks for every line as a separate pattern ● It may take a while if the two files are big
  • 39. November 2015 Giuseppe Profiti Comparing ● Look for the differences in two similar files diff file1 file2 ● Compares the two files line by line ● Output – Line numbers for the different lines – “<” for lines only in file1 – “>” for lines only in file2 ● It is not quite easy to use
  • 40. November 2015 Giuseppe Profiti Sorting ● Diffing is easier when data are sorted sort filename ● Useful parameters: – -n numerical sort (otherwise 100 < 2) – -r reverse sort – -k x sort on column number x – -t x uses x as column separator
  • 41. November 2015 Giuseppe Profiti Getting columns ● Printing a specific column with cut (ex.: 3rd) cut -f 3 filename ● You can specify column separator with -d ● Useful arguments for -f: – N prints the Nth column, counted starting from 1 – N- prints from the Nth to the end of the line – N-M prints from Nth to Mth (included) – -M prints from 1 up to Mth (included)
  • 42. November 2015 Giuseppe Profiti 42/66 Pipe: motivation ● Example: I want the file names for all the files with rwx permissions ● Solution with redirection: ls -l > files.list grep “rwx”files.list > wanted-files.list cut -f 10- -d” ” wanted-files.list > result.list
  • 43. November 2015 Giuseppe Profiti 43/66 Pipe ● Too many intermediate files – Possibly big: disk space issues – Hard to remember: do I need myfiles.list or my.list? ● Rule of thumb: keep intermediate result only if you need it later for further analysis ● For everything else, use pipe | ls -l | grep “rwx” | cut -f 10- -d” ” > result.list ● Pipe sends the result of a command to the input of the following one
  • 44. November 2015 Giuseppe Profiti 44/66 Pipe ● All the previous examples work also without a file as input, but with a pipe ● The first 10 lines of a list of files ls | head ● The first column of the last line of a sorted file sort file.txt | tail -1 | cut -f 1
  • 45. November 2015 Giuseppe Profiti 45/66 Pipe vs sequence ● Pipe sends the result to the next command ● If you want to execute commands in sequence, separate them using ; ls; head test.txt ● What if the second depends from the first? python my.py > a.txt && sort a.txt
  • 46. November 2015 Giuseppe Profiti Shell scripting ● What if the command is very long and you have to use it again? ● What if you have to repeat the same operations for many inputs? ● Shell scripting is programming for the shell ● Same primitives of programming languages – IF choices, FOR loops – Parameters, variables
  • 47. November 2015 Giuseppe Profiti Shell scripting /2 ● Save commands to a text file ● Add execution permissions to the file ● Call the file from the shell ● Example: for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1
  • 48. November 2015 Giuseppe Profiti Shell scripting /3 for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1 ● $( ) returns the output of the commands inside ● Useful for cat and everything that returns a content
  • 49. November 2015 Giuseppe Profiti Shell scripting /4 for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1 ● * is a wildcard, it means “every string” ● In this case, every string ending with “.fasta” ● Other wildcards are: – ? means any single character – [] group choices, i.e. [ae] means either a or e
  • 50. November 2015 Giuseppe Profiti Shell scripting /5 for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1 ● for execute the commands between do and done one time for each iteration ● i is the iteration variable, it gets one of the values (in the example, a file name), you access its value using $i
  • 51. November 2015 Giuseppe Profiti Shell scripting /6 for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1 ● The final result of all the for loops is passed to sort ● This script returns a list of fasta file with an associated number of entries, sorted by that number
  • 52. November 2015 Giuseppe Profiti Shell scripting /7 for i in $(ls *.fasta); do echo $i, $(grep “^>” $i | wc -l); done | sort -n -k 2 > $1 ● The final result is redirected to a file, specified at command line ● Examples: bash myscript.sh result1.txt bash myscript.sh result2.txt
  • 53. November 2015 Giuseppe Profiti Awk ● Awk executes a series of commands for each line of the input ● It can execute different commands for different lines, using matching regular expressions ● It may be faster than other tools ● It is easy to use and powerfull
  • 54. November 2015 Giuseppe Profiti Awk /2 awk '/<regex>/ {<commands>}' a.txt ● You can specify multiple regular expressions ● Commands can contain if and assignments ● Two special keywords instead of regex – BEGIN matches the beginning of the input, before the first line – END matches the end of the input, after the last line
  • 55. November 2015 Giuseppe Profiti Awk /3 awk 'BEGIN {a=0} {a=a+1} END{print a}' ● It counts the number of lines ● Before the first line, sets the variable a to zero ● For each line, increases the counter – There is no regex, so each line matches ● At the end, prints the value of the counter ● Works better than wc -l
  • 56. November 2015 Giuseppe Profiti Awk /4 awk '{print $2,$3}' ● Prints the second and the third column ● Columns are separated by space ● You can specify a different separator with -F awk -F “,” '{print $2,$3}' ● NF is the number of columns (or “fields”) ● $NF is the value of the last column
  • 57. November 2015 Giuseppe Profiti Awk /5 awk '/^ATOM/ {if ($5==”A”) print $7,$8,$9}' ● Prints the positions for each atom in the A chain ● It matches only lines starting with “ATOM” ● You can select lines not matching a pattern awk '!/(TAG)|(TAA)|(TGA)/ {print $3,$4}' ● The ! means “not matching” ● Round brackets group patterns ● | is for alternatives
  • 58. November 2015 Giuseppe Profiti Awk exercise 1 ● Using the example files from http://profiti.web.cs.unibo.it/res/p2b/ex.tar.gz 1.Print lines containing m in test1.txt 2.Print lines not containing m in test1.txt 3.Print lines with A in second column in test1.txt 4.Print the third column of test1.txt (a) Use comma as separator (b) Use E as separator
  • 59. November 2015 Giuseppe Profiti Awk /6 awk 'BEGIN {name=””} /^>/ {name=$0; d[name]=””} !/^>/ {d[name]=d[name]+length($0)} END {for (i in d) print substr(i,2,length(i)),d[i]}' ● Uses an array d, it's like python dictionaries ● $0 is the whole line ● substr is the substring, positions starts from 1 ● Prints a list of fasta entries and their length
  • 60. November 2015 Giuseppe Profiti Awk exercise 2 ● Print the sum of the elements of the third column of test1.txt ● Print the average of the elements of the fourth column of test1.txt ● Take a look at data1.txt and data2.txt – Did you just opened them with an editor? – Did you just used cat?
  • 61. November 2015 Giuseppe Profiti Awk exercise 3 ● How many lines in data1.txt and data2.txt? $wc -l data* 2999997 data1.txt 2999999 data2.txt ● Is it true? – data1.txt contains 2999998 lines – data2.txt contains 3000000 lines ● They contain the same numbers, but 2 ● Which ones?
  • 62. November 2015 Giuseppe Profiti Awk /7 awk 'BEGIN {while ((getline<"patterns.txt")>0)diz[$1]=0} {if ($1 in diz) print $0}' ● Works like grep -f patterns.txt ● Getline reads the file one line at the time ● Each line becomes a key in the array ● The input is then checked against existing keys ● For big files, it is faster than grep – O(N*M) vs O(N+M)
  • 63. November 2015 Giuseppe Profiti Awk exercise 3, solution diff <(sort data1.txt) <(sort data2.txt) ● Diff is picky, the result is not that good – Took 14 seconds on a test computer grep -v -f data1 data2.txt ● Good luck, it may take a while – It may freeze your computer ● Awk takes 4 seconds on a test computer
  • 64. November 2015 Giuseppe Profiti Awk vs Python ● Reading fasta, awk style awk 'BEGIN {name=””} /^>/ {name=$0; d[name]=””} !/^>/ {d[name]=d[name]+length($0)} END {for (i in d) print substr(i,2,length(i)),d[i]}' ● Note: awk scripts can be saved to a file ● Use the -f option to call the saved file
  • 65. November 2015 Giuseppe Profiti Awk vs Python ● Reading fasta, Python style import sys f = open(sys.argv[1]) d = {} name = “” for r in f: r = r.rstrip() if r[0]=='>': name = r[1:] d[name]=0 else: d[name]+=len(r) f.close() for k in d: print k,d[k]
  • 66. November 2015 Giuseppe Profiti 66/66