blast bioinformatics

By
Harpreet Singh Kalsi
Hans Raj College

BIOINFORMATICS
Bioinformatics is an emerging field of science which uses computer
technology for storage, retrieval, manipulation and distribution of
information related to biological data specifically for DNA, RNA and
proteins.

DATABASE
They are simply the repositories in which all the biological data is
stored as computer language. Databases are variously classified on
varying basis like data type, data source, organisms, etc.

TOOLS
Tools are software developed to perform various tasks over the
stored data such as searches, analysis, submission, annotation, etc.

RESIDUE
Terms stand for the building block of the macromolecules in the
databases. For example nucleotide for DNA & RNA and amino acids
for Proteins.

On basis of Data Type
On basis of Data Source
Genome Databases
Sequence Databases Primary Databases

Structure Databases
Secondary Databases
Microarray Databases
Special Categories
Chemical Databases
Metabolic Databases
Integrated Database
Enzyme Databases
Disease Databases Composite Database
Literature Databases
Taxonomy Database

IMPORTANT DATABASES IMPORTANT TOOLS
 NCBI (Integrated database)  BLAST (search and
 EMBL (Nucleotide database) homology tool)
 DDBJ (Nucleotide database)  FASTA (search and
GenBank (Nucleotide

database) homology tool)
 SWISS-PROT (Protein  BankIt (submission tool)
database)  Sequin(submission tool)
 OMIM (Disease database)  ORF Finder (analysis tool)
 PDB (Structure database)  TXSearch (retrieval tool for
 KEGG (Metabolic database) taxonomy database)
 PubMed (Literature database)  SAKURA (submission tool in
 Enzymes (Enzyme database) DDBJ)
PANDIT (taxonomy database)

 ClustalW (multiple sequence
 ArrayExpress (microarray
database) alignment)
 MSDFold (protein secondary
structure comparison tool)

 BLAST stands for Basic Local Alignment Search Tool

 Blast is a program which uses specific scoring matrices (like
PAM or BLOSSUM) for performing sequence-similarity
searches against a variety of sequence databases, to give us
high-scoring ungapped segments among related sequences.

 Complex- requires multiple steps and many parameters

 The BLAST algorithm is fast, accurate, and web-accessible

 Is relatively faster than other sequence similarity search tools.

 Provides us with ability to perform analysis by different types
of programs

Program Input Query search Database
1
 blastn DNA DNA
1
 blastp protein protein
6
 blastx DNA protein
6
 tblastn protein DNA
36
 tblastx DNA DNA

Continued

blastn compares a DNA query sequence against a DNA
database, allowing for gaps
blastp compares a protein query sequence against a
protein database, allowing for gaps
blastx compares a DNA query sequence translated into
six reading frames against a protein database,
allowing for gaps
tblastn compares a protein query sequence against a
DNA database translated into six reading frames,
allowing for gaps
tblastx compares a DNA query sequence translated into
six reading frames against a DNA database
translated into six reading frames. tblastx doesn’t
allow for gaps.

 MEGABLAST - for comparison of large sets of long DNA
sequences

 RPS-BLAST - Conserved Domain Detection

 BLAST 2 Sequences - for performing pair-wise alignments for
2 chosen sequences

 Genomic BLAST - for alignments against select human,
microbial or malarial genomes

 PSI-BLAST - construct a multiple alignment from
matches

 PHI-BLAST -specify a pattern that hits must match

 Make specific primers with Primer-BLAST
 Search trace archives
 Find conserved domains in your sequence (cds)
 Find sequences with similar conserved domain architecture
(cdart)
 Search sequences that have gene expression profiles (GEO)
 Search immunoglobulins (IgBLAST)
 Search using SNP flanks
 Screen sequence for vector contamination (vecscreen)
 Align two (or more) sequences using BLAST (bl2seq)
 Search protein or nucleotide targets in PubChem BioAssay
 Search SRA transcript and genomic libraries
 Constraint Based Protein Multiple Alignment Tool
 Needleman-Wunsch Global Sequence Alignment Tool
 Search RefSeqGene
http://blast.ncbi.nlm.nih.gov/Blast.cgi

Although how BLAST works is a little complicated and lengthy so in
short and brief explanation BLAST works in following two steps:

1. BLAST first searches for short regions of a given length (W)
called “words” (or substrings) that score at least “T” when
compared to the query sequence that align with sequences in
the database (“target sequences”), using a substitution matrix.

2. For every pair of sequences (query and target) that have a word
or words in common, BLAST extends the alignment in both
directions to find alignments that score greater (are more
similar) than a certain score threshold (S). These alignments are
called high scoring pairs or HSPs; the maximal scoring HSPs are
called MSPs.

Query Sequence

“words” (subsequences of the query sequ

Query words are compared to
the database (target sequences)
and exact matches identified

For each word match, alignment
is extended in both directions to
find alignments that score
greater than some threshold
(Schneider and La Rota 2000) (maximal segment pairs, or MSPs)

There are various questions which a BLAST can handle which
commonly arises in the research laboratory. Some of the
most common questions arising are:
 Which bacterial species have a protein that is related to a
protein whose amino-acid sequence I know?
 Where does the DNA I’ve sequenced come from?
 What other genes encode proteins that exhibit structures
similar to the one I’ve just determined?
 What does the protein structure looks like?
 What is the function of the gene or the protein that I've
sequenced? (if it’s not known then you have some work to do)
 What are the probable functions of the sequence I have?

CONTINUED

To answer the question arising we use BLAST for searching
the database and then analyse the results which it produces.
Here to explain this we will see an example
We have following sequence of a protein from our
experiments with a Mycobacterium tuberculosis
Sequence:

Now as to see whether this protein has any similarity
between other organisms we perform a BLAST to
understand it’s importance. To perform BLAST we go to
following URL
http://blast.ncbi.nlm.nih.gov/

CONTINUED

 After performing blast against a chosen or every blast we
perform the analysis of the result
 A chosen entry is shown below

This entry shows that the sequence for which we ran BLAST hits
against a database (here Swiss-Prot) has a 88% identity with
Full=Single-stranded DNA-binding protein accession number P46390.2

Continued

Entry shows us a score which describes the quality of the entry which has
matched with the query which we have sequenced in our experiment.

With the use of accession number which we have obtained after
organising a BLAST search we can easily access the information about
many aspects. Some of them are described below
• The organism from which it came
• Function of the protein
• Region of DNA encoding for the gene
• length of the sequence
• taxonomy of the organism
• FASTA sequence of the protein
• Links for the 3D structure if it has been found

Similarly we can see whether the sequence which we have sequenced is
homologous (similar) or not with any of the sequence in the database
which we are referring for the search. As mentioned we can search any
database of our interest to check it’s function or function for similar
structures.

 BLAST is the most important program in bioinformatics
(maybe all of biology)
 BLAST is based on sound statistical principles (key to its
speed and sensitivity)
 A basic understanding of its principles is key for
using/interpreting BLAST output
 BLAST can play an essential role for helping us to purpose
the following
structure of a protein
Function of sequence
Relation with an organism
 Use blastn or MEGA-BLAST for DNA
 Use PSI-BLAST for protein searches

BOOKS

 BIOINFORMATICS by by Pevsner
 BIOINFORMATICS by Jin Xiong
 BIOINFORMATICS by Ghosh and Malik

INTERNET

 Slide share www.slideshare.com
 NCBI www.blast.ncbi.nlm.nih.gov/Blast.cgi
 UniProt/Swiss-Prot www.uniprot.org

blast bioinformatics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to blast bioinformatics

Similar to blast bioinformatics (20)

Recently uploaded

Recently uploaded (20)

blast bioinformatics