1. BLAST is a program that uses computer algorithms to compare a query DNA or protein sequence to sequence databases and identify sequences that resemble the query sequence above a certain threshold.
2. BLAST works by searching for short, exact matches between the query and database sequences, then extends the matches to find similar though not exact alignments.
3. Analyzing the BLAST results can provide information about the evolutionary relationship between the query sequence and matched sequences, such as whether they come from the same gene or protein family.
2. BIOINFORMATICS
Bioinformatics is an emerging field of science which uses computer
technology for storage, retrieval, manipulation and distribution of
information related to biological data specifically for DNA, RNA and
proteins.
DATABASE
They are simply the repositories in which all the biological data is
stored as computer language. Databases are variously classified on
varying basis like data type, data source, organisms, etc.
TOOLS
Tools are software developed to perform various tasks over the
stored data such as searches, analysis, submission, annotation, etc.
RESIDUE
Terms stand for the building block of the macromolecules in the
databases. For example nucleotide for DNA & RNA and amino acids
for Proteins.
3. On basis of Data Type
On basis of Data Source
Genome Databases
Sequence Databases Primary Databases
Structure Databases
Secondary Databases
Microarray Databases
Special Categories
Chemical Databases
Metabolic Databases
Integrated Database
Enzyme Databases
Disease Databases Composite Database
Literature Databases
Taxonomy Database
5. BLAST stands for Basic Local Alignment Search Tool
Blast is a program which uses specific scoring matrices (like
PAM or BLOSSUM) for performing sequence-similarity
searches against a variety of sequence databases, to give us
high-scoring ungapped segments among related sequences.
Complex- requires multiple steps and many parameters
The BLAST algorithm is fast, accurate, and web-accessible
Is relatively faster than other sequence similarity search tools.
Provides us with ability to perform analysis by different types
of programs
6. Program Input Query search Database
1
blastn DNA DNA
1
blastp protein protein
6
blastx DNA protein
6
tblastn protein DNA
36
tblastx DNA DNA
Continued
7. blastn compares a DNA query sequence against a DNA
database, allowing for gaps
blastp compares a protein query sequence against a
protein database, allowing for gaps
blastx compares a DNA query sequence translated into
six reading frames against a protein database,
allowing for gaps
tblastn compares a protein query sequence against a
DNA database translated into six reading frames,
allowing for gaps
tblastx compares a DNA query sequence translated into
six reading frames against a DNA database
translated into six reading frames. tblastx doesn’t
allow for gaps.
8. MEGABLAST - for comparison of large sets of long DNA
sequences
RPS-BLAST - Conserved Domain Detection
BLAST 2 Sequences - for performing pair-wise alignments for
2 chosen sequences
Genomic BLAST - for alignments against select human,
microbial or malarial genomes
PSI-BLAST - construct a multiple alignment from
matches
PHI-BLAST -specify a pattern that hits must match
9. Make specific primers with Primer-BLAST
Search trace archives
Find conserved domains in your sequence (cds)
Find sequences with similar conserved domain architecture
(cdart)
Search sequences that have gene expression profiles (GEO)
Search immunoglobulins (IgBLAST)
Search using SNP flanks
Screen sequence for vector contamination (vecscreen)
Align two (or more) sequences using BLAST (bl2seq)
Search protein or nucleotide targets in PubChem BioAssay
Search SRA transcript and genomic libraries
Constraint Based Protein Multiple Alignment Tool
Needleman-Wunsch Global Sequence Alignment Tool
Search RefSeqGene
http://blast.ncbi.nlm.nih.gov/Blast.cgi
10. Although how BLAST works is a little complicated and lengthy so in
short and brief explanation BLAST works in following two steps:
1. BLAST first searches for short regions of a given length (W)
called “words” (or substrings) that score at least “T” when
compared to the query sequence that align with sequences in
the database (“target sequences”), using a substitution matrix.
2. For every pair of sequences (query and target) that have a word
or words in common, BLAST extends the alignment in both
directions to find alignments that score greater (are more
similar) than a certain score threshold (S). These alignments are
called high scoring pairs or HSPs; the maximal scoring HSPs are
called MSPs.
11. Query Sequence
“words” (subsequences of the query sequ
Query words are compared to
the database (target sequences)
and exact matches identified
For each word match, alignment
is extended in both directions to
find alignments that score
greater than some threshold
(Schneider and La Rota 2000) (maximal segment pairs, or MSPs)
12. There are various questions which a BLAST can handle which
commonly arises in the research laboratory. Some of the
most common questions arising are:
Which bacterial species have a protein that is related to a
protein whose amino-acid sequence I know?
Where does the DNA I’ve sequenced come from?
What other genes encode proteins that exhibit structures
similar to the one I’ve just determined?
What does the protein structure looks like?
What is the function of the gene or the protein that I've
sequenced? (if it’s not known then you have some work to do)
What are the probable functions of the sequence I have?
CONTINUED
13. To answer the question arising we use BLAST for searching
the database and then analyse the results which it produces.
Here to explain this we will see an example
We have following sequence of a protein from our
experiments with a Mycobacterium tuberculosis
Sequence:
Now as to see whether this protein has any similarity
between other organisms we perform a BLAST to
understand it’s importance. To perform BLAST we go to
following URL
http://blast.ncbi.nlm.nih.gov/
CONTINUED
14. After performing blast against a chosen or every blast we
perform the analysis of the result
A chosen entry is shown below
This entry shows that the sequence for which we ran BLAST hits
against a database (here Swiss-Prot) has a 88% identity with
Full=Single-stranded DNA-binding protein accession number P46390.2
Continued
15. Entry shows us a score which describes the quality of the entry which has
matched with the query which we have sequenced in our experiment.
With the use of accession number which we have obtained after
organising a BLAST search we can easily access the information about
many aspects. Some of them are described below
• The organism from which it came
• Function of the protein
• Region of DNA encoding for the gene
• length of the sequence
• taxonomy of the organism
• FASTA sequence of the protein
• Links for the 3D structure if it has been found
Similarly we can see whether the sequence which we have sequenced is
homologous (similar) or not with any of the sequence in the database
which we are referring for the search. As mentioned we can search any
database of our interest to check it’s function or function for similar
structures.
16. BLAST is the most important program in bioinformatics
(maybe all of biology)
BLAST is based on sound statistical principles (key to its
speed and sensitivity)
A basic understanding of its principles is key for
using/interpreting BLAST output
BLAST can play an essential role for helping us to purpose
the following
structure of a protein
Function of sequence
Relation with an organism
Use blastn or MEGA-BLAST for DNA
Use PSI-BLAST for protein searches
17. BOOKS
BIOINFORMATICS by by Pevsner
BIOINFORMATICS by Jin Xiong
BIOINFORMATICS by Ghosh and Malik
INTERNET
Slide share www.slideshare.com
NCBI www.blast.ncbi.nlm.nih.gov/Blast.cgi
UniProt/Swiss-Prot www.uniprot.org