SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
BLAST
06/01/2012
Introduction:






Acronym for Basic Local Alignment Search Tool
The BLAST program was developed by Stephen
Altschul et al of NCBI in 1990
Also a heuristic method like FASTA
It is one of the most popular programs for sequence
analysis
enables a researcher to compare a query
sequence with a library or database of
sequences and
 identify library sequences that resemble the
query sequence above a certain threshold
 The objective is to find high-scoring ungapped
segments among related sequences

Using BLAST
 http://www.ncbi.nlm.nih.gov/BLAST
1. Select BLAST program to use (blastn, blastp,

2.
3.
4.

5.

blastx, tblastn, tblastx)
Select database to search
different BLAST programs have different
databases
Enter Query Sequence
Submit Search
Steps in BLAST










The seq is optionally filtered to remove lowcomplexity regions (AGAGAG…)
The next step is to create a list of words from the
query sequence.
Each word is typically 3 residues for protein
sequences and 11 residues for DNA sequences.
The list includes every possible word extracted from
the query sequence.
This step is also called seeding.
PROTEIN WORDS

Query: GTQITVEDLFYNIATRRKALKN
Word Size = 3

Word Size can be 2 or 3 (default = 3)

GTQ
TQI
Make a lookup
Neighborhood Words
table of words QIT
LTV, MTV, ISV, LSV, etc.
ITV
TVE
VED
EDL
DLF
...
NUCLEOTIDE WORDS

Query: GTACTGGACATGGACCCTACAGGAA
Word Size = 11

minimum word size = 7
blastn default = 11
megablast default = 28

GTACTGGACAT
TACTGGACATG
ACTGGACATGG
CTGGACATGGA
Make a
TGGACATGGAC
lookup
GGACATGGACC
table of words
GACATGGACCC
ACATGGACCCT
...........


The third step is to search a sequence
database for the occurrence of these words.



This step is to identify database sequences
containing the matching words
Using substitution scores matrixes the query
seq. words are evaluated for matches with
any DB seq. and these scores (log) are added
 A cut-off score (T) is selected to reduce
number of matches to the most significant
ones
 The above procedure is repeated for each
word in the query seq.
 The remaining high-scoring words are
organised into efficient search tree and rapidly
compared to the DB seq.



If a good match is found then an alignment is
extended from the match area in both
directions as far as the score continue to grow.



The extension continues until the score of the
alignment drops below a threshold due to
mismatches



(the drop threshold is twenty-two for proteins
and twenty for DNA).
The resulting contiguous aligned segment pair
without gaps is called high-scoring segment pair
(HSP )
 In the original version of BLAST, the highest
scored HSPs are presented as the final report

A recent improvement in the implementation
of BLAST is the ability to provide gapped
alignment.
 In gapped BLAST, the highest scored segment
is chosen to be extended in both directions
using dynamic programming where gaps may
be introduced.
 The extension continues if the alignment
score is above a certain threshold otherwise it
is terminated

BLAST Output
1.

2.

3.

4.

an introduction that tells where the search occurred
and what database and query were compared
a list of the sequences in the database containing
segment pairs whose scores were least likely to occur
by chance
alignments of the high-scoring segment pairs showing
identical
and
similar
residues
a complete list of the parameter settings used for the
search.
BLAST Variants
 Program
 BLASTP

Query sequence

Database sequence

protein
 BLASTN
nucleic acid
 BLASTX
translated nucleic acid
 TBLASTN protein
 TBLASTX translated nucleic acid

protein
nucleic acid
protein
translated nucleic acid
translated nucleic acid
Databases available on BLAST Web server
Database Description
A. Peptide sequence databases
1.
nr-translations of GenBank DNA sequences with redundancies removed,
PDB,
SwissProt, PIR, and PRF
2.
month -new or revised entries or updates to nr in the previous 30 days
3.
Swissprot- latest release of the SwissProt protein sequence databasea
4.
Drosophila genome -provided by Celera and Berkeley Drosophila genome
project
5.
yeast -yeast (Saccharomyces cerevisiae) genomic sequences
6.
E. Coli- E. coli genomic sequences
7.
pdb -sequences of proteins of known three-dimensional structure from the
Brookhaven Protein Data Bank
8.
yeast -yeast (S. cerevisiae) protein sequences
9.
E. coli- E. coli genomic coding sequence translations
10. kabat [kabatpro] -Kabat’s database of sequences of immunological interest
11. Alu- translations of select Alu repeats from REPBASE, a database of sequence
repeats
 B. Nucleotide sequence databases
1. nr- GenBank, EMBL, DDBJ, and PDB sequences with redundancies

removed (EST, STS, GSS, and HTGS sequences excluded)
2. month -new or revised entries or updates to nr in the previous 30
days
3. dbestb- EST sequences from GenBank, EMBL, and DDBJ with
redundancies removed
4. dbstsb- STS sequences from GenBank, EMBL, and DDBJ with
redundancies removed
5. htgsb- high-throughput genomic sequences
6. kabat [kabatnuc] -Kabat’s database of sequences of immunological
interest
7. vector- vector subset of GenBank
8. mito -database of mitochondrial sequences
9. alu -select Alu repeats from REPBASE, a database of sequence repeats;
suitable for masking Alu repeats from query sequences
10. epd- eukaryotic promoter database
11. gssb -genome survey sequences, includes single-pass genomic
data,exon-trapped sequences, and Alu PCR sequences
Difference between BLAST and FASTA
BLAST

FASTA

uses a substitution matrix to find matching
words

Uses the hashing procedure

Word size:
Protein=3 ;DNA=11

K-tuple:
Protein=2;DNA=4-6

Faster than FASTA

Slower than BLAST

have higher specificity than FASTA due to
Low complexity masking

Lower specificity
E-value (expectation value)








Important statistical indicator in Sequence alignment
it indicates the probability that the resulting
alignments from a database search are caused by
random chance
The E-value provides information about the
likelihood that a given sequence match is purely by
chance.
The lower the E-value, the less likely the database
match is a result of random chance and therefore
the more significant the match is
Formula
E-value is determined by the equation
 E = m × n × P
Where
 m is the total number of residues in a database
 n is the number of residues in the query sequence
and
 P is the probability that an HSP alignment is a result
of random chance.

Bit Score




A bit score is another prominent statistical indicator
used in addition to the E value in a BLAST output.
The bit score measures sequence similarity
independent of query sequence length and
database size and is normalized based on the raw
pairwise alignment score.
Formula
The bit score (S) is determined by the following formula:
S = (λ × s − lnK)/ ln2
Where
 λ is the Gumble distribution constant,
 s is the raw alignment score, and
 K is a constant associated with the scoring matrix used.
 Thus, the bit score (S) is linearly related to the raw
alignment score (s).
 Hence, the higher the bit score, the more highly
significant the match is.

Blast bioinformatics

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Fasta
FastaFasta
Fasta
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Ddbj
DdbjDdbj
Ddbj
 
Clustal
ClustalClustal
Clustal
 
Scop database
Scop databaseScop database
Scop database
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
BLAST
BLASTBLAST
BLAST
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Structural databases
Structural databases Structural databases
Structural databases
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Fasta
FastaFasta
Fasta
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 

Destaque

Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
Asiri Wijesinghe
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]
BiotechOnline
 
N A ROBSON BROCHURE
N A ROBSON BROCHUREN A ROBSON BROCHURE
N A ROBSON BROCHURE
narobson
 
Protein structure
Protein structureProtein structure
Protein structure
Pooja Pawar
 
090622_blast-clustalw
090622_blast-clustalw090622_blast-clustalw
090622_blast-clustalw
ocha_kaneko
 

Destaque (20)

Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
 
BLAST
BLASTBLAST
BLAST
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]BLAST [Basic Alignment Local Search Tool]
BLAST [Basic Alignment Local Search Tool]
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Fasta
FastaFasta
Fasta
 
Mining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for EffectorsMining Plant Pathogen Genomes for Effectors
Mining Plant Pathogen Genomes for Effectors
 
Blast
BlastBlast
Blast
 
N A ROBSON BROCHURE
N A ROBSON BROCHUREN A ROBSON BROCHURE
N A ROBSON BROCHURE
 
Protein structure
Protein structureProtein structure
Protein structure
 
Similarity
SimilaritySimilarity
Similarity
 
Blast
BlastBlast
Blast
 
UCSD Deans and Chairs Presentation - PDB & Drug Discovery
UCSD Deans and Chairs Presentation - PDB & Drug DiscoveryUCSD Deans and Chairs Presentation - PDB & Drug Discovery
UCSD Deans and Chairs Presentation - PDB & Drug Discovery
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Blast 2013 1
Blast 2013 1Blast 2013 1
Blast 2013 1
 
Sand blasting
Sand blastingSand blasting
Sand blasting
 
Use of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay DesignUse of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay Design
 
090622_blast-clustalw
090622_blast-clustalw090622_blast-clustalw
090622_blast-clustalw
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
 

Semelhante a Blast bioinformatics

Blast fasta
Blast fastaBlast fasta
Blast fasta
yaghava
 

Semelhante a Blast bioinformatics (20)

BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptx
 
Sequence database
Sequence databaseSequence database
Sequence database
 
Blast
BlastBlast
Blast
 
Blasta
BlastaBlasta
Blasta
 
BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234BLAST AND FASTA.pptx12345789999987544321234
BLAST AND FASTA.pptx12345789999987544321234
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
Blast
BlastBlast
Blast
 
BLAST
BLASTBLAST
BLAST
 
Data base searching tool
Data base searching toolData base searching tool
Data base searching tool
 
Lecture 5.pptx
Lecture 5.pptxLecture 5.pptx
Lecture 5.pptx
 
BLAST
BLASTBLAST
BLAST
 
Sequence Analysis
Sequence AnalysisSequence Analysis
Sequence Analysis
 
BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf
BIOINFORMATICS_AND_PHYLOGENY.pdf.pdfBIOINFORMATICS_AND_PHYLOGENY.pdf.pdf
BIOINFORMATICS_AND_PHYLOGENY.pdf.pdf
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Basic BLAST (BLASTn)
Basic BLAST (BLASTn)Basic BLAST (BLASTn)
Basic BLAST (BLASTn)
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

Blast bioinformatics

  • 2. Introduction:     Acronym for Basic Local Alignment Search Tool The BLAST program was developed by Stephen Altschul et al of NCBI in 1990 Also a heuristic method like FASTA It is one of the most popular programs for sequence analysis
  • 3. enables a researcher to compare a query sequence with a library or database of sequences and  identify library sequences that resemble the query sequence above a certain threshold  The objective is to find high-scoring ungapped segments among related sequences 
  • 4. Using BLAST  http://www.ncbi.nlm.nih.gov/BLAST 1. Select BLAST program to use (blastn, blastp, 2. 3. 4. 5. blastx, tblastn, tblastx) Select database to search different BLAST programs have different databases Enter Query Sequence Submit Search
  • 5. Steps in BLAST      The seq is optionally filtered to remove lowcomplexity regions (AGAGAG…) The next step is to create a list of words from the query sequence. Each word is typically 3 residues for protein sequences and 11 residues for DNA sequences. The list includes every possible word extracted from the query sequence. This step is also called seeding.
  • 6. PROTEIN WORDS Query: GTQITVEDLFYNIATRRKALKN Word Size = 3 Word Size can be 2 or 3 (default = 3) GTQ TQI Make a lookup Neighborhood Words table of words QIT LTV, MTV, ISV, LSV, etc. ITV TVE VED EDL DLF ...
  • 7. NUCLEOTIDE WORDS Query: GTACTGGACATGGACCCTACAGGAA Word Size = 11 minimum word size = 7 blastn default = 11 megablast default = 28 GTACTGGACAT TACTGGACATG ACTGGACATGG CTGGACATGGA Make a TGGACATGGAC lookup GGACATGGACC table of words GACATGGACCC ACATGGACCCT ...........
  • 8.  The third step is to search a sequence database for the occurrence of these words.  This step is to identify database sequences containing the matching words
  • 9. Using substitution scores matrixes the query seq. words are evaluated for matches with any DB seq. and these scores (log) are added  A cut-off score (T) is selected to reduce number of matches to the most significant ones  The above procedure is repeated for each word in the query seq.  The remaining high-scoring words are organised into efficient search tree and rapidly compared to the DB seq. 
  • 10.  If a good match is found then an alignment is extended from the match area in both directions as far as the score continue to grow.  The extension continues until the score of the alignment drops below a threshold due to mismatches  (the drop threshold is twenty-two for proteins and twenty for DNA).
  • 11. The resulting contiguous aligned segment pair without gaps is called high-scoring segment pair (HSP )  In the original version of BLAST, the highest scored HSPs are presented as the final report 
  • 12.
  • 13. A recent improvement in the implementation of BLAST is the ability to provide gapped alignment.  In gapped BLAST, the highest scored segment is chosen to be extended in both directions using dynamic programming where gaps may be introduced.  The extension continues if the alignment score is above a certain threshold otherwise it is terminated 
  • 14. BLAST Output 1. 2. 3. 4. an introduction that tells where the search occurred and what database and query were compared a list of the sequences in the database containing segment pairs whose scores were least likely to occur by chance alignments of the high-scoring segment pairs showing identical and similar residues a complete list of the parameter settings used for the search.
  • 15.
  • 16.
  • 17. BLAST Variants  Program  BLASTP Query sequence Database sequence protein  BLASTN nucleic acid  BLASTX translated nucleic acid  TBLASTN protein  TBLASTX translated nucleic acid protein nucleic acid protein translated nucleic acid translated nucleic acid
  • 18. Databases available on BLAST Web server Database Description A. Peptide sequence databases 1. nr-translations of GenBank DNA sequences with redundancies removed, PDB, SwissProt, PIR, and PRF 2. month -new or revised entries or updates to nr in the previous 30 days 3. Swissprot- latest release of the SwissProt protein sequence databasea 4. Drosophila genome -provided by Celera and Berkeley Drosophila genome project 5. yeast -yeast (Saccharomyces cerevisiae) genomic sequences 6. E. Coli- E. coli genomic sequences 7. pdb -sequences of proteins of known three-dimensional structure from the Brookhaven Protein Data Bank 8. yeast -yeast (S. cerevisiae) protein sequences 9. E. coli- E. coli genomic coding sequence translations 10. kabat [kabatpro] -Kabat’s database of sequences of immunological interest 11. Alu- translations of select Alu repeats from REPBASE, a database of sequence repeats
  • 19.  B. Nucleotide sequence databases 1. nr- GenBank, EMBL, DDBJ, and PDB sequences with redundancies removed (EST, STS, GSS, and HTGS sequences excluded) 2. month -new or revised entries or updates to nr in the previous 30 days 3. dbestb- EST sequences from GenBank, EMBL, and DDBJ with redundancies removed 4. dbstsb- STS sequences from GenBank, EMBL, and DDBJ with redundancies removed 5. htgsb- high-throughput genomic sequences 6. kabat [kabatnuc] -Kabat’s database of sequences of immunological interest 7. vector- vector subset of GenBank 8. mito -database of mitochondrial sequences 9. alu -select Alu repeats from REPBASE, a database of sequence repeats; suitable for masking Alu repeats from query sequences 10. epd- eukaryotic promoter database 11. gssb -genome survey sequences, includes single-pass genomic data,exon-trapped sequences, and Alu PCR sequences
  • 20. Difference between BLAST and FASTA BLAST FASTA uses a substitution matrix to find matching words Uses the hashing procedure Word size: Protein=3 ;DNA=11 K-tuple: Protein=2;DNA=4-6 Faster than FASTA Slower than BLAST have higher specificity than FASTA due to Low complexity masking Lower specificity
  • 21. E-value (expectation value)     Important statistical indicator in Sequence alignment it indicates the probability that the resulting alignments from a database search are caused by random chance The E-value provides information about the likelihood that a given sequence match is purely by chance. The lower the E-value, the less likely the database match is a result of random chance and therefore the more significant the match is
  • 22. Formula E-value is determined by the equation  E = m × n × P Where  m is the total number of residues in a database  n is the number of residues in the query sequence and  P is the probability that an HSP alignment is a result of random chance. 
  • 23. Bit Score   A bit score is another prominent statistical indicator used in addition to the E value in a BLAST output. The bit score measures sequence similarity independent of query sequence length and database size and is normalized based on the raw pairwise alignment score.
  • 24. Formula The bit score (S) is determined by the following formula: S = (λ × s − lnK)/ ln2 Where  λ is the Gumble distribution constant,  s is the raw alignment score, and  K is a constant associated with the scoring matrix used.  Thus, the bit score (S) is linearly related to the raw alignment score (s).  Hence, the higher the bit score, the more highly significant the match is. 