SlideShare uma empresa Scribd logo
1 de 42
Sequence comparison technique Ms.ruchiyadavlectureramity institute of biotechnologyamity universitylucknow(up)
Sequence comparison technique Pairwise Alignment Local Alignment(Smith WatermanAlgorithm) Global Alignment(Needleman Wunsch  Algorithm) Multiple Alignment Heuristic Methods Rather than struggling to find the optimal alignment we may save a lot of time by employing heuristic algorithms Execution time is much faster May completely miss the optimal alignment  FASTA and  BLAST
A T T G A C T T A A G 1 1 1 1 1 1 1 1 1 1 1 G 2 2 2 2 1 1 1 1 1 1 1 G 2 2 2 2 2 2 2 2 2 2 1 A 3 3 3 3 3 3 3 3 2 2 1 T 4 4 4 4 4 4 3 3 2 2 1 C 5 5 5 5 4 4 3 3 2 2 1 G 6 5 5 5 5 4 3 3 3 2 1 A Heuristic Methods Problem of Dynamic Programming     D.P. compute  the score in a lot of useless area for optimal sequence FASTA focuses on diagonal area
Heuristic   Heuristic    Good local alignment should have some exact match subsequence. FASTA focus on this area
Heuristic Methods: FASTA and BLAST FASTA  First fast sequence searching algorithm for comparing a query sequence against a database. BLAST  Basic Local Alignment Search Technique 	Improvement of FASTA: Search speed, ease of use, statistical rigor.
FASTA ALGORITHM (a)Find runs of identical words Identify regions shared by the two sequences that have the highest density of single identities (ktup=1) or two consecutive identities(ktup=2) (b) Re-score using PAM matrix.  Longest diagonals are scored again using the PAM-250 matrix (or other matrix).  The best scores are saved as “init1” scores.
FASTA Algorithm “init1”  ktup=2
FASTA ALGORITHM                   (c) Join segments using gaps and eliminate other   segments.  Longdiagonals that are neighbors are joined.  The score for this joined region is“initn”.  This score may be lower due to a penalty for a gap. (d) Use DP to create the optimal alignment.  construct an optimal alignment of the query sequence and the library sequence (SW algorithm).This score is reported as the optimized score
FASTA Alignments “initn”
FASTA Algorithm- Find words of identical words.  Lookup table showing the positions of each word of length k, or k-tuple, is constructed for each sequence.  The relative positions of each word in the two sequences are then calculated by subtracting the position in the first sequence from that in the second.  Words that have the same offset position are in phase and reveal a region of alignment between the two sequences.
Look-up table
A T T G A C T T A A G * * G Location Q * * G 2,3,7,11 A * * * * A 6 C * * * * T 1,8 G * C * * G 4,5,9,10 T * * * * A FASTA   - Algorithm - Use look-up Table Query     : G A A T T C A G T T A Sequence: G G A T C G A Dot—Matrix       1    2   3   4   5   6   7   8   9  10  11 Look-up Table
FASTA  - Algorithm - Use the dynamic programming in restricted area around the best-score alignment to find out the higher-score alignment than the best-score alignment Width of this band is a parameter
FASTA  - Complexity  Complexity  Step 1 and 2  	// select the best 10 diagonal run//        Let n be a sequence from DB O(n) because Step 1 just uses look up table        O(n) << O(mn)    m,n = 100 to 200
FASTA  - Complexity  compute partial D.P. Depends on the restricted area < O(mn)  Therefore, FASTA is faster than D.P. Width of this band is a parameter
Step 1: Finding Seeds  t s 16
Step 2: Re-scoring Segments, Keeping Top 10  t s 17
Step 3: Eliminating Unlikely Segments  t s 18
Step 4: Finding the Best Alignment  t s 19
Versions of FASTA FASTA compares a query protein sequence to a protein sequence library to find similar sequences. FASTA also compares a DNA sequence to a DNA sequence library. TFASTA compares a query protein sequence to a DNA sequence library, after translating the DNA sequence library in all six reading frames. FASTX and FASTY translate a query DNA sequence in all three reading forward frames and compare all three frames to a protein sequence database. TFASTX and TFASTY compare a query protein sequence to a DNA sequence database, translating each DNA sequence in all six possible reading frames.
BLAST Publications: Ungapped BLAST – Alttschul et al., 1990 Gapped BLAST, PSI-BLAST -  Altschul et al., 1997 Basic Local Alignment Search Tool Altschul et al. 1990,1994,1997 Heuristic method for local alignment Designed specifically for database searches Based on the same assumption as FASTA that good alignments contain short lengths of exact matches
Basic Local Alignment Search Tool (BLAST) Input: Query (target) sequence– either DNA, RNA or Protein Scoring Scheme– gap penalties, substitution matrix for proteins, identity/mismatch scores for DNA/RNA Word length W– typical is W=3 for proteins and W=11 for DNA/RNA Output: Statistically significant matches   22
BLAST ALGORITHM PARAMETERS
Algorithm of BLAST There are three distinct steps, which are represented as follow: Step1: Query preprocessing; Step2: Scan the database for hits; Step3: Extension of hits.
BLAST  - Algorithm  Step 1: Query preprocessing; 	Create neighbourhood words for each query word  	Max:L-w+1 Query Word Neighborhood words
BLAST  - Algorithm  Step 1: Query preprocessing; A list of words of length 3 for protein  (word length 11 is used for DNA sequences)
BLAST -Query preprocessing Compile the short-hit scoring word list from query.      The length of query word, is 3. Words below threshold are not further pursued.
BLAST  - Algorithm  Step 2: Scan the database for hits; For each words list, identify all exact matches with DB sequences Neighborhood Word list Query Word Sequences in DB Sequence 1 Sequence 2 Step 2 Step 1 The purpose of Step 1 and 2 is as same as FASTA
Step3:Extension of the hits Every hit that has been generated is now extended in both directions, without gaps. To determine whether each hit may be part of a longer segment pair with higher score,
Step3:Extension of the hits HSP (High scoring Segment Pair).  If the extended segment pair has score better than equal to S (set as a parameter of the program), it is called HSP MSP (Maximal segment pair).  In a comparison, for every sequence in the database, the best scoring HSP is called the MSP
HIGH –SCORING PAIR(HSP)
Maximal segment pair(msp)
Step 2: Extracting Seeds t s 33
Step 3: Finding HSPs t s 34
Step 4: Combining HSPs t s 35
BLAST
Basic BLAST
Specialized BLAST ,[object Object]
 Search trace archives
 Find conserved domains in your sequence (cds)
 Find sequences with similar conserved domain architecture (cdart)
 Search sequences that have gene expression  profiles (GEO)

Mais conteúdo relacionado

Mais procurados

Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
avrilcoghlan
 

Mais procurados (20)

sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Structure alignment methods
Structure alignment methodsStructure alignment methods
Structure alignment methods
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
FASTA
FASTAFASTA
FASTA
 
BLAST
BLASTBLAST
BLAST
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Gene prediction methods vijay
Gene prediction methods  vijayGene prediction methods  vijay
Gene prediction methods vijay
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Cath
CathCath
Cath
 
Secondary protein structure prediction
Secondary protein structure predictionSecondary protein structure prediction
Secondary protein structure prediction
 
Multiple Sequence Alignment
Multiple Sequence AlignmentMultiple Sequence Alignment
Multiple Sequence Alignment
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 

Destaque

Introduction to Probabilistic Models for Bioinformatics
Introduction to Probabilistic Models for BioinformaticsIntroduction to Probabilistic Models for Bioinformatics
Introduction to Probabilistic Models for Bioinformatics
ibogicevic
 

Destaque (20)

Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
Pairwise Alignment Course - Verify Your Cloning
Pairwise Alignment Course - Verify Your Cloning Pairwise Alignment Course - Verify Your Cloning
Pairwise Alignment Course - Verify Your Cloning
 
Introduction to Probabilistic Models for Bioinformatics
Introduction to Probabilistic Models for BioinformaticsIntroduction to Probabilistic Models for Bioinformatics
Introduction to Probabilistic Models for Bioinformatics
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Ch06 rna
Ch06 rnaCh06 rna
Ch06 rna
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Blast
BlastBlast
Blast
 
Global local alignment
Global local alignmentGlobal local alignment
Global local alignment
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein features
 
Publicly available tools and open resources in Bioinformatics
Publicly available  tools and open resources in BioinformaticsPublicly available  tools and open resources in Bioinformatics
Publicly available tools and open resources in Bioinformatics
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 
Sequence alignment belgaum
Sequence alignment belgaumSequence alignment belgaum
Sequence alignment belgaum
 
Sequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSASequence Alignment,Blast, Fasta, MSA
Sequence Alignment,Blast, Fasta, MSA
 
Genome evolution
Genome evolutionGenome evolution
Genome evolution
 

Semelhante a Sequence comparison techniques

Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
Prof. Wim Van Criekinge
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
atmapandey
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
yaghava
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
alizain9604
 

Semelhante a Sequence comparison techniques (20)

Mayank
MayankMayank
Mayank
 
2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge2016 bioinformatics i_database_searching_wimvancriekinge
2016 bioinformatics i_database_searching_wimvancriekinge
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014Bioinformatics t5-databasesearching v2014
Bioinformatics t5-databasesearching v2014
 
BLAST AND FASTA.pptx
BLAST AND FASTA.pptxBLAST AND FASTA.pptx
BLAST AND FASTA.pptx
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge2015 bioinformatics database_searching_wimvancriekinge
2015 bioinformatics database_searching_wimvancriekinge
 
FastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHMFastA HOMOLOGY SEARCH ALGORITHM
FastA HOMOLOGY SEARCH ALGORITHM
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
blast and fasta
 blast and fasta blast and fasta
blast and fasta
 
Database Searching
Database SearchingDatabase Searching
Database Searching
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
_BLAST.ppt
_BLAST.ppt_BLAST.ppt
_BLAST.ppt
 
Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)Sequence homology search and multiple sequence alignment(1)
Sequence homology search and multiple sequence alignment(1)
 
Blast gp assignment
Blast  gp assignmentBlast  gp assignment
Blast gp assignment
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
BLAST_CSS2.ppt
BLAST_CSS2.pptBLAST_CSS2.ppt
BLAST_CSS2.ppt
 
Sequence database
Sequence databaseSequence database
Sequence database
 

Último

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Último (20)

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 

Sequence comparison techniques

  • 1. Sequence comparison technique Ms.ruchiyadavlectureramity institute of biotechnologyamity universitylucknow(up)
  • 2. Sequence comparison technique Pairwise Alignment Local Alignment(Smith WatermanAlgorithm) Global Alignment(Needleman Wunsch Algorithm) Multiple Alignment Heuristic Methods Rather than struggling to find the optimal alignment we may save a lot of time by employing heuristic algorithms Execution time is much faster May completely miss the optimal alignment FASTA and BLAST
  • 3. A T T G A C T T A A G 1 1 1 1 1 1 1 1 1 1 1 G 2 2 2 2 1 1 1 1 1 1 1 G 2 2 2 2 2 2 2 2 2 2 1 A 3 3 3 3 3 3 3 3 2 2 1 T 4 4 4 4 4 4 3 3 2 2 1 C 5 5 5 5 4 4 3 3 2 2 1 G 6 5 5 5 5 4 3 3 3 2 1 A Heuristic Methods Problem of Dynamic Programming D.P. compute the score in a lot of useless area for optimal sequence FASTA focuses on diagonal area
  • 4. Heuristic Heuristic Good local alignment should have some exact match subsequence. FASTA focus on this area
  • 5. Heuristic Methods: FASTA and BLAST FASTA First fast sequence searching algorithm for comparing a query sequence against a database. BLAST Basic Local Alignment Search Technique Improvement of FASTA: Search speed, ease of use, statistical rigor.
  • 6. FASTA ALGORITHM (a)Find runs of identical words Identify regions shared by the two sequences that have the highest density of single identities (ktup=1) or two consecutive identities(ktup=2) (b) Re-score using PAM matrix. Longest diagonals are scored again using the PAM-250 matrix (or other matrix). The best scores are saved as “init1” scores.
  • 8. FASTA ALGORITHM (c) Join segments using gaps and eliminate other segments. Longdiagonals that are neighbors are joined. The score for this joined region is“initn”. This score may be lower due to a penalty for a gap. (d) Use DP to create the optimal alignment. construct an optimal alignment of the query sequence and the library sequence (SW algorithm).This score is reported as the optimized score
  • 10. FASTA Algorithm- Find words of identical words. Lookup table showing the positions of each word of length k, or k-tuple, is constructed for each sequence. The relative positions of each word in the two sequences are then calculated by subtracting the position in the first sequence from that in the second. Words that have the same offset position are in phase and reveal a region of alignment between the two sequences.
  • 12. A T T G A C T T A A G * * G Location Q * * G 2,3,7,11 A * * * * A 6 C * * * * T 1,8 G * C * * G 4,5,9,10 T * * * * A FASTA - Algorithm - Use look-up Table Query : G A A T T C A G T T A Sequence: G G A T C G A Dot—Matrix 1 2 3 4 5 6 7 8 9 10 11 Look-up Table
  • 13. FASTA - Algorithm - Use the dynamic programming in restricted area around the best-score alignment to find out the higher-score alignment than the best-score alignment Width of this band is a parameter
  • 14. FASTA - Complexity Complexity Step 1 and 2 // select the best 10 diagonal run// Let n be a sequence from DB O(n) because Step 1 just uses look up table O(n) << O(mn) m,n = 100 to 200
  • 15. FASTA - Complexity compute partial D.P. Depends on the restricted area < O(mn) Therefore, FASTA is faster than D.P. Width of this band is a parameter
  • 16. Step 1: Finding Seeds t s 16
  • 17. Step 2: Re-scoring Segments, Keeping Top 10 t s 17
  • 18. Step 3: Eliminating Unlikely Segments t s 18
  • 19. Step 4: Finding the Best Alignment t s 19
  • 20. Versions of FASTA FASTA compares a query protein sequence to a protein sequence library to find similar sequences. FASTA also compares a DNA sequence to a DNA sequence library. TFASTA compares a query protein sequence to a DNA sequence library, after translating the DNA sequence library in all six reading frames. FASTX and FASTY translate a query DNA sequence in all three reading forward frames and compare all three frames to a protein sequence database. TFASTX and TFASTY compare a query protein sequence to a DNA sequence database, translating each DNA sequence in all six possible reading frames.
  • 21. BLAST Publications: Ungapped BLAST – Alttschul et al., 1990 Gapped BLAST, PSI-BLAST - Altschul et al., 1997 Basic Local Alignment Search Tool Altschul et al. 1990,1994,1997 Heuristic method for local alignment Designed specifically for database searches Based on the same assumption as FASTA that good alignments contain short lengths of exact matches
  • 22. Basic Local Alignment Search Tool (BLAST) Input: Query (target) sequence– either DNA, RNA or Protein Scoring Scheme– gap penalties, substitution matrix for proteins, identity/mismatch scores for DNA/RNA Word length W– typical is W=3 for proteins and W=11 for DNA/RNA Output: Statistically significant matches 22
  • 24. Algorithm of BLAST There are three distinct steps, which are represented as follow: Step1: Query preprocessing; Step2: Scan the database for hits; Step3: Extension of hits.
  • 25. BLAST - Algorithm Step 1: Query preprocessing; Create neighbourhood words for each query word Max:L-w+1 Query Word Neighborhood words
  • 26. BLAST - Algorithm Step 1: Query preprocessing; A list of words of length 3 for protein (word length 11 is used for DNA sequences)
  • 27. BLAST -Query preprocessing Compile the short-hit scoring word list from query. The length of query word, is 3. Words below threshold are not further pursued.
  • 28. BLAST - Algorithm Step 2: Scan the database for hits; For each words list, identify all exact matches with DB sequences Neighborhood Word list Query Word Sequences in DB Sequence 1 Sequence 2 Step 2 Step 1 The purpose of Step 1 and 2 is as same as FASTA
  • 29. Step3:Extension of the hits Every hit that has been generated is now extended in both directions, without gaps. To determine whether each hit may be part of a longer segment pair with higher score,
  • 30. Step3:Extension of the hits HSP (High scoring Segment Pair). If the extended segment pair has score better than equal to S (set as a parameter of the program), it is called HSP MSP (Maximal segment pair). In a comparison, for every sequence in the database, the best scoring HSP is called the MSP
  • 33. Step 2: Extracting Seeds t s 33
  • 34. Step 3: Finding HSPs t s 34
  • 35. Step 4: Combining HSPs t s 35
  • 36. BLAST
  • 38.
  • 39. Search trace archives
  • 40. Find conserved domains in your sequence (cds)
  • 41. Find sequences with similar conserved domain architecture (cdart)
  • 42. Search sequences that have gene expression profiles (GEO)
  • 44. Search for SNPs(snp)
  • 45. Screen sequence for vector contamination (vecscreen)
  • 46. Align two (or more) sequences using BLAST (bl2seq)
  • 47. Search protein or nucleotide targets in PubChem BioAssay
  • 48. Search SRA transcript and genomic libraries
  • 49. Constraint Based Protein Multiple Alignment Tool
  • 50.
  • 51. Databases available on BLAST Web server
  • 52. Databases available on BLAST Web server
  • 53. Options and parameter settings available on the BLAST server