SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
Sequence Alignment
Outline
1. Global Alignment
2. Scoring Matrices
3. Local Alignment
4. Alignment with Affine Gap Penalties
Section 1:
Global Alignment
From LCS to Alignment: Change the Scoring
• Recall: The Longest Common Subsequence (LCS) problem
allows only insertions and deletions (no mismatches).
• In the LCS Problem, we scored 1 for matches and 0 for indels,
so our alignment score was simply equal to the total number of
matches.
• Let’s consider penalizing mismatches and indels instead.
• Simplest scoring schema: For some positive numbers μ and σ:
• Match Premium: +1
• Mismatch Penalty: –μ
• Indel Penalty: –σ
• Under these assumptions, the alignment score becomes as
follows:
Score = #matches – μ(#mismatches) – σ(#indels)
• Our specific choice of µ and σ depends on how we wish to
penalize mismatches and indels.
From LCS to Alignment: Change the Scoring
The Global Alignment Problem
• Input : Strings v and w and a scoring schema
• Output : An alignment with maximum score
• We can use dynamic programming to solve the Global
Alignment Problem:
m : mismatch penalty
σ : indel penalty
if vi = wj
si, j
 max
si 1, j 1
 1
si 1, j 1
 m
si 1, j
 
si, j 1
 







if vi ≠ wj
Section 2:
Scoring Matrices
Scoring Matrices
• To further generalize the scoring of alignments, consider a
(4+1) x (4+1) scoring matrix δ.
• The purpose of the scoring matrix is to score one nucleotide
against another, e.g. A matched to G may be ―worse‖ than C
matched to T.
• The addition of 1 is to include the score for comparison of a
gap character ―-‖.
• This will simplify the
algorithm to the dynamic
formula at right:
Note: For amino acid sequence comparison, we need a (20 + 1) x (20 + 1) matrix.
si, j
 max
si 1, j 1
  v i
, w j 
si 1, j
  v i
,  
si, j 1
  , w j 







Scoring Matrices: Example
A G T C —
A 1 -0.8 -0.2 -2.3 -0.6
G -0.8 1 -1.1 -0.7 -1.5
T -0.2 -1.1 1 -0.5 -0.9
C -2.3 -0.7 -0.5 1 -1
— -0.6 -1.5 -0.9 -1 n/a
A GTC A
CGTTGG
Score: –0.6 – 1 + 1 + 1 – 0.5 – 1.5 – 0.8 = –2.4
• Say we want to align AGTCA and CGTTGG with the
following scoring matrix:
Sample Alignment:
How Do We Make a Scoring Matrix?
• Scoring matrices are created based on biological evidence.
• Alignments can be thought of as two sequences that differ due
to mutations.
• Some of these mutations have little effect on the protein’s
function, therefore some penalties, δ(vi , wj ), will be less harsh
than others.
• This explains why we would want to have a scoring matrix to
begin with.
Scoring Matrix: Positive Mismatches
A R N K
A 5 -2 -1 -1
R -2 7 -1 3
N -1 -1 7 0
K -1 3 0 6
• Notice that although R and
K are different amino acids,
they have a positive
mismatch score.
• Why? They are both
positively charged amino
acids  this mismatch will
not greatly change the
function of the protein.
Scoring Matrix: Positive Mismatches
A R N K
A 5 -2 -1 -1
R -2 7 -1 3
N -1 -1 7 0
K -1 3 0 6
• Notice that although R and
K are different amino acids,
they have a positive
mismatch score.
• Why? They are both
positively charged amino
acids  this mismatch will
not greatly change the
function of the protein.
Scoring Matrix: Positive Mismatches
A R N K
A 5 -2 -1 -1
R -2 7 -1 3
N -1 -1 7 0
K -1 3 0 6
• Notice that although R and
K are different amino acids,
they have a positive
mismatch score.
• Why? They are both
positively charged amino
acids  this mismatch will
not greatly change the
function of the protein.
Mismatches with Low Penalties
• Amino acid changes that tend to preserve the physicochemical
properties of the original residue:
• Polar to Polar
• Aspartate to Glutamate
• Nonpolar to Nonpolar
• Alanine to Valine
• Similarly-behaving residues
• Leucine to Isoleucine
Scoring Matrices: Amino Acid vs. DNA
• Two commonly used amino acid substitution matrices:
1. PAM
2. BLOSUM
• DNA substitution matrices:
• DNA is less conserved than protein sequences
• It is therefore less effective to compare coding regions at
the nucleotide level
• Furthermore, the particular scoring matrix is less
important.
PAM
• PAM: Stands for Point Accepted Mutation
• 1 PAM = PAM1 = 1% average change of all amino acid
positions.
• Note: This doesn’t mean that after 100 PAMs of evolution,
every residue will have changed:
• Some residues may have mutated several times.
• Some residues may have returned to their original state.
• Some residues may not changed at all.
PAMX
• PAMx = PAM1
x (x iterations of PAM1)
• Example: PAM250 = PAM1
250
• PAM250 is a widely used scoring matrix:
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys ...
A R N D C Q E G H I L K ...
Ala A 13 6 9 9 5 8 9 12 6 8 6 7 ...
Arg R 3 17 4 3 2 5 3 2 6 3 2 9
Asn N 4 4 6 7 2 5 6 4 6 3 2 5
Asp D 5 4 8 11 1 7 10 5 6 3 2 5
Cys C 2 1 1 1 52 1 1 2 2 2 1 1
Gln Q 3 5 5 6 1 10 7 3 7 2 3 5
...
Trp W 0 2 0 0 0 0 0 0 1 0 1 0
Tyr Y 1 1 2 1 3 1 1 1 3 2 2 1
Val V 7 4 4 4 4 4 4 4 5 4 15 10
BLOSUM
• BLOSUM: Stands for Blocks Substitution Matrix
• Scores are derived from observations of the frequencies of
substitutions in blocks of local alignments in related proteins.
• BLOSUM62 was created
using sequences sharing
no more than 62% identity.
• A sample of BLOSUM62
is shown at right.
C S T P … F Y W
C 9 -1 -1 3 … -2 -2 -2
S -1 4 1 -1 … -2 -2 -3
T -1 1 4 1 … -2 -2 -3
P 3 -1 1 7 … -4 -3 -4
… … … … … … … … …
F -2 -2 -2 -4 … 6 3 1
Y -2 -2 -2 -3 … 3 7 2
W -2 -3 -3 -4 … 1 2 11
http://www.uky.edu/Classes/BIO/520/BIO520WWW/blosum62.htm
The Blosum50 Scoring Matrix
Most entries
are scored
negatively
Differences between PAM and BLOSUM
1. PAM matrices are based on an explicit evolutionary model (i.e.
replacements are counted on the branches of a phylogenetic tree),
BLOSUM matrices are based on an implicit model of evolution.
2. The PAM matrices are based on mutations observed throughout a global
alignment, this includes both highly conserved and highly mutable regions.
The BLOSUM matrices are based only on highly conserved regions in
series of alignments forbidden to contain gaps.
3. The method used to count the replacements is different: unlike the PAM
matrix, the BLOSUM procedure uses groups of sequences within which not
all mutations are counted the same.
4. Higher numbers in the PAM matrix naming scheme denote larger
evolutionary distance, while larger numbers in the BLOSUM matrix naming
scheme denote higher sequence similarity and therefore smaller
evolutionary distance.
Section 3:
Local Alignment
Local Alignment: Why?
• Two genes in different species may be similar over short
conserved regions and dissimilar over remaining regions.
• Example: Homeobox genes have a short region called the
homeodomain that is highly conserved among species.
• A global alignment would not find the homeodomain
because it would try to align the entire sequence.
• Therefore, we search for an alignment which has a positive
score locally, meaning that an alignment on substrings of
the given sequences has a positive score.
Local Alignment: Illustration
Global alignment
Compute a “mini”
Global Alignment to
get Local Alignment
Local vs. Global Alignment: Example
• Global Alignment:
• Local Alignment—better alignment to find conserved segment:
--T—-CC-C-AGT—-TATGT-CAGGGGACACG—A-GCATGCAGA-GAC
| || | || | | | ||| || | | | | |||| |
AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG—T-CAGAT--C
tccCAGTTATGTCAGgggacacgagcatgcagagac
||||||||||||
aattgccgccgtcgttttcagCAGTTATGTCAGatc
The Local Alignment Problem
• Goal: Find the best local alignment between two strings.
• Input : Strings v and w as well as a scoring matrix δ
• Output : Alignment of substrings of v and w whose alignment
score is maximum among all possible alignments of all
possible substrings of v and w.
Local Alignment: How to Solve?
• We have seen that the Global Alignment Problem tries to find
the longest path between vertices (0,0) and (n,m) in the edit
graph.
• The Local Alignment Problem tries to find the longest path
among paths between arbitrary vertices (i,j) and (i’, j’) in the
edit graph.
Local Alignment: How to Solve?
• We have seen that the Global Alignment Problem tries to find
the longest path between vertices (0,0) and (n,m) in the edit
graph.
• The Local Alignment Problem tries to find the longest path
among paths between arbitrary vertices (i,j) and (i’, j’) in the
edit graph.
• Key Point: In the edit graph with negatively-scored edges,
Local Alignment may score higher than Global Alignment.
Global alignment
Local alignment
The Problem with This Setup
• In the grid of size n x n there
are ~n2 vertices (i,j) that may
serve as a source.
The Problem with This Setup
• In the grid of size n x n there
are ~n2 vertices (i,j) that may
serve as a source.
• For each such vertex
computing alignments from
(i,j) to (i’,j’) takes O(n2)
time.
• In the grid of size n x n there
are ~n2 vertices (i,j) that may
serve as a source.
• For each such vertex
computing alignments from
(i,j) to (i’,j’) takes O(n2)
time.
The Problem with This Setup
The Problem with This Setup
• In the grid of size n x n there
are ~n2 vertices (i,j) that may
serve as a source.
• For each such vertex
computing alignments from
(i,j) to (i’,j’) takes O(n2)
time.
• In the grid of size n x n there
are ~n2 vertices (i,j) that may
serve as a source.
• For each such vertex
computing alignments from
(i,j) to (i’,j’) takes O(n2)
time.
The Problem with This Setup
• In the grid of size n x n there
are ~n2 vertices (i,j) that may
serve as a source.
• For each such vertex
computing alignments from
(i,j) to (i’,j’) takes O(n2)
time.
The Problem with This Setup
• In the grid of size n x n there
are ~n2 vertices (i,j) that may
serve as a source.
• For each such vertex
computing alignments from
(i,j) to (i’,j’) takes O(n2)
time.
• This gives an overall runtime
of O(n4), which is a bit too
slow…can we do better?
The Problem with This Setup
• Long run time O(n4):
• This can be remedied by
giving free rides
The Problem with This Setup
Local Alignment Solution: Free Rides
• The solution actually comes from adding edges to the edit
graph.
• The dashed edges represent the
―free rides‖ from (0, 0) to every
other node.
• Each ―free ride‖ is assigned
an edge weight of 0.
• If we start at (0, 0) instead of
(i, j) and maximize the longest
path to (i’, j’), we will obtain
the local alignment.
Yeah, a free ride!
Smith-Waterman Local Alignment Algorithm
• The largest value of si,j over the whole edit graph is the score
of the best local alignment.
• The recurrence:
• Notice that the 0 is the only difference between the global
alignment recurrence…hence our new algorithm is O(n2)!

si, j
 max
0
si 1, j 1
  v i
, w j 
si 1, j
  v i
, w j 
si, j 1
  , w j 









The Local Alignment Recurrence
• The largest value of si,j over the whole edit graph is the score of
the best local alignment.
• The recurrence:
0
si-1,j-1 + δ (vi, wj)
s i-1,j + δ (vi, -)
s i,j-1 + δ (-, wj)
Power of ZERO: there is
only this change from the
original recurrence of a
Global Alignment - since
there is only one ―free ride‖
edge entering into every
vertex
si,j = max
Section 4:
Alignment with Affine Gap
Penalties
Scoring Indels: Naïve Approach
• In our original scoring schema, we assigned a fixed penalty σ
to every indel:
• -σ for 1 indel
• -2σ for 2 consecutive indels
• -3σ for 3 consecutive indels
• Etc.
• However…this schema may be too severe a penalty for a
series of 100 consecutive indels.
Affine Gap Penalties
• In nature, a series of k indels often come as a single event
rather than a series of k single nucleotide events:
• Example:
Normal scoring would
give the same score
for both alignments
More Likely Less Likely
Accounting for Gaps
• Gap: Contiguous sequence of spaces in one of the rows of an
alignment.
• Affine Gap Penalty for a gap of length x is:
-(ρ + σx)
• ρ > 0 is the gap opening penalty: penalty for introducing a
gap.
• σ > 0 is the gap extension penalty: penalty for each indel in
a gap.
• ρ should be large relative to σ, since starting a gap should be
penalized more than extending it.
Affine Gap Penalties
• Gap penalties:
• -ρ – σ when there is 1 indel,
• -ρ – 2σ when there are 2 indels,
• -ρ – 3σ when there are 3 indels,
• -ρ – x·σ when there are x indels.
Affine Gap Penalties and the Edit Graph
• To reflect affine gap penalties,
we have to add ―long‖
horizontal and vertical edges
to the edit graph. Each such
edge of length x should have
weight

   x  
Affine Gap Penalties and the Edit Graph
• To reflect affine gap penalties,
we have to add ―long‖
horizontal and vertical edges
to the edit graph. Each such
edge of length x should have
weight
• There are many such edges!

   x  
Affine Gap Penalties and the Edit Graph
• To reflect affine gap penalties,
we have to add ―long‖
horizontal and vertical edges
to the edit graph. Each such
edge of length x should have
weight
• There are many such edges!
• Adding them to the graph
increases the running time of
alignment by a factor of n to O(n3).

   x  
• The three recurrences for the scoring algorithm creates a 3-
layered graph.
• The main level extends matches and mismatches.
• The lower level creates/extends gaps in sequence v.
• The upper level creates/extends gaps in sequence w.
• A jumping penalty is assigned to moving from the main level
to either the upper level or the lower level (-ρ – σ).
• There is a gap extension penalty for each continuation on a
level other than the main level (- σ).
Affine Gap Penalties and 3 Layer Manhattan Grid
Visualizing Edit Graph: Manhattan in 3 Layers
δ
δ
δ
δ
δ
δ
δ
δ
δ
δ
Visualizing Edit Graph: Manhattan in 3 Layers
ρ
ρ
σ
σ
δ
δ
δ
δ
δ
Visualizing Edit Graph: Manhattan in 3 Layers
The 3-leveled Manhattan Grid
Affine Gap Penalty Recurrences
si, j

 max
si 1, j

  Continue gap in w (deletion)
si 1, j
     Start gap in w (deletion) : from middle




si, j

 max
si, j 1

  Continue gap in v (insertion)
si, j 1
     Start gap in v (insertion) : from middle




si, j
 max
si 1, j 1
  v i
, w j  Match or mismatch
si, j

End deletion : from top
si, j

End insertion : from bottom








Mais conteúdo relacionado

Mais procurados

Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmProshantaShil
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignmentavrilcoghlan
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionArindam Ghosh
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionRoshan Karunarathna
 
Dynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentDynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentGeethanjaliAnilkumar2
 
Flux balance analysis
Flux balance analysisFlux balance analysis
Flux balance analysisJyotiBishlay
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02PILLAI ASWATHY VISWANATH
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis Nitin Naik
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure predictionSamvartika Majumdar
 
Dot matrix Analysis Tools (Bioinformatics)
Dot matrix Analysis Tools (Bioinformatics)Dot matrix Analysis Tools (Bioinformatics)
Dot matrix Analysis Tools (Bioinformatics)Safa Khalid
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshitaHarshita Bhawsar
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. localbenazeer fathima
 
Protein Folding-biophysical and cellular aspects, protein denaturation
Protein Folding-biophysical and cellular aspects, protein denaturationProtein Folding-biophysical and cellular aspects, protein denaturation
Protein Folding-biophysical and cellular aspects, protein denaturationAnishaMukherjee5
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignmentharshita agarwal
 

Mais procurados (20)

Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Chou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure predictionChou fasman algorithm for protein structure prediction
Chou fasman algorithm for protein structure prediction
 
Dynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentDynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignment
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Flux balance analysis
Flux balance analysisFlux balance analysis
Flux balance analysis
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Parsimony methods
Parsimony methodsParsimony methods
Parsimony methods
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Dot matrix Analysis Tools (Bioinformatics)
Dot matrix Analysis Tools (Bioinformatics)Dot matrix Analysis Tools (Bioinformatics)
Dot matrix Analysis Tools (Bioinformatics)
 
Homology Modelling
Homology ModellingHomology Modelling
Homology Modelling
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Protein Folding-biophysical and cellular aspects, protein denaturation
Protein Folding-biophysical and cellular aspects, protein denaturationProtein Folding-biophysical and cellular aspects, protein denaturation
Protein Folding-biophysical and cellular aspects, protein denaturation
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
 

Destaque

B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentRai University
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState
 
Global local alignment
Global local alignmentGlobal local alignment
Global local alignmentScott Hamilton
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresLars Juhl Jensen
 
Sequence alignments complete coverage
Sequence alignments complete coverageSequence alignments complete coverage
Sequence alignments complete coveragePrasanthperceptron
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment Parinda Rajapaksha
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithmavrilcoghlan
 
Brand Management MBA Project Adam's Youghurt
Brand Management MBA Project Adam's YoughurtBrand Management MBA Project Adam's Youghurt
Brand Management MBA Project Adam's YoughurtHanan Rasool
 
summer internship project report on DAHI CONSUMPTION
summer internship project report on DAHI CONSUMPTIONsummer internship project report on DAHI CONSUMPTION
summer internship project report on DAHI CONSUMPTIONMayank Patel
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithmavrilcoghlan
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure predictionMuhammed sadiq
 

Destaque (20)

B.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignmentB.sc biochem i bobi u 3.1 sequence alignment
B.sc biochem i bobi u 3.1 sequence alignment
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Global alignment
Global alignmentGlobal alignment
Global alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Ch06 rna
Ch06 rnaCh06 rna
Ch06 rna
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
Blast
BlastBlast
Blast
 
Global local alignment
Global local alignmentGlobal local alignment
Global local alignment
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein features
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
Sequence alignments complete coverage
Sequence alignments complete coverageSequence alignments complete coverage
Sequence alignments complete coverage
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment The Needleman-Wunsch Algorithm for Sequence Alignment
The Needleman-Wunsch Algorithm for Sequence Alignment
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
Brand Management MBA Project Adam's Youghurt
Brand Management MBA Project Adam's YoughurtBrand Management MBA Project Adam's Youghurt
Brand Management MBA Project Adam's Youghurt
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
summer internship project report on DAHI CONSUMPTION
summer internship project report on DAHI CONSUMPTIONsummer internship project report on DAHI CONSUMPTION
summer internship project report on DAHI CONSUMPTION
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
RNA secondary structure prediction
RNA secondary structure predictionRNA secondary structure prediction
RNA secondary structure prediction
 

Semelhante a Ch06 alignment

lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Prof. Wim Van Criekinge
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfsriaisvariyasundar
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfssusera1eccd
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020NECST Lab @ Politecnico di Milano
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningSungchul Kim
 
351_27435_EE411_2015_1__1_1_0 3 EE411 Lec6,7 compensation RL.pdf
351_27435_EE411_2015_1__1_1_0 3 EE411 Lec6,7 compensation RL.pdf351_27435_EE411_2015_1__1_1_0 3 EE411 Lec6,7 compensation RL.pdf
351_27435_EE411_2015_1__1_1_0 3 EE411 Lec6,7 compensation RL.pdfYousefDessouki
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxssuser01e301
 

Semelhante a Ch06 alignment (20)

Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
02-alignment.pdf
02-alignment.pdf02-alignment.pdf
02-alignment.pdf
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
Bioinformatica t3-scoring matrices-wim_vancriekinge_v2013
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
MUMS: Transition & SPUQ Workshop - Gradient-Free Construction of Active Subsp...
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Blast fasta 4
Blast fasta 4Blast fasta 4
Blast fasta 4
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Sequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdfSequence-analysis-pairwise-alignment.pdf
Sequence-analysis-pairwise-alignment.pdf
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdf
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
351_27435_EE411_2015_1__1_1_0 3 EE411 Lec6,7 compensation RL.pdf
351_27435_EE411_2015_1__1_1_0 3 EE411 Lec6,7 compensation RL.pdf351_27435_EE411_2015_1__1_1_0 3 EE411 Lec6,7 compensation RL.pdf
351_27435_EE411_2015_1__1_1_0 3 EE411 Lec6,7 compensation RL.pdf
 
Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014Bioinformatica t3-scoringmatrices v2014
Bioinformatica t3-scoringmatrices v2014
 
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
 

Mais de BioinformaticsInstitute

Comparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsComparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsBioinformaticsInstitute
 
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес... Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...BioinformaticsInstitute
 
Вперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкВперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкBioinformaticsInstitute
 
"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр ПредеусBioinformaticsInstitute
 
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...BioinformaticsInstitute
 
Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)BioinformaticsInstitute
 
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...BioinformaticsInstitute
 
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)BioinformaticsInstitute
 

Mais de BioinformaticsInstitute (20)

Graph genome
Graph genome Graph genome
Graph genome
 
Nanopores sequencing
Nanopores sequencingNanopores sequencing
Nanopores sequencing
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
Comparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphsComparative Genomics and de Bruijn graphs
Comparative Genomics and de Bruijn graphs
 
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес... Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
Биоинформатический анализ данных полноэкзомного секвенирования: анализ качес...
 
Вперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днкВперед в прошлое. Методы генетической диагностики древней днк
Вперед в прошлое. Методы генетической диагностики древней днк
 
Knime & bioinformatics
Knime & bioinformaticsKnime & bioinformatics
Knime & bioinformatics
 
"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус"Зачем биологам суперкомпьютеры", Александр Предеус
"Зачем биологам суперкомпьютеры", Александр Предеус
 
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
Иммунотерапия раковых опухолей: взгляд со стороны системной биологии. Максим ...
 
Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)Рак 101 (Мария Шутова, ИоГЕН РАН)
Рак 101 (Мария Шутова, ИоГЕН РАН)
 
Плюрипотентность 101
Плюрипотентность 101Плюрипотентность 101
Плюрипотентность 101
 
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
Секвенирование как инструмент исследования сложных фенотипов человека: от ген...
 
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
Инвестиции в биоинформатику и биотех (Андрей Афанасьев)
 
Biodb 2011-everything
Biodb 2011-everythingBiodb 2011-everything
Biodb 2011-everything
 
Biodb 2011-05
Biodb 2011-05Biodb 2011-05
Biodb 2011-05
 
Biodb 2011-04
Biodb 2011-04Biodb 2011-04
Biodb 2011-04
 
Biodb 2011-03
Biodb 2011-03Biodb 2011-03
Biodb 2011-03
 
Biodb 2011-01
Biodb 2011-01Biodb 2011-01
Biodb 2011-01
 
Biodb 2011-02
Biodb 2011-02Biodb 2011-02
Biodb 2011-02
 
Ngs 3 1
Ngs 3 1Ngs 3 1
Ngs 3 1
 

Último

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Último (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Ch06 alignment

  • 2. Outline 1. Global Alignment 2. Scoring Matrices 3. Local Alignment 4. Alignment with Affine Gap Penalties
  • 4. From LCS to Alignment: Change the Scoring • Recall: The Longest Common Subsequence (LCS) problem allows only insertions and deletions (no mismatches). • In the LCS Problem, we scored 1 for matches and 0 for indels, so our alignment score was simply equal to the total number of matches. • Let’s consider penalizing mismatches and indels instead.
  • 5. • Simplest scoring schema: For some positive numbers μ and σ: • Match Premium: +1 • Mismatch Penalty: –μ • Indel Penalty: –σ • Under these assumptions, the alignment score becomes as follows: Score = #matches – μ(#mismatches) – σ(#indels) • Our specific choice of µ and σ depends on how we wish to penalize mismatches and indels. From LCS to Alignment: Change the Scoring
  • 6. The Global Alignment Problem • Input : Strings v and w and a scoring schema • Output : An alignment with maximum score • We can use dynamic programming to solve the Global Alignment Problem: m : mismatch penalty σ : indel penalty if vi = wj si, j  max si 1, j 1  1 si 1, j 1  m si 1, j   si, j 1          if vi ≠ wj
  • 8. Scoring Matrices • To further generalize the scoring of alignments, consider a (4+1) x (4+1) scoring matrix δ. • The purpose of the scoring matrix is to score one nucleotide against another, e.g. A matched to G may be ―worse‖ than C matched to T. • The addition of 1 is to include the score for comparison of a gap character ―-‖. • This will simplify the algorithm to the dynamic formula at right: Note: For amino acid sequence comparison, we need a (20 + 1) x (20 + 1) matrix. si, j  max si 1, j 1   v i , w j  si 1, j   v i ,   si, j 1   , w j        
  • 9. Scoring Matrices: Example A G T C — A 1 -0.8 -0.2 -2.3 -0.6 G -0.8 1 -1.1 -0.7 -1.5 T -0.2 -1.1 1 -0.5 -0.9 C -2.3 -0.7 -0.5 1 -1 — -0.6 -1.5 -0.9 -1 n/a A GTC A CGTTGG Score: –0.6 – 1 + 1 + 1 – 0.5 – 1.5 – 0.8 = –2.4 • Say we want to align AGTCA and CGTTGG with the following scoring matrix: Sample Alignment:
  • 10. How Do We Make a Scoring Matrix? • Scoring matrices are created based on biological evidence. • Alignments can be thought of as two sequences that differ due to mutations. • Some of these mutations have little effect on the protein’s function, therefore some penalties, δ(vi , wj ), will be less harsh than others. • This explains why we would want to have a scoring matrix to begin with.
  • 11. Scoring Matrix: Positive Mismatches A R N K A 5 -2 -1 -1 R -2 7 -1 3 N -1 -1 7 0 K -1 3 0 6 • Notice that although R and K are different amino acids, they have a positive mismatch score. • Why? They are both positively charged amino acids  this mismatch will not greatly change the function of the protein.
  • 12. Scoring Matrix: Positive Mismatches A R N K A 5 -2 -1 -1 R -2 7 -1 3 N -1 -1 7 0 K -1 3 0 6 • Notice that although R and K are different amino acids, they have a positive mismatch score. • Why? They are both positively charged amino acids  this mismatch will not greatly change the function of the protein.
  • 13. Scoring Matrix: Positive Mismatches A R N K A 5 -2 -1 -1 R -2 7 -1 3 N -1 -1 7 0 K -1 3 0 6 • Notice that although R and K are different amino acids, they have a positive mismatch score. • Why? They are both positively charged amino acids  this mismatch will not greatly change the function of the protein.
  • 14. Mismatches with Low Penalties • Amino acid changes that tend to preserve the physicochemical properties of the original residue: • Polar to Polar • Aspartate to Glutamate • Nonpolar to Nonpolar • Alanine to Valine • Similarly-behaving residues • Leucine to Isoleucine
  • 15. Scoring Matrices: Amino Acid vs. DNA • Two commonly used amino acid substitution matrices: 1. PAM 2. BLOSUM • DNA substitution matrices: • DNA is less conserved than protein sequences • It is therefore less effective to compare coding regions at the nucleotide level • Furthermore, the particular scoring matrix is less important.
  • 16. PAM • PAM: Stands for Point Accepted Mutation • 1 PAM = PAM1 = 1% average change of all amino acid positions. • Note: This doesn’t mean that after 100 PAMs of evolution, every residue will have changed: • Some residues may have mutated several times. • Some residues may have returned to their original state. • Some residues may not changed at all.
  • 17. PAMX • PAMx = PAM1 x (x iterations of PAM1) • Example: PAM250 = PAM1 250 • PAM250 is a widely used scoring matrix: Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys ... A R N D C Q E G H I L K ... Ala A 13 6 9 9 5 8 9 12 6 8 6 7 ... Arg R 3 17 4 3 2 5 3 2 6 3 2 9 Asn N 4 4 6 7 2 5 6 4 6 3 2 5 Asp D 5 4 8 11 1 7 10 5 6 3 2 5 Cys C 2 1 1 1 52 1 1 2 2 2 1 1 Gln Q 3 5 5 6 1 10 7 3 7 2 3 5 ... Trp W 0 2 0 0 0 0 0 0 1 0 1 0 Tyr Y 1 1 2 1 3 1 1 1 3 2 2 1 Val V 7 4 4 4 4 4 4 4 5 4 15 10
  • 18. BLOSUM • BLOSUM: Stands for Blocks Substitution Matrix • Scores are derived from observations of the frequencies of substitutions in blocks of local alignments in related proteins. • BLOSUM62 was created using sequences sharing no more than 62% identity. • A sample of BLOSUM62 is shown at right. C S T P … F Y W C 9 -1 -1 3 … -2 -2 -2 S -1 4 1 -1 … -2 -2 -3 T -1 1 4 1 … -2 -2 -3 P 3 -1 1 7 … -4 -3 -4 … … … … … … … … … F -2 -2 -2 -4 … 6 3 1 Y -2 -2 -2 -3 … 3 7 2 W -2 -3 -3 -4 … 1 2 11 http://www.uky.edu/Classes/BIO/520/BIO520WWW/blosum62.htm
  • 19. The Blosum50 Scoring Matrix Most entries are scored negatively
  • 20. Differences between PAM and BLOSUM 1. PAM matrices are based on an explicit evolutionary model (i.e. replacements are counted on the branches of a phylogenetic tree), BLOSUM matrices are based on an implicit model of evolution. 2. The PAM matrices are based on mutations observed throughout a global alignment, this includes both highly conserved and highly mutable regions. The BLOSUM matrices are based only on highly conserved regions in series of alignments forbidden to contain gaps. 3. The method used to count the replacements is different: unlike the PAM matrix, the BLOSUM procedure uses groups of sequences within which not all mutations are counted the same. 4. Higher numbers in the PAM matrix naming scheme denote larger evolutionary distance, while larger numbers in the BLOSUM matrix naming scheme denote higher sequence similarity and therefore smaller evolutionary distance.
  • 22. Local Alignment: Why? • Two genes in different species may be similar over short conserved regions and dissimilar over remaining regions. • Example: Homeobox genes have a short region called the homeodomain that is highly conserved among species. • A global alignment would not find the homeodomain because it would try to align the entire sequence. • Therefore, we search for an alignment which has a positive score locally, meaning that an alignment on substrings of the given sequences has a positive score.
  • 23. Local Alignment: Illustration Global alignment Compute a “mini” Global Alignment to get Local Alignment
  • 24. Local vs. Global Alignment: Example • Global Alignment: • Local Alignment—better alignment to find conserved segment: --T—-CC-C-AGT—-TATGT-CAGGGGACACG—A-GCATGCAGA-GAC | || | || | | | ||| || | | | | |||| | AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG—T-CAGAT--C tccCAGTTATGTCAGgggacacgagcatgcagagac |||||||||||| aattgccgccgtcgttttcagCAGTTATGTCAGatc
  • 25. The Local Alignment Problem • Goal: Find the best local alignment between two strings. • Input : Strings v and w as well as a scoring matrix δ • Output : Alignment of substrings of v and w whose alignment score is maximum among all possible alignments of all possible substrings of v and w.
  • 26. Local Alignment: How to Solve? • We have seen that the Global Alignment Problem tries to find the longest path between vertices (0,0) and (n,m) in the edit graph. • The Local Alignment Problem tries to find the longest path among paths between arbitrary vertices (i,j) and (i’, j’) in the edit graph.
  • 27. Local Alignment: How to Solve? • We have seen that the Global Alignment Problem tries to find the longest path between vertices (0,0) and (n,m) in the edit graph. • The Local Alignment Problem tries to find the longest path among paths between arbitrary vertices (i,j) and (i’, j’) in the edit graph. • Key Point: In the edit graph with negatively-scored edges, Local Alignment may score higher than Global Alignment.
  • 28. Global alignment Local alignment The Problem with This Setup • In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source.
  • 29. The Problem with This Setup • In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source. • For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time.
  • 30. • In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source. • For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time. The Problem with This Setup
  • 31. The Problem with This Setup • In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source. • For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time.
  • 32. • In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source. • For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time. The Problem with This Setup
  • 33. • In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source. • For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time. The Problem with This Setup
  • 34. • In the grid of size n x n there are ~n2 vertices (i,j) that may serve as a source. • For each such vertex computing alignments from (i,j) to (i’,j’) takes O(n2) time. • This gives an overall runtime of O(n4), which is a bit too slow…can we do better? The Problem with This Setup
  • 35. • Long run time O(n4): • This can be remedied by giving free rides The Problem with This Setup
  • 36. Local Alignment Solution: Free Rides • The solution actually comes from adding edges to the edit graph. • The dashed edges represent the ―free rides‖ from (0, 0) to every other node. • Each ―free ride‖ is assigned an edge weight of 0. • If we start at (0, 0) instead of (i, j) and maximize the longest path to (i’, j’), we will obtain the local alignment. Yeah, a free ride!
  • 37. Smith-Waterman Local Alignment Algorithm • The largest value of si,j over the whole edit graph is the score of the best local alignment. • The recurrence: • Notice that the 0 is the only difference between the global alignment recurrence…hence our new algorithm is O(n2)!  si, j  max 0 si 1, j 1   v i , w j  si 1, j   v i , w j  si, j 1   , w j          
  • 38. The Local Alignment Recurrence • The largest value of si,j over the whole edit graph is the score of the best local alignment. • The recurrence: 0 si-1,j-1 + δ (vi, wj) s i-1,j + δ (vi, -) s i,j-1 + δ (-, wj) Power of ZERO: there is only this change from the original recurrence of a Global Alignment - since there is only one ―free ride‖ edge entering into every vertex si,j = max
  • 39. Section 4: Alignment with Affine Gap Penalties
  • 40. Scoring Indels: Naïve Approach • In our original scoring schema, we assigned a fixed penalty σ to every indel: • -σ for 1 indel • -2σ for 2 consecutive indels • -3σ for 3 consecutive indels • Etc. • However…this schema may be too severe a penalty for a series of 100 consecutive indels.
  • 41. Affine Gap Penalties • In nature, a series of k indels often come as a single event rather than a series of k single nucleotide events: • Example: Normal scoring would give the same score for both alignments More Likely Less Likely
  • 42. Accounting for Gaps • Gap: Contiguous sequence of spaces in one of the rows of an alignment. • Affine Gap Penalty for a gap of length x is: -(ρ + σx) • ρ > 0 is the gap opening penalty: penalty for introducing a gap. • σ > 0 is the gap extension penalty: penalty for each indel in a gap. • ρ should be large relative to σ, since starting a gap should be penalized more than extending it.
  • 43. Affine Gap Penalties • Gap penalties: • -ρ – σ when there is 1 indel, • -ρ – 2σ when there are 2 indels, • -ρ – 3σ when there are 3 indels, • -ρ – x·σ when there are x indels.
  • 44. Affine Gap Penalties and the Edit Graph • To reflect affine gap penalties, we have to add ―long‖ horizontal and vertical edges to the edit graph. Each such edge of length x should have weight     x  
  • 45. Affine Gap Penalties and the Edit Graph • To reflect affine gap penalties, we have to add ―long‖ horizontal and vertical edges to the edit graph. Each such edge of length x should have weight • There are many such edges!     x  
  • 46. Affine Gap Penalties and the Edit Graph • To reflect affine gap penalties, we have to add ―long‖ horizontal and vertical edges to the edit graph. Each such edge of length x should have weight • There are many such edges! • Adding them to the graph increases the running time of alignment by a factor of n to O(n3).     x  
  • 47. • The three recurrences for the scoring algorithm creates a 3- layered graph. • The main level extends matches and mismatches. • The lower level creates/extends gaps in sequence v. • The upper level creates/extends gaps in sequence w. • A jumping penalty is assigned to moving from the main level to either the upper level or the lower level (-ρ – σ). • There is a gap extension penalty for each continuation on a level other than the main level (- σ). Affine Gap Penalties and 3 Layer Manhattan Grid
  • 48. Visualizing Edit Graph: Manhattan in 3 Layers δ δ δ δ δ
  • 49. δ δ δ δ δ Visualizing Edit Graph: Manhattan in 3 Layers
  • 52. Affine Gap Penalty Recurrences si, j   max si 1, j    Continue gap in w (deletion) si 1, j      Start gap in w (deletion) : from middle     si, j   max si, j 1    Continue gap in v (insertion) si, j 1      Start gap in v (insertion) : from middle     si, j  max si 1, j 1   v i , w j  Match or mismatch si, j  End deletion : from top si, j  End insertion : from bottom       