SlideShare uma empresa Scribd logo
1 de 71
Baixar para ler offline
Saul B. Needleman & Christian D. Wuncsch (1969)
KAVINDRI DILSHANI
H.M.K.G BANDARA
PARINDA RAJAPAKSHE
ABOUT RESEARCH
Title
• “A General method applicable to the search for similarities
in the amino acid sequence of two proteins”
Authors
• Saul B. Needleman & Christian D. Wuncsch, Department of Biochemistry,
North-western University & Nuclear Medicine Service , V.A Research
Hospital ,Chicago, USA. (1969) [Cited by 8474]
S.B. Needleman & C.D. Wuncsch , “A General method applicable to the search for similarities in
the amino acid sequence of two proteins” , J. Mol . Biol .(1970) 48, 443- 453.
OUTLINE
• Introduction
- Sequence Alignment
- Approaches
- Needleman-Wunch Algorithm vs. Dynamic Programming
• Example
- Optimal Alignment Score
- Optimal Alignment
- Algorithm Cost
• Applications
- Results & Discussion
- Methodology
- Usefulness
INTRODUCTION
SEQUENCE ALIGNMENT
• Sequence alignment is a way of arranging two or more
sequences of characters to identify regions of similarity.
• Identification of residue-residue correspondences
• Sequence : Can be taken as ordered strings of letters.
• Sequences in Bio-Informatics ?
• DNA sequences
• RNA sequences
• Protein sequences
MOTIVATION
• Find homologous proteins
– Allows to predict structure and function
• Locate similar subsequences in DNA
– e.g.: Allows to identify regulatory elements
– Infer Biological similarities
• Locate DNA sequences that might overlap
– Helps in sequence assembly
SEQUENCE ALIGNMENT - RESULTS
• Input: Two sequences over the same alphabet.
- GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA
• Output: An alignment of the two sequences.
-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A
• Input: Two sequences over the same alphabet.
- GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA
• Output: An alignment of the two sequences.
-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A
• Input: Two sequences over the same alphabet.
- GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA
• Output: An alignment of the two sequences.
-GCGC-ATGGATTGAGCGA
TGCGCCATTGAT-GACC-A
Insertions /
Deletions
(indel)
Perfect
matches
Mismatches
APPROACHES
Sequence Alignment
Qualitative Quantitative
Dot-plot Global
Local
Multiple
QUALITATIVE
• Dot-plot
-Pictorial representation & relationship between two sequences
- Uses a Table or a Matrix
- Doesn’t quantifies the similarity figure !!
QUANTITATIVE
• Construction of the best alignment between
the sequences.
• Assessment of the similarity from the
alignment. ( Numerically Quantifies)
GLOBAL SEQUENCE ALIGNMENT
• The best alignment over the entire length of two
sequences.
• Suitable : Two sequences are of similar length,
with a significant degree of similarity throughout .
LOCAL SEQUENCE ALIGNMENT
• Compares short portions of sequence or a whole
library of sequences with short portions of
another.
• Suitable : Comparing substantially different
sequences, which possibly differ significantly in
length, and have only a short patches of similarity.
MULTIPLE SEQUENCE ALIGNMENT
• Simultaneous alignment of more than two
sequences.
• Suitable : Suitable when searching for subtle
conserved sequence patterns in a protein family,
and when more than two sequences of the protein
family are available.
EXAMPLE
S1 : SIMILARITY S2 : PILLAR S3 : MOLARITY
Global Local Multiple
SIMILARITY
PI-LLAR---
MILAR
ILLAR
SIMILARITY
PI-LLAR---
--MOLARITY
HOW TO QUANTIFY ?
• Introduces a Scoring Schema
• Set of rules which assigns the Alignment score to
any given alignment of two sequences.
• Alignment score : Goodness of Alignment
• Scoring Schema
Substitution scores
Gap penalties
THE SUBSITUTION MATRIX
• Simple scoring schema for Residue substitution
• Express the residue substitution costs can be achieved with
a N x N matrix (N is 4 for DNA and 20 for proteins).
C T A G
C 1 -1 -1 -1
T -1 1 -1 -1
A -1 -1 1 -1
G -1 -1 -1 1
EXAMPLE
• Consider the "best" alignment of ATGGCGT and
ATGAGT
• +1 as a reward for a match, -1 as the penalty for a mismatch,
and ignore gaps
ATGGCGT
ATG_ AGT
Score: +1 + 1 + 1 + 0 - 1 + 1 + 1 = 4
Alternative alignment
ATGGCGT
A_TGAGT
Score: +1 + 0 - 1 + 1 - 1 + 1 + 1 = 2
BETTER MATRIX
• Certain changes in DNA /Protein sequences are more
likely to occur naturally than the others.
• Proteins are composed of twenty amino acids, and
physico-chemical properties of individual amino acids
vary considerably.
• Important to incorporate evolutionary relationships
for this substitution schema.
EVOLUTIONARY SUBSTITUTION MATRIX
• PAM ("point accepted mutation") family
- PAM250, PAM120, etc.
• BLOSUM ("Blocks substitution matrix") family
- BLOSUM62, BLOSUM50, etc.
• Derived from the analysis of known alignments of
closely related proteins
• Assigns variable weights to different substitution
operations.
BLOSUM 62
GAPS
• A Gap, indicates consecutive run of spaces in an
alignment , may be introduced in either sequence.
(insertion or a deletion of a residue)
• Objective :- Optimal sequence alignment with
meaningful alignments.
• Is it Good ?
– Interrupts the entire polymer chain
– In DNA shifts the reading frame
Penalty
GAP PENALTIES
Constant Linear Affine
• Whatever size it is,
receives the constant
negative penalty : -g
• Depends linearly on
the size of a gap.
Parameter : -g, is the
penalty per unit
length of a gap.
• Gap introduction cost >
Gap extension cost
g = o + (L-1)e.
|e| < |o|
ENRICHED SCORING SCHEMA
• Scoring scheme provides us with the quantitative
measure of how good is some alignment relative to
alternative alignments .
• Does this scoring scheme tell us how to find the
best alignment ?
BASIC APPROACH
• Brute-force approach
- Generate the list all possible alignments between two
sequences, score them
- Select the alignment with the best score
– The number of possible global alignments between
two sequences of length N is
22𝑁
𝜋𝑁
- For two sequences of 250 residues this is ~ 10149
NEEDLEMAN WUNCH
ALGORITHM
NEEDLEMAN-WUNCH ALGORITHM
• Reduce the massive number of possibilities that
need to be considered, yet still guarantees that the
best solution will be found.
• Global Sequence Alignment Technique.
• Build up the best alignment by using optimal
alignments of smaller sub sequences.
• Dynamic Programming
DYNAMIC PROGRAMMING
• Dynamic Programming is an algorithmic
paradigm.
- Breaking problem into sub problems
- Stores results of sub problems
- Avoids computing the same results again.
• Main properties of the problem
- Overlapping Sub problems
- Optimal substructure
OVERLAPPING SUB PROBLEMS
• Segregate main problem into sub-problems.
• Mainly used when solutions of same sub problems are
needed again and again.
• Computed solutions to sub problems are stored in a
table/matrix so that these don’t have to recomputed.
• Dynamic Programming is not useful when there are no
common (overlapping) sub problems.
OPTIMAL SUBSTRUCTURE
• If an optimal solution can be constructed efficiently
from optimal solutions of its sub problems.
• Optimal global solution contains the optimal
solutions of all its sub problems.
• Dynamic Programming is not useful when there
isn’t optimal substructure in the problem.
HOW IT WORKS?
• Governed by three steps
- Break the problem into smaller sub problems.
- Solve the smaller problems optimally
- Use the sub-problem solutions to construct an
optimal solution for the original problem
• Needleman-Wunsch Algorithm incorporates
the Dynamic Algorithm paradigm  Optimal global
alignment and the corresponding score.
WORKOUT
Definitions
• A scoring function (σ)
defines the score to give to a substitution mutation
eg. -1 for a match, -1 for mismatch
• A gap penalty
defines the score to give to an insertion or deletion
eg. -1
• A recurrence relation
defines what actions we repeat at each iteration (step) of the
algorithm
T(i-1, j-1) + σ(S1(i), S2(j))
T(i, j) = max T(i-1, j) + gap penalty
T(i, j-1) + gap penalty
Steps
• Step 1
– Fill up a matrix (table) T using the recurrence
relation
• Step 2
–The Trace back step use the filled-in matrix T to
work out the best alignment
Work Out
• Sequences
S1= TGGTG
S2= ATCGT
• Scoring function
For matches : +1
For mismatches : -1
A C G T
A +1 -1 -1 -1
C -1 +1 -1 -1
G -1 -1 +1 -1
T -1 -1 -1 +1
Substitution Matrix
Work Out cont..
• Initializing the table
T G G T G
A
T
C
G
T
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
Left to Right Top to Bottom
Step 1 - The value of T(0,0) is set to zero at the start
Work Out cont..
T G G T G
0
A
T
C
G
T
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
T(i-1, j-1) + σ(S1(i), S2(j))
T(i, j) = max T(i-1, j) + gap penalty
T(i, j-1) + gap penalty
previous column & row
previous column & same row
same column & previous row
Gap penalty = -2
A C G T
A +1 -1 -1 -1
C -1 +1 -1 -1
G -1 -1 +1 -1
T -1 -1 -1 +1
Work Out cont..
0 -2 -4 -6 -8 -10
-2 -1 -3 -5 -7 -9
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
A
T
C
G
T
T G G T G
A C G T
A +1 -1 -1 -1
C -1 +1 -1 -1
G -1 -1 +1 -1
T -1 -1 -1 +1
Gap penalty = -2
T(i-1, j-1) + σ(S1(i), S2(j))
T(i, j) = max T(i-1, j) + gap penalty
T(i, j-1) + gap penalty
Work Out cont..
0 -2 -4 -6 -8 -10
-2 -1 -3 -5 -7 -9
-4 -1 -2 -4 -4 -6
-6 -3 -2 -3 -5 -5
-8 -5 -2 -1 -3 -4
-10 -7 -4 -3 0 -2
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
A
T
C
G
T
T G G T G
Gap penalty = -2
Work Out cont..
0 -2 -4 -6 -8 -10
-2 -1 -3 -5 -7 -9
-4 -1 -2 -4 -4 -6
-6 -3 -2 -3 -5 -5
-8 -5 -2 -1 -3 -4
-10 -7 -4 -3 0 -2
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
A
T
C
G
T
T G G T G Trace Back
Work Out cont..
0 -2 -4 -6 -8 -10
-2 -1 -3 -5 -7 -9
-4 -1 -2 -4 -4 -6
-6 -3 -2 -3 -5 -5
-8 -5 -2 -1 -3 -4
-10 -7 -4 -3 0 -2
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
A
T
C
G
T
T G G T G Trace Back
Work Out cont..
0 -2 -4 -6 -8 -10
-2 -1 -3 -5 -7 -9
-4 -1 -2 -4 -4 -6
-6 -3 -2 -3 -5 -5
-8 -5 -2 -1 -3 -4
-10 -7 -4 -3 0 -2
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
A
T
C
G
T
T G G T G
Trace Back
Work Out cont..
0 -2 -4 -6 -8 -10
-2 -1 -3 -5 -7 -9
-4 -1 -2 -4 -4 -6
-6 -3 -2 -3 -5 -5
-8 -5 -2 -1 -3 -4
-10 -7 -4 -3 0 -2
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
A
T
C
G
T
T G G T G
Trace Back
Work Out cont..
0 -2 -4 -6 -8 -10
-2 -1 -3 -5 -7 -9
-4 -1 -2 -4 -4 -6
-6 -3 -2 -3 -5 -5
-8 -5 -2 -1 -3 -4
-10 -7 -4 -3 0 -2
i=0 i=1 i=2 i=3 i=4 i=5
j=0
j=1
j=2
j=3
j=4
j=5
A
T
C
G
T
T G G T G
Trace Back
S1= TGGTG
S2= ATCGT
-
A
T
|
T
G
C
G
|
G
T
|
T
G
-
→ Score = 3-1-4 = -2
Work Out cont..
W H A T
0 -2 -4 -6 -8
W -2 1 -1 -3 -5
H -4 -1 2 0 -2
Y -6 -3 0 1 -1
W H A T
0 -2 -4 -6 -8
W -2 1 -1 -3 -5
H -4 -1 2 0 -2
Y -6 -3 0 1 -1
W H A T
0 -2 -4 -6 -8
W -2 1 -1 -3 -5
H -4 -1 2 0 -2
Y -6 -3 0 1 -1
W
|
W
H
|
H
A
-
T
Y
(Pink traceback)
W
|
W
H
|
H
A
Y
T
-
(Orange traceback)
match:+1
mismatch:-1
gap:-2
Two possible trace backs ?
Work Out cont..
W H A T
0 -2 -4 -6 -8
W -2 1 -1 -3 -5
H -4 -1 2 0 -2
Y -6 -3 0 1 -1
W H A T
0 -2 -4 -6 -8
W -2 1 -1 -3 -5
H -4 -1 2 0 -2
Y -6 -3 0 1 -1
W H A T
0 -2 -4 -6 -8
W -2 1 -1 -3 -5
H -4 -1 2 0 -2
Y -6 -3 0 1 -1
W
|
W
H
|
H
A
-
T
Y
(Pink traceback)
W
|
W
H
|
H
A
Y
T
-
(Orange traceback)
Performance
• The N-W algorithm takes time proportion to n2
• Accessing all possible alignment one by one 2nCn
N2 < 2nCn
N-W is much faster than assessing all possible alignments one-
by-one
APPLICATIONS
Role of weighing factors
in evaluating a maximum
match
Proteins not expected to
exhibit homology
Proteins expected to
exhibit homology
• Whale myoglobin
• Human β-hemoglobin
• Bovin pancreatic ribonuclease
• Hen’s egg lysozyme
APPLICATION OF THE METHOD
• Identification of the types of amino acid pairs
• Establish variable sets consisting of values to be assigned
to each type of pair
• Determine a value for the penalty
TYPES OF AMINO ACID PAIRS
• Pairs having a maximum of three
corresponding bases in their codonsType 3
• Pairs having a maximum of two
corresponding bases in their codonsType 2
• Pairs having a maximum of one
corresponding bases in their codonsType 1
• Pairs having no possible
corresponding bases in their codonsType 0
• Reading the amino acid sequences to be compared into the
computer
• Maximum-Correspondence array
– Contain all possible pairs of amino acids
– Identify each pair to the corresponding type
• Generating the two-dimensional array row-by-row
• Assigning the variable set containing the type values and
appropriate value from that set to the appropriate cell of the
comparison array
METHODOLOGY
Nucleotide sequences of
RNA codons recognized by
AA-tRNA*
*Marshall RE, Caskey CT, Nirenberg M. Fine
structure of RNA codewords recognized by
bacterial, amphibian, and mammalian transfer
RNA. Science. 1967 Feb 17;155(3764):820–826.
• Determination of the maximum-match by the procedure
of successive summations
• Randomizing the amino acid sequence of only one
member of the protein
– Sequences of β-hemoglobin and ribonuclease
– Randomization procedure: A sequence shuffling routine based
on computer-generated random
numbers
• Repeating the cycle of sequence randomization &
maximum-match determination
• Estimating the average and standard deviation for the
random values of each variable set
METHODOLOGY
RESULTS AND DISCUSSION
• A small random sample size (ten)
• Assumption: For each set of variables the random
values would be distributed in the fashion
of the normal-error curve
• The values of the first six random sets in the β-hemoglobin–
myoglobin comparison were converted to standard measures
• Probit plot
β-HEMOGLOBIN –
MYOGLOBIN MAXIMUM MATCHES
RIBONUCLEASE –
LYSOZYME MAXIMUM MATCHES
β-HEMOGLOBIN –
MYOGLOBIN MAXIMUM MATCHES
β-HEMOGLOBIN –
MYOGLOBIN MAXIMUM MATCHES
β-HEMOGLOBIN –
MYOGLOBIN MAXIMUM MATCHES
β-HEMOGLOBIN –
MYOGLOBIN MAXIMUM MATCHES
β-HEMOGLOBIN –
MYOGLOBIN MAXIMUM MATCHES
• To detect homology and define its nature
• Assumption:
– Homologous proteins are the result of gene duplication
and subsequent mutations
• Construct several hypothetical amino-acid sequences that would
be expected to show homology
– Following the duplications, point mutations occur at a
constant or variable rate
• After a relatively short period of time pairs will have nearly
identical sequences
USEFULLNESS
DETECTION OF THE HIGH DEGREE
OF HOMOLOGY PRESENT
• Use of values for non-identical pairs
• Assigning a relative high penalty for gaps
• Attaching substantial weight
• Reducing the penalty
• Assessing a very small or even negative penalty factor
THE NATURE OF HOMOLOGY
• Indication?
–Variables which maximize the significance of
the difference between real and random
proteins
EVOLUTIONARY DIVERGENCE
• Similar populations accumulate difference over
evolutionary time, and so become increasingly
distinct
EVOLUTIONARY DIVERGENCE
• “Divergent evolution" can be applied to molecular
biology characteristics.
• To genes and proteins derived from two or more
homologous genes
• Assignment of weight to type 2 pairs
– Enhances the significance of the results
– Substantial Evolutionary Divergence
EVOLUTIONARY DIVERGENCE
• Exception??
– Evolutionary divergence manifested by
cytochrome and other heme proteins
• Non-random mutations along the genes
THE DEGREE
& TYPE OF HOMOLOGY
• Differ between protein pairs
• Due to the difference
– No a priori best set of cell and operation values
– No best set of value to detect only slight
homology
METHODS OF DETERMINING
THE DEGREE OF HOMOLOGY
• Counting the number of non-identical pairs in the
homologous comparison
• Counting the number of mutations represented by the
non-identical pairs
• Measure of evolutionary distance
QUICK WRAPUP

Mais conteúdo relacionado

Mais procurados

PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted MutationAmit Kyada
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure predictionkaramveer prajapat
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence AlignmentAjayPatil210
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentSanaym
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithmavrilcoghlan
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-naveed ul mushtaq
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebaseKew Sama
 
Blast fasta
Blast fastaBlast fasta
Blast fastayaghava
 
Dynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentDynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentGeethanjaliAnilkumar2
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignmentAfra Fathima
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignmentavrilcoghlan
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matricesAshwini
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENTMariya Raju
 

Mais procurados (20)

PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Global and Local Sequence Alignment
Global and Local Sequence AlignmentGlobal and Local Sequence Alignment
Global and Local Sequence Alignment
 
BLAST and sequence alignment
BLAST and sequence alignmentBLAST and sequence alignment
BLAST and sequence alignment
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Protein database
Protein databaseProtein database
Protein database
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
The uni prot knowledgebase
The uni prot knowledgebaseThe uni prot knowledgebase
The uni prot knowledgebase
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
Dynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignmentDynamic programming and pairwise sequence alignment
Dynamic programming and pairwise sequence alignment
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
smith - waterman algorithm.pptx
smith - waterman algorithm.pptxsmith - waterman algorithm.pptx
smith - waterman algorithm.pptx
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Pairwise sequence alignment
Pairwise sequence alignmentPairwise sequence alignment
Pairwise sequence alignment
 
Scoring matrices
Scoring matricesScoring matrices
Scoring matrices
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 

Destaque

The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithmavrilcoghlan
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmProshantaShil
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignmentKubuldinho
 
Algorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsAlgorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsNatasha Mandal
 
Optimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationOptimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationAlessandro Samuel-Rosa
 
spsann - optimization of sample patterns using spatial simulated annealing
spsann - optimization of sample patterns using  spatial simulated annealingspsann - optimization of sample patterns using  spatial simulated annealing
spsann - optimization of sample patterns using spatial simulated annealingAlessandro Samuel-Rosa
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction CS, NcState
 
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level ProblemsModular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level ProblemsStefano Costanzo
 
Global local alignment
Global local alignmentGlobal local alignment
Global local alignmentScott Hamilton
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresLars Juhl Jensen
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniquesruchibioinfo
 
Ring -shortcut every thing(logbar inc.)
Ring -shortcut every thing(logbar inc.)Ring -shortcut every thing(logbar inc.)
Ring -shortcut every thing(logbar inc.)santoshi ravali
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Pritom Chaki
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelizationMário Almeida
 

Destaque (20)

The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Introduction to sequence alignment
Introduction to sequence alignmentIntroduction to sequence alignment
Introduction to sequence alignment
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Spam attacks
Spam attacksSpam attacks
Spam attacks
 
Algorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial OperationsAlgorithms for Query Processing and Optimization of Spatial Operations
Algorithms for Query Processing and Optimization of Spatial Operations
 
Optimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationOptimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimation
 
spsann - optimization of sample patterns using spatial simulated annealing
spsann - optimization of sample patterns using  spatial simulated annealingspsann - optimization of sample patterns using  spatial simulated annealing
spsann - optimization of sample patterns using spatial simulated annealing
 
Ch06 rna
Ch06 rnaCh06 rna
Ch06 rna
 
Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction Local vs. Global Models for Effort Estimation and Defect Prediction
Local vs. Global Models for Effort Estimation and Defect Prediction
 
Blast
BlastBlast
Blast
 
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level ProblemsModular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
Modular Multi-Objective Genetic Algorithm for Large Scale Bi-level Problems
 
Global local alignment
Global local alignmentGlobal local alignment
Global local alignment
 
Prediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein featuresPrediction of protein function from sequence derived protein features
Prediction of protein function from sequence derived protein features
 
Sequence comparison techniques
Sequence comparison techniquesSequence comparison techniques
Sequence comparison techniques
 
Ring -shortcut every thing(logbar inc.)
Ring -shortcut every thing(logbar inc.)Ring -shortcut every thing(logbar inc.)
Ring -shortcut every thing(logbar inc.)
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelization
 

Semelhante a The Needleman-Wunsch Algorithm for Sequence Alignment

Semelhante a The Needleman-Wunsch Algorithm for Sequence Alignment (20)

Ch06 alignment
Ch06 alignmentCh06 alignment
Ch06 alignment
 
Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
Bioinformatics t4-alignments v2014
Bioinformatics t4-alignments v2014Bioinformatics t4-alignments v2014
Bioinformatics t4-alignments v2014
 
Class13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfClass13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdf
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
02-alignment.pdf
02-alignment.pdf02-alignment.pdf
02-alignment.pdf
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
ictir2016
ictir2016ictir2016
ictir2016
 
Sequence alignment unit 3
Sequence alignment unit 3Sequence alignment unit 3
Sequence alignment unit 3
 
PCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdfPCB_Lect02_Pairwise_allign (1).pdf
PCB_Lect02_Pairwise_allign (1).pdf
 
Sorting Algorithms
Sorting AlgorithmsSorting Algorithms
Sorting Algorithms
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Data Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptxData Mining Lecture_8(a).pptx
Data Mining Lecture_8(a).pptx
 
seq alignment.ppt
seq alignment.pptseq alignment.ppt
seq alignment.ppt
 
Statistical Process Control
Statistical Process ControlStatistical Process Control
Statistical Process Control
 

Mais de Parinda Rajapaksha

Identifying adverse drug reactions by analyzing twitter messages
Identifying adverse drug reactions by analyzing twitter messagesIdentifying adverse drug reactions by analyzing twitter messages
Identifying adverse drug reactions by analyzing twitter messagesParinda Rajapaksha
 
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Parinda Rajapaksha
 
Scientific methods in computer science
Scientific methods in computer scienceScientific methods in computer science
Scientific methods in computer scienceParinda Rajapaksha
 
Gift 4 life v 1.1 (Blood Camp Management System)
Gift 4 life v 1.1 (Blood Camp Management System)Gift 4 life v 1.1 (Blood Camp Management System)
Gift 4 life v 1.1 (Blood Camp Management System)Parinda Rajapaksha
 

Mais de Parinda Rajapaksha (8)

Android development
Android developmentAndroid development
Android development
 
Realm mobile database
Realm mobile databaseRealm mobile database
Realm mobile database
 
Identifying adverse drug reactions by analyzing twitter messages
Identifying adverse drug reactions by analyzing twitter messagesIdentifying adverse drug reactions by analyzing twitter messages
Identifying adverse drug reactions by analyzing twitter messages
 
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
 
Scientific methods in computer science
Scientific methods in computer scienceScientific methods in computer science
Scientific methods in computer science
 
Gift 4 life v 1.1 (Blood Camp Management System)
Gift 4 life v 1.1 (Blood Camp Management System)Gift 4 life v 1.1 (Blood Camp Management System)
Gift 4 life v 1.1 (Blood Camp Management System)
 
Ceylon tobacco company (ctc)
Ceylon tobacco company (ctc)Ceylon tobacco company (ctc)
Ceylon tobacco company (ctc)
 
Relaxation method
Relaxation methodRelaxation method
Relaxation method
 

Último

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durbanmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 

Último (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 

The Needleman-Wunsch Algorithm for Sequence Alignment

  • 1. Saul B. Needleman & Christian D. Wuncsch (1969) KAVINDRI DILSHANI H.M.K.G BANDARA PARINDA RAJAPAKSHE
  • 2. ABOUT RESEARCH Title • “A General method applicable to the search for similarities in the amino acid sequence of two proteins” Authors • Saul B. Needleman & Christian D. Wuncsch, Department of Biochemistry, North-western University & Nuclear Medicine Service , V.A Research Hospital ,Chicago, USA. (1969) [Cited by 8474] S.B. Needleman & C.D. Wuncsch , “A General method applicable to the search for similarities in the amino acid sequence of two proteins” , J. Mol . Biol .(1970) 48, 443- 453.
  • 3. OUTLINE • Introduction - Sequence Alignment - Approaches - Needleman-Wunch Algorithm vs. Dynamic Programming • Example - Optimal Alignment Score - Optimal Alignment - Algorithm Cost • Applications - Results & Discussion - Methodology - Usefulness
  • 5. SEQUENCE ALIGNMENT • Sequence alignment is a way of arranging two or more sequences of characters to identify regions of similarity. • Identification of residue-residue correspondences • Sequence : Can be taken as ordered strings of letters. • Sequences in Bio-Informatics ? • DNA sequences • RNA sequences • Protein sequences
  • 6. MOTIVATION • Find homologous proteins – Allows to predict structure and function • Locate similar subsequences in DNA – e.g.: Allows to identify regulatory elements – Infer Biological similarities • Locate DNA sequences that might overlap – Helps in sequence assembly
  • 7. SEQUENCE ALIGNMENT - RESULTS • Input: Two sequences over the same alphabet. - GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA • Output: An alignment of the two sequences. -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A • Input: Two sequences over the same alphabet. - GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA • Output: An alignment of the two sequences. -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A • Input: Two sequences over the same alphabet. - GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA • Output: An alignment of the two sequences. -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Insertions / Deletions (indel) Perfect matches Mismatches
  • 9. QUALITATIVE • Dot-plot -Pictorial representation & relationship between two sequences - Uses a Table or a Matrix - Doesn’t quantifies the similarity figure !!
  • 10. QUANTITATIVE • Construction of the best alignment between the sequences. • Assessment of the similarity from the alignment. ( Numerically Quantifies)
  • 11. GLOBAL SEQUENCE ALIGNMENT • The best alignment over the entire length of two sequences. • Suitable : Two sequences are of similar length, with a significant degree of similarity throughout .
  • 12. LOCAL SEQUENCE ALIGNMENT • Compares short portions of sequence or a whole library of sequences with short portions of another. • Suitable : Comparing substantially different sequences, which possibly differ significantly in length, and have only a short patches of similarity.
  • 13. MULTIPLE SEQUENCE ALIGNMENT • Simultaneous alignment of more than two sequences. • Suitable : Suitable when searching for subtle conserved sequence patterns in a protein family, and when more than two sequences of the protein family are available.
  • 14. EXAMPLE S1 : SIMILARITY S2 : PILLAR S3 : MOLARITY Global Local Multiple SIMILARITY PI-LLAR--- MILAR ILLAR SIMILARITY PI-LLAR--- --MOLARITY
  • 15. HOW TO QUANTIFY ? • Introduces a Scoring Schema • Set of rules which assigns the Alignment score to any given alignment of two sequences. • Alignment score : Goodness of Alignment • Scoring Schema Substitution scores Gap penalties
  • 16. THE SUBSITUTION MATRIX • Simple scoring schema for Residue substitution • Express the residue substitution costs can be achieved with a N x N matrix (N is 4 for DNA and 20 for proteins). C T A G C 1 -1 -1 -1 T -1 1 -1 -1 A -1 -1 1 -1 G -1 -1 -1 1
  • 17. EXAMPLE • Consider the "best" alignment of ATGGCGT and ATGAGT • +1 as a reward for a match, -1 as the penalty for a mismatch, and ignore gaps ATGGCGT ATG_ AGT Score: +1 + 1 + 1 + 0 - 1 + 1 + 1 = 4 Alternative alignment ATGGCGT A_TGAGT Score: +1 + 0 - 1 + 1 - 1 + 1 + 1 = 2
  • 18. BETTER MATRIX • Certain changes in DNA /Protein sequences are more likely to occur naturally than the others. • Proteins are composed of twenty amino acids, and physico-chemical properties of individual amino acids vary considerably. • Important to incorporate evolutionary relationships for this substitution schema.
  • 19. EVOLUTIONARY SUBSTITUTION MATRIX • PAM ("point accepted mutation") family - PAM250, PAM120, etc. • BLOSUM ("Blocks substitution matrix") family - BLOSUM62, BLOSUM50, etc. • Derived from the analysis of known alignments of closely related proteins • Assigns variable weights to different substitution operations.
  • 21. GAPS • A Gap, indicates consecutive run of spaces in an alignment , may be introduced in either sequence. (insertion or a deletion of a residue) • Objective :- Optimal sequence alignment with meaningful alignments. • Is it Good ? – Interrupts the entire polymer chain – In DNA shifts the reading frame Penalty
  • 22. GAP PENALTIES Constant Linear Affine • Whatever size it is, receives the constant negative penalty : -g • Depends linearly on the size of a gap. Parameter : -g, is the penalty per unit length of a gap. • Gap introduction cost > Gap extension cost g = o + (L-1)e. |e| < |o|
  • 23. ENRICHED SCORING SCHEMA • Scoring scheme provides us with the quantitative measure of how good is some alignment relative to alternative alignments . • Does this scoring scheme tell us how to find the best alignment ?
  • 24. BASIC APPROACH • Brute-force approach - Generate the list all possible alignments between two sequences, score them - Select the alignment with the best score – The number of possible global alignments between two sequences of length N is 22𝑁 𝜋𝑁 - For two sequences of 250 residues this is ~ 10149
  • 26. NEEDLEMAN-WUNCH ALGORITHM • Reduce the massive number of possibilities that need to be considered, yet still guarantees that the best solution will be found. • Global Sequence Alignment Technique. • Build up the best alignment by using optimal alignments of smaller sub sequences. • Dynamic Programming
  • 27. DYNAMIC PROGRAMMING • Dynamic Programming is an algorithmic paradigm. - Breaking problem into sub problems - Stores results of sub problems - Avoids computing the same results again. • Main properties of the problem - Overlapping Sub problems - Optimal substructure
  • 28. OVERLAPPING SUB PROBLEMS • Segregate main problem into sub-problems. • Mainly used when solutions of same sub problems are needed again and again. • Computed solutions to sub problems are stored in a table/matrix so that these don’t have to recomputed. • Dynamic Programming is not useful when there are no common (overlapping) sub problems.
  • 29. OPTIMAL SUBSTRUCTURE • If an optimal solution can be constructed efficiently from optimal solutions of its sub problems. • Optimal global solution contains the optimal solutions of all its sub problems. • Dynamic Programming is not useful when there isn’t optimal substructure in the problem.
  • 30. HOW IT WORKS? • Governed by three steps - Break the problem into smaller sub problems. - Solve the smaller problems optimally - Use the sub-problem solutions to construct an optimal solution for the original problem • Needleman-Wunsch Algorithm incorporates the Dynamic Algorithm paradigm  Optimal global alignment and the corresponding score.
  • 32. Definitions • A scoring function (σ) defines the score to give to a substitution mutation eg. -1 for a match, -1 for mismatch • A gap penalty defines the score to give to an insertion or deletion eg. -1 • A recurrence relation defines what actions we repeat at each iteration (step) of the algorithm T(i-1, j-1) + σ(S1(i), S2(j)) T(i, j) = max T(i-1, j) + gap penalty T(i, j-1) + gap penalty
  • 33. Steps • Step 1 – Fill up a matrix (table) T using the recurrence relation • Step 2 –The Trace back step use the filled-in matrix T to work out the best alignment
  • 34. Work Out • Sequences S1= TGGTG S2= ATCGT • Scoring function For matches : +1 For mismatches : -1 A C G T A +1 -1 -1 -1 C -1 +1 -1 -1 G -1 -1 +1 -1 T -1 -1 -1 +1 Substitution Matrix
  • 35. Work Out cont.. • Initializing the table T G G T G A T C G T i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 Left to Right Top to Bottom Step 1 - The value of T(0,0) is set to zero at the start
  • 36. Work Out cont.. T G G T G 0 A T C G T i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 T(i-1, j-1) + σ(S1(i), S2(j)) T(i, j) = max T(i-1, j) + gap penalty T(i, j-1) + gap penalty previous column & row previous column & same row same column & previous row Gap penalty = -2 A C G T A +1 -1 -1 -1 C -1 +1 -1 -1 G -1 -1 +1 -1 T -1 -1 -1 +1
  • 37. Work Out cont.. 0 -2 -4 -6 -8 -10 -2 -1 -3 -5 -7 -9 i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 A T C G T T G G T G A C G T A +1 -1 -1 -1 C -1 +1 -1 -1 G -1 -1 +1 -1 T -1 -1 -1 +1 Gap penalty = -2 T(i-1, j-1) + σ(S1(i), S2(j)) T(i, j) = max T(i-1, j) + gap penalty T(i, j-1) + gap penalty
  • 38. Work Out cont.. 0 -2 -4 -6 -8 -10 -2 -1 -3 -5 -7 -9 -4 -1 -2 -4 -4 -6 -6 -3 -2 -3 -5 -5 -8 -5 -2 -1 -3 -4 -10 -7 -4 -3 0 -2 i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 A T C G T T G G T G Gap penalty = -2
  • 39. Work Out cont.. 0 -2 -4 -6 -8 -10 -2 -1 -3 -5 -7 -9 -4 -1 -2 -4 -4 -6 -6 -3 -2 -3 -5 -5 -8 -5 -2 -1 -3 -4 -10 -7 -4 -3 0 -2 i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 A T C G T T G G T G Trace Back
  • 40. Work Out cont.. 0 -2 -4 -6 -8 -10 -2 -1 -3 -5 -7 -9 -4 -1 -2 -4 -4 -6 -6 -3 -2 -3 -5 -5 -8 -5 -2 -1 -3 -4 -10 -7 -4 -3 0 -2 i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 A T C G T T G G T G Trace Back
  • 41. Work Out cont.. 0 -2 -4 -6 -8 -10 -2 -1 -3 -5 -7 -9 -4 -1 -2 -4 -4 -6 -6 -3 -2 -3 -5 -5 -8 -5 -2 -1 -3 -4 -10 -7 -4 -3 0 -2 i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 A T C G T T G G T G Trace Back
  • 42. Work Out cont.. 0 -2 -4 -6 -8 -10 -2 -1 -3 -5 -7 -9 -4 -1 -2 -4 -4 -6 -6 -3 -2 -3 -5 -5 -8 -5 -2 -1 -3 -4 -10 -7 -4 -3 0 -2 i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 A T C G T T G G T G Trace Back
  • 43. Work Out cont.. 0 -2 -4 -6 -8 -10 -2 -1 -3 -5 -7 -9 -4 -1 -2 -4 -4 -6 -6 -3 -2 -3 -5 -5 -8 -5 -2 -1 -3 -4 -10 -7 -4 -3 0 -2 i=0 i=1 i=2 i=3 i=4 i=5 j=0 j=1 j=2 j=3 j=4 j=5 A T C G T T G G T G Trace Back S1= TGGTG S2= ATCGT - A T | T G C G | G T | T G - → Score = 3-1-4 = -2
  • 44. Work Out cont.. W H A T 0 -2 -4 -6 -8 W -2 1 -1 -3 -5 H -4 -1 2 0 -2 Y -6 -3 0 1 -1 W H A T 0 -2 -4 -6 -8 W -2 1 -1 -3 -5 H -4 -1 2 0 -2 Y -6 -3 0 1 -1 W H A T 0 -2 -4 -6 -8 W -2 1 -1 -3 -5 H -4 -1 2 0 -2 Y -6 -3 0 1 -1 W | W H | H A - T Y (Pink traceback) W | W H | H A Y T - (Orange traceback) match:+1 mismatch:-1 gap:-2 Two possible trace backs ?
  • 45. Work Out cont.. W H A T 0 -2 -4 -6 -8 W -2 1 -1 -3 -5 H -4 -1 2 0 -2 Y -6 -3 0 1 -1 W H A T 0 -2 -4 -6 -8 W -2 1 -1 -3 -5 H -4 -1 2 0 -2 Y -6 -3 0 1 -1 W H A T 0 -2 -4 -6 -8 W -2 1 -1 -3 -5 H -4 -1 2 0 -2 Y -6 -3 0 1 -1 W | W H | H A - T Y (Pink traceback) W | W H | H A Y T - (Orange traceback)
  • 46. Performance • The N-W algorithm takes time proportion to n2 • Accessing all possible alignment one by one 2nCn N2 < 2nCn N-W is much faster than assessing all possible alignments one- by-one
  • 48. Role of weighing factors in evaluating a maximum match Proteins not expected to exhibit homology Proteins expected to exhibit homology • Whale myoglobin • Human β-hemoglobin • Bovin pancreatic ribonuclease • Hen’s egg lysozyme
  • 49. APPLICATION OF THE METHOD • Identification of the types of amino acid pairs • Establish variable sets consisting of values to be assigned to each type of pair • Determine a value for the penalty
  • 50. TYPES OF AMINO ACID PAIRS • Pairs having a maximum of three corresponding bases in their codonsType 3 • Pairs having a maximum of two corresponding bases in their codonsType 2 • Pairs having a maximum of one corresponding bases in their codonsType 1 • Pairs having no possible corresponding bases in their codonsType 0
  • 51. • Reading the amino acid sequences to be compared into the computer • Maximum-Correspondence array – Contain all possible pairs of amino acids – Identify each pair to the corresponding type • Generating the two-dimensional array row-by-row • Assigning the variable set containing the type values and appropriate value from that set to the appropriate cell of the comparison array METHODOLOGY
  • 52. Nucleotide sequences of RNA codons recognized by AA-tRNA* *Marshall RE, Caskey CT, Nirenberg M. Fine structure of RNA codewords recognized by bacterial, amphibian, and mammalian transfer RNA. Science. 1967 Feb 17;155(3764):820–826.
  • 53. • Determination of the maximum-match by the procedure of successive summations • Randomizing the amino acid sequence of only one member of the protein – Sequences of β-hemoglobin and ribonuclease – Randomization procedure: A sequence shuffling routine based on computer-generated random numbers • Repeating the cycle of sequence randomization & maximum-match determination • Estimating the average and standard deviation for the random values of each variable set METHODOLOGY
  • 54. RESULTS AND DISCUSSION • A small random sample size (ten) • Assumption: For each set of variables the random values would be distributed in the fashion of the normal-error curve • The values of the first six random sets in the β-hemoglobin– myoglobin comparison were converted to standard measures • Probit plot
  • 55.
  • 63. • To detect homology and define its nature • Assumption: – Homologous proteins are the result of gene duplication and subsequent mutations • Construct several hypothetical amino-acid sequences that would be expected to show homology – Following the duplications, point mutations occur at a constant or variable rate • After a relatively short period of time pairs will have nearly identical sequences USEFULLNESS
  • 64. DETECTION OF THE HIGH DEGREE OF HOMOLOGY PRESENT • Use of values for non-identical pairs • Assigning a relative high penalty for gaps • Attaching substantial weight • Reducing the penalty • Assessing a very small or even negative penalty factor
  • 65. THE NATURE OF HOMOLOGY • Indication? –Variables which maximize the significance of the difference between real and random proteins
  • 66. EVOLUTIONARY DIVERGENCE • Similar populations accumulate difference over evolutionary time, and so become increasingly distinct
  • 67. EVOLUTIONARY DIVERGENCE • “Divergent evolution" can be applied to molecular biology characteristics. • To genes and proteins derived from two or more homologous genes • Assignment of weight to type 2 pairs – Enhances the significance of the results – Substantial Evolutionary Divergence
  • 68. EVOLUTIONARY DIVERGENCE • Exception?? – Evolutionary divergence manifested by cytochrome and other heme proteins • Non-random mutations along the genes
  • 69. THE DEGREE & TYPE OF HOMOLOGY • Differ between protein pairs • Due to the difference – No a priori best set of cell and operation values – No best set of value to detect only slight homology
  • 70. METHODS OF DETERMINING THE DEGREE OF HOMOLOGY • Counting the number of non-identical pairs in the homologous comparison • Counting the number of mutations represented by the non-identical pairs • Measure of evolutionary distance