Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Sequence alignment
1. 1 Department of Zoology , GACW (2018-2019)
SEQUENCE ALIGNMENT
Introduction:
In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA,
RNA, or protein to identify regions of similarity that may be a consequence of
functional, structural, or evolutionary relationships between the sequences. It is an
important first step toward structural and functional analysis of newly determined
sequences. As new biological sequences are being generated at exponential rate,
sequence comparison is becoming increasingly important to draw functional and
evolutionary inference.
The sequence alignment is made between a known sequence and unknown
sequence or between two unknown sequences.
The known sequence is called reference sequence.
The unknown sequence is called query sequence.
Types of Sequence Alignment
Sequence Alignment is of two types, namely:
Global Alignment
Local Alignment
GLOBAL ALIGNMENT:
Global alignment program is based on Needleman-Wunsch algorithm
In global alignment, two sequences to be aligned are assumed to be generally similar
over their entire length. Alignment is carried out from beginning to end of both
sequences to find the best possible alignment across the entire length between the
two sequences.
Input: treat the two sequences as potentially equivalent
Goal: identify conserved regions and differences
Applications:
- Comparing two genes with same function (in human vs. mouse).
- Comparing two proteins with similar function.
LOCAL ALIGNMENT:
Local alignment program is based on Smith-Waterman.
Local alignment, on the other hand, does not assume that the two sequences in
question have similarity over the entire length.
2. 2 Department of Zoology , GACW (2018-2019)
It only finds local regions with the highest level of similarity between the two
sequences and aligns these regions without regard for the alignment of the rest of
the sequence regions.
The three primary methods of producing pairwise alignments are dot-matrix
methods, dynamic programming, and word methods.
Input: The two sequences may or may not be related
Goal: see whether a substring in one sequence aligns well with a substring in the
other
Applications:
Searching for local similarities in large sequences (e.g., newly sequenced
genomes).
Looking or conserved domins or motifs in two proteins.
PAIRWISE SEQUENCE ALIGNMENT
Pairwise sequence alignment methods are used to find the best-matching piecewise
(local or global) alignments of two query sequences.
Pairwise alignments can only be used between two sequences at a time, but they are
efficient to calculate.
The three primary methods of producing Pairwise alignments
1. Dot matrix method(old method)
2. The dynamic programming (DP) algorithm (advanced method)
3. Word or k -tuple methods
3. 3 Department of Zoology , GACW (2018-2019)
DOT MATRIX ANALYSIS
A dot matrix is a grid system
where the similar nucleotides of
two DNA sequences are
represented as dots.
It also called dot plots.
It is a Pairwise sequence
alignment made in the
computer.
The dots appear as colorless
dots in the computer screen.
In dot matrix, nucleotides of one sequence are written from the left to right on the
top row and those of the other sequence are written from the top to bottom on the
left side (column) of the matrix. At every point, where the two nucleotides are the
same, a dot in the intersection of row and column becomes a dark dot. When all
these darken dots are connected, it gives a graph called dot plot. The line found in
the dot plot is called recurrence plot. Each dot in the plot represents a matching
nucleotide or amino acid. Dot matrix method is a qualitative and simple to analyze
sequences. However, it takes much time to analyze large sequences.
Dot matrix method is useful for the following studies:
Sequence similarity between two nucleotide sequences or two amino acid
sequences.
Insertion of short stretches in DNA or amino acid sequence.
Deletion of short stretches from a DNA or amino acid sequence.
Repeats or inserted repeats in a DNA or amino acid sequence.
DYNAMIC PROGRAMMING METHOD
It was introduced by Richard Bellman in 1940.
The word programming here denotes finding an acceptable plan of action not
computer programming.
It is useful in aligning nucleotide sequence of DNA and amino acid sequence
of proteins coded by that DNA.
Dynamic programming is a three step process that involves :
1) Breaking of the problem into small sub problems.
2) Solving sub problems using recursive methods.
3) Construction of optimal solutions for original problem using the optimal
Solutions.
4. 4 Department of Zoology , GACW (2018-2019)
Global alignment program is based on Needleman-Wunsch algorithm and
local alignment on Smith-Waterman. Both algorithms are derivates from the
basic dynamic programming algorithm.
Example:
Alignment:
Sequence 1: G A A T T C A G T T A
Sequence 2: G G A T C G A
So M = 11 and N = 7 (the length of sequence #1 and sequence #2, respectively)
A simple scoring scheme is assumed where
Si,j = 1 if the residue at position i of sequence #1 is the same as the residue at position
j of sequence #2 (match score); otherwise
Si,j = 0 (mismatch score)
w = 0 (gap penalty)
Three steps in dynamic programming
1. Initialization
2. Matrix fill (scoring)
3. Traceback (alignment)
Initialization Step
The first step in the global alignment dynamic programming approach is to create
a matrix with M + 1 columns and N + 1 rows where M and N correspond to the
size of the sequences to be aligned.
The matrix can be initially filled with 0.
Matrix Fill Step
5. 5 Department of Zoology , GACW (2018-2019)
One possible (inefficient) solution of the matrix fill step finds the maximum global
alignment score by starting in the upper left hand corner in the matrix and
finding the maximal score Mi,j for each position in the matrix.
After filling in all of the values the score matrix is as follows:
Traceback Step
The traceback step determines the actual alignment(s) that result in the
maximum score.
6. 6 Department of Zoology , GACW (2018-2019)
Giving an alignment of:
G A A T T C A G T T A
| | | | | |
G G A _ T C _ G _ _ A
WORD METHOD OR K-TUPLE METHOD
It is used to find an optimal alignment solution,but is more than dynamic
programming .
This method is useful in large-scale database searches to find whether there is
significant match available with the query sequence.
Word method is used in the database search tools FASTA and the BLAST
family.
They identify a series of short, non-overlapping subsequences (words) of the
query sequence.
7. 7 Department of Zoology , GACW (2018-2019)
MULTIPLE SEQUENCE ALIGNMENT
Introduction:
Multiple Sequence Alignment (MSA) is generally the alignment of three or more
biological sequence Protein or Nucleic acid) of similar length. From the output,
homology can be inferred and the evolutionary relationship between the
sequences studied.
Types of MSA:
o Dynamic Programming approach
o Progressive method
o Iterative method
Dynamic Programming approach
In fact, dynamic programming is applicable to align any number of
sequences.
Computes an optimal alignment for a given score function.
Because of its high running time, it is not typically used in practice.
Progressive method:
In this method, Pairwise global alignment is performed for all the possible
and these pairs are aligned together on the basis of their similarity.
8. 8 Department of Zoology , GACW (2018-2019)
The most similar sequences are aligned together and then less related
sequences are added to it progressively one-by-one until a complete
multiple query set is obtained.
This method is also called hierarchical method or tree method
Iterative Method:
A method of performing a series of steps to produce successively better
approximation to align many sequences step-by-step is called iterative
method.
Here the Pairwise sequence alignment is totally avoided.
Iterative methods attempt to improve on the weak point of the progressive
methods the heavy dependence on the accuracy of the initial Pairwise
alignment.
9. 9 Department of Zoology , GACW (2018-2019)
Tools in MCA:
Clustal W, Clustal W2, Clustal Omega, Kalign, MAFFT, MUSCLE, M View, T-
Coffee etc.
Applications of MCA:
Detecting similarities between sequences(closely or distinctly related).
Detecting conserved regions or motifs in sequences.
Detecting of structural homologies.
Thus, assisting the improved prediction of secondary and tertiary
structures of proteins.
References:
https://en.wikipedia.org/wiki/Sequence_alignment
file:///C:/Users/god/Downloads/sequencealig-170209142647.pdf
http://avatar.se/lectures/molbioinfo2001/dynprog/dynamic.html
https://www.slideshare.net/RamyaS96/multiple-sequence-alignment-81493182