The S-W algorithm performs in local sequence alignment for determining two similar regions between two strings nucleotide sequences or protein sequence.
Instead of looking for entire sequence, S-W algorithm compares sequence of all possible lengths and optimizes similarity length.
2. WhatisS-W
algorithm??
The S-W algorithm performs in local sequence
alignment for determining two similar regions between
two strings nucleotide sequences or protein sequence.
Instead of looking for entire sequence, S-W algorithm
compares sequence of all possible lengths and
optimizes similarity length.
3. The algorithm was first proposed by Temple F.
Smith and Michael S. Waterman in 1981. Like
the Needleman–Wunsch algorithm, of which it is a variation,
Smith–Waterman is a dynamic programming algorithm.
As such, it has the desirable property that it is guaranteed to
find the optimal local alignment with respect to the scoring
system being used (which includes the substitution
matrix and the gap-scoring scheme).
The main difference to the Needleman–Wunsch algorithm is
that negative scoring matrix cells are set to zero, which
renders the (thus positively scoring) local alignments visible.
Traceback procedure starts at the highest scoring matrix cell
and proceeds until a cell with score zero is encountered,
yielding the highest scoring local alignment.
4. Substitution
matrix and gap
penalty schemes
A substitution matrix assigns each pair of bases or amino
acids a score for match or mismatch.
Usually matches get positive scores, whereas
mismatches get relatively lower scores.
A gap penalty function determines the score cost for
opening or extending gaps.
It is suggested that users choose the appropriate scoring
system based on the goals.
In addition, it is also a good practice to try different
combinations of substitution matrices and gap penalties.
5. Scoringmatrix
The dimensions of the scoring matrix are 1+length of
each sequence respectively.
All the elements of the first row and the first column are
set to 0.
The extra first row and first column make it possible to
align one sequence to another at any position, and setting
them to 0 makes the terminal gap free from penalty.
6.
7.
8. Scoring
Score each element from left to right, top to bottom in the
matrix, considering the outcomes of substitutions
(diagonal scores) or adding gaps (horizontal and vertical
scores).
If none of the scores are positive, this element gets a 0.
Otherwise the highest score is used and the source of
that score is recorded.
11. Traceback
Starting at the element with the highest score, traceback
based on the source of each score recursively, until 0 is
encountered.
The segments that have the highest similarity score
based on the given scoring system is generated in this
process.
To obtain the second best local alignment, apply the
traceback process starting at the second highest score
outside the trace of the best alignment.