SlideShare uma empresa Scribd logo
1 de 68
Baixar para ler offline
PRESENTATION BY
Mr. Nitin Maruti Naik
(M. Sc. SET, PGDBI, PGDGC & CP.)
Phylogenetic Analysis
INTRODUCTION
• A phylogenetic tree also known as a
phylogeny is a diagram that depicts the lines
of evolutionary descent of different species,
organisms, or genes from a common ancestor.
– Attempt to reconstruct evolutionary ancestors
– Estimate time of divergence from ancestor
• Can be used to solve a number of interesting
problems
– Forensics
• HIV virus mutates rapidly
– Predicting evolution of influenza viruses
– Predicting functions of uncharacterized genes -
orthologue detection
– Drug discovery
– Vaccine development
• Target inferred common ancestor
Objectives
• Evolution,
• Elements of phylogeny,
• Methods of phylogenetic analysis,
• Phylogenetic tree of life,
• Comparison of genetic sequence of
organisms,
• Phylogenetic analysis tools-
– Phylip,
– ClustalW.
Evolution
• Speciation
– Evolution of new organisms is driven by
• Mutations
– The DNA sequence can be changed due to single base changes,
deletion/insertion of DNA segments, etc.
• Selection bias
– Speciation events lead to creation of different species.
– Speciation caused by physical separation into groups where
different genetic variants become dominant
• Any two species share a (possibly distant) common ancestor
• The molecular clock hypothesis
Elements of phylogeny
A phylogenetic tree
• A phylogenetic tree is a graph reflecting the
approximate distances between a set of objects
(species, genes, proteins, families) in a hierarchical
fashion
▪ Leaves – current species; sequences in current
species
▪ Internal nodes - hypothetical common ancestors
▪ Branches (Edges) length - “time” from one
speciation to the next (branching represents
speciation into two new species)
Example of Rooted tree
Split (Bipartition)
Terminal Nodes (Leaf)
Interior Nodes (Vertex)
Branch (Edge)
Root of Tree
Taxon A
Taxon B
Taxon C
Taxon D
2
2
1
2
1
1
Taxon A
Taxon B
Taxon C
Taxon C
1
1
4
1
2
Fig. Example of an Unrooted Tree for 4 Taxa
A B C D E
Split (Bipartition)
Terminal Node (Leaf)
Interior Node (Vertex)
Branch (Edge)
Root of Tree
Fig. Terms Used in representing a phylogenetic Tree
Methods of phylogenetic analysis
Methods for analysing phylogenetic
tree
Distance Methods
• Also called Phenetic
• Trees are constructed by
similarity of sequences.
• Tree is called Dendrogram
• Does not necessarily reflect
evolutionary relationship.
• E.g.
– UPGMA clustering,
– Neighbour Joining
– Fitch-Margolish
Character Methods
• Also called Cladistic
• Trees are calculated by considering
various possible pathway of
evolution.
• Based on parsimony or likelihood
methods
• Tree is called Cladogram.
• Use each alignment position as
evolutionary information to build a
tree.
• E.g.
– Maximum parsimony
– Maximum likelihood
– Bayesian
Distance Methods
• UPGMA clustering,
• Neighbour Joining
• Fitch-Margolish
UPGMA
(Unweighted – Pair – Group – Method –with Arithmetic mean)
➢ Stands for Unweighted pair group method with
arithmetic mean.
➢ Originally developed for numeric taxonomy in 1958 by
Sokal and Michener.
➢ This method uses sequential clustering algorithm.
➢ Oldest Distance Method
➢ Proposed by Michener & Sokal in 1957
➢ Produces rooted trees.
➢ It assumes that the trees are ultrametric, meaning that
it assumes constant rate of substitutions in all branches
of the tree.
This method follows a clustering procedure:
(1) Assume that initially each species is a cluster
on its own.
(2) Join closest 2 clusters and recalculate
distance of the joint pair by taking the
average.
(3) Repeat this process until all species are
connected in a single cluster.
Employs a sequential clustering algorithm
I. Identify the two OTU’s from among all the OTUs, that are
most similar to each other and then treat these as a new
single OTU.
II. Subsequently from among the new group of OTUs,
identify the pair with the highest similarity, and so on.
• Advantage
– Fast
– Can handle many sequences
• Disadvantage
– Cannot be used when rates of substitutions are
unequal
– Does not consider multiple substitutions.
NEIGHBOUR JOINING METHOD
• Neighbor-joining methods apply general data
clustering techniques to sequence analysis
using genetic distance as a clustering metric.
• Developed in 1987 by Saitou and Nei.
• The simple neighbor-joining method produces
unrooted trees, but it does not assume a
constant rate of evolution (i.e., a molecular
clock) across lineages.
• It begins with an unresolved star-like tree .
• Each pair is evaluated for being joined and the
sum of all branches length is calculated of the
resultant tree.
• The pair that yields the smallest sum is considered
the closest neighbors and is thus joined .
• A new branch is inserted between them and the
rest of the tree and the branch length is
recalculated.
• This process is repeated until only one terminal is
present
DRAWBACKS
• But it produces only one tree and neglects
other possible trees, which might be as good
as NJ trees, if not significantly better.
• Moreover since errors in distance estimates
are exponentially larger for longer distances,
under some condition, this method will yield a
biased tree.
The Neighbour Joining Method
FITCH – MARGOLIASH METHOD
• Proposed in 1967
• Produces unrooted trees
• Criteria for fitting trees to distance matrices
• Uses a weighted least squares method for
clustering based on genetic distance.
• Closely related sequences are given more weight
in the tree construction process to correct for the
increased inaccuracy in measuring distances
between distantly related sequences.
The Fitch/Margoliash method
Character Based methods
• Maximum Parsimony
• Maximum Likelihood(ML)
Maximum Parsimony
(Fitch, 1977)
• Parsimony – carefulness in the use of resources.
• The basic underlying principle behind parsimony is
given by Occam’s Razor:
• “Given a choice between – a hard and easy way of
doing things, nature will always pick the easiest way i.e.
simple is always preferred over complex.”
• Parsimony assumes that the relationship that requires
the fewest number of mutations to explain the current
state of sequences being considered is the relationship
that is most likely to be correct.
concept of parsimony
• The concept of parsimony is at the heart of all character
based methods of phylogenetic reconstruction.
• The 2 fundamental ideas of biological parsimony are:
– Mutations are exceedingly rare events ;
– The more unlikely events a model invokes, the less likely the
model is to be correct.
• As a result, the relationship that requires the fewest
number of mutations to explain the current state of
the sequences being considered, is the
relationship that is most likely to be
correct.
Example
• Multiple sequence alignment, for a parsimony approach, contains positions
that fall into two categories in terms of their information content : those that
have information (are informative) and those that do not (are uninformative).
Example:
• seq 1 2 3 4 5 6
• 1 G G G G G G
• 2 G G G A G T
• 3 G G A T A G
• 4 G A T C A T
• Position 1 is said invariant and therefore uninformative, because all trees
invoke the same number of mutations (0);
• Position 2 is uninformative because 1 mutation occurs in all three possible
trees;
• Position 3, 2 mutations occur; Position 4 requires 3 mutations in all possible
trees.
• Positions 5 and 6 are informative, because one of the trees invokes only one
mutation and the other 2 alternative trees both require 2 mutations.
• In general, for a position to be informative regardless of how many sequences
are aligned, it has to have at least 2 different nucleotides, and each of these
nucleotides has to be present at least twice.
• The maximum parsimony algorithm searches
for the minimum number of genetic events
(nucleotide substitutions or amino acids
changes) to infer the most parsimonious tree
from a set of sequences.
• The best tree is one which needs fewest
changes.
• Maximum Parsimony (positive points):
– Does not reduce sequence information to a single number
– Tries to provide information on the ancestral sequences
– Evaluates different trees
• Maximum Parsimony (negative points):
– Is slow in comparison with distance methods
– Does not use all the sequence information (only
informative sites are used)
– Does not correct for multiple mutations (does not imply a
model of evolution)
– Does not provide information on the branch lengths
Bootstrapping
• Bootstrap analysis:
• is a statistical method for obtaining an
estimate of error.
• Is used to evaluate the reliability of a tree
• Is used to examine how often a particular
cluster in a tree appears when nucleotides or
amino acids are resampled.
Maximum Parsimony
Maximum likelihood
• This approach is a purely statistical based method.
• Probabilities are considered for every individual
nucleotide substitutions in a set of sequence alignment.
• Since transitions are observed roughly 3 times as often as
transversions; it can be reasonably argued that a greater
likelihood exists that the sequence with C and T are more
closely related to each other than they are to the
sequence with G.
• Calculation of probabilities is complicated by the fact that
the sequence of the common ancestor to the sequences
considered being unknown.
• Furthermore multiple substitutions may have occurred at
one or more sites and that all sites are not necessarily
independent or equivalent.
• Notes :
• 1. This is the best justified method from a
theoretical viewpoint;
• 2. ML estimates the branch lengths of the final
tree ;
• 3. ML methods are usually consistent ;
• 4. Sequence simulation experiments have shown
that this method works better than all others in
most cases.
• Drawbacks : they need long computation time to
construct a tree.
Maximum Likelihood(ML)
Advantages and disadvantages of character
based methods
• Advantages
– MP tries to provide information on the ancestral
sequences
– ML tends to outperform alternative methods such as
parsimony or distance methods even with very short
sequences
• Disadvantages
– Slow in comparison with distance methods
– MP does not use all the sequence information
– ML result is dependent on the model of evolution
used
Applications
• There are wide array of applications of
phylogenetic analysis which include:
– Evolution studies
– Medical research and epidemiology
– In ecology
– In criminal studies
– Finding the orthologues and paralogs
Comparison of genetic sequence of organisms
Phylogenetic analysis tools-
• Phylip,
• ClustalW/X.
PHYLIP (Phylogeny Inference Package)
http://evolution.genetics.washington.edu/phylip.html
• Available free in Windows/MacOS/Linux
systems
• Parsimony, distance matrix and likelihood
methods (bootstrapping and consensus trees)
• Data can be molecular sequences, gene
frequencies, restriction sites and fragments,
distance matrices and discrete characters
PHYLIP (Phylogeny Inference Package)
http://evolution.genetics.washington.edu/phylip.html
• PHYLIP (the PHYLogeny Inference Package) is a package of programs for
inferring phylogenies (evolutionary trees).
• It is available free over the Internet, and written to work on as many
different kinds of computer systems as possible.
• The source code is distributed (in C), and executables are also distributed.
• In particular, already-compiled executables are available for Windows
(95/98/NT/2000/me/xp/Vista), Mac OS X, and Linux systems.
• Older executables are also available for Mac OS 8 or 9 systems.
• Complete documentation is available on documentation files that come
with the package.
The Phylip Manual
• is an excellent source of information.
• Brief one line descriptions of the programs are here
• The easiest way to run PHYLIP programs is via a command
line menu (similar to clustalw).
• The program is invoked through clicking on an icon, or by typing the
program name at the command line.
• > protdist
• > neighbor
• If there is no file called infile the program responds with:
• [gogarten@carrot gogarten]$ seqboot
• seqboot: can't find input file "infile"
• Please enter a new file name>
•
Methods
• Methods that are available in the package include
– parsimony,
– distance matrix, and
– likelihood methods, including
• bootstrapping and
• consensus trees.
• Data types that can be handled include
– molecular sequences,
– gene frequencies,
– restriction sites and fragments,
– distance matrices, and
– discrete characters.
Programs
• The programs are controlled through a menu, which asks the users
which options they want to set, and allows them to start the
computation.
• The data are read into the program from a text file, which the user
can prepare using any word processor or text editor (but it is
important that this text file not be in the special format of that
word processor -- it should instead be in "flat ASCII" or "Text Only"
format).
• Some sequence analysis programs such as the ClustalW alignment
program can write data files in the PHYLIP format.
• Most of the programs look for the data in a file called "infile" -- if
they do not find this file they then ask the user to type in the file
name of the data file.
• Output is written onto special files with names like "outfile" and
"outtree".
• Trees written onto "outtree" are in the Newick format, an informal
standard agreed to in 1986 by authors of a number of major
phylogeny packages.
• At this stage we do not have a mouse-windows interface for PHYLIP.
• PHYLIP is probably the most widely-distributed phylogeny package.
• It is the sixth most frequently cited phylogeny package,
after MrBayes, PAUP*, RAxML, Phyml, and MEGA.
• PHYLIP is also the oldest widely-distributed package.
• It has been in distribution since October, 1980, and has
over 30,000 registered users.
• It is still being updated.
program folder
Menu interface
CLUSTAL – w
• www.ebi.ac.uk/clustalw/
• Clustal is progressive MSA program available
either as a stand alone or online program.
• Clustal is a widely used multiple sequence
alignment computer program.
The latest version is 2.0. There are two
main variations:
• ClustalW: command line interface
• ClustalX: This version has a graphical user
interface.
• It is available for Windows, Mac OS, and
Unix/Linux.
• This program is available from the Clustal
Homepage or European Bioinformatics Institute
ftp server.
There are three main steps:
• Do a pair wise alignment
• Create a phylogenetic tree (or use a user-
defined tree)
• Use the phylogenetic tree to carry out a
multiple alignment
Phylogenetic analysis
Phylogenetic analysis

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Maximum parsimony
Maximum parsimonyMaximum parsimony
Maximum parsimony
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
History and scope in bioinformatics
History and scope in bioinformaticsHistory and scope in bioinformatics
History and scope in bioinformatics
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Fasta
FastaFasta
Fasta
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
Genome Database Systems
Genome Database Systems Genome Database Systems
Genome Database Systems
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICSSTRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
STRUCTURAL GENOMICS, FUNCTIONAL GENOMICS, COMPARATIVE GENOMICS
 
Phylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny ofPhylogenetic tree and its construction and phylogeny of
Phylogenetic tree and its construction and phylogeny of
 
UPGMA
UPGMAUPGMA
UPGMA
 
Protein database
Protein databaseProtein database
Protein database
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 

Semelhante a Phylogenetic analysis

Bls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBls 303 l1.phylogenetics
Bls 303 l1.phylogenetics
Bruno Mmassy
 
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdfphylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
alizain9604
 

Semelhante a Phylogenetic analysis (20)

BTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptxBTC 506 Phylogenetic Analysis.pptx
BTC 506 Phylogenetic Analysis.pptx
 
Phylogenetic tree construction
Phylogenetic tree constructionPhylogenetic tree construction
Phylogenetic tree construction
 
Bls 303 l1.phylogenetics
Bls 303 l1.phylogeneticsBls 303 l1.phylogenetics
Bls 303 l1.phylogenetics
 
Methods of illustrating evolutionary relationship
Methods of illustrating evolutionary relationshipMethods of illustrating evolutionary relationship
Methods of illustrating evolutionary relationship
 
Bioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptxBioinformatics presentation shabir .pptx
Bioinformatics presentation shabir .pptx
 
Distance based method
Distance based method Distance based method
Distance based method
 
Phylogenetic data analysis
Phylogenetic data analysisPhylogenetic data analysis
Phylogenetic data analysis
 
07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf
 
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdfphylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
phylogenetictreeanditsconstructionandphylogenyof-191208102256.pdf
 
Introduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal ClassificationIntroduction to Modern Biosystemaics for Fungal Classification
Introduction to Modern Biosystemaics for Fungal Classification
 
local and global allignment
local and global allignmentlocal and global allignment
local and global allignment
 
Lecture 02 (2 04-2021) phylogeny
Lecture 02 (2 04-2021) phylogenyLecture 02 (2 04-2021) phylogeny
Lecture 02 (2 04-2021) phylogeny
 
PHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptxPHYLOGENETIC ANALYSIS_CSS2.pptx
PHYLOGENETIC ANALYSIS_CSS2.pptx
 
Molecular phylogenetics
Molecular phylogeneticsMolecular phylogenetics
Molecular phylogenetics
 
PHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGAPHYLOGENETICS WITH MEGA
PHYLOGENETICS WITH MEGA
 
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
 Multiple Sequence Alignment-just glims of viewes on bioinformatics. Multiple Sequence Alignment-just glims of viewes on bioinformatics.
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
 
phy prAC.pptx
phy prAC.pptxphy prAC.pptx
phy prAC.pptx
 
Phylogenetics Basics.pptx
Phylogenetics Basics.pptxPhylogenetics Basics.pptx
Phylogenetics Basics.pptx
 
Tree building
Tree buildingTree building
Tree building
 
Phylogenetic Tree evolution
Phylogenetic Tree evolutionPhylogenetic Tree evolution
Phylogenetic Tree evolution
 

Último

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 

Último (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 

Phylogenetic analysis

  • 1. PRESENTATION BY Mr. Nitin Maruti Naik (M. Sc. SET, PGDBI, PGDGC & CP.) Phylogenetic Analysis
  • 2. INTRODUCTION • A phylogenetic tree also known as a phylogeny is a diagram that depicts the lines of evolutionary descent of different species, organisms, or genes from a common ancestor. – Attempt to reconstruct evolutionary ancestors – Estimate time of divergence from ancestor
  • 3. • Can be used to solve a number of interesting problems – Forensics • HIV virus mutates rapidly – Predicting evolution of influenza viruses – Predicting functions of uncharacterized genes - orthologue detection – Drug discovery – Vaccine development • Target inferred common ancestor
  • 4. Objectives • Evolution, • Elements of phylogeny, • Methods of phylogenetic analysis, • Phylogenetic tree of life, • Comparison of genetic sequence of organisms, • Phylogenetic analysis tools- – Phylip, – ClustalW.
  • 5. Evolution • Speciation – Evolution of new organisms is driven by • Mutations – The DNA sequence can be changed due to single base changes, deletion/insertion of DNA segments, etc. • Selection bias – Speciation events lead to creation of different species. – Speciation caused by physical separation into groups where different genetic variants become dominant • Any two species share a (possibly distant) common ancestor • The molecular clock hypothesis
  • 7. A phylogenetic tree • A phylogenetic tree is a graph reflecting the approximate distances between a set of objects (species, genes, proteins, families) in a hierarchical fashion ▪ Leaves – current species; sequences in current species ▪ Internal nodes - hypothetical common ancestors ▪ Branches (Edges) length - “time” from one speciation to the next (branching represents speciation into two new species)
  • 8. Example of Rooted tree Split (Bipartition) Terminal Nodes (Leaf) Interior Nodes (Vertex) Branch (Edge) Root of Tree Taxon A Taxon B Taxon C Taxon D 2 2 1 2 1 1
  • 9. Taxon A Taxon B Taxon C Taxon C 1 1 4 1 2 Fig. Example of an Unrooted Tree for 4 Taxa
  • 10. A B C D E Split (Bipartition) Terminal Node (Leaf) Interior Node (Vertex) Branch (Edge) Root of Tree Fig. Terms Used in representing a phylogenetic Tree
  • 11.
  • 13. Methods for analysing phylogenetic tree Distance Methods • Also called Phenetic • Trees are constructed by similarity of sequences. • Tree is called Dendrogram • Does not necessarily reflect evolutionary relationship. • E.g. – UPGMA clustering, – Neighbour Joining – Fitch-Margolish Character Methods • Also called Cladistic • Trees are calculated by considering various possible pathway of evolution. • Based on parsimony or likelihood methods • Tree is called Cladogram. • Use each alignment position as evolutionary information to build a tree. • E.g. – Maximum parsimony – Maximum likelihood – Bayesian
  • 14. Distance Methods • UPGMA clustering, • Neighbour Joining • Fitch-Margolish
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. UPGMA (Unweighted – Pair – Group – Method –with Arithmetic mean) ➢ Stands for Unweighted pair group method with arithmetic mean. ➢ Originally developed for numeric taxonomy in 1958 by Sokal and Michener. ➢ This method uses sequential clustering algorithm. ➢ Oldest Distance Method ➢ Proposed by Michener & Sokal in 1957 ➢ Produces rooted trees. ➢ It assumes that the trees are ultrametric, meaning that it assumes constant rate of substitutions in all branches of the tree.
  • 21. This method follows a clustering procedure: (1) Assume that initially each species is a cluster on its own. (2) Join closest 2 clusters and recalculate distance of the joint pair by taking the average. (3) Repeat this process until all species are connected in a single cluster.
  • 22. Employs a sequential clustering algorithm I. Identify the two OTU’s from among all the OTUs, that are most similar to each other and then treat these as a new single OTU. II. Subsequently from among the new group of OTUs, identify the pair with the highest similarity, and so on.
  • 23. • Advantage – Fast – Can handle many sequences • Disadvantage – Cannot be used when rates of substitutions are unequal – Does not consider multiple substitutions.
  • 24. NEIGHBOUR JOINING METHOD • Neighbor-joining methods apply general data clustering techniques to sequence analysis using genetic distance as a clustering metric. • Developed in 1987 by Saitou and Nei. • The simple neighbor-joining method produces unrooted trees, but it does not assume a constant rate of evolution (i.e., a molecular clock) across lineages.
  • 25. • It begins with an unresolved star-like tree . • Each pair is evaluated for being joined and the sum of all branches length is calculated of the resultant tree. • The pair that yields the smallest sum is considered the closest neighbors and is thus joined . • A new branch is inserted between them and the rest of the tree and the branch length is recalculated. • This process is repeated until only one terminal is present
  • 26. DRAWBACKS • But it produces only one tree and neglects other possible trees, which might be as good as NJ trees, if not significantly better. • Moreover since errors in distance estimates are exponentially larger for longer distances, under some condition, this method will yield a biased tree.
  • 28.
  • 29.
  • 30. FITCH – MARGOLIASH METHOD • Proposed in 1967 • Produces unrooted trees • Criteria for fitting trees to distance matrices • Uses a weighted least squares method for clustering based on genetic distance. • Closely related sequences are given more weight in the tree construction process to correct for the increased inaccuracy in measuring distances between distantly related sequences.
  • 31.
  • 33.
  • 34.
  • 35. Character Based methods • Maximum Parsimony • Maximum Likelihood(ML)
  • 36. Maximum Parsimony (Fitch, 1977) • Parsimony – carefulness in the use of resources. • The basic underlying principle behind parsimony is given by Occam’s Razor: • “Given a choice between – a hard and easy way of doing things, nature will always pick the easiest way i.e. simple is always preferred over complex.” • Parsimony assumes that the relationship that requires the fewest number of mutations to explain the current state of sequences being considered is the relationship that is most likely to be correct.
  • 37. concept of parsimony • The concept of parsimony is at the heart of all character based methods of phylogenetic reconstruction. • The 2 fundamental ideas of biological parsimony are: – Mutations are exceedingly rare events ; – The more unlikely events a model invokes, the less likely the model is to be correct. • As a result, the relationship that requires the fewest number of mutations to explain the current state of the sequences being considered, is the relationship that is most likely to be correct.
  • 38. Example • Multiple sequence alignment, for a parsimony approach, contains positions that fall into two categories in terms of their information content : those that have information (are informative) and those that do not (are uninformative). Example: • seq 1 2 3 4 5 6 • 1 G G G G G G • 2 G G G A G T • 3 G G A T A G • 4 G A T C A T • Position 1 is said invariant and therefore uninformative, because all trees invoke the same number of mutations (0); • Position 2 is uninformative because 1 mutation occurs in all three possible trees; • Position 3, 2 mutations occur; Position 4 requires 3 mutations in all possible trees. • Positions 5 and 6 are informative, because one of the trees invokes only one mutation and the other 2 alternative trees both require 2 mutations. • In general, for a position to be informative regardless of how many sequences are aligned, it has to have at least 2 different nucleotides, and each of these nucleotides has to be present at least twice.
  • 39. • The maximum parsimony algorithm searches for the minimum number of genetic events (nucleotide substitutions or amino acids changes) to infer the most parsimonious tree from a set of sequences. • The best tree is one which needs fewest changes.
  • 40. • Maximum Parsimony (positive points): – Does not reduce sequence information to a single number – Tries to provide information on the ancestral sequences – Evaluates different trees • Maximum Parsimony (negative points): – Is slow in comparison with distance methods – Does not use all the sequence information (only informative sites are used) – Does not correct for multiple mutations (does not imply a model of evolution) – Does not provide information on the branch lengths
  • 41. Bootstrapping • Bootstrap analysis: • is a statistical method for obtaining an estimate of error. • Is used to evaluate the reliability of a tree • Is used to examine how often a particular cluster in a tree appears when nucleotides or amino acids are resampled.
  • 43.
  • 44. Maximum likelihood • This approach is a purely statistical based method. • Probabilities are considered for every individual nucleotide substitutions in a set of sequence alignment. • Since transitions are observed roughly 3 times as often as transversions; it can be reasonably argued that a greater likelihood exists that the sequence with C and T are more closely related to each other than they are to the sequence with G. • Calculation of probabilities is complicated by the fact that the sequence of the common ancestor to the sequences considered being unknown. • Furthermore multiple substitutions may have occurred at one or more sites and that all sites are not necessarily independent or equivalent.
  • 45. • Notes : • 1. This is the best justified method from a theoretical viewpoint; • 2. ML estimates the branch lengths of the final tree ; • 3. ML methods are usually consistent ; • 4. Sequence simulation experiments have shown that this method works better than all others in most cases. • Drawbacks : they need long computation time to construct a tree.
  • 47. Advantages and disadvantages of character based methods • Advantages – MP tries to provide information on the ancestral sequences – ML tends to outperform alternative methods such as parsimony or distance methods even with very short sequences • Disadvantages – Slow in comparison with distance methods – MP does not use all the sequence information – ML result is dependent on the model of evolution used
  • 48.
  • 49. Applications • There are wide array of applications of phylogenetic analysis which include: – Evolution studies – Medical research and epidemiology – In ecology – In criminal studies – Finding the orthologues and paralogs
  • 50.
  • 51. Comparison of genetic sequence of organisms
  • 52. Phylogenetic analysis tools- • Phylip, • ClustalW/X.
  • 53. PHYLIP (Phylogeny Inference Package) http://evolution.genetics.washington.edu/phylip.html • Available free in Windows/MacOS/Linux systems • Parsimony, distance matrix and likelihood methods (bootstrapping and consensus trees) • Data can be molecular sequences, gene frequencies, restriction sites and fragments, distance matrices and discrete characters
  • 54. PHYLIP (Phylogeny Inference Package) http://evolution.genetics.washington.edu/phylip.html • PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). • It is available free over the Internet, and written to work on as many different kinds of computer systems as possible. • The source code is distributed (in C), and executables are also distributed. • In particular, already-compiled executables are available for Windows (95/98/NT/2000/me/xp/Vista), Mac OS X, and Linux systems. • Older executables are also available for Mac OS 8 or 9 systems. • Complete documentation is available on documentation files that come with the package.
  • 55. The Phylip Manual • is an excellent source of information. • Brief one line descriptions of the programs are here • The easiest way to run PHYLIP programs is via a command line menu (similar to clustalw). • The program is invoked through clicking on an icon, or by typing the program name at the command line. • > protdist • > neighbor • If there is no file called infile the program responds with: • [gogarten@carrot gogarten]$ seqboot • seqboot: can't find input file "infile" • Please enter a new file name> •
  • 56. Methods • Methods that are available in the package include – parsimony, – distance matrix, and – likelihood methods, including • bootstrapping and • consensus trees. • Data types that can be handled include – molecular sequences, – gene frequencies, – restriction sites and fragments, – distance matrices, and – discrete characters.
  • 57. Programs • The programs are controlled through a menu, which asks the users which options they want to set, and allows them to start the computation. • The data are read into the program from a text file, which the user can prepare using any word processor or text editor (but it is important that this text file not be in the special format of that word processor -- it should instead be in "flat ASCII" or "Text Only" format). • Some sequence analysis programs such as the ClustalW alignment program can write data files in the PHYLIP format. • Most of the programs look for the data in a file called "infile" -- if they do not find this file they then ask the user to type in the file name of the data file.
  • 58. • Output is written onto special files with names like "outfile" and "outtree". • Trees written onto "outtree" are in the Newick format, an informal standard agreed to in 1986 by authors of a number of major phylogeny packages. • At this stage we do not have a mouse-windows interface for PHYLIP. • PHYLIP is probably the most widely-distributed phylogeny package.
  • 59. • It is the sixth most frequently cited phylogeny package, after MrBayes, PAUP*, RAxML, Phyml, and MEGA. • PHYLIP is also the oldest widely-distributed package. • It has been in distribution since October, 1980, and has over 30,000 registered users. • It is still being updated.
  • 60.
  • 61.
  • 64. CLUSTAL – w • www.ebi.ac.uk/clustalw/ • Clustal is progressive MSA program available either as a stand alone or online program. • Clustal is a widely used multiple sequence alignment computer program.
  • 65. The latest version is 2.0. There are two main variations: • ClustalW: command line interface • ClustalX: This version has a graphical user interface. • It is available for Windows, Mac OS, and Unix/Linux. • This program is available from the Clustal Homepage or European Bioinformatics Institute ftp server.
  • 66. There are three main steps: • Do a pair wise alignment • Create a phylogenetic tree (or use a user- defined tree) • Use the phylogenetic tree to carry out a multiple alignment