SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Comparative genomics
in eukaryotes
Gene family analysis



  Klaas Vandepoele, PhD


Professor Ghent University
Comparative & Integrative Genomics
VIB – Ghent University, Belgium


                 http://www.bits.vib.be
Workflow




2
Applications of clustering the
        proteome(s)
       Gene families form the basis for the evolutionary
        (or phylogenetic) analysis of
          Detection of orthologs and paralogs
          Gene duplication, family expansions,
           pseudogene formation and gene loss
          Species taxonomies
          Horizontal Gene Transfer (HGT)
          Evolution of gene structure
             • Introns
             • Protein domain organisation &
               (re)arrangements
          Base composition and codon usage

3
I. Structural annotation: genome-
        wide versus family-wise
       Rationale family-wise annotation
           Since every gene has different (sequence)
            characteristics and different genes evolve at
            different rates, using these characteristics to
            determine homologous gene models will
            improve the overall structural annotation
            quality
       Properties:
           Slow & nearly-manual procedure
           High-quality gene models revealing biological
            novel findings

4
Workflow family-wise annotation
            procedure

  Collecting experi-        MSA experimental                          Family
                                                 HMMbuild
mental representatives       representatives                        HMM profile

              EST/cDNA


                                      BLAST                         Species X
                                                                    proteome
           Protein motifs                      Ab initio gene prediction

      Correction gene model               Putative
                                                                    HMMsearch
                                         Homologs
        Classification using
        Phylogenetic trees

5   Detailed characterization                                    http://hmmer.janelia.org/
Experimental representatives


InterProScan




PFAM HMM logo
     Clustalw + JalView




6
BLAST / HMMsearch


    1. Use multiple sequence
       alignment to create HMM profile
    2. Use HMM profile to search for
       similar proteins




7
Representatives + putative homologs

                                                                        BioEdit Sequence Editor




Suffix finalcds indicates corrected gene model compared to the original gene model
generate by the ab-initio gene prediction


             Multiple sequence alignments assist in the detection and
              correction of errors in the structural annotation (missed exon)
8
Representatives + putative homologs




Suffix finalcds indicates corrected gene model compared to the original gene model
generate by the ab-initio gene prediction


             Multiple sequence alignments assist in the detection of errors
              in the structural annotation (false first exon)
9
Examples of family-specific protein
         motifs




        B-type cyclins have HxKF signature
        Cyclin destruction boxes (B1-type cyclin R-[AV]LGDIGN)

10
Examples of family-specific protein
     Arabidopsis
     Rice
                        motifs




                      D-type cyclins contain LxCxE Rb-binding motif
                      Low conservation of phylogenetic signal at primary sequence level
                      General rules are rarely general: exceptions (i.e. missing protein
                       motifs) are frequent and might indicate functional divergence
11
Classification using phylogenetic
                tree construction
        A- and B-type cyclins
          are mitotic cyclins


                                                                           D-type cyclins are
                                                                               G1-specific



     H-type cyclins regulate activity
       of CDK-activating kinases




         • The complexity of the cyclin gene family appears to be higher in plants than in
         mammals
         • Whether there is functional redundancy within A- and B-type cyclins or different
         regulation (and expression) of some cyclin subclasses remains to be analyzed
12
Unraveling functional divergence using
     Genes   large-scale expression compendia




13
                           Plant tissues
Unraveling functional divergence using
             large-scale expression compendia


                                      A-type cyclin




                                      B-type cyclin
     Genes




                                      D-type cyclin



14
                      Plant tissues                   Genevestigator
II. Orthology & paralogy

        A major goal of sequence analysis is evolutionary
         reconstruction. It is critical to distinguish between two
         principal types of homologous relationships, which differ
         in their evolutionary history and functional implications.

        Orthologs, defined as homologous genes evolved
         through speciation (~evolutionary counterparts derived
         from a single ancestral gene in the last common ancestor
         of the given two species)

        Paralogs, which are homologous genes evolved through
         duplication within the same (perhaps ancestral) genome.

        These definitions were first introduced by Fitch (1970)

15
Orthology & paralogy inference


     Organism phylogeny        Gene phylogenies
     (species tree)                gene duplication
                                                              a1
                    A

                                                              b1

                    B                                         c1
                                          a1
                                               b)             a2
                                          a2
                    C                                         b2
                                          b1
                                                              c2
                          a)              b2
       speciation                                     Outparalogs

16                        Inparalogs      c1
In- and outparalogy




17   Sonnhammer & Koonin: Orthology, paralogy and proposed classification for paralog subtypes
Tree reconciliation

        The automatic detection of speciation and duplication
         events using a species tree and gene family tree




18
III. Types of proteome analysis




19
The evolution of multi-domain
     proteins




20
Interpreting the output of an all-
       against-all similarity search




     Metrics for sequence similarity:
     • E-value, Bit score or percent identity
21   • alignment coverage
Clustering of similar sequences




             Proteins = vertices ~ nodes
        Sequence similarity relationship = edges
22
Clustering of similar sequences




23
Advanced methods for protein
         (orthology) clustering
        Sequence similarity-based
            COG (RBH)         [Tatusov 1997]
            InParanoid        [Remm et al., 2001]
            Tribe-MCL         [Van Dongen 2000]
            OrthoMCL          [Li et al., 2003]

        Phylogenetic tree-based
            PhylomeDB         [Huerta-Cepas et al., 2007]
            Ensembl Compara   [Vilella et al., 2008]


24
Overview methodologies



     BBH
                               Inparanoid



            COG




                                 species overlap




25                                                 Gabaldon, 2008
              reconciliation
IV. Resources




26
Resources (bis)

        Ensembl (Vertebrates)
        EnsembGenomes (Metazoa, Protists,
         Fungi, Plants & Bacteria)

        OrthoMCLDB 5 (150 genomes)
        YGOB (>15 Fungi)




27
Hands-on

        Goal: identify and characterize gene family
         members encoding for talin 2 (TLN2)

         1.   Select Query gene
         2.   Retrieve homo/orthologs
         3.   Create multiple sequence alignment
         4.   Identify conserved positions
         5.   Create phylogenetic tree and identify
              ortho/paralogous genes



28

Mais conteúdo relacionado

Mais procurados

Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Leighton Pritchard
 
Gene mapping & its role in evolution
Gene mapping & its role in evolutionGene mapping & its role in evolution
Gene mapping & its role in evolutionmehwishmanzoor4
 
Gene mapping and gene cloning
Gene mapping and gene cloningGene mapping and gene cloning
Gene mapping and gene cloningChallaLasya
 
Cisgenesis and Intragenesis
Cisgenesis and IntragenesisCisgenesis and Intragenesis
Cisgenesis and Intragenesissharadabgowda
 
Comparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organellesComparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organellesKAUSHAL SAHU
 
3.1 genes (2)
3.1 genes (2)3.1 genes (2)
3.1 genes (2)lucascw
 
chloroplast genome ppt.
chloroplast genome ppt.chloroplast genome ppt.
chloroplast genome ppt.dbskkv
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingFOODCROPS
 
genetic linkage and gene mapping
genetic linkage and gene mappinggenetic linkage and gene mapping
genetic linkage and gene mappingMahammed Faizan
 
Linkage mapping and QTL analysis_Lecture
Linkage mapping and QTL analysis_LectureLinkage mapping and QTL analysis_Lecture
Linkage mapping and QTL analysis_LectureSameer Khanal
 
Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...Promila Sheoran
 
Gene mapping and cloning of disease gene
Gene mapping and cloning of disease geneGene mapping and cloning of disease gene
Gene mapping and cloning of disease geneDineshk117
 
Mapping the genome of bacteria
Mapping the genome of bacteriaMapping the genome of bacteria
Mapping the genome of bacteriaMeisam Ruzbahani
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomicsNikhil Aggarwal
 

Mais procurados (20)

Mapping population ppt
Mapping population pptMapping population ppt
Mapping population ppt
 
Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2Comparative Genomics and Visualisation - Part 2
Comparative Genomics and Visualisation - Part 2
 
Gene mapping & its role in evolution
Gene mapping & its role in evolutionGene mapping & its role in evolution
Gene mapping & its role in evolution
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Gene mapping and gene cloning
Gene mapping and gene cloningGene mapping and gene cloning
Gene mapping and gene cloning
 
Pradeep.ii
Pradeep.iiPradeep.ii
Pradeep.ii
 
Cisgenesis and Intragenesis
Cisgenesis and IntragenesisCisgenesis and Intragenesis
Cisgenesis and Intragenesis
 
Comparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organellesComparative genomics in eukaryotes, organelles
Comparative genomics in eukaryotes, organelles
 
Mapping
MappingMapping
Mapping
 
3.1 genes (2)
3.1 genes (2)3.1 genes (2)
3.1 genes (2)
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
chloroplast genome ppt.
chloroplast genome ppt.chloroplast genome ppt.
chloroplast genome ppt.
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 
Molecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breedingMolecular marker and its application to genome mapping and molecular breeding
Molecular marker and its application to genome mapping and molecular breeding
 
genetic linkage and gene mapping
genetic linkage and gene mappinggenetic linkage and gene mapping
genetic linkage and gene mapping
 
Linkage mapping and QTL analysis_Lecture
Linkage mapping and QTL analysis_LectureLinkage mapping and QTL analysis_Lecture
Linkage mapping and QTL analysis_Lecture
 
Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...Tetrad analysis, positive and negative interference, mapping through somatic ...
Tetrad analysis, positive and negative interference, mapping through somatic ...
 
Gene mapping and cloning of disease gene
Gene mapping and cloning of disease geneGene mapping and cloning of disease gene
Gene mapping and cloning of disease gene
 
Mapping the genome of bacteria
Mapping the genome of bacteriaMapping the genome of bacteria
Mapping the genome of bacteria
 
Comparative genomics and proteomics
Comparative genomics and proteomicsComparative genomics and proteomics
Comparative genomics and proteomics
 

Destaque

BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsBITS
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsBITS
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingmikaelhuss
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
Exchange your knowledge on plant gene families
Exchange your knowledge on plant gene familiesExchange your knowledge on plant gene families
Exchange your knowledge on plant gene familiesBioversity International
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...Daksh Raj Chopra
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 

Destaque (20)

BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
Exchange your knowledge on plant gene families
Exchange your knowledge on plant gene familiesExchange your knowledge on plant gene families
Exchange your knowledge on plant gene families
 
Analyzing and integrating probabilistic and deterministic computational model...
Analyzing and integrating probabilistic and deterministic computational model...Analyzing and integrating probabilistic and deterministic computational model...
Analyzing and integrating probabilistic and deterministic computational model...
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
IntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotationsIntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotations
 
Central dogma of dna
Central dogma of dnaCentral dogma of dna
Central dogma of dna
 
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSIN...
 
SCoT and RAPD
SCoT and RAPDSCoT and RAPD
SCoT and RAPD
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 

Semelhante a Comparative genomics gene family analysis

Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesKlaas Vandepoele
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomicsprateek kumar
 
Life science grade 12
Life science grade 12Life science grade 12
Life science grade 12seleka moema
 
HHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 BipolarHHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 BipolarHana (Hoang) Willner
 
Expression systems
Expression systemsExpression systems
Expression systemsAmjad Afridi
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...CSCJournals
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 
13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.pptsoniiKolhi
 
13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.pptPedramKashiani
 
13 miller-chap-5a-lecture
13 miller-chap-5a-lecture13 miller-chap-5a-lecture
13 miller-chap-5a-lectureAmit Gupta
 
miller-chap-5a
 miller-chap-5a miller-chap-5a
miller-chap-5aAmit Gupta
 
4_BCOR12_4develop_2008.ppt
4_BCOR12_4develop_2008.ppt4_BCOR12_4develop_2008.ppt
4_BCOR12_4develop_2008.pptGuillermo Lopez
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformKlaas Vandepoele
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionUdayBhanushali111
 

Semelhante a Comparative genomics gene family analysis (20)

Detection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomesDetection of genomic homology in eukaryotic genomes
Detection of genomic homology in eukaryotic genomes
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Life science grade 12
Life science grade 12Life science grade 12
Life science grade 12
 
HHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 BipolarHHMI Research poster -6-9-2014 Bipolar
HHMI Research poster -6-9-2014 Bipolar
 
Expression systems
Expression systemsExpression systems
Expression systems
 
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...Biological Significance of Gene Expression Data Using Similarity Based Biclus...
Biological Significance of Gene Expression Data Using Similarity Based Biclus...
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt
 
13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt13-miller-chap-5a-lecture.ppt
13-miller-chap-5a-lecture.ppt
 
13 miller-chap-5a-lecture
13 miller-chap-5a-lecture13 miller-chap-5a-lecture
13 miller-chap-5a-lecture
 
miller-chap-5a
 miller-chap-5a miller-chap-5a
miller-chap-5a
 
Microbiology Assignment Help
Microbiology Assignment HelpMicrobiology Assignment Help
Microbiology Assignment Help
 
Asnmnt 4
Asnmnt 4Asnmnt 4
Asnmnt 4
 
4_BCOR12_4develop_2008.ppt
4_BCOR12_4develop_2008.ppt4_BCOR12_4develop_2008.ppt
4_BCOR12_4develop_2008.ppt
 
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platformDissecting plant genomes with the PLAZA 2.5 comparative genomics platform
Dissecting plant genomes with the PLAZA 2.5 comparative genomics platform
 
THE human genome
THE human genomeTHE human genome
THE human genome
 
Molecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contructionMolecular basis of evolution and softwares used in phylogenetic tree contruction
Molecular basis of evolution and softwares used in phylogenetic tree contruction
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 
2014 intro-genetics
2014 intro-genetics2014 intro-genetics
2014 intro-genetics
 

Mais de BITS

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6BITS
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsBITS
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsBITS
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl courseBITS
 
Basics statistics
Basics statistics Basics statistics
Basics statistics BITS
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networksBITS
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksBITS
 
Genevestigator
GenevestigatorGenevestigator
GenevestigatorBITS
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics courseBITS
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structureBITS
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...BITS
 

Mais de BITS (19)

RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 
BITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generationBITS - Introduction to Mass Spec data generation
BITS - Introduction to Mass Spec data generation
 
BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2BITS training - UCSC Genome Browser - Part 2
BITS training - UCSC Genome Browser - Part 2
 
Marcs (bio)perl course
Marcs (bio)perl courseMarcs (bio)perl course
Marcs (bio)perl course
 
Basics statistics
Basics statistics Basics statistics
Basics statistics
 
Cytoscape: Integrating biological networks
Cytoscape: Integrating biological networksCytoscape: Integrating biological networks
Cytoscape: Integrating biological networks
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
Genevestigator
GenevestigatorGenevestigator
Genevestigator
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Vnti11 basics course
Vnti11 basics courseVnti11 basics course
Vnti11 basics course
 
Bits protein structure
Bits protein structureBits protein structure
Bits protein structure
 
BITS: Introduction to Linux - Software installation the graphical and the co...
BITS: Introduction to Linux -  Software installation the graphical and the co...BITS: Introduction to Linux -  Software installation the graphical and the co...
BITS: Introduction to Linux - Software installation the graphical and the co...
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Comparative genomics gene family analysis

  • 1. Comparative genomics in eukaryotes Gene family analysis Klaas Vandepoele, PhD Professor Ghent University Comparative & Integrative Genomics VIB – Ghent University, Belgium http://www.bits.vib.be
  • 3. Applications of clustering the proteome(s)  Gene families form the basis for the evolutionary (or phylogenetic) analysis of  Detection of orthologs and paralogs  Gene duplication, family expansions, pseudogene formation and gene loss  Species taxonomies  Horizontal Gene Transfer (HGT)  Evolution of gene structure • Introns • Protein domain organisation & (re)arrangements  Base composition and codon usage 3
  • 4. I. Structural annotation: genome- wide versus family-wise  Rationale family-wise annotation  Since every gene has different (sequence) characteristics and different genes evolve at different rates, using these characteristics to determine homologous gene models will improve the overall structural annotation quality  Properties:  Slow & nearly-manual procedure  High-quality gene models revealing biological novel findings 4
  • 5. Workflow family-wise annotation procedure Collecting experi- MSA experimental Family HMMbuild mental representatives representatives HMM profile EST/cDNA BLAST Species X proteome Protein motifs Ab initio gene prediction Correction gene model Putative HMMsearch Homologs Classification using Phylogenetic trees 5 Detailed characterization http://hmmer.janelia.org/
  • 7. BLAST / HMMsearch 1. Use multiple sequence alignment to create HMM profile 2. Use HMM profile to search for similar proteins 7
  • 8. Representatives + putative homologs BioEdit Sequence Editor Suffix finalcds indicates corrected gene model compared to the original gene model generate by the ab-initio gene prediction  Multiple sequence alignments assist in the detection and correction of errors in the structural annotation (missed exon) 8
  • 9. Representatives + putative homologs Suffix finalcds indicates corrected gene model compared to the original gene model generate by the ab-initio gene prediction  Multiple sequence alignments assist in the detection of errors in the structural annotation (false first exon) 9
  • 10. Examples of family-specific protein motifs  B-type cyclins have HxKF signature  Cyclin destruction boxes (B1-type cyclin R-[AV]LGDIGN) 10
  • 11. Examples of family-specific protein Arabidopsis Rice motifs  D-type cyclins contain LxCxE Rb-binding motif  Low conservation of phylogenetic signal at primary sequence level  General rules are rarely general: exceptions (i.e. missing protein motifs) are frequent and might indicate functional divergence 11
  • 12. Classification using phylogenetic tree construction A- and B-type cyclins are mitotic cyclins D-type cyclins are G1-specific H-type cyclins regulate activity of CDK-activating kinases • The complexity of the cyclin gene family appears to be higher in plants than in mammals • Whether there is functional redundancy within A- and B-type cyclins or different regulation (and expression) of some cyclin subclasses remains to be analyzed 12
  • 13. Unraveling functional divergence using Genes large-scale expression compendia 13 Plant tissues
  • 14. Unraveling functional divergence using large-scale expression compendia A-type cyclin B-type cyclin Genes D-type cyclin 14 Plant tissues Genevestigator
  • 15. II. Orthology & paralogy  A major goal of sequence analysis is evolutionary reconstruction. It is critical to distinguish between two principal types of homologous relationships, which differ in their evolutionary history and functional implications.  Orthologs, defined as homologous genes evolved through speciation (~evolutionary counterparts derived from a single ancestral gene in the last common ancestor of the given two species)  Paralogs, which are homologous genes evolved through duplication within the same (perhaps ancestral) genome.  These definitions were first introduced by Fitch (1970) 15
  • 16. Orthology & paralogy inference Organism phylogeny Gene phylogenies (species tree) gene duplication a1 A b1 B c1 a1 b) a2 a2 C b2 b1 c2 a) b2 speciation Outparalogs 16 Inparalogs c1
  • 17. In- and outparalogy 17 Sonnhammer & Koonin: Orthology, paralogy and proposed classification for paralog subtypes
  • 18. Tree reconciliation  The automatic detection of speciation and duplication events using a species tree and gene family tree 18
  • 19. III. Types of proteome analysis 19
  • 20. The evolution of multi-domain proteins 20
  • 21. Interpreting the output of an all- against-all similarity search Metrics for sequence similarity: • E-value, Bit score or percent identity 21 • alignment coverage
  • 22. Clustering of similar sequences Proteins = vertices ~ nodes Sequence similarity relationship = edges 22
  • 23. Clustering of similar sequences 23
  • 24. Advanced methods for protein (orthology) clustering  Sequence similarity-based  COG (RBH) [Tatusov 1997]  InParanoid [Remm et al., 2001]  Tribe-MCL [Van Dongen 2000]  OrthoMCL [Li et al., 2003]  Phylogenetic tree-based  PhylomeDB [Huerta-Cepas et al., 2007]  Ensembl Compara [Vilella et al., 2008] 24
  • 25. Overview methodologies BBH Inparanoid COG species overlap 25 Gabaldon, 2008 reconciliation
  • 27. Resources (bis)  Ensembl (Vertebrates)  EnsembGenomes (Metazoa, Protists, Fungi, Plants & Bacteria)  OrthoMCLDB 5 (150 genomes)  YGOB (>15 Fungi) 27
  • 28. Hands-on  Goal: identify and characterize gene family members encoding for talin 2 (TLN2) 1. Select Query gene 2. Retrieve homo/orthologs 3. Create multiple sequence alignment 4. Identify conserved positions 5. Create phylogenetic tree and identify ortho/paralogous genes 28