SlideShare a Scribd company logo
1 of 21
Download to read offline
Protein function and bioinformatics



   Outline of talk

       Why do we need bioinformatics?
   ●




       What tools do we need?
   ●




       Case study: The Methanococcoides burtonii genome
   ●




                                        Neil Saunders
                                        76-455
                                        n.saunders@uq.edu.au
                                        www.uq.edu.au/~uqnsaun1/
Protein function and bioinformatics
            Why do we need bioinformatics?




        Rapid increase in data due to genomics
    ●


        Too much data to characterise genes/proteins individually
    ●


        Bioinformatics = “smart use” of information
    ●


        Ideally, computational and experimental biology are partners
    ●
Protein function and bioinformatics
    The ideal computational – wet lab cycle


         Biological system                   Biological objects




            Experiments                    Computational objects




        Biological inferences                    Analyses




      Bioinformatics is about helping biologists solve problems
Protein function and bioinformatics
              Introduction to genomics


                                 Genomes Online database
                                   www.genomesonline.org
                                 ●




                                 Published/complete     413
                                 Bacteria in progress   977
                                 Eukarya in progress    629
                                 Archaea in progress     57
                                 Metagenomes             56




   10-50% of genes in a new genome may have no known function
Protein function and bioinformatics
        Computational skills for genomics



      "So what new skills will postdocs need to ensure that 
      they don't become science relics? The answer is math,
      statistics, and knowledge of a scripting language for 
      computers."

      ­The Scientist, "Bioinformatics Knowledge Vital to Careers"
      Volume 16 | Issue 17 | 53 | Sep. 2, 2002
      www.the­scientist.com
Protein function and bioinformatics
                    Using WWW resources

       The best web resources provide:
   ●


            - useful tools for analysis
            - integrated data from many sources

   Good examples
     InterPro database          http://www.ebi.ac.uk/interpro/
   ●


     Expasy                     http://au.expasy.org
   ●


     UniProt                    http://www.uniprot.org/
   ●


     CBS Prediction servers     http://www.cbs.dtu.dk/services/
   ●


     IMG Database               http://img.jgi.doe.gov/
   ●




   But...
     Web services no good for genome-scale analyses
   ●


     Usually limits to data input (with good reason)
   ●




   Nucleic Acids Research publishes annual database and
   web servers editions:       http://nar.oxfordjournals.org/
Protein function and bioinformatics
    Computational infrastructure for genomics

    Biological                                    Analysis
     objects                                     (limitless)

      Genome                                  Sequence analysis

     Assembly                                  Regulatory motifs
                        Computational
                          objects
  Gene sequence                               Structural modeling

  Protein sequence                                Phylogeny

  Protein structure                         Comparative genomics

      Pathway                               Pathway reconstruction


          Key points
            Appropriate hardware: workstation v. cluster
          ●


            Linux Linux Linux!
          ●


            Freely-available, open source software is all you need
          ●


            Toolkits and libraries (e.g. BioPerl) to build your own solutions
          ●


            Philosophy of “many small tools plus glue” - scripting language
          ●


            Website + database skills - sharing
          ●
Protein function and bioinformatics
    BioPerl: a life sciences computational toolkit
    Website: http://www.bioperl.org
●



    A collection of Perl modules for biology
●



    Handles many common tasks in sequence/structure analysis, e.g.
●


     - read/write various sequence formats
     - run BLAST and parse the output
     - read/write/analyse sequence alignments
     - access local or remote databases
Protein function and bioinformatics
           Annotation (or not) using BLAST
     BLAST: Basic Local Alignment and Search Tool
      Is useful for finding similar sequences quickly
    ●


      Not sensitive – less useful for weakly-similar sequences
    ●


      Not much good at all for annotation
    ●




    Why not?
      “Hypothetical”: the database sequence is unique
    ●


      “Conserved hypothetical”: several hits but no known function
    ●


      Multi-domain proteins
    ●


      BLAST database contains incorrect annotations
    ●


      Annotation is at the whim of whoever deposited the sequence
    ●




  Classic example: IMPDH
  Wu et al. (2003)
  Comp. Biol. Chem. 27: 37-47
Protein function and bioinformatics
     A better annotation tool: InterProScan
        IPRScan is a tool to search the InterPro database
    ●


        It uses sequence signature profiles – more sensitive than BLAST
    ●


        Integrates the search results from multiple databases
    ●


        A good first step to characterise a new sequence
    ●


        Available as standalone package and runs on clusters
    ●
Protein function and bioinformatics
     Structure prediction: threading and modelling
    The structure of a protein often explains how it functions
●


    However, structural determination is laborious, difficult and time-consuming
●


    Modelling can be useful in cases sequence is similar to a known structure
●




       Threading                                    Homology modelling




    Fit query sequence to fold database   Assume similar sequence = similar structure
Protein function and bioinformatics
         Some modelling tools and databases

        SwissModel:   http://swissmodel.expasy.org/
    ●



        MODELLER:     http://www.salilab.org/modeller/
    ●



        PROSPECT:     http://compbio.ornl.gov/structure/prospect2/
    ●



        ModBase:      http://modbase.compbio.ucsf.edu/
    ●
Protein function and bioinformatics
                Introduction to M. burtonii




  M. burtonii      Ace Lake, Vestfold Hills               The Archaea




                Methanococcoides burtonii
                  Isolated from Ace Lake, Antarctica (1-2 °C)
                ●


                  Grows optimally at 23 °C
                ●


                  Is an archaeon
                ●


                  Is a psychrophilic methanogen
                ●
Protein function and bioinformatics
            The M. burtonii genome




                           What features of this genome
                           are related to cold adaptation?
Protein function and bioinformatics
     Discovery of CSP-like proteins in M. burtonii




   CSP = cold shock protein
 ●


   Expressed in bacteria at low temperature
 ●


   Functions as RNA chaperone to facilitate
 ●


 transcription at low temperature
   Present in some Archaea, including
 ●


 M. frigidum, but not M. burtonii
Protein function and bioinformatics
  Discovery of CSP-like proteins in M. burtonii

   Protein sequences




      PROSPECT
  thread v. CSD folds



      MODELLER                              d1sro__        M. burtonii YP_564958
    structural model




                Both proteins are expressed (proteomics)
            ●


                Located in a putative exosome/proteasome superoperon
            ●


                This is consistent with their proposed function
            ●
Protein function and bioinformatics
   Integrating information: structural RNA study

                                  stems
% GC




                                  all bases




                   OGT (°C)

Is tRNA GC content related to OGT?            Dihydrouridine in M. burtonii
  tRNAScan find tRNA in genomes                 tRNA contains > 1 hU/tRNA
●                                             ●


  GC content calculated using Perl scripts      Maintains flexibility at low temperature
●                                             ●


                                                DUS gene identified using iprscan
                                              ●
Protein function and bioinformatics
       Pyrrolysine: a problem for bioinformatics
                               Proteomics used to identify expressed proteins
                           ●


                               One is trimethylamine methyltransferase (TMA-MT)
                           ●


                               It shows post-translational modification
                           ●


                               It also maps to 2 ORFs in the genome sequence
                           ●




     The ORFs are actually one gene with a read-through UAG codon
 ●


     Pyrrolysine is incorporated at the UAG
 ●


     This is the 22nd genetically-encoded amino acid
 ●
Protein function and bioinformatics
    Statistical analysis of protein properties

          Archaea
        27 organisms
        62 338 ORFs    Amino acid frequency
                             (bioperl)
         Bacteria
       52 organisms
       165 192 ORFs
                             data matrix
                         organisms (rows) x
                       composition (columns)


                                PCA
                       principal components
                         (R stats package)
Protein function and bioinformatics
 Principal components analysis of composition




        2 components explain most of the variation in amino acid composition
    ●


        PC1 correlates with genome GC content
    ●


        PC2 correlates with optimum growth temperature
    ●


        The psychrophilic archaea are distinguished by PC2 score
    ●


        Their proteins contain:  more Gln, Ser, Thr, His, Asp
    ●


                                 less Leu, Trp and Glu
Protein function and bioinformatics
                               Conclusions

    Computational biology and bioinformatics are essential to modern biology
●



    Many tools are available to annotate proteins: web-based
●



                                                    standalone

    Without experiments, bioinformatics is just predictions
●




    Data integration is our biggest problem
●




                                                  www.uq.edu.au/~uqnsaun1/

More Related Content

What's hot (20)

Cath
CathCath
Cath
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
MULTIPLE SEQUENCE ALIGNMENT
MULTIPLE  SEQUENCE  ALIGNMENTMULTIPLE  SEQUENCE  ALIGNMENT
MULTIPLE SEQUENCE ALIGNMENT
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Protein database
Protein databaseProtein database
Protein database
 
BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1BITS: UCSC genome browser - Part 1
BITS: UCSC genome browser - Part 1
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
biological detabase
biological detabasebiological detabase
biological detabase
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Swiss PROT
Swiss PROT Swiss PROT
Swiss PROT
 
BLAST
BLASTBLAST
BLAST
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 

Viewers also liked

4.3 proteins
4.3   proteins4.3   proteins
4.3 proteinsSMKTA
 
Classification and properties of protein
Classification and properties of proteinClassification and properties of protein
Classification and properties of proteinMark Philip Besana
 
Protein structure: details
Protein structure: detailsProtein structure: details
Protein structure: detailsdamarisb
 
Protein Structure & Function
Protein Structure & FunctionProtein Structure & Function
Protein Structure & Functioniptharis
 

Viewers also liked (6)

Protein classification
Protein classificationProtein classification
Protein classification
 
4.3 proteins
4.3   proteins4.3   proteins
4.3 proteins
 
Protein
ProteinProtein
Protein
 
Classification and properties of protein
Classification and properties of proteinClassification and properties of protein
Classification and properties of protein
 
Protein structure: details
Protein structure: detailsProtein structure: details
Protein structure: details
 
Protein Structure & Function
Protein Structure & FunctionProtein Structure & Function
Protein Structure & Function
 

Similar to Protein function and bioinformatics

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureRobert Cormia
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsNeil Saunders
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Sijo A
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfkigaruantony
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wdWagied Davids
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseRai University
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyChrist College, Rajkot
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS
 

Similar to Protein function and bioinformatics (20)

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
 
Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)Bioinformatics (Exam point of view)
Bioinformatics (Exam point of view)
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Data retrieval
Data retrievalData retrieval
Data retrieval
 
Research presentation-wd
Research presentation-wdResearch presentation-wd
Research presentation-wd
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Thesis def
Thesis defThesis def
Thesis def
 
B.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 databaseB.sc biochem i bobi u 2 database
B.sc biochem i bobi u 2 database
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASy
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
BITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequencesBITS: Overview of important biological databases beyond sequences
BITS: Overview of important biological databases beyond sequences
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 

More from Neil Saunders

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Neil Saunders
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomicsNeil Saunders
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansNeil Saunders
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?Neil Saunders
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedNeil Saunders
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesNeil Saunders
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitNeil Saunders
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for youNeil Saunders
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Neil Saunders
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificityNeil Saunders
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?Neil Saunders
 

More from Neil Saunders (11)

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
 
SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?SQL, noSQL or no database at all? Are databases still a core skill?
SQL, noSQL or no database at all? Are databases still a core skill?
 
Data Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet AchievedData Integration: What I Haven't Yet Achieved
Data Integration: What I Haven't Yet Achieved
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction Notices
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
 

Recently uploaded

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Protein function and bioinformatics

  • 1. Protein function and bioinformatics Outline of talk Why do we need bioinformatics? ● What tools do we need? ● Case study: The Methanococcoides burtonii genome ● Neil Saunders 76-455 n.saunders@uq.edu.au www.uq.edu.au/~uqnsaun1/
  • 2. Protein function and bioinformatics Why do we need bioinformatics? Rapid increase in data due to genomics ● Too much data to characterise genes/proteins individually ● Bioinformatics = “smart use” of information ● Ideally, computational and experimental biology are partners ●
  • 3. Protein function and bioinformatics The ideal computational – wet lab cycle Biological system Biological objects Experiments Computational objects Biological inferences Analyses Bioinformatics is about helping biologists solve problems
  • 4. Protein function and bioinformatics Introduction to genomics Genomes Online database www.genomesonline.org ● Published/complete 413 Bacteria in progress 977 Eukarya in progress 629 Archaea in progress 57 Metagenomes 56 10-50% of genes in a new genome may have no known function
  • 5. Protein function and bioinformatics Computational skills for genomics "So what new skills will postdocs need to ensure that  they don't become science relics? The answer is math, statistics, and knowledge of a scripting language for  computers." ­The Scientist, "Bioinformatics Knowledge Vital to Careers" Volume 16 | Issue 17 | 53 | Sep. 2, 2002 www.the­scientist.com
  • 6. Protein function and bioinformatics Using WWW resources The best web resources provide: ● - useful tools for analysis - integrated data from many sources Good examples InterPro database http://www.ebi.ac.uk/interpro/ ● Expasy http://au.expasy.org ● UniProt http://www.uniprot.org/ ● CBS Prediction servers http://www.cbs.dtu.dk/services/ ● IMG Database http://img.jgi.doe.gov/ ● But... Web services no good for genome-scale analyses ● Usually limits to data input (with good reason) ● Nucleic Acids Research publishes annual database and web servers editions: http://nar.oxfordjournals.org/
  • 7. Protein function and bioinformatics Computational infrastructure for genomics Biological Analysis objects (limitless) Genome Sequence analysis Assembly Regulatory motifs Computational objects Gene sequence Structural modeling Protein sequence Phylogeny Protein structure Comparative genomics Pathway Pathway reconstruction Key points Appropriate hardware: workstation v. cluster ● Linux Linux Linux! ● Freely-available, open source software is all you need ● Toolkits and libraries (e.g. BioPerl) to build your own solutions ● Philosophy of “many small tools plus glue” - scripting language ● Website + database skills - sharing ●
  • 8. Protein function and bioinformatics BioPerl: a life sciences computational toolkit Website: http://www.bioperl.org ● A collection of Perl modules for biology ● Handles many common tasks in sequence/structure analysis, e.g. ● - read/write various sequence formats - run BLAST and parse the output - read/write/analyse sequence alignments - access local or remote databases
  • 9. Protein function and bioinformatics Annotation (or not) using BLAST BLAST: Basic Local Alignment and Search Tool Is useful for finding similar sequences quickly ● Not sensitive – less useful for weakly-similar sequences ● Not much good at all for annotation ● Why not? “Hypothetical”: the database sequence is unique ● “Conserved hypothetical”: several hits but no known function ● Multi-domain proteins ● BLAST database contains incorrect annotations ● Annotation is at the whim of whoever deposited the sequence ● Classic example: IMPDH Wu et al. (2003) Comp. Biol. Chem. 27: 37-47
  • 10. Protein function and bioinformatics A better annotation tool: InterProScan IPRScan is a tool to search the InterPro database ● It uses sequence signature profiles – more sensitive than BLAST ● Integrates the search results from multiple databases ● A good first step to characterise a new sequence ● Available as standalone package and runs on clusters ●
  • 11. Protein function and bioinformatics Structure prediction: threading and modelling The structure of a protein often explains how it functions ● However, structural determination is laborious, difficult and time-consuming ● Modelling can be useful in cases sequence is similar to a known structure ● Threading Homology modelling Fit query sequence to fold database Assume similar sequence = similar structure
  • 12. Protein function and bioinformatics Some modelling tools and databases SwissModel: http://swissmodel.expasy.org/ ● MODELLER: http://www.salilab.org/modeller/ ● PROSPECT: http://compbio.ornl.gov/structure/prospect2/ ● ModBase: http://modbase.compbio.ucsf.edu/ ●
  • 13. Protein function and bioinformatics Introduction to M. burtonii M. burtonii Ace Lake, Vestfold Hills The Archaea Methanococcoides burtonii Isolated from Ace Lake, Antarctica (1-2 °C) ● Grows optimally at 23 °C ● Is an archaeon ● Is a psychrophilic methanogen ●
  • 14. Protein function and bioinformatics The M. burtonii genome What features of this genome are related to cold adaptation?
  • 15. Protein function and bioinformatics Discovery of CSP-like proteins in M. burtonii CSP = cold shock protein ● Expressed in bacteria at low temperature ● Functions as RNA chaperone to facilitate ● transcription at low temperature Present in some Archaea, including ● M. frigidum, but not M. burtonii
  • 16. Protein function and bioinformatics Discovery of CSP-like proteins in M. burtonii Protein sequences PROSPECT thread v. CSD folds MODELLER d1sro__ M. burtonii YP_564958 structural model Both proteins are expressed (proteomics) ● Located in a putative exosome/proteasome superoperon ● This is consistent with their proposed function ●
  • 17. Protein function and bioinformatics Integrating information: structural RNA study stems % GC all bases OGT (°C) Is tRNA GC content related to OGT? Dihydrouridine in M. burtonii tRNAScan find tRNA in genomes tRNA contains > 1 hU/tRNA ● ● GC content calculated using Perl scripts Maintains flexibility at low temperature ● ● DUS gene identified using iprscan ●
  • 18. Protein function and bioinformatics Pyrrolysine: a problem for bioinformatics Proteomics used to identify expressed proteins ● One is trimethylamine methyltransferase (TMA-MT) ● It shows post-translational modification ● It also maps to 2 ORFs in the genome sequence ● The ORFs are actually one gene with a read-through UAG codon ● Pyrrolysine is incorporated at the UAG ● This is the 22nd genetically-encoded amino acid ●
  • 19. Protein function and bioinformatics Statistical analysis of protein properties Archaea 27 organisms 62 338 ORFs Amino acid frequency (bioperl) Bacteria 52 organisms 165 192 ORFs data matrix organisms (rows) x composition (columns) PCA principal components (R stats package)
  • 20. Protein function and bioinformatics Principal components analysis of composition 2 components explain most of the variation in amino acid composition ● PC1 correlates with genome GC content ● PC2 correlates with optimum growth temperature ● The psychrophilic archaea are distinguished by PC2 score ● Their proteins contain: more Gln, Ser, Thr, His, Asp ● less Leu, Trp and Glu
  • 21. Protein function and bioinformatics Conclusions Computational biology and bioinformatics are essential to modern biology ● Many tools are available to annotate proteins: web-based ● standalone Without experiments, bioinformatics is just predictions ● Data integration is our biggest problem ● www.uq.edu.au/~uqnsaun1/