SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
Basic bioinformatics concepts,
                      databases and tools
                                                       Module 4
                                       Beyond the sequences

                                                    Dr. Joachim Jacob
                                                http://www.bits.vib.be

Updated Nov 2011
http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod4-intro_H1_2011_otherRelevantData.pdf
Module 4 broadens our view
To understand life, we need not only
sequences, but many other concepts
      
          Bioinformatics is also storing and analyzing
             −   gene information: variations, isoforms,...
             −   Expression data
             −   3D protein structure data
             −   Interaction data
             −   Pathways and network


                     “Storing all relevant biological data”
Schematic view II
GeneA                sequence     annotations – gene expr – pathway – struct,...

GeneB                sequence     annotations – gene expr – pathway – struct,...

GeneC                sequence     annotations – gene expr – pathway – struct,...


                       analysis                  Additional information
                                                        sources
                   results   results
Primary database
Other sequence
databases
The indispensable databases
      
          Gene Ontology – structuring
      
          KEGG – biochemical pathways
      
          PDB – Structure of proteins
      
          Intact – Interaction data
      
          dbSNP – database of genomic variation
      
          Expression sources – Microarray data
Gene Ontology structures the way we
communicate about life




Gene translation                  Protein production                 Protein synthesis



                                            http://www.arabidopsis.org/help/tutorials/go1.jsp
  http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
Gene Ontology structures life
               http://www.geneontology.org/
               Agreement on standardized keywords (often referred to as
                 'controlled vocabularies'), describing all natural processes in an
                 hierarchical way (ontology).
               Keywords are assigned to genes based different evidence
               Keywords are ordered in a hierarchical tree-like structure ( 'directed
                 acyclic graphs')
               Three GO 'trees' exists, describing:
                                 "Biological Process"
                                 "Cellular Component"
                                 "Molecular Function"
                                           http://www.arabidopsis.org/help/tutorials/go1.jsp
 http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
A gene can be given
different GO terms

 Example, cytochrome c:

     molecular function: oxidoreductase activity,

     biological process: oxidative phosphorylation and
 induction of cell death,

     cellular component: mitochondrial matrix and
 mitochondrial inner membrane.

 In each tree, the terms are organised in a directed acyclic
 graph: a network consisting of parents and child-terms (as
 nodes) and lines between them as relationships.
Different evidence codes can assign a
degree of confidence to the assignment
         http://www.geneontology.org/GO.evidence.shtml

         Evidence codes can be grouped by:
         
             Experimental (e.g. IDA – inferred from direct assay)
         
             Computational analysis
         
             Author statement
         
             Curator statement
         
             Inferred from electronic annotation (IEA)
         If available, each annotation has also a reference
Different evidence codes can assign a
degree of confidence to the assignment
Gene Ontology structures all genes
according to their biological significance
         The GO structure and the terms can be browsed by a browser
           called AmiGO.
         The Quick Go from EBI has some nice visualisation
         Excellent GO-wiki for all your questions
GO can be used to retrieve all gene
(products) related to one specific term
         You can search broad, e.g. Amigo search for Diabetes
           leads to following GO term
         http://amigo.geneontology.org/
GO can be used to retrieve all gene
(products) related to one specific term
              Amigo search for Diabetes
GO can be used to retrieve all gene
(products) related to one specific term
              Amigo search for Diabetes
GO is also useful to analyze and compare
different gene lists
          A lot of tools on GO are available on website.




                                http://www.geneontology.org/GO.tools.shtml
Some things to know about GO
         For analyses, one can make use of 'shrinked' GO sets,
           the so-called GO-slims
                –   GO slims are a subset of biologically more
                    relevant GO terms (available per species)
                –   GO ontologies can be downloaded in .obo
                    format.
         Not all information is captured by GO and need to be
           retrieved in other databases
                Metabolic pathways: KEGG, …
                Phenotype/diseases
                       •   Mapping files exists e.g. kegg2go
                              http://www.geneontology.org/GO.slims.shtml
Biological pathways databases organise
genes by molecular reactions
        3 important databases on biological pathways
        
            http://www.kegg.jp/




           http://www.reactome.org/ - EBI
           http://metacyc.org
Proteins with enzymatic function receive
an Enzyme Commission (EC) number
        http://www.chem.qmul.ac.uk/iubmb/enzyme/
        EC 6   Ligases
        EC 5   Isomerases
        EC 4   Lyases
        EC 3   Hydrolases
        EC 2   Transferases
        EC 1   Oxidoreductases
IntAct database contains interaction
information of proteins
         http://www.ebi.ac.uk/intact
         Three types of interactions stored
            
                Protein-protein
            
                Protein-dna
            
                Protein-small molecule
IntAct database represents all
interactions as binary: caution!
Interaction networks can be analysed on
your computer using Cytoscape




                    Cytoscape training material on the BITS website
PDB hosts 3-dimensional
structural data on molecules
PDB hosts 3-dimensional
structural data on molecules

         PDB = Protein DataBank
             http://www.pdb.org/pdb/home/home.do
         Only structures resolved through NMR and X-ray
           (or other accurate techniques)
         
             Proteins
         
             DNA
         
             RNA
         
             Ligands

         Understanding PDB data: tutorial
PDB files can be read by a lot of different
  tools to display the structure
                       Every entry in PDB contains its own PDB accession
                         number (often 1 digit and three letters)
                       The PDB file contains 3D coordinates from every
                         single atom in the structure, together with
                         variability of that position (last two digits)




http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203817:protein-structure-
PDB files can be read by a lot of different
tools to display the structure
         Tools to visualize (and some to analyze
           structures) (see BITS wiki)




                      http://www.bits.vib.be/wiki/index.php/Protein_structure
To find a structure for your protein
  sequence is to search for similarity
               Homology modeling
               Similarity on sequence level projected to a structure
                    Blast your query against PDB db by cblast , or at expasy
                    PSI-BLAST - can detect sequences with similar structures
                     (twilight zone!)
                    If still no success: 3D-jury (a meta approach, including fold
                     recognition and local structure prediction)
               Similarity on structural level: aligning structures
                    VAST (structure)
                    Distance mAtrix aLIgnment DALI

                                             BITS training on protein structure analysis
                http://www.ii.uib.no/~slars/bioinfocourse/PDFs/structpred_tutorial.pdf
Tools at EBI                           http://consurf.tau.ac.il/pe/protexpl/psbiores.htm
Structural information is used to classify
proteins              Database cross-references in PDB entry




             
                 SCOP
             Groups proteins based on evolutionary, domain
               architecture and structural information.
             
                 CATH
             Manually curated classification on protein domains

                                           http://scop.mrc-lmb.cam.ac.uk/scop/
                                                        http://www.cathdb.info/
dbSNP is a public-domain archive for
simple genetic polymorphisms
      
          Single Nucleotide Polymorphism database (NCBI)
      
          Each dbSNP entry has a code rsxx (RefSNP) or ssxx
          (submitted SNP)
          
              single-base nucleotide substitutions (also known as
              single nucleotide polymorphisms or SNPs),
          
              small-scale multi-base deletions or insertions (also
              called deletion insertion polymorphisms or DIPs)
          
              retroposable element insertions and microsatellite
              repeat variations (also called short tandem repeats or
              STRs).
      
          Synchronized with new genome builds
Expression data can be sequence-based
or hybridisation-based
      Sequence-based (ESTs - RNA seq - SAGE)
        
            Digital gene expression/northern
      Microarray databases – hybridisation based:
        
            GEO: gene expression omnibus (NCBI)
             −   Platform: GPLxxxxxxx
             −   Experiment: GSExxxxxx (= several samples)
             −   Sample: GSMxxxxxxxx
             −   Some experiments are curated: GDSxxxxx (online
                 analysis possible)
        
            ArrayExpress (EBI)
Example of expression data at GEO
Example of expression data at GEO
Example of expression data at GEO
Example at ArrayExpress
Example at ArrayExpress
Entrez interconnects the databases at
NCBI for easy querying
        
            UniGene : sequences grouped by gene
        
            PopSet : sequence alignments for population
            studies and phylogeny
        
            Structure : 3D structures (PDB)
        
            Genome : genomic maps of chromosomes and
            plasmids
        
            UniSTS (Sequence Tagged Sites)
        
            PubMed : literature abstracts (MEDLINE,…)
        
            OMIM (Online Mendelian Inheritance in Man) :
            literature reviews,
        
            Mesh (Medical Subject Headings) : keywords
        
            Taxonomy
Finding relevant data
Summarizing most important links to
discover everything you need ...
             Protein data
               Interpro (heavily integrated with EBI resources)
               http://www.interpro.org

             Gene data
               Entrez at NCBI : 'Entrez Gene'
               http://www.ncbi.nlm.nih.gov/Entrez/
               Ebeye Search at EBI : excellent for cross-species
               http://www.ebi.ac.uk/ebisearch/
Hold back your horses!

            Phew, where do I place this all?
Bioinformatics is all about different data,
as versatile as life itself
            Due to the strong cross-references between
              different databases, new databases and
              relevant info are rapidly integrated in existing
              databases.
            You can discover them by taking time to read the
              entries.
New tools are emerging everyday to
enable you to browse all data sources...
         BioGPS, all in one window!
New tools are emerging everyday to
enable you to browse all data sources...
Integrative resources are increasingly
being organised on a species basis
        
            EMAGE database of in situ gene expression in mouse
        
            OMIM Database of diseases in man
        
            Websites providing an interface to integrate all
            this data is increasingly important
        
            Often organized on a species basis
             −   TAIR
             −   Flybase
             −   Wormbase
The organizing biological data
information by species

                     By species, why?
  There is one biological information resource which stays
           more or less unchanged per species ...

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Flux balance analysis
Flux balance analysisFlux balance analysis
Flux balance analysis
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
Dot matrix
Dot matrixDot matrix
Dot matrix
 
Kegg
KeggKegg
Kegg
 
BLAST
BLASTBLAST
BLAST
 
Clustal X
Clustal XClustal X
Clustal X
 
Protein database
Protein databaseProtein database
Protein database
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and Bioinformatics
 
Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Structure analysis of protein
Structure analysis of proteinStructure analysis of protein
Structure analysis of protein
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)SAGE (Serial analysis of Gene Expression)
SAGE (Serial analysis of Gene Expression)
 
Blast bioinformatics
Blast bioinformaticsBlast bioinformatics
Blast bioinformatics
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
The Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resourcesThe Gene Ontology & Gene Ontology Annotation resources
The Gene Ontology & Gene Ontology Annotation resources
 
Cath
CathCath
Cath
 
Sequence alignment global vs. local
Sequence alignment  global vs. localSequence alignment  global vs. local
Sequence alignment global vs. local
 
NCBI
NCBINCBI
NCBI
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 

Destaque

BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITS
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
The important bits of cloud computing
The important bits of cloud computingThe important bits of cloud computing
The important bits of cloud computingCarsonified Team
 
L01 ecture 01-
L01 ecture 01-L01 ecture 01-
L01 ecture 01-MUBOSScz
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyJoaquin Dopazo
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database SystemsDenis Shestakov
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databasesCharu Sharma
 
September 1 Day Workshop
September 1 Day WorkshopSeptember 1 Day Workshop
September 1 Day WorkshopThe Biome
 
DRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLSDRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLSNIPER MOHALI
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
Computer aided drug designing
Computer aided drug designing Computer aided drug designing
Computer aided drug designing Ayesha Aftab
 

Destaque (20)

BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The important bits of cloud computing
The important bits of cloud computingThe important bits of cloud computing
The important bits of cloud computing
 
L01 ecture 01-
L01 ecture 01-L01 ecture 01-
L01 ecture 01-
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database Systems
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
 
September 1 Day Workshop
September 1 Day WorkshopSeptember 1 Day Workshop
September 1 Day Workshop
 
DRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLSDRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLS
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Computer aided drug designing
Computer aided drug designing Computer aided drug designing
Computer aided drug designing
 

Semelhante a BITS: Overview of important biological databases beyond sequences

Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introductionDrGopaSarma
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfkigaruantony
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 

Semelhante a BITS: Overview of important biological databases beyond sequences (20)

Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Biological database
Biological databaseBiological database
Biological database
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Databases
DatabasesDatabases
Databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 

Mais de BITS

RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6BITS
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsBITS
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsBITS
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsBITS
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsBITS
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 

Mais de BITS (20)

RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome level
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 

Último

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Último (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 

BITS: Overview of important biological databases beyond sequences

  • 1. Basic bioinformatics concepts, databases and tools Module 4 Beyond the sequences Dr. Joachim Jacob http://www.bits.vib.be Updated Nov 2011 http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod4-intro_H1_2011_otherRelevantData.pdf
  • 2. Module 4 broadens our view
  • 3. To understand life, we need not only sequences, but many other concepts  Bioinformatics is also storing and analyzing − gene information: variations, isoforms,... − Expression data − 3D protein structure data − Interaction data − Pathways and network “Storing all relevant biological data”
  • 4. Schematic view II GeneA sequence annotations – gene expr – pathway – struct,... GeneB sequence annotations – gene expr – pathway – struct,... GeneC sequence annotations – gene expr – pathway – struct,... analysis Additional information sources results results Primary database Other sequence databases
  • 5. The indispensable databases  Gene Ontology – structuring  KEGG – biochemical pathways  PDB – Structure of proteins  Intact – Interaction data  dbSNP – database of genomic variation  Expression sources – Microarray data
  • 6. Gene Ontology structures the way we communicate about life Gene translation Protein production Protein synthesis http://www.arabidopsis.org/help/tutorials/go1.jsp http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
  • 7. Gene Ontology structures life http://www.geneontology.org/ Agreement on standardized keywords (often referred to as 'controlled vocabularies'), describing all natural processes in an hierarchical way (ontology). Keywords are assigned to genes based different evidence Keywords are ordered in a hierarchical tree-like structure ( 'directed acyclic graphs') Three GO 'trees' exists, describing: "Biological Process" "Cellular Component" "Molecular Function" http://www.arabidopsis.org/help/tutorials/go1.jsp http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
  • 8. A gene can be given different GO terms Example, cytochrome c: molecular function: oxidoreductase activity, biological process: oxidative phosphorylation and induction of cell death, cellular component: mitochondrial matrix and mitochondrial inner membrane. In each tree, the terms are organised in a directed acyclic graph: a network consisting of parents and child-terms (as nodes) and lines between them as relationships.
  • 9.
  • 10. Different evidence codes can assign a degree of confidence to the assignment http://www.geneontology.org/GO.evidence.shtml Evidence codes can be grouped by:  Experimental (e.g. IDA – inferred from direct assay)  Computational analysis  Author statement  Curator statement  Inferred from electronic annotation (IEA) If available, each annotation has also a reference
  • 11. Different evidence codes can assign a degree of confidence to the assignment
  • 12. Gene Ontology structures all genes according to their biological significance The GO structure and the terms can be browsed by a browser called AmiGO. The Quick Go from EBI has some nice visualisation Excellent GO-wiki for all your questions
  • 13. GO can be used to retrieve all gene (products) related to one specific term You can search broad, e.g. Amigo search for Diabetes leads to following GO term http://amigo.geneontology.org/
  • 14. GO can be used to retrieve all gene (products) related to one specific term Amigo search for Diabetes
  • 15. GO can be used to retrieve all gene (products) related to one specific term Amigo search for Diabetes
  • 16. GO is also useful to analyze and compare different gene lists A lot of tools on GO are available on website. http://www.geneontology.org/GO.tools.shtml
  • 17. Some things to know about GO For analyses, one can make use of 'shrinked' GO sets, the so-called GO-slims – GO slims are a subset of biologically more relevant GO terms (available per species) – GO ontologies can be downloaded in .obo format. Not all information is captured by GO and need to be retrieved in other databases Metabolic pathways: KEGG, … Phenotype/diseases • Mapping files exists e.g. kegg2go http://www.geneontology.org/GO.slims.shtml
  • 18. Biological pathways databases organise genes by molecular reactions 3 important databases on biological pathways  http://www.kegg.jp/  http://www.reactome.org/ - EBI  http://metacyc.org
  • 19. Proteins with enzymatic function receive an Enzyme Commission (EC) number http://www.chem.qmul.ac.uk/iubmb/enzyme/ EC 6 Ligases EC 5 Isomerases EC 4 Lyases EC 3 Hydrolases EC 2 Transferases EC 1 Oxidoreductases
  • 20. IntAct database contains interaction information of proteins http://www.ebi.ac.uk/intact Three types of interactions stored  Protein-protein  Protein-dna  Protein-small molecule
  • 21. IntAct database represents all interactions as binary: caution!
  • 22. Interaction networks can be analysed on your computer using Cytoscape Cytoscape training material on the BITS website
  • 24. PDB hosts 3-dimensional structural data on molecules PDB = Protein DataBank http://www.pdb.org/pdb/home/home.do Only structures resolved through NMR and X-ray (or other accurate techniques)  Proteins  DNA  RNA  Ligands Understanding PDB data: tutorial
  • 25. PDB files can be read by a lot of different tools to display the structure Every entry in PDB contains its own PDB accession number (often 1 digit and three letters) The PDB file contains 3D coordinates from every single atom in the structure, together with variability of that position (last two digits) http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203817:protein-structure-
  • 26. PDB files can be read by a lot of different tools to display the structure Tools to visualize (and some to analyze structures) (see BITS wiki) http://www.bits.vib.be/wiki/index.php/Protein_structure
  • 27. To find a structure for your protein sequence is to search for similarity Homology modeling Similarity on sequence level projected to a structure  Blast your query against PDB db by cblast , or at expasy  PSI-BLAST - can detect sequences with similar structures (twilight zone!)  If still no success: 3D-jury (a meta approach, including fold recognition and local structure prediction) Similarity on structural level: aligning structures  VAST (structure)  Distance mAtrix aLIgnment DALI BITS training on protein structure analysis http://www.ii.uib.no/~slars/bioinfocourse/PDFs/structpred_tutorial.pdf Tools at EBI http://consurf.tau.ac.il/pe/protexpl/psbiores.htm
  • 28. Structural information is used to classify proteins Database cross-references in PDB entry  SCOP Groups proteins based on evolutionary, domain architecture and structural information.  CATH Manually curated classification on protein domains http://scop.mrc-lmb.cam.ac.uk/scop/ http://www.cathdb.info/
  • 29. dbSNP is a public-domain archive for simple genetic polymorphisms  Single Nucleotide Polymorphism database (NCBI)  Each dbSNP entry has a code rsxx (RefSNP) or ssxx (submitted SNP)  single-base nucleotide substitutions (also known as single nucleotide polymorphisms or SNPs),  small-scale multi-base deletions or insertions (also called deletion insertion polymorphisms or DIPs)  retroposable element insertions and microsatellite repeat variations (also called short tandem repeats or STRs).  Synchronized with new genome builds
  • 30. Expression data can be sequence-based or hybridisation-based Sequence-based (ESTs - RNA seq - SAGE)  Digital gene expression/northern Microarray databases – hybridisation based:  GEO: gene expression omnibus (NCBI) − Platform: GPLxxxxxxx − Experiment: GSExxxxxx (= several samples) − Sample: GSMxxxxxxxx − Some experiments are curated: GDSxxxxx (online analysis possible)  ArrayExpress (EBI)
  • 31. Example of expression data at GEO
  • 32. Example of expression data at GEO
  • 33. Example of expression data at GEO
  • 36. Entrez interconnects the databases at NCBI for easy querying  UniGene : sequences grouped by gene  PopSet : sequence alignments for population studies and phylogeny  Structure : 3D structures (PDB)  Genome : genomic maps of chromosomes and plasmids  UniSTS (Sequence Tagged Sites)  PubMed : literature abstracts (MEDLINE,…)  OMIM (Online Mendelian Inheritance in Man) : literature reviews,  Mesh (Medical Subject Headings) : keywords  Taxonomy
  • 38. Summarizing most important links to discover everything you need ... Protein data Interpro (heavily integrated with EBI resources) http://www.interpro.org Gene data Entrez at NCBI : 'Entrez Gene' http://www.ncbi.nlm.nih.gov/Entrez/ Ebeye Search at EBI : excellent for cross-species http://www.ebi.ac.uk/ebisearch/
  • 39. Hold back your horses! Phew, where do I place this all?
  • 40. Bioinformatics is all about different data, as versatile as life itself Due to the strong cross-references between different databases, new databases and relevant info are rapidly integrated in existing databases. You can discover them by taking time to read the entries.
  • 41. New tools are emerging everyday to enable you to browse all data sources... BioGPS, all in one window!
  • 42. New tools are emerging everyday to enable you to browse all data sources...
  • 43. Integrative resources are increasingly being organised on a species basis  EMAGE database of in situ gene expression in mouse  OMIM Database of diseases in man  Websites providing an interface to integrate all this data is increasingly important  Often organized on a species basis − TAIR − Flybase − Wormbase
  • 44. The organizing biological data information by species By species, why? There is one biological information resource which stays more or less unchanged per species ...

Notas do Editor

  1. 'translation', whereas another uses the phrase 'protein synthesis',
  2. 'translation', whereas another uses the phrase 'protein synthesis',
  3. 'translation', whereas another uses the phrase 'protein synthesis',
  4. GO hierarchy can be downloaded (obo format) GO Slim: selection of categories
  5. GO hierarchy can be downloaded (obo format) GO Slim: selection of categories
  6. Different types: Ribbon Cartoon Ball and stick Space filling
  7. Different types: Ribbon Cartoon Ball and stick Space filling