SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Hospital Universitari Vall d’Hebron
Institut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Bioinformàtica per la
Recerca Biomèdica
http://ueb.vhir.org/2014BRB
Alex Sánchez
alex.sanchez@vhir.org
13/05/2014
STORING AND ACCESSING INFORMATION
DATABASES AND QUERIES
1. Data banks and databases
● Information in the genomics era
● Distinct DB usages
● To take into account
● Main resources providers
2. Types of databases
● EMBL vs NCBI
● Bibliography DB
● Taxonomy DB
● Nucleotide DB
● Genome DB
● Protein DB
● Microarray DB
● Other DB
● Lists of DB
PRESENTATION OUTLINE
213/05/2014
3. Structure and formats of the databases
● Structure of the DB
● Formats of the DB
● Sequence FASTA format
● GenBank entry example
● EMBL entry example
4. Submitting data
● Submitting sequences
● Submitting expression data
5. Tools for DB exploitation
● ENTREZ
● Cross-search tables
● Entrez queries
● Entrez fields
● Help system
Data banks and databases
313/05/2014
INFORMATION IN THE GENOMICS ERA
4
• Genomics era: huge amount of
data
• To be able to use this information,
it should be properly stored
• The access to that info
– Must be quick
– Has to be done in a flexible way
• That is possible thanks to the
– Creation of databases
– It’s online availability
13/05/2014
DISTINCT DB USAGES
5
• Information search
– By keyword, accession number, authors…
• Homology search
– Is there any sequence identical or similar to that mine?
• Pattern search
– Has my sequence any known pattern?
• Predictions
– Can I find proteins, with already known function, similar to
mine?
13/05/2014
Bioinformatics reagent: Databases
Organized array of information
Place where you put things in, and (if all is well)
you should be able to get them out again.
Resource for other databases and tools.
Simplify the information space by specialization.
Bonus: Allows you to make discoveries.
Important question to ask:
what is the data model?
7
Bioinformatics experiments:
BLAST searchSequence Alignment
Reagents:
•Sequence
•Databases
Method:
•P-P BLASTP
•N-P BLASTX
•P-N TBLASTN
•N-N BLASTN
•N (P) – N (P) TBLASTX
Interpretation:
•Similarity
•Hypothesis testing
Know
your reagents
Know
your methods
Do your controls
8
Nature 409:452
Bioinformatics Citizenship: What it means,
and what does it cost?
Databases
Information system
Query system
Storage System
Data
Databases
Information system
Query system
Storage System
Data
GenBank flat file
COSMIC record
Interaction Record
Title of a book
Book
Databases
Information system
Query system
Storage System
Data
Boxes
Oracle
MySQL
PC binary files
Unix text files
Bookshelves
Databases
Information system
Query system
Storage System
Data
A List you look at
A catalogue
indexed files
SQL
grep
The library of Congress
Google
Entrez
EnsEMBL
UCSC gemome browser
Databases
Information system
Query system
Storage System
Data
TO TAKE INTO ACCOUNT
1413/05/2014
Information organization
Resources providers Databases Tools
Organizations or centers devoted to the
offer and maintain the databases
To find/check/export information into/from DB
Diverse and very different information
MAIN RESOURCES PROVIDERS
1513/05/2014
• The National Center for Biotechnology Information
(NCBI) offers data banks, databases and tools at the
USA
• The European Bioinformatics Institute (EBI) does a
similar function in Europe
• GenomeNet gathers several databases from Japan
Types of databases
1613/05/2014
TYPES OF DB
1713/05/2014
• There are hundreds of BD, so it is not feasible to
enumerate them (but they have tried here)
• We can classify them by multiple criteria
• The structural organization of the EMBL and the
NCBI resources is radically different
EMBL vs NCBI
1813/05/2014
• EMBL
– Bibliographic DB
– Taxonomic DB
– Nucleotide DB
– Genomic BD
– Protein BD
– Microarrays DB
…
• NCBI
– PubMed
– Entrez
– OMIM
– Books
– TaxBrowser
– Structure
…
BIBLIOGRAPHY DB
1913/05/2014
• Collection of papers published in
scientific journals
– Pubmed (NCBI)
– Medline (EBI)
– Biocatalog: papers organized by
concrete molecular biology topics
TAXONOMY DB
2013/05/2014
• Information on the
classification of living things
– basically hierarchical
– and based on molecular
evidences
• To classify any organism
from which at least one
nucleic acid sequence has
been determined
• There is indeed some
controversy in the scientific
community
NUCLEOTIDE DB
2113/05/2014
• Sequences from experimental laboratories
• Daily updated
• Daily exchanging of its contents
– Genbank (NCBI)
– EMBL (EBI)
– KEGG (Genome net)
Sequences NOT in NucleotideDB
• WGS: whole genome shotgun
• TPA: third party annotations
• SNPs
• SAGE tags (serial analysis of gene expression)
• RefSeq (Genomic, mRNA, or protein)
• Consensus sequences
GENOME DB
2313/05/2014
• Sequences and annotations of
whole genomes
– Ensembl (EBI)
– Genome viewer (NCBI)
– Goldenpath (UCSC)
• Specialized genomic resources
– Transfact
– EST
– UTRDB
– SpliceSitesDB
…
PROTEIN DB (I)
2413/05/2014
• Aminoacids primary
sequences
– Without human revision
• Trembl (EBI)
• NR (NCBI)
– With annotation’s curation
• Uniprot (EBI)
– Proteome DB
• Proteome analysis (EBI)
PROTEIN DB (II)
2513/05/2014
• Secondary structures or protein domains
• They depend on the protein source and the analysis
perfomed on them
– PROSITE: Regular Expressions over Swiss-Prot
– PRINTS: Set of motifs that define a family over Swiss-
Prot/TrEMBL
– BLOCKS: Aligned motifs from PROSITE/PRINTS
– PFAM: Markov Modelos over Swiss-Prot
– INTERPRO: Integrates information from several domain-
focused data bases.
PROTEIN DB (III)
2613/05/2014
• 3D structures with coordinates
of each atom
– PDB: Reference protein 3D
structure (x-ray, NMR) database
– CATH: Classification of the PDB
in different functional and
structural groups
– MMDB: subset de PDB
maintained by the NCBI
– MSD: subset of the PDB
maintained by the EBI
MICROARRAY DB
2713/05/2014
• Expression arrays results
– ArrayExpress
– caArray
– Gene Expression Omnibus
OTHER DB (1)
2813/05/2014
• Biological Annotations
– Gene Ontology
– KEGG
– Gene Cards
• Therapeutic targets
– Therapeutic targets database
– PharmGKB
…
Historical perspective on the Human
Genome Data
Human Expressed Seq Tags (mRNA) sequencing
Human genome mapping and sequencing
Population analysis and polymorphism measurements
Genome Wide Association Studies
<the Homer paper>
The Cancer Genome Atlas pilot
The 1000 genome project
The Cancer Genome Atlas
The International Cancer Genome Consortium
• Detailed Phenotype and Outcome data
• Region of residence
• Risk factors
• Examination
• Surgery
• Drugs
• Radiation
• Sample
• Slide
• Specific histological features
• Analyte
• Aliquot
• Donor notes
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
ICGC Controlled
Access Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender
Age range
• Gene Expression (normalized)
• DNA methylation
• Genotype frequencies
• Computed Copy Number and
Loss of Heterozygosity
• Newly discovered somatic variants
ICGC OA
Datasets
http://goo.gl/w4mrV
Main source of Cancer Data: ICGC
http://dcc.icgc.org/
Module 2a bioinformatics.ca
Another source of important Cancer Data:
:
http://www.sanger.ac.uk/genetics/CGP/cosmic/
Module 2a bioinformatics.ca
What is Cancer Data?
Structured Clinical Data about the patient
Structured Clinical Data about the treatment
Structured Clinical Data about the tumor
Associated with a number of
positions (hundreds, if not
thousands) of nucleotide
coordinate system on one
reference genome.
ICGC is implementing NCBI’s bioprojects
http://www.ncbi.nlm.nih.gov/bioproject
LISTS OF BD
3613/05/2014
Nucleic Acids Research Database Listing
– Annual Database issue
http://www.oxfordjournals.org/nar/database/c/
– Suplement that comes with each year’s January issue
– 2009 2013 describes 179 1512 databases, sorted into 14
categories and 41 subcategories.
– They ara added to the list of Nucleic Acids Research
online Molecular Biology Database Collection
– Good starting point for selecting the appropriate DB
LISTS OF BD
3713/05/2014
Structure and formats
of the DB
3813/05/2014
STRUCTURE OF THE DB
3913/05/2014
• The way of organizing data in any DB
depends mainly in the model or architecture
in which it is based on
• There are multiple models
Relational, Hierarchical, Network-based…
but the most usual relational
– Several tables, that could have relationships
between them
– The relationships are done through key fields
FORMATS OF THE DB
4013/05/2014
• To work with relational DB implies the use of
plane data formats
– Text files
– Some kind of labels to specify the contents of
every line or region of the file
• There are multiple formats, so a good
program or application should be able to
recognize (and even interchange) them.
SEQUENCE FASTA FORMAT
4113/05/2014
Identifier Additional info
sequence
1stline
>gi|15341523|gb|AF405321.1| Human echovirus 29 strain JV-10 5' UTR, partial
sequence CAAGCACTTCTGTTTCCCCGGACTGAGTATCAATAGACTGCTCACGCGGTTGAAGGAGAAAACGTTCGTT
ATCCGGCCAACTACTTCGAGAAACCTAGTAACGCCATGGAAGTTGTGGAGTGTTTCGCTCAGCACTACCC
CAGTGTAGATCAGGTTGATGAGTCACCGCATTCCCCACGGGTGACCGTGGCGGTGGCTGCGTTGGCGGCC
TGCCCATGGGGAAACCCATGGGACGCTCTTATACAGACATGGTGCGAAGAGTCTATTGAGCTAGTTGGTA
GTCCTCCGGCCCCTGAATGCGGCTAATCCCAACTGCGGAGCATACACTCTCAAGCCAGAGGGTAGTGTGT
CGTAATGGGCAACTCTGCAGCGGAACCGACTACTTTGGGT
>gi|15341527|gb|AF405325.1| Human echovirus 6 strain D' Amori 5' UTR, partial
sequence
CAAGCACTTCTGTTTCCCCGGACCGAGTATCAATAAGCTGCTCACGCGGCTGAAGGAGAAAGTGTTCGTT
ACCCGGCTAGTTACTTCGAGAAACCTAGTACCACCATGAAGGTTGCGCAGCGTTTCGCTCCGCACAACCC
CAGTGTAGATCAGGTCGATGAGTCACCGCGTTCCCCACGGGCGACCGTGGCGGTGGCTGCGTTGGCGGCC
TGCCCATGGGGCAACCCATGGGACGCTTCAATACTGACATGGTGCGAAGAGTCTATTGAGCTAACTAGTA
GTCCTCCGGCCCCTGAATGCGGATAATCTTAACTGCGGAGCAGGTGCTCACAATCCAGTGGGTGGCCTGT
CGTAACGGGCAACTCTGCAGCGGAACCGACTACTTTGGGT
GENBANK ENTRY EXAMPLE
4213/05/2014
EMBL ENTRY EXAMPLE
4313/05/2014
Submitting data
4413/05/2014
SUBMITTING DATA
4513/05/2014
• Several biological databases are public, so
any (properly identified) user can contribute
uploading new data
• There are multiple types of data to upload,
but the most usual are
– Sequencies
– Expression data (from microarrays)
SUBMITTING SEQUENCES
4613/05/2014
How to submit your sequences to…
• EMBL
– http://www.ebi.ac.uk/embl/Submission/
• GeneBank
– http://www.nlm.nih.gov/pubs/factsheets/sdgenbk.html
SUBMITTING EXPRESSION DATA
4713/05/2014
And your expression data to…
• ArrayExpress (EBI)
– http://www.ebi.ac.uk/microarray/submissions.html
• Gene Expression Omnibus (NCBI)
– https://www.ncbi.nlm.nih.gov/geo/info/faq.html
Tools for DB exploitation
4813/05/2014
ENTREZ
4913/05/2014
• It is the NCBI’s searching system
• Great power and versatility, but less intuitive
than SRS
• It doesn’t provide forms for each field
• Usually used in a “Top Bottom” manner
– Perform a first query
– Refine the results until reaching what you are
looking for.
CROSS-SEARCH TABLES
5013/05/2014
ENTREZ QUERIES
5113/05/2014
• Boolean operators: AND, OR, NOT, “”, *
• AND applied by default
• Query by Accession Numbers (AC) in
– Genbank / EMBL / DDBJ:
• 1 char. + 5 nums. (U12345)
• 2 char. + 6 nums. (AF123456)
– SwissProt / PIR:
• 1 char. + 5 nums. (P12345)
• Refine queries with the reserved word LIMITS
• Combine queries with HISTORY
ENTREZ AVAILABLE FIELDS
5213/05/2014
HELP AND INFORMATION SYSTEM
5313/05/2014
Estamos interesados en el gen MLH1 humano, implicado en el cáncer de
colon
– Separar el grano de la paja: identificar una secuencia de mRNA
representativa y bien anotada del gen MLH1.
– Obtener literatura asociada y su secuencia protéica.
– Identificar proteínas similares.
– Identificar dominios conservados dentro de la proteína.
– Identificar mutaciones conocidas en el gen o la proteína.
– Encontrar la estructura tridimensional de la proteína, si esta es
conocida, o si no es así, identificar estructuras de secuencia homóloga.
– Ver el contexto genómico del gen y descargar la región que lo contiene.
Vall d'Hebron Institut de Recerca 21/06/2011
Ejemplos de búsqueda con Entrez
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta directa (1.1)
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta directa (1.2) Límites
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta directa (1.3) Filtros
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta directa (1.4) Registro
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (2) Enlaces a otras BD
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (3) Secuencias
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (4) Proteína
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (5.1) Mutaciones
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (5.2) SNPs
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (5.3) OMIM
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (6.1) Estructuras
Mouse over the residues of NP_000240 until the grey footer bar shows ‘gi
4557757, loc 67’ (Glycine). Click on the corresponding Glycine residue in
1H7U_A (loc 74) to highlight it.
In the structure window use the left mouse button to spin the 3D structure until
you can clearly see and identify the highlighted residue. Is it possibly in
the active site? For example, is it within 5 Ä of the ATPS molecule?
Double click on the Mg-complexed ATPS to highlight it. Then use the menu bar
option called ‘Show/Hide|Select By Distance|Residues Only’ to highlight
all residues within 5 Ä of the ATPS. Indeed, the Glycine at position #74 is
within 5 Ä and is likely part of the active site for this energy-producing
domain. This hints at the possible problems a Gly  Trp mutation might
cause at that position.
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (6.2) Alineamiento de secuencia y
estructura
Vall d'Hebron Institut de Recerca 21/06/2011
Consulta (7) Visualización en contexto
genómico

Mais conteúdo relacionado

Mais procurados

Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designingDr NEETHU ASOKAN
 
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)R.P MAURYA
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformaticsnadeem akhter
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)ShivaniShewale2
 

Mais procurados (20)

Sequence file formats
Sequence file formatsSequence file formats
Sequence file formats
 
Scop database
Scop databaseScop database
Scop database
 
Cath
CathCath
Cath
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Role of bioinformatics of drug designing
Role of bioinformatics of drug designingRole of bioinformatics of drug designing
Role of bioinformatics of drug designing
 
Biological Databases
Biological DatabasesBiological Databases
Biological Databases
 
Protein Data Bank (PDB)
Protein Data Bank (PDB)Protein Data Bank (PDB)
Protein Data Bank (PDB)
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Biological databases
Biological databasesBiological databases
Biological databases
 
EMBL-EBI
EMBL-EBIEMBL-EBI
EMBL-EBI
 
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
 
sequence of file formats in bioinformatics
sequence of file formats in bioinformaticssequence of file formats in bioinformatics
sequence of file formats in bioinformatics
 
Biological Database
Biological DatabaseBiological Database
Biological Database
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Genome annotation 2013
Genome annotation 2013Genome annotation 2013
Genome annotation 2013
 
Molecular modeling database
Molecular modeling database Molecular modeling database
Molecular modeling database
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Semelhante a Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformatics Course - Session 1.2 - VHIR, Barcelona)

Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxVandana Yadav03
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptxscience lover
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptBangaluru
 
Biological databases
Biological databasesBiological databases
Biological databasesAshfaq Ahmad
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary databaseKAUSHAL SAHU
 
Nucleic_Acid_Databases, Bioinformatics, genome
Nucleic_Acid_Databases, Bioinformatics, genomeNucleic_Acid_Databases, Bioinformatics, genome
Nucleic_Acid_Databases, Bioinformatics, genomeMohamedHasan816582
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...Syed Ahmad Chan Bukhari, PhD
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.pptSoumen Barman
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEPrashantSharma807
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics AakifahAmreen
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuBen Busby
 

Semelhante a Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformatics Course - Session 1.2 - VHIR, Barcelona) (20)

Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.ppt
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Nucleic_Acid_Databases, Bioinformatics, genome
Nucleic_Acid_Databases, Bioinformatics, genomeNucleic_Acid_Databases, Bioinformatics, genome
Nucleic_Acid_Databases, Bioinformatics, genome
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.ppt
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Biological databases
Biological databases Biological databases
Biological databases
 
Databases_L2.pptx
Databases_L2.pptxDatabases_L2.pptx
Databases_L2.pptx
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics
 
Protein database
Protein  databaseProtein  database
Protein database
 
Advanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osuAdvanced genomics v_medical_pitt_kent_osu
Advanced genomics v_medical_pitt_kent_osu
 

Mais de VHIR Vall d’Hebron Institut de Recerca

Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...VHIR Vall d’Hebron Institut de Recerca
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...VHIR Vall d’Hebron Institut de Recerca
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...VHIR Vall d’Hebron Institut de Recerca
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...VHIR Vall d’Hebron Institut de Recerca
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...VHIR Vall d’Hebron Institut de Recerca
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaVHIR Vall d’Hebron Institut de Recerca
 

Mais de VHIR Vall d’Hebron Institut de Recerca (20)

Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Information management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cmsInformation management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cms
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCRCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
 
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCRCurso de Genómica - UAT (VHIR) 2012 - RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCR
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
 
Curso de Genómica - UAT (VHIR) 2012 - Microarrays
Curso de Genómica - UAT (VHIR) 2012 - MicroarraysCurso de Genómica - UAT (VHIR) 2012 - Microarrays
Curso de Genómica - UAT (VHIR) 2012 - Microarrays
 
Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
 Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 

Último

FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 

Último (20)

FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 

Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformatics Course - Session 1.2 - VHIR, Barcelona)

  • 1. Hospital Universitari Vall d’Hebron Institut de Recerca - VHIR Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII) Bioinformàtica per la Recerca Biomèdica http://ueb.vhir.org/2014BRB Alex Sánchez alex.sanchez@vhir.org 13/05/2014 STORING AND ACCESSING INFORMATION DATABASES AND QUERIES
  • 2. 1. Data banks and databases ● Information in the genomics era ● Distinct DB usages ● To take into account ● Main resources providers 2. Types of databases ● EMBL vs NCBI ● Bibliography DB ● Taxonomy DB ● Nucleotide DB ● Genome DB ● Protein DB ● Microarray DB ● Other DB ● Lists of DB PRESENTATION OUTLINE 213/05/2014 3. Structure and formats of the databases ● Structure of the DB ● Formats of the DB ● Sequence FASTA format ● GenBank entry example ● EMBL entry example 4. Submitting data ● Submitting sequences ● Submitting expression data 5. Tools for DB exploitation ● ENTREZ ● Cross-search tables ● Entrez queries ● Entrez fields ● Help system
  • 3. Data banks and databases 313/05/2014
  • 4. INFORMATION IN THE GENOMICS ERA 4 • Genomics era: huge amount of data • To be able to use this information, it should be properly stored • The access to that info – Must be quick – Has to be done in a flexible way • That is possible thanks to the – Creation of databases – It’s online availability 13/05/2014
  • 5. DISTINCT DB USAGES 5 • Information search – By keyword, accession number, authors… • Homology search – Is there any sequence identical or similar to that mine? • Pattern search – Has my sequence any known pattern? • Predictions – Can I find proteins, with already known function, similar to mine? 13/05/2014
  • 6. Bioinformatics reagent: Databases Organized array of information Place where you put things in, and (if all is well) you should be able to get them out again. Resource for other databases and tools. Simplify the information space by specialization. Bonus: Allows you to make discoveries. Important question to ask: what is the data model?
  • 7. 7 Bioinformatics experiments: BLAST searchSequence Alignment Reagents: •Sequence •Databases Method: •P-P BLASTP •N-P BLASTX •P-N TBLASTN •N-N BLASTN •N (P) – N (P) TBLASTX Interpretation: •Similarity •Hypothesis testing Know your reagents Know your methods Do your controls
  • 8. 8 Nature 409:452 Bioinformatics Citizenship: What it means, and what does it cost?
  • 10. Databases Information system Query system Storage System Data GenBank flat file COSMIC record Interaction Record Title of a book Book
  • 11. Databases Information system Query system Storage System Data Boxes Oracle MySQL PC binary files Unix text files Bookshelves
  • 12. Databases Information system Query system Storage System Data A List you look at A catalogue indexed files SQL grep
  • 13. The library of Congress Google Entrez EnsEMBL UCSC gemome browser Databases Information system Query system Storage System Data
  • 14. TO TAKE INTO ACCOUNT 1413/05/2014 Information organization Resources providers Databases Tools Organizations or centers devoted to the offer and maintain the databases To find/check/export information into/from DB Diverse and very different information
  • 15. MAIN RESOURCES PROVIDERS 1513/05/2014 • The National Center for Biotechnology Information (NCBI) offers data banks, databases and tools at the USA • The European Bioinformatics Institute (EBI) does a similar function in Europe • GenomeNet gathers several databases from Japan
  • 17. TYPES OF DB 1713/05/2014 • There are hundreds of BD, so it is not feasible to enumerate them (but they have tried here) • We can classify them by multiple criteria • The structural organization of the EMBL and the NCBI resources is radically different
  • 18. EMBL vs NCBI 1813/05/2014 • EMBL – Bibliographic DB – Taxonomic DB – Nucleotide DB – Genomic BD – Protein BD – Microarrays DB … • NCBI – PubMed – Entrez – OMIM – Books – TaxBrowser – Structure …
  • 19. BIBLIOGRAPHY DB 1913/05/2014 • Collection of papers published in scientific journals – Pubmed (NCBI) – Medline (EBI) – Biocatalog: papers organized by concrete molecular biology topics
  • 20. TAXONOMY DB 2013/05/2014 • Information on the classification of living things – basically hierarchical – and based on molecular evidences • To classify any organism from which at least one nucleic acid sequence has been determined • There is indeed some controversy in the scientific community
  • 21. NUCLEOTIDE DB 2113/05/2014 • Sequences from experimental laboratories • Daily updated • Daily exchanging of its contents – Genbank (NCBI) – EMBL (EBI) – KEGG (Genome net)
  • 22. Sequences NOT in NucleotideDB • WGS: whole genome shotgun • TPA: third party annotations • SNPs • SAGE tags (serial analysis of gene expression) • RefSeq (Genomic, mRNA, or protein) • Consensus sequences
  • 23. GENOME DB 2313/05/2014 • Sequences and annotations of whole genomes – Ensembl (EBI) – Genome viewer (NCBI) – Goldenpath (UCSC) • Specialized genomic resources – Transfact – EST – UTRDB – SpliceSitesDB …
  • 24. PROTEIN DB (I) 2413/05/2014 • Aminoacids primary sequences – Without human revision • Trembl (EBI) • NR (NCBI) – With annotation’s curation • Uniprot (EBI) – Proteome DB • Proteome analysis (EBI)
  • 25. PROTEIN DB (II) 2513/05/2014 • Secondary structures or protein domains • They depend on the protein source and the analysis perfomed on them – PROSITE: Regular Expressions over Swiss-Prot – PRINTS: Set of motifs that define a family over Swiss- Prot/TrEMBL – BLOCKS: Aligned motifs from PROSITE/PRINTS – PFAM: Markov Modelos over Swiss-Prot – INTERPRO: Integrates information from several domain- focused data bases.
  • 26. PROTEIN DB (III) 2613/05/2014 • 3D structures with coordinates of each atom – PDB: Reference protein 3D structure (x-ray, NMR) database – CATH: Classification of the PDB in different functional and structural groups – MMDB: subset de PDB maintained by the NCBI – MSD: subset of the PDB maintained by the EBI
  • 27. MICROARRAY DB 2713/05/2014 • Expression arrays results – ArrayExpress – caArray – Gene Expression Omnibus
  • 28. OTHER DB (1) 2813/05/2014 • Biological Annotations – Gene Ontology – KEGG – Gene Cards • Therapeutic targets – Therapeutic targets database – PharmGKB …
  • 29. Historical perspective on the Human Genome Data Human Expressed Seq Tags (mRNA) sequencing Human genome mapping and sequencing Population analysis and polymorphism measurements Genome Wide Association Studies <the Homer paper> The Cancer Genome Atlas pilot The 1000 genome project The Cancer Genome Atlas The International Cancer Genome Consortium
  • 30. • Detailed Phenotype and Outcome data • Region of residence • Risk factors • Examination • Surgery • Drugs • Radiation • Sample • Slide • Specific histological features • Analyte • Aliquot • Donor notes • Gene Expression (probe-level data) • Raw genotype calls • Gene-sample identifier links • Genome sequence files ICGC Controlled Access Datasets • Cancer Pathology Histologic type or subtype Histologic nuclear grade • Patient/Person Gender Age range • Gene Expression (normalized) • DNA methylation • Genotype frequencies • Computed Copy Number and Loss of Heterozygosity • Newly discovered somatic variants ICGC OA Datasets http://goo.gl/w4mrV Main source of Cancer Data: ICGC
  • 33. Another source of important Cancer Data: : http://www.sanger.ac.uk/genetics/CGP/cosmic/
  • 34. Module 2a bioinformatics.ca What is Cancer Data? Structured Clinical Data about the patient Structured Clinical Data about the treatment Structured Clinical Data about the tumor Associated with a number of positions (hundreds, if not thousands) of nucleotide coordinate system on one reference genome.
  • 35. ICGC is implementing NCBI’s bioprojects http://www.ncbi.nlm.nih.gov/bioproject
  • 36. LISTS OF BD 3613/05/2014 Nucleic Acids Research Database Listing – Annual Database issue http://www.oxfordjournals.org/nar/database/c/ – Suplement that comes with each year’s January issue – 2009 2013 describes 179 1512 databases, sorted into 14 categories and 41 subcategories. – They ara added to the list of Nucleic Acids Research online Molecular Biology Database Collection – Good starting point for selecting the appropriate DB
  • 38. Structure and formats of the DB 3813/05/2014
  • 39. STRUCTURE OF THE DB 3913/05/2014 • The way of organizing data in any DB depends mainly in the model or architecture in which it is based on • There are multiple models Relational, Hierarchical, Network-based… but the most usual relational – Several tables, that could have relationships between them – The relationships are done through key fields
  • 40. FORMATS OF THE DB 4013/05/2014 • To work with relational DB implies the use of plane data formats – Text files – Some kind of labels to specify the contents of every line or region of the file • There are multiple formats, so a good program or application should be able to recognize (and even interchange) them.
  • 41. SEQUENCE FASTA FORMAT 4113/05/2014 Identifier Additional info sequence 1stline >gi|15341523|gb|AF405321.1| Human echovirus 29 strain JV-10 5' UTR, partial sequence CAAGCACTTCTGTTTCCCCGGACTGAGTATCAATAGACTGCTCACGCGGTTGAAGGAGAAAACGTTCGTT ATCCGGCCAACTACTTCGAGAAACCTAGTAACGCCATGGAAGTTGTGGAGTGTTTCGCTCAGCACTACCC CAGTGTAGATCAGGTTGATGAGTCACCGCATTCCCCACGGGTGACCGTGGCGGTGGCTGCGTTGGCGGCC TGCCCATGGGGAAACCCATGGGACGCTCTTATACAGACATGGTGCGAAGAGTCTATTGAGCTAGTTGGTA GTCCTCCGGCCCCTGAATGCGGCTAATCCCAACTGCGGAGCATACACTCTCAAGCCAGAGGGTAGTGTGT CGTAATGGGCAACTCTGCAGCGGAACCGACTACTTTGGGT >gi|15341527|gb|AF405325.1| Human echovirus 6 strain D' Amori 5' UTR, partial sequence CAAGCACTTCTGTTTCCCCGGACCGAGTATCAATAAGCTGCTCACGCGGCTGAAGGAGAAAGTGTTCGTT ACCCGGCTAGTTACTTCGAGAAACCTAGTACCACCATGAAGGTTGCGCAGCGTTTCGCTCCGCACAACCC CAGTGTAGATCAGGTCGATGAGTCACCGCGTTCCCCACGGGCGACCGTGGCGGTGGCTGCGTTGGCGGCC TGCCCATGGGGCAACCCATGGGACGCTTCAATACTGACATGGTGCGAAGAGTCTATTGAGCTAACTAGTA GTCCTCCGGCCCCTGAATGCGGATAATCTTAACTGCGGAGCAGGTGCTCACAATCCAGTGGGTGGCCTGT CGTAACGGGCAACTCTGCAGCGGAACCGACTACTTTGGGT
  • 45. SUBMITTING DATA 4513/05/2014 • Several biological databases are public, so any (properly identified) user can contribute uploading new data • There are multiple types of data to upload, but the most usual are – Sequencies – Expression data (from microarrays)
  • 46. SUBMITTING SEQUENCES 4613/05/2014 How to submit your sequences to… • EMBL – http://www.ebi.ac.uk/embl/Submission/ • GeneBank – http://www.nlm.nih.gov/pubs/factsheets/sdgenbk.html
  • 47. SUBMITTING EXPRESSION DATA 4713/05/2014 And your expression data to… • ArrayExpress (EBI) – http://www.ebi.ac.uk/microarray/submissions.html • Gene Expression Omnibus (NCBI) – https://www.ncbi.nlm.nih.gov/geo/info/faq.html
  • 48. Tools for DB exploitation 4813/05/2014
  • 49. ENTREZ 4913/05/2014 • It is the NCBI’s searching system • Great power and versatility, but less intuitive than SRS • It doesn’t provide forms for each field • Usually used in a “Top Bottom” manner – Perform a first query – Refine the results until reaching what you are looking for.
  • 51. ENTREZ QUERIES 5113/05/2014 • Boolean operators: AND, OR, NOT, “”, * • AND applied by default • Query by Accession Numbers (AC) in – Genbank / EMBL / DDBJ: • 1 char. + 5 nums. (U12345) • 2 char. + 6 nums. (AF123456) – SwissProt / PIR: • 1 char. + 5 nums. (P12345) • Refine queries with the reserved word LIMITS • Combine queries with HISTORY
  • 53. HELP AND INFORMATION SYSTEM 5313/05/2014
  • 54. Estamos interesados en el gen MLH1 humano, implicado en el cáncer de colon – Separar el grano de la paja: identificar una secuencia de mRNA representativa y bien anotada del gen MLH1. – Obtener literatura asociada y su secuencia protéica. – Identificar proteínas similares. – Identificar dominios conservados dentro de la proteína. – Identificar mutaciones conocidas en el gen o la proteína. – Encontrar la estructura tridimensional de la proteína, si esta es conocida, o si no es así, identificar estructuras de secuencia homóloga. – Ver el contexto genómico del gen y descargar la región que lo contiene. Vall d'Hebron Institut de Recerca 21/06/2011 Ejemplos de búsqueda con Entrez
  • 55. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta directa (1.1)
  • 56. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta directa (1.2) Límites
  • 57. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta directa (1.3) Filtros
  • 58. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta directa (1.4) Registro
  • 59. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (2) Enlaces a otras BD
  • 60. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (3) Secuencias
  • 61. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (4) Proteína
  • 62. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (5.1) Mutaciones
  • 63. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (5.2) SNPs
  • 64. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (5.3) OMIM
  • 65. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (6.1) Estructuras
  • 66. Mouse over the residues of NP_000240 until the grey footer bar shows ‘gi 4557757, loc 67’ (Glycine). Click on the corresponding Glycine residue in 1H7U_A (loc 74) to highlight it. In the structure window use the left mouse button to spin the 3D structure until you can clearly see and identify the highlighted residue. Is it possibly in the active site? For example, is it within 5 Ä of the ATPS molecule? Double click on the Mg-complexed ATPS to highlight it. Then use the menu bar option called ‘Show/Hide|Select By Distance|Residues Only’ to highlight all residues within 5 Ä of the ATPS. Indeed, the Glycine at position #74 is within 5 Ä and is likely part of the active site for this energy-producing domain. This hints at the possible problems a Gly  Trp mutation might cause at that position. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (6.2) Alineamiento de secuencia y estructura
  • 67. Vall d'Hebron Institut de Recerca 21/06/2011 Consulta (7) Visualización en contexto genómico