SlideShare uma empresa Scribd logo
1 de 23
Biological Databases
•Bioinformatics relies heavily on vast amounts of data on key
biological molecules
•Enormous amount of biological data are being generated every
day
•These raw data form the base from which biological information
is obtained
•From this primary information, logical interpretations can be
drawn by applying known principles of molecular biology
•This secondary information forms the basis for 'secondary'
databases for creating still more information
•The databases effectively store, manage, connect and distribute
data.
•Handling this rich data resource is, of course, a challenging and
uphill task, which requires considerable knowledge and skill.
Here is the role of bioinformatician
Creating Databases
•Large quantities of data produced daily
•To make data easily accessible, a filing & network of
biological information needed
•Biological database- collection of files containing
records of biological data in machine readable form,
arranged in fields and which can be accessed, added,
retrieved, manipulated & modified
•Data arranged as files in fields
•The data in a database are arranged by sets
of rules, which are programmed into
software that manages the data - Database
Management System / DBMS.
•Set by the owners of the databases
•Huge databases have sophisticated
methods of arranging data in the form of
arrays and tables - structured databases.
•Help in efficient and rapid data mining
Interaction with the database is made user-
friendly by creating Graphical User Interfaces or
GUIs.
GUIs provide pictorial representations or icons
that enable the user to interact through simple &
easily understandable mouse-driven commands
•Allow the uploading of raw data
•Each database will have its own file formats and
DBMS for storing and managing these data.
• Some of the data formats are Text, Sequence,
Structure, Links, etc.
Biological databases can thus be genomic
databases, nucleic acid and amino acid sequence
databases, protein databases, metabolic pathway
databases, protein family databases, structure
databases, taxonomic databases, bibliographic
databases, etc.
Accession number
•Unique identifier for a sequence record.
•Do not change, even if information in the record
is changed at the author's request.
Searching Databases
•Usual mode of search is through the usage for
appropriate keywords
•The software that goes along with the database uses
sets of questions through an repeated process called
structured query language (SQL).
•Most popular text-based search is available in
PubMed.
• This database is a repository of links to all scientific
literature in refered journals giving details of the
publication, including hyperlinks to the original
articles.
•GenBank - DNA sequences
•UniProt- amino acid sequences
Categories of Databases
1. Categories Based on Type of Data
•Primary database contains original data in the form of
primary sequence data or structural data as submitted by
the scientific community.
•Unique data obtained through laboratory experiments
and are retained as the original data. They are not
curated.
•Also known as archival databanks
Example: Nucleic acid databases: EMBL, GenBank,
DDBJ
Protein databases: Swiss-Prot, PDB, PIR, TrEMBL
Metabolite databases: KEGG, EcoCyc, MetaCyc
Secondary databases
• Also known as pattern databases.
•These contain information that has been processed &
derived from the raw data available in primary
databases.
•Here, the data are classified according to their
structure, models, common characteristics of sequence
classes, structure of domains and motifs, etc.
•Value added databases - derivative databases
Examples: PROSITE, PRINTS, BLOCKS, Pfam, etc.
Composite database
•Database that amalgamates a number of primary
sources, using a set of defined criteria.
•The choice of different data sources and the
application of different criteria result in the
emergence of composite databases, each of
which has its own particular format.
Eg. OWL( nucleic acid sequences),
NRDB(protein database) and SWISS-
PROT+TrEMBL.
2.Categories Based on Composition of Data Type
Sequence databases: either nucleotide or amino acid sequences, or
may contain both.
Genome databases: repositories of whole genome nucleotide
sequences of various organisms.
Micro-array databases: They contain data obtained from empirical
micro-array based experiments.
Metabolite databases: data on biochemical pathways, metabolites,
enzymes, etc. in different organisms.
Structure databases: They carry data on the 3D structure of proteins
and nucleotides.
Chemical databases: They store the data on chemical structures,
their composition, functional groups, etc.
Bibliographic databases: These are repositories of scientific
publications from accredited and peer-reviewed journals.
Eg PubMed
PubMed
•Bibliographic database.
•Free database accessing the MEDLINE
database of citations, abstracts and some full text
articles on life sciences and related fields
•Developed and maintained by the National
Center for Biotechnology Information (NCBI),
at the U.S National Library of Medicine (NLM),
located at the National Institutes Of Health
(NIH).
•Provides access to additional relevant Web sites
and links to the other NCBI molecular biology
resources.
3. Categories based on database configuration
•Flat file databases, Relational databases, Object
oriented databases and Hypertext databases
•flat file database is the simplest database model in
which all the information is stored in text files,
•ASN.I (Abstract Syntax Notation One) is an
International Standards organisation (ISO) data
representation format.
•NCBI uses this notation
Primary Database
•Composed of an array of nucleotide sequence
entries.
•These databases are data repositories that
accept nucleic acid sequence data and make it
freely available to the public.
Eg. EMBL, DDBJ and GenBank of NCBI.
NCBI (GenBank)
GenBank is hosted by, National Centre for Biotechnology
Information
•This offers all publicly available nucleotide sequences, their
protein translations, and their bibliographic and annotated
information.
•It also facilitates and encourages direct submission of sequence
data by providing a very simple and user friendly process.
•You can access the data in NCBI free of cost over the Internet
through their site, http://www.ncbi.nlm.nih.gov/genbank/
•Data can be submitted & it is released after quality assurance
check
DDBJ , DNA Data Bank of Japan
•Started in 1986.
• It is now hosted at national Institute of Genetics.
•DDBJ can be accessed though Internet via DDBJ
homepage, http://www.ddbj.nig.ac.jp/.
•Collect nucleotide sequences from researchers and to
issue the internationally recognized accession number
to data submitters.
•Each database entry includes details of sequences,
submitter's details, bibliographic references, biological
significance, and the scientific name and taxonomy of
the organism.
EMBL (European Molecular Biology Laboratory)
•Nucleotide sequence database (of DNA and
RNA) Hosted at UK by the EMBL European
Bioinformatics Institute.
•EMBL collects nucleotide sequence data from
individual researchers, genome sequencing
projects and patent applications.
•It was first established in 1974
•Sequences are stored in the database as they
would exist in the biological state.
•The stored data generally correspond to wild type
sequences without mutation or genetic
manipulations.
•OMIM (Online Mendelian Inheritance in Man)
Human gene database.
•OMIM focuses on the relationship between
phenotype and genotype.
•OMIM was developed for the World Wide Web
by NCBI
•It can be accessed at the URL:
http:/https://www.ncbi.nlm.nih.gov/omim
Basic Local Alignment Search Tool/ BLAST, is one of the most widely used
sequence analysis search tools used for comparing primary biological sequence
information
The BLAST program can be accessed over WWW or downloaded from
http://ncbi.nlm.nih.gov/ BLAST/ at NCBI
Programs
BLASTn - nucleotide query sequence against nucleotide sequence database
comparison.
BLASTp - protein query sequence against protein sequence database.
BLASTx - translated nucleotide query sequence against protein sequence
database.
tBLASTn - protein query sequence against translated nucleotide sequence
database.
tBLASTx - translated nucleotide query sequence against translated nucleotide
database.
PSI-BLAST - finds distant relatives of a protein.
MEGABLAST - Faster program used when large numbers of input sequences
are compared.
BLAST is much more effective for protein sequences than DNA sequences.
FASTA is a popular DNA and protein sequence
alignment/ database scanning program created byWR
Pearson and D J Lipman in 1988.
Programs
fasta: compares a query sequence and a group of
sequences of the same type (nucleotide or protein).
fastx: compares a translated nucleotide query sequence
and a group of protein sequences.
fasty: compares a DNA sequence to a protein sequence
database.
fasts: compares set of short peptide fragments against a
protein database

Mais conteúdo relacionado

Semelhante a biological databases.pptx

Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.pptSoumen Barman
 
Biological databases
Biological databasesBiological databases
Biological databasesAfra Fathima
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological databaseKAUSHAL SAHU
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics AakifahAmreen
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary databaseKAUSHAL SAHU
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.pptSanthiyaAK
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptBangaluru
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 

Semelhante a biological databases.pptx (20)

Biological data base
Biological data baseBiological data base
Biological data base
 
Hands on training_biological_databases.ppt
Hands on training_biological_databases.pptHands on training_biological_databases.ppt
Hands on training_biological_databases.ppt
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
Biological data bioinformatics
Biological data bioinformatics Biological data bioinformatics
Biological data bioinformatics
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.ppt
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Protein database
Protein  databaseProtein  database
Protein database
 
Biological databases
Biological databases Biological databases
Biological databases
 
Data Base in Bioinformatics.ppt
Data Base in Bioinformatics.pptData Base in Bioinformatics.ppt
Data Base in Bioinformatics.ppt
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Biological databases.pptx
Biological databases.pptxBiological databases.pptx
Biological databases.pptx
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
Composite and Specialized databases
Composite and Specialized databasesComposite and Specialized databases
Composite and Specialized databases
 

Último

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Último (20)

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

biological databases.pptx

  • 2. •Bioinformatics relies heavily on vast amounts of data on key biological molecules •Enormous amount of biological data are being generated every day •These raw data form the base from which biological information is obtained •From this primary information, logical interpretations can be drawn by applying known principles of molecular biology •This secondary information forms the basis for 'secondary' databases for creating still more information •The databases effectively store, manage, connect and distribute data. •Handling this rich data resource is, of course, a challenging and uphill task, which requires considerable knowledge and skill. Here is the role of bioinformatician
  • 3. Creating Databases •Large quantities of data produced daily •To make data easily accessible, a filing & network of biological information needed •Biological database- collection of files containing records of biological data in machine readable form, arranged in fields and which can be accessed, added, retrieved, manipulated & modified •Data arranged as files in fields
  • 4. •The data in a database are arranged by sets of rules, which are programmed into software that manages the data - Database Management System / DBMS. •Set by the owners of the databases •Huge databases have sophisticated methods of arranging data in the form of arrays and tables - structured databases. •Help in efficient and rapid data mining
  • 5. Interaction with the database is made user- friendly by creating Graphical User Interfaces or GUIs. GUIs provide pictorial representations or icons that enable the user to interact through simple & easily understandable mouse-driven commands •Allow the uploading of raw data •Each database will have its own file formats and DBMS for storing and managing these data. • Some of the data formats are Text, Sequence, Structure, Links, etc.
  • 6. Biological databases can thus be genomic databases, nucleic acid and amino acid sequence databases, protein databases, metabolic pathway databases, protein family databases, structure databases, taxonomic databases, bibliographic databases, etc. Accession number •Unique identifier for a sequence record. •Do not change, even if information in the record is changed at the author's request.
  • 7. Searching Databases •Usual mode of search is through the usage for appropriate keywords •The software that goes along with the database uses sets of questions through an repeated process called structured query language (SQL). •Most popular text-based search is available in PubMed. • This database is a repository of links to all scientific literature in refered journals giving details of the publication, including hyperlinks to the original articles. •GenBank - DNA sequences •UniProt- amino acid sequences
  • 8. Categories of Databases 1. Categories Based on Type of Data •Primary database contains original data in the form of primary sequence data or structural data as submitted by the scientific community. •Unique data obtained through laboratory experiments and are retained as the original data. They are not curated. •Also known as archival databanks Example: Nucleic acid databases: EMBL, GenBank, DDBJ Protein databases: Swiss-Prot, PDB, PIR, TrEMBL Metabolite databases: KEGG, EcoCyc, MetaCyc
  • 9. Secondary databases • Also known as pattern databases. •These contain information that has been processed & derived from the raw data available in primary databases. •Here, the data are classified according to their structure, models, common characteristics of sequence classes, structure of domains and motifs, etc. •Value added databases - derivative databases Examples: PROSITE, PRINTS, BLOCKS, Pfam, etc.
  • 10. Composite database •Database that amalgamates a number of primary sources, using a set of defined criteria. •The choice of different data sources and the application of different criteria result in the emergence of composite databases, each of which has its own particular format. Eg. OWL( nucleic acid sequences), NRDB(protein database) and SWISS- PROT+TrEMBL.
  • 11. 2.Categories Based on Composition of Data Type Sequence databases: either nucleotide or amino acid sequences, or may contain both. Genome databases: repositories of whole genome nucleotide sequences of various organisms. Micro-array databases: They contain data obtained from empirical micro-array based experiments. Metabolite databases: data on biochemical pathways, metabolites, enzymes, etc. in different organisms. Structure databases: They carry data on the 3D structure of proteins and nucleotides. Chemical databases: They store the data on chemical structures, their composition, functional groups, etc. Bibliographic databases: These are repositories of scientific publications from accredited and peer-reviewed journals. Eg PubMed
  • 12. PubMed •Bibliographic database. •Free database accessing the MEDLINE database of citations, abstracts and some full text articles on life sciences and related fields •Developed and maintained by the National Center for Biotechnology Information (NCBI), at the U.S National Library of Medicine (NLM), located at the National Institutes Of Health (NIH). •Provides access to additional relevant Web sites and links to the other NCBI molecular biology resources.
  • 13.
  • 14. 3. Categories based on database configuration •Flat file databases, Relational databases, Object oriented databases and Hypertext databases •flat file database is the simplest database model in which all the information is stored in text files, •ASN.I (Abstract Syntax Notation One) is an International Standards organisation (ISO) data representation format. •NCBI uses this notation
  • 15. Primary Database •Composed of an array of nucleotide sequence entries. •These databases are data repositories that accept nucleic acid sequence data and make it freely available to the public. Eg. EMBL, DDBJ and GenBank of NCBI.
  • 16. NCBI (GenBank) GenBank is hosted by, National Centre for Biotechnology Information •This offers all publicly available nucleotide sequences, their protein translations, and their bibliographic and annotated information. •It also facilitates and encourages direct submission of sequence data by providing a very simple and user friendly process. •You can access the data in NCBI free of cost over the Internet through their site, http://www.ncbi.nlm.nih.gov/genbank/ •Data can be submitted & it is released after quality assurance check
  • 17.
  • 18. DDBJ , DNA Data Bank of Japan •Started in 1986. • It is now hosted at national Institute of Genetics. •DDBJ can be accessed though Internet via DDBJ homepage, http://www.ddbj.nig.ac.jp/. •Collect nucleotide sequences from researchers and to issue the internationally recognized accession number to data submitters. •Each database entry includes details of sequences, submitter's details, bibliographic references, biological significance, and the scientific name and taxonomy of the organism.
  • 19.
  • 20. EMBL (European Molecular Biology Laboratory) •Nucleotide sequence database (of DNA and RNA) Hosted at UK by the EMBL European Bioinformatics Institute. •EMBL collects nucleotide sequence data from individual researchers, genome sequencing projects and patent applications. •It was first established in 1974 •Sequences are stored in the database as they would exist in the biological state. •The stored data generally correspond to wild type sequences without mutation or genetic manipulations.
  • 21. •OMIM (Online Mendelian Inheritance in Man) Human gene database. •OMIM focuses on the relationship between phenotype and genotype. •OMIM was developed for the World Wide Web by NCBI •It can be accessed at the URL: http:/https://www.ncbi.nlm.nih.gov/omim
  • 22. Basic Local Alignment Search Tool/ BLAST, is one of the most widely used sequence analysis search tools used for comparing primary biological sequence information The BLAST program can be accessed over WWW or downloaded from http://ncbi.nlm.nih.gov/ BLAST/ at NCBI Programs BLASTn - nucleotide query sequence against nucleotide sequence database comparison. BLASTp - protein query sequence against protein sequence database. BLASTx - translated nucleotide query sequence against protein sequence database. tBLASTn - protein query sequence against translated nucleotide sequence database. tBLASTx - translated nucleotide query sequence against translated nucleotide database. PSI-BLAST - finds distant relatives of a protein. MEGABLAST - Faster program used when large numbers of input sequences are compared. BLAST is much more effective for protein sequences than DNA sequences.
  • 23. FASTA is a popular DNA and protein sequence alignment/ database scanning program created byWR Pearson and D J Lipman in 1988. Programs fasta: compares a query sequence and a group of sequences of the same type (nucleotide or protein). fastx: compares a translated nucleotide query sequence and a group of protein sequences. fasty: compares a DNA sequence to a protein sequence database. fasts: compares set of short peptide fragments against a protein database