SlideShare a Scribd company logo
1 of 24
Proteins Databases
Hafiz.M.Zeeshan.Raza
Research Assistant_HEC_NRPU
hafizraza26@gmail.com
COMSATS UNIVERSITY Islamabad
SAHIWAL Campus, Punjab, Pakistan
Overview
• Introduction
• Sections of Database
• Importance of GenBank
Introduction
• Protein databases have become a crucial part of modern biology. Huge
amounts of data for protein structures, functions, and particularly
sequences are being generated.
• These data cannot be handled without using computer databases.
Searching databases is often the first step in the study of a new protein.
• Without the prior knowledge obtained from such searches, known
information about the protein could be missed, or an experiment could be
repeated unnecessarily.
• Comparison between proteins and protein classification provide
information about the relationship between proteins within a genome or
across different species, and hence offer much more information than can
be obtained by studying only an isolated protein.
Continue…
• Thanks to the Human Genome Project and other sequencing efforts, new
sequences have been generated at a prodigious rate.
• These sequences provide a rich information source and are the core of the
revolutionary movement toward “large-scale biology.”
• The protein sequences can be computationally annotated from these genomic
sequences. Various databases contain protein sequences with different focuses.
• Most protein databases have interactive search engines so that users can specify
their needs and obtain the related information interactively.
• Many protein databases also allow submitters to deposit data, and database
servers can check the format of the data and provide immediate feedback.
Protein Sequence Databases
• Protein bioinformatics databases can be primarily classified as sequence
databases, 2D gel databases, 3D structure databases, chemistry
databases, enzyme and pathway databases, family and domain databases,
gene expression databases, genome annotation databases, organism
specific databases, phylogenomic databases, polymorphism and mutation
database , protein-protein interaction databases, proteomic databases,
PTM databases, ontologies, specialized protein databases, and other
(miscellaneous) databases.
Protein Sequence Databases
• Among all protein sequence databases, UniProt is the most widely used
one. It provides more annotations than any other sequence database with
a minimal level of redundancy through human input or integration with
other databases. UniProtKB has three components:
1. Protein knowledgebase, including Swiss-Prot (manually annotated and
reviewed) and TrEMBL (automatically annotated).
2. UniRef (sequence clusters for fast sequence similarity searches).
3. UniParc (sequence archive for keeping track of sequences and their
identifiers).
Continue…
• In addition to Swiss-Prot and TrEMBL, UniProtKB includes information from Protein
Sequence Database (PSD) in the Protein Identification Resource, which builds a
complete and non-redundant database from a number of protein and nucleic acid
sequence databases together with bibliographic and annotated information.
• The National Center for Biotechnology Information (NCBI;
http://www.ncbi.nlm.nih.gov) also provides rich information and a number of
useful tools for protein sequences.
• It includes entries from the non-redundant GenBank translations, UniProt, PIR,
Protein Research Foundation (PRF) in Japan, and the Protein Data Bank (PDB).
Continue…
• UniProt, as a curated protein sequence database, offers a portal to
a wide range of annotations, covering areas such as function,
family, domain parsing, post-translational modifications, and
variants. UniProt can be accessed at http://www.uniprot.org.
• Human vitronectin is used here as an example for searching protein
sequence databases. To locate the UniProt entry for this protein,
one can search either the entry name (VTNC_HUMAN) or the
accession number (P04004) obtained from a BLAST search.
Continue…
• Each entry contains the following items shown in table format in the NiceProt View layout:
1. Name and origin
2. Protein attributes
3. General annotation
4. Ontologies (gene functions)
5. Binary protein-protein interactions
6. Sequence annotation (features)
7. Sequence
8. References (literature citation)
9. Web resources
10. Cross-references (links to other databases)
11. Entry information, and
12. Relevant documents.
RefSeq Database
• The National Center for Biotechnology Information Reference Sequence (NCBI
RefSeq) database provides curated non-redundant sequences of genomic regions,
transcripts and proteins for taxonomically diverse organisms including Archaea,
Bacteria, Eukaryotes, and Viruses.
• RefSeq database is derived from the sequence data available in the redundant
archival database GenBank. RefSeq sequences include coding regions, conserved
domains, variations etc. and enhanced annotations such as publications, names,
symbols, aliases, Gene IDs, and database cross-references.
• The sequences and annotations are generated using a combined approach of
collaboration, automated prediction, and manual curation.
Continue…
• The RefSeq records can be directly accessed from NCBI web sites by
search of the Nucleotide or Protein databases, BLAST searches
against selected databases and FTP downloads.
• RefSeq records are also available through indirect links from other
NCBI resources such as Gene, Genome, BioProject, dbSNP, ClinVar
and Map Viewer etc.
• In addition, RefSeq supports programmatic access through Entrez
Programming Utilities.
PROTEIN STRUCTURAL DATABASES
• Searching structure databases is becoming more and more popular in molecular
biology.
• The three-dimensional structures of proteins not only define their biological
functions, but also hold a key in rational drug design.
• Traditionally, protein structures were solved at a low throughput mode.
• However, advances in new technologies, such as synchrotron radiation sources and
high-resolution nuclear magnetic resonance (NMR), accelerate the rate of protein
structure determination substantially.
• The only international repository for the processing and distribution of protein
structures is the PDB.
Continue…
• The worldwide PDB (wwPDB, http://www.wwpdb.org) was established in 2003
as an international collaboration to maintain a single and publicly available
Protein Data Bank Archive (PDB Archive) of macro-molecular structural data.
• The wwPDB member includes Protein Data Bank in Europe (PDBe), Protein
Data Bank Japan (PDBj), Research Collaboratory for Structural Bioinformatics
Protein Data Bank (RCSB PDB), and Biological Magnetic Resonance Bank
(BMRB).
• The “PDB Archive” is a collection of flat files in three different formats: the
legacy PDB format; the PDBx/mmCIF (http:// deposit.pdb.org/mmcif/) format;
and the Protein Data Bank Markup Language (PDBML) format.
• Each member site serves as a deposition, data processing and distribution site
for the PDB Archive, and each provides its own view of the primary data and a
variety of tools and resources.
Protein Family Databases
• Proteins can be classified according to their sequence, evolutionary,
structural, or functional relationships.
• A protein in the context of its family is much more informative than the
single protein itself.
• For example, residues conserved across the family often indicate special
functional roles.
• Two proteins classified in the same functional family may suggest that
they share similar structures, even when their sequences do not have
significant similarity.
Continue…
• There is no unique way to classify proteins into families. Boundaries between
different families may be subjective.
• The choice of classification system depends in part on the problem; in general, the
author suggests looking into classification systems from different databases and
comparing them.
• Three types of classification methods are widely adopted based upon the similarity
of sequence, structure, or function.
• Sequence-based methods are applicable to any proteins whose sequences are
known, while structure-based methods are limited to the proteins of known
structures, and function-based methods depend on the functions of proteins being
annotated.
Continue…
• Sequence- and structure-based classifications can be automated and are scalable
to high-throughput data, whereas function-based classification is typically carried
out manually.
• Structure- and function-based methods are more reliable, while sequence-based
methods may result in a false positive result when sequence similarity is weak (i.e.,
two proteins are classified into one family by chance rather than by any biological
significance).
• In addition, since protein structure and function are better conserved than
sequence, two proteins having similar structures or similar functions may not be
identified through sequence-based methods
InterPro Database
• InterPro is an integrated resource of predictive models or ‘signatures’ representing
protein domains, families, regions, repeats and sites from major protein signature
databases including CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom ,
PROSITE, SMART, SUPERFAMILY and TIGRFAMs.
• Each entry in the InterPro database is annotated with a descriptive abstract name and
cross references to the original data sources, as well as to specialized functional
databases.
• The search by sequence or domain architecture is provided by InterPro web site. The
InterPro signatures in XML format are available via anonymous FTP download.
• InterPro also provides a software package InterProScan that can be used locally to scan
protein sequences against InterPro’s signatures.
Pfam Database
• Pfam is a database of protein families represented as multiple sequence alignments and
Hidden Markov Models (HMMs).
• Pfam entries can be classified as Family (related protein regions), Domain (protein
structural unit), Repeat (multiple short protein structural units), Motifs (short protein
structural unit outside global domains).
• Related Pfam entries are grouped into clans based on sequence, structure or profile-
HMM similarity.
• The Pfam database web site provides search interface for querying by sequence,
keyword, domain architecture, taxonomy, and browse interfaces for analyzing protein
sequences for Pfam matches and viewing Pfam annotations in domain architectures,
sequence alignments, interactions, species and protein structures in PDB.
PIRSF Database
• The PIRSF classification system provides comprehensive and non overlapping clustering
of UniProtKB sequences into a hierarchical order to reflect their evolutionary
relationships based on whole proteins rather than on the component domains.
• The PIRSF system classifies the protein sequences into families, whose members are
both homologous (evolved from a common ancestor) and homeomorphic (sharing full-
length sequence similarity and a common domain architecture).
• The PIRSF family classification results are expert-curated based on literature review and
integrative sequence and functional analysis.
• The current release of PIRSF contains 11,800 families, which cover 5,407,000 UniProtKB
protein sequences.
PROSITE
• PROSITE is a database of documentation entries describing protein domains, families
and functional sites as well as associated patterns and profiles to identify them.
• The entries are derived from multiple alignments of homologous sequences and have
the advantage of identifying distant relationships between sequences.
• PROSITE includes a collection of ProRules based on profiles and patterns of functionally
and/or structurally critical amino acids that can be used to increase PROSITE’s
discriminatory power.
• The PROSITE web site provides keyword-based search and allows browsing by
documentation entry, ProRule description, taxonomic scope and number of positive
hits.
PRIDE Database
• The PRoteomics IDEntifications database (PRIDE) is a repository for mass-
spectrometry based proteomics data including identifications of proteins, peptides
and post-translational modifications that have been described in the scientific
literature, together with supporting mass spectra and related technical and
biological metadata.
• PRIDE supports tandem MS (MS/MS) and Peptide Fingerprinting datasets with
search/analysis workflows originally analyzed by the submitters.
• PIRDE provides several services such as the Protein Identifier Cross-Reference
(PICR), the Ontology Lookup Service (OLS) and Database on Demand.
MEROPS (Metalloprotease) Database
• MEROPS is an integrated database of information about peptidases (also termed proteases,
proteinases and proteolytic enzymes) and the proteins that inhibit them.
• A homologous set of peptidases and protein inhibitors are grouped into peptidase and
inhibitor species.
• Protein species are grouped into family that contains statistically significant similarities in
amino acid sequence. Families are grouped into clans that contain related structures.
• Both family (sub-family) and clan can be browsed by index page with links to their summary
page. Each peptidase has a summary page that can be browsed by Name, Identifier, Gene
name, Organism and Substrates.
• The peptidase summary page includes information of Gene Structure, Alignment, Tree,
Sequences and their features, Distribution, Structure, Literature, Human EST, Mouse EST,
Substrates, Inhibitors and Pharma.
Importance of Databases
• Protein databases have become a crucial part of modern biology. Huge amounts of data for
protein structures, functions, and particularly sequences are being generated.
• Searching databases is often the first step in the study of a new protein. Comparison
between proteins or between protein families provides information about the relationship
between proteins within a genome or across different species, and hence offers much more
information than can be obtained by studying only an isolated protein.
• In addition, secondary databases derived from experimental databases are also
widely available. These databases reorganize and annotate the data or provide predictions.
The use of multiple databases often helps researchers understand the structure and function
of a protein.
• Although some protein databases are widely known, they are far from being fully utilized in
the protein science community.
Proteins databases

More Related Content

What's hot

What's hot (20)

Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Cath
CathCath
Cath
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Protein database
Protein databaseProtein database
Protein database
 
Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment   Clustal W - Multiple Sequence alignment
Clustal W - Multiple Sequence alignment
 
Primary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyanaPrimary and secondary databases ppt by puneet kulyana
Primary and secondary databases ppt by puneet kulyana
 
Ddbj
DdbjDdbj
Ddbj
 
Composite and Specialized databases
Composite and Specialized databasesComposite and Specialized databases
Composite and Specialized databases
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Protein sequence databases
Protein sequence databasesProtein sequence databases
Protein sequence databases
 

Similar to Proteins databases

biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptxscience lover
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxVandana Yadav03
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.pptSanthiyaAK
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databasesSangeeta Das
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...BibiQuinah
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 

Similar to Proteins databases (20)

Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Protein database
Protein  databaseProtein  database
Protein database
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
Biological databases
Biological databases Biological databases
Biological databases
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.ppt
 
Bioinformatics biological databases
Bioinformatics biological databasesBioinformatics biological databases
Bioinformatics biological databases
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Biological database
Biological databaseBiological database
Biological database
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 

More from Hafiz Muhammad Zeeshan Raza

Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...Hafiz Muhammad Zeeshan Raza
 
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Hafiz Muhammad Zeeshan Raza
 
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...Hafiz Muhammad Zeeshan Raza
 
Quality control of sequencing with fast qc obtained with
Quality control of sequencing with fast qc obtained withQuality control of sequencing with fast qc obtained with
Quality control of sequencing with fast qc obtained withHafiz Muhammad Zeeshan Raza
 
DNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional ModificationDNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional ModificationHafiz Muhammad Zeeshan Raza
 

More from Hafiz Muhammad Zeeshan Raza (13)

Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...Car manufacturing is a complex and fascinating industry that plays a signific...
Car manufacturing is a complex and fascinating industry that plays a signific...
 
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
Experience of New Graduate Nurses Feeling Not Ready for Professional Role on ...
 
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
TO ANALYZE THE ROLE OF RURAL WOMAN'S TO ENSURE CHILD NUTRITION IN DISTRICT RA...
 
OMANTEL
OMANTELOMANTEL
OMANTEL
 
Quality control of sequencing with fast qc obtained with
Quality control of sequencing with fast qc obtained withQuality control of sequencing with fast qc obtained with
Quality control of sequencing with fast qc obtained with
 
Cell organelles
Cell organellesCell organelles
Cell organelles
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Translation & Post Translational Modifications
Translation & Post Translational ModificationsTranslation & Post Translational Modifications
Translation & Post Translational Modifications
 
DNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional ModificationDNA transcription & Post Transcriptional Modification
DNA transcription & Post Transcriptional Modification
 
Recombinant DNA technology
Recombinant DNA technologyRecombinant DNA technology
Recombinant DNA technology
 
Restriction Fragment Length Polymorphism (RFLP)
Restriction Fragment Length Polymorphism (RFLP)Restriction Fragment Length Polymorphism (RFLP)
Restriction Fragment Length Polymorphism (RFLP)
 
Mendeley software beginers
Mendeley software beginersMendeley software beginers
Mendeley software beginers
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 

Recently uploaded

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 

Recently uploaded (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

Proteins databases

  • 2. Overview • Introduction • Sections of Database • Importance of GenBank
  • 3. Introduction • Protein databases have become a crucial part of modern biology. Huge amounts of data for protein structures, functions, and particularly sequences are being generated. • These data cannot be handled without using computer databases. Searching databases is often the first step in the study of a new protein. • Without the prior knowledge obtained from such searches, known information about the protein could be missed, or an experiment could be repeated unnecessarily. • Comparison between proteins and protein classification provide information about the relationship between proteins within a genome or across different species, and hence offer much more information than can be obtained by studying only an isolated protein.
  • 4. Continue… • Thanks to the Human Genome Project and other sequencing efforts, new sequences have been generated at a prodigious rate. • These sequences provide a rich information source and are the core of the revolutionary movement toward “large-scale biology.” • The protein sequences can be computationally annotated from these genomic sequences. Various databases contain protein sequences with different focuses. • Most protein databases have interactive search engines so that users can specify their needs and obtain the related information interactively. • Many protein databases also allow submitters to deposit data, and database servers can check the format of the data and provide immediate feedback.
  • 5. Protein Sequence Databases • Protein bioinformatics databases can be primarily classified as sequence databases, 2D gel databases, 3D structure databases, chemistry databases, enzyme and pathway databases, family and domain databases, gene expression databases, genome annotation databases, organism specific databases, phylogenomic databases, polymorphism and mutation database , protein-protein interaction databases, proteomic databases, PTM databases, ontologies, specialized protein databases, and other (miscellaneous) databases.
  • 6. Protein Sequence Databases • Among all protein sequence databases, UniProt is the most widely used one. It provides more annotations than any other sequence database with a minimal level of redundancy through human input or integration with other databases. UniProtKB has three components: 1. Protein knowledgebase, including Swiss-Prot (manually annotated and reviewed) and TrEMBL (automatically annotated). 2. UniRef (sequence clusters for fast sequence similarity searches). 3. UniParc (sequence archive for keeping track of sequences and their identifiers).
  • 7. Continue… • In addition to Swiss-Prot and TrEMBL, UniProtKB includes information from Protein Sequence Database (PSD) in the Protein Identification Resource, which builds a complete and non-redundant database from a number of protein and nucleic acid sequence databases together with bibliographic and annotated information. • The National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov) also provides rich information and a number of useful tools for protein sequences. • It includes entries from the non-redundant GenBank translations, UniProt, PIR, Protein Research Foundation (PRF) in Japan, and the Protein Data Bank (PDB).
  • 8. Continue… • UniProt, as a curated protein sequence database, offers a portal to a wide range of annotations, covering areas such as function, family, domain parsing, post-translational modifications, and variants. UniProt can be accessed at http://www.uniprot.org. • Human vitronectin is used here as an example for searching protein sequence databases. To locate the UniProt entry for this protein, one can search either the entry name (VTNC_HUMAN) or the accession number (P04004) obtained from a BLAST search.
  • 9. Continue… • Each entry contains the following items shown in table format in the NiceProt View layout: 1. Name and origin 2. Protein attributes 3. General annotation 4. Ontologies (gene functions) 5. Binary protein-protein interactions 6. Sequence annotation (features) 7. Sequence 8. References (literature citation) 9. Web resources 10. Cross-references (links to other databases) 11. Entry information, and 12. Relevant documents.
  • 10. RefSeq Database • The National Center for Biotechnology Information Reference Sequence (NCBI RefSeq) database provides curated non-redundant sequences of genomic regions, transcripts and proteins for taxonomically diverse organisms including Archaea, Bacteria, Eukaryotes, and Viruses. • RefSeq database is derived from the sequence data available in the redundant archival database GenBank. RefSeq sequences include coding regions, conserved domains, variations etc. and enhanced annotations such as publications, names, symbols, aliases, Gene IDs, and database cross-references. • The sequences and annotations are generated using a combined approach of collaboration, automated prediction, and manual curation.
  • 11. Continue… • The RefSeq records can be directly accessed from NCBI web sites by search of the Nucleotide or Protein databases, BLAST searches against selected databases and FTP downloads. • RefSeq records are also available through indirect links from other NCBI resources such as Gene, Genome, BioProject, dbSNP, ClinVar and Map Viewer etc. • In addition, RefSeq supports programmatic access through Entrez Programming Utilities.
  • 12. PROTEIN STRUCTURAL DATABASES • Searching structure databases is becoming more and more popular in molecular biology. • The three-dimensional structures of proteins not only define their biological functions, but also hold a key in rational drug design. • Traditionally, protein structures were solved at a low throughput mode. • However, advances in new technologies, such as synchrotron radiation sources and high-resolution nuclear magnetic resonance (NMR), accelerate the rate of protein structure determination substantially. • The only international repository for the processing and distribution of protein structures is the PDB.
  • 13. Continue… • The worldwide PDB (wwPDB, http://www.wwpdb.org) was established in 2003 as an international collaboration to maintain a single and publicly available Protein Data Bank Archive (PDB Archive) of macro-molecular structural data. • The wwPDB member includes Protein Data Bank in Europe (PDBe), Protein Data Bank Japan (PDBj), Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), and Biological Magnetic Resonance Bank (BMRB). • The “PDB Archive” is a collection of flat files in three different formats: the legacy PDB format; the PDBx/mmCIF (http:// deposit.pdb.org/mmcif/) format; and the Protein Data Bank Markup Language (PDBML) format. • Each member site serves as a deposition, data processing and distribution site for the PDB Archive, and each provides its own view of the primary data and a variety of tools and resources.
  • 14. Protein Family Databases • Proteins can be classified according to their sequence, evolutionary, structural, or functional relationships. • A protein in the context of its family is much more informative than the single protein itself. • For example, residues conserved across the family often indicate special functional roles. • Two proteins classified in the same functional family may suggest that they share similar structures, even when their sequences do not have significant similarity.
  • 15. Continue… • There is no unique way to classify proteins into families. Boundaries between different families may be subjective. • The choice of classification system depends in part on the problem; in general, the author suggests looking into classification systems from different databases and comparing them. • Three types of classification methods are widely adopted based upon the similarity of sequence, structure, or function. • Sequence-based methods are applicable to any proteins whose sequences are known, while structure-based methods are limited to the proteins of known structures, and function-based methods depend on the functions of proteins being annotated.
  • 16. Continue… • Sequence- and structure-based classifications can be automated and are scalable to high-throughput data, whereas function-based classification is typically carried out manually. • Structure- and function-based methods are more reliable, while sequence-based methods may result in a false positive result when sequence similarity is weak (i.e., two proteins are classified into one family by chance rather than by any biological significance). • In addition, since protein structure and function are better conserved than sequence, two proteins having similar structures or similar functions may not be identified through sequence-based methods
  • 17. InterPro Database • InterPro is an integrated resource of predictive models or ‘signatures’ representing protein domains, families, regions, repeats and sites from major protein signature databases including CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom , PROSITE, SMART, SUPERFAMILY and TIGRFAMs. • Each entry in the InterPro database is annotated with a descriptive abstract name and cross references to the original data sources, as well as to specialized functional databases. • The search by sequence or domain architecture is provided by InterPro web site. The InterPro signatures in XML format are available via anonymous FTP download. • InterPro also provides a software package InterProScan that can be used locally to scan protein sequences against InterPro’s signatures.
  • 18. Pfam Database • Pfam is a database of protein families represented as multiple sequence alignments and Hidden Markov Models (HMMs). • Pfam entries can be classified as Family (related protein regions), Domain (protein structural unit), Repeat (multiple short protein structural units), Motifs (short protein structural unit outside global domains). • Related Pfam entries are grouped into clans based on sequence, structure or profile- HMM similarity. • The Pfam database web site provides search interface for querying by sequence, keyword, domain architecture, taxonomy, and browse interfaces for analyzing protein sequences for Pfam matches and viewing Pfam annotations in domain architectures, sequence alignments, interactions, species and protein structures in PDB.
  • 19. PIRSF Database • The PIRSF classification system provides comprehensive and non overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships based on whole proteins rather than on the component domains. • The PIRSF system classifies the protein sequences into families, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full- length sequence similarity and a common domain architecture). • The PIRSF family classification results are expert-curated based on literature review and integrative sequence and functional analysis. • The current release of PIRSF contains 11,800 families, which cover 5,407,000 UniProtKB protein sequences.
  • 20. PROSITE • PROSITE is a database of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them. • The entries are derived from multiple alignments of homologous sequences and have the advantage of identifying distant relationships between sequences. • PROSITE includes a collection of ProRules based on profiles and patterns of functionally and/or structurally critical amino acids that can be used to increase PROSITE’s discriminatory power. • The PROSITE web site provides keyword-based search and allows browsing by documentation entry, ProRule description, taxonomic scope and number of positive hits.
  • 21. PRIDE Database • The PRoteomics IDEntifications database (PRIDE) is a repository for mass- spectrometry based proteomics data including identifications of proteins, peptides and post-translational modifications that have been described in the scientific literature, together with supporting mass spectra and related technical and biological metadata. • PRIDE supports tandem MS (MS/MS) and Peptide Fingerprinting datasets with search/analysis workflows originally analyzed by the submitters. • PIRDE provides several services such as the Protein Identifier Cross-Reference (PICR), the Ontology Lookup Service (OLS) and Database on Demand.
  • 22. MEROPS (Metalloprotease) Database • MEROPS is an integrated database of information about peptidases (also termed proteases, proteinases and proteolytic enzymes) and the proteins that inhibit them. • A homologous set of peptidases and protein inhibitors are grouped into peptidase and inhibitor species. • Protein species are grouped into family that contains statistically significant similarities in amino acid sequence. Families are grouped into clans that contain related structures. • Both family (sub-family) and clan can be browsed by index page with links to their summary page. Each peptidase has a summary page that can be browsed by Name, Identifier, Gene name, Organism and Substrates. • The peptidase summary page includes information of Gene Structure, Alignment, Tree, Sequences and their features, Distribution, Structure, Literature, Human EST, Mouse EST, Substrates, Inhibitors and Pharma.
  • 23. Importance of Databases • Protein databases have become a crucial part of modern biology. Huge amounts of data for protein structures, functions, and particularly sequences are being generated. • Searching databases is often the first step in the study of a new protein. Comparison between proteins or between protein families provides information about the relationship between proteins within a genome or across different species, and hence offers much more information than can be obtained by studying only an isolated protein. • In addition, secondary databases derived from experimental databases are also widely available. These databases reorganize and annotate the data or provide predictions. The use of multiple databases often helps researchers understand the structure and function of a protein. • Although some protein databases are widely known, they are far from being fully utilized in the protein science community.