Protein databases contain information on protein sequences, structures, and functions. The major protein databases are:
- Protein Data Bank (PDB) which contains 3D protein structures determined via X-ray crystallography or NMR.
- Swiss-Prot which contains manually annotated protein sequences and functions.
- TrEMBL which supplements Swiss-Prot with automatically annotated translations of DNA sequences.
Protein databases are important for comparing proteins, understanding relationships between proteins, and aiding the study of new proteins. Searching databases is often the first step in protein research.
2. PROTEIN DATABASES
What are PROTEIN ?
PROTEIN DATABASES TYPES
• Protein Information Resource (PIR)
• SWISS-PROT
• Protein Databank (PDB)
Importance of Protein Databases
10. • Protein Information Resource (PIR)
• SWISS-PROT
• Protein Databank (PDB)
PROTEIN DATABASES
11. Protein Information Resource (PIR)
History
The Protein Information Resource (PIR) is an integrated
public bioinformatics resource to support genomic, proteomic and
systems biology research and scientific studies.
PIR was established in 1984 by the National Biomedical
Research Foundation (NBRF) as a resource to assist researchers in
the identification and interpretation of protein sequence
information.
For over four decades, beginning with the Atlas of Protein
Sequence and Structure, PIR has provided protein databases and
analysis tools freely accessible to the scientific community including
the Protein Sequence Database (PSD).
12. In 2002 PIR, along with its international
partners, EBI (European Bioinformatics Institute)
and SIB (Swiss Institute of Bioinformatics), were awarded
a grant from NIH to create UniProt, a single worldwide
database of protein sequence and function, by unifying
the PIR-PSD, Swiss-Prot, and TrEMBL databases.
Today, PIR maintains staff at UD and GUMC and
continues to offer world leading resources to assist with
proteomic and genomic data integration and the
propagation and standardization of protein annotation.
13. Protein Databank (PDB):
• PDB is a primary protein structure database. It is a
crystallographic database for the three-dimensional
structure of large biological molecules, such as proteins.
• In spite of the name, PDB archive the three-dimensional
structures of not only proteins but also all biologically
important molecules, such as nucleic acid fragments,
RNA molecules, large peptides such as antibiotic
gramicidin and complexes of protein and nucleic acids.
• The database holds data derived from mainly three
sources: Structure determined by X-ray crystallography,
NMR experiments, and molecular modeling.
14. SWISS-PROT
• The other well known and extensively used
protein database is SWISS-PROT.
• The data in each entry can be considered
separately as core data and annotation.
• The core data consists of the sequences entered
in common single letter amino acid code, and the
related references and bibliography. The
taxonomy of the organism from which the
sequence was obtained also forms part of this
core information.
15. The annotation contains information on the
function or functions of the protein, post-
translational modification such as phosphorylation,
acetylation, etc., functional and structural domains
and sites, such as calcium binding regions, ATP-
binding sites, zinc fingers, etc., known secondary
structural features as for examples alpha helix, beta
sheet, etc., the quaternary structure of the protein,
similarities to other protein if any, and diseases that
may arise due to different authors publishing
different sequences for the same protein, or due to
mutations in different strains of an described as
part of the annotation.
16. TrEMBL (for Translated EMBL)
It is a also computer-annotated protein
sequence database that is released as a
supplement to SWISS-PROT. It contains the
translation of all coding sequences present in
the EMBL Nucleotide database, which have not
been fully annotated. Thus it may contain the
sequence of proteins that are never expressed
and never actually identified in the organisms.
17. • UniProtKB/Swiss-Prot which is manually
annotated and is reviewed and
• UniProtKB/TrEMBL which is automatically
annotated and is not reviewed
18. Importance of Protein Databases
Huge amounts of data for protein structures,
functions, and particularly sequences are being
generated. Searching databases are often the first
step in the study of a new protein. It has the
following uses:
• Comparison between proteins or between
protein families provides information about the
relationship between proteins within a genome
or across different species and hence offers much
more information that can be obtained by
studying only an isolated protein.
19. Importance of Protein Databases
• Secondary databases derived from
experimental databases are also widely
available. These databases reorganize and
annotate the data or provide predictions.
• The use of multiple databases often helps
researchers understand the structure and
function of a protein.