SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Protein Information Resource
(PIR)
Introduction
• An integrated publicly accessible bioinformatics resource to support
genomic/proteomic research and scientific discovery.
• Established in 1984, by the National Biomedical Research Foundation
(NBRF) Georgetown University Medial Center, Washington D.C.,
USA.
• It is the source of annotated protein databases and analysis tools for
the researchers.
• Serve as primary resource for the exploration of protein information.
• Accessible by text search for entry and list retrieval, and also BLAST
search and peptide match.
Features of PIR
Comprehensive, Non-redundant, Annotated database
contain protein sequences of prokaryotes, eukaryotes,
viruses, phages, archaea.
Data is well organized. Entries classified into protein
family and super-family.
Protein Sequence Database (PSD) cross-references to
other genomic and proteomic public databases
Updated weekly and full release are published
quarterly.
Provide cross reference between its own databases.
Database Organization and Annotation
• The basis of database organization and annotation lies in their proper
structuring according to protein family relationships.
• According to protein family relationships, the database can be
structured at three level:
1. Super families and families  for full length sequence similarity
2. Homology domain  for local functional and structural units
3. Motifs  for functional and structural sites
Resources of PIR
The resources of PIR can be broadly classified into two
categories:
1. Data retrieval systems
2. Databases
Data Retrieval in PIR
Data Retrieval in PIR consist of search engines of three types.
Interactive text-based
search engine
Standard Sequence
similarity search engines
Advanced Search
Engines
Boolean queries of
text fields Peptide match
Pattern match
BLAST
FASTA
Pair-wise alignment
Multiple alignment
0 (false)
1 (true)
Combine sequence
similarity and
annotation searches
Evaluation of gene-
family relationship
Databases of PIR
UniProt- Universal Protein Resource
PIR +
EBI (European Bioinformatics Institute)
SIB (Swiss Institute of Bioinformatics)
UniProt
United Protein Database
Central resource of Protein Sequence & Function
UniProt- Universal Protein Resource
The UniProt database consist of the following three database:
1. UniProt Knowledgebase (UniProtKB)
2. UniProt Reference Cluster (UniRef)
3. UniProt Archive (UniParc)
UniProt Knowledgebase (UniProtKB)
• Central database of protein sequences with annotation and functional information.
• Provide single record for all protein products derived from a certain gene from a
certain species.
• Give details of accession number, alternative splicing, proteolytic cleavage, post-
translational modifications to each from of derived protein.
2 Parts
Contain Manually Annotated Records Contain Computationally Analyzed Records
UniProt/Swiss-Prot UniProt/TrEMBL
Which have to be manually annotated
UniProt Reference Cluster (UniRef)
• Provide non-redundant data collections based on UniProt
Knowledgebase and UniParc to obtain complete coverage of sequence
space at several resolution.
3 separate datasets that compress sequence space at different resolution:
• Sequences that are 100% identical (UniRef100 database)
• Sequences that are >= 90% identical (UniRef90 database)
• Sequences that are >= 50% identical (UniRef50 database)
UniProt Archive (UniParc)
• Provides a stable, comprehensive, non-redundant sequence collection
by storing the complete body of publicly available protein sequence
data.
• On addition of new or revised protein sequences, a UniParc sequence
version is provided or increased and thus makes it possible to track the
history of sequence changes in all the source databases.
• To avoid redundancy, each unique sequence is assigned a unique
identifier and is stored only once.
• Basic information stored with each UniParc entry are the identifier, the
sequence, cylic redundancy check number, source database with
accession or version number and a time stamp.
iProClass- Integrated Protein
Knowledgebase
• Provides comprehensive description of a protein family, function and
structure for UniProt protein sequences, and serve as a framework for
data integration in a distributed networking environment.
• Contain non-redundant protein sequences from PIR-PSD, Swiss-Prot,
TrEMBL.
iProClass
Family relationships Structural
classifications
Functional
classifications
Global level
(superfamily, family)
Local level
(domain, motif, site)
Types of Protein sequence reports
iProClass
2 Types
1st Types 2nd Types
Cover information on
Structure
Function
Family
Genetics
Disease
Ontology
Taxonomy
Literature
With reference to
relevant molecular
databases
Super-family report with
Length
Taxonomy
Keyword statistics
Complete member listing
PIRSF-Protein Family Classification
System
• PIR extended its super-family concept and developed the Super-
Family Classification system.
• To facilitate the sensible propagation and standardization of protein
annotation and systematic detection of annotation errors.
• Consists of two datasets: Preliminary clusters and curated families.
• Curated families include family name, protein membership, parent-
child relationship, domain architecture, optional description and
bibliography.
iProLINK
Integrated Protein Literature INformation and Knowledge
Provides annotated literature, protein name directory, and other
information to facilitate text mining in the area of literature based
database curation, protein ontology development and named entity
recognition.

Mais conteúdo relacionado

Mais procurados (20)

Structural databases
Structural databases Structural databases
Structural databases
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Scop database
Scop databaseScop database
Scop database
 
DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)DNA data bank of japan (DDBJ)
DNA data bank of japan (DDBJ)
 
Prosite
PrositeProsite
Prosite
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Fasta
FastaFasta
Fasta
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Protein database
Protein databaseProtein database
Protein database
 
BLAST
BLASTBLAST
BLAST
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
Gene bank by kk sahu
Gene bank by kk sahuGene bank by kk sahu
Gene bank by kk sahu
 
Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-Sequence alig Sequence Alignment Pairwise alignment:-
Sequence alig Sequence Alignment Pairwise alignment:-
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 
Introduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbjIntroduction to ncbi, embl, ddbj
Introduction to ncbi, embl, ddbj
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 

Semelhante a PIR- Protein Information Resource

Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases Hemant Bothe
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...SBituila
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...BibiQuinah
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.pptSanthiyaAK
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)ShivaniShewale2
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introductionDrGopaSarma
 
protein databases
 protein databases protein databases
protein databaseswasisyed
 
Protein databases
Protein databasesProtein databases
Protein databasessarumalay
 

Semelhante a PIR- Protein Information Resource (20)

Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Protein Sequence Databases
Protein Sequence Databases Protein Sequence Databases
Protein Sequence Databases
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...Sequence and Structural Databases of DNA and Protein, and its significance in...
Sequence and Structural Databases of DNA and Protein, and its significance in...
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.ppt
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Protein information resource (PIR)
Protein information resource (PIR)Protein information resource (PIR)
Protein information resource (PIR)
 
Protein database
Protein  databaseProtein  database
Protein database
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Proteomic databases
Proteomic databasesProteomic databases
Proteomic databases
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
protein databases
 protein databases protein databases
protein databases
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Biological databases
Biological databases Biological databases
Biological databases
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 

Mais de Thapar Institute of Engineering & Technology, Patiala, Punjab, India

Mais de Thapar Institute of Engineering & Technology, Patiala, Punjab, India (20)

SDS PAGE
SDS PAGESDS PAGE
SDS PAGE
 
Agarose gel electrophoresis
Agarose gel electrophoresisAgarose gel electrophoresis
Agarose gel electrophoresis
 
Prokaryotic and eukaryotic cell
Prokaryotic and eukaryotic cellProkaryotic and eukaryotic cell
Prokaryotic and eukaryotic cell
 
Preparation and staining of specimens for microscopy
Preparation and staining of specimens for microscopyPreparation and staining of specimens for microscopy
Preparation and staining of specimens for microscopy
 
Microbial polysaccharides
Microbial polysaccharidesMicrobial polysaccharides
Microbial polysaccharides
 
Organic acids production copy
Organic acids production   copyOrganic acids production   copy
Organic acids production copy
 
Methods of strain improvement
Methods of strain improvementMethods of strain improvement
Methods of strain improvement
 
Refrigeration
RefrigerationRefrigeration
Refrigeration
 
Patents
PatentsPatents
Patents
 
Vaccines
VaccinesVaccines
Vaccines
 
Chemical reactions and rancidity of fats
Chemical reactions and rancidity of fatsChemical reactions and rancidity of fats
Chemical reactions and rancidity of fats
 
Characteristics of biological databases
Characteristics of biological databasesCharacteristics of biological databases
Characteristics of biological databases
 
FASTA
FASTAFASTA
FASTA
 
Organoleptic properties of proteins
Organoleptic properties of proteinsOrganoleptic properties of proteins
Organoleptic properties of proteins
 
Denaturation of proteins
Denaturation of proteinsDenaturation of proteins
Denaturation of proteins
 
OMIM- Online Mendelian Inheritance in Man
OMIM- Online Mendelian Inheritance in Man OMIM- Online Mendelian Inheritance in Man
OMIM- Online Mendelian Inheritance in Man
 
Antigen & antigenicity
Antigen & antigenicityAntigen & antigenicity
Antigen & antigenicity
 
Protein Data Bank (PDB)
Protein Data Bank (PDB)Protein Data Bank (PDB)
Protein Data Bank (PDB)
 
SWISS-PROT
SWISS-PROTSWISS-PROT
SWISS-PROT
 
Organs of the immune system
Organs of the immune systemOrgans of the immune system
Organs of the immune system
 

Último

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Último (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

PIR- Protein Information Resource

  • 2. Introduction • An integrated publicly accessible bioinformatics resource to support genomic/proteomic research and scientific discovery. • Established in 1984, by the National Biomedical Research Foundation (NBRF) Georgetown University Medial Center, Washington D.C., USA. • It is the source of annotated protein databases and analysis tools for the researchers. • Serve as primary resource for the exploration of protein information. • Accessible by text search for entry and list retrieval, and also BLAST search and peptide match.
  • 3. Features of PIR Comprehensive, Non-redundant, Annotated database contain protein sequences of prokaryotes, eukaryotes, viruses, phages, archaea. Data is well organized. Entries classified into protein family and super-family. Protein Sequence Database (PSD) cross-references to other genomic and proteomic public databases Updated weekly and full release are published quarterly. Provide cross reference between its own databases.
  • 4. Database Organization and Annotation • The basis of database organization and annotation lies in their proper structuring according to protein family relationships. • According to protein family relationships, the database can be structured at three level: 1. Super families and families  for full length sequence similarity 2. Homology domain  for local functional and structural units 3. Motifs  for functional and structural sites
  • 5. Resources of PIR The resources of PIR can be broadly classified into two categories: 1. Data retrieval systems 2. Databases
  • 6. Data Retrieval in PIR Data Retrieval in PIR consist of search engines of three types. Interactive text-based search engine Standard Sequence similarity search engines Advanced Search Engines Boolean queries of text fields Peptide match Pattern match BLAST FASTA Pair-wise alignment Multiple alignment 0 (false) 1 (true) Combine sequence similarity and annotation searches Evaluation of gene- family relationship
  • 7. Databases of PIR UniProt- Universal Protein Resource PIR + EBI (European Bioinformatics Institute) SIB (Swiss Institute of Bioinformatics) UniProt United Protein Database Central resource of Protein Sequence & Function
  • 8. UniProt- Universal Protein Resource The UniProt database consist of the following three database: 1. UniProt Knowledgebase (UniProtKB) 2. UniProt Reference Cluster (UniRef) 3. UniProt Archive (UniParc)
  • 9. UniProt Knowledgebase (UniProtKB) • Central database of protein sequences with annotation and functional information. • Provide single record for all protein products derived from a certain gene from a certain species. • Give details of accession number, alternative splicing, proteolytic cleavage, post- translational modifications to each from of derived protein. 2 Parts Contain Manually Annotated Records Contain Computationally Analyzed Records UniProt/Swiss-Prot UniProt/TrEMBL Which have to be manually annotated
  • 10. UniProt Reference Cluster (UniRef) • Provide non-redundant data collections based on UniProt Knowledgebase and UniParc to obtain complete coverage of sequence space at several resolution. 3 separate datasets that compress sequence space at different resolution: • Sequences that are 100% identical (UniRef100 database) • Sequences that are >= 90% identical (UniRef90 database) • Sequences that are >= 50% identical (UniRef50 database)
  • 11. UniProt Archive (UniParc) • Provides a stable, comprehensive, non-redundant sequence collection by storing the complete body of publicly available protein sequence data. • On addition of new or revised protein sequences, a UniParc sequence version is provided or increased and thus makes it possible to track the history of sequence changes in all the source databases. • To avoid redundancy, each unique sequence is assigned a unique identifier and is stored only once. • Basic information stored with each UniParc entry are the identifier, the sequence, cylic redundancy check number, source database with accession or version number and a time stamp.
  • 12. iProClass- Integrated Protein Knowledgebase • Provides comprehensive description of a protein family, function and structure for UniProt protein sequences, and serve as a framework for data integration in a distributed networking environment. • Contain non-redundant protein sequences from PIR-PSD, Swiss-Prot, TrEMBL. iProClass Family relationships Structural classifications Functional classifications Global level (superfamily, family) Local level (domain, motif, site)
  • 13. Types of Protein sequence reports iProClass 2 Types 1st Types 2nd Types Cover information on Structure Function Family Genetics Disease Ontology Taxonomy Literature With reference to relevant molecular databases Super-family report with Length Taxonomy Keyword statistics Complete member listing
  • 14. PIRSF-Protein Family Classification System • PIR extended its super-family concept and developed the Super- Family Classification system. • To facilitate the sensible propagation and standardization of protein annotation and systematic detection of annotation errors. • Consists of two datasets: Preliminary clusters and curated families. • Curated families include family name, protein membership, parent- child relationship, domain architecture, optional description and bibliography.
  • 15. iProLINK Integrated Protein Literature INformation and Knowledge Provides annotated literature, protein name directory, and other information to facilitate text mining in the area of literature based database curation, protein ontology development and named entity recognition.