SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
RDF
•


•



    etc


•




•
B=

     150B



     113B



     75B



     38B



       B
       1982   1986   1990   1994   1998   2002   2006   2010
ID




Gene Ontology, EC   etc
RDF

•


•         UniProt


•         PDBJ DDBJ


•Bio2RDF BioGateway
    RDF
UniProt RDF

• UniProt




•




• UniProt     RDF
UniProt

  Name                          Description                     Source          File size    #triples
  uniprot    Protein annotation data                       UniProt consortium     14G         3.3 B
  uniref     Clusters of proteins with similar sequences   UniProt consortium     7G          900M
  uniparc    Non-redundant archive of UniProt sequences    UniProt consortium     65G          1B
 citations   Literature citations                          UniProt consortium   1355M       10,177,308
 taxonomy    Classification of organisms                    UniProt consortium    421M       5,041,437
 journals    Journals                                      UniProt consortium     3M         34,850
 pathways    Pathways                                      UniProt consortium   1000K         8,865
 keywords    Keywords                                      UniProt consortium    940K         8,449
 locations   Subcellular locations                         UniProt consortium    468K         4,476
  tissues    TIssues                                       UniProt consortium    572K         7439
components   Cellular components (Organelles)              UniProt consortium     6K           43
    go       Gene onotology                                       SBI            25M         263,944
 enzymes     Classification of enzymes                       GO consortium         4M          4,476
 core.owl    Classes and properties for UniProt RDF        UniProt consortium    152K
#triples
  Sesame       Java                       70 M
   4store       C                         15 B
   5store       C
  Virtuoso      C                        15.4 B
   Jena        Java                       1.7 B
  Bigdata      Java                      12.7 B
   ARC         PHP
AllegroGraph   Lisp                        1B
                      http://esw.w3.org/LargeTripleStores
Protein                                          UniProt
         Components                        encodedIn
                            core.owl

<owl:ObjectProperty rdf:about="encodedIn">
    <rdfs:label rdf:datatype="&xsd;string">encoded in</rdfs:label>
    <rdfs:comment rdf:datatype="&xsd;string"
        >The subcellular location where a protein is encoded.</rdfs:comment>
    <rdfs:domain rdf:resource="Protein"/>
    <rdfs:range rdf:resource="Subcellular_Location"/>
</owl:ObjectProperty>
RDF                                                     purl
             http://purl.uniprot.org/{database}/{identifier}

                 UniProt

                     http://purl.uniprot.org/core/

                                Gene                           URI

                  http://purl.uniprot.org/core/Gene

      type
PDBJ, DDBJ                  RDF

• PDBJ
                 47     4.7B


• http://www.pdbj.org/rdf      ID


• DDBJ                                    INSD: International Nucleotide
 Sequence Database                  1.2                              76
      7.6B


• mulgara (http://mulgara.org/)
RDF



     KEGG Taxonomy          23,238
KEGG GENES Cyanobacteria    708,745
        KEGG OC            10,384,602
 hmmer Pfam-A vs Cyano     11,881,212
 hmmer Pfam-B vs Cyano     7,007,154
    Kazusa Annotatioin     2,807,879
1

•


• Synechococcus


• 1.0e-20


•                 Pfam
1     SPARQL
SPARQL 
PREFIX hmmer: <http://hmmer.janelia.org/>
PREFIX kegg: <http://www.kegg.jp/>
PREFIX kg:     <http://www.kegg.jp/entry/>
PREFIX pfam: <http://pfam.sanger.ac.uk/>
PREFIX kt:     <http://www.kegg.jp/taxon/>
SELECT ?pfam1, ?pfam2, COUNT(DISTINCT(?org))
WHERE {
  GRAPH <hmmer_pfam_a_cyano> {
    ?gene hmmer:hit        ?n1 .
    ?gene hmmer:hit        ?n2 .
    ?n1    pfam:pfam_id    ?pfam1 .
    ?n1    hmmer:i-evalue ?eval1 .
    ?n2    pfam:pfam_id    ?pfam2 .
    ?n2    hmmer:i-evalue ?eval2 .
  }
  GRAPH <http://www.kegg.jp/genes> {
    ?gene kegg:belongs_to ?org .
  }
  GRAPH <http://www.kegg.jp/taxonomy> {
    ?org kegg:belongs_to kt:Synechococcus .
  }
  FILTER (?eval1 < 1.0e-10 && ?eval2 < 1.0e-10 && ?pfam1 != ?pfam2)
};
10

  Domain I     Domain II             #genes            #species
RNA_pol_Rpb2 RNA_pol_Rpb2              9                  9
     _3
  G6PD_N          _1
               G6PD_C                  9                  9
5_3_exonuc_N     5_3_exonuc           9                   9
      HIT          DcpS_C             9                   9
Glyco_hydro_38 Glyco_hydro_38         9                   9
                     C
RNA_pol_Rpb2 RNA_pol_Rpb2             9                   9
      _6
   GARS_N            _3
                  GARS_C              9                   9
    DSHCT           DEAD              9                   9
   adh_short         KR               12                  9
    EFG_C          EFG_IV             10                  9
                    ....   171   9     Synechococcus
2

• KEGG                    OC


• Cyanobacteria


• Kazusa Annotation    PumMed


• KO   KEGG Othology
2     SPARQL
SPARQL
PREFIX kegg: <http://www.kegg.jp/>
PREFIX kg: <http://www.kegg.jp/entry/>
PREFIX kt: <http://www.kegg.jp/taxon/>
PREFIX kns: <http://a.kazusa.or.jp/ns/>
SELECT ?oc, ?gene, ?ko, COUNT(DISTINCT(?pm))
WHERE {
  GRAPH <http://www.kegg.jp/oc> {
    ?gene kegg:belongs_to ?oc .
  }
  GRAPH <http://www.kegg.jp/genes> {
    ?gene kegg:belongs_to ?taxon .
    ?gene kegg:linked_to ?cb_gene .
    OPTIONAL {
      ?gene kg:ortholog ?ko .
    }
  }
  GRAPH <http://www.kegg.jp/taxonomy> {
    ?taxon kegg:belongs_to kt:Cyanobacteria .
  }
  GRAPH <http://kazusa.or.jp/cyanobase> {
    ?cb_gene ?p1 ?bm .
    ?bm      ?p2 ?pm .
  }
};
PumMed ID                         10

      OC        #gene with PMID        #PMID
 Genes_537709          3                1296
 Genes_565278          3                761
 Genes_710476          2                527
 Genes_189668          1                497
 Genes_710587          1                479
 Genes_710480          1                416
 Genes_711471          1                407
 Genes_71824           1                393
 Genes_75617           5                381
 Genes_711511          1                376
Semantic Web

•       URI


•


•


• W3C
Semantic Web

• SPARQL
                          ->




•              ->


•


•                    ->

Mais conteúdo relacionado

Mais procurados

Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Lucidworks
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesJackie Wirz, PhD
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Surya Saha
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Christian Frech
 

Mais procurados (6)

EB-eye Back End
EB-eye Back EndEB-eye Back End
EB-eye Back End
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
 
PAG-2004-Roe
PAG-2004-RoePAG-2004-Roe
PAG-2004-Roe
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020Next-generation sequencing from 2005 to 2020
Next-generation sequencing from 2005 to 2020
 

Semelhante a Linked Data for integrating life-science databases

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleJennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemBio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemFrançois Belleau
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2Razzaqe
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptNaglaaFathy42
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcBioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcAdiM27
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2Razzaqe
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyShaojun Xie
 
Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsBioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsMohamedHasan816582
 
What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017Vivek Krishnakumar
 

Semelhante a Linked Data for integrating life-science databases (20)

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
RML NCBI Resources
RML NCBI ResourcesRML NCBI Resources
RML NCBI Resources
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemBio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
Bioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.pptBioinformatic_Databases_2.ppt
Bioinformatic_Databases_2.ppt
 
Bioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzcBioinformatic_Databases_2xcxzczxcxzxcxzc
Bioinformatic_Databases_2xcxzczxcxzxcxzc
 
Bioinformatic databases 2
Bioinformatic databases 2Bioinformatic databases 2
Bioinformatic databases 2
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
 
第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo第2回LinkedData勉強会@yayamamo
第2回LinkedData勉強会@yayamamo
 
Bioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt BioinformaticsBioinformatic_Databases_2.ppt Bioinformatics
Bioinformatic_Databases_2.ppt Bioinformatics
 
Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
Protein synthesis
Protein synthesis Protein synthesis
Protein synthesis
 
PDF文档.pdf
PDF文档.pdfPDF文档.pdf
PDF文档.pdf
 
Crispr/cas9 101
Crispr/cas9 101Crispr/cas9 101
Crispr/cas9 101
 
Submitted sequence (strains)
Submitted sequence (strains)Submitted sequence (strains)
Submitted sequence (strains)
 
What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017What's New at Araport - ICAR 2017
What's New at Araport - ICAR 2017
 

Último

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Linked Data for integrating life-science databases

  • 1. RDF
  • 2. • • etc • •
  • 3. B= 150B 113B 75B 38B B 1982 1986 1990 1994 1998 2002 2006 2010
  • 4.
  • 5.
  • 6.
  • 8.
  • 9. RDF • • UniProt • PDBJ DDBJ •Bio2RDF BioGateway RDF
  • 11. UniProt Name Description Source File size #triples uniprot Protein annotation data UniProt consortium 14G 3.3 B uniref Clusters of proteins with similar sequences UniProt consortium 7G 900M uniparc Non-redundant archive of UniProt sequences UniProt consortium 65G 1B citations Literature citations UniProt consortium 1355M 10,177,308 taxonomy Classification of organisms UniProt consortium 421M 5,041,437 journals Journals UniProt consortium 3M 34,850 pathways Pathways UniProt consortium 1000K 8,865 keywords Keywords UniProt consortium 940K 8,449 locations Subcellular locations UniProt consortium 468K 4,476 tissues TIssues UniProt consortium 572K 7439 components Cellular components (Organelles) UniProt consortium 6K 43 go Gene onotology SBI 25M 263,944 enzymes Classification of enzymes GO consortium 4M 4,476 core.owl Classes and properties for UniProt RDF UniProt consortium 152K
  • 12. #triples Sesame Java 70 M 4store C 15 B 5store C Virtuoso C 15.4 B Jena Java 1.7 B Bigdata Java 12.7 B ARC PHP AllegroGraph Lisp 1B http://esw.w3.org/LargeTripleStores
  • 13. Protein UniProt Components encodedIn core.owl <owl:ObjectProperty rdf:about="encodedIn"> <rdfs:label rdf:datatype="&xsd;string">encoded in</rdfs:label> <rdfs:comment rdf:datatype="&xsd;string" >The subcellular location where a protein is encoded.</rdfs:comment> <rdfs:domain rdf:resource="Protein"/> <rdfs:range rdf:resource="Subcellular_Location"/> </owl:ObjectProperty>
  • 14. RDF purl http://purl.uniprot.org/{database}/{identifier} UniProt http://purl.uniprot.org/core/ Gene URI http://purl.uniprot.org/core/Gene type
  • 15. PDBJ, DDBJ RDF • PDBJ 47 4.7B • http://www.pdbj.org/rdf ID • DDBJ INSD: International Nucleotide Sequence Database 1.2 76 7.6B • mulgara (http://mulgara.org/)
  • 16. RDF KEGG Taxonomy 23,238 KEGG GENES Cyanobacteria 708,745 KEGG OC 10,384,602 hmmer Pfam-A vs Cyano 11,881,212 hmmer Pfam-B vs Cyano 7,007,154 Kazusa Annotatioin 2,807,879
  • 18. 1 SPARQL SPARQL  PREFIX hmmer: <http://hmmer.janelia.org/> PREFIX kegg: <http://www.kegg.jp/> PREFIX kg: <http://www.kegg.jp/entry/> PREFIX pfam: <http://pfam.sanger.ac.uk/> PREFIX kt: <http://www.kegg.jp/taxon/> SELECT ?pfam1, ?pfam2, COUNT(DISTINCT(?org)) WHERE {   GRAPH <hmmer_pfam_a_cyano> {     ?gene hmmer:hit ?n1 .     ?gene hmmer:hit ?n2 .     ?n1 pfam:pfam_id ?pfam1 .     ?n1 hmmer:i-evalue ?eval1 .     ?n2 pfam:pfam_id ?pfam2 .     ?n2 hmmer:i-evalue ?eval2 .   }   GRAPH <http://www.kegg.jp/genes> {     ?gene kegg:belongs_to ?org .   }   GRAPH <http://www.kegg.jp/taxonomy> {     ?org kegg:belongs_to kt:Synechococcus .   }   FILTER (?eval1 < 1.0e-10 && ?eval2 < 1.0e-10 && ?pfam1 != ?pfam2) };
  • 19. 10 Domain I Domain II #genes #species RNA_pol_Rpb2 RNA_pol_Rpb2 9 9 _3 G6PD_N _1 G6PD_C 9 9 5_3_exonuc_N 5_3_exonuc 9 9 HIT DcpS_C 9 9 Glyco_hydro_38 Glyco_hydro_38 9 9 C RNA_pol_Rpb2 RNA_pol_Rpb2 9 9 _6 GARS_N _3 GARS_C 9 9 DSHCT DEAD 9 9 adh_short KR 12 9 EFG_C EFG_IV 10 9 .... 171 9 Synechococcus
  • 20. 2 • KEGG OC • Cyanobacteria • Kazusa Annotation PumMed • KO KEGG Othology
  • 21. 2 SPARQL SPARQL PREFIX kegg: <http://www.kegg.jp/> PREFIX kg: <http://www.kegg.jp/entry/> PREFIX kt: <http://www.kegg.jp/taxon/> PREFIX kns: <http://a.kazusa.or.jp/ns/> SELECT ?oc, ?gene, ?ko, COUNT(DISTINCT(?pm)) WHERE {   GRAPH <http://www.kegg.jp/oc> {     ?gene kegg:belongs_to ?oc .   }   GRAPH <http://www.kegg.jp/genes> {     ?gene kegg:belongs_to ?taxon .     ?gene kegg:linked_to ?cb_gene .     OPTIONAL {       ?gene kg:ortholog ?ko .     }   }   GRAPH <http://www.kegg.jp/taxonomy> {     ?taxon kegg:belongs_to kt:Cyanobacteria .   }   GRAPH <http://kazusa.or.jp/cyanobase> {     ?cb_gene ?p1 ?bm .     ?bm ?p2 ?pm .   } };
  • 22. PumMed ID 10 OC #gene with PMID #PMID Genes_537709 3 1296 Genes_565278 3 761 Genes_710476 2 527 Genes_189668 1 497 Genes_710587 1 479 Genes_710480 1 416 Genes_711471 1 407 Genes_71824 1 393 Genes_75617 5 381 Genes_711511 1 376
  • 23. Semantic Web • URI • • • W3C
  • 24. Semantic Web • SPARQL -> • -> • • ->