Tutorial on the DisGeNET Discovery Platform, with especial focus on its exploitation in the Semantic Web showing how to retrieve and integrate DisGeNET data with other RDF linked datasets.
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
DisGeNET Tutorial SWAT4LS 2015-12-07
1. A discovery platform for translational research
Núria Queralt Rosinach
Integrative Biomedical Informatics Group (IBI)
Research Programme on Biomedical Informatics (GRIB)
Hospital del Mar Research Institute (IMIM)
Pompeu Fabra University (UPF)
Barcelona
Usage Tutorial
2. Outline
• How can DisGeNEThelp your research?
• DisGeNETDiscovery Platform Overview
• DisGeNETLinked Open Data
– Introduction
• RDF-LDDescription: Data Model,VoID, Interlinking
• Implementation
• Accessibility
• Documentation
• UseCases
– Queryingthe DisGeNET-RDF
• Hands-on
DisGeNET- Tutorial 2SWAT4LS 2015
3. How can DisGeNET help your
research?
DisGeNET- Tutorial 3SWAT4LS 2015
4. Big Questions 4 Big Data
Genotype Phenotype
Environment (life-style, chemicals, radiation,
infections,clinical care
intervention,…)
Human
Biology
Medical
Sciences
Understanding
Human
Diseases
PPI
DDI
Comorbidities
-EMR, EHR, IoT
-Imaging
-Patientregistries
-Clinical trials
-Epidemiologic
studies
-…
-Data Bases
-Literature
-OMICS
-Animal models
-…
DisGeNET- Tutorial 4
5. Translational Research
Genotype Phenotype
Environment
Molecular Patient
Understanding
Human
Diseases -EMR, EHR, IoT
-Imaging
-Patientregistries
-Clinical trials
-Epidemiologic
studies
-…
-Data Bases
-Literature
-OMICS
-Animal models
-…
Keyin
Translational
Research
•Decision-making
•Prevention
•Diagnosis
•Therapies
•ResearchDiscoveryDisGeNET- Tutorial 5SWAT4LS 2015
6. OMIM:300123;OMIM:312000
ORPHA393; ORPHA90695; ORPHA3157; ORPHA79495; ORPHA67045
MentalRetardation;Panhypopituitarism;46,XXsexreversal 3
MESH:C538613;MESH:C538613
No Data
Mental retardation -?- SOX3
Access to Gene-Disease Associations
SOX3
DisGeNET- Tutorial 6SWAT4LS 2015
7. OMIM:300123;OMIM:312000
ORPHA393; ORPHA90695; ORPHA3157; ORPHA79495; ORPHA67045
MentalRetardation;Panhypopituitarism;46,XXsexreversal 3
MESH:C538613;MESH:C538613
No Data
Mental retardation -?- SOX3
Access to Gene-Disease Associations
SOX3
Lack of:
• Normalization
• Semantic integration
• Data model harmonization
• Unified access
DisGeNET- Tutorial 7SWAT4LS 2015
8. http://www.disgenet.org/
•Piñero et al. DisGeNET: a discovery platformforthe dynamical explorationof humandiseases andtheir
genes. Database (2015)Vol. 2015: article ID bav028,(2015)
• Knowledgeplatformon human gene-diseaseassociations(GDAs)
• Integrates informationfromexpert-curateddatabasesandfrom the
literature(textmining)
• All disease areas
• Supportingevidence
• Analysis tools
DisGeNET- Tutorial 8SWAT4LS 2015
9. Research Questions
ANALYSIS
KNOWLEDGE
DISCOVERY
ACTIONABLE
INFORMATION
Evidence
• Which genes are associated to Marfan
syndrome?
• Which disease genes have approved
drugs annotated?
• Which disease genes have differential
expression?
• Which disease genes share a pathway?
• Is there genetic variation related to the
MECP2 and Rett Syndrome association?
• What evidence supports the association
between APP gene and Alzheimer
Disease?
• Which genes and evidence support the
comorbidity between Chronic Kidney
disease and Diabetes Mellitus, Type 2?
DisGeNET- Tutorial 9SWAT4LS 2015
23. Data Model
• How to describe an association?
a) As a property
b) As a class
Gene associated Disease
S P O
Gene Association Disease
PO SP O
DisGeNET- Tutorial 23SWAT4LS 2015
24. Data Model
• How to describe an association?
a) As a property
b) As a class
Gene associated Disease
S P O
Gene Association Disease
PO SP O
DisGeNET- Tutorial 24SWAT4LS 2015
25. Data Model
• How to describe an association?
a) As a property
b) As a class
Gene associated Disease
S P O
Gene Association Disease
PO SP O
Provenanceand Evidence
RDF triples
DisGeNET- Tutorial 25SWAT4LS 2015
26. Data Model
• Ontology-basedintegration
• DisGeNET Standards
– SharedIDs
– Standardontologies
Gene Association Disease
PO SP O
http://semanticscience.org/ontology/sio.owl
DisGeNETAssociation
Type Ontology
rdf:type
DisGeNET- Tutorial 26SWAT4LS 2015
31. Data Model
• URIs in DisGeNET: shared, cool & dereferenceable
– ID Normalization
– DisGeNET URIs:
– Estable URIs from primary data providers
– Identifiers.org
http://rdf.disgenet.org/resource/entity/ID
http://identifiers.org/data-collection-namespace/ID
Unique
association
attributes
DisGeNET- Tutorial 31SWAT4LS 2015
32. Data Model
• URIs in DisGeNET: shared, cool & dereferenceable
– ID Normalization
– Gene-Disease Association::DisGeNET ID
Entity URI Semantics
Gene-Disease
Association
http://rdf.disgenet.org/resource/gda/
DGNf5cb3969d75871f05a5d5f984f8dfc34
sio:SIO_001122
PubMed article http://identifiers.org/pubmed/9837812 ncit:C47902
Source http://rdf.disgenet.org/v3.0.0/void/uniprot-20150221
dctypes:Dataset,
dcat:Distribution
Score
http://rdf.disgenet.org/resource/gda/
ncbigene:4728_umls:C0023264_association_DisGeNET
Score
ncit:C25338
SNP http://identifiers.org/dbsnp/rs28939679 ncit:C18279
DisGeNET- Tutorial 32SWAT4LS 2015
33. Data Model
• URIs in DisGeNET: shared, cool & dereferenceable
– ID Normalization
– Gene::NCBI Gene ID
Entity URI Semantics
Gene http://identifiers.org/ncbigene/4728 ncit:C16612
HGNC Gene Symbol http://identifiers.org/hgnc.symbol/NDUFS8 ncit:C43568
Protein http://identifiers.org/uniprot/O00217 ncit:C17021
Panther Class
http://rdf.disgenet.org/resource/panther.classification
/PC00211
rdfs:Class
Pathway http://identifiers.org/reactome/REACT_111217 ncit:C20633
DisGeNET- Tutorial 33SWAT4LS 2015
34. Data Model
• URIs in DisGeNET: shared, cool & dereferenceable
– ID Normalization
– Disease::UMLS Concept Unique Identifier (CUI)
Entity URI Semantics
Disease
http://linkedlifedata.com/resource/umls/id/
C0023264
ncit:C7057
MeSH Class http://rdf.imim.es/rh-mesh.owl#C18 rdfs:Class
UMLS SemanticType
http://biotop.googlecode.com/svn/trunk/
umlssn.owl#T047
rdfs:Class
Phenotype http://purl.obolibrary.org/obo/HP_0004633 sio:SIO_010056
Cross References http://identifiers.org/vocab-namespace/ID
Human Disease
Ontology, MesH,
OMIM, Orphanet,
Decipher, NCIt, ICD9,
Human Phenotype
Ontology
DisGeNET- Tutorial 34SWAT4LS 2015
40. DisGeNET as Linked Open Data
• Interlinking: 4,962,315 RDF links to RDF datasets in the LOD
https://datahub.io/dataset/disgenet
(morestatistics)
DisGeNET- Tutorial 40
41. Federated Query Support
• SPARQL 1.1: SERVICE <sparql endpoint> {}
Disease ID Gene ID
GDA ID
Skos:exactMatch
DisGeNET- Tutorial 41SWAT4LS 2015
42. Implementation
• DisGeNET RDF data, VoID dataset description, and six OWL ontologies
loaded into the RDF Store
• Total number of triples: 24,882,432 (8,5G)
SPARQL
Endpoint
Faceted
Browser
LODEStar:SPARQL+ LD
Browser
Hardware:7.1.0
Usage Restrictions
• SPARQL:
• only SELECT, DESCRIBE,ASK,
CONSTRUCT
• performance opt:
• Max # of rows per result
• Max query costestimation
time
• Max query execution time
Security:basicsetupDisGeNET- Tutorial 42
46. SPARQL QUERIES
• Not easy
• RDF Schema-aware
• Performance issues
• Optimal queries: there is a trade off
between the amount of time you
spend analyzing and transforming the
query and the performance gains of
those transformations
• Technology-dependant
• crossing a lot of information
decrease speed (making the system
fails): better local
• Other approacheson development
• Q/A based on natural language
• Linked Data Fragments
• ElasticSearch
DisGeNET- Tutorial 46SWAT4LS 2015
47. Querying DisGeNET
• SPARQL Queries over DisGeNET data
http://rdf.disgenet.org/sparql/
http://rdf.disgenet.org/lodestar/sparql
• Contains all DisGeNET data
• Free access
• SPARQL 1.1 Standard
DisGeNET- Tutorial 47SWAT4LS 2015
51. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Minimal Resource Description Graph
DisGeNET- Tutorial 51SWAT4LS 2015
52. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
DisGeNET- Tutorial 52SWAT4LS 2015
53. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
?gda
SELECT DISTINCT?gda
FROM <http://rdf.disgenet.org>
WHERE{
?gdardf:typesio:SIO_001122.
}
LIMIT100
sio:SIO_001122
rdf:type
DisGeNET- Tutorial 53SWAT4LS 2015
54. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
SELECT DISTINCT?gda
FROM <http://rdf.disgenet.org>
WHERE{
?gdardf:typesio:SIO_001122.
}
LIMIT100
DisGeNET- Tutorial 54SWAT4LS 2015
55. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
• Which is the sio:SIO_001122class?
DisGeNET- Tutorial 55SWAT4LS 2015
56. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
• Which is the sio:SIO_001122class?
SELECTDISTINCT?gda?type ?label
FROM<http://rdf.disgenet.org>
WHERE {
?gdardf:type ?type .
FILTER(?type= sio:SIO_001122)
?typerdfs:label ?label
}
LIMIT100
DisGeNET- Tutorial 56SWAT4LS 2015
57. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach ?gda, show me the ?gene and the ?disease associated,and the
?typeOfAssociation
DisGeNET- Tutorial 57SWAT4LS 2015
58. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach ?gda, show me the ?gene and the ?disease associated,and the
?typeOfAssociation
SELECTDISTINCT?gda?gene ?disease ?type ?label
FROM<http://rdf.disgenet.org>
WHERE {
?gdardf:type ?type ;
sio:SIO_000628?gene,?disease .
?typerdfs:label ?label .
?genea ncit:C16612.
?diseasea ncit:C7057
}
LIMIT50
DisGeNET- Tutorial 58SWAT4LS 2015
59. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach ?gda, show me the ?gene and the ?disease associated,the
?paper,and the ?sentence description of the relationship in the paper
DisGeNET- Tutorial 59SWAT4LS 2015
60. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach ?gda, show me the ?gene and the ?disease associated,the
?paper,and the ?sentence description of the relationship in the paper
SELECTDISTINCT?gda?gene ?disease ?paper ?sentence
FROM<http://rdf.disgenet.org>
WHERE {
?gdasio:SIO_000628?gene,?disease ;
sio:SIO_000772?paper;
dcterms:description?sentence.
?genea ncit:C16612.
?diseasea ncit:C7057
}
LIMIT50
DisGeNET- Tutorial 60SWAT4LS 2015
61. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach ?gda, show me the ?gene and the ?disease associated,the
?paper,and the ?sentence description of the relationship in the paper
SELECTDISTINCT?gda?gene ?disease ?paper ?sentence
FROM<http://rdf.disgenet.org>
WHERE {
?gdasio:SIO_000628?gene,?disease ;
sio:SIO_000772?paper;
dcterms:description?sentence.
FILTER(regex(str(?sentence),"syndrome","i"))
?genea ncit:C16612.
?diseasea ncit:C7057
}
LIMIT50
DisGeNET- Tutorial 61SWAT4LS 2015
62. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach ?gda show me the ?gene, ?disease, ?source, and the level of
?evidenceof the association
DisGeNET- Tutorial 62SWAT4LS 2015
63. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach ?gda show me the ?gene, ?disease, ?source, and the level of
?evidenceof the association
PREFIXwi: <http://purl.org/ontology/wi/core#>
SELECTDISTINCT?gda?gene ?disease ?source ?evidence
FROM<http://rdf.disgenet.org>
WHERE {
?gdasio:SIO_000628?gene,?disease ;
sio:SIO_000253?source.
?genea ncit:C16612.
?diseasea ncit:C7057.
?sourcewi:evidence ?evidence
}
LIMIT50
DisGeNET- Tutorial 63SWAT4LS 2015
64. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach gene-diseasepairshow me the ?number of evidences and the
score?value
DisGeNET- Tutorial 64SWAT4LS 2015
65. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
•Foreach gene-diseasepairshow me the ?number of evidences and the
score?value
SELECTDISTINCT?gene ?disease count(DISTINCT?gda)AS ?numberOfEvidences
?scoreValue
FROM<http://rdf.disgenet.org>
WHERE {
?gdasio:SIO_000628?gene,?disease ;
sio:SIO_000216?score.
?genea ncit:C16612.
?diseasea ncit:C7057.
?scoresio:SIO_000300?scoreValue
}
ORDER BY DESC(?numberOfEvidences)DESC(?scoreValue)
LIMIT50
DisGeNET- Tutorial 65SWAT4LS 2015
66. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
• For each ?gdashow me the ?snp
DisGeNET- Tutorial 66SWAT4LS 2015
67. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene-Disease Association Graph
• For each ?gdashow me the ?snp
• Go to theWeb and understandand execute Q1.1-Q1.4
SELECTDISTINCT?gda?gene ?disease ?snp FROM
<http://rdf.disgenet.org>
WHERE {
?gdasio:SIO_000628?gene,?disease ;
sio:SIO_000001?snp.
?genea ncit:C16612.
?diseasea ncit:C7057.
}
LIMIT50
DisGeNET- Tutorial 67SWAT4LS 2015
69. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene Graph
?gene
SELECT DISTINCT?gene
FROM <http://rdf.disgenet.org>
WHERE{
?generdf:type ncit:C16612.
}
LIMIT100
Gene
rdf:type
DisGeNET- Tutorial 69SWAT4LS 2015
70. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene Graph
SELECT DISTINCT?gene
FROM <http://rdf.disgenet.org>
WHERE{
?generdf:type ncit:C16612.
}
LIMIT100
DisGeNET- Tutorial 70SWAT4LS 2015
71. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Gene Graph
• For each ?gene show me:
• ?identifier, ?name, ?geneSymbol
• ?protein(s)
• ?panther class(es)and ?pantherclassname
• ?pathway(s)and ?pathwayname
• Go to web and understand/executeQ1.5
DisGeNET- Tutorial 71SWAT4LS 2015
73. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease Graph
SELECT DISTINCT?disease
FROM <http://rdf.disgenet.org>
WHERE{
?diseasea ncit:C7057.
}
LIMIT100
?disease
Disease
rdf:type
DisGeNET- Tutorial 73SWAT4LS 2015
74. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease Graph
SELECT DISTINCT?disease
FROM <http://rdf.disgenet.org>
WHERE{
?diseasea ncit:C7057.
}
LIMIT100
DisGeNET- Tutorial 74SWAT4LS 2015
75. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease Graph
• For the disease <http://linkedlifedata.com/resource/umls/id/C0596263>showme:
• the disease ?name, MeSH disease class ?label, and the umlsSTY ?title
• show all cross-referencesto other disease terminologies
• Go to the Web and understand/executeQ1.6
DisGeNET- Tutorial 75SWAT4LS 2015
76. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease mapping to other ontologies
SELECT DISTINCT?disease
FROM <http://rdf.disgenet.org>
WHERE{
?diseaseskos:exactMatch?ontology .
}?ontology
Disease
?link
COVERAGE
DisGeNET- Tutorial 76SWAT4LS 2015
77. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Ontology Walking queries
• Grouping of similar instances
• Filtering data
• Query data by classes
•Ontologies loaded in our RDF triple store: SIO, DO,
ORDO, NCIT, HPO, and ECO (OWL)
• Go to the Web and understand/executeQ1.7andQ1.11
?child rdfs:subClassOf+ ?parent
DisGeNET- Tutorial 77SWAT4LS 2015
78. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease-Phenotype Association Graph (curated from HPO)
DisGeNET- Tutorial 78SWAT4LS 2015
79. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease-Phenotype Association Graph (curated from HPO)
• Why this model?
SELECTDISTINCT?disease count(distinct?hpdisease)as ?hpdiseases count(distinct
?phenotype)as ?phenotypes WHERE {
?disease rdf:type ncit:C7057.
?disease skos:exactMatch?hpdisease.
?hpdisease sio:SIO_000341?phenotype.
}
ORDER BY DESC(?hpdiseases)
LIMIT100
SELECTDISTINCT?disease ?hpdisease count(distinct?phenotype)as ?phenotypes
WHERE {
?disease rdf:type ncit:C7057.
?disease skos:exactMatch?hpdisease.
?hpdisease sio:SIO_000341?phenotype.
FILTER (?disease = <http://linkedlifedata.com/resource/umls/id/C3280766>)
}
GROUPBY ?disease ?hpdisease
80. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease-Phenotype Association Graph (curated from HPO)
• How many phenotypes are associatedwith Orphanet:209
DisGeNET- Tutorial 80SWAT4LS 2015
81. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease-Phenotype Association Graph (curated from HPO)
• How many phenotypes are associatedwith Orphanet:209
DisGeNET- Tutorial 81SWAT4LS 2015
SELECTDISTINCT?disease ?hpdisease count(distinct?phenotype)as ?phenotypes
WHERE {
?disease rdf:type ncit:C7057.
?disease skos:exactMatch?hpdisease.
?hpdisease sio:SIO_000341?phenotype.
FILTER (?hpdisease = <http://identifiers.org/orphanet/209>)
}
82. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease-Phenotype Association Graph (curated from HPO)
DisGeNET- Tutorial 82SWAT4LS 2015
• How many diseases are associatedwith a phenotype
83. Querying DisGeNET
• SPARQL Queries over DisGeNET data
• Disease-Phenotype Association Graph (curated from HPO)
• Go to the Web and understand/executeQ1.10and Q1.12
DisGeNET- Tutorial 83SWAT4LS 2015
• How many diseases are associatedwith a phenotype
SELECTDISTINCT?phenotype ?phenotypeName count(distinct?disease)as
?diseases
WHERE {
?hpdisease sio:SIO_000341?phenotype.
?phenotype dcterms:title ?phenotypeName.
?disease skos:exactMatch?hpdisease.
?disease rdf:type ncit:C7057;
dcterms:title ?diseaseName .
}
ORDER BY DESC(?diseases)
LIMIT100
84. Querying DisGeNET + LOD cloud
• Federated Queries: DisGeNET + external datasets
• Go to the Web and understand/executethe FederatedQueries
DisGeNET- Tutorial 84SWAT4LS 2015
85. Use Cases
• What genes are associated to Marfan syndrome?
• What evidence supports the association between APP gene and Alzheimer
Disease?
• What disease classes are associated with APP gene?
• Which genes and evidence support the comorbidity between Chronic
Kidney disease and Diabetes Mellitus, Type 2?
• What SNPs are related to the MECP2 and Rett Syndrome association?
• Which diseases are associated to post-translational modifications type of
association?
• What disease genes are hitted by compounds in ChEMBL?
• What disease genes have differential expression in Gene Expression Atlas?
• What disease genes are in WikiPathways?
• Find compounds (from ChEMBL) that target genes (from DisGeNET) that
participate in the same pathway (from WikiPathways)
DisGeNET- Tutorial 85SWAT4LS 2015