SlideShare uma empresa Scribd logo
1 de 20
Linked Cancer Genome Atlas
Database
Muhammad Saleem, Shanmukha
Sampath Padmanabhuni, Axel-Cyrille
Ngonga Ngomo, Jonas S. Almeida,
Stefan Decker, Helena F. Deus.
Linked Data Cup, I-Semantics 2013, September 04 - 06 2013, Graz, Austria
Agenda
• Cancer Genome Atlas (TCGA) introduction
• Problem statement
• Linked TCGA a scalable solution
• Cancer treatment using Linked TCGA
• Demo of the use cases
• Conclusion
TCGA Introduction
• A publicly accessible atlas of cancer related data
from National Cancer Institute (NCI)
– 9000 patients
– 33 cancer types
– 147,645 raw data files
– total of 12.7 terabytes of data
• Only a 46% of the total expected data with new
data being submitted every day
• Goal is to enable cancer researchers to make and
validate important discoveries
Problem Statement
• Data in the TCGA is organized as text archives
with no remote querying interface
– Download very large archives and waiting in queues
– Parse the relevant text
– Collect the critical co-variates necessary for analysis
• Various types of experimental results are not
connected biologically
• TCGA data should be made publicly available for
remote querying and virtual integration
Linked TCGA a Scalable Solution:
RDFization
chromosome position beta_value
16 28890100 0.439271303584937
3 57743543 0.245147665381461
7 15725862 0.0440161061196347
2 177029073 0.741342927038953
11 93862594 0.0290713821114479
14 93813777 0.985555436681019
18 11980953 0.0109832005732912
14 89290921 0.0104525957219692
composite
element REF gene_symbolchromosome position beta_value
cg00000292 ATP2A1 16 288901000.439271303584937
cg00002426 SLMAP 3 577435430.245147665381461
cg00003994 MEOX2 7 157258620.0440161061196347
cg00005847 HOXD3 2 1770290730.741342927038953
cg00006414 ZNF425 7 148822837NA
cg00007981 PANX1 11 938625940.0290713821114479
cg00008493 COX8C 14 938137770.985555436681019
cg00008713 IMPA2 18 119809530.0109832005732912
cg00009407 TTC8 14 892909210.0104525957219692
Text to RDF Conversion
Data Refiner
Refined
Raw
chromosome position beta_value
16 28890100 0.439271303584937
3 57743543 0.245147665381461
7 15725862 0.0440161061196347
2 177029073 0.741342927038953
11 93862594 0.0290713821114479
14 93813777 0.985555436681019
18 11980953 0.0109832005732912
14 89290921 0.0104525957219692
composite
element REF gene_symbolchromosome position beta_value
cg00000292 ATP2A1 16 288901000.439271303584937
cg00002426 SLMAP 3 577435430.245147665381461
cg00003994 MEOX2 7 157258620.0440161061196347
cg00005847 HOXD3 2 1770290730.741342927038953
cg00006414 ZNF425 7 148822837NA
cg00007981 PANX1 11 938625940.0290713821114479
cg00008493 COX8C 14 938137770.985555436681019
cg00008713 IMPA2 18 119809530.0109832005732912
cg00009407 TTC8 14 892909210.0104525957219692
Text to RDF Conversion
Data Refiner
Refined
Raw
chromosome position beta_value
16 28890100 0.439271303584937
3 57743543 0.245147665381461
7 15725862 0.0440161061196347
2 177029073 0.741342927038953
11 93862594 0.0290713821114479
14 93813777 0.985555436681019
18 11980953 0.0109832005732912
14 89290921 0.0104525957219692
composite
element REF gene_symbolchromosome position beta_value
cg00000292 ATP2A1 16 288901000.439271303584937
cg00002426 SLMAP 3 577435430.245147665381461
cg00003994 MEOX2 7 157258620.0440161061196347
cg00005847 HOXD3 2 1770290730.741342927038953
cg00006414 ZNF425 7 148822837NA
cg00007981 PANX1 11 938625940.0290713821114479
cg00008493 COX8C 14 938137770.985555436681019
cg00008713 IMPA2 18 119809530.0109832005732912
cg00009407 TTC8 14 892909210.0104525957219692
Text to RDF Conversion
Data Refiner
Refined
Raw
chromosome position beta_value
16 28890100 0.439271303584937
3 57743543 0.245147665381461
7 15725862 0.0440161061196347
2 177029073 0.741342927038953
11 93862594 0.0290713821114479
14 93813777 0.985555436681019
18 11980953 0.0109832005732912
14 89290921 0.0104525957219692
composite
element REF gene_symbolchromosome position beta_value
cg00000292 ATP2A1 16 288901000.439271303584937
cg00002426 SLMAP 3 577435430.245147665381461
cg00003994 MEOX2 7 157258620.0440161061196347
cg00005847 HOXD3 2 1770290730.741342927038953
cg00006414 ZNF425 7 148822837NA
cg00007981 PANX1 11 938625940.0290713821114479
cg00008493 COX8C 14 938137770.985555436681019
cg00008713 IMPA2 18 119809530.0109832005732912
cg00009407 TTC8 14 892909210.0104525957219692
@prefix b:<http://tcga.deri.ie/>.
@prefix d:<http://tcga.deri.ie/schema/bcr_patient_barcode>.
@prefix r:<http://tcga.deri.ie/schema/result>.
@prefix c:<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>.
@prefix w:<http://tcga.deri.ie/schema/dna_methylation_result>.
@prefix m:<http://tcga.deri.ie/schema/chromosome>.
@prefix v:<http://tcga.deri.ie/schema/position>.
@prefix u:<http://tcga.deri.ie/schema/beta_value>.
b:TCGA-A2-A0CX d: "TCGA-A2-A0CX".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d1 .
b:TCGA-A2-A0CX-d1 c: w: ; m: "16"; v: "28890100"; u: "0.439271303584937".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d2 .
b:TCGA-A2-A0CX-d2 c: w: ; m: "3"; v: "57743543"; u: "0.245147665381461".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d3 .
b:TCGA-A2-A0CX-d3 c: w: ; m: "7"; v: "15725862"; u: "0.0440161061196347".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d4 .
b:TCGA-A2-A0CX-d4 c: w: ; m: "2"; v: "177029073"; u: "0.741342927038953".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d5 .
b:TCGA-A2-A0CX-d5 c: w: ; m: "11"; v: "93862594"; u: "0.0290713821114479".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d6 .
b:TCGA-A2-A0CX-d6 c: w: ; m: "14"; v: "93813777"; u: "0.985555436681019".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d7 .
b:TCGA-A2-A0CX-d7 c: w: ; m: "18"; v: "11980953"; u: "0.0109832005732912".
b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d8 .
b:TCGA-A2-A0CX-d8 c: w: ; m: "14"; v: "89290921"; u: "0.0104525957219692".
Text to RDF Conversion
Data Refiner
RDFizer
Refined
RDFizedRaw
Linked TCGA Data Workflow
Linked TCGA Tumors Statistics
Tumor Type
Original
Size(GB)
Refined
Size (GB)
RDFized
Size (GB)
Triples
(Million)
Cervical (CESC) 8.75 2.44 8.86 400.19
Rectal adenocarcinoma (READ) 8.07 2.25 9.04 413.31
Papillary Kidney (KIRP) 10.40 2.90 10.4 469.65
Bladder cancer (BLCA) 12.16 3.39 12.3 556.38
Acute Myeloid Leukemia (LAML) 14.85 4.14 15.1 684.05
Lower Grade Glioma (LGG) 17.08 4.76 17.1 778.82
Prostate adenocarcinoma (PRAD) 18.05 5.03 18.1 821.01
Lung squamous carcinoma (LUSC) 20.63 5.75 20.5 927.08
Cutaneous melanoma (SKCM) 23.22 6.47 23.2 1050.94
Head and neck squamous cell(HNSC) 27.6 7.69 27.5 1245.37
• A total of 7.36 Billion Triples for 10 small tumors
• Total Linked TCGA > 30 billion triples (Largest Dataset of LOD)
Linking to Linked Open Data
Source Target Class #Links
DNA27 HGNC Gene 23181
DNA27 Homologene Gene 27654
DNA27 HGNC Gene 15171
DNA450 Homologene Gene 489643
DNA450 OMIM Gene 212284
DNA27 HGNC Chromosome 108662
DNA27 OMIM Chromosome 16039535
Methylation HGNC Chromosome 97530
Methylation OMIM Chromosome 14407269
Gene Expression HGNC Chromosome 86052
Gene Expression OMIM Chromosome 12535829
• Links are generated using LIMES
http://aksw.org/Projects/LIMES.html
Cancer Treatment using Linked TCGA
Linked TCGA Use Cases
1. Targeted cancer treatment
– Whether a specific drug can be used to treat a tumour
using the genomic data of patients with same tumor
2. Mechanism-based treatment
– Whether a combination of drugs can be applied to treat
a specific tumor using similar patients data
3. Survival outcome
– Using mathematical model to predict future signs such
as survival outcome for a new patient
Use case 1,2 SPARQL query
SELECT ?patient ?mean
WHERE
{
?uri tcga:tumour_type "BRCA".
?uri tcga:bcr_patient_barcode ?patient.
?patient rdf:type tcga:expression_gene_results.
?patient tcga:gene_symbol "HER2","ER".
?patient tcga:scaled_estimate ?mean
}
Use Case 1,2 Querying LOD DrugBank
SELECT ?drugname
WHERE
{
?patient rdf:type tcga:expression_gene_results.
?patient tcga:gene_symbol ?targetname .
?patient tcga:scaled_estimate ?mean.
FILTER (?mean > Threshold)
?drug drugbank:target ?target.
?drug drugbank:genericName ?drugname .
?target drugbank:synonym ?targetname .
FILTER REGEX (?targetname, "HER2||estrogenreceptor||ERBB2", "i")
}
Use Case 3 Query
SELECT ?patient ?mean
WHERE
{
?uri tcga:tumour_type "BRCA".
?uri tcga:bcr_patient_barcode ?patient.
?patient rdf:type tcga:clinical.
?patient tcga:tumour_stage ?tumour_stage.
?patient tcga:age_at_initial_patalogical_diagnosis ?age.
?patient tcga:relevant_biomarker "BRCA1","CDKN2A", "CDH1".
?patient tcga:beta_value ?mean
}
Demo1
Demo2
Everything is Public
• TopFed: https://code.google.com/p/topfed/
• Linked TCGA : http://tcga.deri.ie/
saleem@informatik.uni-leipzig.de
AKSW, University of Leipzig, Germany
Thanks
Muhammad Saleem
saleem.muhammd@gmail.com

Mais conteúdo relacionado

Destaque

Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
Clinical research training - Dr Blanaid Mee - Dec 7th 2016
Clinical research training - Dr Blanaid Mee - Dec 7th 2016Clinical research training - Dr Blanaid Mee - Dec 7th 2016
Clinical research training - Dr Blanaid Mee - Dec 7th 2016ipposi
 
City of hope research informatics common data elements
City of hope research informatics common data elementsCity of hope research informatics common data elements
City of hope research informatics common data elementsAbdul-Malik Shakir
 
Patient profiling disaggregating the data
Patient profiling disaggregating the dataPatient profiling disaggregating the data
Patient profiling disaggregating the datanhsnwHELP
 
Patient-Generated Data for Cancer Treatment and Management
Patient-Generated Data for Cancer Treatment and ManagementPatient-Generated Data for Cancer Treatment and Management
Patient-Generated Data for Cancer Treatment and ManagementTommy Snitz
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsdatablend
 
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...Health IT Conference – iHT2
 
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...Emad Shash
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
 
Elective Care Conference: the role of the MDT coordinator role
Elective Care Conference: the role of the MDT coordinator role Elective Care Conference: the role of the MDT coordinator role
Elective Care Conference: the role of the MDT coordinator role NHS Improvement
 
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答建豪 陳
 
National Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingNational Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingWarren Kibbe
 
Swedish National Board of Health and Welfare Mona Heurgren
Swedish National Board of Health and Welfare Mona Heurgren Swedish National Board of Health and Welfare Mona Heurgren
Swedish National Board of Health and Welfare Mona Heurgren HIQAHI
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Health Catalyst
 
Clinical Data Management
Clinical Data ManagementClinical Data Management
Clinical Data Managementbiinoida
 
NCRI Kerri Clough Gorr
NCRI Kerri Clough GorrNCRI Kerri Clough Gorr
NCRI Kerri Clough GorrHIQAHI
 
Human Resource planning
Human Resource planningHuman Resource planning
Human Resource planningAnything Group
 

Destaque (19)

Malmo 11.11.2008
Malmo 11.11.2008Malmo 11.11.2008
Malmo 11.11.2008
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
how to sell
how to sellhow to sell
how to sell
 
Clinical research training - Dr Blanaid Mee - Dec 7th 2016
Clinical research training - Dr Blanaid Mee - Dec 7th 2016Clinical research training - Dr Blanaid Mee - Dec 7th 2016
Clinical research training - Dr Blanaid Mee - Dec 7th 2016
 
City of hope research informatics common data elements
City of hope research informatics common data elementsCity of hope research informatics common data elements
City of hope research informatics common data elements
 
Patient profiling disaggregating the data
Patient profiling disaggregating the dataPatient profiling disaggregating the data
Patient profiling disaggregating the data
 
Patient-Generated Data for Cancer Treatment and Management
Patient-Generated Data for Cancer Treatment and ManagementPatient-Generated Data for Cancer Treatment and Management
Patient-Generated Data for Cancer Treatment and Management
 
FluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphsFluxGraph: a time-machine for your graphs
FluxGraph: a time-machine for your graphs
 
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
iHT² Health IT Summit New York - Cancer Care Ontario Presentation "Transformi...
 
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
Impact of Multidisciplinary Discussion on Treatment Outcome For Gynecologic C...
 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
 
Elective Care Conference: the role of the MDT coordinator role
Elective Care Conference: the role of the MDT coordinator role Elective Care Conference: the role of the MDT coordinator role
Elective Care Conference: the role of the MDT coordinator role
 
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
2015 Micromedex使用者大會 如何在臨床工作中找到實證解答
 
National Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data SharingNational Cancer Data Ecosystem and Data Sharing
National Cancer Data Ecosystem and Data Sharing
 
Swedish National Board of Health and Welfare Mona Heurgren
Swedish National Board of Health and Welfare Mona Heurgren Swedish National Board of Health and Welfare Mona Heurgren
Swedish National Board of Health and Welfare Mona Heurgren
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 
Clinical Data Management
Clinical Data ManagementClinical Data Management
Clinical Data Management
 
NCRI Kerri Clough Gorr
NCRI Kerri Clough GorrNCRI Kerri Clough Gorr
NCRI Kerri Clough Gorr
 
Human Resource planning
Human Resource planningHuman Resource planning
Human Resource planning
 

Semelhante a Linked Cancer Genome Atlas Database

Medicilon KRAS-targeted Drugs R&D Service.pdf
Medicilon KRAS-targeted Drugs R&D Service.pdfMedicilon KRAS-targeted Drugs R&D Service.pdf
Medicilon KRAS-targeted Drugs R&D Service.pdfmedicilonz
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Jane Landolin
 
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...CancerImagingInforma
 
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...Weihua Liu
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchNolan Nichols
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowHorizonDiscovery
 
TCIA Data Harmonization Project
TCIA Data Harmonization ProjectTCIA Data Harmonization Project
TCIA Data Harmonization Projectimgcommcall
 
A method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkA method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkSOYEON KIM
 
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...Kate Barlow
 
Presentation july 31_2015
Presentation july 31_2015Presentation july 31_2015
Presentation july 31_2015gkoytiger
 
Presentatie maastricht
Presentatie maastrichtPresentatie maastricht
Presentatie maastrichtriannefijten
 
Hupo2017 wessels mb2021 Glycopeptide profiling
Hupo2017 wessels mb2021 Glycopeptide profilingHupo2017 wessels mb2021 Glycopeptide profiling
Hupo2017 wessels mb2021 Glycopeptide profilingHans Wessels
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Thermo Fisher Scientific
 

Semelhante a Linked Cancer Genome Atlas Database (20)

Medicilon KRAS-targeted Drugs R&D Service.pdf
Medicilon KRAS-targeted Drugs R&D Service.pdfMedicilon KRAS-targeted Drugs R&D Service.pdf
Medicilon KRAS-targeted Drugs R&D Service.pdf
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
A practical guide to using The Cancer Imaging Archive for QIN Challenges and ...
 
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
Mutation Profiling of CRC ctDNA using AmpliSeq CHP2 Cancer Panel AACR_NCI_EOR...
 
Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
TCIA Data Harmonization Project
TCIA Data Harmonization ProjectTCIA Data Harmonization Project
TCIA Data Harmonization Project
 
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues tRNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
 
Next Generation Sequencing (NGS) Approach to Investigate Role of Small RNAs i...
Next Generation Sequencing (NGS) Approach to Investigate Role of Small RNAs i...Next Generation Sequencing (NGS) Approach to Investigate Role of Small RNAs i...
Next Generation Sequencing (NGS) Approach to Investigate Role of Small RNAs i...
 
Next Generation Sequencing (NGS) Approach to Investigate ​ Role of Small RNA...
Next Generation Sequencing (NGS) Approach to Investigate ​  Role of Small RNA...Next Generation Sequencing (NGS) Approach to Investigate ​  Role of Small RNA...
Next Generation Sequencing (NGS) Approach to Investigate ​ Role of Small RNA...
 
A method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based networkA method to improve survival prediction using mutual information based network
A method to improve survival prediction using mutual information based network
 
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
Clinical Utility of Droplet Digital PCR on Liquid Biopsies from Patients with...
 
Mobile CRISPRi
Mobile CRISPRiMobile CRISPRi
Mobile CRISPRi
 
Presentation july 31_2015
Presentation july 31_2015Presentation july 31_2015
Presentation july 31_2015
 
Illumina sequencing introduction
Illumina sequencing introductionIllumina sequencing introduction
Illumina sequencing introduction
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
Presentatie maastricht
Presentatie maastrichtPresentatie maastricht
Presentatie maastricht
 
undergrad thesis
undergrad thesisundergrad thesis
undergrad thesis
 
Hupo2017 wessels mb2021 Glycopeptide profiling
Hupo2017 wessels mb2021 Glycopeptide profilingHupo2017 wessels mb2021 Glycopeptide profiling
Hupo2017 wessels mb2021 Glycopeptide profiling
 
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
Development of a Multi-Variant Frequency Ladder™ for Next Generation Sequenci...
 

Mais de Muhammad Saleem

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...Muhammad Saleem
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...Muhammad Saleem
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationMuhammad Saleem
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework Muhammad Saleem
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Muhammad Saleem
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsMuhammad Saleem
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016Muhammad Saleem
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015Muhammad Saleem
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataMuhammad Saleem
 

Mais de Muhammad Saleem (15)

QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
QaldGen: Towards Microbenchmarking of Question Answering Systems Over Knowled...
 
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
How Representative Is a SPARQL Benchmark? An Analysis of RDF Triplestore Benc...
 
LargeRDFBench
LargeRDFBenchLargeRDFBench
LargeRDFBench
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint FederationCostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation
 
SQCFramework: SPARQL Query containment Benchmark Generation Framework
SQCFramework: SPARQL Query containment  Benchmark Generation Framework SQCFramework: SPARQL Query containment  Benchmark Generation Framework
SQCFramework: SPARQL Query containment Benchmark Generation Framework
 
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
Question Answering Over Linked Data: What is Difficult to Answer? What Affect...
 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
 
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation SystemsFine-grained Evaluation of SPARQL Endpoint Federation Systems
Fine-grained Evaluation of SPARQL Endpoint Federation Systems
 
SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016SPARQL Querying Benchmarks ISWC2016
SPARQL Querying Benchmarks ISWC2016
 
FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015FEASIBLE-Benchmark-Framework-ISWC2015
FEASIBLE-Benchmark-Framework-ISWC2015
 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
Fostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked DataFostering Serendipity through Big Linked Data
Fostering Serendipity through Big Linked Data
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Linked Cancer Genome Atlas Database

  • 1. Linked Cancer Genome Atlas Database Muhammad Saleem, Shanmukha Sampath Padmanabhuni, Axel-Cyrille Ngonga Ngomo, Jonas S. Almeida, Stefan Decker, Helena F. Deus. Linked Data Cup, I-Semantics 2013, September 04 - 06 2013, Graz, Austria
  • 2. Agenda • Cancer Genome Atlas (TCGA) introduction • Problem statement • Linked TCGA a scalable solution • Cancer treatment using Linked TCGA • Demo of the use cases • Conclusion
  • 3. TCGA Introduction • A publicly accessible atlas of cancer related data from National Cancer Institute (NCI) – 9000 patients – 33 cancer types – 147,645 raw data files – total of 12.7 terabytes of data • Only a 46% of the total expected data with new data being submitted every day • Goal is to enable cancer researchers to make and validate important discoveries
  • 4. Problem Statement • Data in the TCGA is organized as text archives with no remote querying interface – Download very large archives and waiting in queues – Parse the relevant text – Collect the critical co-variates necessary for analysis • Various types of experimental results are not connected biologically • TCGA data should be made publicly available for remote querying and virtual integration
  • 5. Linked TCGA a Scalable Solution: RDFization
  • 6. chromosome position beta_value 16 28890100 0.439271303584937 3 57743543 0.245147665381461 7 15725862 0.0440161061196347 2 177029073 0.741342927038953 11 93862594 0.0290713821114479 14 93813777 0.985555436681019 18 11980953 0.0109832005732912 14 89290921 0.0104525957219692 composite element REF gene_symbolchromosome position beta_value cg00000292 ATP2A1 16 288901000.439271303584937 cg00002426 SLMAP 3 577435430.245147665381461 cg00003994 MEOX2 7 157258620.0440161061196347 cg00005847 HOXD3 2 1770290730.741342927038953 cg00006414 ZNF425 7 148822837NA cg00007981 PANX1 11 938625940.0290713821114479 cg00008493 COX8C 14 938137770.985555436681019 cg00008713 IMPA2 18 119809530.0109832005732912 cg00009407 TTC8 14 892909210.0104525957219692 Text to RDF Conversion Data Refiner Refined Raw
  • 7. chromosome position beta_value 16 28890100 0.439271303584937 3 57743543 0.245147665381461 7 15725862 0.0440161061196347 2 177029073 0.741342927038953 11 93862594 0.0290713821114479 14 93813777 0.985555436681019 18 11980953 0.0109832005732912 14 89290921 0.0104525957219692 composite element REF gene_symbolchromosome position beta_value cg00000292 ATP2A1 16 288901000.439271303584937 cg00002426 SLMAP 3 577435430.245147665381461 cg00003994 MEOX2 7 157258620.0440161061196347 cg00005847 HOXD3 2 1770290730.741342927038953 cg00006414 ZNF425 7 148822837NA cg00007981 PANX1 11 938625940.0290713821114479 cg00008493 COX8C 14 938137770.985555436681019 cg00008713 IMPA2 18 119809530.0109832005732912 cg00009407 TTC8 14 892909210.0104525957219692 Text to RDF Conversion Data Refiner Refined Raw
  • 8. chromosome position beta_value 16 28890100 0.439271303584937 3 57743543 0.245147665381461 7 15725862 0.0440161061196347 2 177029073 0.741342927038953 11 93862594 0.0290713821114479 14 93813777 0.985555436681019 18 11980953 0.0109832005732912 14 89290921 0.0104525957219692 composite element REF gene_symbolchromosome position beta_value cg00000292 ATP2A1 16 288901000.439271303584937 cg00002426 SLMAP 3 577435430.245147665381461 cg00003994 MEOX2 7 157258620.0440161061196347 cg00005847 HOXD3 2 1770290730.741342927038953 cg00006414 ZNF425 7 148822837NA cg00007981 PANX1 11 938625940.0290713821114479 cg00008493 COX8C 14 938137770.985555436681019 cg00008713 IMPA2 18 119809530.0109832005732912 cg00009407 TTC8 14 892909210.0104525957219692 Text to RDF Conversion Data Refiner Refined Raw
  • 9. chromosome position beta_value 16 28890100 0.439271303584937 3 57743543 0.245147665381461 7 15725862 0.0440161061196347 2 177029073 0.741342927038953 11 93862594 0.0290713821114479 14 93813777 0.985555436681019 18 11980953 0.0109832005732912 14 89290921 0.0104525957219692 composite element REF gene_symbolchromosome position beta_value cg00000292 ATP2A1 16 288901000.439271303584937 cg00002426 SLMAP 3 577435430.245147665381461 cg00003994 MEOX2 7 157258620.0440161061196347 cg00005847 HOXD3 2 1770290730.741342927038953 cg00006414 ZNF425 7 148822837NA cg00007981 PANX1 11 938625940.0290713821114479 cg00008493 COX8C 14 938137770.985555436681019 cg00008713 IMPA2 18 119809530.0109832005732912 cg00009407 TTC8 14 892909210.0104525957219692 @prefix b:<http://tcga.deri.ie/>. @prefix d:<http://tcga.deri.ie/schema/bcr_patient_barcode>. @prefix r:<http://tcga.deri.ie/schema/result>. @prefix c:<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>. @prefix w:<http://tcga.deri.ie/schema/dna_methylation_result>. @prefix m:<http://tcga.deri.ie/schema/chromosome>. @prefix v:<http://tcga.deri.ie/schema/position>. @prefix u:<http://tcga.deri.ie/schema/beta_value>. b:TCGA-A2-A0CX d: "TCGA-A2-A0CX". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d1 . b:TCGA-A2-A0CX-d1 c: w: ; m: "16"; v: "28890100"; u: "0.439271303584937". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d2 . b:TCGA-A2-A0CX-d2 c: w: ; m: "3"; v: "57743543"; u: "0.245147665381461". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d3 . b:TCGA-A2-A0CX-d3 c: w: ; m: "7"; v: "15725862"; u: "0.0440161061196347". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d4 . b:TCGA-A2-A0CX-d4 c: w: ; m: "2"; v: "177029073"; u: "0.741342927038953". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d5 . b:TCGA-A2-A0CX-d5 c: w: ; m: "11"; v: "93862594"; u: "0.0290713821114479". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d6 . b:TCGA-A2-A0CX-d6 c: w: ; m: "14"; v: "93813777"; u: "0.985555436681019". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d7 . b:TCGA-A2-A0CX-d7 c: w: ; m: "18"; v: "11980953"; u: "0.0109832005732912". b:TCGA-A2-A0CX r: b:TCGA-A2-A0CX-d8 . b:TCGA-A2-A0CX-d8 c: w: ; m: "14"; v: "89290921"; u: "0.0104525957219692". Text to RDF Conversion Data Refiner RDFizer Refined RDFizedRaw
  • 10. Linked TCGA Data Workflow
  • 11. Linked TCGA Tumors Statistics Tumor Type Original Size(GB) Refined Size (GB) RDFized Size (GB) Triples (Million) Cervical (CESC) 8.75 2.44 8.86 400.19 Rectal adenocarcinoma (READ) 8.07 2.25 9.04 413.31 Papillary Kidney (KIRP) 10.40 2.90 10.4 469.65 Bladder cancer (BLCA) 12.16 3.39 12.3 556.38 Acute Myeloid Leukemia (LAML) 14.85 4.14 15.1 684.05 Lower Grade Glioma (LGG) 17.08 4.76 17.1 778.82 Prostate adenocarcinoma (PRAD) 18.05 5.03 18.1 821.01 Lung squamous carcinoma (LUSC) 20.63 5.75 20.5 927.08 Cutaneous melanoma (SKCM) 23.22 6.47 23.2 1050.94 Head and neck squamous cell(HNSC) 27.6 7.69 27.5 1245.37 • A total of 7.36 Billion Triples for 10 small tumors • Total Linked TCGA > 30 billion triples (Largest Dataset of LOD)
  • 12. Linking to Linked Open Data Source Target Class #Links DNA27 HGNC Gene 23181 DNA27 Homologene Gene 27654 DNA27 HGNC Gene 15171 DNA450 Homologene Gene 489643 DNA450 OMIM Gene 212284 DNA27 HGNC Chromosome 108662 DNA27 OMIM Chromosome 16039535 Methylation HGNC Chromosome 97530 Methylation OMIM Chromosome 14407269 Gene Expression HGNC Chromosome 86052 Gene Expression OMIM Chromosome 12535829 • Links are generated using LIMES http://aksw.org/Projects/LIMES.html
  • 13. Cancer Treatment using Linked TCGA
  • 14. Linked TCGA Use Cases 1. Targeted cancer treatment – Whether a specific drug can be used to treat a tumour using the genomic data of patients with same tumor 2. Mechanism-based treatment – Whether a combination of drugs can be applied to treat a specific tumor using similar patients data 3. Survival outcome – Using mathematical model to predict future signs such as survival outcome for a new patient
  • 15. Use case 1,2 SPARQL query SELECT ?patient ?mean WHERE { ?uri tcga:tumour_type "BRCA". ?uri tcga:bcr_patient_barcode ?patient. ?patient rdf:type tcga:expression_gene_results. ?patient tcga:gene_symbol "HER2","ER". ?patient tcga:scaled_estimate ?mean }
  • 16. Use Case 1,2 Querying LOD DrugBank SELECT ?drugname WHERE { ?patient rdf:type tcga:expression_gene_results. ?patient tcga:gene_symbol ?targetname . ?patient tcga:scaled_estimate ?mean. FILTER (?mean > Threshold) ?drug drugbank:target ?target. ?drug drugbank:genericName ?drugname . ?target drugbank:synonym ?targetname . FILTER REGEX (?targetname, "HER2||estrogenreceptor||ERBB2", "i") }
  • 17. Use Case 3 Query SELECT ?patient ?mean WHERE { ?uri tcga:tumour_type "BRCA". ?uri tcga:bcr_patient_barcode ?patient. ?patient rdf:type tcga:clinical. ?patient tcga:tumour_stage ?tumour_stage. ?patient tcga:age_at_initial_patalogical_diagnosis ?age. ?patient tcga:relevant_biomarker "BRCA1","CDKN2A", "CDH1". ?patient tcga:beta_value ?mean }
  • 19. Everything is Public • TopFed: https://code.google.com/p/topfed/ • Linked TCGA : http://tcga.deri.ie/ saleem@informatik.uni-leipzig.de AKSW, University of Leipzig, Germany