SlideShare a Scribd company logo
1 of 21
Download to read offline
Data Integration: what I haven’t yet achieved
Neil Saunders

MATHEMATICS, INFORMATICS AND STATISTICS
www.csiro.au
My main project

Ludwig colorectal cancer study

Data integration 2 of 21
Multiple “omics” platforms

exon expression

Data integration 3 of 21

methylation

copy number
We want to “integrate” these data

but what does that mean?

Data integration 4 of 21
Integration can mean “portals”

Data integration 5 of 21
Integration can mean “visualization”

Data integration 6 of 21
Integration can mean “correlation”

Data integration 7 of 21
What do we think integration means?

A

+

B

+

C

More information when combined than when separate
Data integration 8 of 21
What’s already “out there”? PubMed
PubMed Search: "data integration"
q
q

q

q

articles / 100 000

12

q

q

8
q

q
q

4

q

q

2002

2004

2006

Year

Data integration 9 of 21

2008

2010
What’s already “out there”? CiteULike

http://www.citeulike.org/user/neils/tag/integration

Data integration 10 of 21
Buzz-word compliant

Data integration 11 of 21
Quote from integIRTy paper

These methods can be roughly grouped into four categories:
stepwise, regression-based, correlation-based and
latent variable models
integIRTy: a method to identify genes altered in cancer by accounting for
multiple mechanisms of regulation using item response theory
Bioinformatics, Vol. 28, No. 22. (15 November 2012), pp. 2861-2869

Data integration 12 of 21
Regression: SIM

Integrated analysis of DNA copy number and gene expression microarray data using gene sets
BMC Bioinformatics 2009, 10:203

Data integration 13 of 21
1

2

3

4

5

6

7

8

10
9

11

12

13

14

15

16

17
18

19
20
21
22
0

0

Data integration 14 of 21
0.2
0.4
2

0.6
0.8
4

1

Correlation

010
026
142
011
115
018
037
145
017
009
023
002
116
117
120
003
036
029
040
114
118
121
112
006
113
119
034
035
028
004
007
013
014
016
024
012
019
021
015
001
067
068
072
077
048
058
064
050
075
080
086
051
061
070
076
087
092
096
099
101
104
110
093
097
100
089
109
091
103
127
130
131
135
133
136
134
137
125
128
138
146
032
033
043
038
041
042
140
141
144
153
152
147
122
123
132
126
139
069
074
085
055
095
005
066
010
026
142
011
115
018
037
145
017
009
023
002
116
117
120
003
036
029
040
114
118
121
112
006
113
119
034
035
028
004
007
013
014
016
024
012
019
021
015
001
067
068
072
077
048
058
064
050
075
080
086
051
061
070
076
087
092
096
099
101
104
110
093
097
100
089
109
091
103
127
130
131
135
133
136
134
137
125
128
138
146
032
033
043
038
041
042
140
141
144
153
152
147
122
123
132
126
139
069
074
085
055
095
005
066

Chr

Correlation: DR-Integrator
Latent variable: iCluster

(file under impractical)

Data integration 15 of 21
Basics that are never explained 1/2

Integration across groups or description of samples?

Data integration 16 of 21
Basics that are never explained 2/2

Genes x Samples

Data integration 17 of 21
Conclusions 1/3

We’re not the first people doing this...
...but it’s becoming a “hot topic”

Data integration 18 of 21
Conclusions 2/3

Room for improvement in software, much of which is:

• Poorly-written
• Poorly-documented
• Difficult to implement

Data integration 19 of 21
Conclusions 3/3

Too much for one individual!

Data integration 20 of 21
CSIRO Mathematics, Informatics and Statistics
Neil Saunders
t
+61 2 9325 3144
e Neil.Saunders@csiro.au
w Mathematics, Informatics and Statistics web

MATHEMATICS, INFORMATICS AND STATISTICS
www.csiro.au

More Related Content

Similar to Data Integration: What I Haven't Yet Achieved

Remote Patient & Elderly Care Monitoring
Remote Patient & Elderly Care MonitoringRemote Patient & Elderly Care Monitoring
Remote Patient & Elderly Care MonitoringVeselin Pizurica
 
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian networkImpact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian networkIJECEIAES
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...Rafael C. Jimenez
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightUniversity Medicine Greifswald
 
Throw the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-careThrow the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-carehoot72
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...IJECEIAES
 
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition MethodKPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition Methodijtsrd
 
Himss singapore 2012 clinician it leadership 2012[1]
Himss singapore 2012 clinician it leadership 2012[1]Himss singapore 2012 clinician it leadership 2012[1]
Himss singapore 2012 clinician it leadership 2012[1]HealthXn
 
Big Data and Business Intelligence in Health
Big Data and Business Intelligence in HealthBig Data and Business Intelligence in Health
Big Data and Business Intelligence in HealthHealthXn
 
Le Bauer: Data Driven Model Development
Le Bauer:  Data Driven Model DevelopmentLe Bauer:  Data Driven Model Development
Le Bauer: Data Driven Model DevelopmentquestRCN
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancerpaperpublications3
 
Blockchain key Drivers
Blockchain key Drivers Blockchain key Drivers
Blockchain key Drivers SumaMeeran
 
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...IAEME Publication
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls Dan Elton
 
A comparative study of cn2 rule and svm algorithm
A comparative study of cn2 rule and svm algorithmA comparative study of cn2 rule and svm algorithm
A comparative study of cn2 rule and svm algorithmAlexander Decker
 
Acceliant white paper_edc_and_epro
Acceliant white paper_edc_and_eproAcceliant white paper_edc_and_epro
Acceliant white paper_edc_and_eproTrianz
 
Arcs conference
Arcs conferenceArcs conference
Arcs conferenceHealthXn
 

Similar to Data Integration: What I Haven't Yet Achieved (20)

Big Data - A view
Big Data - A viewBig Data - A view
Big Data - A view
 
Remote Patient & Elderly Care Monitoring
Remote Patient & Elderly Care MonitoringRemote Patient & Elderly Care Monitoring
Remote Patient & Elderly Care Monitoring
 
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian networkImpact of big data congestion in IT: An adaptive knowledgebased Bayesian network
Impact of big data congestion in IT: An adaptive knowledgebased Bayesian network
 
Life science requirements from e-infrastructure: initial results from a joint...
Life science requirements from e-infrastructure:initial results from a joint...Life science requirements from e-infrastructure:initial results from a joint...
Life science requirements from e-infrastructure: initial results from a joint...
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
COMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management rightCOMBINE standards & tools: Getting model management right
COMBINE standards & tools: Getting model management right
 
Throw the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-careThrow the Semantic Web at Today's Health-care
Throw the Semantic Web at Today's Health-care
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
 
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition MethodKPCA and Eigen Face Based Dimension Reduction Face Recognition Method
KPCA and Eigen Face Based Dimension Reduction Face Recognition Method
 
Himss singapore 2012 clinician it leadership 2012[1]
Himss singapore 2012 clinician it leadership 2012[1]Himss singapore 2012 clinician it leadership 2012[1]
Himss singapore 2012 clinician it leadership 2012[1]
 
MultiModal Retrieval Image
MultiModal Retrieval ImageMultiModal Retrieval Image
MultiModal Retrieval Image
 
Big Data and Business Intelligence in Health
Big Data and Business Intelligence in HealthBig Data and Business Intelligence in Health
Big Data and Business Intelligence in Health
 
Le Bauer: Data Driven Model Development
Le Bauer:  Data Driven Model DevelopmentLe Bauer:  Data Driven Model Development
Le Bauer: Data Driven Model Development
 
Supervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For CancerSupervised Multi Attribute Gene Manipulation For Cancer
Supervised Multi Attribute Gene Manipulation For Cancer
 
Blockchain key Drivers
Blockchain key Drivers Blockchain key Drivers
Blockchain key Drivers
 
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
OPTIMIZED PREDICTION IN MEDICAL DIAGNOSIS USING DNA SEQUENCES AND STRUCTURE I...
 
Machine Learning Pitfalls
Machine Learning Pitfalls Machine Learning Pitfalls
Machine Learning Pitfalls
 
A comparative study of cn2 rule and svm algorithm
A comparative study of cn2 rule and svm algorithmA comparative study of cn2 rule and svm algorithm
A comparative study of cn2 rule and svm algorithm
 
Acceliant white paper_edc_and_epro
Acceliant white paper_edc_and_eproAcceliant white paper_edc_and_epro
Acceliant white paper_edc_and_epro
 
Arcs conference
Arcs conferenceArcs conference
Arcs conference
 

More from Neil Saunders

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Neil Saunders
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomicsNeil Saunders
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansNeil Saunders
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesNeil Saunders
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitNeil Saunders
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for youNeil Saunders
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Neil Saunders
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificityNeil Saunders
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?Neil Saunders
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformaticsNeil Saunders
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsNeil Saunders
 

More from Neil Saunders (11)

Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?Online bioinformatics forums: why do we keep asking the same questions?
Online bioinformatics forums: why do we keep asking the same questions?
 
Should I be dead? a very personal genomics
Should I be dead? a very personal genomicsShould I be dead? a very personal genomics
Should I be dead? a very personal genomics
 
Learning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticiansLearning from complete strangers: social networking for bioinformaticians
Learning from complete strangers: social networking for bioinformaticians
 
Building A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction NoticesBuilding A Web Application To Monitor PubMed Retraction Notices
Building A Web Application To Monitor PubMed Retraction Notices
 
Version Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using GitVersion Control in Bioinformatics: Our Experience Using Git
Version Control in Bioinformatics: Our Experience Using Git
 
What can science networking online do for you
What can science networking online do for youWhat can science networking online do for you
What can science networking online do for you
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
Predikin and PredikinDB:  tools to predict protein kinase peptide specificityPredikin and PredikinDB:  tools to predict protein kinase peptide specificity
Predikin and PredikinDB: tools to predict protein kinase peptide specificity
 
The Viking labelled release experiment: life on Mars?
The Viking labelled release experiment:  life on Mars?The Viking labelled release experiment:  life on Mars?
The Viking labelled release experiment: life on Mars?
 
Protein function and bioinformatics
Protein function and bioinformaticsProtein function and bioinformatics
Protein function and bioinformatics
 
Genomics of cold-adapted microorganisms
Genomics of cold-adapted microorganismsGenomics of cold-adapted microorganisms
Genomics of cold-adapted microorganisms
 

Recently uploaded

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Data Integration: What I Haven't Yet Achieved

  • 1. Data Integration: what I haven’t yet achieved Neil Saunders MATHEMATICS, INFORMATICS AND STATISTICS www.csiro.au
  • 2. My main project Ludwig colorectal cancer study Data integration 2 of 21
  • 3. Multiple “omics” platforms exon expression Data integration 3 of 21 methylation copy number
  • 4. We want to “integrate” these data but what does that mean? Data integration 4 of 21
  • 5. Integration can mean “portals” Data integration 5 of 21
  • 6. Integration can mean “visualization” Data integration 6 of 21
  • 7. Integration can mean “correlation” Data integration 7 of 21
  • 8. What do we think integration means? A + B + C More information when combined than when separate Data integration 8 of 21
  • 9. What’s already “out there”? PubMed PubMed Search: "data integration" q q q q articles / 100 000 12 q q 8 q q q 4 q q 2002 2004 2006 Year Data integration 9 of 21 2008 2010
  • 10. What’s already “out there”? CiteULike http://www.citeulike.org/user/neils/tag/integration Data integration 10 of 21
  • 12. Quote from integIRTy paper These methods can be roughly grouped into four categories: stepwise, regression-based, correlation-based and latent variable models integIRTy: a method to identify genes altered in cancer by accounting for multiple mechanisms of regulation using item response theory Bioinformatics, Vol. 28, No. 22. (15 November 2012), pp. 2861-2869 Data integration 12 of 21
  • 13. Regression: SIM Integrated analysis of DNA copy number and gene expression microarray data using gene sets BMC Bioinformatics 2009, 10:203 Data integration 13 of 21
  • 14. 1 2 3 4 5 6 7 8 10 9 11 12 13 14 15 16 17 18 19 20 21 22 0 0 Data integration 14 of 21 0.2 0.4 2 0.6 0.8 4 1 Correlation 010 026 142 011 115 018 037 145 017 009 023 002 116 117 120 003 036 029 040 114 118 121 112 006 113 119 034 035 028 004 007 013 014 016 024 012 019 021 015 001 067 068 072 077 048 058 064 050 075 080 086 051 061 070 076 087 092 096 099 101 104 110 093 097 100 089 109 091 103 127 130 131 135 133 136 134 137 125 128 138 146 032 033 043 038 041 042 140 141 144 153 152 147 122 123 132 126 139 069 074 085 055 095 005 066 010 026 142 011 115 018 037 145 017 009 023 002 116 117 120 003 036 029 040 114 118 121 112 006 113 119 034 035 028 004 007 013 014 016 024 012 019 021 015 001 067 068 072 077 048 058 064 050 075 080 086 051 061 070 076 087 092 096 099 101 104 110 093 097 100 089 109 091 103 127 130 131 135 133 136 134 137 125 128 138 146 032 033 043 038 041 042 140 141 144 153 152 147 122 123 132 126 139 069 074 085 055 095 005 066 Chr Correlation: DR-Integrator
  • 15. Latent variable: iCluster (file under impractical) Data integration 15 of 21
  • 16. Basics that are never explained 1/2 Integration across groups or description of samples? Data integration 16 of 21
  • 17. Basics that are never explained 2/2 Genes x Samples Data integration 17 of 21
  • 18. Conclusions 1/3 We’re not the first people doing this... ...but it’s becoming a “hot topic” Data integration 18 of 21
  • 19. Conclusions 2/3 Room for improvement in software, much of which is: • Poorly-written • Poorly-documented • Difficult to implement Data integration 19 of 21
  • 20. Conclusions 3/3 Too much for one individual! Data integration 20 of 21
  • 21. CSIRO Mathematics, Informatics and Statistics Neil Saunders t +61 2 9325 3144 e Neil.Saunders@csiro.au w Mathematics, Informatics and Statistics web MATHEMATICS, INFORMATICS AND STATISTICS www.csiro.au