SlideShare uma empresa Scribd logo
1 de 17
Data Management for Predictive Tools Paul Fearn, MBA NLM Informatics Research Fellow Biomedical and Health Informatics University of Washington | Fred Hutchinson Cancer Research Center Seattle, Washington PROSTATE CANCER: PREDICTIVE MODELS FOR DECISION MAKING April 7th – 9th, 2011  - MSKCC - New York, NY
Data Management Requirements Need to assemble large datasets for predictive modeling Pooling data across sites, systems and countries Linking data across clinical, specimen and lab repositories Quality assurance (for reproducibility of results) Tradeoffs between accuracy and reproducibility of data points Transparency of data processing Complete and up-to-date datasets Ease to access, sort, filter and export data Statistical analysis in Stata, R, SPSS, SAS, Excel SQL queries and reports Sustainability Secondary (N-ary) use of clinical and research data Cumulative cost of data entry Cumulative cost of staff training and turnover Cumulative risks and opportunity costs of staff entrenchment
The Growth Problem Lu Z. PubMed and Beyond. Database 2011;2011:baq036  21245076[pmid]
The Growth Problem http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
The Growth Problem http://www.ncbi.nlm.nih.gov/books/NBK44423/
The Breaking Point 1000 cases
The Growth Problem Microsoft Access databases 1999 ProstateDB 1.0 2000 PRDB / Prostabase ColdFusion & SQL Server web-based database 2002 Valhalla 1.0 – 1.1 Prostate 2003 Valhalla 1.2 (7,994 patients) Billing/EMR compliant populated clinic forms ASP.NET & SQL Server web-based database 2004 CAISIS 2.0 – 2.1 (26,470 patients) Integrated bladder, kidney, testis 2005 CAISIS 3.0 – 3.1 (44,000 patients) Prostatectomy eForm, protocol manager, tumor maps 2006 CAISIS 3.5 – (55,000 patients) GU and Urology Prostate Follow-up eForms 2007 CAISIS 4.0 – (80,000 patients) Metadata, dynamic forms, new diseases and eForms 2008 CAISIS 4.1 – (98,000 patients) Email eForms, advanced find, specimen tracking 2009 CAISIS 4.5 – (120,000+ patients) Project tracking, patient education, virtual fields, reporting module 2010 CAISIS 5.0x
The Curation Problem Increasing volume of data More data points for annotation Clinical / patient Genomic / biological Public health / environment Parallel curation issues in modern clinical and biological research databases (Krallinger 2008*) Development of NLP system to support clinical research operations (Savova 2010**) *18834499[pmid], **20819853[pmid]
On the Other Hand… Long tail of research efforts Small heterogeneous labs and projects Subsets of data Specialized requirements Innovative approaches
Spectrum of Approaches One dataset per project (i.e. study based systems) Registry databases (i.e. one treatment or disease) Data warehouse or data repository Common schema (data model) “Amalgamation” of heterogeneous datasets Common security and access Common syntax (data format) Defined links between records Indexed for searching and retrieval Federation / grid of semantically integrated data Common vocabulary / terminology Formal models (caBIG)
Loosely Linking Data http://www.ncbi.nlm.nih.gov/sites/gquery
Tightly Integrating Data Vocabulary / Terminology NCI Thesaurus (NCIt) NLM UMLS Standard data models caBIG / caDSR HL7/FDA/NCI CDISC / BRIDG Web services* Common syntax / format *Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-20 12000935[pmid]
The CAISIS System
Appendix: 394 people at 60 sites visited from Aug, 2008 to Jun, 2009 Driving Flying
[object Object]
Costly curation and support of research databases
Widespread and large scale implementation of EMRs

Mais conteúdo relacionado

Mais procurados

Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseAlejandra Gonzalez-Beltran
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...Genomika Diagnósticos
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIRDOM
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Ashish Sharma
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐dataGarethKnight
 
Informatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineInformatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineAndre Dekker
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0mehmood78
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook OntologyStuart Chalk
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemWolfgang Kuchinke
 
A model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsA model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsKody Moodley
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in researchLouise Corti
 
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...ASIS&T
 
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...ASIS&T
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...Catherine Canevet
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexSusanna-Assunta Sansone
 

Mais procurados (20)

Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Informatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineInformatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision Medicine
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
A model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsA model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicals
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in research
 
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
 
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 

Semelhante a NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools

Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized MedicineEdgewater
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Remedy Informatics
 
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...Mark Hawker
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceOla Spjuth
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataManjulaPatel
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016Warren Kibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 

Semelhante a NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools (20)

Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized Medicine
 
Translational Biomedical Informatics 2010: Infrastructure and Scaling
Translational Biomedical Informatics 2010: Infrastructure and ScalingTranslational Biomedical Informatics 2010: Infrastructure and Scaling
Translational Biomedical Informatics 2010: Infrastructure and Scaling
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Markham2009
Markham2009Markham2009
Markham2009
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 

Mais de European School of Oncology

ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...European School of Oncology
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...European School of Oncology
 
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...European School of Oncology
 
A. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasA. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasEuropean School of Oncology
 
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasA. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasEuropean School of Oncology
 
S. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineS. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineEuropean School of Oncology
 
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...European School of Oncology
 
J.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artJ.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artEuropean School of Oncology
 
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...European School of Oncology
 
T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer European School of Oncology
 
N. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerN. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerEuropean School of Oncology
 
S. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artS. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artEuropean School of Oncology
 
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...European School of Oncology
 
G. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artG. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artEuropean School of Oncology
 
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...European School of Oncology
 

Mais de European School of Oncology (20)

ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
 
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
 
W. Hassen - Bladder cancer - Guidelines
W. Hassen - Bladder cancer - GuidelinesW. Hassen - Bladder cancer - Guidelines
W. Hassen - Bladder cancer - Guidelines
 
A. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasA. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomas
 
H. Khaled - Bladder cancer - State of the art
H. Khaled - Bladder cancer - State of the artH. Khaled - Bladder cancer - State of the art
H. Khaled - Bladder cancer - State of the art
 
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasA. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
 
1 azim
1 azim1 azim
1 azim
 
H. Azim - Lymphomas - State of the art
H. Azim - Lymphomas - State of the artH. Azim - Lymphomas - State of the art
H. Azim - Lymphomas - State of the art
 
S. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineS. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccine
 
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
 
J.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artJ.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the art
 
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
 
V. Kesic - Cervical cancer - State of the art
V. Kesic - Cervical cancer - State of the art V. Kesic - Cervical cancer - State of the art
V. Kesic - Cervical cancer - State of the art
 
T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer
 
N. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerN. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancer
 
S. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artS. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the art
 
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
 
G. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artG. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the art
 
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools

  • 1. Data Management for Predictive Tools Paul Fearn, MBA NLM Informatics Research Fellow Biomedical and Health Informatics University of Washington | Fred Hutchinson Cancer Research Center Seattle, Washington PROSTATE CANCER: PREDICTIVE MODELS FOR DECISION MAKING April 7th – 9th, 2011 - MSKCC - New York, NY
  • 2. Data Management Requirements Need to assemble large datasets for predictive modeling Pooling data across sites, systems and countries Linking data across clinical, specimen and lab repositories Quality assurance (for reproducibility of results) Tradeoffs between accuracy and reproducibility of data points Transparency of data processing Complete and up-to-date datasets Ease to access, sort, filter and export data Statistical analysis in Stata, R, SPSS, SAS, Excel SQL queries and reports Sustainability Secondary (N-ary) use of clinical and research data Cumulative cost of data entry Cumulative cost of staff training and turnover Cumulative risks and opportunity costs of staff entrenchment
  • 3. The Growth Problem Lu Z. PubMed and Beyond. Database 2011;2011:baq036 21245076[pmid]
  • 4. The Growth Problem http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
  • 5. The Growth Problem http://www.ncbi.nlm.nih.gov/books/NBK44423/
  • 6. The Breaking Point 1000 cases
  • 7. The Growth Problem Microsoft Access databases 1999 ProstateDB 1.0 2000 PRDB / Prostabase ColdFusion & SQL Server web-based database 2002 Valhalla 1.0 – 1.1 Prostate 2003 Valhalla 1.2 (7,994 patients) Billing/EMR compliant populated clinic forms ASP.NET & SQL Server web-based database 2004 CAISIS 2.0 – 2.1 (26,470 patients) Integrated bladder, kidney, testis 2005 CAISIS 3.0 – 3.1 (44,000 patients) Prostatectomy eForm, protocol manager, tumor maps 2006 CAISIS 3.5 – (55,000 patients) GU and Urology Prostate Follow-up eForms 2007 CAISIS 4.0 – (80,000 patients) Metadata, dynamic forms, new diseases and eForms 2008 CAISIS 4.1 – (98,000 patients) Email eForms, advanced find, specimen tracking 2009 CAISIS 4.5 – (120,000+ patients) Project tracking, patient education, virtual fields, reporting module 2010 CAISIS 5.0x
  • 8. The Curation Problem Increasing volume of data More data points for annotation Clinical / patient Genomic / biological Public health / environment Parallel curation issues in modern clinical and biological research databases (Krallinger 2008*) Development of NLP system to support clinical research operations (Savova 2010**) *18834499[pmid], **20819853[pmid]
  • 9. On the Other Hand… Long tail of research efforts Small heterogeneous labs and projects Subsets of data Specialized requirements Innovative approaches
  • 10. Spectrum of Approaches One dataset per project (i.e. study based systems) Registry databases (i.e. one treatment or disease) Data warehouse or data repository Common schema (data model) “Amalgamation” of heterogeneous datasets Common security and access Common syntax (data format) Defined links between records Indexed for searching and retrieval Federation / grid of semantically integrated data Common vocabulary / terminology Formal models (caBIG)
  • 11. Loosely Linking Data http://www.ncbi.nlm.nih.gov/sites/gquery
  • 12. Tightly Integrating Data Vocabulary / Terminology NCI Thesaurus (NCIt) NLM UMLS Standard data models caBIG / caDSR HL7/FDA/NCI CDISC / BRIDG Web services* Common syntax / format *Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-20 12000935[pmid]
  • 14. Appendix: 394 people at 60 sites visited from Aug, 2008 to Jun, 2009 Driving Flying
  • 15.
  • 16. Costly curation and support of research databases
  • 17. Widespread and large scale implementation of EMRs
  • 18. Development of data warehouses and repositories
  • 20. Difficulties accessing and retrieving research data
  • 21. Skewed distribution of data systems
  • 22. Prevalence of Microsoft Access and Excel solutions
  • 23. Shifts to less expensive and more open source platforms
  • 24. REDCap, CAISIS, caTissue, Python and BioconductorAppendix: Site Visit Findings
  • 25. Appendix: Clinical Systems Surgical Reports Radiation Therapy Reports Pathology Reports Laboratory Reports Radiology Reports Review of Systems and Patient Reported Outcomes Electronic Medical / Health Records Registration / demographics Clinical trials eligibility and recruitment Scheduling and operations
  • 26. Appendix: Engaging Patients in Data Management Pre-first visit questionnaires Web-based survey systems (e.g. REDCap) Patient reported outcomes Longitudinal follow-up process Tablets, iPads and mobile applications

Notas do Editor

  1. I hope you will give a broad overview of the key features of the database that would allow the development of optimal predictive models, demonstrate how Caisis works to collect clinical and research data, and has proved to be so valuable to the development of predictive models.
  2. Constraints on data entry increase reproducibility, but may decrease accuracyConducive to quantitative research and hypothesis testingOpen fields / coding may increase accuracy, but decrease reproducibilityConducive to qualitative research and discovery
  3. Krallinger et al. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol (2008) vol. 9 Suppl 2 pp. S8Savova et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc (2010) vol. 17 (5) pp. 507-13
  4. Caisis is a data repository. One data model to rule them all
  5. How much time and effort does it take to pool databases and spreadsheets for predictive modeling?Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-2012000935[pmid]If there is a need for large aggregated datasets from heterogeneous sources to support predictive modeling, we need to plan for this model.Building for one site and rolling out to other sites successfully is rare.
  6. Most people proclaimed that they did not want to “reinvent the wheel”, but proceeded to do so. Disconnect between beliefs and actions.Harris et al. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform (2009) vol. 42 (2) pp. 377-81