SlideShare uma empresa Scribd logo
1 de 17
Data Management for Predictive Tools Paul Fearn, MBA NLM Informatics Research Fellow Biomedical and Health Informatics University of Washington | Fred Hutchinson Cancer Research Center Seattle, Washington PROSTATE CANCER: PREDICTIVE MODELS FOR DECISION MAKING April 7th – 9th, 2011  - MSKCC - New York, NY
Data Management Requirements Need to assemble large datasets for predictive modeling Pooling data across sites, systems and countries Linking data across clinical, specimen and lab repositories Quality assurance (for reproducibility of results) Tradeoffs between accuracy and reproducibility of data points Transparency of data processing Complete and up-to-date datasets Ease to access, sort, filter and export data Statistical analysis in Stata, R, SPSS, SAS, Excel SQL queries and reports Sustainability Secondary (N-ary) use of clinical and research data Cumulative cost of data entry Cumulative cost of staff training and turnover Cumulative risks and opportunity costs of staff entrenchment
The Growth Problem Lu Z. PubMed and Beyond. Database 2011;2011:baq036  21245076[pmid]
The Growth Problem http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
The Growth Problem http://www.ncbi.nlm.nih.gov/books/NBK44423/
The Breaking Point 1000 cases
The Growth Problem Microsoft Access databases 1999 ProstateDB 1.0 2000 PRDB / Prostabase ColdFusion & SQL Server web-based database 2002 Valhalla 1.0 – 1.1 Prostate 2003 Valhalla 1.2 (7,994 patients) Billing/EMR compliant populated clinic forms ASP.NET & SQL Server web-based database 2004 CAISIS 2.0 – 2.1 (26,470 patients) Integrated bladder, kidney, testis 2005 CAISIS 3.0 – 3.1 (44,000 patients) Prostatectomy eForm, protocol manager, tumor maps 2006 CAISIS 3.5 – (55,000 patients) GU and Urology Prostate Follow-up eForms 2007 CAISIS 4.0 – (80,000 patients) Metadata, dynamic forms, new diseases and eForms 2008 CAISIS 4.1 – (98,000 patients) Email eForms, advanced find, specimen tracking 2009 CAISIS 4.5 – (120,000+ patients) Project tracking, patient education, virtual fields, reporting module 2010 CAISIS 5.0x
The Curation Problem Increasing volume of data More data points for annotation Clinical / patient Genomic / biological Public health / environment Parallel curation issues in modern clinical and biological research databases (Krallinger 2008*) Development of NLP system to support clinical research operations (Savova 2010**) *18834499[pmid], **20819853[pmid]
On the Other Hand… Long tail of research efforts Small heterogeneous labs and projects Subsets of data Specialized requirements Innovative approaches
Spectrum of Approaches One dataset per project (i.e. study based systems) Registry databases (i.e. one treatment or disease) Data warehouse or data repository Common schema (data model) “Amalgamation” of heterogeneous datasets Common security and access Common syntax (data format) Defined links between records Indexed for searching and retrieval Federation / grid of semantically integrated data Common vocabulary / terminology Formal models (caBIG)
Loosely Linking Data http://www.ncbi.nlm.nih.gov/sites/gquery
Tightly Integrating Data Vocabulary / Terminology NCI Thesaurus (NCIt) NLM UMLS Standard data models caBIG / caDSR HL7/FDA/NCI CDISC / BRIDG Web services* Common syntax / format *Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-20 12000935[pmid]
The CAISIS System
Appendix: 394 people at 60 sites visited from Aug, 2008 to Jun, 2009 Driving Flying
[object Object]
Costly curation and support of research databases
Widespread and large scale implementation of EMRs

Mais conteúdo relacionado

Mais procurados

Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseAlejandra Gonzalez-Beltran
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...Genomika Diagnósticos
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIRDOM
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Ashish Sharma
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐dataGarethKnight
 
Informatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineInformatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineAndre Dekker
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0mehmood78
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook OntologyStuart Chalk
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemWolfgang Kuchinke
 
A model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsA model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsKody Moodley
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in researchLouise Corti
 
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...ASIS&T
 
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...ASIS&T
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...Catherine Canevet
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexSusanna-Assunta Sansone
 

Mais procurados (20)

Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Burton - Security, Privacy and Trust
Burton - Security, Privacy and TrustBurton - Security, Privacy and Trust
Burton - Security, Privacy and Trust
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
 
Doing research better: The role of meta‐data
Doing research better: The role of meta‐dataDoing research better: The role of meta‐data
Doing research better: The role of meta‐data
 
Informatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineInformatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision Medicine
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017Bioschemas at bio hackathon 2017
Bioschemas at bio hackathon 2017
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
 
The Electronic Notebook Ontology
The Electronic Notebook OntologyThe Electronic Notebook Ontology
The Electronic Notebook Ontology
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
Repositories in an Open Data Ecosystem
Repositories in an Open Data EcosystemRepositories in an Open Data Ecosystem
Repositories in an Open Data Ecosystem
 
A model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicalsA model for capturing provenance of assertions about chemicals
A model for capturing provenance of assertions about chemicals
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in research
 
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
RDAP 16 Poster: Diving into Data: Implementing a Data Repository at the Texas...
 
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
RDAP 16 Poster: A Proposed Course Model for Integrating RDM with Research Rep...
 
From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...From data to knowledge – the Ondex System for integrating Life Sciences data ...
From data to knowledge – the Ondex System for integrating Life Sciences data ...
 
NIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery IndexNIH BD2K bioCADDIE DataMed: Data Discovery Index
NIH BD2K bioCADDIE DataMed: Data Discovery Index
 

Semelhante a NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools

Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceCarole Goble
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized MedicineEdgewater
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesUniversity of Malaya
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Remedy Informatics
 
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...Mark Hawker
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data ManagementCarole Goble
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceOla Spjuth
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009Ian Foster
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataManjulaPatel
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016Warren Kibbe
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 

Semelhante a NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools (20)

Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized Medicine
 
Translational Biomedical Informatics 2010: Infrastructure and Scaling
Translational Biomedical Informatics 2010: Infrastructure and ScalingTranslational Biomedical Informatics 2010: Infrastructure and Scaling
Translational Biomedical Informatics 2010: Infrastructure and Scaling
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
Ontology-Driven Clinical Intelligence: Removing Data Barriers for Cross-Disci...
 
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
The Future: Overcoming the Barriers to Using NHS Clinical Data For Research P...
 
A Big Picture in Research Data Management
A Big Picture in Research Data ManagementA Big Picture in Research Data Management
A Big Picture in Research Data Management
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Markham2009
Markham2009Markham2009
Markham2009
 
AAPM Foster July 2009
AAPM Foster July 2009AAPM Foster July 2009
AAPM Foster July 2009
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
Curation and Preservation of Crystallography Data
Curation and Preservation of Crystallography DataCuration and Preservation of Crystallography Data
Curation and Preservation of Crystallography Data
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016NCI Cancer Genomic Data Commons for NCAB September 2016
NCI Cancer Genomic Data Commons for NCAB September 2016
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 

Mais de European School of Oncology

ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...European School of Oncology
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...European School of Oncology
 
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...European School of Oncology
 
A. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasA. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasEuropean School of Oncology
 
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasA. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasEuropean School of Oncology
 
S. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineS. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineEuropean School of Oncology
 
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...European School of Oncology
 
J.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artJ.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artEuropean School of Oncology
 
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...European School of Oncology
 
T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer European School of Oncology
 
N. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerN. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerEuropean School of Oncology
 
S. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artS. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artEuropean School of Oncology
 
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...European School of Oncology
 
G. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artG. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artEuropean School of Oncology
 
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...European School of Oncology
 

Mais de European School of Oncology (20)

ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
ABC1 - X. Zhang - Metastasis seed pre-selection driven by the microenvironmen...
 
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
G. Ceresoli - Prostate and renal cancer - State of the art and update on syst...
 
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
A. Shamseddine - Prostate and renal cancer - State of the art and update on s...
 
W. Hassen - Bladder cancer - Guidelines
W. Hassen - Bladder cancer - GuidelinesW. Hassen - Bladder cancer - Guidelines
W. Hassen - Bladder cancer - Guidelines
 
A. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomasA. Stathis - New drugs in the treatment of lymphomas
A. Stathis - New drugs in the treatment of lymphomas
 
H. Khaled - Bladder cancer - State of the art
H. Khaled - Bladder cancer - State of the artH. Khaled - Bladder cancer - State of the art
H. Khaled - Bladder cancer - State of the art
 
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomasA. Stathis - Lymphomas - New drugs in the treatment of lymphomas
A. Stathis - Lymphomas - New drugs in the treatment of lymphomas
 
1 azim
1 azim1 azim
1 azim
 
H. Azim - Lymphomas - State of the art
H. Azim - Lymphomas - State of the artH. Azim - Lymphomas - State of the art
H. Azim - Lymphomas - State of the art
 
S. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccineS. Khleif - Ovarian cancer - General lecture on vaccine
S. Khleif - Ovarian cancer - General lecture on vaccine
 
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
A. Hassan - Ovarian cancer - Guidelines and clinical case presentation (2-3 c...
 
J.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the artJ.B. Vermorken - Ovarian cancer - State of the art
J.B. Vermorken - Ovarian cancer - State of the art
 
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
A. Hassan - Cervical cancer - Guidelines and clinical case presentation (2-3 ...
 
V. Kesic - Cervical cancer - State of the art
V. Kesic - Cervical cancer - State of the art V. Kesic - Cervical cancer - State of the art
V. Kesic - Cervical cancer - State of the art
 
T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer T. Cufer - Breast cancer - State of the art for advanced breast cancer
T. Cufer - Breast cancer - State of the art for advanced breast cancer
 
N. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancerN. El Saghir - Breast cancer - State of the art for early breast cancer
N. El Saghir - Breast cancer - State of the art for early breast cancer
 
S. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the artS. Cascinu - Liver/Hepatobiliary - State of the art
S. Cascinu - Liver/Hepatobiliary - State of the art
 
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
S. Cascinu - Colorectal cancer - Guidelines and clinical case presentation (2...
 
G. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the artG. Pentheroudakis - Colorectal cancer - State of the art
G. Pentheroudakis - Colorectal cancer - State of the art
 
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
A. Tfayli - Head and neck - Guidelines and clinical case presentation (2-3 ca...
 

Último

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

NY Prostate Cancer Conference - P.A. Fearn - Session 1: Data management for predictive tools

  • 1. Data Management for Predictive Tools Paul Fearn, MBA NLM Informatics Research Fellow Biomedical and Health Informatics University of Washington | Fred Hutchinson Cancer Research Center Seattle, Washington PROSTATE CANCER: PREDICTIVE MODELS FOR DECISION MAKING April 7th – 9th, 2011 - MSKCC - New York, NY
  • 2. Data Management Requirements Need to assemble large datasets for predictive modeling Pooling data across sites, systems and countries Linking data across clinical, specimen and lab repositories Quality assurance (for reproducibility of results) Tradeoffs between accuracy and reproducibility of data points Transparency of data processing Complete and up-to-date datasets Ease to access, sort, filter and export data Statistical analysis in Stata, R, SPSS, SAS, Excel SQL queries and reports Sustainability Secondary (N-ary) use of clinical and research data Cumulative cost of data entry Cumulative cost of staff training and turnover Cumulative risks and opportunity costs of staff entrenchment
  • 3. The Growth Problem Lu Z. PubMed and Beyond. Database 2011;2011:baq036 21245076[pmid]
  • 4. The Growth Problem http://www.ncbi.nlm.nih.gov/genbank/genbankstats.html
  • 5. The Growth Problem http://www.ncbi.nlm.nih.gov/books/NBK44423/
  • 6. The Breaking Point 1000 cases
  • 7. The Growth Problem Microsoft Access databases 1999 ProstateDB 1.0 2000 PRDB / Prostabase ColdFusion & SQL Server web-based database 2002 Valhalla 1.0 – 1.1 Prostate 2003 Valhalla 1.2 (7,994 patients) Billing/EMR compliant populated clinic forms ASP.NET & SQL Server web-based database 2004 CAISIS 2.0 – 2.1 (26,470 patients) Integrated bladder, kidney, testis 2005 CAISIS 3.0 – 3.1 (44,000 patients) Prostatectomy eForm, protocol manager, tumor maps 2006 CAISIS 3.5 – (55,000 patients) GU and Urology Prostate Follow-up eForms 2007 CAISIS 4.0 – (80,000 patients) Metadata, dynamic forms, new diseases and eForms 2008 CAISIS 4.1 – (98,000 patients) Email eForms, advanced find, specimen tracking 2009 CAISIS 4.5 – (120,000+ patients) Project tracking, patient education, virtual fields, reporting module 2010 CAISIS 5.0x
  • 8. The Curation Problem Increasing volume of data More data points for annotation Clinical / patient Genomic / biological Public health / environment Parallel curation issues in modern clinical and biological research databases (Krallinger 2008*) Development of NLP system to support clinical research operations (Savova 2010**) *18834499[pmid], **20819853[pmid]
  • 9. On the Other Hand… Long tail of research efforts Small heterogeneous labs and projects Subsets of data Specialized requirements Innovative approaches
  • 10. Spectrum of Approaches One dataset per project (i.e. study based systems) Registry databases (i.e. one treatment or disease) Data warehouse or data repository Common schema (data model) “Amalgamation” of heterogeneous datasets Common security and access Common syntax (data format) Defined links between records Indexed for searching and retrieval Federation / grid of semantically integrated data Common vocabulary / terminology Formal models (caBIG)
  • 11. Loosely Linking Data http://www.ncbi.nlm.nih.gov/sites/gquery
  • 12. Tightly Integrating Data Vocabulary / Terminology NCI Thesaurus (NCIt) NLM UMLS Standard data models caBIG / caDSR HL7/FDA/NCI CDISC / BRIDG Web services* Common syntax / format *Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-20 12000935[pmid]
  • 14. Appendix: 394 people at 60 sites visited from Aug, 2008 to Jun, 2009 Driving Flying
  • 15.
  • 16. Costly curation and support of research databases
  • 17. Widespread and large scale implementation of EMRs
  • 18. Development of data warehouses and repositories
  • 20. Difficulties accessing and retrieving research data
  • 21. Skewed distribution of data systems
  • 22. Prevalence of Microsoft Access and Excel solutions
  • 23. Shifts to less expensive and more open source platforms
  • 24. REDCap, CAISIS, caTissue, Python and BioconductorAppendix: Site Visit Findings
  • 25. Appendix: Clinical Systems Surgical Reports Radiation Therapy Reports Pathology Reports Laboratory Reports Radiology Reports Review of Systems and Patient Reported Outcomes Electronic Medical / Health Records Registration / demographics Clinical trials eligibility and recruitment Scheduling and operations
  • 26. Appendix: Engaging Patients in Data Management Pre-first visit questionnaires Web-based survey systems (e.g. REDCap) Patient reported outcomes Longitudinal follow-up process Tablets, iPads and mobile applications

Notas do Editor

  1. I hope you will give a broad overview of the key features of the database that would allow the development of optimal predictive models, demonstrate how Caisis works to collect clinical and research data, and has proved to be so valuable to the development of predictive models.
  2. Constraints on data entry increase reproducibility, but may decrease accuracyConducive to quantitative research and hypothesis testingOpen fields / coding may increase accuracy, but decrease reproducibilityConducive to qualitative research and discovery
  3. Krallinger et al. Linking genes to literature: text mining, information extraction, and retrieval applications for biology. Genome Biol (2008) vol. 9 Suppl 2 pp. S8Savova et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc (2010) vol. 17 (5) pp. 507-13
  4. Caisis is a data repository. One data model to rule them all
  5. How much time and effort does it take to pool databases and spreadsheets for predictive modeling?Stein. Creating a bioinformatics nation. Nature (2002) vol. 417 (6885) pp. 119-2012000935[pmid]If there is a need for large aggregated datasets from heterogeneous sources to support predictive modeling, we need to plan for this model.Building for one site and rolling out to other sites successfully is rare.
  6. Most people proclaimed that they did not want to “reinvent the wheel”, but proceeded to do so. Disconnect between beliefs and actions.Harris et al. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform (2009) vol. 42 (2) pp. 377-81