Russian Call Girls Kota * 8250192130 Service starts from just ₹9999 ✅
effective data sharing for a learning healthcare system
1.
2. Title: Implementing data standards for effective data sharing
and essential steps towards optimising medical research.
Authors: Paul Houston1, Larry Callahan2, Michael Braxenthaler4, Lauren Becnel1, Sam Hume1,
Dorina Bratfalean1, Martin Romaker3, Philipe Roca-Serra3, Andreas Tilman, Susanna Sansone,
Ibrahim Emam, Kerstin Forsberg,
Affiliations:
1. CDISC and CDISC European Foundation
2. FDA –Office of Health Informatics, Silver Spring Md USA.
3. Oxford Research University –
4. Roche
5. Biosci Consulting
6. astrazeneca
Etriks IMI Grant agreement number 115446
Draft publication – release q3 2018
3. Effective data sharing definitions
• ‘Effective data sharing’ –relies upon ‘high quality
structured data’ being fully shared, without limitations,
for open interrogation and for aggregation with
complementary data sets.
• ‘Effective data’ or ‘High quality structured data’ –
Implementation of a rigorous data standards
environment while also following approaches such as
FAIR (Findable, Accessible, Interoperable and
Reuseable) and The Dublin Core Interoperability levels.
• Semantic Interoperability – when high quality data is
inter-connected on the internet in a meaningful way to
create new knowledge and medical breakthroughs
4. Questions.
Who is in sales here?
Who does some sales?
We all need to sell medical research and
promote the sharing of data.
5. Recent history and the data sharing landscape
• Enshrined in European Public Policy – ‘open access to
document’ Vienna agreement
• Cochrane Institute and Ben Goldacre consistent voices
• Reaction is EMA policy 0070 – ‘Study reports’ ‘summary
data and results’ next steps will be Patient Level Data
• Pharma response – Datasphere.com
ClinicalStudyDataRequest.com , academic Yoda, AllTrials,
Vivali
• European Commission making research data open by
default
• WHO, Wellcome, Médecins Sans Frontières etc statement
7. Motivations/Why?
• Connect the worlds healthcare data (costs per year of
maintain and achieve of $300-700 million per year for
semantic interoperability but potential savings in
healthcare of $30 billion per year in the USA alone
• McKinsey have made other estimates for cost savings
for open data in healthcare of up to 300 billion.
• Why – to find cures, more precise medicines, improved
public Health, save lives, create efficiencies
8. Our Model proposes:
The more we collaborate to create and share,
high quality structured data
the higher our achievements will be and
the greater the cost effectiveness will be
9.
10. 5 top problems
1. Data standards are not consistently
mandated/recommended by govt/funders/foundations
2. Standards are not applied consistently, implementation
curves too steep, current low MetaDataRegistry use,
3. Data isn’t shared fully: fenestrated data or licencing or
anonymization issues
4. More resources: Creating and maintaining semantic
interoperability and standards is complex the sector needs
more hands, more funding, more government support.
European Open Science cloud estimate - 500,000 data
stewards.
5. More co-ordination less innovation - and less duplication of
terminologies and Domain Models.
11. 1. Creating data atoms of a quality high enough to be
Interoperable and Re-useable/reproducible , the
essential I & R of the FAIR data principles.
2. For data sharing essential to used established
anonymization techniques i.e. Phuse – Feran, Emman et
al 2015
3. It is the clinical research findings that are shared which
are the foundation to stratifying medicines for precision
medicine
4. Controlled vocabularies and shared ontologies,
Biomedical concepts, CDISC RDF and RDFS and SKOS for
the semantic Web.
12. Use Case: Improve registries via structured and connected
data: Clinical research & Patient/Disease
Avoiding Vioxx incident - Had the data for Vioxx and other similar cox2 inhibitor
drugs such as Naproxen been made available many of the estimated 40,000 deaths
might have been avoided and Merck would have avoided the 4.85 billion dollar joint
law suit.
• Reporting of clinical trials is now mandatory in USA and Europe but that reporting
must be timely – late submissions ignored for over 1billion in fines
• Standards exist for ingredients of drugs to be recorded in a structured format –
FDA working on IPD dictionary – so medicines using like ingredients can be cross
analysed
• Publishing results for Preclinical studies are not mandated by law in Europe
whereas SEND is mandated in USA. ‘Pre-clinical data is an essential ethical right’
Anderson & Kimmelman 2012
• Early detection of ADRs i.e. BIA 10-2474 preclinical studies showed deaths in
primates at high dosage, same in first in man studies – 1 death and several
serious ADRs.
13. • C. Glenn Begley et al identified 53 preclinical ‘landmark’ oncology papers in top
journals. Only 47 of 53 could be replicated
New knowledge or scientific facts
Which ultimately rests upon
all scientific models
and findings
Rest upon the integrity of each atom of
data and it’s supporting data
and metadata
14. • ‘The loss of empirical studies are sinkholes in the medical landscape’
• Grave danger in clinical interventions being made on poor data assumptions
• With the future being AI and deeplearning data must be reliable and structured
Unreliable facts at any point make poor scientific assumptions…
15. Califf’s big idea: Build a database for research done before clinical
trials
Could an official database of experiments on lab rats help medical science
overcome some of its most pressing problems?
• Internal preclinical data is scattered and lacks scientific rigour and
granularity
• Knowing what the dead ends are would make things more efficient
for the system as a whole
• Knowing what failed as well as what succeeded will improve the
reproducabilty issue
• SEND now mandated by FDA, PMDA now looking to utilize
preclinical…EMA?
C. Glenn Begley identified 53 "landmark" publications --
papers in top journals, from reputable labs -- for his
team to reproduce. Begley sought to double-check the
findings before trying to build on them for drug
development.
Result: 47 of the 53 could not be replicated
16. • Broad consensus on Domain Models : – BRIDG,
OMOP, CIMI, IDMP
• Implement Meta data registries for share
consistent standards implementation SHARE
• Data Sharing efforts OpenTrials,
CDSR,Datasphere
• Precompetitive efforts such as IMI showing
success
Collaboration has already facilitated:
17. • Complete beginning to end standards:
extending protocol, trial registration and
results summaries
• Convergence of standards to enrich the
data i.e. CDISC with IDMP, ODM2.0/FHIR –
• Further sharing of pre-competitive data
for de-duplication of effort i.e. Drug
repurposing
Deeper collaboration needed…
18. • Registry information is key to informing decision
making:better registries means better business
analytics for pharma and future A.I initiatives,
• better pharmacovigilance,
• De-duplication of effort i.e. more effective drug
repurposing
• A CTR2 standard would better inform future EDC
templates
• Efficiency and cost savings in disclosure of trials
CTR2 Case Study – https://www.cdisc.org/ctr2-project
19. Figure 3. Moving registries towards fully structured registries with broad integrated
data sources inkeeping with the ‘hierarchy of data returns’ paradigm
20. “life sciences involves modeling of an incomplete and ever-changing
model of how our bodies work and what we know about it.”
21. • Once expressed in RDF, information can be
represented, accessed, computed, integrated, and
exchanged without the need for any translations
• provides a universal, mathematically precise, and
computable language that can express a wide range
of information – ideal for integrating wide data
sources
• platform independence and semantic interoperability
are inherent
• Anonymisation - RDF is ideal to represent knowledge
and facts without including patient level details
Linking data with RDF…the potential
22. RDF matches and complements many of the goals
in the CDISC Vision
23. The CDISC Mission and Principles :
• Recognize the ultimate goal of creating regulatory
submissions that allow for flexibility in scientific content
and are easily interpreted, understood, and navigated
by regulatory reviewers.
• Acknowledge that the data content, structure, and
quality of the standard data models are of paramount
importance, independent of implementation strategy
and platform.
• Work with other professional groups to encourage that
there is maximum sharing of information and minimum
duplication of efforts.
24. Preclinical RDF db –
(not in existence)
X
Pharma RDF Mashup
BIO2RDF
(ADRs via SIDER)
Possible to create a RDF mash up of active substance
codes from PubChemRDF and research information
from clinical research linked via Pubmed to Springer
FTP dump files
PubChemRDF
Workaround achievable via linked data
BIA 10-2474-101
Drug Z
Event - death
NewStudyi.d
<hasActiveSubstance>
ZINC113812628
<hasDrug.>
Figure 4.1 ‘Effective Data Sharing’ Houston , Callahan et al , GRAPH USE CASE FOR BIA 10-2474-101 fatty acid amide
hydrolase inhibitor
SEND/SDTM RDF Data
RDF Data point
RDF Open Data Base
Legend
Graph view of potential clinical and
preclinical data connections
<haspreclinicalissues>
3.dosagethreshhold
1. DosingRegimeIssues
2.Pk data Ignored
<hasEvent>
<hasPreclinicalEvents>
<haspubmedMappings>
UniCode-5AP1ZW859M
<hasCitations>
JournalCitationDOI
<IDMPprefferedname>
<hasDose.>
<hasActiveSubstance>
1
2
3
4
25. ID Subject Predicate Object
1 Drug Z <hasActiveSubstance> ZINC113812628
2 ZINC113812628 <hasMappings> UNII-5AP1ZW859M
2 UNII-5AP1ZW859M <IDMPPrefferedName> BIA10-2474-101
4
5
BIA10-2474-101
Journal DOI
<hascitations>
<hasEvents>
Journal DOI
Event:Death of Monkeys
RDF Triples GRAPH USE CASE FOR BIA 10-2474-101
You can keep the primary sources of data in standard relational or
XML-database format, but export key “facts” as triples in RDF.
https://www.w3.org/2001/sw/sweo/public/UseCases/Pfizer/
27. by September 2011 there were 31 billion RDF statements , 504 million RDF links
28. Today, we even have medical doctors playing with Cypher queries,
demonstrating another important implication of this project: a shift in
mindset towards truly interdisciplinary effort of clinicians and data
29. Top 5 Recommendations to reach level 4
(from the 25 recommendations of the publication)
1. Improve registries via structured and connected data:clinical
research & Patient/Disease
2. Merge siloed data sharing efforts through collaborative models
i.e. amalgamate alltrials with vivali and others.
3. Wider publication and sharing of data concepts and semantic
relationships particularly between ontologies ; a formal LOD
diagram that connects all domain models i.e. BRIDG, OMOP,
CIMI, IDMP, PubMED etc
4. Increase investment in precompetitive data and knowledge
sharing/sustainability (too many RDF data sets not updated
regularly)
5. Validating the representations of the terminologies using
constraints and semantic standards
• expressed in RDF Data Shapes using SHACL - extend CDISC RDF
30. Level 4 is an achievable pot of gold at the end of the rainbow
But we all need to be on board with selling the data sharing agenda
31. With Thanks
• Larry Callahan2, Michael Braxenthaler4, Lauren Becnel1,
Sam Hume1, Dorina Bratfalean1, Martin Romaker3, Philipe
Roca-Serra3, Andreas Tilman, Susanna Sansone, Ibrahim
Emam, Kerstin Forsberg, Frederik Malfait, John Ezzell, Frank
Petavy, John Owen, Dave Iberson Hurst
• Phuse RDF team:
• Phil Ashworth, Scott Bahlavooni, Daniel Boisvert, Susan
DeHaven, Nathan Freimark, Josephine-Anne Gough, Laura
Hollink, Dave Jordan, Ron Katriel, Kirsten Langendorf, Geoff
Low, Frederik Malfait, Mitra Rocca, Gary Walker
• Also recognized are the efforts by Robert Wynne and Erin
Muhlbradt
Notas do Editor
If I give you a share of my profits, particularly as I understand you are helping my business, would I tell you where, how or when to spend that money?
A rigorous standards environment – ’The hierarchy of data returns’
Discussion with Kavita and
This is a promosing start but today I want to present what we can do better.. Peter Goszche -
standards create paths through all the data , illuminating knowledge, the data surrounding that knowledge is the context the providence and the proof. One piece missing or one piece that isn’t accurate in that equation and the whole scienficic premise collapses like a house of cards
https://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf
https://en.wikipedia.org/wiki/Expenditures_in_the_United_States_federal_budget
McKinsey Global Institute, Open data: Unlocking innovation and performance with liquid information
Gates Foundation, Wellcome Trust, UK Medical Research Council, Médecins Sans Frontières etc. also published a statement for all funded research results to be disclosed, as have the ICMJE Wellcome Trust. Sharing research data to improve public health: full joint statement by funders of health research. 2011. www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Public-health-and-epidemiology/WTDV030690.htm.
http://www.icmje.org/recommendations/browse/publishing-and-editorial-issues/clinical-trial-registration.html
The committee for the European Open Science Cloud has finalised a report recommending the need for 500,000 expert data stewards to move Europe towards an open and maintainable cluster of federated research environments spanning Europe https://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf
anonymization techniques that have been developed 1.ferran, Emman et al 2015 Phuse De-identification of CDISC http://www.phusewiki.org/docs/Conference%202015%20DH%20Papers/DH01.pdf
to ensure patient safety and efficacy Valkenhoef et al, 2012 Deficiencies in the transfer and availability of clinical trials evidence: a review of existing systms and standards https://doi.org/10.1186/1472-6947-12-95
Harlan & Hines et al, BMJ. 2007 Jan 20; 334(7585): 120–123. doi: 10.1136/bmj.39024.487720.68
C Glenn Begley 483,531–533 (29 March 2012) doi:10.1038/483531a 28 March 2012
pre-clinical data is an essential ethical right Anderson & Kimmelman PMC4516408/ 2012
experimental fatty acid amide hydrolase inhibitor
The ISO IDMP standards , discussed below, focus on creating traceable data points , such as active ingredients and formulation specific details, from preclinical and clinical research through to post marketing analysis.
eTRIKS project the open source Neo4j graph database was installed to query a prototype protein centric network to provide biological context to asthma related genes and CDISC patient data (figure 2 above). The graph database provided a flexible solution for the integration of multiple types of biological and patient data which also proved excellent for data mining to support hypothesis generation. Lysenko A, Roznovăţ IA, Saqi M, Mazein A, Rawlings CJ, Auffray C., Representing and querying disease networks using graph databases, BioData Min. 2016 Jul 25;9:23. doi: 10.1186/s13040-016-0102-8. PMID: 27462371
Figure 2: Example of a Cypher query on the network analysis platform. For a given class of drugs, this query finds the biomarkers that may be affected by intake of this drug and the mechanisms through which this link occurs. This helps clinicians to decide what biomarkers to focus on in specific patient subpopulations.