SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
Provenance abstraction for
implementing security policies
Learning Health System and securing provenance of
health data
Dr Vasa Curcin
King’s College London
Overview
• Learning Health System
• LHS requirements for provenance data
• TRANSFoRm project
• Transformation-oriented Access Control Language
for Provenance (TACLP)
Learning Health System
“ ... one in which progress in science,
informatics, and care culture align to generate
new knowledge as an ongoing, natural by-
product of the care experience, and
seamlessly refine and deliver best practices for
continuous improvement in health and health
care.” (Institute of Medicine)
We can’t afford to waste data!
2!
2
A Learning Health System
for the Nation
Pharmaceutical Firm
Beacon
Community
Integrated
Delivery
System
Community
Practice
Health'Informa. on'Organiza. on'
Health Center
Network
Federal
Agencies
State Public Health
Governance
Patient Engagement
Trust
Analysis
Dissemination
Learning Health System
Defining functions of a LHS are to:
1.routinely and securely aggregate data from disparate sources
2.convert the data to knowledge
3.disseminate that knowledge, in actionable forms, to everyone who can
benefit from it.
c/o C. Friedman
Learning Health System take-up
• US medical/academic centres
o Mayo, Duke, Vanderbilt
o PCORI
• National data aggregators
o Clinical Practice Research Datalink
o NIVEL
• EHR vendors
o CSC, Asseco, TPP, InPractice Systems
• European academic-industrial
collaborations
o TRANSFoRm, EHR4CR, Semantic
HealthNet
…and Bill
Example: Clinical trial challenges
• Major motivation for the LHS work
• Trials too expensive and difficult to run
• Efficacy-effectiveness gap (EEG)
o Disconnect between outcomes from clinical trials and
information needed for clinical practice
o Interaction of drug effect and real-life contextual factors
o Challenge to identify contextual factors
• LHS provides context and workflow
LHS for Clinical Trials
• EHR integration
o Eligibility checking done automatically from EHR data
o eCRFs partially filled based on EHR information
o All collected data stored in the EHR system as well as the
research database
• Closing the loop
o eCRF data enriches the EHR
o Helps the clinician
o Adds value to the EHR system
• Data does not go to waste!
7
Trust in the LHS
• Research community is struggling to ensure transparency
and correctness of published research
• Reasons complex and interleaving (positive bias, intractable
analysis, deluge of journals)
• Bayer Healthcare team published work showing that only
25% of the academic studies they examined could be
replicated
o Prinz et al. Nat. Rev. Drug Discov. 10, 712, 2011
• Of 53 oncology studies from 2001-2011, each highlighting
big new apparent advances in the field, only 11% (6) could
be robustly replicated.
o Begley & Ellis Nature 483, 531–533, 2012
Trust in the LHS (cont.)
• The problem is by no means restricted to preclinical studies
• Twelve randomised clinical trials testing 52 observational claims and
failed to reproduce a single one
o Young SS, Karr A. Deming, data and observational studies. Significance sep
2011; 8(3):116–120
• Replication of 100 experiments published in 2008 in three high-
ranking psychology journals – less than one half of finding replicated
o Estimating the reproducibility of psychological science. Science Aug
2015;349(6251)
• Random sample of 441 biomedical journal articles 2000 – 2014: none
made all their data available, one provided full protocol, majority did
not disclose funding or conflicts of interest
o Iqbal et al. Reproducible Research Practices and Transparency across the
Biomedical Literature. PLoS biology 2016; 14(1)
• Cost of irreproducible research in life science is estimated at $28
billion per year in the U.S
o Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in
Preclinical Research. PLOS Biology jun 2015; 13(6)
• Each component in the
healthcare system
produces and consumes
data:
• Epidemiological research
using record linkages
• Research data embedded in
the EHR
• Decision support for
diagnosis
• Provenance infrastructure
required to support all
these domains
Data in the Learning Health System
Specific
research
data
Actionable
data
Routinely
collected
data
• Clinical trials
• Controlled
populations
• Well-defined
questions
• EHR systems
• Wide coverage
• Vast quantity
• May lack in
detail and quality
• Distilled scientific
findings
• Usable in clinical
practice
• Decision support
TRANSFoRm project
• €7.5M European Commission 2010-2015
• Funded under the Patient Safety Work Program of FP7
• Developing methods, models, services, validated
architectures and demonstrations to support:
o Epidemiological research using GP records, including genotype-
phenotype studies and other record linkages
o Clinical trials embedded in the EHR
o Decision support for diagnosis
www.transformproject.eu
Middleware
Secure data
transport
RCT tools
(Electronic Data
Collection)
Epidemiological
study tools
(Data queries)
Authentication
framework
Diagnostic support
tools
Data source
connectivity
module
Provenance
framework
Vocabulary
service
TRANSFoRm software landscape
Use case 1: Type 2 Diabetes
• Research Question: In type 2 diabetic
patients, are selected single nucleotide
polymorphisms (SNPs) associated with
variations in drug response to oral antidiabetic
drugs (Sulfonylurea)?
• Design: Case-control study
• Data: primary care databases (phenotype
data) pre-linked to genomic databases
(genetic risk factors) – data federation
Use case 2: Gastro-oesophageal reflux disease (GORD)
• Research Question: What gives the best symptom relief
and improvement in Quality of Life: continuous or on
demand Proton Pump Inhibitor use?
• Design: Randomised Controlled Trial (RCT)
• Data: Collection through EHR & web based questionnaire –
electronic case report forms AND mobile Patient Related
Outcome Measures
• Provenance and security
Use case 3: Diagnostic Decision Support
• Early diagnostic suggestions for presenting problems:
• chest pain
• abdominal pain
• shortness of breath
• Clinical Prediction Rule web service (with underlying
ontology)
• Prototype Decision Support System integrated with a
commercial electronic health record system
• Vision by InPractice Systems
Provenance challenge for TRANSFoRm
• Viable methods for adoption in a heterogeneous
software environment
o No shared workflow middleware to rely on
• Need to achieve domain specificity
• Able to demonstrate conformance to standards
o Title 21 of the Code of Federal Regulations; Electronic
Records; Electronic Signatures (21 CFR Part 11)
o Good Clinical Practice (GCP)
o EudraLex Vol. 4 Annex 11: Computerised Systems in EU
o CONSORT, STROBE, RECORD
Semantic annotations
• Semantic concepts in the provenance graph defined using
TRANSFoRm ontologies:
o Clinical Research Information Model (CRIM)
o Software infrastructure ontology
o Clinical evidence ontology
• Ontology concepts annotations on provenance nodes
• Provenance templates define domain actions that map to
provenance fragments
PCROM
(UML Model)
Randomised
Clinical Trial
Ontology
(RCTO)
Randomised
Clinical Trial
Provenance
Ontology
(RCTPO)
Provenance templates
Provenance database
Provenance server
Existing
tools
1. Tools are agnostic to provenance
representation
2. Service invocation matches some
provenance template in
Provenance server
3. Template is instantiated into a
provenance graph fragment with
OWL concept annotations
4. Graphs merged inside the
database
API service calls
OPM graphs annotated
with OWL
Example: Provenance of diagnostic recommendation
Provenance security
• Use a single provenance graph for:
o Full trial audit
o Reporting studies
o Publication review
o Collaborators
o Readers
• Need to abstract parts of the graph
• Access control and view generation for provenance
graphs
o Future Generation Computer Systems, Volume 49,
August 2015, Pages 8-27 Roxana Danger, Vasa Curcin,
Paolo Missier, Jeremy Bryans
Basic idea
• The aim of an access control strategy is not only to
determine if the resource can be viewed or not, but
to construct a view of the graph which satisfies the
security constraints
• The goal is for maximum amount of information to
be retained
• NB Based on TRANSFoRm use cases but not
implemented in the live system
Access control
• Ensuring that a principal (person, process, etc.) can
only access the services or data in a system that
they are authorized to
• Implemented through security policies that try to
enforce a certain protection goal such as to prevent
unauthorized disclosure (secrecy) and intentional or
accidental unauthorized changes (integrity)
• Authorizations for some resource can be:
o Positive (allow)
o Negative (deny)
Access control
• Two classical approaches:
o Closed policy
• deny-by-default
• Access to a resource is only granted if a corresponding positive
authorization policy exists
o Open policy
• Permit-by=default
• Access unless a corresponding negative authorization policy exists.
• Combined approach used to support policy exceptions
• Conflict resolution needed if multiple policies apply,
e.g.
o denials-take-precedence
o most-specific-takes- precedence
o priority levels
o time-dependent access.
Access control languages for provenance
• Qin Ni et al
o Semantic description of subjects (user roles) and resources
to be accessed
o conditions under which restrictions are applied,
o four different types of access permissions.
• Cadenhead et al
o Added regular expressions for resource and condition
descriptions
• Transformation-oriented Access Control Language for
Provenance (TACLP)
o Allows users to define subgraphs to be transformed, with
three different levels of abstractions (namely hide, minimal
and maximal).
Indirect relations
• Introduce some new relations to be used in
abstraction
External effects and causes
• External effects and causes of the set of nodes S
w.r.t. a set of nodes R
o Set of nodes that represent the immediate
effects/causes of S that would be affected by removal of
nodes in R from the graph V (𝑆 ⊆ 𝑅 ⊆ 𝑉)
o If S=R, then denote as ef(R) and ca(R)
External effects and causes
Basic operations
• Node removal
o Subgraph needs to be hidden
o e.g. if it is unnecessary for an analysis or user access to it
has been restricted.
• Node replacement
o removing details of data and operations in a subgraph
while retaining some information (abstract entity) of the
existence of such subgraph.
Operation: node removal
• Let Prov = (V , E , type) and R ⊆ V be a set of nodes to be
removed. Result is a new provenance graph Prov′
=(V′,E′,type′), such that:
Operation: node replacement
• As before, with operation AR replacing node set R
with node va
Abstract nodes and edges
• Dummy nodes introduced during entity
replacement
• Preserve the causality of the rest of the graph
• Two types of dependencies:
o Indirect
• Denoted with double lines
• Represent multi-step dependences (wdf+, u+, wgb+, wtb+)
o Soft dependencies
• Denoted with double dashed lines
• Generic transitive relationship which is not one of the above
Removal and Replacement
Replace (A,B)
Remove (A,B)
Removal and Replacement
Replace (A,B)
Remove (A,B)
False dependencies
• False dependencies introduce a previously non-
existent path in the new graph, e.g. removing A, B
Causality preserving transformation
• A transformation is called causality preserving if it
does not introduce false dependencies.
• Given a provenance graph and a set of entities to be
abstracted/hidden, the question is how can these
entities be joined or removed from the graph using
only causality-preserving transformations?
Causality preserving partition and transformation
• Given a set of nodes R ⊆ V, a causality preserving
partition ℘ of R is such that removing or replacing
any set of nodes 𝑃 ∈ ℘ will not introduce causal
dependencies
• A graph transformation by partition ℘ of R is then a
sequential application of Remp or Repp
• The necessary and sufficient condition for such
transformation to be causality preserving is that for
each 𝑃 ∈ ℘ all of P’s external causes and effects are
connected
Optimal causality preserving partition
• Default partition of R consists of singletons, i.e.
each node in R is a set in the partition.
• Optimal partition is such that none of its sets have
the same sets of external causes and effects w.r.t. R
• Partitioning algorithm
o Step 1, determine external causes and effects for default
partition
o Step 2, gradually merge the partitions until optimal.
Provenance graph transformation algorithm
• Once the partition is computed, the
transformations are iteratively applied to each
element in the partition
• Labels input provides names for generated abstract
nodes
• Levels input provides abstraction level for each
partition
o Hide
• remove operation
o Minimum abstraction, maximum abstraction
• replace operation
• isolated singletons removed as a special case.
Computational efficiency
• Transformation algorithm performance depends on
the performance of the partition algorithm
• The other steps are linear to cardinality of the set of
partitions ℘ and its edges
• The partition algorithm considers pair-wise
combinations of nodes.
• Overall complexity is O(R2), where R is the set of
nodes to abstract
Experimental results
• Provenance view transformation algorithm was
implemented in Python 2.7 using Networkx API.
• Experiments were executed on Ubuntu 12.04, Intel
Core i7-3687U CPU with 2.10GHz and 8GB RAM
• Synthetic provenance graphs used, randomly
generating edges for each node within the degree
range 2-10
• Two parameters:
o the percentage of nodes to abstract (from 5 to 25 with a step
5)
o the percentage of nodes to abstract which are causally
dependent (from 0 to 100 with a step of 25)
• Each configuration was executed 10 times and the plots
presented show the averages of these executions.
Performance behaviour
• Execution time (Y) in seconds as a function of the
number of nodes (X) and the percentage of nodes
to abstract (Z)
• Quadratic time
Use case: Access to health data
• Access control for the provenance data collected from an
Electronic Health Record (EHR) and clinical trial systems
• Rules:
o Auditors. Healthcare system auditors or law enforcement agencies can
access the whole provenance graph during the auditing process.
o Family doctors and patients. Electronic health records and their data
provenance can only be accessed by patients during weekends, and by
FDs during weekdays.
o Active FDs. Active FDs have access to the provenance data associated
with the EHRs of their patients and its provenance;
o Clinical trial 1. If some data comes from a clinical trial, the GP needs to
be participant of the trial to see the subgraph associated with that trial.
o Clinical trial 2. Patients do not have access to clinical trial processes.
o Laboratory. Patients do not have access to laboratory processes.
o Automatic diagnosis recommendation. Patients have no access to any
information related to the automatic diagnosis recommendation nor to
the graph segment connecting it with the clinical evidences.
TACLP
• Transformation-oriented Access Control Language
for Provenance (TACLP)
• Extends the works of Ni and Cadenhead by
introducing transformations
• A policy consists of:
o Target
o Effect
o Transformation
o Condition (optional)
o Obligation (optional)
TACLP Target
• Subject element
o Set of users (subject element) to which the policy should be
applied, expressed through IRI references
• Record element
o Set of resources to which the policy should be applied,
expressed through IRI references
• Restriction element (optional)
o A conditional expression under which the policy is applied
o Either a relational comparison between a value in a property
path and a literal, or a full logical expression.
• Scope element (optional)
o If the policy is ‘transferable’ or ‘non-transferable’ with
respect to subjects
o Whether it applies to all the ancestors of matched elements
in the graph, or to the matched elements only.
TACLP Effect
• Specifies the intended outcome
• Four possibilities:
o Absolute permit guarantees access to the graph regardless
of the effect of other policies
• e.g. for allowing access to auditors or law enforcement agencies, and
avoids the need for additional conditions in deny policies
o Deny guarantees that certain parts of the graph will not be
accessed by users in the subject element.
o Necessary permit is used to describe the necessary, but not
always sufficient, conditions for accessing certain parts of the
graphs
o Permit is used to describe those parts of the graph that can
be accessed if there are no other policies denying access to
it.
TACLP Transformation
• How to transform the provenance graph in order to
hide certain resources
• Specification of which nodes need to be hidden and
Removal/Replace operations to be applied to them
• Set of policies comprising
o Policy type (target, record, condition, effect,
transformation element and obligation)
o Policy evaluation type (deny- takes-precedence or
permit-takes-precedence)
TACLP Transformation
• Abstraction level
o Hide
• matched nodes of the subgraph have to be completely hidden
(removed) from the graph
• Remove transformation is applied;
o Minimum abstraction
• Replace transformation is applied
• No caused-by relationship (soft dependencies) will appear in
the transformed graph.
o Maximum abstraction
• Replace transformation is applied
• Soft dependencies can appear in the transformed graph.
Access control evaluation algorithms
• Aim to produce an abstracted graph that satisfies
the constraints
• Deny-takes-precedence
1. Absolute permit policies evaluated first
2. Necessary permit and deny policies
3. Permit policies
• Allow-takes-precedence
1. Absolute permit evaluated first
2. Necessary permit policies
3. Permit policies
4. Deny policies
Example: Source provenance graph
Example: Abstracted provenance graph
Summary
• Learning Health System presenting new set of
challenges for medical and informatics communities
• Provenance can help establish trust in the LHS
• Methods needed to verify trust
• Abstraction of provenance traces needed to address
requirements of multiple stakeholders
o Researchers
o Regulators
o Publishers
• Future work
o Projects running on provenance of decision support and
visual analytics for health data
o Looking for partnerships to investigate applications of the
security work
Acknowledgements
• Thanks to:
o Roxana Danger
o Paolo Missier
o Jeremy Bryant
o Derek Corrigan
o Brendan Delaney
Questions?
Thank you!

Mais conteúdo relacionado

Mais procurados

AI applications in life sciences - drug development
AI applications in life sciences - drug developmentAI applications in life sciences - drug development
AI applications in life sciences - drug developmentJayanthi Repalli, PhD
 
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-finalPeter Embi
 
2016 CRI Year-in-Review
2016 CRI Year-in-Review2016 CRI Year-in-Review
2016 CRI Year-in-ReviewPeter Embi
 
Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Peter Embi
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-finalPeter Embi
 
Open Data in Medicine. Application of Mind Maping automation to visualize inf...
Open Data in Medicine. Application of Mind Maping automation to visualize inf...Open Data in Medicine. Application of Mind Maping automation to visualize inf...
Open Data in Medicine. Application of Mind Maping automation to visualize inf...José M. Guerrero
 
Research Data Management for Clinical Trials and Quality Improvement
Research Data Management for Clinical Trials and Quality ImprovementResearch Data Management for Clinical Trials and Quality Improvement
Research Data Management for Clinical Trials and Quality ImprovementMargaret Henderson
 
Health advances ai in diagnostic development
Health advances ai in diagnostic developmentHealth advances ai in diagnostic development
Health advances ai in diagnostic developmentHealth Advances
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized MedicineEdgewater
 
AMIA 2015 CRI Year-in-Review
AMIA 2015 CRI Year-in-ReviewAMIA 2015 CRI Year-in-Review
AMIA 2015 CRI Year-in-ReviewPeter Embi
 
AllTrials AAAS 2015 - Access to anonymised patient level data
AllTrials AAAS 2015 - Access to anonymised patient level dataAllTrials AAAS 2015 - Access to anonymised patient level data
AllTrials AAAS 2015 - Access to anonymised patient level dataSenseAboutSci
 
Informatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineInformatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineAndre Dekker
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the futurePistoia Alliance
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineJoel Saltz
 
A Roadmap for SAS Programmers to Clinical Statistical Programming
A Roadmap for SAS Programmers to Clinical Statistical ProgrammingA Roadmap for SAS Programmers to Clinical Statistical Programming
A Roadmap for SAS Programmers to Clinical Statistical ProgrammingMohammad Majharul Alam
 
NS1450X - Computerized Systems in Clinical Research
NS1450X - Computerized Systems in Clinical ResearchNS1450X - Computerized Systems in Clinical Research
NS1450X - Computerized Systems in Clinical ResearchJudson Chase
 
Natasha Curry: Risk Prediction in England
Natasha Curry: Risk Prediction in EnglandNatasha Curry: Risk Prediction in England
Natasha Curry: Risk Prediction in EnglandNuffield Trust
 
Wsdanjohncleland
WsdanjohnclelandWsdanjohncleland
Wsdanjohncleland3GDR
 

Mais procurados (20)

AI applications in life sciences - drug development
AI applications in life sciences - drug developmentAI applications in life sciences - drug development
AI applications in life sciences - drug development
 
Peter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-ReviewPeter Embi's 2011 AMIA CRI Year-in-Review
Peter Embi's 2011 AMIA CRI Year-in-Review
 
Embi cri review-2012-final
Embi cri review-2012-finalEmbi cri review-2012-final
Embi cri review-2012-final
 
2016 CRI Year-in-Review
2016 CRI Year-in-Review2016 CRI Year-in-Review
2016 CRI Year-in-Review
 
Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014Clinical Research Informatics (CRI) Year-in-Review 2014
Clinical Research Informatics (CRI) Year-in-Review 2014
 
Embi cri review-2013-final
Embi cri review-2013-finalEmbi cri review-2013-final
Embi cri review-2013-final
 
Open Data in Medicine. Application of Mind Maping automation to visualize inf...
Open Data in Medicine. Application of Mind Maping automation to visualize inf...Open Data in Medicine. Application of Mind Maping automation to visualize inf...
Open Data in Medicine. Application of Mind Maping automation to visualize inf...
 
Research Data Management for Clinical Trials and Quality Improvement
Research Data Management for Clinical Trials and Quality ImprovementResearch Data Management for Clinical Trials and Quality Improvement
Research Data Management for Clinical Trials and Quality Improvement
 
Health advances ai in diagnostic development
Health advances ai in diagnostic developmentHealth advances ai in diagnostic development
Health advances ai in diagnostic development
 
The Future of Personalized Medicine
The Future of Personalized MedicineThe Future of Personalized Medicine
The Future of Personalized Medicine
 
AMIA 2015 CRI Year-in-Review
AMIA 2015 CRI Year-in-ReviewAMIA 2015 CRI Year-in-Review
AMIA 2015 CRI Year-in-Review
 
AllTrials AAAS 2015 - Access to anonymised patient level data
AllTrials AAAS 2015 - Access to anonymised patient level dataAllTrials AAAS 2015 - Access to anonymised patient level data
AllTrials AAAS 2015 - Access to anonymised patient level data
 
Informatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision MedicineInformatics and Clinical Decision Support in Precision Medicine
Informatics and Clinical Decision Support in Precision Medicine
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
Introduction to clinical sas
Introduction to clinical sasIntroduction to clinical sas
Introduction to clinical sas
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision Medicine
 
A Roadmap for SAS Programmers to Clinical Statistical Programming
A Roadmap for SAS Programmers to Clinical Statistical ProgrammingA Roadmap for SAS Programmers to Clinical Statistical Programming
A Roadmap for SAS Programmers to Clinical Statistical Programming
 
NS1450X - Computerized Systems in Clinical Research
NS1450X - Computerized Systems in Clinical ResearchNS1450X - Computerized Systems in Clinical Research
NS1450X - Computerized Systems in Clinical Research
 
Natasha Curry: Risk Prediction in England
Natasha Curry: Risk Prediction in EnglandNatasha Curry: Risk Prediction in England
Natasha Curry: Risk Prediction in England
 
Wsdanjohncleland
WsdanjohnclelandWsdanjohncleland
Wsdanjohncleland
 

Semelhante a Provenance abstraction for implementing security: Learning Health System and securing provenance of health data

Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
  Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...  Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...HIMSS UK
 
Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Computer System Validation - privacy zones, eSource and EHR data in clinical ...Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Computer System Validation - privacy zones, eSource and EHR data in clinical ...Wolfgang Kuchinke
 
Computer System Validation with privacy zones, e-source and clinical trials b...
Computer System Validation with privacy zones, e-source and clinical trials b...Computer System Validation with privacy zones, e-source and clinical trials b...
Computer System Validation with privacy zones, e-source and clinical trials b...Wolfgang Kuchinke
 
Computer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-KuchinkeComputer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-KuchinkeWolfgang Kuchinke
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Perficient, Inc.
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesPhilip Payne
 
The Continuous Update Project: Novel approach to reviewing mechanistic evide...
 The Continuous Update Project: Novel approach to reviewing mechanistic evide... The Continuous Update Project: Novel approach to reviewing mechanistic evide...
The Continuous Update Project: Novel approach to reviewing mechanistic evide...World Cancer Research Fund International
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSemantic Web San Diego
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...lucenerevolution
 
Safti net kick off 12092010-mrs
Safti net kick off 12092010-mrsSafti net kick off 12092010-mrs
Safti net kick off 12092010-mrsMarion Sills
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorHuman Variome Project
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdfAlain van Gool
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsLuis Marco Ruiz
 
Evolution 2013: Prof. Dr. Georges De Moor, EuroRec on Liberating Health Data ...
Evolution 2013: Prof. Dr. Georges De Moor, EuroRec on Liberating Health Data ...Evolution 2013: Prof. Dr. Georges De Moor, EuroRec on Liberating Health Data ...
Evolution 2013: Prof. Dr. Georges De Moor, EuroRec on Liberating Health Data ...Life Sciences Network marcus evans
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015IRIDA_community
 

Semelhante a Provenance abstraction for implementing security: Learning Health System and securing provenance of health data (20)

Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
  Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...  Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
 
Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Computer System Validation - privacy zones, eSource and EHR data in clinical ...Computer System Validation - privacy zones, eSource and EHR data in clinical ...
Computer System Validation - privacy zones, eSource and EHR data in clinical ...
 
Computer System Validation with privacy zones, e-source and clinical trials b...
Computer System Validation with privacy zones, e-source and clinical trials b...Computer System Validation with privacy zones, e-source and clinical trials b...
Computer System Validation with privacy zones, e-source and clinical trials b...
 
Computer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-KuchinkeComputer validation of e-source and EHR in clinical trials-Kuchinke
Computer validation of e-source and EHR in clinical trials-Kuchinke
 
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
Combining Patient Records, Genomic Data and Environmental Data to Enable Tran...
 
The Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across ScalesThe Learning Health System: Thinking and Acting Across Scales
The Learning Health System: Thinking and Acting Across Scales
 
Dalton
DaltonDalton
Dalton
 
Dalton presentation
Dalton presentationDalton presentation
Dalton presentation
 
The Continuous Update Project: Novel approach to reviewing mechanistic evide...
 The Continuous Update Project: Novel approach to reviewing mechanistic evide... The Continuous Update Project: Novel approach to reviewing mechanistic evide...
The Continuous Update Project: Novel approach to reviewing mechanistic evide...
 
Simplifying semantics for biomedical applications
Simplifying semantics for biomedical applicationsSimplifying semantics for biomedical applications
Simplifying semantics for biomedical applications
 
Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...Next generation electronic medical records and search a test implementation i...
Next generation electronic medical records and search a test implementation i...
 
Safti net kick off 12092010-mrs
Safti net kick off 12092010-mrsSafti net kick off 12092010-mrs
Safti net kick off 12092010-mrs
 
The Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham TaylorThe Human Variome Database in Australia in 2014 - Graham Taylor
The Human Variome Database in Australia in 2014 - Graham Taylor
 
Markham2009
Markham2009Markham2009
Markham2009
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
2022-11-23 DTL Future of data-driven life sciences, Utrecht, Alain van Gool.pdf
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse EnvironmentsEnabling Clinical Data Reuse with openEHR Data Warehouse Environments
Enabling Clinical Data Reuse with openEHR Data Warehouse Environments
 
Evolution 2013: Prof. Dr. Georges De Moor, EuroRec on Liberating Health Data ...
Evolution 2013: Prof. Dr. Georges De Moor, EuroRec on Liberating Health Data ...Evolution 2013: Prof. Dr. Georges De Moor, EuroRec on Liberating Health Data ...
Evolution 2013: Prof. Dr. Georges De Moor, EuroRec on Liberating Health Data ...
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 

Último

online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdfMeon Technology
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilVICTOR MAESTRE RAMIREZ
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfBrain Inventory
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsJaydeep Chhasatia
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampVICTOR MAESTRE RAMIREZ
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmonyelliciumsolutionspun
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...OnePlan Solutions
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.Sharon Liu
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Incrobinwilliams8624
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxAutus Cyber Tech
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyRaymond Okyere-Forson
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptkinjal48
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntelliSource Technologies
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesSoftwareMill
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfTobias Schneck
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorShane Coughlan
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIIvo Andreev
 

Último (20)

online pdf editor software solutions.pdf
online pdf editor software solutions.pdfonline pdf editor software solutions.pdf
online pdf editor software solutions.pdf
 
Generative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-CouncilGenerative AI for Cybersecurity - EC-Council
Generative AI for Cybersecurity - EC-Council
 
Why Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdfWhy Choose Brain Inventory For Ecommerce Development.pdf
Why Choose Brain Inventory For Ecommerce Development.pdf
 
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software TeamsYour Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
Your Vision, Our Expertise: TECUNIQUE's Tailored Software Teams
 
Deep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - DatacampDeep Learning for Images with PyTorch - Datacamp
Deep Learning for Images with PyTorch - Datacamp
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine HarmonyLeveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
Leveraging DxSherpa's Generative AI Services to Unlock Human-Machine Harmony
 
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
Transforming PMO Success with AI - Discover OnePlan Strategic Portfolio Work ...
 
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
20240319 Car Simulator Plan.pptx . Plan for a JavaScript Car Driving Simulator.
 
Enterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze IncEnterprise Document Management System - Qualityze Inc
Enterprise Document Management System - Qualityze Inc
 
ERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptxERP For Electrical and Electronics manufecturing.pptx
ERP For Electrical and Electronics manufecturing.pptx
 
AI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human BeautyAI Embracing Every Shade of Human Beauty
AI Embracing Every Shade of Human Beauty
 
Webinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.pptWebinar_050417_LeClair12345666777889.ppt
Webinar_050417_LeClair12345666777889.ppt
 
Introduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptxIntroduction-to-Software-Development-Outsourcing.pptx
Introduction-to-Software-Development-Outsourcing.pptx
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Growing Oxen: channel operators and retries
Growing Oxen: channel operators and retriesGrowing Oxen: channel operators and retries
Growing Oxen: channel operators and retries
 
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdfARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
ARM Talk @ Rejekts - Will ARM be the new Mainstream in our Data Centers_.pdf
 
OpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS CalculatorOpenChain Webinar: Universal CVSS Calculator
OpenChain Webinar: Universal CVSS Calculator
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
Salesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptxSalesforce AI Associate Certification.pptx
Salesforce AI Associate Certification.pptx
 

Provenance abstraction for implementing security: Learning Health System and securing provenance of health data

  • 1. Provenance abstraction for implementing security policies Learning Health System and securing provenance of health data Dr Vasa Curcin King’s College London
  • 2. Overview • Learning Health System • LHS requirements for provenance data • TRANSFoRm project • Transformation-oriented Access Control Language for Provenance (TACLP)
  • 3. Learning Health System “ ... one in which progress in science, informatics, and care culture align to generate new knowledge as an ongoing, natural by- product of the care experience, and seamlessly refine and deliver best practices for continuous improvement in health and health care.” (Institute of Medicine) We can’t afford to waste data!
  • 4. 2! 2 A Learning Health System for the Nation Pharmaceutical Firm Beacon Community Integrated Delivery System Community Practice Health'Informa. on'Organiza. on' Health Center Network Federal Agencies State Public Health Governance Patient Engagement Trust Analysis Dissemination Learning Health System Defining functions of a LHS are to: 1.routinely and securely aggregate data from disparate sources 2.convert the data to knowledge 3.disseminate that knowledge, in actionable forms, to everyone who can benefit from it. c/o C. Friedman
  • 5. Learning Health System take-up • US medical/academic centres o Mayo, Duke, Vanderbilt o PCORI • National data aggregators o Clinical Practice Research Datalink o NIVEL • EHR vendors o CSC, Asseco, TPP, InPractice Systems • European academic-industrial collaborations o TRANSFoRm, EHR4CR, Semantic HealthNet …and Bill
  • 6. Example: Clinical trial challenges • Major motivation for the LHS work • Trials too expensive and difficult to run • Efficacy-effectiveness gap (EEG) o Disconnect between outcomes from clinical trials and information needed for clinical practice o Interaction of drug effect and real-life contextual factors o Challenge to identify contextual factors • LHS provides context and workflow
  • 7. LHS for Clinical Trials • EHR integration o Eligibility checking done automatically from EHR data o eCRFs partially filled based on EHR information o All collected data stored in the EHR system as well as the research database • Closing the loop o eCRF data enriches the EHR o Helps the clinician o Adds value to the EHR system • Data does not go to waste! 7
  • 8. Trust in the LHS • Research community is struggling to ensure transparency and correctness of published research • Reasons complex and interleaving (positive bias, intractable analysis, deluge of journals) • Bayer Healthcare team published work showing that only 25% of the academic studies they examined could be replicated o Prinz et al. Nat. Rev. Drug Discov. 10, 712, 2011 • Of 53 oncology studies from 2001-2011, each highlighting big new apparent advances in the field, only 11% (6) could be robustly replicated. o Begley & Ellis Nature 483, 531–533, 2012
  • 9. Trust in the LHS (cont.) • The problem is by no means restricted to preclinical studies • Twelve randomised clinical trials testing 52 observational claims and failed to reproduce a single one o Young SS, Karr A. Deming, data and observational studies. Significance sep 2011; 8(3):116–120 • Replication of 100 experiments published in 2008 in three high- ranking psychology journals – less than one half of finding replicated o Estimating the reproducibility of psychological science. Science Aug 2015;349(6251) • Random sample of 441 biomedical journal articles 2000 – 2014: none made all their data available, one provided full protocol, majority did not disclose funding or conflicts of interest o Iqbal et al. Reproducible Research Practices and Transparency across the Biomedical Literature. PLoS biology 2016; 14(1) • Cost of irreproducible research in life science is estimated at $28 billion per year in the U.S o Freedman LP, Cockburn IM, Simcoe TS. The Economics of Reproducibility in Preclinical Research. PLOS Biology jun 2015; 13(6)
  • 10. • Each component in the healthcare system produces and consumes data: • Epidemiological research using record linkages • Research data embedded in the EHR • Decision support for diagnosis • Provenance infrastructure required to support all these domains Data in the Learning Health System Specific research data Actionable data Routinely collected data • Clinical trials • Controlled populations • Well-defined questions • EHR systems • Wide coverage • Vast quantity • May lack in detail and quality • Distilled scientific findings • Usable in clinical practice • Decision support
  • 11. TRANSFoRm project • €7.5M European Commission 2010-2015 • Funded under the Patient Safety Work Program of FP7 • Developing methods, models, services, validated architectures and demonstrations to support: o Epidemiological research using GP records, including genotype- phenotype studies and other record linkages o Clinical trials embedded in the EHR o Decision support for diagnosis www.transformproject.eu
  • 12. Middleware Secure data transport RCT tools (Electronic Data Collection) Epidemiological study tools (Data queries) Authentication framework Diagnostic support tools Data source connectivity module Provenance framework Vocabulary service TRANSFoRm software landscape
  • 13. Use case 1: Type 2 Diabetes • Research Question: In type 2 diabetic patients, are selected single nucleotide polymorphisms (SNPs) associated with variations in drug response to oral antidiabetic drugs (Sulfonylurea)? • Design: Case-control study • Data: primary care databases (phenotype data) pre-linked to genomic databases (genetic risk factors) – data federation
  • 14. Use case 2: Gastro-oesophageal reflux disease (GORD) • Research Question: What gives the best symptom relief and improvement in Quality of Life: continuous or on demand Proton Pump Inhibitor use? • Design: Randomised Controlled Trial (RCT) • Data: Collection through EHR & web based questionnaire – electronic case report forms AND mobile Patient Related Outcome Measures • Provenance and security
  • 15. Use case 3: Diagnostic Decision Support • Early diagnostic suggestions for presenting problems: • chest pain • abdominal pain • shortness of breath • Clinical Prediction Rule web service (with underlying ontology) • Prototype Decision Support System integrated with a commercial electronic health record system • Vision by InPractice Systems
  • 16. Provenance challenge for TRANSFoRm • Viable methods for adoption in a heterogeneous software environment o No shared workflow middleware to rely on • Need to achieve domain specificity • Able to demonstrate conformance to standards o Title 21 of the Code of Federal Regulations; Electronic Records; Electronic Signatures (21 CFR Part 11) o Good Clinical Practice (GCP) o EudraLex Vol. 4 Annex 11: Computerised Systems in EU o CONSORT, STROBE, RECORD
  • 17. Semantic annotations • Semantic concepts in the provenance graph defined using TRANSFoRm ontologies: o Clinical Research Information Model (CRIM) o Software infrastructure ontology o Clinical evidence ontology • Ontology concepts annotations on provenance nodes • Provenance templates define domain actions that map to provenance fragments PCROM (UML Model) Randomised Clinical Trial Ontology (RCTO) Randomised Clinical Trial Provenance Ontology (RCTPO)
  • 18. Provenance templates Provenance database Provenance server Existing tools 1. Tools are agnostic to provenance representation 2. Service invocation matches some provenance template in Provenance server 3. Template is instantiated into a provenance graph fragment with OWL concept annotations 4. Graphs merged inside the database API service calls OPM graphs annotated with OWL
  • 19. Example: Provenance of diagnostic recommendation
  • 20. Provenance security • Use a single provenance graph for: o Full trial audit o Reporting studies o Publication review o Collaborators o Readers • Need to abstract parts of the graph • Access control and view generation for provenance graphs o Future Generation Computer Systems, Volume 49, August 2015, Pages 8-27 Roxana Danger, Vasa Curcin, Paolo Missier, Jeremy Bryans
  • 21. Basic idea • The aim of an access control strategy is not only to determine if the resource can be viewed or not, but to construct a view of the graph which satisfies the security constraints • The goal is for maximum amount of information to be retained • NB Based on TRANSFoRm use cases but not implemented in the live system
  • 22. Access control • Ensuring that a principal (person, process, etc.) can only access the services or data in a system that they are authorized to • Implemented through security policies that try to enforce a certain protection goal such as to prevent unauthorized disclosure (secrecy) and intentional or accidental unauthorized changes (integrity) • Authorizations for some resource can be: o Positive (allow) o Negative (deny)
  • 23. Access control • Two classical approaches: o Closed policy • deny-by-default • Access to a resource is only granted if a corresponding positive authorization policy exists o Open policy • Permit-by=default • Access unless a corresponding negative authorization policy exists. • Combined approach used to support policy exceptions • Conflict resolution needed if multiple policies apply, e.g. o denials-take-precedence o most-specific-takes- precedence o priority levels o time-dependent access.
  • 24. Access control languages for provenance • Qin Ni et al o Semantic description of subjects (user roles) and resources to be accessed o conditions under which restrictions are applied, o four different types of access permissions. • Cadenhead et al o Added regular expressions for resource and condition descriptions • Transformation-oriented Access Control Language for Provenance (TACLP) o Allows users to define subgraphs to be transformed, with three different levels of abstractions (namely hide, minimal and maximal).
  • 25. Indirect relations • Introduce some new relations to be used in abstraction
  • 26. External effects and causes • External effects and causes of the set of nodes S w.r.t. a set of nodes R o Set of nodes that represent the immediate effects/causes of S that would be affected by removal of nodes in R from the graph V (𝑆 ⊆ 𝑅 ⊆ 𝑉) o If S=R, then denote as ef(R) and ca(R)
  • 28. Basic operations • Node removal o Subgraph needs to be hidden o e.g. if it is unnecessary for an analysis or user access to it has been restricted. • Node replacement o removing details of data and operations in a subgraph while retaining some information (abstract entity) of the existence of such subgraph.
  • 29. Operation: node removal • Let Prov = (V , E , type) and R ⊆ V be a set of nodes to be removed. Result is a new provenance graph Prov′ =(V′,E′,type′), such that:
  • 30. Operation: node replacement • As before, with operation AR replacing node set R with node va
  • 31. Abstract nodes and edges • Dummy nodes introduced during entity replacement • Preserve the causality of the rest of the graph • Two types of dependencies: o Indirect • Denoted with double lines • Represent multi-step dependences (wdf+, u+, wgb+, wtb+) o Soft dependencies • Denoted with double dashed lines • Generic transitive relationship which is not one of the above
  • 32. Removal and Replacement Replace (A,B) Remove (A,B)
  • 33. Removal and Replacement Replace (A,B) Remove (A,B)
  • 34. False dependencies • False dependencies introduce a previously non- existent path in the new graph, e.g. removing A, B
  • 35. Causality preserving transformation • A transformation is called causality preserving if it does not introduce false dependencies. • Given a provenance graph and a set of entities to be abstracted/hidden, the question is how can these entities be joined or removed from the graph using only causality-preserving transformations?
  • 36. Causality preserving partition and transformation • Given a set of nodes R ⊆ V, a causality preserving partition ℘ of R is such that removing or replacing any set of nodes 𝑃 ∈ ℘ will not introduce causal dependencies • A graph transformation by partition ℘ of R is then a sequential application of Remp or Repp • The necessary and sufficient condition for such transformation to be causality preserving is that for each 𝑃 ∈ ℘ all of P’s external causes and effects are connected
  • 37. Optimal causality preserving partition • Default partition of R consists of singletons, i.e. each node in R is a set in the partition. • Optimal partition is such that none of its sets have the same sets of external causes and effects w.r.t. R • Partitioning algorithm o Step 1, determine external causes and effects for default partition o Step 2, gradually merge the partitions until optimal.
  • 38. Provenance graph transformation algorithm • Once the partition is computed, the transformations are iteratively applied to each element in the partition • Labels input provides names for generated abstract nodes • Levels input provides abstraction level for each partition o Hide • remove operation o Minimum abstraction, maximum abstraction • replace operation • isolated singletons removed as a special case.
  • 39. Computational efficiency • Transformation algorithm performance depends on the performance of the partition algorithm • The other steps are linear to cardinality of the set of partitions ℘ and its edges • The partition algorithm considers pair-wise combinations of nodes. • Overall complexity is O(R2), where R is the set of nodes to abstract
  • 40. Experimental results • Provenance view transformation algorithm was implemented in Python 2.7 using Networkx API. • Experiments were executed on Ubuntu 12.04, Intel Core i7-3687U CPU with 2.10GHz and 8GB RAM • Synthetic provenance graphs used, randomly generating edges for each node within the degree range 2-10 • Two parameters: o the percentage of nodes to abstract (from 5 to 25 with a step 5) o the percentage of nodes to abstract which are causally dependent (from 0 to 100 with a step of 25) • Each configuration was executed 10 times and the plots presented show the averages of these executions.
  • 41. Performance behaviour • Execution time (Y) in seconds as a function of the number of nodes (X) and the percentage of nodes to abstract (Z) • Quadratic time
  • 42. Use case: Access to health data • Access control for the provenance data collected from an Electronic Health Record (EHR) and clinical trial systems • Rules: o Auditors. Healthcare system auditors or law enforcement agencies can access the whole provenance graph during the auditing process. o Family doctors and patients. Electronic health records and their data provenance can only be accessed by patients during weekends, and by FDs during weekdays. o Active FDs. Active FDs have access to the provenance data associated with the EHRs of their patients and its provenance; o Clinical trial 1. If some data comes from a clinical trial, the GP needs to be participant of the trial to see the subgraph associated with that trial. o Clinical trial 2. Patients do not have access to clinical trial processes. o Laboratory. Patients do not have access to laboratory processes. o Automatic diagnosis recommendation. Patients have no access to any information related to the automatic diagnosis recommendation nor to the graph segment connecting it with the clinical evidences.
  • 43. TACLP • Transformation-oriented Access Control Language for Provenance (TACLP) • Extends the works of Ni and Cadenhead by introducing transformations • A policy consists of: o Target o Effect o Transformation o Condition (optional) o Obligation (optional)
  • 44. TACLP Target • Subject element o Set of users (subject element) to which the policy should be applied, expressed through IRI references • Record element o Set of resources to which the policy should be applied, expressed through IRI references • Restriction element (optional) o A conditional expression under which the policy is applied o Either a relational comparison between a value in a property path and a literal, or a full logical expression. • Scope element (optional) o If the policy is ‘transferable’ or ‘non-transferable’ with respect to subjects o Whether it applies to all the ancestors of matched elements in the graph, or to the matched elements only.
  • 45. TACLP Effect • Specifies the intended outcome • Four possibilities: o Absolute permit guarantees access to the graph regardless of the effect of other policies • e.g. for allowing access to auditors or law enforcement agencies, and avoids the need for additional conditions in deny policies o Deny guarantees that certain parts of the graph will not be accessed by users in the subject element. o Necessary permit is used to describe the necessary, but not always sufficient, conditions for accessing certain parts of the graphs o Permit is used to describe those parts of the graph that can be accessed if there are no other policies denying access to it.
  • 46. TACLP Transformation • How to transform the provenance graph in order to hide certain resources • Specification of which nodes need to be hidden and Removal/Replace operations to be applied to them • Set of policies comprising o Policy type (target, record, condition, effect, transformation element and obligation) o Policy evaluation type (deny- takes-precedence or permit-takes-precedence)
  • 47. TACLP Transformation • Abstraction level o Hide • matched nodes of the subgraph have to be completely hidden (removed) from the graph • Remove transformation is applied; o Minimum abstraction • Replace transformation is applied • No caused-by relationship (soft dependencies) will appear in the transformed graph. o Maximum abstraction • Replace transformation is applied • Soft dependencies can appear in the transformed graph.
  • 48. Access control evaluation algorithms • Aim to produce an abstracted graph that satisfies the constraints • Deny-takes-precedence 1. Absolute permit policies evaluated first 2. Necessary permit and deny policies 3. Permit policies • Allow-takes-precedence 1. Absolute permit evaluated first 2. Necessary permit policies 3. Permit policies 4. Deny policies
  • 51. Summary • Learning Health System presenting new set of challenges for medical and informatics communities • Provenance can help establish trust in the LHS • Methods needed to verify trust • Abstraction of provenance traces needed to address requirements of multiple stakeholders o Researchers o Regulators o Publishers • Future work o Projects running on provenance of decision support and visual analytics for health data o Looking for partnerships to investigate applications of the security work
  • 52. Acknowledgements • Thanks to: o Roxana Danger o Paolo Missier o Jeremy Bryant o Derek Corrigan o Brendan Delaney

Notas do Editor

  1. The US health system is going digital ~30% now , ~80% by 2019 - In many EU countries primary care, 100% usage of EHRs, more than 50% completely paperless • If each care provider, patient, researcher, used his/her own data only for immediate needs, we are failing to realize the potential • If comparable data are shared, we can learn and improve • The key is to figure out how to do this routinely. We can’t afford to waste data.
  2. The overall goal is a healthcare system that draws on the best evidence to provide the most appropriate care for each patient, focusing on prevention and health promotion, delivers the most value, and adds to learning and improvements with each care experience LHS: “ ... one in which progress in science, informatics, and care culture align to generate new knowledge as an ongoing, natural by-product of the care experience, and seamlessly refine and deliver best practices for continuous improvement in health and health care.” (Institute of Medicine) Examples: 1. Nationwide post-market surveillance of a new drug quickly reveals that personalized dosage algorithms require modification. A modified decision support rule is created and is implemented in EHR systems. 2. During an epidemic, new cases reported directly from EHRs. As the disease spreads into new areas, clinicians are alerted. 3. A patient faces a difficult medical decision. She bases that decision on the experiences of other patients like her. Key is to move beyond individual knowledge silos – there are some wonderful solutions out there, particularly in the US, which do brilliant work locally, but do not consider the interoperability with the wider world. Researchers are increasingly asking: how portable is this, and how can we pick it up?
  3. Feedback loop
  4. Part of a wider reproducibility challenge Potential reasons: incorrect or inappropriate stat analysis of results or insufficient sample sizes pressure to publish sometimes results in negligence over the control or reporting of experimental conditions bias towards publishing positive results many initially rejected papers get published in other journals without substantial changes or improvements Important not to overreact: Being unable to reproduce the findings does not automatically mean that the study is flawed, however it does open the research to questions.   Number of citations for the unreproducible findings actually outpaced those with reproducible findings! (ibid)
  5. Front end tools sharing the same set of reusable components in middleware and data connectivity package
  6. We start from domain ontologies, and map them to provenance ontologies, using OPM concepts Our project predates the current W3C standard but mapping from OPM to PROV straightforward (the other way, not necessarily so) Challenge was that our tools are heterogeneous, some are user-facing, and don’t share an execution environment. Thus we introduced provenance templates.
  7. Provenance server exports service interface based on the templates Abstract provenance graph fragments with semantic annotations Client applications provide details (investigators, data set references, study parameters) Sent to the provenance interface and converted into full provenance graphs and stored in the database This is a very non-intrusive way of embedding provenance into a software ecosystem.
  8. Our work builds upon existing work in the field.
  9. Key are generic causality relations and indirect relations Indirect essentially composes the original relation with transitive closure of was derived from
  10. R is the set that is getting removed, we are observing the effects and causes from its subset S
  11. Highlight the difference when R=S
  12. Entity removal transformation (RemR) is used when a subgraph needs to be hidden, e.g. if it is unnecessary for an analysis or user access to it has been restricted.
  13. Entity replacement (RepR) is used for removing details of data and operations in a subgraph while retaining some information (abstract entity) of the existence of such subgraph.
  14. Removal and replacement transformations do not introduce cycles in the new graph as long as the original graph is acyclic, as OPM provenance graphs are. However, using these transformations on an arbitrary set of nodes can introduce false dependencies, that is, causal links that were not present in the original graph.
  15. Soft edge introduced in remove
  16. Removal and replacement transformations do not introduce cycles in the new graph However, using these transformations on an arbitrary set of nodes can introduce false dependencies, that is, causal links that were not present in the original graph. entity replacement transformation introduces false dependencies when entities A and B are joined. In this case, paths from 2 to 5, D, and E do not exist in the original graph.
  17. Proof in the paper
  18. Not in the live system
  19. Remove this?
  20. Use of c* - the most general connectivity in the provenance graph
  21. The graph shows the evolution of an EHR of a patient during two visits and the subsequent actions. In the first visit the patient (Ag1) visited a general practitioner (Ag3) and an EHR system (Ag2) was used to record all the details of the visit. First, new item creation process (P1) executed, generating a new EHR version (EHR v20 - A2) based on the previous version of the patient’s EHR (EHR v19 - A1). After the patient detailed the symptoms, the GP gave them a prescription (A3) to be followed, created a blood test form (A4) for the test to be performed, and updated the data in the EHR system, generating a new version of the record (EHR v21 - A5). The blood test form was used to prepare the instrumentation and conduct the measurement (P3). All these operations were controlled by the laboratory System (Ag4) and a laboratory technician (Ag5). As part of this process, a laboratory condition report was generated (A6), and it triggered the blood test report creation process (P5), which generated the test report (A7) and a new version of the EHR containing the results of the test (EHR v22 - A9). The test and the laboratory condition reports (A7 and A6) were both used during the creation of an electronic Case Report Form, eCRF, (P4), as the patient is involved in a clinical Trial, and his progress is also followed by the Clinical Trial researcher (Ag6). The result of this action is the eCRF (A8). In the second visit, a new EHR item process (P6) was executed again, producing the new version of the EHR (A10). Followed this, the doctor used a decision support system (Ag10) to confirm their diagnostic hypothesis. They opened the application, entered the patient details (P7), and a set of diagnostic cues (A11) that were extracted from the EHR of the patient. These were then compared (P8) with the clinical evidence repository (A12) of the decision support system. A diagnostic recommendation (A13) was then obtained and given as a possible option to the GP, who used it to generate their final diagnosis (A15). A variable containing the recommendation chosen by the GP (A14) is also generated and maintained by the decision support tool. Once the GP had the diagnosis, they proceed to update the data in the EHR system, generating a new prescription for the patient (A16), and a new version of the EHR (A17).
  22. Notice that the labels properly describe the aim of the abstracted entities in the cases of laboratory and clinical trial, and the whole subgraph corresponding to the automatic diagnosis decision support processing is removed.
  23. Ultimately, LHS is about scaling up of the health system, and consequently the associated research that health system is built upon. If this scaling is to succeed we have to install mechanisms to verify trust in the system inside our research instruments. In the research world increasingly reliant on electronic tools, provenance gives us a lingua franca to achieve traceability, which we have shown to be essential to building these mechanisms. The idea was evaluated in a provenance infrastructure that was implemented in the TRANSFoRm project in three distinct LHS domains, those of clinical trials, decision support systems and cohort studies. The challenge now is to address the provenance gap that exists between the provenance metadata collected and the reporting requirements of different domains, and this will require a joint effort by a range of stakeholders, including medical scientists, informaticians, publishers and regulators. However, this work is essential if the quality of translation from research into practice in the LHS is to improve with the growing volume of data and research and not deteriorate.