SlideShare uma empresa Scribd logo
1 de 6
Baixar para ler offline
CDISC EU Interchange 2012




Semantic Models for CDISC Based Standards and Metadata Management


Introduction
We have possibly come at a critical turning point in the way clinical data can be managed, used
and reused within and across organizations. The coverage and maturity of existing CDISC
standards, the establishment of these standards within the industry at large, the use of these
standards as a foundation for metadata driven systems, and the upcoming role of semantic
standards are all converging to create new and unique opportunities. In this presentation we look
at the implications and challenges of integrating CDISC standards, metadata, and information
models into a single framework. We also show how semantic standards can provide a solid
foundation in building such a framework.


CDISC Standards
The role of data standards for the management of clinical data has shifted significantly over the
past few years, largely due to the establishment of CDISC standards across the pharmaceutical
industry. Not so long ago, sponsors had to consider if and when they should use SDTM standards
for FDA submissions. Today, those questions have changed. Not if and when, but how to best
adopt CDISC based data standards is becoming the leading question. This change in mindset is
in itself a major step forward, but also leads to formidable challenges, both for CDISC as the
owner of the standards, for sponsors integrating these standards into their own organizations, for
vendors providing products and services, and for regulatory organizations to review submitted
data.

A key challenge for any set of standards is to be consistent and complete. Looking at the CDISC
standards, we see a variety of standards at different levels of maturity. The SDTM standards,
domains and terminology seem to have the highest level of adoption to date, but as more
sponsors submit data according to those standards, its shortcomings become magnified. SDTM is
an informal model and in many instances open for interpretation. This leads to inconsistencies in


                                                                                          Page | 1
CDISC EU Interchange 2012

how collected data is mapped to SDTM, potentially across studies from a single sponsor, but
definitely across studies from different sponsors. As sponsors get comfortable adopting the
SDTM standards, they naturally venture into the CDASH and ADaM standards. These standards
have had a shorter life time and have not yet reached the maturity level of SDTM while suffering
from similar problems. In addition, issues about consistency at the content and representational
levels across the CDISC standards come into focus as well. This is highlighted by the disconnect
between the standards just mentioned and the BRIDG model, a comprehensive domain analysis
model for protocol-driven biomedical and clinical research, captured as a UML model.

Sponsors adopting CDISC have to deal with these issues. They also face the challenge to manage
and integrate CDISC based data standards within their respective organizations at the
information architecture, process, and systems application level. In the following sections we
outline some fundamental principles that can help meet these challenges.


Information Architecture
We already indicated the importance for a set of standards to be complete and consistent. Formal
models make these notions precise. Another observation is that the content of the CDISC
standards depends on the meaning of what is studied in the biological and clinical reality (often
referred to as concepts), and how these concepts are represented by data elements from protocol
to submission, i.e. we are dealing with semantic and metadata information about biomedical and
clinical research knowledge and data. The conclusion is immediate and striking. An information
architecture taking this into account needs to be based on a formal ontological metadata model.

Well placed to get the job done are semantic models based on the W3C semantic web standards
(RDF, OWL, SKOS). These standards provide the means to define a formal representation of a
body of knowledge. In short, the Resource Description Framework (RDF) specifies a general
model of how any piece of knowledge can be represented by statements of the form Subject-
Predicate-Object or Subject-Predicate-Value, called triples. Each part of a triple (except Value)
has a Uniform Resource Identifier (URI), and triples can be aggregated into graphs with subject
and objects as nodes, and predicates as arcs. The Web Ontology Language (OWL) adds a typing
mechanism to classify subjects and objects into a hierarchy of classes and defines modeling
constructs to express knowledge about predicates. This gives a rich modeling vocabulary to build
schemas and the capability to derive new triples from existing triples (inference). Finally, the
Simple Knowledge Organization System (SKOS) is a thin RDF based vocabulary that can be
used to build terminologies. See [2] for more information on RDF based standards.

A knowledge base written in RDF can easily be shared between systems by serializing it into
formats such as RDF/XML. RDF knowledge bases are also easy to federate and cross-reference
as witnessed by the development of the Linked Open Data (LOD) cloud, a large amount of open
and cross-linked RDF data sets available on the web today. In this context it should be noted that

                                                                                          Page | 2
CDISC EU Interchange 2012

an OWL version of the NCI Thesaurus (the source for CDISC’s controlled terminologies) is
freely available today in an RDF/XML format. Also, an effort is well on its way to port the
BRIDG UML model to an OWL based ontology.

Looking across the CDISC standards, we notice that the content is itself metadata, hence the
RDF schema we have in mind corresponds to a level 3 meta-model. A good starting point here is
the ISO 11179 standard for metadata registries (MDR). This standard is a bit elaborate and not
that widely adopted, but it is does provide a good starting point to develop a small and generic
OWL vocabulary for metadata models, including most notably the capability of item level
versioning for anything that goes into a metadata registry. Using an ISO 11179 based OWL
vocabulary, it is fairly straightforward to create a knowledge base for the CDASH, SDTM, and
ADaM standards.

Finally, there is a need to eliminate any possible interpretation and to guarantee consistency
between the different CDISC standards. A biomedical concept model, representing the meaning
of what is studied in the biological and clinical reality, can provide the glue to hold everything
together. It provides common and precise semantic content for any CDASH, SDTM, and ADaM
data element, and restricts these standards to have only representational capabilities. On the other
side of the coin, an RDF based biomedical concept model can link directly into other RDF
sources with semantic content such as the NCI Thesaurus and BRIDG once its OWL
representation is available.

Our considerations on an information architecture for CDISC standards based on semantic web
standards lead to the following RDF based information stack.


                                    Sponsor Extensions

                  CDASH                      SDTM                       ADaM

                               Biomedical Concept Model

                          ISO 11179 MDR Schema (subset)

              BRIDG and ISO 21090                           NCI Thesaurus

                    RDF                       OWL                        SKOS
                                             Figure 1

                                                                                            Page | 3
CDISC EU Interchange 2012

Notice that the top layer offers sponsors the opportunity to extend content based on existing RDF
schemas, e.g. sponsors may add additional SDTM data elements as supplemental qualifiers, or
introduce additional RDF schemas to cover new types of content.


CDISC Considerations
The CDISC standards have come a long way, both in terms of maturity and adoption, but also
face considerable challenges as more sponsors use the standards, and even more so as substantial
content is expected to be added for therapeutic areas. A layered information architecture based
on semantic standards can provide a solid foundation to systematically address these challenges.

The CDISC SHARE project may be the best place to get such an effort on its way, but will
require substantial commitment from CDISC as a whole to be successful. Just recently we have
provided a first draft OWL model to give a home to the ideas that the SHARE team has been
working on over the past few years. The future roadmap however seems to be unclear at best
with no firm commitment to implementation goals and time lines. At the same time the SHARE
team is already producing much valuable content that fits extremely well in the biomedical
concept model.


Sponsor and Vendor Considerations
Right now we seem to have come at a turning point, driven by a widespread adoption of CDISC
standards and an emerging need for sponsors to establish a standards management function
within their respective organizations. Large organizations have increasing difficulty just dealing
with the resulting work load of managing and applying clinical data standards. This naturally
leads to the need for a metadata repository (MDR).

The same arguments for the information architecture given earlier apply even more here.
RDF/XML represents an RDF interface format for MDR content. As indicated before, it can
easily be shared and federated, but also loaded into a triple store database. Since an RDF
knowledge base can carry its own schema and everything is represented by triples, the triple
store load is immediate and the RDF knowledge base directly represents the MDR content.

Two examples of how sponsors have started to implement semantic standards and apply linked
data principles: At Roche this is done by implementing an internally built MDR, see more details
below. At AstraZeneca the requirements on a commercial MDR product will include an interface
to MDR content based on semantic standards and linked data principles. This is part of a larger
effort called integrative informatics (i2) establishing the components to let a Linked Data cloud
grow across AstraZeneca R&D.




                                                                                           Page | 4
CDISC EU Interchange 2012

MDR Based Standards Implementation at Roche
In a first phase, Roche has successfully defined a set of clinical trial data standards based on the
CDISC, ISO 11179 MDR, and the W3C semantic standards following the architecture shown
earlier in Figure 1. In this implementation, the biomedical concept model has deliberately been
designed as a thin layer in anticipation that CDISC SHARE is going to give this part of the stack
later on. BRIDG can be added as soon as its OWL representation becomes available. The data
collection and data tabulation standards cover all of safety and the Roche therapeutic areas, but is
only partially based on CDASH. Data analysis standards are still in their infant stages.

In a second phase, Roche has built an MDR and an application infrastructure in 2011. This
includes a controlled mechanism to publish the RDF stack to a triple store database, a web
browser application to deliver the content to end-users, and a set of web services to provide
access to other applications. The MDR includes item level versioning following ISO 11179 and
is deployed in a high availability IT production environment. The next release is scheduled to
include semantic search and linking from the biomedical concept model into the NCI Thesaurus.
The good news for sponsors is that semantic technology has proven to work at all levels, from
W3C standards to semantic toolsets such as modeling workbenches, triple store databases, and
application programming interfaces (API).

Roche is now entering a third phase to establish MDR driven workflow automation from
protocol to submission. The idea is to implement a semantic representation of the protocol and
data analysis plan, and from there use the MDR content to support study build, provide data
transformation services to derive SDTM mappings, and finally support the production of data
analysis and submission deliverables.


References
1. To read more on knowledge systems and semantic modeling, the following is recommended.
      Dean Allemang and Jim Hendler. Semantic Web for the Working Ontologist. Second
       Edition. Morgan Kaufmann, 2011. This is an excellent book, well-written, specifically on
       the modeling aspects of RDF and OWL.
      Christopher Walton. Agency and the Semantic Web. Oxford University Press, 2007. This
       book gives a broad outlook on knowledge systems and the semantic web, including more
       academic background on the computational aspects of the subject.
      Dragan Gasevic, Dragan Djuric, and Vladan Devedzic. Model Driven Engineering and
       Ontology Management. Second Edition. Springer, 2009. This book provides valuable
       insight on knowledge engineering and the relationship between the different modeling
       spaces.




                                                                                            Page | 5
CDISC EU Interchange 2012

2. Here is a good entry page to locate the W3C standards for the semantic web, in particular the
   RDF, RDFS, OWL, and SKOS standards:
   http://www.w3.org/2001/sw/wiki/Main_Page
3. To see what the National Cancer Institute (NCI) is doing in the area of controlled
   terminologies and ontology modeling, have a look here:
   https://cabig.nci.nih.gov/concepts/EVS/
4. The National Center for Biomedical Ontology (NCBO) is a great resource for biomedical
   ontologies and related technologies. It can be accessed here:
   http://www.bioontology.org/




                                                                                         Page | 6

Mais conteúdo relacionado

Destaque

Update regulatory standards landscape
Update   regulatory standards landscapeUpdate   regulatory standards landscape
Update regulatory standards landscapeFred Miller
 
Pistoia Alliance Debates: SEND, the CDISC Standard for Exchange of Nonclinica...
Pistoia Alliance Debates: SEND, the CDISC Standard for Exchange of Nonclinica...Pistoia Alliance Debates: SEND, the CDISC Standard for Exchange of Nonclinica...
Pistoia Alliance Debates: SEND, the CDISC Standard for Exchange of Nonclinica...Pistoia Alliance
 
Linking clinical data standards
Linking clinical data standardsLinking clinical data standards
Linking clinical data standardsKerstin Forsberg
 
Strategies for Implementing CDISC
Strategies for Implementing CDISCStrategies for Implementing CDISC
Strategies for Implementing CDISCjbarag
 
Cdisc sdtm implementation_process _v1
Cdisc sdtm implementation_process _v1Cdisc sdtm implementation_process _v1
Cdisc sdtm implementation_process _v1ray4hz
 
Implementation of CDISC ADAM in The Pharmacokinetics Department
Implementation of CDISC ADAM in The Pharmacokinetics DepartmentImplementation of CDISC ADAM in The Pharmacokinetics Department
Implementation of CDISC ADAM in The Pharmacokinetics DepartmentSGS
 
d-Wise | SAS Clinical Data Integration
d-Wise | SAS Clinical Data Integration   d-Wise | SAS Clinical Data Integration
d-Wise | SAS Clinical Data Integration d-Wise Technologies
 
CDISC Electronic Submission to FDA
CDISC Electronic Submission to FDACDISC Electronic Submission to FDA
CDISC Electronic Submission to FDAKevin Lee
 
SDTM (Study Data Tabulation Model)
SDTM (Study Data Tabulation Model)SDTM (Study Data Tabulation Model)
SDTM (Study Data Tabulation Model)SWAROOP KUMAR K
 
CDISC SDTM Domain Presentation
CDISC SDTM Domain PresentationCDISC SDTM Domain Presentation
CDISC SDTM Domain PresentationAnkur Sharma
 

Destaque (10)

Update regulatory standards landscape
Update   regulatory standards landscapeUpdate   regulatory standards landscape
Update regulatory standards landscape
 
Pistoia Alliance Debates: SEND, the CDISC Standard for Exchange of Nonclinica...
Pistoia Alliance Debates: SEND, the CDISC Standard for Exchange of Nonclinica...Pistoia Alliance Debates: SEND, the CDISC Standard for Exchange of Nonclinica...
Pistoia Alliance Debates: SEND, the CDISC Standard for Exchange of Nonclinica...
 
Linking clinical data standards
Linking clinical data standardsLinking clinical data standards
Linking clinical data standards
 
Strategies for Implementing CDISC
Strategies for Implementing CDISCStrategies for Implementing CDISC
Strategies for Implementing CDISC
 
Cdisc sdtm implementation_process _v1
Cdisc sdtm implementation_process _v1Cdisc sdtm implementation_process _v1
Cdisc sdtm implementation_process _v1
 
Implementation of CDISC ADAM in The Pharmacokinetics Department
Implementation of CDISC ADAM in The Pharmacokinetics DepartmentImplementation of CDISC ADAM in The Pharmacokinetics Department
Implementation of CDISC ADAM in The Pharmacokinetics Department
 
d-Wise | SAS Clinical Data Integration
d-Wise | SAS Clinical Data Integration   d-Wise | SAS Clinical Data Integration
d-Wise | SAS Clinical Data Integration
 
CDISC Electronic Submission to FDA
CDISC Electronic Submission to FDACDISC Electronic Submission to FDA
CDISC Electronic Submission to FDA
 
SDTM (Study Data Tabulation Model)
SDTM (Study Data Tabulation Model)SDTM (Study Data Tabulation Model)
SDTM (Study Data Tabulation Model)
 
CDISC SDTM Domain Presentation
CDISC SDTM Domain PresentationCDISC SDTM Domain Presentation
CDISC SDTM Domain Presentation
 

Semelhante a Semantic models for cdisc based standards and metadata management (1)

PerformanceSCORM
PerformanceSCORMPerformanceSCORM
PerformanceSCORMopenforum
 
Semic 2012 highlights report
Semic 2012 highlights report Semic 2012 highlights report
Semic 2012 highlights report Semic.eu
 
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.`Shweta Bhavsar
 
Strategic Directions for Health Informatics Content Interoperability in NZ
Strategic Directions for Health Informatics Content Interoperability in NZStrategic Directions for Health Informatics Content Interoperability in NZ
Strategic Directions for Health Informatics Content Interoperability in NZHealth Informatics New Zealand
 
DCMI IEEE LTSC Joint taskforce at DC2007
DCMI IEEE LTSC Joint taskforce at DC2007DCMI IEEE LTSC Joint taskforce at DC2007
DCMI IEEE LTSC Joint taskforce at DC2007Mikael Nilsson
 
Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Wolfgang Kuchinke
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technologypeertechzpublication
 
Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Kerstin Forsberg
 
Looking for SDTM migration specialist
Looking for SDTM migration specialistLooking for SDTM migration specialist
Looking for SDTM migration specialistAngelo Tinazzi
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...DATAVERSITY
 
IDMP value beyond compliance
IDMP value beyond complianceIDMP value beyond compliance
IDMP value beyond complianceeCTDconsultancy
 
CRDC-H Draft Model Presentation to Nodes
CRDC-H Draft Model Presentation to NodesCRDC-H Draft Model Presentation to Nodes
CRDC-H Draft Model Presentation to NodesNicole Vasilevsky
 
Decoding the Acronyms in Clinical Data Standards
Decoding the Acronyms in Clinical Data StandardsDecoding the Acronyms in Clinical Data Standards
Decoding the Acronyms in Clinical Data Standardsd-Wise Technologies
 
Petroleum Data Models for spatial data
Petroleum Data Models for spatial dataPetroleum Data Models for spatial data
Petroleum Data Models for spatial dataabsvis
 

Semelhante a Semantic models for cdisc based standards and metadata management (1) (20)

Ordbms
OrdbmsOrdbms
Ordbms
 
What We Need to Know About CDISC
What We Need to Know About CDISCWhat We Need to Know About CDISC
What We Need to Know About CDISC
 
PerformanceSCORM
PerformanceSCORMPerformanceSCORM
PerformanceSCORM
 
Semic 2012 highlights report
Semic 2012 highlights report Semic 2012 highlights report
Semic 2012 highlights report
 
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
WHAT ARE METADATA STANDARDS? EXPLAIN DUBLIN CORE IN DETAIL.
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 
Strategic Directions for Health Informatics Content Interoperability in NZ
Strategic Directions for Health Informatics Content Interoperability in NZStrategic Directions for Health Informatics Content Interoperability in NZ
Strategic Directions for Health Informatics Content Interoperability in NZ
 
DCMI IEEE LTSC Joint taskforce at DC2007
DCMI IEEE LTSC Joint taskforce at DC2007DCMI IEEE LTSC Joint taskforce at DC2007
DCMI IEEE LTSC Joint taskforce at DC2007
 
Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
 
Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015Linked data presentation for who umc 21 jan 2015
Linked data presentation for who umc 21 jan 2015
 
Metadata : Concentrating on the data, not on the scheme
Metadata : Concentrating on the data, not on the schemeMetadata : Concentrating on the data, not on the scheme
Metadata : Concentrating on the data, not on the scheme
 
DDS-TSN OMG Request for Proposals (RFP)
DDS-TSN OMG Request for Proposals (RFP)DDS-TSN OMG Request for Proposals (RFP)
DDS-TSN OMG Request for Proposals (RFP)
 
Looking for SDTM migration specialist
Looking for SDTM migration specialistLooking for SDTM migration specialist
Looking for SDTM migration specialist
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
 
IDMP value beyond compliance
IDMP value beyond complianceIDMP value beyond compliance
IDMP value beyond compliance
 
CRDC-H Draft Model Presentation to Nodes
CRDC-H Draft Model Presentation to NodesCRDC-H Draft Model Presentation to Nodes
CRDC-H Draft Model Presentation to Nodes
 
Decoding the Acronyms in Clinical Data Standards
Decoding the Acronyms in Clinical Data StandardsDecoding the Acronyms in Clinical Data Standards
Decoding the Acronyms in Clinical Data Standards
 
HL7 - Whats Hot and Whats Not
HL7 - Whats Hot and Whats NotHL7 - Whats Hot and Whats Not
HL7 - Whats Hot and Whats Not
 
Petroleum Data Models for spatial data
Petroleum Data Models for spatial dataPetroleum Data Models for spatial data
Petroleum Data Models for spatial data
 

Mais de Kerstin Forsberg

Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zenecaKerstin Forsberg
 
Linked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcareLinked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcareKerstin Forsberg
 
A Justification-based Semantic Framework for Representing, Evaluating and Uti...
A Justification-based Semantic Framework for Representing, Evaluating and Uti...A Justification-based Semantic Framework for Representing, Evaluating and Uti...
A Justification-based Semantic Framework for Representing, Evaluating and Uti...Kerstin Forsberg
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings Kerstin Forsberg
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Kerstin Forsberg
 
Pushing back, standards and standard organizations in a Semantic Web enabled ...
Pushing back, standards and standard organizations in a Semantic Web enabled ...Pushing back, standards and standard organizations in a Semantic Web enabled ...
Pushing back, standards and standard organizations in a Semantic Web enabled ...Kerstin Forsberg
 
CDISC2RDF overview with examples
CDISC2RDF overview with examplesCDISC2RDF overview with examples
CDISC2RDF overview with examplesKerstin Forsberg
 
CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013
CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013
CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013Kerstin Forsberg
 
Linked open data it univ 22 nov 2012
Linked open data it univ 22 nov 2012Linked open data it univ 22 nov 2012
Linked open data it univ 22 nov 2012Kerstin Forsberg
 
Linked open data example uk spending
Linked open data example uk spendingLinked open data example uk spending
Linked open data example uk spendingKerstin Forsberg
 
Linked data in pharma it univ 2 april 2012
Linked data in pharma it univ 2 april 2012Linked data in pharma it univ 2 april 2012
Linked data in pharma it univ 2 april 2012Kerstin Forsberg
 
Linked data introduction w exempel
Linked data introduction w exempelLinked data introduction w exempel
Linked data introduction w exempelKerstin Forsberg
 
Designing and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryDesigning and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryKerstin Forsberg
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesKerstin Forsberg
 

Mais de Kerstin Forsberg (19)

Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zeneca
 
Linked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcareLinked Data efforts for data standards in biopharma and healthcare
Linked Data efforts for data standards in biopharma and healthcare
 
A Justification-based Semantic Framework for Representing, Evaluating and Uti...
A Justification-based Semantic Framework for Representing, Evaluating and Uti...A Justification-based Semantic Framework for Representing, Evaluating and Uti...
A Justification-based Semantic Framework for Representing, Evaluating and Uti...
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
 
Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium Lankade data Vinnova webbinarium
Lankade data Vinnova webbinarium
 
Pushing back, standards and standard organizations in a Semantic Web enabled ...
Pushing back, standards and standard organizations in a Semantic Web enabled ...Pushing back, standards and standard organizations in a Semantic Web enabled ...
Pushing back, standards and standard organizations in a Semantic Web enabled ...
 
CDISC2RDF overview with examples
CDISC2RDF overview with examplesCDISC2RDF overview with examples
CDISC2RDF overview with examples
 
CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013
CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013
CDISC2RDF poster for Conference on Data Integration in the Life Sciences 2013
 
Cdisc2 rdf overveiw
Cdisc2 rdf overveiwCdisc2 rdf overveiw
Cdisc2 rdf overveiw
 
Linked open data it univ 22 nov 2012
Linked open data it univ 22 nov 2012Linked open data it univ 22 nov 2012
Linked open data it univ 22 nov 2012
 
Linked open data example uk spending
Linked open data example uk spendingLinked open data example uk spending
Linked open data example uk spending
 
Linked data in pharma it univ 2 april 2012
Linked data in pharma it univ 2 april 2012Linked data in pharma it univ 2 april 2012
Linked data in pharma it univ 2 april 2012
 
Linked data introduction w exempel
Linked data introduction w exempelLinked data introduction w exempel
Linked data introduction w exempel
 
Designing and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryDesigning and launching the Clinical Reference Library
Designing and launching the Clinical Reference Library
 
Linked data in pharma
Linked data in pharmaLinked data in pharma
Linked data in pharma
 
Linked data in pharma R&D
Linked data in pharma R&DLinked data in pharma R&D
Linked data in pharma R&D
 
Mobile Newsmaking
Mobile NewsmakingMobile Newsmaking
Mobile Newsmaking
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
Extensible use of RDF
Extensible use of RDFExtensible use of RDF
Extensible use of RDF
 

Último

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Semantic models for cdisc based standards and metadata management (1)

  • 1. CDISC EU Interchange 2012 Semantic Models for CDISC Based Standards and Metadata Management Introduction We have possibly come at a critical turning point in the way clinical data can be managed, used and reused within and across organizations. The coverage and maturity of existing CDISC standards, the establishment of these standards within the industry at large, the use of these standards as a foundation for metadata driven systems, and the upcoming role of semantic standards are all converging to create new and unique opportunities. In this presentation we look at the implications and challenges of integrating CDISC standards, metadata, and information models into a single framework. We also show how semantic standards can provide a solid foundation in building such a framework. CDISC Standards The role of data standards for the management of clinical data has shifted significantly over the past few years, largely due to the establishment of CDISC standards across the pharmaceutical industry. Not so long ago, sponsors had to consider if and when they should use SDTM standards for FDA submissions. Today, those questions have changed. Not if and when, but how to best adopt CDISC based data standards is becoming the leading question. This change in mindset is in itself a major step forward, but also leads to formidable challenges, both for CDISC as the owner of the standards, for sponsors integrating these standards into their own organizations, for vendors providing products and services, and for regulatory organizations to review submitted data. A key challenge for any set of standards is to be consistent and complete. Looking at the CDISC standards, we see a variety of standards at different levels of maturity. The SDTM standards, domains and terminology seem to have the highest level of adoption to date, but as more sponsors submit data according to those standards, its shortcomings become magnified. SDTM is an informal model and in many instances open for interpretation. This leads to inconsistencies in Page | 1
  • 2. CDISC EU Interchange 2012 how collected data is mapped to SDTM, potentially across studies from a single sponsor, but definitely across studies from different sponsors. As sponsors get comfortable adopting the SDTM standards, they naturally venture into the CDASH and ADaM standards. These standards have had a shorter life time and have not yet reached the maturity level of SDTM while suffering from similar problems. In addition, issues about consistency at the content and representational levels across the CDISC standards come into focus as well. This is highlighted by the disconnect between the standards just mentioned and the BRIDG model, a comprehensive domain analysis model for protocol-driven biomedical and clinical research, captured as a UML model. Sponsors adopting CDISC have to deal with these issues. They also face the challenge to manage and integrate CDISC based data standards within their respective organizations at the information architecture, process, and systems application level. In the following sections we outline some fundamental principles that can help meet these challenges. Information Architecture We already indicated the importance for a set of standards to be complete and consistent. Formal models make these notions precise. Another observation is that the content of the CDISC standards depends on the meaning of what is studied in the biological and clinical reality (often referred to as concepts), and how these concepts are represented by data elements from protocol to submission, i.e. we are dealing with semantic and metadata information about biomedical and clinical research knowledge and data. The conclusion is immediate and striking. An information architecture taking this into account needs to be based on a formal ontological metadata model. Well placed to get the job done are semantic models based on the W3C semantic web standards (RDF, OWL, SKOS). These standards provide the means to define a formal representation of a body of knowledge. In short, the Resource Description Framework (RDF) specifies a general model of how any piece of knowledge can be represented by statements of the form Subject- Predicate-Object or Subject-Predicate-Value, called triples. Each part of a triple (except Value) has a Uniform Resource Identifier (URI), and triples can be aggregated into graphs with subject and objects as nodes, and predicates as arcs. The Web Ontology Language (OWL) adds a typing mechanism to classify subjects and objects into a hierarchy of classes and defines modeling constructs to express knowledge about predicates. This gives a rich modeling vocabulary to build schemas and the capability to derive new triples from existing triples (inference). Finally, the Simple Knowledge Organization System (SKOS) is a thin RDF based vocabulary that can be used to build terminologies. See [2] for more information on RDF based standards. A knowledge base written in RDF can easily be shared between systems by serializing it into formats such as RDF/XML. RDF knowledge bases are also easy to federate and cross-reference as witnessed by the development of the Linked Open Data (LOD) cloud, a large amount of open and cross-linked RDF data sets available on the web today. In this context it should be noted that Page | 2
  • 3. CDISC EU Interchange 2012 an OWL version of the NCI Thesaurus (the source for CDISC’s controlled terminologies) is freely available today in an RDF/XML format. Also, an effort is well on its way to port the BRIDG UML model to an OWL based ontology. Looking across the CDISC standards, we notice that the content is itself metadata, hence the RDF schema we have in mind corresponds to a level 3 meta-model. A good starting point here is the ISO 11179 standard for metadata registries (MDR). This standard is a bit elaborate and not that widely adopted, but it is does provide a good starting point to develop a small and generic OWL vocabulary for metadata models, including most notably the capability of item level versioning for anything that goes into a metadata registry. Using an ISO 11179 based OWL vocabulary, it is fairly straightforward to create a knowledge base for the CDASH, SDTM, and ADaM standards. Finally, there is a need to eliminate any possible interpretation and to guarantee consistency between the different CDISC standards. A biomedical concept model, representing the meaning of what is studied in the biological and clinical reality, can provide the glue to hold everything together. It provides common and precise semantic content for any CDASH, SDTM, and ADaM data element, and restricts these standards to have only representational capabilities. On the other side of the coin, an RDF based biomedical concept model can link directly into other RDF sources with semantic content such as the NCI Thesaurus and BRIDG once its OWL representation is available. Our considerations on an information architecture for CDISC standards based on semantic web standards lead to the following RDF based information stack. Sponsor Extensions CDASH SDTM ADaM Biomedical Concept Model ISO 11179 MDR Schema (subset) BRIDG and ISO 21090 NCI Thesaurus RDF OWL SKOS Figure 1 Page | 3
  • 4. CDISC EU Interchange 2012 Notice that the top layer offers sponsors the opportunity to extend content based on existing RDF schemas, e.g. sponsors may add additional SDTM data elements as supplemental qualifiers, or introduce additional RDF schemas to cover new types of content. CDISC Considerations The CDISC standards have come a long way, both in terms of maturity and adoption, but also face considerable challenges as more sponsors use the standards, and even more so as substantial content is expected to be added for therapeutic areas. A layered information architecture based on semantic standards can provide a solid foundation to systematically address these challenges. The CDISC SHARE project may be the best place to get such an effort on its way, but will require substantial commitment from CDISC as a whole to be successful. Just recently we have provided a first draft OWL model to give a home to the ideas that the SHARE team has been working on over the past few years. The future roadmap however seems to be unclear at best with no firm commitment to implementation goals and time lines. At the same time the SHARE team is already producing much valuable content that fits extremely well in the biomedical concept model. Sponsor and Vendor Considerations Right now we seem to have come at a turning point, driven by a widespread adoption of CDISC standards and an emerging need for sponsors to establish a standards management function within their respective organizations. Large organizations have increasing difficulty just dealing with the resulting work load of managing and applying clinical data standards. This naturally leads to the need for a metadata repository (MDR). The same arguments for the information architecture given earlier apply even more here. RDF/XML represents an RDF interface format for MDR content. As indicated before, it can easily be shared and federated, but also loaded into a triple store database. Since an RDF knowledge base can carry its own schema and everything is represented by triples, the triple store load is immediate and the RDF knowledge base directly represents the MDR content. Two examples of how sponsors have started to implement semantic standards and apply linked data principles: At Roche this is done by implementing an internally built MDR, see more details below. At AstraZeneca the requirements on a commercial MDR product will include an interface to MDR content based on semantic standards and linked data principles. This is part of a larger effort called integrative informatics (i2) establishing the components to let a Linked Data cloud grow across AstraZeneca R&D. Page | 4
  • 5. CDISC EU Interchange 2012 MDR Based Standards Implementation at Roche In a first phase, Roche has successfully defined a set of clinical trial data standards based on the CDISC, ISO 11179 MDR, and the W3C semantic standards following the architecture shown earlier in Figure 1. In this implementation, the biomedical concept model has deliberately been designed as a thin layer in anticipation that CDISC SHARE is going to give this part of the stack later on. BRIDG can be added as soon as its OWL representation becomes available. The data collection and data tabulation standards cover all of safety and the Roche therapeutic areas, but is only partially based on CDASH. Data analysis standards are still in their infant stages. In a second phase, Roche has built an MDR and an application infrastructure in 2011. This includes a controlled mechanism to publish the RDF stack to a triple store database, a web browser application to deliver the content to end-users, and a set of web services to provide access to other applications. The MDR includes item level versioning following ISO 11179 and is deployed in a high availability IT production environment. The next release is scheduled to include semantic search and linking from the biomedical concept model into the NCI Thesaurus. The good news for sponsors is that semantic technology has proven to work at all levels, from W3C standards to semantic toolsets such as modeling workbenches, triple store databases, and application programming interfaces (API). Roche is now entering a third phase to establish MDR driven workflow automation from protocol to submission. The idea is to implement a semantic representation of the protocol and data analysis plan, and from there use the MDR content to support study build, provide data transformation services to derive SDTM mappings, and finally support the production of data analysis and submission deliverables. References 1. To read more on knowledge systems and semantic modeling, the following is recommended.  Dean Allemang and Jim Hendler. Semantic Web for the Working Ontologist. Second Edition. Morgan Kaufmann, 2011. This is an excellent book, well-written, specifically on the modeling aspects of RDF and OWL.  Christopher Walton. Agency and the Semantic Web. Oxford University Press, 2007. This book gives a broad outlook on knowledge systems and the semantic web, including more academic background on the computational aspects of the subject.  Dragan Gasevic, Dragan Djuric, and Vladan Devedzic. Model Driven Engineering and Ontology Management. Second Edition. Springer, 2009. This book provides valuable insight on knowledge engineering and the relationship between the different modeling spaces. Page | 5
  • 6. CDISC EU Interchange 2012 2. Here is a good entry page to locate the W3C standards for the semantic web, in particular the RDF, RDFS, OWL, and SKOS standards: http://www.w3.org/2001/sw/wiki/Main_Page 3. To see what the National Cancer Institute (NCI) is doing in the area of controlled terminologies and ontology modeling, have a look here: https://cabig.nci.nih.gov/concepts/EVS/ 4. The National Center for Biomedical Ontology (NCBO) is a great resource for biomedical ontologies and related technologies. It can be accessed here: http://www.bioontology.org/ Page | 6