Current labs can greatly benefit from a digital transformation.
FAIR data principles are crucial in this process.
Laying a solid data governance foundation is an invaluable long-term move.
2. Slide 2
Which experiments did we
already run for this substance?
We need all analytical results
for samples of this batch.
3. Slide 3
Evolving technology advancementsIncreasing regulatory requirements
Advanced algorithms
Enabling complex simulations/ in silico
experiments and insights generation
Data as an asset
Building integrated data assets from large scale
data sets and platforms along CMC value chain
IoT
Automating collection and analysis of
laboratory data at the point of data creation
Continuous manufacturing
Improving cost & quality through continuous
manufacturing assets and in-process quality
control
Distributed collaboration platforms
Sharing real-time data and insights with
remote/ 3rd partner research partners
Example: Drivers of digital transformation in CMC
Changing customer needs
Shared developments
Sharing data & developing new approaches
across companies & industries (e.g. GSK &
McLaren)
Accelerated development
Continuous requirement to increase speed and
improve quality of trial supply
New product modalities & combinations
Adjusting CMC process to specific requirements
for combination products (e.g. regulations)
Future clinical trial design
Smaller local trials with near real time supply of
finished goods to patient at point of care
M&A readiness
Being prepared to rapidly integrate M&A assets
(scale-up)
As a strategic development partner, CMC is
faced with increasingly complex requirements of
internal and external customers
Technological advancements represent
significant challenges for CMC, but also the key
to cope with increasing regulatory and
customer requirements
To maintain their license to operate, CMC must
address increasing regulatory (data)
requirements
Transparency
Increased requirement to provide transparency
on regulatory data
Regulatory scope
Increased scope of regulatory data collection
in early research
Dossier management
Structured content management system to
automate dossier mgmt. and submission
System validation
Need for automated data submission/ GXP
validation
PQ/CMC Standardization
Increase data integrity by standardisation and
algroithm driven evaluation of the PQ/CMC
data (KASA) – FDA Initiative
CMC = Chemistry Manufacturing and Control
4. Slide 4
Illustrative phases of CMC process [1]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
IIPC and quality control (GMP)
Engineering production drug product
Analytical methods development
IND filing
Preliminary development assays & standards
Impurity identification
Salt screening, polymorphism screening etc. (DoE)
Phase I (GMP) drug substance production
Process & analytical documentation
Qualification & transfer to QC
Critical parameter selection & validation (QTPP)
Safety assessment
Clinical reference standards
Pre-formulation development
Drug product production
Process & analytical Documentation
Product characterization
Preclinical/engineering drug substance production
Preliminary drug substance stability studies
Tech transfer to commercial site
Clinical formulation development
IND Ffiling preparation – CMS section
These trends materialize in a multitude of use cases that allow to
significantly improve time-to-market and efficiency
• Data standardization ‚Allotrope‘
• Reference & master data
• Digital archive
• Digital lab
• Digital analytical methods
• Late lead optimization
• In-silico first (experiments)
• Predictive stability study
• Digitized RIMS process
• Distributed collaboration
• Digital twin of drug product
• Predictive quality by design/ optimal
batch trajectory
• Speed-up study start-up (clinical
supply)
• In-silico trials (toxicity and safety)
• Structured authoring and reporting
• Digital submissions
• […]
Digital R&D use-case long list*
Duration [months]
Analytical
Development
Process
Chemistry
Formulation
Development
Drug
Substance
Manufacturing
Drug Product
Manufacturing
Clinical Study
& Regulatory
1
2
3
4
5
6
In-silico experiments
Simulating experiments digitally and
using wet lab experiments only for
validation/ stability studies
-30% Wet lab experiments and
generating more insights
Predictive trials
Correlation of pre-clinical data
(e.g. stability data) and clinical
data to lower attrition rate.
~30% faster study startup
Structured authoring &
reporting (KASA)
Machine-readable and
semantically enriched data
to automate digital
submissions.
~30% faster in
authoring and adjusting
(post-approval) e-
submission
Optimal batch trajectory
(Quality-by-design)
Batch comparison and identification of
patterns for multi-variant process
optimization
~25% increased productivity of batch
throughput and quality
Digital use cases will
have a positive impact on
R&D value drivers Reduction of time-to-market
25%
Increase in efficiency
30%
Process simulation (incl. scale-up)
Digital lab
System Integration; connecting
instruments and lab applications
(CDS, LIMS)
~30% increase of efficiency
and improved quality
Generating more novel insights
25%
Decrease in compliance related costs
20%
[1] Timeline for Key Chemistry, Manufacturing, and Control Tasks for Therapeutic Monoclonal Antibody Product Development
[2] Source: Contractpharma, Strategy&/ OSTHUS Project Experience & Analysis
6. Slide 6
Guiding Principles for Scientific Data Management and Stewardship
https://www.nature.com/articles/sdata2016182016
7. Slide 7
Many initiatives: This one gives an overview and practical guidance
https://fairtoolkit.pistoiaalliance.org/
Lead by Ian Harrow
8. Slide 8
Poll 2: How many data initiatives are
relating to FAIR inside your organization?
A. None
B. Some
C. Many
9. Slide 9
Key Observations
• FAIR is a good mindset and activates people
• Too much effort to make everything FAIR
• Difficult to measure FAIRness
• Align-o-mania
10. Slide 10
The FAIR Principles
F1: (Meta) data are assigned globally unique and persistent identifiers
F2: Data are described with rich metadata
F3: Metadata clearly and explicitly include the identifier of the data they describe
F4: (Meta)data are registered or indexed in a searchable resource
A1: (Meta)data are retrievable by their identifier using a standardized communication protocol
A1.1: The protocol is open, free and universally implementable
A1.2: The protocol allows for an authentication and authorization where necessary
A2: Metadata should be accessible even when the data is no longer available
I1: (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation
I2: (Meta)data use vocabularies that follow the FAIR principles
I3: (Meta)data include qualified references to other (meta)data
R1: (Meta)data are richly described with a plurality of accurate and relevant attributes
R1.1: (Meta)data are released with a clear and accessible data usage license
R1.2: (Meta)data are associated with detailed provenance
R1.3: (Meta)data meet domain-relevant community standards
Good explanations at https://www.go-fair.org/fair-principles/
11. Slide 11
F1: (Meta) data are
assigned globally unique
and persistent identifiers
Examples:
• human polycystin-1 protein http://www.uniprot.org/uniprot/P98161
• Publication “FAIR Guiding Principles…” https://doi.org/10.1038/sdata.2016.18
• Internal product identification https://pid.yourcompany.com/products/USR48E0zaS
You separate identification of data
resources from their (current) location
User Applications
Registry
12. Slide 12
F4: (Meta)data are
registered or indexed in a
searchable resource
Most powerful with some high-level
semantic types, e.g., product,
substance, device, experiment,
sample, batch…
User
Simple Lookup
13. Slide 13
13
Governance Node
Concepts
Reference & Master Entities
Registry
Authorized
Source
publish
Substance
Authorized
Consumer
register
subscribe
Further authorized
sources …
Resolution
Model Repository
Lookup
OperatingModel
Mappings
Product
Managing Data
as a product
with Distributed
Reference &
Master Data
Whitepaper by H. Oberkampf, C. Senger,
M. Chisholm to be published soon.
14. Slide 14 * McKinsey on Leadership: https://www.mckinsey.com/business-functions/organization/our-insights/the-organization-blog/slowing-down-to-speed-up
Slowing down to speed up*
Invest in things that matter long term and take time
to develop, so that we can move quickly and become
successful over the long haul.
15. Slide 15
1. Identification of Key Challenges & Questions
Related to People, Processes, Data, Tech, e.g.,
• Business information needs & regulatory pressure
• Cross-functional collaboration
• Assumption that “IT will solve it”
• Data is not managed as an asset
2. Agreement on Guiding Principles & Policies
• E.g., FAIR, data accountability, standardization
• Target data and IT architecture
3. Practical Implementation with Concrete and
Coherent Actions
• Prioritization of data assets, applications and a use-case pipeline
• Creation of PoCs to validate the strategy while creating value
• Operationalization and roll-out with organizational framework
• Enterprise scaling/alignment
Moving from application-centric to a data-centric FAIR organization
requires a good data strategy
Data &
Analytics
Centric
Application
Centric
16. Slide 16
Blueprints
Re-use is faster than re-invent
Adaptable to fit your needs
Operationalize your data strategy
Scale across functions
17. Slide 17
Poll 2: Which Blueprints are relevant for you?
A. Combined process and data perspectives
B. Use-Case Pipeline
C. Data Governance Framework
D. Data Management Maturity Model
E. Data Governance Organization
F. Data Architecture
G. Reference & Master Data Management
18. Slide 18
Application 1 Application 2 Application 3 …
CHEMISTRY ANALYTICS IN-SILICO DEVFORMULATION CHARACTERIZATION DOWNSTREAMFORMULATIONANALYTICS UPSTREAM
Application X Application Y Application Z …
STABILITY TESTING PRODUCT PROFILE REGULATORY
REQUIREMENTS
CRITICAL QUALITY
ATTRIBUTES
LEAD
OPTIMIZATION
BUSINESS
ENTITIES
COMPOUND API …RESULTTESTPROTOCOLEXPERIMENTBATCH
Business Processes
MAPPING AND ALIGNMENT LAYER
BATCH
RESULT
APICOMPOUND
TEST
PROTOCOL
EXPERIMENT
BUSINESS
DOMAINS
BUSINESS
PROCESSES
PROCESS
ANALYSIS
DATA
INVENTORY
PROCESS DATA
MAPPING
GAP
ANALYSIS
INTEGRATED DATA
MODEL
Approach:
Use-Case driven
Bridging Data &
Business
Agile & MVPs
TOGAF principles
simplified & illustrative
19. Slide 19
Use-Case Pipeline: Structured Process with focus on value and scaling
Collect across functions
Assess feasibility
Prioritize by business value
Realize an MVP
Operationalize & Scale
20. Slide 20
More detailed information is collected in a
second step required regarding
• Fit-Gap of functional requirements
(technical capabilities) – map against
existing platform
• Data requirements and MVP description
• Effort estimation
21. Slide 21
Data Governance Framework
Data Governance
define the data strategy
Policies & Rules
of Engagement
Mission
Data Governance
Organization
Data Management
execute the IT and business processes that touch data
Management
define the business strategy
Data Management
Maturity Model
Data Stakeholders Data Council Data Stewards
WHOWHENWHY
Level 1:
Set up Data
Council
Level 2:
Agree Data
Scope
Level 3: Capture
Data Assets: Data
Elements,
Sources & Flows
Level 4:
Implement Data
Quality &
Ownership
Level 5:
Publish Data
Controls
Level 6:
Review Data
Controls &
Transition to
RTO
Ongoing
Effort:
Maintain Data
Quality & Data
Assets
Quality Monitoring
Data Modelling
Ref. & Master Data
WHAT
Goals Metrics/Success
Measure
Value Identification
& Funding
Focus Areas Data Rules, Definitions & Policies
to achieve
HOW
Control Mechanisms
Decision Rights
AccountabilitiesInitiatives and
priority use-cases
Adapted from Data Governance Institute
22. Slide 22
Main Tasks Maturity People Process Data Technology
Maintain Data Quality & Data
Assets
Continuously Assess Gaps:
Producers Consumers Interact
Publish Data Controls
Implement Data Quality &
Ownership
Capture Data Assets: Flows,
Sources & Data Elements
Agree on the Data Scope
Set up Data Council
(One Time Effort)
Run the
Organization
Maturity 6
Maturity 5
Maturity 4
Maturity 3
Maturity 2
Maturity 1
• Sufficiently resourced and
highly skilled
• Data citizenship established
• Federated data management
• Assigned ownership
• training of data owners
• adequately resourced and
optimally skilled
• People involved in data
capture are informed and
trained
• Key DG stakeholders are
identified
• Assigned resources for
governing the data in scope
• Training plan developed
• Awareness created
• Organized but poorly resourced
• basic training council members
• Understanding of DG needs
• Continuously measured and
improved
• Best practices are shared with
peers and industry
• Service Level Agreement
• Operational Level Agreement
• Pro-active management of
regulatory changes
• Metrics are used end-2-end,
including variance, prediction and
quantitative analysis
• Monitoring and issue reporting
• data cleansing procedures started
• Standard processes are
consistently followed.
• For specific needs standards
process and policy are used.
• Feedback mechanisms in place.
• Planned and executed in
accordance with policy
• Definition of process control needs
• Ad hoc, primarily reactive at the
project level
• DG is not applied across business
areas
• New data domains are on-
boarded as needed in agile
mode
• Seen as critical to survival
• High degree of data
interoperability and reuse
• Data integrity and accuracy is
measured and maintained
• Data quality requirements are
identified and KPIs defined
• Consistent, accurate and valid
• clear picture of critical data
elements and data flows
• authorized data source known
• Core data assets are available
but poor quality
• identification of data criticality
• Business glossary defined
• Unavailable and/or of poor
quality
• Missing context
• Dependency on key experts
• Contributes to efficiency of the
capacity
• Easy to use for different user
groups
• Technological fit/gap is
measured and improved in agile
mode
• Supports business and IT needs
• Data quality tool operational
• Supports the business needs
• Many data processing black
boxes
• Technical way to access data,
partially with outdated tech.
• Missing operational support
• Difficult to use or frequent errors
• Lack of technology requires
manual/paper-based actions
No Data Governance exists and awareness of Data Governance is missing. People, process, data and technology are not aligned
DG States: Performed Managed Defined Measured Optimized
Extension to standard DCAM from EDM Council https://edmcouncil.org/ with dimensions for people, processes, data and technology
CDE = critical data element, DG = Data Governance
Iterative Data Governance Maturity Scale
23. Slide 23
Federated Data Governance Organization
Federated Data ManagementCentral Data Management
Enterprise Data Committee
Enterprise Data Officer Divisional Data Officers
Data CouncilsChief Data Office
Divisional/Regional Data Offices
Other Council Members
Data Definition Owner
Data Architect Data Platform Owner
Data ConsumersData Content Owner
Data Architecture Council Data Steward
(Representative)
Chief Data Architect
• Data Strategy
• Data Architecture
• Data Mgmt. Programs
• Regulatory Engagement
• Data Insights
• Data Quality Mgmt.
• Culture & Capabilities
Divisional Data Officers
council office role Working Groups / Resolution Teams
24. Slide 24
High-level Data Architecture Blueprint
Data Models
Taxonomy/Ontology
General Standards (e.g., UML, RDF, SKOS, DCTerms, DCAT, …)
Master Data Reference Data
Alignment &
Transformation
Industry
Standards
IDMP, Allotrope,
QUDT, ChEBI, …
Mappings
Glossary
Metadata Models
Enterprise
Information
Model
25. Slide 25
Building Data Models Modularized & Reusable
Common Semantic Concepts
Selected Standards
Common Attributes
Common RelationshipsGovernance
Common versioning,
meta model, authority,
data source,
information security &
life-cycle
Moredetail,structure&dependencies
Persistent Identifiers
alignments
alignments
alignments
Common Patternsalignments
mappings
Extensions,
Perspectives
26. Virtually everything in business
today is an undifferentiated
commodity, except how a
company manages its
information. How you manage
information determines whether
you win or lose.
Vendor agnostic Service
Provider for digitalization:
Data Strategy, Governance,
Integration, Analytics
leveraging the full stack of AI.
Digital Integration Hub:
provides a highly scalable
enterprise platform for
information lifecycle
management and digital
preservation.
Digital Analytics Engine
integrates virtually
knowledge and data cross
any kind of data sources.
Digital Registry
for Data Governance
which provides reliable and
fast access to distributed
reference and master data.
“
– Bill Gates
D i s r u p t i v e I n n o v a t i o n a t E n t e r p r i s e S c a l e