Standard models in evidence synthesis work well in settings characterized by a large evidence base, the absence of effect modifiers, and connected networks. Handling sparse data, substantial between-study heterogeneity and disconnected studies, however, poses challenges to researchers and requires advanced methodology.
In the absence of head-to-head studies, evidence synthesis is a well-established technique to indirectly compare novel and established interventions in various disease areas. In standard settings, the most established methods for various outcome types work well and result in realistic effect estimates. However, there are a variety of situations when standard methods may no longer be sufficient:
- if there is only a sparse network of evidence
- if there is a large amount of between-study heterogeneity
- if the network is disconnected
Key Topics Include:
- General introduction into the objectives of conducting evidence synthesis
- Description of typical situations of “non-standard” data, including sparse networks of evidence, a large amount of between-study heterogeneity, or disconnected networks
- Advanced methods to address non-standard data, including the use of informative priors, subgroup analyses, meta-regression and multi-level meta regression, and matching-adjusted indirect comparisons (MAICs)
- Case studies illustrating how these advanced methods of evidence synthesis are applied on actual data
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Evidence Synthesis for Sparse Evidence Base, Heterogeneous Studies, and Disconnected Networks
1. Copyright 2022. All Rights Reserved. Contact Presenter for Permission
Evidence Synthesis for
Sparse Evidence Base,
Heterogeneous Studies, and
Disconnected Networks
Matthias Hunger, MSc, Dr. rer. biol. hum
Lead Epidemiologist
Global HEOR & Epidemiology
ICON
Nathan Green, PhD
Senior Research Fellow
Department of Statistical Science
University College London
Katrin Haeussler, MSc, PhD
Senior Health Economist
Global HEOR & Epidemiology
ICON
2. Introduction into evidence synthesis
Heterogeneity
Case study on choosing suitable priors
Matching-adjusted indirect comparison (MAIC)
Multilevel network meta-regression (ML NMR)
2
Agenda
4. What is evidence synthesis?
4
– All relevant randomized controlled trials (RCTs) of high
quality in a medical area are identified through a
systematic review of the literature
– These RCTs are all put together “in one melting pot”
– Statistical methodology is used to calculate a pooled
treatment effect (e.g. a relative effect in terms of odds
ratio or relative risk or an absolute effect in terms of risk
difference)
– This pooled treatment effect can help to draw conclusions
on comparative efficacy and safety of interventions of
interest
Image source: Irish Medieval Food - Pottage
Lora O'Brien - Irish Author & Guide
5. Why do we conduct evidence synthesis?
5
– One RCT alone is not enough
evidence to come to a final
conclusion on most effective and
safest treatment
– RCTs often result in contradictory
conclusions
– Quality assessment can help
investigating which RCTs are
of best design and therefore
most reliable
– Evidence synthesis of high quality
RCTs can help overcome
contradictory conclusions of
individual RCTs Drawing by Maki Naro, source: https://slate.com/technology/2015/04/vaccines-
and-autism-a-new-study-shows-no-connection.html
6. Which methods of evidence synthesis exist?
6
– Head-to-head studies are available comparing
experimental intervention (Tx A) to comparator of
interest (Tx B)
– à Simplest approach: Pairwise meta-analysis
– Weights are assigned to individual studies
based on variance
– No head-to-head studies are available
– à More complex approaches involving indirect
comparisons
– Bucher method based on simple equations
– Network meta-analysis (NMA) based on
generalized linear models
– Bayesian and frequentist approaches
available
Tx A Tx B
Study A, 2007
Study B, 2005
Study C, 2010
Study D, 2015
Study E, 2008
PBO Tx B
Tx A
Study G, 2009
Study H, 2016
Study J, 2012
Study K, 2015
Study L, 2014
Indirect
Comparison
A vs B
7. Comparison of Bayesian and Frequentist approaches
Frequentist
– Based on repeated experiments
– In a frequentist NMA, hypothesis testing takes place and
results can be interpreted as showing a statistically significant
difference or the absence thereof.
– The 95% CI for many samples would contain the true
population parameter 95% of the time. CIs cannot be
interpreted in terms of probabilities
– No additional information can be included and the analysis is
based solely on the observed data.
– Ranking through P-Score
– Can either be based on weighted regression models
following Rücker or conducted by means of the Bucher
method. The Bucher ITC is based on simple equations.
Bayesian
– Formal combination of prior distribution with likelihood
distribution to obtain a posterior distribution.
– Every parameter is defined as a random variable
– No hypothesis testing takes place in a Bayesian NMA.
Therefore, comparability of treatments can be shown
directly and we do not speak of statistical significance.
Treatments are deemed comparable or one treatment is
favorable over another treatment.
– Interpretation of 95% credible intervals is not based on
repeated experimentation and therefore can be conducted
as: with 95% probability, a certain value lies within the
credible interval.
– Additional information can be included in the priors to
strengthen the evidence base
– Ranking through SUCRA
– Probabilities of each treatment to be better than each
comparator can be estimated
– Following NICE DSU guidelines
Hackenberger, B.K.: Bayes or not Bayes, is this the question? Croat Med J 2019;60(1):50-52.
10. Similarity assumption
10
– The key assumption for indirect treatment comoparisons is the similarity
assumption
– Similarity means that one would expect
the relative effect of A vs. PBO to remain
unchanged if the study was conducted
under the conditions of the B vs. PBO
study (and vice versa)
– Similarity means that the pairs of trials
analyzed are comparable regarding
potential effect modifiers
12. Feasibility of NMA
Do the patients in all trials match the target population
for decision / inference
Yes, all trials No, only a subset
Combine in NMA
Are there reasons to think that
there are differences
in treatment effect
No Yes
1: Restrict NMA
to relevant trials
2: Consider subgroup
analysis / meta-regression
3: Use methods addressing effect
modifying (MAIC, ML-NMR)
13. 13
Networks we like…and do not like…
Large network of evidence Sparse network of evidence
Disconnected network of evidence
S
t
u
d
y
9
S
t
u
d
y
A
S
t
u
d
y
B
S
t
u
d
y
C
Study 8
Study K
S
t
u
d
y
1
2
S
t
u
d
y
Y
S
t
u
d
y
Z
Study 13
Study U
Study V
Study W
Study 7
Study D
Study E
Study 13
Study U
Study V
Study W
Study 13
Study U
Study V
Study W
Study
7
Study
D
Study
E
S
t
u
d
y
7
S
t
u
d
y
D
S
t
u
d
y
E
S
tu
d
y
1
S
tu
d
y
2
S
tu
d
y
N
S
tu
d
y
O
S
tu
d
y
P
Study 11
Study X
Study 3
Study L
Study M
Study 10
Study G
Study H
Study I
Study J
Study 6
Study Q
Study R
Study S
Study T
S
t
u
d
y
1
0
S
t
u
d
y
G
S
t
u
d
y
H
S
t
u
d
y
I
S
t
u
d
y
J
S
tu
d
y
1
0
S
tu
d
y
G
S
tu
d
y
H
S
tu
d
y
I
S
tu
d
y
J
S
t
u
d
y
4
S
t
u
d
y
5
S
t
u
d
y
F
Study 8
Study 7
S
t
u
d
y
1
0
S
t
u
d
y
G
S
tu
d
y
1
0
S
tu
d
y
G
S
tu
d
y
H
S
tu
d
y
I
S
tu
d
y
J
S
t
u
d
y
9
S
t
u
d
y
A
S
t
u
d
y
B
S
t
u
d
y
C
Study 13
Study U
Study V
Study W
Study 13
Study U
Study V
Study W
Study 13
Study U
Study V
Study W
Study
7
Study
D
Study
E
S
t
u
d
y
7
S
t
u
d
y
D
S
t
u
d
y
E
S
t
u
d
y
4
S
t
u
d
y
5
S
t
u
d
y
F
– Standard case
– NICE DSU guidelines can be followed straightforwardly
– Informative priors could be investigated
– Different model assumptions in Bayesian and frequentist approaches can result
in different degrees of confidence in the results
– Bayesian model: Crucial to elicit suitable priors
– Frequentist model: No priors used, yet not as flexible
– MAIC would be a possibility to compare to
disconnected single-arm study
Study 10
S
t
u
d
y
1
0
S
t
u
d
y
G
S
t
u
d
y
1
0
S
t
u
d
y
G
S
t
u
d
y
H
S
t
u
d
y
I
S
t
u
d
y
J
Effect modifying
– MAIC, ML-NMR
15. Non-informative priors on between-study standard deviation τ –
uniform priors
– As per NICE DSU, standard non-informative priors on between-study standard deviation are usually
uniform(0,5) or uniform(0,2)
– These ensure that the highest density of sd is usually at low values around 0-0.5, with a long tail of the
distribution
– These are considered to be non-informative and therefore can lead to unrealistically wide credible interval
bounds in case of sparse evidence base
– A possible alternative would be to use so-called „weakly informative“ priors instead
– Commonly, a variety of half-normal and gamma distributions are used in the literature
Ren S., Oakley J.E., Stevens J.W.: Incorporating Genuine Prior Information about Between-Study Heterogeneity in Random Effects Pairwise
and Network Meta-analyses. Medical Decision Making 2018, 38(4):531-542.
16. Informative priors on between-study standard deviation τ
– Informative priors on the between-study heterogeneity parameter τ are assumed to follow t and log-
normal distributions for continuous and binary outcomes, respectively.
– If the number of trials is small (4 studies or less), the “default” practice of using non-informative
priors for the between trial standard deviation (τ ~ Uni(0,5)) is likely to result in posteriors which
allow for unrealistically high levels of heterogeneity (i.e. resulting in extremely wide 95% credible
intervals).
– The solution advised by NICE is to use informative priors, based on expert opinion or on meta-
epidemiological data (NICE DSU TSD3 p16).
– Rhodes et al. present informative priors on respiratory diseases, obtained through predictive
distributions for continuous outcome. The authors found that heterogeneity was substantially lower
in meta-analyses related to respiratory diseases and therefore show separate results.
– Turner et al. present informative priors for binary outcome.
Rhodes et al.: Predictive distributions were developed for the extent of heterogeneity in meta-analyses of continuous outcome data. J Clin Epidemiol. 2015; 68(1):52-60.
Turner et al.: Predictive distributions for between-study heterogeneity and simple methods for their application in Bayesian meta-analysis. Statist. Med. 2015, 34 984-998.
17. Weakly informative priors on between-study standard
deviation τ – half-normal priors
“In practice, the half-normal distribution is quite commonly used; the reasons for its popularity are probably its simple and familiar
form, its near-uniform behavior at the origin along with a reasonably quickly decaying upper tail, as well as considerations of
numerical stability.”
Röver C., Bender R., Dias S., Schmid C.H., Schmidli H., Sturtz S., Weber S., Friede T.: On weakly informative prior
distributions for the heterogeneity parameter in Bayesian random-effects meta-analysis. Res Syn Meth. 2021;12:448-474.
18. Network of evidence on count outcome
18
S
t
u
d
y
3
S
t
u
d
y
6
Comparator H
S
t
u
d
y
7
Study 7
Study 7
S
t
u
d
y
2
Study 5
Comparator E
Intervention X
Study 4
Comparator I
Comparator D
Study
1
Comparator G
Comparator F
Comparator A
Comparator C
S
t
u
d
y
4
Comparator B
Study 4
19. Results using different priors
Model fit
DIC -6.71
19
SD 0.39, 95% CrI [0.01; 2.25]
Results on non-informative uni(0,5) prior on τ Results on informative t(-5.18,2.47²,5) prior on log(τ²)
DIC -7.23 SD 0.073, 95% CrI [0.003; 0.523]
Model fit
Model fit
DIC -5.88
SD 0.38, 95% CrI [0.04; 1.60] SD 0.13, 95% CrI [0.01; 0.35]
Results on weakly informative HN(0,0.15²) prior
DIC -5.65
Results on weakly informative HN(0,0.16²) prior
Model fit
Favors Intervention X Favors comparator
Favors Intervention X Favors comparator
IR Ratio (CI) of Intervention X vs. Other
Favors Intervention X Favors comparator
IR Ratio (CI) of Intervention X vs. Other
20. Results on weakly informative HN(0,τ2) prior, threshold analysis
τ Intervention X in favour of comparators
0.11 Comparator G
Comparator E
Comparator D
Comparator C
Comparator B
0.12 Comparator G
Comparator D
Comparator B
0.13 Comparator G
Comparator D
Comparator B
0.14 Comparator G
Comparator B
0.15 Comparator G
0.16 -
20
21. Justification of prior selection
– A suitable prior on between-study heterogeneity τ has to be chosen with care
and the selection has to be justified
– One possibility is to focus on informative priors from the literature, such as
Turner et al. or Rhodes et al.
– If a particular half-normal, Gamma, Exponential, ... prior is to be selected, this
has to be justified by assessing relative and absolute model fit
– Relative model fit in terms of Deviance Information Criterion (DIC)
– Absolute model fit to the data in terms of posterior predictive checks
22. Posterior predictive checks
22
– Assessing model fit to data
– How much do posterior inferences change when other prior distributions are used?
– If the model fits, the replicated data generated under the model should look similar to observed data – the observed data
should look plausible under the posterior predictive distribution
– An observed discrepancy can be due to model misfit or chance
– Simulated values are drawn from the posterior predictive distribution of replicated data and compared to samples of observed
data
– Any systematic differences between the simulations and the data indicate potential failings of the model
– Graphical posterior predictive checks
– The data are displayed alongside simulated data from the fitted model, and systematic discrepancies between real and simulated data
are searched for
– All data can be displayed directly
– Data summaries or parameter inferences can be displayed in case of large datases
– Graphs of residuals or other measures of discrepancy between model and data can be shown
– Numerical posterior predictive checks
– Specifying a test quantity and an appropriate predictive distribution
– Discrepancy between test quantites can be summarized by a p-value
– If p-value is in reasonable range between 0.05 and 0.95, the model fit to the data is deemed ok
24. Matching-adjusted indirect comparisons (i)
24
– Matching-adjusted indirect comparisons (MAIC) can overcome limitations of
classical ITCs in the following situations:
– Important effect modifiers (such as
disease severity) bias the indirect
comparison of A vs. B (violation of similarity
assumption)
AND/OR
– Treatments of interest A and B
cannot be connected through a
common comparator
S
t
u
d
y
9
S
t
u
d
y
A
S
t
u
d
y
B
S
t
u
d
y
C
Study 13
Study U
Study V
Study W
Study 13
Study U
Study V
Study W
Study 13
Study U
Study V
Study W
S
t
u
d
y
7
S
t
u
d
y
D
S
t
u
d
y
E
S
t
u
d
y
4
S
t
u
d
y
5
S
t
u
d
y
F
Study 10
25. Matching-adjusted indirect comparisons (ii)
25
– MAICs can adjust for the effect of treatment effect modifiers in an anchored
indirect comparison
– MAICs can also remove some of the biases that unadjusted (naïve) direct
comparisons of outcomes can have when a common comparator arm is
missing (unanchored comparison)
– MAICs require individual patient data (IPD) from clinical trials for one
treatment (typically the company’s own treatment), but only published,
aggregate data on baseline characteristics and outcomes for the comparator
treatment
28. How does MAIC work? – Steps in a nutshell
28
Data
collection
Matching
criteria
Matching
Recalculating
outcomes
– Individual patient data for treatment A
– Published aggregate data for treatment B
– Select patient and disease characteristics known as effect
modifiers (and prognostic variables)
– Apply weighting to patients receiving treatment A to match
characteristics of patients receiving treatment B
– Compare weighted outcomes for treatment A to observed
outcomes for treatment B
29. Matching criteria
29
– Choice of variables to be matched/weighted on should be carefully considered:
– Including too many variables will reduce precision (by reducing the effective sample size)
– Failure to include relevant variables will result in bias
– For anchored comparisons: all effect modifiers but no purely prognostic variables
– For unanchored comparisons: all effect modifiers and prognostic variables
– Evidence that a variable is an effect modifier/prognostic variable for the outcome of
interest should be based on quantitative evidence, expert opinion, or systematic
literature reviews
– Conduct sensitivity analyses using different sets of matching variables
– Select patient and disease characteristics known as effect modifiers
(and prognostic variables)
30. Matching
30
– Matching is accomplished by re-weighting patients in the IPD trial by their odds of
having been enrolled in the comparator trial
– Approach is similar to propensity score
weighting and is performed for all the
selected matching criteria simultaneously
– For anchored comparisons, matching
is also performed for the placebo arms;
it is then possible to compare relative
treatment effects (vs. placebo) between
A and B
– Apply weighting to patients receiving treatment A to match
characteristics of patients receiving treatment B
31. Recalculating Outcomes
31
– Weighted outcomes can be calculated for any statistics of the outcome of interest
– Ideally, statistical methods that take into account uncertainty around estimated weights
(such as Generalized Estimating Equations) should be used
– After matching, the effective sample size (ESS) can be calculated
– If the populations were balanced, each patient in the index trial would get a weight close to 1 and
the effective sample size would be similar to the original sample size
– Low effective sample size may occur when the populations differ substantially in one or more of
the characteristics matched
– Compare weighted outcomes for treatment A to observed outcomes
for treatment B
32. Recommendations by NICE DSU
32
The development of a Technical Support Document (TSD) by the NICE Decision Support
Unit (DSU) was released in December 2016 and a number of key recommendations are
made1:
1. Only use unanchored comparisons when a connected network is not available
2. Anchored comparisons must demonstrate that there are effect modifiers and that there is
imbalance in the effect modifiers
3. All effect modifiers but no purely prognostic variables should be matched in an anchored
comparison
4. Unanchored comparisons must include assessments of error due to unaccounted for
covariates
5. Indirect comparisons should be carried out on the linear predictor scale
6. Must explicitly state the target population
1 Phillippo DM et al.: NICE DSU Technical Support Document 18: methods for population-adjusted indirect comparisons in submission to NICE. https://research-
information.bris.ac.uk/en/publications/nice-dsu-technical-support-document-18-methods-for-population-adj
33. MAIC Case Study: Secukinumab in AS
33
– Ankylosing spondylitis (AS) is a form of arthritis causing inflammation of the
spinal joints that can lead to severe, chronic pain and discomfort1
– Secukinumab, a new antibody against interleukin 17A, has shown efficacy for
up to 104 weeks in patients with active ankylosing spondylitis2
OBJECTIVE
– To compare the efficacy of secukinumab with that of adalimumab
using matching-adjusted indirect comparison (MAIC) in patients with active AS
in terms of ASAS203 and ASAS40
1
Spondylitis Association of America. https://www.spondylitis.org/Ankylosing-Spondylitis
2
Baeten D, et al., 2015 N Engl J Med 373:2534–48
3 Mapi Trust https://eprovide.mapi-trust.org/instruments/assessment-in-ankylosing-spondylitis-response-criteria
34. Case study: Matching approach
34
Figure from Maksymowych et al.
Presented at AMCP 2017
35. Case study: Baseline characteristics before/after matching
35
Figure from Maksymowych et al.
Presented at AMCP 2017
36. Case study: Efficacy outcomes after matching (i)
36
– Placebo-adjusted (anchored)
comparisons were feasible at week 8
and week 12
– There is no evidence that ASAS 20 or
ASAS 40 responses differed significantly
between secukinumab and adalimumab
at week 12
Figure from Maksymowych et al. Presented at AMCP 2017
37. Case study: Efficacy outcomes after matching (ii)
37
– As the unbiased placebo phase ended
at week 12, comparisons at week 16
and week 24 were unanchored
– There was weak evidence (p=0.047)
that ASAS20 responses were higher
with secukinumab than adalimumab at
week 16
– At week 24, there was up to moderate
evidence (p=0.017/0.012) that ASAS20
and ASAS40 responses were higher
with secukinumab
39. Bias
39
– Constancy of relative effects
𝑑!" !" = 𝑑!" !#
– Biased if there are
differences in effect modifiers
between studies
40. External validity target population for HTA decision-making
40
– MAIC and STC are restricted to contrast treatments in the study B sample
– May not be representative of target population of eligible patients for study B
– May differ to the target population of routine clinical practice in the
jurisdiction
– Valid estimate of treatment effect in one context is not necessarily valid in
another
– MAIC and STC have been designed for pairwise comparisons
41. What we would like
41
– Synthesis of larger treatment networks of any size
– Avoid aggregation bias
– Produces estimates in the target population
42. Multilevel network meta-regression (ML-NMR)
42
– Population-adjustment methods aim to
relax this assumption using IPD to adjust
for differences in effect modifiers
between studies
– Ideally, we would have IPD for every
study but more typical we have for only a
subset
– ML-NMR based on Jackson et al. (2006,
2008)
– ML-NMR synthesizes mixtures of IPD
and AgD and performs population
adjustment in networks of any size
1. Define individual-level regression model
– IPD network meta-regression
2. Average (integrate) over the aggregate
study population to form the aggregate-
level model
– Use efficient and general numerical
integration
Phillippo et al (2020) Multilevel Network Meta-Regression for population-adjusted treatment comparisons J R Stat Soc: A 183(3)
47. Equations
47
– Individual level straightforward in many case e.g. sum of Normal or Poisson
outcomes
– Aggregate level integration easier in some cases e.g. identity link or discrete
covariates but in general use numerical integration
48. Open questions
48
– Application to disconnected networks (unanchored scenario) unclear
– Extension to survival analysis setting required
– Current implementation targets a conditional treatment effect as opposed to a
marginal
– As per STC, “standardization” step required for population-level
reimbursement decisions
49. Software
49
– R software package (multinma)
– Active maintenance and development
– Freely available and easy to use
53. MAIC vs. ML-NMR – current situation
53
MAIC ML-NMR
Can only remove bias when the aggregate data
population is entirely contained within the
population of the IPD study
Method of integrating over the covariate
distribution is more flexible and has conceptual
advantages
Only applicable to two-study scenarios Generalizable to larger treatment networks
Limited to target population of aggregate data
trial
Comparisons may be provided in any target
population given sufficient information on the
covariate distribution
Well-established and well-known to decision-
makers
Decisions-makers have limited/no experience
with this new method
As a weigting method easily applicable to any
outcome data, including time-to-event
Further research required to extend ML-NMR to
time-to-event data
Applicable to unanchored comparisons Not (yet) applicable to incorporate data from
single-arm studies, but extension is conceputally
possible
54. Conclusion
54
– Handling sparse data, heterogeneity and disconnected studies in evidence
synthesis poses challenges to researchers and requires advanced
methodology
– If the network is sparse, informative priors on between-study heterogeneity
parameters represent alternatives to the use of non-informative priors in the
standard case
– In pairwise comparisons, MAICs use IPD from one trial to adjust for effect
modifying (in the anchored case) and population imbalance (in the
unanchored case)
– ML-NMR is a novel method and a direct extension of the standard network
meta-analysis framework which uses IPD to adjust for differences in effect
modifiers between studies