| Title: | Bayesian Evaluation, Analysis, and Simulation Software Tools for Trials |
|---|---|
| Description: | Bayesian dynamic borrowing with covariate adjustment via inverse probability weighting for simulations and data analyses in clinical trials. This makes it easy to use propensity score methods to balance covariate distributions between external and internal data. This methodology based on Psioda et al (2025) <doi:10.1080/10543406.2025.2489285>. |
| Authors: | Christina Fillmore [aut, cre] (ORCID: <https://orcid.org/0000-0003-0595-2302>), Nate Bean [aut] (ORCID: <https://orcid.org/0000-0001-9946-0119>), Abi Terry [aut], Ben Arancibia [aut], GlaxoSmithKline Research & Development Limited [cph, fnd], Trustees of Columbia University [cph] (R/stanmodels.R, configure, configure.win) |
| Maintainer: | Christina Fillmore <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.0.3.9000 |
| Built: | 2026-05-21 09:49:04 UTC |
| Source: | https://github.com/gsk-biostatistics/beastt |
Inverse Probability Weighted Bayesian Dynamic Borrowing
Stan Development Team (NA). RStan: the R interface to Stan. R package version 2.32.3. https://mc-stan.org
Converts a multivariate normal distribution for Weibull parameters (or a mixture of these distributions) into an approximate beta distribution for the survival probability at a specific time point. This is particularly useful for visualizing survival probabilities in sweet spot plots
approx_mvn_at_time(x, time)approx_mvn_at_time(x, time)
x |
A vector of distributional objects that must be either multivariate normal distributions or mixtures of multivariate normal distributions. For Weibull models, these represent distributions of the log(shape) and log(scale) parameters. |
time |
A numeric value specifying survival time at which to calculate the survival probability. |
The function performs the following steps:
For each multivariate normal distribution, it generates 10,000 samples of the Weibull parameters
Calculates the corresponding survival probabilities at the specified time using the Weibull survival function
Fits a beta distribution to match the mean and variance of these survival probabilities
For mixture distributions, it performs this approximation for each component and creates a new mixture with the same weights
The conversion uses the relationship between Weibull parameters and survival probability: S(t) = exp(-(t*exp(log_scale))^exp(log_shape)).
A vector of beta distributional (or mixture of beta distributional) objects approximating the survival probabilities at the specified time point. If the input is a mixture distribution, the output will be a mixture of beta distributions with the same weights.
library(distributional) # Create a multivariate normal distribution for Weibull parameters # (log(shape), log(scale)) mvn_dist <- dist_multivariate_normal( mu = list(c(0, -1)), # log(shape) = 0, log(scale) = -1 sigma = list(matrix(c(0.1, 0, 0, 0.1), nrow = 2)) ) # Approximate as beta distribution for survival at time t=12 beta_approx <- approx_mvn_at_time(mvn_dist, time = 12)library(distributional) # Create a multivariate normal distribution for Weibull parameters # (log(shape), log(scale)) mvn_dist <- dist_multivariate_normal( mu = list(c(0, -1)), # log(shape) = 0, log(scale) = -1 sigma = list(matrix(c(0.1, 0, 0, 0.1), nrow = 2)) ) # Approximate as beta distribution for survival at time t=12 beta_approx <- approx_mvn_at_time(mvn_dist, time = 12)
Compute a single "average" distribution from a vector of distributional objects. This function calculates the mean of each hyperparameter across all input distributions and returns a new distributional object of the same family with these averaged hyperparameters.
avg_dist(x)avg_dist(x)
x |
A vector of distributional objects of the same family (beta, normal, multivariate normal, or mixture). |
The function supports four distribution families:
Beta distributions: Averages the shape1 and shape2 hyperparameters
Normal distributions: Averages the mean and standard deviation hyperparameters
Multivariate normal distributions: Averages the location vectors and covariance matrices
Mixture distributions: Same as above for each distribution type, where averaging is done by component. Also averages the mixture weight.
For multivariate normal distributions, both the location vector and covariance matrix are averaged element-wise.
A single distributional object of the same family as the input, with hyperparameters set equal to the average of all input distribution hyperparameters.
library(distributional) # Beta distributions beta_dists <- c( dist_beta(shape1 = 2, shape2 = 5), dist_beta(shape1 = 3, shape2 = 3), dist_beta(shape1 = 4, shape2 = 2) ) avg_dist(beta_dists) |> parameters() # Normal distributions norm_dists <- c( dist_normal(mu = 0, sigma = 1), dist_normal(mu = 2, sigma = 2), dist_normal(mu = 4, sigma = 3) ) avg_dist(norm_dists) |> parameters() # Multivariate normal distributions mvn_dists <- c( dist_multivariate_normal(mu = list(c(0, 0)), sigma = list(matrix(c(1, 0, 0, 1), nrow = 2))), dist_multivariate_normal(mu = list(c(1, 1)), sigma = list(matrix(c(2, 0, 0, 2), nrow = 2))) ) avg_dist(mvn_dists) |> parameters()library(distributional) # Beta distributions beta_dists <- c( dist_beta(shape1 = 2, shape2 = 5), dist_beta(shape1 = 3, shape2 = 3), dist_beta(shape1 = 4, shape2 = 2) ) avg_dist(beta_dists) |> parameters() # Normal distributions norm_dists <- c( dist_normal(mu = 0, sigma = 1), dist_normal(mu = 2, sigma = 2), dist_normal(mu = 4, sigma = 3) ) avg_dist(norm_dists) |> parameters() # Multivariate normal distributions mvn_dists <- c( dist_multivariate_normal(mu = list(c(0, 0)), sigma = list(matrix(c(1, 0, 0, 1), nrow = 2))), dist_multivariate_normal(mu = list(c(1, 1)), sigma = list(matrix(c(2, 0, 0, 2), nrow = 2))) ) avg_dist(mvn_dists) |> parameters()
This is an example of output from a simulation study that investigates the
operating characteristics of inverse probability weighted Bayesian dynamic
borrowing for the case with a binary outcome. This output was generated
based on the binary simulation template. For this simulation study, only the
degree of covariate imbalance (as indicated by population) and the
marginal treatment effect were varied.
binary_sim_dfbinary_sim_df
binary_sim_df A data frame with 255 rows and 6 columns:Populations defined by different covariate imbalances
Marginal treatment effect
True control response rate on the marginal scale
Probability of rejecting the null hypothesis in the case with borrowing
Probability of rejecting the null hypothesis without borrowing
Vector of power priors (or some other informative prior distribution for the control marginal parameter of interest based on the external data) as distributional objects
Bootstrap Covariate Data
bootstrap_cov( external_dat, n, imbal_var = NULL, imbal_prop = NULL, ref_val = 0 )bootstrap_cov( external_dat, n, imbal_var = NULL, imbal_prop = NULL, ref_val = 0 )
external_dat |
Data frame of the external data from which to bootstrap covariate vectors |
n |
Number of rows in the output dataset |
imbal_var |
Optional variable indicating which covariate's distribution
should be altered to incorporate imbalance compared to the external data.
If left |
imbal_prop |
Optional imbalance proportion, required if an imbalance variable is specified. This defines the proportion of individuals with the reference value of the imbalance variable in the returned dataset. This can either be a single proportion or a vector of proportions, in which case a list of datasets is returned. |
ref_val |
Optional value corresponding to the reference level of the binary imbalance variable, if specified |
Covariate data can be generated for n individuals enrolled in
the internal trial by bootstrap sampling entire covariate vectors from the
external data, thus preserving the correlation between the covariates. If
both imbal_var = NULL and imbal_prop = NULL, the function returns
a single data frame in which the distributions of each covariate align
with the covariate distributions from the external data (i.e., balanced
covariate distributions across the two trials). Alternatively, covariate
imbalance can be incorporated into the generated sample with respect to a
binary covariate (imbal_var) such that a specified proportion
(imbal_prop) of individuals in the resulting sample will have the
reference level (ref_val) of this imbalance covariate. In this case,
stratified bootstrap sampling is employed with the imbalance covariate as
the stratification factor.
Multiple samples with varying degrees of imbalance can be generated
simultaneously by defining imbal_prop to be a vector of values. The
function then returns a list of data frames with a length equal to the
number of specified imbalance proportions.
Data frame with the same number of columns as the external data frame
and n number of rows (if the length of imbal_prop is 0 or 1); otherwise,
a list of data frames with a length equal to that of imbal_prop
# Return one data frame with covariate distributions similar to external data samp_balance <- bootstrap_cov(ex_binary_df, n = 1000) # Return a list of two data frames that incorporate imbalance w.r.t. covariate 2 samp_imbalance <- bootstrap_cov(ex_binary_df, n = 1000, imbal_var = cov2, imbal_prop = c(0.25, 0.5), ref_val = 0)# Return one data frame with covariate distributions similar to external data samp_balance <- bootstrap_cov(ex_binary_df, n = 1000) # Return a list of two data frames that incorporate imbalance w.r.t. covariate 2 samp_imbalance <- bootstrap_cov(ex_binary_df, n = 1000, imbal_var = cov2, imbal_prop = c(0.25, 0.5), ref_val = 0)
In order to properly generate binary response data for the internal trial as part of a simulation study that investigates inverse probability weighting, we need to translate the desired marginal drift and treatment effect to the corresponding conditional drift and treatment effect that can then be added into a binary outcome model (e.g., logistic regression model) used to simulate response data.
calc_cond_binary(population, glm, marg_drift, marg_trt_eff)calc_cond_binary(population, glm, marg_drift, marg_trt_eff)
population |
A very large data frame (e.g., number of rows |
glm |
Logistic regression model object fit using the external data |
marg_drift |
Vector of marginal drift values |
marg_trt_eff |
Vector of marginal treatment effect values |
In simulation studies that investigate the properties of inverse probability weighted Bayesian dynamic borrowing, scenarios should be considered in which the underlying response rates for the internal and external control populations differ by varying amounts due to unmeasured confounding (i.e., drift, where positive values indicate a higher response rate for the internal population). While values of drift and treatment effect (i.e., risk difference) can be defined on the marginal scale for simulation studies, we must first convert these values to the conditional scale and then include these terms, along with covariates, in a logistic regression outcome model when generating response data for the internal arms. Doing so allows us to assume a relationship between the covariates and the response variable while properly accounting for drift and treatment effect.
To identify the conditional drift and treatment effect that correspond to
specified values of marginal drift and treatment effect, we first bootstrap
covariate vectors from the external data (e.g., ) to
construct a "population" that represents both the internal trial
(possibly incorporating intentional covariate imbalance) and the external
trial after standardizing it to match the covariate distributions
of the internal trial (allowing us to control for measured confounding
from potential imbalance in the covariate distributions). Measured
confounding can be incorporated into the data generation by bootstrapping
a very large data frame (population) in which the distribution of at
least one covariate is intentionally varied from that of the external data;
additional unmeasured drift can be incorporated through the
translation of specified marginal values (marg_drift) to conditional
values.
Let and denote the marginal and conditional drift,
respectively. For a specified value of , we can identify the
corresponding as the value that, when added as an additional
term in the logistic regression model (i.e., change in the intercept) for
each individual in the population, increases/decreases the
population-averaged conditional probabilities of response by an amount
approximately equal to . That is, the optimal
minimizes
where is the vector of regression coefficients
from the logistic regression model (glm) fit to the external control data
(assumed here to be the "true" covariate effects when generating response
data) and is a vector of covariates (including an
intercept term) from the bootstrapped population of size . In the
formula above, the first and second terms correspond to the
population-averaged conditional probabilities (i.e., the marginal response
rates) of the internal control population with drift and the external
control population (with covariate distributions standardized to match the
internal trial), respectively.
If we now denote the marginal and conditional treatment effect by
and , respectively, we can use a similar process
to identify the optimal that approximately corresponds to the
specified value of , which is done by minimizing the following:
where the first term is the average of the conditional probabilities of response (i.e., the marginal response rate) of the internal treated population.
See here for a simulation example with a binary outcome.
tibble of all combinations of the marginal drift and treatment effect. For each row the conditional drift and treatment effect has been calculated as well as the true response rates for the control and treated populations.
library(dplyr) # Model "true" regression coefficients using the external data logit_mod <- glm(y ~ cov1 + cov2 + cov3 + cov4, data = ex_binary_df, family = binomial) # Bootstrap internal control "population" with imbalance w.r.t. covariate 2 pop_int_ctrl <- bootstrap_cov(ex_binary_df, n = 100000, imbal_var = cov2, imbal_prop = 0.25, ref_val = 0) |> select(-subjid, -y) # keep only covariate columns # Convert the marginal drift and treatment effects to conditional calc_cond_binary(population = pop_int_ctrl, glm = logit_mod, marg_drift = c(-.1, 0, .1), marg_trt_eff = c(0, .15))library(dplyr) # Model "true" regression coefficients using the external data logit_mod <- glm(y ~ cov1 + cov2 + cov3 + cov4, data = ex_binary_df, family = binomial) # Bootstrap internal control "population" with imbalance w.r.t. covariate 2 pop_int_ctrl <- bootstrap_cov(ex_binary_df, n = 100000, imbal_var = cov2, imbal_prop = 0.25, ref_val = 0) |> select(-subjid, -y) # keep only covariate columns # Convert the marginal drift and treatment effects to conditional calc_cond_binary(population = pop_int_ctrl, glm = logit_mod, marg_drift = c(-.1, 0, .1), marg_trt_eff = c(0, .15))
In order to properly generate time-to-event (TTE) outcome data for the internal trial as part of a simulation study that investigates inverse probability weighting, we need to translate the desired marginal drift and treatment effect to the corresponding conditional drift and treatment effect that can then be added into a TTE outcome model (e.g., Weibull proportional hazards regression model) used to simulate response data.
calc_cond_weibull( population, weibull_ph_mod, marg_drift, marg_trt_eff, analysis_time )calc_cond_weibull( population, weibull_ph_mod, marg_drift, marg_trt_eff, analysis_time )
population |
A very large data frame (e.g., number of rows |
weibull_ph_mod |
|
marg_drift |
Vector of marginal drift values |
marg_trt_eff |
Vector of marginal treatment effect values |
analysis_time |
A single time point when survival probabilities will be calculated |
In simulation studies that investigate the properties of inverse
probability weighted Bayesian dynamic borrowing, scenarios should be
considered in which the underlying survival probabilities at some
prespecified time (analysis_time) for the internal and external
control populations differ by varying amounts due to unmeasured confounding
(i.e., drift, where positive values indicate a higher survival probability
for the internal population). While values of drift and treatment effect
(i.e., difference between the survival probabilities at time for
the treated and control populations) can be defined on the marginal scale
for simulation studies, we must first convert these values to the
conditional scale and then include these terms, along with covariates, in a
Weibull proportional hazards (PH) regression outcome model when generating
time-to-event (TTE) data for the internal arms. Doing so allows us to
assume a relationship between the covariates and the response variable
while properly accounting for drift and treatment effect.
To identify the conditional drift and treatment effect that correspond to
specified values of marginal drift and treatment effect, we first bootstrap
covariate vectors from the external data (e.g., ) to
construct a "population" that represents both the internal trial
(possibly incorporating intentional covariate imbalance) and the external
trial after standardizing it to match the covariate distributions
of the internal trial (allowing us to control for measured confounding
from potential imbalance in the covariate distributions). Measured
confounding can be incorporated into the data generation by bootstrapping
a very large data frame (population) in which the distribution of at
least one covariate is intentionally varied from that of the external data;
additional unmeasured drift can be incorporated through the
translation of specified marginal values (marg_drift) to conditional
values.
Let and denote the marginal and conditional drift,
respectively. For a specified value of , we can identify the
corresponding as the value that, when added as an additional
term in the Weibull PH model survival function (i.e., additive change in
the intercept) for each individual in the population, increases/decreases
the population-averaged conditional probabilities of survival at time
by an amount approximately equal to . That is, the
optimal minimizes
where is the Weibull shape parameter,
is a vector of regression coefficients, and
is a vector of covariates (including an intercept
term) from the bootstrapped population of size . We note that
and are calculated as functions of the scale parameter
() and coefficients ()
estimated by the survreg object that was fit to the external data, and we
assume here that these estimates are the "true" shape and covariate effects
when generating response data. In the formula above, the first and second
terms correspond to the population-averaged conditional survival functions
(i.e., the marginal survival probabilities) at time for the
internal control population with drift and the external control population
(with covariate distributions standardized to match the internal trial),
respectively.
If we now denote the marginal and conditional treatment effect by
and , respectively, we can use a similar process
to identify the optimal that approximately corresponds to the
specified value of , which is done by minimizing the following:
where the first term is the average of the conditional survival functions
(i.e., the marginal survival probabilities) at time for the
internal treated population.
See here for a simulation example with a time-to-event outcome.
tibble of all combinations of the marginal drift and treatment
effect. For each row the conditional drift and treatment effect has been
calculated as well as the true marginal survival probabilities at time t
for the control and treatment populations.
library(dplyr) library(survival) # Model "true" regression coefficients using the external data weibull_ph_mod <- survreg(Surv(y, event) ~ cov1 + cov2 + cov3 + cov4, data = ex_tte_df, dist = "weibull") # Bootstrap internal control "population" with imbalance w.r.t. covariate 2 pop_int_ctrl <- bootstrap_cov(ex_tte_df, n = 100000, imbal_var = cov2, imbal_prop = 0.25, ref_val = 0) |> select(c(cov1, cov2, cov3, cov4)) # keep only covariate columns # Convert the marginal drift and treatment effects to conditional calc_cond_weibull(population = pop_int_ctrl, weibull_ph_mod, marg_drift = c(-.1, 0, .1), marg_trt_eff = c(0, .10), analysis_time = 12)library(dplyr) library(survival) # Model "true" regression coefficients using the external data weibull_ph_mod <- survreg(Surv(y, event) ~ cov1 + cov2 + cov3 + cov4, data = ex_tte_df, dist = "weibull") # Bootstrap internal control "population" with imbalance w.r.t. covariate 2 pop_int_ctrl <- bootstrap_cov(ex_tte_df, n = 100000, imbal_var = cov2, imbal_prop = 0.25, ref_val = 0) |> select(c(cov1, cov2, cov3, cov4)) # keep only covariate columns # Convert the marginal drift and treatment effects to conditional calc_cond_weibull(population = pop_int_ctrl, weibull_ph_mod, marg_drift = c(-.1, 0, .1), marg_trt_eff = c(0, .10), analysis_time = 12)
Calculate a posterior distribution that is beta (or a mixture of beta components). Only the relevant treatment arms from the internal dataset should be read in (e.g., only the control arm if constructing a posterior distribution for the control response rate).
calc_post_beta(internal_data, response, prior)calc_post_beta(internal_data, response, prior)
internal_data |
A tibble of the internal data. |
response |
Name of response variable |
prior |
A distributional object corresponding to a beta distribution or a mixture distribution of beta components |
For a given arm of an internal trial (e.g., the control arm or an
active treatment arm) of size , suppose the response data are binary
such that , . The
posterior distribution for is written as
where is the likelihood of the
response data from the internal arm and is a prior
distribution on (either a beta distribution or a mixture
distribution with an arbitrary number of beta components). The posterior
distribution for is either a beta distribution or a mixture of
beta components depending on whether the prior is a single beta
distribution or a mixture distribution.
distributional object
library(dplyr) library(distributional) calc_post_beta(internal_data = filter(int_binary_df, trt == 1), response = y, prior = dist_beta(0.5, 0.5))library(dplyr) library(distributional) calc_post_beta(internal_data = filter(int_binary_df, trt == 1), response = y, prior = dist_beta(0.5, 0.5))
Calculate a posterior distribution that is normal (or a mixture of normal components). Only the relevant treatment arms from the internal dataset should be read in (e.g., only the control arm if constructing a posterior distribution for the control mean).
calc_post_norm(internal_data, response, prior, internal_sd = NULL)calc_post_norm(internal_data, response, prior, internal_sd = NULL)
internal_data |
A tibble of the internal data. |
response |
Name of response variable |
prior |
A distributional object corresponding to a normal distribution, a t distribution, or a mixture distribution of normal and/or t components |
internal_sd |
Standard deviation of internal response data if
assumed known. It can be left as |
For a given arm of an internal trial (e.g., the control arm or an
active treatment arm) of size , suppose the response data are normally
distributed such that , .
If is assumed known, the posterior distribution for
is written as
where is the
likelihood of the response data from the internal arm and
is a prior distribution on (either a normal distribution, a
distribution, or a mixture distribution with an arbitrary number of
normal and/or components). Any components of the prior for
are approximated with a mixture of two normal distributions.
If is unknown, the marginal posterior distribution for
is instead written as
In this case, the prior for is chosen to be
such that
becomes a non-standardized distribution. This integrated likelihood
is then approximated with a mixture of two normal distributions.
If internal_sd is supplied a positive value and prior corresponds to a
single normal distribution, then the posterior distribution for
is a normal distribution. If internal_sd = NULL or if other types of prior
distributions are specified (e.g., mixture or t distribution), then the
posterior distribution is a mixture of normal distributions.
distributional object
library(distributional) library(dplyr) post_treated <- calc_post_norm(internal_data = filter(int_norm_df, trt == 1), response = y, prior = dist_normal(0.5, 10), internal_sd = 0.15)library(distributional) library(dplyr) post_treated <- calc_post_norm(internal_data = filter(int_norm_df, trt == 1), response = y, prior = dist_normal(0.5, 10), internal_sd = 0.15)
Calculate a posterior distribution for the probability of surviving past a given analysis time(s) for time-to-event data with a Weibull likelihood. Only the relevant treatment arms from the internal dataset should be read in (e.g., only the control arm if constructing a posterior distribution for the control survival probability).
calc_post_weibull(internal_data, response, event, prior, analysis_time, ...)calc_post_weibull(internal_data, response, event, prior, analysis_time, ...)
internal_data |
This can either be a propensity score object or a tibble of the internal data. |
response |
Name of response variable |
event |
Name of event indicator variable (1: event; 0: censored) |
prior |
A distributional object corresponding to a multivariate normal distribution or a mixture of 2 multivariate normals. The first element of the mean vector and the first row/column of covariance matrix correspond to the log-shape parameter, and the second element corresponds to the intercept (i.e., log-inverse-scale) parameter. |
analysis_time |
Vector of time(s) when survival probabilities will be calculated |
... |
rstan sampling option. This overrides any default options. For more
information, see |
For a given arm of an internal trial (e.g., the control arm or an
active treatment arm) of size , suppose the response data are time to event
such that , where
. Define
where is the intercept parameter of a Weibull
proportional hazards regression model. The posterior distribution for
is written as
where
is the likelihood of the response data from the internal arm with event indicator
and survival function , and is a prior
distribution on (either a multivariate normal distribution or a
mixture of two multivariate normal distributions). Note that the posterior distribution
for does not have a closed form, and thus MCMC samples
for and are drawn from the posterior. These MCMC
samples are used to construct samples from the posterior distribution
for the probability of surviving past the specified analysis time(s) for the
given arm.
To construct a posterior distribution for the treatment difference (i.e., the difference in survival probabilities at the specified analysis time), obtain MCMC samples from the posterior distributions for the survival probabilities under each arm and then subtract the control-arm samples from the treatment-arm samples.
stan posterior object
library(distributional) library(dplyr) library(rstan) mvn_prior <- dist_multivariate_normal( mu = list(c(0.3, -2.6)), sigma = list(matrix(c(1.5, 0.3, 0.3, 1.1), nrow = 2))) post_treated <- calc_post_weibull(filter(int_tte_df, trt == 1), response = y, event = event, prior = mvn_prior, analysis_time = 12, warmup = 5000, iter = 15000) # Extract MCMC samples of survival probabilities at time t=12 surv_prob_treated <- as.data.frame(extract(post_treated, pars = c("survProb")))[,1]library(distributional) library(dplyr) library(rstan) mvn_prior <- dist_multivariate_normal( mu = list(c(0.3, -2.6)), sigma = list(matrix(c(1.5, 0.3, 0.3, 1.1), nrow = 2))) post_treated <- calc_post_weibull(filter(int_tte_df, trt == 1), response = y, event = event, prior = mvn_prior, analysis_time = 12, warmup = 5000, iter = 15000) # Extract MCMC samples of survival probabilities at time t=12 surv_prob_treated <- as.data.frame(extract(post_treated, pars = c("survProb")))[,1]
Calculate a (potentially inverse probability weighted) beta power prior for the control response rate using external control data.
calc_power_prior_beta(external_data, response, prior)calc_power_prior_beta(external_data, response, prior)
external_data |
This can either be a |
response |
Name of response variable |
prior |
A beta distributional object that is the initial prior for the control response rate before the external control data are observed |
Weighted participant-level response data from the control arm of an
external study are incorporated into an inverse probability weighted (IPW)
power prior for the control response rate . When borrowing
information from an external control arm of size , the components
of the IPW power prior for are defined as follows:
with
weights :
Defining the weights to equal 1 results in a
conventional beta power prior.
Beta power prior object
Other power prior:
calc_power_prior_norm(),
calc_power_prior_weibull()
library(distributional) library(dplyr) # This function can be used directly on the data calc_power_prior_beta(external_data = ex_binary_df, response = y, prior = dist_beta(0.5, 0.5)) # Or this function can be used with a propensity score object ps_obj <- calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) calc_power_prior_beta(ps_obj, response = y, prior = dist_beta(0.5, 0.5))library(distributional) library(dplyr) # This function can be used directly on the data calc_power_prior_beta(external_data = ex_binary_df, response = y, prior = dist_beta(0.5, 0.5)) # Or this function can be used with a propensity score object ps_obj <- calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) calc_power_prior_beta(ps_obj, response = y, prior = dist_beta(0.5, 0.5))
Calculate a (potentially inverse probability weighted) normal power prior using external data.
calc_power_prior_norm( external_data, response, prior = NULL, external_sd = NULL )calc_power_prior_norm( external_data, response, prior = NULL, external_sd = NULL )
external_data |
This can either be a |
response |
Name of response variable |
prior |
Either |
external_sd |
Standard deviation of external response data if assumed
known. It can be left as |
Weighted participant-level response data from an external study are
incorporated into an inverse probability weighted (IPW) power prior for the
parameter of interest (e.g., the control mean if borrowing
from an external control arm). When borrowing information from an external
dataset of size , the IPW likelihood of the external response
data with weights is defined as
The prior argument should be either a distributional object with a family
type of normal or NULL, corresponding to the use of a normal initial
prior or an improper uniform initial prior (i.e., ), respectively.
The external_sd argument can be a positive value if the external standard
deviation is assumed known or left as NULL otherwise. If external_sd = NULL, then prior must be NULL to indicate the use of an improper
uniform initial prior for , and an improper prior is defined
for the unknown external standard deviation such that . The details of the IPW power prior for each
case are as follows:
external_sd = positive value ( known):With
either a proper normal or an improper uniform initial prior, the IPW
weighted power prior for is a normal distribution.
external_sd = NULL ( unknown):With improper
priors for both and , the marginal IPW weighted
power prior for after integrating over is
a non-standardized distribution.
Defining the weights to equal 1 results in a
conventional normal (or ) power prior if the external standard
deviation is known (unknown).
Normal power prior object
Other power prior:
calc_power_prior_beta(),
calc_power_prior_weibull()
library(distributional) library(dplyr) # This function can be used directly on the data calc_power_prior_norm(ex_norm_df, response = y, prior = dist_normal(0.5, 10), external_sd = 0.15) # Or this function can be used with a propensity score object ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) calc_power_prior_norm(ps_obj, response = y, prior = dist_normal(0.5, 10), external_sd = 0.15)library(distributional) library(dplyr) # This function can be used directly on the data calc_power_prior_norm(ex_norm_df, response = y, prior = dist_normal(0.5, 10), external_sd = 0.15) # Or this function can be used with a propensity score object ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) calc_power_prior_norm(ps_obj, response = y, prior = dist_normal(0.5, 10), external_sd = 0.15)
Calculate an approximate (potentially inverse probability weighted) multivariate normal power prior for the log-shape and log-inverse-scale parameters of a Weibull likelihood for external time-to-event control data.
calc_power_prior_weibull( external_data, response, event, intercept, shape, approximation = c("Laplace", "MCMC"), ... )calc_power_prior_weibull( external_data, response, event, intercept, shape, approximation = c("Laplace", "MCMC"), ... )
external_data |
This can either be a |
response |
Name of response variable |
event |
Name of event indicator variable (1: event; 0: censored) |
intercept |
Normal distributional object that is the initial prior for the intercept (i.e., log-inverse-scale) parameter |
shape |
Integer value that is the scale of the half-normal prior for the shape parameter |
approximation |
Type of approximation to be used. Either |
... |
Arguments passed to |
Weighted participant-level response data from the control arm of an
external study are incorporated into an approximated inverse probability
weighted (IPW) power prior for the parameter vector
, where
is the intercept parameter of a Weibull proportional hazards regression model
and and are the Weibull shape and scale parameters,
respectively. When borrowing information from an external dataset of size ,
the IPW likelihood of the external response data with
event indicators and weights
is defined as
The initial priors for the intercept parameter and the shape parameter
are assumed to be normal and half-normal, respectively, which can
be defined using the intercept and shape arguments.
The power prior for does not have a closed form, and
thus we approximate it via a bivariate normal distribution; i.e.,
If approximation = Laplace, then is the mode vector
of the IPW power prior and is the negative
inverse of the Hessian of the log IPW power prior evaluated at the mode. If
approximation = MCMC, then MCMC samples are obtained from the IPW power prior,
and and
are the estimated mean vector and covariance matrix of these MCMC samples.
Note that the Laplace approximation method is faster due to its use of
optimization instead of MCMC sampling.
The first element of the mean vector and the first row/column of covariance
matrix correspond to the log-shape parameter (), and the
second element corresponds to the intercept (, the log-inverse-scale)
parameter.
Multivariate Normal Distributional Object
Other power prior:
calc_power_prior_beta(),
calc_power_prior_norm()
library(distributional) library(dplyr) # This function can be used directly on the data calc_power_prior_weibull(ex_tte_df, response = y, event = event, intercept = dist_normal(0, 10), shape = 50, approximation = "Laplace") # Or this function can be used with a propensity score object ps_obj <- calc_prop_scr(internal_df = filter(int_tte_df, trt == 0), external_df = ex_tte_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) calc_power_prior_weibull(ps_obj, response = y, event = event, intercept = dist_normal(0, 10), shape = 50, approximation = "Laplace")library(distributional) library(dplyr) # This function can be used directly on the data calc_power_prior_weibull(ex_tte_df, response = y, event = event, intercept = dist_normal(0, 10), shape = 50, approximation = "Laplace") # Or this function can be used with a propensity score object ps_obj <- calc_prop_scr(internal_df = filter(int_tte_df, trt == 0), external_df = ex_tte_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) calc_power_prior_weibull(ps_obj, response = y, event = event, intercept = dist_normal(0, 10), shape = 50, approximation = "Laplace")
Calculate the propensity scores and ATT inverse probability weights for participants from internal and external datasets. Only the relevant treatment arms from each dataset should be read in (e.g., only the control arm from each dataset if creating a hybrid control arm).
calc_prop_scr(internal_df, external_df, id_col, model, ...)calc_prop_scr(internal_df, external_df, id_col, model, ...)
internal_df |
Internal dataset with one row per subject and all the variables needed to run the model |
external_df |
External dataset with one row per subject and all the variables needed to run the model |
id_col |
Name of the column in both datasets used to identify each subject. It must be the same across datasets |
model |
Model used to calculate propensity scores |
... |
Optional arguments |
For the subset of participants in both the external and internal studies for which we want to balance the covariate distributions (e.g., external control and internal control participants if constructing a hybrid control arm), we define a study-inclusion propensity score for each participant as
where denotes a vector of baseline covariates for the th
participant and denotes the indicator that the participant is
enrolled in the internal trial ( if internal,
if external). The estimated propensity score is obtained
using logistic regression.
An ATT inverse probability weight is calculated for each individual as
In a weighted estimator, data from participants in the external study
are given a weight of whereas data
from participants in the internal trial are given a weight of 1.
prop_scr_obj object, with the internal and the external data and
the propensity score and inverse probability weight calculated for each
subject.
# This can be used for both continuous and binary data library(dplyr) # Continuous calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # Binary calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4)# This can be used for both continuous and binary data library(dplyr) # Continuous calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # Binary calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4)
Calculate the Analysis Time Based on a Target Number of Events and/or Target Follow-up Time
calc_study_duration( study_time, observed_time, event_indicator, target_events = NULL, target_follow_up = NULL )calc_study_duration( study_time, observed_time, event_indicator, target_events = NULL, target_follow_up = NULL )
study_time |
Vector of study times (accrual time + observed time) |
observed_time |
Vector of observed times (event time or censoring time) |
event_indicator |
Vector of boolean values (TRUE/FALSE or 1/0) indicating if the observed time value is an event or censoring time |
target_events |
Target number of events, where the analysis time is
determined once this number of events is reached. Default is |
target_follow_up |
Target follow-up for each participant, where the
analysis time is determined once each participant in the risk set is
followed up for this amount of time (i.e., minimum follow-up time). Default
is |
This function calculates the analysis time for a study with a
time-to-event endpoint for which the target number of events
(target_events) and/or target follow-up time (target_follow_up) are
specified. If only target_events is specified, the analysis will occur
at the time when the target number of events has been reached. If only
target_follow_up is specified, the analysis will occur once the
last-enrolled participant who is still in the risk set has been followed up
for this amount of time. If both target_events and target_follow_up are
specified, the analysis time will be based on whichever occurs first.
Time of analysis
library(dplyr) # Determining analysis time by reaching a target number of events ex_tte_df |> mutate( analysis_time = calc_study_duration(study_time = total_time, observed_time = y, event_indicator = event, target_events = 30) ) # Determining analysis time by a target follow-up time ex_tte_df |> mutate( analysis_time = calc_study_duration(study_time = total_time, observed_time = y, event_indicator = event, target_follow_up = 12) ) # Or use both (whichever happens first) ex_tte_df |> mutate( analysis_time = calc_study_duration(study_time = total_time, observed_time = y, event_indicator = event, target_events = 30, target_follow_up = 12) )library(dplyr) # Determining analysis time by reaching a target number of events ex_tte_df |> mutate( analysis_time = calc_study_duration(study_time = total_time, observed_time = y, event_indicator = event, target_events = 30) ) # Determining analysis time by a target follow-up time ex_tte_df |> mutate( analysis_time = calc_study_duration(study_time = total_time, observed_time = y, event_indicator = event, target_follow_up = 12) ) # Or use both (whichever happens first) ex_tte_df |> mutate( analysis_time = calc_study_duration(study_time = total_time, observed_time = y, event_indicator = event, target_events = 30, target_follow_up = 12) )
This is a simulated dataset used to illustrate Bayesian dynamic borrowing in the case when borrowing from an external control arm with a binary endpoint, where the baseline covariate distributions of the internal and external data are balanced via inverse probability weighting.
ex_binary_dfex_binary_df
ex_binary_dfA data frame with 150 rows and 6 columns:
Unique subject ID
Covariate 1, which is normally distributed around 65 with a SD of 10
Covariate 2, which is binary (0 vs. 1) with about 30% of participants having level 1
Covariate 3, which is binary (0 vs. 1) with about 40% of participants having level 1
Covariate 4, which is binary (0 vs. 1) with about 50% of participants having level 1
Response, which is binary (0 vs. 1)
This is a simulated dataset used to illustrate Bayesian dynamic borrowing in the case when borrowing from an external control arm with a normal endpoint, where the baseline covariate distributions of the internal and external data are balanced via inverse probability weighting.
ex_norm_dfex_norm_df
ex_norm_dfA data frame with 150 rows and 6 columns:
Unique subject ID
Covariate 1, which is normally distributed around 50 with a SD of 10
Covariate 2, which is binary (0 vs. 1) with about 20% of participants having level 1
Covariate 3, which is binary (0 vs. 1) with about 60% of participants having level 1
Covariate 4, which is binary (0 vs. 1) with about 30% of participants having level 1
Response, which is normally distributed with a SD of 0.15
This is a simulated dataset used to illustrate Bayesian dynamic borrowing in the case when borrowing from an external control arm with a time-to-event endpoint, where the baseline covariate distributions of the internal and external data are balanced via inverse probability weighting.
ex_tte_dfex_tte_df
ex_tte_dfA data frame with 150 rows and 9 columns:
Unique subject ID
Response (observed time at which the participant either had an event or was censored)
Enrollment time
Time from study start
Event indicator (1: event; 0: censored)
Covariate 1, which is normally distributed around 65 with a SD of 10
Covariate 2, which is binary (0 vs. 1) with about 30% of participants having level 1
Covariate 3, which is binary (0 vs. 1) with about 40% of participants having level 1
Covariate 4, which is binary (0 vs. 1) with about 50% of participants having level 1
This is a simulated dataset used to illustrate Bayesian dynamic borrowing in the case when borrowing from an external control arm with a binary endpoint, where the baseline covariate distributions of the internal and external data are balanced via inverse probability weighting.
int_binary_dfint_binary_df
int_binary_dfA data frame with 160 rows and 7 columns:
Unique subject ID
Covariate 1, which is normally distributed around 62 with an sd of 8
Covariate 2, which is binary (0 vs. 1) with about 40% of participants having level 1
Covariate 3, which is binary (0 vs. 1) with about 40% of participants having level 1
Covariate 4, which is binary (0 vs. 1) with about 60% of participants having level 1
Treatment indicator, where 0 = control and 1 = active treatment
Response, which is binary (0 vs. 1)
This is a simulated dataset used to illustrate Bayesian dynamic borrowing in the case when borrowing from an external control arm with a normal endpoint, where the baseline covariate distributions of the internal and external data are balanced via inverse probability weighting.
int_norm_dfint_norm_df
int_norm_dfA data frame with 120 rows and 7 columns:
Unique subject ID
Covariate 1, which is normally distributed around 55 with a SD of 8
Covariate 2, which is binary (0 vs. 1) with about 30% of participants having level 1
Covariate 3, which is binary (0 vs. 1) with about 50% of participants having level 1
Covariate 4, which is binary (0 vs. 1) with about 30% of participants having level 1
Treatment indicator, where 0 = control and 1 = active treatment
Response, which is normally distributed with a SD of 0.15
This is a simulated dataset used to illustrate Bayesian dynamic borrowing in the case when borrowing from an external control arm with a time-to-event endpoint, where the baseline covariate distributions of the internal and external data are balanced via inverse probability weighting.
int_tte_dfint_tte_df
int_tte_dfA data frame with 160 rows and 10 columns:
Unique subject ID
Response (observed time at which the participant either had an event or was censored)
Enrollment time
Time from study start
Event indicator (1: event; 0: censored)
Treatment indicator, where 0 = control and 1 = active treatment
Covariate 1, which is normally distributed around 62 with a SD of 8
Covariate 2, which is binary (0 vs. 1) with about 40% of participants having level 1
Covariate 3, which is binary (0 vs. 1) with about 40% of participants having level 1
Covariate 4, which is binary (0 vs. 1) with about 60% of participants having level 1
Inverse Logit Function
inv_logit(x)inv_logit(x)
x |
Real number(s) to take the inverse logit of |
This function is a short hand to .
Vector of probabilities between 0 and 1
inv_logit(0.5)inv_logit(0.5)
Test If Propensity Score Object
is_prop_scr(x)is_prop_scr(x)
x |
Object to test |
Boolean
library(dplyr) x <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) is_prop_scr(x)library(dplyr) x <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) is_prop_scr(x)
Extract Means of Mixture Components
mix_means(x)mix_means(x)
x |
A mixture distributional object |
If a distributional object that is a mixture of two or more normal distributions is read in, the function will return a numeric object with the means of each normal component. If the distributional object is a mixture of two or more multivariate normal distributions, the function will return a list with the mean vectors of each multivariate normal component.
numeric or list object
library(distributional) mix_norm <- dist_mixture(comp1 = dist_normal(1, 10), comp2 = dist_normal(1.5, 12), weights = c(.5, .5)) mix_means(mix_norm)library(distributional) mix_norm <- dist_mixture(comp1 = dist_normal(1, 10), comp2 = dist_normal(1.5, 12), weights = c(.5, .5)) mix_means(mix_norm)
Extract Standard Deviations of Mixture Components
mix_sigmas(x)mix_sigmas(x)
x |
A mixture distributional object |
If a distributional object that is a mixture of two or more normal distributions is read in, the function will return a numeric object with the standard deviations of each normal component. If the distributional object is a mixture of two or more multivariate normal distributions, the function will return a list with the covariance matrices of each multivariate normal component.
numeric or list object
library(distributional) mix_norm <- dist_mixture(comp1 = dist_normal(1, 10), comp2 = dist_normal(1.5, 12), weights = c(.5, .5)) mix_sigmas(mix_norm)library(distributional) mix_norm <- dist_mixture(comp1 = dist_normal(1, 10), comp2 = dist_normal(1.5, 12), weights = c(.5, .5)) mix_sigmas(mix_norm)
Plot Distribution
plot_dist(...)plot_dist(...)
... |
Distributional object(s) to plot. When passing multiple objects naming them will change the labels in the plot, else they will use the distributional format |
ggplot object that is the density of the provided distribution
library(distributional) plot_dist(dist_normal(0, 1)) plot_dist(dist_multivariate_normal(mu = list(c(1, 2)), sigma = list(matrix(c(4, 2, 2, 3), ncol=2)))) #Plotting Multiple plot_dist(dist_normal(0, 1), dist_normal(10, 5)) plot_dist('Prior' = dist_normal(0, 1), 'Posterior' = dist_normal(10, 5))library(distributional) plot_dist(dist_normal(0, 1)) plot_dist(dist_multivariate_normal(mu = list(c(1, 2)), sigma = list(matrix(c(4, 2, 2, 3), ncol=2)))) #Plotting Multiple plot_dist(dist_normal(0, 1), dist_normal(10, 5)) plot_dist('Prior' = dist_normal(0, 1), 'Posterior' = dist_normal(10, 5))
Propensity Score Cloud Plot
prop_scr_cloud(x, trimmed_prop_scr = NULL)prop_scr_cloud(x, trimmed_prop_scr = NULL)
x |
A |
trimmed_prop_scr |
A trimmed |
ggplot object
library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) ps_obj_trimmed <- trim_ps(ps_obj, low = 0.1, high = 0.6) # Plotting the Propensity Scores prop_scr_cloud(ps_obj, trimmed_prop_scr = ps_obj_trimmed)library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) ps_obj_trimmed <- trim_ps(ps_obj, low = 0.1, high = 0.6) # Plotting the Propensity Scores prop_scr_cloud(ps_obj, trimmed_prop_scr = ps_obj_trimmed)
Plot overlapping density curves of the propensity scores for both the internal and external participants, or plot external IPWs.
prop_scr_dens( x, variable = c("propensity score", "ps", "inverse probability weight", "ipw"), ... )prop_scr_dens( x, variable = c("propensity score", "ps", "inverse probability weight", "ipw"), ... )
x |
Propensity score object |
variable |
Variable to plot. It must be either a propensity score ("ps" or "propensity score") or inverse probability weight ("ipw" or "inverse probability weight") |
... |
Optional arguments for |
ggplot object
library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # Plotting the Propensity Scores prop_scr_dens(ps_obj) # Or plotting the inverse probability weights prop_scr_dens(ps_obj, variable = "ipw")library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # Plotting the Propensity Scores prop_scr_dens(ps_obj) # Or plotting the inverse probability weights prop_scr_dens(ps_obj, variable = "ipw")
Plot overlapping histograms of the propensity scores for both the internal and external participants, or plot external IPWs.
prop_scr_hist( x, variable = c("propensity score", "ps", "inverse probability weight", "ipw"), ... )prop_scr_hist( x, variable = c("propensity score", "ps", "inverse probability weight", "ipw"), ... )
x |
Propensity score object |
variable |
Variable to plot. It must be either a propensity score ("ps" or "propensity score") or inverse probability weight ("ipw" or "inverse probability weight") |
... |
Optional arguments for |
ggplot object
library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # Plotting the Propensity Scores prop_scr_hist(ps_obj) # Or plotting the inverse probability weights prop_scr_hist(ps_obj, variable = "ipw")library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # Plotting the Propensity Scores prop_scr_hist(ps_obj) # Or plotting the inverse probability weights prop_scr_hist(ps_obj, variable = "ipw")
Plot the unadjusted and IPW-adjusted absolute standardized mean differences for each covariate.
prop_scr_love(x, reference_line = NULL, ...)prop_scr_love(x, reference_line = NULL, ...)
x |
Propensity score object |
reference_line |
Numeric value of where along the x-axis the vertical reference line should be placed |
... |
Optional options for |
ggplot object
library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # Plotting the Propensity Scores prop_scr_love(ps_obj, reference_line = 0.1)library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_norm_df, trt == 0), external_df = ex_norm_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # Plotting the Propensity Scores prop_scr_love(ps_obj, reference_line = 0.1)
prop_scr objectRescale a prop_scr object
rescale_ps(x, n = NULL, scale_factor = NULL)rescale_ps(x, n = NULL, scale_factor = NULL)
x |
a |
n |
Desired sample size that the external data should effectively
contribute to the analysis of the internal trial data. This will be used to
scale the external weights if |
scale_factor |
Value to multiple all weights by. This will be used to
scale the external weights if |
a prop_scr object with rescaled weights
library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # weights in a propensity score object can be rescaled to achieve a desired # effective sample size (i.e., sum of weights) rescale_ps(ps_obj, n = 75) # Or by a predetermined factor rescale_ps(ps_obj, scale_factor = 1.5)library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) # weights in a propensity score object can be rescaled to achieve a desired # effective sample size (i.e., sum of weights) rescale_ps(ps_obj, n = 75) # Or by a predetermined factor rescale_ps(ps_obj, scale_factor = 1.5)
Adds vague normal component, where the level of vagueness is controlled by
the n parameter
robustify_mvnorm(prior, n, weights = c(0.5, 0.5))robustify_mvnorm(prior, n, weights = c(0.5, 0.5))
prior |
Multivariate Normal distributional object |
n |
Number of theoretical participants (or events, for time-to-event data) |
weights |
Vector of weights, where the first number corresponds to the informative component and the second is the vague |
In cases with a time-to-event endpoint, a robust mixture prior can be
created by adding a vague multivariate normal component to any multivariate
normal prior with mean vector and covariance matrix
. The vague component is calculated to have the
same mean vector and covariance matrix equal to
, where n is the specified number of
theoretical events.
mixture distribution
library(distributional) robustify_mvnorm( dist_multivariate_normal(mu = list(c(1, 0)), sigma = list(c(10, 5))), n = 15)library(distributional) robustify_mvnorm( dist_multivariate_normal(mu = list(c(1, 0)), sigma = list(c(10, 5))), n = 15)
Adds vague normal component, where the level of vagueness is controlled by
the n parameter
robustify_norm(prior, n, weights = c(0.5, 0.5))robustify_norm(prior, n, weights = c(0.5, 0.5))
prior |
Normal or Multivariate Normal distributional object |
n |
Number of theoretical participants (or events, for time-to-event data) |
weights |
Vector of weights, where the first number corresponds to the informative component and the second is the vague |
In cases with a normal endpoint, a robust mixture prior can be created by
adding a vague normal component to any normal prior with mean
and variance .The vague component is calculated to have the
same mean and variance equal to , where
n is the specified number of theoretical participants. If robustifying a normal
power prior that was calculated from external control data and n is defined as
the number of external control participants, and the vague component would
then correspond to one external control participant's worth of data.
mixture distribution
library(distributional) robustify_norm(dist_normal(0,1), n = 15)library(distributional) robustify_norm(dist_normal(0,1), n = 15)
Simulate Participant Accrual Times
sim_accrual(n, accrual_periods, accrual_props)sim_accrual(n, accrual_periods, accrual_props)
n |
Number of participants |
accrual_periods |
Vector of right endpoints defining the time periods of accrual, e.g., c(6,8) defines 0<=x<6, 6<=x<8. |
accrual_props |
Vector indicating the proportion of participants that are expected to be enrolled during each of the defined accrual periods. Should sum to 1, otherwise these proportions will be normalized. |
Simulate the accrual times for each participant, where
accrual_periods defines the right time points for each accrual period
(with the last element corresponding to the end of accrual), and
accrual_props defines the proportion of study participants who are
enrolled during each of these periods. The simulated accrual times for
participants within a given accrual period are assumed to be uniformly
distributed.
Vector of accrual times corresponding to when each participant enters the study
at <- sim_accrual(n = 100000, accrual_periods = c(6, 8), accrual_props = c(.5, .5)) hist(at, breaks = 100, main = "Histogram of Enrollment Times", xlab = "Enrollment Time")at <- sim_accrual(n = 100000, accrual_periods = c(6, 8), accrual_props = c(.5, .5)) hist(at, breaks = 100, main = "Histogram of Enrollment Times", xlab = "Enrollment Time")
Simulate Event Times for Each Individual from a Piecewise Constant Hazard Model
sim_pw_const_haz(n, hazard_periods = NULL, hazard_values)sim_pw_const_haz(n, hazard_periods = NULL, hazard_values)
n |
Number of individuals |
hazard_periods |
Vector of break points between time periods with separate constant hazards, e.g., c(6,8) defines [0,6), [6,8), [8, infinity). Leave as NULL if defining only one hazard period. |
hazard_values |
Vector of constant hazard values associated with the time intervals |
Simulate the event or censoring times for each participant using a
piecewise constant hazard model where each time period is defined to have
a different constant hazard. Here, hazard_periods defines the right
time points for each time period (with the exception of the last time
period which extends to infinity), and hazard_values defines the constant
hazards for each time period.
Vector of simulated times from the time-to-event distribution
tte_dat <- sim_pw_const_haz(n = 100000, hazard_periods = c(6, 8), hazard_values = c(0.1, 0.1, 0.1)) hist(tte_dat, breaks = 100, main = "Event Time Distribution", xlab = "Event Time")tte_dat <- sim_pw_const_haz(n = 100000, hazard_periods = c(6, 8), hazard_values = c(0.1, 0.1, 0.1)) hist(tte_dat, breaks = 100, main = "Event Time Distribution", xlab = "Event Time")
Simulate Event Times for Each Participant from a Weibull Proportional Hazards Regression Model
sim_weib_ph(weibull_ph_mod, samp_df, cond_drift = 0, cond_trt_effect = 0)sim_weib_ph(weibull_ph_mod, samp_df, cond_drift = 0, cond_trt_effect = 0)
weibull_ph_mod |
|
samp_df |
Data frame of covariates corresponding to the sample arm
(control or treated) for which event times should be simulated. The column
names should correspond to the covariate names in the |
cond_drift |
Optional value of the conditional drift by which the intercept in the Weibull proportional hazards regression model should be increased/decreased to incorporate the impact of unmeasurable sources of drift. Default is 0. |
cond_trt_effect |
Optional value of the conditional treatment effect by which the intercept in the Weibull proportional hazards regression model should be increased/decreased if simulating event data for a treated arm. Default is 0. |
Simulate the event times for each participant using a Weibull
proportional hazards (PH) regression model. The "true" parameter values
for the Weibull shape and the regression coefficients
are assumed to be equal to the parameter estimates
from a survreg object (weibull_ph_mod) fit using external data (note
that the Weibull shape parameter is defined as the inverse of
the scale parameter reported by survreg).
For participant , let denote the time-to-event random
variable and the
vector of covariates (row of samp_df) that correspond
to the -dimensional vector of regression coefficients
. The density function of the Weibull PH regression
model is
where . Here, and
denote the conditional drift (cond_drift) and conditional treatment
effect (cond_trt_effect), respectively, that can be calculated using
calc_cond_weibull() for desired values of the marginal drift and marginal
treatment effect.
Vector of simulated event times from a Weibull proportional hazards regression model
library(dplyr) library(survival) # Model "true" regression coefficients and shape parameter using the external data weibull_ph_mod <- survreg(Surv(y, event) ~ cov1 + cov2 + cov3 + cov4, data = ex_tte_df, dist = "weibull") # Sample covariates for internal control arm via bootstrap from external data samp_int_ctrl <- bootstrap_cov(ex_tte_df, n = 100) |> select(c(cov1, cov2, cov3, cov4)) # keep only covariate columns tte_dat <- sim_weib_ph(weibull_ph_mod, samp_df = samp_int_ctrl)library(dplyr) library(survival) # Model "true" regression coefficients and shape parameter using the external data weibull_ph_mod <- survreg(Surv(y, event) ~ cov1 + cov2 + cov3 + cov4, data = ex_tte_df, dist = "weibull") # Sample covariates for internal control arm via bootstrap from external data samp_int_ctrl <- bootstrap_cov(ex_tte_df, n = 100) |> select(c(cov1, cov2, cov3, cov4)) # keep only covariate columns tte_dat <- sim_weib_ph(weibull_ph_mod, samp_df = samp_int_ctrl)
Create visualization plots to help identify the "sweet spot" in borrowing
strategies across different simulation scenarios. For each unique scenario
defined by the combination of variables in scenario_vars, the function
produces a plot showing power, type I error, and the distribution of the
design prior for the control marginal parameter for approaches with and
without borrowing.
sweet_spot_plot( .data, scenario_vars, trt_diff, control_marg_param, h0_prob, h0_prob_no_borrowing, design_prior = NULL, highlight = TRUE )sweet_spot_plot( .data, scenario_vars, trt_diff, control_marg_param, h0_prob, h0_prob_no_borrowing, design_prior = NULL, highlight = TRUE )
.data |
A data frame containing iteration-level simulation results. |
scenario_vars |
A vector of quoted column names corresponding to variables used to define unique simulation scenarios. Each unique combination of values in these columns will generate a separate plot. |
trt_diff |
An unquoted column name representing the treatment difference. Used to identify scenarios with null effect (trt_diff = 0) for type I error calculation. |
control_marg_param |
An unquoted column name to be used as the x-axis in the plots. This is typically the control endpoint of interest on the marginal scale (e.g., control response rate). |
h0_prob |
An unquoted column name containing the probability of rejecting the null hypothesis when when borrowing external data. |
h0_prob_no_borrowing |
An unquoted column name containing the probability of rejecting the null hypothesis when not borrowing external data. |
design_prior |
An unquoted column name containing distributional objects
that represent the design prior distribution for the control marginal
parameter (e.g., posterior distribution using the external control data).
Used to aid visualization of which values of the control marginal parameter
are assumed to be plausible. Default is |
highlight |
Logical value to indicate if you want sweet spot
highlighting or not. If |
The function calculates power and type I error rates for BDB approaches
that borrow from external data (e.g., use of a robust mixture prior with positive
weight on the informative component) and an approach that does not
borrow from external data (e.g., use of a vague prior) under each scenario
and visualizes them together as a function of the underlying control marginal
parameter of interest (e.g., control response rate for binary outcomes) that
may vary as a result of drift. This helps identify the "sweet spot" where borrowing
results in higher power and lower type I error rates compared to not borrowing.
Type I error is calculated using scenarios where trt_diff equals 0, and power
is calculated for all scenarios with positive values of trt_diff.
If design_prior is non-NULL, the design prior distribution is included
in the plot to provide insight into which values of the control marginal
parameter are plausible given this assumed design prior. We note that
design_prior can represent any informative prior that potentially
incorporates the external control data (e.g., the posterior distribution of
the control marginal parameter constructed using the external data and a
vague prior). Each element of the vector corresponding to design_prior must
be a distributional object with a family equal to "beta", "normal", or
"mixture" (where each component is either "beta" or "normal"). For the
time-to-event case in which a multivariate normal prior is assumed for the
control log-shape and intercept of a Weibull proportional hazards model,
this distribution must first be translated into a univariate beta design
prior for the control survival probability at some prespecified time.
This approximation can be done using approx_mvn_at_time(). If the
design priors in the vector indicated by design_prior differ across
iterations within a given scenario (e.g., using the IPW power prior as the
iteration-specific design prior), then the average distribution will be
plotted (i.e., a distribution of the same family with the hyperparameters
averaged across iterations).
A list of ggplot objects, one for each unique scenario defined by
scenario_vars. Each plot shows:
Power curves for the cases with and without borrowing
Type I error rates for the cases with and without borrowing
Distribution of the design prior (if design_prior is specified)
Best, N., Ajimi, M., Neuenschwander, B., Saint-Hilary, G., & Wandel, S. (2024). Beyond the Classical Type I Error: Bayesian Metrics for Bayesian Designs Using Informative Priors. Statistics in Biopharmaceutical Research, 17(2), 183–196. doi:10.1080/19466315.2024.2342817
library(dplyr) # Assuming binary_sim_df is a data frame with simulation results in the shape # of binary template code plots <- sweet_spot_plot( .data = binary_sim_df, scenario_vars = c("population", "marg_trt_eff"), trt_diff = marg_trt_eff, control_marg_param = true_control_RR, h0_prob = reject_H0_yes, h0_prob_no_borrowing = no_borrowing_reject_H0_yes, design_prior = pwr_prior ) # Display the first plot plots[[1]] tte_plots <- tte_sim_df |> mutate(beta_appox = approx_mvn_at_time(mix_prior, time = 12)) |> sweet_spot_plot( scenario_vars = c("population", "marg_trt_eff"), trt_diff = marg_trt_eff, control_marg_param = true_control_surv_prob, h0_prob = reject_H0_yes, h0_prob_no_borrowing = no_borrowing_reject_H0_yes, design_prior = beta_appox ) tte_plots[[1]]library(dplyr) # Assuming binary_sim_df is a data frame with simulation results in the shape # of binary template code plots <- sweet_spot_plot( .data = binary_sim_df, scenario_vars = c("population", "marg_trt_eff"), trt_diff = marg_trt_eff, control_marg_param = true_control_RR, h0_prob = reject_H0_yes, h0_prob_no_borrowing = no_borrowing_reject_H0_yes, design_prior = pwr_prior ) # Display the first plot plots[[1]] tte_plots <- tte_sim_df |> mutate(beta_appox = approx_mvn_at_time(mix_prior, time = 12)) |> sweet_spot_plot( scenario_vars = c("population", "marg_trt_eff"), trt_diff = marg_trt_eff, control_marg_param = true_control_surv_prob, h0_prob = reject_H0_yes, h0_prob_no_borrowing = no_borrowing_reject_H0_yes, design_prior = beta_appox ) tte_plots[[1]]
Tidy a(n) prop_scr object
## S3 method for class 'prop_scr' tidy(x, ...)## S3 method for class 'prop_scr' tidy(x, ...)
x |
a |
... |
Unused, included for generic consistency only. |
A tidy tibble::tibble() summarizing the results of the propensity
score weighting. The tibble will have the id column of the external data,
an internal column to indicate all the data is external, a ps column
with the propensity scores and a weight column with the inverse
probability weights
library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) tidy(ps_obj)library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) tidy(ps_obj)
prop_scr objectTrim a prop_scr object
trim_ps(x, low = NULL, high = NULL, quantile = FALSE)trim_ps(x, low = NULL, high = NULL, quantile = FALSE)
x |
A |
low |
Low cut-off such that all participants with propensity scores less
than this value (or quantile if |
high |
High cut-off such that all participants with propensity scores
greater than this value (or quantile if |
quantile |
True/False value to determine if the cut-off values are based directly on the propensity scores (false) or their quantiles (true). By default this is false. |
This function uses R's default method of quantile calculation (type 7)
a prop_scr object with a trimmed propensity score distribution
library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) trim_ps(ps_obj, low = 0.3, high = 0.7)library(dplyr) ps_obj <- calc_prop_scr(internal_df = filter(int_binary_df, trt == 0), external_df = ex_binary_df, id_col = subjid, model = ~ cov1 + cov2 + cov3 + cov4) trim_ps(ps_obj, low = 0.3, high = 0.7)
This is an example of output from a simulation study that investigates the
operating characteristics of inverse probability weighted Bayesian dynamic
borrowing for the case with a time-to-event outcome. This output was generated
based on the time-to-event simulation template. For this simulation study, only the
degree of covariate imbalance (as indicated by population) and the
marginal treatment effect were varied.
tte_sim_dftte_sim_df
tte_sim_df A data frame with 18 rows and 7 columns:Populations defined by different covariate imbalances
Marginal treatment effect
True control survival probability at time t=12 months on the marginal scale
Probability of rejecting the null hypothesis in the case with borrowing
Probability of rejecting the null hypothesis without borrowing
Vector of IPW power priors as distributional objects
Vector of mixture priors (i.e., the robustified IPW power priors) as distributional objects