Multiple - methods for trials with multiple endpoints • drugdevelopR

Suppose we are planning a drug development program testing the superiority of an experimental treatment over a control treatment. Our drug development program consists of an exploratory phase II trial which is, in case of promising results, followed by a confirmatory phase III trial.

The drugdevelopR package enables us to optimally plan such programs using a utility-maximizing approach. To get a brief introduction, we presented a very basic example on how the package works in Introduction to planning phase II and phase III trials with drugdevelopR. Contrary to the basic setting, where only one phase II and one phase III trial were conducted, in this scenario we investigate what happens when multiple endpoints are of interest. The functions currently implemented only allow for the conduction of trials with up to two different endpoints. The drugdevelopR package enables programs where only one endpoints has to show a significant positive treatment effect or all endpoints have to show a significant positive treatment effect.

After installing the package according to the installation instructions, we can load it using the following code, and start right into the two examples:

library(drugdevelopR)
#> Loading required package: doParallel
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel

Only one endpoint has to be significant

Suppose we are developing a new tumor treatment, exper. The two endpoints we consider interesting are overall survival (OS) and progression-free survival (PFS). These are time-to-event outcome variables. Therefore, we will use the function optimal_multiple_tte, which calculates optimal sample sizes and threshold decisions values for time-to-event outcomes when multiple endpoints are investigated.

Within our drug development program, we will compare our experimental treatment to the control treatment contro. The treatment effect measure of exper for each endpoint $i \in \{1,2\}$ is given by $\theta_i = −\log(HR_i)$ , where the hazard ratio $HR_i$ is the ratio of the hazard rates between the experimental treatment and the control treatment, for each endpoint. For more information on the hazard ratio, see the vignette on time-to-event endpoints.

Defining all necessary parameters

The parameters in the setting with multiple endpoints differ slightly from the other cases (bias adjustment and multitrial). As in the basic setting, the treatment effect may be fixed (as in this example) or may follow a prior distribution (see Fixed or Prior) Note that in this case, contrary to the other settings, the second treatment effect describes the effect of the second endpoint and is also relevant for fixed = TRUE. Furthermore, some options to adapt the program to your specific needs are also available in this setting (see More parameters), however skipping phase II and choosing different treatment effects in phase II and III are not possible.

hr1 is the assumed hazard ratio of the endpoint OS. hr2 is the assumed hazard ratio of the endpoint PFS. We assume that our experimental treatment exper leads to a hazard reduction of 80% compared to the control treatment contro for endpoint OS and a reduction of 75% for PFS. Therefore, we set hr1 = 0.8 and hr2 = 0.75.
n2min and n2max specify the minimal and maximal number of participants (not events) for the phase II trial. The package will search for the optimal sample size within this region. For now, we want the program to search for the optimal sample size in the interval between 20 and 400 participants. In addition, we will tell the program to search this region in steps of ten participants at a time by setting stepn2 = 10.
Differing from the basis setting, one has to provide two sets of benefits triples, one triple b11, b21, b31 in the case endpoint OS is significant (independent of PFS beiing significant or not) and one triple b12, b22, b32 in the case OS is not significant. The intuition is that overall survival is the “more relevant” endpoint, i.e. if this endpoints shows a significant result, we have higher gains than if only the “less important” endpoint PFS shows a significant result. We assign benefit triples of {1000, 2000, 3000} for the case OS is significant and {1000, 1500, 2000} if not (all in units of 10^5$).
Finally, the parameter rho determines the correlation between the treatment effects of the two endpoints. In our case, we set rho = 0.6, indicating the correlation between the treatment effects of OS and PFS is 0.6.

set.seed(123)
res1 <- optimal_multiple_tte(hr1 = 0.8, hr2 = 0.75,                   # define assumed true HRs
                    id1 = NULL, id2 = NULL,
                    n2min = 20, n2max = 400, stepn2 = 10,             # define optimization set for n2
                    hrgomin = 0.7, hrgomax = 0.9, stephrgo = 0.05,    # define optimization set for HRgo
                    alpha = 0.025, beta = 0.1,                        # drug development planning parameters
                    c2 = 0.75, c3 = 1, c02 = 100, c03 = 150,          # define costs for phase II and III
                    K = Inf, N = Inf, S = -Inf,                       # set constraints
                    steps1 = 1,  stepm1 = 0.95,  stepl1 = 0.85,       # effect size categories
                    b11 = 1000, b21 = 2000, b31 = 3000,
                    b12 = 1000, b22 = 1500, b32 = 2000,               # define expected benefits (both categories)
                    rho = 0.6, fixed = TRUE,                          # correlation and treatment effect
                    num_cl = 2)

Interpreting the output

After setting all these input parameters and running the function, let’s take a look at the output of the program.

res1
#> Optimization result:
#>  Utility: 106.61
#>  Sample size:
#>    phase II: 90, phase III: 301, total: 391
#>  Probability to go to phase III: 0.79
#>  Total cost:
#>    phase II: 168, phase III: 420, cost constraint: Inf
#>  Fixed cost:
#>    phase II: 100, phase III: 150
#>  Variable cost per patient:
#>    phase II: 0.75, phase III: 1
#>  Effect size categories:
#>   small: 1 medium: 0.95 large: 0.85
#>  Expected gains if endpoint 1 is significant:
#>   small: 1000 medium: 2000 large: 3000
#>  Expected gains if only endpoint 2 is significant:
#>   small: 1000 medium: 1500 large: 2000
#>  Success probability: 0.4
#>  Joint probability of success and observed effect of size ... in phase III:
#>    medium: 0.21, large: 0.07
#>  Probability of endpoint 1 being significant in phase III: 0.71
#>  Significance level: 0.025
#>  Targeted power: 0.9
#>  Decision rule threshold: 0.85 [HRgo] 
#>  Assumed true effects [HR]: 
#>    endpoint 1: 0.8,  endpoint2 2: 0.75
#>  Correlation between endpoints: 0.6

The program returns a data frame, with the following results:

res1$n2 is the optimal number of participants for phase II and res$n3 the resulting number of events for phase III. We see that the optimal scenario requires 90 participants in phase II and 301 participants in phase III.
res1$HRgo is the optimal threshold value for the go/no-go decision rule. We see that we need a hazard ratio of less than 0.85 in phase II in order to proceed to phase III.
res1$u is the expected utility of the program for the optimal sample size and threshold value. In our case it amounts to 106.61, i.e. we have an expected utility of 10 661 000$.
res1$OS is the probability that endpoint OS is significant, given that the program is successful overall. The probability of a successful program is 0.4, thus, if the program is successful the probability that the endpoint OS is significant and we obtain higher benefits is 0.71.

Both endpoints have to be significant

In this scenario, suppose we are developing a new Alzheimer medication, exper. The two endpoints we consider interesting are improving cognition (cognitive endpoint) and improving activities of daily living (functional endpoint). We want both outcomes to be show statistically significant differences for the trial to be successful. These are normally distributed outcome variables. Therefore, we will use the function optimal_multiple_normal, which calculates optimal sample sizes and threshold decisions values for normally distributed outcomes when multiple endpoints are investigated.

Within our drug development program, we will compare our experimental treatment to the control treatment contro. The treatment effect measure of exper for each endpoint $i \in \{1,2\}$ is given by $\Delta_i = \mu_{exper} - \mu_{contro}$ , which is the difference in mean between the experimental treatment and the control treatment, for both endpoints. Note that this is not divided by the standard deviation, hence is it not standardized. For more information, see the vignette on normally distributed outcomes.

Defining all necessary parameters

Again, the parameters in the setting with multiple endpoints differ slightly from the other cases (bias adjustment and multitrial). As in the basic setting, the treatment effect may be fixed (as in this example) or may follow a prior distribution (see Fixed or Prior) Note, that in this case, contrary to the other settings, the second treatment effect describes the effect of the second endpoint and is also relevant for fixed = TRUE. Furthermore, some options to adapt the program to your specific needs are also available in this setting (see More parameters), however skipping phase II and choosing different treatment effects in phase II and III are not possible.

Delta1 is our assumed true treatment effect given as difference in means between exper and contro for the cognitive endpoint and Delta2 is our assumed true treatment effect given as difference in means between exper and contro for the functional endpoint. We assume true treatment effects to be 0.8 and 0.5 for each endpoint respectively, hence Delta1 = 0.8 and Delta2 = 0.5.
n2min and n2max specify the minimal and maximal number of participants for the phase II trial. The package will search for the optimal sample size within this region. For now, we want the program to search for the optimal sample size in the interval between 20 and 200 participants. In addition, we will tell the program to search this region in steps of ten participants at a time by setting stepn2 = 10.
We have to include two parameters that amount for the standard deviations $\sigma_i$ for $i \in \{1,2\}$ . We set sigma1 = 2 and sigma2 = 1 for our example.
The parameter relaxed decides how the effect sizes of the two endpoints are combined into one overall effect size (large, medium or small overall treatment effect). If relaxed = FALSE , we consider a strict rule assigning a large overall effect if both endpoints show an effect of large size, a small overall effect if at least one of the endpoints shows a small effect, and a medium overall effect otherwise. If relaxed = TRUE, we investigate the characteristics for a more relaxed rule assigning a large overall effect if at least one of the endpoints shows a large effect, a small effect if both endpoints show a small effect, and a medium overall effect otherwise. For our example, we set relaxed = TRUE
Finally, the parameter rho determines the correlation between the treatment effects of the two endpoints. In our case, we set rho = 0.5, indicating the correlation between the treatment effects is 0.5.

set.seed(123)
res2 <- optimal_multiple_normal(Delta1 = 0.8, Delta2 = 0.5,  # define assumed true treatment effects
   in1= NULL, in2= NULL, sigma1 = 2, sigma2= 1,              # standard deviations
   n2min = 20, n2max = 200, stepn2 = 10,                     # define optimization set for n2
   kappamin = 0.02, kappamax = 0.2, stepkappa = 0.02,        # define optimization set for HRgo
   alpha = 0.025, beta = 0.1,                                # drug development planning parameters
   c2 = 0.75, c3 = 1, c02 = 100, c03 = 150,                  # define fixed and variable costs for phase II and III
   K = Inf, N = Inf, S = -Inf,                               # set constraints
   steps1 = 0,  stepm1 = 0.5, stepl1 = 0.8,                  # benefit categories
   b1 = 1000, b2 = 2000, b3 = 3000,                          # define expected benefits 
   rho = 0.5, relaxed = TRUE,                                # relaxed "TRUE"
   fixed = TRUE,                                             # fixed treatment effect 
   num_cl = 2)

Interpreting the output

After setting all these input parameters and running the function, let’s take a look at the output of the program.

res2
#> Optimization result:
#>  Utility: 342.07
#>  Sample size:
#>    phase II: 120, phase III: 171, total: 291
#>  Probability to go to phase III: 0.98
#>  Total cost:
#>    phase II: 190, phase III: 318, cost constraint: Inf
#>  Fixed cost:
#>    phase II: 100, phase III: 150
#>  Variable cost per patient:
#>    phase II: 0.75, phase III: 1
#>  Effect size categories (expected gains):
#>   small: 0 (1000), medium: 0.5 (2000), large: 0.8 (3000)
#>  Success probability: 0.824
#>  Joint probability of success and observed effect of size ... in phase III:
#>    small: 0.798, medium: 0.026, large: 0
#>  Significance level: 0.025
#>  Targeted power: 0.9
#>  Decision rule threshold: 0.02 [Kappa] 
#>  Assumed true effects [Delta]: 
#>    endpoint 1: 0.8, endpoint 2: 0.5
#>  Correlation between endpoints: 0.5

The program returns a data frame, with the following results:

res$n2 is the optimal number of participants for phase II and res$n3 the resulting number of events for phase III. We see that the optimal scenario requires 120 participants in phase II and 171 participants in phase III.
res$Kappa is the optimal threshold value for the go/no-go decision rule. We see that we need a hazard ratio of less than 0.02 in phase II in order to proceed to phase III.
res$u is the expected utility of the program for the optimal sample size and threshold value. In our case it amounts to 342.07, i.e. we have an expected utility of 34 207 000$.

Where to go from here

In this article we presented examples when multiple endpoints are of interest.

For more information on how to use the package, see:

Introduction to drugdevelopR: See how the package works in a basic example.
Different outcomes: Apply it to binary endpoints and time-to-event endpoints.
Interpreting the rest of the output: Obtain further details on your drug development program.
Fixed or prior: Model the assumed treatment effect on a prior distribution.
More parameters: Define custom effect size categories. Put constraints on the optimization by defining maximum costs, the total expected sample size of the program or the minimum expected probability of a successful program. Define an expected difference in treatment effect between phase II and III. Skip phase II.
Complex drug development programs: Adapt to situations with biased effect estimators, multiple phase III trials or multi-arm trials.
Parallel computing: Be faster at calculating the optimum by using parallel computing.