On flexible covariate adjustment under covariate-constrained randomization

Bingkai Wang; Fan Li

doi:10.1177/17407745261423479

. Author manuscript; available in PMC: 2026 Mar 7.

Published before final editing as: Clin Trials. 2026 Mar 3:17407745261423479. doi: 10.1177/17407745261423479

On flexible covariate adjustment under covariate-constrained randomization

Bingkai Wang ¹, Fan Li ^2,³

PMCID: PMC12965748 NIHMSID: NIHMS2143160 PMID: 41776765

Abstract

Covariate-constrained randomization is an effective treatment allocation procedure for controlling imbalance across multiple baseline covariates in randomized trials. Motivated by the GroupPMPlus cluster randomized trial, we introduce the asymptotic theory for a broad class of estimators, known as M-estimators, under covariate-constrained randomization. Here, M-estimators refer to estimators obtained by optimizing an objective function, such as a log-likelihood function, and include commonly used methods such as analysis of covariance and linear mixed models. We show that M-estimators remain consistent in this setting but can exhibit non-Gaussian asymptotic distributions depending on the specification. Using examples of common M-estimators, we delineate conditions under which covariate-constrained randomization can be safely ignored in statistical analysis. Our results extend to stratified covariate-constrained randomization and semiparametric efficient estimators based on data-adaptive machine learning methods. We illustrate these theoretical findings using the GroupPMPlus study to evaluate the causal effect of a psychological treatment on mental health outcomes following a disaster.

Keywords: Asymptotic theory, constrained randomization, covariate adjustment, causal inference, M-estimation, machine learning

Introduction

Covariate-constrained randomization refers to a restricted randomization procedure in which treatment assignments are repeatedly generated until a predefined balance criterion on baseline covariates is met. Compared with simple randomization or stratified randomization,¹ this approach can simultaneously balance several covariates of different types and allows continuous control over the randomization space through a design parameter. Compared with minimization, which balances multiple categorical covariates, covariate-constrained randomization can accommodate both continuous and categorical covariates, offering additional flexibility. These advantages have contributed to its increasing use in both economics,² where it is often referred to as rerandomization, and biomedical research,^3,4 particularly in the context of cluster randomized trials to improve baseline balance.^5–7

Our work is motivated by the GroupPMPlus trial,⁸ a cluster randomized study in Nepal designed to improve the mental health of individuals affected by humanitarian crises such as pandemics, armed conflict, and environmental disasters. The intervention, Group Problem Management Plus, consisted of five weekly sessions and was compared with standard care. A total of 72 wards, the smallest administrative units in Nepal, were enrolled, comprising 609 participants. The intervention and standard care were both implemented at the ward level, and wards were equally randomized to receive intervention or standard care. The trial employed stratified covariate-constrained randomization, with stratification based on gender (uniform within wards) and covariate-constrained randomization based on three binary cluster-level covariates: access to mental health services (high or low), disaster risk (high or low), and rural or urban geographical location. The primary outcome was the GHQ-12 score, a continuous measure of psychological distress, assessed 3 months after treatment initiation.

Several methodological questions arose as we approached the analysis of the GroupPMPlus trial. First, although mixed-effects models are routinely used to adjust for baseline covariates while accounting for the within-cluster correlations,⁹ their asymptotic validity under covariate-constrained randomization has not been fully established; see Wang et al.¹⁰ for the asymptotic theory for linear mixed model analysis of cluster randomized trials under simple randomization. Second, incorporating covariate-constrained randomization into the statistical analysis typically requires knowledge of the balancing threshold, which is not directly reported for this study from their primary publication. The implications of the balancing threshold on the asymptotic property of the treatment effect estimator remain elusive. Third, while data-adaptive machine learning methods have shown potential for maximizing efficiency gains in randomized trials,^11,12 it is unclear whether this property carries to covariate-constrained randomization. Addressing these several methodological questions can not only strengthen the statistical analysis of the GroupPMPlus trial but also provide a sound theoretical foundation for future individually randomized and cluster randomized studies that adopt covariate-constrained randomization.

In this paper, we expand the asymptotic theory of M-estimators and efficient estimators—in which nuisance functions are estimated via data-adaptive machine learning algorithms—to the settings of covariate-constrained randomization and its stratified extension.¹³ These large-sample results generalize earlier work that focuses on linearly adjusted estimators^14–16 to a much wider class of treatment effect estimators commonly used in analyzing clinical trials.

Methods

Definitions and assumptions

As a general setup, we consider a randomized clinical trial with $n$ individuals. For each individual $i (i = 1, \dots, n)$ , we observe a real-valued outcome $Y_{i}$ , a treatment allocation variable $A_{i} (A_{i} = 1)$ for treatment and 0 for control), and a vector of baseline covariates $X_{i}$ . We define $Y_{i} (a)$ as the potential outcome if individual $i$ were assigned to treatment $a, a \in {0, 1}$ and assume causal consistency such that $Y_{i} = Y_{i} (A_{i})$ . Furthermore, we assume that the vectors $(Y_{i} (1), Y_{i} (0), X_{i})$ for $i = 1, \dots, n$ are independent and identically distributed. Under the super-population causal inference framework, the target estimand is the average treatment effect, defined as $Δ^{*} = E [Y (1) - Y (0)]$ , where the expectation is taken with respect to a notional super-population that generates the full data vector.¹⁷

Covariate-constrained randomization controls for imbalance on a pre-specified set of baseline covariates, which we denote as $X_{i}^{r}$ with $X_{i}^{r}$ being a subset of $X_{i}^{r}$ , and involves the following three steps. First, we independently generate $A_{1}^{*}, \dots, A_{n}^{*}$ from a Bernoulli distribution with $P (A_{i}^{*} = 1) = π \in (0, 1)$ as in simple randomization. Next, we compute the imbalance statistic and its variance estimator as

I = \frac{1}{N_{1}} \sum_{i = 1}^{n} A_{i}^{*} X_{i}^{r} - \frac{1}{N_{0}} \sum_{i = 1}^{n} (1 - A_{i}^{*}) X_{i}^{r},

\hat{V a r} (I) = \frac{1}{N_{1} N_{0}} \sum_{i = 1}^{n} (X_{i}^{r} - \bar{X^{r}}) {(X_{i}^{r} - \bar{X^{r}})}^{⊤}

where $N_{1} = \sum_{i = 1}^{n} A_{i}^{*}, N_{0} = n - N_{1}^{*}$ and $\bar{X^{r}} = n^{- 1} \sum_{i = 1}^{n} X_{i}^{r}$ . Finally, given a pre-specified balance threshold $t > 0$ , we check a Mahalanobis distance type condition of whether $I^{⊤} {\hat{V a r} (I)}^{- 1} I < t$ . If true, the final treatment assignment ( $A_{1}, \dots, A_{n}$ ) is set to be $(A_{1}^{*}, \dots, A_{n}^{*})$ ; otherwise, we return to the first step and regenerate the treatment assignment until the condition is met.¹⁸ As a result, the output will be a set of treatment assignments satisfying the balancing criterion.

Under covariate-constrained randomization, a smaller $t$ corresponds to a higher rejection rate for the generated allocation and a stronger control over chance imbalance. When covariate-constrained randomization is combined with stratified randomization (e.g. the stratified covariate-constrained randomization), the first step (generating treatment allocation under simple randomization) will be replaced by stratified randomization among pre-specified discrete baseline strata; the second and third steps remain unchanged. Thus, the strata variables will typically be perfectly balanced as covariate-constrained randomization is implemented within each stratum.

Asymptotic results for M-estimators

M-estimators¹⁹ broadly refer to the class of estimators obtained by solving estimating equations (or equivalently optimizing an objective function). Denoting $O_{i} = (Y_{i}, A_{i}, X_{i})$ , we first specify an estimation function $ψ (O_{i} \in θ)$ , where $θ$ is a vector of parameters, and then estimate $θ$ by solving $\sum_{i = 1}^{n} ψ (O_{i}; θ) = 0$ . The resulting estimator $\hat{θ}$ is called an M-estimator for $θ$ . In our setting, the average treatment effect $Δ$ will be a parameter in $θ$ (or as a function of $θ$ ) such that this procedure can output an M-estimator $\hat{Δ}$ that converges to a probability limit $Δ^{*}$ .

This framework encompasses many estimation methods. For instance, maximum likelihood estimators are M-estimators where $ψ (O_{i}; θ)$ is the score function. To contextualize the development, we describe two additional examples of M-estimators that will be applied to the GroupPMPlus trial. In the context of cluster randomized trials with simple or stratified randomization, these two M-estimators are previously discussed by Wang et al.¹⁰ and are shown to provide model-robust estimators for the average treatment effect under arbitrary misspecification.

Example 1 (The ANCOVA estimator). Consider fitting a working model $E [Y ∣ A, X] = β_{0} + β_{A} A + β_{X}^{⊤} X$ with ordinary least squares to obtain estimators $({\hat{β}}_{0}, {\hat{β}}_{A}, {\hat{β}}_{X})$ for parameters ( $β_{0}, β_{A}, β_{X}$ ). The analysis of covariance (ANCOVA) estimator for the average treatment effect is $\hat{Δ} = {\hat{β}}_{A}$ . In the context of the cluster randomization, because the ordinary least squares approach is used for point estimation, $\hat{Δ}$ can also be considered as the generalized estimating equations estimator under working independence.

Example 2 (The Mixed-ANCOVA estimator). In cluster randomized trials, the outcomes for each cluster become an $N_{i}$ -dimensional vector $Y_{i} = (Y_{i 1}, \dots, Y_{i N_{i}})$ , where $N_{i}$ is the size of cluster $i$ . In addition, the covariate $X_{i}$ is a collection of individual covariates $\{X_{1 i}, \dots, X_{i N_{i}}\}$ . Then, covariate-constrained randomization is based on $X_{i}^{r}$ being summary function of $(X_{i 1}^{r}, \dots, X_{i N_{i}}^{r})$ , such as cluster averages. In this context, we fit the linear mixed model $Y_{i j} = β_{0} + β_{A} A_{i} + β_{X} X_{i j} + δ_{i} + ε_{i j}$ , where $δ_{i} ~ N (0, τ^{2})$ is the random intercept and $ε_{i j} ~ N (0, σ^{2})$ is the independent error. Under maximum likelihood estimation, the mixed-model ANCOVA estimator for the average treatment effect is $\hat{Δ} = {\hat{β}}_{A} .$

Under mild regularity conditions, M-estimators have the following properties. First, an M-estimator has the same consistency (asymptotically unbiasedness) under simple and covariate-constrained randomization. Second, its asymptotic distribution is typically non-Gaussian, but a weighted average of a Gaussian distribution and a truncated normal distribution. Third, it has no larger asymptotic variance under covariate-constrained randomization than under simple randomization, with more variance reduction given a smaller $t$ (tighter balance threshold or equivalently stronger balance control) or higher prognostic values in $X^{r}$ . These statistical properties are illustrated Figure 1. When stratified randomization is combined with covariate-constrained randomization, the same results hold with stratified randomization replacing simple randomization (the original version of covariate-constrained randomization). A more rigorous presentation of these technical results can be found in Wang and Li.¹³

Figure 1. — Illustration of the asymptotic distribution under different randomization schemes and parameter settings. The blue solid curve shows the distribution under simple randomization (standard normal). The red dashed curve corresponds to covariate-constrained randomization with $t = 1$ and a moderately prognostic covariate $X_{i}^{r}$ (with $R^{2} = 0.5$ in the linear regression of $Y_{i}$ on $X_{i}^{r}$ ). The green dotted curve corresponds to covariate-constrained randomization with $t = 1$ and a strongly prognostic covariate $X_{i}^{r}$ (with $R^{2} = 0.75$ in the linear regression of $Y_{i}$ on $X_{i}^{r}$ ).

According to these theoretical results, statistical analysis ignoring the covariate-constrained randomization, for example, using normal approximation for hypothesis testing, will generally be valid but conservative.¹³ This is because covariate-constrained randomization brings additional precision gain by balancing baseline covariates. To avoid such power loss, one can make statistical inferences with the exact asymptotic non-Gaussian distribution. Alternatively, appropriate covariate adjustment via outcome modeling is also an effective approach to restore the nominal rejection rate of the hypothesis testing procedure. Specifically, in Examples 1 and 2, since we have included $X^{r}$ in the regression model, under equal randomization (i.e. $π = 0.5$ ), these estimators will have the same asymptotic distribution under simple or covariate-constrained randomization.¹³ Therefore, valid yet non-conservative statistical inference for these estimators can be performed as if simple randomization were used in the design stage. This theoretical finding provides a foundation to support previously observed patterns in simulation studies, see, for example, Li et al.,^5,6 when analyzing continuous and binary outcomes in cluster randomized trials. Furthermore, this result holds regardless of the choice of $t$ , which obviates the need to know $t$ for valid inference in large samples. For other estimators, such as ANCOVA2 (ANCOVA with treatment–covariate interaction terms²⁰), G-computation with a working logistic regression model,²¹ and doubly robust estimators to address missing outcomes,²² adjusting for in the outcome regression can similarly capture the precision gains induced by covariate-constrained randomization. Formal justification is provided in Wang et al.,¹³ which extends results established under stratified permuted block randomization in Ye et al.²³ and Wang et al.²⁴ to covariate-constrained randomization.

Extension to efficient estimation with debiased machine learning

In the analysis of randomized trials, statistical precision through covariate adjustment can be maximized using the debiased machine learning (DML) approach with cross-fitting.¹¹ For example, Wang et al.¹² have developed the DML approach for the analysis of cluster randomized trials with the additional complication of within-cluster subsampling. Compared to traditional regression methods, DML has been proven to achieve semiparametric efficiency in randomized trials with simple randomization. The procedure is as follows. First, we randomly partition the data into $K$ folds of approximately equal size. Next, for each fold $k = 1, \dots, K$ , we use the other $K - 1$ fold to train a machine learning algorithm for $E [Y ∣ A, X]$ , which we denote as ${\hat{η}}^{(k)} (A, X)$ . Then, we compute ${\hat{η}}^{(k)} (A, X)$ for individuals in the $k$ the fold. Going through all $K$ folds, we can obtain the machine learning prediction for each individual $i = 1, \dots, n$ , which we denote as $\hat{η} (A_{i}, X_{i})$ . Finally, the DML estimator for the average treatment effect is constructed based on the efficient influence function as

{\hat{Δ}}^{d m l} = \frac{1}{n} \sum_{i = 1}^{n} [\frac{A_{i} - π}{π (1 - π)} \{Y_{i} - \hat{η} (A_{i}, X_{i})\} + \hat{η} (1, X_{i}) - \hat{η} (0, X_{i})] .

(1)

For deriving the asymptotic results of this estimator, we assume ${\hat{η}}^{(b)} (A, X)$ is consistent in $L_{2}$ norm, which can be achieved by many existing machine learning methods, including random forests,²⁵ deep neural network,²⁶ and highly adaptive lasso.²⁷ When stratified covariate-constrained randomization is used, the above sample splitting needs to be stratified based on randomization strata to preserve treatment balance; the detailed procedure is described in Rafi²⁸ and Wang and Li.¹³

Under covariate-constrained randomization or stratified covariate-constrained randomization, we can show that ${\hat{Δ}}^{d m l}$ is consistent, asymptotically normal, and as efficient as under simple randomization. This property holds regardless of the values of $t$ or choices of $X^{r}$ , provided that $X^{r}$ is included in the machine learning algorithms as predictors. Thus, inference can proceed as if the trial used simple randomization, and statistical precision remains optimal with regard to the semiparametric variance lower bound. The detailed presentation of this result can be found in Section 6 of Wang et al.¹³ An important implication of this result is that it justifies the use of flexible, data-adaptive covariate adjustment methods under (stratified) covariate-constrained randomization, and offers a recipe for valid statistical inference as long as $E [Y | A, X]$ is consistently estimated via machine learning methods.

Illustrate analysis of the GroupPMPlus trial

We demonstrate the theoretical findings by reanalyzing the GroupPMPlus cluster randomized trial. Here, we implemented the unadjusted, ANCOVA, and DML estimators at the cluster level. Specifically, we first take cluster averages of the outcomes and covariates and then conduct the analysis using these cluster-level data. The ANCOVA estimator is defined as in Example 1, and the unadjusted estimator is defined as ANCOVA without covariates. The DML estimator is defined in equation (1), and we use an ensemble learner of generalized linear models, regression trees, and neural networks to construct $\hat{η} (A_{i}, X_{i})$ . We also fit the mixed-ANCOVA estimator of Example 2 using individual-level data. Baseline covariates included the baseline GHQ-12 score and all variables used in stratified covariate-constrained randomization.

Since we are unable to obtain information about the balancing threshold $t$ used in the randomization procedure from the published report, we carry out our analysis assuming simple randomization; that is, we compute the standard errors of each estimator and their confidence intervals under normal approximation. This choice will lead to conservative confidence intervals for an unadjusted analysis but has no impact on the covariate-adjusted estimators, as supported by our theory. That is, as long as the randomization variables are controlled by the adjusted estimators, the covariate-constrained randomization procedure should be safely ignorable asymptotically, and statistical inference procedures established under simple randomization can be used without modification.

The results are summarized in Figure 2. While all estimators have similar point estimates, their standard error estimates differ. Since we are unaware of the design parameter $t$ , the unadjusted estimator has a conservative standard error estimate of 0.78, leading to failure to reject the null at the 5% level. In contrast, all three covariate-adjusted estimators have fully accounted for the precision gain from stratified covariate-constrained randomization, thus leading to valid estimation of the variance. This is achieved even without knowing the exact randomization parameters, which further demonstrates the benefit of adjusting for randomization variables. Among the three covariate-adjusted estimators, the cluster-level ANCOVA estimator has the highest precision, while the DML estimator leads to the least variance reduction in this particular data analysis. This may be because either the sample size is relatively limited for machine learning methods to fully realize their asymptotic efficiency gain or the true data-generating distribution is almost linear in the baseline covariates (hence the cluster-level ANCOVA reasonably well approximated the data-generating process).

Conclusion and recommendation

Covariate adjustment in randomized clinical trials can be incorporated at both the design stage, through approaches such as covariate-constrained randomization, and at the analysis stage, via outcome modeling and other established covariate-adjustment techniques. Each stage offers distinct but complementary opportunities to improve statistical precision while preserving validity, as mentioned in recent regulatory guidance.²⁹ Our theoretical results demonstrate that, similar to conventional covariate-adaptive designs such as stratified randomization,²⁴ covariate-constrained randomization never increases asymptotic variances of the M-estimator for $Δ$ and, under appropriate modeling, can yield substantial precision gains.

From a methodological standpoint, our asymptotic theory for M-estimators under covariate-constrained randomization clarifies when and how the randomization scheme affects the large-sample distributional properties of treatment effect estimators under the potential outcomes framework. While consistency is retained under a broad class of outcome models due to randomization, the asymptotic distribution may deviate from normality, particularly when insufficient randomization variables are adjusted for in the analysis. This has implications for statistical inference under covariate-constrained randomization; that is, direct reliance on standard errors that ignore the randomization constraints can lead to mis-calibrated tests and confidence intervals that are typically conservative. This has been previously shown in the simulation studies for cluster randomized trials,^5,6 where the unadjusted $F$ -test under a linear mixed model can grow substantially conservative when only a few covariates are used in the covariate-constrained randomization procedure.

From a practical standpoint, linear adjustment for the randomization variables in ANCOVA offers substantial efficiency gains, is simple to carry out, and has strong theoretical support^30,31 in many common trial settings. For example, under equal allocation in two-arm trials, the ANCOVA estimator has been shown to asymptotically equivalent to the semiparametric efficient estimator (when the true outcome surface is linear),^32,33 and this can fully capture the precision gain from covariate-constrained randomization as long as the randomization variables are sufficiently adjusted. While ANCOVA is attractive for its simplicity and robustness, it may not fully capture the nuanced data-generating process (especially under unequal allocation or when the true outcome surface is nonlinear), making more flexible, data-adaptive covariate adjustment methods such as DML appealing. These methods can approximate complex relationships and, in large samples, achieve semiparametric efficiency under weak assumptions. However, there is a trade-off between model complexity and theoretical optimality. That is, DML requires a sufficiently large number of independent units to accurately estimate complex nuisance functions and realize its asymptotic benefits. In our motivating example, with 72 clusters as independent units, the advantage of DML over ANCOVA may be limited. In contrast, in individually randomized trials or cluster randomized trials with over 100 clusters, DML may have greater potential to outperform ANCOVA. A similar discussion of this balance between complexity and efficiency under simple randomization in cluster randomized trials can be found in Wang et al.,²⁴ and the same considerations apply in the context of covariate-constrained randomization designs.

For implementation, covariate-constrained randomization encompasses many variations, differing in balance metrics, weighting of covariates, and acceptance thresholds, each of which may influence both balance properties and downstream analytic requirements. Our theoretical results provide formal justification for these variants and guidance for aligning the analytic model with the specific form of constrained randomization used. In practice, we recommend that trial statisticians document the chosen balancing algorithm in sufficient detail to permit replication and to inform the choice of analytic model. Specifically, our results reinforce the recommendation of aligning the design and analysis in clinical trials (e.g. those that were discussed in Li et al.^5,6 based on their extensive simulation findings). That is, one is recommended to ensure that the analysis model incorporates randomization variables used in covariate-constrained randomization, to the extent possible, such that inference procedures developed under simple randomization can be directly applicable.

Covariate-constrained randomization is primarily used in settings where trial participants are predetermined, such as cluster randomized trials where clusters are often recruited prior to randomization or individual randomized trials involving a closed cohort identified prior to randomization. In contrast, many clinical trials enroll participants sequentially, so covariate information is only observed at the time of enrollment. To address this setting, sequential rerandomization³⁴ has been developed, in which participants are enrolled in batches and covariate-constrained randomization is applied within each batch. A formal theoretical investigation of this design is left for future research.

Finally, our work can be extended to several future directions involving high-dimensional covariate adjustment, adaptive treatment allocation, or complex missing data mechanisms. Such work will further bridge the gap between rigorous statistical theory and the increasingly rich data structures encountered in modern clinical trials. By integrating careful design choices with robust and efficient covariate-adjusted analyses, investigators can maximize both the precision and credibility of their trial findings.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Research in this article was supported in part by a Patient-Centered Outcomes Research Institute (PCORI) contract (award no. ME-2022C2–27676) and by the National Institutes of Health (NIH) grant R00AI173395. All statements in this article, including its findings and conclusions, are solely the responsibilities of the authors and do not necessarily represent the views of PCORI, its Board of Governors, Methodology Committee or NIH.

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Authors’ note

This is a summary of an invited presentation at the 17th Annual Conference on Statistical Issues in Clinical Trials, based on the arXiv preprint arXiv:2406.02834 authored by the same research team.

References

1.Zelen M The randomization and stratification of patients to clinical trials. J Chron Dis 1974; 27(7): 365–375, http://www.sciencedirect.com/science/article/pii/0021968174900150 [DOI] [PubMed] [Google Scholar]
2.Bruhn M and McKenzie D. In pursuit of balance: randomization in practice in development field experiments. Am Econ J Appl Econ 2009; 1(4): 200–232. [Google Scholar]
3.Ivers NM, Halperin IJ, Barnsley J, et al. Allocation techniques for balance at baseline in cluster randomized trials: a methodological review. Trials 2012; 13: 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Turner EL, Li F, Gallis JA, et al. Review of recent methodological developments in group-randomized trials: part 1—design. Am J Public Health 2017; 107(6): 907–915. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Li F, Lokhnygina Y, Murray DM, et al. An evaluation of constrained randomization for the design and analysis of group-randomized trials. Stat Med 2016; 35(10): 1565–1579. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Li F, Turner EL, Heagerty PJ, et al. An evaluation of constrained randomization for the design and analysis of group-randomized trials with binary outcomes. Stat Med 2017; 36(24): 3791–3806. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Zhou Y, Turner EL, Simmons RA, et al. Constrained randomization and statistical inference for multi-arm parallel cluster randomized controlled trials. Stat Med 2022; 41(10): 1862–1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Jordans MJD, Kohrt BA, Sangraula M, et al. Effectiveness of group problem management plus, a brief psychological intervention for adults affected by humanitarian disasters in Nepal: a cluster randomized controlled trial. PLoS Med 2021; 18(6): e1003621. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Turner EL, Prague M, Gallis JA, et al. Review of recent methodological developments in group-randomized trials: part 2—analysis. Am J Public Health 2017; 107(7): 1078–1086. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Wang B, Harhay MO, Small DS, et al. On the mixed-model analysis of covariance in cluster-randomized trials. arXiv Preprint arXiv:2112.00832, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Chernozhukov V, Chetverikov D, Demirer M, et al. Double/debiased machine learning for treatment and structural parameters. Econom J 2018; 21(1): C1–C68. [Google Scholar]
12.Wang B, Park C, Small DS, et al. Model-robust and efficient covariate adjustment for cluster-randomized experiments. J Am Stat Assoc 2024; 119(548): 2959–2971. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Wang B and Li F. Asymptotic inference with flexible covariate adjustment under rerandomization and stratified rerandomization. arXiv Preprint arXiv:2406.02834, 2024. [Google Scholar]
14.Li X, Ding P and Rubin DB. Asymptotic theory of rerandomization in treatment–control experiments. Proc Natl Acad Sci USA 2018; 115(37): 9157–9162. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Li X and Ding P. Rerandomization and regression adjustment. J R Stat Soc Ser B Stat Methodol 2020; 82(1): 241–268. [Google Scholar]
16.Wang X, Wang T and Liu H. Rerandomization in stratified randomized experiments. J Am Stat Assoc 2023; 118(542): 1295–1304. [Google Scholar]
17.Fay MP and Li F. Causal interpretation of the hazard ratio in randomized clinical trials. Clin Trials 2024; 21(5): 623–635. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Morgan KL and Rubin DB. Rerandomization to improve covariate balance in experiments. Ann Stat 2012; 40(2): 1263–1282. [Google Scholar]
19.van der Vaart AW. Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press, 1998. DOI: 10.1017/CBO9780511802256.006. [DOI] [Google Scholar]
20.Tsiatis AA, Davidian M, Zhang M, et al. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach. Stat Med 2008; 27(23): 4658–4677, https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3113 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Moore KL and van der Laan MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med 2009; 28(1): 39–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Robins J, Sued M, Lei-Gomez Q, et al. Comment: performance of double-robust estimators when “inverse probability” weights are highly variable. Stat Sci 2007; 22(4): 544–559. [Google Scholar]
23.Ye T, Shao J, Yi Y, et al. Toward better practice of covariate adjustment in analyzing randomized clinical trials. J Am Stat Assoc 2023; 118(544): 2370–2382. [Google Scholar]
24.Wang B, Susukida R, Mojtabai R, et al. Model-robust inference for clinical trials that improve precision by stratified randomization and covariate adjustment. J Am Stat Assoc 2023; 118(542): 1152–1163. [Google Scholar]
25.Wager S and Athey S. Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 2018; 113(523): 1228–1242. [Google Scholar]
26.Farrell MH, Liang T and Misra S. Deep neural networks for estimation and inference. Econometrica 2021; 89(1): 181–213. [Google Scholar]
27.Benkeser D and Van Der Laan M. The highly adaptive lasso estimator. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016, pp. 689–696. New York: IEEE. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Rafi A Efficient semiparametric estimation of average treatment effects under covariate adaptive randomization. arXiv Preprint arXiv:2305.08340, 2023. [Google Scholar]
29.FDA. Adjust warp covariates in randomized clinical trials for drugs and biological products: guidance for industry, 2023, https://www.fda.gov/media/148910/download
30.Lin W Agnostic notes on regression adjustments to experimental data: reexamining freedman’s critique. Ann Appl Stat 2013; 27(1): 295–318. [Google Scholar]
31.Wang B, Ogburn EL and Rosenblum M. Analysis of covariance in randomized trials: more precision and valid confidence intervals, without model assumptions. Biometrics 2019; 75(4): 1391–1400. [DOI] [PubMed] [Google Scholar]
32.Shen C, Li X and Li L. Inverse probability weighting for covariate adjustment in randomized studies. Stat Med 2014; 33(4): 555–568. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Zeng S, Li F, Wang R, et al. Propensity score weighting for covariate adjustment in randomized clinical trials. Stat Med 2021; 40(4): 842–858. [DOI] [PubMed] [Google Scholar]
34.Zhou Q, Ernst PA, Morgan KL, et al. Sequential rerandomization. Biometrika 2018; 105(3): 745–752. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R1] 1.Zelen M The randomization and stratification of patients to clinical trials. J Chron Dis 1974; 27(7): 365–375, http://www.sciencedirect.com/science/article/pii/0021968174900150 [DOI] [PubMed] [Google Scholar]

[R2] 2.Bruhn M and McKenzie D. In pursuit of balance: randomization in practice in development field experiments. Am Econ J Appl Econ 2009; 1(4): 200–232. [Google Scholar]

[R3] 3.Ivers NM, Halperin IJ, Barnsley J, et al. Allocation techniques for balance at baseline in cluster randomized trials: a methodological review. Trials 2012; 13: 1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Turner EL, Li F, Gallis JA, et al. Review of recent methodological developments in group-randomized trials: part 1—design. Am J Public Health 2017; 107(6): 907–915. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Li F, Lokhnygina Y, Murray DM, et al. An evaluation of constrained randomization for the design and analysis of group-randomized trials. Stat Med 2016; 35(10): 1565–1579. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Li F, Turner EL, Heagerty PJ, et al. An evaluation of constrained randomization for the design and analysis of group-randomized trials with binary outcomes. Stat Med 2017; 36(24): 3791–3806. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Zhou Y, Turner EL, Simmons RA, et al. Constrained randomization and statistical inference for multi-arm parallel cluster randomized controlled trials. Stat Med 2022; 41(10): 1862–1883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Jordans MJD, Kohrt BA, Sangraula M, et al. Effectiveness of group problem management plus, a brief psychological intervention for adults affected by humanitarian disasters in Nepal: a cluster randomized controlled trial. PLoS Med 2021; 18(6): e1003621. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Turner EL, Prague M, Gallis JA, et al. Review of recent methodological developments in group-randomized trials: part 2—analysis. Am J Public Health 2017; 107(7): 1078–1086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Wang B, Harhay MO, Small DS, et al. On the mixed-model analysis of covariance in cluster-randomized trials. arXiv Preprint arXiv:2112.00832, 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Chernozhukov V, Chetverikov D, Demirer M, et al. Double/debiased machine learning for treatment and structural parameters. Econom J 2018; 21(1): C1–C68. [Google Scholar]

[R12] 12.Wang B, Park C, Small DS, et al. Model-robust and efficient covariate adjustment for cluster-randomized experiments. J Am Stat Assoc 2024; 119(548): 2959–2971. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Wang B and Li F. Asymptotic inference with flexible covariate adjustment under rerandomization and stratified rerandomization. arXiv Preprint arXiv:2406.02834, 2024. [Google Scholar]

[R14] 14.Li X, Ding P and Rubin DB. Asymptotic theory of rerandomization in treatment–control experiments. Proc Natl Acad Sci USA 2018; 115(37): 9157–9162. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] 15.Li X and Ding P. Rerandomization and regression adjustment. J R Stat Soc Ser B Stat Methodol 2020; 82(1): 241–268. [Google Scholar]

[R16] 16.Wang X, Wang T and Liu H. Rerandomization in stratified randomized experiments. J Am Stat Assoc 2023; 118(542): 1295–1304. [Google Scholar]

[R17] 17.Fay MP and Li F. Causal interpretation of the hazard ratio in randomized clinical trials. Clin Trials 2024; 21(5): 623–635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Morgan KL and Rubin DB. Rerandomization to improve covariate balance in experiments. Ann Stat 2012; 40(2): 1263–1282. [Google Scholar]

[R19] 19.van der Vaart AW. Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press, 1998. DOI: 10.1017/CBO9780511802256.006. [DOI] [Google Scholar]

[R20] 20.Tsiatis AA, Davidian M, Zhang M, et al. Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: a principled yet flexible approach. Stat Med 2008; 27(23): 4658–4677, https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.3113 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Moore KL and van der Laan MJ. Covariate adjustment in randomized trials with binary outcomes: targeted maximum likelihood estimation. Stat Med 2009; 28(1): 39–64. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R22] 22.Robins J, Sued M, Lei-Gomez Q, et al. Comment: performance of double-robust estimators when “inverse probability” weights are highly variable. Stat Sci 2007; 22(4): 544–559. [Google Scholar]

[R23] 23.Ye T, Shao J, Yi Y, et al. Toward better practice of covariate adjustment in analyzing randomized clinical trials. J Am Stat Assoc 2023; 118(544): 2370–2382. [Google Scholar]

[R24] 24.Wang B, Susukida R, Mojtabai R, et al. Model-robust inference for clinical trials that improve precision by stratified randomization and covariate adjustment. J Am Stat Assoc 2023; 118(542): 1152–1163. [Google Scholar]

[R25] 25.Wager S and Athey S. Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc 2018; 113(523): 1228–1242. [Google Scholar]

[R26] 26.Farrell MH, Liang T and Misra S. Deep neural networks for estimation and inference. Econometrica 2021; 89(1): 181–213. [Google Scholar]

[R27] 27.Benkeser D and Van Der Laan M. The highly adaptive lasso estimator. In: 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Montreal, QC, Canada, 17–19 October 2016, pp. 689–696. New York: IEEE. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Rafi A Efficient semiparametric estimation of average treatment effects under covariate adaptive randomization. arXiv Preprint arXiv:2305.08340, 2023. [Google Scholar]

[R29] 29.FDA. Adjust warp covariates in randomized clinical trials for drugs and biological products: guidance for industry, 2023, https://www.fda.gov/media/148910/download

[R30] 30.Lin W Agnostic notes on regression adjustments to experimental data: reexamining freedman’s critique. Ann Appl Stat 2013; 27(1): 295–318. [Google Scholar]

[R31] 31.Wang B, Ogburn EL and Rosenblum M. Analysis of covariance in randomized trials: more precision and valid confidence intervals, without model assumptions. Biometrics 2019; 75(4): 1391–1400. [DOI] [PubMed] [Google Scholar]

[R32] 32.Shen C, Li X and Li L. Inverse probability weighting for covariate adjustment in randomized studies. Stat Med 2014; 33(4): 555–568. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R33] 33.Zeng S, Li F, Wang R, et al. Propensity score weighting for covariate adjustment in randomized clinical trials. Stat Med 2021; 40(4): 842–858. [DOI] [PubMed] [Google Scholar]

[R34] 34.Zhou Q, Ernst PA, Morgan KL, et al. Sequential rerandomization. Biometrika 2018; 105(3): 745–752. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

On flexible covariate adjustment under covariate-constrained randomization

Bingkai Wang

Fan Li

Abstract

Introduction

Methods

Definitions and assumptions

Asymptotic results for M-estimators

Figure 1.

Extension to efficient estimation with debiased machine learning

Illustrate analysis of the GroupPMPlus trial

Figure 2.

Conclusion and recommendation

Funding

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On flexible covariate adjustment under covariate-constrained randomization

Bingkai Wang

Fan Li

Abstract

Introduction

Methods

Definitions and assumptions

Asymptotic results for M-estimators

Figure 1.

Extension to efficient estimation with debiased machine learning

Illustrate analysis of the GroupPMPlus trial

Figure 2.

Conclusion and recommendation

Funding

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases