Skip to main content
Environmental Health Perspectives logoLink to Environmental Health Perspectives
. 2025 Jun 19;133(6):067019. doi: 10.1289/EHP15305

Statistical Methods for Chemical Mixtures: A Roadmap for Practitioners Using Simulation Studies and a Sample Data Analysis in the PROTECT Cohort

Wei Hao 1, Amber L Cathey 2, Max M Aung 3, Jonathan Boss 1, John D Meeker 2, Bhramar Mukherjee 4,
PMCID: PMC12178341  PMID: 40392783

Abstract

Background:

Quantitative characterization of the health impacts associated with exposure to chemical mixtures has received considerable attention in current environmental and epidemiological studies. With many existing statistical methods and emerging approaches, it is important for practitioners to understand which method is best suited for their inferential goals.

Objective:

The goal of this paper is to provide empirical simulation-based evidence regarding performance of mixture methods to help guide researchers on selecting the best available methods to address three scientific questions in mixtures analysis: identifying important components of a mixture, identifying interactions among mixture components, and creating a summary score for risk stratification and prediction.

Methods:

We conducted a review and comparison of 11 analytical methods available for use in mixtures research through extensive simulation studies for continuous and binary outcomes. In addition, we carried out an illustrative data analysis using the PROTECT birth cohort from Puerto Rico to examine the associations between exposure to chemical mixtures—metals, polycyclic aromatic hydrocarbons (PAHs), phthalates, and phenols—and birth outcomes.

Results:

Our simulation results suggest that the choice of methods depends on the goal of analysis and that there is no clear winner across the board. For selection of important toxicants in the mixtures and for identifying interactions, Elastic net (Enet) by Zou et al., Lasso for Hierarchical Interactions (HierNet) by Bien et al., and selection of nonlinear interactions by a forward stepwise algorithm (SNIF) by Narisetty et al. have the most stable performance across simulation settings. For overall summary or a cumulative measure, we find that using the Super Learner to combine multiple environmental risk scores can lead to improved risk stratification and prediction properties.

Conclusions:

We develop an integrated R package “CompMix” that provides a platform for mixtures analysis where the practitioners can implement a pipeline that includes several approaches for mixtures analysis. Our study offers guidelines for selecting appropriate statistical methods for addressing specific scientific questions related to mixtures research. We identify critical gaps where new and better methods are needed. https://doi.org/10.1289/EHP15305

Introduction

In recent years, many environmental health studies have explored chemical mixtures using a variety of statistical methods aimed at characterizing the mixtures and assessing the mixtures’ effects on health outcomes. For example, these chemical mixtures or multipollutants may include phthalates, phenols, polycyclic aromatic hydrocarbons (PAHs), per- and polyfluoroalkyl substances (PFAS), metals, and more. Traditional studies of health impacts of environmental exposure have focused on examining individual agents one at a time, primarily due to the limitations in statistical methods and prohibitive sample sizes. However, in reality, humans are exposed to a wide range of chemicals in their environments via various pathways simultaneously, and their effects should be modeled jointly. There remain significant statistical challenges when studying the joint health effect of the mixtures. For example, the chemicals may exhibit complex dependence; the response-dose associations are often highly nonlinear and nonadditive; the number of the multipollutant and their interactions could be high, with their effect sizes potentially small and challenging to detect in comparison with the larger effect of demographic covariates. These challenges are difficult to address satisfactorily via standard regression models.

The National Institute of Environmental Health Sciences (NIEHS) has identified mixtures analyses as a high-priority area for research in 2013 and 20151,2 and launched the Powering Research through Innovative Methods for Mixtures in Epidemiology (PRIME) funding program to address methodological challenges in mixtures research in 2017.3 Despite the development and availability of numerous mixtures methods, there is continued discussion and debate on which methods are best suited for a given researcher’s hypothesis for a given data set. The central goal of this paper is to provide empirical simulation-based evidence regarding performance of mixture methods to help guide researchers on selecting the best available methods to address three scientific questions in data analysis: a) identifying the important toxic components of the mixture as related to the health outcome; b) identifying the interaction effects from combinations of pollutants on the outcome; and c) prediction of the health outcome and identifying high-risk mixture strata.

Our most important goal is streamline implementation challenges so that practitioners are able to explore a variety of methods in a single platform. Having one package calling for all analysis will robustify results and help identify consensus findings. To this end, we have developed an R package “CompMix: A comprehensive toolkit for environmental mixtures analysis” (doi:10.32614/CRAN.package.CompMix). The package offers the flexibility to perform various tasks, such as variable selection, interaction detection, construction of composite summary risk scores, and comparison of certain performance metrics across fitted models. Our vision going forward is to update CompMix with emerging methods as they become available. Our hope is that this paper and accompanying software will reduce the barriers in implementation of a diverse array of modern statistical methods for mixtures analysis.

Our study is motivated by a large-scale longitudinal birth cohort study, funded by the National Institutes of Health and taking place in Puerto Rico, known as the PROTECT study, which aims to increase diversity and representation of historically neglected communities in biomedical research and investigate how exposures to a range of chemicals in the environment, including phthalates, phenols, PAHs, and metals, negatively impact birth outcomes and women’s health. Puerto Rico has 18 Superfund sites and suffers extensively from environmental contamination. At the same time, the population in Puerto Rico has higher rates of preterm birth and low birth weight (LBW), with 11.6% of live births being preterm and 10.2% being LBW, in comparison with 10.1% and 8.2%, respectively, in the general US population in 2020.4 Adverse birth outcomes, such as preterm birth (<37 wk of gestation) and LBW (birth weight <2.5kg), are global health concerns linked to increased risks of developing conditions such as diabetes and cardiovascular disease in adulthood.5,6 Previous studies using the PROTECT cohort have observed links between individual environmental chemical exposures during pregnancy and a greater risk of preterm birth.7 However, due to unique statistical barriers, much remains unknown about the impact of exposure to environmental toxicant mixtures during pregnancy on these adverse birth outcomes.

The Landscape of Statistical Methods

Popular approaches for identifying mixture components are high-dimensional penalized regressions such as Lasso,8 Elastic Net (Enet9) and Group Lasso (G-Lasso10). Other more flexible approaches with nonparametric natures include machine learning methods, such as random forest (RF11), neural networks, support vector machine and Bayesian kernel machine regression (BKMR12,13). One important goal in mixtures research is to identify interactions among exposures, which prompted the development of hierarchical integrative group least absolute shrinkage and selection operator (HigLasso14), selection of nonlinear interactions by a forward stepwise algorithm (SNIF15) and factor analysis for interactions (FIN16). Lasso for Hierarchical Interactions (HierNet17) is a general method for interaction detection and has also been used for the mixtures analysis. One special area of machine learning is ensemble learning, and its representative work is Super Learner (SL18) which is targeted toward optimal prediction but can also be used for variable identification through creation of a variable importance score and thresholding it.

Moreover, methods have also been developed to characterize the summary measures of environmental mixtures, including weighted quantile sum regression (WQS19) and quantile g-computation (Q-gcomp20). In particular, Environmental Risk Score (ERS), a general method that uses a diverse range of predictive models to construct a one-dimensional risk score,21,22 has also attracted attention and has been broadly applied to quantify the health effects due to the pollutant mixtures.

Several publications have provided an overview of many statistical approaches available for studying the health effects of chemical mixtures. Table S13,2329 in our supplementary material provided a summary of literature analyzing the health impacts of chemical mixtures. Braun et al.24 summarized the methodological challenges and research questions that can be addressed in mixtures analysis. Davalos et al.25 reviewed approaches used in examining air pollution exposures and classified these approaches into five classes. Hamra et al.27 formulated the research questions on mixtures and compared different statistical tools. Gibson et al.26 provided an extensive overview of the methods and illustrated their usage with the National Health and Nutrition Examination Survey (NHANES) dataset. In particular, they categorized the methods into unsupervised methods and supervised methods, where unsupervised methods focus on the dimension reduction or clustering of exposures and supervised methods model the impacts of exposures on a health outcome. Park et al.22 focused on the machine learning approaches to construct the ERS with an NHANES data analysis as an illustration. These publications have focused on real data analysis, meaning that one would never know the true contributing toxicants and associations given the data, and making it difficult to evaluate the selection accuracy of the pollutants among methods. In contrast, simulation studies are a powerful tool for comprehensively comparing various methods under a broad range of data generating mechanisms. Agier et al.23 evaluated the performance of several linear regression–based statistical methods in a simulation setting with a continuous outcome and did not include more recent methods such as BKMR, WQS, Q-gcomp, HigLasso, or SNIF. Some other sporadic works on simulation studies for mixtures analysis have emerged28,29 but a systematic evaluation of the popular methods used in mixtures analyses under diverse data scenarios—varied sample sizes, changing number of pollutants for continuous and binary outcomes—is still lacking. To address this gap, our goal was to use simulation studies to perform a head-to-head comparison among mixtures methods under different data settings and provide guidance for practitioners.

The present study proposes an analytical framework that uses variable selection/prioritization/ranking techniques, including Lasso, Enet, G-Lasso, BKMR, RF, HigLasso, HierNet, and SNIF to identify the pollutants and interactions that are associated with the health outcome. We characterize the health effects from exposure to the mixtures through constructing ERS and adapt the ensemble learning approach of Super Learner to combine the ERSs derived from various methods for improved prediction. We compare the different ERSs with summary risk score measures derived from two other approaches: WQS and Q-gcomp. The evaluation metrics that we consider in our simulation study include measures of variable selection accuracy, prediction accuracy under different outcome types, ability to stratify high-risk individuals, and the computational cost. By varying the sample size, the number of pollutants, and the signal-to-noise ratio, we strive to provide a comprehensive evaluation of the representative methods and gained insight into their advantages and limitations in the context of mixtures analysis.

In the field of environmental health research, two lines of expansions are happening in tandem. One is the collection of large-scale data on multiple exposures with the hope of understanding health effects of complex mixtures based on sources of pollution and by toxicological characterization of exposure–response functions. The second is the exploding landscape of statistical learning methods to analyze complex mixtures. The details of coding and analytic complexity associated with the latter can often overwhelm the core scientific goal of a mixtures analysis. A pipeline like the one we present in the software package “CompMix” to automate and streamline multiple data analysis approaches for various environmental mixtures contexts reduces the analytic barrier, allowing researchers to focus on the synthesis of results from multiple methods and paint a bigger picture. Instead of committing to one method or one task, the health effects of mixtures can now be holistically examined from multiple angles in a relatively quick and comprehensive way. The investigator can focus more on interpreting the results than writing elaborate codes and getting lost in numerical details.

Methods

General Framework

We selected 11 representative statistical approaches and categorized them into three groups based on their ability to tackle the three main objectives of a typical mixtures analysis plan: a) identifying the important toxic components of the mixtures; b) identifying the interaction effects of combination of pollutants, and c) evaluating the predictive performance of the summary measures and risk stratification. An important facet of this approach is that all of these goals are specific to an underlying health outcome. The grouping of methods is presented in Figure 1, and further details of each method can be found elsewhere in the “Methods” section.

Figure 1.

Figure 1 is a flowchart that has three steps. Step 1: Pollutant selection: penalized regressions, including Lasso, elastic-net, group-Lasso and Machine learning, including B K M R, random forests led to super learner-E R S. Step 2: Interaction detection: targeted interaction search methods, including HierNet, HigLAsso, S N I F led to super learner-E R S. Step 3: Outcome Prediction and Risk stratification: Summary measure, including environmental risk score, W Q S, Q-group led to super learner-E R S.

Methods for mixtures analysis categorized in three groups, depending on inferential goals.

For objective 1, which involves pollutant selection, we considered methods that perform variable selection or provide variable importance scores corresponding to the input variables that can be used for ranking and can be thresholded for variable selection. The commonly used methods for this objective include penalized regressions, such as Lasso, Enet, and G-Lasso, which are among the most highly cited methods for variable selection in a regression setup. In addition, we considered two flexible nonparametric/machine learning methods, BKMR and RF, which provide pollutant importance rankings and can capture nonlinear associations between exposures and outcomes as well as exposure by exposure interactions. It is notable that BKMR was developed specifically to address challenges in chemical mixtures analysis and has become one of the most popular approaches in this area.

For objective 2, which involves interaction detection, we considered three methods that were specifically developed for interaction selection: HigLasso, HierNet, and SNIF. HigLasso and SNIF were directly motivated by problems in chemical mixtures analysis. Although these interaction search methods are less commonly used in mixtures applications, we aimed to raise practitioners’ awareness of them, because interaction detection is a very important aspect of mixtures analysis. It is worth noting that some of the methods for pollutant selection, such as Lasso, Enet, and G-Lasso, can also be used for interaction selection if one specifies the interaction terms in the underlying models. However, these methods were not initially designed for interaction selection, so they do not account for key assumptions such as heredity principle.14 If an interaction term is selected, the heredity principle is considered weak if at least one of the two main effects must be selected and strong if both main effects must be selected. HierNet allows users to specify weak or strong heredity principle, and it selects linear main, interaction, and quadratic terms. HigLasso and SNIF both impose a strong heredity principle and are able to select nonlinear main and interaction effects.

For objective 3, which involves prediction of health outcomes and risk stratification, we considered four methods, including ERS, WQS, Q-gcomp, and SL-ERS. SL-ERS uses predictive models to create a summary risk score. WQS and Q-gcomp, both designed to quantify the joint effects of mixtures, also construct a summary measure or burden index using a weighted average of exposures. These methods are widely used for risk prediction and risk stratification. The original SL used cross-validation to create a weighted combination of different learners to improve prediction. Motivated by the fundamental idea of SL, we developed our own version of SL for ensembling various ERSs. We refer to this method as SL-ERS or ERSSL.

Mixture Methods

We provide a detailed overview of current statistical methods for mixtures analysis (Table 1), with a focus on supervised methods that can be implemented to construct ERS. To begin, we introduce the notations and problem setup. Consider a random sample of n subjects. For subject ii=1,,n, let xi,p denote the pth environmental pollutant exposure, p=1,,P; zi,k denote the kth confounding factor, k=1,,K; yi denotes the one-dimensional continuous or binary outcome of interest. Let xp={xi,p}i=1n, zk={zi,k}i=1n, and y={yi}i=1n. Let Di=xi,1,xi,P,  zi,1,,zi,K,  yi represent the observed data for subject i. In addition, we define a22=aa for any vector a. For simplicity of the illustration and without loss of generality, we assumed that confounders were not present when reviewing some existing literature.

Table 1.

Summary characteristics of various methods for analysis of chemical mixtures.

Method Input Output Covariate adjustment Heredity principle Strengths Limitations R package
Lasso-M Pollutants and covariates Coefficients for selected pollutants; selected or adjusted covariates Penalized or always adjusted No Linear main effects; easy interpretation Lasso tends to select one pollutant from a group of correlated pollutants. glmnet
Lasso-MI Pollutants, covariates, and the interactions of interest Coefficients for selected pollutants; selected interactions; selected or adjusted covariates Penalized or always adjusted No Linear main and interaction effects with interactions modeled as product terms Lasso tends to select one pollutant from a group of correlated pollutants; it is likely that interactions are selected without the selection of either main effect. glmnet
Enet-M Pollutants and covariates Coefficients for selected pollutants; selected or adjusted covariates Penalized or always adjusted No Linear main effects; easy interpretation; Enet tends to select correlated pollutants together. May not fit the data with nonlinear associations or interactions well glmnet
Enet-MI Pollutants, covariates, and the interactions of interest Coefficients for selected pollutants; selected interactions; selected or adjusted covariates Penalized or always adjusted No Linear main and interaction effects; Enet tends to select correlated pollutants together. It is likely that interactions are selected without the selection of either main effect. glmnet
G-Lasso-M Pollutants, covariates and groups of pollutants Coefficients for selected pollutants; selected or adjusted covariates Penalized or always adjusted No Requires prespecified groups of variables Including or excluding the groups of variables gglasso
G-Lasso-MI Pollutants, covariates and groups of pollutants and interactions Coefficients for selected pollutants; selected interactions; selected or adjusted covariates Penalized or always adjusted No Requires prespecified groups of variables Including or excluding the groups of variables. It is likely that interactions are selected without the selection of either main effect. gglasso
BKMR Pollutants and covariates Posterior inclusion probabilities for all the pollutants Covariates adjusted linearly or subject to selection inside the kernel No Component-wise or hierarchical pollutant selection; nonlinear pollutant effects and interactions Computationally intensive for large datasets as well as large number of pollutants bkmr
RF Pollutants and covariates Importance score for pollutants and covariates Pollutants and covariates cannot be treated separately, no way to force confounders in the model. No Nonparametric assumptions; providing measures of importance for predictors; robust to outliers Not a direct method to perform variable selection; may be difficult to interpret randomForest
HigLasso Pollutants and covariates Selected pollutants; covariates and the interactions Pollutants and covariates cannot be treated separately, no way to force confounders in the model. Strong Nonlinear main and interaction effects, focused on selection Prediction for outcome is not available; computationally intensive for large datasets. higlasso
HierNet Pollutants and covariates Selected pollutant; covariates; the interactions and the quadratic terms Pollutants and covariates cannot be treated separately. Weak or strong Linear main, interaction and quadratic main effects Computationally intensive with a large number of pollutants hierNet
SNIF Pollutants and covariates Selected pollutants; covariates and the interactions Pollutants and covariates cannot be treated separately Strong Nonlinear main and interaction effects Not available to binary outcome. The variable selection may be conservative. snif
Summary risk scores
WQS Pollutants; covariates; and the interactions of interest Positive weights for pollutants; weighted index for mixture effects Covariates treated via parametric models. NA Easy interpretation for weights and mixture index. Interaction of mixture index can be included. Not a direct method to perform pollutant selection. Directional homogeneity may be strong. gWQS
Q-gcomp Pollutants; covariates; and the interactions of interest Positive or negative weights for pollutants; weighted index for mixture effects Covariates treated via parametric models NA Easy interpretation for mixture index; positive or negative weights both allowed; computationally efficient Not a direct method to perform pollutant selection qgcomp
SL-ERS Candidate learners, pollutants, covariates or input risk scores Weight for component ERS estimated adaptively Different covariate adjustments depending on the learner NA Aims to provide the optimal combination of input generated risk scores to improve prediction Interpreting the associations between pollutants and outcome is complex due to ensembling of several learners. SuperLearner
CompMix

Note: BKMR, Bayesian kernel machine regression; Enet-M, elastic net for main effects; Enet-MI, elastic net for main effects and interactions; ERS, environmental risk score; G-Lasso-M, group lasso for main effects; G-Lasso-MI, group lasso for main effects and interactions; HierNet, lasso for hierarchical interactions; HigLasso, hierarchical integrative group lasso; Lasso-M, lasso for main effects; Lasso-MI, lasso for main effects and interactions; NA, not applicable; Q-gcomp, quantile g-computation; RF, random forest; SL-ERS, Super Learner used to adaptively combine component ERS through weighting; SNIF, selection of nonlinear interactions by a forward stepwise algorithm; WQS, weighted quantile sum regression.

Pollutant identification.

This group of approaches aimed to perform the variable selection of the main effects of the pollutants. They can be divided into two categories: penalized regression approaches and machine learning approaches.

Lasso.

The least absolute shrinkage and selection operator (Lasso) was proposed by Tibshirani in 1996.8 It is a broadly used linear regression method that performs feature selection by penalizing the sum of the absolute values of the coefficients, termed as L1 penalty. It was developed to improve the prediction accuracy by selecting the most important predictors while shrinking the coefficients of less relevant ones to zero. Lasso minimizes the following objective function,

yβ0p=1Pxpβp22+λp=1P|βp|,

where βpp=0,,P are model parameters, and λλ0 is the tuning parameter to regulate the size of parameters that are shrunk to zero. When λ=0, Lasso is equivalent to ordinary linear regression; as λ increases, many regression coefficients βp are shrunk to zero. In our analysis, we select the λ value that gives the minimum mean cross-validated error. One limitation of Lasso is that when handling a group correlated predictors, it often selects one predictor from the group while shrinking the coefficients of other predictors to zero. We implemented Lasso via R package “glmnet” (version 4.1-4).30

Enet.

Enet was proposed by Zou et al. in 2005.9 It is also a variable selection and penalized regression method, and it addresses the issue of the L1 penalty that Lasso used by adding an L2 penalty, which is the sum of the squared coefficients. This advantage of Enet is especially appealing in the context of mixture analysis, where exposure to multipollutants within the same family class tends to be highly correlated. Enet minimizes the following objective function,

yβ0p=1Pxpβp22+λp=1Pαβp+1αβp2,

where 0α1 is another tunning parameter in addition to λ, and α controls weights between L1 and L2 penalties. In our analysis, we specified α=0.5 and selected λ value that gives the minimum mean cross-validated error. We implemented Enet via R package “glmnet” (version 4.1-4).30

G-Lasso.

G-Lasso was proposed by Yuan et al. in 2006.10 It is an extension of Lasso that performs variable selections on prespecified groups of variables. It allows the entire group of variables to be either included or excluded from the model. As with Enet, this unique feature is particularly suitable because multipollutants are often correlated and presented in chemical classes. However, not all pollutants in a group may be relevant to the outcome, despite being highly correlated. Therefore, G-Lasso may potentially have a high false discovery rate (FDR). Suppose the data consist of G groups of exposures. For each group g=1,,G, let xg denote the design matrix of the exposure variables in the group g. The objective function is as follows,

yg=1Gxgβg22+λg=1G||βg||2,

where βgg=1,,G are vector parameters, and 2 indicates L2 norm. In our analysis, we selected λ value that gives the minimum mean cross-validated error. We implemented G-lasso via R package “gglasso” (version 1.5).31

BKMR.

BKMR was proposed by Bobb et al. in 2015.13 It is a machine learning approach developed to address the statistical challenges in estimating the simultaneous effects from exposure to multiple pollutants. We consider the following model:

y=hx1,,xP+αTz+ε,

where h is the unknown function of exposures to be estimated. Covariates z enter the model linearly; however, if nonlinearity between covariates and outcome are suspected, covariates can also be added into the h function to improve overall prediction accuracy and selection. BKMR performs pollutant selection by providing the posterior inclusion probability (PIP, between 0 and 1) for each variable considered in the h function. The PIP can be viewed as an importance score, where a higher PIP indicates greater importance of the variable. BKMR does not directly conduct variable selection as the shrinkage method; thus, researchers need to prespecify a threshold value for PIP (e.g., 0.50) to select pollutants with PIPs higher than the threshold. Another aspect to note is that despite BKMR specifies the outcome as a nonlinear function of the exposures that implicitly accounts for interaction effects, one cannot directly make inference on interaction selection. Therefore, BKMR cannot be evaluated for comparisons of interaction selection accuracy among different methods. In BKMR, one needs to specify the number of iterations of the Markov Chain Monte Carlo (MCMC) sampler, and we ran 2,000 iterations for all the simulations and 10,000 iterations for data analysis. We implemented BKMR via R package “bkmr” (version 0.2.0).32

RF.

RF was proposed by Breiman in 2001.11 It is an ensemble learning method for classification that combines randomly generated tree-structured classifiers. It has been successfully applied in a broad range of areas, including environmental health. It is more robust to outliers and noise than other methods, and it provides a variable importance score. For RF, setting a threshold value as in BKMR would be challenging because the importance score describes the accuracy loss if a pollutant is removed. To carry out a pollutant selection function, we proposed using k-means clustering33 to group the pollutant importance into two clusters, and the pollutants in the cluster with higher importance scores are the ones selected. It is important to note that for all other methods except RF, the predictions are based on the selected variables and their model fitting. However, for RF, the variables selected by k-means are meant only as guidelines because RF is not designed as a variable selection tool. The prediction is based on the model fitting using all the pollutants, not just the selected ones. We implemented RF via R package “randomForest” (version 4.6-14).34

Interaction detection.

This group of approaches aims to perform interaction detection in addition to main effect selection.

HierNet.

HierNet was proposed by Bien in 2013.35 HierNet extends the Lasso for selecting linear main and interaction effects under heredity constraints. The user can choose between strong or weak heredity constraints. With strong heredity, an interaction term is selected into the model only when both of its corresponding main effect terms are selected, whereas with weak heredity constraint, an interaction term is selected if at least one of its corresponding main effect terms is selected. Apart from selecting interactions, HierNet also screens for quadratic terms. However, because quadratic terms are not the primary interest, they are not summarized in this paper. We implemented HierNet via R package “hierNet” (version 1.9).17

HigLasso.

HigLasso was proposed by Boss et al. in 2021.14 It explores the nonlinear associations between exposures and health outcomes and has been developed as a general shrinkage method that selects nonlinear main and interaction effects of exposures. To specify complex nonlinear relationships, HigLasso adopts a basis expansion approach, and it also assumes strong heredity constraint. HigLasso performs variable selection by imposing sparsity on coefficient estimates using G-Lasso penalties. We implemented the HigLasso via R package “higlasso” (version 0.9.0).36

SNIF.

Selection of nonlinear interactions by a forward stepwise algorithm (SNIF) was proposed by Narisetty et al. in 2019.15 It was motivated by the need to identify chemicals that affect health outcomes and was developed under a general regression framework for selection of interaction effects. Like HigLasso, SNIF adopts a basis expansion approach for modeling the nonlinear exposure effects and performs interaction selection under the strong heredity constraint. However, SNIF has additional flexibility to retain linear effects of exposures in its selection path, which helps effectively reduce the number of parameters in the model when the linear model fits data well. We implement SNIF via R package “snif” (version 0.5.0).37

Outcome prediction and risk stratification.

This group of approaches aims to optimize the accuracy of predictions for health outcomes while constructing summary risk measures from exposure to chemical mixtures and covariates.

ERS.

The classic Environmental Risk Score (ERS) was proposed by Park et al. in 2014 and 2017.21,22 It is constructed as a one-dimensional risk score through various predictive models. However, a limitation of this ERS is that it has been restricted to the estimated health effects due to pollutants. The penalization methods such as Lasso, Enet, and G-Lasso can be used to compute the classic ERS as the weighted sum of pollutants or their interactions while controlling for covariates. However, the cutting-edge machine learning algorithms such as BKMR, RF, and SNIF are not readily applicable for computing the classic ERS, because they estimate health effects from pollutants and covariates in complex functions without direct separation of pollutant-only effects from other effects. Thus, in this paper we redefine the ERS concept as the prediction for the continuous health outcome or the logit of the probability for the binary outcome. With this updated definition, we can compare ERSs computed through various statistical methods in terms of predictive power, interpretability, and risk stratification. Consider the following model y=gx1,,xP,z1,,zK+ε, εN0,σ2, where we used training data to obtain g^ and defined ERS as y^=g^x1,,xP,z1,,zK for a collection of pollutants and covariates.

We selected four methods to construct ERS, namely Enet, BKMR, HierNet, and SNIF, each with distinct strengths in modeling: linear data with easy interpretation (Enet), nonlinear exposure–response relationship (BKMR), linear data with interaction detection (HierNet), and nonlinear data with interaction detection (SNIF). We constructed four such method-specific ERSs, i.e., ERSEnet-MI, ERSBKMR, ERSHierNet, and ERSSNIF, where Enet-MI refers to the Enet model with main and interaction effects.

SL-ERS.

To enhance the prediction accuracy of these four ERSs, inspired by the concept of the original Super Learner, we then construct SL-ERS as wEnet-MIERSEnet-MI+wBKMRERSBKMR +wHierNetERSHierNet+wSNIFERSSNIF, where the four unknown weights wEnet-MI, wBKMR, wHierNet, and wSNIF each range from 0 and 1 and sum to 1. We iteratively solved for these weights using a coordinate descent algorithm. The SL-ERS construction is explained below.

Super Learner (SL) was proposed by Van der Laan et al. in 2007.18 It is a prediction algorithm that aims to find an optimal combination for a collection of candidate learners to minimize the overall risk. The main challenge with SL-ERS is determining the optimal weights for these four learners using the training data. To achieve this, on the training dataset with sample size ntrain, we used 10-fold cross-validation and split the data into 60% for estimation and 40% for validation. We then applied the coordinate descent algorithm with constraint that the four weights were nonnegative and summed to 1. For the tth fold t=1,,10, let Iestt{1,,n} and Ivalidt{1,,n} represent the 60% and 40% indices of subjects that are randomly drawn from the training data for estimation and validation respectively, i.e., |Iestt|=0.6ntrain and |Ivalidt|=0.4ntrain, where represents the cardinality of the set. Let J= {Enet-MI, BKMR, HierNet, SNIF} be a collection of four learners. For each learner jjJ, we estimated gj by using the tth fold estimation data yi,xi,1,,xi,P,zi,1,,zi,KiIestt denoted as g^jt. To optimize the weights of the learner, exactly following the idea of a Super Learner that optimizes weights when combining multiple predictions, we minimized the loss function on the validation data for continuous outcome, i.e., minwt=110iIvalidt[yijJwjg^jtxi,1,,xi,P,zi,1,,zi,K]2, where w=wjjJ. For binary outcome, we use a cross-entropy loss function. Once weights w are obtained from training data, we compute the predicted outcome for the test data, y^test=ERSSL=jJwjERSj, where ERSj=g^jxtest,1,,xtest,P,ztest,1,,ztest,Krepresents the ERS constructed for each learner jjJ on the test data. This is the value of SL-ERS.

WQS.

WQS was proposed by Carrico et al. in 2015.19 It aims to estimate a single dimension disease risk score, called the WQS index, from exposure to mixtures of chemicals. The WQS index is calculated as a weighted sum of individual exposure quantiles. The model is given below,

gμ=β0+β1p=1Pwpxpq+αTz,

where g is the link function in generalized linear model,38 and μ is the mean outcome. p=1Pwpxpq is defined as WQS index, and the weights wpp=1,P are estimated using bootstrapping on training data, which comprise 40% of the total samples by default using R package “gWQS” (version 3.0.4),39,40 and we used the gwqs function without repeated holdout validations. The weights wp are between 0 and 1 and sum to 1. The categorical variable xi,pq is determined by the quantile of pth pollutant exposure for ith subject. The quantile transformation enjoys the advantage of standardizing the exposures, and hence the weights describe the relative contribution of each chemical to the joint effect of the health outcome. The remaining 60% of the data is used to test the significance of the coefficient β1 for the WQS index on the health outcome. The model can also include a set of covariates z, which enter the model linearly, with αT denoting a vector of regression coefficients.

WQS offers a straightforward interpretation by creating a summary index that captures the joint effect of multiple pollutants as well as the relative importance characterized by the magnitude of the weights. However, the validity of directional homogeneity assumption that assumes all the components in the mixture share the same direction of associations with the outcome, should be carefully considered. In addition, transforming continuous pollutant exposures into categorical ones may potentially lead to a loss of information and changes in the correlation structure among pollutants and their true association with the outcome. The package allows users to include interaction terms or quadratic terms of the WQS index to characterize nonlinear association, but these interactions are usually treated as covariates rather than pollutant effects of primary interest.

Q-gcomp.

Quantile g-computation (Q-gcomp) was proposed by Keil et al. in 2020.20 It extends the framework of WQS by relaxing the assumption of directional homogeneity and allowing for positive and negative effects of pollutants. The model without covariates is given by:

gμ=β0+p=1Pβpxpq=β0+ψ+βp>0wpxpq+ψβp<0wpxpq,

where ψ+=βp>0βp, ψ=βp<0βp, and wp=βpψ+Iβp>0+βpψIβp<0. A linear regression model is fitted to obtain the coefficients βpp=0,P that determine the estimate of ψ=ψ++ψ_ for the summary index and weight wp for each chemical. The parameter estimation procedure for WQS and Q-gcomp differs in that WQS first estimates the weights on the training data and then estimates the coefficient for the WQS index and its p-value on the validation data, whereas Q-gcomp uses all the data to estimate βp and obtain ψ. We implemented Q-gcomp via R package “qgcomp” (version 2.8.5).41

Although both WQS and Q-gcomp provide meaningful summary risk scores from mixtures and rank exposure importance by chemical weights, they do not offer variable selection, potentially limiting their effectiveness in high-dimensional settings. ERS, on the other hand, has two advantages over WQS and Q-gcomp. First, ERS can be constructed using a wide range of statistical prediction approaches, allowing us to incorporate methods with unique strength to create candidate ERSs that address questions such as variable selection and interaction detection. These candidate ERSs can then be combined to obtain a weighted ERS, referred to as SL-ERS or ERSSL (as described above) to achieve better outcome prediction. Second, because each ERS is constructed using pollutant measurements rather than quantiles, it does not lose any information from the pollutants, leading to more accurate prediction and association detection.

Simulation Methods

Simulated data settings.

The simulation studies aimed to investigate the associations between chemical mixtures, their interactions, and the continuous or binary health outcomes under settings of cross-sectional studies. We simulated 20 pollutants x1,,x20, that were partitioned into three groups of size seven, six, and seven, respectively, to represent three families of chemicals, namely phthalates, PAHs, and metals. They were simulated from a multivariate normal distribution, with mean zero and the marginal variance one. True features are x1 and x2; x3 and x4; and x5 from three groups, reflecting that each group included important toxins. Let q=5 denote the number of true features x1 to x5. The correlation matrix among 20 pollutants is specified according to the grouping structure, with within-group correlations and between-group correlations set to 0.6 and 0.1, respectively. Figure S2 shows the heatmap of Pearson correlations among the simulated pollutants. The mean function for the continuous outcome variable y was generated under four settings: linear main effect model (LM), linear main and interaction effect model (LMI), nonlinear main effect model (NM), and nonlinear main and interaction effect model (NMI). In the settings of LMI and NMI, the 10 pairwise interactions of the five true features were also associated with the outcomes. For each of the four settings, we considered the following four scenarios: sample size n=1,000, p=20 and R2=0.2, 0.1; and sample size n=2,000, p=40 and R2=0.2, 0.1. For the binary outcomes, we also generated data from four settings: logit link main effect model (Logit), logit link main and interaction effect model (LogitI), logit link nonlinear main effect model (Nlogit), and logit link nonlinear main and interaction effect model (NlogitI). We adopted the specific mean function g forms from Boss et al.14 A full list of simulation settings and parameter specifications can be found in Table 2.

Table 2.

Complete list of simulation settings.

Model Outcome Sample size Number of pollutants Number of true effects R2 Mean functiona
LM Continuous 1,000 20 5 0.2 (1)
LMI Continuous 1,000 20 5 0.2 (2)
NM Continuous 1,000 20 5 0.2 (3)
NMI Continuous 1,000 20 5 0.2 (4)
Logit Binary 1,000 20 5 0.2 (1)
LogitI Binary 1,000 20 5 0.2 (2)
Nlogit Binary 1,000 20 5 0.2 (3)
NlogitI Binary 1,000 20 5 0.2 (4)
LM Continuous 2,000 40 5 0.2 (1)
LMI Continuous 2,000 40 5 0.2 (2)
NM Continuous 2,000 40 5 0.2 (3)
NMI Continuous 2,000 40 5 0.2 (4)
Logit Binary 2,000 40 5 0.2 (1)
LogitI Binary 2,000 40 5 0.2 (2)
Nlogit Binary 2,000 40 5 0.2 (3)
NlogitI Binary 2,000 40 5 0.2 (4)
LM Continuous 1,000 20 5 0.1 (1)
LMI Continuous 1,000 20 5 0.1 (2)
NM Continuous 1,000 20 5 0.1 (3)
NMI Continuous 1,000 20 5 0.1 (4)
Logit Binary 1,000 20 5 0.1 (1)
LogitI Binary 1,000 20 5 0.1 (2)
Nlogit Binary 1,000 20 5 0.1 (3)
NlogitI Binary 1,000 20 5 0.1 (4)
LM Continuous 2,000 40 5 0.1 (1)
LMI Continuous 2,000 40 5 0.1 (2)
NM Continuous 2,000 40 5 0.1 (3)
NMI Continuous 2,000 40 5 0.1 (4)
Logit Binary 2,000 40 5 0.1 (1)
LogitI Binary 2,000 40 5 0.1 (2)
Nlogit Binary 2,000 40 5 0.1 (3)
NlogitI Binary 2,000 40 5 0.1 (4)

Note: LM, linear main effects; LMI, linear main effects and interactions; Logit, logit link linear main effects; LogitI, logit link linear main effects and interactions; NM, nonlinear main effects; NMI, nonlinear main effects and interactions; Nlogit, logit link nonlinear main effects; NlogitI, logit link nonlinear main effects and interactions.

a

Mean functions: (1) x1+x2+x3+x4+x5. (2) x1+x2+x3+x4+x5+x1x2+x1x3+x1x4+x1x5+x2x3+x2x4+x2x5+x3x4+x3x5+x4x5. (3) x1Ix1>0+exp x2+|x3|+x42+x5+12. (4) x1I(x1>0)+exp(x2)+|x3|+x42+(x5+1)2+x1x2I(x1>0)I(x2<0)+x1x3I(x1>0)I(x3>0.5)+x1x4I(x1>0)I(x4>0)+x1x5I(x1>0)I(x5<0.5)+x2x3I(x2<0)I(x3>0.5))+x2x4I(x2<0)I(x4>0)+x2x5I(x2<0)I(x5<0.5)+x3x4I(x3>0.5)I(x4>0)+x3x5I(x3>0.5)I(x5<0.5)+x4x5I(x4>0)I(x5<0.5).

Evaluation criteria.

Under our analytical framework for assessing the multipollutant mixtures, we used standard criteria to evaluate the performance of a collection of statistical methods. For each of the 500 datasets, we randomly split 1,000 samples into training and testing datasets, each with 500 samples (i.e., ntrain=ntest=500). We evaluated the feature selection and interaction detection on the training dataset and compared the predictive power of three summary scores on the testing dataset. In each dataset under the continuous outcome setting, to assess the feature selection and interaction detection, we considered 20 pollutants and their 190 pairwise interactions, totaling 210 predictors for Lasso, Enet, and G-Lasso (Lasso-MI/Enet-MI/G-Lasso-MI). For comparison purposes, we also fit these three regularization methods with 20 pollutants for main effects only (Lasso-M/Enet-M/G-Lasso-M). We considered 20 pollutants for underlying models of BKMR, RF, HigLasso, HierNet, and SNIF, because BKMR and RF do not allow the separation between main and interaction effects, whereas HigLasso, HierNet, and SNIF automatically screen for pairwise interactions for the 20 exposures. For binary outcomes, we had similar settings except that HigLasso and SNIF were no longer available, and BKMR showed unstable simulation results, thus we had to omit these three methods from the analysis of binary outcomes.

To evaluate the accuracy of selecting important pollutants, we used sensitivity, specificity, and false discovery rate (FDR) metrics. These metrics were defined as follows, where J=500 is the number of simulated datasets, and there are 5 true and 15 null effects.

SensitivitySen=1Jj=1J#oftrueeffectsidentified#oftrueeffects,
Specificity(Spe)=1Jj=1J#ofnulleffectsidentified#ofnulleffects, and
FDR=1Jj=1J#ofnulleffectsinthoseselectedeffects#ofselectedeffects.

To evaluate the accuracy of interaction detection, we calculated the same three metrics for four settings with interactions: LMI, NMI, LogitI, and NlogitI, where there were 10 true and 180 null interaction effects. In the absence of true interaction effects (settings of LM, NM, Logit, and Nlogit), we used the false positive rate (FPR) to assess the proportion of falsely selected interactions out of all 190 interactions. Specifically, FPR was defined as 1 minus Specificity. The higher the Sensitivity and Specificity values and the lower the FDR and FPR values, the better the feature selection and interaction detection.

For the continuous outcome, to evaluate the prediction performance of three main summary scores (ERS/WQS/Q-gcomp) under various model specifications, we used the testing data. For main effect models, we predicted ERS using Enet-M, and we fit and predicted WQS and Q-gcomp models with either pollutants selected by Enet (WQS-M*/Q-gcomp-M*) or all 20 pollutants (WQS-M/Q-gcomp-M). For models considering both main and interactions, we constructed SL-ERS using Enet-MI, BKMR, HierNet, and SNIF, where the weights of SL-ERS were obtained from training data. We also fit and predicted WQS and Q-gcomp models with either pollutants and their interactions selected by Enet-MI (WQS-MI*/Q-gcomp-MI*) or all 210 effects (WQS-MI/Q-gcomp-MI).

We evaluated the predictive performance of different methods in three aspects. First, we calculated the correlation coefficient (Corr) and sum of squared error (SSE) between predicted and observed continuous outcome. Second, to assess the predictive power of summary scores for a binary outcome, we dichotomized the continuous outcome at the 90th percentile so that the values more than 90th percentile were 1, then calculated AUC to measure the prediction probability of distinguishing between binary outcomes. Third, to evaluate the risk stratification property, we stratified each summary measure on the testing data by the two thresholds of 25th (Q1) and 75th (Q3) percentile of summary measure from the training data. We defined the test samples as low- (or high-) risk group and conducted a logistic regression for these subsets of samples with the binary outcome to obtain an odds ratio (OR) of having an extreme outcome between the group with the lowest quartile of the summary measure and the group with the highest quartile of the summary measure. For the binary outcome, in addition to AUC, we also calculated the Brier score, defined as the mean of SSEs between the predicted probability and the observed binary outcome. In cases where methods (Enet-M, Enet-MI, BKMR, HierNet, and SNIF) failed to select any predictors, we defined Corr=0 and OR=1, indicating no predictive power and no risk stratification property from ERS. Figure 2 illustrates the simulation procedure and comparisons among the methods for a continuous outcome, and the procedure for analyzing the binary outcome is similar. R codes regarding generating the datasets and evaluating the performance metrics are available at https://github.com/haowei72.

Figure 2.

Figure 2 is a workflow diagram for simulation and evaluation of exposure-outcome models with four steps. The flowchart is organized into a multi-step pipeline with boxes connected by arrows of different colors, showing the direction and type of analysis. It consists of the following key stages: Step 1. Model Setting and Data Generation (Left Side): Top Left (Model Setting): A box labeled with symbols like n, p, q, R², and “Mean functions” under “Model Setting.” Arrow: Connects to “Data Generation” box. Data Generation: Uses models (L M, L M I, N M, N M I) to create both training and testing datasets for exposures and outcomes. Step 2. Exposure or Interaction Selection (Center Top): Training Data Box: Connects to several model methods for selection and fitting: Exposure or Interaction Selection: Uses Lasso, Enet, G-Lasso, HigLasso, HierNet, S N I F. Exposure Selection: Uses B K M R and R F. Selected Exposures and Interactions are routed separately for downstream analysis. Selection Accuracy Evaluation is shown at the top with a reference to “Table S 4.” Step 3. Fitting and Prediction Models (Middle or Right): Fitted Model Boxes: Left side: Enet-M I, B KM R, HierNet, S N I F. Right side: W Q S-M*, Q-gcomp-M asterisk, with various combinations including M I (multiple imputation) and M*. Prediction Boxes: Enet-M I, B K M R, HierNet, S N I F. W Q S-M asterisk, Q-gcomp-M asterisk and their variants. Arrows link from Selected Exposures or Interactions to Fitted Model, and then to Prediction. Step 4. Evaluation Bottom Center: Summary Measure Evaluation: Metrics: S S E, Corr (for continuous outcomes); area under the curve, Odds ratio (for binary outcomes). Linked to “Table 3” for summary. Arrows connect model predictions to the evaluation box. The arrows represents model selection and flow of input parameters, data movement from generation to evaluation, and model fitting or prediction leading to evaluation.

Schematic diagram of the simulation study. Note: BKMR, Bayesian kernel machine regression; Enet-M, elastic net for main effects; Enet-MI, elastic net for main effects and interactions; ERS, environmental risk score; G-Lasso-M, group lasso for main effects; G-Lasso-MI, group lasso for main effects and interactions; HierNet, lasso for hierarchical interactions; HigLasso, hierarchical integrative group lasso; Lasso-M, lasso for main effects; Lasso-MI, lasso for main effects and interactions; LM, linear main effects; LMI, linear main effects and interactions; NM, nonlinear main effects; NMI, nonlinear main effects and interactions; Q-gcomp, quantile g-computation; Q-gcomp-M*, Q-gcomp for selected main effects by Enet-M; Q-gcomp-MI*, Q-gcomp for selected main effects and interactions by Enet-MI; Q-gcomp-M, Q-gcomp for main effects; Q-gcomp-MI, Q-gcomp for main effects and interactions; RF, random forest; SL-ERS, environmental risk score; SuperLearner used to adaptively combine component ERS through weighting; SNIF, selection of nonlinear interactions by a forward stepwise algorithm; WQS, weighted quantile sum regression; WQS-M*, WQS for selected main effects by Enet-M; WQS-M, WQS for main effects and interactions.

Methods for PROTECT Study

Data description.

We apply the proposed framework to a data analysis from the PROTECT study, a large-scale ongoing prospective birth cohort study initiated in 2010. The goal of the study was to assess the impacts of exposure to mixtures from four chemical classes (metals, phthalates, phenols, and PAHs) during pregnancy on birth outcomes, such as birth weight and preterm birth. Recruitment of the study has been detailed previously.42,43 Women were eligible to participate if they were between the ages of 18 and 40 years, had their first clinic visit before their 20th week of pregnancy, did not use in vitro fertilization to become pregnant, did not use oral contraceptives within 3 months of becoming pregnant, and had no known preexisting medical or obstetric conditions. Spot urine samples were collected at three visits, at approximately 18, 22, and 26 wk of gestation. The PROTECT study was approved by the research and ethics committees of the University of Michigan School of Public Health, University of Puerto Rico, Northeastern University, and all participating hospitals and clinics. All participants provided full informed consent prior to participation.

Exposures, outcomes, and covariates.

We started with a dataset that included 61 pollutants, including 19 phthalates, 12 phenols, 8 PAHs, and 22 metals, collected from urine samples of 1,747 women during gestation across three visits. We then created reduced datasets including only individuals with complete data on each outcome (birth weight, preterm birth status) and covariates. This process resulted in sample sizes of 1,348 for birth weight (kg) and 1,379 for preterm birth (yes/no). In addition to analyzing birth weight as a continuous variable, we dichotomized it at <2.5kg, following the World Health Organization’s definition of low birth weight44 and at 4.0kg for high birth weight (formally termed fetal macrosomia45). Accordingly, each participant had a continuous birth weight measurement, along with two binary indicators reflecting whether their birth weight is classified as low (<2.5kg) or high (>4.0kg). Among our samples, we had 93 out of 1,348 children with low birth weight (6.90%) and 55 out of 1,348 children with high birth weight (4.08%). Preterm birth is defined as birth prior to 37 wk of gestational age. We had 126 out of 1,379 children born preterm (9.14%). For birth weight analysis, we adjusted for covariates of infant sex, maternal education (high school or less, some college, college or above), maternal age at recruitment (years), and gestational age at delivery (weeks). For preterm birth analysis, we adjusted for the same covariates, excluding gestational age, which is used to define preterm birth. The covariates were selected based on a priori knowledge from the study. Please refer to Table S2 for descriptive statistics of study participants’ demographic characteristics. We imputed exposure concentrations (ng/mL) measured below limit of detection (LOD) with LOD/2 and corrected them by urine specific gravity (SG) using the equation Ci,p,vSGmed1SGi,v1, where SGmed was the median urine SG in this dataset (1.019); SGi,v was the individual i urine specific gravity at visit v; and Ci,v,p was the pth pollutant concentration for individual i at visit v. Due to the right-skewed distributions of SG-adjusted concentrations, we applied the logarithmic transformation with base 10 on the concentrations.

After evaluating the percentage of samples measured above LOD for each pollutant at each visit, we eliminated 14 pollutants (4 phthalates, 1 phenol, 1 PAH, and 8 metals) from our analysis, due to having measurements below the LOD in more than 30% of the samples at any visit. Furthermore, we excluded eight additional pollutants (four phthalates and four phenols) due to missingness in more than 20% of samples after taking the mean across three visits in our dataset. We imputed missing values for all remaining chemicals (4.23%–15.50% missing) via R package “missForest” (version 1.4)46,47 based on single imputation. After these preprocessing steps, our final dataset consisted of 39 chemical exposures (11 phthalates, 7 phenols, 7 PAHs, and 14 metals), 4 covariates (three covariates for preterm), and 903 pairwise interactions among all chemicals and covariates. Figure 3 presents a correlation heatmap of the log-transformed geometric mean of SG-adjusted concentrations across three prenatal visits for 39 chemical exposures from urine samples in the PROTECT study. These pollutants were from four chemical classes: metals, PAHs, phthalates, and phenols, forming four blocks in the heatmap. On the left side of the figure, we show daily products that may contain the pollutants. On the right side of the figure, we show the birth outcomes of interest. Pollutants’ full name, LOD values, number of observations below LOD, and descriptive statistics of pollutants are presented in Table S3.

Figure 3.

Figure 3 is a correlation heatmap, plotting T C S, P-P B, M-P B, B P A, B P-3, 25-D C P, 24-D C P, M I B P, M E P, M E O H P, M E H P, M E H H P, M E C P P, M C P P, M C O P, M C N P, M B Z P, M B P, 4-O H-P H E, 2-O H-N A P, 2-O H-F L U, 2-3-O H –P H E, 1-O H-P Y R, 1-O H-P H E, 1-O H-N AP, zinc, thallium, tin, lead, nickel, molybdenum, manganese, mercury, copper, cesium, cobalt, cadmium, barium, and arsenic (y-axis) across arsenic, barium, cadmium, cobalt, cesium, copper, mercury, manganese, molybdenum, nickel, lead, tin, thallium, zinc, 1-O H-N AP, 1-O H-P Y R, 1-O H-P H E, 2-2-3-O H –P H E, 2-O H-F L U, 2-O H-N A P, 4-O H-P H E, M B P, M B Z P, M C N P, M C O P, M C P P, M E C P P, M E H H P, M E H P, M E O H P, M E P, M I B P, 24-D C P, 25-D C P, B P-3, B P A, M-P B, P-P B, T C S (x-axis). A scale depict the birth weight preterm ranges from negative 0.1 to 1 in increments of 0.1. There are 14 metals, 7 PAHs, 11 phthalates, 7 phenols.

Standard structure of a mixtures analysis: correlation heatmap of log-transformed geometric mean of specific gravity-adjusted concentrations across three visits for 39 pollutants from urine samples in the PROTECT study. The chemicals are ordered by four families: metals, PAHs, phthalates, and phenols, forming four blocks in the heatmap. On the left side of the figure, daily products that may contain the pollutants are shown. On the right side of the figure, we show the birth outcomes of interest. Please refer to Excel Table S1 for correlation matrix values. The illustration images in this figure were designed by Freepik. Note: PAHs, polycyclic aromatic hydrocarbons.

Statistical analysis.

For birth weight, we created 100 training and testing datasets by randomly splitting the 1,348 samples into 674 samples each for training and testing data. For each training dataset, for feature selection, we first fit Enet with underlying models considering 39 main effects (Enet-M) adjusting for four covariates. Next, we fit 43 main effects along with their pairwise 903 interactions (Enet-MI). For BKMR, we fit 39 chemicals in the nonlinear function while adjusting for 4 covariates linearly, with PIP’s cutoff of 0.50. For HierNet and SNIF, we fit the models with 43 main effects, where the HierNet and SNIF screened for main and 903 interaction effects by default. For each testing dataset, we compares three main effect models that were adjusted for covariates: ERS obtained from Enet (ERSEnet-M), WQS (WQS-M), and Q-gcomp (Q-gcomp-M). We did not fit WQS and Q-gcomp with the main effects selected from Enet-M as in our simulations, because Enet fails to select any chemicals in 22 of the 100 training datasets. For main and interaction effect models, we compared five ERSs (Enet-MI, BKMR, HierNet, SNIF, and the combined SL-ERS) using following metrics. First, we calculated Corr and SSE between observed birth weight and ERSs. Second, we used low birth weight (Yes=1/No=0) as the binary outcome and calculated AUC to evaluate the prediction power of ERSs. Third, we categorized ERSs from the testing data using Q1 obtained from training data and defined the subjects with ERS below Q1 as a high-risk group and computed the OR of low birth weight between the high-risk group and the rest of the samples to evaluate the risk discrimination ability of each ERS. For high birth weight, we calculated the OR of high birth weight for subjects with ERS more than Q3 vs. the rest of the subjects. We repeated the entire model fitting and validation procedure 100 times and reported the pollutants and interactions that are consistently selected by each method for at least 30% of the 100 times. For preterm birth (Yes=1/No=0), the analysis was similar, and for main and interaction models, we compared five ERSs (Lasso-MI, Enet-MI, RF, HierNet, and SL-ERS), reported Brier scores, and defined ERS less than Q1 as a low-risk group and higher than Q3 as a high-risk group to reflect that the higher the ERS, the higher the probability of being a preterm birth.

Results

Simulation Results

Pollutant identification.

For continuous outcomes (Figure 4 and Table S4: main/marginal), G-Lasso-MI had the lowest specificity (0.000) and highest FDR (FDR=0.75) across all models, indicating it selected all 20 exposures. G-Lasso will either select a group of correlated predictors or shrink the whole group to zero. In our simulation settings, each group had a true predictor; G-Lasso hence selected all the exposures. Comparing Lasso-M and Lasso-MI or Enet-M and Enet-MI under all data settings, the inclusions of interactions into the models consistently decreased the sensitivity slightly (e.g., 0.975 to 0.954 in LM for Lasso), increased the specificity (0.643 to 0.775), and decreased the FDR (0.497 to 0.391). BKMR showed low specificities in LM and LMI (0.082, 0.052), but high specificities in NM and NMI (0.913, 0.769). Comparing the three methods designed for interactions, HierNet demonstrated the highest sensitivity and highest FDR, whereas SNIF had the highest specificity and lowest FDR across all settings.

Figure 4.

Figure 4 is a set of eight dot plots. The top-four graphs are titled L M, N M, L M I, and N M I, plotting specificity, ranging from 0.0 to 1.0 in increments of 0.2 (y-axis) across sensitivity, ranging from 0.0 to 1.0 in increments of 0.2 (x-axis) for Lasso-M, Enet-M, G-Lasso-M, Lasso-M I, Enet-M I, G-Lasso-M I, B K M R, R F, HigLasso, HierNet, S N I F, main, interaction. The bottom-four graphs are titled L M, N M, L M I, and N M I, plotting false discovery rate, ranging from 0.0 to 1.0 in increments of 0.2 (y-axis) across sensitivity, ranging from 0.0 to 1.0 in increments of 0.2 (x-axis) for Lasso-M, Enet-M, G-Lasso-M, Lasso-M I, Enet-M I, G-Lasso-M I, B K M R, R F, HigLasso, HierNet, S N I F, main, interaction.

Selection accuracy for main and interaction identification among 11 methods, where continuous outcome is generated from LM, NM, LMI, and NMI. Means of Sensitivity, Specificity, and FDR are obtained from 500 data replications with ntrain=500, p=20, q=5, and R2=0.2. Please refer to Table S4 in supplementary material for details. Note: FDR, false discovery rate; LM, linear main effects; LMI, linear main effects and interactions; NM, nonlinear main effects; NMI, nonlinear main effects and interactions.

For binary outcomes (Figure 5 and Table S5: main/marginal), similar to the results for continuous outcomes, G-Lasso-M and G-Lasso-MI had shown very low specificity and highest FDR. Comparing Lasso-M and Lasso-MI or Enet-M and Enet-MI under all data settings, we saw similar trends as continuous outcomes, where sensitivity decreased, specificity increased, and FDR decreased after including interactions. Excluding G-Lasso, Enet-M had the highest sensitivity in Logit and LogitI (0.978, 0.954), and HierNet had the highest sensitivity in Nlogit and NlogitI (0.780, 0.846). Lasso-MI demonstrated the highest specificity and the lowest FDR in all four settings (e.g., 0.803 and 0.367 in Logit).

Figure 5.

Figure 5 is a set of eight dot plots. The top-four graphs are titled Logit, Nlogit, LogitI, and NlogitI, plotting specificity, ranging from 0.0 to 1.0 in increments of 0.2 (y-axis) across sensitivity, ranging from 0.0 to 1.0 in increments of 0.2 (x-axis) for Lasso-M, Enet-M, G-Lasso-M, Lasso-M I, Enet-M I, G-Lasso-M I, R F, HierNet, main, interaction. The bottom-four graphs are titled Logit, Nlogit, LogitI, and NlogitI, plotting false discovery rate, ranging from 0.0 to 1.0 in increments of 0.2 (y-axis) across sensitivity, ranging from 0.0 to 1.0 in increments of 0.2 (x-axis) for Lasso-M, Enet-M, G-Lasso-M, Lasso-M I, Enet-M I, G-Lasso-M I, R F, HierNet, main, interaction.

Selection accuracy for main and interaction identification among seven methods, where binary outcome is generated from Logit, Nlogit, LogitI, and NlogitI. Means of Sensitivity, Specificity, and FDR are obtained from 500 data replications with ntrain=500, p=20, q=5, and R2=0.2. Please refer to Table S5 in supplementary material for details. Note: FDR, false discovery rate; Logit, logit link linear main effects; LogitI, logit link linear main effects and interactions; Nlogit, logit link nonlinear main effects; NlogitI, logit link nonlinear main effects and interactions.

Interaction detection.

For continuous outcomes (Figure 4 and Table S4: interaction), G-Lasso-MI again exhibited a specificity of zero under LMI and NMI. For the remaining five methods, SNIF achieved the lowest FPR of 0.000 in both LM and NM, whereas HierNet and Enet-MI had the highest FPR in LM (0.060) and NM (0.105), respectively. In LMI and NMI, Enet-MI demonstrated the highest sensitivity (0.661, 0.310), and SNIF again showed the highest specificity (0.999, 1.000) and lowest FDR (0.168, 0.016).

For binary outcomes (Figure 5 and Table S5: interaction), G-Lasso-MI had high FPR and low specificity. Lasso-MI and HierNet had the lowest FPR in Logit and Nlogit (0.057, 0.053). In LogitI and NlogitI, Enet-MI had the highest sensitivity (0.343, 0.251) and FDR (0.816, 0.867) among the three methods, whereas HierNet had slightly higher specificity and lower FDR. Overall, Lasso-MI, Enet-MI and HierNet produced similar results in interaction selection. However, the sensitivity for interactions in LogitI and NlogitI was low for all three methods, suggesting that identification of interactions is challenging for binary outcomes.

Outcome prediction and risk stratification.

For continuous outcomes (Table 3: “Continuous outcome and continuous ERS/WQS/Q-gcomp”), in LM setting, Enet-M showed the highest correlation coefficient and the smallest SSEs (Corr=0.43, SSE=36.8), followed by SL with Corr of 0.42 and SSE of 37.2. In LMI, Enet-MI and SL-ERS had competitive Corr (0.39, 0.39) and SSE (155.4, 155.7). Because the true model LMI includes the interactions, models that accounted for interactions in general fit the data much better than models with only main effects. For instance, the Corr for Enet-M and Enet-MI were 0.19 and 0.39, respectively. This finding emphasizes the importance of including the interactions in the model fitting when there are true interactive associations. BKMR performed the weakest among the five ERSs that accounted for interactions in terms of lowest Corr of 0.26. In NM, SL-ERS has the highest Corr of 0.40 and lowest SSE of 69.4. Note that even though the true model NM only had main effects, the ERS models considering interactions outperformed the models considering only main effects. This performance is not surprising because interactions can capture nonlinear associations partially. Similar results are found in the NMI setting, where SL fit the data best and models considering interactions demonstrated advantages over models with only main effects. In addition, it is worth mentioning that BKMR seemed to fit nonlinear models better than linear models. When comparing WQS-M* with WQS-M or Q-gcomp-M* with Q-gcomp-M, the models with variable selection outperformed the models without variable selection under most data scenarios. Similar results are seen when comparing WQS-MI* with WQS-MI, where WQS-MI* had shown better or very similar Corr and SSE with WQS-MI. However, Q-gcomp-MI showed a significant reduction in Corr and a large increase in SSE, suggesting that Q-gcomp-MI does not fit the full data well. It is also noteworthy that WQS and Q-gcomp models had shown similar results in comparison with other methods under the LM setting, but their predictions were worse under other settings with interactions and/or nonlinearity.

Table 3.

Risk prediction performance by different statistical methods, when data are generated from LM, NM, LMI, and NMI. Means of Corr, SSE, AUC, and median of OR are obtained from 500 data replications for ntest=500, p=20, q=5, and R2=0.2.

Data Metric ERS Enet-M WQS-M* WQS-M Q-gcomp-M* Q-gcomp-M ERS Enet-MI ERS BKMR ERS HierNet ERS SNIF SL-ERS WQS -MI* WQS -MI Q-gcomp-MI* Q-gcomp-MI
Continuous outcome and continuous ERS/WQS/Q-gcomp
LM Corr 0.43 0.41 0.41 0.40 0.39 0.42 0.32 0.42 0.41 0.42 0.39 0.33 0.37 0.20
SSE 36.8 37.6 37.6 37.8 38.3 37.4 40.8 37.3 37.6 37.2 38.4 40.0 39.4 66.0
LMI Corr 0.19 0.19 0.20 0.18 0.16 0.39 0.26 0.36 0.28 0.39 0.34 0.34 0.32 0.17
SSE 176.1 176.9 176.5 178.0 180.8 155.4 172.6 159.2 176.3 155.7 162.4 161.7 170.7 275.0
NM Corr 0.31 0.28 0.28 0.27 0.26 0.37 0.38 0.40 0.37 0.40 0.30 0.29 0.28 0.14
SSE 74.7 76.1 76.2 76.6 77.8 71.8 70.6 69.5 80.5 69.4 75.3 76.0 78.6 128.9
NMI Corr 0.30 0.28 0.27 0.27 0.25 0.37 0.37 0.39 0.35 0.39 0.31 0.31 0.29 0.15
SSE 111.7 113.6 113.9 114.3 116.2 106.4 106.4 104.4 121.9 104.3 111.4 111.5 116.1 189.5
Binary outcome and continuous ERS/ WQS/Q-gcomp
LM AUC 0.73 0.72 0.72 0.72 0.72 0.73 0.68 0.73 0.72 0.73 0.72 0.70 0.71 0.62
LMI AUC 0.65 0.65 0.65 0.64 0.63 0.73 0.67 0.72 0.68 0.73 0.72 0.71 0.71 0.64
NM AUC 0.70 0.68 0.68 0.68 0.67 0.72 0.73 0.74 0.73 0.74 0.69 0.68 0.68 0.60
NMI AUC 0.70 0.69 0.69 0.68 0.67 0.72 0.72 0.73 0.72 0.73 0.70 0.69 0.69 0.61
Binary outcome and categorical ERS/ WQS/Q-gcomp
LM OR 11.2 10.0 9.8 9.8 9.4 10.4 8.0 10.8 10.3 10.8 9.0 6.1 8.2 3.0
LMI OR 2.9 2.9 3.0 2.8 2.7 6.7 5.8 5.6 4.3 6.9 6.0 5.7 5.4 3.1
NM OR 5.6 5.0 4.9 5.1 4.7 7.2 7.8 8.7 7.0 8.7 5.8 4.9 5.1 2.4
NMI OR 5.5 5.0 4.9 4.9 4.7 7.2 6.8 7.4 6.3 7.8 5.5 5.1 5.3 2.6

Note: AUC, area under the receiver operating characteristic curve; BKMR, Bayesian kernel machine regression; Corr, correlation; Enet-M, elastic net for main effects; Enet-MI, elastic net for main effects and interactions; ERS, environmental risk score; HierNet, lasso for hierarchical interactions; LM, linear main effects; LMI, LM effects and interactions; NM, nonlinear main effects; NMI, NM and interactions; OR, odds ratio; Q-gcomp, quantile g-computation; Q-gcomp-M, Q-gcomp for main effects; Q-gcomp-M*, Q-gcomp for selected main effects by Enet-M; Q-gcomp-MI, Q-gcomp for main effects and interactions; Q-gcomp-MI*, Q-gcomp for selected main effects and interactions by Enet-MI; SL-ERS, Super Learner used to adaptively combine component ERS through weighting; SNIF, selection of nonlinear interactions by a forward stepwise algorithm; SSE, sum of squared error; WQS, weighted quantile sum regression; WQS-M, WQS for main effects; WQS-M*, for selected main effects by Enet-M; WQS-MI, WQS for main effects and interactions; WQS-MI*, WQS for selected main effects and interactions by Enet-MI.

For binary outcomes dichotomized from continuous outcome with the continuous ERS/WQS/Q-gcomp, AUC results were consistent with those of Corr and SSE. In LM, Enet-M, Enet-MI, HierNet, and SL-ERS, all achieved the highest AUC of 0.73, whereas in the remaining three settings, the five ERS methods considering interactions had a higher AUC than the methods with only main effects. SL-ERS achieved the highest AUC in all settings. For categorical ERS/WQS/Q-gcomp, Enet-M was top-performing, with highest risk stratification OR (OR=11.2), followed by HierNet and SL-ERS (OR=10.8) in LM, and SL-ERS had highest OR in LMI, NM, and NMI. To summarize Table 3, SL-ERS was the top-performing method across all settings, demonstrating the strength of its ensemble algorithm that combines multiple learners.

For binary outcomes (Table 4), Enet-M had the highest AUC of 0.812 and lowest Brier of 0.096 and highest OR of 33.5 in Logit setting, followed by SL-ERS (AUC=0.801) and Lasso-MI, Enet-MI (Brier=0.098), and WQS-M* (OR=28.6). In LogitI, the three models that only considered main effects had higher or similar AUC and higher OR than the five ERS methods that incorporated interactions. In Nlogit and NlogitI, HierNet and SL-ERS achieved the highest AUC (0.753 and 0.780) and high ORs (11.3, 14.1). For WQS and Q-gcomp, the results comparing the models with and without variable selection show that the models with variable selection outperformed the full models in terms of higher AUC, lower or similar Brier score, and higher risk stratification OR regardless of settings. It is evident that variable selection can greatly improve the prediction accuracy for these two methods.

Table 4.

Risk prediction performance by different statistical methods, when data are generated from Logit, LogitI, Nlogit, and NLogitI. Means of AUC and Brier, and median of OR are obtained from 500 data replications for ntest=500, p=20, q=5 and R2=0.2.

Data Metric ERS Enet-M WQS-M* WQS-M Q-gcomp-M* Q-gcomp-M ERS Lasso-MI ERSEnet-MI ERSRF ERS HierNet SL-ERS WQS-MI* WQS-MI Q-gcomp-MI* Q-gcomp-MI
Binary outcome and continuous ERS WQS/Q-gcomp
Logit AUC 0.812 0.799 0.798 0.795 0.787 0.800 0.800 0.778 0.796 0.801 0.770 0.740 0.763 0.594
Brier 0.096 0.099 0.099 0.100 0.103 0.098 0.098 0.101 0.100 0.099 0.102 0.104 0.109 0.273
LogitI AUC 0.771 0.762 0.761 0.758 0.749 0.756 0.757 0.735 0.761 0.759 0.732 0.721 0.723 0.589
Brier 0.094 0.096 0.097 0.097 0.099 0.094 0.094 0.098 0.095 0.094 0.097 0.098 0.104 0.271
Nlogit AUC 0.751 0.733 0.731 0.727 0.714 0.750 0.748 0.732 0.753 0.753 0.712 0.698 0.702 0.561
Brier 0.108 0.112 0.112 0.114 0.116 0.109 0.109 0.110 0.109 0.108 0.114 0.114 0.121 0.297
NlogitI AUC 0.778 0.760 0.759 0.755 0.743 0.779 0.776 0.762 0.780 0.780 0.742 0.727 0.729 0.586
Brier 0.085 0.088 0.088 0.089 0.092 0.084 0.085 0.086 0.085 0.085 0.089 0.089 0.097 0.254
Binary outcome and categorical ERS WQS/Q-gcomp
Logit OR 33.5 28.6 27.2 27.0 23.4 23.9 24.1 15.4 25.5 25.6 13.6 8.7 14.2 2.1
LogitI OR 10.5 10.2 9.8 9.8 9.4 9.0 9.2 4.8 9.5 9.1 7.1 6.6 6.6 2.0
Nlogit OR 11.0 9.6 9.0 9.3 8.1 10.7 10.5 6.7 11.3 11.2 7.3 6.0 6.6 1.6
NlogitI OR 13.6 11.9 11.4 11.7 10.6 13.7 12.9 8.7 13.9 14.1 8.7 7.9 8.7 2.0

Note: Mean prevalence of outcome over 500 replicates equals to 13.6%, 12.9%, 14.5%, and 11.2% for Logit, LogitI, Nlogit and NlogitI, respectively. AUC, area under the receiver operating characteristic curve; Brier, Brier score; Enet-M, elastic net for main effects; Enet-MI, elastic net for main effects and interactions; ERS, environmental risk score; HierNet, lasso for hierarchical interactions; Lasso-MI, lasso for main effects and interactions; Logit, logit link linear main effects; LogitI, logit-link linear main effects and interactions; Nlogit, logit link nonlinear main effects; NlogitI, logit link nonlinear main effects and interactions; OR, odds ratio; Q-gcomp, quantile g-computation; Q-gcomp-M*, Q-gcomp for selected main effects by Enet-M; Q-gcomp-M, Q-gcomp for main effects; Q-gcomp-MI*, Q-gcomp for selected main effects and interactions by Enet-MI; Q-gcomp-MI, Q-gcomp for main effects and interactions; RF, random forest; SL-ERS, Super Learner used to adaptively combine component ERS through weighting; SNIF, selection of nonlinear interactions by a forward stepwise algorithm; WQS, weighted quantile sum regression; WQS-M, WQS for main effects; WQS-M*, WQS for selected main effects by Enet-M; WQS-MI, WQS for main effects and interactions; WQS-MI*, WQS for selected main effects and interactions by Enet-MI.

Summary.

Based on the simulation results presented in Tables 3 and 4 and S4–S17, we summarize recommendations for the methods in Table 5 under different settings of sample sizes, number of pollutants, and small or medium signals for continuous and binary outcomes. For continuous outcomes, the results consistently suggest that HierNet, SNIF, and Enet-MI have the most stable selections for pollutants and their interactions. For pollutant selection, HierNet almost always shows the highest sensitivity, whereas SNIF exhibits the lowest sensitivity but the highest specificity and lowest FDR, suggesting that even though it may miss important pollutants, it tends to select only the truly significant pollutants. Depending on the specific research question, we recommend that researchers use both HierNet and SNIF to compare the results in the real data analysis, considering whether they prefer sensitivity or control of FDR. For interaction selection, Enet-MI and SNIF seem to perform better than the other methods, but in general, the sensitivity is very low and FDR is high. This finding suggests that there is a clear need for the development of new methods for detection of interactions. For prediction and as a summary score, SL-ERS and HierNet perform better than the rest of methods. SL-ERS has the advantage of ensemble learning from predictions via multiple methods, and HierNet, even though a linear method, can fit nonlinear data by selecting the interactions and quadratic terms.

Table 5.

Recommendation table of the methods under various data settings based on our simulation study.

Sample size
Medium (n=1,000/p=200) Large (n=2,000/p=40)
Signal Medium Small Medium Small
Continuous outcome
 Pollutant selection HierNet, SNIF HierNet, SNIF HierNet, SNIF HierNet, SNIF
 Interaction detection Enet-MI, SNIF Enet-MI, SNIF Enet-MI, SNIF Enet-MI, SNIF
 Prediction SL-ERS, HierNet SL-ERS, HierNet SL-ERS, HierNet SL-ERS, Enet-M
 Tables S4 and 3 S6 and S7 S10 and S11 S14 and S15
Binary outcome
 Pollutant selection Enet-M, Lasso-MI, HierNet Enet-M, Lasso-MI Lasso-MI, HierNet Enet-M, Lasso-MI, HierNet
 Interaction detection Enet-MI, HierNet Enet-MI, HierNet Lasso-MI, Enet-MI Enet-MI, HierNet
 Prediction Enet-M, SL-ERS Enet-M Enet-M, SL-ERS Enet-M, SL-ERS
 Tables S5 and 4 S8 and S9 S12 and S13 S16 and S17

Note: Enet-M, elastic net for main effects; Enet-MI, elastic net for main effects and interactions; ERS, environmental risk score; HierNet, lasso for hierarchical interactions; Lasso-MI, lasso for main effects and interactions; SL-ERS, Super Learner used to adaptively combine component ERS through weighting; SNIF, selection of nonlinear interactions by a forward stepwise algorithm.

For binary outcomes, there are limited number of methods available. For pollutant selection, Enet-M, Lasso-MI, and HierNet exhibit satisfactory sensitivity, but their FDRs are relatively high. For interaction detection, G-lasso-MI has low specificity, so the options remaining are Enet-MI, Lasso-MI, and HierNet. Unfortunately, these methods suffer from low sensitivity and high FDR, making the selection for interactions in binary outcomes quite challenging. For prediction, ERS constructed from Enet-M outperforms many methods under various settings, suggesting that a parsimonious model might achieve the same or better prediction accuracy in comparison with other larger models for binary outcomes. For WQS and Q-gcomp, the results show that the models with variable selection provide higher AUC, lower or similar Brier score, and higher risk stratification OR regardless of settings than the models without variable selection.

Computing time.

To compare the computing time for each method, Table S18 lists the mean computing time in seconds for each method under various data settings when signals are small. Specifically, we considered settings of ntrain=500 and p=20 or ntrain=1,000 and p=40 with R2=0.1 for continuous and binary outcomes. We ran each setting with 100 data replications, then calculated the mean time each took. There is a significant difference in computing time among these methods, with Q-gcomp being the fastest, requiring as little as 0.03 s, followed by RF, Enet, Lasso, SNIF, and WQS, all of which with negligible computing time, and BKMR (2,000 MCMC iterations) and HigLasso being the slowest, requiring 2,000–50,000 s. The computing time varied only slightly for different true data settings for the same sample size/number of pollutants. However, computing time increased significantly when sample size increased from 500 to 1,000 and the number of pollutants increased from 20 to 40.

PROTECT Data Analysis Results

Pollutant identification and interaction detection.

Table 6 reports the variables that are selected at least 30% of the time by each method in the 100 fittings using random training data. For birth weight, Enet-M selects two metals: barium (Ba) and arsenic (As) and one phthalate: mono carboxyisooctyl phthalate (MCOP). Figure S1 in the Supplementary Material shows the distributions of the 100 coefficient estimates for these three chemicals, indicating positive associations with birth weight when Ba is selected and negative associations when As and MCOP are selected. Note that Enet-M can only select 39 chemicals because covariates are controlled. Enet-MI only selects one main effect “gestational age,” and it tends to select interactions over main effects in comparison with Enet-M. BKMR does not select any chemicals more than 30 times, so we report the top five selected chemicals-Ba, manganese (Mn), As, cobalt (Co), and tin (Sn)—all of which are metals. MCMC convergence is assessed by Gelman-Rubin diagnostic statistic; see Figure S3 in the Supplementary Material for more details. HierNet selects 12 main effects for chemicals (seven of which are metals) and seven interactions, six of which involve gestational age and a chemical. SNIF only selects main effects of metals and covariates, and based on the simulations, SNIF tends to be conservative in selection as evidenced by the lowest FDR, indicating that the exposures or covariates it selects are almost always true predictors. Comparing selections across methods, we find that metals (Ba, As, Co and Mn), phthalates (monobenzyl phthalate (MBZP) and MCOP), and the phenol bisphenol A (BPA) are frequently selected as main effects or interactions.

Table 6.

Main and interaction effects selected in at least 30% of the 100 random sampled training data in PROTECT study.

Outcome Method Selected term (selection percentage)
Birth weight Enet-M Ba (43%) MCOP (33%) As (31%)
Enet-MI Ga (100%) Age × Ga (78%) Age × Sex (70%) Sex × Ga (62%) BPA × Co (54%)
Hg × Mn (44%) Mo × Ga (43%) As × MCOP (40%) Co × Mn (38%) Cd × MEHP (36%)
As × Cd (33%) Ni × Sex (31%)
BKMRa Ba (20%) Mn (18%) As (17%) Co (15%) Sn (15%)
HierNet Ga (100%) Sex (87%) Age (86%) Cu (67%) Ba (65%)
Sn (59%) Tl × Ga (50%) Mn × Ga (49%) Ni (48%) MBZP × Age (47%)
MCOP (45%) As (44%) MIBP × Ga (43%) Co (42%) MBZP (42%)
Edu × Ga (38%) Cd (36%) BPA (34%) 1-OH-NAP (33%) MIBP (33%)
Zn × Ga (32%) Co × Ga (31%)
SNIF Co (100%) Ga (99%) Sex (78%) Ba (48%) Age (45%)
Mn (41%)
Preterm birth Enet-M BP-3 (67%) Mn (65%) Cd (61%) MBP (58%) Mo (54%)
P-PB (52%) Zn (46%) MCNP (44%) 2-OH-NAP (41%) MEHP (39%)
1-OH-PYR (35%) ————
Lasso-MI Cd × MBP (40%) BP-3 × Edu (34%) ———
Enet-MI Cd × MBP (44%) BP3 × Edu (42%) Hg × Mn (35%) Mo × Zn (33%) Mo × Edu (33%)
RF MEOHP (100%) MEHHP (100%) MECPP (99%) 2-OH-FLU (85%) 1-OH-PHE (77%)
MBP (76%) 2-3-OH-PHE (61%) 1-OH-PYR (55%) MCNP (50%) MEHP (50%)
MIBP (47%) MCOP (43%) 4-OH-PHE (39%) MBZP (33%) MCPP (31%)
HierNet Mn (52%) Ba (33%) BP-3 (31%) ——

Note: The covariates accounted for in birth weight are infant sex, gestational age at delivery (weeks), education (high school or less, some college, college or above), and maternal age at recruitment (years); covariates accounted for preterm include aforementioned variables except gestational age. The covariates are adjusted and not penalized in models of Enet-M and BKMR; whereas covariates can be selected in Enet-MI, Lasso-MI, HierNet, SNIF and RF. —, no data; 1-OH-NAP, 1-hydroxynapthalene; 1-OH-PHE, 1-hydroxyphenanthrene; 1-OH-PYR, 1-hydroxypyrene; 2-OH-FLU, 2-hydroxyfluorene; 2-3-OH-PHE, 2,3-hydroxyphenanthrene; 4-OH-PHE, 4-hydroxyphenanthrene; As, arsenic; Ba, barium; BKMR, Bayesian kernel machine regression; BP-3, benzophenone-3; BPA, bisphenol A; Cd, cadmium; Co, cobalt; Cu, copper; Enet-M, elastic net for main effects; Enet-MI, elastic net for main effects and interactions; Hg, mercury; HierNet, lasso for hierarchical interactions; Lasso-MI, lasso for main effects and interactions; MBP, mono-n-butyl phthalate; MBZP, monobenzyl phthalate; MCOP, mono carboxyisooctyl phthalate; MCNP, mono carboxyisononyl phthalate; MCPP, mono-3-carboxypropyl phthalate; MECPP, mono-2-ethyl-5-carboxypentyl phthalate; MEHP, mono-2-ethylhexyl phthalate; MEHHP, mono-2-ethyl-5-hydroxyhexyl phthalate; MEOHP, mono-2-ethyl-5-oxohexyl phthalate; MIBP, mono-isobutyl phthalate; Mn, manganese; Mo, molybdenum; Ni, nickel; P-PB, propyl paraben; RF, random forest; Sn, tin; SNIF, selection of nonlinear interactions by a forward stepwise algorithm; Tl, thallium; Zn, zinc.

a

Top five compounds that are most frequently selected.

For preterm birth, Lasso-MI and Enet-MI tended to select interactions over main effects, whereas HierNet selected main effects only. RF selected 15 pollutants, including 10 phthalates and 5 PAHs, with mono-2-ethyl-5-oxohexyl phthalate (MEOHP) and mono-2-ethyl-5-hydroxyhexyl phthalate (MEHHP) being the top-selected pollutants. The metals Mn and cadmium (Cd) or their interactions with other chemicals were selected across methods. The only covariate selected was maternal education. It is worth noting that for both birth weight and preterm outcomes, HierNet also screens for any quadratic terms, but we omit summarizing them in this analysis because quadratic term selection is not a focus of this paper.

Outcome prediction and risk stratification.

Tables 7 and 8 report a comparison of the predictive power among summary scores of ERS, WQS, and Q-gcomp for different outcomes. For birth weight, the mean weight for SL across 100 times of random splits for Enet-MI, BKMR, HierNet, and SNIF are 56.3%, 1.0%, 36.0%, and 6.7%, indicating that Enet and HierNet have better overall predictive performance. Enet-M and SL-ERS outperform other methods in terms of Corr and SSE for main effect and main and interaction effect models, respectively. For the low birth weight binary outcome, SL-ERS achieves highest AUC (0.839) and HierNet achieves the highest OR of having low birth weight (12.95) when comparing the lowest quartiles of ERSs vs. the rest of the samples. For the high birth weight binary outcome, Enet-M achieves highest AUC (0.664) among the other methods, suggesting that main effect models fit the data better for the high birth weight outcome. WQS-M yields the highest OR of having high birth weight (2.64) when comparing the highest quartiles of predictive values vs. the rest of the samples.

Table 7.

Comparison of risk prediction performance by different methods for the birth weight outcome in PROTECT analysis.

Outcome Metric ERS Enet-M WQS-M Q-gcomp-M ERS Enet-MI ERS BKMR ERS HierNet ERS SNIF SL-ERS
Continuous birth weight and continuous ERS
Birth weight Corr 0.544 0.541 0.512 0.535 0.537 0.523 0.481 0.535
SSE 0.198 0.199 0.209 0.201 0.201 0.204 0.307 0.201
Low birth weight and continuous ERS
AUC 0.835 0.836 0.827 0.838 0.835 0.838 0.821 0.839
Low birth weight and categorical ERS (Q1 vs. rest)
OR 12.00 12.14 10.04 12.66 12.34 12.95 12.46 12.15
High birth weight and continuous ERS
AUC 0.664 0.661 0.630 0.656 0.661 0.656 0.629 0.657
High birth weight and categorical ERS (Q3 vs. rest)
OR 2.52 2.64 2.00 2.33 2.54 2.50 2.07 2.45

Note: The covariates include infant sex, gestational age at delivery (weeks), maternal education (high school or less, some college, college or above) and maternal age at recruitment (years). The covariates are adjusted and not penalized in models of Enet-M, WQS-M, Q-gcomp-M, and BKMR, whereas covariates can be selected in Enet-MI, HierNet, and SNIF. AUC, area under the receiver operating characteristic curve; BKMR, Bayesian kernel machine regression; Corr, correlation; Enet-M, elastic net for main effects; Enet-MI, elastic net for main effects and interactions; ERS, environmental risk score; HierNet, lasso for hierarchical interactions; Q1, 25th percentile; Q3, 75th percentile; Q-gcomp, quantile g-computation; Q-gcomp-M, Q-gcomp for main effects; SSE, sum of squared error; OR, odds ratio; SL-ERS, Super Learner used to adaptively combine component ERS through weighting; SNIF, selection of nonlinear interactions by a forward stepwise algorithm; WQS, weighted quantile sum regression; WQS-M, WQS for main effects.

Table 8.

Comparison of risk prediction performance by different methods for the preterm birth outcome in PROTECT analysis

Outcome Metric ERS Enet-M WQS-M Q-gcomp-M ERS Lasso-MI ERS Enet-MI ERS RF ERS HierNet SL-ERS
Preterm birth and continuous ERS
Preterm birth AUC 0.597 0.570 0.594 0.547 0.544 0.549 0.527 0.553
Brier 0.083 0.083 0.088 0.083 0.083 0.086 0.083 0.083
Preterm birth and ERS (high vs. low risk)
OR 2.31 1.91 2.31 1.56 1.49 2.00 1.23 1.73

Note: The covariates include infant sex, maternal education (high school or less, some college, college or above) and maternal age at recruitment (years). The covariates are adjusted and not penalized in models of Enet-M, WQS-M, and Q-gcomp-M, whereas covariates can be selected in Lasso-MI, Enet-MI, and HierNet. AUC, area under the receiver operating characteristic curve; Brier, Brier score; Enet-M, elastic net for main effects; Enet-MI, elastic net for main effects and interactions; ERS, environmental risk score; HierNet, lasso for hierarchical interactions; Lasso-MI, lasso for main effects and interactions; OR, odds ratio; Q-gcomp, quantile g-computation; Q-gcomp-M, Q-gcomp for main effects; RF, random forest; SL-ERS, Super Learner used to adaptively combine component ERS through weighting; WQS, weighted quantile sum regression; WQS-M, WQS for main effects.

For preterm binary outcome, the mean weight for SL-ERS across 100 times of random splits for Lasso, Enet, RF, and HierNet were 20.8%, 11.1%, 10.8%, and 57.3%, respectively, indicating that HierNet has better overall predictions than the other three approaches. The main effect models give higher AUC than the four individual ERSs accounting for interactions, where ERS-M had the highest AUC (0.597). Enet-M, WQS-M, Lasso-MI, Enet-MI, HierNet, and SL-ERS all achieved the smallest Brier scores (0.083). Enet-M and Q-gcomp-M had the highest OR of having a preterm birth when comparing the highest and lowest quartiles of summary measures. For the main and interaction models, SL-ERS had the best AUC, and Brier and RF had the highest OR.

Discussion

This paper presents an analytical framework to study the association between exposure to chemical mixtures and health outcomes. We evaluated several statistical methods for three research questions in mixtures analyses through simulation studies that ranged from simple linear models to complex nonlinear models. Although the methods evaluated in this paper are not exhaustive, they represent a diverse set of approaches with unique strengths that can be used to answer specific research questions. To enhance the prediction accuracy among ERSs from different learners, we propose a method inspired by SL, namely, SL-ERS, where we iteratively solve weights for each candidate ERS and combine their predictions using their weighted sum of ERSs. We have developed an R package “CompMix: A comprehensive toolkit for environmental mixtures analysis” for practitioners to analyze their data and compare results across versatile methods.

Lessons Learned from Simulation Studies

Our simulation studies for continuous outcomes demonstrate that for pollutant selection, HierNet almost always shows the highest sensitivity; for interaction detection, Enet-MI and SNIF seem to perform better than the other methods; for prediction, SL-ERS and ERS constructed from HierNet outperform other methods across the settings, highlighting SL-ERS’s strength as an ensemble algorithm that combines multiple learners. For pollutant and interaction selection with a binary outcome, all the investigated methods either exhibited high sensitivity and high FDR or low sensitivity. For prediction, Enet-M outperforms many methods under various settings, suggesting that a parsimonious model might achieve the same or better prediction accuracy in comparison with other larger models. Furthermore, we notice that regardless of whether the true data are generated with interactions or not, fitting models that account for nonlinearity (such as BKMR) or include interactions generally yield better results than models with only main effects. Thus, we recommend considering models that accommodate interaction and nonlinearity in addition to linear models. As sample size and the number of pollutants increase, methods such as Lasso, Enet, HierNet, RF, Q-gcomp, and WQS tend to remain relatively stable in computational cost, primarily because they are not considering nonlinear association and interactions in their current implementation, whereas methods such as BKMR with full Bayesian implementation becomes significantly more computationally intensive, requiring substantial computing resources.

New Insights from PROTECT Data Analysis

Metals (Ba, As and Co), phthalates (MBZP and MCOP), and phenol (BPA) are more likely to be associated with birth weight after adjusting for possible confounding factors such as maternal age and gestational age at delivery. In particular, the interaction effects between Co and BPA on birth weight are more frequently identified in comparison with others. Our analysis also indicates that metals Mn and Cd and their interactions may have high impact on the preterm birth. All these findings are confirmed by different methods, which deserve further investigations by environmental epidemiologists.

Software

To facilitate the implementation of the statistical methods among practitioners, we have developed an open-source R package “CompMix: A comprehensive toolkit for environmental mixtures analysis,” currently featuring the implementation of seven methods, including Lasso, Enet, BKMR, RF, HierNet, WQS, and Q-gcomp for continuous outcomes, and six methods, including Lasso, Enet, RF, HierNet, WQS, and Q-gcomp for binary outcomes. The package offers the flexibility to perform three tasks: a) pollutant selection, b) interaction detection, and c) outcome predictions and risk stratification for users to determine which models fit their data best. Our package offers several unique features to existing software. First, it provides an easily used interface with few input arguments. All tuning parameters have been set to default values tested by extensive simulation studies, greatly facilitating off-the-shelf tuning parameter selection. On the other hand, the package also provides an interface to modify the tuning parameters and model specifications for statisticians who are more familiar with those existing packages. Second, this package also provides a comprehensive summary of model fit, which offers useful information for the users to select the appropriate methods for their data. We will update this software regularly and include more emerging methods as they become available in the future. The package is available to download from the Comprehensive R Archive Network (doi:10.32,614/CRAN.package.CompMix).

Limitations of the Current Study

First, our simulation studies did not include any covariates in the models when comparing different methods. This is because different methods adjust covariates in distinct ways. For example, Lasso and Enet can force covariates into the model without penalization or perform variable selection on covariates. Alternatively, methods like HierNet, SNIF, HigLasso, and RF treat covariates in the same manner that they treat pollutants. These methods do not allow the flexibility to include certain covariates in the model and carry out the subsequent model-building/tree-construction steps conditional on the covariates. In addition, BKMR models nonlinear associations between pollutants and outcome while adjusting covariates linearly. As a result, it would be very challenging to compare the performance across methods with different covariate specifications. Confounder selection in mixtures analysis merits a separate paper in its own right.48,49 Another limitation of this study is the scale dependence of interactions. A statistical interaction that is observed on one scale may disappear when exposure or outcome is transformed, which is quite counterintuitive to the notion of a biological interaction.50 We have continued to evaluate interaction models by using cross-product terms in this study. For binary outcomes, there is a growing literature suggesting that additive interaction on risk difference scale may be more pertinent and public health relevant than considering multiplicative interactions.51 Extending some of the large-scale penalized methods to characterize additive interactions or testing for additive interactions after fitting a multiplicative model in the spirit of several existing methods5255 may be warranted in the future. Third, the datasets we generated in the simulation studies represent only a limited sample of a small fraction of possible parametric configurations, so the conclusions may not be generalizable to datasets with greater complexity. We performed one simulation setting for linear main effect models where three true effects belong to the same chemical group, and the results can be found in Tables S19 and S20. Fourth, our simulation studies focus only on datasets with complete observations, whereas some chemicals in the PROTECT study may involve high percentages of measurements below LOD or missing measurements across three visits. The impact of imputation methods on the data analysis and scientific findings are worth further investigation. In our analysis of PROTECT data, we used single imputation due to its simplicity and computational advantage. Combining variable selection and cross-validation across multiply imputed datasets in a more principled way requires methods that are beyond the scope of this paper.56 Finally, our analysis for real data in the PROTECT study did not include all possible confounders, such as family income, parity, occupation, and others, which may influence both exposures and outcome.

Promises and Perils of Mixtures Analysis

Nearly 20 years have passed since the notion of the exposome was introduced as a parallel to the human genome.57 In these last two decades we have seen tremendous progress in collection and analysis of data on multiple pollutants in environmental health studies. However, the reproducibility and success in the genome world has not been mirrored in the exposome world. The reasons for these differences are manyfold: a) exposure data are much more complex with measurement error, temporal variability, correlation, interaction, and nonlinearity; and b) the culture of forming global consortia and data sharing resulting in massive sample sizes have not permeated the exposome world. The statistical truth is that it is nearly impossible to tease complex multivariate relationships apart from such modest-sized datasets.

Though in this paper we propose a pipeline to analyze mixtures, we recognize there is no default automatic one-size-fits-all solution for mixtures analysis. The goal of this paper is to make exploration/navigation of the multiverse of packages and models easier. Constructing ERS is a general framework that can incorporate many statistical approaches, including the methods we discuss in our paper, and the new methods that will be developed over time in this field. However, we must note that ERS is only useful for risk prediction and stratification purposes. We are still very far from a framework for developing regulatory policies targeting mixtures or ERSs. Cumulating exposure effects through a summary risk score often leads to loss of the specificity of a single exposure and the underlying biology. An agnostic statistical/machine learning–based approach should be supplemented by more targeted causal models with strong conceptual and contextual tenets.58,59 It would be audacious to trivialize the complexity and subtlety of mixtures analysis by using one software package like CompMix and reporting its output. We want to emphasize this point for the users of these methods.

Open Problems and Future Directions

To further study the health impact of mixtures, new and efficient statistical methods are urgently needed to address many important issues, including missing data and measurement errors,60 longitudinal exposures to detect critical time windows,6166 longitudinal outcomes, nonlinear interaction detection,67 integrating multi-omics data,68 mediation analysis with mixtures,69,70 and causal inference with mixtures.71 Although we focused on statistical approaches toward variable selection, interaction identification, and risk summarization, there exists a large body of literature on alternative ways of combining exposures using their molecular structure or toxicological profiles.72 Combining statistical and biological approaches in a happy scientific marriage is an important direction to pursue. Estimating the health effect from exposure to chemical mixtures is a complex and challenging topic that requires a multidisciplinary team comprising epidemiologists, statisticians, and toxicologists. This team must work together to formulate the scientific questions, identify the statistical barriers, interpret the study findings, and understand the limitations of the research. Through close collaborations, innovative methods can be designed and implemented in mixtures research to enhance our understanding of the health impacts from exposure to chemical mixtures.

Supplementary Material

ehp15305.s001.acco.pdf (1.4MB, pdf)

Acknowledgments

This research work was supported by the grants from the Puerto Rico PROTECT birth cohort (P42ES017198), Applying and Advancing Modern Approaches for Studying the Joint Impacts of Environmental Chemicals on Pregnancy Outcomes (R01ES031591), the Michigan Center on Lifestage Environmental Exposure and Disease (P30ES017885), and The Michigan Cancer and Research on the Environment (UH3 CA267907).

Conclusions and opinions are those of the individual authors and do not necessarily reflect the policies or views of EHP Publishing or the National Institute of Environmental Health Sciences.

References

  • 1.Carlin DJ, Rider CV, Woychik R, Birnbaum LS. 2013. Unraveling the health effects of environmental mixtures: an NIEHS priority. Environ Health Perspect 121(1):A6–A8, PMID: 23409283, 10.1289/ehp.1206182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Taylor KW, Joubert BR, Braun JM, Dilworth C, Gennings C, Hauser R, et al. 2016. Statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environ Health Perspect 124(12):A227–A229, PMID: 27905274, 10.1289/EHP547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Joubert BR, Kioumourtzoglou M-A, Chamberlain T, Chen HY, Gennings C, Turyk ME, et al. 2022. Powering Research through Innovative Methods for mixtures in Epidemiology (PRIME) program: novel and expanded statistical methods. Int J Environ Res Public Health 19(3):1378, PMID: 35162394, 10.3390/ijerph19031378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Osterman MJK, Hamilton BE, Martin JA, Driscoll AK, Valenzuela CP. 2022. Births: Final Data for 2020. National Center for Health Statistics (US). National Vital Statistics Reports. https://stacks.cdc.gov/view/cdc/112078 [accessed 11 June 2025].
  • 5.Kajantie E, Osmond C, Barker DJ, Eriksson JG. 2010. Preterm birth–a risk factor for type 2 diabetes? The Helsinki Birth Cohort study. Diabetes Care 33(12):2623–2625, PMID: 20823347, 10.2337/dc10-0912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Lewandowski AJ, Levy PT, Bates ML, McNamara PJ, Nuyt AM, Goss KN. 2020. Impact of the vulnerable preterm heart and circulation on adult cardiovascular disease risk. Hypertension 76(4):1028–1037, PMID: 32816574, 10.1161/HYPERTENSIONAHA.120.15574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ferguson KK, Rosen EM, Rosario Z, Feric Z, Calafat AM, McElrath TF, et al. 2019. Environmental phthalate exposure and preterm birth in the PROTECT birth cohort. Environ Int 132:105099, PMID: 31430608, 10.1016/j.envint.2019.105099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58(1):267–288, 10.1111/j.2517-6161.1996.tb02080.x. [DOI] [Google Scholar]
  • 9.Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67(2):301–320, 10.1111/j.1467-9868.2005.00503.x. [DOI] [Google Scholar]
  • 10.Yuan M, Lin Y. 2006. Model selection and estimation in regression with grouped variables. J R Stat Soc Series B Stat Methodol 68(1):49–67, 10.1111/j.1467-9868.2005.00532.x. [DOI] [Google Scholar]
  • 11.Breiman L. 2001. Random forests. Machine Learning 45(1):5–32, 10.1023/A:1010933404324. [DOI] [Google Scholar]
  • 12.Bobb JF, Claus Henn B, Valeri L, Coull BA. 2018. Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health 17(1):67, PMID: 30126431, 10.1186/s12940-018-0413-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, et al. 2015. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 16(3):493–508, PMID: 25532525, 10.1093/biostatistics/kxu058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Boss J, Rix A, Chen Y-H, Narisetty NN, Wu Z, Ferguson KK, et al. 2021. A hierarchical integrative group least absolute shrinkage and selection operator for analyzing environmental mixtures. Environmetrics 32(8):e2698, PMID: 34899005, 10.1002/env.2698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Narisetty NN, Mukherjee B, Chen YH, Gonzalez R, Meeker JD. 2019. Selection of nonlinear interactions by a forward stepwise algorithm: application to identifying environmental chemical mixtures affecting health outcomes. Stat Med 38(9):1582–1600, PMID: 30586682, 10.1002/sim.8059. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ferrari F, Dunson DB. 2021. Bayesian factor analysis for inference on interactions. J Am Stat Assoc 116(535):1521–1532, PMID: 34898761, 10.1080/01621459.2020.1745813. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bien J, Tibshirani R. 2020. hierNet: a Lasso for Hierarchical Interactions. R package version 1.9. https://CRAN.R-project.org/package=hierNet. [DOI] [PMC free article] [PubMed]
  • 18.Van der Laan M, Polley E, Hubbard A. 2007. Super learner. Stat Appl Genet Mol Biol 6(1):Article25, PMID: 17910531, 10.2202/1544-6115.1309. [DOI] [PubMed] [Google Scholar]
  • 19.Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. 2015. Characterization of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat 20(1):100–120, PMID: 30505142, 10.1007/s13253-014-0180-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Keil AP, Buckley JP, O’Brien KM, Ferguson KK, Zhao S, White AJ. 2020. A quantile-based g-computation approach to addressing the effects of exposure mixtures. Environ Health Perspect 128(4):047004, PMID: 32255670, 10.1289/EHP5838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Park SK, Tao Y, Meeker JD, Harlow SD, Mukherjee B. 2014. Environmental risk score as a new tool to examine multi-pollutants in epidemiologic research: an example from the NHANES study using serum lipid levels. PLoS One 9(6):e98632, PMID: 24901996, 10.1371/journal.pone.0098632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Park SK, Zhao Z, Mukherjee B. 2017. Construction of environmental risk score beyond standard linear models using machine learning methods: application to metal mixtures, oxidative stress and cardiovascular disease in NHANES. Environ Health 16(1):102–117, PMID: 28950902, 10.1186/s12940-017-0310-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Agier L, Portengen L, Chadeau-Hyam M, Basagaña X, Giorgis-Allemand L, Siroux V, et al. 2016. A systematic comparison of linear regression–based statistical methods to assess exposome-health associations. Environ Health Perspect 124(12):1848–1856, PMID: 27219331, 10.1289/EHP172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Braun JM, Gennings C, Hauser R, Webster TF. 2016. What can epidemiological studies tell us about the impact of chemical mixtures on human health? Environ Health Perspect 124(1):A6–A9, PMID: 26720830, 10.1289/ehp.1510569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Davalos AD, Luben TJ, Herring AH, Sacks JD. 2017. Current approaches used in epidemiologic studies to examine short-term multipollutant air pollution exposures. Ann Epidemiol 27(2):145–153. e1, PMID: 28040377, 10.1016/j.annepidem.2016.11.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gibson EA, Nunez Y, Abuawad A, Zota AR, Renzetti S, Devick KL, et al. 2019. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health 18(1):1–16, 10.1186/s12940-019-0515-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hamra GB, Buckley JP. 2018. Environmental exposure mixtures: questions and methods to address them. Curr Epidemiol Rep 5(2):160–165, PMID: 30643709, 10.1007/s40471-018-0145-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Hoskovec L, Benka-Coker W, Severson R, Magzamen S, Wilson A. 2021. Model choice for estimating the association between exposure to chemical mixtures and health outcomes: a simulation study. PLoS One 16(3):e0249236, PMID: 33765068, 10.1371/journal.pone.0249236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Sun Z, Tao Y, Li S, Ferguson KK, Meeker JD, Park SK, et al. 2013. Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons. Environ Health 12(1):1–19, 10.1186/1476-069X-12-85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Friedman J, Hastie T, Tibshirani R. 2010. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22, PMID: 20808728. [PMC free article] [PubMed] [Google Scholar]
  • 31.Yang Y, Zou H, Bhatnagar S. 2020. gglasso: Group Lasso Penalized Learning Using a Unified BMD Algorithm. R package version 1.5. https://CRAN.R-project.org/package=gglasso.
  • 32.Bobb JF. 2017. bkmr: Bayesian Kernel Machine Regression. R package version 0.2.0. https://CRAN.R-project.org/package=bkmr.
  • 33.MacQueen I. 1967. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Vol. 5.1: Statistics. Le Cam LM, Neyman J, eds. Berkeley, CA: University of California Press. 1:281–297. [Google Scholar]
  • 34.Liaw A, Wiener M. 2002. Classification and regression by randomForest. R News 2(3):18–22. [Google Scholar]
  • 35.Bien J, Taylor J, Tibshirani R. 2013. A lasso for hierarchical interactions. Ann Stat 41(3):1111–1141, PMID: 26257447, 10.1214/13-AOS1096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Rix A, Boss J. 2020. higlasso: Hierarchical Integrative Group LASSO. R package version 0.9.0. https://CRAN.R-project.org/package=higlasso.
  • 37.Rix A. 2021. snif: selection of Nonlinear Interactions by a Forward Stepwise Algorithm. R package version 0.5.0.
  • 38.Nelder JA, Wedderburn RW. 1972. Generalized linear models. J R Stat Soc Ser A 135(3):370–384, 10.2307/2344614. [DOI] [Google Scholar]
  • 39.Renzetti S, Curtin P, Just AC, Bello G, Gennings C. 2021. gWQS: generalized Weighted Quantile Sum Regression. R package version 3.0.4. https://CRAN.R-project.org/package=gWQS.
  • 40.Renzetti S, Gennings C, Curtin PC. 2019. gWQS: an R package for linear and generalized weighted quantile sum (WQS) regression. J Stat Softw 1–9. [Google Scholar]
  • 41.Keil A. 2021. qgcomp: quantile G-Computation. R package version 2.8.5. https://github.com/alexpkeil1/qgcomp/.
  • 42.Meeker JD, Cantonwine DE, Rivera-González LO, Ferguson KK, Mukherjee B, Calafat AM, et al. 2013. Distribution, variability, and predictors of urinary concentrations of phenols and parabens among pregnant women in Puerto Rico. Environ Sci Technol 47(7):3439–3447, PMID: 23469879, 10.1021/es400510g. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Cantonwine DE, Cordero JF, Rivera-González LO, Anzalota Del Toro LV, Ferguson KK, Mukherjee B, et al. 2014. Urinary phthalate metabolite concentrations among pregnant women in Northern Puerto Rico: distribution, temporal variability, and predictors. Environ Int 62:1–11, PMID: 24161445, 10.1016/j.envint.2013.09.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.World Health Organization. 2004. International Statistical Classification of Diseases and Related Health Problems: Alphabetical Index. vol. 3. Geneva, Switzerland: World Health Organization. [Google Scholar]
  • 45.Langer O. 2000. Fetal macrosomia: etiologic factors. Clin Obstet Gynecol 43(2):283–297, PMID: 10863626, 10.1097/00003081-200006000-00006. [DOI] [PubMed] [Google Scholar]
  • 46.Stekhoven DJ. 2013. missForest: nonparametric missing value imputation using random Forest. R Package Version 1.4. [Google Scholar]
  • 47.Stekhoven DJ, Bühlmann P. 2012. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118, PMID: 22039212, 10.1093/bioinformatics/btr597. [DOI] [PubMed] [Google Scholar]
  • 48.Dominici F, Wang C, Crainiceanu C, Parmigiani G. 2008. Model selection and health effect estimation in environmental epidemiology. Epidemiology 19(4):558–560, PMID: 18552590, 10.1097/EDE.0b013e31817307dc. [DOI] [PubMed] [Google Scholar]
  • 49.Wilson A, Zigler CM, Patel CJ, Dominici F. 2018. Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics 74(3):1034–1044, PMID: 29569228, 10.1111/biom.12860. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Aschard H. 2016. A perspective on interaction effects in genetic association studies. Genet Epidemiol 40(8):678–688, PMID: 27390122, 10.1002/gepi.21989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Kim S, Wang M, Tyrer JP, Jensen A, Wiensch A, Liu G, et al. 2019. A comprehensive gene–environment interaction analysis in ovarian cancer using genome‐wide significant common variants. Int J Cancer 144(9):2192–2205, PMID: 30499236, 10.1002/ijc.32029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Knol MJ, VanderWeele TJ. 2012. Recommendations for presenting analyses of effect modification and interaction. Int J Epidemiol 41(2):514–520, PMID: 22253321, 10.1093/ije/dyr218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Knol MJ, VanderWeele TJ, Groenwold RH, Klungel OH, Rovers MM, Grobbee DE. 2011. Estimating measures of interaction on an additive scale for preventive exposures. Eur J Epidemiol 26(6):433–438, PMID: 21344323, 10.1007/s10654-011-9554-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Richardson DB, Kaufman JS. 2009. Estimation of the relative excess risk due to interaction and associated confidence bounds. Am J Epidemiol 169(6):756–760, PMID: 19211620, 10.1093/aje/kwn411. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Tchetgen Tchetgen EJ, Shi X, Wong BH, Sofer T. 2019. A general approach to detect gene (G)‐environment (E) additive interaction leveraging G‐E independence in case‐control studies. Stat Med 38(24):4841–4853, PMID: 31441522, 10.1002/sim.8337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Du J, Boss J, Han P, Beesley LJ, Kleinsasser M, Goutman SA, et al. 2022. Variable selection with multiply-imputed datasets: choosing between stacked and grouped methods. J Comput Graph Stat 31(4):1063–1075, PMID: 36644406, 10.1080/10618600.2022.2035739. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Wild CP. 2005. Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev 14(8):1847–1850, PMID: 16103423, 10.1158/1055-9965.EPI-05-0456. [DOI] [PubMed] [Google Scholar]
  • 58.Antonelli J, Zigler C. 2024. Causal analysis of air pollution mixtures: estimands, positivity, and extrapolation. Am J Epidemiol 193(10):1392–1398, PMID: 38872350, 10.1093/aje/kwae115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Traini E, Huss A, Portengen L, Rookus M, Verschuren WMM, Vermeulen RCH, et al. 2022. A multipollutant approach to estimating causal effects of air pollution mixtures on overall mortality in a large, prospective cohort. Epidemiology 33(4):514–522, PMID: 35384897, 10.1097/EDE.0000000000001492. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Eick SM, Hüls A. 2022. Invited perspective: challenges and opportunities for missing data in the context of environmental mixture methods. Environ Health Perspect 130(11):111305, PMID: 36416735, 10.1289/EHP12118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Antonelli J, Wilson A, Coull BA. 2023. Multiple exposure distributed lag models with variable selection. Biostatistics 25(1):1–19, PMID: 36073640, 10.1093/biostatistics/kxac038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Liu SH, Bobb JF, Claus Henn B, Schnaas L, Tellez-Rojo MM, Gennings C, et al. 2018. Modeling the health effects of time‐varying complex environmental mixtures: mean field variational bayes for lagged kernel machine regression. Environmetrics 29(4):e2504, PMID: 30686915, 10.1002/env.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Liu SH, Bobb JF, Lee KH, Gennings C, Claus Henn B, Bellinger D, et al. 2018. Lagged kernel machine regression for identifying time windows of susceptibility to exposures of complex mixtures. Biostatistics 19(3):325–341, PMID: 28968676, 10.1093/biostatistics/kxx036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Mork D, Wilson A. 2022. Treed distributed lag nonlinear models. Biostatistics 23(3):754–771, PMID: 33527997, 10.1093/biostatistics/kxaa051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Mork D, Wilson A. 2023. Estimating perinatal critical windows of susceptibility to environmental mixtures via structured Bayesian regression tree pairs. Biometrics 79(1):449–461, PMID: 34562017, 10.1111/biom.13568. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Wilson A, Hsu HL, Chiu YM, Wright RO, Wright RJ, Coull BA. 2022. Kernel machine and distributed lag models for assessing windows of susceptibility to environmental mixtures in children’s health studies. Ann Appl Stat 16(2):1090–1110, PMID: 36304836, 10.1214/21-aoas1533. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Antonelli J, Mazumdar M, Bellinger D, Christiani D, Wright R, Coull B. 2020. Estimating the health effects of environmental mixtures using Bayesian semiparametric regression and sparsity inducing priors. Ann Appl Stat 14(1):257–275, 10.1214/19-AOAS1307. [DOI] [Google Scholar]
  • 68.Koh EJ, Hwang SY. 2019. Multi-omics approaches for understanding environmental exposure and human health. Mol Cell Toxicol 15(1):1–7, 10.1007/s13273-019-0001-4. [DOI] [Google Scholar]
  • 69.Aung MT, Song Y, Ferguson KK, Cantonwine DE, Zeng L, McElrath TF, et al. 2020. Application of an analytical framework for multivariate mediation analysis of environmental data. Nat Commun 11(1):5624, PMID: 33159049, 10.1038/s41467-020-19335-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Ferguson KK, Chen Y-H, VanderWeele TJ, McElrath TF, Meeker JD, Mukherjee B. 2017. Mediation of the relationship between maternal phthalate exposure and preterm birth by oxidative stress with repeated measurements across pregnancy. Environ Health Perspect 125(3):488–494, PMID: 27352406, 10.1289/EHP282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Avery CL, Howard AG, Ballou AF, Buchanan VL, Collins JM, Downie CG, et al. 2022. Strengthening causal inference in exposomics research: application of genetic data and methods. Environ Health Perspect 130(5):055001, PMID: 35533073, 10.1289/EHP9098. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Monosson E. 2005. Chemical mixtures: considering the evolution of toxicology and chemical assessment. Environ Health Perspect 113(4):383–390, PMID: 15811826, 10.1289/ehp.6987. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ehp15305.s001.acco.pdf (1.4MB, pdf)

Articles from Environmental Health Perspectives are provided here courtesy of National Institute of Environmental Health Sciences

RESOURCES