Summary:
An important goal of environmental health research is to assess the risk posed by mixtures of environmental exposures. Two popular classes of models for mixtures analyses are response-surface methods and exposure-index methods. Response-surface methods estimate high-dimensional surfaces and are thus highly flexible but difficult to interpret. In contrast, exposure-index methods decompose coefficients from a linear model into an overall mixture effect and individual index weights; these models yield easily interpretable effect estimates and efficient inferences when model assumptions hold, but, like most parsimonious models, incur bias when these assumptions do not hold. In this paper we propose a Bayesian multiple index model framework that combines the strengths of each, allowing for non-linear and non-additive relationships between exposure indices and a health outcome, while reducing the dimensionality of the exposure vector and estimating index weights with variable selection. This framework contains response-surface and exposure-index models as special cases, thereby unifying the two analysis strategies. This unification increases the range of models possible for analyzing environmental mixtures and health, allowing one to select an appropriate analysis from a spectrum of models varying in flexibility and interpretability. In an analysis of the association between telomere length and 18 organic pollutants in the National Health and Nutrition Examination Survey (NHANES), the proposed approach fits the data as well as more complex response-surface methods and yields more interpretable results.
Keywords: Environmental health, kernel machine regression, multiple index models, variable selection
1. Introduction
An important goal of environmental health research is to quantify the risk posed by the environmental exposures to which humans are exposed. Exposures can include those from multiple domains, such as chemical stressors (e.g. metals, persistant organic pollutants, particles) and non-chemical stressors (e.g. psychosocial stress, diet). Research has historically focused on the effects of individual environmental exposures, but this is unrealistic as humans are not exposed to a single pollutant in isolation. As such, a priority of the National Institute of Environmental Health Sciences (NIEHS), and environmental health research broadly, is to understand the impact of environmental mixtures (Carlin et al., 2013; Taylor et al., 2016).
Numerous statistical approaches have been proposed to estimate the health effects of mixtures, including: index methods (Carrico et al., 2015; Keil et al., 2020), exposure-response surface methods (Bobb et al., 2015, 2018), Bayesian shrinkage and selection priors (Dunson et al., 2008; Herring, 2010), clustering approaches like profile regression (Molitor et al., 2010, 2011, 2014) and related Dirichlet process mixture models (Dunson et al., 2008, 2007), among others (for recent reviews, see Davalos et al., 2017; Hamra and Buckley, 2018; Tanner et al., 2020). When choosing an appropriate method, practitioners first identify the scientific question of interest (Gibson et al., 2019) and then choose an approach to answer that question. This choice often includes deciding between highly flexible methods that are difficult to interpret and more restrictive methods that yield interpretable results. Two of the most popular models in environmental mixtures research, exposure-response surface methodology and index based methods, exemplify this trade off.
Exposure-response surface methods, such as Bayesian kernel machine regression (BKMR; Bobb et al., 2015, 2018) or Gaussian process regression more generally (Williams and Rasmussen, 2006; Ferrari and Dunson, 2019), estimate a multi-dimensional exposure-response surface non-parametrically. These models allow for non-linear and non-additive relationships between exposures and response, thus describe a broad array of exposure-response relationships. In the case of BKMR, interpretation is primarily based on posterior analysis of the high-dimensional exposure-response surface and is highly reliant on visualization. When there is a moderate to large number of components in the mixture, succinct interpretation can be difficult due to the large number of main effects and interactions. For example, in the analysis of 18 exposures considered in this paper, BKMR requires visual inspection of 18 main effect exposure-response functions and 306 pairwise interaction surface plots.
In contrast, exposure-index methods analyze the relationship between a response and a linear multi-exposure index. Among the most popular approaches in the environmental health sciences are weighted quantile sum regression (WQS; Carrico et al., 2015; Renzetti et al., 2019; Colicino et al., 2019) and quantile G-computation (QGC; Keil et al., 2020). These approaches first transform predictors to the quantile scale and then fit a linear model. The regression coefficients are then decomposed into (1) an “overall index effect” and (2) index weights indicating the relative contribution of each mixture component to the overall effect. This decomposition eases interpretation because there is a single parameter that characterizes the overall effect of the mixture, and a set of weights that reflects the relative importance of each exposure to this overall effect. In addition, model parsimony results in efficient inferences when model assumptions hold. However, these methods are ultimately instances of linear models and may not be sufficiently flexible to accurately model a more complex exposure-response relationship. Though previous work has incorporated higher order terms or interactions (Keil et al., 2020), one must specify these parametrically, and moreover this comes at the cost of the convenient interpretation of index weights.
From a statistical perspective, these index based methods can be viewed as a single index model with an assumed linear association between the outcome and an index formed as a weighted average of quantiled exposures. More general formulations of the single index model (SIM) relax the linearity assumptions and model the relationship between the response and the exposure index non-parametrically (Powell et al., 1989; Hardle et al., 1993; Hristache et al., 2001; Yu and Ruppert, 2002; Lin and Kulasekera, 2007; Wang and Yang, 2009; Wang et al., 2010). When there are natural groupings of components within a mixture, one can further allow for multiple indices and interactions among them in a multiple index model (MIM) framework (Stoker, 1986; Ichimura and Lee, 1991; Samarov, 1993; Horowitz, 2012). These have received substantial attention in the econometric and statistical literature but have not enjoyed the same popularity in environmental health: recent reviews cite WQS/QGC and BKMR, but not MIMs (e.g., Davalos et al., 2017; Hamra and Buckley, 2018; Tanner et al., 2020). Wang et al. (2020) applied a frequentist SIM for exposure mixtures, but we are unaware of any application of MIMs to mixtures. While Bayesian SIMs have been developed (Antoniadis et al., 2004; Park et al., 2005; Choi et al., 2011; Gramacy and Lian, 2012; Alquier and Biau, 2013; Poon and Wang, 2013), we are aware of little work on Bayesian MIMs outside of Reich et al. (2011) in a very different mixture modelling framework.
In this paper, we propose a Bayesian multiple index model (BMIM), which provides a unified framework for estimating the association between exposure mixtures and a health outcome. Specifically, the BMIM represents a class of models that includes both response surface and index-based models. This class of models also encompasses a spectrum of models between these two extremes that vary in terms of model flexibility and interpretability, thus giving analysts greater control in selecting an appropriate method for any given data analysis.
The proposed approach: (1) facilitates formal Bayesian inference on interpretable index weights; (2) allows for a non-linear relation between an index and the outcome; (3) permits non-additive interactions among multi-exposure indices; and (4) incorporates Bayesian variable selection on mixture components. Moreover, we extend the BMIM framework to accommodate binary responses as well as cluster-correlated data.
2. NHANES Case Study: Telemere Length and a Mixture of Pollutants
We consider a case study of the association between a mixture of environmental pollutants and leukocyte telomere length (LTL), a biomarker of cellular aging (Mitro et al., 2016). To investigate this question, Mitro et al. (2016), and later Gibson et al. (2019), analyzed data from the 2001–2002 National Health and Nutrition Examination Survey (NHANES). From the original cohort of 11,039, both analyzed a subset of 1003 people aged 20 years or older who had complete data for relevant variables. Following Gibson et al. (2019), we consider an exposure mixture including 18 persistent organic pollutants that have been grouped naturally into sets of pollutants hypothesized to act similarly: Group 1 includes eight non-dioxin-like polychlorinated biphenyls (PCBs 74, 99, 138, 153, 170, 180, 187, and 194); Group 2 includes two non-ortho PCBs (PCBs 126 and 169); Group 3 includes PCB 118, four furans, and three dioxins. The outcome of interest is log-LTL, which is thought to be susceptible to environmental exposures (Mitro et al., 2016). Further details and inclusion criteria have been reported in depth elsewhere (Mitro et al., 2016; Gibson et al., 2019).
We analyze the same sample to characterize the relationship between log-LTL and the multi-pollutant mixture as well as the individual pollutant contributions.
3. Existing Methods
First let be the outcome of interest (log-LTL), be the exposures ( pollutants), and be a vector of covariates for the ith observation .
3.1. Quantile G-Computation (QGC)
Exposure index methods such as WQS (Gennings et al., 2010; Carrico et al., 2015) and QGC (Keil et al., 2020) fit a linear model as follows:
| (1) |
where represent pre-transformed versions of , scored as quantiles (e.g., 0,1,2,3 if is in the 0–1st, 1–2nd, 2–3rd or 3–4th quartile). The parameter represents the linear association between the index and the outcome. WQS and QGC regression simultaneously estimate an overall index association and component weights. By convention WQS and QGC model exposures on the quantile scale, but one could equivalently construct an index model with continuous exposures, implying slightly different parameter interpretations.
To identify both the weights and the overall effects some identifiability constraint is needed. For WQS, and (which we do not pursue here). QGC relaxes the positivity assumptions and allows for both positive and negative associations. Ultimately, the QGC approach estimates a linear model and then decomposes the estimated coefficients into an overall effect beta and component weights post-hoc. When estimates are of opposite sign, QGC typically reports positive weights and negative weights separately (see the qgcomp package; Keil et al., 2020). As such, each is interpreted as the “proportion of the positive association,” and analogously so for the negative weights (Keil et al., 2020).
3.2. Bayesian Kernel Machine Regression (BKMR)
We consider BKMR as a popular response surface method for environmental mixtures. BKMR is a flexible approach to modelling mixtures that allows non-linear associations and non-additive interactions among exposures. As BKMR estimates a smooth response surface, we include continuous exposures rather than quantized versions. The BKMR model is
| (2) |
where is an unknown and potentially non-linear function represented via a kernel. We assume exists in , defined by a positive semidefinite reproducing kernel .
The choice of kernel function uniquely determines a set of basis functions (Cristianini et al., 2000). A common choice is the Gaussian kernel,, where are feature weights and . This corresponds to radial basis functions (Liu et al., 2007).
Estimation leverages an equivalence with a linear mixed effects model (Liu et al., 2007). Under a Kernel representation for , model (2) can be written as
where is the kernel matrix with elements , and is a tuning parameter that determines model complexity with small favoring a more flexible model (Liu et al., 2007). Estimation is based on the marginal likelihood of with respect to . The model is completed by specifying priors for . We adopt default priors from Bobb et al. (2015, 2018) throughout.
Component-wise variable selection is incorporated via spike and slab priors on the . To identify effects of exposures that may be highly correlated, Bobb et al. (2015) also introduced a hierarchical variable selection scheme. This permits only one pollutant among a group of exposures to enter the model at a time, which may be too restrictive in some situations.
4. Proposed Approach: Bayesian Multiple Index Models
4.1. Model Framework
We propose a Bayesian multiple index model (BMIM) framework to combine the flexibility of response surface methods with the interpretability of more parsimonious index models. Suppose can be partitioned into mutually exclusive groups denoted for . The proposed model can be written
| (3) |
where are -vectors of index weights subject to the identifiability constraints: , where is the unit vector of length , and for . Contrast these constraints with those of the linear index models: like QGC, this allows for positive and negative associations, but rather than summing to 1 the weights have norm 1.
The key notion is that, in contrast to BKMR, one need only estimate an exposure-response surface of dimension , which may remain small even when is large. Moreover, within the mth index, we can interpret the contributions of each exposure via index weights .
We again employ kernel function that is now a function of a vector of the indices. That is, the Gaussian kernel can be written as for and . To reduce the computational burden of sampling a vector from this constrained space, we reparameterize as in Wilson et al. (2020). The Gaussian kernel is then: . The weights can be fully identified and recovered from the posterior sample of . Specifically, we have and .
4.2. Prior Specification and Posterior Inference
We specify a prior directly for the unconstrained . In particular, we allow for variable selection on the exposure component weights via spike and slab priors:
where , and is a point mass at zero (George and McCulloch, 1997). The spike and slab priors directly apply selection at the component level but can also exert selection at the index level. When all of the components in an index are selected out of the model the index itself is selected out of the model. We specify equal prior inclusion probability at the component level, which results in higher prior inclusion probability for indices with more components. One possible alternative is to apply equal prior inclusion probability to each index. Moreover, one might even specify spike-and-slab priors directly on the for explicit indexwise selection, which might also reduce the computational burden of updating all required by both BKMR and BMIM.
To complete the model specification, we adopt a flat prior for and , and we set , and . And we draw from the posterior via MCMC (see supporting information for details).
We summarize the posterior for the weights as follows. We first report the posterior inclusion probability (PIP) for an entire index—i.e. the posterior probability that . When a whole index is selected out of the model , the component weights are undefined. In this case, we subsequently describe the conditional posterior for via PIP, mean and 95% credible intervals. To summarize component-wise variable selection, we recover marginal PIPs for the component weights by multipling the PIP for by the conditional PIP for , as .
4.3. Indexwise Curves and Other Associations of Interest
As with BKMR, we can predict the response surface at arbitrary exposure levels. As such, BMIM estimation yields analogous component-wise response curves by varying a single exposure along a grid and holding others fixed. This also allows one to report an “overall mixture effect” analogous to that of QGC by simultaneously increasing all exposures by a quantile, or an “overall index effect” by increasing all exposures within an index.
The BMIM structure lends itself to reporting index-wise response curves. Consider the mth index, and set to some fixed quantile of posterior means of for each and . Then set (i.e., the mth column of ) to a grid of constants - e.g., equally spaced values between 5 th and 95 th percentiles across observations of the posterior means for each . This is convenient in that it naturally captures how exposures vary jointly. In contrast, increasing exposures simultaneously by a quantile would not necessarily capture their joint variability unless exposures are highly correlated. Note that this takes the weights as fixed and thus isolates uncertainty in the shape of the index-response curve. That is, it ignores uncertainty in the weights in the index of interest. This is akin to making inferences about in model (1), or constructing indices via (fixed) toxic equivalency factors as is common in toxicology (e.g., Mitro et al., 2016).
The parsimonious set of index-wise exposure-response functions simplifies the interpretation of a fitted model, as one can plot index-wise curves rather than component-wise curves. The contrast is even starker for interactions, as one could present two-way index-wise interaction plots rather than . In the NHANES example, this corresponds to 306 component-wise interaction plots under BKMR and only 6 index-wise plots under a MIM with 3 indices. Nevertheless, one could still extract the same component-wise curves while reflecting uncertainty associated with the estimation of the weights
4.4. Relation to Existing Methods and a Spectrum of Models
We highlight here two special cases. First, when , the BMIM reduces to a single index model. If one further adopts a polynomial kernel: of degree (and pre-transform exposures accordingly) one could recover the index model in (1)—with the benefit of uncertainty quantification on the weights . Even in more flexible single-index specifications, e.g. with quadratic or Gaussian kernel, one can still estimate well-defined and interpretable weights. Hence, the class of index models is contained in BMIM, including standard linear models and more flexible models with a nonlinear association.
At the other extreme, setting —that is, a BMIM model containing P single-exposure indices—corresponds to the ususal BKMR. This shows that the two common approaches to analyzing environmental mixtures in fact lie on a opposite ends of a spectrum of models that vary in their flexibility. Rather than forcing analysts to choose between one of these two extremes, the BMIM framework permits one to specify the level of flexibility and interpretability most appropriate for a given analysis.
4.5. Software and Extensions
In addition to the methods described here—including both Gaussian and polynomial kernels—we have extended the BMIM to several common scenarios. When outcomes are cluster-correlated, the BMIM can incorporate random intercepts. It can also accommodate binary outcomes via a probit-latent variable formulation. Details of these extensions can be found in the supplemental material. Associated software is available at github.com/glenmcgee/BMIM.
5. Simulations
5.1. Setup
We conducted a series of simulations to compare the behaviour of Bayesian single- and multiple-index models to the more flexible BKMR and the parsimonious QGC. Using the real exposures and covariates in the NHANES sample, we generated outcomes from , where included age (standardized), age2, male (0,1), and indicators of BMI (25–29.9; 30+). We set to loosely reflect effect sizes in the data application, and we set . We assumed two different structures for :
Scenario A follows from a single index model. First generate where such that and , standardized to have mean 0 and standard deviation 1. Then , where is S-shaped.
Scenario B follows from a multiple index model. Generate and , where , standardized as above, and and are as defined above. Then , where is a unimodal function, is flat (null), and is sigmoidal.
Details on the exposure response curves can be found in the supporting information.
In each setting, we generated 100 datasets of 500 observations. We split each sample into a training set of observations, and held out the remaining 200 observations to evaluate model fit. To each training set, we fit QGC , a Bayesian single index model (BSIM), a 3-index BMIM using the same indices as above (BMIM-3), BKMR, and BKMR with hierarchical selection using the three groups of exposures (BKMR-H), with Gaussian kernel.
In scenario (A), we would expect the BSIM to perform best as it is correctly specified (with unknown response curve). We expect BMIM (with ) and BKMR to be flexible enough to still perform well here, albeit with more variability due to less model structure. In scenario (B), the BSIM is mis-specified and should perform poorly. Here BMIM is correctly specified (with unknown response curve) and should perform better than the more flexible BKMR. QGC is misspecified in both scenarios, as it assumes linearity on the quantile scale.
We evaluated model fit via mean squared error (MSE) for the unknown in the test set, as well as MSE for the outcomes in the test set and in 4-fold cross validation (CV). For MSE, we report the mean as well as the standard deviation across datasets in order to contextualize differences between models. We also report 95% interval coverage (Cvg) and average standard errors (SE) for in the test set.
5.2. Simulation Results
Under the single index data-generating mechanism (A), the BSIM had the lowest MSE. The next lowest MSE was for BMIM-3 followed by BKMR. Under the three-index data-generating mechanism (B), BMIM-3 had the lowest MSE followed by BKMR. The incorrectly specified BSIM performed worse than both. In both scenarios, QGC exhibited very high MSE relative to the other approaches, as it was misspecified. As described previously, one could incorporate higher order terms into the QGC approach, but we found that incorporating quadratic terms for all exposures still did not perform well (see supporting information). Cross validation MSE followed the same pattern, suggesting this can be used for model selection.
Interval coverage behaved similarly, with the simplest correctly specified model achieving near nominal coverage. Interestingly, under the single-index scenario (A), the more flexible alternatives (BMIM-3 and BKMR) had progressively lower coverage, potentially because of bias due to increased shrinkage; nevertheless both greatly outperformed QGC. In scenario (B), BMIM-3 and BKMR both achieved near nominal coverage whereas the misspecified BSIM and QGC had very low coverage. Naturally, the more structured BSIM had smaller SEs than BMIM-3, which had lower SEs than BKMR in (A) and (B).
Interestingly, when we repeated the simulations in low-signal settings (setting or 2.0), the simpler models were no longer the clear winners. Instead, the BSIM, BMIM-3 and BKMR all performed about the same in terms of MSE, interval coverage, and even average SEs. Indeed, cross validation MSEs were effectively equal in low-signal settings. These models fit about equally well, and it would be difficult to choose one that best fits the data. Facing such a setting in practice, one would likely favor the model with simpler interpretations.
Note that we report here results for BKMR using the same priors for as in the BMIM and BSIM. In the supporting information we report results for BKMR using the default priors for from the bkmr package Bobb et al. (2018), which performed very similarly to our implementation when was small but worse when was high. BKMR with hierarchical variable selection generally performed worse in terms of MSE and interval coverage but achieved the lowest average SEs (see supporting information), because it imposes the most shrinkage by limiting a maximum of one exposure per group to feature in the model at a time. Therefore it is misspecified for the data generation mechanisms considered here.
6. Analysis of NHANES Case Study
6.1. Models and Analyses
We analyzed the NHANES data in order to compare the different conclusions that can be drawn under each approach. We considered several goals: characterizing the overall outcome association of the mixture, quantifying the contributions of each mixture component, characterizing the outcome associations of each index, and investigating two-way interactions.
We conducted a QGC analysis (with exposures binned into deciles; ), full BKMR (with componentwise as well as hierarchical variable selection via spike-and-slab priors), a BSIM analysis, and a BMIM analysis with three indices (BMIM-3; ) corresponding to the three exposure groups identified by Gibson et al. (2019), within which exposures are expected to act similarly. In the kernel-based methods, continuous exposures were first standardized to have mean 0 and standard deviation 1, and we adopted a Gaussian kernel. All models were adjusted for age (linear and quadratic), sex, and BMI .
6.2. Results
Overall Mixture Association.
All of the methods can estimate an “overall” association between mixture and health, but the interpretations differ somewhat. QGC quantifies the overall association through its parameter. This parameter represents the mean difference in log(LTL) comparing a population to another with all exposures one decile group higher. We estimate this to be 0.069 (95% confidence interval: [0.030, 0.108]). The kernel methods can estimate arbitrary contrasts, so we estimated an analogous “overall” mixture-health association, this time comparing a population with all exposures at their 60th percentile versus all exposures set to their 50th percentile. The three kernel methods all produced similar estimates of this overall effect: the BSIM estimate was 0.034 (95% credible interval [−0.006, 0.075]), the BMIM-3 estimate was 0.037 (95%CI [−0.004, 0.077]), and the BKMR estimate was 0.037 (95% CI [0.005, 0.069]). There are two likely reason that the estimates vary between QGC and the kernel-based methods. First, because of the non-linearity in the kernel approaches (as the estimate may depend on the specific quantiles being compared), but also because—despite their similar interpretations—the estimands are slightly different, as QGC compares two quantile categories rather than two specific exposure values.
Individual Component Contributions.
All of the methods provide a measure of variable or component importance. The interpretation and inference varies between methods.
QGC provides two sets of weights—positive and negative weights—and each represents the proportion of the positive or negative effect that can be attributed to that exposure. Table 2 presents estimates of these weights. Among the ten exposures with positive weights, Furan1 contributed the most, with a weight 0.18, and PCB99 and Furan4 each had weights of 0.14. Among the negative weights, PCB180 contributed the most, with a weight of 0.29, and the others were no more than 0.16. A key limitation is that the current software provides no estimates of uncertainty for the weights. Hence, inferential statements about the relative size of weights—or even the sign of weights—cannot be readily made.
Table 2:
Summarizing exposure weights in BKMR and the MIM. For BKMR we report posterior inclusion probabilities (PIPs) for each exposure. For MIM, we report the PIP for the entire index via ρ; we also summarize the distribution of weights θ conditional on (otherwise it is not well defined). Est is the posterior mean standardized to satisfy the constraints; CI is 95% credible interval.
| QGC | BKMR | BSIM (M=1) | BMIM-3 (M=3) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|||||||||
| Weights | ||||||||||||
|
|
|
|
|
|
|
|||||||
| Group | Exposure | Pos | Neg | PIP | PIP | PIP | Est | CI | PIP | PIP | Est | CI |
|
| ||||||||||||
| 1 | PCB074 | 0.10 | 0.11 | 1.00 | 0.17 | 0.04 | (−0.22, 0.58) | 0.60 | 0.37 | 0.39 | (−0.49, 1.00) | |
| PCB099 | 0.14 | 0.18 | 0.16 | 0.06 | (−0.06, 0.60) | 0.39 | 0.60 | (−0.38, 1.00) | ||||
| PCB138 | 0.02 | 0.12 | 0.15 | 0.02 | (−0.21, 0.37) | 0.33 | 0.29 | (−0.46, 1.00) | ||||
| PCB153 | 0.11 | 0.13 | 0.17 | 0.01 | (−0.35, 0.48) | 0.40 | 0.38 | (−0.60, 1.00) | ||||
| PCB170 | 0.16 | 0.13 | 0.15 | −0.02 | (−0.35, 0.13) | 0.37 | 0.25 | (−0.62, 1.00) | ||||
| PCB180 | 0.29 | 0.13 | 0.17 | −0.04 | (−0.45, 0.21) | 0.41 | 0.28 | (−0.62, 1.00) | ||||
| PCB187 | 0.02 | 0.11 | 0.15 | 0.02 | (−0.21, 0.41) | 0.34 | 0.21 | (−0.5, 0.99) | ||||
| PCB194 | 0.08 | 0.09 | 0.14 | −0.01 | (−0.32, 0.20) | 0.36 | 0.28 | (−0.53, 1.00) | ||||
| 2 | PCB126 | 0.09 | 0.09 | 0.16 | 0.07 | (−0.03, 0.67) | 0.44 | 0.51 | 0.46 | (−0.18, 1.00) | ||
| PCB169 | 0.10 | 0.18 | 0.25 | 0.13 | (−0.07, 0.85) | 0.79 | 0.89 | (0.00, 1.00) | ||||
| 3 | PCB118 | 0.10 | 0.11 | 0.30 | 0.24 | (−0.02, 0.91) | 0.97 | 0.25 | 0.15 | (−0.11, 0.94) | ||
| Dioxin1 | 0.05 | 0.10 | 0.13 | 0.01 | (−0.20, 0.30) | 0.19 | −0.02 | (−0.48, 0.37) | ||||
| Dioxin2 | 0.11 | 0.09 | 0.14 | −0.03 | (−0.37, 0.07) | 0.23 | −0.04 | (−0.61, 0.49) | ||||
| Dioxin3 | 0.16 | 0.07 | 0.11 | −0.02 | (−0.28, 0.02) | 0.17 | 0.00 | (−0.36, 0.41) | ||||
| Furan1 | 0.18 | 0.88 | 0.81 | 0.95 | (0.00, 1.00) | 0.82 | 0.98 | (−0.12, 1.00) | ||||
| Furan2 | 0.02 | 0.10 | 0.16 | 0.05 | (−0.13, 0.47) | 0.21 | 0.06 | (−0.44, 0.76) | ||||
| Furan3 | 0.14 | 0.11 | 0.15 | 0.05 | (−0.05, 0.46) | 0.24 | 0.08 | (−0.38, 0.87) | ||||
| Furan4 | 0.14 | 0.21 | 0.11 | 0.01 | (−0.11, 0.25) | 0.22 | 0.11 | (−0.3, 0.94) | ||||
BKMR does not estimate exposure weights. Instead, we quantify exposure importance via posterior inclusion probabilities (PIPs; Table 2). This is the posterior probability that . The PIP identifies the probability that a particular components contributes to the exposure-response function but does not provide a measure of effect size. This is in contrast to the weights in QGC that identify only the relative size of the association but not uncertainty. BKMR also identified Furan1 as having the strongest signal, with a PIP of 0.88; all others had PIPs below 0.30. PCB180, which had the largest component weight in the QGC analysis, had a PIP of 0.13 indicating weak support for an association with the outcome.
The BSIM and BMIM-3 estimate PIPs as well as component weights, along with 95% credible intervals for the weights. The BSIM estimates of the weights for components in the first group (non-dioxin-like PCBs) were all close to zero (between −0.04 and 0.06; PIPs between 0.14 and 0.17). In contrast, two exposures in this group—PCB99 and PCB180—had among the strongest weights under QGC. Among the 10 other exposures, the signs of the weights matched those of QGC for all but Furan3. Overall, the results were much more similar to those of BKMR, in that Furan1 had by far the strongest weight, with a PIP of 0.81 and an estimated weight of 0.95 (95% CI [0.00,1.00]); all other estimated weights were far smaller in magnitude and had PIPs of no more than 0.30.
The BMIM-3 results were similar to those of the BSIM: Furan1 still had by far the strongest association (, 95% CI: [−0.12, 1.00]). In index 1, the estimated weight for PCB74 (0.39; 95% CI [−0.49, 1.00]) was nearly twice that of PCB187 (0.21; 95% CI [−0.50, 0.99]), indicating the association was twice as strong for PCB74 in the direction of the response curve for index 1—however, index 1 as a whole was very weakly associated with the outcome.
As Furan1 was identified as the strongest mixture component by each of the kernel methods, we plot the corresponding estimated exposure-response curves (with all other exposures set to their medians) under each of these approaches in Figure 1. The three plots are nearly identical, indicating that BKMR, the BSIM and the BMIM-3 lead to roughly the same conclusions for the association between Furan1 and the outcome.
Figure 1:

Exposure response curve for Furan1 in NHANES analysis. Panel A shows fitted curve using the bkmr package in R; B is based on single index model; C is based on 3-index model.
Indexwise Exposure-Response Curves.
An advantage of the index models is that they facilitate studying response curves at the index level, rather than reporting individual exposure-response curves or increasing every exposure simultaneously. Under the BSIM, we visualize the index-wise estimated response curve in Figure 2 (a), which appears slightly sub-linear. Under the BMIM-3, the three estimated index-wise curves are not very different from that of the BSIM (panel b), with each increasing sublinearly. Specifically, the estimated response curve for index 1 is close to null and has the widest credible intervals; that of index 2 corresponds to a slightly stronger association, and that of index 3 is the strongest. Note in particular the similarity between the response curve for index 3 and the component-wise curve for Furan1. Matching this observation, the index-level PIPs (for ) were correspondingly low for the first two indices (0.60 and 0.44), whereas that of index 3 was 0.97.
Figure 2:

Estimated index-wise response curves from NHANES analysis. The top panel shows the exposure-response function for the BSIM. The bottom panel shows the exposure-response functions for each of the three indices in the BMIM-3. For the BMIM-3, index exposure-response function is plotted with the other indices fixed at the median value. The plot also shows 95% credible intervals.
Interactions.
A feature of BMIM and BKMR is that we can consider interactions, although we consider a different set of interactions in each. Under BKMR, we compare fitted exposure response curves for when is set to its 10th, 50th or 90th percentile (and all others to their medians). We display these for all pairs in Figure 3 (a), although it is not straight-forward to interpret the resulting plots. By contrast the BMIM-3 framework characterizes interactions between whole indices, so rather than scanning 306 plots, we need only investigate . Moreover, as indexwise plots more naturally characterize how exposures within an index vary jointly, they are less prone to the sparsity issues that arise in the 306 component-wise interaction plots. We plot these two-way indexwise interactions in panel (b), and there appears to be some indication of interaction between index 1 and the others, although the evidence is weak relative to the level of uncertainty.
Figure 3:

Visualizing two-way interactions under BKMR (left) and the BMIM-3 (right) in the NHANES analysis. Each panel shows the exposure-response relation for one exposure (panel [a]) or one index (panel [b]) with all other exposures or indices fixed at a given percentile (0.1, 0.5, or 0.9). Parallel lines indicate no interaction between components or indices while deviation indicates interactions.
Model Fit.
We conducted 5-fold cross validation to compare model fit via MSE. Of the four models, QGC had the highest MSE (0.790), and BKMR had the lowest (0.774). The MSEs for BSIM and BMIM-3 were effectively equal to that of BKMR (less than 0.3% higher). As the results of the kernel approaches identified one strong association, we also fit BKMR with hierarchical variable selection, but it fit no better than the standard BKMR (0.774). Practically speaking, the simpler index models were able to achieve virtually the same fit as the more complex BKMR, all while simplifying the burden of interpretation substantially.
7. Discussion
The BMIM framework represents a spectrum of models, with linear index models and BKMR at either end. At present it is common in environmental mixture studies to adopt one of these two extremes. By bridging the gap between the two, the BMIM framework allows analysts to select a model with an appropriate balance of flexibility and interpretability. When , the BMIM extends linear index models to allow for variable selection and non-linear relations. When , and absent interactions, it can also allow for additive index models. Even when a single multi-exposure index is of interest, the BMIM allows one to examine interactions between this index and one or more individual covariates, such as age or socioeconomic status (SES). Rather than including such covariates in the vector zi, the BMIM allows them to be included in the kernel as follows: This allows for interactions between a multi-exposure index and covariates, or even interaction with a multiple-modifier index. As another special case, one could incorporate a time-varying exposure in an index with smooth structure imposed on the weights (as in a BKMR-distributed lag model; Wilson et al., 2020) to allow interactions between time-varying exposures and other indices or exposures measured cross-sectionally.
In the NHANES case study, some commonalities and differences emerged across methods. The BSIM, BMIM-3, and BKMR all indicated Furan1 had the strongest association with the outcome; QGC estimated it to have the largest positive weight. One key difference was that QGC reported an even larger negative weight for PCB180, whereas the Bayesian kernel methods did not weight it highly. This was because while the estimated coefficient for this exposure in the underlying linear model was the largest in magnitude, it was small relative to the uncertainty of the estimate. The Bayesian methods reflect this uncertainty and therefore the strength of evidence of an association, whereas the convention for these index models has been to not report these uncertainties. Our results suggest this is worth doing even if one opts for an existing frequentist index method.
Like BKMR, a limitation of the BMIM is the computational burden, as MCMC requires repeatedly inverting an matrix. However, ongoing research on computational methods for BKMR could allow BKMR and BMIMs to scale to large samples. In any case, the sample size is more restrictive than the number of exposures and we have successfully applied the method to over a hundred exposures grouped into dozens of indices.
Existing index models are often employed in causal analyses. In this article we focus on the statistical—rather than causal—issues of mixtures analyses, but we note that the BMIM could be adapted for causal inference as well. This, however, would require detailed discussion of context-specific assumptions, and warrants further investigation.
While it is very often the case that there is a natural way to partition exposures into indices, a limitation is that index groupings are fixed a priori and hence may be misspecified. In simulations, overly restrictive models—i.e. assuming when there is a 3-index structure, or when generating data under a full BKMR but fitting a 3-index model (see supporting information)—led to worse accuracy. The same is true if one correctly specifies the number of indices but incorrectly assigns exposures to groups (see supporting information). We recommend letting biological knowledge drive index assignment where possible; a conservative approach would be to not group exposures in the absence of prior knowledge. One might also investigate model fit across candidate exposure groupings. A natural next step would be to develop flexible strategies for adaptively selecting index groupings to avoid misspecification of any fixed grouping. However, extensions to group exposures adaptively or average over different groupings would likely come with more difficult interpretations, decreased power, and potential identifiability issues. Namely this expanded model formulation with unknown variable groupings would represent a new class of models that merit its own investigation.
Supplementary Material
Table 1:
Simulation Results across 100 datasets. The table shows mean squared error (MSE), average standard errors () and 95% interval coverage (Cvg) on the estimated h function in the test dataset, as well as MSE for y evaluated on a test dataset (MSE(y)) and via cross validation (CV-MSE(y)). BKMR-H is BKMR with hierarchical variable selection.
| MSE(h) | Cvg(h) | MSE(y) | CV-MSE(y) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|||||||||
| σ | M | Model | Mean | SD | Mean | SD | Mean | SD | ||
|
| ||||||||||
| 0.5 | 1 | QGC | 0.150 | 0.015 | 0.19 | 0.63 | 0.41 | 0.04 | 0.44 | 0.04 |
| BSIM | 0.031 | 0.010 | 0.17 | 0.94 | 0.28 | 0.03 | 0.30 | 0.03 | ||
| BMIM-3 | 0.051 | 0.011 | 0.21 | 0.91 | 0.30 | 0.03 | 0.34 | 0.03 | ||
| BKMR | 0.076 | 0.012 | 0.22 | 0.83 | 0.32 | 0.04 | 0.37 | 0.03 | ||
| BKMR-H | 0.085 | 0.014 | 0.14 | 0.60 | 0.33 | 0.04 | 0.38 | 0.03 | ||
| 3 | QGC | 0.188 | 0.024 | 0.19 | 0.61 | 0.41 | 0.04 | 0.42 | 0.04 | |
| BSIM | 0.122 | 0.027 | 0.17 | 0.54 | 0.34 | 0.04 | 0.35 | 0.04 | ||
| BMIM-3 | 0.035 | 0.012 | 0.19 | 0.95 | 0.28 | 0.03 | 0.29 | 0.03 | ||
| BKMR | 0.048 | 0.013 | 0.23 | 0.94 | 0.29 | 0.03 | 0.30 | 0.03 | ||
| BKMR-H | 0.085 | 0.021 | 0.14 | 0.62 | 0.33 | 0.04 | 0.35 | 0.03 | ||
|
| ||||||||||
| 1.0 | 1 | QGC | 0.226 | 0.048 | 0.33 | 0.81 | 1.21 | 0.13 | 1.27 | 0.11 |
| BSIM | 0.086 | 0.038 | 0.30 | 0.94 | 1.07 | 0.12 | 1.13 | 0.10 | ||
| BMIM-3 | 0.113 | 0.033 | 0.31 | 0.90 | 1.10 | 0.12 | 1.17 | 0.10 | ||
| BKMR | 0.125 | 0.034 | 0.30 | 0.87 | 1.11 | 0.12 | 1.19 | 0.10 | ||
| BKMR-H | 0.142 | 0.035 | 0.22 | 0.65 | 1.13 | 0.13 | 1.17 | 0.09 | ||
| 3 | QGC | 0.263 | 0.059 | 0.33 | 0.77 | 1.21 | 0.12 | 1.27 | 0.12 | |
| BSIM | 0.157 | 0.042 | 0.30 | 0.81 | 1.13 | 0.11 | 1.15 | 0.10 | ||
| BMIM-3 | 0.090 | 0.035 | 0.33 | 0.95 | 1.07 | 0.11 | 1.11 | 0.11 | ||
| BKMR | 0.094 | 0.035 | 0.34 | 0.95 | 1.08 | 0.11 | 1.11 | 0.10 | ||
| BKMR-H | 0.141 | 0.046 | 0.23 | 0.71 | 1.12 | 0.12 | 1.15 | 0.10 | ||
|
| ||||||||||
| 2.0 | 1 | QGC | 0.528 | 0.177 | 0.62 | 0.91 | 4.43 | 0.47 | 4.63 | 0.42 |
| BSIM | 0.227 | 0.113 | 0.48 | 0.92 | 4.17 | 0.45 | 4.28 | 0.36 | ||
| BMIM-3 | 0.229 | 0.103 | 0.48 | 0.92 | 4.18 | 0.44 | 4.33 | 0.37 | ||
| BKMR | 0.232 | 0.104 | 0.48 | 0.92 | 4.18 | 0.45 | 4.34 | 0.34 | ||
| BKMR-H | 0.291 | 0.120 | 0.37 | 0.75 | 4.24 | 0.46 | 4.29 | 0.35 | ||
| 3 | QGC | 0.565 | 0.186 | 0.62 | 0.89 | 4.43 | 0.45 | 4.63 | 0.44 | |
| BSIM | 0.228 | 0.096 | 0.50 | 0.94 | 4.18 | 0.42 | 4.28 | 0.39 | ||
| BMIM-3 | 0.200 | 0.097 | 0.52 | 0.96 | 4.15 | 0.42 | 4.29 | 0.40 | ||
| BKMR | 0.194 | 0.101 | 0.52 | 0.96 | 4.15 | 0.42 | 4.29 | 0.40 | ||
| BKMR-H* | 0.277 | 0.118 | 0.38 | 0.80 | 4.22 | 0.44 | 4.28 | 0.37 | ||
–based on 99 datasets, due to computational instability.
Acknowledgements
This research was supported by NIH grants ES028800, ES028811, ES028688, and ES000002, and by USEPA grants RD-835872–01 and RD-839278. Its contents are solely the responsibility of the grantee and do not represent the official views of the USEPA. Further, USEPA does not endorse the purchase of any commercial products or services mentioned in the publication.
Footnotes
Supporting Information
Web Appendices, Tables, and Figures referenced in Sections 4, 5 and 7 are available with this paper at the Biometrics website on Wiley Online Library. R code for analysis and simulations is available with this paper at the Biometrics website on Wiley Online Library and is also available at github.com/glenmcgee/BMIM.
Data Availability Statement
The data that support the findings in this paper are openly available on GitHub at https://github.com/lizzyagibson/SHARP.Mixtures.Workshop, published alongside Gibson et al. (2019).
References
- Alquier P and Biau G (2013). Sparse single-index model. Journal of Machine Learning Research 14, 243–280. [Google Scholar]
- Antoniadis A, Grégoire G, and McKeague IW (2004). Bayesian estimation in single-index models. Statistica Sinica pages 1147–1164. [Google Scholar]
- Bobb JF, Henn BC, Valeri L, and Coull BA (2018). Statistical software for analyzing the health effects of multiple concurrent exposures via bayesian kernel machine regression. Environmental Health 17, 67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, and Coull BA (2015). Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 16, 493–508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carlin DJ, Rider CV, Woychik R, and Birnbaum LS (2013). Unraveling the health effects of environmental mixtures: an niehs priority. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carrico C, Gennings C, Wheeler DC, and Factor-Litvak P (2015). Characterization of weighted quantile sum regression for highly correlated data in a risk analysis setting. Journal of agricultural, biological, and environmental statistics 20, 100–120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choi T, Shi JQ, and Wang B (2011). A gaussian process regression approach to a single-index model. Journal of Nonparametric Statistics 23, 21–36. [Google Scholar]
- Colicino E, Pedretti NF, Busgang S, and Gennings C (2019). Per-and poly-fluoroalkyl substances and bone mineral density: results from the bayesian weighted quantile sum regression. medRxiv page 19010710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cristianini N, Shawe-Taylor J, et al. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge university press. [Google Scholar]
- Davalos AD, Luben TJ, Herring AH, and Sacks JD (2017). Current approaches used in epidemiologic studies to examine short-term multipollutant air pollution exposures. Annals of epidemiology 27, 145–153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunson DB, Herring AH, and Engel SM (2008). Bayesian selection and clustering of polymorphisms in functionally related genes. Journal of the American Statistical Association 103, 534–546. [Google Scholar]
- Dunson DB, Herring AH, and Siega-Riz AM (2008). Bayesian inference on changes in response densities over predictor clusters. Journal of the American Statistical Association 103, 1508–1517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunson DB, Pillai N, and Park J-H (2007). Bayesian density regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69, 163–183. [Google Scholar]
- Ferrari F and Dunson DB (2019). Identifying main effects and interactions among exposures using gaussian processes. arXiv preprint arXiv:1911.01910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gennings C, Sabo R, and Carney E (2010). Identifying subsets of complex mixtures most associated with complex diseases: polychlorinated biphenyls and endometriosis as a case study. Epidemiology pages S77–S84. [DOI] [PubMed] [Google Scholar]
- George EI and McCulloch RE (1997). Approaches for bayesian variable selection. Statistica sinica pages 339–373. [Google Scholar]
- Gibson EA, Goldsmith J, and Kioumourtzoglou M-A (2019). Complex mixtures, complex analyses: an emphasis on interpretable results. Current Environmental Health Reports 6, 53–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gibson EA, Nunez Y, Abuawad A, Zota AR, Renzetti S, Devick KL, Gennings C, Goldsmith J, Coull BA, and Kioumourtzoglou M-A (2019). An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environmental Health 18, 76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gramacy RB and Lian H (2012). Gaussian process single-index models as emulators for computer experiments. Technometrics 54, 30–41. [Google Scholar]
- Hamra GB and Buckley JP (2018). Environmental exposure mixtures: questions and methods to address them. Current epidemiology reports 5, 160–165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hardle W, Hall P, and Ichimura H (1993). Optimal smoothing in single-index models. The annals of Statistics pages 157–178. [Google Scholar]
- Herring AH (2010). Nonparametric bayes shrinkage for assessing exposures to mixtures subject to limits of detection. Epidemiology (Cambridge, Mass.) 21, S71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Horowitz JL (2012). Semiparametric methods in econometrics, volume 131. Springer Science & Business Media. [Google Scholar]
- Hristache M, Juditsky A, and Spokoiny V (2001). Direct estimation of the index coefficient in a single-index model. Annals of Statistics pages 595–623. [Google Scholar]
- Ichimura H and Lee L-F (1991). Semiparametric least squares estimation of multiple index models: single equation estimation. Nonparametric and semiparametric methods in econometrics and statistics pages 3–49. [Google Scholar]
- Keil AP, Buckley JP, O’Brien KM, Ferguson KK, Zhao S, and White AJ (2020). A quantile-based g-computation approach to addressing the effects of exposure mixtures. Environmental health perspectives 128, 047004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lin W and Kulasekera K (2007). Identifiability of single-index models and additive-index models. Biometrika 94, 496–501. [Google Scholar]
- Liu D, Lin X, and Ghosh D (2007). Semiparametric regression of multidimensional genetic pathway data: Least-squares kernel machines and linear mixed models. Biometrics 63, 1079–1088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitro SD, Birnbaum LS, Needham BL, and Zota AR (2016). Cross-sectional associations between exposure to persistent organic pollutants and leukocyte telomere length among us adults in nhanes, 2001–2002. Environmental health perspectives 124, 651–658. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Molitor J, Brown IJ, Chan Q, Papathomas M, Liverani S, Molitor N, Richardson S, Van Horn L, Daviglus ML, Dyer A, et al. (2014). Blood pressure differences associated with optimal macronutrient intake trial for heart health (omniheart)–like diet compared with a typical american diet. Hypertension 64, 1198–1204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Molitor J, Papathomas M, Jerrett M, and Richardson S (2010). Bayesian profile regression with an application to the national survey of children’s health. Biostatistics 11, 484–498. [DOI] [PubMed] [Google Scholar]
- Molitor J, Su JG, Molitor N-T, Rubio VG, Richardson S, Hastie D, Morello-Frosch R, and Jerrett M (2011). Identifying vulnerable populations through an examination of the association between multipollutant profiles and poverty. Environmental science & technology 45, 7754–7760. [DOI] [PubMed] [Google Scholar]
- Park CG, Vannucci M, and Hart JD (2005). Bayesian methods for wavelet series in single-index models. Journal of Computational and Graphical Statistics 14, 770–794. [Google Scholar]
- Poon W-Y and Wang H-B (2013). Bayesian analysis of generalized partially linear single-index models. Computational statistics & data analysis 68, 251–261. [Google Scholar]
- Powell JL, Stock JH, and Stoker TM (1989). Semiparametric estimation of index coefficients. Econometrica: Journal of the Econometric Society pages 1403–1430. [Google Scholar]
- Reich BJ, Bondell HD, and Li L (2011). Sufficient dimension reduction via bayesian mixture modeling. Biometrics 67, 886–895. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Renzetti S, Gennings C, and Curtin PC (2019). gwqs: An r package for linear and generalized weighted quantile sum (wqs) regression. Journal of Statistical Software. [Google Scholar]
- Samarov AM (1993). Exploring regression structure using nonparametric functional estimation. Journal of the American Statistical Association 88, 836–847. [Google Scholar]
- Stoker TM (1986). Consistent estimation of scaled coefficients. Econometrica: Journal of the Econometric Society pages 1461–1481. [Google Scholar]
- Tanner E, Lee A, and Colicino E (2020). Environmental mixtures and children’s health: identifying appropriate statistical approaches. Current Opinion in Pediatrics 32, 315–320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Taylor KW, Joubert BR, Braun JM, Dilworth C, Gennings C, Hauser R, Heindel JJ, Rider CV, Webster TF, and Carlin DJ (2016). Statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environmental health perspectives 124, A227–A229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J-L, Xue L, Zhu L, Chong YS, et al. (2010). Estimation for a partial-linear single-index model. The Annals of statistics 38, 246–274. [Google Scholar]
- Wang L and Yang L (2009). Spline estimation of single-index models. Statistica Sinica pages 765–783. [Google Scholar]
- Wang Y, Wu Y, Jacobson M, Lee M, Jin P, Trasande L, and Liu M (2020). A family of partial-linear single-index models for analyzing complex environmental exposures with continuous, categorical, time-to-event, and longitudinal health outcomes. Environmental Health 19,. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williams CK and Rasmussen CE (2006). Gaussian processes for machine learning, volume 2. MIT press; Cambridge, MA. [Google Scholar]
- Wilson A, Hsu H-HL, Mathilda Chiu Y-H, Wright RO, Wright RJ, and Coull BA (2020). Kernel machine and distributed lag models for assessing windows of susceptibility to mixtures of time-varying environmental exposures in children’s health studies. arXiv preprint arXiv:1904.12417. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu Y and Ruppert D (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association 97, 1042–1054. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data that support the findings in this paper are openly available on GitHub at https://github.com/lizzyagibson/SHARP.Mixtures.Workshop, published alongside Gibson et al. (2019).
