Abstract
Background:
Environmental health researchers often aim to identify sources or behaviors that give rise to potentially harmful environmental exposures.
Objective:
We adapted principal component pursuit (PCP)—a robust and well-established technique for dimensionality reduction in computer vision and signal processing—to identify patterns in environmental mixtures. PCP decomposes the exposure mixture into a low-rank matrix containing consistent patterns of exposure across pollutants and a sparse matrix isolating unique or extreme exposure events.
Methods:
We adapted PCP to accommodate nonnegative data, missing data, and values below a given limit of detection (LOD). We simulated data to represent environmental mixtures of two sizes with increasing proportions and three noise structures. We applied PCP-LOD to evaluate its performance in comparison with principal component analysis (PCA). We next applied principal component pursuit with limit of detection (PCP-LOD) to an exposure mixture of 21 persistent organic pollutants (POPs) measured in 1,000 U.S. adults from the 2001–2002 National Health and Nutrition Examination Survey (NHANES). We applied singular value decomposition to the estimated low-rank matrix to characterize the patterns.
Results:
PCP-LOD recovered the true number of patterns through cross-validation for all simulations; based on an a priori specified criterion, PCA recovered the true number of patterns in 32% of simulations. PCP-LOD achieved lower relative predictive error than PCA for all simulated data sets with up to 50% of the data . When 75% of values were , PCP-LOD outperformed PCA only when noise was low. In the POP mixture, PCP-LOD identified a rank-three underlying structure and separated 6% of values as extreme events. One pattern represented comprehensive exposure to all POPs. The other patterns grouped chemicals based on known structure and toxicity.
Discussion:
PCP-LOD serves as a useful tool to express multidimensional exposures as consistent patterns that, if found to be related to adverse health, are amenable to targeted public health messaging. https://doi.org/10.1289/EHP10479
Introduction
To assess exposure to multiple chemicals simultaneously, researchers must consider the high dimensionality of environmental exposures and the complex correlation structure among chemicals. Environmental researchers often aim to represent patterns of exposure using dimension reduction techniques.1 Grouping chemicals may allow for interpretation of underlying sources or behaviors that give rise to these highly correlated multipollutant exposures. Identification of exposure routes is a consequential research question in environmental health because common sources may prove more modifiable or preventable than single chemical exposures and, thus, more amenable to regulatory action or targeted interventions.
When analyzing high-dimensional data, a major challenge is how to recover low-dimensional patterns from noisy, incomplete, or erroneous measurements.2 In environmental health, observations below the analytic limit of detection (LOD) provide an example of incomplete data. Depending on the laboratory, these observations may be marked as and not reported, or they may be reported as measured with less certainty than those .3,4 Identification of exposure patterns in data sets with large proportions of observations proves challenging.5
Traditional methods to handle observations include single and multiple imputation, the most common implementation being imputation with .6 This method was proposed in 1990 as providing more accurate estimation of the mean and standard deviation than imputation with and improved computational efficiency over a maximum likelihood method.7 However, predictive accuracy is not often the goal in environmental epidemiology, and computational speed is no longer a barrier to new methods. Furthermore, substitution of values with a fixed value (e.g., ), especially when some information is available, will impact the distribution of the data, potentially severely impacting exposure pattern identification in the study population.8
Here, we introduce a novel technique to identify patterns in environmental mixtures, adapting a robust and well-established method for data dimensionality reduction and pattern recognition in computer vision applications, principal component pursuit (PCP). PCP decomposes the exposure data matrix into a low-rank matrix (to identify underlying patterns of exposure across the pollutants) and a sparse matrix (to identify unusual, unique, or extreme exposure events).9 PCP has several advantages over traditional methods for pattern recognition in environmental mixtures. In a recent PCP extension, square root PCP (), Zhang et al. derived a new formulation with a universal choice of regularization parameter.10 Thus, the user is not required to choose or tune hyperparameters (i.e., parameters that control the strength of the penalty term or the amount of shrinkage). We combined this with a separate extension introducing a nonconvex penalty on the low-rank matrix that performs well with data that may not have a strong underlying structure.11,12 Estimation of the sparse matrix is especially advantageous. Traditional methods are sensitive to unusual or extreme exposure events13; the patterns identified by PCP are not influenced by outlying values. Instead, exposures that are not explained by patterns in the low-rank matrix are separated in the sparse matrix and available to the researcher for further analysis.
To our knowledge, this is the first time that PCP has been considered in pattern identification in environmental health or epidemiology. Additionally, we have included three novel extensions designed uniquely for chemical mixtures: a) a distinct penalty for observations (PCP-LOD) that has improved distributional assumptions over single imputation and adapts to study-specific confidence in measurement, b) a nonnegativity constraint on the low-rank matrix to improve interpretability of results, and c) procedures to accommodate missing values. We also implemented a cross-validation approach so that the choice of estimated components is replicable and free from researchers’ implicit assumptions. In this work, we conducted a simulation study based on real multipollutant exposures, simulating an increasing proportion of observations measured and varying levels and structure of added noise. We use these to compare PCP-LOD performance to that of PCA with values imputed as . Finally, we applied PCP-LOD to an environmental health data set of persistent organic pollutants (POPs) measured in the 2001–2002 cycle of the National Health and Nutrition Examination Study (NHANES) to identify consistent patterns of POP exposure while isolating unique or extreme events.
Methods
PCP
We present PCP as a robust method for dimensionality reduction and pattern identification.9 Given an exposure data matrix , with individual entries , where is the number of observations and is the number of pollutants, PCP seeks to express as a sum of two matrices: a low-rank matrix , with individual entries , where , and a sparse matrix , with individual entries , where most entries are zero. Because is of rank , its rank can correspond to underlying patterns in exposure, such as specific sources or certain behaviors. is still defined in terms of the original variables, i.e., patterns are not directly estimated. captures unusual or uniquely high or low events that cannot be explained by the identified patterns from ; it does not capture events relevant to individual chemicals. PCP may be paired with various matrix factorization techniques [e.g., principal component analysis (PCA), factor analysis, or nonnegative matrix factorization (NMF)] to extract chemical loadings and individual scores. In traditional PCP, the rank of and the number or location of nonzero entries in do not need to be a priori defined.
We incorporated two existing PCP extensions that suit features of environmental mixtures data. First, Zhang et al. recently proposed with a noise-independent universal choice of regularization parameters.10 Previous formulations of PCP required knowledge of the true noise level to determine the appropriate parameters.12,14 This approach is problematic in environmental mixtures where we cannot know or accurately estimate the underlying noise level, and it would leave the researcher with the subjective task of tuning parameters on a per–data set basis. Zhang et al. provide a more practical approach to pattern recognition in environmental mixtures.
As first proposed, PCP minimizes a weighted combination of the nuclear norm of , , and the entry-wise norm of , .9,14,15 A norm is inherently a distance measure; for example, Euclidean distance is a norm. These norms transform the size of the matrix (or the distance between matrices, as in below) on which they are applied to a positive real number. The nuclear norm of is the sum of its singular values and is often used to search for low rank matrices. The entry-wise norm of is the sum of the absolute value of each entry in the matrix; this encourages to be sparse. The Frobenius norm in Equation 1 is the sum of the square of each entry in the matrix.16 Notably, this formulation has the desirable quality of convexity, meaning that every local optimum is a global optimum, and there is a single best solution. This, together with the particular structure of the and nuclear norms, guarantees that the resulting optimization problem can be solved efficiently.17–19 In practice, however, the nuclear norm assumes a stronger low-rank structure (i.e., slowly decaying singular values) than what is the case in many real-life environmental mixtures (e.g., POPs or air pollution). To address unsatisfactory performance with the nuclear norm, we replaced it with a projection (Supplemental Materials, “Section S1: Indicator for the matrices”).19 Although the nuclear norm is convex, the projection is not, meaning that the algorithm could find a local optimum that is not the global one. However, closely related nonconvex formulations are accompanied by theoretical guarantees of equivalent performance with the convex implementation.11,12,20–22 These provide strong motivation for our framework but do not prove its success; our method contains a number of additional proposals that are helpful for processing health data but fall outside the bounds of the strongest existing theoretical results.23,24 Combining a nonconvex rank projection with ,10 we solve the following optimization problem:
| (1) |
where denotes the original data matrix. The two parameters, and , are not tuned by the researcher; instead, they are each set using single universal values, from Candès et al. and from Zhang et al., which have been shown theoretically to yield near-optimal estimation performance.9,10 The first term on the right is the rank-r projection, , an indicator function that constrains to be of rank . The value of must be specified; to address this, we implemented a cross-validation approach to choose in the “Implementation and Evaluation” section. The final term is the error between the predicted and the observed values, which favors a solution that is close to the original data.
Environmental health-relevant extensions.
To better adapt nonconvex () for use with environmental data, we extended this method in three ways. First, we modified the algorithm to allow for missing values. This method proves beneficial to environmental data sets that often include participants with missing exposure measurements. It also enables the cross-validation procedure outlined in the “Methods” section titled “Implementation and Evaluation.” Missing values are handled by only applying the LOD penalty to entries that are observed or , consistent with the literature on robust matrix completion.25,26 Next, we constrained the low-rank matrix to be nonnegative. Nonnegativity in allows for individual pattern scores and chemical loadings on patterns on the same support as the original chemical distributions. The nonnegativity constraint is enforced by using a splitting approach in which we introduce an auxiliary matrix variable, which is nonnegative, and add an additional equality constraint (Supplemental Materials, “Section S2: Optimization via Alternating Directions Method of Multipliers”).24 We tailored the third extension to observations . We introduced a diverging penalty, , in the solution to accommodate values when they are not available to the users (Equation 2), as is most commonly the case. This penalty treats all estimated values from zero to the LOD as equally good approximations (Equation 3, line 2), removing the error term from the objective function:
| (2) |
with
| (3) |
Here, is a matrix of LOD values, and represents the observation-specific LOD. This is an attribute of the data specified by the researcher; it can be common across all chemicals, chemical-specific, or chemical- and individual-specific, depending on the measurements. If all observations are , this equation simplifies to Equation 1. For observation and estimated values (Equation 3, line 3) or (Equation 3, line 4), we included more stringent penalties than in Equation 1, which acted to push estimates to the known range. Estimated values were penalized the most because measured concentrations cannot be negative. For observation , estimated values were penalized less than negative values but more than estimates for observations because we had prior information for these observations that the estimated values should be . For observation and estimated value (Equation 3, line 2), we placed no penalty on estimated values lying between and the LOD. This amounted to all values between zero and the LOD being equally optimal.
The proposed LOD penalty introduced no additional difficulty for optimization. To enforce additional structure, such as nonnegativity, on the low-rank term, we employed an Alternating Directions Method of Multipliers (ADMM) splitting technique (see Supplemental Materials, Section S2, for a detailed algorithmic description).24
Simulations
We simulated 100 exposure matrices for all combinations of two mixture sizes, three noise structures, and three detection proportions (1,800 total). We generated data sets of 500 observations each, , where presents an exposure profile with mixture components. We specified underlying patterns and investigated two mixture sizes ( and ) to represent medium and large environmental mixtures. We first simulated chemical loadings () to represent realistic environmental patterns where some chemicals were distinct to a single pattern and some chemicals appeared in multiple patterns. Each pattern included chemicals that loaded distinctly and chemicals that overlapped with a second pattern. Distinct chemicals were given a loading of 1 on the single pattern on which they loaded and a loading of 0 for the remaining patterns. One-third of the chemicals appeared in only one pattern; two-thirds of the chemicals appeared in two patterns. This design corresponds to multiple environmental sources giving rise to the chemicals in the mixture. Overlapping chemicals were drawn from a Dirichlet distribution so that their loadings would sum to 1 over all patterns. Of the four loadings across the four patterns for each chemical, two were drawn from and two were set to zero. This introduced variability into the overlapping chemical loadings (Figure 1A).
Figure 1.
(A) Representative simulated chemical loadings. We simulated 100 examples for mixture size (depicted here) and . Here, two chemicals load solely on each of the four patterns. The remaining chemicals appear in two patterns each. (B) Correlation matrix of one simulated data set () with high noise.
We next generated individual scores (). We drew scores independently from because chemical concentration distributions are generally concentrated at low values with long right tails. We created the simulated data from matrix products of individual scores by chemical loadings with added noise, replacing negative values with zero. We generated noise in one of three ways, a) low Gaussian noise [], b) high Gaussian noise [], or c) low Gaussian noise with high sparse events, which most closely aligns with the assumptions of PCP-LOD. We included low- and high-noise scenarios to describe performance under ideal (i.e., low-noise) and extreme (i.e., high-noise) circumstances. We included low noise with sparse events to evaluate the method’s ability to identify rare events. Figure 1B shows an example simulated correlation matrix. Finally, to provide representative examples of method performance on data with low to high proportions of values , we designated a quantile (25th, 50th, or 75th) and set all values below the set threshold as .
Study Population
For pattern recognition in an environmental mixture with varying detection limits across chemicals, we chose a mixture of dioxins, furans, and polychlorinated biphenyls (PCBs) measured in U.S. adults from the 2001–2002 NHANES cycle. POP concentrations from this NHANES cycle are well characterized and have been used in prior environmental mixture analyses.27,28 NHANES inclusion criteria have been reported previously.29 For the chosen cycle, 11,039 participants were interviewed. One-third of participants age 12 y and older were eligible for environmental chemical analysis. We removed individuals below 18 y of age or without any POP measurements, resulting in a final study sample of 1,000. Eighteen PCBs, seven dioxins, and nine furans were measured. Exposure assessment of POPs measured in blood serum in NHANES has been described previously.30,31 All POP values were lipid-adjusted by the U.S. Centers for Disease Control and Prevention (U.S. CDC).32 Of the POPs measured, 21 detected in at least 50% of all samples were included in our main analyses. Written informed consent was obtained from all participants, and NHANES data collection was approved by the National Center for Health Statistics (NCHS) Ethics Review Board.
Implementation and Evaluation
We determined the appropriate rank for PCP-LOD and the number of components to retain from PCA in the same manner for all experiments and the application. For PCA, we a priori defined our component retention criterion as the first components that explained of the variance in the data, as seen previously in environmental mixtures applications.27 Although it is possible to perform cross-validation on PCA,33,34 it is not a common practice in applied environmental health research. For PCP-LOD we used the default parameters for and and cross-validated to select the rank of the matrix. We set an initial grid of rank values from 1 to 10 for all scenarios. We performed this cross-validation approach on a single representative data set for each combination of simulated mixture sizes ( and 48), proportions (25%, 50%, and 75%), and noise structures (low, high, and sparse) and for the POP mixture.
To cross-validate PCP-LOD on a single data set, , we repeated the following steps 100 times for each rank : a) we randomly corrupted 20% of the mixture as missing (i.e., set the value to NA) to serve as a held-out test set, denoted , yielding the corrupted matrix ; b) we ran PCP-LOD on to obtain and ; and c) we recorded the relative recovery error of in comparison with the observed data in the held-out set, calculated via the Frobenius norm, . Finally, for each rank, we aggregated the average relative recovery error across 100 runs and chose the optimal rank, , as that with the lowest mean relative recovery error on the held-out set. We subsequently ran PCP-LOD on the full data set with the selected rank .
We ran PCP-LOD and PCA on all simulated data sets. We compared PCP-LOD and PCA to assess their relative performance when faced with large proportions of nondetectable observations. For PCA, we replaced observations with . For PCP-LOD we estimated the rank of , the sparsity of , and their relative change to assess stability of the solution across increasing proportions of data . Because the sparse matrix may contain nonzero values so close to zero as to be considered zero, we set a threshold above which to regard values as legitimate extreme exposures. We evaluated sparse events two standard deviations of the model residuals (), per chemical, from zero, i.e., , where indicates “observed” values above the LOD in the simulated data.
For both PCP-LOD and PCA, we calculated relative predictive error as the ratio of the error to the truth in terms of their Frobenius norm: . For PCP-LOD we interpreted as the predicted values, and for PCA we constructed predicted values as the product of the score matrix (i.e., the coordinates of the rotated data on the principal components) by the rotation matrix (i.e., right eigenvectors), truncated at the chosen rank. We defined the “truth” as simulated values before noise or sparse events were added. Finally, we assessed the stability of the identified patterns using the relative prediction error of the singular value decomposition (SVD).
Application
Prior to the application to the NHANES POP mixture, we examined distributional plots and descriptive statistics for all variables. We scaled all POP concentrations by their standard deviations to make variances comparable across chemicals. The solution, thus, cannot be influenced by high-variance pollutants. For PCA, we replaced observations with . We used PCP-LOD to separate unique events from underlying patterns. Following PCP-LOD, we extracted individual scores and pattern loadings from using SVD. We compared scores, loadings, and overall relative error with those obtained from PCA. We present unique events and interpret observed patterns. To better characterize sparse events, we employed hierarchical clustering on the sparse matrix to identify individuals with similar profiles of extreme events. We used Ward’s minimum variance linkage method to grow the dendrogram (i.e., the tree-based representation of observations),35 and we cut it based on subject-specific knowledge to choose the appropriate number of clusters.
As a sensitivity analysis, we repeated this application using a higher LOD cut point for POP inclusion, retaining only chemicals that were detected in at least 75% of samples. All analyses were conducted using R version 4.0.4 (https://www.r-project.org/). Code to implement PCP-LOD, along with simulations, NHANES data, and analyses conducted in this work are available at github.com/lizzyagibson/PCP-LOD (Supplemental Material, “R code”).
Results
Simulations
We ran PCP-LOD and PCA on all simulated data sets. PCP-LOD had lower relative prediction error across the majority of mixture size ( and 48), proportion (25%, 50%, and 75%), and noise structure (low, high, and sparse) combinations. PCP-LOD outperformed PCA on all simulations with low noise, simulations with high noise with up to 50% , and simulations with low noise and added sparse events with up to 50% (Figure 2 and Supplemental Table S1). Figures 2 and 3 present simulations where ; corresponding figures where are included in Supplemental Figures S1 and S2.
Figure 2.
Overall relative predictive error of principal component pursuit with limit of detection (LOD) and PCA on simulated data with across increasing proportions of data below the limit of detection. The panels show results for different structures of added noise. Box plots display summary statistics for each method across 100 simulations. The bottom and top hinges of the boxes correspond to the first and third quartiles (the 25th and 75th percentiles), respectively. The upper (lower) whiskers extend from the hinge to the largest (smallest) value no farther than from the hinge (where IQR is the interquartile range or distance between the first and third quartiles). See Supplemental Table S1 for corresponding numeric data. Note: IQR, interquartile range; LOD, limit of detection; PCP, principal component pursuit.
Figure 3.
Relative predictive error of principal component pursuit with limit of detection (LOD) and PCA on simulated data with stratified by detection. The panel columns separate results from different structures of added noise, and the panel rows separate values that were simulated as observed (top row) from those simulated as below the LOD (bottom row). Box plots display summary statistics for each method across 100 simulations. See Supplemental Table S2 for corresponding numeric data. Note: LOD, limit of detection; PCA, principal component analysis; PCP, principal component pursuit.
PCP-LOD was more affected by the proportion of data , which can be seen in the larger step size between box plots in Figure 2. The decline in PCP-LOD predictive accuracy as the proportion of values increased appears because of poorer performance on values in high-noise scenarios (Figure 3 and Supplemental Table S2). Relative prediction error for values was approximately constant for PCP-LOD and PCA. Supplemental Tables S3, S4, and S5 contain the median and interquartile range (IQR) of relative error for predicted values overall and stratified by LOD.
Next, we assessed the stability of the identified patterns using the SVD of the simulated data before noise or sparse events were added and compared this with the SVD of the matrix and of PCA results (Figure 4 and Supplemental Table S6). Figure 4 depicts the relative prediction error comparing the left eigenvectors (comparable to scaled individual scores) of the PCP-LOD and PCA solutions with those of the simulated “truth.” PCP-LODs median relative prediction error is generally lower than PCAs for the larger mixture size and higher than PCAs for the smaller mixture size. However, these patterns appear quite stable over increasing proportions of data for both methods. PCP-LOD solutions achieved lower relative prediction error on chemical loadings (i.e., right eigenvectors) across all simulations (Supplemental Figure S3 and Table S6).
Figure 4.
Relative estimation error of principal component pursuit with limit of detection (LOD) and PCA solution scores (i.e., left eigenvectors) compared with those of the simulated data before noise was added. The panel columns separate results from different structures of added noise, and panel rows present two simulated mixture sizes. Box plots display summary statistics for each method across 100 simulations. See Supplemental Table S6 for corresponding numeric data. Note: LOD, limit of detection; PCA, principal component analysis; PCP, principal component pursuit.
Across PCP-LOD solutions, between 2% and 10% of entries were nonsparse. We found decreasing sparsity as the proportion increased, with 3% (IQR: 2%, 4%), 6% (IQR: 4%, 7%), and 7% (IQR: 3%, 8%) unique events, on average, found in simulations with 25%, 50%, and 75% , respectively (Supplemental Table S7). For simulations that included sparse events in the noise structure, PCP-LOD correctly included 69% (IQR: 67%, 71%), 70% (IQR: 68%, 72%), and 65% (IQR: 62%, 67%) of sparse values in the matrix, on average, for simulations with 25%, 50%, and 75% , respectively (Supplemental Table S8).
Application
Thirty-four POPs were measured in the NHANES 2001–2002 cycle. Detection frequency is presented in Figure 5 and Supplemental Table S9. Fourteen PCBs, four furans, and three dioxins were detected in of samples. POP concentrations were all positively correlated (Figure 6A).
Figure 5.
Detection frequency of persistent organic pollutants measured in NHANES 2001–2002. All congeners to the right of the vertical dashed line were detected in of samples and included in the analysis. See Supplemental Table S9 for corresponding numeric data. Note: LOD, limit of detection; NHANES, National Health and Nutrition Examination Survey.
Figure 6.

(A) Spearman correlation matrix of 21 persistent organic pollutants measured in NHANES 2001–2002. Observations below the limit of detection were handled by case-wise deletion. (B) Spearman correlation matrix of low-rank structure across POPs estimated by principal component pursuit with limit of detection (LOD). Note: LOD, limit of detection; NHANES, National Health and Nutrition Examination Survey; PCP, principal component pursuit; POP, persistent organic pollutant.
We applied PCP-LOD to identify underlying patterns of POP exposure and extreme exposure events that were not explained by these patterns without making a priori assumptions concerning the number of patterns or sparse events. PCP-LOD returned a low-rank matrix of rank three, which corresponds with three patterns of POP exposure in the matrix. Figure 6B depicts the correlation matrix alongside the correlation matrix of the raw data (Figure 6A). By removing sparse events and residual noise, PCP-LOD increased the correlations between POPs. To characterize underlying patterns, we extracted principal components from the low-rank matrix using SVD.
The three components distinguished by PCP-LOD included one component of overall POP exposure, a component that separated dioxins and furans from PCBs, and a third component that separated higher molecular weight PCBs from lower molecular weight PCBs (Figure 7 and Supplemental Table S10). The first component explained 79.4% of the variance in the low-rank matrix, the second explained 14.6%, and the third explained 6.0%.
Figure 7.
SVD-identified components of principal component pursuit with limit of detection (LOD) matrix of underlying POP exposure across 21 POPs with at least 50% of measured concentrations in 2001–2002 NHANES participants. PCP-LOD chose a three-rank solution based on random hold-out cross-validation. See Supplemental Table S10 for corresponding numeric data. Note: LOD, limit of detection; NHANES, National Health and Nutrition Examination Survey; PCA, principal component analysis; PCP, principal component pursuit; POP, persistent organic pollutant; SVD, singular value decomposition.
PCA conducted on the POP mixture chose three components that explained of the variance and returned loadings and scores much the same as those from (Supplemental Figure S4 and Table S11). Using the three chosen components, the relative prediction error on values was 0.30 for PCA, similar to the relative error of 0.32 for PCP-LOD when comparing only with the original data. However, when including in the solution (), the relative error for PCP-LOD on values was 0.07. This is more comparable to the PCA solution when including all 21 components, 0.06, which does not accomplish any dimension reduction. Because values are unknown in this application, only the relative prediction error on values could be calculated.
PCP-LOD partitioned the variation that was unexplained by the low-rank structure into a sparse matrix of large outlying values and the remaining residuals. The matrix contained mostly zero values, with 5.7% of entries being nonsparse. Sparse observations were generally weakly correlated, with the absolute value of for 70% of Spearman correlations between sparse chemical exposure events (Supplemental Figure S5). Table 1 summarizes the number of individuals with uniquely high or low exposure events. Figure 8 describes participant-specific sparse events. Most participants had no extreme exposures (44%) or only extremely low exposures (18%). A total of 22% had one high unique event on a single chemical, and 16% had between two and six high exposures across 21 chemicals left unexplained by the identified patterns (Table 1 and Supplemental Table S12).
Table 1.
Summary of extreme events captured in the sparse component of principal component pursuit with limit of detection (LOD) from a mixture of 21 POPs measured in 1,000 participants in NHANES 2001–2002. Entries are counts of participants with uniquely low and/or high events, organized by row and column, respectively.
| High unique events | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | a | ||
| Low unique events | 0 | 439 | 141 | 41 | 11 | 5 | 0 | 1 | 638 |
| 1 | 147 | 46 | 30 | 13 | 5 | 1 | 0 | 242 | |
| 2 | 27 | 19 | 11 | 6 | 3 | 1 | 0 | 67 | |
| 3 | 8 | 11 | 9 | 8 | 2 | 1 | 1 | 40 | |
| 4 | 0 | 1 | 2 | 3 | 2 | 0 | 1 | 9 | |
| 5 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 2 | |
| 6 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 2 | |
| b | 621 | 218 | 93 | 42 | 18 | 4 | 4 | 1,000 | |
Note: LOD, limit of detection; NHANES, National Health and Nutrition Examination Survey; PCP, principal component pursuit; POP, persistent organic pollutant.
aRow sums of uniquely low events.
bColumn sums of uniquely high events.
Figure 8.
Principal component pursuit with limit of detection (LOD) solution matrix of sparse events of POP exposure in 2001–2002 NHANES participants. To facilitate visualization, we have categorized sparse values into high and low exposure events. Dark orange indicates an extremely high exposure event ; light purple indicates an extremely low exposure event . White indicates sparsity. POPs (columns) and NHANES participants (rows) are hierarchically clustered to further facilitate visualization. Note: LOD, limit of detection; NHANES, National Health and Nutrition Examination Survey; PCP, principal component pursuit; POP, persistent organic pollutant.
We identified three clusters grouping individuals with similar profiles of extreme events. The first cluster included the 439 individuals without any unique events. The second grouped 448 individuals with few sparse events per person [ IQR: (1, 2)] relative to cluster 3. The third cluster of 113 individuals had more sparse events per person [; IQR: (2, 5)] and more extreme values, on average (Supplemental Figures S6 and S7 and Table S13).
In a sensitivity analysis including POPs detected in of samples, we included 11 POPs. PCP-LOD returned a low-rank matrix of rank two, which corresponds to two patterns of POP exposure in the matrix. The two patterns distinguished by PCP-LOD were similar to the first two components from the main analysis (Supplemental Figure S8 and Table S14). The first component included all POPs loading in the same direction, and the second component separated dioxins and furans from PCBs. In the low-rank matrix, the first component explained 79.6% of the variance, and the second explained 20.4%. The matrix contained mostly zero values, with 11.4% of entries being nonsparse.
Discussion
We propose PCP-LOD as a new approach to identify patterns—and extreme events left unexplained by patterns—underlying environmental chemical mixtures in the presence of values . Our simulation studies highlighted three main advantages of PCP-LOD over PCA at identifying patterns in environmental mixtures: a) reduced error in estimated patterns of exposure, b) identification of extreme or unique events, and c) improved estimation of values .
Patterns identified by PCP-LOD are more robust to noise and incomplete data than more traditional pattern identification methods because patterns in are not influenced by events in . PCP-LOD estimated the underlying low-rank structure of with lower relative error than PCA under all realistic simulation scenarios. PCA outperformed PCP-LOD for two error structures when 75% of the data set was simulated as . In that case, PCP-LOD used 25% to reconstruct 75% of the data, and poorer performance was expected. However, it is unlikely that an environmental health researcher will face a chemical mixture with 75% of all values . In our application to POPs detected in over 50% of measurements among NHANES participants, 76% of all observations were . In the entire POP mixture of 34 chemicals, with five chemicals never detected, 52% of all observations were . We observed the highest relative prediction error across all simulations for values in simulated data sets. This held for PCA, as well, and applies to all methods to address censored or missing data.
Although we compared PCP-LOD with PCA in this work, PCA is not the only existing tool used to characterize chemical mixtures. Other traditional methods, such as factor analysis and NMF, have also been employed to address research questions around patterns of environmental exposures.36–38 Additional techniques, such as frequent itemset mining (FIM) or perturbed factor analysis (PFA), have been borrowed from other fields, adapted, or developed for environmental health data. FIM, a data mining technique commonly employed for retail analysis, has been used to isolate chemical combinations (i.e., patterns) based on their prevalence.39,40 PFA captures similarities and differences in exposures and has been used to express shared exposure profiles within groups and to evaluate differences in exposure across groups.41 These methods lack some of PCP-LOD’s distinct features (i.e., LOD-specific penalty, nonnegativity constraint, and procedure to accommodate missing values), but they have certain characteristics (that PCP-LOD may lack), making them well suited for environmental mixtures research. It is useful to have a collection of tools to answer research questions concerning multipollutant exposures to allow for improved public health communication.
PCP-LOD may be paired with various dimension reduction techniques. In our simulations and application to NHANES data, we paired PCP-LOD with SVD to make results comparable with those of PCA. The SVD solution does include negative values, but because of the nonnegativity constraint on the matrix, PCP-LOD can be paired with any nonnegative dimension reduction technique (e.g., NMF) to provide results interpretable on an additive scale with a parts-based representation.42
The three components underlying the NHANES mixture distinguished by PCP-LOD represent one pattern of exposure to all POPs and two patterns grouped by known structural and toxicological properties. More than 90% of human exposure to PCBs, dioxins, and furans is through the food supply, mainly meat, dairy, and seafood.43–45 Thus, the first component of comprehensive exposure may be interpreted as a dietary source of these POPs. The second component separated dioxins and furans, which are generally more toxic, from PCBs.46 Accordingly, a potential interpretation of the second component is as a measure of toxicity. Notably, in a sensitivity analysis restricting to 11 POPs detected in at least 75% of samples, the first two identified patterns remain largely unchanged, demonstrating PCP-LOD’s ability to extract the underlying structure in the presence of values . The third component separated lower molecular weight PCBs from higher molecular weight PCBs, where larger numbers indicate more chlorine atoms and larger molecules. Higher chlorinated congeners tend to bioaccumulate more than lower chlorinated congeners.47,48 The third pattern identified in the main analysis was not found in the sensitivity analysis, because four of the lower molecular weight PCBs and four of the higher molecular weight PCBs from the main analysis were not included. Depending on the research question, any or all of these components could be included in subsequent analyses with health outcomes.
In the original POP mixture, individuals with high values on any chemicals were likely to have high values on other chemicals, or equivalently, individuals with low values on any chemicals were likely to have low values on other chemicals. PCP-LOD captured this in a component representing overall mixture exposure. After removing the underlying patterns in the mixture described in , high (or low) exposure events on individual chemicals did not indicate high (or low) exposure to other chemicals; i.e., sparse events in were not highly correlated. About half of the unique low exposure events were in the original mixture; these values were not explained by overall low exposure or by identified patterns.
The ability to identify and separate extreme events is a unique feature offered by PCP-LOD and cannot be found in other methods. These unique or extreme events not captured in may themselves be risk factors (e.g., wildfires—unique events not explained by commonly recognized air pollution sources—for asthma emergency admissions),49 or they may modify an association with one of the components (e.g., a Saharan dust episode might modify the association with traffic-related pollution).50 Next steps could entail including exposures along with identified patterns from in a health model with some form of penalization (e.g., lasso or elastic net). Because exposure variables in the sparse matrix are not highly correlated, they do not pose the same problems as the original mixture.
Although PCP-LOD addresses several drawbacks of existing methods, it does not overcome all limitations of pattern identification in environmental mixtures. First, in multipollutant exposures the “true” originating mechanism is almost never known; thus PCP-LOD cannot provide the “correct” answer. PCP-LOD also cannot guarantee interpretable results. However, the nonnegativity constraint and the matrix were added, in part, to enhance interpretability. PCP-LOD, like other methods employed in our field, should be used in conjunction with subject area expertise. The interpretability of results relies on this expert knowledge. This limitation applies, however, to all methods that are used to address research questions concerning patterns of environmental exposures. Second, including scores obtained from any dimension reduction technique paired with PCP-LOD in a health model ignores the uncertainty inherent in the solution selection, resulting in underestimated confidence intervals and, potentially, spurious results.51 Third, some data sets will likely be high-dimensional, with a large number of correlated chemical measurements for each participant. In this situation, PCP-LOD still performs well, provided the rank of the target matrix is small enough in comparison with (e.g., , where is a constant, as Zhou et al. presented at an information theory symposium).14 Moreover, PCP methods have not yet been extended to accommodate repeated measures such as clustered observations or longitudinal data, which are common in environmental health applications. Additionally, our application findings should be interpreted in light of their limitations. First, as is the case when using chemical biomarkers, our study is susceptible to exposure measurement error. In a noisy setting, any method will exhibit an inaccuracy in the estimated left singular vectors, which is commensurate with the noise level. Nevertheless, even in this setting, the results produced by PCP-LOD are stable with respect to noise.14 Second, our results may not be generalizable beyond the study population. Although NHANES includes a nationally representative sample of the general noninstitutionalized U.S. population,52 we did not account for the complex sampling design and weights of the study.53 Thus, the PCP-LOD-identified patterns may represent sources or behaviors distinct to the participants.
PCP-LOD also has numerous strengths when compared with existing methods to identify exposure patterns in environmental mixtures, which require strong assumptions and have key limitations. As a consequence, their use has resulted in heterogeneous and inconsistent findings across studies.1 Moreover, results from methods that are not generalizable or interpretable hinder their use in the design and development of regulations, policies, and targeted interventions. Original PCP has few assumptions, namely that is not sparse and that is not low rank.9 This is an appealing feature of a tool when the underlying truth is not known. PCP-LOD directly addresses several additional limitations of existing methods: a) its solution is not necessarily orthogonal, allowing correlations between patterns; b) its solution is nonnegative, so patterns can exist in an interpretable space; c) its parameters do not require tuning by the researcher, meaning that the choice of number of patterns in is not subjective; and d) PCP-LOD is robust to extreme values because of the novel matrix.
To our knowledge, this work represents the first instance of decomposing the structure among chemicals in an additive manner. By separating the unique events from underlying patterns, PCP-LOD provides the opportunity to include extreme events in analyses, where they previously may have been suppressed or discarded. The theory-backed parameter selection and cross-validation enhances reproducibility of PCP-LOD, ensuring that two different research groups with the same data set will identify the same optimal number of patterns. PCP-LOD may be employed when environmental epidemiologists have research questions concerning sources or behaviors leading to chemical exposure or patterns underlying exposure to multiple pollutants, especially when data are noisy, incomplete, or may contain extreme exposure events.
Supplementary Material
Acknowledgments
This work was partially supported by the National Institutes of Environmental Health (NIEHS) individual fellowship grant F31 ES030263, as well as PRIME R01 ES028805 and P30 ES009089.
References
- 1.Gibson EA, Goldsmith J, Kioumourtzoglou M-A. 2019. Complex mixtures, complex analyses: an emphasis on interpretable results. Curr Environ Health Rep 6(2):53–61, PMID: , 10.1007/s40572-019-00229-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gull SF, Daniell GJ. 1978. Image reconstruction from incomplete and noisy data. Nature 272(5655):686–690, 10.1038/272686a0. [DOI] [Google Scholar]
- 3.Helsel DR. 2005. More than obvious: better methods for interpreting nondetect data. Environ Sci Technol 39(20):419A–423A, PMID: , 10.1021/es053368a. [DOI] [PubMed] [Google Scholar]
- 4.Helsel DR. 2005. Nondetects and Data Analysis. Statistics for Censored Environmental Data. Hoboken, NJ: Wiley-Interscience. [Google Scholar]
- 5.U.S. EPA (U.S. Environmental Protection Agency). 2000. Guidance for Data Quality Assessment. Practical Methods for Data Analysis: EPA QA/G-9, QA00 version. Washington, DC: U.S. Environmental Protection Agency, Office of Environmental Information. [Google Scholar]
- 6.Barr DB, Landsittel D, Nishioka M, Thomas K, Curwin B, Raymer J, et al. 2006. A survey of laboratory and statistical issues related to farmworker exposure studies. Environ Health Perspect 114(6):961–968, PMID: , 10.1289/ehp.8528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Hornung RW, Reed LD. 1990. Estimation of average concentration in the presence of nondetectable values. Appl Occup Environ Hyg 5(1):46–51, 10.1080/1047322X.1990.10389587. [DOI] [Google Scholar]
- 8.Helsel DR. 1990. Less than obvious—statistical treatment of data below the detection limit. Environ Sci Technol 24(12):1766–1774, 10.1021/es00082a001. [DOI] [Google Scholar]
- 9.Candès EJ, Li X, Ma Y, Wright J. 2011. Robust principal component analysis? J ACM 58(3):1–37, 10.1145/1970392.1970395. [DOI] [Google Scholar]
- 10.Zhang J, Yan J, Wright J. 2021. Square root principal component pursuit: tuning-free noisy robust matrix recovery. Adv Neural Inf Process Syst 34:29464–29475. [Google Scholar]
- 11.Netrapalli P, Niranjan UN, Sanghavi S, Anandkumar A, Jain P. 2014. Non-convex robust PCA. Adv Neural Inf Process Syst 27. https://proceedings.neurips.cc/paper/2014/file/443cb001c138b2561a0d90720d6ce111-Paper.pdf [accessed 7 September 2021]. [Google Scholar]
- 12.Chen Y, Fan J, Ma C, Yan Y. 2020. Bridging convex and nonconvex optimization in robust PCA: noise, outliers, and missing data. Ann Stat 49(5):2948–2971, 10.1214/21-AOS2066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Wold S, Esbensen K, Geladi P. 1987. Principal component analysis. Chemometr Intell Lab Syst 2(1–3):37–52, 10.1016/0169-7439(87)80084-9. [DOI] [Google Scholar]
- 14.Zhou Z, Li X, Wright J, Candès E, Ma Y. 2010. Stable principal component pursuit. In: 2010 IEEE International Symposium on Information Theory Proceedings, 1518–1522, 10.1109/ISIT.2010.5513535. [DOI] [Google Scholar]
- 15.Chandrasekaran V, Sanghavi S, Parrilo PA, Willsky AS. 2011. Rank-sparsity incoherence for matrix decomposition. SIAM J Optim 21(2):572–596, 10.1137/090761793. [DOI] [Google Scholar]
- 16.Blackledge JM. 2006. Chapter 8, Vector and Matrix Norms. In: Digital Signal Processing: Mathematical and Computational Methods, Software Development and Applications. 2nd ed. Chichester, UK: Horwood. [Google Scholar]
- 17.Lin Z, Liu R, Su Z. 2011. Linearized alternating direction method with adaptive penalty for low-rank representation. In: Proceedings of 25th Annual Conference on Neural Information Processing Systems 2011. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. 12–14 December 2011. Granada, Spain: Adv Neural Inf Process Syst 24, 612–620. https://proceedings.neurips.cc/paper/2011/hash/18997733ec258a9fcaf239cc55d53363-Abstract.html [accessed 7 September 2021]. [Google Scholar]
- 18.Boyd S, Boyd SP, Vandenberghe L. 2004. Convex Optimization. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 19.Wright J, Ma Y. 2021. High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications. Cambridge, UK: Cambridge University Press. [Google Scholar]
- 20.Ge R, Jin C, Zheng Y. 2017. No spurious local minima in nonconvex low rank problems: a unified geometric analysis. In: ICML’17–Proceedings of the 34th International Conference on Machine Learning, vol. 70. 6–11 August 2017. Sydney, Australia: JMLR, 1233–1242. [Google Scholar]
- 21.Yi X, Park D, Chen Y, Caramanis C. 2016. Fast algorithms for robust PCA via gradient descent. arXiv, 10.48550/arXiv.1605.07784. [DOI] [Google Scholar]
- 22.Cherapanamjeri Y, Gupta K, Jain P. 2017. Nearly optimal robust matrix completion. In: ICML’17: Proceedings of the 34th International Conference on Machine Learning, 797–805. [Google Scholar]
- 23.Gao W, Goldfarb D, Curtis FE. 2020. ADMM for multiaffine constrained optimization. Optim Methods Softw 35(2):257–303, 10.1080/10556788.2019.1683553. [DOI] [Google Scholar]
- 24.Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. 2010. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning 3(1):1–122, 10.1561/2200000016. [DOI] [Google Scholar]
- 25.Chen Y, Jalali A, Sanghavi S, Caramanis C. 2011. Low-rank matrix recovery with errors and erasures. IEEE Trans Inf Theory 59(7):4324–4337, 10.1109/TIT.2013.2249572. [DOI] [Google Scholar]
- 26.Li X. 2013. Compressed sensing and matrix completion with constant proportion of corruptions. Constructive Approximation 37(1):73–99. [Google Scholar]
- 27.Gibson EA, Nunez Y, Abuawad A, Zota AR, Renzetti S, Devick KL, et al. 2019. An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health 18(1):76, PMID: , 10.1186/s12940-019-0515-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.McGee G, Wilson A, Webster TF, Coull BA. 2021. Bayesian multiple index models for environmental mixtures. Biometrics 1–13. Preprint posted online September 25, 2021, PMID: , 10.1111/biom.13569. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zipf G, Chiappa M, Porter KS, Ostchega Y, Lewis BG, Dostal J. 2013. Health and Nutrition Examination Survey plan and operations, 1999–2010. Vital Health Stat 1 56:1–37, PMID: . [PubMed] [Google Scholar]
- 30.U.S. CDC (U.S. Centers for Disease Control and Prevention). 2002. Laboratory Procedure Manual: PCBs and Persistent Pesticides in Serum. 2001–2002. Atlanta, GA: U.S. CDC. https://www.cdc.gov/nchs/data/nhanes/nhanes_01_02/l28poc_b_met_pcb_pesticides.pdf [accessed 7 September 2021]. [Google Scholar]
- 31.U.S. CDC. 2002. Laboratory Procedure Manual: PCDDs, PCDFs, and cPCBs in Serum. 2001–2002. Atlanta, GA: U.S. CDC. https://www.cdc.gov/nchs/data/nhanes/nhanes_01_02/l28poc_b_met_dioxin_pcb.pdf [accessed 7 September 2021]. [Google Scholar]
- 32.Akins JR, Waldrep K, Bernert JT Jr, 1989. The estimation of total serum lipids by a completely enzymatic ‘summation’ method. Clin Chim Acta 184(3):219–226, PMID: , 10.1016/0009-8981(89)90054-5. [DOI] [PubMed] [Google Scholar]
- 33.Krzanowski W. 1987. Cross-validation in principal component analysis. Biometrics 43(3):575–584, 10.2307/2531996. [DOI] [Google Scholar]
- 34.Diana G, Tommasi C. 2002. Cross-validation methods in principal component analysis: a comparison. Stat Methods Appt 11(1):71–82, 10.1007/BF02511446. [DOI] [Google Scholar]
- 35.Ward JH Jr. 1963. Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244, 10.1080/01621459.1963.10500845. [DOI] [Google Scholar]
- 36.Zhuang LH, Chen A, Braun JM, Lanphear BP, Hu JM, Yolton K, et al. 2021. Effects of gestational exposures to chemical mixtures on birth weight using Bayesian factor analysis in the Health Outcome and Measures of Environment (HOME) Study. Environ Epidemiol 5(3):e159, PMID: , 10.1097/EE9.0000000000000159. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Traoré T, Forhan A, Sirot V, Kadawathagedara M, Heude B, Hulin M, et al. 2018. To which mixtures are French pregnant women mainly exposed? A combination of the second French total diet study with the EDEN and ELFE cohort studies. Food Chem Toxicol 111:310–328, PMID: , 10.1016/j.fct.2017.11.016. [DOI] [PubMed] [Google Scholar]
- 38.Paatero P, Hopke PK, Hoppenstock J, Eberly SI. 2003. Advanced factor analysis of spatial distributions of PM2.5 in the eastern United States. Environ Sci Technol 37(11):2460–2476, PMID: , 10.1021/es0261978. [DOI] [PubMed] [Google Scholar]
- 39.Kapraun DF, Wambaugh JF, Ring CL, Tornero-Velez R, Setzer RW. 2017. A method for identifying prevalent chemical combinations in the US population. Environ Health Perspect 125(8):087017, PMID: , 10.1289/EHP1265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Stanfield Z, Addington CK, Dionisio KL, Lyons D, Tornero-Velez R, Phillips KA, et al. 2021. Mining of consumer product ingredient and purchasing data to identify potential chemical coexposures. Environ Health Perspect 129(6):67006, PMID: , 10.1289/EHP8610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Roy A, Lavine I, Herring AH, Dunson DB. 2021. Perturbed factor analysis: accounting for group differences in exposure profiles. Ann Appl Stat 15(3):1386–1404, 10.1214/20-AOAS1435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Lee DD, Seung HS. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791, PMID: , 10.1038/44565. [DOI] [PubMed] [Google Scholar]
- 43.ATSDR (Agency for Toxic Substances and Disease Registry). 2000. Toxicological Profile for Polychlorinated Biphenyls (PCBs). [PubMed]
- 44.World Health Organization. 2000. WHO Fact Sheets: Dioxins and Their Effects on Human Health. https://www.who.int/news-room/fact-sheets/detail/dioxins-and-their-effects-on-human-health [accessed 7 September 2021].
- 45.Loganathan BG, Masunaga S. 2020. PCBs, dioxins, and furans: Human exposure and health effects. In: Handbook of Toxicology of Chemical Warfare Agents. 3rd ed. London, UK: Elsevier, 267–278. [Google Scholar]
- 46.IARC Working Group on the Evaluation of Carcinogenic Risks to Humans. Chemical Agents and Related Occupations. Lyon (FR): International Agency for Research on Cancer. 2012. (IARC Monographs on the Evaluation of Carcinogenic Risks to Humans, No. 100F.) https://www.ncbi.nlm.nih.gov/books/NBK 304416/ [accessed 7 September 2021]. [PMC free article] [PubMed] [Google Scholar]
- 47.Steele G, Stehr-Green P, Welty E. 1986. Estimates of the biologic half-life of polychlorinated biphenyls in human serum. N Engl J Med 314(14):926.– , PMID: , 10.1056/NEJM198604033141418. [DOI] [PubMed] [Google Scholar]
- 48.Hopf NB, Ruder AM, Succop P. 2009. Background levels of polychlorinated biphenyls in the US population. Sci Total Environ 407(24):6109–6119, PMID: , 10.1016/j.scitotenv.2009.08.035. [DOI] [PubMed] [Google Scholar]
- 49.Delfino RJ, Brummel S, Wu J, Stern H, Ostro B, Lipsett M, et al. 2009. The relationship of respiratory and cardiovascular hospital admissions to the Southern California wildfires of 2003. Occup Environ Med 66(3):189–197, PMID: , 10.1136/oem.2008.041376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Karanasiou A, Moreno N, Moreno T, Viana M, De Leeuw F, Querol X. 2012. Health effects from sahara dust episodes in Europe: literature review and research gaps. Environ Int 47:107–114, PMID: , 10.1016/j.envint.2012.06.012. [DOI] [PubMed] [Google Scholar]
- 51.Kioumourtzoglou M-A, Coull BA, Dominici F, Koutrakis P, Schwartz J, Suh H. 2014. The impact of source contribution uncertainty on the effects of source-specific PM2.5 on hospital admissions: a case study in Boston. J Expo Sci Environ Epidemiol 24(4):365–371, PMID: , 10.1038/jes.2014.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Johnson CL, Paulose-Ram R, Ogden CL, Carroll MD, Kruszan-Moran D, Dohrmann SM, et al. 2013. National Health and Nutrition Examination Survey. Analytic Guidelines 1999–2010. Vital Health Stat 2 161:1–24, PMID: . [PubMed] [Google Scholar]
- 53.Curtin LR, Mohadjer LK, Dohrmann SM, Montaquila JM, Kruszan-Moran D, Mirel LB, et al. 2012. The National Health and Nutrition Examination Survey: sample design, 1999–2006. Vital Health Stat 2 155:1–39, PMID: . [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







