Abstract
Distributed lag models (DLMs) are often used to estimate lagged associations and identify critical exposure windows. In a simulation study of prenatal nitrogen dioxide (NO2) exposure and birth weight, we demonstrate that bias amplification and variance inflation can manifest under certain combinations of DLM estimation approaches and time-trend adjustment methods when using low-spatial-resolution exposures with extended lags. Our simulations showed that when using high-spatial-resolution exposure data, any time-trend adjustment method produced low bias and nominal coverage for the distributed lag estimator. When using either low- or no-spatial-resolution exposures, bias due to time trends was amplified for all adjustment methods. Variance inflation was higher in low- or no-spatial-resolution DLMs when using a long-term spline to adjust for seasonality and long-term trends due to concurvity between a distributed lag function and secular function of time. NO2–birth weight analyses in a Massachusetts-based cohort showed that associations were negative for exposures experienced in gestational weeks 15–30 when using high-spatial-resolution DLMs; however, associations were null and positive for DLMs with low- and no-spatial-resolution exposures, respectively, which is likely due to bias amplification. DLM analyses should jointly consider the spatial resolution of exposure data and the parameterizations of the time trend adjustment and lag constraints.
Keywords: air pollution, bias amplification, birth weight, concurvity, distributed lag models, spatial resolution, variance inflation, Z-bias
Abbreviations
- BW
birth weight
- CI
confidence or credible interval
- df
degrees of freedom
- DLM
distributed lag model
- NO2
nitrogen dioxide
- ns-DLM
natural spline–based distributed lag model
- ppb
parts per billion
- RMSE
root mean square error
- TDLM
tree-based distributed lag model
Environmental exposures often have delayed or lagged health effects. Epidemiologists commonly use distributed lag models (DLMs) to capture potentially complex lag patterns. Recent studies have used DLMs over extended lag periods (which we call “extended DLMs” throughout). For example, DLMs in air pollution and temperature time-series studies of acute health outcomes now examine lags up to 30 days (1), and those in birth outcomes research typically use an extended exposure period of up to 40 weeks to identify critical windows during pregnancy in which exposure is associated with pregnancy, birth, or children’s health outcomes (2–5). In such studies, it is typical to control for a time trend by, for example, including smooth functions of time or indicators of year and season to remove the influence of seasonality and long-term trends. Not doing so or doing so inadequately can lead to biased exposure-health associations. However, depending on the spatial resolution in the data, the DLM estimation approach, and the time trend adjustment method, extended DLMs can be prone to bias amplification and variance inflation.
DLMs can be applied to exposure data of various spatial resolutions. Common scenarios in which there could be low or no spatial resolution in the exposure data include individual-level analyses that rely on a single monitor or weather station for exposure assignment (e.g., many temperature and climate studies), studies of rare outcomes that require aggregating data to a larger spatial unit to ensure a sufficient number of outcome events (e.g., aggregating census tract data to a city/county and then conducting a time-series analysis by taking the average exposure in that geographic area to see whether changes in exposure correlate to changes in the outcome), and case-only records for which there is often information on zip codes but not exact residential addresses (e.g., hospital admissions).
In this paper, we focus on 2 perils that can occur when fitting an extended DLM to spatially unresolved exposure data: 1) bias amplification—also known as “Z-bias” in the epidemiologic and econometrics literature (6–11)—in the presence of residual confounding by time trends; and 2) variance inflation that arises from concurvity (i.e., the nonparametric analogue of multicollinearity) between a distributed lag function and a secular function of time (12, 13). We characterize the impact of these 2 issues under different spatial resolutions in the exposure data when estimating the time-varying association between exposure to prenatal nitrogen dioxide (NO2)—a traffic-related emissions surrogate—and birth weight (BW). We first estimate the NO2-BW lag-response function using both a natural spline–based DLM (ns-DLM) and tree-based DLM (TDLM) in a Massachusetts-based cohort. Then, using the lag-response relationships estimated from this data, we illustrate using simulations that bias amplification and variance inflation can manifest under certain combinations of spatial resolution, DLM estimation approach, and time trend adjustment method.
METHODS
Data analysis
Study population.
We used data on all singleton, term deliveries (≥37 weeks of gestation) at Beth Israel Deaconess Medical Center in Boston, Massachusetts, that were conceived from January 1, 2000, to December 31, 2015 (n = 71,440). We considered 3 spatial resolutions for the NO2 exposure data: high (1-km), low (county-level), and no resolution; a detailed description of how these spatial resolutions were constructed and how they were assigned can be found in Web Appendix 1 (available at https://doi.org/10.1093/aje/kwac220). We restricted our analyses to exposures during the first 37 weeks of pregnancy to have a common exposure period for all observations. Individual- and area-level variables that were considered and how they were operationalized in the analyses are described in Web Appendix 2. We excluded births with missing covariate information and those with implausible BW, which we defined as 4 standard deviations (SDs) from the mean of the cohort. Furthermore, we excluded participants who lived >20 km from Beth Israel Deaconess Medical Center because for simulation scenarios with no-spatial-resolution exposures, we needed to specify a spatial unit to average the exposure estimates. We obtained a final sample size of 46,153 participants (Web Figure 1). The institutional review boards of Harvard T. H. Chan School of Public Health and Beth Israel Deaconess Medical Center approved this study.
Analytical treatment.
We fitted 2 types of DLMs, which are described in more detail in Web Appendix 3, to the data: 1) ns-DLMs with 4 degrees of freedom (df) for the lag response, and 2) TDLMs, which use an ensemble of Bayesian regression trees that assumes a piecewise constant relationship across the lag-response space (14, 15). For both approaches, we estimated using the 3 different spatial resolutions of the exposure (high, low, and no resolution): 1) the lag-specific NO2-BW association for each gestational week, and 2) the cumulative association, which is the expected change in BW associated with a simultaneous change in NO2 at each gestational week. We considered the following 6 methods to adjust for time trends: 1) no adjustment, 2) natural spline for seasonality and long-term trends (a 64-df spline for the entire 16-year study period, which we call “long-term spline”), 3) year indicator and natural spline for seasonality (i.e., a spline term for day of the year, which we call “seasonal spline”) with 4 df, 4) year indicator and harmonics (sine-cosine pair), 5) year and month indicators, and 6) year and season indicators. To adjust for confounding by temperature, we included distributed nonlinear lags; we used 4 df for the lag constraint and 3 df for the temperature-response function to account for potential nonlinear relationships within each gestational week. Finally, all other covariates previously described were included in the model, with linear and quadratic terms used for continuous variables.
Simulation study
We simulated a cohort over the same 16-year study period as the Massachusetts data. To ensure realistic autocorrelation and seasonal trends in exposure, we used the same NO2 exposure as in the real-data analysis. We simulated the BW data to have the same distribution and sample size as the real data, where these simulations had 2 input parameters: 1) the NO2-BW lag response from the high-spatial-resolution ns-DLM with a long-term spline in the real-data analysis (Figure 1), and 2) time trends of BW estimated by fitting a natural spline with 4 df/year to the real data (Web Figure 2).
Figure 1 .
Continues
Figure 1.
The time-varying association between weekly nitrogen dioxide (NO2) exposure and birth weight in term deliveries at Beth Israel Deaconess Medical Center (n = 46,153), Boston, Massachusetts, 2000–2016. Estimates were made using natural spline–based distributed lag models (ns-DLMs) (A) and tree-based distributed lag models (TDLMs) (B) under scenarios with 3 different spatial resolutions for exposure (high, low, and no resolution) and 6 methods to adjust for time trends (no adjustment, long-term spline, year and spline, year and harmonics, year and month, and year and season). Black solid lines show the lag-response estimates, gray shaded areas show the 95% confidence intervals for ns-DLMs and 95% Bayesian credible intervals for TDLMs, and the black dashed line indicates the null hypothesis of no effect across all weeks. Web Figure 3 is a zoomed-out version (expanded y-axis) of the no-spatial-resolution ns-DLM with a long-term spline. g, grams; ppb, parts per billion
We fitted ns-DLMs and TDLMs to the simulated data with 3 different spatial resolutions of the exposure and time trend adjustment methods. We considered the same time trend adjustment methods used in the real-data analyses. In total, we considered 36 combinations of data-generating scenarios and analysis models (2 types of DLMs, 3 exposure resolutions, and 6 methods to adjust for time trends), and each of these scenarios was simulated 500 times.
For each scenario, we computed average root mean square error (RMSE) and coverage for the cumulative effect, which is the expected change in BW due to a simultaneous increase of 10 parts per billion (ppb) in NO2 for every week of pregnancy. Coverage was calculated as the percentage of times the 95% confidence interval for the ns-DLM or 95% Bayesian credible interval for the TDLM estimated from simulated data contained the true cumulative effect (for simplicity, we will abbreviate both confidence and credible intervals as CIs henceforth). We also computed the coverage of the pointwise 95% CIs for the lag-specific estimates across the 37-week exposure period.
We also examined the statistical power by calculating the probability of detecting a critical window (i.e., percentage of times the 95% CI for any of the simulated lag-specific effects did not contain zero) in gestational weeks 18–24, which has been shown to be the exposure window with negative lag-specific estimates in the real data (i.e., the high-spatial-resolution ns-DLM with a long-term spline in Figure 1). Furthermore, we calculated the probability of detecting a critical window for the entire 37 weeks (i.e., whether any lag-specific 95% CI did not contain zero) to assess whether our simulations also identified the positive lag-specific estimates at the beginning and end of pregnancy.
We conducted several additional analyses. First, we assessed the performance of the models in simulations using alternate data-generating mechanisms—1) no time trends or NO2 effect, 2) NO2 effect with no time trends, 3) time trends without NO2 effect—to see if inferences were robust to the data-generating process. Second, we also considered a smaller sample size of n = 1,000—generated by randomly sampling 1,000 individuals with replacement from the full simulated data set—to explore whether bias amplification and variance inflation were more pronounced in smaller data sets.
RESULTS
Data analysis
Characteristics of the Massachusetts cohort can be found in Table 1 and are described in Web Appendix 1. Figure 1 shows the NO2-BW lag-responses when using ns-DLMs and TDLMs with three different spatial resolutions of the exposure data. The high-spatial-resolution ns-DLM showed that NO2 was associated with reduced BW in weeks 18–24 when using a long-term spline to adjust for the time trend (Figure 1). The association was strongest around the 20th week of gestation—we estimated a change in BW of −1.22 g (95% CI: −2.35, −0.09) per 10-ppb increase during week 20. When using other time trend adjustment methods (seasonal spline, harmonics, month, season), the association was strongest during the 25th week of gestation, and the magnitude of the association was similar to that when using a long-term spline. The corresponding high-spatial-resolution TDLMs produced similar lag-response relationship, although the estimates were attenuated (Figure 1).
Table 1.
Maternal and Child Characteristics of Singleton Term Deliveries at Beth Israel Deaconess Medical Center (n = 46,153), Boston, Massachusetts, 2000–2016
| Characteristic | Mean (SD) | No. | % |
|---|---|---|---|
| Age, years | 31 (5) | ||
| Median income, $thousands | 67 (27) | ||
| Median home value, $thousands | 400 (170) | ||
| Pregnancy average NO2 exposure, ppb | 29 (6) | ||
| Pregnancy average temperature, °C | 10 (3) | ||
| Birth weight, grams | 3,400 (450) | ||
| Race/ethnicity | |||
| White | 22,304 | 48 | |
| Black | 6,042 | 13 | |
| Asian | 7,932 | 17 | |
| Hispanic | 2,917 | 6 | |
| Other | 6,958 | 15 | |
| Education | |||
| College or higher | 12,771 | 28 | |
| Lower than college | 8,967 | 19 | |
| Not specified | 24,415 | 53 | |
| Mode of delivery | |||
| Vaginal | 31,408 | 68 | |
| Caesarean | 13,288 | 29 | |
| Instrumental | 1,457 | 3 | |
| Parity | |||
| Nulliparous | 22,835 | 49 | |
| Parous | 23,318 | 51 | |
| Child sex | |||
| Female | 22,670 | 49 | |
| Male | 23,483 | 51 | |
| Insurance | |||
| Private | 37,492 | 81 | |
| Public/uninsured | 8,661 | 19 |
Abbreviations: NO2, nitrogen dioxide; ppb, parts per billion; SD, standard deviation
In low- and no-spatial-resolution DLMs, we observed lag responses that were different and 95% CIs that were wider compared with those estimated from DLMs using high-spatial-resolution exposure data (Figure 1). When using low-spatial-resolution exposure data, we found that the peak association was in the later part of pregnancy when controlling for time trends with methods other than a long-term spline in both ns-DLMs and TDLMs. However, the CIs were larger than those estimated with the high-spatial-resolution exposure data and contained the null for all lags. The exception to this is the model that controlled for time with a long-term spline. The lag-response relationship when using a long-term spline and ns-DLM identified weeks 10–19 as a critical window, with the association appearing strongest in the 14th week of gestation; for every 10-ppb increase in NO2 during that week, the mean change in BW was −3.59 g (95% CI: −6.32, −0.86). The results were similar but attenuated with TDLMs. For no-spatial-resolution DLMs, we found that associations were mostly positive for both the ns-DLM and TDLM regardless of the time trend adjustment method. For models that controlled for time trend with methods other than a long-term spline, the associations appeared strongest around the 11th week of gestation; for example, when using harmonics, for every 10-ppb increase in NO2 in week 11, the mean change in BW was 3.11 g (95% CI: 0.66, 5.57). The lag-response relationship when using a long-term spline was substantially different from those using other methods of adjustment and had much wider CIs (Figure 1; expanded y-axis shown in Web Figure 3). The association was strongest in the 25th week of gestation, and for every 10-ppb increase in NO2 during that week, the mean change in BW was 9.0 g with a very wide confidence interval indicating instability of the estimate (95% CI: −11.5, 29.6).
Simulation study
Results of the simulation study are shown in Figures 2–3 and Tables 2–3. Several scenarios produced biased estimates of the lag-specific effects (Figure 2). Models without time trend adjustment had the largest bias and lowest coverage, which was expected as these were meant to show the shape of the underlying confounding bias by time trends (i.e., upward confounding at the beginning and end of pregnancy). Any time trend adjustment method in the high-spatial-resolution DLMs produced nominal coverage and low average RMSEs (Table 2). The residual confounding by time trends was negligible when using high-spatial-resolution exposures, as shown by only minor deviation from the null on average in Figure 2. This negligible bias with high-spatial-resolution exposure data was amplified when using exposure estimates with coarser spatial resolution; that is, bias amplification was present in DLMs using low- or no-spatial-resolution exposure estimates (Figure 2; Table 2). The amplification was larger for ns-DLMs than for TDLMs. Coverage of the cumulative effect was slightly below the nominal level for low- or no-spatial-resolution ns-DLMs but was still achieved for the corresponding TDLMs (Table 2). This difference was driven by the lower coverage for lags prior to week 15 and slightly lower than nominal coverage from weeks 15–30 for the ns-DLMs (Figure 3).
Figure 2.
Simulation results, showing the time-varying association between nitrogen dioxide (NO2) exposure and birth weight, of natural spline–based distributed lag models (A) and tree-based distributed lag models (B), across all weeks using 6 methods to adjust for time trends (no adjustment, long-term spline, year and spline, year and harmonics, year and month, and year and season). Simulation inputs include time trends and NO2 effects from the Massachusetts data. Gray lines show the lag-response relationships from each iteration of the simulation, the black solid line indicates the average across all simulations (500 replicates), the black dashed line indicates the null hypothesis of no effect across all weeks, and the black dotted line indicates the true simulated lag-response relationship. g, grams; ppb, parts per billion.
Figure 3.
Coverage of lag-specific associations between nitrogen dioxide (NO2) and birth weight in simulations using natural spline–based distributed lag models (DLMs) (A) and tree-based distributed lag models (TDLMs) (B), with 6 methods to adjust for time trends (no adjustment, long-term spline, year and spline, year and harmonics, year and month, and year and season). Simulation inputs include time trends and NO2 effects from the Massachusetts data. The solid line and points indicate the proportion of simulations over 500 replicates in which the lag-specific 95% confidence interval (or credible intervals for TDLMs) contained the true simulated lag-specific effect, and the black dashed line indicates the 95% nominal coverage.
Table 2.
Simulation Results Showing Coverage and Average Root Mean Square Error Across All Simulations (500 Replicates) for the Cumulative Effect Using Natural Spline–Based Distributed Lag Models and Tree-Based Distributed Lag Models While Varying the Time Trend Adjustment Method
| Natural Spline–Based DLM | Tree-Based DLM | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| High Resolution | Low Resolution | No Resolution | High Resolution | Low Resolution | No Resolution | |||||||
| Adjustment Method | Coverage | RMSE | Coverage | RMSE | Coverage | RMSE | Coverage | RMSE | Coverage | RMSE | Coverage | RMSE |
| No adjustment | 0.00 | 17.48 | 0.00 | 28.38 | 0.00 | 34.08 | 0.00 | 16.64 | 0.00 | 27.54 | 0.00 | 33.02 |
| Long-term spline | 0.96 | 3.95 | 0.95 | 7.98 | 0.96 | 154.47 | 0.96 | 3.56 | 0.97 | 6.82 | 1.00 | 33.93 |
| Year and spline | 0.95 | 3.88 | 0.94 | 7.95 | 0.90 | 21.27 | 0.96 | 3.56 | 0.96 | 6.45 | 0.96 | 13.88 |
| Year and harmonics | 0.97 | 3.87 | 0.95 | 7.95 | 0.92 | 18.47 | 0.95 | 3.62 | 0.96 | 6.71 | 0.95 | 14.85 |
| Year and month | 0.94 | 3.90 | 0.95 | 7.81 | 0.95 | 19.16 | 0.96 | 3.56 | 0.95 | 6.57 | 0.95 | 13.93 |
| Year and season | 0.97 | 3.75 | 0.94 | 7.89 | 0.94 | 17.99 | 0.96 | 3.53 | 0.96 | 6.71 | 0.96 | 13.51 |
Abbreviations: DLM, distributed lag model; RMSE, root mean square error.
Table 3.
Simulation Results Under Data-Generating Mechanisms With Both Time Trends and Nitrogen Dioxide Effect Showing the Probability of Detecting a Critical Window in Weeks 18–24 and During the 37 Weeks of Pregnancy for Natural Spline–Based Distributed Lag Models and Tree-Based Distributed Lag Models While Varying the Time Trend Adjustment Method
| Adjustment Method and Period | Natural Spline–Based DLM | Tree-Based DLM | ||||
|---|---|---|---|---|---|---|
| High Resolution | Low Resolution | No Resolution | High Resolution | Low Resolution | No Resolution | |
| Probability of detecting a critical window in weeks 18–24 | ||||||
| No adjustment | 1.00 | 0.99 | 0.99 | 0.44 | 0.24 | 0.17 |
| Long-term spline | 0.68 | 0.30 | 0.06 | 0.10 | 0.02 | 0.00 |
| Year and spline | 0.89 | 0.71 | 0.43 | 0.24 | 0.12 | 0.11 |
| Year and harmonics | 0.90 | 0.72 | 0.53 | 0.25 | 0.14 | 0.08 |
| Year and month | 0.91 | 0.78 | 0.55 | 0.25 | 0.15 | 0.08 |
| Year and season | 0.97 | 0.90 | 0.71 | 0.43 | 0.29 | 0.24 |
| Probability of detecting a critical window during the entire pregnancy | ||||||
| No adjustment | 1.00 | 1.00 | 1.00 | 0.97 | 1.00 | 1.00 |
| Long-term spline | 0.73 | 0.42 | 0.25 | 0.12 | 0.03 | 0.02 |
| Year and spline | 0.91 | 0.81 | 0.70 | 0.27 | 0.15 | 0.20 |
| Year and harmonics | 0.92 | 0.81 | 0.77 | 0.29 | 0.20 | 0.18 |
| Year and month | 0.92 | 0.85 | 0.79 | 0.29 | 0.19 | 0.17 |
| Year and season | 0.98 | 0.93 | 0.87 | 0.52 | 0.37 | 0.32 |
Abbreviation: DLM, distributed lag model.
Figure 2 .
Continues
Figure 3 .
Continues
Importantly, ns-DLMs fitted to low- or no-spatial-resolution exposure data with a long-term spline to control for time trends produced an overall unbiased estimate with nominal coverage, but the lag-specific and cumulative effect estimates had much higher variance than with other adjustment methods or when there was more spatial resolution in the exposure data (Figure 2; Table 2). For example, the average RMSE for a ns-DLM when using a long-term spline with high-, low-, and no-spatial resolution in the exposure data was 3.95, 7.98, and 154.47 g, respectively (Table 2). Estimates from TDLMs were more stable than ns-DLMs (Figure 2; Table 2).
We also observed that DLMs using high-spatial-resolution exposure data were more likely to identify critical windows in gestational weeks 18–24 and during the entire pregnancy compared with those using low- or no-spatial-resolution exposure data, with ns-DLMs possessing more statistical power than TDLMs (Table 3). Among the time trend adjustment methods, the long-term spline was less likely to detect the critical window compared with the other methods, whereas the year-and-season-indicator method performed relatively well (Table 3).
Results under alternate data-generating mechanisms—1) no time trends or NO2 effect; 2) NO2 effect with no time trends; 3) time trends without NO2 effect—were similar to the primary simulations and can be found in Web Figures 4–7 and Web Tables 1–2. When using low- or no-spatial-resolution exposures, bias due to time trends (if present) was amplified. Furthermore, variance inflation due to concurvity was higher in low- or no-spatial resolution DLMs when using a long-term spline to adjust for seasonality and long-term trends (regardless of whether there were time trends or NO2 effects). Overall, in concordance with our primary simulations, we found that high-spatial-resolution data produced low bias and nominal coverage of the distributed lag estimator (i.e., this is true irrespective of the data-generating mechanism) (Web Figures 4–7; Web Tables 1–2).
For simulations using smaller data sets (n = 1,000), we found that bias was larger compared with our main simulations (Web Figures 8–12; Web Tables 3–4). We observed the same pattern of bias and coverage when comparing across spatial resolutions and DLM constraints. That is, models that used low- or no-spatial- resolution exposure data produced more bias and lower coverage compared with those with higher-spatial-resolution data, and RMSEs were larger for ns-DLMs compared with TDLMs (Web Figures 8–12, Web Table 3). Furthermore, we observed that no-spatial-resolution DLMs with a long-term spline produced highly unstable estimates (more so than those in the main simulation). We found that although there was now lower statistical power compared with our main simulations, ns-DLMs with higher-exposure spatial resolution were still more likely to identify critical windows compared with those that used coarser exposure data; however, TDLMs were not able to detect the critical window in these smaller data sets (Web Table 4).
DISCUSSION
While DLMs were initially popularized in air pollution time-series studies looking at daily exposures up to a week prior to acute outcomes (e.g., mortality or cardiovascular hospital admissions) (16–19), recent studies have used DLMs over longer periods of time. This has been particularly true for studies of temperature effects, where 21-day lags have been common (20). However, depending on the spatial resolution, the DLM estimation approach, and the method to control for time trends, these extended DLMs may be vulnerable to bias amplification and variance inflation, which can compromise our ability to identify biologically relevant exposure windows. Here, we demonstrate these issues using the example of NO2 and BW through a simulation study and in a real-data analysis.
Our simulation study showed that the magnitude of the bias due to time trends was a function of the spatial resolution of the exposure data. We showed that if we estimate DLMs with high-spatial-resolution exposure data, then any of the methods we considered to adjust for confounding by time trends will lead to low bias and nominal coverage. We showed that with increasing coarseness of the exposure data, the bias was amplified (bias amplification was higher in no-spatial-resolution models compared with those using low-spatial-resolution exposures, which in turn had more bias than high-spatial-resolution models). This is because the estimate of an individual’s NO2 exposure is influenced primarily by 4 components: spatial variability (e.g., exposure differences across geographic areas), short-term temporal variability (e.g., day-to-day changes), variation due to longer-term time trends (i.e., seasonality and long-term trends), and random error. By decreasing the spatial component of the exposure, time trends would explain a larger fraction of the remaining variation in the exposure estimate. Thus, any residual confounding by time trends would manifest as a larger bias compared with scenarios using exposures with higher spatial resolution (i.e., the bias is amplified). This phenomenon is akin to previous manifestations of “Z-bias” (6–11), but the scenarios are dissimilar in that bias amplification here would arise from either the exposure assessment method (e.g., single-monitor studies) or the study design choice (e.g., time-series) rather than from the statistical analysis implemented by study investigators.
We also observed in our simulations that low- or no-spatial-resolution DLMs with a long-term spline were prone to variance inflation, with coarser resolutions leading to higher instability of the estimates. This is due to concurvity in that the spline basis functions for time trend and the lag dimension of the ns-DLM are too similar. The issue persisted, just to a lesser extent (estimates were closer to the null and 95% CIs were narrower), when using TDLMs. This is partially due to lower concurvity because of the piecewise constant parameterization and because the TDLM approach employs a shrinkage estimator with a Bayesian prior (14). All other methods to adjust for time trends did not suffer from variance inflation as they did not rely on a similar spline parameterization as the lag dimension of the DLM. In our simulations with smaller data sets (n = 1,000), we showed that the degree of variance inflation is affected by sample size, with more-pronounced impact in small studies—the RMSE for the cumulative association estimated by ns-DLMs with a long-term spline was almost 1 kg.
In the analysis of the Massachusetts data, we showed that high-spatial-resolution ns-DLMs and TDLMs using several time trend adjustment methods identified the same wide critical window of a negative association for weeks 15–30. We are confident in this finding for 2 reasons. First, our simulations showed that if we use high-spatial-resolution exposures in our models, then any of the time trend adjustment methods lead to low bias and nominal coverage for both ns-DLM and TDLM approaches. Second, these weeks are biologically sensitive windows for fetal development. Placental blood flow stabilizes at around the 16th week (21, 22); thus, around this time, the fetus may be more susceptible to the oxidative stress burst due to increased NO2 exposure (23, 24). Furthermore, there is evidence that NO2 can induce acute placental and pulmonary inflammation (25), which can interfere with transplacental nutrient exchange and deprive the fetus of adequate nutrition throughout pregnancy (hence the wide critical window).
Models with low- and no-spatial-resolution exposure data fitted to the Massachusetts data produced different lag-specific estimates with wider 95% CIs compared with those using high-spatial-resolution exposures. From comparing the high-spatial-resolution DLMs without any adjustment with those that did adjust for time trends, we deduced that the confounding bias due to time trends was upward in this setting. Thus, our findings of null and positive associations for DLMs with low- and no-spatial-resolution exposures, respectively, were in line with our simulations. This is because the upward bias would be amplified for both resolutions, but more so for the no-spatial-resolution models as a higher proportion of the exposure variation can be explained by time trends. Because bias amplification here arises from the underlying data structure, no parameterizations of either the lag dimension or the time trend adjustment were immune from it. That is, unlike scenarios involving instrumental variables (6–10), we could not eliminate the issue by choosing to not condition on the instrument. We also observed high levels of variance inflation in models with a long-term spline (estimates appeared more unstable for no-spatial-resolution DLMs) as both the time trend and the lag constraint for the 37-week exposure window were parameterized similarly. We would expect lower concurvity if we looked at trimester-average NO2 exposures because the time trend and distributed lag function operate on different temporal scales (but there is a trade-off in reducing the temporal scale of the exposure in this manner as it can lead to other types of biases (2)). By the same logic, we would expect to see low concurvity for exposures and outcomes that follow different trends and/or for models with different length lags (e.g., DLM analyses looking at daily exposures for acute outcomes with a short lag period such as one week (18)). The importance of scale described here is similar to those previously discussed for spatial confounding (26, 27), in that a different scale for the exposure compared with the spatial confounder will improve the stability of the estimate. This is because if both exist on the same scale, then their effects cannot be disentangled and can lead to computational instability. From a causal inference perspective, this is known as a violation of the positivity assumption (27).
Based on our simulations, we recommend the use of DLM analyses that can leverage high-spatial-resolution exposure data to identify critical windows. Either ns-DLMs or TDLMs are fine as they performed similarly in settings with low levels of concurvity, although TDLMs may be preferable when we do not know the true underlying lag pattern a priori because it makes fewer assumptions on the shape of the lag function, but only if there is sufficient statistical power to conduct the analysis (e.g., using data from large administrative data sets). When high-spatial-resolution data are not available or infeasible for a study design, our simulations showed that a year indicator in combination with harmonics, month indicator, or season indicator all performed well, with the year-and-season indicator performing best in this context, perhaps because of the very periodic pattern of seasonality within each year. Yet this may not always be the case, as it likely depends on the time trends of the exposure and outcome, as well as how the model is constructed and parameterized. None of the models are immune from bias amplification (as it is a function of the data structure), but there are several steps that can be taken to assess whether bias amplification is a threat to internal validity. One option is to assess the seasonality and long-term trends of the exposure and outcome. If no time trend is obvious, then bias amplification is unlikely to be an issue as there is no confounding by time trends (i.e., there is no bias to amplify). However, if there is an underlying time trend in the outcome, then researchers could repeat the analyses in settings with different seasonal patterns and long-term trends, if such data are available. If the lag-specific estimates are consistent across different time trends (i.e., the confounding structure likely varies across the data sets), then it would be compelling evidence that the correct window has been identified. In smaller studies, we recommend using caution when implementing DLM analyses, as our simulations showed that using DLMs with extended lags can lead to imprecise and highly variable estimates. In such studies, it may be better to use moving average models (28), which, although may misspecify the lag pattern, would result in more stable estimates.
A key strength of this work is that we had high-spatial-resolution exposure data to create 3 exposure resolutions for the analyses, which allowed us to explore the issues of bias amplification and concurvity under several scenarios. We also had a large data set with rich covariate data to conduct a real-data analysis alongside our simulation study. However, some notable limitations should also be mentioned. First, our illustrative example of NO2 and BW in Massachusetts had specific time trends of NO2 and BW. Research questions using different exposures and/or outcomes, examining different number of windows or windows of different duration, or conducted in settings with different seasonal and long-term trends could be affected by bias amplification and concurvity differently. Second, we only considered 2 formulations of the DLM—the ns-DLM and TDLM; however, there are other DLM approaches that could have been used, including unconstrained DLMs, other penalization approaches, or a cross-basis for nonlinear effects. Although we did not explore these other approaches here, they also are expected to experience concurvity issues if the parameterizations of time trend and the lag constraint are similar, as well as bias amplification since this is a function of the underlying data structure. Furthermore, it is also possible that model misspecification could have induced bias in the real-data analysis. That is, we assumed linear lag-specific associations when they could potentially be nonlinear; however, this was done to simplify our simulation study and make the corresponding results easier to interpret (the simulation study did not suffer from such misspecification as the true effect was simulated to be linear). Finally, our simulations were based on the lag-response relationship estimated from the real-data analysis, and so fitting the same analysis models to the simulated data produced performance metrics (i.e., RMSE and coverage) that were likely optimistic. However, we chose to use this lag-response curve because we wanted to assess how other time trend adjustment methods performed if we used the model with the long-term spline as the reference (i.e., the model that generated the simulated data). Interestingly, the model that matches the data-generating mechanisms (i.e., the model with the long-term spline) is not the model that we recommend (harmonics or indicators for month/season performed better).
In conclusion, we extend the discussions of concurvity typically presented within the context of generalized additive models (GAMs) (12, 13) and spatial confounding (26, 27), as well as present a previously undiscussed manifestation of bias amplification. The discussions we present here are not intended to dissuade investigators from conducting DLM analyses using low-spatial-resolution exposure data but rather to point out several caveats when approaching and interpreting such analyses as well as provide guidance on the best practices for controlling for time trends. Ultimately, the spatial resolution, time trend adjustment method, and parameterization of the lag constraint should all be jointly considered in DLM analyses.
Supplementary Material
ACKNOWLEDGMENTS
Author affiliations: Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts (Michael Leung, Michele R. Hacker, Joel Schwartz, Marc G. Weisskopf); Department of Environmental Health, Harvard T. H. Chan School of Public Health, Boston, Massachusetts (Michael Leung, Brent A. Coull, Joel Schwartz, Marc G. Weisskopf); Department of Environmental Health Sciences, Columbia University Mailman School of Public Health, New York City, New York (Sebastian T. Rowland, Marianthi-Anna Kioumourtzoglou); Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts (Brent A. Coull); Department of Obstetrics and Gynecology, Beth Israel Deaconess Medical Center, Boston, Massachusetts (Anna M. Modest, Michele R. Hacker); Department of Obstetrics, Gynecology and Reproductive Biology, Harvard Medical School, Boston, Massachusetts (Anna M. Modest, Michele R. Hacker); and Department of Statistics, Colorado State University, Fort Collins, Colorado (Ander Wilson).
This work was supported by the National Institute of Environmental Health Sciences (grants P30 ES009098, P30 ES000002, R01 ES030616, R01 ES029943, and R01 AG065276). This publication was also partially made possible by the Environmenal Protection Agency support (grant RD-835872-01).
The temperature data are available from Phase 2 of the North American Land Data Assimilation Systems at the NASA Earth Sciences Data and Information Services Center. Zip Code Tabulation Area-Level annual median household income and median value of owner-occupied housing units are available from the 2000 and 2010 US Census and the American Community Survey. The NO2 data are available from the corresponding author on reasonable request. To protect the privacy of participants, the Beth Israel Deaconess Medical Center data set with individual-level covariates cannot be shared.
Presented at the 2022 annual meeting of the International Society of Environmental Epidemiology, September 18–21, 2022, Athens, Greece.
The views expressed in this article are those of the authors and do not represent the official views of the National Institutes of Health or the US Environmental Protection Agency. Further, the US Environmental Protection Agency does not endorse the purchase of any commercial products or services mentioned in the publication.
Conflict of interest: none declared.
REFERENCES
- 1. He MZ, Kinney PL, Li T, et al. Short- and intermediate-term exposure to NO2 and mortality: a multi-county analysis in China. Environ Pollut. 2020;261:114165. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Wilson A, Chiu Y-HM, Hsu H-HL, et al. Potential for bias when estimating critical windows for air pollution in children’s health. Am J Epidemiol. 2017;186(11):1281–1289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Kioumourtzoglou MA, Raz R, Wilson A, et al. Traffic-related air pollution and pregnancy loss. Epidemiology. 2019;30(1):4–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Jahn JL, Krieger N, Agénor M, et al. Gestational exposure to fatal police violence and pregnancy loss in US core based statistical areas, 2013–2015. EClinicalMedicine. 2021;36:100901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Huang Y, Kioumourtzoglou M-A, Mittleman MA, et al. Air pollution and risk of placental abruption: a study of births in New York City, 2008–2014. Am J Epidemiol. 2021;190(6):1021–1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Heckman J, Navarro-Lozano S. Using matching, instrumental variables, and control functions to estimate economic choice models. Rev Econ Stat. 2004;86(1):30–57. [Google Scholar]
- 7. Bhattacharya J, Vogt W. Do instrumental variables belong in propensity scores?. National Bureau of Economic Research Technical Working Paper Series, no. 343. https://www.nber.org/papers/t0343. Accessed March 3, 2022.
- 8. Middleton JA, Scott MA, Diakow R, et al. Bias amplification and bias unmasking. Polit Anal. 2016;24(3):307–323. [Google Scholar]
- 9. Wooldridge JM. Should instrumental variables be used as matching variables? Res Econ. 2016;70(2):232–237. [Google Scholar]
- 10. Ding P, Vanderweele TJ, Robins JM. Instrumental variables as bias amplifiers with general outcome and confounding. Biometrika. 2017;104(2):291–302. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Weisskopf MG, Seals RM, Webster TF. Bias amplification in epidemiologic analysis of exposure to mixtures. Environ Health Perspect. 2018;126(4):047003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Ramsay TO, Burnett RT, Krewski D. The effect of concurvity in generalized additive models linking mortality to ambient particulate matter. Epidemiology. 2003;14(1):18–23. [DOI] [PubMed] [Google Scholar]
- 13. Ramsay T, Burnett R, Krewski D. Exploring bias in a generalized additive model for spatial air pollution data. Environ Health Perspect. 2003;111(10):1283–1288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Mork D, Wilson A. Treed distributed lag nonlinear models. Biostatistics. 2022;23(3):754–771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Mork D, Wilson A. Estimating perinatal critical windows of susceptibility to environmental mixtures via structured Bayesian regression tree pairs [published online ahead of print September 25, 2021]. Biometrics. ( 10.1111/biom.13568). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Schwartz J. The distributed lag between air pollution and daily deaths. Epidemiology. 2000;11(3):320–326. [DOI] [PubMed] [Google Scholar]
- 17. Zanobetti A, Wand MP, Schwartz J, et al. Generalized additive distributed lag models: quantifying mortality displacement. Biostatistics. 2000;1(3):279–292. [DOI] [PubMed] [Google Scholar]
- 18. Zanobetti A, Schwartz J, Dockery DW. Airborne particles are a risk factor for hospital admissions for heart and lung disease. Environ Health Perspect. 2000;108(11):1071–1077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Zanobetti A, Schwartz J, Samoli E, et al. The temporal pattern of respiratory and heart disease mortality in response to air pollution. Environ Health Perspect. 2003;111(9):1188–1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Gasparrini A, Guo Y, Hashizume M. Mortality risk attributable to high and low ambient temperature: a multicountry observational study. Lancet. 2015;386(9991):369–375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Jauniaux E, Watson AL, Hempstock J, et al. Onset of maternal arterial blood flow and placental oxidative stress: a possible factor in human early pregnancy failure. Am J Pathol. 2000;157(6):2111–2122. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Jauniaux E, Watson A, Burton G. Evaluation of respiratory gases and acid-base gradients in human fetal fluids and uteroplacental tissue between 7 and 16 weeks’ gestation. Am J Obstet Gynecol. 2001;184(5):998–1003. [DOI] [PubMed] [Google Scholar]
- 23. Kelly F. Oxidative stress: its role in air pollution and adverse health effects. Occup Environ Med. 2003;60(8):612–616. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Romieu I, Castro-Giner F, Kunzli N, et al. Air pollution, oxidative stress and dietary supplementation: a review. Eur Respir J. 2008;31(1):179–197. [DOI] [PubMed] [Google Scholar]
- 25. Bobak M. Outdoor air pollution, low birth weight, and prematurity. Environ Health Perspect. 2000;108(2):173–176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Paciorek CJ. The importance of scale for spatial-confounding bias and precision of spatial regression estimators. Stat Sci. 2010;25(1):107–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Gilbert B, Datta A, Ogburn E. Approaches to spatial confounding in geostatistics [preprint]. arXiv. 2021. 10.48550/arxiv.2112.14946. Accessed March 3, 2022. [DOI] [Google Scholar]
- 28. Gasparrini A. Modelling lagged associations in environmental time series data: a simulation study. Epidemiology. 2016;27(6):835–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






