Abstract
Fetal alcohol spectrum disorders (FASD) are thought to be the leading cause of developmental disabilities worldwide. Accurate estimates of the prevalence of FASD have been lacking. An improved estimate of FASD prevalence in the U.S. was recently reported in a study where multistage methods of active case ascertainment of first-grade children with FASD were utilized in four regions in the U.S. Each method relied on parental consent and therefore had potential non-response bias. We consider weighted approaches, where the weights were formed using the distribution of observed variables in the population from which the samples were drawn. However, there are likely other unobserved variables that affect both non-response and FASD outcome. We describe sensitivity analyses using methodology developed for causal inference. The results show feasible regions of FASD prevalence under certain assumptions, and provide a framework for explaining the non-response bias due to the unobserved variables.
Keywords: Causal inference, Feasible lower bound, Inverse probability weighting, Non-response, Sampling bias, Unobserved confounding
INTRODUCTION
Fetal alcohol exposure occurs when a woman drinks alcohol while pregnant. Data from prenatal clinics and postnatal studies suggest that 20 to 30 percent of women consume alcohol at some time during pregnancy (1). Alcohol can negatively influence fetal development at any stage during a pregnancy, including at the earliest stages in gestation, often before a woman knows she is pregnant (2). Alcohol consumption during pregnancy can affect the developing embryo or fetus, leading to specific facial features, growth deficiencies, and a range of characteristic cognitive and behavioral problems that may become apparent at any time during childhood. Fetal alcohol spectrum disorders (FASD) is the umbrella term for the spectrum of effects of prenatal alcohol exposure, which include fetal alcohol syndrome (FAS), partial fetal alcohol syndrome (pFAS), alcohol-related neurodevelopmental disorder (ARND), and alcohol-related birth defects (ARBD). The estimated lifetime cost (for example, lost productivity, housing, medical services) for one individual diagnosed with FAS, the most severe end of the spectrum, is 2.9 billion U.S. dollars (3); and the cumulative costs of care for all FAS-affected individuals has been estimated at 6 billion U.S. dollars annually (4).
Primary care physicians often lack sufficient training or confidence to ascertain accurate prenatal alcohol exposure histories from women or to recognize the clinical features of FASD in children. This results in the failure of referral of FASD-affected individuals for appropriate evaluation and treatment, and under-reporting of the disorder. The magnitude of the potential cost of FASD for affected individuals and families substantiates the critical public health significance of the disorder. For these reasons, more accurate U.S. estimates of the prevalence of FASD are needed in order to establish a baseline against which to measure the effectiveness of prevention and intervention programs. Accurate estimates are also needed to better inform the need for specific services and interventions to ultimately improve health outcomes for families and children.
The Collaboration on FASD Prevalence (CoFASP) research consortium, funded by the National Institute on Alcohol Abuse and Alcoholism (NIAAA) beginning in September 2010, sought to establish the prevalence of FASD among school-aged children in several U.S. communities, using active case ascertainment methodology. The primary findings of the CoFASP consortium, recently reported in May et al. (2018) (5), were conservatively estimated prevalences of FASD ranging from 1.1-5.0%.
The study design used by CoFASP was an epidemiological survey of children in the first grade in school in the general population of each of four U.S. communities, where case ascertainment, i.e. classification of FASD, required complex multistage procedures. Depending on the site, these steps included: seeking parental permission to contact, consent to and completion of screening of children using growth measures and/or a developmental questionnaire, and consent to and completion of a comprehensive evaluation for physical features of FASD, child neurodevelopment, and a maternal interview to capture prenatal exposure to alcohol. Each stage of the procedure reduced the number of children who eventually provided sufficient data to be classified as having an FASD or not, leaving the remainder as “non-response” with respect to the primary outcome of interest. This non-response portion ranged from 49 to 81 percent across the eight samples (5).
Weighted approaches are routinely used in public health surveillance, including those conducted by the Centers for Disease Control and Prevention (CDC) as well as state health departments, to account for stratified design, oversampling, non-response, and non-coverage. Weighting each record appropriately can help to adjust for the differences between the sample and the target population. Weighting the observed values by inverse-probabilities of selection was initially proposed by Horvitz and Thomas (6) and has since been widely used in missing data problems and causal inference (see for example, Robins et al and Hogan et al) (7, 8).
Forming weights for estimation of the prevalence of FASD requires that data be available on the distribution of the relevant variables in the target population, that is, variables that predict non-response as well as FASD outcome; some examples are: specific patterns of alcohol use in pregnancy, socioeconomic status, maternal age, parity, and race/ethnicity. However, the availability of such a distribution is limited in practice. The implication of such limitations is that we are very likely to have uncontrolled “confounding” when using the available variables to form weights to estimate FASD prevalence rates, and therefore sensitivity analysis is imperative. Here the word “confounding” is borrowed from the causal inference literature and we further elaborate this in the Methods Section below.
In this analysis we sought to address the potential non-response bias in the estimation of FASD prevalence with weighted approaches and a sensitivity analysis, using as an example one of the regions in the CoFASP study to derive a feasible lower bound for the prevalence of FASD in that population.
METHODS
Study design and samples
The design of the parent study was described in May et al. (2018).(5) First grade children with parental consent were recruited from public and private elementary schools in four geographic locations in the United States using a cross-sectional design over two academic years. In order to construct the weighted estimates of the FASD prevalence rate, it was necessary to have 1) the distribution of selected variables in the target population such as maternal alcohol use, and 2) these same variables having been captured in the sample data. To illustrate the sensitivity analysis approach, two samples were selected from the CoFASP primary outcome dataset: the Pacific Southwestern City site for the 2012-2013 and 2013-2014 academic years.
Inverse-probability weighting and sensitivity analysis
In the following suppose that we have a random sample of units i = 1, … , N from an infinite population. Let Yi be the primary outcome of interest; in our case Yi = 1 or 0 indicating diagnosis ‘yes’ or ‘no’ FASD (or sub-categories as appropriate) for child i. Let Ri = 1 or 0 be the response indicator of whether a child has received an FASD (or subcategory) diagnosis; i.e. whether Yi is observed. In our case only those children who completed the comprehensive evaluation and received classification as FASD ‘yes’ (including subcategories FAS, pFAS and ARND) or ‘no’ are responders. Let Xi be a vector of observed covariates such as race/ethnicity, maternal alcohol use, etc. Our goal is to obtain an (asymptotically) unbiased estimate of the population mean E(Y). If we can assume that Yi and Ri are conditionally independent given Xi then the response (or equivalently, missing) mechanism is called strong ignorability (9), also referred to as ‘no uncontrolled confounding’, which implies that the missing Yi’s are missing at random (MAR) (10). Under this assumption, if we can use Xi to predict the probabilities of response, then inverse-probability weighting (IPW) can be used to provide a consistent estimate of the population mean, i.e. prevalence in this case. We note that the MAR assumption cannot be tested using the observed data, and we will base our sensitivity analysis on various departures from this assumption.
MCAR estimate
The primary finding for the estimates in May et al. (2018)(5) assumed that data were missing completely at random (MCAR), that is, children in the eligible sample who were not evaluated had the same prevalence of FASD as those who completed the evaluation within each subpopulation. This included children not consented or those who did not complete all components of the evaluation. The subpopulations were then combined using the known weights; see May et al. (2018)(5) for more details.
MAR estimate
This estimate assumes that the outcome is missing at random (MAR). It uses IPW with weights based on the propensity of response estimated using X. More specifically the MAR estimate is obtained using (∑wiRiYi)/N, where wi = l/P(Ri = 1|Xi). In our case since X is categorical (itself can be a combination of several categorical variables with all possible interactions, as described below), the propensity of response P(Ri = 1|Xi) is estimated by the observed frequency of response within each category of the distribution of X; this is equivalent to fitting a so-called saturated parametric propensity model.
For the distribution of X in the target population, maternal race/ethnicity and alcohol use was captured from the 2005-2007 California Maternal and Infant Health Assessment (MIHA) for the Pacific Southwestern City. The MIHA survey was conducted postnatally among recent mothers who were randomly sampled for participation from birth certificates in that year (11). More specifically, “Any Alcohol Use” in our CoFASP samples was defined as any alcohol consumption reported during pregnancy, and in MIHA was defined as any alcohol consumption during the first or third trimester.
Sensitivity analysis
In order to carry out the sensitivity analysis, we followed the setup of Shen et al (12) for causal inference of a treatment effect. While we do not have a treatment under concern, it is well known that under the potential outcomes framework for causal inference, within each treatment group and the corresponding population, the observed treatment assignment is equivalent to our response indicator here, and the average treatment effect is the difference between the two population means that one needs to estimate. The MAR assumption is equivalent to no unobserved confounding.
Following Shen et al (12) sensitivity of the MAR estimate obtained using IPW is affected by the unmeasured ‘confounder’ denoted as U. It is assumed that Y and R are conditionally independent given X and U. The following parameters enter the sensitivity analysis:
- τ = 0.1, 0.2, 0.3, and 0.4, which reflects the effect of U on the response indicator R, in terms of odds ratio (OR) in a range of roughly 1-3, assuming a probit model for the propensity of response.(12) More precisely, assume that
where Φ(·) is the cumulative distribution function (CDF) of standard normal, g(X) is some function of X, and U ~ N(0,1) is independent of X. Shen et al (12) showed that τ values between 0.1 - 0.4 correspond to OR of roughly 1 - 3 for the effect of one interquartile range change in U on S* .(1) ρ = Corr(Y, ϵ), where ϵ = S*/S, S* = E(R| X, U) is the true propensity of response, and S = E(R| X) is called the pseudo propensity score. In other words ρ is the correlation between the outcome Y and the error in (modeling) the propensity of response because of the unobserved U. It can be seen as capturing the effect of U on Y in addition to its effect on R. We will further explore the values of ρ below using simulations.
λ which is a nonparametric version of τ, without assuming a probit model for the propensity of response. For more details see Shen et al (12).
Sensitivity of the MAR estimates is obtained by solving a quadratic equation (7) of E(Y) in Shen et al. (12) for given values of ρ and τ or λ. See Shen et al (12) for more details.
Simulation to determine the range of ρ
While the interpretation of τ and λ is clear from the above as the effect of U on R, the meaning of the values of ρ is less clear. This is because while it captures the dependence of Y on U, in generating the data it can also depend on the values of τ. To determine a realistic range for ρ to be used in practice, we carried out extensive simulation experiments for various plausible scenarios including parameter values. The simulation setup is briefly described below, with more details provided in the Supplement File.
Since we can choose the distribution of X in simulations, without loss of generality we may assume g(·) to be the identity function. From (1) we have (Shen et al.(12)
(2) |
For given values of τ, X, and U, we can compute ϵ = S*/S, and also generate Y according to
(3) |
We can then compute the Monte Carlo sample-based correlation coefficient for ρ = Corr(Y, ϵ).
We considered various distributions for X including normal, uniform, binary and Poisson. Various magnitudes of β1 and β2 in (3) were also considered that reflected the different effects of the unobserved U on Y relative to the effects of the observed X. In addition, during an early stage of the CoFASP project we also considered to let X follow the 2012-2013 joint distribution of child race/ethnicity and eligibility for free or reduced-price meal plan in the Pacific SW City. See the Supplement File for more details.
Imputing predictors of response
There were two ways X values could be missing. In the first, in the CoFASP samples, only the children who received an FASD classification yes or no (i.e. responders) had variables such as maternal race/ethnicity and alcohol use captured during the comprehensive evaluation. These variables were not observed for the remainder of the subjects. For these non-responders, we can impute their X values from the distribution below:
where P(X = x) is obtained from the population MIHA survey, P(X = x|R = 1) is estimated directly from the observed sample data, and P(R = 1) and P(R = 0) are estimated by the proportions out of the N children with observed and unobserved FASD outcomes in the samples, respectively.
In the second source of missing X values, maternal race/ethnicity and alcohol use data were missing for a number of the responders, i.e., children who received the comprehensive evaluation and an FASD classification yes or no. These subjects typically had all their other maternal characteristic variables missing as well, i.e., had not completed the maternal interview. We considered imputation using multivariate imputation by chained equations (MICE) (13), using the FASD classification as the predictor.
Multiple imputation (MI) was employed for both types of missing described above. Ten imputations were carried out, and the results from each imputation were combined using the standard MI approach to obtain the final point estimate and its estimated variance (14).
In addition, for the second type of missing, we also imputed all of the missing values as the category with the highest and the lowest FASD prevalence in the sample; for example, all missing the variable “Any Alcohol Use” were imputed as ‘yes’ or as ‘no’, respectively.
RESULTS
Simulation results
Simulation results are given in the Supplement File Section 2. Apparently ρ tends to be increasingly positive as U increases the odds of Y = 1, and increasingly negative as U decreases the odds of Y = 1.
From the extensive simulation results we see that the values of ρ lie between (−0.31, 0.31) for all the situations considered, and between (−0.3, 0.3) when both odds ratios (ORs) in the tables of the Supplement are within 0.33 and 3. We note that relevant odds ratios between 0.33 and 3 are the commonly used to set the sensitivity parameters (15). In the sensitivity analysis below (−0.3, 0.3) was the range used for ρ.
CoFASP data results
Table 1 shows the distribution of the variables any alcohol use, and alcohol by race/ethnicity among the n = 404 and 483 children classified as FASD yes or no and subcategories in the Pacific Southwestern City in the 2012-2013 and 2013-2014 sample years, respectively. It also shows the distribution of these variables from the MIHA survey for those years. After imputing the missing values as described above, these distributions were then used to compute the MAR estimates, which are given in Table 2.
Table 1.
Distribution of Weighting Variables by Fetal Alcohol Spectrum Disorder (FASD) in the Pacific Southwestern City in 2012-2013 and 2013-2014.
Variable | Level | 2012-2013 (n=404) | 2013-2014 (n=483) | MIHA Rates1 | ||
---|---|---|---|---|---|---|
FASD | No FASD | FASD | No FASD | |||
Any Alcohol | Yes | 29 (69.0%) | 82 (22.7%) | 33 (67.3%) | 109 (25.1%) | 20.00% |
No | 8 (19.0%) | 261 (72.1%) | 12 (24.5%) | 312 (71.9%) | 80.00% | |
Missing | 5 (11.9%) | 19 (5.2%) | 4 (8.2%) | 13 (3.0%) | ||
Any Alcohol by Race/Ethnicity | Yes-Latina | 13 (31.0%) | 33 (9.1%) | 19 (38.8%) | 40 (9.2%) | 5.30% |
Yes-Other | 4 (9.5%) | 12 (3.3%) | 6 (12.2%) | 24 (5.5%) | 2.30% | |
Yes-White | 12 (28.6%) | 36 (9.9%) | 8 (16.3%) | 45 (10.4%) | 11.80% | |
No-Latina | 3 (7.1%) | 148 (40.9%) | 10 (20.4%) | 228 (52.5%) | 39.50% | |
No-Other | 2 (4.8%) | 45 (12.4%) | 1 (2.0%) | 49 (11.3%) | 12.90% | |
No-White | 3 (7.1%) | 68 (18.8%) | 1 (2.0%) | 35 (8.1%) | 28.20% | |
Missing | 5 (11.9%) | 20 (5.5%) | 4 (8.2%) | 13 (3.0%) |
Maternal and Infant Health Assessment (MIHA) during 2005-2007 for the Pacific Southwestern City.
Table 2.
Estimates and Feasible Lower Bounds of FASD Prevalence (per 100 Children) Using Any Alcohol or Any Alcohol by Race/Ethnicity to Predict the Propensity of Response for the Pacific Southwestern City in 2012-2013 and 2013-2014.
Year (n,N)1 | Diagnosis | Cases | MCAR2 | ALB3 | Imputation Method4 | Any Alcohol | Any Alcohol × Race/Ethnicity | ||
---|---|---|---|---|---|---|---|---|---|
MAR (95%CI) | FLB5 | MAR (95%CI) | FLB | ||||||
2012-2013 (n = 404, N=2,238) | FAS | 1 | 0.20 | 0.04 | A1 | 0.27 (0.03, 1.44) | 0.04 | 0.23 (0.02, 1.24) | 0.04 |
B | 0.28 (0.03, 1.48) | 0.04 | 0.25 (0.02, 1.32) | 0.04 | |||||
C | 0.30 (0.03, 1.56) | 0.04 | 0.26 (0.03, 1.38) | 0.04 | |||||
pFAS | 19 | 3.88 | 0.85 | A1 | 4.27 (2.64, 6.49) | 1.75 | 4.14 (2.49, 6.43) | 1.62 | |
B | 4.10 (2.51, 6.28) | 1.64 | 4.02 (2.39, 6.30) | 1.55 | |||||
C | 3.87 (2.35, 5.98) | 1.48 | 3.72 (2.14, 5.96) | 1.35 | |||||
ARND | 22 | 4.92 | 0.98 | A1 | 4.28 (2.78, 6.25) | 1.75 | 4.35 (2.78, 6.43) | 1.74 | |
B | 3.83 (2.47, 5.63) | 1.48 | 3.83 (2.44, 5.68) | 1.45 | |||||
C | 3.30 (2.17, 4.79) | 1.17 | 3.50 (2.24, 5.20) | 1.23 | |||||
Total FASD | 42 | 9.00 | 1.88 | A1 | 8.82 (6.49, 11.58) | 4.73 | 8.73 (6.33, 11.59) | 4.56 | |
B | 8.21 (5.98, 10.87) | 4.28 | 8.10 (5.82, 10.85) | 4.13 | |||||
C | 7.47 (5.43, 9.92) | 3.71 | 7.48 (5.32, 10.11) | 3.63 | |||||
2013-2014 (n = 483, N=2,171) | FAS | 3 | 0.49 | 0.14 | A2 | 0.52 (0.14, 1.47) | 0.14 | 0.34 (0.09, 1.00) | 0.14 |
B | 0.51 (0.14, 1.46) | 0.14 | 0.33 (0.09, 0.97) | 0.14 | |||||
C | 0.50 (0.13, 1.45) | 0.14 | 0.32 (0.08, 0.96) | 0.14 | |||||
pFAS | 24 | 4.14 | 1.11 | A2 | 4.60 (2.99, 6.71) | 2.13 | 4.12 (2.45, 6.45) | 1.48 | |
B | 4.52 (2.93, 6.62) | 2.07 | 4.02 (2.35, 6.37) | 1.40 | |||||
C | 4.35 (2.80, 6.41) | 1.95 | 3.88 (2.21,6.27) | 1.30 | |||||
ARND | 22 | 3.82 | 1.01 | A2 | 3.29 (2.13, 4.84) | 1.33 | 3.86 (2.05, 6.57) | 1.34 | |
B | 3.03 (1.96, 4.45) | 1.17 | 2.93 (1.53, 5.08) | 1.07 | |||||
C | 2.77 (1.82, 4.05) | 1.03 | 2.59 (1.61, 3.94) | 1.01 | |||||
Total FASD | 49 | 8.44 | 2.26 | A2 | 8.41 (6.30, 10.90) | 4.77 | 8.33 (5.69, 11.59) | 4.02 | |
B | 8.06 (6.00, 10.49) | 4.50 | 7.29 (4.94, 10.23) | 3.30 | |||||
C | 7.63 (5.67, 9.95) | 4.15 | 6.79 (4.72, 9.36) | 2.94 |
Abbreviations: Absolute Lower Bound (ALB); Feasible Lower Bound (FLB); Missing Completely At Random (MCAR).
n is the number of children with a complete evaluation and FASD classification, N is the total 1st grade enrollment.
MCAR is the weighted prevalence estimate reported in May et al. (2018).(5)
ALB is the conservative prevalence estimate reported in May et al. (2018).(5)
- A1 - category with the lowest prevalence of Total FASD: “No” for Any Alcohol and “No-Latina” for Any Alcohol × Race/Ethnicity;
- A2 - category with the lowest prevalence of Total FASD: “No” for Any Alcohol and “No-Other” for Any Alcohol × Race/Ethnicity;
- B - using multivariate imputation by chained equations (MICE);
- C - category with the highest prevalence of Total FASD: “Yes” for Any Alcohol and “Yes-Latina” for Any Alcohol × Race/Ethnicity.
FLB is the feasible lower bound computed using the nonparametric approach with λ = 0.088 and ρ = 0.3.
In Table 2, the various estimates were computed for the entire samples of N = 2,238 and 2,171 eligible first-grade children in the schools selected for participation in those two school years. The MCAR estimates were the estimates from May et al. (2018) (5). The absolute lower bounds (ALB) were also from May et al. (2018) (5), using the number of FASD and subcategory cases divided by the total number (N) of eligible children.
It is seen that different imputation methods for missing covariate information among the n children gave different MAR estimates, typically from higher to lower (A – C). The only exception is FAS in 2012-2013, where the only case occurred in the “no” category of “any alcohol use” and the “no-Latina” category of any alcohol × race/ethnicity.
The sensitivity analysis results are illustrated in Figure 1, for 2012-2013 and using MICE for imputation. For each panel within the figure, for example, Total FASD and using “any alcohol use” as the weighting variable (first row last panel), the lower right end of the solid lines (nonparametric with λ = 0.088 and ρ = 0.3) gives the lowest possible value of the estimated Total FASD prevalence among the sensitivity analysis results shown in that panel. This value is given as the corresponding feasible lower bound (FLB) in Table 2. The FLB’s are guaranteed to be no lower than the ALB’s in its implementation (12). On the other hand, the wide vertical ranges of these lines, both solid (for nonparametric approach) and dashed (for parametric approach), show the ranges of plausible prevalence values depending on the amount and direction of the uncontrolled confounding as discussed earlier. Additional sensitivity analysis plots similar to Figure 1 are given in the Supplement.
Figure 1.
Sensitivity analysis for FASD and sub-category prevalence for the Pacific Southwestern city in 2012-2013. Abbreviations: Fetal Alcohol Syndrome (FAS); partial Fetal Alcohol Syndrome (pFAS); Alcohol-Related Neurodevelopmental Disorder (ARND); Fetal Alcohol Spectrum Disorder (FASD).
The overall FLB is the lowest of the six FLB’s among the six combinations using two different predictors of response, and three different imputation methods; these are shown in bold typeface in Table 2. It is seen, for example, that the overall FLB for Total FASD prevalence in 2012-2013 was 3.63 per 100 children, compared to the ALB of 1.88. Similarly, in 2013-2014, the overall FLB for Total FASD prevalence was 2.94 per 100 children, compared to the ALB of 2.26. Similar comparisons can also be drawn for the subcategories of fetal alcohol syndrome (FAS), partial fetal alcohol syndrome (pFAS) and alcohol related neurodevelopmental disorder (ARND). An important immediate message is the following: by carrying out careful sensitivity analysis, we were able to derive tighter bounds for FASD prevalence than previously published.
DISCUSSION
For epidemiological surveys like the CoFASP, multiple stages of the active case ascertainment process can create bias for estimating FASD prevalence that is unknown both in direction and in magnitude. This is in contrast to population-based approaches used by the CDC for diseases like cancer, although similar issues exist for other conditions like autism (16). Weighted estimates have been commonly used for survey data, which is equivalent to our MAR estimates. Our sensitivity analysis shows, however, that if uncontrolled confounding exists, it can have substantial impacts on the estimated prevalence rates.
The primary finding in May et al (2018) (5) also used weights, but the weights were only based on the sampling probabilities of subpopulations, such as screened positive versus negative, or being less than 25th percentile on growth etc., while assuming no consent bias or any bias from children who did not complete the full evaluation. The weights in the present analysis are different in that they make use of predictors of non-response, and coupled with sensitivity analysis, they address the consent bias and any other bias resulting from missing outcomes.
The sensitivity analysis presented here utilized the nonparametric approach of Shen et al (12) so that no specific models were assumed. The actual derivation of the Shen et al (12) results is mathematically complex, and is not the focus of our application; we refer interested readers to their original work. The setup of Shen et al (12) framework quantifies how the unobserved variable U varies its strength of effects on the response indicator R and the FASD outcome Y. The two parameters of the analysis control the effects of U on R and Y, respectively. While the meaning of λ (τ) is straightforward in terms of the odds ratio of U on R, we had to carry out extensive simulation in order to determine the feasible values of ρ. Careful selection of these parameter values helps to ensure that the results of the sensitivity analysis are useful in practice. In this case, the feasible lower bounds of FASD and subcategory prevalence rates were improved compared to the absolute lower bound published in May et al. (2018) (5).
In the Methods Section above we have briefly pointed out the connection between prevalence estimate in one population and causal inference in two or more populations. The latter is more complex as there are distinct issues such as selection bias, unobserved confounding, etc. In our simpler one-sample and one-population problem, however, these are all equivalent to non-response bias.
In this paper, we have restricted our attention to the methodology of sensitivity analysis of the weighted MAR estimate. An additional issue is the handling of missing covariates. Multiple imputation assumed that these covariates were missing at random. The additional sensitivity to this assumption was not investigated in the current work.
A limitation of this approach was that we selected the variable of “any alcohol use in pregnancy” as this was captured in the CoFASP study by maternal interview, and was available in the reference population survey through MIHA data. However, low levels of any alcohol use in early pregnancy are not clearly associated with risk for FASD. The more informative covariate would have been multiple “binge” episodes of multiple drinks per occasion in pregnancy, as this is the pattern of alcohol consumption most predictive of FASD. Although the variable of two or more occasions of three or more drinks per occasion was used to define risky alcohol exposure in the CoFASP study, a comparable quantity/frequency measures was not available in the MIHA survey data for the Pacific Southwestern city for the relevant years 2005-2007.
A second limitation is that it is unknown to what extent maternal alcohol use was under-reported or mis-reported in the maternal interviews either in the CoFASP samples, or in the MIHA data.
A third limitation is that we have restricted our sensitivity analysis to unobserved confounding only. For estimation of prevalence rates, it is common to consider sensitivity to the classification criteria. There are a number of different diagnostic schema for FASD currently in use, and when these are alternatively applied to the same sample of children, the resulting prevalence of FASD varies widely, as recently demonstrated by Coles et al (17). In the case of CoFASP, if alternative classification criteria were applied, the absolute lower bound prevalence estimates, and therefore the feasible lower bound estimates, would have differed.
In summary, in epidemiological survey studies, and in particular in the study of a disorder such as FASD, a number of biases may influence the results. Maternal alcohol consumption may not be accurately reported, and mothers with affected children may be less (or more) likely to participate. Thus, consent bias may substantially impact estimates of prevalence in either direction. Sensitivity analyses are needed to better understand the range of potential prevalence.
Supplementary Material
HIGHLIGHTS.
Follow-up work to a recent JAMA article that received wide media coverage (NYtimes, CNN, etc)
Sensitivity analyses on estimates of fetal alcohol spectrum disorders (FASD) prevalence
Methodological implications to epidemiological surveys using weights where unobserved ‘confounding’ (for non-response) might be present
Acknowledgements
The authors would like to thank the California Department of Public Health for providing the Maternal and Infant Health Assessment results. We thank Dr. Changyu Shen at Beth Isreal Deasoness Medical Center for providing the R program for the sensitivity analysis and many related discussions.
This work was supported by the National Institute on Alcohol Abuse and Alcoholism of the National Institutes of Health.
The views expressed in this paper are those of the authors and not necessarily those of any funding body or others whose support is acknowledged. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Sources of funding: This work was supported by the National Institute on Alcohol Abuse and Alcoholism of the National Institutes of Health Grant U01 AA019879.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of interests: None to report
Conflict of interest: none declared.
References
- 1.National Insitutes of Health. Fetal Alcohol Spectrum Disorders Fact Sheet: National Institutes of Health, October 2010; 2010. [Google Scholar]
- 2.Feldman HS, Jones KL, Lindsay S, et al. Prenatal alcohol exposure patterns and alcohol-related birth defects and growth deficiencies: a prospective study. Alcohol Clin Exp Res. 2012;36(4):670–6. doi: 10.1111/j.1530-0277.2011.01664.x [DOI] [PubMed] [Google Scholar]
- 3.National Institute . Updating Estimates of the Economic Costs of Alchol Abuse in the United States: Estimates, Update Methods, and Data. 2000. https://pubs.niaaa.nih.gov/publications/economic-2000/. Accessed August 3, 2018.
- 4.National Institute on Alcohol Abuse and Alcoholism. Secondary Analyses of Existing Alcohol Research Data (R03). 2017. https://grants.nih.gov/grants/guide/pa-files/PA-17-468.html. Accessed August 3, 2018.
- 5.May PA, Chambers CD, Kalberg WO, et al. Prevalence of Fetal Alcohol Spectrum Disorders in 4 US Communities. Jama. 2018;319(5):474–82. doi: 10.1001/jama.2017.21896 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Horvitz DG, Thompson DJ. Using the Whole Cohort in the Analysis of Case-Cohort Data. American Journal of Epidemiology. 1952;169(11):1398–405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Robins J, Rotnitzky A, Zhao L. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–66. [Google Scholar]
- 8.Hogan JW, Lancaster T. Instrumental variables and inverse probability weighting for causal inference from longitudinal observational studies. Statistical Methods in Medical Research. 2004;13:17–48. [DOI] [PubMed] [Google Scholar]
- 9.Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. [Google Scholar]
- 10.Rubin DB. Inference and missing data (with discussion). Biometrika. 1976;63:581–92. [Google Scholar]
- 11.California Department of Public Health. Maternal and Infant Health Assessment (MIHA). https://www.cdph.ca.gov/Programs/CFH/DMCAH/MIHA/Pages/default.aspx.
- 12.Shen C, Li X, Li L, Were MC. Sensitivity analysis for causal inference using inverse probability weighting. Biometrical Journal. 2011;53:822–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. 2011 2011;45(3):67. doi: 10.18637/jss.v045.i03 [DOI] [Google Scholar]
- 14.Little RJA, Rubin DB. Statistical Analysis with Missing Data (2nd Ed.): John Wiley & Sons, Inc.; 2002. [Google Scholar]
- 15.Rosenbaum PR. Observational Studies: Springer-Verlag; New York; 2002. [Google Scholar]
- 16.Elsabbagh M, Divian G, Koh Y, et al. Global prevalence of autism and other pervasive developmental disorders. Autism Research. 2012;5:160–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Coles CD, Gailey AR, Mulle JG, Kable JA, Lynch ME, Jones KL. A comparison among 5 methods for the clinical diagnosis of fetal alcohol spectrum disorders. Alcohol Clinical and Experimental Research. 2016;40:1000–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.