Abstract
Purpose:
To evaluate the use of data from population-based surveys such as the National Health and Nutrition Examination Survey (NHANES) for external adjustment for confounders imperfectly measured in health care databases in the United States.
Methods:
Our example study used Medicaid Analytic eXtract (MAX) data to estimate the relative risk (RR) for prenatal serotonin-norepinephrine reuptake inhibitors (SNRIs) exposure and cardiac defects. Smoking and obesity are known confounders poorly captured in databases. NHANES collects information on lifestyle factors, depression, and prescription medications. External adjustment requires information on the prevalence of confounders and their association with SNRI use; which was obtained from the NHANES. It also requires estimates of their association with the outcome, which were based on the literature and allowed us to correct the RR using sensitivity analyses.
Results:
In MAX, the RR for the association between prenatal SNRI exposure and cardiac defects was 1.51 unadjusted and 1.20 adjusted for measured confounders and restricted to women with depression. In NHANES, among women of childbearing age with depression, the prevalence of smoking was 60.2% (95% Confidence Interval 43.2, 74.3) for SNRI users and 44.1% (39.6, 48.8) for non-users of antidepressants. The corresponding estimates for obesity were 59.2% (43.2, 74.3) and 40.5% (35.9, 45.0), respectively. If the associations between smoking and obesity with cardiac defects are independent from each other and from other measured confounders, additional adjustment for smoking and obesity would move the RR from 1.20 to around 1.10.
Conclusion:
National surveys like NHANES are readily available sources of information on potential confounders and they can be used to assess and improve the validity of RR estimates from observational studies missing data on known risk factors.
Keywords: NHANES, sensitivity analyses, external adjustment, confounding, bias
BACKGROUND
Health care utilization databases have become the standard source of information for health service research and are increasingly used to evaluate the comparative safety and effectiveness of alternative medical interventions.1,2 These databases offer large sample sizes and prospectively recorded data on diagnoses, procedures and prescriptions across the full range of health care settings in routine clinical practice. Moreover, although the cost of processing claims data can be high, it is usually lower than the cost of primary data collection. However, health care utilization databases also present several challenges. When nonrandomized studies use these databases for causal inferences on the risks and benefits of interventions, a major threat to validity is often the lack of comparability between the exposed and reference groups, i.e. confounding bias.
To reduce the possibility of confounding, investigators abstract information on a variety of factors that might be associated with the exposure and outcome of interest. Heterogeneity between exposed and reference groups is then controlled in the analyses through restriction, standardization, stratification, matching or modeling. Although claims are generated for reimbursement purposes and not for research, when carefully organized by experienced researchers with both the clinical knowledge and the understanding of non-clinical influences on coding processes, they can provide accurate clinical information.3,4 Yet, health care databases typically lack information on selected socio-demographic and lifestyle factors which have the potential to confound study results.
When key covariates cannot be measured in claims data, information obtained from external sources can be used to adjust for such unmeasured confounders through binary algebraic solutions or propensity score calibration.5–8 In the United States, the Medicare Current Beneficiaries Survey has been used with these methods in pharmacoepidemiology to indirectly adjust relative risk estimates from Medicare claims data cohorts.9 However, access to a random sample of patients identified in administrative data to gather additional covariate information is often unfeasible or unaffordable. Without any suitable external sources (e.g., surveys, preferentially from the same study source population), researchers often resort to making an educated guess about the information necessary to control for unmeasured confounders (i.e., its prevalence, its association with the exposure and its association with the outcome).10 While sensitivity analyses using this information can provide valuable insight into the magnitude of potential residual bias,11 they are based on assumptions that may not rest on a solid empirical foundation. The uncertainty would diminish if investigators could at least obtain external information on the prevalence of crucial confounders in the exposed and reference group.
Many countries conduct periodic nationwide surveys to gather information on the health status of their citizens and inform policies. For example, the National Health and Nutrition Examination Survey (NHANES) is an ongoing nationally-representative health examination survey conducted by the National Center for Health Statistics of the United States Centers for Disease Control and Prevention (CDC)12,13 to assess the health of the population. The NHANES has been previously used to describe the prevalence of diseases and risk factors, set national standards for health indicators and evaluate their time trends. Here we demonstrate how national health surveys can be used to perform external confounding adjustment via sensitivity analyses and thereby improve the validity of epidemiologic studies when it is not feasible to collect additional information for the study population itself. Specifically, we describe an application of this method to account for potential confounding in a study of medication safety in pregnancy.
METHODS
Application: Serotonin-norepinephrine reuptake inhibitors (SNRIs) and congenital cardiac malformations
We illustrate the use of NHANES data to inform sensitivity analyses for external confounding adjustment with an example from a health care utilization database study where there is concern about potential residual confounding bias by lifestyle factors. The methods and results of the study have previously been described in detail.14 Briefly, in this study we evaluated the risk of major cardiac defects in liveborn infants associated with maternal first trimester use of specific antidepressants using data from the Medicaid Analytic eXtract (MAX) from 2000 to 2007.15,16 The population included 949,504 pregnant women enrolled in Medicaid. In addition to estimating crude relative risks, we restricted the cohort to women with depression (one of the main indications for SNRIs) and used propensity score (PS) stratification to control for depression severity and other potential confounders including risk factors for congenital malformations (e.g., diabetes). Propensity scores were derived from the predicted probability of treatment estimated in a logistic regression model with these pre-specified confounders, and the outcome models were stratified on 100 equally sized propensity score strata. In unadjusted analyses, we observed an elevated relative risk for SNRIs, which was attenuated with confounding adjustment. However, we were still concerned about potential confounding by poorly measured factors such as smoking and obesity. Here, we conduct sensitivity analyses to account for potential residual confounding by these factors. Although we had an imperfect measurement of these variables (i.e., there are claims for obesity and tobacco use disorder), for illustrative purposes we will assume they are completely unmeasured.
Sensitivity Analyses
We used the methods described by Greenland17 and Walker10 to study the potential impact of an unmeasured confounder and to estimate the relative risk (RRED) relating exposure (E) and disease (D) within levels of the confounder (C). These methods have been described in detail elsewhere, but we briefly summarize them in the APPENDIX. They generate a range of plausible values for the adjusted (stratified) RRED as a function of i) the unadjusted RRED, ii) the prevalence of the confounder in the reference group (P0), iii) the association between the confounder and the exposure of interest, i.e., the prevalence of the confounder among exposed (P1) and iv) the association between the confounder and the disease (RRCD). The stratified RRED can be estimated as:
The graphical (or tabular) display of the stratified RRED for a range of plausible values of the bias parameters provides full disclosure of uncertainty due to the unmeasured confounders with no assumptions regarding the likelihood distribution of the parameters within the ranges. However, it does not consider random errors.
Alternatively, probabilistic sensitivity analyses provide a point estimate with a 95% simulation interval, which combines both the random and systematic errors.11 This approach draws the bias parameters (P0, P1, RRCD) from pre-specified distributions, re-arranges data within the 2×2 table based on the selected values for the bias parameters, estimates an adjusted RRED, and repeats this process for a pre-specified number of iterations. The bias-corrected estimate of association is the median of the resulting distribution of adjusted RRED and the 95% simulation interval reflects the 2.5th and 97.5th percentile.
Both methods (deterministic and probabilistic) require information on the bias parameters (P0, P1, RRCD). In our application, the magnitude of the bias will depend on the prevalence of smoking/obesity among SNRI users and non-users in the population and the association between smoking/obesity and cardiac malformations within levels of exposure (assuming no effect measure modification). When this information is not available within the study population it can sometimes be obtained from surrogate populations or, if not available, from the literature. We used empirical data from NHANES on the prevalence of smoking and obesity in the US population and their association with SNRI exposure. Information on the strength of the association between these confounders and cardiac defects was obtained from the literature.
NHANES: External Source for Confounder Prevalence and Association with Exposure
The NHANES includes a nationally representative sample of about 5,000 persons each year located in counties across the US. The interview contains demographic, socioeconomic, dietary, and health-related questions. The physical examination component consists of medical, dental, and physiological measurements as well as collection of biological samples for laboratory tests.13 Per NHANES analytic guidelines, the sample data are weighted to produce data representative of the US civilian population. The weights account for oversampling and survey non-response.
For the present illustrative application, we used six files from NHANES data from 1999 to 2010, a period overlapping with the MAX study (eTable 1): 1) The Demographic Variables and Sample Weights file, which provides selected demographic variables such as gender, age, income, educational level, and examination sample weights; 2) The Body Measures file of the examination component; 3) The Smoking files of the questionnaire component which provides self-reported smoking status at the time of interview; 4) The Patient Health Questionnaire survey (Depression Screener/DPQ file), which provides information on depression but is only available for years 2005–2010; 5) The Health Insurance file of the questionnaire component, which provides self-reported information on Medicaid coverage and allows restriction to this population if needed; 6) The Prescription Medication Section of the questionnaire component, which provides self-reported information on the use of prescription medications in the month before the interview. These files were linked using the unique survey participant identifier number.18
Integration of NHANES data in sensitivity analyses
We defined a range of plausible values for P0, P1 and RRCD based on the NHANES data (P0, P1) and the literature (RRCD). The estimated crude cardiac defects RRED for SNRIs in MAX was 1.51 (95% CI: 1.20–1.90). The use of NHANES for sensitivity analyses involves five steps: 1) Define the population closest to the study population in measured characteristics (e.g., age, sex, insurance). To obtain P0 and P1 estimates from a comparable population, from the 59,367 subjects surveyed within NHANES 1999–2010,19 we identified the subpopulation of 15,736 women of childbearing age (12–55 years old) and further restricted the sample to 2,347 women insured through Medicaid. 2) Ascertain the exposure of interest (e.g., users of a medication in the last 30 days). We identified participants treated with SNRIs (our exposure of interest) and those not treated with antidepressants (our reference group). 3) Calculate the frequency of potential confounders incompletely measured in the data (e.g., BMI, smoking, race) within exposure strata using sampling weights. We calculated the proportion of cigarette smokers and the proportion of obese participants among SNRI users (P1) and non-antidepressant-users (P0) considering sampling weights. 4) Obtain from the literature or from experts’ opinion plausible values for the association between the unmeasured confounders and the outcome of interest. To make informed assumptions about RRCD we reviewed the literature on the association of smoking and obesity with cardiac defects overall. The range of plausible RRCD values considered for the analyses was 1.0 to 1.2 for smoking20,21 and 1.1 to 1.4 for obesity22,23. Note that the association between the confounder and the exposure and outcome can be due to direct or indirect effects, or to shared common causes. 5) Lastly, using these bias parameters, we conducted the sensitivity analyses. In the deterministic analyses, we estimated stratified RRED relating SNRIs and cardiac malformations within levels of smoking and obesity and created three- and two-dimensional grids using the plausible range of parameters. In the probabilistic analyses, we specified the distributions for the bias parameters. In both analyses smoking and obesity were considered as independent confounders, which would lead to over-adjustment if these factors were correlated.
However, in MAX, the RRED for SNRIs was 1.20 (0.91–1.57) upon restriction of the population to women with depression (to reduce confounding by indication) and adjustment for other measured confounders through PS stratification. Therefore, we repeated the sensitivity analyses restricting the NHANES population to women of reproductive age with depression and using the adjusted RRED estimate as the starting point, assuming a similar magnitude of residual confounding by unmeasured factors. Correlation between the measured and unmeasured confounders can result in an overestimation of the magnitude of residual bias for the adjusted estimate.7 Unfortunately, NHANES data do not enable stratification on all variables included in the PS used in the MAX study and therefore do not allow estimation of the distribution of the confounders among exposed and unexposed within PS strata. Similarly, the NHANES sample size did not allow simultaneous restriction to women of reproductive age with depression and on Medicaid to estimate the prevalence of confounders within SNRI users.
NHANES data were analyzed using SAS for Windows, version V.9.3. Sensitivity analyses were performed using Excel.
RESULTS
Of the 15,736 women of childbearing age surveyed in NHANES, 155 (weighted prevalence=1.6%) were taking SNRIs within the 30 days preceding the interview; 10.3% were using antidepressants overall. Compared to non-antidepressant-users, a higher proportion of women on SNRIs had a body mass index (BMI) ≥30 kg/m2 and were smokers. Differences were attenuated in comparisons within women on Medicaid and when the sample was restricted to women with depression. (Table 1) Among women of childbearing age on Medicaid, 209 (8.9%) were on antidepressants based on the NHANES questionnaire, close to the 6.8% found in MAX based on pharmacy dispensing.
TABLE 1.
Weighted proportion of smokers and obese women of childbearing age and 95% confidence intervals for those either unexposed to antidepressants (AD) or exposed to SNRIs. NHANES 1999–2010.
Categories | Overall | Women 12–55 | Women 12–55 on Medicaid | Women 18–55 with depression | ||||
---|---|---|---|---|---|---|---|---|
SNRI n = 355 | no-AD n= 56360 | SNRI n =155 | no-AD n=14610 | SNRI n=30 | no-AD n= 2138 | SNRI n=37 | no-AD n=448 | |
Cigarette Smoking | 28.6% (24.2, 33.6) | 22.4% (22.1, 22.7) | 41.9% (34.4, 49,8) | 22.0% (21.3, 22.7) | 65.0% (48.6, 81.7) | 30.6% (28.7, 32.6) | 60.2% (43.2, 74.3) | 44.1% (39.6, 48.8) |
Obese | 45.7% (40.5, 50.8) | 24.7% (24.3, 25.1) | 49.2% (41.2, 56.9) | 28.7% (28.0, 29.4) | 58.7% (41.9, 76.2) | 36.3% (34.3, 38.4) | 59.2% (43.2, 74.3) | 40.5% (35.9, 45.0) |
Note: Not all the subjects had information on smoking, BMI, or depression. We used the subsamples with available information and applied sampling weights accordingly. Restriction to women with moderate to severe depression included years 2005–2010 and ages 18 to 55, where information was collected.
Deterministic Sensitivity Analyses
Figure 1 and eTable 2 present the external adjustment of the unadjusted observed RRED of 1.51 for both smoking and obesity, independently. The graphs present the RRED adjusted for each confounder given a range of plausible associations of these confounders with cardiac defects (x-axis RRCD) and with SNRI use (y-axis presents % of smokers among SNRI users, which represents a range of RRCE given a fixed estimated % in the reference group). For smoking, we evaluated the impact of a RRCD between smoking and cardiac defects overall ranging from 1 to 1.2; using the estimated prevalence of smoking of 31% among non-antidepressant users based on NHANES data for women of childbearing age on Medicaid. For SNRI users the estimated prevalence of smoking was 65% based on NHANES (Table 1), but we considered a range from 50% to 85% (given its 95%CI of 49% to 82%). For obesity, we evaluated a range of 1 to 1.4 for RRCD and considered a prevalence of obesity of 36% among non-antidepressant users and of 59% for SNRI users, ranging from 40% to 75%. The externally adjusted RRED varies depending on the assumptions. Under the scenarios considered, the corrected RRED estimate is moderately lower than the observed and it does not cross the null.
Figure 1.
Deterministic sensitivity analysis for external adjustment of the observed crude relative risk (RRED) using NHANES data to estimate the prevalence of confounders in exposed and reference groups.
Note: Graphs present the adjustment of the observed RRED of 1.51 between SNRIs and cardiac defects overall for smoking (left graph) and obesity (right graph) considering a range of associations between these confounders and the outcome (RRCD) informed by the literature; using the estimated prevalence of smoking (30.6%) and obesity (36.3%) in the reference group of non-antidepressant users based on NHANES prevalence for women 12–55 on Medicaid; and a plausible range of smoking (50% to 85%) and obesity (40% to 75%) among the SNRIs exposed subjects based on the 95%CIs in table 2. The bolded line in the middle of the grid reflects the most likely prevalence of smoking (65%) and obesity (59%) among exposed based on estimates from NHANES; figure 2 focuses on these estimates.
Figure 2 (left panel) and eTable 3 present the external adjustment for both smoking and obesity using their most likely frequencies based on NHANES (Table 1). In the upper bound scenario of maternal smoking being associated with cardiac malformations overall with a RRCD of 1.2, the smoking-corrected RRED for SNRIs would be 1.42. The corresponding obesity-corrected RRED would be 1.40 if obesity were associated with a RRCD of 1.4. If both smoking and obesity were independently associated with a 1.2 and 1.4 increased risk of cardiac malformations, respectively, the corrected RRED would be 1.32.
Figure 2.
Deterministic Sensitivity Analysis for external adjustment of the observed crude (left panel) and adjusted (right panel) relative risk (RRED) using the NHANES data to estimate the prevalence of confounders in the exposed and reference groups.
Note: The left graph presents the external adjustment of the observed crude RRED of 1.51 for obesity considering a range of associations between obesity and the outcome (RRCD) from 1 to 1.4. It also presents the RRED adjusted for both obesity and smoking assuming independence and considering an association between smoking and cardiac defects overall of 1.2 (the upper bound of CI reported in published meta-analyses). For simplicity, only the range for obesity is presented since it is wider (1 to 1.4) than that for smoking (1 to 1.2) based on published meta-analyses. In NHANES, the estimated prevalence among women on Medicaid for smoking and obesity was 30.6% and 36.3% respectively for non-antidepressant users, and 65.0% and 58.7% respectively for SNRI exposed subjects. The right graph presents the corresponding results for the fully adjusted and depression-restricted RRED of 1.20 using information from the NHANES sample restricted to women with moderate to severe depression. The estimated prevalence among women with depression for smoking and obesity was 44.1% and 40.5% respectively for non-antidepressant users, and 60.2% and 59.2% respectively for SNRI exposed subjects.
Figure 2 (right panel) also presents the external adjustment of the observed fully adjusted RRED of 1.20 among women with depression. Based on NHANES data for women with depression, we estimated a prevalence of smoking of 44% among non-antidepressant users and 60% among SNRI users (Table 1). The estimated prevalence of obesity was 41% among non-antidepressant users and 59% among SNRI users, within women with depression. If maternal smoking and obesity were associated with a 1.2 and 1.4 increased risk of cardiac malformations, respectively, independently from each other and from other adjusted characteristics (e.g. diabetes), the corrected RRED for SNRIs would be 1.10. If these assumptions do not hold, the estimate would be closer to 1.20.
Probabilistic Sensitivity Analyses
The results from the probabilistic sensitivity analyses are summarized in Table 2. The observed unadjusted RRED of 1.51 (1.20–1.90) is reduced to 1.47 after adjustment for smoking with a 95% simulation interval from 1.17 to 1.84. The same RRED adjusted for obesity was estimated at 1.44 (1.15–1.82). The adjusted RRED of 1.20 (0.91–1.57) is further reduced to 1.19 (0.94–1.47) after additional adjustment for smoking, and to 1.16 (0.92–1.44) after additional adjustment for obesity.
Table 2.
Probabilistic Sensitivity Analyses of the observed crude and adjusted relative risk (RR) using NHANES data
Prevalence distribution3 | RR simulation results (N=1,000) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
P1 | P0 | Systematic Error | Total Error | |||||||
Min | Mode | Max | Min | Mode | Max | Median | (2.5th-97.5th percentile) | Median | (2.5th-97.5th percentile) | |
Smoking | ||||||||||
Unadjusted1 | 0.29 | 0.65 | 0.70 | 0.22 | 0.31 | 0.44 | 1.48 | (1.44 – 1.50) | 1.47 | (1.17 – 1.84) |
Adjusted2 | 0.29 | 0.60 | 0.65 | 0.22 | 0.44 | 0.50 | 1.19 | (1.17 – 1.20) | 1.19 | (0.94 – 1.47) |
Obesity | ||||||||||
Unadjusted1 | 0.46 | 0.59 | 0.65 | 0.25 | 0.36 | 0.41 | 1.45 | (1.40 – 1.49) | 1.44 | (1.15 – 1.82) |
Adjusted2 | 0.46 | 0.59 | 0.65 | 0.25 | 0.41 | 0.45 | 1.16 | (1.12 – 1.19) | 1.16 | (0.92 – 1.44) |
Unadjusted RR (95% CI) = 1.51 (1.20–1.90)
Adjusted RR in depression restricted cohort (95% CI) = 1.20 (0.91–1.57).
Mode for unadjusted analyses based on NHANES prevalence for women 12–55 on Medicaid; mode for adjusted analyses based on NHANES prevalence for women 18–55 with depression.
Minimum and maximum of the triangular distribution are informed by the range of prevalence values for NHANES across populations. If the mode corresponded to the upper or lower bound of the NHANES prevalence range, the bound was shifted with 0.05
RRCD ranges from 1.0 to 1.2 for smoking and from 1.0 to 1.4 for obesity, with a mode at 1.1 and 1.2, respectively, in all analyses
Prevalence distributions assumed to be the same for unadjusted and adjusted relative risk corrections, thus assuming independence from confounders included in the propensity score.
DISCUSSION
Pharmaco-epidemiologic studies using health care databases are commonly criticized because of the potential residual confounding from unmeasured non-clinical factors. We used publicly-available files from the NHANES12 to assess the association between our exposure of interest (use of SNRIs) and potential confounders (obesity and smoking) in the US population. This information was used to quantify the magnitude of residual confounding and adjust relative risk estimates for the association between SNRI use during pregnancy and cardiac defects.
We had reported an association between SNRIs and cardiac defects with a crude RR equal 1.51, which was corrected to 1.20 using covariates measured in MAX including proxies for severity of depression, diabetes, concomitant medications, and other clinical and health-care utilization characteristics. Further adjustment for smoking and obesity using NHANES data to inform sensitivity analyses yielded RR estimates between 1.20 and 1.10. Hence, in this example, under plausible conditions, residual confounding by obesity and smoking was likely to bias the results upward somewhat. However, the original adjusted RR of 1.20 was unlikely to be fully explained by residual confounding by obesity or smoking alone. There would have to be a substantial (and implausible) increased risk of cardiac defects overall associated with these maternal factors to account for the observed association between SNRIs and cardiac defects. Nonetheless, other unmeasured confounders could further move a modest RR in the 1.1 to 1.2 range to 1.0. For example, genetic determinants associated with the risk of depression (and thus antidepressants) and cardiac malformations.
To illustrate the usefulness of national surveys in general, and NHANES data in particular, we used simple sensitivity analysis methods that allow adjustment for binary variables, assumed constant relative risks across strata, and assumed that measured and unmeasured covariates are independent given exposure.5 To account for potential correlation between covariates, one potential approach would be to repeat the estimations within levels of important measured confounders and then average results across strata; provided that the confounder is available in the external data source and the sample is large enough as to obtain stable stratified confounder distributions.17 For our application, the NHANES sample of SNRI exposed women was not large enough to stratify by multiple confounders simultaneously. Flexible methods for external adjustment have been developed and allow multivariate adjustment for unmeasured confounders and combined corrections of different biases or sensitivity analysis of confidence intervals.6,24 Both deterministic and probabilistic sensitivity analyses have advantages and disadvantages. When the unmeasured confounding variable is binary and there is no effect modification and no time-dependent effects, deterministic analyses are somewhat easier to implement and communicate25. In graphical and tabular displays, one can observe the change in point estimate as a function of RRCD, P1, and P0. It is left up to the reader to decide which combinations of bias parameters are more likely. The probabilistic bias analysis requires the researcher to specify distributions around the bias parameters, which will often be based on (informed) assumptions. Such analysis provides both a bias-corrected point estimate and a 95% simulation interval, which incorporates both random and systematic error. It is less transparent, however, how the point estimate and simulation interval would change as a function of the distributions specified for the bias parameters. Both approaches are valuable and to some extent complementary. The objective of the current paper was to illustrate the use of NHANES for sensitivity analyses (either deterministic or probabilistic), rather than to compare the available methods for sensitivity analyses.
One of the important limitations of NHANES use for external adjustment is the cross-sectional information, i.e., the exposed group would include both initiators and prevalent users of prescription medications. Therefore, for new-user approaches,26 NHANES data would be suitable when we can assume that the distribution of the characteristics of interest at the time of the survey is not very different from that at treatment initiation. Also, any correlation between the measured and unmeasured confounders can result in an overestimation of the magnitude of bias when correcting an adjusted estimate. However, a transparent disclosure of the plausible range of corrected RRs provides useful information on the direction and upper limit of confounding bias; which is always more desirable than ignoring residual confounding and reporting only the un-corrected RR. Lastly, the population within NHANES is only a surrogate for the study population rather than a random sample. For example, we identified women of reproductive age in NHANES to estimate the prevalence of characteristics in the first trimester of pregnancy in the original MAX cohort, which may overestimate the frequency of factors such as smoking during pregnancy. However, the impact of this misclassification would be limited if it is non-differential with respect to the exposure. Also, one limitation of sensitivity analyses in general is that we are correcting for only a few known unmeasured confounders, therefore residual confounding (in any direction) by other factors might remain.
In conclusion, after all efforts have been made to reduce confounding in the design and analysis, external adjustment through sensitivity analyses can provide a quantitative assessment of potential bias due to unmeasured confounders. National health surveys such as NHANES can be a valuable source to provide empirical estimates on the prevalence of confounders and their association with prescription medications and other exposures of interest in the US population. Moreover, the survey population can be restricted to subgroups more similar to specific study populations and data sources, e.g., the elderly, Medicaid/Medicare beneficiaries. We used pregnancy research as an example, but the same principles would apply to a wide range of epidemiologic studies with missing information on important covariates captured in health surveys.
Supplementary Material
KEY POINTS.
Health care databases often lack information on lifestyle factors, which can lead to residual confounding in pharmacoepidemiologic studies.
Collection of additional information on confounders from a subsample to conduct external adjustment is often unfeasible.
Sensitivity analyses substitute this information with assumptions on the prevalence of the confounder and its association with both the exposure and the outcome of interest.
The National Health and Nutrition Examination Survey (NHANES) is a public source of information on the prevalence of confounders and their association with use of prescription drugs.
National surveys such as NHANES can inform sensitivity analyses to generate a range of plausible externally adjusted relative risk estimates.
ACKNOWLEDGMENTS:
Work supported by the National Institute of Mental Health R01 MH116194. S Hernandez-Diaz reports being an investigator on research grants to her institution from Eli Lilly, GSK and Takeda, and consulted for Roche and UCB; all outside the submitted work. BT Bateman reports being an investigator on grants from Eli Lilly, Baxalta, GSK and Pacira outside the submitted work; and reports personal fees from Aetion and from Alosa Foundation outside the submitted work. KF Huybrechts reports being investigator on grants from Eli Lilly, and GSK outside the submitted work. K Palmsten was supported by a career development award from the Eunice Kennedy Shriver National Institute of Child Health & Human Development, National Institutes of Health (R00HD082412).
Abbreviations:
- CDC
Centers for Disease Control and Prevention
- NCHS
National Center for Health Statistics
- NHANES
National Health and Nutrition Examination Survey
REFERENCES
- 1.Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. Journal of clinical epidemiology 2005;58:323–37. [DOI] [PubMed] [Google Scholar]
- 2.Huybrechts KF, Bateman BT, Hernandez-Diaz S. Use of real-world evidence from healthcare utilization data to evaluate drug safety during pregnancy. Pharmacoepidemiology and drug safety 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Kiyota Y, Schneeweiss S, Glynn RJ, Cannuscio CC, Avorn J, Solomon DH. Accuracy of Medicare claims-based diagnosis of acute myocardial infarction: estimating positive predictive value on the basis of review of hospital records. American heart journal 2004;148:99–104. [DOI] [PubMed] [Google Scholar]
- 4.Palmsten K, Huybrechts KF, Kowal MK, Mogun H, Hernandez-Diaz S. Validity of maternal and infant outcomes within nationwide Medicaid data. Pharmacoepidemiology and drug safety 2014;23:646–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cornfield J, Haenszel W, Hammond EC, Lilienfeld AM, Shimkin MB, Wynder EL. Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Cancer Institute 1959;22:173–203. [PubMed] [Google Scholar]
- 6.Hernan MA, Robins JM. Method for conducting sensitivity analysis. Biometrics 1999;55:1316–7. [PubMed] [Google Scholar]
- 7.Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiology and drug safety 2006;15:291–303. [DOI] [PubMed] [Google Scholar]
- 8.Sturmer T, Schneeweiss S, Avorn J, Glynn RJ. Adjusting effect estimates for unmeasured confounding with validation data using propensity score calibration. Am J Epidemiol 2005;162:279–89. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Schneeweiss S, Glynn RJ, Tsai EH, Avorn J, Solomon DH. Adjusting for unmeasured confounders in pharmacoepidemiologic claims data using external information: the example of COX2 inhibitors and myocardial infarction. Epidemiology 2005;16:17–24. [DOI] [PubMed] [Google Scholar]
- 10.Walker AM. Observation and inference. An introduction to the methods of epidemiology. Newton Lower Falls: Epidemiology Resources Inc.; 1991. [Google Scholar]
- 11.Lash TL, Schmidt M, Jensen AO, Engebjerg MC. Methods to apply probabilistic bias analysis to summary estimates of association. Pharmacoepidemiology and drug safety 2010;19:638–44. [DOI] [PubMed] [Google Scholar]
- 12.McDowell A, Engel A, Massey J, Maurer K. Plan and operation of the second National Health and Nutrition Examination Survey, 1976–80. national Center for Health Statistics. Vital Health Stat 1981;1(15). [PubMed] [Google Scholar]
- 13.http://www.cdc.gov/nchs/data/nhanes/nhanes_13_14/2013-14_overview_brochure.pdf. 2013-2014.
- 14.Palmsten K, Huybrechts KF, Mogun H, et al. Harnessing the Medicaid Analytic eXtract (MAX) to Evaluate Medications in Pregnancy: Design Considerations. PloS one 2013;8:e67405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Huybrechts KF, Palmsten K, Avorn J, et al. Antidepressant use in pregnancy and the risk of cardiac defects. The New England journal of medicine 2014;370:2397–407. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Huybrechts KF, Palmsten K, Mogun H, et al. National trends in antidepressant medication treatment among publicly insured pregnant women. General hospital psychiatry 2013;35:265–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Greenland S. Basic methods for sensitivity analysis of biases. International journal of epidemiology 1996;25:1107–16. [PubMed] [Google Scholar]
- 18.National Health and Nutrition Examination Survey. NHANES 1999–2000 Data Files. www.cdc.gov/nchs/about/major/nhanes/nhanes99_00.htm. 2005.
- 19.Analytic and reporting guidelines, The National Health and Nutrition Examination Survey (NHANES). 2005. (Accessed February 26, 2013, at http://www.cdc.gov/nchs/data/nhanes/nhanes_03_04/nhanes_analytic_guidelines_dec_2005.pdf )
- 20.Lee LJ, Lupo PJ. Maternal smoking during pregnancy and the risk of congenital heart defects in offspring: a systematic review and metaanalysis. Pediatr Cardiol 2013;34:398–407. [DOI] [PubMed] [Google Scholar]
- 21.Zhang D, Cui H, Zhang L, Huang Y, Zhu J, Li X. Is maternal smoking during pregnancy associated with an increased risk of congenital heart defects among offspring? A systematic review and meta-analysis of observational studies. The journal of maternal-fetal & neonatal medicine : the official journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstet 2017;30:645–57. [DOI] [PubMed] [Google Scholar]
- 22.Zheng Z, Yang T, Chen L, et al. Increased maternal Body Mass Index is associated with congenital heart defects: An updated meta-analysis of observational studies. Int J Cardiol 2018;273:112–20. [DOI] [PubMed] [Google Scholar]
- 23.Zhu Y, Chen Y, Feng Y, Yu D, Mo X. Association between maternal body mass index and congenital heart defects in infants: A meta-analysis. Congenit Heart Dis 2018;13:271–81. [DOI] [PubMed] [Google Scholar]
- 24.Greenland S. Multiple-bias modelling for analysis of observational data. J R Statist Soc A 2005;168:,267–306. [Google Scholar]
- 25.Vanderweele TJ, Arah OA. Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology 2011;22:42–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Ray WA. Evaluating Medication Effects Outside of Clinical Trials: New-User Designs. Am J Epidemiol 2003;158:915–20. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.