Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Mar 2.
Published in final edited form as: Stata J. 2012 Oct 1;12(4):718–725.

Incorporating Complex Sample Design Effects When Only Final Survey Weights are Available

Brady T West 1, Sean Esteban McCabe 2
PMCID: PMC3939068  NIHMSID: NIHMS402292  PMID: 24596541

Abstract

This article considers the situation that arises when a survey data producer has collected data from a sample with a complex design (possibly featuring stratification of the population, cluster sampling, and / or unequal probabilities of selection), and for various reasons only provides secondary analysts of those survey data with a final survey weight for each respondent and “average” design effects for survey estimates computed from the data. In general, these “average” design effects, presumably computed by the data producer in a way that fully accounts for all of the complex sampling features, already incorporate possible increases in sampling variance due to the use of the survey weights in estimation. The secondary analyst of the survey data who then 1) uses the provided information to compute weighted estimates, 2) computes design-based standard errors reflecting variance in the weights (using Taylor Series Linearization, for example), and 3) inflates the estimated variances using the “average” design effects provided is applying a “double” adjustment to the standard errors for the effect of weighting on the variance estimates, leading to overly conservative inferences. We propose a simple method for preventing this problem, and provide a Stata program for applying appropriate adjustments to variance estimates in this situation. We illustrate two applications of the method to survey data from the Monitoring the Future (MTF) study, and conclude with suggested directions for future research in this area.

Background

Standard practice in the design-based analysis of complex sample survey data requires data analysts to identify variables containing final survey weights (possibly compensating for unequal probabilities of selection, nonresponse adjustments, and/or post-stratification adjustments) and either variables identifying first-stage sampling error strata and first-stage sampling error computation units (SECUs, or “ultimate clusters”), or variables containing replicate survey weights, enabling the use of replicated variance estimation procedures (Heeringa et al., 2010, Chapter 4). In both cases, the final survey weights enable computation of unbiased estimates of descriptive parameters and regression parameters in finite populations. In the former case, one estimates the sampling variance of a parameter estimate using a first-order Taylor Series approximation (approximating the parameter estimate as a linear function of weighted sample totals, and then computing the variance of this approximation). This is known as “Taylor Series Linearization,” or TSL, and this variance estimation method introduces a slight positive bias in variance estimates (and slightly conservative inferences about the population of interest); see Wolter (2007) for more details. Replicated variance estimation methods, such as Jackknife Repeated Replication (JRR) and Balanced Repeated Replication (BRR), are also possible when these design codes are available. In the latter case, when only the final survey weight and replicate survey weights are available, JRR and BRR can be used to estimate variances. Asymptotically, TSL, JRR, and BRR converge to very similar variance estimates for most parameter estimates (Rao and Wu, 1985).

The Stata software provides data analysts with several easy-to-use tools implementing these analysis procedures, including svyset (for identifying complex design features), svydes (for simple descriptive analyses of the identified design features) and the ability to insert the svy: modifier before a wide variety of descriptive (e.g., mean) and model-based (e.g., regress) commands to implement appropriate design-based analyses. Unfortunately, not all public-use survey data sets contain all of the necessary variables enabling the alternative variance estimation procedures. Stratum and SECU codes are often excluded from public-use survey data sets to maintain respondent confidentiality and limit disclosure risk (Lu and Sitter, 2008). Some data sets may also exclude replicate weights, providing data analysts with only the final survey weight (e.g., Johnston et al., 2008). In this case, the data producer needs to provide the data user with design effects for a wide variety of key parameters (e.g., Johnston et al., 2011, pp. 532-552), which the data user can then use to adjust variance estimates and associated confidence intervals, correctly accounting for complex sampling features (e.g., Thomas et al., 2005).

In general, following the notation used by Park et al. (2003), the design effect for a given parameter estimate θ^ is defined as follows:

Deft2(θ^)=Var(θ^)CDVar(θ^)SRSWR=Deft2start×Deft2clust×Deft2weights (1)

In (1), θ^ refers to an estimate of the parameter ignoring the weights and assuming simple random sampling with replacement (SRSWR). The total design effect Deft3(θ^) (which is estimated in practice) accounts for the multiplicative change in the variance of an estimate under SRSWR due to complex sample design (CD) features, including without-replacement selection, stratified cluster sampling (generally resulting in an increase in the variance due to the cluster sampling, i.e., Deft2clust > 1, and a decrease in the variance due to stratified sampling, i.e., Deft2strat < 1), and the use of weights in estimation (generally resulting in an increase in the variance, i.e., Deft2weights > 1; see Heeringa et al., 2010, Section 2.5).

Previous work has shown that the total design effect for an estimated mean is an approximate function of complex interactions between the various sample design features, the relationship of the variable of interest with the sampling weights, and the distribution of the variable of interest (Park and Lee, 2004). For practical purposes, we write the total design effect in (1) as the simple product of three separate design effects due to each complex sampling feature, as suggested by Park et al. (2003). Importantly, this result only holds if the survey variable of interest and the survey weights are independent (Park and Lee, 2004). Under this assumption, if a data producer only provides a data user with a final survey weight and average total design effects, the data user can compute weighted estimates and linearized variance estimates based on the weights (introducing Deft2weights), and then further adjust the estimated variances to incorporate additional design effects due to stratified cluster sampling.

Because total design effects include effects on the variance due to all of the elements of complex sampling (weighting, stratification, and clustering), one does not want to “doubly adjust” for the effects of weighting if a linearized variance estimator incorporating the weights has already been used. For example, suppose that a data producer provides the public with a survey weight in a data file, possibly incorporating compensations for unequal probability of selection, differential nonresponse, and post-stratification. The data producer also provides the data user with average design effects for many estimates of interest, several of which suggest that Deft3(θ^)=2.0. Importantly, this average design effect of 2.0 already includes multiplicative increases in the variance due to weighting (as computed by the data producer), as shown in (1). The data user then computes weighted estimates and linearized variance estimates for those weighted estimates (which incorporate increases in variance due to variability in the weights), and proceeds to multiply the estimated variance by the average design effect of 2.0 (which already includes the increase in variance due to weighting), as instructed by the data producer. The net result is an unnecessary inflation of the variance of the estimate, and overly conservative inferences.

A simple adjustment procedure [based on the approximation in (1)] can be used if the data user is only provided with a final survey weight and an average total design effect for a number of key statistics. The design effect due to the use of weights (which we denote as Deft2weights) can first be estimated using the estat effects command in Stata, after estimating a parameter and using TSL to estimate the variance of the parameter estimate. The average design effect provided by the data producer (Deft2¯) can then be divided by the design effect due to weighting to “extract” the approximate portion of the overall average design effect due to stratified cluster sampling:

Deft2¯Deft2weightsDeft2start×Deft2clust (2)

This extracted portion of the overall design effect can then be used to adjust estimated variances and corresponding test statistics and confidence intervals. If degrees of freedom based on the complex sample are also provided by the data producer, these can be incorporated into the adjustments as well. If not, large-sample critical values based on the standard normal distribution can be used to compute p-values for standard test statistics and construct confidence intervals.

The following Stata .ado file defines a simple command that takes five inputs [a weighted parameter estimate (1), the linearized estimate of the standard error (2), an average design effect (3), a design effect due to weighting (4), and an indicator of whether exponentiated forms of parameter estimates are desired (5)] and enables this type of design effect adjustment.

program define deft2corr
capture log close
di “ ”
if ‘5’ == 1 {
    di “Exponentiated Estimate: ” exp (‘1’)
    di “95% CI LL: ” exp (‘1’ - 1.96* ‘2’* (sqrt (‘3’)/sqrt (‘4’) ) )
    di “95% CI LL: ”exp (‘1’ + 1.96* ‘2’* (sqrt (‘3’)/sqrt (‘4’) ) )
}
if ‘5’ == 0 {
    di “Estimate: ” ‘1’
    di “95% CI LL: ” ‘1’ - 1.96* ‘2’ * (sqrt (‘3’)/sqrt (‘4’) )
    di “95% CI LL: ” ‘1’ + 1.96* ‘2’ * (sqrt(‘3’)/ sqrt (‘4’) )
}
di “Z statistic: ” ‘1’ / (‘2’ * (sqrt (‘3’) / sqrt (‘4’) ) )
if ‘1’ / (‘2’ * (sqrt (‘3’) / sqrt (‘4’) ) ) > 0 di “p-value: ” 2 * (1- normal (‘1’ / (‘2’* (sqrt (‘3’) / sqrt(‘4’) ) ) ) )
if ‘1’ / (‘2’ * (sqrt (‘3’) / sqrt (‘4’) ) ) < = 0 di “p-value: ” 2 * (normal (‘1’ / (‘2’ * (sqrt (‘3’) / sqrt (‘4’) ) ) ) )
end
exit

We illustrate the use of the deft2corr command by analyzing data from the Monitoring the Future (MTF) study (years 2007-2009).

Illustration

Nonmedical use of prescription opioids is a growing public health problem in the United States. Previous research focusing on young adults has found that more than 1 in every 10 lifetime nonmedical users of prescription opioids report intranasal administration (snorting) (McCabe et al., 2007). Furthermore, approximately 67% of intranasal users screened positive for drug abuse in the past year relative to approximately 6% of non-users and 26% of nonmedical users who reported oral administration only. The use of prescription opioids via intranasal and other non-oral routes of administration is an extremely dangerous drug use behavior that has been linked to a number of adverse physical consequences (Jewers et al., 2005; Watson et al., 2004; Yewell, Haydon, Archer, & Manaligod, 2002). Additionally, the rate of delivery of drug to the brain directly correlates to the abuse potential of the drug and intranasal along with other non-oral routes of administration deliver drug to the brain at a much faster rate than oral administration (Kollins, 2003; Roset et al., 2001).

In this example, we analyze survey data from the Monitoring the Future (MTF) Study (Years 2007-2009), and focus on two research objectives for 12th grade students in the United States. The first objective is to estimate the proportion of nonmedical users of prescription opioids among high school seniors (modal age 18) using via intranasal administration. The second objective is to use logistic regression modeling to estimate differences in the odds of intranasal administration between nonmedical users only, nonmedical users who began using nonmedically prior to medical use, and nonmedical users who began using medically before nonmedical use (adjusting for race / ethnicity, year, geographic region of the school, and metropolitan statistical area). The MTF study only provides data users with a final survey weight (the variable V5; Johnston et al., 2008), and includes appendices of total design effects for a variety of estimates, enabling the computation of average total design effects (Johnston et al., 2011, Appendix C). For this illustration, we use an average MTF total design effect of 2.0 (Johnston et al., 2011).

Prior to running the analyses, we examined the critical assumption underlying the result for the total design effect in (1), and found that the correlation of the survey weight variable (V5) and the nasal administration indicator (V1615) was negligible (r = 0.01, p = 0.82). The following Stata commands implement the analyses, and illustrate the use of the deft2corr command:

* Set up survey weight and linearized variance estimation (default).
svyset [pweight = V5]
* Estimate the proportion of nonmedical users (subpop) using nasal
* admin (V1615).
svy, subpop(if usehistory == 3 | usehistory == 4 | usehistory == 5): prop V1615
* Request DEFT2 due to weighting (only).
estat effects
* Apply deft2corr, inputting the estimate, linearized SE, average
* total DEFT2, and DEFT2 component due to weights.
deft2corr 0.3627 0.0257 2 1.3528 0
* Estimate logistic regression model for odds of nasal admin.
svy, subpop(if usehistory == 3 | usehistory == 4 | usehistory == 5): logit V1615 ib3.usehistory i.V1151 i.V1 i.V13 i.V17
* Request DEFT2 estimates due to weighting (only).
estat effects
* Apply deft2corr for each parameter estimate, requesting
* exponentiated estimates (adjusted odds ratios).
deft2corr 1.7938 0.5310 2 1.4987 1
deft2corr 2.0390 0.5172 2 1.5098 1

Table 1 presents results from running the commands above (columns 2 and 5), along with the resulting 95% confidence intervals when only using the weights without any adjustment for stratification and clustering (column 3), and the confidence intervals computed when applying the average total design effect on top of the linearized standard error already incorporating the weights (the “double adjustment”; column 4).

Table 1.

Comparisons of Design-Based 95% Confidence Intervals for Selected Parameters using Alternative Design Effect Adjustments (Source: MTF 2007-2009).

95% Confidence Interva s:
Parameter Weighted Estimate Without Adjustment for Stratification and Clustering With “Double Adjustment” for Weighting With Correct Adjustment
Proportion: Nasal Administration 0.3627 (0.3124, 0.4130) (0.2915, 0.4339) (0.3015, 0.4239)
AOR: NMPM vs. MPNM 6.0120 (2.1230, 17.0245)** (1.3798, 26.1973)* (1.8067, 20.0071)**
AOR: NMO vs. MPNM 7.6826 (2.7874, 21.1746)*** (1.8320, 32.2206)** (2.3923, 24.6734)***

NOTES: AOR = Adjusted Odds Ratio. NMPM = Nonmedical Use Prior to Medical Use. MPNM = Medical Use Prior to Nonmedical Use. NMO = Nonmedical Use Only. t-test of null hypothesis that regression parameter is equal to 0:

***

p < 0.001

**

p < 0.01

*

p < 0.05.

An estimated 36.27% of nonmedical users of prescription opioids in the 12th grade (modal age 18) during 2007-2009 administered the opioids intranasally, while nonmedical users who used nonmedically prior to any medical use and nonmedical users only (with no medical use ever) had 6.01 and 7.68 times greater odds of intranasal administration than nonmedical users who used medically first, holding the other covariates fixed.

The results in Table 1 clearly show how the correct design effect adjustment (last column) produces 95% confidence intervals for the parameters of interest with lower and upper limits that lie in-between the overly conservative limits computed using the “double” adjustment for the effect of weighting, and the overly liberal limits computed without adjusting for the effects of stratification and clustering. In the cases of the two adjusted odds ratios, the level of significance actually varies depending on the adjustment used. This suggests that the overly conservative “double” adjustment could certainly impact tests of significance in other cases and contexts.

Notably, the average total design effect provided by a data producer (2.0 in this case) is essentially arbitrary, and can have a big impact on inferences. For this reason, it is essential for data users to obtain an appropriate average total design effect for the subpopulation and the estimates in which they are interested when using these adjustment methods. We also acknowledge that more complex types of corrections may be needed if the provided survey weights are informative, or strongly related to the survey variables of interest. Future research on this scenario needs to consider whether the results of Park and Lee (2004) can be applied to enable data users who are only provided with survey weights and average design effects to correctly compute estimated standard errors that fully reflect all of the complex features of a given sample design.

Acknowledgements

The development of this article was supported by a research grant R01DA031160 from the National Institute on Drug Abuse, National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Drug Abuse or the National Institutes of Health. The authors would like to thank the Substance Abuse and Mental Health Data Archive for providing access to these data, and one anonymous reviewer for helpful comments on earlier versions of this paper.

Biography

Brady T. West, M.A., Ph.D. is a Research Assistant Professor in the Survey Methodology Program, located within the Survey Research Center at the Institute for Social Research on the University of Michigan-Ann Arbor (U-M) campus. He also serves as a Statistical Consultant at the U-M Center for Statistical Consultation and Research. A winner of the Edward C. Bryant Scholarship for outstanding academic achievement in survey statistics (sponsored by Westat and the American Statistical Association) as a doctoral student, his current research interests include the implications of measurement error in auxiliary variables and survey paradata for survey estimation, survey nonresponse, interviewer variance, analysis of complex sample survey data, and multilevel regression models for clustered and longitudinal data. He is the lead author of a book comparing different statistical software packages in terms of their mixed-effects modeling procedures (Linear Mixed Models: A Practical Guide using Statistical Software, Chapman Hall/CRC Press, 2007), with a second edition currently being written, and he is a co-author of a second book entitled Applied Survey Data Analysis (with Steven Heeringa and Pat Berglund), which was published by Chapman Hall in April 2010.

Sean Esteban McCabe, M.A., M.S.W., Ph.D. is a Research Associate Professor at the University of Michigan Substance Abuse Research Center (UMSARC) and the Institute for Research on Women and Gender (IRWG). Dr. McCabe is an internationally recognized scholar in the areas of web-based data collection, prescription drug misuse, gender, sexual orientation, and epidemiology of substance abuse. Dr. McCabe has served as the Director of the Substance Abuse Research Center and the Office of Student Conflict Resolution at the University of Michigan. Dr. McCabe was recently the recipient of the University of Michigan Research Faculty Recognition Award and he has been the recipient of two NIH Junior Investigator Awards. Dr. McCabe reviews manuscripts for 24 health and substance abuse journals and he reviews grant applications for several organizations including the National Institutes of Health and Department of Education. He has served as principal investigator of eight NIH-funded projects in the past five years, as well as a participating investigator and faculty mentor on a number of NIH-funded projects and he has authored over 90 peer-reviewed articles.

Contributor Information

Brady T. West, Survey Methodology Program, Survey Research Center, Institute for Social Research, University of Michigan-Ann Arbor, Ann Arbor, MI (bwest@umich.edu)

Sean Esteban McCabe, Institute for Research on Women and Gender, Substance Abuse Research Center, University of Michigan-Ann Arbor, Ann Arbor, MI (plius@umich.edu).

References

  1. Heeringa SG, West BT, Berglund PA. Applied Survey Data Analysis. Chapman and Hall / CRC Press; Boca Raton, FL: 2010. [Google Scholar]
  2. Jewers WM, Rawal YB, Allen CM, Kalmar JR, Fox E, Chacon GE, Sedghizadeh PP. Palatal Perforation Associated with Intranasal Prescription Narcotic Abuse. Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, Endodontology. 2005;99:594–597. doi: 10.1016/j.tripleo.2004.04.006. [DOI] [PubMed] [Google Scholar]
  3. Johnston LD, O'Malley PM, Bachman JG, Schulenberg JE. Monitoring the Future: A Continuing Study of American Youth (12th-Grade Survey), 2007. Form 1 Data Codebook. Inter-University Consortium for Political and Social Research, University of Michigan; Ann Arbor, MI: 2008. [Google Scholar]
  4. Johnston LD, O'Malley PM, Bachman JG, Schulenberg JE. Monitoring the Future National Survey Results on Drug Use, 1975-2010. I. Secondary School Students. Institute for Social Research, University of Michigan; Ann Arbor, MI: 2011. [Google Scholar]
  5. Kollins SH. Comparing the Abuse Potential of Methylphenidate Versus Other Stimulants: A review of available evidence and relevance to the ADHD patient. Journal of Clinical Psychiatry. 2003;64:14–18. [PubMed] [Google Scholar]
  6. Lu W, Sitter RR. Disclosure Risk and Replication-Based Variance Estimation. Statistica Sinica. 2008;18:1669–1687. [Google Scholar]
  7. McCabe SE, Cranford JA, Boyd CJ, Teter CJ. Motives, Diversion and Routes of Administration Associated with Nonmedical Use of Prescription Opioids. Addictive Behaviors. 2007;32:562–575. doi: 10.1016/j.addbeh.2006.05.022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Park I, Lee H. Design Effects for the Weighted Mean and Total Estimators under Complex Survey Sampling. Survey Methodology. 2004;30(2):183–193. [Google Scholar]
  9. Park I, Winglee M, Clark J, Rust K, Sedlak A, Morganstein D. Design Effects and Survey Planning.. Proceedings of the Section on Survey Research Methods, Joint Statistical Meetings, 2003.2003. pp. 3179–3186. [Google Scholar]
  10. Rao JNK, Wu CFJ. Inference from Stratified Samples: Second-Order Analysis of Three Methods for Nonlinear Statistics. Journal of the American Statistical Association. 1985;80:620–630. [Google Scholar]
  11. Roset PN, Farre M, de la Torre R, Mas M, Menoyo E, Hernandez C. Modulation of Rate of Onset and Intensity of Drug Effects Reduces Abuse Potential in Healthy Males. Drug and Alcohol Dependence. 2001;64:285–298. doi: 10.1016/s0376-8716(01)00127-2. [DOI] [PubMed] [Google Scholar]
  12. Thomas SL, Heck RH, Bauer KW. Weighting and Adjusting for Design Effects in Secondary Data Analyses. Chapter 4 in New Directions for Institutional Research. 2005;2005(127) [Google Scholar]
  13. Watson WA, Litovitz TL, Klein-Schwartz W, Rodgers GC, Youniss J, Reid N, Rouse WG, Rembert RS, Borys D. 2003 Annual Report of the American Association of Poison Control Centers Toxic Exposure Surveillance System. American Journal of Emergency Medicine. 2004;22:335–404. doi: 10.1016/j.ajem.2004.06.001. [DOI] [PubMed] [Google Scholar]
  14. Wolter KM. Introduction to Variance Estimation. 2nd ed. Springer-Verlag; New York: 2007. 2007. [Google Scholar]
  15. Yewell J, Haydon R, Archer S, Manaligod JM. Complications of Intranasal Prescription Narcotic Abuse. Annals of Otology, Rhinology and Laryngology. 2002;111:174–177. doi: 10.1177/000348940211100212. [DOI] [PubMed] [Google Scholar]

RESOURCES