Skip to main content
American Journal of Public Health logoLink to American Journal of Public Health
. 2015 Nov;105(11):2214–2215. doi: 10.2105/AJPH.2015.302842

Weighted Multilevel Models: A Case Study

Brady T West 1,, Linda Beer 1, Garrett W Gremel 1, John Weiser 1, Christopher H Johnson 1, Shikha Garg 1, Jacek Skarbinski 1
PMCID: PMC4605185  PMID: 26378830

Recent advances in statistical software1 have enabled public health researchers to fit multilevel models to a variety of outcome variables. Multilevel models facilitate inferences regarding unexplained variability among randomly sampled clusters of units (e.g., hospitals) in outcomes of interest and identify covariates that explain the variance in a given outcome at each level of a particular data hierarchy (e.g., patients within hospitals).2,3 Models with random intercepts enable researchers to accommodate correlations within higher-level units resulting from longitudinal or clustered study designs, and models with random coefficients enable researchers to identify higher-level covariates that explain between-cluster variance in relationships of interest.2,3

Public-use survey data sets collected from large national samples, such as the National Health and Nutrition Examination Survey, also have become widely available.4 The samples underlying these data sets are often “complex” in nature for 2 reasons: (1) the use of stratified multistage cluster sampling to increase sampling and cost efficiency and (2) unequal probabilities of selection from target populations for sampled elements, often as a result of oversampling of key subgroups (leading to the need to use weights for generating unbiased population estimates). Secondary analysts can accommodate these design complexities statistically by using “design-based” analyses, which ensure that population inferences are unbiased with respect to the sample design.4 However, these design-based approaches generally do not enable the types of cluster-specific inferences afforded by multilevel models,2,3 and researchers are now considering multilevel models for complex sample survey data.

Multilevel modeling represents a “model-based” approach to survey data analysis, in which dependencies in the data introduced by complex sampling features are generally accounted for by sound specification of the underlying probability model.5,6 Advocates of this approach argue that any information contained in the sample design features should be accounted for in the model specification, making the sampling uninformative.5 However, analysts may not have access to covariates capturing all of this information. In this case, the use of weighted estimation when fitting multilevel models provides some protection against potential biases introduced by informative sampling.6 Informed by recent methodological and computational developments in this area,1–3,6,7 we show that changes in inferences are possible when fitting multilevel models to complex sample survey data and ignoring the sampling weights.

We analyzed data from the 2013 Medical Monitoring Project HIV Provider Survey, sponsored by the Centers for Disease Control and Prevention, for which a probability sample of HIV care providers was selected from outpatient HIV care facilities in 16 states and Puerto Rico.8,9 Briefly, the provider survey followed a 2-stage probability-proportionate-to-size sample design, first sampling states and territories and then HIV facilities and selecting all providers within a facility. Unbiased estimation of multilevel model parameters requires the use of weights at all levels of a given data hierarchy,7 so we used previously calculated sampling weights adjusted for nonresponse at the facility level and inverses of estimated response probabilities at the provider level.

We focus on only facilities with multiple responding providers and include covariates that are both theoretically relevant for the dependent variables described later in this article and related to the sampling weights (e.g., an indicator of the provider serving more than 200 patients). Details about computation of the Medical Monitoring Project sampling weights for both providers and facilities are available on request.10 We scaled the final provider-level weights to sum to the sample sizes within each facility. A failure to do this would overstate actual sample sizes within each higher-level unit (facility), possibly resulting in biased estimates of model parameters.2,3,7

We fit multilevel logistic regression models to 2 binary dependent variables, indicating whether the responding provider delivered adequate drug use risk reduction and sexual risk reduction services to patients (defined as delivering approximately 70% of recommended risk reduction services to most or all of the patients). The models included random intercepts to capture between-facility variation in each proportion, in addition to fixed effects of several provider- and facility-level covariates of interest. We fit these models with the new GLIMMIX command11 in SAS/STAT version 13.1 (SAS Institute, Cary, NC), which can fit multilevel models to complex sample survey data. Identical results can be obtained with the new svy: melogit command in Stata version 14 (StataCorp LP, College Station, TX).

We did not test whether the parameter differences in the weighted and unweighted models were significant,12 but we did observe several shifts in inference when using weighted estimation (Table A; available as a supplement to the online version of this article at http://www.ajph.org). In both models, the intercept became more negative and significant, suggesting that the probability of using adequate risk reduction was being overstated for the type of provider represented by zeroes on all of the covariates (which may not be entirely meaningful in all models). For drug risk reduction, the coefficient for delivering care in a language other than English became nonsignificant. For the sexual risk reduction outcome, the male provider coefficient became significant, and the Black provider, nurse practitioner, and integrated team effects became even stronger. Finally, the estimated variability of the random facility intercepts was clearly being overstated when ignoring the weights, and the weighted models explained more of the variance in the outcomes at each level.

The weights at each level were clearly informative about the parameters defining these models, and ignoring them in analysis would have led to erroneous inferences with respect to the sample design used. Notably, these results held despite the inclusion of available covariates related to the sampling weights in the models. In practice, covariates used to compute the weights or the weights at each level of the data hierarchy may not be available to the public, making appropriate design-adjusted estimation of multilevel models difficult or impossible. We encourage analysts fitting multilevel models to survey data to carefully examine the variables available for weighted estimation in these data sets, make use of the powerful software1–3,11 that has been developed in this area, and (when possible) examine whether weighted estimation or adjustment for covariates related to the weights affects their inferences.

Acknowledgments

We thank the participating Medical Monitoring Project providers, facilities, project areas, and Provider and Community Advisory Board members. We also acknowledge the contributions of the Clinical Outcomes Team, the Behavioral and Clinical Surveillance Branch, other members of the Division of HIV/AIDS Prevention at Centers for Disease Control and Prevention, and the Medical Monitoring Project 2013 Study Group members (http://www.cdc.gov/hiv/statistics/systems/mmp/resources.html#StudyGroupMembers). Finally, we would like to thank the Altarum Institute data collection team.

Human Participant Protection

In accordance with federal human participant protection regulations [Protection of Human Subjects, US Federal Code Title 45 Part 46 (2009). Available at: http://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.html. Accessed February 4, 2014] and guidelines for defining public health research [Centers for Disease Control and Prevention. Distinguishing Public Health Research and Public Health Nonresearch. 2010. Available at: http://www.cdc.gov/od/science/integrity/docs/cdc-policy-distinguishing-public-health-research-nonresearch.pdf. Accessed February 4, 2014], the Medical Monitoring Project Provider Survey was determined by Centers for Disease Control and Prevention to be a nonresearch, public health surveillance activity used for disease control program or policy purposes. Recruitment materials explained the voluntary nature of the survey; informed consent was not obtained.

References

  • 1.Galecki AT, West BT. Software for fitting multilevel models. In: Scott MA, Simonoff JS, Marx BD, editors. The SAGE Handbook of Multilevel Modeling. Los Angeles, CA: Sage Publishing; 2013. pp. 465–484. [Google Scholar]
  • 2.Carle AC. Fitting multilevel models in complex survey data with design weights: recommendations. BMC Med Res Methodol. 2009;9:49–62. doi: 10.1186/1471-2288-9-49. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Rabe-Hesketh S, Skrondal A. Multilevel modelling of complex survey data. J R Stat Soc [Ser A] 2006;169(4):805–827. [Google Scholar]
  • 4.Sakshaug JW, West BT. Important considerations when analyzing health survey data collected using a complex sample design. Am J Public Health. 2014;104(1):15–16. doi: 10.2105/AJPH.2013.301515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Little RJ. To model or not to model? Competing modes of inference for finite population sampling. J Am Stat Assoc. 2004;99(466):546–556. [Google Scholar]
  • 6.Pfeffermann D. Modelling of complex survey data: why model? Why is it a problem? How can we approach it? Surv Methodol. 2011;37:115–136. [Google Scholar]
  • 7.Pfeffermann D, Skinner CJ, Holmes DJ, Goldstein H, Rasbash J. Weighting for unequal selection probabilities in multilevel models. J Royal Stat Soc B. 1998;60:23–40. [Google Scholar]
  • 8.Frankel MR, McNaghten A, Shapiro MF et al. A probability sample for monitoring the HIV-infected population in care in the U.S. and in selected states. Open AIDS J. 2012;6:67–76. doi: 10.2174/1874613601206010067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Behavioral and Clinical Characteristics of Persons Receiving Medical Care for HIV Infection—Medical Monitoring Project, United States, 2010. Atlanta, GA: Centers for Disease Control and Prevention; 2010. HIV Surveillance Special Report 9. [Google Scholar]
  • 10.Centers for Disease Control and Prevention. HIV/AIDS: Medical Monitoring Project (MMP). 2015. Available at: http://www.cdc.gov/hiv/statistics/systems/mmp/contact.html. Accessed June 29, 2015.
  • 11.Zhu M. Analyzing Multilevel Models With the GLIMMIX Procedure. Cary, NC: SAS Institute Inc; 2014. Paper SAS026–2014. [Google Scholar]
  • 12.Nordberg L. Generalized linear modeling of sample survey data. J Off Stat. 1989;5:223–239. [Google Scholar]

Articles from American Journal of Public Health are provided here courtesy of American Public Health Association

RESOURCES