Abstract
Data in health research are frequently structured hierarchically. For example, data may consist of patients nested within physicians, who in turn may be nested in hospitals or geographic regions. Fitting regression models that ignore the hierarchical structure of the data can lead to false inferences being drawn from the data. Implementing a statistical analysis that takes into account the hierarchical structure of the data requires special methodologies.
In this paper, we introduce the concept of hierarchically structured data, and present an introduction to hierarchical regression models. We then compare the performance of a traditional regression model with that of a hierarchical regression model on a dataset relating test utilization at the annual health exam with patient and physician characteristics. In comparing the resultant models, we see that false inferences can be drawn by ignoring the structure of the data.
Résumé
Dans le domaine de la recherche en santé, les données sont souvent structurées de façon hiérarchique. Par exemple, des données peuvent regrouper des patients reliés à des médecins, qui à leur tour sont reliés à un hôpital ou une région géographique. L’élaboration de modèles de régression qui négligent cette structure hiérarchique peut mener à des conclusions erronées. La réalisation d’une analyse statistique qui tient compte de la hiérarchie des données requiert des méthodes spécifiques.
Dans notre article, nous présentons le concept des structures hiérarchisées de données et initions le lecteur aux modèles de régression hiérarchiques. Nous comparons ensuite les résultats d’un modèle de régression traditionnel à ceux d’un modèle hiérarchique appliqué à un fichier qui établit des liens entre l’utilisation de tests lors d’examens annuels de santé et les caractéristiques des patients et des médecins en cause. La comparaison entre les deux modèles montre que l’on peut tirer de fausses conclusions si l’on ne tient pas compte de la structure des données.
Footnotes
Dr. Goel is supported in part by a National Health Scholar Award from Health Canada. Dr. van Walraven was an R. Samuel McLaughlin Foundation research fellow at ICES when part of this study was conducted and is currently an Arthur Bond Scholar of the Physicians Services Incorporated Foundation. The views expressed herein are solely those of the authors and do not represent the views of any of the sponsoring organizations
References
- 1.Rice N, Leyland A. Multilevel models: Applications to health data. J Health Services Research and Policy. 1996;1:154–64. doi: 10.1177/135581969600100307. [DOI] [PubMed] [Google Scholar]
- 2.Groves RM. Survey Errors and Survey Costs. New York, NY: John Wiley & Sons; 1989. [Google Scholar]
- 3.Snijders TAB, Bosker RJ. Multilevel Analysis. An Introduction to Basic and Advanced Multilevel Modeling. London: Sage Publications; 1999. [Google Scholar]
- 4.Leyland F, Boddy FA. League tables and acute myocardial infarction. Lancet. 1998;351:555–58. doi: 10.1016/S0140-6736(97)09362-8. [DOI] [PubMed] [Google Scholar]
- 5.Duncan C, Jones K, Moon G. Context, composition and heterogeneity: Using multilevel models in health research. Soc Sci Med. 1998;46:97–117. doi: 10.1016/S0277-9536(97)00148-2. [DOI] [PubMed] [Google Scholar]
- 6.Normand ST, Glickman ME, Gatsonis CA. Statistical methods for profiling providers of medical care: Issues and applications. J Am Statistical Association. 1997;92:803–14. doi: 10.1080/01621459.1997.10474036. [DOI] [Google Scholar]
- 7.Gatsonis CA, Epstein AM, Newhouse JP, et al. Variations in the utilization of coronary angiography for elderly patients with an acute myocardial infarction. Med Care. 1995;33:625–42. doi: 10.1097/00005650-199506000-00005. [DOI] [PubMed] [Google Scholar]
- 8.Christiansen CL, Morris CN. Improving the statistical approach to health care provider profiling. Ann Intern Med. 1997;127:764–68. doi: 10.7326/0003-4819-127-8_Part_2-199710151-00065. [DOI] [PubMed] [Google Scholar]
- 9.Langford IH, Bentham G, McDonald A. Multilevel modelling of geographically aggregated health data: A case study on malignant melanoma mortality and UV exposure in the European community. Statistics in Medicine. 1998;17:41–57. doi: 10.1002/(SICI)1097-0258(19980115)17:1<41::AID-SIM712>3.0.CO;2-0. [DOI] [PubMed] [Google Scholar]
- 10.Bryk AS, Raudenbush SW. Hierarchical Linear Models. Newbury Park, CA: Sage Publications; 1992. [Google Scholar]
- 11.Goldstein H. Multilevel Statistical Models. Second. New York, NY: Edward Arnold; 1995. [Google Scholar]
- 12.Van Walraven C, Goel V, Austin P. Why are investigations not recommended by practice guidelines ordered at the periodic health examination? J Evaluation in Clinical Practice. 2000;6:215–24. doi: 10.1046/j.1365-2753.2000.00245.x. [DOI] [PubMed] [Google Scholar]
- 13.Cameron AC, Trivedi PK. Regression Analysis of Count Data. New York, NY: Cambridge University Press; 1998. [Google Scholar]
- 14.SAS Institute Inc. SAS/STAT Software: Changes and Enhancements through Release 6.12. Cary, NC: SAS Institute Inc.; 1997. [Google Scholar]
- 15.Goldstein H, Rasbash J, Plewis I, et al. A User’s Guide to MLwinN. Multilevel Models Project. London: Institute of Education, University of London; 1998. [Google Scholar]
- 16.Bennett N. Teaching Styles and Pupil Progress. London: Open Books; 1976. [Google Scholar]
- 17.Aitkin M, Anderson D, Hinde J. Statistical modelling of data on teaching styles (with discussion) J Royal Statistical Society. 1981;144:148–61. [Google Scholar]
