Abstract
OBJECTIVES: Postal codes are often the only available geographic identifiers in many sources of health data in Canada. In order to conduct geographic analyses, postal codes are routinely geocoded to census geography to link to ecological data. Despite common use of this method, the extent of geographic misclassification errors is poorly understood. We estimated misclassification errors in the geocoding of postal codes to assign census geography in Nova Scotia, Canada.
METHODS: We examined differences between counts and match rates for postal-code geocoded and actual locations of buildings in Nova Scotia at two census administrative area levels: dissemination areas (DAs) and census subdivisions (CSDs). Actual locations were based on the data collected by the provincial government containing actual latitude/longitude of buildings. Variation in misclassification by rurality, using Statistics Canada’s classification, was also assessed.
RESULTS: Outside two urban areas (Halifax Metro and Sydney) which had <10% differences in counts, many DAs had >30% differences. Match rates showed similar patterns, with the vast majority of non-urban DAs having <40% match rates. Even in major urban areas, 10% of DAs had large misclassification errors. Misclassification errors at the CSD level were still too great to estimate counts or rates without further area aggregation.
CONCLUSION: Routine use of postal code geocoding should be replaced with geocoding of location information using additional identifiers such as civic addresses or latitude and longitude. If data holders did this in-house before providing data to researchers, the accuracy and capacity of geographic analysis would be enhanced while protecting confidentiality.
Key words: Geocoding, postal code, data linkage, small-area analysis, population health
Résumé
OBJECTIFS: Les codes postaux sont souvent les seuls identifiants géographiques disponibles dans de nombreuses sources de données sanitaires au Canada. Afin de procéder à des analyses géographiques, les codes postaux sont habituellement géocodés à la géographie du recensement pour être reliés aux données écologiques. Bien que ce soit une méthode couramment utilisée, on connaît mal l’étendue des erreurs de classification géographique. Nous avons estimé les erreurs de classification dans le géocodage des codes postaux pour fins d’association à la géographie du recensement en Nouvelle-Écosse, au Canada.
MÉTHODE: Nous avons examiné les écarts entre les numérations et les taux d’appariement d’emplacements géocodés selon le code postal et d’emplacements réels de bâtiments en Nouvelle-Écosse à deux niveaux de régions administratives du recensement: les aires de diffusion (AD) et les subdivisions de recensement (SDR). Les emplacements réels ont été déterminés selon les données recueillies par le gouvernement provincial indiquant la latitude et la longitude réelles des bâtiments. Nous avons aussi évalué la variation des erreurs de classification par ruralité à l’aide de la classification de Statistique Canada.
RÉSULTATS: Sauf dans deux agglomérations urbaines (Sydney et la région métropolitaine de Halifax) où il y avait <10 % d’écarts dans les numérations, beaucoup d’AD affichaient des écarts >30 %. Les tendances étaient semblables pour les taux d’appariement: la très grande majorité des AD non urbaines affichaient des taux d’appariement <40 %. Même dans les grandes agglomérations urbaines, 10 % des AD comportaient d’importantes erreurs de classification. Les erreurs de classification à l’échelle des SDR étaient encore trop importantes pour estimer les numérations ou les taux sans un regroupement plus poussé des zones.
CONCLUSION: L’utilisation habituelle du géocodage par code postal devrait être remplacée par le géocodage de l’information de localisation à l’aide d’identifiants supplémentaires, comme les adresses de voirie ou la latitude et la longitude. Si les détenteurs de données faisaient cela à l’interne avant de fournir leurs données aux chercheurs, l’exactitude et la capacité des analyses géographiques seraient rehaussées, et la confidentialité des données serait protégée.
Mots clés: géocodage, code postal, couplage de données, analyse de données régionales, santé des populations
Footnotes
Conflict of Interest: None to declare
References
- 1.Krieger N, Waterman P, Lemieux K, Zieler S, Hogan JW. Evaluating the accuracy of geocoding in public health research. Am J Public Health. 2001;90:1114–16. doi: 10.2105/ajph.91.7.1114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rushton G, Armstrong MP, Gittler J, Greene BR, Pavlik CE, West MM, et al. Geocoding in cancer research: A review. Am J Prev Med. 2006;30(2S):S16–24. doi: 10.1016/j.amepre.2005.09.011. [DOI] [PubMed] [Google Scholar]
- 3.Auger N, Daniel M, Platt RW, Wu Y, Luo ZC, Choiniere R. Association between perceived security of the neighbourhood and small-for-gestational-age birth. Paediatr Perinat Epidemiol. 2008;22(5):467–77. doi: 10.1111/j.1365-3016.2008.00959.x. [DOI] [PubMed] [Google Scholar]
- 4.Wilkins R, Peters PA. PCCF+ Version 5K User’s Guide. Automated Geographic Coding Based on the Statistics Canada Postal Code Conversion Files, Including Postal Codes Through May 2011. Ottawa, ON: Health Analysis Division, Statistics Canada; 2012. [Google Scholar]
- 5.Peller P. An Analysis of the Postal Code Conversion File’s Use in Research. Calgary, AB: University of Calgary; 2011. pp. 1–24. [Google Scholar]
- 6.Jacquez GM. A research agenda: Does geocoding positional error matter in health GIS studies? Spat Spatio-temporal Epidemiol. 2012;3:7–16. doi: 10.1016/j.sste.2012.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Bell NJ, Schuurman N, Morad Hameed S. A small-area population analysis of socioeconomic status and incidence of severe burn/fire-related injury in British Columbia, Canada. Burns. 2009;35(8):1133–41. doi: 10.1016/j.burns.2009.04.028. [DOI] [PubMed] [Google Scholar]
- 8.Wang C, Guttmann A, To T, Dick PT. Neighborhood income and health outcomes in infants: How do those with complex chronic conditions fare? Arch Pediatr Adolesc Med. 2009;163(7):608–15. doi: 10.1001/archpediatrics.2009.36. [DOI] [PubMed] [Google Scholar]
- 9.Zhang X, Onufrak S, Holt JB, Croft JB. Prev Chronic Dis. 2013. A multilevel approach to estimating small area childhood obesity prevalence at the census block-group level. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Terashima M, Guernsey JR, Andreou P. What type of rural? Assessing the variations in life expectancy at birth at small area-level for a small population province using classes of locally defined settlement types. BMC Public Health. 2014;14:162. doi: 10.1186/1471-2458-14-162. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Pampalon R, Hamel D, Gamache P. Recent changes in the geography of social disparities in premature mortality in Québec. Soc Sci Med. 2008;67(8):1269–81. doi: 10.1016/j.socscimed.2008.06.010. [DOI] [PubMed] [Google Scholar]
- 12.Matheson FI, Moineddin R, Glazier RH. The weight of place: A multilevel analysis of gender, neighborhood material deprivation, and body mass index among Canadian adults. Soc Sci Med. 2008;66(3):675–90. doi: 10.1016/j.socscimed.2007.10.008. [DOI] [PubMed] [Google Scholar]
- 13.Terashima M, Rainham DGC, Levy AR. A small-area analysis of inequalities in chronic disease prevalence across urban and non-urban communities in the Province of Nova Scotia, Canada, 2007–2012. BMJ Open. 2014;4(e004459):1–10. doi: 10.1136/bmjopen-2013-004459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Armstrong B. Effect of measurement error on epidemiological studies of environmental and occupational exposures. Occup Environ Med. 1998;55(10):651–56. doi: 10.1136/oem.55.10.651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rhomberg L, Chandalia J, Long J, Goodman J. Measurement error in environmental epidemiology and the shape of exposure-response curves. Crit Rev Toxicol. 2011;41(8):651–71. doi: 10.3109/10408444.2011.563420. [DOI] [PubMed] [Google Scholar]
- 16.Government of Nova Scotia. Nova Scotia Civic Address Users Guide. Halifax, NS: GeoNOVA; 2015. [Google Scholar]
- 17.Statistics Canada. Postal Code Conversion File Plus (PCCF+) Reference Guide. Ottawa, ON: Statistics Canada; 2014. [Google Scholar]
- 18.Statistics Canada. 2011 Census Dictionary. 2012. [Google Scholar]
- 19.Ross NA, Tremblay S, Graham K. Neighbourhood influences on health in Montreal, Canada. Soc Sci Med. 2004;28:443–78. doi: 10.1016/j.socscimed.2004.01.016. [DOI] [PubMed] [Google Scholar]
- 20.Goldberg DW, Jacquez GM. Advances in geocoding for the health sciences. Spat Spatio-temporal Epidemiol. 2012;3:1–5. doi: 10.1016/j.sste.2012.02.001. [DOI] [PubMed] [Google Scholar]
- 21.Census of Population. Catalogue no. 12-581-X. Available at: http://www.statcan.gc.ca/pub/12-581-x/2012000/pop-eng.htm (Accessed November 30, 2015).
- 22.Iburi S, Fujita J, Yajima H, Kakuda H, Sakamoto M, Matsumura A. The intervention against an outbreak of pulmonary tuberculosis in the dormitory of construction laborers - Connection with approaches from public health, medical treatment, social welfare, and labor management. Kekkaku. 2001;76(11):691–98. [PubMed] [Google Scholar]
- 23.Ratcliffe JH. Geocoding crime and a first estimate of a minimum acceptable hit rate. Int J Geogr Inform Sci. 2004;18(1):61–72. doi: 10.1080/13658810310001596076. [DOI] [Google Scholar]
- 24.DMTI Spatial. Platinum Postal Code Suite v2011.3. Markham, ON: Multiple Enhanced Postalcodes (MEP); 2011. [Google Scholar]
- 25.Kephart G, Asada Y, Atherton F, Burge F, Campbell L-A, Dowling L, et al. Small Area Variation in Rates of High-Cost Healthcare Use Across Nova Scotia. Halifax, NS: Maritime SPOR Support Unit; 2016. [Google Scholar]
- 26.Fuller D, Shareck M. Can J Public Health. 2014. Canada Post community mailboxes: Implications for health research. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Shah TI, Bell S, Wilson K. Geocoding for public health research: Empirical comparison of two geocoding services applied to Canadian cities. Can Geogr. 2014;58(4):400–17. doi: 10.1111/cag.12091. [DOI] [Google Scholar]
- 28.Office for National Statistics UK. Guidance and Methodology, Super Output Areas. ONS, London, UK.
