Skip to main content
Injury Prevention logoLink to Injury Prevention
. 2004 Jun;10(3):186–191. doi: 10.1136/ip.2003.004580

Practical introduction to record linkage for injury research

D Clark 1
PMCID: PMC1730090  PMID: 15178677

Abstract

The frequency of early fatality and the transient nature of emergency medical care mean that a single database will rarely suffice for population based injury research. Linking records from multiple data sources is therefore a promising method for injury surveillance or trauma system evaluation. The purpose of this article is to review the historical development of record linkage, provide a basic mathematical foundation, discuss some practical issues, and consider some ethical concerns.

Clerical or computer assisted deterministic record linkage methods may suffice for some applications, but probabilistic methods are particularly useful for larger studies. The probabilistic method attempts to simulate human reasoning by comparing each of several elements from the two records. The basic mathematical specifications are derived algebraically from fundamental concepts of probability, although the theory can be extended to include more advanced mathematics.

Probabilistic, deterministic, and clerical techniques may be combined in different ways depending upon the goal of the record linkage project. If a population parameter is being estimated for a purely statistical study, a completely probabilistic approach may be most efficient; for other applications, where the purpose is to make inferences about specific individuals based upon their data contained in two or more files, the need for a high positive predictive value would favor a deterministic method or a probabilistic method with careful clerical review. Whatever techniques are used, researchers must realize that the combination of data sources entails additional ethical obligations beyond the use of each source alone.

Full Text

The Full Text of this article is available as a PDF (227.2 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Alsop J. C., Langley J. D. Determining first admissions in a hospital discharge file via record linkage. Methods Inf Med. 1998 Jan;37(1):32–37. [PubMed] [Google Scholar]
  2. Annas George J. Medical privacy and medical research--judging the new federal regulations. N Engl J Med. 2002 Jan 17;346(3):216–220. doi: 10.1056/NEJM200201173460320. [DOI] [PubMed] [Google Scholar]
  3. Arellano M. G., Petersen G. R., Petitti D. B., Smith R. E. The California Automated Mortality Linkage System (CAMLIS). Am J Public Health. 1984 Dec;74(12):1324–1330. doi: 10.2105/ajph.74.12.1324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Beebe G. W. Record linkage systems--Canada vs the United States. Am J Public Health. 1980 Dec;70(12):1246–1248. doi: 10.2105/ajph.70.12.1246. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bell R. M., Keesey J., Richards T. The urge to merge: linking vital statistics records and Medicaid claims. Med Care. 1994 Oct;32(10):1004–1018. [PubMed] [Google Scholar]
  6. Blakely Tony, Salmond Clare. Probabilistic record linkage and a method to calculate the positive predictive value. Int J Epidemiol. 2002 Dec;31(6):1246–1252. doi: 10.1093/ije/31.6.1246. [DOI] [PubMed] [Google Scholar]
  7. Breen K. J. Consent for the linkage of data for public health research: is it (or should it be) an absolute pre-requisite? Aust N Z J Public Health. 2001 Oct;25(5):423–425. [PubMed] [Google Scholar]
  8. Brenner H., Schmidtmann I. Determinants of homonym and synonym rates of record linkage in disease registration. Methods Inf Med. 1996 Mar;35(1):19–24. [PubMed] [Google Scholar]
  9. Brenner H., Schmidtmann I. Effects of record linkage errors on disease registration. Methods Inf Med. 1998 Jan;37(1):69–74. [PubMed] [Google Scholar]
  10. Brenner H., Schmidtmann I., Stegmaier C. Effects of record linkage errors on registry-based follow-up studies. Stat Med. 1997 Dec 15;16(23):2633–2643. doi: 10.1002/(sici)1097-0258(19971215)16:23<2633::aid-sim702>3.0.co;2-1. [DOI] [PubMed] [Google Scholar]
  11. Califf Robert M., Muhlbaier Lawrence H. Health Insurance Portability and Accountability Act (HIPAA): must there be a trade-off between privacy and quality of health care, or can we advance both? Circulation. 2003 Aug 26;108(8):915–918. doi: 10.1161/01.CIR.0000085720.65685.90. [DOI] [PubMed] [Google Scholar]
  12. Clark D. E. Development of a statewide trauma registry using multiple linked sources of data. Proc Annu Symp Comput Appl Med Care. 1993:654–658. [PMC free article] [PubMed] [Google Scholar]
  13. Clark D. E., Hahn D. R. Comparison of probabilistic and deterministic record linkage in the development of a statewide trauma registry. Proc Annu Symp Comput Appl Med Care. 1995:397–401. [PMC free article] [PubMed] [Google Scholar]
  14. Clark D. E., Hahn D. R. Hospital trauma registries linked with population-based data. J Trauma. 1999 Sep;47(3):448–454. doi: 10.1097/00005373-199909000-00003. [DOI] [PubMed] [Google Scholar]
  15. Clark D. E., Katz M. S., Campbell S. M. Decreasing mortality and morbidity rates after the institution of a statewide burn program. J Burn Care Rehabil. 1992 Mar-Apr;13(2 Pt 1):261–270. doi: 10.1097/00004630-199203000-00017. [DOI] [PubMed] [Google Scholar]
  16. Cook L. J., Knight S., Olson L. M., Nechodom P. J., Dean J. M. Motor vehicle crash characteristics and medical outcomes among older drivers in Utah, 1992-1995. Ann Emerg Med. 2000 Jun;35(6):585–591. [PubMed] [Google Scholar]
  17. Cook L. J., Olson L. M., Dean J. M. Probabilistic record linkage: relationships between file sizes, identifiers and match weights. Methods Inf Med. 2001 Jul;40(3):196–203. [PubMed] [Google Scholar]
  18. Copas J. B., Hilton F. J. Record linkage: statistical models for matching computer records. J R Stat Soc Ser A Stat Soc. 1990;153(3):287–320. [PubMed] [Google Scholar]
  19. Copes W. S., Stark M. M., Lawnick M. M., Tepper S., Wilkerson D., DeJong G., Brannon R., Hamilton B. B. Linking data from national trauma and rehabilitation registries. J Trauma. 1996 Mar;40(3):428–436. doi: 10.1097/00005373-199603000-00018. [DOI] [PubMed] [Google Scholar]
  20. Dean J. M., Vernon D. D., Cook L., Nechodom P., Reading J., Suruda A. Probabilistic linkage of computerized ambulance and inpatient hospital discharge records: a potential tool for evaluation of emergency medical services. Ann Emerg Med. 2001 Jun;37(6):616–626. doi: 10.1067/mem.2001.115214. [DOI] [PubMed] [Google Scholar]
  21. Dunn H. L. Record Linkage. Am J Public Health Nations Health. 1946 Dec;36(12):1412–1416. [PMC free article] [PubMed] [Google Scholar]
  22. Esposito T. J., Nania J., Maier R. V. State trauma system evaluation: a unique and comprehensive approach. Ann Emerg Med. 1992 Apr;21(4):351–357. doi: 10.1016/s0196-0644(05)82649-6. [DOI] [PubMed] [Google Scholar]
  23. Fair M. E., Lalonde P., Newcombe H. B. Application of exact ODDS for partial agreements of names in record linkage. Comput Biomed Res. 1991 Feb;24(1):58–71. doi: 10.1016/0010-4809(91)90013-m. [DOI] [PubMed] [Google Scholar]
  24. Farrell T. M., Sutton J. E., Clark D. E., Horner W. R., Morris K. I., Finison K. S., Menchen G. E., Cohn K. H. Moose-motor vehicle collisions. An increasing hazard in northern New England. Arch Surg. 1996 Apr;131(4):377–381. doi: 10.1001/archsurg.1996.01430160035005. [DOI] [PubMed] [Google Scholar]
  25. Ferrante A. M., Rosman D. L., Knuiman M. W. The construction of a road injury database. Accid Anal Prev. 1993 Dec;25(6):659–665. doi: 10.1016/0001-4575(93)90031-q. [DOI] [PubMed] [Google Scholar]
  26. Fife D. Matching fatal accident reporting system cases with National Center for Health Statistics motor vehicle deaths. Accid Anal Prev. 1989 Feb;21(1):79–83. doi: 10.1016/0001-4575(89)90050-x. [DOI] [PubMed] [Google Scholar]
  27. Gill L., Goldacre M., Simmons H., Bettley G., Griffith M. Computerised linking of medical records: methodological guidelines. J Epidemiol Community Health. 1993 Aug;47(4):316–319. doi: 10.1136/jech.47.4.316. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Gomatam Shanti, Carter Randy, Ariet Mario, Mitchell Glenn. An empirical comparison of record linkage procedures. Stat Med. 2002 May 30;21(10):1485–1496. doi: 10.1002/sim.1147. [DOI] [PubMed] [Google Scholar]
  29. Howe G. R., Lindsay J. A generalized iterative record linkage computer system for use in medical follow-up studies. Comput Biomed Res. 1981 Aug;14(4):327–340. doi: 10.1016/0010-4809(81)90004-5. [DOI] [PubMed] [Google Scholar]
  30. Howe G. R. Use of computerized record linkage in cohort studies. Epidemiol Rev. 1998;20(1):112–121. doi: 10.1093/oxfordjournals.epirev.a017966. [DOI] [PubMed] [Google Scholar]
  31. Jamieson E., Roberts J., Browne G. The feasibility and accuracy of anonymized record linkage to estimate shared clientele among three health and social service agencies. Methods Inf Med. 1995 Sep;34(4):371–377. [PubMed] [Google Scholar]
  32. Jaro M. A. Probabilistic linkage of large public health data files. 1995 Mar 15-Apr 15Stat Med. 14(5-7):491–498. doi: 10.1002/sim.4780140510. [DOI] [PubMed] [Google Scholar]
  33. Karlson T. A., Quade C., Florey M. Nonfatal motor vehicle crash injuries: Wisconsin's experience with linked data systems. Wis Med J. 1996 May;95(5):301–304. [PubMed] [Google Scholar]
  34. Kelman C. W., Bass A. J., Holman C. D. J. Research use of linked health data--a best practice protocol. Aust N Z J Public Health. 2002;26(3):251–255. doi: 10.1111/j.1467-842x.2002.tb00682.x. [DOI] [PubMed] [Google Scholar]
  35. Langley J. D., Botha J. L. Use of record linkage techniques to maintain the Leicestershire Diabetes Register. Comput Methods Programs Biomed. 1994 Jan;41(3-4):287–295. doi: 10.1016/0169-2607(94)90060-4. [DOI] [PubMed] [Google Scholar]
  36. Lopez D. G., Rosman D. L., Jelinek G. A., Wilkes G. J., Sprivulis P. C. Complementing police road-crash records with trauma registry data--an initial evaluation. Accid Anal Prev. 2000 Nov;32(6):771–777. doi: 10.1016/s0001-4575(99)00130-x. [DOI] [PubMed] [Google Scholar]
  37. Moore M. Comparison of young and adult driver crashes in Alaska using linked traffic crash and hospital data. Alaska Med. 1997 Oct-Dec;39(4):95–102. [PubMed] [Google Scholar]
  38. Muse A. G., Mikl J., Smith P. F. Evaluating the quality of anonymous record linkage using deterministic procedures with the New York State AIDS registry and a hospital discharge file. 1995 Mar 15-Apr 15Stat Med. 14(5-7):499–509. doi: 10.1002/sim.4780140511. [DOI] [PubMed] [Google Scholar]
  39. NEWCOMBE H. B., KENNEDY J. M., AXFORD S. J., JAMES A. P. Automatic linkage of vital records. Science. 1959 Oct 16;130(3381):954–959. doi: 10.1126/science.130.3381.954. [DOI] [PubMed] [Google Scholar]
  40. Neutel C. I., Johansen H. L., Walop W. 'New data from old': epidemiology and record-linkage. Prog Food Nutr Sci. 1991;15(3):85–116. [PubMed] [Google Scholar]
  41. Newcombe H. B. Age-related bias in probabilistic death searches due to neglect of the "prior likelihoods". Comput Biomed Res. 1995 Apr;28(2):87–99. doi: 10.1006/cbmr.1995.1007. [DOI] [PubMed] [Google Scholar]
  42. Newcombe H. B. Strategy and art in automated death searches. Am J Public Health. 1984 Dec;74(12):1302–1303. doi: 10.2105/ajph.74.12.1302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Patterson L., Weiss H., Schano P. Combining multiple data bases for outcomes assessment. Am J Med Qual. 1996 Spring;11(1):S73–S77. [PubMed] [Google Scholar]
  44. Roos L. L., Jr, Wajda A., Nicol J. P. The art and science of record linkage: methods that work with few identifiers. Comput Biol Med. 1986;16(1):45–57. doi: 10.1016/0010-4825(86)90061-2. [DOI] [PubMed] [Google Scholar]
  45. Roos L. L., Wajda A. Record linkage strategies. Part I: Estimating information and evaluating approaches. Methods Inf Med. 1991 Apr;30(2):117–123. [PubMed] [Google Scholar]
  46. Roos L. L., Walld R., Wajda A., Bond R., Hartford K. Record linkage strategies, outpatient procedures, and administrative data. Med Care. 1996 Jun;34(6):570–582. doi: 10.1097/00005650-199606000-00007. [DOI] [PubMed] [Google Scholar]
  47. Rosman D. L. The western australian road injury database (1987-1996): ten years of linked police, hospital and death records of road crashes and injuries. Accid Anal Prev. 2001 Jan;33(1):81–88. doi: 10.1016/s0001-4575(00)00018-x. [DOI] [PubMed] [Google Scholar]
  48. Runge J. W. Linking data for injury control research. Ann Emerg Med. 2000 Jun;35(6):613–615. doi: 10.1016/s0196-0644(00)70035-7. [DOI] [PubMed] [Google Scholar]
  49. Russell J., Conroy C. Representativeness of deaths identified through the injury-at-work item on the death certificate: implications for surveillance. Am J Public Health. 1991 Dec;81(12):1613–1618. doi: 10.2105/ajph.81.12.1613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Smith M. E., Newcombe H. B. Accuracies of computer versus manual linkages of routine health records. Methods Inf Med. 1979 Apr;18(2):89–97. [PubMed] [Google Scholar]
  51. Van Tuinen M. Unsafe driving behaviors and hospitalization. Mo Med. 1994 Apr;91(4):172–175. [PubMed] [Google Scholar]
  52. Waien S. A. Linking large administrative databases: a method for conducting emergency medical services cohort studies using existing data. Acad Emerg Med. 1997 Nov;4(11):1087–1095. doi: 10.1111/j.1553-2712.1997.tb03684.x. [DOI] [PubMed] [Google Scholar]
  53. Wajda A., Roos L. L., Layefsky M., Singleton J. A. Record linkage strategies: Part II. Portable software and deterministic matching. Methods Inf Med. 1991 Aug;30(3):210–214. [PubMed] [Google Scholar]
  54. Weiss H. B., Dill S. M., Garrison H. G., Coben J. H. The potential of using billing data for emergency department injury surveillance. Acad Emerg Med. 1997 Apr;4(4):282–287. doi: 10.1111/j.1553-2712.1997.tb03549.x. [DOI] [PubMed] [Google Scholar]

Articles from Injury Prevention are provided here courtesy of BMJ Publishing Group

RESOURCES