Skip to main content
Proceedings of the AMIA Symposium logoLink to Proceedings of the AMIA Symposium
. 2002:757–761.

Identification of patient name references within medical documents using semantic selectional restrictions.

Ricky K Taira 1, Alex A T Bui 1, Hooshang Kangarloo 1
PMCID: PMC2244274  PMID: 12463926

Abstract

De-identification of a patient's personal data from medical records is a protective legal requirement imposed before medical documents can be used for research purposes or transferred to other healthcare providers (e.g., teachers, students, tele-consultations). This de-identification process is tedious if performed manually, and is known to be quite faulty in direct search and replace strategies [9]. In this paper, we report on the identification step of this process. The proposed algorithm is based on estimating the fitness of candidate patient name references to a set of semantic selectional restrictions. The semantic restrictions place tight contextual requirements upon candidate words in the report text and are determined automatically from a manually tagged corpus of training reports. Maximum entropy classifiers are used to provide a probabilistic measure of the belief of a given candidate token to a given semantic restriction. We report on the design and preliminary evaluation of the system within the do-main of pediatric urology.

Full text

PDF

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Bui Alex A. T., Dionisio John David N., Morioka Craig A., Sinha Usha, Taira Ricky K., Kangarloo Hooshang. DataServer: an infrastructure to support evidence-based radiology. Acad Radiol. 2002 Jun;9(6):670–678. doi: 10.1016/s1076-6332(03)80312-4. [DOI] [PubMed] [Google Scholar]
  2. Metz C. E. Basic principles of ROC analysis. Semin Nucl Med. 1978 Oct;8(4):283–298. doi: 10.1016/s0001-2998(78)80014-2. [DOI] [PubMed] [Google Scholar]
  3. Quantin C., Bouzelat H., Allaert F. A., Benhamiche A. M., Faivre J., Dusserre L. Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods Inf Med. 1998 Sep;37(3):271–277. [PubMed] [Google Scholar]

Articles from Proceedings of the AMIA Symposium are provided here courtesy of American Medical Informatics Association

RESOURCES