Abstract
Objectives. We determined whether statistical text mining (STM) can identify fall-related injuries in electronic health record (EHR) documents and the impact on STM models of training on documents from a single or multiple facilities.
Methods. We obtained fiscal year 2007 records for Veterans Health Administration (VHA) ambulatory care clinics in the southeastern United States and Puerto Rico, resulting in a total of 26 010 documents for 1652 veterans treated for fall-related injury and 1341 matched controls. We used the results of an STM model to predict fall-related injuries at the visit and patient levels and compared them with a reference standard based on chart review.
Results. STM models based on training data from a single facility resulted in accuracy of 87.5% and 87.1%, F-measure of 87.0% and 90.9%, sensitivity of 92.1% and 94.1%, and specificity of 83.6% and 77.8% at the visit and patient levels, respectively. Results from training data from multiple facilities were almost identical.
Conclusions. STM has the potential to improve identification of fall-related injuries in the VHA, providing a model for wider application in the evolving national EHR system.
Approximately one third of all adults older than 65 years fall each year.1 Fall injury is a leading cause of death and disability among older adults.2 Adults aged 65 years and older had more than 2.1 million emergency department (ED) visits from injurious falls in 2006, accounting for 1 in 10 of all ED visits nationally. Direct-care costs of fall injuries in the United States for people aged 65 years and older are estimated to be approximately $20 billion annually.3 A fall the previous year is the strongest clinical predictor of subsequent falls and should target patients for fall prevention programs.4 Although most estimates of treatment of fall-related injuries have come from hospital ED data, a recent national survey estimated that treatment of more than 50% of 76 million nonfatal acute injuries (most of which were fall injuries) occurred in ambulatory care settings outside of hospital EDs.5 With the evolution of the electronic health record (EHR), new opportunities to measure the impact of this important public health issue will be available. In this article, we describe results of using statistical text mining (STM) of clinical documents from an integrated EHR to improve the identification of fall-related injuries in ambulatory care.
The Veterans Health Administration’s (VHA’s) EHR supports both ambulatory and inpatient care and allows full management of the health record nationally. The EHR connects VHA facilities’ workstations and PCs through the Computerized Patient Record System, a graphical user interface that allows full management of the health record.6 Services or encounters with patients are documented in the EHR in 2 ways: structured or coded data using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM)7 codes and written notes, which are entered directly by the provider and saved as separate files. It is the largest integrated EHR in the United States,8 and it contains tremendous amounts of administrative data and approximately 2.5 billion text-based documents (e.g., progress notes, lab reports).9
Previously, we described patterns of fall-related ambulatory care encounters in the VHA administrative data from the EHR.10 The current study was based on the analysis of fall-related E-codes (ICD-9-CM codes E880–E889), part of the ICD-9-CM coding system that permits the “supplementary classification of external causes of injury and poisoning.”11(p 81) For example, the primary diagnosis code for an encounter might be “fracture of the neck of the femur” (ICD-9-CM 820) with the fall-related E-code for “fall from a ladder” (ICD-9-CM E888.1). Although nearly half of the encounters occurred in the emergency or urgent care setting, fall-related injuries led to services across a spectrum of medical and surgical providers and departments.10 A single-institution study demonstrated that STM could be used to identify fall-related injuries in VHA ambulatory care documents when no E-code was present.12
We conducted a multi-institutional study that used STM to identify fall-related injuries at both the outpatient visit level (≥ 1 outpatient encounter in a given day) and patient level (across 1 year of data) and extended previous work to determine the best STM model for identifying fall-related injuries at the document level.13 We explored whether STM can be successfully applied to documents generated across a large health care system. The evidence that practice patterns vary across facilities in large systems is considerable, and if documentation practices also vary, training on a sample from 1 facility would have an impact on the results. Because STM analysis requires a large reference set of documents that have been reviewed and classified by expert human review, which can be very costly, we also investigated the effect on STM results of selecting documents from a single or multiple institutions.
METHODS
The analytic sample was developed in 4 steps. In step 1, we identified a cohort of patients coded for falls using ICD-9-CM codes and matched controls likely to have been treated for a fall, but with no fall codes. We identified patients who received outpatient services for injuries during fiscal year 2007 from 4 VHA facilities’ ambulatory care clinics and Community Based Outpatient Clinics in the southeastern United States and Puerto Rico. All encounters with diagnosis codes for an injury (ICD-9-CM codes 800–999) or a fall-related E-code (ICD-9-CM codes E880–E889) as the primary reason for the encounter were extracted from the Medical SAS Outpatient (Encounters) database.12 In step 2, we selected from the pool of patients treated for injuries as many as 2 matched controls who did not receive a fall-related E-code on the basis of facility, gender, type of injury (IDC-9-CM code), and age within 10 years. We also identified patients who had a fall-related E-code as the primary diagnosis code rather than an injury code. We could not select matched controls for this group because no diagnosis was available for matching.
In step 3, we extracted for the cohort identified in step 2 all outpatient administrative encounters with an injury or fall-related E-code documented as the primary diagnosis code during the study year, with encounters on a given day grouped together to form ambulatory “visits.” In step 4, we identified EHR documents (progress notes, reports, etc.) associated with the visits. There is no one-to-one link between clinic encounters and visits in the VHA’s EHR because multiple documents may be generated by 1 or more clinicians during a single encounter. To ensure that we included all potentially relevant clinical documents, we extracted documents within 48 hours of the encounter or visit. VHA guidelines require providers to complete documentation immediately after the encounter; however, our experience has suggested that during busy clinics documentation might be completed the next day. We used the 48-hour window to maximize the capture of information about the fall, even if the document was created as much as 2 days later.
Chart Review (Annotation)
To create a reference standard for training STM models, we conducted chart review (annotation). Written guidelines defining text that documented a fall were developed and reviewed by clinical experts and labeled as fall or not fall. A fall was defined according to the World Health Organization as “inadvertently coming to rest on the ground, floor or other lower level, excluding intentional change in position to rest in furniture, wall or other objects.”14(p 1) Three clinicians experienced in chart review were trained to implement the annotation guidelines on an initial set of 50 randomly selected documents. The annotators were instructed to identify text that indicated that the patient had suffered a fall. On the basis of feedback during the initial training, the guidelines were revised, and 100 additional documents were randomly selected for further training. We compared the results of this process with annotations conducted by a clinical expert. We used these 150 documents as a training reference standard (based on the clinical expert’s annotations) to assess how well the chart reviewers performed according to the guidelines. Agreement on document classification between the annotators and the clinical expert was calculated using Cohen’s κ.15 When agreement levels of all annotators exceeded a κ of 0.80, formal annotation of the data set began.16 To ensure validity and reliability of the annotation process, we conducted spot checks at the rate of approximately 10 per 1000 notes annotated.
Statistical Text Mining
We used STM to classify documents as containing patterns of text that indicated whether the patient had fallen or not. STM “aims to extract useful knowledge from textual data or documents.”17(p 6) STM involves learning patterns from existing data or information and is built on machine learning and artificial intelligence. Machine learning applied in the medical domain can be described as “the study of computer algorithms that improve automatically through the analysis of data.”17(p 7) Here we employed STM using supervised learning by labeling each document as fall or not fall.
In our previous document-level analysis, we iteratively applied all combinations of several weighting schemes, matrix reductions, and classification algorithms to identify the STM model that best classified fall documents13 on the basis of the area under the receiver operating characteristic curve. We used the best-performing model employing a linear support vector machine18,19 to train an STM model for this study. More specifically, we transformed documents into a term-by-document matrix by converting all text to lowercase, tokenizing, removing tokens with fewer than 3 characters or no alphabetical characters, normalizing terms using the National Library of Medicine lexical tool Norm,20 removing stop words, and removing terms that only occurred once in the matrix. We then weighted the matrix using 3 factors: a log transformation on term frequency, χ2 weighting of terms, and normalization of documents using cosine normalization.21 Features from the weighted matrix were then selected from among the top 50 weighted terms and generated using the top 200 singular value decomposition dimensions by means of latent semantic analysis.22 These features were then provided to the linear support vector machine to train a model. We performed all analyses using RapidMiner23 version 5.2 (RapidMiner USA, Cambridge, MA) with the Text Processing version 5.2.1 and Weka version 5.1.1 extensions, as well as some custom-built components.
To evaluate the STM model on data that were not used to develop models, we divided data from each site into training (70%) and testing (30%) data sets, stratified by whether a patient had at least 1 document annotated for the presence of a fall or not. We split these data using 2 strategies to compare results of sampling on model performance. In the first case, we used all of the available training data from the facility with the largest number of patients to train a model that was then applied to the test data set from that facility and the other facilities. In the second case, we randomly drew an equal number of documents from all 4 facilities to provide a second training set.
RESULTS
For this study, we identified a cohort of 2241 patients from among whom the analytic sample was chosen. This cohort contained 631 (28.2%) patients coded with both a primary injury code and a fall-related E-code, 995 (44.4%) matched controls and 615 (27.4%) patients who had a fall-related E-code (as the primary diagnosis code) but no primary injury code. After applying the sampling strategy to the cohort, 4597 outpatient visits and 26 010 documents were identified within the 48-hour window. Document titles in the VHA EHR reflect either a place of service or the type of clinician entering the note, for example, emergency department notes, nursing triage notes, or orthopedic surgery consult notes. The sample included 614 unique document titles. We grouped the most common titles into clinically relevant groups and identified 52 titles written by nurses (n = 6194; 24% of documents), 17 titles from emergency or urgent care (n = 2668; 10% of documents), 70 titles from surgery (n = 2112; 8% of documents), and 22 titles from primary care (n = 1991; 8% of documents). The cohort was 91.8% men aged 21 to 98 years (median = 74.0). κ values for all spot checks (n = 308) of annotator agreement with a clinical expert exceeded the minimum level established for the study; the κ for all spot check data combined was 0.90.
A total of 18 654 documents from 1568 patients were available for training, and 7356 documents from 673 patients were available for testing. In the first sampling strategy, we used training data for 6496 documents from 589 patients from a single facility to build an STM model that was subsequently applied to the test data set from that facility and the other facilities. In the second sampling strategy, we randomly drew an equal number of documents (6496) from all 1568 training patients from all 4 facilities.
When applied to the test data, models built on data from a single facility resulted in accuracy of 87.5%, F-measure of 87.0%, sensitivity of 92.1%, specificity of 83.6%, positive predictive value of 82.5%, and negative predictive value of 92.6% at the visit level. Models based on data from multiple institutions resulted in almost identical results (Table 1). Patient-level test results for single-facility models resulted in accuracy of 88.1%, F-measure of 90.9%, sensitivity of 94.1%, specificity of 77.8%, positive predictive value of 87.9%, and negative predictive value of 88.5%. Again, almost identical results were seen when the training data were taken from multiple facilities.
TABLE 1—
Level | Single Facility Training, % | Multiple Facility Training, % |
Visit level | ||
Accuracy | 87.5 | 87.1 |
F-measurea | 87.0 | 86.6 |
Sensitivity | 92.1 | 91.8 |
Specificity | 83.6 | 83.1 |
PPV | 82.5 | 82.0 |
NPV | 92.6 | 92.3 |
Patient level | ||
Accuracy | 88.1 | 88.4 |
F-measurea | 90.9 | 91.3 |
Sensitivity | 94.1 | 95.8 |
Specificity | 77.8 | 75.8 |
PPV | 87.9 | 87.2 |
NPV | 88.5 | 91.3 |
Note. NPV = negative predictive value; PPV = positive predictive value; STM = statistical text mining. For the visit-level analysis, the sample size was n = 1383. For the patient-level analysis, the sample size was n = 673.
Harmonic mean of sensitivity and positive predictive value.
The Venn diagrams in Figure 1 describe the relationship between patients identified as having a fall-related injury by chart review (top circle; areas A–D), STM (left circle; areas B, D, E, and G), and E-codes (right circle; areas C, D, F, and G) in the test data set on the basis of the visit- and patient-level models built from multiple facilities. Of the 425 patients identified as having a fall-related injury on the basis of chart review, 305 (71.8%) were identified by both STM and E-codes (area D). A total of 102 (24.0%) and 10 (2.4%) were identified by both chart review and either STM or E-codes (areas B and C, respectively). Eight patients (1.9%) were not identified by either STM or E-codes as having a fall-related injury (area A). A total of 107 patients were identified as having a fall-related injury without any evidence having been documented in the text (areas E–G): 31 (29.0%) by STM, 47 (43.9%) by E-codes, and 29 (27.1%) by both STM and E-codes. Combining STM and E-codes resulted in identification of 98.1% of fallers in the sample.
We conducted an error analysis on the results of the STM model. We selected all of the incorrectly classified documents for as many as 10 patients from each site from areas A, C, E, and G. We identified 10 categories of errors, with the most common errors shown first in Table 2. Except for when the reason for an error could not be determined (unsure category), many errors resulted from the presence of the word fall in contexts other than a patient having had a recent fall (fall risk, template, fall-related information, history, etc.). Less common errors resulted from the presence of negation, misspellings, and incorrect word usage. Among the group of errors we classified as “other” was the use of the word fall in reference to other people or inanimate objects.
TABLE 2—
Category of Misclassification | Example |
Unsure | “Pt fell down . . . sustaining head trauma” |
Fall risk | “Pt taken off Warfarin due to fall risk” |
Template | “5 or greater indicates a high risk for falls” |
Annotator judgment | “Swelling after twisting right ankle last night going down stairs” |
Fall-related information | “Pt goals: be able to performed [sic] safely without falling down” |
History | “3 falls in the last 3 years” |
Incorrect annotation | “Hx of LBP since fall off of fire truck in 1986” |
Negation | “Very painful, didn’t fall. Heard a pop” |
Semantic | “Felll [sic] hit head and left hip and leg” |
Other | “Wife residing with daughter secondary to fall and fractured skull” |
DISCUSSION
The goal of this study was to demonstrate the potential of STM to improve identification of fall-related injuries in the EHR of a large integrated health system. We purposely selected a large sample of patients, visits, and documents from multiple institutions to determine how robust STM might be in practice in an EHR that includes documents written by a variety of clinicians in emergency and other ambulatory care settings. Our results suggest that STM has the potential to improve the identification of fall-related injuries at both the visit and the patient level. As part of an integrated EHR, STM results could be used to supplement traditional analyses based on administrative data to provide alerts to clinicians concerning previous falls or to better estimate fall injury incidence and prevalence.
In VHA ambulatory care, E-codes are recorded by the clinician providing care, not by professional coders. After each encounter, a clinician enters ICD-9-CM diagnosis and procedures codes, including E-codes if appropriate, and also writes a progress note. Although E-codes assigned by professional coders in hospitals have been shown to be very accurate,24 we could find no published study addressing the completeness of E-codes in ambulatory care. Our results showed that supplemental fall-related E-codes were assigned to about 74.1% of the patients who were classified as receiving services related to a fall by chart review. We should note that of the 1246 patients who received fall-related E-codes, only 50.6% (n = 631) also received a primary ICD-9-CM injury code as recommended in coding practices. This result likely reflects the fact that many clinicians are aware of the need to document falls but are less clear on the proper coding practices because it is not part of their daily routine. Although efforts to improve coding may be warranted, the reality is that given the workload of clinicians in the VHA, the coding will likely never reach the level of accuracy of professional coders in the hospital setting. In our large sample of data from the VHA’s integrated EHR, it appears that most clinicians document a fall as being the reason for a visit as a matter of course. Statements such as “Mr. XXXX is a 75-year-old male presenting with a contusion to his right lower leg due to a fall at home” are common, and STM represents a useful method to extract information from this type of documentation.
A human-labeled reference data set of documents is needed to conduct the supervised STM used here. Logistically, it is often easiest to obtain labeled data from 1 facility. However, patterns of documentation may be unique to facilities. In this study, we found that an STM model developed from data at 1 facility performed equally as well as those developed from data at all facilities. When interpreting these findings, note that the sample sizes of training sets for both the single institution and the multi-institution training samples were relatively large. In a recent study, we found that sample sizes of 1500 and more resulted in similar performance results.25
A recent comprehensive review of the literature suggested that computerized approaches “offer the potential to further develop and standardize the analysis of narrative text for injury surveillance.”26(p 354) In this study, E-codes correctly identified 74.1% (n = 315) of the 425 fallers identified through chart review, and STM identified 95.8% (n = 407). This 20% increase in reported cases likely underrepresents the total number of unreported fallers in the VHA facilities because our sampling strategy matched approximately 1.6 controls per patient coded with both an injury and a fall-related ICD-9-CM code. Together, the 2 methods identified 98.1% of the fallers in the sample.
We included documents written by a wide variety of clinicians in this study, demonstrating that STM is robust for identifying fall-related injuries. Some of this success of our STM models is likely the result of document selection. We chose documents written within 48 hours of outpatient encounters known or suspected to be related to fall injuries, which likely reduced the opportunity for errors because the encounters during that time period typically related to treating an injury. Even so, our error analysis suggests that incorporating preprocessing steps to identify negated terms such as “the patient denies falling” or to extract templates that reference fall risk (e.g., “Have you had a fall in the last month? Y/N”) would likely improve the performance of STM models. The decision of whether to use STM alone or in combination with other text extraction techniques would depend on the cost of additional analytic steps versus the goals of the analysis. That being said, the results reported here support the fact that the targeted use of STM in combination with ICD-9-CM coding can be very effective.
A major advantage of STM is that all that is needed to train the models is a reference standard of documents labeled fall or not fall. Although we used costly traditional chart review to create the reference standard, automated systems based on feedback from clinicians could develop and update a reference standard. During clinical practice, a random sample of records could be presented to clinicians for classification. After an initial training period, this sample could be very small, perhaps just several records a week, but would provide a continuously updated reference standard data set. STM models could regularly be retrained on the basis of the updated reference set to extend the shelf life of applications.
Currently, surveillance of fall-related injuries is based on E-codes, primarily assigned in hospital EDs, although much care for fall-related injuries is provided outside of the ED. One reason for this is the lack of a national integrated ambulatory care EHR. However, the Health Information Technology for Economic and Clinical Health Act,27 which represents a major effort to expand and integrate EHR nationally (including the ambulatory care setting), has the potential to create a platform for automated surveillance systems. A recent article described the Electronic Medical Record Support for Public Health program, which extracts raw data from EHRs, analyzes them for conditions such as notifiable diseases and diabetes, and automatically transmits them to state health departments.28 An STM-based system, such as the one described in this article, could provide an additional avenue for providing automated identification of such conditions. These data could be combined with traditional structured data through programs such as the Observational Medical Outcomes Partnership’s common data model.29,30 The common data model is used to standardize the format and content of the observational data, allowing for the application of standardized surveillance applications, tools, and methods. The integrated VHA electronic health record system provides a laboratory for development of surveillance systems that leverages both structured and unstructured data that could eventually be deployed in an evolving national EHR.
This study was the first attempt to our knowledge to apply STM to identify fall-related injuries in text-based ambulatory care documents generated from a large integrated EHR. The tremendous resource of text-based data in the VHA’s electronic health record represents an opportunity to explore new ways, such as STM, to better monitor this important clinical issue. The results of this study suggest that using STM on ambulatory care documents has the potential to improve surveillance of fall-related injuries in the VHA and provides a model for wider application in regional or national integrated EHR systems. At the same time, STM algorithms embedded into the EHR system could flag patients with previous falls for fall prevention interventions in real time, thus improving care and reducing burden on providers.
Acknowledgments
Funding for this work was provided by the Veterans Healthcare Administration (Health Services Research and Development grants IIR 05-120-3 and SDR HIR 09-002).
We thank Blesila R. Vasquez, MD, for her assistance with the completion of this study.
Note. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the Department of Veterans Affairs or the US government.
Human Participant Protection
This study was approved by the institutional review boards at the University of South Florida, the University of Florida, Bay Pines VA Healthcare System, and the VA Caribbean Healthcare System.
References
- 1.Hausdorff JM, Rios DA, Edelberg HK. Gait variability and fall risk in community-living older adults: a 1-year prospective study. Arch Phys Med Rehabil. 2001;82(8):1050–1056. doi: 10.1053/apmr.2001.24893. [DOI] [PubMed] [Google Scholar]
- 2.Alamgir H, Muazzam S, Nasrullah M. Unintentional falls mortality among elderly in the United States: time for action. Injury. 2012;43(12):2065–2071. doi: 10.1016/j.injury.2011.12.001. [DOI] [PubMed] [Google Scholar]
- 3.Stevens JA, Corso PS, Finkelstein EA, Miller TR. The costs of fatal and non-fatal falls among older adults. Inj Prev. 2006;12(5):290–295. doi: 10.1136/ip.2005.011015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ganz DA, Bao Y, Shekelle PG, Rubenstein LZ. Will my patient fall? JAMA. 2007;297(1):77–86. doi: 10.1001/jama.297.1.77. [DOI] [PubMed] [Google Scholar]
- 5.Betz ME, Li G. Epidemiologic patterns of injuries treated in ambulatory care settings. Ann Emerg Med. 2005;46(6):544–551. doi: 10.1016/j.annemergmed.2005.07.009. [DOI] [PubMed] [Google Scholar]
- 6. Computerized Patient Record System. Available to VHA employees and researchers at: http://vaww.virec.research.va.gov/VistA/Overview.htm. Accessed August 1, 2014.
- 7.International Classification of Diseases, Ninth Revision, Clinical Modification. Hyattsville, MD: National Center for Health Statistics; 1980. DHHS publication PHS 80-1260. [Google Scholar]
- 8.Jha AK, DesRoches CM, Campbell EG et al. Use of electronic health records in US hospitals. N Engl J Med. 2009;360(16):1628–1638. doi: 10.1056/NEJMsa0900592. [DOI] [PubMed] [Google Scholar]
- 9.VA Information Resource Center. VIReC Frequencies: Record Counts, Null Counts and Discrete Value Frequencies: Corporate Data Warehouse TIU Schema. Hines, IL: US Department of Veterans Affairs, Health Services Research and Development Service, VA Information Resource Center; 2013. Available to VHA employees and researchers at: http://vaww.virec.research.va.gov/MedSAS/Outpatient.htm. Accessed August 1, 2014. [Google Scholar]
- 10.Luther SL, French DD, Powell-Cope G, Rubenstein LZ, Campbell R. Using administrative data to track fall-related ambulatory care services in the Veterans Administration Healthcare system. Aging Clin Exp Res. 2005;17(5):412–418. doi: 10.1007/BF03324631. [DOI] [PubMed] [Google Scholar]
- 11. Centers for Medicare and Medicaid Services, National Center for Health Statistics. ICD-9-CM official guidelines for coding and reporting. Available at: http://www.cdc.gov/nchs/data/icd9/icd9cm_guidelines_2011.pdf. Accessed April 30, 2012.
- 12.Tremblay M, Berndt D, Luther S, Foulis P, French D. Identifying fall-related injuries: text mining the electronic medical record. Inf Tech Manag. 2009;10(4):253–265. [Google Scholar]
- 13.McCart JA, Berndt DJ, Jarman J, Finch DK, Luther SL. Finding falls in ambulatory care clinical documents using statistical text mining. J Am Med Inform Assoc. 2013;20(5):906–914. doi: 10.1136/amiajnl-2012-001334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.World Health Organization. WHO global report on falls prevention in older age. Available at: http://www.who.int/ageing/publications/Falls_prevention7March.pdf. Accessed March 24, 2014.
- 15.Di Eugenio B, Glass M. The kappa statistic: a second look. Computational Linguist. 2004;30(1):95–101. [Google Scholar]
- 16.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–174. [PubMed] [Google Scholar]
- 17.Chen H, Fuller S, Friedman C, Hersh W. Knowledge management, data mining, and text mining in medical informatics. In: Chen H, Fuller SS, Friedman C, Hersh W, editors. Medical Informatics: Knowledge Management and Data Mining in Biomedicine. New York, NY: Springer; 2005. pp. 3–33. [Google Scholar]
- 18.Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–297. [Google Scholar]
- 19.Fan RE, Chang KW, Hsieh CJ et al. LIBLINEAR: a library for large linear classification. J Mach Learn Res. 2008;9(2008):1871–1874. [Google Scholar]
- 20.McCray AT, Srinivasan S, Browne AC. Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care. 1994:235–9. [PMC free article] [PubMed] [Google Scholar]
- 21.Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inf Process Manag. 1988;24(5):513–523. [Google Scholar]
- 22.Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R. Indexing by latent semantic analysis. J Am Soc Inf Sci. 1990;41(6):391–407. [Google Scholar]
- 23.Mierswa I, Wurst M, Klinkenberg R . Proceedings of the 12th ACM KDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, PA: ACM; 2006. YALE: rapid prototyping for complex data mining tasks; pp. 935–40. [Google Scholar]
- 24.Coben JH, Steiner CA, Barrett M, Merrill CT, Adamson D. Completeness of cause of injury coding in healthcare administrative databases in the United States, 2001. Inj Prev. 2006;12(3):199–201. doi: 10.1136/ip.2005.010512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Berndt DJ, McCart JA, Finch DK, Luther SL. A case study of data quality in text mining clinical progress notes. ACM Trans Manag Inf Syst. 2015;6(1) Article 1. [Google Scholar]
- 26.McKenzie K, Scott DA, Campbell MA, McClure RJ. The use of narrative text for injury surveillance research: a systematic review. Accid Anal Prev. 2010;42(2):354–363. doi: 10.1016/j.aap.2009.09.020. [DOI] [PubMed] [Google Scholar]
- 27.Blumenthal D. Launching HITECH. N Engl J Med. 2010;362(5):382–385. doi: 10.1056/NEJMp0912825. [DOI] [PubMed] [Google Scholar]
- 28.Stang PE, Ryan PB, Racoosin JA et al. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med. 2010;153(9):600–606. doi: 10.7326/0003-4819-153-9-201011020-00010. [DOI] [PubMed] [Google Scholar]
- 29.Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. J Am Med Inform Assoc. 2012;19(1):54–60. doi: 10.1136/amiajnl-2011-000376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Klompas M, McVetta J, Lazarus R et al. Integrating clinical practice and public health surveillance using electronic medical record systems. Am J Prev Med. 2012;42(6 suppl 2):S154–S162. doi: 10.1016/j.amepre.2012.04.005. [DOI] [PubMed] [Google Scholar]