Abstract
Peripheral arterial disease (PAD) is a chronic disease that affects millions of people worldwide and yet remains underdiagnosed and undertreated. Early detection is important, because PAD is strongly associated with an increased risk of mortality and morbidity. In this study, we built a PAD surveillance system using natural language processing (NLP) for early detection of PAD from narrative clinical notes. Our NLP algorithm had excellent positive predictive value (0.93) and identified 41% of PAD cases before the initial ankle-brachial index (ABI) test date while in 12% of cases the NLP algorithm detected PAD on the same date as the ABI (the gold standard for comparison). Hence, our system ascertains PAD patients in a timely and accurate manner. In conclusion, our PAD surveillance NLP algorithm has the potential for translation to clinical practice for use in reminding clinicians to order ABI tests in patients with suspected PAD and to reinforce the implementation of guideline recommended risk modification strategies in patients diagnosed with PAD.
Introduction
Peripheral arterial disease (PAD) is a common disease that affects 8.5 million adults in the United States. 1 PAD patients are at high risk for adverse outcomes including death, myocardial infarction, stroke and limb amputation. 2,3 Adverse vascular events often lead to poor quality of life and may also contribute to the high rate of depression in these patients. 4,5 However, PAD patients are often underdiagnosed and undertreated. 2,6,7 Lack of physician and public awareness of PAD-associated risks for adverse outcomes likely contribute to this public health problem. 6 The total annual cost associated with vascular hospitalization of PAD patients in the United States in 2004 was estimated to be in excess of 21 billion dollars and this number will increase as the population ages. 8 Timely detection and prompt implementation of guideline-recommended therapies for risk modification in PAD may lead to reduction of the risk for adverse outcomes as well as reduced costs.
Automated surveillance of clinical notes from electronic health records (EHRs) may promptly identify PAD cases. The main objective of disease surveillance is detection of individuals with disease. 9 Manual methods for disease surveillance are costly, time-consuming and inconsistent. 10 With a computerized approach for disease surveillance, cases are detected by applying case definitions or algorithmic approaches to clinical data. 9 EHRs have the potential to enhance surveillance efforts as they contain a rich variety of information that facilitates timely and efficient surveillance. Accordingly, we built a PAD surveillance system using natural language processing (NLP) for early detection of PAD symptoms from narrative clinical notes.
Background
PAD is confirmed by measuring the ankle-brachial index (ABI) but this method remains underutilized, 11 and PAD remains an under-diagnosed condition in the primary care setting. 12,13 Delayed diagnosis of PAD contributes to high rates of morbidity, limb amputation and death. 14
ABI is the ratio of blood pressure (BP) at the ankle to BP in the arm and is the gold standard for PAD diagnosis. 2 However, results of ABI testing may not be available at the point of care and consequently clinicians need to manually review clinical notes to seek information to support the diagnosis. Manual review of the medical record is labor intensive, time consuming and often impractical for busy clinicians evaluating patients with multiple complex health conditions.
Previously NLP systems were successfully applied to clinical notes for case identification 15 such as for bipolar disorder, 16 binge eating disorder,17 diabetes and celiac disease. 18 We previously used NLP to identify PAD cases from radiology notes 19 and also from narrative clinical notes. 20 However, NLP systems have been underused for disease surveillance. During the surveillance process, relevant data and information is collected and analyzed, to generate knowledge that may be promptly distributed to healthcare providers for appropriate action so that they can implement risk modification strategies, which may lower risks for adverse outcomes. 21 Prior surveillance studies focused on infectious diseases, birth defects, mental health issues, drug abuse and environmental exposures. 22 Most previous studies efforts to harness EHRs for population health surveillance have used ICD-9 codes to extract structured data elements. For example, in Italy, health authorities used international classification of diseases ninth revision (ICD-9) codes, death certificates and pathology reports to monitor the incidence of birth defects, a main reason for infant mortality in that region. 23 Another study used an algorithm based on structured and unstructured data (clinical notes using NLP) for potential surveillance of post-operative surgical complications. 24 To the best of our knowledge no prior study developed an NLP algorithm for PAD surveillance from narrative clinical notes. The goal of the present study was to build a PAD surveillance system using NLP for early detection of PAD from narrative clinical notes.
Methods
Study Setting and Population
This study took place at Mayo Clinic, Rochester Minnesota and used the resources of the Rochester Epidemiology Project (REP)25 to compile a community-based PAD case-control cohort from Olmsted County. The institutional review boards of participating medical centers approved this study.
Gold Standard
All patients underwent ABI testing at the Mayo noninvasive vascular laboratory.1 The ABI reports were in PDF format and were not part of clinical notes. Controls were patients with normal ABI. PAD cases were patients with abnormal ABI defined as ABI ⋚ 0.9 at rest or 1 minute after exercise or by the presence of poorly compressible arteries (ABI ⋛ 1.40 or ankle systolic blood pressure > 255 mmHg). 1 In addition to PAD status, the date of ABI testing was also recorded as an index date for all patients.
Study Design
The automated NLP algorithm was validated by comprehensive manual medical record review. Figure 1 shows the overall design of the study. All retrieved clinical notes for each patient were used to ascertain patient PAD status as an output.
PAD Status by NLP Algorithm
We identified a list of PAD-related terms to build the NLP algorithm prototype. For this purpose, an expert clinician manually reviewed clinical notes of 20 patients with PAD and 20 patients without PAD. The clinician highlighted word/phrases in each of clinical note used to determine PAD status. Examples of sentences abstracted from clinical notes that were used to identify PAD related terms are shown in Table 1.
Table 1:
Words highlighted in grey are examples of the best keywords for confirmation of PAD. Examples of the best keywords for exclusion are underlined. These keywords were used to create a list of appropriate terms of PAD status (Table 3). These notes were excluded from subsequent analysis.
Table 3:
Confirmation Key Words Disease Location | Confirmation Key Words Diagnosis | Exclusion Key Words |
---|---|---|
leg/legs; lower limbs/limb; lower extremities/extremity; Iliac/femoral/ tibial/popliteal artery/arteries; distal/ infrarenal/abdominal aorta/aorto-(bi)iliac/ aorto(bi)iliac/aorto(bi)-iliac; aorto-(bi)femoral; foot, toe, toes, shin; plantar, heel, ankle, interdigital; below/above knee, claudication/calf pain; | ischemic ulcer/ulcers; ASO/Arteriosclerosis obliterans/arterial sclerosis obliterans/atherosclerotic disease; PAD/peripheral arterial disease /peripheral vascular disease/ peripheral arterial occlusive disease; arterial occlusive disease/ occlusion/ occluded; stenosis; NCV/non compressible vessels; NCA/non compressible arteries; PCV/poorly compressible vessels; stiff vessels/ arteries ischemia; positive ABI/ankle brachial index/vascular labs/ extremities study/arterial studies; revascularization/recanalization/bypass/angioplasty/PTA/sten ting/stent/graendarterectomy/endarterectomies; thrombectomy/thromboembolectomy/throm bosis/embolectomy/embolectomies. | family history of, upper extremities / upper extremity; arm/arms, hand(s); brachial artery, axillary artery, radial artery, ulnar artery; carotid, innominate artery, subclavian artery; mesenteric artery; celiac artery; AAA/abdominal aortic aneurysm/abd aortic aneurysm; renal arteries/ artery; coronaries, coronary arteries/ artery/cerebrovascular-disease/arteries/artery; pseudoclaudication/ pseudoclaudicatory pain. Amputation; traumatic /trauma; sarcoma/osteoma; diabetic foot, hammer toe/ toes; vascular calcification; varicose veins; lower extremity/extremities edema/cellulitis/venous system; carotid artery disease/spinal ischemia. |
The list of keywords was further refined by manual review of charts conducted by a board certified cardiologist during the interactive validation of this algorithm. A detailed description of this approach has been previously reported. 20 In our prior study, we split the cohort into training and testing datasets. The training dataset was used to interactively refine the PAD NLP algorithm with refinement of PAD-related keywords and rules. During this interactive refinement, we identified note types, note sections and service groups that were relevant for ascertainment of PAD. 20 Table 2 contains a list of included note types, note sections and service groups used in the present study. We retrieved clinical notes from the EHR of each patient that were created until ABI test date plus 21 days (time interval for a subsequent clinic visit to review test results).
Table 2:
Note Types | Note Sections | Service Groups |
---|---|---|
Consult | Impression / Report / Plan | Primary Care |
Subsequent Visit | Diagnosis | Hospital Internal Medicine |
Patient Progress | Principal/primary Diagnosis | General Medicine |
Supervisory | Secondary Diagnoses | Family Medicine |
Limited Exam | Past Medical/Surgical History | Critical Care |
Specialty Evaluation | Ongoing Care | Urgent Care |
Multisystem Evaluation | Immunizations | Cardiology |
Injection | Key Findings / Test Results | Vascular |
Educational Visit | Pre-Procedure Information | Pulmonary |
Hospital Service Transfer | Post-Procedure Information | Oncology |
Vital Signs | Nephrology | |
Current Medications | Neurology | |
Revision History | Pathology | |
Special Instructions | Gastroenterology | |
Advance Directives | Vascular Wound Care | |
Discharge Activity | Vascular Surgery | |
Final Pathology Diagnosis | Cardiac Surgery |
NLP algorithm
The NLP algorithm had two main components: text-processing and patient classification. The text-processing component identified concepts in text that matched specified criteria while the patient classification component defined the PAD status on the basis of available evidence from clinical notes.
The NLP algorithm used MedTagger,26 an open source NLP pipeline which used the Apache unstructured information management architecture (UIMA) framework. The NLP algorithm used keywords described in Table 3 for patient classification. The following rules were used:
Rules to define PAD cases:
Any diagnostic keyword + any disease location keyword within two sentences of a same note
Rules for non-PAD cases:
If not satisfied the definition for PAD case OR
If exclusion keywords were present in the clinical note
Whenever the NLP algorithm classified a patient as PAD it also provides the note type, inception date and a part of clinical note +/- 2 sentences with the evidence used by the system to classify the patient.
Results
The dataset processed by the NLP algorithm consisted of 1569 patients (806 cases and 763 controls). The total number of clinical notes in dataset was 512,471 and on average each note had 386 words. The average age of patients was 71.2 years, 44% of were women and 90% were whites. Table 4 summarizes the results of the NLP algorithm performance presented as positive predictive value (PPV), sensitivity, negative predictive value (NPV) and specificity.
Table 4:
PPV | 0.93 |
Sensitivity | 0.70 |
NPV | 0.80 |
Specificity | 0.95 |
We compared the temporal association between NLP algorithm inception date (the date on which NLP algorithm classified the patient as PAD) with the gold standard index date for each PAD patient. For true positive cases, the difference between the NLP algorithm inception date and the gold standard index date was measured in days. We categorized whether the NLP algorithm identified PAD “before”, “at” the same time or “after” the gold standard index date. We found that in 329 cases (41%) the NLP algorithm identified PAD cases before the gold standard index date while in 93 cases (12%) NLP algorithm index date and gold standard index date were the same and in 141 cases (18%) the NLP index date was after gold standard index date but within the 21 day-window (Figure 2).
Discussion
The extensive use of EHRs holds great promise for population health surveillance strategies as the ability to rapidly extract information from EHRs may benefit individual health, healthcare delivery and the health of populations. 27 For an aging population with multiple coexisting chronic conditions, ascertainment of relevant characteristics at the point of care can be extremely time-consuming and challenging as data are buried within the EHR. 28
Clinically, the ABI test is used to confirm of PAD as recommended by clinical practice guidelines. 29 However, PAD remains an under-diagnosed and undertreated disease. 12,13 For the present study, we used the ABI test as the gold standard for comparison for development and validation of the NLP algorithm for PAD. We have previously applied the NLP-PAD algorithm in PAD patients who did not undergo ABI testing 30 and the rules derived from that process have also been incorporated in the final version of the algorithm used in the present study.
The novel findings of the present study were that an NLP algorithm identified accurately PAD cases from clinical notes, with high positive predictive value, and prior to the date of gold standard diagnostic test in 41% of the cases from the community. There was a time interval between documentation of evidence of PAD (e.g. symptoms) in narrative clinical notes and PAD diagnosis by ABI test. Hence, our data clearly shows a delay in establishing PAD diagnosis despite presence of PAD symptoms.
The accurate and timely assessment of PAD will 1) remind clinicians to order ABI testing for patients with suspected PAD and 2) implement standardized risk modification strategies in patients with diagnosed PAD. Early identification of PAD and subsequent implementation of risk modification strategies therapies are important for the management of these patients. The risk modification strategies recommended by clinical practice guidelines 31 include smoking cessation, and therapy with aspirin, statin medications, and angiotensin-converting enzyme inhibitors. Implementation of these recommendations is associated with significant reduction in adverse outcomes in PAD patients. 31 However, despite the evidence, patients with PAD continue to receive suboptimal risk modification strategies. 6, 31
This NLP algorithm may be incorporated to EHRs to identify both patients with previously diagnosed PAD and patients with PAD symptoms (i.e. suspected PAD). The PAD surveillance NLP algorithm could be linked to clinical decision support (CDS) to remind clinicians to order the diagnostic test (ABI). This may result in diagnosis of PAD earlier in the course of disease progression. 32 In addition, after the test results are available and documented in the clinical notes automated reminders for implementation of risk modification strategies in PAD would be generated with the ultimate goal to prevent and reduce adverse outcomes in PAD patients. Future EHR-based studies will evaluate the impact of the NLP-based CDS system on outcomes in PAD patients.
In conclusion, this PAD surveillance NLP algorithm has the potential for translation to clinical practice for use in CDS tools to remind clinicians to order ABI tests in patients with suspected PAD and to reinforce the implementation of guideline recommended risk modification strategies in patients diagnosed with PAD.
Acknowledgements
This study has supported by National Heart, Lung, and Blood Institute of the National Institutes of Health award number K01HL124045, the NHGRI’s eMERGE (Electronic Records and Genomics) Network grants HG04599 and HG006379. This study was made possible using the resources of the Rochester Epidemiology Project, which is supported by the National Institute on Aging of the National Institutes of Health under award number R01AG034676 and the NLP framework established through the NIGMS award R01GM102283A1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We thank Carin Y. Smith and Bradley R. Lewis for statistical support and preparation of datasets.
References
- 1.Kullo IJ, Rooke TW. Peripheral Artery Disease. N Engl J Med. 2016;2016:861–71. doi: 10.1056/NEJMcp1507631. [DOI] [PubMed] [Google Scholar]
- 2.Hirsch AT, Haskal ZJ, Hertzer NR, et al. ACC/AHA 2005 Guidelines for the Management of Patients With Peripheral Arterial Disease (Lower Extremity, Renal, Mesenteric, and Abdominal Aortic): A Collaborative Report from the American Association for Vascular Surgery/Society for Vascular Surgery,⏢ Society for Cardiovascular Angiography and Interventions, Society for Vascular Medicine and Biology, Society of Interventional Radiology, and the ACC/AHA Task Force on Practice Guidelines (Writing Committee to Develop Guidelines for the Management of Patients With Peripheral Arterial Disease) J AM Coll Cardiol. 2006;47:e1–e192. doi: 10.1097/01.RVI.0000240426.53079.46. [DOI] [PubMed] [Google Scholar]
- 3.Criqui MH, Denenberg JO, Langer RD, Fronek A. The epidemiology of peripheral arterial disease: importance of identifying the population at risk. Vascular Medicine. 1997;2:221–6. doi: 10.1177/1358863X9700200310. [DOI] [PubMed] [Google Scholar]
- 4.McDermott MM, Greenland P, Guralnik JM, et al. Depressive symptoms and lower extremity functioning in men and women with peripheral arterial disease. J Gen Intern Med. 2003;18:461–7. doi: 10.1046/j.1525-1497.2003.20527.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Regensteiner JG, Hiatt WR, Coll JR, et al. The impact of peripheral arterial disease on health-related quality of life in the Peripheral Arterial Disease Awareness, Risk, and Treatment: New Resources for Survival (PARTNERS) Program. Vasc Med. 2008;13:15–24. doi: 10.1177/1358863X07084911. [DOI] [PubMed] [Google Scholar]
- 6.Hirsch AT, Criqui MH, Treat-Jacobson D, et al. Peripheral arterial disease detection, awareness, and treatment in primary care. JAMA. 2001;286:1317–24. doi: 10.1001/jama.286.11.1317. [DOI] [PubMed] [Google Scholar]
- 7.Oka RK, Umoh E, Szuba A, Giacomini JC, Cooke JP. Suboptimal intensity of risk factor modification in PAD. Vasc Med. 2005;10:91–6. doi: 10.1191/1358863x05vm611oa. [DOI] [PubMed] [Google Scholar]
- 8.Mahoney EM, Wang K, Cohen DJ, et al. One-year costs in patients with a history of or at risk for atherothrombosis in the United States. Circ Cardiovasc Qual Outcomes. 2008;1:38–45. doi: 10.1161/CIRCOUTCOMES.108.775247. [DOI] [PubMed] [Google Scholar]
- 9.Tsui F, Wagner M, Cooper G, et al. Probabilistic case detection for disease surveillance using data in electronic medical records. Online journal of public health informatics. 2011;3 doi: 10.5210/ojphi.v3i3.3793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Mendonca EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in infants using natural language processing of radiology reports. J Biomed Inform. 2005;38:314–21. doi: 10.1016/j.jbi.2005.02.003. [DOI] [PubMed] [Google Scholar]
- 11.Daddato S, Tartagni E, Dormi A, et al. Can peripheral arterial disease be early screened for in a podiatric setting? A preliminary study in a cohort of asymptomatic adults. Eur Rev Med Pharmacol Sci. 2012;16:1646–50. [PubMed] [Google Scholar]
- 12.Bernstein J, Esterhai JL, Staska M, Reinhardt S, Mitchell ME. The prevalence of occult peripheral arterial disease among patients referred for orthopedic evaluation of leg pain. Vasc Med. 2008;13:235–8. doi: 10.1177/1358863X08091970. [DOI] [PubMed] [Google Scholar]
- 13.El-Menyar A, Amin H, Rashdan I, et al. Ankle-brachial index and extent of atherosclerosis in patients from the Middle East (the AGATHA-ME study): a cross-sectional multicenter study. Angiology. 2009;60:329–34. doi: 10.1177/0003319708321585. [DOI] [PubMed] [Google Scholar]
- 14.Walker CM, Bunch FT, Cavros NG, Dippel EJ. Multidisciplinary approach to the diagnosis and management of patients with peripheral arterial disease. Clin Interv Aging. 2015;10:1147–53. doi: 10.2147/CIA.S79355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc. 2016;23:1007–15. doi: 10.1093/jamia/ocv180. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Castro VM, Minnier J, Murphy SN, et al. Validation of electronic health record phenotyping of bipolar disorder cases and controls. Am J Psychiatry. 2015;172:363–72. doi: 10.1176/appi.ajp.2014.14030423. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Bellows BK, LaFleur J, Kamauu AW, et al. Automated identification of patients with a diagnosis of binge eating disorder from narrative electronic health records. J Am Med Inform Assoc. 2014;21:e163–8. doi: 10.1136/amiajnl-2013-001859. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ludvigsson JF, Leffler DA, Bai JC, et al. The Oslo definitions for coeliac disease and related terms. Gut. 2013;62:43–52. doi: 10.1136/gutjnl-2011-301346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Savova GK, Fan J, Ye Z, et al. Discovering peripheral arterial disease cases from radiology notes using natural language processing. AMIA Annu Symp Proc. 2010:722–6. [PMC free article] [PubMed] [Google Scholar]
- 20.Afzal N, Sohn S, Abram S, et al. Mining Peripheral Arterial Disease Cases from Narrative Clinical Notes using Natural Language Processing. In Press Journal of Vascular Surgery. 2017 doi: 10.1016/j.jvs.2016.11.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Thacker SB, Redmond S, Rothenberg RB, Spitz SB, Choi K, White MC. A controlled trial of disease surveillance strategies. Am J Prev Med. 1986;2:345–50. [PubMed] [Google Scholar]
- 22.Thacker SB, Qualters JR, Lee LM. Centers for Disease C, Prevention. Public health surveillance in the United States: evolution and challenges. MMWR Suppl. 2012;61:3–9. [PubMed] [Google Scholar]
- 23.Tagliabue G, Tessandori R, Caramaschi F, et al. Descriptive epidemiology of selected birth defects, areas of Lombardy, Italy, 1999. Popul Health Metr. 2007;5 doi: 10.1186/1478-7954-5-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.FitzHenry F, Murff HJ, Matheny ME, et al. Exploring the frontier of electronic health record surveillance: the case of postoperative complications. Med Care. 2013;51:509–16. doi: 10.1097/MLR.0b013e31828d1210. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.St Sauver JL, Grossardt BR, Yawn BP, et al. Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system. Int J Epidemiol. 2012;41:1614–24. doi: 10.1093/ije/dys195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Liu H, Bielinski SJ, Sohn S, et al. An Information Extraction Framework for Cohort Identification Using Electronic Health Records. AMIA Jt Summits Transl Sci Proc. 2013:149–53. [PMC free article] [PubMed] [Google Scholar]
- 27.Paul MM, Greene CM, Newton-Dame R, et al. The state of population health surveillance using electronic health records: a narrative review. Popul Health Manag. 2015;18:209–16. doi: 10.1089/pop.2014.0093. [DOI] [PubMed] [Google Scholar]
- 28.Furukawa MF. Meaningful use: a roadmap for the advancement of health information exchange. Isr J Health Policy Res. 2013;2:1. doi: 10.1186/2045-4015-2-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Gerhard-Herman MD, Gornik HL, Barrett C, et al. 2016 AHA/ACC Guideline on the Management of Patients With Lower Extremity Peripheral Artery Disease: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J Am Coll Cardiol. 2016 doi: 10.1016/j.jacc.2016.11.007. [DOI] [PubMed] [Google Scholar]
- 30.Afzal N, Sohn S, Abram S, Liu H, Kullo IJ, Arruda-Olson AM. 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI) IEEE; 2016. Identifying peripheral arterial disease cases using natural language processing of clinical notes; pp. 126–131. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7455851. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Armstrong EJ, Chen DC, Westin GG, et al. Adherence to guideline-recommended therapy is associated with decreased major adverse cardiovascular events and major adverse limb events among patients with peripheral arterial disease. J Am Heart Assoc. 2014;3:e000697. doi: 10.1161/JAHA.113.000697. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Berner ES. Clinical decision support systems: state of the art. AHRQ publication. 2009;90069 [Google Scholar]