Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2020 Mar 4;2019:228–237.

Towards Reliable ARDS Clinical Decision Support: ARDS Patient Analytics with Free-text and Structured EMR Data

Emilia Apostolova 1, Amit Uppal 2, Jessica E Galarraga 3,4, Ioannis Koutroulis 5, Tim Tschampel 6, Tony Wang 7, Tom Velez 6
PMCID: PMC7153087  PMID: 32308815

Abstract

In this work, we utilize a combination of free-text and structured data to build Acute Respiratory Distress Syndrome(ARDS) prediction models and ARDS phenotype clusters. We derived ’Patient Context Vectors’ representing patientspecific contextual ARDS risk factors, utilizing deep-learning techniques on ICD and free-text clinical notes data. The Patient Context Vectors were combined with structured data from the first 24 hours of admission, such as vital signs and lab results, to build an ARDS patient prediction model and an ARDS patient mortality prediction model achieving AUC of 90.16 and 81.01 respectively. The ability of Patient Context Vectors to summarize patients’ medical history and current conditions is also demonstrated by the automatic clustering of ARDS patients into clinically meaningful phenotypes based on comorbidities, patient history, and presenting conditions. To our knowledge, this is the first study to successfully combine free-text and structured data, without any manual patient risk factor curation, to build real-time ARDS prediction models.

Introduction

Critical care Clinical Decision Support (CDS) systems aim at the early identification and timely treatment of rapidly progressive, life-threatening conditions. In particular, ARDS (Acute Respiratory Distress Syndrome) is a significant cause of morbidity and mortality in the USA and worldwide1,2. Early recognition and evidence-based management of ARDS can limit the propagation of lung injury and significantly improve patient outcomes3.

To date, there exists no accurate and reliable way to anticipate which patients, presenting with respiratory distress, are likely to develop ARDS. Numerous prediction scores have been developed to assess ARDS prognosis and risk of death, such as Lung Injury Score (LIS)4, Lung Injury Prediction Score (LIPS)5, APPS (Age, Plateau, PaO2/FiO2 Score)6, Early Acute Lung Injury (EALI)7, and Modified ARDS Prediction Score (MAPS)8. Using a consensus process, a panel of experts convened in 2011 to develop the Berlin definition, focusing on addressing a number of limitations of prior definitions9. Still, the predictive validities of these tools and definitions have proven to be moderate, for example, as measured by area under receiver operating curve (AUC).

The difficulty in analyzing and predicting ARDS outcomes stems from the fact that this is both rare, and, at the same time, highly heterogeneous condition10. ARDS involves the interaction of multiple risk factors, past history, and current conditions, signs, and symptoms. Hospital alert systems typically rely on highly sensitive screening of structured data, such as vital signs and lab results, which, in the case of such rare conditions, are often associated with false clinical alarms resulting in “alarm fatigue”11.

EMR data depends on what the clinician deems necessary to measure and record in the act of caring for the patient. EMR data is typically entered for the purposes of clinical documentation and billing12, and thus not centered around the needs of real-time surveillance-based CDS systems. The combined physician care related variables and underlying patient-related contextual factors needed for a reliable ARDS risk evaluation are typically dispersed across the patient EMR record, and available at different times throughout the patient stay. Patient demographics, past medical and visit history, chronic conditions, risk factors, current signs and symptoms can be found in diverse combinations of structured elements and clinical notes (e.g. nursing notes, radiology reports, etc.), that record diagnosis and procedure codes, vital signs, lab orders and results, ventilation parameters, etc. The challenge for real-time surveillance-based CDS systems is accommodating for the variability and the availability of real-time electronic data and enabling accurate contextual interpretation of real-time patient data.

In this work, we utilize all available EMR patient information, in the form of structured data and free-text, for real-time predictive modeling. While our experiments are focused on identifying ARDS cases, the described method is applicable to a variety of disease surveillance CDS use cases, needing information dispersed across the EMR patient record.

A second goal of this study is to utilize the combination of clinician knowledge and experience, and a data-driven approach to identify ARDS patients’ phenotypes and risk factors, acknowledging the need for targeted personalized treatments reflecting differences in treatment outcomes across patient subtypes1315.

Dataset

Clinical encounter data of adult patients were extracted from the MIMIC3 Intensive Care Unit (ICU) database16. MIMIC3 consists of retrospective ICU encounter data of patients admitted into Beth Israel Deaconess Medical Center from 2001 to 2012. Included ICUs are medical, surgical, trauma-surgical, coronary, cardiac surgery recovery, and medical/surgical care units. MIMIC3 includes time series data recorded in the EMR during encounters (e.g. vital signs/diagnostic laboratory results, free text clinical notes, medications, procedures, etc.). The dataset contains data associated with over 58,000 ICU visits, including over 2 million free-text clinical notes and over 650,000 diagnosis codes.

For this study, in accordance with previous literature17, we identified ARDS for adult patients older than 18 years with ICD-9 codes for severe acute respiratory failure and use of continuous invasive mechanical ventilation, excluding those with codes for acute asthma, COPD and CHF exacerbations1. This resulted in 4,624 ARDS cases from a total of 48,399 adult ICU admissions. The ICU mortality rate in this population was approximately 59%, somewhat higher than expected for ARDS18, suggesting that the algorithm used is capturing the most severe cases of ARDS, and thus introducing some level noise with possibly containing true ARDS cases marked as negative examples.

Our ARDS predictive model utilized data in the form of free-text clinical notes, ICD codes, and structured physiological and ventilator data. The structured data included in this analysis consists of anion gap (aniongap), albumin, bands, bicarbonate, bilirubin, creatine, chloride, glucose, hematocrit, hemoglobin, lactate, platelet, potassium, partial thromboplastin time (ptt), international normalized ratio (inr), prothrombin time (pt), sodium, bun, white blood cell count (wbc), heart rate (heartrate), systolic blood pressure (sysbd), diastolic blood pressure (diasbp), mean blood pressure (meanbp), respiratory rate (resperate), body temperature (tempc), peripheral capillary oxygen saturation (spo2), body mass index (bmi), gender, age, urine output (urine1). All variables are included as min, max, and mean values and are measured over the first 24 hours of ICU admission. The first 24 hour timeframe was chosen, as it has been reported that ARDS develops at a median of 30 hours after hospital admission19. Thus, a 24-hour window provides for the gathering of enough structured data, while at the same time is early enough for real-time CDS.

Descriptive Analytics

ARDS patient characteristics and risk factors were first gathered with the help of experienced clinicians. Expert knowledge was gathered in the form of Concept Maps (Cmaps)20. Cmaps is a tool developed at the Institute for Human and Machine Cognition (IHMC) that enables collaborative knowledge creation, in the form of concepts, relations, and ontologies, with links to external resources and publications. A snippet of the developed ARDS Cmap developed by our clinical research team is shown in Figure 1.

Figure 1:

Figure 1:

ARDS Cmap: Clinician-coded representation of ARDS patient characteristics and risk factors.

For example, clinicians coded ARDS precipitant causes include sepsis, aspiration, traumatic injuries, burns, and drugs, including illicit drugs, such as cocaine, heroin, or prescription drugs, such as chemotherapeutic agents, etc.

Initially, the ARDS Cmap was used as a screening rule engine. In addition, the Cmap was used to identify risk factors that were later used in predictive models. Although rule engines based on Cmaps are evidence-based and tend to be highly sensitive, they tend to perform with subpar specificity (e.g. the Berlin definition achieved an AUC of 0.577). Such rule engines are also highly sensitive to missing data. In contrast, data-driven and Machine Learning algorithms have the potential to improve on rule engines from training on large datasets, learning from a variety of clinical data and response patterns, and are able to handle missing data. However, for machine learning to be effective in predicting highly heterogeneous conditions such as ARDS, training data requires both high precision labeling and the identification of features with adequate numbers of samples needed to separate classifications of ARDS classes and subclasses/phenotypes from non-ARDS patients.

1Inclusion ICD9 Codes: 51881, 51882, 51884, 51851, 51852, 51853, 5184, 5187, 78552, 99592, 9670, 9671, 9672; Exclusion ICD9 Codes: 49391, 49392, 49322, 4280.

ICD Embeddings and Patient Vectors

We then looked for a data-driven approach to provide additional insight into ARDS patient characteristics, risk factors, and phenotypes.

Intuitively, even without any additional patient EMR data, clinicians viewing properly coded patient diagnosis codes (e.g. ICD codes in a problem list) are typically able to create a mental summary of the overall patient condition, including medical history, risk factors, presenting conditions. ICD codes are used to describe both current diagnoses (e.g. Pneumonia, unspecified organism: ICD9 486 ), but also a variety of additional patient information. For example, ICD codes can describe ARDS risk factors, such as patient’s history and chronic conditions (e.g. Chronic kidney disease; Personal history of malignant neoplasm; etc.); information regarding past and current treatments and procedures (e.g. Infection due to other bariatric procedure). In some cases, ICD codes contain information such as the patient age group and/or susceptibilities (e.g. Sepsis of newborn; Elderly multigravida); expected outcome (Encounter for palliative care); patient social history (e.g. Adult emotional/psychological abuse; Cocaine dependence); the reason for the visit, (e.g. Railway accidents; Motor Vehicle accidents).

Using ICD codes for statistical analysis and predictive models, however, poses a series of challenges. Patient ICD codes in EMRs tend to be sparse. There are numerous ICD codes (around 15,000 ICD9 codes and around 68,000 ICD10 codes), with only a very small subset of these applicable to a particular patient (e.g. MIMIC3 admissions have an average of 11 ICD codes). ICD codes also tend to co-occur and overlap. In addition, ICD coding can be, in some cases, subjective and dependent on numerous external factors2123.

However, the concurrence and mutual information of ICD codes over large data repositories can be utilized. For example, the fact that Pneumonia ICD codes are often accompanied with ICD codes describing Cough, Fever, Pleural effusion, etc. can be utilized to generate vector representations of ICD codes. Inspired by deep learning representation, such as word embeddings24, it has been suggested that this medical code co-occurrence can be exploited to generate low-dimensional representations of ICD codes2527 that may facilitate EMR data-based exploratory analysis and predictive modeling28.

In this study, we utilized available MIMIC3 patient data to generate the ICD embeddings following the approach of Choi et al.25. In our approach, we generated a low-dimensional representation of the patient history, symptoms, risk factors, diagnosis, etc, by averaging the patient ICD code embeddings. We refer to this representation of the patient’s medical history and clinical condition as Patient Context Vectors.

To generate ARDS patients groups sharing similar characteristics and risk factors, we then clustered the ARDS Patient Context Vectors via k-means clustering29. Clinical review of the generated ARDS clusters determined the optimal number of clusters to be 10.

The Patient Context Vectors were able to clearly separate ARDS patient risk factors and conditions into clinically valid categories, such as Malignancy or Chronic Hepatic Disease. Figure 2 shows the frequency of patients in various clusters, sorted left-to-right according to mortality rate. Table 1 lists the 15 most representative ICD code descriptions (cluster centroids) for the 10 ARDS clusters. The cluster description was provided by clinicians reviewing the corresponding cluster centroids.

Figure 2:

Figure 2:

Frequency and mortality rate of MIMIC3 ARDS patient clusters formed by clustering of averaged ICD embeddings.

Table 1:

: The top 15 most representative ICD codes for various clusters, based on cosine similarity to the cluster centroid.

Malignancy Chronic Hepatic Disease
Malignant neoplasm of liver, secondary Other and unspecified coagulation defects
Secondary malignant neoplasm ofbone and bone marrow Other ascites
Anemia in neoplastic disease Hepatic encephalopathy
Secondary malignant neoplasm oflung Thrombocytopenia, unspecified
Secondary malignant neoplasm of other specified sites Portal hypertension
Neoplasm related pain (acute) (chronic) Alcoholic cirrhosis of liver
Personal history ofirradiation, presenting hazards to health Hepatorenal syndrome
Secondary and unspecified malignant neoplasm of intrathoracic lymph nodes Acquired coagulation factor deficiency
Hypercalcemia Spontaneous bacterial peritonitis
Personal history ofantineoplastic chemotherapy Acute and subacute necrosis of liver
Encounter for palliative care Esophageal varices in diseases classified elsewhere, without mention of bleeding
Antineoplastic and immunosuppressive drugs causing adverse effects in therapeutic use Cirrhosis of liver without mention of alcohol
Secondary malignant neoplasm ofpleura Other sequelae of chronic liver disease
Malignant pleural effusion Other shock without mention of trauma
Secondary malignant neoplasm ofbrain and spinal cord Portal vein thrombosis
Nosocomial Complications Vascular Disease Complications
Urinary tract infection, site not specified Acute kidney failure, unspecified
Intestinal infection due to Clostridium difficile Congestive heart failure, unspecified
Pressure ulcer, lower back Hyperpotassemia
Acute respiratory failure Anemia in chronic kidney disease
Hyperosmolality and/or hypernatremia Hypertensive chronic kidney disease, unspecified, with chronic kidney disease stage V or end stage renal disease
Pneumonitis due to inhalation offood or vomitus Candidiasis of other urogenital sites
Anemia of other chronic disease End stage renal disease
Candidiasis of other urogenital sites Intestinal infection due to Clostridium difficile
Pneumonia, organism unspecified Long-term (current) use of insulin
Pseudomonas infection in conditions classified elsewhere and of unspecified site Below knee amputation status
Acute and chronic respiratory failure Chronic kidney disease, unspecified
Pressure ulcer, stage II Severe sepsis
Alkalosis Acute kidney failure with lesion of tubular necrosis
Mixed acid-base balance disorder Pressure ulcer, lower back
Pressure ulcer, buttock Diabetes with renal manifestations, type II or unspecified type, not stated as uncontrolled
Multi-organ Failure Neurologic Comorbidities
Severe sepsis Encephalopathy, unspecified
Septic shock Intracerebral hemorrhage
Acute respiratory failure Cerebral artery occlusion, unspecified with cerebral infarction
Unspecified septicemia Dysphagia, unspecified
Acute kidney failure with lesion oftubular necrosis Hemiplegia, unspecified, affecting unspecified side
Acidosis Obstructive hydrocephalus
Thrombocytopenia Cerebral edema
Other and unspecified coagulation defects Subdural hemorrhage
Streptococcal septicemia Compression of brain
Intestinal infection due to Clostridium difficile Hyperosmolality and/or hypernatremia
Hyposmolality and/or hyponatremia Other encephalopathy
Candidiasis of other urogenital sites Cerebral embolism with cerebral infarction
Defibrination syndrome Other disorders of neurohypophysis
Candidiasis of mouth Grand mal status
Acute kidney failure Other convulsions
Surgical Source Control Complex Comorbidities
Paralytic ileus Anemia, unspecified
Unspecified pleural effusion Unspecified acquired hypothyroidism
Unspecified septicemia Diabetes mellitus without mention of complication, type II or unspecified type, not stated as uncontrolled
Acute kidney failure with lesion oftubular necrosis Chronic airway obstruction, not elsewhere classified
Acute vascular insufficiency of intestine Atrial flutter
Intestinal infection due to Clostridium difficile Mixed acid-base balance disorder
Perforation of intestine Atrial fibrillation
Streptococcal septicemia Acute kidney failure, unspecified
Other postoperative infection Pneumonia, organism unspecified
Disseminated candidiasis Delirium due to conditions classified elsewhere
Peritoneal abscess Do not resuscitate status
Acidosis Chronic kidney disease, unspecified
Candidiasis of other urogenital sites Other and unspecified hyperlipidemia
Other suppurative peritonitis Personal history of tobacco use
Unspecified intestinal obstruction Hypoxemia
Trauma/Fx Toxic Ingestion
Closed fracture of dorsal [thoracic] vertebra without mention of spinal cord injury Bipolar disorder,unspecified
Traumatic pneumothorax without mention of open wound into thorax Poisoning by tricyclic antidepressants
Contusion of lung without mention of open wound into thorax Alcohol withdrawal
Open wound of scalp, without mention of complication Poisoning by benzodiazepine-based tranquilizers
Closed fracture oflumbar vertebra without mention ofspinal cord injury Poisoning by other antipsychotics, neuroleptics, and major tranquilizers
Other closed skull fracture with cerebral laceration and contusion, with loss of consciousness of unspecified duration Lack of housing
Open wound offorehead, without mention ofcomplication Rhabdomyolysis
Nontraffic accident involving other off-road motor vehicle injuring motorcyclist Other, mixed, or unspecified drug abuse. unspecified
Accidental fall on or from other stairs or steps Poisoning by aromatic analgesics, not elsewhere classified
Motor vehicle traffic accident of unspecified nature injuring passenger in motor vehicle Suicide and self-inflicted poisoning by analgesics, antipyretics, and antirheumatics
Injury by unspecified means, undetermined whether accidentally or purposely inflicted Substance abuse in family
Traumatic shock Suicide and self-inflicted poisoning by tranquilizers and other psychotropic agents
Closed fracture of clavicle, unspecified part Toxic effect of ethyl alcohol
Closed fracture of seventh cervical vertebra Posttraumatic stress disorder
Closed fracture of scapula, unspecified part Other and unspecified alcohol dependence, continuous

Interestingly, the manually clinician-curated Cmap appears to overlap to a large extent with the automatically derived ARDS patient clusters. For example, clinicians listed chemotherapeutic agents as a distinct risk factor for ARDS. Our data-driven analysis showed a distinct cluster of ARDS patients with ICD codes describing malignancy. Further analysis could delineate if this is a direct relationship between chemotherapeutic agents and the development of ARDS, or rather reflects an indirect relationship. For example, perhaps chemotherapy-induced immunosuppression acts as a risk factor for sepsis which, in turn, acts as a risk factor for ARDS.

Furthermore, the mortality associated with different clusters was also consistent with clinician experience. Clinicians recognize that patients with advanced malignancy may develop severe infections and ARDS as a final common pathway in their advanced disease. In this context, ARDS may represent a manifestation of their advanced underlying disease, and therapies directed at ARDS and its precipitant may not significantly impact mortality. Indeed, the cluster of malignancy-associated ARDS had a very high mortality rate of 90%.

In contrast, patients who develop ARDS as a result of trauma typically were well enough to be engaged in the activity leading to trauma, and thus ARDS may truly be the primary disease as opposed to a symptom of advanced underlying disease. The likelihood of mortality in this group may be more significantly influenced by the treatments targeted at ARDS. In our analysis, the cluster of trauma associated-ARDS had a significantly lower mortality rate of 30% compared to other clusters.

Predictive Analytics

Clustering and manual evaluation by clinicians of Patient Context Vectors proved to be a useful tool in summarizing the overall patient condition, risk factors and history. Real-time CDS systems, however, might not have access to the full set of the patient ICD codes as they might be entered in the EMR system at a later stage. It has been observed that clinical notes, specifically nursing and physician notes, typically contain all of the information available from ICD codes. Furthermore, while past medical history and presenting conditions might not always be ICD-coded, they are typically available in the form of free-text notes. In previous work30, we have successfully utilized free-text for predicting Patient Context Vectors, in the case of missing ICD codes. Additionally, we were able to successfully combine information available in ICD codes and in nursing notes and produce an average Patient Context Vector.

A word-level Convolutional Neural Network (CNN) was trained to predict from free-text notes the Patient Context Vector (averaged ICD code embedding). Structured data, in the form of vital signs and lab results, was then combined with the predicted Patient Context Vector and utilized in a machine learning model trained to predict ARDS patients.

Figure 3 summarizes the system workflow during prediction time.

Figure 3:

Figure 3:

Real-time ARDS prediction workflow. Nursing notes available at prediction time are used to predict Patient Context Vectors. ICD codes available at prediction time are also converted to Patient Context Vectors by averaging ICD code embeddings. Patient Context Vectors are used together with structured EMR data to predict the patient ARDS status.

The ARDS prediction model was trained utilizing structured data from the first 24 hours of admission, as described above in the Dataset Section. In addition, the first half of the available patient nursing notes and the first half of entered ICD codes were used to produce Patient Context Vectors of size 50. The patient ICD codes were used to look up the corresponding ICD code embedding and averaged. Each nursing note was used to predict the Patient Context Vector via the trained word-level CNN. All Patient Context Vectors were then averaged and used in a addition to the structured MIMIC3 data. As the MIMIC3 dataset ICD codes lack timestamps, we were unable to identify ICD codes available during the first 24 hours of admission. The MIMIC3 ICD Codes, however, are ordered, and we used the first half of the ICD codes, and the first half of notes, as an approximation of data available early in the patient stay.

A Gradient Boosting Machine (GBM) model31,32 was used to predict ARDS patients from the total population of adult patients. A GBM model was also used to predict the mortality among all ARDS patients. Table 2 shows the result from the experiments. All results were produced via 10-fold cross validation.

Table 2:

: 10-fold cross-validation GBM results of predicting ARDS patients and predicting mortality among ARDS patients. P=Precision, R=Recall, F1= F1-score for the positive class. The Baseline set of features consists of vital signs, lab results, Glasgow Coma Scale score, gender and age, in the form of structured data. ”Baseline + first half of notes/ICD” includes also the average of the first half of entered visit ICD codes embeddings, and Patient Context Vectors predicted from the first half of the visit nursing notes.

ARDS Prediction        
Features AUC P R F1
Baseline 79.74 29.77 67.80 37.17
Baseline + first half of notes/ICD 90.16 49.46 78.85 48.63
       
ARDS Mortality Prediction        
Features AUC P R F1
Baseline 78.26 69.19 92.60 79.20
Baseline + first half of notes/ICD 82.11 73.66 89.99 81.01
       

In both prediction models, the inclusion of Patient Context Vectors significantly increased the overall model performance. Intuitively, the patient medical history and overall clinical condition are important predictive factors. Results demonstrate that Patient Context Vectors can be successfully utilized to represent additional knowledge of a patient’s condition. The importance of the Patient Context Vector is also demonstrated by the scaled GBM variable importance shown in Figure 4. Together with critical vital signs measurements, various Patient Context Vector dimensions (shown with prefix embedding) play an important role in predicting the patient’s ARDS outcome. A known limitation of low-dimensionality representations is the lack of interpretably of individual dimensions (e.g. embedding 39 lacks the interpretability of systolic blood pressure or temperature). Future work will focus on interpretability-imparted patient context vector embeddings, amenable to clinical interpretation.

Figure 4:

Figure 4:

GBM scaled variable importance of Baseline prediction model features plus Patient Context Vectors from first half of ICD codes/notes.

In terms of practical application, the proposed system can be used in addition to existing high recall/low precision hospital alert systems, and used to prioritize alerts, mitigating the effects of alert fatigue. Furthermore, the imperfect and noisy nature of the automatically created dataset is likely resulting in over-pessimistic evaluation. It has been shown that ML classification algorithms are able to achieve high performance at relatively high levels of noise and that ML models generated from noisy datasets perform significantly better when evaluated on clean test sets (10 to 30% classification accuracy improvement at high training set noise levels)33–35. Future work will focus on creating a clean, clinician-reviewed ARDS test dataset for more precise evaluation of the proposed approach.

Related Work

Numerous prediction scores have been developed to assess ARDS prognosis. Gajic et al.5 developed a Lung Injury Prediction Score (LIPS) formula including predisposing conditions, such as sepsis, shock, pneumonia, alcohol abuse, chemotherapy, FIO2 and respiratory rate measures, found useful in predicting ARDS and mortality in surgical critical care patients36. Interestingly, over 70% of score points associated with LIPS score calculation are based on patient context data rather than vials, labs, symptoms.

Other tools, such as Villar et al.6 base their ARDS prediction score on Age, PaO2/FIO2, and Plateau pressure score. Levitt et al.7 developed an Early Acute Lung Injury (EALI) score including risk factors, respiratory rate, and oxygen requirement. Xie et al.8 developed a modified ARDS prediction score (MAPS) based on a hand-crafted set of risk factors, risk modifiers, vital signs, etc.

A number of studies focus on predicting mortality among ARDS patients. For example, similar to our cluster findings, Hyers37 reports that patients who develop bacterial sepsis and multiple organ dysfunction are at high risk of dying and patients who develop ARDS from trauma or other noninfectious causes have a better prognosis. Navarrete-Navarro et al.38 report that mortality among ARDS patients correlates with the PaO2/FIO2 ratio on the 3rd day of ARDS, the APACHE III score, and the development of multiple system organ failure. Timmons et al.39 studied children with ARDS and found significant differences between survivors and nonsurvivors based on intrapulmonary venous admixture, mean airway pressure, alveolar-arterial oxygen tension difference, oxygenation index, and peak inspiratory pressure. Villar et al.40 built an ARDS mortality prediction model based on tertiles of patient age, plateau airway pressure, and PaO2/FIO2 at the time the patient meets ARDS criteria. Similarly, Spicer et al.41 studied pediatric ARDS patients and determined that oxygenation index and hematopoietic stem cell transplant / cancer history can be used on Day 1 or Day 3 of ARDS to predict hospital mortality without the need for more complex models.

Unlike the current work, the described previous studies utilized only structured patient data, and ICD codes/risk factors, when used, consisted of manually crafted lists.

In a broader context, a large volume of literature on combining structured and free-text EMR data apply Medical Concept detection on the free-text notes for manually curated list of risk factors and other disease-relevant medical concepts. Ford et al.42 present a review of various approaches to Medical Concept detection from free-text notes for the purpose of detecting cases of a clinical condition, often in conjunction with structured data.

More recently, deep learning has been used to utilize free-text and structured EMR data. Shickel et al.43 present a survey of various deep learning techniques. Miotto et al.44 build a Deep Patient representation in an unsupervised manner via denoising autoencoders, however, similar to previous approaches they first pre-process the free-text notes by extracting medical concepts with an off-the-shelf tool. Various studies2527,45 use deep learning techniques to generate low-dimensional representations of diagnosis codes and patients utilizing structured data (diagnosis codes, medications, and procedures). Unlike previous work, we combine free-text and structured EMR data for obtaining low-dimensional patient representations, without the use medical concept detection.

Conclusion

This work demonstrates the utility of deep learning techniques to summarize a patient’s medical history, risk factors, comorbidities, and current signs and symptoms in the form of Patient Context Vectors. Automatically generated ARDS patients clusters agree with manually curated clinician knowledge and provide additional insight into the complexities and risk factors associated with ARDS. More importantly, Patient Context Vectors, derived from available ICD codes and nursing notes, can be easily combined with structured EMR data to build real-time ARDS CDS tools, with potential to improve patient outcomes and reduce mortality among ARDS patients.

Acknowledgements

Research reported in this publication was supported by a NIH SBIR award to CTA by NIH National Heart, Lung, and Blood Institute, of the National Institutes of Health under award number 1R43HL135909-01A1.

Figures & Table

References

  • 1.Máca J, Jor O, Holub M, Sklienka P, Burˇsa F, Burda M, et al. Past and present ARDS mortality rates: a systematic review. Respiratory care. 2017;62((1)):113–122. doi: 10.4187/respcare.04716. [DOI] [PubMed] [Google Scholar]
  • 2.Bellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, et al. Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries. Jama. 2016;315(8):788–800. doi: 10.1001/jama.2016.0291. [DOI] [PubMed] [Google Scholar]
  • 3.Fan E, Del Sorbo L, Goligher EC, Hodgson CL, Munshi L, Walkey AJ, et al. An official American Thoracic So-ciety/European Society of Intensive Care Medicine/Society of Critical Care Medicine clinical practice guideline: mechanical ventilation in adult patients with acute respiratory distress syndrome. American journal of respiratory and critical care medicine. 2017;195((9)):1253–1263. doi: 10.1164/rccm.201703-0548ST. [DOI] [PubMed] [Google Scholar]
  • 4.Murray JF, Matthay MA, Luce JM, Flick MR, et al. An expanded definition of the adult respiratory distress syndrome. Am Rev Respir Dis. 1988;138(3):720–723. doi: 10.1164/ajrccm/138.3.720. [DOI] [PubMed] [Google Scholar]
  • 5.Gajic O, Dabbagh O, Park PK, Adesanya A, Chang SY, Hou P, et al. Early identification of patients at risk of acute lung injury: evaluation of lung injury prediction score in a multicenter cohort study. American journal of respiratory and critical care medicine. 2011;183((4)):462–470. doi: 10.1164/rccm.201004-0549OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Villar J, Ambrós A, Soler JA, Martínez D, Ferrando C, Solano R, et al. Age, PaO2/FIO2, and Plateau pressure score: a proposal for a Simple Outcome score in patients with the Acute Respiratory distress syndrome. Critical care medicine. 2016;44((7)):1361–1369. doi: 10.1097/CCM.0000000000001653. [DOI] [PubMed] [Google Scholar]
  • 7.Levitt JE, Calfee CS, Goldstein BA, Vojnik R, Matthay MA. Early acute lung injury: criteria for identifying lung injury prior to the need for positive pressure ventilation. Critical care medicine. 2013;41((8)):1929. doi: 10.1097/CCM.0b013e31828a3d99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Xie J, Liu L, Yang Y, Yu W, Li M, Yu K, et al. A modified acute respiratory distress syndrome prediction score: a multicenter cohort study in China. Journal of thoracic disease. 2018;10((10)):5764. doi: 10.21037/jtd.2018.09.117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Force ADT, Ranieri V, Rubenfeld G, et al. Acute respiratory distress syndrome. Jama. 2012;307((23)):2526–2533. doi: 10.1001/jama.2012.5669. [DOI] [PubMed] [Google Scholar]
  • 10.Shaver CM, Bastarache JA. Clinical and biological heterogeneity in acute respiratory distress syndrome: direct versus indirect lung injury. Clinics in chest medicine. 2014;35((4)):639–653. doi: 10.1016/j.ccm.2014.08.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sendelbach S, Funk M. Alarm fatigue: a patient safety concern. AACN advanced critical care. 2013;24((4)):378– 386. doi: 10.1097/NCI.0b013e3182a903f9. [DOI] [PubMed] [Google Scholar]
  • 12.Wasserman RC. Electronic medical records (EMRs), epidemiology, and epistemology: reflections on EMRs and future pediatric clinical research. Academic pediatrics. 2011;11((4)):280–287. doi: 10.1016/j.acap.2011.02.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Calfee CS, Delucchi K, Parsons PE, Thompson BT, Ware LB, Matthay MA, et al. Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials. The Lancet Respiratory Medicine. 2014;2((8)):611–620. doi: 10.1016/S2213-2600(14)70097-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sinha P, Delucchi KL, Thompson BT, McAuley DF, Matthay MA, Calfee CS, et al. Latent class analysis of ARDS subphenotypes: a secondary analysis of the statins for acutely injured lungs from sepsis (SAILS) study. Intensive care medicine. 2018;44((11)):1859–1869. doi: 10.1007/s00134-018-5378-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zhang Z. Identification of three classes of acute respiratory distress syndrome using latent class analysis. PeerJ. 2018;6:e4592. doi: 10.7717/peerj.4592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Scientific data. 2016;3 doi: 10.1038/sdata.2016.35. 160035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bime C, Poongkunran C, Borgstrom M, Natt B, Desai H, Parthasarathy S, et al. Racial Differences in Mortality from Severe Acute Respiratory Failure in the United States, 2008–2012. Annals of the American Thoracic Society. 2016;13((12)):2184–2189. doi: 10.1513/AnnalsATS.201605-359OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Rawal G, Yadav S, Kumar R. Acute respiratory distress syndrome: An update and review. Journal of Translational Internal Medicine. 2016 doi: 10.1515/jtim-2016-0012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Shari G, Kojicic M, Li G, Cartin-Ceba R, Alvarez CT, Kashyap R, et al. Timing of the onset of acute respiratory distress syndrome: a population-based study. Respiratory care. 2011;56((5)):576–582. doi: 10.4187/respcare.00901. [DOI] [PubMed] [Google Scholar]
  • 20.Cañas AJ, Hill G, Carff R, Suri N, Lott J, Gómez G, et al. CmapTools: A knowledge modeling and sharing environment. 2004 [Google Scholar]
  • 21.O’malley KJ, Cook KF, Price MD, Wildes KR, Hurdle JF, Ashton CM. Measuring diagnoses: ICD code accuracy. Health services research. 2005;40((5p2)):1620–1639. doi: 10.1111/j.1475-6773.2005.00444.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Stausberg J, Lehmann N, Kaczmarek D, Stein M. Reliability of diagnoses coding with ICD-10. International journal of medical informatics. 2008;77((1)):50–57. doi: 10.1016/j.ijmedinf.2006.11.005. [DOI] [PubMed] [Google Scholar]
  • 23.Burles K, Innes G, Senior K, Lang E, McRae A. Limitations of pulmonary embolism ICD-10 codes in emergency department administrative data: let the buyer beware. BMC medical research methodology. 2017;17((1)):89. doi: 10.1186/s12874-017-0361-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781. 2013 [Google Scholar]
  • 25.Choi Y, Chiu CYI, Sontag D. Learning low-dimensional representations of medical concepts. AMIA Summits on Translational Science Proceedings. 2016;2016:41. [PMC free article] [PubMed] [Google Scholar]
  • 26.Choi E, Bahadori MT, Searles E, Coffey C, Thompson M, Bost J, et al. Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM; 2016:p. 1495–1504. [Google Scholar]
  • 27.Kartchner D, Christensen T, Humpherys J, Wade S. Code2Vec: Embedding and Clustering Medical Diagnosis Data. In: Healthcare Informatics (ICHI), 2017 IEEE International Conference on. 2017:386–390. IEEE; [Google Scholar]
  • 28.Bai T, Chanda AK, Egleston BL, Vucetic S. EHR phenotyping via jointly embedding medical concepts and words into a unified vector space. BMC medical informatics and decision making. 2018;18((4)):123. doi: 10.1186/s12911-018-0672-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.MacQueen J, et al. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. vol. 1. Oakland, CA, USA: 1967. Some methods for classification and analysis of multivariate observations; pp. p. 281–297. [Google Scholar]
  • 30.Apostolova E, Wang T, Tschampel T, Koutroulis I, Velez T. Combining Structured and Free-text Electronic Medical Record Data for Real-time Clinical Decision Support. To appear in BioNLP. 2019 [Google Scholar]
  • 31.Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of statistics. 2001:p. 1189–1232. [Google Scholar]
  • 32.h2o.ai; Accessed: 2019-01-30. https://www.h2o.ai/ [Google Scholar]
  • 33.Frénay B, Verleysen M. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems. 2013;25((5)):845–869. doi: 10.1109/TNNLS.2013.2292894. [DOI] [PubMed] [Google Scholar]
  • 34.Rolnick D, Veit A, Belongie S, Shavit N. Deep learning is robust to massive label noise. arXiv preprint arXiv:170510694. 2017 [Google Scholar]
  • 35.Kreek RA, Apostolova E. Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data. In: Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text; 2018:. p. 104–109. [Google Scholar]
  • 36.Bauman ZM, Gassner MY, Coughlin MA, Mahan M, Watras J. Lung injury prediction score is useful in predicting acute respiratory distress syndrome and mortality in surgical critical care patients. Critical care research and practice. 2015;2015 doi: 10.1155/2015/157408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Hyers T. Prediction of survival and mortality in patients with adult respiratory distress syndrome. New horizons (Baltimore, Md) 1993;1((4)):466–470. [PubMed] [Google Scholar]
  • 38.Navarrete-Navarro P, Ruiz-Bailén M, Rivera-Fernández R, Guerrero-López F, Pola-Gallego-de Guzmán MD, Vázquez-Mata G. Acute respiratory distress syndrome in trauma patients: ICU mortality and prediction factors. Intensive care medicine. 2000;26((11)):1624–1629. doi: 10.1007/s001340000683. [DOI] [PubMed] [Google Scholar]
  • 39.Timmons OD, Dean JM, Vernon DD. Mortality rates and prognostic variables in children with adult respiratory distress syndrome. The Journal of pediatrics. 1991;119((6)):896–899. doi: 10.1016/s0022-3476(05)83039-2. [DOI] [PubMed] [Google Scholar]
  • 40.Villar J, Pérez-Méndez L, Basaldúa S, Blanco J, Aguilar G, Toral D, et al. A risk tertiles model for predicting mortality in patients with acute respiratory distress syndrome: age, plateau pressure, and PaO2/FiO2 at ARDS onset can predict mortality. Respiratory care. 2011;56((4)):420–428. doi: 10.4187/respcare.00811. [DOI] [PubMed] [Google Scholar]
  • 41.Spicer AC, Calfee CS, Zinter MS, Khemani RG, Lo VP, Alkhouli MF, et al. A Simple and Robust Bedside Model for Mortality Risk in Pediatric Patients with ARDS. Pediatric critical care medicine: a journal of the Society of Critical Care Medicine and the World Federation of Pediatric Intensive and Critical Care Societies. 2016;17((10)):907. doi: 10.1097/PCC.0000000000000865. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review. Journal of the American Medical Informatics Association. 2016;23((5)):1007–1015. doi: 10.1093/jamia/ocv180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE journal of biomedical and health informatics. 2018;22((5)):1589–1604. doi: 10.1109/JBHI.2017.2767063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports. 2016;6 doi: 10.1038/srep26094. 26094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Choi E, Xiao C, Stewart W, Sun J. Mime: Multilevel medical embedding of electronic health records for predictive healthcare. In: Advances in Neural Information Processing Systems. 2018: p. 4547–4557. [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES