Mining heterogeneous clinical notes by multi-modal latent topic model

Zhi Wen; Pratheeksha Nair; Chih-Ying Deng; Xing Han Lu; Edward Moseley; Naomi George; Charlotta Lindvall; Yue Li

doi:10.1371/journal.pone.0249622

. 2021 Apr 8;16(4):e0249622. doi: 10.1371/journal.pone.0249622

Mining heterogeneous clinical notes by multi-modal latent topic model

Zhi Wen ^1,^#, Pratheeksha Nair ^1,^#, Chih-Ying Deng ², Xing Han Lu ¹, Edward Moseley ², Naomi George ³, Charlotta Lindvall ^2,^*, Yue Li ^1,^*

Editor: Ivan Olier⁴

PMCID: PMC8031429 PMID: 33831055

Abstract

Latent knowledge can be extracted from the electronic notes that are recorded during patient encounters with the health system. Using these clinical notes to decipher a patient’s underlying comorbidites, symptom burdens, and treatment courses is an ongoing challenge. Latent topic model as an efficient Bayesian method can be used to model each patient’s clinical notes as “documents” and the words in the notes as “tokens”. However, standard latent topic models assume that all of the notes follow the same topic distribution, regardless of the type of note or the domain expertise of the author (such as doctors or nurses). We propose a novel application of latent topic modeling, using multi-note topic model (MNTM) to jointly infer distinct topic distributions of notes of different types. We applied our model to clinical notes from the MIMIC-III dataset to infer distinct topic distributions over the physician and nursing note types. Based on manual assessments made by clinicians, we observed a significant improvement in topic interpretability using MNTM modeling over the baseline single-note topic models that ignore the note types. Moreover, our MNTM model led to a significantly higher prediction accuracy for prolonged mechanical ventilation and mortality using only the first 48 hours of patient data. By correlating the patients’ topic mixture with hospital mortality and prolonged mechanical ventilation, we identified several diagnostic topics that are associated with poor outcomes. Because of its elegant and intuitive formation, we envision a broad application of our approach in mining multi-modality text-based healthcare information that goes beyond clinical notes. Code available at https://github.com/li-lab-mcgill/heterogeneous_ehr.

Introduction

Multitudes of clinical notes are generated within the electronic health records (EHR) for each encounter between a patient and healthcare providers. These notes are written by clinical experts with specialized domain knowledge and include a plethora of rich information not otherwise captured within the EHR’s laboratory, imaging, billing, and administrative documentation. Importantly, there exist overlapping sub-domains of medical knowledge which depends on the particular expertise of the author. Due to distinct medical domain knowledge, different note types often involve different clinical vocabularies. In particular, clinical notes authored by physicians may differ considerably in terms of vocabulary and content compared to those notes authored by registered nurses. While Latent Dirichlet Allocation (LDA) [1] is a popular approach to extract meaningful topics from documents, it assumes that all of the documents follow the same topic distributions. We hypothesize that by modeling different note types each with a distinct discrete distribution using a multi-modal latent topic model, we can improve the interpretability of the latent topics learned from the notes and generate a more accurate risk stratification of patients.

To this end, we propose a multi-note topic model (MNTM) that jointly infers distinct latent topic distributions corresponding to each distinct note type. As a proof-of-concept, we use the clinical notes from the Medical Information Mart for Intensive Care III (MIMIC-III) data [2] for 17,000 patients in the intensive care unit (ICU). Our goal is to develop an early prediction model of the risk of prolonged mechanical ventilation (PMV) and in-hospital mortality among ICU patients based solely on clinical notes data accrued during the first 48 hours of their ICU admission. Early prediction was selected as the unit of analysis because of its high clinical relevance. PMV and in-hospital mortality were selected because they are the conventional outcomes for early prognostication in the critical care literature [3].

Related methods

Our method of latent topic modeling is distinct from several previous methods [1, 4–6]. While previous investigators have employed latent topic models for mining clinical notes, to the best of our knowledge, none of these methods consider distinct note types differently. Chen et al. (2015) applied LDA directly to the EHR data without considering multi-modality [4]. Pivovarov et al. (2015) described a multi-modal LDA that infers topics by data types, where clinical note is one of the four data types (billing code, laboratory tests, clinical notes, and prescription) [5] but does not distinguish between note types. This model only works with a fixed set of data types. Li et al. (2020) described a multi-modal topic model called MixEHR to jointly infer distinct topic distributions for each data type while imputing non-missing at random laboratory test results [7]. While MixEHR can generalize to any arbitrary data type, it has not been applied to the current problem of multi-note-type modeling. Therefore, we consider our current approach as a novel application of the multi-modal topic model.

Methods

Multi-modal latent topic model

We propose a multi-modal latent topic model (Fig 1). Suppose there are K latent disease topics. Each topic k ∈ {1, …, K} under note type t ∈ {1, …, T} represents a distribution over the vocabulary, which is a vector of unknown word frequency $ϕ_{k}^{(t)} = {[ϕ_{w k}^{(t)}]}_{W^{(t)}}$ for W^(t) distinct words in the vocabulary. We assume that the topic-specific word frequency $ϕ_{k}^{(t)}$ follows a Dirichlet distribution with unknown hyperparameter β_wt. For each patient j ∈ {1, …, D}, the disease mixture membership θ_j is generated from the K-dimensional Dirichlet distribution Dir(α) with unknown asymmetric hyperparameters α_k. To generate a note token i for patient j, a latent topic $z_{i j}^{(t)}$ under data type t is first drawn from a categorical distribution θ_j. Then a clinical feature $x_{i j}^{(t)}$ is drawn from a categorical distribution with rate equal to $ϕ_{z_{i j}^{(t)}}^{(t)}$ .

Formally, we first generate global variables for the K topics:

\begin{matrix} ϕ_{k}^{(t)} \sim D i r (β_{t}) : \frac{Γ (\sum_{w} β_{w t})}{\prod_{w} Γ (β_{w t})} \prod_{w} {[ϕ_{w k}^{(t)}]}^{β_{w t} - 1} \end{matrix}

where t is the note types (e.g., t ∈ {physician note, nursing note}). We then generate local variables for the patient topic mixture:

θ_{j} \sim D i r (α) : \frac{Γ (\sum_{k} α_{k})}{\prod_{k} Γ (α_{k})} \prod_{k} θ_{j k}^{α_{k} - 1}

Given the topic mixture, we sample a topic for each token in note type t of each patient’s note:

z_{i j}^{(t)} \sim C a t (θ_{j}) : \prod_{k} θ_{j k}^{[z_{i j}^{(t)} = k]}

We then sample a word for token i from topic distribution under topic z_ij:

x_{i j}^{(t)} \sim C a t (ϕ_{k}^{(t)}) : \prod_{w} {(ϕ_{k w}^{(t)})}^{[x_{i j}^{(t)} = w]}

Notably, the topic mixture θ_j is shared across note types and can therefore facilitate “borrowing” information between different note types when learning the topic distribution ϕ^(t).

To learn the model, we implemented a collapsed variational Bayesian algorithm [8]. Briefly, we first integrate out the Dirichlet variables because they are conjugate to the multinomial distribution of the tokens making the resulting inference much more efficient. We then approximate the expectations by first deriving the conditional distribution for the topic assignments $z_{i j k}^{(t)}$ and then approximating their sufficient statistics by the variational parameters:

γ_{i j k}^{(t)} \propto (α_{k} + {\tilde{n}}_{. j k}^{- (i, j)}) (\frac{β_{t x_{i j}^{(t)}} + {[{\tilde{n}}_{x_{i j}^{(t)} . k}^{(t)}]}^{- (i, j)}}{\sum_{w} β_{w t} + {[{\tilde{n}}_{w . k}^{(t)}]}^{- (i, j)}})

(1)

where the notation n^−(i,j) indicates the exclusion of token i in patient j’s clinical note and the sufficient statistics are

{\tilde{n}}_{. j k}^{- (i, j)} = \sum_{t = 1}^{T} \sum_{i^{'} \neq i}^{N_{j}^{(t)}} γ_{i^{'} j k}

(2)

{[{\tilde{n}}_{w_{t} . k}^{(t)}]}^{- (i, j)} = \sum_{j^{'} = 1}^{D} \sum_{i^{'} = 1}^{N_{j^{'}}} [x_{i^{'} j^{'}}^{(t)} = w_{t}] γ_{w_{t} j^{'} k}^{(t)} - [x_{i j}^{(t)} = w_{t}] γ_{w_{t} j k}^{(t)}

(3)

The learning algorithm therefore follows a variational Bayes expectation-maximization algorithm: E-step infers $γ_{i j k}^{(t)}$ ’s with Eq (1); M-step updates sufficient statistics ${\tilde{n}}_{. j k}$ and ${\tilde{n}}_{w . k}^{(t)}$ with Eqs (2) and (3), respectively. The EM update guarantees maximizing the evidence lower bound (ELBO) of the model under the mean-field variational distribution for independent topic assignments (i.e., $q (z) = \prod_{t, i, j} q (z_{i j}^{(t)} | γ_{i j}^{(t)})$ ) [8].

Upon convergence of ELBO, we infer the respective variational expectations of the patient topic mixture and topics distribution:

\begin{matrix} {\hat{θ}}_{j k} = \frac{α_{k} + {\tilde{n}}_{j k}^{(.)}}{\sum_{k^{'}} α_{k^{'}} + {\tilde{n}}_{j k^{'}}^{(.)}}, {\hat{ϕ}}_{w k}^{(t)} = \frac{β_{w t} + {\tilde{n}}_{w k}^{(t)}}{\sum_{w^{'}} β_{w^{'} t} + {\tilde{n}}_{w^{'} k}^{(t)}} \end{matrix}

Furthermore, we update the hyper-parameters by maximizing the marginal likelihood under the variational expectations via empirical Bayes fixed-point update [9, 10]:

α_{k}^{*} \leftarrow \frac{a_{α} - 1 + α_{k} \sum_{j} Ψ (α_{k} + {\tilde{n}}_{j k} + {\tilde{m}}_{j k}) - Ψ (α_{k})}{b_{α} + \sum_{j} Ψ ({\tilde{n}}_{j k} + \sum_{k} α_{k}) - Ψ (\sum_{k} α_{k})}

(4)

β_{w t}^{*} \leftarrow \frac{a_{β} - 1 + β_{w t} \sum_{k} \sum_{w} Ψ (β_{w t} + n_{w . k}^{(t)}) - K W_{t} Ψ (β_{w t})}{b_{β} + \sum_{k} Ψ (W_{t} β_{w t} + \sum_{w} n_{w . k}^{(t)} - K Ψ (W_{t} β_{w t})}

(5)

where Ψ(.) is the digamma function, W_t is the vocabulary size under clinical note type t, the Gamma parameters are set to fixed values mainly for numerical stability: a_α = 1;b_α = 0, a_β = 1, b_β = 100.

MIMIC-III note processing

From the entire cohort (all patients admitted to the ICU), we selected a subset, which we have called day-2 cohort. This subset includes the notes of patients that have been mechanically ventilated for at least two consecutive days. We used the entire cohort excluding the day-2 cohort, to train our unsupervised topic model and then used this trained topic model to infer topic mixtures of notes in the day-2 cohort, which are used for mechanical ventilation prediction.

For both cohorts, we performed a standard text preprocessing procedure including converting letters to lower case, removing punctuation, white spaces, stop words provided by Natural Language Toolkit library (https://www.nltk.org/), and words that appeared in fewer than 5 notes or in more than 15% of notes. After the preprocessing, each note had around 300 words on average. The vocabulary for physicians’ notes contained 8948 words and the vocabulary for nursing notes contained 8076 words. In our study, the notes of an admission, instead of a patient, were grouped together as one document, and were therefore assumed to have one topic composition. While notes written in different admissions might have different focuses on the topics, it is reasonable to assume notes within a single admission have mostly the same topics, including notes written by different professionals.

For the single-note-type model, we processed the notes in two different ways: (1) the same words from the different types were assigned the same word ID and their frequencies were the overall total sum over all types of notes (referred to as “single-note-type (same word)”); (2) the same words from different types were assigned different word IDs and their frequencies were computed separately (referred to “single-note-type (diff. word)”). For example, the word ‘heartbeat’ may occur in both a physician’s note as well as a nursing note but is represented separately (as ‘physician-heartbeat’ and ‘nurse-heartbeat’). For the proposed multi-note model, we differentiated such words by assigning different note types to them.

We evaluated our model’s predictive performance by 5-fold cross-validation. Prolonged mechanical ventilation was defined as ≥ 7 days because this time period represents a major clinical decision branch in a patient’s care [11–13].

Qualitative evaluation

We performed a qualitative evaluation of the topic cohesiveness. Topic cohesiveness was defined a priori as “relatedness of each term within the topic to a central disease process or health state”. Cohesiveness was measured by a blinded physician using a 5-point scale. A second blinded physician with content expertise in critical care medicine reviewed the word clouds of each model in aggregate and provided a determination of the relative cohesiveness of the two models.

Results

Multi-note model improves PMV and mortality prediction

In each validation fold, we trained both the single-note models (single-note-type (same words) and single-note-type (diff. words)) and the multi-note model on the training set followed by a logistic regression model to predict the binary outcome of PMV also on the same training set. We used 50 topics for each of the 3 topic models. We experimented with 10, 30, 50 and 100 topics by measuring the perplexity on held-out documents and chose the best number of topics going forward.

We then predicted the PMV binary outcome on the validation set (Fig 2). We observed consistent improvement in terms of area under the receiver operating characteristic (ROC) curves (AUROC: 66.8% for multi note type, 66.0% for single note type (diff. words), 60.7% for single note type (same words)) and area under the precision-recall curve (AUPRC: 40.8% for multi note type, 39.2% for single-note-type (diff. words), 33.9% for single note type (same words)). In particular, the multi-type model achieved AUROC equal to 0.668 with standard deviation (std) equal to 0.008. Hence, the 95% confidence interval (CI) was $[0.668 - 1.96 \times 0.008 / \sqrt{10}, 0.668 + 1.96 \times 0.008 / \sqrt{10}] = [0.6630, 0.6730]$ . The best single-note type model (diff-word) achieved on average 0.660 ± 0.008 std (i.e., [0.6550, 0.6650] 95% CI). Therefore, the AUROC of the multi-note model was higher than the best single-note model but the difference was not statistically significant at 95% CI. However, AUROC tends to be insensitive to unbalanced data. We therefore turned to AUPRC. In terms of AUPRC, the multi-note model achieved on average 0.408 ± 0.007 (std), while the best single-note model achieved on average 0.392 ± 0.008 (std), and the 95% confidence interval in terms of AUPRC were [0.404, 0.412] and [0.387, 0.397], respectively. This showed that the AUPRC of the multi-note model was significantly higher than the AUPRC of the best single-note model at 95% CI.

To further illustrate the benefits of modeling multi-note types, we applied our approach to mortality prediction. Here we used the first 48 hours nursing and physician notes to predict in-hospital mortality. Same as the PMV application, we trained a 50-topic model for each approach and used the topic mixture memberships as an input to a logistic regression classifier for predicting mortality. We performed 5-fold CV to evaluate each method. In particular, each fold including 1560 admissions for evaluation and the remaining 4 folds including 6233 admissions total were used for training each topic model. We found that the multi-note model performed slightly better compared to single-note models, as measured by AUROC and AUPRC (S2 Fig in S1 File). On mortality prediction, the multi-note model achieved on average 0.861 ± 0.004 (std) in terms of AUROC and [0.859, 0.863] 95% CI. The best single-note (same-word) model achieved on average 0.845 ± 0.004 (std) and [0.843, 0.847] 95% CI. In terms of AUPRC, the multi-note model achieved on average 0.419 ± 0.011 (std) and [0.412, 0.426] 95% CI, while the single-note model achieved on average 0.404 ± 0.008 (std) and [0.399, 0.409] 95% CI. These indicated that both the AUPRC and AUROC of the multi-note model are significantly higher than those of the best single-note model at 95% CI.

By construction, the single type (diff. word) model operates over a vocabulary that is roughly twice as big as that of the single type (same word) model (because the same word coming from the two note types is treated as two different words). On the other hand, the multi type model operates on the same vocabulary as single type (same word), but counts the same word coming from different notes types differently. Therefore, to compare more fairly by controlling the impact brought by the effective “vocabulary size” (unique words that are seen by the models), we focused our subsequent analysis on the comparison between the multi-note type model and single-note-type (diff. words) model. For ease of reference, we rename the single-note-type (diff. words) model simply as single-note. We focus our analysis on PMV henceforth as it is less explored than mortality.

Evaluating the topic interpretability of single-note and multi-note topic models

To evaluate the interpretability of the single-note versus the multi-note topic model, we generated a word cloud representing each of the 50 topics in both models (i.e. 100 word clouds). Each topic’s word cloud was comprised of the top 100 words within the topic, based on the inferred word probabilities under each topic. (S1 Fig in S1 File and Fig 3).

Fig 3 — Red indicates the words written by physicians and black indicates the words written by nurses.

For the single-note model, the most common topic themes were “mixed” topics followed by topics pertaining to cardiology, gastroenterology, neurology and respiratory issues. The most common topic themes for multiple-note model were those pertaining to cardiology, gastroenterology, respiratory and neurology. The topics generated by the multi-note model had significantly more cohesiveness than the topics generated by the single-note model. In the multi-note model, most word clouds were comprised of words, phrases, or abbreviations that tracked closely with that topic’s theme. By comparison, the topics extracted in the single-note model contained a greater amount of noisy, unrelated words. For example, the single-note model generated a topic themed “hematological” in which ‘pillow’ was the most common word, and a topic themed “stroke” in which ‘adenoca’ was a common word

In addition, we sought an unbiased quantitative evaluation of the topic interpretability. We asked a physician to manually review the general medical cohesiveness of each word-cloud in the single-note and multi-note model and rated from 1 (poor; irrelevant) to 5 (excellent; sticks to one common disease topic).

Quantitatively, the average interpretability score is 3.46 (± 1.15 standard deviation (std)) for single-note model and 4.22 (± 1.15 std) for multi-note model (Fig 4). We conducted a two-sided t-test between the physician ratings of the multi-note topic model and the single-note topic model (i.e., the standard LDA model) in R and obtained a p-value equal to 0.001298. This indicated that the difference between these two models in terms of physician’s ratings is statistically significant. This trend was further confirmed by the content expert reviewer. The detail of the topic disease and cohesiveness score are listed in S1 and S2 Tables in S1 File.

Fig 4 — The horizontal lines in the box represent the median and the box represent the range between 25% and 75% quartile of the data.

Correlating topics with mechanical ventilation duration

To gain further insights from the 50 learned topics, we inferred the 50-topic patient mixture memberships using the trained topic model. We then correlated the patient 50-topic mixture with the patient’s total mechanical ventilation (MV) duration using only those patients’ notes that were recorded within 48 hours of the their ICU admission (Fig 5 top panels). We chose Pearson’s correlation coefficient because it is a normalized metric whose magnitude reflects the strength of linear correlation, in the range of -1 to 1, and the range restriction of the variables has no impact on the correlation. We also tried Spearman’s and Kendall’s correlation coefficients and observed similar results. We visualized the top 3 most positively correlated topics and the top 3 most negatively correlated topics for single-note model and multi-note model (Fig 5 bottom panels). The multi-note model clearly revealed more meaningful topics related to MV duration. For example, the most correlated topics for MV from multi-note model was associated with septic shock followed by pneumonia. In contrast, the most correlated topic for MV from the single-note model is associated with ‘javascript system error’ along with some discrete and irrelevant terms and concepts. The most common negatively correlated topic for MV was chronic obstructive pulmonary disease (COPD) with acute exacerbation (AE) from the multi-note model and liver transplant with some sparse and unrelated terms from the single-note model.

Fig 5 — We correlated the topic mixture with the MV duration and displayed the correlation as barplots We then visualize the top words for the the topics that are most positively and negatively correlated with PMV using **(a)** the single-note model; **(b)** the multi-note model. Only the 3 most positively and the 3 most negative correlated topics were shown for each model.

Discussion

Different types of medical specialists, such as physicians and nurses, hold distinct domains of medical knowledge. These differences are reflected in the language and terms that populate clinical notes. Existing methods of LTM treat notes authored by different types of medical specialists as the same by assuming all notes follow a homogeneous topic distribution. To the best of our knowledge, we are the first group to propose a model that applies separate analysis depending on the author type of the notes. Our simple and elegant multi-modal topic model showed the advantage of inferring distinct distributions of latent topics between physician and nursing notes. We demonstrated that the proposed multi-note model extracts more meaningful topics and improves the interpretability of the knowledge learned from the notes as compared to the single-note model. We also showed that our model confers, slightly but statistically significantly, more accurate prediction of duration of MV—a highly clinically relevant clinical question among medical specialists caring for patients in critical conditions.

As a future work, we will explore supervised topic models [14] to learn both the topics and predictions simultaneously. There are also more flexible neural network language models such as ClinicalBERT that can learn more abstract terms [15, 16]. We will compare our simpler topic model with ClinicalBERT. Moreover, we will also explore a powerful combination of recurrent neural network and topic model (TopicRNN) [17], which learns both the global context with the topic model and the local context with the RNN. Applications using an analogous idea of predicting readmission of ICU patients using billing code has also shown some promising results [18]. Lastly, our method is not limited to the healthcare domain. For example, we can model documents written in different languages or book reviews by literary scholars from different domains. Together, we envision that our current model can succeed in many application domains, where knowledge is manifested as free-form text in human natural language from diverse empirical domain-knowledge.

Supporting information

S1 File

(PDF)

Click here for additional data file.^{(5.4MB, pdf)}

Data Availability

All relevant data are available on Github: https://github.com/li-lab-mcgill/heterogeneous_ehr.

Funding Statement

a) YL is supported by Natural Sciences and Engineering Research Council (NSERC) Discovery Grant (RGPIN-2019-0621), Fonds de recherche Nature et technologies (FRQNT) New Career (NC-268592), and Microsoft Research. b) The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. c) No author receives salary from any of the above funders.

References

1. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. The Journal of Machine Learning Research. 2003;3:993–1022. [Google Scholar]
2. Johnson AEW, Pollard TJ, Shen L, Lehman LwH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Scientific Data. 2016;3:160035–160039. 10.1038/sdata.2016.35 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Vincent JL, Moreno R. Clinical review: scoring systems in the critically ill. Critical care. 2010;14(2):207. 10.1186/cc8204 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Chen Y, Ghosh J, Bejan CA, Gunter CA, Gupta S, Kho A, et al. Building bridges across electronic health record systems through inferred phenotypic topics. JOURNAL OF BIOMEDICAL INFORMATICS. 2015;55(C):82–93. 10.1016/j.jbi.2015.03.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Pivovarov R, Perotte AJ, Grave E, Angiolillo J, Wiggins CH, Elhadad N. Learning probabilistic phenotypes from heterogeneous EHR data. J Biomed Inform. 2015;58(C):156–165. 10.1016/j.jbi.2015.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Li Y, Kellis M. A latent topic model for mining heterogenous non-randomly missing electronic health records data. arxiv. 2018.
7. Li Y, Nair P, Lu XH, Wen Z, Wang Y, Dehaghi AAK, et al. Inferring multimodal latent topics from electronic health records. Nature Communications. 2020;11(1):1–17. 10.1038/s41467-020-16378-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Teh YW, Newman D, Welling M. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Advances in neural information processing systems. 2006.
9.Minka T. Estimating a Dirichlet distribution. Technical Report. 2000.
10.Asuncion A, Welling M, Smyth P, Teh YW. On Smoothing and Inference for Topic Models. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. UAI’09. Arlington, Virginia, United States: AUAI Press; 2009. p. 27–34. Available from: http://dl.acm.org/citation.cfm?id=1795114.1795118.
11. Griffiths J, Barber VS, Morgan L, Young JD. Systematic review and meta-analysis of studies of the timing of tracheostomy in adult patients undergoing artificial ventilation. Bmj. 2005;330(7502):1243. 10.1136/bmj.38467.485671.E0 [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Pappalardo F, Franco A, Landoni G, Cardano P, Zangrillo A, Alfieri O. Long-term outcome and quality of life of patients requiring prolonged mechanical ventilation after cardiac surgery. European journal of cardio-thoracic surgery. 2004;25(4):548–552. 10.1016/j.ejcts.2003.11.034 [DOI] [PubMed] [Google Scholar]
13. Boles J, Bion J, Connors A, Herridge M, B M, C M. Weaning from mechanical ventilation. Eur Respir J. 2007;29:1033–10563. 10.1183/09031936.00010206 [DOI] [PubMed] [Google Scholar]
14.McAuliffe JD, Blei DM. Supervised Topic Models. In: Platt JC, Koller D, Singer Y, Roweis ST, editors. Advances in Neural Information Processing Systems 20. Curran Associates, Inc.; 2008. p. 121–128.
15.Huang K, Altosaar J, Ranganath R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXivorg. 2019.
16.Alsentzer E, Murphy JR, Boag W, Weng W, Jin D, Naumann T, et al. Publicly Available Clinical BERT Embeddings. CoRR. 2019;abs/1904.03323.
17.Dieng AB, Wang C, Gao J, Paisley JW. TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency. CoRR. 2016;abs/1611.01702.
18. Xiao C, Ma T, Dieng AB, Blei DM, Wang F. Readmission prediction via deep contextual embedding of clinical concepts. PloS one. 2018;13(4):e0195024–15. 10.1371/journal.pone.0195024 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0249622.r001

Decision Letter 0

Ivan Olier

22 Sep 2020

PONE-D-20-09700

Mining heterogeneous clinical notes by multi-modal latent topic model

PLOS ONE

Dear Dr. Li,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 06 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Ivan Olier, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2.Thank you for stating the following financial disclosure:

[The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.].

At this time, please address the following queries:

Please clarify the sources of funding (financial or material support) for your study. List the grants or organizations that supported your study, including funding received from your institution.
State what role the funders took in the study. If the funders had no role in your study, please state: “The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”
If any authors received a salary from any of your funders, please state which authors and which funders.
If you did not receive any funding for this study, please state: “The authors received no specific funding for this work.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have proposed an interesting study on the text modelling for electronic health records. In particular, the authors conceived that different note types should be modelled differently in a single framework. Therefore, they have proposed the modelling solution. Overall, the idea is well-motivated with good writing styles in a logical manner. In particular, I would like to highlight that the authors did ask for the physicians' involvements in the proposed study. Such experiments reflect the authors' efforts and its clinical relevance. I have the following minor comments:

1. It could be nice if the authors could show other methods for comparisons.

2. The time complexity analysis or running time could be provided.

3. The source code could be released.

Reviewer #2: # Review of Manuscript PONE-D-20-09700

## Mining heterogeneous clinical notes by multi-modal latent topic model

## Summary

I appreciated the opportunity to read this interesting paper. The authors propose an extension of the seminal Latent Dirichlet Allocation topic model of Blei et al. (2003) such that documents (i.e., medical notes) about a given patient written by different authors can be accommodated. This is accomplished by introducing multiple topic-word probability distribution matrices that are author-specific. They describe a variational Bayes estimation procedure to approximate the posterior distribution of the model parameters. They evaluated generalizability (i.e., prediction quality) using cross-validation on a subset of the MIMIC-III (Johnson et al., 2016). Results suggest marginal or no improvement in prediction quality over two alternative LDA models. Two facets of topic interpretability from the three models were evaluated by two external raters with relevant expertise. Overall, the proposed model is a useful and extensible generalization of LDA that could be useful in a variety of fields and applications beyond medicine.

## Major Issues

+ Page 3, 4: All notes for a given patient are assumed to have the same topic proportion vector $\\theta_j$. This seems to be a very strong and unrealistic assumption since each note may quite naturally contain different topics or different compositions of topics, a much more plausible scenario that the present model does not accommodate. For example, a nurse may write a note that focuses on a different subset of the K topics than a doctor.

+ Page 4: The notation in the variational algorithm equations is inconsistent and symbols and subscripts are not always clearly defined. Please carefully define the notation of the different counts $n$. In particular, Equation (5) introduces new notation that is never defined, specifically $\\Psi$ and $W_t$.

+ While an estimation algorithm for the proposed model is introduced, its performance is not evaluated. Without a simulation study to evaluate the quality of the point estimates obtained from the variational algorithm, it is impossible to know (a) if it can correctly recover the parameters of the proposed model (b) the necessary data requirements (e.g., number of documents, length of documents) and (c) the impact of potential complicating factors on model recovery (e.g., vocabulary size, the number of note types, impact of missing note types for some participants). Many applications of topic models in social science and medicine apply these models to much smaller data sets, particularly, short documents and a small number of documents where the models can break down. It would be valuable for both further methodological development and good practice in application to study the statistical performance of this model systematically.

+ Page 5, 6: What predictors were used in the logistic regression models? The topic proportion estimates $\\hat{\\theta}_j$?

+ Page 6: Why was the number of topics set to 50? Was this arbitrary or was this chosen by cross-validation on training data? The use of 50 topics could very likely contribute to overfitting especially given the large number of additional parameters being estimated in the proposed model. Since there is very little noticeable improvement in prediction quality, the data simply may not support a 50-topic model.

+ Page 6: The ROCs in Figure S2 do not support the claim that the "multi-note model achieved superior performance". The difference in AUROC is 86% vs. 85% and is presented without uncertainty estimates. Accounting for uncertainty, I would conjecture that the models are equivalent in prediction on this problem. This model does not, of course, have to be a major predictive breakthrough as its potential improvements to topic interpretability are interesting. However, I encourage the authors to (a) use less biased language when comparing the models' performance and (b) provide confidence intervals for the AUROC estimates throughout the paper.

+ Page 6, 7: Please define "cohesiveness". The claim that the "topics generated by the multi-note model had significantly more cohesiveness" is not supported unless I am missing something. This appears to be an entirely subjective claim.

+ Page 9: The claims in the last two sentences are overly strong given the evidence provided. It is not clear that the proposed model necessarily provides more meaningful topics (see comments elsewhere on this point). There is certainly limited or no evidence on this data set that the proposed model provides more accurate prediction of MV duration. As noted elsewhere, the differences in AUROC between the different models is negligible. Of course, this model *may* provide better predictive performance on other data sets. Indeed, as the authors hint at on p. 10, because the topics are not linked in the model to the outcome of interest as in, for example, supervised topic models (Blei & McAuliffe, 2008), there is no reason for the topic proportion estimates to be related at all to the outcome.

+ Please provide code to reproduce the analyses described throughout the paper.

## Minor Issues

+ Figure 1, p. 3: $z_{ijk}$ and $x_{ij}$ should have superscripts $(t)$ to differentiate the different notes for patient $j$ These superscripts appear in the distributional specification of both random variables on page 3.

+ Page 3, 4: References to the multinomial distribution should be replaced with the **categorical** distribution. This is also consistent with the distribution definitions for $z_{ij}^(t)$ and $x_{ij}^(t)$, which lack the normalization constants of a multinomial distribution but are consistent with the categorical distribution.

+ Page 5. Summary statistics of the document/note text should be provided. How long are these notes? How large is the vocabulary of each type of note?

+ Page 5. Was the impact of the pre-processing choices on model performance evaluated at all?

+ Page 5, 6: The outcome being predicted in the empirical application was dichotomized from a count variable (number of days), which can often reduce the information available and is not generally recommended. Please justify this decision or, perhaps better, use the original outcome.

+ Page 6 and elsewhere: Language used to compare the proposed model to the comparison models is overly optimistic and potentially misleading. A difference in AUROC of 66% vs. 65% is very small. Indeed, confidence intervals for the AUROC should be provided for all models as I suspect that they will indicate substantial uncertainty.

+ Page 7: What do the numbers in parentheses for the average interpretability scores represent? Standard deviations? Confidence limits? On a related note, if the ratings are between 1 and 5, a rating of 4.22 + 1.15 = 5.37 is impossible. Finally, a direct test comparing the interpretability rating of the rater(s) would be useful to avoid overinterpretting what appear to be quite noisy estimates of interpretability. Accounting for uncertainty, there may be no statistically significant difference in interpretability between the two models.

+ Page 8: What measure of correlation was used? Pearson's correlation (the usual default) is inappropriate for correlating mixture proportion estimates with a duration variable because of the range restriction on the mixture proportion estimates. A better option would be Spearman's or Kendall's correlation coefficients.

+ Figure 4, Page 9: Please descibe the components of this boxplot as different graphing software uses different conventions. Are the whiskers quartiles? Confidence limits? What do the bounds of the box indicate? What are the points indicating and are they overplotted (jitter the points if so)?

## Typos

+ Page 4: After Equation (3), please remove the fragment "The hyperparameters $\\alpha_k$ and $\\beta_{tk}$".

Reviewer #3: The authors introduced a novel hierarchical topic model method for clinical notes retrieved from Harvard teaching hospitals. The method and its interpretation are interesting from both machine learning/NLP and clinical informatic perspective. I have following comments

1. The topic modelling method, by its nature, belongs to unsupervised distributional embedding learning, which is a traditional branch of text embedding widely used. In recently years, the clinical NLP community has been dominant by “BERTologists” ( the group of researchers focusing on transformer based model). Could the authors clarify the different use cases and advantages of method present in this manuscript compared with transformer based models?

2. One interesting feature found in this method is topic discovery and interpretability of the model. I would suggest the author reflect these points in abstract and even the tittle for readers without much machine learning backgrounds.

3. When predicting mortality, models present in this manuscript has AUCROC<0.70. As MIMIC3 dataset is from ICU and many publications claim AUCROC >0.80, it is necessary for the authors to justify this relatively low performance. If the focus is not performance, it should be clearly mentioned early in the text.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Apr 8;16(4):e0249622. doi: 10.1371/journal.pone.0249622.r002

Author response to Decision Letter 0

3 Dec 2020

Please see Response to Reviewers.pdf

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(530.8KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0249622.r003

Decision Letter 1

Ivan Olier

20 Jan 2021

PONE-D-20-09700R1

Mining heterogeneous clinical notes by multi-modal latent topic model

PLOS ONE

Dear Dr. Li,

Please submit your revised manuscript by Mar 06 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Ivan Olier, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: No

**********

6. Review Comments to the Author

Reviewer #1: The authors have addressed my comments satisfactorily. In particular, the authors have collaborated with the medical side which is very nice.

Reviewer #2: ## Summary

I thank the authors for addressing most of the questions and comments I had from the first version of their manuscript. In particular, I applaud them for conducting and reporting a simulation study to evaluate the performance of their estimation algorithm. This provides valuable information about the proposed algorithm's ability to successfully estimate the proposed model, particularly for data sets with similar characteristics. I also thank the authors for providing access to the code for the proposed algorithm and its application via Github. I have several remaining points for the authors to consider, which I list below.

## Interpretability

+ Page 7, Line 159-160: As I mentioned in my review of the original manuscript, a direct test comparing the interpretability rating of the rater(s) would be useful to directly test the difference in interpretability ratings between the topics from the proposed model and the single-note model. I did so using paired t-tests (under a range of assumed correlations since I don't have the original scores) and find that the difference is statistically significant across different potential correlations, so I agree that there is improvement in physician rating of interpretability. My main suggestion here is to report such a test directly in addition to the means and standard deviations to provide clear evidence of the claim that is highlighted in both the abstract and in the Discussion (Line 190-193) that interpretability may be better using the multimodal topic model instead of the standard LDA model in this example.

## Prediction

+ Page 5, Line 131-132: As I mentioned in my review of the original manuscript, the ROCs in Figure 2 and in Figure S2 do not support the claim here that the "multi-note model performed better compared to single-note models". For example, the difference in AUROC from Figure S2 when predicting whether a patient will be on a mechanical ventilator for more than 7 days or not is 86.0% vs. 85.2%, an arguably negligible difference, which is presented without uncertainty estimates such as the standard deviation across the 5-fold cross-validation. Accounting for uncertainty, I would conjecture that the models are equivalent in prediction on this problem. Again, I encourage the authors to provide confidence intervals for the AUROC estimates throughout the paper (e.g., via bootstrapping as in the pROC R software package; Robin et al., 2011, doi: 10.1186/1471-2105-12-77).

+ As I mentioned in my original review and in the first bullet point above, claims in the results sections and discussion (e.g., Lines 130-132, 193-194) are not supported by the evidence provided: The authors claim that the proposed model "performed better compared to single-note models, as measured by AUROC and AUPRC" and "our model confers more accurate prediction of duration of MV". However, the AUROC and AUPRC values are, as I mentioned above, quite similar. There is no strong statistical evidence (unless confidence intervals for each AUROC/AUPRC value per model and for the difference in AUROC/AUPRC between models are provided) that prediction performance differed between the two models. I would echo comments from other reviewers that even if these AUROC values differ significantly from a purely statistical inference perspecive, that the very small magnitude of such a difference needs to be defended. Alternatively, the authors could omit claims that their model's predictive performance is better than single-note models and focus on the improved interpretability of the topics extracted by their model. To clarify, I do not believe that they need to show better predictive performance for this paper to be a valuable contribution to the literature. However, I do not want their claims in the results and discussion to potentially misread a casual reader.

## Errata

While the overall conclusions and results in the paper are clear, the paper would benefit from careful proofreading for minor typographical and grammatical errors (e.g., definite and indefinite articles such as "a" and "the" are omitted throughout the paper) before publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. 2021 Apr 8;16(4):e0249622. doi: 10.1371/journal.pone.0249622.r004

Author response to Decision Letter 1

20 Feb 2021

Please see Response to Reviewers.pdf

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(176.9KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0249622.r005

Decision Letter 2

Ivan Olier

23 Mar 2021

Mining heterogeneous clinical notes by multi-modal latent topic model

PONE-D-20-09700R2

Dear Dr. Li,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Ivan Olier, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have addressed my comments already.I don't have any further commet. Thank you very much.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. doi: 10.1371/journal.pone.0249622.r006

Acceptance letter

Ivan Olier

29 Mar 2021

PONE-D-20-09700R2

Mining heterogeneous clinical notes by multi-modal latent topic model

Dear Dr. Li:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Ivan Olier

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 File

(PDF)

Click here for additional data file.^{(5.4MB, pdf)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(530.8KB, pdf)}

Attachment

Submitted filename: Response to Reviewers.pdf

Click here for additional data file.^{(176.9KB, pdf)}

Data Availability Statement

All relevant data are available on Github: https://github.com/li-lab-mcgill/heterogeneous_ehr.

[pone.0249622.ref001] 1. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. The Journal of Machine Learning Research. 2003;3:993–1022. [Google Scholar]

[pone.0249622.ref002] 2. Johnson AEW, Pollard TJ, Shen L, Lehman LwH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Scientific Data. 2016;3:160035–160039. 10.1038/sdata.2016.35 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0249622.ref003] 3. Vincent JL, Moreno R. Clinical review: scoring systems in the critically ill. Critical care. 2010;14(2):207. 10.1186/cc8204 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0249622.ref004] 4. Chen Y, Ghosh J, Bejan CA, Gunter CA, Gupta S, Kho A, et al. Building bridges across electronic health record systems through inferred phenotypic topics. JOURNAL OF BIOMEDICAL INFORMATICS. 2015;55(C):82–93. 10.1016/j.jbi.2015.03.011 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0249622.ref005] 5. Pivovarov R, Perotte AJ, Grave E, Angiolillo J, Wiggins CH, Elhadad N. Learning probabilistic phenotypes from heterogeneous EHR data. J Biomed Inform. 2015;58(C):156–165. 10.1016/j.jbi.2015.10.001 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0249622.ref006] 6.Li Y, Kellis M. A latent topic model for mining heterogenous non-randomly missing electronic health records data. arxiv. 2018.

[pone.0249622.ref007] 7. Li Y, Nair P, Lu XH, Wen Z, Wang Y, Dehaghi AAK, et al. Inferring multimodal latent topics from electronic health records. Nature Communications. 2020;11(1):1–17. 10.1038/s41467-020-16378-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0249622.ref008] 8.Teh YW, Newman D, Welling M. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. Advances in neural information processing systems. 2006.

[pone.0249622.ref009] 9.Minka T. Estimating a Dirichlet distribution. Technical Report. 2000.

[pone.0249622.ref010] 10.Asuncion A, Welling M, Smyth P, Teh YW. On Smoothing and Inference for Topic Models. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. UAI’09. Arlington, Virginia, United States: AUAI Press; 2009. p. 27–34. Available from: http://dl.acm.org/citation.cfm?id=1795114.1795118.

[pone.0249622.ref011] 11. Griffiths J, Barber VS, Morgan L, Young JD. Systematic review and meta-analysis of studies of the timing of tracheostomy in adult patients undergoing artificial ventilation. Bmj. 2005;330(7502):1243. 10.1136/bmj.38467.485671.E0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0249622.ref012] 12. Pappalardo F, Franco A, Landoni G, Cardano P, Zangrillo A, Alfieri O. Long-term outcome and quality of life of patients requiring prolonged mechanical ventilation after cardiac surgery. European journal of cardio-thoracic surgery. 2004;25(4):548–552. 10.1016/j.ejcts.2003.11.034 [DOI] [PubMed] [Google Scholar]

[pone.0249622.ref013] 13. Boles J, Bion J, Connors A, Herridge M, B M, C M. Weaning from mechanical ventilation. Eur Respir J. 2007;29:1033–10563. 10.1183/09031936.00010206 [DOI] [PubMed] [Google Scholar]

[pone.0249622.ref014] 14.McAuliffe JD, Blei DM. Supervised Topic Models. In: Platt JC, Koller D, Singer Y, Roweis ST, editors. Advances in Neural Information Processing Systems 20. Curran Associates, Inc.; 2008. p. 121–128.

[pone.0249622.ref015] 15.Huang K, Altosaar J, Ranganath R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXivorg. 2019.

[pone.0249622.ref016] 16.Alsentzer E, Murphy JR, Boag W, Weng W, Jin D, Naumann T, et al. Publicly Available Clinical BERT Embeddings. CoRR. 2019;abs/1904.03323.

[pone.0249622.ref017] 17.Dieng AB, Wang C, Gao J, Paisley JW. TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency. CoRR. 2016;abs/1611.01702.

[pone.0249622.ref018] 18. Xiao C, Ma T, Dieng AB, Blei DM, Wang F. Readmission prediction via deep contextual embedding of clinical concepts. PloS one. 2018;13(4):e0195024–15. 10.1371/journal.pone.0195024 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Mining heterogeneous clinical notes by multi-modal latent topic model

Zhi Wen

Pratheeksha Nair

Chih-Ying Deng

Xing Han Lu

Edward Moseley

Naomi George

Charlotta Lindvall

Yue Li

Roles

Abstract

Introduction

Related methods

Methods

Multi-modal latent topic model

Fig 1. Proposed multi-note latent topic model.

MIMIC-III note processing

Qualitative evaluation

Results

Multi-note model improves PMV and mortality prediction

Fig 2. ROC and precision-recall curve for binary PMV prediction.

Evaluating the topic interpretability of single-note and multi-note topic models

Fig 3. Word clouds of the 50 topics from multi-note model.

Fig 4. Topic scores over 50 latent topics inferred by single-note and multi-note latent topic models from the 17,000 clinical notes.

Correlating topics with mechanical ventilation duration

Fig 5. Topic correlations with the MV duration.

Discussion

Supporting information

Data Availability

Funding Statement

References

Decision Letter 0

Ivan Olier

Roles

Author response to Decision Letter 0

Decision Letter 1

Ivan Olier

Roles

Author response to Decision Letter 1

Decision Letter 2

Ivan Olier

Roles

Acceptance letter

Ivan Olier

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases