Skip to main content
. 2022 Sep 14;13:946387. doi: 10.3389/fpsyt.2022.946387

TABLE 7.

Comparison of NLP performance for classification from clinicians’ notes.

Author Task NLP model Embeddings and Corpus AUC
Xia et al. (77) Identify patients with complex neurological disorder cTAKES-based Train corpus: ∼600 clinical notes
Test corpus: ∼500 clinical notes
0.96
Wissel et al. (104) Identify candidates for epilepsy surgery Multiple linear regression Train corpus: ∼1,100 clinical notes
Test corpus: ∼8,340 clinical notes
0.9
Heo et al. (85) Prediction of stroke outcomes CNN + LSTM + Multilayer perceptron Train corpus: ∼1,300 clinical notes
Test corpus: ∼500 clinical notes
0.81
Lineback et al. (78) Prediction of 30-day readmission after stroke Logistic regression + naïve Bayes + SVM + RF + Gradient boosting + XGBoost Train corpus: ∼2,300 clinical notes
Test corpus: ∼550 clinical notes
0.64
Lin et al. (81) Identify UAU in hospitalized patients Logistic regression Train corpus: ∼58 k clinical notes 0.91
Bacchi et al. (87) Prediction of cause of TIA-like presentations RNN + CNN Corpus: 2,201 clinical notes (∼150 words each) 0.88

NLP, natural language processing; AUC, area under curve; cTAKES, clinical text analysis and knowledge extraction system; CNN, convolutional neural network; LSTM, long short-term memory; SVM, support vector machine; RF, random forest; UAU, unhealthy alcohol use; ISP-D, internet-based self-assessment program for depression.