. 2024 Mar 26;8(12):2991–3000. doi: 10.1182/bloodadvances.2023012200

Table 1.

Data extraction from 13 included studies

Reference	Setting (country, hospital type)	Type of free-text data and language	Cohort size (number of reports), characteristic, true positive	Training approach	Text processing approach	ML approach	Performance measure^∗
Banerjee et al,¹⁶ 2018 PMID 29175548	United States, academic	Radiology (CT chest) reports, English	4512 reports from 1 hospital 254/858 true positives in external validation set	3512 for training, 1000 for testing 10-fold cross validation	Intelligent word embedding; combines semantic-dictionary mapping and neural embedding	Binary LR models (LASSO)	PE Internal validation (n = 1000) AUC, 0.95 Precision, 97.25% Recall, 96.70% F1 score, 0.97 External validation (n = 858) AUC, 0.96 Precision, 93.03% Recall, 93.02% F1 score, 0.94
Banerjee et al,¹⁷ 2019 PMID 30477892	United States, academic	Radiology reports (CT chest), English	4512 reports from 1 hospital True positives not reported	2512 reports for training, 1000 for calibration, 1000 for testing	Global Vectors for Word Representation (GloVe) Novel domain phrase hierarchy	CNN model HNN; without attention mechanism A-HNN DPA-HNN	PE DPA-HNN Internal validation (n = 1000) AUC, 0.99 Precision, 0.99 Recall, 0.99 F1 score, 0.99 External validation 1 (n = 1000) AUC, 0.94 Precision, 0.94 Recall, 0.81 F1 score, 0.86 External validation 2 (n = 1000) AUC, 0.93 Precision, 0.80 Recall, 0.80 F1 score, 0.80 External validation 3 (n = 858) AUC, 0.95 Precision, 0.87 Recall, 0.87 F1 score, 0.87
Chen et al,¹⁸ 2018 PMID 29135365	United States, academic	Radiology (CT chest) reports, English	117 915 reports from 1 hospital 38/1 000 true positives in internal validation set 279 of 859 true positives in external validation set	2500 for training with resampling, 1000 reports for calibration, 1000 for testing	GloVe	CNN model using Tensor Flow	PE Internal validation (n = 1000) Sensitivity, 0.950% Specificity, 0.997% Accuracy, 0.995% F1 score, 0.938 External validation (n = 859) Sensitivity, 0.952% Specificity, 0.905% Accuracy, 0.921% F1 score, 0.891
Danilov et al,¹⁹ 2022 PMID 35062094	Russia, academic	All clinical notes, Russian	621 medical cases from 1 hospital 139/621 true positives	300 for training with resampling, training/testing ratio 80%/20%	Semiautomatic IEA	RF, LR, SVM with kernel types linear, radial, and polynomial (poly), and K-nearest neighbors	PE RF Sensitivity, 0.959 Specificity, 0.976 PPV, 0.920 Accuracy, 0.950 F1 score, 0.937
Dantes et al,²⁰ 2018 PMID 29087984	United States, academic	Radiology reports (duplex ultrasound of extremity, CTA chest, or MRI chest), English	2551 reports from 1 hospital True positives not reported	4-5 reports for training	IDEAL-X	IDEAL-X online ML mode, not further specified	DVT/PE Sensitivity, 92% (95% CI, 88.3-96.1) Specificity, 99% (95% CI, 98.5-99.4)
Fiszman et al,²⁷ 1998 PMID 9929341	United States, community	Radiology reports (V/Q lung scans), English	572 reports from 1 hospital True positives not reported	200 for training, 372 for testing	Rule-based	Bayesian networks	PE Precision, 0.88 Recall, 0.92
Pham et al,²¹ 2014 PMID 25099227	France, academic	Radiology reports (CTA/CTV chest), French	573 reports from 1 hospital True positives not reported	Randomly selected 100 reports to form test set. With the remaining set, tripled the number of positive reports and increased negative reports to match that number; this formed the training set.	Human annotation with simple segmentation and tokenization	Initially used a Naïve Bayes classifier using Weka to identify optimal feature sets, then used Wapiti implementations of SVM and Maximum entropy (MaxEnt)	DVT/PE MaxEnt Precision, 1.00 Recall, 0.96 F1 score, 0.98
Rochefort et al,²² 2014 PMID 25332356	Canada, academic	Radiology reports, English	2000 reports from 1649 patients from 5 hospitals 121/2000 true positives for PE, 259 of 2000 true positives for DVT	10-fold cross validation	Bag of words	SVM	DVT Sensitivity, 0.80 (95% CI, 0.76-0.85) PPV, 0.89 (95% CI, 0.85-0.93) AUC, 0.98 (95% CI, 0.97-0.99) PE Sensitivity, 0.79 (95% CI, 0.73-0.85) PPV, 0.84 (95% CI, 0.75-0.92) AUC, 0.99 (95% CI, 0.98-1.00)
Selby et al,²³ 2018 PMID 30056994	United States, academic	Radiology reports (duplex ultrasound of extremity or CTA chest), English	2746 reports from 2206 post-operative patients from 1 hospital 27/506 true positives for PE, 259/2000 true positives for DVT	Data set split into 70% training, 30% for testing	Bag of words	Weka; specific model was not specified	DVT Sensitivity, 85.1% Specificity, 94.6% PPV, 78.4% NPV, 96.5% PE Sensitivity, 90.0% Specificity, 98.7% PPV, 81.8% NPV, 99.3%
Shah et al,²⁶ 2020 PMID 32600201	United States, academic	All clinical notes, English	1000 notes from 1 hospital True positives not reported	400 for training, 600 for testing	Rule-based	Model not specified, used the tool Extractor from CloudMedX	DVT/PE Accuracy, 90.0%, Sensitivity, 97.0%, Specificity, 86.0%
Weikert et al,²⁴ 2020 PMID 32135443	Switzerland, academic	Radiology reports (CTA chest), German	4397 reports from 1 hospital 209 of 1377 true positives	2801 reports (all reports from years 2016-2017) used for training, 1377 reports (from year 2018) used for testing 3-fold cross validation	Term frequency-inverse document frequency (tf-idf) and word2vec model	SVM and RF using Scikit CNN using Tensor Flow	PE CNN Sensitivity, 97.7% (95% CI, 94.6-99.2) Specificity, 99.4% (95% CI, 98.8-99.8) PPV, 96.8% (95% CI, 93.5-98.4) NPV, 99.6% (95% CI, 99.0-99.8) Accuracy, 99.1% (95% CI, 98.5-99.6) F1 score, 0.972 (95% CI, 0.963-0.981)
Wendelboe et al,²⁸ 2022 PMID 37206160	United States, academic	Radiology reports (CTA chest, duplex ultrasound of extremity, V/Q lung scans), English	1591 reports from 1 hospital, 1487 reports from another hospital for a total of 3078 reports 1204 of 3078 true positives	Training based on Dantes et al²⁰	IDEAL-X	IDEAL-X online ML mode, not further specified	DVT/PE Accuracy, 93.7 (95% CI, 93.7-93.8) Sensitivity, 96.3 (95% CI, 96.2-96.4) Specificity, 92 (95% CI, 91.9-92) PPV, 89.1 (95% CI, 89-89.2) NPV, 97.3 (95% CI, 97.3-97.4)
Yu et al,²⁵ 2014 PMID 25117751	United States, academic	Radiology reports (CTA chest), English	10 330 reports from 1 hospital 1 972/10 330 true positives	50% for training, 50% for testing	Rule-based NILE system, output converted to numeric features	LR with adaptive LASSO penalty	PE PPV, 0.95 NPV, 0.99 AUC, 0.998 ± 0.005 F1 score, 0.96

A-HNN, attention–based hierarchical neural network; CTA, computed tomography angiography; CTV, computed tomography venography; DPA-HNN, domain phrase attention–based hierarchical neural network; HNN, hierarchial neural network; IEA, information extraction algorithm; LASSO, binary logistic regression models; MaxEnt, maximum entropy; MRI, magnetic resonance imaging; NILE, narrative information linear extraction; PE, pulmonary embolism; RF, random forest; V/Q, ventilation/perfusion.

^∗

If multiple models were used, the model with the best performance measure is reported.