Skip to main content
. 2024 Mar 26;8(12):2991–3000. doi: 10.1182/bloodadvances.2023012200

Table 1.

Data extraction from 13 included studies

Reference Setting (country, hospital type) Type of free-text data and language Cohort size (number of reports), characteristic, true positive Training approach Text processing approach ML approach Performance measure
Banerjee et al,16 2018
PMID 29175548
United States, academic Radiology (CT chest) reports, English 4512 reports from 1 hospital
254/858 true positives in external validation set
3512 for training, 1000 for testing
10-fold cross validation
Intelligent word embedding; combines semantic-dictionary mapping and neural embedding Binary LR models (LASSO) PE
Internal validation (n = 1000)
AUC, 0.95
Precision, 97.25%
Recall, 96.70%
F1 score, 0.97
External validation (n = 858)
AUC, 0.96
Precision, 93.03%
Recall, 93.02%
F1 score, 0.94
Banerjee et al,17 2019
PMID 30477892
United States, academic Radiology reports (CT chest), English 4512 reports from 1 hospital
True positives not reported
2512 reports for training, 1000 for calibration, 1000 for testing Global Vectors for Word Representation (GloVe)
Novel domain phrase hierarchy
CNN model
HNN; without attention mechanism
A-HNN
DPA-HNN
PE
DPA-HNN
Internal validation (n = 1000)
AUC, 0.99
Precision, 0.99
Recall, 0.99
F1 score, 0.99
External validation 1 (n = 1000)
AUC, 0.94
Precision, 0.94
Recall, 0.81
F1 score, 0.86
External validation 2 (n = 1000)
AUC, 0.93
Precision, 0.80
Recall, 0.80
F1 score, 0.80
External validation 3 (n = 858)
AUC, 0.95
Precision, 0.87
Recall, 0.87
F1 score, 0.87
Chen et al,18 2018
PMID 29135365
United States, academic Radiology (CT chest) reports, English 117 915 reports from 1 hospital
38/1 000 true positives in internal validation set
279 of 859 true positives in external validation set
2500 for training with resampling, 1000 reports for calibration, 1000 for testing GloVe CNN model using Tensor Flow PE
Internal validation (n = 1000)
Sensitivity, 0.950%
Specificity, 0.997%
Accuracy, 0.995%
F1 score, 0.938
External validation (n = 859)
Sensitivity, 0.952%
Specificity, 0.905%
Accuracy, 0.921%
F1 score, 0.891
Danilov et al,19 2022
PMID 35062094
Russia, academic All clinical notes, Russian 621 medical cases from 1 hospital
139/621 true positives
300 for training with resampling, training/testing ratio 80%/20% Semiautomatic IEA RF, LR, SVM with kernel types linear, radial, and polynomial (poly), and K-nearest neighbors PE
RF
Sensitivity, 0.959
Specificity, 0.976
PPV, 0.920
Accuracy, 0.950
F1 score, 0.937
Dantes et al,20 2018
PMID 29087984
United States, academic Radiology reports (duplex ultrasound of extremity, CTA chest, or MRI chest), English 2551 reports from 1 hospital
True positives not reported
4-5 reports for training IDEAL-X IDEAL-X online ML mode, not further specified DVT/PE
Sensitivity, 92% (95% CI, 88.3-96.1)
Specificity, 99% (95% CI, 98.5-99.4)
Fiszman et al,27 1998
PMID 9929341
United States, community Radiology reports (V/Q lung scans), English 572 reports from 1 hospital
True positives not reported
200 for training, 372 for testing Rule-based Bayesian networks PE
Precision, 0.88
Recall, 0.92
Pham et al,21 2014
PMID 25099227
France, academic Radiology reports (CTA/CTV chest), French 573 reports from 1 hospital
True positives not reported
Randomly selected 100 reports to form test set. With the remaining set, tripled the number of positive reports and increased negative reports to match that number; this formed the training set. Human annotation with simple segmentation and tokenization Initially used a Naïve Bayes classifier using Weka to identify optimal feature sets, then used Wapiti implementations of SVM and Maximum entropy (MaxEnt) DVT/PE
MaxEnt
Precision, 1.00
Recall, 0.96
F1 score, 0.98
Rochefort et al,22 2014
PMID 25332356
Canada, academic Radiology reports, English 2000 reports from 1649 patients from 5 hospitals
121/2000 true positives for PE, 259 of 2000 true positives for DVT
10-fold cross validation Bag of words SVM DVT
Sensitivity, 0.80 (95% CI, 0.76-0.85)
PPV, 0.89 (95% CI, 0.85-0.93)
AUC, 0.98 (95% CI, 0.97-0.99)
PE
Sensitivity, 0.79 (95% CI, 0.73-0.85)
PPV, 0.84 (95% CI, 0.75-0.92)
AUC, 0.99 (95% CI, 0.98-1.00)
Selby et al,23 2018
PMID 30056994
United States, academic Radiology reports (duplex ultrasound of extremity or CTA chest), English 2746 reports from 2206 post-operative patients from 1 hospital
27/506 true positives for PE, 259/2000 true positives for DVT
Data set split into 70% training, 30% for testing Bag of words Weka; specific model was not specified DVT
Sensitivity, 85.1%
Specificity, 94.6%
PPV, 78.4%
NPV, 96.5%
PE
Sensitivity, 90.0%
Specificity, 98.7%
PPV, 81.8%
NPV, 99.3%
Shah et al,26 2020
PMID 32600201
United States, academic All clinical notes, English 1000 notes from 1 hospital
True positives not reported
400 for training, 600 for testing Rule-based Model not specified, used the tool Extractor from CloudMedX DVT/PE
Accuracy, 90.0%, Sensitivity, 97.0%, Specificity, 86.0%
Weikert et al,24 2020
PMID 32135443
Switzerland, academic Radiology reports (CTA chest), German 4397 reports from 1 hospital
209 of 1377 true positives
2801 reports (all reports from years 2016-2017) used for training, 1377 reports (from year 2018) used for testing
3-fold cross validation
Term frequency-inverse document frequency (tf-idf) and word2vec model SVM and RF using Scikit
CNN using Tensor Flow
PE
CNN
Sensitivity, 97.7% (95% CI, 94.6-99.2)
Specificity, 99.4% (95% CI, 98.8-99.8)
PPV, 96.8% (95% CI, 93.5-98.4)
NPV, 99.6% (95% CI, 99.0-99.8)
Accuracy, 99.1% (95% CI, 98.5-99.6)
F1 score, 0.972 (95% CI, 0.963-0.981)
Wendelboe et al,28 2022
PMID 37206160
United States, academic Radiology reports (CTA chest, duplex ultrasound of extremity, V/Q lung scans), English 1591 reports from 1 hospital, 1487 reports from another hospital for a total of 3078 reports
1204 of 3078 true positives
Training based on Dantes et al20 IDEAL-X IDEAL-X online ML mode, not further specified DVT/PE
Accuracy, 93.7 (95% CI, 93.7-93.8)
Sensitivity, 96.3 (95% CI, 96.2-96.4)
Specificity, 92 (95% CI, 91.9-92)
PPV, 89.1 (95% CI, 89-89.2)
NPV, 97.3 (95% CI, 97.3-97.4)
Yu et al,25 2014
PMID 25117751
United States, academic Radiology reports (CTA chest), English 10 330 reports from 1 hospital
1 972/10 330 true positives
50% for training, 50% for testing Rule-based NILE system, output converted to numeric features LR with adaptive LASSO penalty PE
PPV, 0.95
NPV, 0.99
AUC, 0.998 ± 0.005
F1 score, 0.96

A-HNN, attention–based hierarchical neural network; CTA, computed tomography angiography; CTV, computed tomography venography; DPA-HNN, domain phrase attention–based hierarchical neural network; HNN, hierarchial neural network; IEA, information extraction algorithm; LASSO, binary logistic regression models; MaxEnt, maximum entropy; MRI, magnetic resonance imaging; NILE, narrative information linear extraction; PE, pulmonary embolism; RF, random forest; V/Q, ventilation/perfusion.

If multiple models were used, the model with the best performance measure is reported.