Skip to main content
. 2017 Jan 17;5(1):e3. doi: 10.2196/medinform.6690

Table 2.

Best predictive performance from each random forest patient similarity metric (PSM) model in terms of mean area under the receiver operating characteristic curve and area under the precision-recall curve in comparison with cosine PSM and traditional models with no PSM. All cosine PSM results are from Lee et al [14].


Number of similar patients at best predictive performance Best predictive performance, mean (95% CI)

AUROCa AUPRCb AUROC AUPRC

RFc PSMd Cosine PSM RF PSM Cosine PSM RF PSM Cosine PSM No PSM RF PSM Cosine PSM No PSM
DCe 260 100 230 60 0.801 (0.792-0.811) 0.797 (0.791-0.803) 0.693 (0.679-0.707) 0.429 (0.409-0.449) 0.393 (0.378-0.407) 0.280 (0.263-0.297)
LRf 5000 6000 9000 6000 0.824 (0.815-0.832) 0.830 (0.825-0.836) 0.811 (0.799-0.821) 0.460 (0.437-0.484) 0.474 (0.460-0.488) 0.449 (0.430-0.468)
DTg 5000 2000 7000 4000 0.779 (0.775-0.784) 0.753 (0.742-0.764) 0.721 (0.705-0.738) 0.352 (0.337-0.367) 0.347 (0.335-0.358) 0.339 (0.324-0.353)
RF 15000 4000 0.839 (0.835-0.844) 0.839 (0.835-0.844) 0.507 (0.527-0.486) 0.505 (0.487-0.523)
CSRFh 0.832 (0.821-0.843) 0.496 (0.520-0.471)

aAUROC: area under the receiver operating characteristic curve.

bAUPRC: area under the precision-recall curve.

cRF: random forest.

dPSM: patient similarity metric.

eDC: death counting.

fLR: logistic regression.

gDT: decision tree.

hCSRF: case-specific random forest.