. 2020 Nov 5;28(1):104–112. doi: 10.1093/jamia/ocaa220

Table 3.

10-fold cross-validation metrics results of the classification task, based on the predicted label for each note when it was in the cross-validation test set

Experiment	Metric	Precision	Recall	F1-score	Support
Main	Retained	0.931	0.966	0.948	2964
	LTFU	0.861	0.748	0.801	838
	micro average	0.918	0.918	0.918	3802
	macro average	0.896	0.857	0.875	3802
	weighted average	0.916	0.918	0.916	3802

Keep all notes	Retained	0.903	0.925	0.914	6375
	LTFU	0.302	0.248	0.273	838
	micro average	0.846	0.846	0.846	7213
	macro average	0.603	0.586	0.593	7213
	weighted average	0.834	0.846	0.839	7213

Note: LTFU = lost to follow-up; tp = The number of true positives; fp = The number of false positives; fn = The number of false negatives; precision = tp/(tp + fp); recall = tp/(tp + fn); F1-score = 2 * (precision * recall)/(precision + recall); support = The number of occurrences in a class; micro = Globally calculate metrics by using the total true positives, false negatives, and false positives; macro = Get the unweighted average of the metrics for each label; weighted = Calculate metrics for each label and get their average weighted by support. These metrics definitions were taken from scikit-learn, the result table was obtained through scikit-learn’s classification report method. The linear model with stochastic gradient descent and elastic net regularization was evaluated with the best found hyperparameter alpha of 5e-05 for the main experiment, and 1e-05 for the “keep all notes” setup.