AUC and F-scores (when specificity = 0.95) of algorithms trained with n gold-standard labels, using features selected by EXPERT, expert-curated features; M2, the main ICD and NLP features only; AFEP, the original AFEP procedure; A5, expanded 5 sources + AFEP selection (frequency control + rank correlation selection); A5V, expanded 5 sources + majority voting + AFEP selection; S2, original 2 sources + surrogate-assisted selection; and SAFE, the proposed procedure with 5 sources and majority voting, plus surrogate-assisted selection.