Skip to main content
. Author manuscript; available in PMC: 2020 Mar 1.
Published in final edited form as: J Biomed Inform. 2019 Feb 7;91:103122. doi: 10.1016/j.jbi.2019.103122

Table 1.

Methodology comparison between AFEP, SAFE, and SEDFE.

AFEP SAFE SEDFE
Commonality Applies NER to online articles about the target phenotype to find an initial list of clinical concepts as candidate features
Feature selection method Frequency control, then threshold by rank correlation with the NLP feature representing the target phenotype Frequency control, majority voting, then use sparse regression to predict the silver-standard labels derived from surrogate features Majority voting; Use concept embedding to determine feature relatedness; Use semantic combination and the BIC to determine the number of needed features
Data requirement EHR data (hospital dependent and not sharable) EHR data (hospital dependent and not sharable) A biomedical corpus for training word embedding (usually sharable)
Tuning parameters Threshold for the rank correlation (1) Upper and lower thresholds of the surrogate features for creating the silver standard labels, which are affected by the distribution of the features, and therefore phenotype dependent; (2) The number of patients to sample, which affects the number of selected features The word embedding parameters, which are not overly sensitive. The embedding is done only once for all phenotypes