Table 1.
Methodology comparison between AFEP, SAFE, and SEDFE.
| AFEP | SAFE | SEDFE | |
|---|---|---|---|
| Commonality | Applies NER to online articles about the target phenotype to find an initial list of clinical concepts as candidate features | ||
| Feature selection method | Frequency control, then threshold by rank correlation with the NLP feature representing the target phenotype | Frequency control, majority voting, then use sparse regression to predict the silver-standard labels derived from surrogate features | Majority voting; Use concept embedding to determine feature relatedness; Use semantic combination and the BIC to determine the number of needed features |
| Data requirement | EHR data (hospital dependent and not sharable) | EHR data (hospital dependent and not sharable) | A biomedical corpus for training word embedding (usually sharable) |
| Tuning parameters | Threshold for the rank correlation | (1) Upper and lower thresholds of the surrogate features for creating the silver standard labels, which are affected by the distribution of the features, and therefore phenotype dependent; (2) The number of patients to sample, which affects the number of selected features | The word embedding parameters, which are not overly sensitive. The embedding is done only once for all phenotypes |