Skip to main content
. 2017 Oct 31;5(4):e42. doi: 10.2196/medinform.8531

Table 1.

Performance of different natural language processing systems on the evaluation set under 4 conditions using 100, 200, 500, and 1000 target-domain training examplesa.

System AUC-ROCb Average precision

100 200 500 1000 100 200 500 1000
SourceOnly 0.739 0.739 0.739 0.739 0.811 0.811 0.811 0.811
TargetOnly 0.728 0.749 0.769 0.782 0.799 0.816 0.833 0.844
ADS-fsac 0.746 0.756 0.776 0.790 0.815 0.823 0.839 0.850
ADS-sdsd 0.751 0.759 0.775 0.786 0.819 0.826 0.838 0.847
ADS-fsa vs ADS-sdse

t99 4.25 2.79
8.78 3.81 3.04
11.58

P values <.001 .01
<.001 <.001 .003
<.001

aThe highest performance scores are italicized.

bAUC-ROC: area under the receiver operating characteristic curve.

cADS-fsa: adapted distant supervision-feature space augmentation.

dADS-sds: adapted distant supervision-supervised distant supervision.

eThe P values for difference between ADS-fsa and SourceOnly, ADS-sds and SourceOnly, ADS-fsa and TargetOnly, and ADS-sds and TargetOnly are <.001 (t99 ranges from 4.84 to 133.31) for all metrics under all conditions. We report the P values (if the P value ≤.05) and the corresponding t99 values for difference between ADS-fsa and ADS-sds.