Skip to main content
. 2012 Sep 28;4(9):75. doi: 10.1186/gm376

Table 5.

Performance using gene2pubmed as the gene-literature data source

Scoring method Novel MEDLINE validation AUC (02/2007-01/2009) Novel MEDLINE validation AUC (02/2007-04/2010) Pre-existing CTD validation AUC (11/2008) Novel CTD validation AUC (11/2008-04/2010) Pre-existing MEDLINE validation AUC (02/2007) Mean AUC Rank
Cosine distance of term frequency-inverse document frequency 0.92 0.91 0.95 0.93 0.98 0.94 2
Cosine distance of P-values 0.53 0.51 0.65 0.63 0.53 0.57 16
Cosine distance of term fractions 0.90 0.89 0.93 0.91 0.96 0.92 5
Sum of the log of combined P-values 0.91 0.89 0.94 0.94 0.94 0.92 3
Sum of the differences of log P-values 0.91 0.91 0.77 0.83 0.93 0.87 7
L2 of log-p of overlapping terms only 0.96 0.95 0.92 0.94 0.99 0.95 1
L2 of term fractions of overlapping terms only 0.64 0.62 0.57 0.60 0.53 0.59 15
L2 of log of P-values 0.90 0.90 0.76 0.83 0.93 0.86 10
L2 of P-values 0.89 0.89 0.75 0.81 0.92 0.86 12
L2 of term fractions 0.92 0.90 0.91 0.92 0.95 0.92 4
L2 of term frequency 0.90 0.90 0.76 0.82 0.93 0.86 11
Term coverage 0.90 0.91 0.77 0.83 0.93 0.87 8
Term overlap 0.91 0.89 0.90 0.92 0.90 0.90 6
Number of gene MeSH terms 0.85 0.82 0.85 0.88 0.83 0.85 13
Number of disease MeSH terms 0.90 0.90 0.76 0.83 0.93 0.86 9
Gene ID 0.75 0.73 0.78 0.79 0.74 0.76 14

AUC of the described scoring methods were compared and tested on the validation sets. CTD, Comparative Toxicogenomics Database.