Skip to main content
. 2012 Sep 28;4(9):75. doi: 10.1186/gm376

Table 3.

Explanation of the scoring functions evaluated

Scoring method Description
Cosine distance of term frequency-inverse document frequency jMgi(j)di(j)jMgi(j)2jMdi(j)2
Cosine distance of P-values iMgp(i)dp(i)iMgp(i)2iMdp(i)2
Cosine distance of term fractions iMgf(i)df(i)iMgf(i)2iMdf(i)2
Sum of the log of combined P-values iMloggp(i)+dp(i)-gp(i)dp(i)
Sum of the differences of log P-values iMloggp(i)dp(i)=iMloggp(i)-logdp(i)
L2 of log-p of overlapping terms only i(GD)loggp(i)-logdp(i)2
L2 of term fractions of overlapping terms only i(GD)gf(i)-df(i)2
L2 of log of P-values iMloggp(i)dp(i)2=iMloggp(i)-logdp(i)2
L2 of P-values iMgp(i)-dp(i)2
L2 of term fractions iMgf(i)-df(i)2
L2 of term frequency iMg(i)-d(i)2
Term coverage |GD|
Term overlap |GD|
Number of gene MeSH terms |G|
Number of disease MeSH terms |D|
Gene ID Entrez Gene ID of the gene

M refers to the set of all MeSH terms, G and D to the MeSH terms for the gene and disease profile, respectively. g(i), gf(i), gp(i) and gi(i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the gene profile. d(i), df(i), dp(i) and di(i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the disease profile.