Table 3.
Explanation of the scoring functions evaluated
Scoring method | Description |
---|---|
Cosine distance of term frequency-inverse document frequency | |
Cosine distance of P-values | |
Cosine distance of term fractions | |
Sum of the log of combined P-values | |
Sum of the differences of log P-values | |
L2 of log-p of overlapping terms only | |
L2 of term fractions of overlapping terms only | |
L2 of log of P-values | |
L2 of P-values | |
L2 of term fractions | |
L2 of term frequency | |
Term coverage | |G∪D| |
Term overlap | |G∩D| |
Number of gene MeSH terms | |G| |
Number of disease MeSH terms | |D| |
Gene ID | Entrez Gene ID of the gene |
M refers to the set of all MeSH terms, G and D to the MeSH terms for the gene and disease profile, respectively. g(i), gf(i), gp(i) and gi(i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the gene profile. d(i), df(i), dp(i) and di(i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the disease profile.