Table 3.
Explanation of the scoring functions evaluated
| Scoring method | Description |
|---|---|
| Cosine distance of term frequency-inverse document frequency | |
| Cosine distance of P-values | |
| Cosine distance of term fractions | |
| Sum of the log of combined P-values | |
| Sum of the differences of log P-values | |
| L2 of log-p of overlapping terms only | |
| L2 of term fractions of overlapping terms only | |
| L2 of log of P-values | |
| L2 of P-values | |
| L2 of term fractions | |
| L2 of term frequency | |
| Term coverage | |G∪D| |
| Term overlap | |G∩D| |
| Number of gene MeSH terms | |G| |
| Number of disease MeSH terms | |D| |
| Gene ID | Entrez Gene ID of the gene |
M refers to the set of all MeSH terms, G and D to the MeSH terms for the gene and disease profile, respectively. g(i), gf(i), gp(i) and gi(i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the gene profile. d(i), df(i), dp(i) and di(i) refer to the frequency, term fraction, hypergeometric P-value and term frequency-inverse document frequency for the MeSH term i of the disease profile.