Skip to main content
. Author manuscript; available in PMC: 2021 Oct 10.
Published in final edited form as: J Biomed Inform. 2017 Nov 1;76:41–49. doi: 10.1016/j.jbi.2017.10.013

Table 2.

Methods used to combine signal statistics

Method Description
Random Forests Uses the R package ‘randomForest’. Predictors: s1, s2, …, sK.
Parameters: ntree=100, mtry=2 (ntree: number of trees in the forest, mtry: number of variables randomly sampled as candidates at each tree split). Uses out-of-sample predictions.
Logistic Regression Uses the R package ‘glm’. Predictors: s1, s2, …, sK.
Support Vector Machines Uses the R package ‘e1071’. Predictors: s1, s2, …, sK.
Parameters: kernel=“radial”, method = “C-classification” (for classification vs regression). The remaining parameters were set to their default values.
Naive Bayes Smoothing Model: P(GT|s1, s2, …, sK) = ηkP(GT|sk)/P(GT)K−1.
η is a normalizing constant st the class conditional probabilities on the LHS sum to 1. P(GT|sk) is obtained by fitting a logistic regression to the set of test case signal statistics generated from data source k. The modal combines these probabilities by assuming that P(s1, s2, …, sK|GT) = ∏k P(sk|GT), hence Naive Bayes. P(GT) was set to the value 0.5.
Arithmetic Average Model: s¯=(s1+s2++sK)/K
Geometric Average Model: s¯=s1s2sKK
Fixed Effects Model: s¯=kwklog(sk)/kwk
wk=1/VAR(log(sk))
Random Effects Model: s¯=kwklog(sk)/kwk
wk=1/(VAR(log(sk))+τ2),
τ2 is estimated using the DerSimonian and Laird method[52].
Empirical Bayes Model: s¯=αkwklog(sk)/kwk+(1α)θ
wk=1/VAR(log(sk)), α = τ2/(τ2 + v2), v2=kVAR(log(sk)/kVAR(log(sk)
τ2 and θ are estimated via the EM algorithm.

sk: signal statistic (ratio of eq. 1) produced from data source k for a given association (test case). s1, s2, …, sK : set signal statistics produced from data sources 1 to K for a given association. GT: ground truth assigned to a test case (true/false)