Table 2.
Method | Description |
---|---|
Random Forests | Uses the R package ‘randomForest’. Predictors: s1, s2, …, sK. Parameters: ntree=100, mtry=2 (ntree: number of trees in the forest, mtry: number of variables randomly sampled as candidates at each tree split). Uses out-of-sample predictions. |
Logistic Regression | Uses the R package ‘glm’. Predictors: s1, s2, …, sK. |
Support Vector Machines | Uses the R package ‘e1071’. Predictors: s1, s2, …, sK. Parameters: kernel=“radial”, method = “C-classification” (for classification vs regression). The remaining parameters were set to their default values. |
Naive Bayes Smoothing |
Model: P(GT|s1, s2, …, sK) = η∏kP(GT|sk)/P(GT)K−1. η is a normalizing constant st the class conditional probabilities on the LHS sum to 1. P(GT|sk) is obtained by fitting a logistic regression to the set of test case signal statistics generated from data source k. The modal combines these probabilities by assuming that P(s1, s2, …, sK|GT) = ∏k P(sk|GT), hence Naive Bayes. P(GT) was set to the value 0.5. |
Arithmetic Average | Model: |
Geometric Average | Model: |
Fixed Effects |
Model: |
Random Effects |
Model: , τ2 is estimated using the DerSimonian and Laird method[52]. |
Empirical Bayes |
Model: , α = τ2/(τ2 + v2), τ2 and θ are estimated via the EM algorithm. |
sk: signal statistic (ratio of eq. 1) produced from data source k for a given association (test case). s1, s2, …, sK : set signal statistics produced from data sources 1 to K for a given association. GT: ground truth assigned to a test case (true/false)