. Author manuscript; available in PMC: 2021 Oct 10.

Published in final edited form as: J Biomed Inform. 2017 Nov 1;76:41–49. doi: 10.1016/j.jbi.2017.10.013

Table 2.

Methods used to combine signal statistics

Method	Description
Random Forests	Uses the R package ‘randomForest’. Predictors: s₁, s₂, …, s_K. Parameters: ntree=100, mtry=2 (ntree: number of trees in the forest, mtry: number of variables randomly sampled as candidates at each tree split). Uses out-of-sample predictions.
Logistic Regression	Uses the R package ‘glm’. Predictors: s₁, s₂, …, s_K.
Support Vector Machines	Uses the R package ‘e1071’. Predictors: s₁, s₂, …, s_K. Parameters: kernel=“radial”, method = “C-classification” (for classification vs regression). The remaining parameters were set to their default values.
Naive Bayes Smoothing	Model: P(GT\|s₁, s₂, …, s_{K) = η∏_kP(GT\|s_k)/P(GT)^K−1.} η is a normalizing constant st the class conditional probabilities on the LHS sum to 1. P(GT\|s_k) is obtained by fitting a logistic regression to the set of test case signal statistics generated from data source k. The modal combines these probabilities by assuming that P(s₁, s₂, …, s_K\|GT) = ∏_k P(s_k\|GT), hence Naive Bayes. P(GT) was set to the value 0.5.
Arithmetic Average	Model: $\bar{s} = (s_{1} + s_{2} + \dots + s_{K}) / K$
Geometric Average	Model: $\bar{s} = \sqrt[K]{s_{1} \cdot s_{2} \cdot \dots \cdot s_{K}}$
Fixed Effects	Model: $\bar{s} = \sum_{k} w_{k} log (s_{k}) / \sum_{k} w_{k}$ $w_{k} = 1 / V A R (log (s_{k}))$
Random Effects	Model: $\bar{s} = \sum_{k} w_{k} log (s_{k}) / \sum_{k} w_{k}$ $w_{k} = 1 / (V A R (log (s_{k})) + τ^{2})$ , τ² is estimated using the DerSimonian and Laird method[52].
Empirical Bayes	Model: $\bar{s} = α \sum_{k} w_{k} log (s_{k}) / \sum_{k} w_{k} + (1 - α) θ$ $w_{k} = 1 / V A R (log (s_{k}))$ , α = τ²/(τ² + v²), $v^{2} = \prod_{k} V A R (log (s_{k}) / \sum_{k} V A R (log (s_{k})$ τ² and θ are estimated via the EM algorithm.

s_k: signal statistic (ratio of eq. 1) produced from data source k for a given association (test case). s₁, s₂, …, s_K : set signal statistics produced from data sources 1 to K for a given association. GT: ground truth assigned to a test case (true/false)