Table 3:
Method name | Strategy | Main advantages | Main limitations | Citation |
---|---|---|---|---|
ComBat | Bayesian empirical | Removes batch effect in most cases | Removes biological signal in most cases | [4] |
RUV | Linear model | Effective with spike-in controls | Individual variants make specific assumptions about the data | [79] |
removeBatchEffect | Linear model | Generalizable to most transcriptomic data types | May be less effective in complex experimental designs | [80] |
SVN | Linear model | Generalizable to many cases | Assumes that feature similarities between datasets are due to biology | [81] |
mnnCorrect | Mutual nearest neighbours | Accounts for heterogeneity within sample groups | Restricted to single-cell data | [78] |
MINT | Multivariate model | Robust to overfitting and strong multidimensional technical variation | Minimum sample count requirement | [82] |
Scanorama | Mutual nearest neighbours | Scales to very large sample sizes. Robust to overcorrection | Restricted to single-cell data | [83] |
MultiCluster | Tensor decomposition | Accounts for multiple batch variables simultaneously | Restricted to 3-way variable comparisons | [84] |
zeroSum | Zero sum regression | Generalizable across different technologies and platforms | Weak or non-linear features may be masked by strong features | [85] |
Batch is a special case of intra-modality harmonization and is included for completeness because many underlying strategies used are applicable to broader data integration. All methods are restricted to a single data modality of transcriptomics. Names, strategies, advantages, and limitations of each method are provided. Regarding advantages and limitations, a few major points are highlighted. A citation for reference to the original publication of each method is provided where full details can be obtained.