Skip to main content
. 2020 Jun 16;9(6):giaa064. doi: 10.1093/gigascience/giaa064

Table 3:

Intra-modality data harmonization approaches

Method name Strategy Main advantages Main limitations Citation
ComBat Bayesian empirical Removes batch effect in most cases Removes biological signal in most cases [4]
RUV Linear model Effective with spike-in controls Individual variants make specific assumptions about the data [79]
removeBatchEffect Linear model Generalizable to most transcriptomic data types May be less effective in complex experimental designs [80]
SVN Linear model Generalizable to many cases Assumes that feature similarities between datasets are due to biology [81]
mnnCorrect Mutual nearest neighbours Accounts for heterogeneity within sample groups Restricted to single-cell data [78]
MINT Multivariate model Robust to overfitting and strong multidimensional technical variation Minimum sample count requirement [82]
Scanorama Mutual nearest neighbours Scales to very large sample sizes. Robust to overcorrection Restricted to single-cell data [83]
MultiCluster Tensor decomposition Accounts for multiple batch variables simultaneously Restricted to 3-way variable comparisons [84]
zeroSum Zero sum regression Generalizable across different technologies and platforms Weak or non-linear features may be masked by strong features [85]

Batch is a special case of intra-modality harmonization and is included for completeness because many underlying strategies used are applicable to broader data integration. All methods are restricted to a single data modality of transcriptomics. Names, strategies, advantages, and limitations of each method are provided. Regarding advantages and limitations, a few major points are highlighted. A citation for reference to the original publication of each method is provided where full details can be obtained.