. 2012 Nov 6;45(1):1–16. doi: 10.1152/physiolgenomics.00082.2012

Table 1.

Algorithms developed for CNV detection using array-based data

Platform	Data Type	Algorithm	Advantages	Disadvantages	Size Range of CNVs	Ref. List No.
PennCNV	Illunima (mainly) and Affymetrix	Hidden Markov Model (HMM)	Detects smaller CNVs and using family trio information to detect inherited and novel CNVs	PennCNV assumes that the model parameters are known which is not true when normal tissue contamination across samples require sample-specific model parameters	Kilo-bp resolution, with a median size of 12 kb	85
PennCNV	Illunima (mainly) and Affymetrix	Hidden Markov Model (HMM)	Pedigree information significantly improved the sensitivity and false discovery rate	Sensitive to intensity noise	Kilo-bp resolution, with a median size of 12 kb	85
QuantiSNP	Illumina	Objective Bayes-HMM	Ability to combine data from several platforms of differing resolution	Size of detected CNVs are very large (>190 Kb)	>190 kb	13
			Provides probabilistic quantification of state classifications
			Significantly increased the accuracy of segmental aneuploidy identification and mapping
GenoCNV	Illumina	HMM	Data-driven parameter estimation	Designed for Illumina SNP arrays	NA	80
			Effects of tissue contamination are explicitly modeled in tumor samples	Appropriate normalization and transformation method is needed for Affymetrix SNP arrays
			Detects the cancer CNV in a tumor samples contaminated with normal tissue
			Reports genotype information within CNV regions
MixHMM	Illumina	HMM	Directly deal with stromal contamination caused heterogeneity in clinical samples, allowing detection of tumor CNV events in heterogeneous tumor samples	Is not intended to distinguish among multiple clones	NA	51
			CNV detection for copy numbers is ≤7	Works only for detection in autosomes
			Allows more complete and accurate description of other forms of allelic imbalance	Works only for detection in autosomes
ACNE	Affymetrix	Multisample summarization method based on nonnegative matrix factorization	Ddeals directly with the cross hybridization between probes within SNP probeset; estimates the allele-specific CNVs	Relies on the data given by CRMA v2 after background and offset removal	Mb resolution	65
CRLMM	Affymetrix SNP 6.0	Linear model to estimate batch- and locus-specific parameters	Specifically designed to address batch effects and uncertainty	For the study of germline traits, the model is most useful when 25 or more samples have been processed together in a batch	NA	75
			Does not require reference samples to estimate model parameters	The assumption of fixed batch effect and linearity between average intensity and allelic dosage is not always true
			Uses of biallelic genotype calls from experimental data to estimate batch-specific and locus-specific parameters of background and signal without the requirement of training data
ITALICS	Affymetrix	Uses GLAD algorithm to estimate biological signal; estimates nonrelevant effects by multiple linear regression	Estimates both biological and nonrelevant effects in an alternate, iterative manner, accurately eliminates irrelevant effects	Is based solely on perfect match (PM) probes; needs to obtain a reference dataset for calculating the quartet effect	NA	71
ITALICS	Affymetrix		Corrects for experimental artifacts caused by spatial effects		NA	71
CNstream	Illumina Beadchip	Heuristics and parametrical statistics to assign a confidence score to each sample at each probe	Joint clustering analysis of all the intensity samples at each probe with the computational speed and analytical accuracy of estimating copy numer polymorphisms (CNPs) from segments of consecutive probes	Designed for Illumina arrays	10–100 kb	3
			Allows a whole-genome analysis to be carried out without the need for previously defined CNP maps
			High level of accuracy was achieved by analyzing the measures of each intensity channel separately and combining information from multiple samples
cn.FARMS	Affymetrix, Illumina and Agilent array	Probabilistic latent variable model, which is optimized by a Bayesian maximum a posteriori approach.	Controls for false positive rate	Does not do segmentation of integer copy number estimation	NA	12
cn.FARMS	Affymetrix, Illumina and Agilent array		Faster computation with low computational load	Does not do segmentation of integer copy number estimation	NA	12
GADA	CGH, Affymetrix and Illumina	Compact liner algebra representation for the genome copy number based on normalized probe intensities. Then it use sparse Bayesian learning (SBL) and backward elimination (BE) for breakpoint identification	Fast computational speed	Does not include allele-specific copy number data	NA	69
GADA	CGH, Affymetrix and Illumina		Low false discovery rate	Does not include allele-specific copy number data	NA	69
CNVworkshop	A variety of SNP array platforms	Genotype-specific extension of the Circular Binary Segmentation algorithm	Integration with the UCSC Genome Browser	Predominantly uses circular binary segmentation (CBS) algorithm, which carries a heavy computational cost	NA	26
			Tabular displays of genomic and pathogenic attributes for each CNV
			Results are easily queried, sorted, filtered, and visualized via a web-based presentation layer that includes a GBrowse-based graphical representation of CNV content and relevant public data
R-Gada	Illumina, Affymetrix and aCGH arrays	Uses Bayesian learning and BE for segmentation, parallel computation	Fast and accurate segmentation algorithm to call CNVs	Rely on other packages for normalization and segmentation procedure; only use Log R ratio in the analysis; Sensitivity, given by the maximum breakpoint sparseness, is controlled by the hyperparameter a	2 kb–8.5 Mb	68
			Multivariate analysis of population structure
			Has tools for performing genome wide association studies
			Has a complete set of tools for visualizing and reporting copy number alterations
NEXUS	CGH and SNP arrays, data generated by array-image analysis software	Rank Segmentation, SNPRank Segmentation, FASST Segmentation and SNP-FASST Segmentation	Convenient tool to explore the data on the population level by performing various statistical analyses, such as class comparisons clustering, enrichment analysis, survival analysis	NA	NA	17
SCIMMkit	Illumina Infinium II and GoldenGate SNP assays	Two rounds of mixture likelihood-based clustering	For targeted interrogation of CNVs using Illumina Infinium II and GoldenGate SNP assaysis	Does not explicitly include batch effects in its statistical model	1.6–560 kb	93
			Can be applied to standardized genome-wide SNP arrays and customized multiplexed SNP panels
			Provides economy, efficiency and flexibility in experimental design
GLAD	Array-based, comparative genomic hybridization (CGH) data	Local constant Gaussian regression model using a weighted maximum likelihood estimator	Automatic detection of breakpoints	Fails to detect very fine structure	NA	36
GLAD	Array-based, comparative genomic hybridization (CGH) data		Detects large regions with high accuracy	Fails to detect very fine structure	NA	36
CBS	Array-based, comparative genomic hybridization (CGH)data	Maximum of a likelihood ratio statistic is used recursively to detect narrower segments of aberration	Consistent	Heavy computation cost with majority of its computational time spent on significance evaluation	NA	35, 64
CBS	Array-based, comparative genomic hybridization (CGH)data		Low FDR		NA	35, 64

* NA, not available.