Skip to main content
. 2012 Nov 6;45(1):1–16. doi: 10.1152/physiolgenomics.00082.2012

Table 1.

Algorithms developed for CNV detection using array-based data

Platform Data Type Algorithm Advantages Disadvantages Size Range of CNVs Ref. List No.
PennCNV Illunima (mainly) and Affymetrix Hidden Markov Model (HMM) Detects smaller CNVs and using family trio information to detect inherited and novel CNVs PennCNV assumes that the model parameters are known which is not true when normal tissue contamination across samples require sample-specific model parameters Kilo-bp resolution, with a median size of 12 kb 85
Pedigree information significantly improved the sensitivity and false discovery rate Sensitive to intensity noise
QuantiSNP Illumina Objective Bayes-HMM Ability to combine data from several platforms of differing resolution Size of detected CNVs are very large (>190 Kb) >190 kb 13
Provides probabilistic quantification of state classifications
Significantly increased the accuracy of segmental aneuploidy identification and mapping
GenoCNV Illumina HMM Data-driven parameter estimation Designed for Illumina SNP arrays NA 80
Effects of tissue contamination are explicitly modeled in tumor samples Appropriate normalization and transformation method is needed for Affymetrix SNP arrays
Detects the cancer CNV in a tumor samples contaminated with normal tissue
Reports genotype information within CNV regions
MixHMM Illumina HMM Directly deal with stromal contamination caused heterogeneity in clinical samples, allowing detection of tumor CNV events in heterogeneous tumor samples Is not intended to distinguish among multiple clones NA 51
CNV detection for copy numbers is ≤7 Works only for detection in autosomes
Allows more complete and accurate description of other forms of allelic imbalance
ACNE Affymetrix Multisample summarization method based on nonnegative matrix factorization Ddeals directly with the cross hybridization between probes within SNP probeset; estimates the allele-specific CNVs Relies on the data given by CRMA v2 after background and offset removal Mb resolution 65
CRLMM Affymetrix SNP 6.0 Linear model to estimate batch- and locus-specific parameters Specifically designed to address batch effects and uncertainty For the study of germline traits, the model is most useful when 25 or more samples have been processed together in a batch NA 75
Does not require reference samples to estimate model parameters The assumption of fixed batch effect and linearity between average intensity and allelic dosage is not always true
Uses of biallelic genotype calls from experimental data to estimate batch-specific and locus-specific parameters of background and signal without the requirement of training data
ITALICS Affymetrix Uses GLAD algorithm to estimate biological signal; estimates nonrelevant effects by multiple linear regression Estimates both biological and nonrelevant effects in an alternate, iterative manner, accurately eliminates irrelevant effects Is based solely on perfect match (PM) probes; needs to obtain a reference dataset for calculating the quartet effect NA 71
Corrects for experimental artifacts caused by spatial effects
CNstream Illumina Beadchip Heuristics and parametrical statistics to assign a confidence score to each sample at each probe Joint clustering analysis of all the intensity samples at each probe with the computational speed and analytical accuracy of estimating copy numer polymorphisms (CNPs) from segments of consecutive probes Designed for Illumina arrays 10–100 kb 3
Allows a whole-genome analysis to be carried out without the need for previously defined CNP maps
High level of accuracy was achieved by analyzing the measures of each intensity channel separately and combining information from multiple samples
cn.FARMS Affymetrix, Illumina and Agilent array Probabilistic latent variable model, which is optimized by a Bayesian maximum a posteriori approach. Controls for false positive rate Does not do segmentation of integer copy number estimation NA 12
Faster computation with low computational load
GADA CGH, Affymetrix and Illumina Compact liner algebra representation for the genome copy number based on normalized probe intensities. Then it use sparse Bayesian learning (SBL) and backward elimination (BE) for breakpoint identification Fast computational speed Does not include allele-specific copy number data NA 69
Low false discovery rate
CNVworkshop A variety of SNP array platforms Genotype-specific extension of the Circular Binary Segmentation algorithm Integration with the UCSC Genome Browser Predominantly uses circular binary segmentation (CBS) algorithm, which carries a heavy computational cost NA 26
Tabular displays of genomic and pathogenic attributes for each CNV
Results are easily queried, sorted, filtered, and visualized via a web-based presentation layer that includes a GBrowse-based graphical representation of CNV content and relevant public data
R-Gada Illumina, Affymetrix and aCGH arrays Uses Bayesian learning and BE for segmentation, parallel computation Fast and accurate segmentation algorithm to call CNVs Rely on other packages for normalization and segmentation procedure; only use Log R ratio in the analysis; Sensitivity, given by the maximum breakpoint sparseness, is controlled by the hyperparameter a 2 kb–8.5 Mb 68
Multivariate analysis of population structure
Has tools for performing genome wide association studies
Has a complete set of tools for visualizing and reporting copy number alterations
NEXUS CGH and SNP arrays, data generated by array-image analysis software Rank Segmentation, SNPRank Segmentation, FASST Segmentation and SNP-FASST Segmentation Convenient tool to explore the data on the population level by performing various statistical analyses, such as class comparisons clustering, enrichment analysis, survival analysis NA NA 17
SCIMMkit Illumina Infinium II and GoldenGate SNP assays Two rounds of mixture likelihood-based clustering For targeted interrogation of CNVs using Illumina Infinium II and GoldenGate SNP assaysis Does not explicitly include batch effects in its statistical model 1.6–560 kb 93
Can be applied to standardized genome-wide SNP arrays and customized multiplexed SNP panels
Provides economy, efficiency and flexibility in experimental design
GLAD Array-based, comparative genomic hybridization (CGH) data Local constant Gaussian regression model using a weighted maximum likelihood estimator Automatic detection of breakpoints Fails to detect very fine structure NA 36
Detects large regions with high accuracy
CBS Array-based, comparative genomic hybridization (CGH)data Maximum of a likelihood ratio statistic is used recursively to detect narrower segments of aberration Consistent Heavy computation cost with majority of its computational time spent on significance evaluation NA 35, 64
Low FDR

* NA, not available.