Table 1.
Algorithms developed for CNV detection using array-based data
Platform | Data Type | Algorithm | Advantages | Disadvantages | Size Range of CNVs | Ref. List No. |
---|---|---|---|---|---|---|
PennCNV | Illunima (mainly) and Affymetrix | Hidden Markov Model (HMM) | Detects smaller CNVs and using family trio information to detect inherited and novel CNVs | PennCNV assumes that the model parameters are known which is not true when normal tissue contamination across samples require sample-specific model parameters | Kilo-bp resolution, with a median size of 12 kb | 85 |
Pedigree information significantly improved the sensitivity and false discovery rate | Sensitive to intensity noise | |||||
QuantiSNP | Illumina | Objective Bayes-HMM | Ability to combine data from several platforms of differing resolution | Size of detected CNVs are very large (>190 Kb) | >190 kb | 13 |
Provides probabilistic quantification of state classifications | ||||||
Significantly increased the accuracy of segmental aneuploidy identification and mapping | ||||||
GenoCNV | Illumina | HMM | Data-driven parameter estimation | Designed for Illumina SNP arrays | NA | 80 |
Effects of tissue contamination are explicitly modeled in tumor samples | Appropriate normalization and transformation method is needed for Affymetrix SNP arrays | |||||
Detects the cancer CNV in a tumor samples contaminated with normal tissue | ||||||
Reports genotype information within CNV regions | ||||||
MixHMM | Illumina | HMM | Directly deal with stromal contamination caused heterogeneity in clinical samples, allowing detection of tumor CNV events in heterogeneous tumor samples | Is not intended to distinguish among multiple clones | NA | 51 |
CNV detection for copy numbers is ≤7 | Works only for detection in autosomes | |||||
Allows more complete and accurate description of other forms of allelic imbalance | ||||||
ACNE | Affymetrix | Multisample summarization method based on nonnegative matrix factorization | Ddeals directly with the cross hybridization between probes within SNP probeset; estimates the allele-specific CNVs | Relies on the data given by CRMA v2 after background and offset removal | Mb resolution | 65 |
CRLMM | Affymetrix SNP 6.0 | Linear model to estimate batch- and locus-specific parameters | Specifically designed to address batch effects and uncertainty | For the study of germline traits, the model is most useful when 25 or more samples have been processed together in a batch | NA | 75 |
Does not require reference samples to estimate model parameters | The assumption of fixed batch effect and linearity between average intensity and allelic dosage is not always true | |||||
Uses of biallelic genotype calls from experimental data to estimate batch-specific and locus-specific parameters of background and signal without the requirement of training data | ||||||
ITALICS | Affymetrix | Uses GLAD algorithm to estimate biological signal; estimates nonrelevant effects by multiple linear regression | Estimates both biological and nonrelevant effects in an alternate, iterative manner, accurately eliminates irrelevant effects | Is based solely on perfect match (PM) probes; needs to obtain a reference dataset for calculating the quartet effect | NA | 71 |
Corrects for experimental artifacts caused by spatial effects | ||||||
CNstream | Illumina Beadchip | Heuristics and parametrical statistics to assign a confidence score to each sample at each probe | Joint clustering analysis of all the intensity samples at each probe with the computational speed and analytical accuracy of estimating copy numer polymorphisms (CNPs) from segments of consecutive probes | Designed for Illumina arrays | 10–100 kb | 3 |
Allows a whole-genome analysis to be carried out without the need for previously defined CNP maps | ||||||
High level of accuracy was achieved by analyzing the measures of each intensity channel separately and combining information from multiple samples | ||||||
cn.FARMS | Affymetrix, Illumina and Agilent array | Probabilistic latent variable model, which is optimized by a Bayesian maximum a posteriori approach. | Controls for false positive rate | Does not do segmentation of integer copy number estimation | NA | 12 |
Faster computation with low computational load | ||||||
GADA | CGH, Affymetrix and Illumina | Compact liner algebra representation for the genome copy number based on normalized probe intensities. Then it use sparse Bayesian learning (SBL) and backward elimination (BE) for breakpoint identification | Fast computational speed | Does not include allele-specific copy number data | NA | 69 |
Low false discovery rate | ||||||
CNVworkshop | A variety of SNP array platforms | Genotype-specific extension of the Circular Binary Segmentation algorithm | Integration with the UCSC Genome Browser | Predominantly uses circular binary segmentation (CBS) algorithm, which carries a heavy computational cost | NA | 26 |
Tabular displays of genomic and pathogenic attributes for each CNV | ||||||
Results are easily queried, sorted, filtered, and visualized via a web-based presentation layer that includes a GBrowse-based graphical representation of CNV content and relevant public data | ||||||
R-Gada | Illumina, Affymetrix and aCGH arrays | Uses Bayesian learning and BE for segmentation, parallel computation | Fast and accurate segmentation algorithm to call CNVs | Rely on other packages for normalization and segmentation procedure; only use Log R ratio in the analysis; Sensitivity, given by the maximum breakpoint sparseness, is controlled by the hyperparameter a | 2 kb–8.5 Mb | 68 |
Multivariate analysis of population structure | ||||||
Has tools for performing genome wide association studies | ||||||
Has a complete set of tools for visualizing and reporting copy number alterations | ||||||
NEXUS | CGH and SNP arrays, data generated by array-image analysis software | Rank Segmentation, SNPRank Segmentation, FASST Segmentation and SNP-FASST Segmentation | Convenient tool to explore the data on the population level by performing various statistical analyses, such as class comparisons clustering, enrichment analysis, survival analysis | NA | NA | 17 |
SCIMMkit | Illumina Infinium II and GoldenGate SNP assays | Two rounds of mixture likelihood-based clustering | For targeted interrogation of CNVs using Illumina Infinium II and GoldenGate SNP assaysis | Does not explicitly include batch effects in its statistical model | 1.6–560 kb | 93 |
Can be applied to standardized genome-wide SNP arrays and customized multiplexed SNP panels | ||||||
Provides economy, efficiency and flexibility in experimental design | ||||||
GLAD | Array-based, comparative genomic hybridization (CGH) data | Local constant Gaussian regression model using a weighted maximum likelihood estimator | Automatic detection of breakpoints | Fails to detect very fine structure | NA | 36 |
Detects large regions with high accuracy | ||||||
CBS | Array-based, comparative genomic hybridization (CGH)data | Maximum of a likelihood ratio statistic is used recursively to detect narrower segments of aberration | Consistent | Heavy computation cost with majority of its computational time spent on significance evaluation | NA | 35, 64 |
Low FDR |
* NA, not available.