Skip to main content
. 2012 Nov 6;45(1):1–16. doi: 10.1152/physiolgenomics.00082.2012

Table 3.

Algorithms developed for CNV detection using direct genomic sequencing data

Platform Data Type Algorithm Advantages Disadvantages Size Range of CNVs Ref. List No.
Control-FREEC Single-end, mate-pair or paired-end genomic sequencing Uses a sliding window approach to calculate read count (RC) in nonoverlapping windows (raw copy number polymorphism). Uses LASSO-based algorithm for the segmentation Designed for cancer deep sequencing when no control is available and/or the genome is polyploidy Designed mainly for Illumina single-end, mate-pair or paired-end sequencing NA 7, 33
Able to identify the genotype status despite contamination of the tumor sample by normal cells (estimated percent of tumor cells was 60%)
ExomeCNV Exome sequencing data Uses normalized depth-of-coverage and B allele frequency to infer CNV and LOH Does not assume random, unbiased distribution of sequence reads, and continuity of search space Same level of consistency was not observed when single-end data were compared with paired-end data 120 bp–240 Mb in size 74
Ability to detect copy-number variation (CNV) and loss of heterozygosity (LOH) from exome sequencing data Resolution of CNV detection with ExomeCNV is limited largely by the probe design
For detecting deletions, 95% power is achieved for segments of size 500 bp or more
SeqGene Exome and transcriptome sequencing data Uses a CBS for CNV detection Detects CNV Reference (such as a normal DNA sample or the average of a group of pooled samples) is needed for absolute copy number calls NA 20, 83
Supports gene expression quantification, mutation and coverage visualization and pathway analysis workflows
mrsFast Short-read sequencing data Simple cache-oblivious, all-to-all list comparison algorithm Rapidly finds all mapping locations of a collection of short reads Mainly a mapping algorithm NA 31
Substantially faster and more accurate Requires another algorithm, variation Hunter, to find CNVs
CNV-seq Shotgun sequencing data Derived aCGH statistical analysis Method can be applied to relatively low sequence coverage with good specificity and sensitivity Assumes sampling of DNA fragments is random, which may not be true when data were generated by different sequencing methods 1 kb–2.9 Mb 91
window-size (1947 bp) gives a specificity of 95.4%
overall specificity between 91.7–99.9% and sensitivity between 72.2–96.5%
MoDIL Clone-end sequencing data Using expectation-maximization algorithm and appropriate Bayesian priors Detects small indels in the range of 20–50 bp Designed for detecting small indels 20–50 bp 47
BreakDancer Mate-pair genomic sequencing Infers CNVs based on discordant mate pairs that have larger outer-distance deviations than a fixed threshhold Predicts a wide variety of structural variants, including insertions, deletions, inversions and translocations Detections for inversions are limited by library insert size 10 bp–1 Mb 9
BreakDancerMini, focuses on detecting small indels (typically 10–100 bp)
Overall high and consistent specificity and sensitivity
CNVer Mated short reads sequencing data Uses a computational framework, donor graph, for CNV calling based on both pair-end-mapping and depth of coverage Uses both mate pair mapping and depth-of-coverage information jointly to achieve better accuracy and sensitivity of detecting CNVs flanked by segmental duplications Cannot identify the locations where the duplicated sequence is inserted Near perfect resolution in breakpoints identification 60
The insert size does not limit the size of the variants that can be detected Inability to identify novel insertions
SplazerS Pair-end and single-end sequencing data More sensitive alignment method for pre_x and suf_x matches, allowing for mismatches and small gaps Detects medium-sized insertions and long deletions with precise break points Shows highest sensitivity, especially in variant-rich regions >10 bp, with exact breakpoints identification 23
Robust in the presence of sequencing errors as well as alignment errors Applicable to anchored paired-end as well as unanchored and single-end data
Can be used on reads of variable lengths Is not constrained to short read lengths
Ability to analyze complex structural variations
SVMerge Single, pair-end sequencing data Uses a collection of SV callers, which use paired-end mapping, split-mapping, clusters of one-end-mapped reads, read-depth and targeted insertion calling Integrates calls from several existing structural variant callers Users need to provide standard Binary Alignment Map files; CNV calls in highly repetitive regions in the genome are removed for quality control, difficulty in detecting true insertions >100 bp 89
Enhanced structural variant detection
Breakpoint refinement using local de novo assembly
A lower FDR
New SV calling methods may be incorporated into the analysis pipeline
AGE Nucleotide or protein sequences Alignment with gap excision simultaneously aligning the 5′ and 3′ ends of two given sequences and introducing a “large-gap jump” between the local end alignments to maximize the total alignment score Defines the exact location of break points of SVs by performing precise local alignment at the flanking ends Due to computational scalability, this algorithm is most practical at aligning sequence with only one deletion, insertion or inversion 50 bp–1 Mb 1
Can be applied to tandem duplications, inversions and complex events
Can be universally applied in various biological studies relying on alignment
Can be applied to the alignment of protein sequences
SRiC Split-read, sequencing data Built upon BLAT, which maps sequence reads to reference genome with gapped alignment Ability to pinpoint the exact breakpoints Requires reference genome, with nonsplit reads, SRiC becomes unnecessarily time-consuming 1 bp–700 kb 96
Reveal the actual sequence content of insertions and cover the whole size spectrum of deletions
Can achieve 70-80% call accuracy
inGAP-sv Paired end mapping data (PEM) Uses both pair-end mapping and coverage of depth strategies to identify different types of SVs Can detect many more large insertions and complex variants with lower FDR Failed to detect some small indels (<50bp) >50 bp 70
Users can identify, visualize, annotate and manually edit SVs
Can detect 58-80% of deletions in gold-standard sets in individual NA12878
CNVnator Single-end and paried-end sequencing data Statistical analysis of mapping density based on mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) assumes uniformity of coverage across the genome High sensitivity (86–96%) Misses CNVs created by retrotransposable elements from a few hundred to mega bases 2
Low FDR (3–20%)
High genotyping accuracy (93–95%)
High resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage)
HMM Short-read sequencing data Uses an HMM by integrating both depth of coverage and mate-pair relationship to calculate emission probability Detects small deletions (200–2,000 bp) with low coverage (<10x) Hemizygous deletions are not modeled 200–2,000 bp 77
Exceeds 80% discovery rate at ≥3× depth of coverage for medium-size (400–800 bp) CNVs
BIC-seq Short-read sequencing data Read-depth based Bayesian information criteria algorithm to detect CNVs based on minimizing the Bayesian information criterion Detects small CNVs (10 bp resolution) with high sensibility and true positive rate Harder to detect copy gains than loss 40 bp–5.7 Mb 90
Performs well for detecting large deletions (e.g., one-copy loss of size >50 kb)
80% of the calling validared by qPCR