. 2012 Nov 6;45(1):1–16. doi: 10.1152/physiolgenomics.00082.2012

Table 3.

Algorithms developed for CNV detection using direct genomic sequencing data

Platform	Data Type	Algorithm	Advantages	Disadvantages	Size Range of CNVs	Ref. List No.
Control-FREEC	Single-end, mate-pair or paired-end genomic sequencing	Uses a sliding window approach to calculate read count (RC) in nonoverlapping windows (raw copy number polymorphism). Uses LASSO-based algorithm for the segmentation	Designed for cancer deep sequencing when no control is available and/or the genome is polyploidy	Designed mainly for Illumina single-end, mate-pair or paired-end sequencing	NA	7, 33
Control-FREEC	Single-end, mate-pair or paired-end genomic sequencing		Able to identify the genotype status despite contamination of the tumor sample by normal cells (estimated percent of tumor cells was 60%)		NA	7, 33
ExomeCNV	Exome sequencing data	Uses normalized depth-of-coverage and B allele frequency to infer CNV and LOH	Does not assume random, unbiased distribution of sequence reads, and continuity of search space	Same level of consistency was not observed when single-end data were compared with paired-end data	120 bp–240 Mb in size	74
			Ability to detect copy-number variation (CNV) and loss of heterozygosity (LOH) from exome sequencing data	Resolution of CNV detection with ExomeCNV is limited largely by the probe design
			For detecting deletions, 95% power is achieved for segments of size 500 bp or more
SeqGene	Exome and transcriptome sequencing data	Uses a CBS for CNV detection	Detects CNV	Reference (such as a normal DNA sample or the average of a group of pooled samples) is needed for absolute copy number calls	NA	20, 83
SeqGene	Exome and transcriptome sequencing data	Uses a CBS for CNV detection	Supports gene expression quantification, mutation and coverage visualization and pathway analysis workflows		NA	20, 83
mrsFast	Short-read sequencing data	Simple cache-oblivious, all-to-all list comparison algorithm	Rapidly finds all mapping locations of a collection of short reads	Mainly a mapping algorithm	NA	31
mrsFast	Short-read sequencing data		Substantially faster and more accurate	Requires another algorithm, variation Hunter, to find CNVs	NA	31
CNV-seq	Shotgun sequencing data	Derived aCGH statistical analysis	Method can be applied to relatively low sequence coverage with good specificity and sensitivity	Assumes sampling of DNA fragments is random, which may not be true when data were generated by different sequencing methods	1 kb–2.9 Mb	91
			window-size (1947 bp) gives a specificity of 95.4%
			overall specificity between 91.7–99.9% and sensitivity between 72.2–96.5%
MoDIL	Clone-end sequencing data	Using expectation-maximization algorithm and appropriate Bayesian priors	Detects small indels in the range of 20–50 bp	Designed for detecting small indels	20–50 bp	47
BreakDancer	Mate-pair genomic sequencing	Infers CNVs based on discordant mate pairs that have larger outer-distance deviations than a fixed threshhold	Predicts a wide variety of structural variants, including insertions, deletions, inversions and translocations	Detections for inversions are limited by library insert size	10 bp–1 Mb	9
			BreakDancerMini, focuses on detecting small indels (typically 10–100 bp)
			Overall high and consistent specificity and sensitivity
CNVer	Mated short reads sequencing data	Uses a computational framework, donor graph, for CNV calling based on both pair-end-mapping and depth of coverage	Uses both mate pair mapping and depth-of-coverage information jointly to achieve better accuracy and sensitivity of detecting CNVs flanked by segmental duplications	Cannot identify the locations where the duplicated sequence is inserted	Near perfect resolution in breakpoints identification	60
CNVer	Mated short reads sequencing data		The insert size does not limit the size of the variants that can be detected	Inability to identify novel insertions	Near perfect resolution in breakpoints identification	60
SplazerS	Pair-end and single-end sequencing data	More sensitive alignment method for pre_x and suf_x matches, allowing for mismatches and small gaps	Detects medium-sized insertions and long deletions with precise break points	Shows highest sensitivity, especially in variant-rich regions	>10 bp, with exact breakpoints identification	23
			Robust in the presence of sequencing errors as well as alignment errors	Applicable to anchored paired-end as well as unanchored and single-end data
			Can be used on reads of variable lengths	Is not constrained to short read lengths
			Ability to analyze complex structural variations	Is not constrained to short read lengths
SVMerge	Single, pair-end sequencing data	Uses a collection of SV callers, which use paired-end mapping, split-mapping, clusters of one-end-mapped reads, read-depth and targeted insertion calling	Integrates calls from several existing structural variant callers	Users need to provide standard Binary Alignment Map files; CNV calls in highly repetitive regions in the genome are removed for quality control, difficulty in detecting true insertions	>100 bp	89
			Enhanced structural variant detection
			Breakpoint refinement using local de novo assembly
			A lower FDR
			New SV calling methods may be incorporated into the analysis pipeline
AGE	Nucleotide or protein sequences	Alignment with gap excision simultaneously aligning the 5′ and 3′ ends of two given sequences and introducing a “large-gap jump” between the local end alignments to maximize the total alignment score	Defines the exact location of break points of SVs by performing precise local alignment at the flanking ends	Due to computational scalability, this algorithm is most practical at aligning sequence with only one deletion, insertion or inversion	50 bp–1 Mb	1
			Can be applied to tandem duplications, inversions and complex events
			Can be universally applied in various biological studies relying on alignment
			Can be applied to the alignment of protein sequences
SRiC	Split-read, sequencing data	Built upon BLAT, which maps sequence reads to reference genome with gapped alignment	Ability to pinpoint the exact breakpoints	Requires reference genome, with nonsplit reads, SRiC becomes unnecessarily time-consuming	1 bp–700 kb	96
			Reveal the actual sequence content of insertions and cover the whole size spectrum of deletions
			Can achieve 70-80% call accuracy
inGAP-sv	Paired end mapping data (PEM)	Uses both pair-end mapping and coverage of depth strategies to identify different types of SVs	Can detect many more large insertions and complex variants with lower FDR	Failed to detect some small indels (<50bp)	>50 bp	70
			Users can identify, visualize, annotate and manually edit SVs
			Can detect 58-80% of deletions in gold-standard sets in individual NA12878
CNVnator	Single-end and paried-end sequencing data	Statistical analysis of mapping density based on mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) assumes uniformity of coverage across the genome	High sensitivity (86–96%)	Misses CNVs created by retrotransposable elements	from a few hundred to mega bases	2
			Low FDR (3–20%)
			High genotyping accuracy (93–95%)
			High resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage)
HMM	Short-read sequencing data	Uses an HMM by integrating both depth of coverage and mate-pair relationship to calculate emission probability	Detects small deletions (200–2,000 bp) with low coverage (<10x)	Hemizygous deletions are not modeled	200–2,000 bp	77
HMM	Short-read sequencing data		Exceeds 80% discovery rate at ≥3× depth of coverage for medium-size (400–800 bp) CNVs	Hemizygous deletions are not modeled	200–2,000 bp	77
BIC-seq	Short-read sequencing data	Read-depth based Bayesian information criteria algorithm to detect CNVs based on minimizing the Bayesian information criterion	Detects small CNVs (10 bp resolution) with high sensibility and true positive rate	Harder to detect copy gains than loss	40 bp–5.7 Mb	90
			Performs well for detecting large deletions (e.g., one-copy loss of size >50 kb)
			80% of the calling validared by qPCR