Table 3.
Algorithms developed for CNV detection using direct genomic sequencing data
Platform | Data Type | Algorithm | Advantages | Disadvantages | Size Range of CNVs | Ref. List No. |
---|---|---|---|---|---|---|
Control-FREEC | Single-end, mate-pair or paired-end genomic sequencing | Uses a sliding window approach to calculate read count (RC) in nonoverlapping windows (raw copy number polymorphism). Uses LASSO-based algorithm for the segmentation | Designed for cancer deep sequencing when no control is available and/or the genome is polyploidy | Designed mainly for Illumina single-end, mate-pair or paired-end sequencing | NA | 7, 33 |
Able to identify the genotype status despite contamination of the tumor sample by normal cells (estimated percent of tumor cells was 60%) | ||||||
ExomeCNV | Exome sequencing data | Uses normalized depth-of-coverage and B allele frequency to infer CNV and LOH | Does not assume random, unbiased distribution of sequence reads, and continuity of search space | Same level of consistency was not observed when single-end data were compared with paired-end data | 120 bp–240 Mb in size | 74 |
Ability to detect copy-number variation (CNV) and loss of heterozygosity (LOH) from exome sequencing data | Resolution of CNV detection with ExomeCNV is limited largely by the probe design | |||||
For detecting deletions, 95% power is achieved for segments of size 500 bp or more | ||||||
SeqGene | Exome and transcriptome sequencing data | Uses a CBS for CNV detection | Detects CNV | Reference (such as a normal DNA sample or the average of a group of pooled samples) is needed for absolute copy number calls | NA | 20, 83 |
Supports gene expression quantification, mutation and coverage visualization and pathway analysis workflows | ||||||
mrsFast | Short-read sequencing data | Simple cache-oblivious, all-to-all list comparison algorithm | Rapidly finds all mapping locations of a collection of short reads | Mainly a mapping algorithm | NA | 31 |
Substantially faster and more accurate | Requires another algorithm, variation Hunter, to find CNVs | |||||
CNV-seq | Shotgun sequencing data | Derived aCGH statistical analysis | Method can be applied to relatively low sequence coverage with good specificity and sensitivity | Assumes sampling of DNA fragments is random, which may not be true when data were generated by different sequencing methods | 1 kb–2.9 Mb | 91 |
window-size (1947 bp) gives a specificity of 95.4% | ||||||
overall specificity between 91.7–99.9% and sensitivity between 72.2–96.5% | ||||||
MoDIL | Clone-end sequencing data | Using expectation-maximization algorithm and appropriate Bayesian priors | Detects small indels in the range of 20–50 bp | Designed for detecting small indels | 20–50 bp | 47 |
BreakDancer | Mate-pair genomic sequencing | Infers CNVs based on discordant mate pairs that have larger outer-distance deviations than a fixed threshhold | Predicts a wide variety of structural variants, including insertions, deletions, inversions and translocations | Detections for inversions are limited by library insert size | 10 bp–1 Mb | 9 |
BreakDancerMini, focuses on detecting small indels (typically 10–100 bp) | ||||||
Overall high and consistent specificity and sensitivity | ||||||
CNVer | Mated short reads sequencing data | Uses a computational framework, donor graph, for CNV calling based on both pair-end-mapping and depth of coverage | Uses both mate pair mapping and depth-of-coverage information jointly to achieve better accuracy and sensitivity of detecting CNVs flanked by segmental duplications | Cannot identify the locations where the duplicated sequence is inserted | Near perfect resolution in breakpoints identification | 60 |
The insert size does not limit the size of the variants that can be detected | Inability to identify novel insertions | |||||
SplazerS | Pair-end and single-end sequencing data | More sensitive alignment method for pre_x and suf_x matches, allowing for mismatches and small gaps | Detects medium-sized insertions and long deletions with precise break points | Shows highest sensitivity, especially in variant-rich regions | >10 bp, with exact breakpoints identification | 23 |
Robust in the presence of sequencing errors as well as alignment errors | Applicable to anchored paired-end as well as unanchored and single-end data | |||||
Can be used on reads of variable lengths | Is not constrained to short read lengths | |||||
Ability to analyze complex structural variations | ||||||
SVMerge | Single, pair-end sequencing data | Uses a collection of SV callers, which use paired-end mapping, split-mapping, clusters of one-end-mapped reads, read-depth and targeted insertion calling | Integrates calls from several existing structural variant callers | Users need to provide standard Binary Alignment Map files; CNV calls in highly repetitive regions in the genome are removed for quality control, difficulty in detecting true insertions | >100 bp | 89 |
Enhanced structural variant detection | ||||||
Breakpoint refinement using local de novo assembly | ||||||
A lower FDR | ||||||
New SV calling methods may be incorporated into the analysis pipeline | ||||||
AGE | Nucleotide or protein sequences | Alignment with gap excision simultaneously aligning the 5′ and 3′ ends of two given sequences and introducing a “large-gap jump” between the local end alignments to maximize the total alignment score | Defines the exact location of break points of SVs by performing precise local alignment at the flanking ends | Due to computational scalability, this algorithm is most practical at aligning sequence with only one deletion, insertion or inversion | 50 bp–1 Mb | 1 |
Can be applied to tandem duplications, inversions and complex events | ||||||
Can be universally applied in various biological studies relying on alignment | ||||||
Can be applied to the alignment of protein sequences | ||||||
SRiC | Split-read, sequencing data | Built upon BLAT, which maps sequence reads to reference genome with gapped alignment | Ability to pinpoint the exact breakpoints | Requires reference genome, with nonsplit reads, SRiC becomes unnecessarily time-consuming | 1 bp–700 kb | 96 |
Reveal the actual sequence content of insertions and cover the whole size spectrum of deletions | ||||||
Can achieve 70-80% call accuracy | ||||||
inGAP-sv | Paired end mapping data (PEM) | Uses both pair-end mapping and coverage of depth strategies to identify different types of SVs | Can detect many more large insertions and complex variants with lower FDR | Failed to detect some small indels (<50bp) | >50 bp | 70 |
Users can identify, visualize, annotate and manually edit SVs | ||||||
Can detect 58-80% of deletions in gold-standard sets in individual NA12878 | ||||||
CNVnator | Single-end and paried-end sequencing data | Statistical analysis of mapping density based on mean-shift approach with additional refinements (multiple-bandwidth partitioning and GC correction) assumes uniformity of coverage across the genome | High sensitivity (86–96%) | Misses CNVs created by retrotransposable elements | from a few hundred to mega bases | 2 |
Low FDR (3–20%) | ||||||
High genotyping accuracy (93–95%) | ||||||
High resolution in breakpoint discovery (<200 bp in 90% of cases with high sequencing coverage) | ||||||
HMM | Short-read sequencing data | Uses an HMM by integrating both depth of coverage and mate-pair relationship to calculate emission probability | Detects small deletions (200–2,000 bp) with low coverage (<10x) | Hemizygous deletions are not modeled | 200–2,000 bp | 77 |
Exceeds 80% discovery rate at ≥3× depth of coverage for medium-size (400–800 bp) CNVs | ||||||
BIC-seq | Short-read sequencing data | Read-depth based Bayesian information criteria algorithm to detect CNVs based on minimizing the Bayesian information criterion | Detects small CNVs (10 bp resolution) with high sensibility and true positive rate | Harder to detect copy gains than loss | 40 bp–5.7 Mb | 90 |
Performs well for detecting large deletions (e.g., one-copy loss of size >50 kb) | ||||||
80% of the calling validared by qPCR |