Skip to main content
. 2019 Feb;16(1):189–204. doi: 10.20892/j.issn.2095-3941.2018.0142

2.

A brief description of the procedure for clinical tumor NGS testing

Step Description Tools and database Output
Base calling and duplicate removal Base calling and duplicate removal, also known as initial analysis Sequencing platform configuration software FASTQ format
Primer removal Primer sequences for amplicon sequencing must be removed from the reads CutAdapt, BWA, etc. FASTQ or BAM format
Adaptor removal Remove the adaptor sequences from the end of reads. It may interfere with the alignment and cause false-positive/false-negative variant calling if not being trimmed CutAdapt, BWA, Trimmomatic, SeqPrep, etc. FASTQ or BAM format
Low-quality base removal Low-quality bases may also interfere with the alignment and cause false results. These bases should usually be trimmed from the ends of read CutAdapt, BWA, Trimmomatic, SeqPrep, etc. FASTQ or BAM format
Alignment In the alignment step, paired-/single-end reads are aligned to the reference genome. SNVs and small indels could be recognized in this step BWA, Novalign, Stampy , SOAP2, LifeScope, Bowtie, etc. BAM format
Duplicate removal (optional) Duplicates can be introduced by PCR amplifications in the library construction and sequencing steps. Implausible duplicates in the original DNA decrease the accuracy of the calling and should be removed. Probe hybridization capture sequencing generates fewer duplicates, because DNA is randomly fragmented during library construction. Amplicon sequencing does not require deduplication if there are no allele barcodes, and requires if there are Picard Mark Duplicates, SAMtools, etc. BAM format
Indel realignment (optional) Misalignment is usually seen around indels which can cause false results, especially at the beginning or end of the reads. Local realignment method can determine these locations, minimize this error, and increase accuracy GATK RealignerTargetCreator and IndelRealigner, SRMA, etc. BAM format
Base quality score recalibration (optional) The base quality score could be recalibrated after the alignment/realignment to decrease the false-positive rate GATK BaseRecalibrator and PrintReads, ReQON, etc. BAM format
Variant calling Variant calling refers to the detection and description of variations (including SNVs and small indels) based on differences between sequencing data and reference genomes GATK UnifiedGenotyper, GATK HaplotypeCaller, SAMtools, MuTect, Varscan, Platypus, etc. VCF format
Annotation The variant interpretation relies on detailed annotation. The basic annotation includes gene name, gene structure areas (exon, splicing region, intron, intragenic region, etc.), and coding information. SNP information, pathogenicity, and other references could also be included ANNOVAR, SnpEff, , Cartagenia Bench Lab NGS, dbSNP, 1000 Genomes, ESP6500, SIFT, PhyloP, MutationTaster, COSMIC, OMIM, ClinVar, HGMD, etc. CSV, TSV, TXT, Excel, etc.
Filtering Disease related variants could be identified by strict filtering large amount of annotated variant calling results. Typical filtering criteria removes low-quality variants, non-coding regions (eg, intron and intragenic region), synonymous SNVs, and known low-frequency SNPs in healthy populations. Labs should set up an internal database to analyze the false positives that often occur on their own platforms and perform rigorous filtering of these false positives Cartagenia Bench Lab NGS, SnpSift, etc. CSV, TSV, TXT, Excel, database, etc.