Skip to main content
. 2017 Aug 22;7:9058. doi: 10.1038/s41598-017-09089-1

Table 1.

Computational steps, dependency conditions and their execution time in the NGS workflow.

Step ID Job step name Application name Application module Input file name Application parameters Output file name Recommended no. of cores Job dependency condition % of execution time
S1 Map to Reference BWA KIT Seqtk, trimadap, SamTools, bwa mem, samblaster *.fastq.gz Default *.bam N/M 6.5%
S2 Build a standard BAM INDEX sambamba Index *.bam Default *.bam.bai 1 S1 0.5%
S3 Realigner TargetCreator GATK Target creator *.aln.bam T RealignerTargetCreator, −R hs37d5.fa, −known Mills_and_1000G_ gold_standard.indels.vcf.gz, *.realigner. intervals 4 or 8 S2 3%
S4 Indel Realigner GATK INDEL *aln.bam, *.realigner. intervals T IndelRealigner, −R hs37d5.fa, −known Mills_and_1000G _gold_standard.indels.vcf.gz, −knownIntervals *.realigned. bam 1 S3 2%
S5 Base Recalibrator GATK Base Recalibration *.realigned. bam T BaseRecalibrator, −R hs37d5.fa, −knownSites dbsnp_138.vcf.gz *.recal.table N/M S4 13%
S6 Print Reads GATK Analyse the Reads *.realigned. bam, *. recal.table T PrintReads, −R hs37d5.fa, −BQSR *.realigned. recal.bam 2 or 4 S5 25%
S7 Haplotype Caller GATK Haplotype *.realigned. recal.bam T HaplotypeCaller, −R hs37d5.fa, −pairHMM VECTOR_LOGLESS_CACHING, − −emitRef Confidence GVCF, − −variant _index_type LINEAR, − −variant_index_parameter 128000, − −dbsnp Mills_and_1000G_ gold_standard.indels.vcf.gz *.raw.snps. indels.g.vcf 4 or 8 S6 43%
S8 Variant Recalibrator GATK Variant recalibration *.realigned. bam, *.recal.table −T BaseRecalibrator, −R hs37d5.fa, −known Mills_and_1000G_ gold_standard.indels.vcf.gz, −BQSR *.after_recal. table N S5 6%
S9 Analyze Covariates GATK Analyse the variant *.recal.table, *.after_ recal. table −T AnalyzeCovariates −before −after *.recal_plots. pdf 1 S8 1%

Where, N is the total number of cores and M is the number of CPUs.