Table 1.
Step ID | Job step name | Application name | Application module | Input file name | Application parameters | Output file name | Recommended no. of cores | Job dependency condition | % of execution time |
---|---|---|---|---|---|---|---|---|---|
S1 | Map to Reference | BWA KIT | Seqtk, trimadap, SamTools, bwa mem, samblaster | *.fastq.gz | Default | *.bam | N/M | — | 6.5% |
S2 | Build a standard BAM INDEX | sambamba | Index | *.bam | Default | *.bam.bai | 1 | S1 | 0.5% |
S3 | Realigner TargetCreator | GATK | Target creator | *.aln.bam | −T RealignerTargetCreator, −R hs37d5.fa, −known Mills_and_1000G_ gold_standard.indels.vcf.gz, | *.realigner. intervals | 4 or 8 | S2 | 3% |
S4 | Indel Realigner | GATK | INDEL | *aln.bam, *.realigner. intervals | −T IndelRealigner, −R hs37d5.fa, −known Mills_and_1000G _gold_standard.indels.vcf.gz, −knownIntervals | *.realigned. bam | 1 | S3 | 2% |
S5 | Base Recalibrator | GATK | Base Recalibration | *.realigned. bam | −T BaseRecalibrator, −R hs37d5.fa, −knownSites dbsnp_138.vcf.gz | *.recal.table | N/M | S4 | 13% |
S6 | Print Reads | GATK | Analyse the Reads | *.realigned. bam, *. recal.table | −T PrintReads, −R hs37d5.fa, −BQSR | *.realigned. recal.bam | 2 or 4 | S5 | 25% |
S7 | Haplotype Caller | GATK | Haplotype | *.realigned. recal.bam | −T HaplotypeCaller, −R hs37d5.fa, −pairHMM VECTOR_LOGLESS_CACHING, − −emitRef Confidence GVCF, − −variant _index_type LINEAR, − −variant_index_parameter 128000, − −dbsnp Mills_and_1000G_ gold_standard.indels.vcf.gz | *.raw.snps. indels.g.vcf | 4 or 8 | S6 | 43% |
S8 | Variant Recalibrator | GATK | Variant recalibration | *.realigned. bam, *.recal.table | −T BaseRecalibrator, −R hs37d5.fa, −known Mills_and_1000G_ gold_standard.indels.vcf.gz, −BQSR | *.after_recal. table | N | S5 | 6% |
S9 | Analyze Covariates | GATK | Analyse the variant | *.recal.table, *.after_ recal. table | −T AnalyzeCovariates −before −after | *.recal_plots. pdf | 1 | S8 | 1% |
Where, N is the total number of cores and M is the number of CPUs.