Skip to main content
. 2015 Sep 7;8:421. doi: 10.1186/s13104-015-1385-4

Table 7.

GATK steps

Step Command Description
Input BAM java -jar MarkDuplicates.jar INPUT = your_bam_file OUTPUT = step1.bam METRICS_FILE = Fmetrics_step1.bam ASSUME_SORTED = true Marking duplicates
Step 1. java -jar AddOrReplaceReadGroups.jar INPUT= step1.bam OUTPUT = step2.bam RGID= Read_Group ID RGLB = Read_Group_Library RGPL= platform RGPU = platform_unit RGSM= sample_name RGDS = Read_Group_Description RGDT = Read_Group_Run_Date Replacing all read groups in the INPUT file with a new read group
Step 2. java -jar ReorderSam.jar INPUT =  step2.bam OUTPUT =  step3.bam REFERENCE = ucsc.hg19.fasta Reorder reads in BAM file to match the contig ordering in a provided reference file
Step 3. java -jar SortSam.jar INPUT = step3.bam OUTPUT = step4.bam SORT_ORDER = coordinate Sorting the aligned reads by coordinate order
Step 4. java -jar BuildBamIndex.jar INPUT= step4.bam Generating BAM index
java -jar GenomeAnalysisTK.jar -T RealignerTargetCreator -R ucsc.hg19.fasta -S STRICT -I step4.bam -o indels.intervals -allowPotentiallyMisencodedQuals Indel Realignment I (Creating a target list of intervals to be realigned)
java -jar GenomeAnalysisTK.jar -T IndelRealigner -R ucsc.hg19.fasta -S STRICT -I step4.bam -targetIntervals indels.intervals -o step5.bam -known Mills_and_1000G_gold_standard.indels.hg19.vcf -known 1000G_phase1.indels.hg19.vcf -allowPotentiallyMisencodedQuals Indel Realignment II (Performing realignment of the target intervals)
Step 5. java -jar SortSam.jar INPUT = step5.bam OUTPUT = step6.bam SORT_ORDER = coordinate Sorting the aligned reads by coordinate order
Step 6. java -jar BuildBamIndex.jar INPUT = step6.bam Generating BAM index
java -jar GenomeAnalysisTK.jar -T BaseRecalibrator -I step6.bam -R ucsc.hg19.fasta -S STRICT -knownSites dbsnp_138.hg19.vcf -o recal.grp –covariate QualityScoreCovariate –covariate ReadGroupCovariate –covariate ContextCovariate –covariate CycleCovariate –solid_nocall_strategy PURGE_READ –solid_recal_mode SET_Q_ZERO_BASE_N -allowPotentiallyMisencodedQuals Base quality score recalibration I (data-driven adjustment of base quality scores)
java -jar GenomeAnalysisTK.jar -R ucsc.hg19.fasta -S STRICT -I step6.bam -T PrintReads -o step7.bam -BQSR recal.grp -allowPotentiallyMisencodedQuals Base quality score recalibration II (Applying the recalibration to sequence data)
Step 7. java -jar GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T HaplotypeCaller -I step7.bam -S STRICT –dbsnp dbsnp_138.hg19.vcf -minPruning 3 -o step8.vcf -stand_call_conf 50 -stand_emit_conf 30 Calling variants in sequence data
Step 8. java -jar GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T SelectVariants –variant step8.vcf -o step9_SNP.vcf -selectType SNP -S STRICT Select SNPs from the input file
Step 9. java -jar GenomeAnalysisTK.jar -T VariantRecalibrator –input step9_SNP.vcf -R ucsc.hg19.fasta -S STRICT -resource:1000G,known = false,training = true,truth = false,prior = 10 1000G_phase1.snps.high_confidence.hg19.vcf -resource:hapmap, known =f alse, training = true, truth = true, prior = 15.0 hapmap_3.3.hg19.vcf -resource:omni, known=false, training = true, truth = true, prior = 12.0 1000G_omni2.5.hg19.vcf -resource:dbsnp, known = true, training = false, truth = false, prior = 2.0 dbsnp_138.hg19.vcf -an QD -an MQRankSum -an ReadPosRankSum -an FS -an MQ –maxGaussians 4 -mode SNP -recalFile recal -tranchesFile tranches Building SNP recalibration model
java -jar GenomeAnalysisTK.jar -R ucsc.hg19.fasta -T ApplyRecalibration -S STRICT –input step9_SNP.vcf -ts_filter_level 99.5 -mode SNP -tranchesFile tranches -recalFile recal -o step10_final.vcf Applying SNP recalibration model