Table 1.
Step | Program | Module/Call | Input | Parameters | Output | Number per genome | Number of active cores/node | Number nodes | Concurrency |
---|---|---|---|---|---|---|---|---|---|
Extract fastq | Picard 1.98 | SamToFastq | bam | default | fastq | 2(#RGs) | 1 | 1 | 1 |
Alignment | BWA 0.5.9 | bwa aln | fastq | -qtrim 15 | sai file | 2(#RGs) | 24 | 2(#RGs) | 24(#RGs) |
Convert sai to sam | BWA tpx 0.5.9 | bwa sampe | sai | -T –X –P (BWA) | sam | 2(#RGs) | ∼24 | #RGs | 24(#RGs) |
Compression | samtools 0.1.18 | view | sam | -bo (samtools) | bam | 2(#RGs) | 3 | #RGs | 1(#RGs) |
Split | samtools 0.1.18 | view/merge | bam | -h (preserve readgroup) | bam | 25 | 24 | 1 | 25 |
Sort | Picard 1.98 | SortSam | bam | -coordinate | bam | 25 | 4 + GC | ∼6 | 25 |
Mark duplicates | Picard 1.98 | Mark | bam | -REMOVE_DUPLICATES | bam | 25 | 4 + GC | ∼6 | 25 (each with 3 threads) |
Duplicates | false | ||||||||
Reorder | Picard 1.98 | ReorderSam | bam | default | bam | 25 | 4 + GC | ∼6 | 25 (GCa) |
Identify indel realignment targets | GATK 2.7-1 | GATK | bam | -T RealignerTargetCreator | intervals | 25 | 4 + GC | ∼6 | 25 (GC) |
-L <chromosome ID> | |||||||||
Realign targeted intervals | GATK 2.7-1 | GATK | bam | -T IndelRealigner | bam | 25 | 4 + GC | ∼6 | 25 (GC) |
-targetIntervals <intervals> | |||||||||
-LOD 5 | |||||||||
-L <chromosome ID> | |||||||||
Base recalibrator | GATK 2.7-1 | GATK | bam | -T BaseRecalibrator | csv | 25 | 4 + GC | ∼6 | 25 (GC) |
-cov ReadGroupCovariate | |||||||||
-cov QualityScoreCovariate | |||||||||
-cov CycleCovariate | |||||||||
-cov ContextCovariate | |||||||||
-knownSites dbSNP_135 | |||||||||
Print reads | GATK 2.7-1 | GATK | bam | -T PrintReads | bam | 25 | 4 + GC | ∼6 | 25 (GC) |
-baq RECALCULATE | |||||||||
-baqGOP 30 | |||||||||
-BQSR <csv file> | |||||||||
Call variants | GATK 2.7-1 | GATK | bam | -T Haplotype Caller | vcf | 25 | |||
-L <chromosome ID> | |||||||||
-D dbSNP_135.hg19.vcf | |||||||||
-A AlleleBalance | |||||||||
-A Coverage | |||||||||
-A HomopolymerRun | |||||||||
-A FisherStrand | |||||||||
-A HaplotypeScore | |||||||||
-A HardyWeinberg | |||||||||
-A ReadPosRankSumTest | |||||||||
-A QualByDepth | |||||||||
-A MappingQualityRankSumTest | |||||||||
-A VariantType | |||||||||
-A MappingQualityZero | |||||||||
-minPruning 10 | |||||||||
-stand_call_conv 30.0 | |||||||||
-stand_emit_conv 10.0 | |||||||||
Filter variants | GATK 2.7-1 | GATK | bam | -T VariantFiltration | vcf | 25 | 4 + GC | ∼6 | 25 (GC) |
-L <chromosome ID> | |||||||||
–clusterWindowSize 10 | |||||||||
–filterExpression “(AB?: 0) > 0.75 || -QUAL < 30.0 || DP > 360 || SB > −0.1 || MQ0 ≥ 10” | |||||||||
Annotate variants | snpEff 2.0.5 | snpEff | vcf | default | vcf | 25 | 4 | ∼6 |
Note: RG = readgroup. aGC = 2 threads used for java Garbage Collection.