Skip to main content
. 2014 Feb 12;30(11):1508–1513. doi: 10.1093/bioinformatics/btu071

Table 1.

Computational approach to genome analysis using the massively parallel Cray XE6 supercomputer

Step Program Module/Call Input Parameters Output Number per genome Number of active cores/node Number nodes Concurrency
Extract fastq Picard 1.98 SamToFastq bam default fastq 2(#RGs) 1 1 1
Alignment BWA 0.5.9 bwa aln fastq -qtrim 15 sai file 2(#RGs) 24 2(#RGs) 24(#RGs)
Convert sai to sam BWA tpx 0.5.9 bwa sampe sai -T –X –P (BWA) sam 2(#RGs) ∼24 #RGs 24(#RGs)
Compression samtools 0.1.18 view sam -bo (samtools) bam 2(#RGs) 3 #RGs 1(#RGs)
Split samtools 0.1.18 view/merge bam -h (preserve readgroup) bam 25 24 1 25
Sort Picard 1.98 SortSam bam -coordinate bam 25 4 + GC ∼6 25
Mark duplicates Picard 1.98 Mark bam -REMOVE_DUPLICATES bam 25 4 + GC ∼6 25 (each with 3 threads)
Duplicates     false
Reorder Picard 1.98 ReorderSam bam default bam 25 4 + GC ∼6 25 (GCa)
Identify indel realignment targets GATK 2.7-1 GATK bam -T RealignerTargetCreator intervals 25 4 + GC ∼6 25 (GC)
-L <chromosome ID>
Realign targeted intervals GATK 2.7-1 GATK bam -T IndelRealigner bam 25 4 + GC ∼6 25 (GC)
-targetIntervals <intervals>
-LOD 5
-L <chromosome ID>
Base recalibrator GATK 2.7-1 GATK bam -T BaseRecalibrator csv 25 4 + GC ∼6 25 (GC)
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov CycleCovariate
-cov ContextCovariate
-knownSites dbSNP_135
Print reads GATK 2.7-1 GATK bam -T PrintReads bam 25 4 + GC ∼6 25 (GC)
-baq RECALCULATE
-baqGOP 30
-BQSR <csv file>
Call variants GATK 2.7-1 GATK bam -T Haplotype Caller vcf 25
-L <chromosome ID>
-D dbSNP_135.hg19.vcf
-A AlleleBalance
-A Coverage
-A HomopolymerRun
-A FisherStrand
-A HaplotypeScore
-A HardyWeinberg
-A ReadPosRankSumTest
-A QualByDepth
-A MappingQualityRankSumTest
-A VariantType
-A MappingQualityZero
-minPruning 10
-stand_call_conv 30.0
-stand_emit_conv 10.0
Filter variants GATK 2.7-1 GATK bam -T VariantFiltration vcf 25 4 + GC ∼6 25 (GC)
-L <chromosome ID>
–clusterWindowSize 10
–filterExpression “(AB?: 0) > 0.75 || -QUAL < 30.0 || DP > 360 || SB > −0.1 || MQ0 ≥ 10”
Annotate variants snpEff 2.0.5 snpEff vcf default vcf 25 4 ∼6

Note: RG = readgroup. aGC = 2 threads used for java Garbage Collection.