Skip to main content
. Author manuscript; available in PMC: 2024 Sep 17.
Published in final edited form as: Nat Protoc. 2023 Nov 23;19(2):487–516. doi: 10.1038/s41596-023-00914-8

Table 4 ∣.

Substeps in data analysis pipeline

Step Software (main option) Note
82.1 FastQC QC of raw sequencing read
82.2 GATK FastqToSam Convert fastq files to unmapped bam file
82.3 GATK MarkIlluminaAdapters Mark sequencing adapters in unmapped bam file
82.4 GATK SamToFastq; bwa; samtools Sequence alignment
82.5 GATK MergeBamAlignment Merge mapped bam file with unmapped bam file
82.6 GATK SortSam; GATK SetNmMdAndUqTags Sort BAM file by coordinate order and fix tag values for NM and UQ
82.7 GATK MarkDuplicates Mark duplicate reads to avoid counting non-independent observations
82.8 GATK BaseRecalibrator Generate base quality score recalibration model
82.9 GATK ApplyBQSR Apply base quality score recalibration model
82.10 GATK SortSam; GATK BuildBamIndex; md5sum Sort by coordinate, index and calculate md5
82.11 samtools Calculate depth and coverage
82.12 GATK HaplotypeCaller Call germline heterozygous SNVs and INDELs (only for bulk DNA)
85.1 SCcaller Run SCcaller on all candidate mutations
85.2 n.a Filter out low-quality and germline calls and keep only somatic mutations
85.3 bedtools Calculate sensitivity
85.4 samtools Calculate coverage of both bulk and single cell
85.5 n.a Estimate SNV and INDEL burdens per single cell