Skip to main content
. 2014 Oct 14;24(11):1311–1327. doi: 10.1038/cr.2014.131

Table 1. Error filters used in the computational pipeline.

Filter name Definition
Repetitive regions We rejected nucleotide positions (“sites”) located in annotated repetitive DNA elements and self-alignment regions with similarity score > 80.
Homopolymers We rejected sites located in or near homopolymers which were defined as four or more continuous identical nucleotides, and their flanking regions which were defined as 2 bp from homopolymers shorter than 6 nt or 3 bp from longer homopolymers.
Base-calling error We rejected sites for which the minor allele could be explained by random base-calling errors according to LoFreq29.
Extreme depth We rejected sites with sequencing depth that was either too low (< 25) or too high (> 150), compared to the average sequencing depth of ∼80.
Misaligned reads We rejected sites where > 50% of the reads supporting the major or minor alleles had high risk of being misaligned, defined as when the BWA and BLAT alignments were inconsistent or when the site fell within 15 bp of the start or end of the aligned read or within 5 bp from a gap in the alignment.
Strand bias We rejected sites where the majority of reads supporting the alternative allele were found in only one strand direction. The Fisher's exact test was performed to compare the ratio of the reads supporting the reference and alternative alleles between two strand directions, and sites with a P-value < 0.05 were rejected.
Clustered sites We rejected sites located in or within 20 kb from the genomic regions clustered with three or more sites with minor allele fractions between 10% and 35% and maximal distance between two adjacent sites < 10 kb.
Complete linkage We rejected sites for which one allele showed complete coincidence with an adjacent polymorphic site within the same read-pair. The Fisher's exact test was performed by counting the number of read-pairs supporting the four types of allele combinations, and sites with a P-value < 0.01 and no more than one disagreeing read-pair were filtered.
Within-read position We rejected sites where the majority of sites supporting the alternative alleles were clustered at one end of the reads. The Wilcoxon rank-sum test was performed to compare the positions of the site along the reads between those supporting the reference and alternative alleles, and sites with a P-value < 0.05 were rejected.
Observed in common We rejected sites whose allele fractions showed large deviations from germline expectations in two or more individuals.