Table 1.
Six informative genotype combinations to estimate heterogeneity/contamination ratioa
Combination | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
S1 genotype | AA | AA | AA | AT | AT | AT |
S2 genotype | TT | AT | TG | GG | AG | GC |
S2 ratio (SNP heterogeneity ratio) | T/(A + T)b | 2T/(A + T)b | (T + G)/(A + T + G) | G/(A + T + G)c | 2G/(A + T + G) | (G + C)/(A + T + G + C) |
Nucleotide frequency pattern | Large A, small T | Large A, small T | Large A, small T and G | Large A and T, small G | Large A and T, small G | Large A and T, small G and C |
aS1 is the major component and S2 is the minor/contaminating component in the mixed sample. Each combination uses specific nucleotides to represent a class of combinations; for example, the first combination denotes that both are homozygous genotypes with different nucleotides. In the formulas for calculating S2 ratio, a nucleotide denotes its count (total number of reads) in NGS sequencing data.
bCombinations 1 and 2 cannot be distinguished from observed NGS data, so 1.5T/(A + T) is used for both.
cCombinations 4 and 5 cannot be distinguished from observed NGS data, so 1.5G/(A + T + G) is used for both.