Table 5. Genotype inference task.
We evaluated a single methods (PCA with clustering) on the genotype inference task (which inversion genotype does a sample have?) using two benchmark test cases (positive from a single population and positive from multiple populations). Note that the two association-testing methods are not able to infer genotypes. For each chromosome arm used, we indicated known inversions, how many genotypes are present in the data set, and a measure of balanced accuracy calculated from the cluster predictions. The D. melanogaster 3R chromosome arm has three mutually-exclusive inversions, which we list separately.
Test Case | Chrom. | Inversion | Present Genotypes | Clusters | Balanced Accuracy |
---|---|---|---|---|---|
Single | D. mel. 2L | In(2L)t | 3 | 3 | 93.3% |
Single | D. mel. 2R | In(2R)NS | 3 | 3 | 94.4% |
Single | D. mel. 3R | In(3R)Mo | 3 | 60.7% | |
Single | In(3R)p | 3 | 43.3% | ||
Single | In(3R)K | 3 | 55.0% | ||
Multiple | 150 An. gam. and col. 2L | 2La | 2 | 3 | 66.7% |
Multiple | 81 An. gam. 2L | 2La | 2 | 2 | 100.0% |
Multiple | 34 An. gam. and col. 2L | 2La | 3 | 4 | 100.0% |
We evaluated clustering in terms of accuracy of inferring inversion genotypes. Inversion genotypes were retrieved from the original papers describing the data [17, 37–39]. Association of the known genotypes with the cluster labels was measured using balanced accuracy. *Could not resolve multiple, mutually-exclusive inversions