TABLE 8.
Testing data | One-step (%) | Two-step (%) | Ta (%) | FW (%) | DH (%) | Ta c (%) | FW c (%) | DH c (%) |
---|---|---|---|---|---|---|---|---|
Fθ | ||||||||
sel(500, 0.001) | 99.8 | 99.8 | 26.6 | 79.0 | 41.6 | 73.8 | 71.8 | 67.8 |
sel(500, 0.2) | 98.4 | 98.4 | 26.8 | 23.2 | 28.0 | 66.4 | 12.2 | 20.4 |
sel(200, 0.001) | 93.8 | 93.8 | 11.0 | 25.8 | 21.4 | 51.0 | 52.0 | 50.0 |
sel(200, 0.2) | 87.6 | 87.6 | 11.6 | 8.4 | 12.0 | 42.6 | 11.2 | 17.0 |
bot random | 97.0 | 3.8 | 51.2 | 62.8 | 26.2 | 52.4 | 23.2 | 12.6 |
FK | ||||||||
sel(500, 0.001) | 98.4 | 76.0 | 26.2 | 79.8 | 41.6 | 72.6 | 72.0 | 69.8 |
sel(500, 0.2) | 96.6 | 72.0 | 29.8 | 26.4 | 37.0 | 69.4 | 9.4 | 19.0 |
sel(200, 0.001) | 86.2 | 62.2 | 9.8 | 27.2 | 19.8 | 51.4 | 54.0 | 48.8 |
sel(200, 0.2) | 75.8 | 48.6 | 13.2 | 8.2 | 13.2 | 42.6 | 7.8 | 15.2 |
bot random | 55.8 | 3 | 52.8 | 62.4 | 26.4 | 62.4 | 24.0 | 12.0 |
The percentage of times selection was predicted for testing samples that were simulated under different selective and bottleneck scenarios is shown. We compared the following approaches that use summary statistics: Ta, Tajima's D; FW, Fay and Wu's H; DH, DH test; c, center. First, these statistics were computed only once across the whole 40-kb region, which may lead to a weakened selective signal according to an averaging effect. Since the signal in the center of the region will usually be the strongest, we then tried to use only the 4-kb center section of the region to compute the statistics. The results can be found under Ta c, FW c, and DH c. “One-step” and “two-step” indicate one-step boosting and two-step boosting, respectively. These results are the same as in Table 7. bot random = bot(N(0.02, 0.012), N(0.02, 0.012)). The type I error probability of boosting (both for one-step and for two-step) was adjusted to 5%, and we chose cutoff points for the other tests also according to the 5% quantile estimated from 50,000 simulated neutral samples. The samples were generated under both fixed θ (Fθ) and fixed K (FK). We can see that boosting always performed much better for distinguishing neutrality from selection, although the difference between the methods was reduced slightly when Tajima's D, Fay and Wu's H, and the DH test were calculated only from the center section of the region. Under the more difficult situations the advantage of boosting is particularly visible. Note that one-step boosting predicted most of the bottleneck samples as selection whereas the DH test did not. The application of two-step boosting, however, solved this problem.