TABLE 3.
Detection power in dependence of the sequence length
Testing samples | l = 20 kb (%) | l = 8 kb (%) | l = 4 kb (%) | l = 2 kb (%) | l = 1 kb (%) |
---|---|---|---|---|---|
sel(500, 0.001) | 99.8 | 98.8 | 99.2 | 95.2 | 93.4 |
sel(500, 0.2) | 99.0 | 97.8 | 96.8 | 96.2 | 89.0 |
sel(200, 0.001) | 95.4 | 94.8 | 89.8 | 86.0 | 87.8 |
sel(200, 0.2) | 88.4 | 84.0 | 78.8 | 80.8 | 79.6 |
We consider samples of sequences of length l and fixed θ to the same value in training and testing. Training was done with neu + sel(N(500, 2002), N(0.2, 0.12)). The type I error probability (probability of incorrect classification of neutral samples) was adjusted to 5%. When l = 20, 8, or 4 kb, the length of the subsegments was chosen as 2 kb; when l = 2 or 1 kb, each subsegment was 0.5 kb. The summary statistics were computed independently for each subsegment. The predictive power remains quite high even for short regions.