Skip to main content
. 2011 Jan;187(1):229–244. doi: 10.1534/genetics.110.122614

TABLE 3.

Detection power in dependence of the sequence length

Testing samples l = 20 kb (%) l = 8 kb (%) l = 4 kb (%) l = 2 kb (%) l = 1 kb (%)
sel(500, 0.001) 99.8 98.8 99.2 95.2 93.4
sel(500, 0.2) 99.0 97.8 96.8 96.2 89.0
sel(200, 0.001) 95.4 94.8 89.8 86.0 87.8
sel(200, 0.2) 88.4 84.0 78.8 80.8 79.6

We consider samples of sequences of length l and fixed θ to the same value in training and testing. Training was done with neu + sel(N(500, 2002), N(0.2, 0.12)). The type I error probability (probability of incorrect classification of neutral samples) was adjusted to 5%. When l = 20, 8, or 4 kb, the length of the subsegments was chosen as 2 kb; when l = 2 or 1 kb, each subsegment was 0.5 kb. The summary statistics were computed independently for each subsegment. The predictive power remains quite high even for short regions.