Table 2. Comparative analysis of statistical methods for the identification of outliers representing erroneous ESTs of PTS1 proteins. Three statistical methods (see text) were evaluated to identify outliers in OG-specific PWM score histograms. Methods evaluated as too insensitive for an OG failed to identify additional apparent outliers and experimentally validated cytosolic sequences, while methods evaluated as too unspecific detected too many sequences as outliers. To achieve very good performance in outlier detection, it is recommended to apply all three methods and to eliminate outliers that are identified by at least two methods. ACX1, acyl-CoA oxidase; AGT, alanine (serine)-glyoxylate amiontransferase; ATF1/2, acetyltransferase; BSMDR, quinone oxidoreductase; GSTT1, glutathione S-transferase isoform theta 1; HPR, hydroxypyruvate reductase; MLS, malate synthase; SDRb/DECR, short-chain dehydrogenase-reductase B/2,4-dienoyl-CoA reductase; SCP2, sterol carrier protein isoform 2.
OG acronym |
Total seq. number |
Method 1: „Standard deviation from mean value” |
Method 2: „Positive deviation from median score” |
Method 3: „Interquartile range” |
|||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Number of seq. excluded | Number of seq. excluded (%) | Number of erroneous (cyt.) seq. excluded | Conclusion | Number of seq. excluded | Number of seq. excluded (%) | Number of erroneous (cyt.) seq. excluded | Conclusion | Number of seq. excluded | Number of seq. excluded (%) | Number of erroneous (cyt.) seq. excluded | Conclusion | ||
ACX1 |
88 |
3 |
3.4 |
0/0 |
good |
3 |
3.4 |
0/0 |
good |
3 |
3.4 |
0/0 |
good |
AGT |
94 |
15 |
16.0 |
1/1 |
good |
15 |
16.0 |
1/1 |
good |
14 |
14.9 |
1/1 |
good |
ATF1/2 |
61 |
4 |
6.6 |
0/0 |
good |
5 |
8.2 |
0/0 |
good |
6 |
9.8 |
0/0 |
good |
BSMDR |
52 |
3 |
5.8 |
0/0 |
good |
2 |
3.8 |
0/0 |
good |
2 |
3.8 |
0/0 |
good |
GSTT1 |
54 |
4 |
7.4 |
1/1 |
good |
5 |
7.4 |
1/1 |
good |
4 |
7.4 |
1/1 |
good |
HPR |
76 |
4 |
5.3 |
0/0 |
too insens. |
10 |
13.2 |
0/0 |
good |
7 |
9.2 |
0/0 |
good |
MLS |
47 |
13 |
27.7 |
1/1 |
good |
13 |
27.7 |
1/1 |
good |
0 |
0 |
0/1 |
too insens. |
SDRb/DECR |
72 |
9 |
12.5 |
1/1 |
too unspec. |
5 |
12.5 |
1/1 |
too unspec. |
2 |
2.8 |
1/1 |
good |
SCP2 |
91 |
6 |
6.6 |
2/2 |
good |
10 |
11.0 |
2/2 |
too unspec. |
6 |
6.6 |
2/2 |
good |
total |
635 |
61 |
9.6 |
6/6 |
good |
68 |
10.7% |
6/6 |
good |
44 |
6.9% |
5/6 |
good |
|
|
|
|
|
|
Combined method application |
|
|
|
|
|||
60 | 9.4% | 6/6 | very good |