Table 3.
Comparison of prevalent compositionally biased regions for the whole proteome, translated intergenic DNA, known proteins, hypothetical proteins and dORFs in budding yeast
(a) Proteome | |||||
Pbias < 1 × 10-5 | Pbias < 1 × 10-9 | Pbias < 1 × 10-13 | |||
S | 37,006 | S | 18,502 | S | 10,630 |
E | 21,163 | E | 9,147 | T | 5,900 |
L | 18,064 | T | 6,836 | E | 4,704 |
K | 17,067 | N | 6462 (9.3) | Q | 3,924 (10.4) |
N | 15,577 (7.4) | Q | 5,212 (7.5) | N | 3,745 (10.0) |
A | 13,974 | K | 4,280 | P | 2,049 |
G | 12,927 | P | 3,831 | K | 1,910 |
D | 10,004 | L | 3,512 | D | 1,292 |
P | 9,892 | D | 3,176 | G | 961 |
T | 9,866 | A | 2,473 | A | 916 |
F | 8,934 | G | 2,115 | L | 554 |
Q | 8,689 (4.1) | C | 810 | C | 256 |
I | 6,939 | F | 764 | R | 204 |
R | 5,333 | H | 662 | H | 195 |
V | 4,121 | R | 509 | M | 163 |
C | 3,293 | I | 264 | F | 94 |
Y | 2,960 | Y | 262 | V | 90 |
H | 2,645 | M | 245 | Y | 33 |
W | 2,009 | V | 150 | W | 0 |
M | 850 | W | 0 | I | 0 |
Total | 211,313 | Total | 69,212 | Total | 37,620 |
(b) Translated igDNA* | |||||
Pbias < 1 × 10-5 | Pbias < 1 × 10-9 | Pbias < 1 × 10-13 | |||
F | 28,949 | F | 5,692 | F | 1,211 |
C | 10,074 | C | 1,280 | H | 602 |
K | 7,800 | H | 908 | V | 490 |
R | 7,551 | V | 814 | T | 448 |
Y | 6,450 | K | 753 | C | 377 |
L | 6,283 | Y | 690 | L | 366 |
I | 3,789 | T | 681 | Y | 282 |
H | 3,157 | P | 675 | P | 243 |
P | 1,650 | R | 594 | S | 222 |
S | 1,613 | L | 576 | K | 186 |
V | 1,566 | S | 380 | I | 185 |
T | 1,299 | G | 380 | R | 178 |
G | 1,136 | I | 353 | N | 173 (3.2) |
N | 798 (0.9) | W | 299 | G | 166 |
W | 746 | N | 242 (1.7) | W | 98 |
Q | 498 (0.6) | Q | 125 (0.9) | Q | 51 (1.0) |
M | 282 | E | 85 | E | 39 |
A | 268 | M | 26 | D | 16 |
E | 241 | D | 16 | M | 15 |
D | 16 | A | 0 | A | 0 |
Total | 84,166 | Total | 14,569 | Total | 5,348 |
(c) Known yeast proteins† | |||||
Pbias < 1 × 10-5 | Pbias < 1 × 10-9 | Pbias < 1 × 10-13 | |||
S | 27,539 | S | 15,328 | S | 9,819 |
E | 17,519 | E | 8,074 | T | 5,900 |
L | 13,928 | N | 5,716 (9.9) | E | 4,289 |
K | 13,785 | T | 5,413 | N | 3,551 (11.9) |
N | 12,854 (7.7) | Q | 4,520 (7.8) | Q | 3,348 (11.3) |
A | 12,482 | K | 3,653 | K | 1,723 |
G | 11,783 | L | 2,864 | P | 1,669 |
D | 1,934 | L | 595 | P | 170 |
P | 1,883 | P | 453 | G | 62 |
Q | 7,299 (4.4) | A | 2,434 | G | 899 |
P | 7,045 | G | 1,969 | L | 451 |
F | 6,154 | C | 608 | C | 207 |
I | 5,495 | H | 530 | H | 162 |
R | 3,973 | R | 447 | R | 155 |
V | 3,415 | F | 443 | M | 113 |
C | 2,400 | I | 264 | F | 78 |
Y | 2,158 | Y | 218 | V | 0 |
H | 1,536 | M | 195 | Y | 0 |
W | 1,484 | V | 60 | W | 0 |
M | 656 | W | 0 | I | 0 |
Total | 166,920 (13) | Total | 57,938 (38) | Total | 33,070 (19) |
(d) Hypothetical yeast proteins† | |||||
Pbias < 1 × 10-5 | Pbias < 1 × 10-9 | Pbias < 1 × 10-13 | |||
S | 8,621 | S | 2,958 | T | 1,240 |
L | 3,905 | T | 1,423 | S | 772 |
E | 3,630 | E | 1,073 | Q | 576 (13.7) |
K | 3,043 | Q | 680 (6.8) | E | 415 |
F | 2,747 | N | 664 (6.6) | D | 262 |
N | 2,506 (6.4) | K | 602 | N | 194 (4.6) |
T | 2,050 | D | 600 | K | 187 |
D | 1,934 | L | 595 | P | 170 |
P | 1,883 | P | 453 | G | 62 |
A | 1,386 | F | 321 | V | 55 |
I | 1,267 | C | 202 | M | 50 |
R | 1,264 | G | 146 | L | 50 |
Q | 1,171 (3.0) | H | 106 | R | 49 |
G | 882 | R | 62 | C | 49 |
C | 863 | V | 55 | Y | 33 |
H | 528 | M | 50 | H | 33 |
W | 514 | Y | 44 | F | 16 |
Y | 512 | A | 14 | A | 0 |
V | 389 | V | 0 | W | 0 |
M | 179 | W | 0 | I | 0 |
Total | 39,274 (16) | Total | 10,048 (221) | Total | 4,213 (150) |
(e) dORFs | |||||
Pbias < 1 × 10-5 | Pbias < 1 × 10-9 | Pbias < 1 × 10-13 | |||
R | 459 | R | 254 | R | 254 |
H | 307 | L | 204 | L | 204 |
S | 288 | T | 138 | H | 122 |
G | 271 | Q | 129 (11.0) | T | 120 |
L | 248 | H | 122 | C | 99 |
Q | 225 (6.8) | C | 99 | Q | 74 (8.3) |
T | 208 | S | 82 | N | 23 (2.6) |
N | 172 (5.2) | P | 72 | A | 0 |
F | 168 | Y | 50 | D | 0 |
C | 163 | N | 23 (2.0) | E | 0 |
V | 151 | A | 0 | F | 0 |
A | 149 | D | 0 | G | 0 |
D | 111 | E | 0 | I | 0 |
I | 98 | F | 0 | K | 0 |
P | 84 | G | 0 | P | 0 |
Y | 67 | I | 0 | S | 0 |
E | 45 | K | 0 | V | 0 |
K | 37 | M | 0 | Y | 0 |
W | 23 | V | 0 | W | 0 |
M | 14 | W | 0 | M | 0 |
Total | 3,288 | Total | 1,173 | Total | 896 |
*Translated igDNA ('intergenic DNA') is conceptually translated in six frames. For analysis of intergenic DNA in budding yeast, we used the 'Not Feature' file of sequences in FASTA format distributed by SGD (this contains all genomic DNA that does not overlap an annotated feature [32]). This set of nucleotide sequences was conceptually translated in all six reading frames, and the amino-acid compositional biases were tallied up as for the annotated budding-yeast proteome. A dORF is an open reading frame that is disrupted by one or more frameshifts or premature stop codons, and which is likely to be a pseudogene. A data set of dORFs has been derived previously for the budding-yeast genome [9]. †In the totals for known and hypothetical proteins, the number of bias residues per residue of protein is given in parentheses.