Skip to main content
. 2003 May 30;4(6):R40. doi: 10.1186/gb-2003-4-6-r40

Table 3.

Comparison of prevalent compositionally biased regions for the whole proteome, translated intergenic DNA, known proteins, hypothetical proteins and dORFs in budding yeast

(a) Proteome
Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13

S 37,006 S 18,502 S 10,630
E 21,163 E 9,147 T 5,900
L 18,064 T 6,836 E 4,704
K 17,067 N 6462 (9.3) Q 3,924 (10.4)
N 15,577 (7.4) Q 5,212 (7.5) N 3,745 (10.0)
A 13,974 K 4,280 P 2,049
G 12,927 P 3,831 K 1,910
D 10,004 L 3,512 D 1,292
P 9,892 D 3,176 G 961
T 9,866 A 2,473 A 916
F 8,934 G 2,115 L 554
Q 8,689 (4.1) C 810 C 256
I 6,939 F 764 R 204
R 5,333 H 662 H 195
V 4,121 R 509 M 163
C 3,293 I 264 F 94
Y 2,960 Y 262 V 90
H 2,645 M 245 Y 33
W 2,009 V 150 W 0
M 850 W 0 I 0
Total 211,313 Total 69,212 Total 37,620

(b) Translated igDNA*

Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13

F 28,949 F 5,692 F 1,211
C 10,074 C 1,280 H 602
K 7,800 H 908 V 490
R 7,551 V 814 T 448
Y 6,450 K 753 C 377
L 6,283 Y 690 L 366
I 3,789 T 681 Y 282
H 3,157 P 675 P 243
P 1,650 R 594 S 222
S 1,613 L 576 K 186
V 1,566 S 380 I 185
T 1,299 G 380 R 178
G 1,136 I 353 N 173 (3.2)
N 798 (0.9) W 299 G 166
W 746 N 242 (1.7) W 98
Q 498 (0.6) Q 125 (0.9) Q 51 (1.0)
M 282 E 85 E 39
A 268 M 26 D 16
E 241 D 16 M 15
D 16 A 0 A 0
Total 84,166 Total 14,569 Total 5,348

(c) Known yeast proteins

Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13

S 27,539 S 15,328 S 9,819
E 17,519 E 8,074 T 5,900
L 13,928 N 5,716 (9.9) E 4,289
K 13,785 T 5,413 N 3,551 (11.9)
N 12,854 (7.7) Q 4,520 (7.8) Q 3,348 (11.3)
A 12,482 K 3,653 K 1,723
G 11,783 L 2,864 P 1,669
D 1,934 L 595 P 170
P 1,883 P 453 G 62
Q 7,299 (4.4) A 2,434 G 899
P 7,045 G 1,969 L 451
F 6,154 C 608 C 207
I 5,495 H 530 H 162
R 3,973 R 447 R 155
V 3,415 F 443 M 113
C 2,400 I 264 F 78
Y 2,158 Y 218 V 0
H 1,536 M 195 Y 0
W 1,484 V 60 W 0
M 656 W 0 I 0
Total 166,920 (13) Total 57,938 (38) Total 33,070 (19)

(d) Hypothetical yeast proteins

Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13

S 8,621 S 2,958 T 1,240
L 3,905 T 1,423 S 772
E 3,630 E 1,073 Q 576 (13.7)
K 3,043 Q 680 (6.8) E 415
F 2,747 N 664 (6.6) D 262
N 2,506 (6.4) K 602 N 194 (4.6)
T 2,050 D 600 K 187
D 1,934 L 595 P 170
P 1,883 P 453 G 62
A 1,386 F 321 V 55
I 1,267 C 202 M 50
R 1,264 G 146 L 50
Q 1,171 (3.0) H 106 R 49
G 882 R 62 C 49
C 863 V 55 Y 33
H 528 M 50 H 33
W 514 Y 44 F 16
Y 512 A 14 A 0
V 389 V 0 W 0
M 179 W 0 I 0
Total 39,274 (16) Total 10,048 (221) Total 4,213 (150)

(e) dORFs

Pbias < 1 × 10-5 Pbias < 1 × 10-9 Pbias < 1 × 10-13

R 459 R 254 R 254
H 307 L 204 L 204
S 288 T 138 H 122
G 271 Q 129 (11.0) T 120
L 248 H 122 C 99
Q 225 (6.8) C 99 Q 74 (8.3)
T 208 S 82 N 23 (2.6)
N 172 (5.2) P 72 A 0
F 168 Y 50 D 0
C 163 N 23 (2.0) E 0
V 151 A 0 F 0
A 149 D 0 G 0
D 111 E 0 I 0
I 98 F 0 K 0
P 84 G 0 P 0
Y 67 I 0 S 0
E 45 K 0 V 0
K 37 M 0 Y 0
W 23 V 0 W 0
M 14 W 0 M 0
Total 3,288 Total 1,173 Total 896

*Translated igDNA ('intergenic DNA') is conceptually translated in six frames. For analysis of intergenic DNA in budding yeast, we used the 'Not Feature' file of sequences in FASTA format distributed by SGD (this contains all genomic DNA that does not overlap an annotated feature [32]). This set of nucleotide sequences was conceptually translated in all six reading frames, and the amino-acid compositional biases were tallied up as for the annotated budding-yeast proteome. A dORF is an open reading frame that is disrupted by one or more frameshifts or premature stop codons, and which is likely to be a pseudogene. A data set of dORFs has been derived previously for the budding-yeast genome [9]. In the totals for known and hypothetical proteins, the number of bias residues per residue of protein is given in parentheses.