Table 6.
Gene Clusters Deduced from the X-Matrix, for a Selected Set of Complexes/Functional Units
| A. Percentage of each complex accumulated in each one of the nine clusters | ||||||||||
| Complexes | Cluster No.No. of Prot. | 1 51 | 2 55 | 3 58 | 4 12 | 5 17 | 6 6 | 7 15 | 8 9 | 9 54 |
|---|---|---|---|---|---|---|---|---|---|---|
| PSI | 12 | 33.33 | 8.33 | 8.33 | 0 | 0 | 0 | 0 | 8.33 | 41.67 |
| PSII | 18 | 16.67 | 0 | 5.56 | 0 | 5.56 | 0 | 0 | 0 | 72.22 |
| ATPase | 8 | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 75 |
| Cytb6f | 6 | 16.67 | 0 | 0 | 0 | 0 | 0 | 16.67 | 0 | 66.67 |
| NADHase | 11 | 0 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0 |
| Phyb | 9 | 11.11 | 11.11 | 77.78 | 0 | 0 | 0 | 0 | 0 | 0 |
| RibProt | 43 | 46.51 | 4.65 | 2.33 | 0 | 2.33 | 0 | 0 | 9.3 | 34.88 |
| RNAApol | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
| CellDiv | 5 | 20 | 40 | 0 | 40 | 0 | 0 | 0 | 0 | 0 |
| HypoProt | 73 | 8.22 | 30.14 | 24.66 | 2.74 | 1.37 | 6.85 | 17.81 | 1.37 | 6.85 |
| B. Weight (in percentage) of each complex within each of the clusters | ||||||||||
| Complexes | Cluster No.No. of Prot. | 1 51 | 2 55 | 3 58 | 4 12 | 5 17 | 6 6 | 7 15 | 8 9 | 9 54 |
| PSI | 12 | 7.84 | 1.82 | 1.72 | 0 | 0 | 0 | 0 | 11.11 | 9.26 |
| PSII | 18 | 5.88 | 0 | 1.72 | 0 | 5.88 | 0 | 0 | 0 | 24.07 |
| ATPase | 8 | 3.92 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 11.11 |
| Cytb6f | 6 | 1.96 | 0 | 0 | 0 | 0 | 0 | 6.67 | 0 | 7.41 |
| NADHase | 11 | 0 | 0 | 0 | 0 | 64.71 | 0 | 0 | 0 | 0 |
| Phyb | 9 | 1.96 | 1.82 | 12.07 | 0 | 0 | 0 | 0 | 0 | 0 |
| RibProt | 43 | 39.22 | 3.64 | 1.72 | 0 | 5.88 | 0 | 0 | 44.44 | 27.78 |
| RNApol | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7.41 |
| CellDiv | 5 | 1.96 | 3.64 | 0 | 16.67 | 0 | 0 | 0 | 0 | 0 |
| HypoProt | 73 | 11.76 | 40 | 31.03 | 16.67 | 5.88 | 83.33 | 86.67 | 11.11 | 9.26 |
| C. Recovery of original complexes in the clusters and Purity inside the clusters | ||||||||||
| Cluster No. | Complexes | −1n(P-value) >3 | Recovery % | Purity % | HypoProt % | Organisms best represented in each cluster | ||||
| Synecho. | Nongreen algae | Red algae | Green algae | Land plants | ||||||
| 1 | RiPr | 4.09 | 46.51 | 39.22 | 8.22 | × | × | |||
| 3 | Phyb | 3.12 | 77.78 | 12.07 | 24.66 | × | ||||
| 4 | CellDiv | (2.9) | 40 | 16.67 | 2.74 | × | ||||
| 5 | NADHase | 11.01 | 100 | 64.71 | 1.37 | × | × | |||
| 9 | PSII | 4.72 | 72.22 | 24.07 | 6.85 | × | × | × | × | × |
| Total | All clusters | >3 | 73.05 | 36.45 | ||||||
Cluster analysis of genes as deduced from the scores matrix. The optimal number of clusters was found to be equal to nine. Tables include data about nine well-known chloroplast complexes (see Methods) and the hypothetical proteins. (A) Percentage of each complex accumulated in each one of the nine clusters obtained. (B) Percentage of weight of each complex within each one of the clusters. (C) The most relevant functional units as detected with the parameter of the statistical significance (P-value < 10−3). The P-value was derived assuming a background Poisson distribution (J.J. Lozano and A.R. Ortiz, in prep.). %R is the percentage of recovery of original complexes in the clusters. %P is the purity inside the clusters. %H is the percentage of functionally unknown proteins. Groups of genomes maximally represented in each cluster are marked by ×'s on the right of the table.