Skip to main content
. 2002 Apr;12(4):567–583. doi: 10.1101/gr.209402

Table 6.

Gene Clusters Deduced from the X-Matrix, for a Selected Set of Complexes/Functional Units

A. Percentage of each complex accumulated in each one of the nine clusters
Complexes Cluster No.No. of Prot. 1 51 2 55 3 58 4 12 5 17 6 6 7 15 8 9 9 54











PSI 12 33.33 8.33 8.33 0 0 0 0 8.33 41.67
PSII 18 16.67 0 5.56 0 5.56 0 0 0 72.22
ATPase  8 25 0 0 0 0 0 0 0 75
Cytb6f  6 16.67 0 0 0 0 0 16.67 0 66.67
NADHase 11 0 0 0 0 100 0 0 0 0
Phyb  9 11.11 11.11 77.78 0 0 0 0 0 0
RibProt 43 46.51 4.65 2.33 0 2.33 0 0 9.3 34.88
RNAApol  4 0 0 0 0 0 0 0 0 100
CellDiv  5 20 40 0 40 0 0 0 0 0
HypoProt 73 8.22 30.14 24.66 2.74 1.37 6.85 17.81 1.37 6.85
B. Weight (in percentage) of each complex within each of the clusters
Complexes Cluster No.No. of Prot. 1 51 2 55 3 58 4 12 5 17 6 6 7 15 8 9 9 54











PSI 12 7.84 1.82 1.72 0 0 0 0 11.11 9.26
PSII 18 5.88 0 1.72 0 5.88 0 0 0 24.07
ATPase  8 3.92 0 0 0 0 0 0 0 11.11
Cytb6f  6 1.96 0 0 0 0 0 6.67 0 7.41
NADHase 11 0 0 0 0 64.71 0 0 0 0
Phyb  9 1.96 1.82 12.07 0 0 0 0 0 0
RibProt 43 39.22 3.64 1.72 0 5.88 0 0 44.44 27.78
RNApol  4 0 0 0 0 0 0 0 0 7.41
CellDiv  5 1.96 3.64 0 16.67 0 0 0 0 0
HypoProt 73 11.76 40 31.03 16.67 5.88 83.33 86.67 11.11 9.26
C. Recovery of original complexes in the clusters and Purity inside the clusters
Cluster No. Complexes −1n(P-value) >3 Recovery % Purity % HypoProt % Organisms best represented in each cluster







Synecho. Nongreen algae Red algae Green algae Land plants





1 RiPr 4.09 46.51 39.22 8.22 × ×
3 Phyb 3.12 77.78 12.07 24.66 ×
4 CellDiv (2.9) 40 16.67 2.74 ×
5 NADHase 11.01 100 64.71 1.37 × ×
9 PSII 4.72 72.22 24.07 6.85 × × × × ×
Total All clusters >3 73.05 36.45

Cluster analysis of genes as deduced from the scores matrix. The optimal number of clusters was found to be equal to nine. Tables include data about nine well-known chloroplast complexes (see Methods) and the hypothetical proteins. (A) Percentage of each complex accumulated in each one of the nine clusters obtained. (B) Percentage of weight of each complex within each one of the clusters. (C) The most relevant functional units as detected with the parameter of the statistical significance (P-value < 10−3). The P-value was derived assuming a background Poisson distribution (J.J. Lozano and A.R. Ortiz, in prep.). %R is the percentage of recovery of original complexes in the clusters. %P is the purity inside the clusters. %H is the percentage of functionally unknown proteins. Groups of genomes maximally represented in each cluster are marked by ×'s on the right of the table.