. 2002 Apr;12(4):567–583. doi: 10.1101/gr.209402

Table 6.

Gene Clusters Deduced from the X-Matrix, for a Selected Set of Complexes/Functional Units

A. Percentage of each complex accumulated in each one of the nine clusters
Complexes	Cluster No.No. of Prot.	1 51	2 55	3 58	4 12	5 17	6 6	7 15	8 9	9 54

PSI	12	33.33	8.33	8.33	0	0	0	0	8.33	41.67
PSII	18	16.67	0	5.56	0	5.56	0	0	0	72.22
ATPase	8	25	0	0	0	0	0	0	0	75
Cytb6f	6	16.67	0	0	0	0	0	16.67	0	66.67
NADHase	11	0	0	0	0	100	0	0	0	0
Phyb	9	11.11	11.11	77.78	0	0	0	0	0	0
RibProt	43	46.51	4.65	2.33	0	2.33	0	0	9.3	34.88
RNAApol	4	0	0	0	0	0	0	0	0	100
CellDiv	5	20	40	0	40	0	0	0	0	0
HypoProt	73	8.22	30.14	24.66	2.74	1.37	6.85	17.81	1.37	6.85
B. Weight (in percentage) of each complex within each of the clusters
Complexes	Cluster No.No. of Prot.	1 51	2 55	3 58	4 12	5 17	6 6	7 15	8 9	9 54

PSI	12	7.84	1.82	1.72	0	0	0	0	11.11	9.26
PSII	18	5.88	0	1.72	0	5.88	0	0	0	24.07
ATPase	8	3.92	0	0	0	0	0	0	0	11.11
Cytb6f	6	1.96	0	0	0	0	0	6.67	0	7.41
NADHase	11	0	0	0	0	64.71	0	0	0	0
Phyb	9	1.96	1.82	12.07	0	0	0	0	0	0
RibProt	43	39.22	3.64	1.72	0	5.88	0	0	44.44	27.78
RNApol	4	0	0	0	0	0	0	0	0	7.41
CellDiv	5	1.96	3.64	0	16.67	0	0	0	0	0
HypoProt	73	11.76	40	31.03	16.67	5.88	83.33	86.67	11.11	9.26
C. Recovery of original complexes in the clusters and Purity inside the clusters
Cluster No.	Complexes	−1n(P-value) >3	Recovery %	Purity %	HypoProt %	Organisms best represented in each cluster
Cluster No.	Complexes	−1n(P-value) >3	Recovery %	Purity %	HypoProt %
*Synecho.*	Nongreen algae	Red algae	Green algae	Land plants

1	RiPr	4.09	46.51	39.22	8.22	×	×
3	Phyb	3.12	77.78	12.07	24.66			×
4	CellDiv	(2.9)	40	16.67	2.74				×
5	NADHase	11.01	100	64.71	1.37	×				×
9	PSII	4.72	72.22	24.07	6.85	×	×	×	×	×
Total	All clusters	>3	73.05	36.45

Cluster analysis of genes as deduced from the scores matrix. The optimal number of clusters was found to be equal to nine. Tables include data about nine well-known chloroplast complexes (see Methods) and the hypothetical proteins. (A) Percentage of each complex accumulated in each one of the nine clusters obtained. (B) Percentage of weight of each complex within each one of the clusters. (C) The most relevant functional units as detected with the parameter of the statistical significance (P-value < 10⁻³). The P-value was derived assuming a background Poisson distribution (J.J. Lozano and A.R. Ortiz, in prep.). %R is the percentage of recovery of original complexes in the clusters. %P is the purity inside the clusters. %H is the percentage of functionally unknown proteins. Groups of genomes maximally represented in each cluster are marked by ×'s on the right of the table.