Table 1.
Counts of maximal cliques, P3, and M-P3 in a real test dataset of over 330,000 sequences
| Identity threshold (%) | Nodes | Average number of cliques by CC | Percent nodes in cliques (in MLvl cliques) | Percent nodes in H triangles (in MLvl H triangles) | Percent nodes in Syn triangles (in MLvl Syn triangles) | Percent nodes in P3 (in MLvl P3) | Percent nodes in M-P3 (in MLvl M-P3) |
| 50 | 295,606 | 35.1 | 46.8 (10.1) | 66.3 (40.4) | 27.4 (18.3) | 36.3 (11.4) | 28.9 (8.5) |
| 70 | 178,558 | 60.8 | 35.8 (2.7) | 59.4 (44.8) | 17.9 (14.9) | 16.7 (2) | 15.2 (1.1) |
| 90 | 104,851 | 0.2 | 36.9 (0.8) | 57.6 (52.4) | 14.1 (12.3) | 12.1 (0.3) | 10.9 (0.4) |
| 99 | 44,592 | 0.2 | 31.8 (0.7) | 55.3 (50.9) | 4.7 (4.2) | 11.5 (0.1) | 3.8 (0.1) |
| Examples | NA | NA | Fig. 1 | * | † | ‡ | § |
Maximal cliques of four nodes and more that were amenable to phylogenetic studies were referred to as cliques. Triangles, based on homology edges only (called H triangles) or sharing of distinct genetic material (called Syn triangles), and P3s were enumerated using in-house scripts, which are available from Philippe Lopez on request. P3s for which at least one of two edges was not homologous were labeled M-P3s. Triangles, P3s, and cliques harboring both cellular and mobile genetic elements sequences were labeled multilevel (MLvl). The percentage of sequences involved in each pattern was estimated. It does not sum to 100%, because a given sequence can simultaneously be part of distinct patterns, in which they are involved through different sets of neighbors. A few real examples corresponding to these patterns are provided for the network at 50% identification threshold (genInfo identifier numbers are indicated). CC, connected component.
*Sharing of cyanophycin synthetase by A (Cyanothece sp. ATCC 51142_172037152), B (Nostoc punctiforme PCC 73102_186685868), and C (Gloeobacter violaceus PCC 7421_37523895). Sharing of fosfomycin resistance protein by A (a plasmid of Staphylococcus aureus_170780437), B (a chromosome of Bacillus cereus Q1_222095687), and C (a virus, Bacillus phage Cherry_77020211).
†The bifunctional protein HldE, glycerol-3-phosphate cytidylyltransferase, and ADP-heptose synthase of Thermodesulfovibrio yellowstonii DSM 11347_206890027, Fusobacterium nucleatum subsp. nucleatum ATCC 25586_19704265, and Bdellovibrio bacteriovorus HD100_42524647 follow this pattern. Late competence protein, S-layer protein, and β-lactamase domain protein of a virus, Geobacillus phage GBSV1_115334647, a chromosome of Bacillus cereus Q1_222096303, and a plasmid of Geobacillus sp. WCH70_239828744, respectively, follow this pattern.
‡Sharing of ammonium transporter Amt by A (Methanobrevibacter ruminantium M1_288560581), B (T. yellowstonii DSM 11347_206890102), and C (Leptospira interrogans serovar Lai str. 56601_294828399). Sharing of 6-phosphogluconate dehydrogenase-like protein by A (a virus, Synechococcus phage syn9_162290189), B (a plasmid of Anabaena variabilis ATCC 29413_75812812), and C (a chromosome of Chloroflexus aurantiacus J-10-fl_163846093).
§B (Bacteroides fragilis YCH46_53714858) shares parts of its bifunctional methionine sulfoxide reductase A/B with the methionine sulfoxide reductase of A (Clostridium acetobutylicum ATCC 824_15893384) and other parts with the methionine sulfoxide reductase B of C (Bordetella pertussis Tohama I_33594433). B (the chromosome of Rickettsia rickettsii str. Iowa_165933859) shares parts of its lysozyme with A (the lysozyme of a virus, Bacteriophage APSE-2_212499717) and other parts with the lysozyme of C (a plasmid of Azospirillum sp. B510_2_288961413).