TABLE 2.
Relative importance of various factors for genome size in a linear model (LM).
Dataset | Factor | Coefficient | P-value | Relative importance |
All | GC% | 0.086 | <2 × 10–16 | 91.84% |
Plasmid | 0.714 | <2 × 10–16 | 5.91% | |
Virus | 0.454 | <2 × 10–16 | 2.22% | |
CRISPR | −0.043 | 0.248 | 0.03% | |
Virus∗plasmid | −0.130 | 0.104 | – | |
No plasmids | GC% | 0.087 | <2 × 10–16 | 96.62% |
Virus | 0.454 | <2 × 10–16 | 3.18% | |
CRISPR | −0.108 | 0.017 | 0.20% | |
No viruses | GC% | 0.087 | <2 × 10–16 | 93.16% |
Plasmid | 0.713 | <2 × 10–16 | 6.77% | |
CRISPR | 0.066 | 0.168 | 0.07% |
The equation of “All” dataset used in the LM is size ∼ GC% + plasmid + virus + CRISPR + virus∗plasmid. Here, size represents the genome size; GC% represents the genomic GC-content of the host genome; plasmid, virus, and CRISPR represent whether the host genomes are associated with plasmids, viruses, and CRISPR, respectively. The “Coefficient” column contains estimated regression coefficients calculated by ordinary least squares. Relative importance was calculated using the “relaimpo” package (Groemping, 2006); the equation of “No plasmids” dataset is size ∼ GC% + virus + CRISPR; and the equation of “No viruses” dataset is size ∼ GC% + plasmid + CRISPR.