Skip to main content
. 2019 Oct 16;10:2254. doi: 10.3389/fmicb.2019.02254

TABLE 2.

Relative importance of various factors for genome size in a linear model (LM).

Dataset Factor Coefficient P-value Relative importance
All GC% 0.086 <2 × 10–16 91.84%
Plasmid 0.714 <2 × 10–16 5.91%
Virus 0.454 <2 × 10–16 2.22%
CRISPR −0.043 0.248 0.03%
Virusplasmid −0.130 0.104
No plasmids GC% 0.087 <2 × 10–16 96.62%
Virus 0.454 <2 × 10–16 3.18%
CRISPR −0.108 0.017 0.20%
No viruses GC% 0.087 <2 × 10–16 93.16%
Plasmid 0.713 <2 × 10–16 6.77%
CRISPR 0.066 0.168 0.07%

The equation of “All” dataset used in the LM is size ∼ GC% + plasmid + virus + CRISPR + virusplasmid. Here, size represents the genome size; GC% represents the genomic GC-content of the host genome; plasmid, virus, and CRISPR represent whether the host genomes are associated with plasmids, viruses, and CRISPR, respectively. The “Coefficient” column contains estimated regression coefficients calculated by ordinary least squares. Relative importance was calculated using the “relaimpo” package (Groemping, 2006); the equation of “No plasmids” dataset is size ∼ GC% + virus + CRISPR; and the equation of “No viruses” dataset is size ∼ GC% + plasmid + CRISPR.