Skip to main content
Applied and Environmental Microbiology logoLink to Applied and Environmental Microbiology
. 2023 Jan 25;89(2):e01727-22. doi: 10.1128/aem.01727-22

Migration Rates on Swim Plates Vary between Escherichia coli Soil Isolates: Differences Are Associated with Variants in Metabolic Genes

Birgit M Prüß a,, Shelley M Horne a, Erika Shay Bauer a, Collin Pirner a, Madelyn Schwartz a, Morgan L Petersen a, Peter W Bergholz a,b,c,*
Editor: Pablo Ivan Nikeld
PMCID: PMC9972950  PMID: 36695629

ABSTRACT

This study investigates migration phenotypes of 265 Escherichia coli soil isolates from the Buffalo River basin in Minnesota, USA. Migration rates on semisolid tryptone swim plates ranged from nonmotile to 190% of the migration rate of a highly motile E. coli K-12 strain. The nonmotile isolate, LGE0550, had mutations in flagellar and chemotaxis genes, including two IS3 elements in the flagellin-encoding gene fliC. A genome-wide association study (GWAS), associating the migration rates with genetic variants in specific genes, yielded two metabolic variants (rygD-serA and metR-metE) with previous implications in chemotaxis. As a novel way of confirming GWAS results, we used minimal medium swim plates to confirm the associations. Other variants in metabolic genes and genes that are associated with biofilm were positively or negatively associated with migration rates. A determination of growth phenotypes on Biolog EcoPlates yielded differential growth for the 10 tested isolates on d-malic acid, putrescine, and d-xylose, all of which are important in the soil environment.

IMPORTANCE E. coli is a Gram-negative, facultative anaerobic bacterium whose life cycle includes extra host environments in addition to human, animal, and plant hosts. The bacterium has the genomic capability of being motile. In this context, the significance of this study is severalfold: (i) the great diversity of migration phenotypes that we observed within our isolate collection supports previous (G. NandaKafle, A. A. Christie, S. Vilain, and V. S. Brözel, Front Microbiol 9:762, 2018, https://doi.org/10.3389/fmicb.2018.00762; Y. Somorin, F. Abram, F. Brennan, and C. O’Byrne, Appl Environ Microbiol 82:4628–4640, 2016, https://doi.org/10.1128/AEM.01175-16) ideas of soil promoting phenotypic heterogeneity, (ii) such heterogeneity may facilitate bacterial growth in the many different soil niches, and (iii) such heterogeneity may enable the bacteria to interact with human, animal, and plant hosts.

KEYWORDS: Escherichia coli, soil, Buffalo River basin, migration rate on swim plates, genome-wide association study, variants, motility diversity

INTRODUCTION

Escherichia coli is a Gram-negative, facultative anaerobic, often motile bacterium that is not part of the soil microbiome that is considered beneficial to plants, but whose fecal-oral route of transmission to humans involves frequent passages through extrahost environments such as soil. It has been speculated that much of the E. coli biodiversity that is needed to persist in the many different human or animal host environments (e.g., intestinal, extraintestinal, and urogenital) might be generated and maintained via passage through extrahost environments (1). Soil biotic and abiotic environments are highly heterogeneous over small spatial scales, making soil an ideal environment to select for phenotypic heterogeneity. E. coli typically ends up in soil due to the activity of cattle, other farm animals, or wildlife. From there, it can be internalized into food crop plants, such as mung beans (2), lettuce (3), spinach (4), carrots (4), or tomatoes (4). Because of the heterogeneity of soil and the limited availability of labile nutrient compounds outside the phytosphere, variation in motility and chemotaxis traits may be selected in soil environments.

The difference between the bacterial composition of soil and that of the plant rhizobiome was first observed in 1976 (5). Since that time, chemotaxis toward plant exudates has been recognized as the primary mechanism by which plants recruit bacteria to their rhizobiome (6). The root microbiome or rhizobiome of a plant contributes to plant growth, nutritional value, plant health, and crop yield (7). As part of the symbiosis, microbes benefit from the multitude of carbon metabolites that plants exude into the soil through their roots.

Indeed, plant exudates contain abundant labile nutrients, including monosaccharides, 5-carbon sugars, oligosaccharides, some amino acids, organic acids, phenolic compounds, or vitamins (8). Two examples of exudates that recruit beneficial bacteria to the rhizobiome include Arabidopsis thaliana exuding d-malic acid through the roots for the recruitment of Bacillus subtilis (9) and the growth of Azospirillum on sugars such as d-xylose (10) that promote plant colonization by the bacteria (11). Just as root exudates promote the growth of mutualistic microbes, many commensal and pathogenic microbes also seek out the rhizosphere. Plant pathogens using chemotaxis toward root exudates include (i) the ginseng soft-rot bacterium Pseudomonas qessardii, which performed chemotaxis toward three fractions of ginseng rood exudate (12), and (ii) the soil fungus Trichoderma harzianum, which was attracted to stressed tomato plants by compounds they exuded (13).

The soil environment influences the genetic and phenotypic diversity of E. coli and has been discussed as a possible factor behind motility variation in E. coli. As one example, NandaKafle and coworkers demonstrated that E. coli O157:H7 grown in soil-extracted soluble organic matter (SESOM) remained culturable for 24 days, whereas controls grown in Luria-Bertani broth (LB) displayed a death phase (14). Proteome analysis revealed differences in the stress response between the SESOM-grown bacteria and the LB-grown ones as well as increased motility in the SESOM-grown E. coli. It was concluded that soil can serve as a reservoir for E. coli and not just the fecal matter itself (14). As a second example of soil promoting motility heterogeneity, Somorin et al. investigated the RpoS-dependent general stress response in long-term soil-persistent E. coli and concluded that the soil environment does not necessarily select for consistently high (or low) levels of motility, but genetic and phenotypic diversity in response to largely different environmental niches are essential (15). Therefore, the diverse niches encountered during passage through soil may act as diversifying selection for motility phenotypes. Yet, no study has been done to date where a collection of E. coli isolates from different soils has been profiled for their motility diversity.

This new study takes advantage of a previous collection of E. coli isolates that were obtained from soil of the Buffalo River basin in Minnesota, USA, and for which the genomes were sequenced using Illumina short-read sequencing by synthesis (16) (BioProject no. PRJNA416911). The total collection contained 3,329 isolates from 1,428 soil samples that were collected from 143 sites. Isolates were comprised of 43% phylogroup B1, 19% phylogroup B2, 18% phylogroup D, and 8% phylogroup E (16). Phylogroup B1 has shown little evidence of interaction between genes and the environment, while soil environments and soil populations of phylogroup B2 and E have shown evidence of tighter adaptation to the human host (1618). For this study, we focused on phenotype and genomic diversity in phylogroup D because (i) the genomic divergence among these isolates is greater than those for other phylogroups and (ii) isolates from this phylogroup have shown evidence of persistence and interactions between genes and the environment in soil over long-term natural studies (17).

We hypothesized that E. coli from soil would exhibit diverse motility phenotypes that could be linked to genome sequence variation using genome-wide association study (GWAS) techniques. GWAS determines associations between sequence variants and phenotypes and can be accomplished using phylogenetic, machine learning, or multiple-regression techniques (19). Here, we used a phylogenetic method to determine associations between motility phenotypes and sequence variants across the clonal phylogeny of our isolate collection. As a measure of motility, we used tryptone swim plates and determined the migration rates of 265 isolates. Migration rates ranged from entirely nonmotile to 190% of the migration rate of an E. coli K-12 strain that is commonly used in research labs and considered highly motile. Since migration on swim plates depends not just on the motility of the tested strain but also on the growth rates (among other factors), we determined growth rates for the same isolates using the same temperature conditions and media. Migration rates did not correlate with growth rates for these isolates. The nonmotile isolate had mutations in flagellar/chemotaxis genes. GWAS determined associations with migration rates on tryptone swim plates and variants in two metabolic genes that had previously recognized functions in motility and/or chemotaxis and were confirmed experimentally. Phenotypic analysis of 10 isolates revealed differential growth on three carbon sources.

RESULTS AND DISCUSSION

Migration rates on swim plates were diverse and varied nearly 40-fold among isolates.

Swim plate assays on tryptone swim plates were performed on 265 phylogroup D soil isolates of E. coli (Fig. 1A); the diameter of the outermost (usually serine) ring was measured over time. Without following a normal distribution, migration rates on tryptone swim plates varied substantially from entirely nonmotile (LGE0550) to a migration rate (LGE2730) that was 192% the rate of the motile E. coli K-12 strain MC1000. To exclude the possibility that the differences in the migration rate did not merely reflect differences in the basal metabolic rate, the growth rates of the same isolates were determined in liquid tryptone broth (TB) medium at 34°C (Fig. 1B). Spearman’s ranked correlation test determined an r of −0.36, indicating a weak negative correlation between the migration rate and growth rate for these isolates (Fig. 1C) and lack of association between a faster migration rate and faster metabolic rates. As a result, we concluded that GWAS on the migration rate would not be significantly confounded by overall variation in metabolic rate during growth.

FIG 1.

FIG 1

Migration and growth rates in tryptone broth. Panel A shows the histogram of the migration rate on tryptone swim plates (millimeters per hour), expressed as percentage of MC1000. Panel B is the histogram of the growth rate in liquid TB, expressed in log2 OD600 growth per hour. Panel C shows the correlation between migration rate and growth from Spearman’s rank order correlation test.

The nonmotile isolate LGE0550 has IS3 element insertions in fliC.

Our migration rate analysis from this study yielded a single isolate that was completely nonmotile, LGE0550. Based on well-established literature (2024), we hypothesized that this isolate might have mutations in structural flg and fli flagellar genes. To test this hypothesis, we compared the sequences of flagellar/chemotaxis proteins from LGE0550 and E. coli K-12 MG1655 (NC_000913) (25). For additional comparisons, E. coli O17:K52:H18 UMN026 (NC_011751) and E. coli O157:H7 Sakai (NC_002695) (26) were used.

The most dramatic difference between the flagellar proteins of LGE0550 and MG1655 (Table 1) was an insertion of 172 bp into the open reading frame of the flagellin-encoding gene fliC. This insertion contained an IS3 element between bp 44 and 103 as well as another IS3-like element from the IS2 family of transposases between bp 104 and 163. The insertion replaced a stretch of 165 bp of the fliC coding sequence, starting after the codon that encodes amino acid (aa) 183 in FliC. The 5′ end of FliC contained additional amino acid differences. A second large difference was the lack of the 5′ end of Aer, where LGE0550 was missing the first 338 aa relative to the MG1655 sequence (Table 1). In addition, there were many single amino acid substitutions in about two-thirds of the flagellar proteins (Table 1).

TABLE 1.

Flagellar system of the nonmotile isolate LGE0550

Protein Characteristic(s) of:
MG1655 LGE0550a
FlhA 220I 220V
FlhD InsB-5 in flhD promoter Protein identical to MG1655 FlhD
No IS element in flhD promoter
FlhE 29I/50S/68S/97A/99A 29V/50A/68A/97P/99I
FliA 130A 130V
FliC 92V/100T/106E/110S/141N/154N/165Q 92I/100S/106D/110D/141D/154G,
Starting at aa 165, KIDSDTLGLNGFNVNGSGT Starting at aa 165, QIDAKTLGLDGFSVKNNDT
After aa 183 follows 172-bp insertion, including IS3-like element from IS2 family of transposases between bp 104 and 163 and IS3 transposase between bp 44 and 103; 165 bp of fliC replaced
FliF 285H 285Q
FliH 23F/25I/45A/69K 23M/25M/45W/69E
FliJ 61E 61M
FliK 4L/14T/15T 4I/14N/15A
Starting at aa 38, AGETTTDKAA Starting at aa 38, TGEATTDKAT
65I/68V/75N 65V/68L/75D/98N
Starting at aa 98, TTAQTMALAAVADKNTT Starting at aa 98, NTAQTITLAAAADNNTA
185Q/206P/363G 185E/206S/363E
FliL 41D/122N 41E/122K
FliN 36E/37T 36D/37A
FliP 15I 15V
FliR 2L/7E/11S/14N/56P/247I 2M/7D/11F/14S/56S/247M
FliS 43S 43R
FliT 54V 54I
FliZ 53T/63T/168R178S 53M/63V/168G/178A
FlgA 3I/20T/30N/105A/140I 3A/20A/30T/105V/140V
FlgD 210S 210N
FlgE Starting at aa 166, TVTPFS Starting at aa 166, SVNAFD
Starting at aa 217, NSIAKTATTLEFNANGTLVDGAMANNIA Starting at aa 217, TGTAEPAMTLVFNANGVLTSNPTENIT
154E 154D
FlgF 92A 92D
FlgI 16A 16V
FlgJ 15A 15V
FlgK 438E/491A 438A/491T
FlgL Starting at aa 149, EKGKY Starting at aa 149, ADGEY
246V/316S 246I/316N
FlgM 21T/28S 21I/28T
FlgN 66T 66A
Aer Missing first 338 aa
Tar 131Y/230G/544A/168L/428S/433N/463Q/492V/534V 131H/230S/544T/168Q/428A/433S/463R/492T/524A
Tap 116G/154E 116A/154E
Trg 117P/175N/247L 117S/175S/247M
Starting at aa 537, AGE Starting at aa 537, VDK
MotA 224T 224S
CheA 143V/156S/200P/234T/259N 143M/156M/200T/234P/259S
CheB 120N/237S 120S/237A
CheR 44A 44V
CheW 94L 94F
a

The following proteins were identical to those of MG1655: FlhB, FlhC, FliD, FliE, FliG, FliI, FliM, FliO, FliQ, FlgB, FlgC, FlgG, FlgH, MotB, CheY, and CheZ.

To assess the importance of single amino acid differences, we compared the FliC protein sequence of MG1655 to that of other E. coli strains. E. coli UMN026 and E. coli O157:H7 Sakai both have the same single amino acid differences from MG1655 at the 5′ end of the protein that LGE0550 exhibits (92V/I, 106E/D, and 110S/D for MG1655/LGE0550). E. coli UMN026 has an insertion of 27 aa after aa 202, which E. coli O157:H7 Sakai and LGE0550 do not have. Many differences between the three FliC sequences follow after this point (data not shown). We believe that our best candidate to explain the lack of migration of LGE0550 is the presence of the two IS elements in the FliC open reading frame. IS elements in FliC of nonmotile E. coli have been described previously (27). We consider it highly likely that the IS3 elements in FliC are the reason for the lack of migration in LGE0550. An analysis of 8 chemotaxis proteins (CheA, CheB, CheR, CheY, CheW, CheZ, MotA, and MotB) between MG1655, UMN026, and LGE1247 (poorly motile) showed no differences compared with each other (data not shown).

Since we did not have any other isolates that were either completely nonmotile or extremely poorly migratory, we did not perform any sequence comparisons between flagellar and chemotaxis proteins of the remaining isolates to those of MG1655. We did, however, compare sequences of the flhD promoter for several of the isolates since insertions or deletions in this promoter had previously been responsible for motility diversity in biofilm (28). The isolates LGE0440, LGE1939, LGE2066, LGE2139, LGE2758, LGE2838, LGE2839, LGE3035, LGE3151, and LGE3213 covered a range of migration rates from 33% to 160% of that of MC1000. None of those isolates carried an IS element upstream of the open reading frame for flhD (data not shown). Likewise, PCR analysis with two reactions that had previously been used for the detection of insertions and deletions within the flhD operon (28, 29) did not indicate any amplicon length polymorphism within the flhD operons of 14 isolates (Table 2). Migration on tryptone swim plates of five poorly motile isolates was not improved by transformation with the flhD-expressing plasmid pXL27 either. After analyzing the flhD operon of 10 isolates computationally and 14 additional isolates experimentally, we were unable to identify any IS element within the respective flhD operons. It seems like mutations in the flhD operon are not associated with diversity of E. coli migration rates in the soil isolates.

TABLE 2.

Analysis of the flhD operon of poorly and highly motile E. coli soil isolates

Isolate pXL27 complementationa Fragment size (kb) byb:
Migration (%)
PCR1 PCR2
MC1000c NA 1.2 2 100
AJW678d NA 1.2 1.3 45
Poorly motile isolates (60% or less)
 LGE0550 NM 1.2 1.3 0
 LGE1742 NM 1.2 1.3 5.7
 LGE2261 PM 1.2 1.3 22.85
 LGE1603 PM 1.2 1.3 25.7
 LGE3043 NA 1.2 1.3 57.25
 LGE1882 NA 1.2 1.3 60.2
Highly motile isolates (120% or more)
 LGE0072 NA 1.2 1.3 120.5
 LGE0638 NA 1.2 1.3 129.0
 LGE1036 NA 1.2 1.3 141.6
 LGE1174 NA 1.2 1.3 141.6
 LGE0168 NA 1.2 1.3 142.4
 LGE0998 NA 1.2 1.3 149.1
 LGE0212 NA 1.2 1.3 154.4
 LGE0404 NA 1.2 1.3 171.4
a

NM, nonmotile; PM, partially motile; NA, not applicable.

b

PCR1 includes 1,199 bp in the absence of insertions or deletions, and PCR2 includes 1,343 bp in the absence of insertions or deletions.

c

MC1000 contains an IS element between the two forward primer sites for the PCRs. The migration rate on tryptone swim plates of MC1000 that was considered 100% was 8.3 mm/h.

d

AJW678 does not contain an IS element in the flhD promoter.

GWAS analysis associated migration rates on tryptone swim plates with variants in chemotaxis, metabolic/transporter genes, biofilm-associated genes, and others.

Over the past decade, GWAS has been used to detect associations between single nucleotide polymorphisms (SNPs) and multiple human diseases, including diabetes and cardiovascular disease (19). Bacterial GWAS presents special challenges due to high frequency of mutations, high levels of allelic diversity in populations, varied rates of homologous recombination over evolutionary histories, and frequent gene transfer events (30). The TreeWAS methodology that was developed for bacterial GWAS and was used in this study attempts to overcome those limitations by mapping sequence variants onto the clonal phylogeny of isolates and then comparing those results with a null distribution generated against random permutations of phenotypes across the tree (31) (https://github.com/caitiecollins/treeWAS). To assess sample size, we used a bacterial GWAS study that identified the iron uptake systems of E. coli as important in extraintestinal virulence of the pathogen (32). In this study, the GWAS was done with a sample size of n = 326 E. coli isolates initially and then a subset of n = 186. Our sample size of n = 265 is within the range of the two sample sizes in the previous study.

We used TreeWAS to identify variants associated with migration rates on tryptone swim plates. E. coli UMN026 (NC_011751) was used as a reference strain. A total of 82 variants were obtained with a Bonferroni-adjusted P value of <0.05, of which 14 were intergenic (IG), 12 were missense (MS) variants, 1 variant had lost the start codon of a gene (SL), and 55 were synonymous variants. Table 3 summarizes the 27 nonsynonymous variants.

TABLE 3.

Associations between migration rates on tryptone swim plates and variants in chemotaxis, metabolic, biofilm, and other genes

Scorea Variant typeb Variantc vAFd Variatione Product(s)f GO term(s)g
Variants in chemotaxis genes and metabolic  genes related to chemotaxis
 −44.61 IG yqjI-aer 51.8 A→G NA NA
 46.284 IG metR-metE 21.9 T→A metE, 5-methyltetrahydropteroyltriglutamate-homocysteine S-methyltransferase metE: GO:0050667, homocysteine metabolic process; GO:0009086, methionine biosynthetic process; GO:0035999, tetrahydrofolate interconversion
metR, transcriptional regulator of metE metR: GO:0003700, DNA-binding transcription factor activity; GO:0009086, methionine biosynthetic process
 52.402 IG rygD-serA 42.8 A→G serA, d-3-phosphoglycerate dehydrogenase, 2-hydroxyglutarate dehydrogenase serA: GO:0006564, l-serine biosynthetic process; GO:0009070, serine family amino acid biosynthetic process; GO:0047545, 2-hydroxyglutarate dehydrogenase activity
rygD, small RNA upstream of variation rygD: GO:0035194, posttranscriptional gene silencing by RNA
Variants in metabolic and transporter genes
 41.529 IG bcp-hyfA 25 C→T hyfA, hydrogenase 4, 4Fe-4S subunit; bcp, located upstream of variation hyfA: GO:0006210, thymine catabolic process; GO:0006212, uracil catabolic process
 48.167 MS sgcQ 50.7 261G→C Putative nucleoside triphosphatase; KpLE2 phage-like element GO:0003824, catalytic activity, molecular function
 45.667 MS hypF 74.1 2168C→T Carbamoyl phosphate phosphatase and maturation protein for [NiFe] hydrogenases GO:0046944, protein carbamoylation; GO:0051604, protein maturation
 42.994 MS prkB 39.6 744G→T Phosphoribulokinase GO:0005975, carbohydrate metabolic process; GO:0015937, coenzyme A biosynthetic process
 −44.61 IG yjjV-yjjW 51.8 A→G NA NA
 41.349 MS ECUMN_2447 60.8 501C→G Molybdate metabolism protein
 44.751 LS Eco 34 2T→G Ecotin (serine protease inhibitors) GO:0004867, serine-type endopeptidase inhibitor activity
 40.697 IG ftsK-lolA 33.8 G →T lolA, outer membrane lipoprotein carrier proteinftsK, located upstream of variation lolA: GO:0072323, chaperone-mediated protein transport across periplasmic space; GO:0044874, lipoprotein localization to outer membrane; GO:0042953, lipoprotein transport
 40.697 IG ytfQ-ytfR 33.8 G→T ytfR, putative sugar ABC transporter ATP-binding protein ytfR: GO:0016887, ATPase activity; GO: 0005524, ATP binding
ytfQ, located upstream of variation
 48.303 MS mhpT 55.4 336T→G Putative 3-hydroxyphenylpropionic transporter MhpT GO:0015293, symporter activity, molecular function
 42.712 IG ECUMN_tRNA29-amiC 33.1 C→G N-Acetylmuramoyl-l-alanine amidase GO:0071555, cell wall organization; GO:0043093, FtsZ-dependent cytokinesis; GO:0009253, peptidoglycan catabolic process
 45.481 IG rseX-ECUMN_2256 50 C→A ECUMN_2256, outer membrane porin OmpN ECUMN_2256: GO:0034219, carbohydrate transmembrane transport; GO:0006811, ion transport
Variants in biofilm genes and metabolic  genes related to biofilm
 40.697 IG wcaA-wzc 33.8 G→T wcaA, tyrosine kinase
wzc, located upstream of variation
 45.366 MS yfaL 45 556A→G Adhesin and autotransporter
 43.252 MS entF 54.7 2215G→A Enterobactin synthase subunit F
 41.000 MS yahK 57.9 49T→C Putative oxidoreductase
Variants in other genes of other functions  and hypothetical proteins
 52.650 MS ygeH 63.3 329G→T Transcriptional regulator GO:0000160, phosphorelay signal transduction system; GO:0006355, regulation of transcription, DNA templated
 40.830 MS ECUMN_1821 43.5 203T→C Putative tail length tape measure protein
 −40.719 IG insC-ECUMN_3360 19 G→T Transposase ORF A (fragment), IS3 family GO:0006313, transposition, DNA mediated
 47.604 MS ydfU 49.3 1034C→T Hypothetical protein
 47.031 MS ECUMN_0441 62.6 514T→C Hypothetical protein
 43.493 MS ECUMN_4471 39.6 1113C→G Hypothetical protein
 43.780 IG xylR-bax 32 G→A NA NA
 44.829 IG ygeH-ygeI 63.3 C→A ygeI, hypothetical protein
ygeH, located upstream of variation
a

A positive score means that the variant (mutation) causes the isolate to move up in migration rate by the indicated number of places (ranked motility). A negative score means that the variant moves down in migration rate by the indicated number of places.

b

Variant types: IG, intergenic (the mutation is an SNP in the region between two genes); MS, missense (the mutation is a single nucleotide change in the open reading frame and causes an amino acid change); LS, the mutation results in a loss of the start site.

c

Variant, the designation of the variant.

d

vAF, percentile of the isolates that carry the variant.

e

Variation, the single nucleotide exchange within the intergenic region, the open reading frame of a gene, or at the start of the gene/protein.

f

For missense variants, the name of the gene product is shown. For intergenic variants, the protein function is indicated for the 3′gene when the orientation of the two genes is in the following orientation and order →→. This was the case for bcp-hyfA, ftsK-lolA, ytfO-ytfR, wcaA-wzc, and ygeH-ygeI. The protein is indicated for both genes when the orientation of the two genes is ←→. This was the case for metR-metE. In a case where the orientation of the two genes is →←, the function is indicated as NA (not applicable). This was the case for ygiI-aer, yjjV-yjjW, and xylR-bax. Gene orientations were taken from Regulon DB (http://regulondb.ccg.unam.mx). References that are indicative of protein function are provided in the text.

g

GO terms from UniProt (https://www.uniprot.org) were used for the respective gene(s). Whenever possible, we used biological process terms. In some cases, there were not any biological process terms associated with the gene. In those cases, we used molecular function terms. We were unable to retrieve GO terms for the ECUMN-designated genes, unless there was a specific gene indicated.

Since mutations in structural flagellar or chemotaxis genes (designations fli and flg or che and mot, respectively) tend to lead to a complete or almost complete lack of motility, and we only had one completely nonmotile isolate and one very poorly motile one, we did not expect variants in structural flagellar or chemotaxis genes. SNP in the yqjI-aer intergenic region occurs between the 3′ ends of both genes. We consider it unlikely that variations at the 3′ end of a gene would change transcription or translation of either of the two genes. Two other intergenic variants were associated with migration rates and were deemed to be candidates for affecting gene expression due to previous genetic association of the genes with motility (Table 3).

The first of these intergenic variants is the metR-metE variant, with a score of 46.284 at a sample size of 147, of which 21.9% carried the alternate sequence.

The metE gene encodes 5-methyltetrahydropteroyl triglutamate-homocysteine S-methyltransferase (33), which is involved in methionine biosynthesis. The metR gene encodes the transcriptional activator of metE (34). Methionine was described by Julius Adler as one of the amino acids toward which E. coli can perform chemotaxis through the aspartate receptor Tar (35). From methionine, the enzyme S-adenosylmethionine (SAM) synthetase (encoded by metK) leads to the production of SAM (36), which serves as donor of the methyl group for the methylation of the chemoreceptors and methyl-accepting chemotaxis proteins (MCPs) by CheR (37, 38). The positive association of the migration rate on tryptone swim plates with the metR-metE variant might be explained by the requirement of methionine and SAM in the adaptation response for chemotaxis. This detected association between migration rate and variant merits further experimentation because mutations in metE-metR are known to affect uropathogenic traits and have also been shown to be acquired through horizontal transfer of regulatory elements (39).

The second locus is the rygD-serA variant, with a score of 52.402 at a sample size of 196 genomes, of which 42.8% carried the mutation, suggesting the variant has a positive effect on the migration rate. The serA gene encodes an enzyme with d-3-phosphoglycerate dehydrogenase (40) and 2-hydroxyglutarate dehydrogenase activity (41). This observation fits in with a rather old and fundamental concept of the tryptone swim plates that we used to determine migration on swim plates. On semisolid agar plates, bacteria form concentric rings that follow dynamic equations for the distribution of cell density and chemoattractant concentration (42). When swimming through mixed amino acids, bacteria follow gradients of specific amino acids (43). Serine is of interest here, because the outermost of the rings is the serine ring, where bacteria migrate toward serine, which is facilitated by the serine receptor Tsr and characterized by a decline of serine across this ring (44). A recent study determined that inactivating the serA gene of Pseudomonas aeruginosa resulted in a decrease in swimming and swarming motility (45).

Twelve additional metabolic and transporter variants were associated with migration on tryptone swim plates in a positive or negative manner. These are bcp-hyfA (hydrogenase) (46), sgcQ (putative nucleoside triphosphatase), hypF (carbamoyl phosphate phosphatase) (47), prkB (phosphoribulokinase), ECUMN_2447 (molybdate metabolism protein), eco (ecotin) (48), ftsK-lolA (lipoprotein transport membrane) (49), ytfQ-ytfR (putative ABC transporter), mhpT [transporter for 3-(3-hydroxyphenyl) propionate] (50), ECUMN_tRNA29-amiC (N-acetylmuramoyl-l-alanine amidase, cell division) (51), and rseX-ECUMN_2256 (outer membrane porin OmpN).

This study identified four variants that were positively associated with the migration rate on tryptone swim plates, and those loci had a previously demonstrated involvement in biofilm (Table 3). These were as follows: (i) the variant in the wcaA-wzc intergenic region with a score of 40.7 at a sample size of 150, of which 34% carried the alternate sequence; (ii) the missense variant in yfaL, encoding an adhesin that enhances intracellular biofilm formation in uropathogenesis (52), with a score of 45.4 at a sample size of 145, of which 45% carried the variation; (iii) the missense variant in entF, encoding a subunit of enterobactin synthase (53), with a score of 43.3 at a sample size of 160, with an alternate allele frequency of 54.7%; and (iv) the missense variant in yahK (which encodes aldehyde reductase) (54), with a score of 41.5 at a sample size of 180, with an alternate allele frequency of 57.9%. The gene products of yfaL, entB, and yahK have a demonstrated involvement in biofilm formation from genetic or transcriptomic methods (55). The remaining eight variants were in genes that encoded one transcriptional regulator, a putative tail length tape measure protein, a transposase, and four hypothetical proteins.

Confirmation of the associations between migration on tryptone swim plants and variants in metR-metE and rygD-serA.

Our method of confirming GWAS data is novel according to our literature search. Published GWAS confirmation techniques are often computational (e.g., principal-component analysis [56] or random forest model [57]), but they can also be experimental (e.g., gene expression analysis [58]). Our approach is based upon previous knowledge about specific genes, which enables us to perform physiologically relevant experiments specific to each variant.

To test the hypothesis that the metR-metE variation might increase the ability of the bacteria to migrate toward any chemoattractant, we tested the migratory ability of 20 isolates with the metR-metE variant sequence and 20 isolates with the reference genome sequence on minimal succinate swim plates (Fig. 2). The migration rate of the bacteria that carried the metR-metE intergenic variation was superior to that of bacteria that had the UMN026 sequence in the metR-metE region. The average migration rate for the isolates with the UMN026 sequence in the metR-metE intergenic region was 0.91 ± 0.7 mm/h; the average migration rate for the metR-metE variants was 1.51 ± 0.83 mm/h (P = 0.02; df = 19). This indicates statistical significance of the difference. Altogether, the data support the hypothesis that the metR-metE variation increases the migratory ability of the bacteria on swim plates, at least toward succinate as chemoattractant.

FIG 2.

FIG 2

Difference in migration rates between 20 isolates containing the reference sequence and 20 isolates containing alternate genome sequence variants in the intergenic metR-metE region: Confirmation of the association of migration with the metR-metE variant. The migration rate on minimal succinate swim plates is presented in millimeters per hour for the two groups of isolates. Migration rates are expressed as the average and standard deviation from four replicate experiments. The UMN026 group of isolates contains the UMN026 sequence in the metR-metE region. The “Alternate” group is the variant group of isolates.

To test the hypothesis that the rygD-serA variation may impact the ability of the bacteria to migrate toward serine, we determined the migration rates for 20 isolates with rygD-serA alternate variants and 20 isolates with the reference genome sequence on minimal swim plates that contained serine as the chemoattractant. Figure 3 shows the difference in migration rate (millimeters per hour) between the swim plates that contained serine and those that did not contain serine for the two groups of isolates. Interestingly, the isolates displaying the alternate rygD-serA variant displayed a bimodal distribution of migration rates, with seven isolates migrating between 0 and 1.5 mm/h and 10 isolates having migration rates of >2.5 mm/h. The group of isolates containing the UMN026 sequence in this region did not show this bimodality: in fact, all seven poorly motile isolates were part of the variant group. In synthesizing this with the overall association of the alternate rygD-serA variant with increased migration rates, we hypothesize that the rygD-serA variant contributes to the ability of the bacteria to migrate toward serine. It is highly likely that this locus is linked to other loci that may be required to increase the phenotype.

FIG 3.

FIG 3

Distribution of migration rates among 20 isolates the containing reference sequence and 20 isolates containing alternate variant sequences in the intergenic rygD-serA region. Shown is confirmation of the association of migration with the rygD-serA variant. The difference in migration rate (millimeters per hour) between the minimal succinate swim plates that contained serine and those that did not contain serine is presented for the two groups of isolates (four replicates). The UMN026 group of isolates contains the UMN026 sequence in the rygD-serA region. The “Alternate” group is the variant group of isolates.

Phenotype microarrays revealed differential growth of 10 isolates on d-xylose, putrescine, and d-malic acid.

In our quest to determine the underlying genetic reasons for the phenotypic differences in the migration rates on tryptone swim plates, we have so far found multiple mutations in flagellar/chemotaxis genes and two variants in metabolic genes that already had a published connection to motility and chemotaxis. However, the majority of the GWAS variants were in metabolic genes that did not have a previously published connection to motility and chemotaxis. To further investigate the connection between metabolism and migration, we determined growth phenotypes of 10 isolates with the EcoPlate from the Biolog phenotype microarray system. The 10 isolates were selected based on their broad range of growth phenotypes (log10 optical density at 600 nm [OD600] of 0.096 to 0.168 h−1), biofilm amounts (OD600 of 0.086 to 1.85), and migration rate on tryptone swim plates. Growth and biofilm data were taken from the previous study (59); migration rates on tryptone swim plates at 34°C and 15°C are summarized in Fig. 4. In addition to 34°C, we chose a second temperature of 15°C because this was the average soil temperature at which the isolates had been collected. The migration rate at 34°C was fastest at 160% of that of MC1000 for isolate LGE2066 and slowest at 60% for LGE1939. Isolates LGE2066 and LGE1939 also exhibited the fastest and slowest migration rates, respectively, at 15°C, with 586% and 104% those of MC1000. The remaining isolates showed different patterns of migration rates at the two temperatures. We do not have an explanation for the low migration rate at 15°C for MC1000 but believe that the soil isolates may be better adapted to 15°C than MC1000.

FIG 4.

FIG 4

Migration of 10 isolates on tryptone swim plates. The experiment was done at 34°C (blue bars) and 15°C (orange bars). Migration rates are expressed as percentages of the migration rate of MC1000; the average and standard deviations were calculated across 4 replicate experiments.

The EcoPlate contains 31 carbon sources and has been used for community analysis of soil bacteria (60, 61). All 10 tested isolates were able to respire/grow on d-galactonic acid, pyruvic acid methyl ester, d-galacturonic acid, d-mannitol, N-acetyl-d-glucosamine, glucose-1-phosphate, and α-d-lactose (Fig. 5A). Maximum respiration/growth was seen by LGE2066 at almost 600% of the average that was calculated across all carbon sources for the same isolate. Pathways for the metabolization for these carbon sources were determined with the Kyoto Encyclopedia for Genes and Genomes (KEGG; https://www.genome.jp/kegg). While we could not find pyruvic acid methyl ester in the KEGG database, the remaining six carbon sources all feed into the upper half of glycolysis, the latest at the level of glyceraldehyde-3-phosphate. The three carbon sources on which the 10 isolates exhibited differential respiration/growth are d-malic acid, putrescine, and d-xylose. It seems like glycolysis is an essential pathway for growth of E. coli in soil, whereas metabolic diversity can be beneficial when it comes to growth on other carbon sources. Note that none of the remaining 21 carbon sources provided on the EcoPlates permitted any of the E. coli isolates to respire and grow (data not shown).

FIG 5.

FIG 5

Phenotype microarrays for 10 isolates. Panel A shows the growth rates in OD600 per day, normalized to the average that was calculated for all carbon sources and expressed as a percentage of this average. Averages and standard deviations for each isolate and carbon source were calculated across four to five replicate experiments. Panel B shows the hierarchical clustering that was performed on the mean values of the above data and with the distance of all possible pairs of isolate and carbon source computed. The heat map is a two-dimensional graphical representation of the data, where the mean values are mapped to colors across a range.

A cluster analysis was performed with the growth data from the EcoPlate, and a heat map was produced. Mean values were mapped to colors across a range from red indicating high growth to blue resembling poor growth (Fig. 5B). The carbon sources from columns 4 through 6 formed a cluster, where LGE0417, LGE2139, and LGE2758 showed good growth, followed by LGE1939, LGE2066, LGE2839, and LGE2838, which showed poor growth, and LGE3035 and LGE3213, which showed mixed growth. The carbon sources which form this cluster are d-malic acid, putrescine, and d-xylose, which supports our initial observation. The correlation between growth/respiration on these three carbon sources and migration rates on tryptone swim plates (Fig. 1A) was calculated with Pearson’s correlation. The correlation coefficient (r) for growth on d-malic acid and migration rates on tryptone swim plates was 0.16, with a P value of 0.33. The r value for growth on putrescine and migration rates on tryptone swim plates was −0.36, with a P value of 0.15. The r value for growth on d-xylose and migration rates on tryptone swim plates was −0.57, with a P value of 0.043. Using a P value below 0.05 as a cutoff, we believe there may be a moderate negative correlation between growth on d-xylose and the migration rate on tryptone swim plates. Note that there was a variant in xylR-bax that was positively associated with the migration rates on tryptone swim plates from the GWAS analysis (Table 3). However, the xylR and bax genes are divergent in a way that the variation is located between the two 3′ ends of the genes. It is questionable whether such variations can impact gene expression.

d-Malic acid, putrescine, and d-xylose are important compounds for many bacteria and plants in the soil environment. Roots of A. thaliana are capable of exuding d-malic acid through their roots, which helps beneficial bacteria such as Bacillus subtilis to be recruited to the rhizosphere (9). For E. coli, fumarate and, to a lesser extent, malate have been shown to act as switching factors for the flagellar motor, possibly acting as a connection between metabolism and chemotaxis (62). Chemotaxis toward d-malic acid is dependent on the aerotaxis sensor Aer (63). As one example of the beneficial effect of putrescine on plants, the addition of putrescine and Rhizobacterium to two wheat varieties in sandy soil improved drought tolerance (64). In a second example, putrescine enhanced growth, the metabolic state, and the ability to cope with manganese stress in the mustard plant Brassica juncea (65). An older study by Julius Adler (44) and a microfluidic device study from recent years (66) showed that E. coli can use d-xylose as a chemoattractant, although at a lower efficiency than what was observed for other carbon sources (e.g., d-mannose). Azospirillum is a nitrogen fixer for grass crops, such as corn or rice, and promotes plant growth (67). This bacterium grows well on sugars and organic acids (e.g., d-xylose, d-malic acid) (10) and uses chemotaxis for plant colonization (11). Whether E. coli can use d-malic acid, putrescine, and/or d-xylose to perform chemotaxis toward plant roots will need further experimentation. However, our data are consistent with the idea that growth on at least d-xylose may be associated with migration toward chemoattractants. At this point, the possibility that E. coli “hijacks” the chemotaxis mechanism by which plants recruit their plant beneficial root microbiome from the soil cannot be excluded, and we plan to investigate the chemotactic behavior of soil living bacteria toward plant-exuded metabolic intermediates in more detail in the future.

Conclusion.

In conclusion, our study found a large phenotypic heterogeneity among 265 E. coli soil isolates, when migration rates were determined on tryptone swim plates. Underlying genetic reasons for this diversity are similarly diverse: (i) mutations in flagellar and chemotaxis genes, (ii) variations in metabolic genes relating to serine and methionine biosynthesis, and (iii) possibly growth on d-xylose. A novel confirmation technique for GWAS (or TreeWAS) was introduced that was successful in the cases of the metR-metE and rygD-serA variants. Based on this wide range of phenotypic and genotypic diversity, we conclude that either soil is selectively neutral with regard to migratory phenotypes or diverse soil environments (microniches) might exert diverse selective effects on migration rates of E. coli. These effects could potentially be very local, resulting in the observed phenotypic heterogeneity across the entire collection of isolates.

MATERIALS AND METHODS

E. coli isolates.

A previous publication by the P. Bergholz laboratory described the collection of 3,329 isolates of Escherichia coli, which had been obtained from 1,430 soil samples collected from the Buffalo River basin in Minnesota and North Dakota (16). A total of 590 of these isolates were from phylogroup D. For this study, we used 265 of the phylogroup D isolates, for which a previous GWAS had been performed, associating biofilm phenotypes with genetic variations in specific genes (59). Genomes were sequenced to an average coverage of 92×, with a median of 91×, minimum of 37×, and maximum of 224×. The median Q30 base percentage was 83%, with a minimum of 78% and a maximum of 87% (59). Genome sequencing data were available for all these isolates (BioProject no. PRJNA416911) (16).

Swim plates.

To assess the motility of the 265 isolates, a swim assay was used that was described by Wolfe and Berg many years ago (43) and permits the measurement of a swim ring across a semisolid agar plate. A mixed amino acid bacterial growth medium was used that was designated tryptone broth (TB) (1% tryptone, 0.5% NaCl) and also contained 0.3% agar. Bacteria were inoculated into the center of this tryptone swim plate with an inoculation loop from a single colony that had been grown overnight on Luria-Bertani broth agar plate (LB) (1% tryptone broth, 0.5% NaCl, 1.5% agar). Inoculated tryptone swim plates were incubated at 34°C, and the diameter of the outer serine ring was measured at 1- to 2-h intervals. Averages and standard deviations were calculated from at least three biological replicates that were obtained from different colonies of the same isolate, and migration rates were calculated as millimeters per hour. A moderately motile E. coli K-12 strain, MC1000 (68), was used to standardize the data set. Final migration rates on tryptone swim plates were expressed as percentage of the migration rate of MC1000.

For 10 isolates (LGE2839, LGE1939, LGE2066, LGE2838, LGE3035, LGE3213, LGE0417, LGE2139, LGE3151, and LGE2758), the migration rate on tryptone swim plates was determined at 15°C. The experiment was performed in four biological replicates; data analysis was performed as described above.

A modification of the tryptone swim plate was used to detect the ability to migrate toward individual chemoattractants in validation of GWAS data. This minimal medium swim plate consisted of 5 g/L NaCl, 10 mM K2HPO4, 10 mM KH2PO4, 1 mM (NH4)2SO4, 0.1 M MgSO4, 30 mM succinic acid, 0.1% thiamine, 0.5% leucine, 0.5% threonine, and 0.3% agar. Swim plates were incubated at 34°C, and the formation of the swim ring was observed for 28.5 h. For the metR-metE variant, the migration rate (millimeters per hour) was determined in four replicates on minimal swim plates without further supplements; bacteria were given succinic acid as chemoattractant and carbon source. For the rygD-serA variant, the minimal swim plate contained 1 mM serine as the chemoattractant; in four replicate experiments, the difference between the migration rate (millimeters per hour) of each isolate was determined between plates that contained serine and plates that did not. Data are presented as violin plots. Statistical analysis of the data was done with a two-tailed unpaired t test, where normally distributed (or Wilcoxon test, when a nonparametric test was warranted), comparing the migration rates of 20 isolates that contained the variant to those of 20 isolates that did not contain the variant. A P value of 0.05 was used as a cutoff to define statistical significance of the difference. Variants that were used for these experiments are included in Table 4.

TABLE 4.

Variants for the experiments shown in Fig. 2 and 3

Variant Swim plate Figure illustrating expt Isolatesa:
With variant Without variant
metR-metE Minimal succinate Fig. 2 LGE0171, LGE0255, LGE0411, LGE0550, LGE0628, LGE0747, LGE1033, LGE1250, LGE1512, LGE1554, LGE1658, LGE2102, LGE2139, LGE2425, LGE2722, LGE2771, LGE2839, LGE2970, LGE3060, LGE3320 LGE0072, LGE0108, LGE0360, LGE0483, LGE0674, LGE0733, LGE1160, LGE1483, LGE1525, LGE1541, LGE1603, LGE1746, LGE1848, LGE1900, LGE2165, LGE2404, LGE2727, LGE2795, LGE3035, LGE3151
rygD-serA Minimal succinate ± serine Fig. 3 LGE0072, LGE0111, LGE0360, LGE0404, LGE0626, LGE0746, LGE0963, LGE1080, LGE1205, LGE1250, LGE1448, LGE1510, LGE1554, LGE1660, LGE1892, LGE2134, LGE2165, LGS2424, LGE2771, LGE3070 LGE0221, LGE0242, LGE0340, LGE0395, LGE0457, LGE0560, LGE0604, LGE0832, LGE0911, LGE1110, LGE1459, LGE1513, LGE1605, LGE1640, LGE1722, LGE1808, LGE1947, LGE2075, LGE2400, LGE3210
a

For more information on the isolates, please see BioProject no. PRJNA416911 (16).

Growth of the isolates.

Growth of cultures was performed at 34°C in MOPS (morpholinepropanesulfonic acid) minimal medium (69) supplemented with 0.5% (wt/vol) Casamino Acids. Isolates were maintained as freezer stocks at –80°C and cultivated on LB agar plates prior to each experiment. Cultures were transferred three times into growth medium and incubated at 34°C to allow for acclimatization to the assay conditions. Growth curves were collected on 96-well plates by measuring optical density at 600 nm (OD600) at 20-min intervals until the stationary phase was reached. R/nlsMicrobio was used to determine growth rates (log OD600 per hour) and yields (log OD600) by fitting a Baranyi-Roberts logistic growth model to the OD600 time curve data. Spearman’s rank order correlation test was used to test for associations between migration rate on tryptone swim plates and growth data at 34°C.

Whole-genome sequence alignments.

The genome sequences of all 265 isolates are associated with BioProject no. PRJNA416911 (16). The sequences for select isolates were downloaded and imported into Geneious 15.R.11.1.5. The genome sequences for E. coli O17:K52:H18 UMN026 (NC_011751), E. coli O157:H7 Sakai (NC_002695) (26), and MG1655 (NC_000913) (25) were directly imported into Geneious from NCBI. Whole-genome alignment was performed with Mauve, and multiple-sequence alignment was done with ClustalW.

Analysis of the flhD operon.

A selection of six isolates that exhibited a migration on tryptone swim plates of less than 60% of that of MC1000 and eight isolates whose migration rate was higher than 120% of that of MC1000 was subjected to analysis of the flhD operon. LGE0550 (nonmotile), LGE1603 (25.7% of the MC1000 migration rate), LGE1742 (5.7%), and LGE2261 (22.85%) were transformed with the flhD-expressing plasmid pXL27 (70). Migration of the transformants was tested with at least 3 colonies per transformation on tryptone swim plates. PCR1 and PCR2 were performed for all 14 isolates as described previously (29). PCR1 yields a fragment of 1,199 bp in the absence of insertions or deletions, and PCR2 yields a fragment of 1,343 bp. The positive-control strain MC1000 contains a 768-bp IS1 element between the forward primers for the two PCRs, increasing the size of the PCR2 fragment. The negative-control strain AJW678 (71) does not contain an insertion or deletion within the flhD promoter (72).

Variant calling was performed using kSNP v.3.1 with the complete genome sequence of E. coli UMN026 (NCBI nucleotide accession no. NC_011751.1) as a reference sequence and de novo assemblies of soil isolate genomes as the comparison genomes (73). The resulting list of variants, present in the reference sequence, was annotated using snpEff v.4.1 with the included E. coli UMN026 annotation database (74). Accessory gene content, including paralogs, of variants was identified and annotated during ROARY processing.

Core and accessory genome variants were subsequently processed in R v.3.3.4 (http://www.r-project.org). Genome variant matrices contain many variants that are “redundant,” which tightly covary with other sequence variants with possible genetic linkage and confound statistical association analysis. Many other sequence variants are “insignificant,” meaning that the minor allele fraction is too small to enable a useful test of association. As a result, we first filtered our sequence variant matrix to remove sequence variants that were present or absent in >90% of genomes. Of 223,992 core genome variants identified by kSNP3, only 75,873 had prevalence between 10 and 90% in our data set. Therefore, variants were only analyzed in this study if they were observed in at least 28 genomes. A total of 28,118 genes were present in the pangenome of the 277 phylogroup D E. coli isolates analyzed. A total of 2,797 genes had prevalence in the pangenome of 10% to 90% and so were deemed sufficiently variable for analysis.

Linear modeling (logistic regression [e.g., GEMMA]), phylogenetic (e.g., TreeWAS), and machine learning (e.g., random forest classifiers) methods have been utilized for GWAS on continuous phenotype data. Regardless of the method, the fact of strong clonal population structure in microbial species complicates the discovery of loci under selection (75). As this was our first large GWAS study, we evaluated several analysis methods, including (i) Firth’s penalized likelihood logistic regression of genotypes against phenotypes with three principal-component analysis (PCA) eigenvectors as indicators of population structure (76, 77), (ii) gradient forest (multivariate random forest) classifiers on genotypes against phenotypes with three PCA eigenvectors as indicators of population structure (78), and (iii) TreeWAS, a phylogenetic simulation using a complete core genome phylogenetic clonal frame as an indicator of the phylogenetic history of strains (31) (https://github.com/caitiecollins/treeWAS). All methods appeared sensitive and corresponded with respect to variants in iron metabolism, central metabolism, electron transport, and attachment factors. However, phylogenetic simulation analysis completely accounts for the population structure and history in microbes and has shown the lowest numbers of false positives (31). Thus, we opted to focus our analysis on using TreeWAS for detecting associations.

In detail, TreeWAS analysis was run on binary core SNP presence data and on binary accessory gene presence data separately. Because tree-based models like TreeWAS perform better on uniformly distributed phenotypes, phenotype data were rank transformed with ties averaged. TreeWAS simulates the null distribution of genome variants based on the provided clonal frame tree using maximum-likelihood techniques. TreeWAS performs tests of association between phenotype and genotype using three different scoring criteria (31). The P values of the association tests in TreeWAS were Bonferroni corrected with a threshold of 0.05 (P value of ~6.5 × 10−7). For core genome variants, the “subsequent” scoring criteria detected all of the significant associations. That scoring criterion is designed to detect appearance of the new phenotype in a lineage subsequent to the appearance of a sequence variant. For accessory genome variants, the “simultaneous” scoring criterion detected all of the significant associations. That scoring criterion is designed to detect change of the phenotype at the moment of acquisition of a new gene in the phylogeny. The “terminal” scoring criterion did not detect any statistically significant associations in motility.

Phenotype microarrays.

Ten E. coli isolates were selected based upon their broad range of migration and biofilm phenotypes. Isolates were plated on LB agar and incubated at 15°C, which constitutes the average temperature of the soil isolates at the time of sample collection (16). Cultures were transferred into 5 mL of TB and grown for 3 days at 15°C. This transfer was repeated two more times: each time, 80 μL of the previous broth was used to inoculate the new 4 mL of broth. Bacteria were collected by centrifugation at 4,700 rpm for 10 min at ~15°C, washed twice with 4 mL of 0.9% saline, and centrifuged again. The final pellet was resuspended once in 4 mL of 0.9% saline, the OD600 of the culture was determined, and inoculation of the 96-well plates was done at an OD600 of between 0.098 and 0.102.

The Biolog EcoPlates (Biolog, Hayward, CA) contain triplicates of 31 carbon sources and have been used for the analysis of bacterial communities in soil (79). Plates were inoculated as explained above with 100 μL of bacterial suspension per well. Three different biological isolates were used for the triplicates of the carbon sources that are contained on each plate. Plates were incubated at 15°C for up to 2 weeks. OD600 readings were taken daily. The experiment was performed with four to five replicates per isolate.

To analyze the data, the growth rate in each well was determined at its maximum and expressed as OD600 per day. The average growth across all carbon sources for one isolate and experiment was determined and used to calculate growth as a percentage of this average for each carbon source. The average and standard deviation of the percentage data were determined across four to five replicates per isolate. The 10 carbon sources that permitted at least one of the isolates to grow were presented and analyzed further. A cluster analysis was performed where the distance of all possible pairs of isolate and carbon source was computed. Mean data were used for this analysis, and average linkage analysis was performed, where the distance between two clusters is defined as the average of distances between any member of one cluster to any member of the other cluster. The resulting heat map is a two-dimensional graphical representation of the data in which the mean values are mapped to colors across a range. In a final analysis, Pearson’s correlation coefficient was used to calculate correlations between growth on each of three carbon sources and migration on tryptone swim plates at 15°C. A P value of <0.05 was used as a cutoff to determine the significance of the correlation.

ACKNOWLEDGMENTS

This research was funded by National Science Foundation CAREER award no. DEB-1453397 to P.W.B. Hatch Act Federal Formula Funds projects no. ND02428 and ND2438 funded P.W.B. and B.M.P., respectively, through the North Dakota Agricultural Experiment Station. C.P. and E.S.B. were funded by a Duncan’s scholarship from the College of Agriculture, Food Systems, and Natural Resources. M.L.P. was funded by a Graduate Student Research Award from ND-EPSCoR.

We thank Barney Geddes for helping with Geneious and John McEvoy for critically reading and editing the manuscript.

Contributor Information

Birgit M. Prüß, Email: Birgit.Pruess@ndsu.edu.

Pablo Ivan Nikel, Novo Nordisk Foundation Center for Biosustainability.

REFERENCES

  • 1.Blount ZD. 2015. The unexhausted potential of E. coli. eLife 4:e05826. 10.7554/eLife.05826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Deering AJ, Pruitt RE, Mauer LJ, Reuhs BL. 2011. Identification of the cellular location of internalized Escherichia coli O157:H7 in mung bean, Vigna radiata, by immunocytochemical techniques. J Food Prot 74:1224–1230. 10.4315/0362-028X.JFP-11-015. [DOI] [PubMed] [Google Scholar]
  • 3.Franz E. 2007. Ecology and risk assessment of E. coli O157: H7 and Salmonella Typhimurium in the primary production chain of lettuce. PhD thesis. Wageningen University, Wageningen, The Netherlands. [Google Scholar]
  • 4.Jablasone J, Warriner K, Griffiths M. 2005. Interactions of Escherichia coli O157:H7, Salmonella typhimurium and Listeria monocytogenes plants cultivated in a gnotobiotic system. Int J Food Microbiol 99:7–18. 10.1016/j.ijfoodmicro.2004.06.011. [DOI] [PubMed] [Google Scholar]
  • 5.Foster R, Rovira A. 1976. Ultrastructure of wheat rhizosphere. New Phytol 76:343–352. 10.1111/j.1469-8137.1976.tb01469.x. [DOI] [Google Scholar]
  • 6.Bais HP, Weir TL, Perry LG, Gilroy S, Vivanco JM. 2006. The role of root exudates in rhizosphere interactions with plants and other organisms. Annu Rev Plant Biol 57:233–266. 10.1146/annurev.arplant.57.032905.105159. [DOI] [PubMed] [Google Scholar]
  • 7.Lakshmanan V, Selvaraj G, Bais HP. 2014. Functional soil microbiome: belowground solutions to an aboveground problem. Plant Physiol 166:689–700. 10.1104/pp.114.245811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Gunina A, Kuzyakov Y. 2015. Sugars in soil and sweets for microorganisms: review of origin, content, composition and fate. Soil Biol Biochem 90:87–100. 10.1016/j.soilbio.2015.07.021. [DOI] [Google Scholar]
  • 9.Rudrappa T, Czymmek KJ, Paré PW, Bais HP. 2008. Root-secreted malic acid recruits beneficial soil bacteria. Plant Physiol 148:1547–1556. 10.1104/pp.108.127613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hartmann A, Zimmer W. 1993. Physiology of Azospirillum, p 15–40. In Okon Y (ed), Azospirillum/plant associations. CRC Press, Boca Raton, FL. [Google Scholar]
  • 11.Zhulin IB, Armitage JP. 1992. The role of taxis in the ecology of Azospirillum. Symbiosis 13:199–206. [Google Scholar]
  • 12.Lei F, Fu J, Zhou R, Wang D, Zhang A, Ma W, Zhang L. 2017. Chemotactic response of Ginseng bacterial soft-rot to Ginseng root exudates. Saudi J Biol Sci 24:1620–1625. 10.1016/j.sjbs.2017.05.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Lombardi N, Vitale S, Turrà D, Reverberi M, Fanelli C, Vinale F, Marra R, Ruocco M, Pascale A, d'Errico G, Woo SL, Lorito M. 2018. Root exudates of stressed plants stimulate and attract Trichoderma soil fungi. Mol Plant Microbe Interact 31:982–994. 10.1094/MPMI-12-17-0310-R. [DOI] [PubMed] [Google Scholar]
  • 14.NandaKafle G, Christie AA, Vilain S, Brözel VS. 2018. Growth and extended survival of Escherichia coli O157:H7 in soil organic matter. Front Microbiol 9:762. 10.3389/fmicb.2018.00762. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Somorin Y, Abram F, Brennan F, O'Byrne C. 2016. The general stress response is conserved in long-term soil-persistent strains of Escherichia coli. Appl Environ Microbiol 82:4628–4640. 10.1128/AEM.01175-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Dusek N, Hewitt AJ, Schmidt KN, Bergholz PW. 2018. Landscape-scale factors affecting the prevalence of Escherichia coli in surface soil include land cover type, edge interactions, and soil pH. Appl Environ Microbiol 84:e02714-17. 10.1128/AEM.02714-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Bergholz PW, Noar JD, Buckley DH. 2011. Environmental patterns are imposed on the population structure of Escherichia coli after fecal deposition. Appl Environ Microbiol 77:211–219. 10.1128/AEM.01880-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Touchon M, Perrin A, de Sousa JAM, Vangchhia B, Burn S, O'Brien CL, Denamur E, Gordon D, Rocha EP. 2020. Phylogenetic background and habitat drive the genetic diversification of Escherichia coli. PLoS Genet 16:e1008866. 10.1371/journal.pgen.1008866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chang M, He L, Cai L. 2018. An overview of genome-wide association studies. Methods Mol Biol 1754:97–108. 10.1007/978-1-4939-7717-8_6. [DOI] [PubMed] [Google Scholar]
  • 20.Komeda Y, Silverman M, Matsumura P, Simon M. 1978. Genes for the hook-basal body proteins of the flagellar apparatus in Escherichia coli. J Bacteriol 134:655–667. 10.1128/jb.134.2.655-667.1978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Matsumura P, Silverman M, Simon M. 1977. Synthesis of mot and che gene products of Escherichia coli programmed by hybrid ColE1 plasmids in minicells. J Bacteriol 132:996–1002. 10.1128/jb.132.3.996-1002.1977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Silverman M, Simon M. 1973. Genetic analysis of bacteriophage Mu-induced flagellar mutants in Escherichia coli. J Bacteriol 116:114–122. 10.1128/jb.116.1.114-122.1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Silverman M, Simon M. 1973. Genetic analysis of flagellar mutants in Escherichia coli. J Bacteriol 113:105–113. 10.1128/jb.113.1.105-113.1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Silverman MR, Simon MI. 1972. Flagellar assembly mutants in Escherichia coli. J Bacteriol 112:986–993. 10.1128/jb.112.2.986-993.1972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Blattner FR, Plunkett G, III, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y. 1997. The complete genome sequence of Escherichia coli K-12. Science 277:1453–1462. 10.1126/science.277.5331.1453. [DOI] [PubMed] [Google Scholar]
  • 26.Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T, Tanaka M, Tobe T, Iida T, Takami H, Honda T, Sasakawa C, Ogasawara N, Yasunaga T, Kuhara S, Shiba T, Hattori M, Shinagawa H. 2001. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res 8:11–22. 10.1093/dnares/8.1.11. [DOI] [PubMed] [Google Scholar]
  • 27.Strauch E, Beutin L. 2006. Imprecise excision of insertion element IS5 from the fliC gene contributes to flagellar diversity in Escherichia coli. FEMS Microbiol Lett 256:195–202. 10.1111/j.1574-6968.2006.00100.x. [DOI] [PubMed] [Google Scholar]
  • 28.Horne SM, Sayler J, Scarberry N, Schroeder M, Lynnes T, Prüß BM. 2016. Spontaneous mutations in the flhD operon generate motility heterogeneity in Escherichia coli biofilm. BMC Microbiol 16:262. 10.1186/s12866-016-0878-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Barker CS, Prüss BM, Matsumura P. 2004. Increased motility of Escherichia coli by insertion sequence element integration into the regulatory region of the flhD operon. J Bacteriol 186:7529–7537. 10.1128/JB.186.22.7529-7537.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Didelot X. 2021. Phylogenetic methods for genome-wide association studies in bacteria. Methods Mol Biol 2242:205–220. 10.1007/978-1-0716-1099-2_13. [DOI] [PubMed] [Google Scholar]
  • 31.Collins C, Didelot X. 2018. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput Biol 14:e1005958. 10.1371/journal.pcbi.1005958. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Galardini M, Clermont O, Baron A, Busby B, Dion S, Schubert S, Beltrao P, Denamur E. 2020. Major role of iron uptake systems in the intrinsic extra-intestinal virulence of the genus Escherichia revealed by a genome-wide association study. PLoS Genet 16:e1009065. 10.1371/journal.pgen.1009065. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Hondorp ER, Matthews RG. 28 April 2006. Methionine. EcoSal Plus 10.1128/ecosalplus.3.6.1.7. [DOI] [PubMed] [Google Scholar]
  • 34.Maxon ME, Redfield B, Cai XY, Shoeman R, Fujita K, Fisher W, Stauffer G, Weissbach H, Brot N. 1989. Regulation of methionine synthesis in Escherichia coli: effect of the MetR protein on the expression of the metE and metR genes. Proc Natl Acad Sci USA 86:85–89. 10.1073/pnas.86.1.85. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Mesibov R, Adler J. 1972. Chemotaxis toward amino acids in Escherichia coli. J Bacteriol 112:315–326. 10.1128/jb.112.1.315-326.1972. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Yu P, Zhu P. 2017. Improving the production of S-adenosyl-L-methionine in Escherichia coli by overexpressing metK. Prep Biochem Biotechnol 47:867–873. 10.1080/10826068.2017.1350976. [DOI] [PubMed] [Google Scholar]
  • 37.Djordjevic S, Stock AM. 1998. Chemotaxis receptor recognition by protein methyltransferase CheR. Nat Struct Biol 5:446–450. 10.1038/nsb0698-446. [DOI] [PubMed] [Google Scholar]
  • 38.Simms SA, Stock AM, Stock JB. 1987. Purification and characterization of the S-adenosylmethionine:glutamyl methyltransferase that modifies membrane chemoreceptor proteins in bacteria. J Biol Chem 262:8537–8543. 10.1016/S0021-9258(18)47447-9. [DOI] [PubMed] [Google Scholar]
  • 39.Oren Y, Smith MB, Johns NI, Kaplan Zeevi M, Biran D, Ron EZ, Corander J, Wang HH, Alm EJ, Pupko T. 2014. Transfer of noncoding DNA drives regulatory rewiring in bacteria. Proc Natl Acad Sci USA 111:16112–16117. 10.1073/pnas.1413272111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Tobey KL, Grant GA. 1986. The nucleotide sequence of the serA gene of Escherichia coli and the amino acid sequence of the encoded protein, d-3-phosphoglycerate dehydrogenase. J Biol Chem 261:12179–12183. 10.1016/S0021-9258(18)67220-5. [DOI] [PubMed] [Google Scholar]
  • 41.Zhao G, Winkler ME. 1996. A novel alpha-ketoglutarate reductase activity of the serA-encoded 3-phosphoglycerate dehydrogenase of Escherichia coli K-12 and its possible implications for human 2-hydroxyglutaric aciduria. J Bacteriol 178:232–239. 10.1128/jb.178.1.232-239.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Rosen G, Baloga S. 1976. On the structure of steadily propagating rings of chemotactic bacteria. J Mechanochem Cell Motil 3:225–228. [PubMed] [Google Scholar]
  • 43.Wolfe AJ, Berg HC. 1989. Migration of bacteria in semisolid agar. Proc Natl Acad Sci USA 86:6973–6977. 10.1073/pnas.86.18.6973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Adler J. 1969. Chemoreceptors in bacteria. Science 166:1588–1597. 10.1126/science.166.3913.1588. [DOI] [PubMed] [Google Scholar]
  • 45.Yasuda M, Nagata S, Yamane S, Kunikata C, Kida Y, Kuwano K, Suezawa C, Okuda J. 2017. Pseudomonas aeruginosa serA gene is required for bacterial translocation through Caco-2 cell monolayers. PLoS One 12:e0169367. 10.1371/journal.pone.0169367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Andrews SC, Berks BC, McClay J, Ambler A, Quail MA, Golby P, Guest JR. 1997. A 12-cistron Escherichia coli operon (hyf) encoding a putative proton-translocating formate hydrogenlyase system. Microbiology (Reading) 143:3633–3647. 10.1099/00221287-143-11-3633. [DOI] [PubMed] [Google Scholar]
  • 47.Maier T, Binder U, Böck A. 1996. Analysis of the hydA locus of Escherichia coli: two genes (hydN and hypF) involved in formate and hydrogen metabolism. Arch Microbiol 165:333–341. 10.1007/s002030050335. [DOI] [PubMed] [Google Scholar]
  • 48.Lee HR, Seo JH, Kim OM, Lee CS, Suh SW, Hong YM, Tanaka K, Ichihara A, Ha DB, Chung CH. 1991. Molecular cloning of the ecotin gene in Escherichia coli. FEBS Lett 287:53–56. 10.1016/0014-5793(91)80014-t. [DOI] [PubMed] [Google Scholar]
  • 49.Okuda S, Tokuda H. 2011. Lipoprotein sorting in bacteria. Annu Rev Microbiol 65:239–259. 10.1146/annurev-micro-090110-102859. [DOI] [PubMed] [Google Scholar]
  • 50.Xu Y, Chen B, Chao H, Zhou NY. 2013. mhpT encodes an active transporter involved in 3-(3-hydroxyphenyl)propionate catabolism by Escherichia coli K-12. Appl Environ Microbiol 79:6362–6368. 10.1128/AEM.02110-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Heidrich C, Templin MF, Ursinus A, Merdanovic M, Berger J, Schwarz H, de Pedro MA, Holtje JV. 2001. Involvement of N-acetylmuramyl-l-alanine amidases in cell separation and antibiotic-induced autolysis of Escherichia coli. Mol Microbiol 41:167–178. 10.1046/j.1365-2958.2001.02499.x. [DOI] [PubMed] [Google Scholar]
  • 52.Berry RE, Klumpp DJ, Schaeffer AJ. 2009. Urothelial cultures support intracellular bacterial community formation by uropathogenic Escherichia coli. Infect Immun 77:2762–2772. 10.1128/IAI.00323-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Pettis GS, McIntosh MA. 1987. Molecular characterization of the Escherichia coli enterobactin cistron entF and coupled expression of entF and the fes gene. J Bacteriol 169:4154–4162. 10.1128/jb.169.9.4154-4162.1987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Mohammadi Nargesi B, Sprenger GA, Youn JW. 2018. Metabolic engineering of Escherichia coli for para-amino-phenylethanol and para-amino-phenylacetic acid biosynthesis. Front Bioeng Biotechnol 6:201. 10.3389/fbioe.2018.00201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.May T, Okabe S. 2011. Enterobactin is required for biofilm development in reduced-genome Escherichia coli. Environ Microbiol 13:3149–3162. 10.1111/j.1462-2920.2011.02607.x. [DOI] [PubMed] [Google Scholar]
  • 56.Yano K, Morinaka Y, Wang F, Huang P, Takehara S, Hirai T, Ito A, Koketsu E, Kawamura M, Kotake K, Yoshida S, Endo M, Tamiya G, Kitano H, Ueguchi-Tanaka M, Hirano K, Matsuoka M. 2019. GWAS with principal component analysis identifies a gene comprehensively controlling rice architecture. Proc Natl Acad Sci USA 116:21262–21267. 10.1073/pnas.1904964116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Mageiros L, Méric G, Bayliss SC, Pensar J, Pascoe B, Mourkas E, Calland JK, Yahara K, Murray S, Wilkinson TS, Williams LK, Hitchings MD, Porter J, Kemmett K, Feil EJ, Jolley KA, Williams NJ, Corander J, Sheppard SK. 2021. Genome evolution and the emergence of pathogenicity in avian Escherichia coli. Nat Commun 12:765. 10.1038/s41467-021-20988-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Li Y, Haug S, Schlosser P, Teumer A, Tin A, Pattaro C, Köttgen A, Wuttke M. 2020. Integration of GWAS summary statistics and gene expression reveals target cell types underlying kidney function traits. J Am Soc Nephrol 31:2326–2340. 10.1681/ASN.2020010051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Peterson ML. 2018. Biofilm formation of Escherichia coli from surface soils is influenced by variation in cell envelope, iron metabolism, and attachment factor genes. MSc in microbiology. North Dakota State University, ProQuest Dissertations Publishing, Ann Arbor, MI. [Google Scholar]
  • 60.Feigl V, Ujaczki É, Vaszita E, Molnár M. 2017. Influence of red mud on soil microbial communities: application and comprehensive evaluation of the Biolog EcoPlate approach as a tool in soil microbiological studies. Sci Total Environ 595:903–911. 10.1016/j.scitotenv.2017.03.266. [DOI] [PubMed] [Google Scholar]
  • 61.Lopes LD, Weisberg AJ, Davis EW, II, Varize CS, Pereira ESMC, Chang JH, Loper JE, Andreote FD. 2019. Genomic and metabolic differences between Pseudomonas putida populations inhabiting sugarcane rhizosphere or bulk soil. PLoS One 14:e0223269. 10.1371/journal.pone.0223269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Barak R, Eisenbach M. 1992. Correlation between phosphorylation of the chemotaxis protein CheY and its activity at the flagellar motor. Biochemistry 31:1821–1826. 10.1021/bi00121a034. [DOI] [PubMed] [Google Scholar]
  • 63.Greer-Phillips SE, Alexandre G, Taylor BL, Zhulin IB. 2003. Aer and Tsr guide Escherichia coli in spatial gradients of oxidizable substrates. Microbiology (Reading) 149:2661–2667. 10.1099/mic.0.26304-0. [DOI] [PubMed] [Google Scholar]
  • 64.Khan N, Bano A, Babar MDA. 2019. The stimulatory effects of plant growth promoting rhizobacteria and plant growth regulators on wheat physiology grown in sandy soil. Arch Microbiol 201:769–785. 10.1007/s00203-019-01644-w. [DOI] [PubMed] [Google Scholar]
  • 65.Hussain A, Nazir F, Fariduddin Q. 2019. Polyamines (spermidine and putrescine) mitigate the adverse effects of manganese induced toxicity through improved antioxidant system and photosynthetic attributes in Brassica juncea. Chemosphere 236:124830. 10.1016/j.chemosphere.2019.124830. [DOI] [PubMed] [Google Scholar]
  • 66.Kim M, Kim T. 2010. Diffusion-based and long-range concentration gradients of multiple chemicals for bacterial chemotaxis assays. Anal Chem 82:9401–9409. 10.1021/ac102022q. [DOI] [PubMed] [Google Scholar]
  • 67.Bashan Y, De-Bashan LE. 2010. How the plant growth-promoting bacterium Azospirillum promotes plant growth—a critical assessment. Adv Agron 108:77–136. 10.1016/S0065-2113(10)08002-8. [DOI] [Google Scholar]
  • 68.Casadaban MJ, Cohen SN. 1980. Analysis of gene control signals by DNA fusion and cloning in Escherichia coli. J Mol Biol 138:179–207. 10.1016/0022-2836(80)90283-1. [DOI] [PubMed] [Google Scholar]
  • 69.Neidhardt FC, Bloch PL, Smith DF. 1974. Culture medium for enterobacteria. J Bacteriol 119:736–747. 10.1128/jb.119.3.736-747.1974. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Liu X, Matsumura P. 1994. The FlhD/FlhC complex, a transcriptional activator of the Escherichia coli flagellar class II operons. J Bacteriol 176:7345–7351. 10.1128/jb.176.23.7345-7351.1994. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Kumari S, Beatty CM, Browning DF, Busby SJ, Simel EJ, Hovel-Miner G, Wolfe AJ. 2000. Regulation of acetyl coenzyme A synthetase in Escherichia coli. J Bacteriol 182:4173–4179. 10.1128/JB.182.15.4173-4179.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Prüss BM, Verma K, Samanta P, Sule P, Kumar S, Wu J, Christianson D, Horne SM, Stafslien SJ, Wolfe AJ, Denton A. 2010. Environmental and genetic factors that contribute to Escherichia coli K-12 biofilm formation. Arch Microbiol 192:715–728. 10.1007/s00203-010-0599-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Gardner SN, Slezak T, Hall BG. 2015. kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome. Bioinformatics 31:2877–2878. 10.1093/bioinformatics/btv271. [DOI] [PubMed] [Google Scholar]
  • 74.Cingolani P. 2022. Variant annotation and functional prediction: SnpEff. Methods Mol Biol 2493:289–314. 10.1007/978-1-0716-2293-3_19. [DOI] [PubMed] [Google Scholar]
  • 75.Saber MM, Shapiro BJ. 2020. Benchmarking bacterial genome-wide association study methods using simulated genomes and phenotypes. Microb Genom 6:e000337. 10.1099/mgen.0.000337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Lees JA, Vehkala M, Välimäki N, Harris SR, Chewapreecha C, Croucher NJ, Marttinen P, Davies MR, Steer AC, Tong SY, Honkela A, Parkhill J, Bentley SD, Corander J. 2016. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nat Commun 7:12797. 10.1038/ncomms12797. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Zhou X, Stephens M. 2012. Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44:821–824. 10.1038/ng.2310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Fountain-Jones NM, Craft ME, Funk WC, Kozakiewicz C, Trumbo DR, Boydston EE, Lyren LM, Crooks K, Lee JS, VandeWoude S, Carver S. 2017. Urban landscapes can change virus gene flow and evolution in a fragmentation-sensitive carnivore. Mol Ecol 26:6487–6498. 10.1111/mec.14375. [DOI] [PubMed] [Google Scholar]
  • 79.Bissett A, Richardson AE, Baker G, Kirkegaard J, Thrall PH. 2013. Bacterial community response to tillage and nutrient additions in a long-term wheat cropping experiment. Soil Biol Biochem 58:281–292. 10.1016/j.soilbio.2012.12.002. [DOI] [Google Scholar]

Articles from Applied and Environmental Microbiology are provided here courtesy of American Society for Microbiology (ASM)

RESOURCES