Abstract
The root-knot nematodes (genus Meloidogyne) are important plant parasites causing substantial agricultural losses. The Meloidogyne incognita group (MIG) of species, most of which are obligatory apomicts (mitotic parthenogens), are extremely polyphagous and important problems for global agriculture. While understanding the genomic basis for their variable success on different crops could benefit future agriculture, analyses of their genomes are challenging due to complex evolutionary histories that may incorporate hybridization, ploidy changes, and chromosomal fragmentation. Here, we sequence 19 genomes, representing five species of key root-knot nematodes collected from different geographic origins. We show that a hybrid origin that predated speciation within the MIG has resulted in each species possessing two divergent genomic copies. Additionally, the apomictic MIG species are hypotriploids, with a proportion of one genome present in a second copy. The hypotriploid proportion varies among species. The evolutionary history of the MIG genomes is revealed to be very dynamic, with noncrossover recombination both homogenizing the genomic copies, and acting as a mechanism for generating divergence between species. Interestingly, the automictic MIG species M. floridensis differs from the apomict species in that it has become homozygous throughout much of its genome.
Keywords: Meloidogyne, genome evolution, phylogenomics, coverage ratio, recombination
Introduction
The root-knot nematodes (RKN; genus Meloidogyne) are among the world’s most destructive crop pests, causing very significant reduction in yields in nearly all major agricultural crops (Moens et al. 2009). The most-studied species in this genus can be divided into three well-supported clades, and the tropical RKN species in Clade 1, especially the closely related Meloidogyne incognita group (MIG) species, are major global pests. They are found in agricultural areas on all continents that have mild winter temperatures (Trudgill and Blok 2001), and have been highlighted as one of the most serious threats to temperate agricultural regions as climate change progresses (Bebber et al. 2014).
Based on cytological examination of gamete development, most MIG nematodes have been determined to be mitotic parthenogens (apomicts) which do not undergo meiosis and reproduce asexually (Triantaphyllou 1963, 1981, 1985; Janssen et al. 2017). Although asexual organisms are often characterized as being less able to adapt to variable environments and interspecific competition than those with meiosis, the MIG apomicts are very successful, globally distributed, highly polyphagous crop pests (Trudgill and Blok 2001). Considerable variation in ability to break crop resistance and to reproduce on different crop species is observed both between and within species (Trudgill and Blok 2001; Williamson and Roberts 2009). Despite their global importance, relatively little information is available on genetic variation between and within species, or genomic diversity across pathogenicity groups and mating systems. Draft genomes are available for two MIG species: Meloidogyne incognita (Abad et al. 2008) and Meloidogyne floridensis (Lunt et al. 2014). A high-quality genome assembly is available for the distantly related Meloidogyne hapla, a facultative meiotic parthenogen from Meloidogyne Clade 2 (Opperman et al. 2008). It has been suggested on the basis of the two draft genome sequences that the MIG arose from a complex series of hybridization events, and that this hybrid origin is evident in the genome structure (Lunt et al. 2014). Genome-scale data are important if we are to accurately understand the complex origins and evolution of the MIG.
The MIG species have classically been differentiated using subtle morphological characters, isozymes, and host range (Hartman and Sasser 1985; Esbenshade and Triantaphyllou 1990; Eisenback and Triantaphyllou 1991; Carneiro et al. 2004; Hunt et al. 2009). The phylogenetic relationships between the closely related MIG species have been difficult to determine unequivocally (Triantaphyllou 1985; De Ley et al. 2002; Adams et al. 2009; Janssen et al. 2016). Nuclear genome sequencing has revealed that MIG species contain two very divergent copies of many loci, likely due to a past hybridization event (Lunt 2008). The different evolutionary histories of these copies, likely to have been brought together by hybridization (i.e., they are homoeologs), compromise both phylogenetic analyses and species identifications that use nuclear genome sequences. While mitochondrial DNA approaches can successfully discriminate some MIG species (Hugall et al. 1994; Blok and Powers 2009; Pagan et al. 2015; Janssen et al. 2016), there is little phylogenetic signal and these maternal-lineage mtDNA studies cannot report on hybrid origins. Large-scale sequencing of nuclear genomes from multiple species and isolates has the potential to provide a wealth of comparative data. Modern population genomic analysis techniques will then be available to discriminate closely related species, examine the agricultural spread and divergence of populations, and more fully represent the MIG’s evolutionary history and phylogenetic relationships.
We have sequenced the genomes of 19 new Meloidogyne isolates from five nominal species, including one to eight unique isolates of each species from diverse geographical locations. The new genomes include isolates of M. incognita, Meloidogyne javanica, and Meloidogyne arenaria, the three most widespread and agriculturally important MIG species. We also sequenced the genome of a second isolate of M. floridensis. Unlike most other MIG species, M. floridensis oocytes display some components of meiosis, including chromosome pairing into bivalents (Handoo et al. 2004). As outgroup, we sequenced Meloidogyne enterolobii (junior synonym M. mayaguensis), a highly pathogenic and invasive species which is also apomictic. Meloidogyne enterolobii is, like the MIG, a member Meloidogyne Clade 1, but clearly distinguished from them by mitochondrial and ribosomal RNA sequence comparisons (Holterman et al. 2009; Janssen et al. 2016). Here, we examine the phylogenomic relationships between Meloidogyne species, investigate the origins of the apomictic species, examine hybridization and ploidy, and describe levels of intra- and interspecific variation. Together, these comparative genomic approaches yield a detailed view of evolutionary history of these crop pests, and provide a valuable platform for future improvement of agricultural practices.
Materials and Methods
Reproducibility
In order to make our analyses as reproducible as possible, we provide a collection of Jupyter notebooks containing analysis scripts, specified parameters, descriptions, and details of the data versions used. It should be possible to recreate all figures from these notebooks. Phylogenomic analyses made use of ReproPhylo an environment for reproducible phylogenomics (Szitenberg et al. 2015). Raw data is published in BioProject PRJNA340324 and genome assemblies, intermediate data transformations, and methods notebooks can be found in the manuscript’s git repository (last accessed September 2017):
https://github.com/HullUni-bioinformatics/MIG-Phylogenomics#mig-phylogenomics de50fe4.
An independent and permanent freeze of this git repository is published at doi: 10.5281/zenodo.399475.
Samples and Sequencing
Meloidogyne J2 larvae, egg masses, or bulk genomic DNA samples were obtained either frozen or preserved in ethanol. The samples represented isolates taken from diverse geographic locations and grown in laboratory culture before harvesting (supplementary table S1, Supplementary Material online). Isolates were identified to species by the source laboratories, with identifications later confirmed by sequencing. High-coverage Illumina short-read data were produced from each isolate. The raw sequence data have been deposited with the international sequence databases under BioProject reference PRJNA340324. The sequencing data including library size and number of reads is listed in supplementary table S2, Supplementary Material online (in addition, https://github.com/HullUni-bioinformatics/MIG-Phylogenomics, DOI: 10.5281/zenodo.399475). Three additional samples are now available from Blanc-Mathieu et al. (2017). Unfortunately, these data were released concurrently with the submission of our manuscript and were not included.
Assembly and Annotation
We created five high-quality genome assemblies, one for each species included in this stud Sequence read quality filtering, de novo assembly, and genome annotation were carried out as detailed in the github repository. We used the Platanus assembler (Kajitani et al. 2014) because it is optimized for highly heterozygous genomes, and performed best in our hands with our data.
In addition to the five de novo reference assemblies (table 1), protein coding gene data was created via read mapping to whole gene reference sequences (including introns and exons). For each sample, the reference gene sequences were taken from its conspecific reference genome. Quality trimmed read pairs were mapped to the gene data set using the BWA package (Li and Durbin 2009). The resulting alignment was converted to fasta entries by creating a VCF file with FreeBayes (Garrison and Marth 2012), including SNPs and monomorphic positions. This VCF file, containing all the sequence positions, was then formatted as fasta. The resulting gene assembly was annotated using the protein2genome model in exonerate (Slater and Birney 2005), to delineate intron–exon boundaries. For each sample, protein query sequences were taken from the conspecific reference genome (in addition, section 2 in https://github.com/HullUni-bioinformatics/MIG-Phylogenomics, DOI: 10.5281/zenodo.399475).
Table 1.
Species | Strain or Isolate Code | Assembly Span (bp) | Number of Scaffolds | Longest Scaffold (bp) | Scaffold N50 (bp) | CEGMA |
Number of Predicted Genes | %GC | |
---|---|---|---|---|---|---|---|---|---|
C | A | ||||||||
Meloidogyne javanica | VW4 | 142,608,877 | 34,394 | 223,460 | 14,133 | 90% | 2.5 | 26,917 | 30.2 |
M. incognita | W1 | 122,043,328 | 33,735 | 248,829 | 16,498 | 83% | 2.4 | 24,714* | 30.6 |
M. arenaria | HarA | 163,770,989 | 46,509 | 163,224 | 10,504 | 91% | 2.7 | 30,308 | 30.3 |
M. enterolobii | L30 | 162,361,678 | 46,090 | 94,967 | 9,280 | 81% | 2.6 | 31,051 | 30.2 |
M. floridensis | SJF1 | 74,893,904 | 9,134 | 88,393 | 13,256 | 84% | 1.6 | 14,144* | 30.2 |
CEGMA, Core Eukaryotic genes Mapping Approach (Parra et al. 2007); C, % complete CEGMA orthologs; A, average copy number per CEGMA ortholog.
Previous gene number predictions were 17,999 in M. incognita (Abad et al. 2008) and 14,500 in M. floridensis (Lunt et al. 2014).
Orthology Definition
In order to conduct a phylogenetic and evolutionary analysis, we created a data set of orthologs. We used OrthoFinder (Emms and Kelly 2015) to define orthology among the proteins obtained from the reference genomes and from the mapping assemblies. The analysis included all the available data for M. floridensis, M. incognita, M. javanica, M. arenaria, and M. enterolobii. Although currently OrthoFinder incorporates a phylogenetic step, the version we have used (git commit 2bb1fe3), did not include this feature and we thus created a phylogenetic filtering step (see below). We tested inflation values of 1.1–20, and selected the inflation value 2, as the setting resulting in the most orthology groups (OGs) containing at least one copy from each sample. The inflation parameter in the Markov cluster linkage (MCL) step defines the relative similarity of the different proteins clustered. The lower it is set, the larger a cluster is allowed to be, and a lower number of clusters is obtained. The optimal value can differ among species with different evolutionary rates. For each resulting OG, we reconstructed a gene tree, using a nucleotide MAFFT l-ins-i alignment (Katoh and Standley 2013) and alignment trimming which allowed up to 30% missing data and at least 0.001 similarity score (TrimAl; Capella-Gutiérrez et al. 2009). Alignments that contained two (or more) sequences from a single species that had a small or no overlap (< 20 bp) in the alignment were discarded to avoid erroneous consideration of two exons of the same orthologue as separate orthologues. The trees were reconstructed with the default parameters in RAxML (Stamatakis 2014). Each gene tree was rooted with M. enterolobii, and the ingroup was traversed to collapse sister leaves belonging to the same sample into a single leaf, keeping the least derived sequence out of the two, using ete2 (Huerta-Cepas et al. 2010). Then the ingroup was clustered into two groups based on patristic distances, using a hierarchical clustering approach. OGs with more than one representative per sample in each cluster were discarded. OG contents were edited based on the results of the steps described earlier, and new alignments and gene trees were constructed (in addition, section 3 in https://github.com/HullUni-bioinformatics/MIG-Phylogenomics, DOI: 10.5281/zenodo.399475).
Phylogenomic Reconstruction and Tests for Phylogenetic Conflicts within the Data
We tested for the existence of conflicting phylogenetic signal in the nuclear protein coding gene trees. Conflicting signal is not unusual among nuclear gene trees, but in this study, we were additionally exploring possible conflict within trees that might exist between the subtrees of the first and second homoeologs. In a first approach, we calculated a weighted Robinson–Foulds distance matrix (Robinson and Foulds 1981) among all the gene trees and constructed a metric MDS plot to examine the number of gene tree clusters that existed in the data with the treeCl package (Gori et al. 2016). In a second approach, we created 100 sets of randomized gene trees, and built a coalescent tree for each randomized set using ASTRAL 4 (Mirarab et al. 2014). In the randomization process, we randomly assigned homoeolog identity to each homoeolog subtree, in each gene tree, such that for a given gene tree, subtree 1 would be denoted “homoeolog 1” in some of the randomized sets, and “homoeolog 2” in the other randomized sets, whereas subtree 2 would be called “homoeolog 2” and “homoeolog 1,” respectively (supplementary section 1.1, Supplementary Material online). We did not use the terms “genome A” and “genome B” when naming the homoeologs to indicate that we do not have synteny information for homoeologs from different orthology clusters. Using the suffixes “A” and “B” would erroneously create the impression that their genome copy assignment is known. The 100 resulting coalescent trees, each based on randomized gene trees of all the orthology clusters, were then combined into a strict consensus tree with RAxML (Stamatakis 2014), with the assumption that if the subtrees indeed share a phylogenetic history the nodes will be recovered in this consensus tree, despite this randomized shuffling. The third approach was based on a maximum likelihood (ML) phylogenetic reconstruction of 100 randomized supermatrices. For each gene in a given randomized supermatrix, the first and second homoeologs were randomly denoted as homoeolog A and B, or vice versa, thus affecting the concatenation process. For example, in one supermatrix, sequences of the first homoeolog from gene 1 were concatenated with sequences of the first homoeolog of gene 2, while in another supermatrix the concatenation was inverted (supplementary section 1.2, Supplementary Material online). The second and third approaches allowed us to test whether the two homoeologs had a shared phylogenetic history or distinctly different phylogenetic histories. We reasoned that if the two homoeologs had a shared organismal history and had coevolved since the MIG ancestor this would be reflected in shared phylogenetic histories, and neither of the randomization schemes would affect the resulting topology (in addition, section 4 in https://github.com/HullUni-bioinformatics/MIG-Phylogenomics, DOI: 10.5281/zenodo.399475).
Mitochondrial Genome Assembly, Annotation, and Tree Reconstruction
An iterative mapping and extension approach was followed to assemble mitochondrial genomes. We used MITObim (Hahn et al. 2013) to iteratively carry out a mapping assembly with the Mira assembler (Chevreux et al. 1999). Genes from publically available mitochondrial genomes were used as seed sequences. Genes from the mitochondrial genomes of M. incognita (NC_024097), M. javanica (NC_026556), and M. arenaria (NC_026554) were used as seeds for the assembly of their conspecifics with mismatch cutoffs of 1–4. Meloidogyne incognita (NC_024097) was also used as a reference for M. floridensis, M. enterolobii, and M. haplanaria, albeit with a more relaxed mismatch cutoff (6 and 15, respectively). Mitochondrial genes were annotated with exonerate (Slater and Birney 2005), using the protein2genome model and protein sequences from the reference mitochondrial genomes as queries. Mitochondrial ribosomal RNAs and the mitochondrial putative control regions were annotated with the est2genome model and the nucleotide sequences from the reference mitochondrial genome as queries. Two phylogenetic trees were reconstructed with mitochondrial genes. The first was rooted with sequences from M. enterolobii and the second was an unrooted tree that only included sequence from MIG species. The analysis was kept reproducible with ReproPhylo (Szitenberg et al. 2015). Single gene data sets were aligned with the l-ins-i algorithm in MAFFT (Katoh and Standley 2013) and trimmed with TrimAl (Capella-Gutiérrez et al. 2009) to exclude alignment columns with >10% missing data. Sequences were then concatenated into a supermatrix and a tree with reconstructed with RAxML (Stamatakis 2014) using the GTR-GAMMA model and allowing a separate model of evolution for each gene (in addition, see sections 5, 6, and 7 in https://github.com/HullUni-bioinformatics/MIG-Phylogenomics, DOI: 10.5281/zenodo.399475).
Coverage Ratio between Homoeologous Contigs
As per the working hypothesis of at least two genome copies in each individual, we used contig and scaffold coverage data to identify regions of the genome that had evidence of being present in two copies rather than one. This approach is analogous to approaches to identifying sex chromosomes in heterokaryotypic organisms. In the case of the Meloidogyne, we were attempting to identify homoeologous contigs where one homoeolog was present in two copies while the other was present in one copy. We paired homoeolog contigs that carried shared orthologous loci, and created a set of 400 contig pairs for each of the apomict MIG reference genomes, ensuring the sets were homologous among the genomes. For each apomict MIG sample, trimmed reads were mapped to contig pairs from its conspecific genomes. For each contig pair, the coverage values were aligned according to the contig sequence pair alignment. Finally, per position coverage ratios were calculated (designating the contig with the highest coverage as the numerator) and the median ratio was inferred for each contig pair. For each sample, we fitted functions with one and two Gaussian components and computed the residual error in each case, to determine which of the two functions fit the distribution of median coverage ratio better (in addition, section 9 in https://github.com/HullUni-bioinformatics/MIG-Phylogenomics, DOI: 10.5281/zenodo.399475).
Recombination
To quantify recombination, we took a sliding window BLAST approach to characterize changing similarity profiles between sequences (supplementary section 2, Supplementary Material online), as the assemblies contain homoeolog contig pairs which may or may not recombine (see Results). For each sample pair, we collected long scaffolds from the query genome (>10 kb). We then recovered the first two hits on the target assembly, for each window in the query scaffold. Windows were 5,000 bp with 2,500 bp overlap. We required matches to be at least 2,500 bp long, and have BLAST E values < 0.01. Cases in which two adjacent windows had the same two hits, but in reciprocal order were considered “events,” as long as the difference between the two hits was at least seven mismatches, and that they existed on different scaffolds. Events that were also recovered when exchanging the query and the target genomes were considered crossover events, otherwise they were considered noncrossover events. We note that this criterion is not valid for M. floridensis, which is a mostly haploid genome assembly. Noncrossover event rates were expressed as the fraction of scaffolds with an event out of the total scaffold count. The Pearson correlation coefficient was calculated between the noncrossover event rates and tree distances, for samples in which at least 3,000 long contigs (>10 kb) were recovered. This analysis is detailed in section 10 in https://github.com/HullUni-bioinformatics/MIG-Phylogenomics, DOI: 10.5281/zenodo.399475.
Result
Reference Genome Assembly and Annotation
We assembled one de novo reference genome for each of the five species, identified repeats and predicted genes on these assemblies, and then mapped data from other conspecific isolates to call variants. The isolates chosen for reference assembly (because of superior sample quality) were M. incognita W1, M. javanica VW4, M. arenaria HarA, M. enterolobii L30, and M. floridensis SJF1 (table 1). We excluded contamination from our assemblies using the TAGC “blobplot” pipeline (Kumar et al. 2013). Very few primary assembly scaffolds (<1% of the total span) were removed as likely contaminants. We tested several genome assembly approaches, and found that Platanus (Kajitani et al. 2014) produced the most complete assemblies for the apomictic species, based on representation of core eukaryotic genes (Parra et al. 2007) and expected genome sizes (table 1). For the automictic species M. floridensis, the Celera assembler (Myers et al. 2000) yielded the best assembly. Our assemblies are included in the archive associated with this publication (doi: 10.5281/zenodo.399475).
We also sequenced the genome of M. haplanaria isolate SJH1 (BioProject PRJNA340324). Meloidogyne haplanaria is outside of the MIG phylogenetically, and was only included in mitochondrial genome analyses (see below). We did not include the previously published M. incognita genome (2008) in our analyses as it was produced with older sequencing technologies, and older assembly algorithms, and without access to the raw sequencing data, we were unable to compare it appropriately (supplementary section 3, Supplementary Material online).
Our reference genomes ranged in span from 75 Mb (M. floridensis) to 164 Mb (M. arenaria) (table 1). The transposable element content of the genomes was characterized as described by Szitenberg et al. (2016) (supplementary section 4, Supplementary Material online). While estimates of the proportion of the genome occupied by mobile elements will be influenced by the accuracy of genome size estimates themselves, we found that Clade I RKN (MIG plus M. enterolobii) had a greater genome proportion of TEs than did M. hapla. However, there were no clear differences in transposable element content between the MIG species or in relation to reproductive mode, as also discussed in Szitenberg et al. (2016). We identified from 14,144 to 30,308 protein coding genes in the five reference genome assemblies (table 1). The number of genes we predicted in the M. incognita genome was higher than reported previously for this species (Abad et al. 2008), but for M. floridensis, we predicted a similar number of protein coding genes to that reported previously (Lunt et al. 2014). Protein coding gene sequences were recovered from other samples (listed in supplementary table S1, Supplementary Material online) using a mapping-and-assembly approach. Reads were mapped to the genes of the most closely related reference genome, based on their mitochondrial relationships (see below).
Divergent Genome Copies Are Common in MIG Genomes
Previous analyses of the M. incognita (2008) and M. floridensis (2014) genomes revealed that many loci were present as two divergent genomic copies, a pattern not observed in M. hapla (Opperman et al. 2008). These divergent copies have signatures of having been brought together by hybridization (Lunt et al. 2014). Our analyses of all five of the newly assembled genomes, using a range of methods, indicated that divergent gene copies are a common feature of all MIG genomes (figs. 1–3). Such divergent copies were not found in the Clade 2 automictic, diploid M. hapla. All the apomictic taxa show a leptokurtic peak of within-genome, nonself best BLAST matches with a mode at ∼97% identity (fig. 1). Notably while M. floridensis also had a peak of pairs at ∼97% identity, there were many fewer such loci in both our newly assembled M. floridensis SJF1 genome and the previous assembly (2014). To robustly define groups of orthologous genes across our species set, we clustered the protein sequences from the five reference genomes with OrthoFinder (Emms and Kelly 2015). We found that the default inflation parameter (1.5) merged what appeared to be distinct sets of orthologs, and so used a more restrictive inflation value of 2. A total of 29,315 orthology groups (OGs) were recovered. Within these, we selected the 4,675 OGs that contained from one to four gene copies in all Meloidogyne species. We filtered the OG predictions to remove likely artifacts of overclustering, and recent within-species duplications. We identified recent paralogs (in-paralogs; where sequences from the same species were close sisters) and retained only the less divergent of the two gene copies. To avoid incorporation of groups that contained more than one set of orthologs, we verified that the sequences from M. enterolobii were monophyletic. This was true of all OGs that had passed the first filter. This filtered data set contained 3,544 OGs, with one or two orthologs per species in most groups (fig. 2). Although a third divergent genomic copy appeared to be present for a few loci, our reanalyses of these data revealed that these were largely likely to be derived from in-paralogs or fragmented gene predictions (supplementary section 5, Supplementary Material online). For the three apomictic species, there were 1,632–1,920 orthology groups containing two copies, but many fewer were found in M. floridensis (477), consistent with the intragenomic analysis (fig. 1). Many groups contained two gene copies in more than one species, with 871–1,046 shared between apomictic species pairs and 225–246 shared between M. floridensis and the apomictic species. We selected a subset of 612 OGs which contained two copies in at least three of the five RKN species. We removed OGs containing possible gene prediction artifacts (i.e., where a species was represented by putative homoeologs with <20 bp overlap, as these may have derived from single, fragmented gene copy). While a few OGs included three gene copies from individual MIG species, the proportion of these “triples” was lower than reported previously (Abad et al. 2008; Lunt et al. 2014). These filters removed a further 75 OGs, leaving 533 OGs for further analysis.
Phylogenomics of MIG Species and Genomes
We collated sequences corresponding to the set of 533 OGs in which two gene copies existed in at least three of the MIG species from the nineteen whole genome sequenced isolates. We generated a supermatrix alignment where each MIG isolate and M. enterolobii were represented by two operational taxonomic units (OTUs), “A” and “B.” We randomly assigned the paired gene copies from each species to the corresponding “A” or “B” OTU. Maximum likelihood (ML) phylogenetic analysis yielded a well resolved tree with the MIG “A” and “B” OTUs as sister clades (fig. 3A). Within each of “A” and “B,” the several isolates of each species were robustly grouped together, and the branching order of these species was identical in the “A” and “B” subtrees, splitting the four taxa analyzed into two groups of two: M. floridensis with M. incognita, and M. javanica with M. arenaria. Based on the sequenced isolates, divergence within species between isolates was very low for M. incognita and M. javanica, and slightly larger for M. floridensis and M. arenaria (fig. 4A). The phylogenetic relationships between species for each genome copy were supported with maximal bootstrap support (black nodes in fig. 3A). We conducted two randomization analyses in which for each OG we shuffled homoeolog identity between OTUs “A” and “B” and reconstructed trees based on ML analysis of randomized supermatrices or on coalescent analysis of randomized gene trees (see Materials and Methods; supplementary section 1, Supplementary Material online). Both randomization tests supported all the same nodes (red nodes in fig. 3A). Coalescent phylogenomic analyses yielded the same topology (supplementary section 1, Supplementary Material online). The two M. enterolobii OTUs were resolved as monophyletic, outside the MIG clade, indicating that the divergent gene copies in M. enterolobii had an evolutionary origin distinct from the event that generated the MIG genome copies.
Simultaneous analysis of many loci in a phylogenomic reconstruction can obscure the presence of alternative phylogenetic histories for subsets of those loci. In order to test for this within our data set, ML trees were constructed for each OG and a pairwise weighted Robinson–Foulds distance matrix was computed between all gene tree pairs. Only a single cluster was recovered when embedding the distance matrix in 3 D space via metric MDS (fig. 3B). Attempts to highlight up to 10 distinct groups did not reveal any separation, supporting the existence of a single topology.
Mitochondrial coding sequences showed little divergence within and among MIG species (median identity across genes >99%, fig. 4B), and we were only able to resolve species relationships with complete mitochondrial genome or complete mitochondrial coding sequence data sets. Even with the mitochondrial genome data set there were just 78 out of 10,955 phylogenetically informative sites. The MIG were quite distant from the outgroups, with M. haplanaria closer than M. enterolobii. Despite the lack of strong phylogenetic signal, the mitochondrial phylogenies were congruent with the nuclear genome analyses, with the exception of the paraphyly of M. arenaria (supplementary section 6, Supplementary Material online).
Surprisingly, intraspecific sequence identity was lower in the nuclear genome (median values of 0.987–0.994; fig. 4A) than in the mitochondrial genome (identities were 1.0 with the exception of M. arenaria, where the median identity is 0.997; fig. 4B). Meloidogyne incognita and M. javanica had lower intraspecific diversity than did M. arenaria and M. floridensis (fig. 4A), even though fewer isolates were examined for the latter two species, and in accordance with branch lengths in figure 3A.
Evidence of Genetic Exchange between Homoeologs
Above we showed that in the MIG apomict species, there is strong evidence for presence of two distinct copies of many nuclear genes, and multiple analyses of the evolutionary histories of these copies were congruent. These findings suggest that the MIG species have two distinct genome copies generated by the same, major genome event and that these genomes came together or existed in an ancestor of the four MIG taxa analyzed. This event could explain the increased assembly span and elevated protein coding gene number observed in these species compared with the homozygous diploid species M. hapla. However, not all genes in the MIG taxa were present in divergent copies (fig. 1). In addition, while more genes in the apomict MIG taxa M. incognita, M. javanica, and M. arenaria were present in the assembly in divergent copies than as a single copy, in the automict M. floridensis only a small proportion of OGs contained two divergent copies (figs. 1 and 2).
If the MIG species’ genomes are the product of a process that resulted in, originally, two divergent copies of every gene, then loss of one copy of a subset of genes would need to have occurred. Stochastic deletion of second copies in a piecemeal fashion is one possible explanation. However, this appears a priori highly unlikely for the cytogenetically diploid M. floridensis. Another mechanism would be ameiotic, noncrossover recombination between homoeologous chromosomes resulting in two homozygotic copies on one side of the recombination, maintaining heterozygosity on the other side (apparent gene conversion), or a double strand break repair mechanism that would also result in gene conversion.
To identify and quantify such potential noncrossover recombination events, we used a sliding window BLAST approach within and between species (see Materials and Methods; supplementary section 2, Supplementary Material online). We identified locations in a query genome in which two overlapping windows in the query had the same top two best hits in the target genome, except that the best hit in the first window was the second best in the second, and vice versa. Each pair of such overlapping windows was counted as an event. The number of events per contig are in supplementary table S3, Supplementary Material online (supplementary section 2, Supplementary Material online) To distinguish gene conversion from reciprocal crossover, we checked that exchanging the query and the target did not yield the same locus. In the apomict species, all the detected events had the signature of noncrossover recombination and we did not detect reciprocal crossover. Gene conversion usually results in a short conversion tract (<100 bp) (LaFave and Sekelsky 2009). While we observed such short conversion tracts by manual inspection, further work is required to quantify this subtle signature throughout the genome and to determine the extent of its contribution to the observed genetic exchange between homoeologs.
For the meiotic M. floridensis, our assembly span was shorter than would be expected from a species carrying two distinct genome copies (table 1) and the genome appeared largely homozygous. A homozygous genome does not provide the contrast to detect crossover or recombination because such events will have no consequence on the sequence of either the identical genome copies. Within the apomicts, noncrossover recombination rates (supplementary table S3 and section 2, Supplementary Material online) are largely explained by phylogenetic distances between MIG species (Pearson’s r = 0.73), suggesting a steady rate of such recombination, although when M. floridensis is included this relationship weakens (r = 0.4) (supplementary section 2, Supplementary Material online).
Read Coverage Analyses and Estimation of Ploidy
Based on cytological examination, chromosomes in MIG apomicts are very small and their number varies within and between species (Triantaphyllou 1981, 1985). This chromosome number variation has led to the speculation that many of these species/isolates are aneuploid (hypotriploid). A previous assembly of the M. incognita genome also predicted some level of triploidy (Abad et al. 2008) but some doubt exists regarding the evidence presented there (supplementary section 5, Supplementary Material online). We assessed the likely ploidy represented by our assemblies through read coverage analysis, and the stoichiometry of the divergent gene copies. We identified ∼350 contig pairs, shared among the three apomictic MIG species, that had shared content of diverged gene pairs (only ∼100 contig pairs were shared between the MIG apomicts and M. enterolobii). We normalized the modal coverages for each contig by expressing them as a ratio of “alpha” to “beta” homeologues, where the copy with greater coverage was arbitrarily designated “alpha.” Thus, a coverage ratio of 1 indicates one copy each of alpha and beta, while a ratio of 2 would indicate two copies of each alpha segment to each beta segment, or three copies overall.
In all species, a bimodal Gaussian distribution fit the data better than did a unimodal Gaussian distribution (15 < Δ Residual error < 50) (fig. 5). This result conflicts with the presence of a simple stoichiometry (one copy each of alpha and beta) across the whole of the genomes of these apomicts, as some genomic regions fit a triploid model (alpha1, alpha2, beta). The alpha1 and alpha2 genomic copies are almost identical in sequence within a genome. Many of the same genomic regions are present in double stoichiometry across the MIG genomes (supplementary section 7, Supplementary Material online), suggesting that they were largely formed before speciation of the MIG apomicts.
The relative proportions of the genome assessed as being present in 1:1 stoichiometry versus 2:1 stoichiometry differed among species (fig. 5). Median coverage ratio distributions for all the isolates (supplementary section 7, Supplementary Material online, including a summary for species) showed that M. incognita had the highest fraction of the genome in the 2:1 pattern, and M. enterolobii the lowest. The proportion of the loci present as homoeologs in M. enterolobii and in the MIG apomicts was similar (fig. 1), but different orthology clusters were represented in the duplicated fraction of M. enterolobii. While more than 350 pairs of contigs carrying divergent copies among the MIG apomicts, only 112 of these genome segments were shared between the MIG apomicts and M. enterolobii. This finding supports the hypothesis, derived from phylogenetic analyses (fig. 3A), that the MIG and M. enterolobii represent independent origins of divergent genome copies.
Discussion
We have generated good-quality assemblies of the genomes of five agriculturally damaging species of plant-parasitic RKN. We also generated whole genome resequencing data for several isolates of the apomictic MIG species, and have analyzed our data for insights into the processes that have generated their peculiar genome structures.
We have presented four main findings. Firstly, the assemblies of the apomict MIG species (table 1) have spans and gene counts much greater than observed in the diploid M. hapla. Secondly, all MIG genomes harbor a substantial number of pairs of divergent gene copies (fig. 2). Similar divergent gene copies are absent from the diploid species M. hapla, and are reduced in number in the meiotic MIG species M. floridensis. Phylogenetic analyses of these divergent gene pairs (fig. 3) showed that they have congruent evolutionary histories, and that they likely arose from a single, major event before the speciation of the four MIG taxa analyzed.
Thirdly, analysis of read coverage of the apomictic genomes revealed that many of the duplicated segments had a 2:1 stoichiometry (fig. 5). Coupled with published karyotypic analysis of MIG species, which revealed greater than diploid chromosome numbers, these data are best reconciled with partial triploidy of the MIG species, with two copies of one genome and a single copy of the other. The two copies of one genome are very closely related (most genes are identical, and maximal divergence was 0.1%), whereas the other, divergent genome is ∼3% different in coding regions and much more divergent in noncoding regions.
Lastly, we found that not all genes were present as divergent duplicate copies, suggesting that there has been stochastic loss of heterozygosity in some divergent gene copies. Homozygous genes were frequently shared between MIG species, implying that these events have been ongoing through speciation. Exploring them, we found evidence for frequent recombination events, most likely noncrossover events, where discordant similarities mapped along assembly scaffolds showed replacement of one copy by its sister.
MIG Species Possess Divergent A and B Genomes
From these data, we suggest that the MIG species have complex, hypotriploid genomes, made up of divergent A and B homoeologous subgenomes. Arbitrarily, we designate the effectively diploid subgenome as A (and thus A1 and A2 copies), and the other genome as B.
The phylogenomic tree was well supported, and the data displayed no detectable conflicting signal (fig. 3). Randomly partitioning gene pairs into “A” and “B” sets generated congruent species phylogenies for each set (fig. 3). Genome assembly quality was not an issue, as the two isolates of M. floridensis grouped together robustly despite the more fragmented first M. floridensis draft genome (2014). Both of the divergent copies yield the same species phylogeny, and thus the uniting of the A and B components in a common progenitor must have occurred before the speciation of the MIG. Meloidogyne enterolobii is well separated from the MIG species (fig. 3), as has been reported previously (Tigano et al. 2005; Lunt 2008; Adams et al. 2009). Meloidogyne enterolobii is also an apomict, and also contains divergent genome copies, but the phylogeny showed that these copies arose as a different progenitor from the one at the base of the MIG. Meloidogyne enterolobii thus likely represents an independent origin of apomixis. Transitions to apomixis have happened on at least four separate occasions elsewhere in the genus Meloidogyne (Janssen et al. 2017). However, the relationship of cause and effect between the emergence of divergent genomes and the loss of meiosis remains to be deciphered.
The Origins of MIG A and B Genomes
We do not known what mechanisms led to the presence of the two divergent genomes in the MIG species. In addition, it is not known when in their evolutionary history apomixis arose. Below, we propose three broad mechanisms to account for the divergent A and B genomes found in the MIG (fig. 6).
Whole Genome Duplication
Whole genome duplication (WGD) in the ancestor of the MIG would create a tetraploid, which could be driven to hypotriploidy by gene loss. However, with no evidence of tetraploid regions in the MIG genomes, this model is unlikely. As we have identified signatures of noncrossover recombination homogenizing the A1 and A2 genomes and some homoeologous regions of the A and B genomes, it is hard to envisage a scenario where the products of WGD could diverge (to ∼3% in coding regions) in the presence of his strong homogenizing factor.
Frozen Allelic Sequence Divergence
A transition to apomixis could instantaneously freeze the original haploid chromosomes (alleles) of the parent as independently evolving entities. Organismal levels of frozen allelic sequence divergence (ASD) vary greatly across different taxa (Romiguier et al. 2014) and apomictic species can have very high ASD, due to lack of recombination. The resulting subgenome divergence under this model could be very similar to that of a hybrid genome. However, hypotriploidy is again not easy to explain in a frozen ASD model. The loci present in a triploid alpha1, alpha2, beta arrangement are largely shared by the MIG (supplementary section 7, Supplementary Material online) indicating that they most likely emerged in shared ancestral events. Explaining the pattern of variation in the hypotriploid MIG apomict genomes as ASD requires invocation of extensive additional segmental duplication before species divergence, followed by suppressed duplication since speciation. This complex scenario seems unlikely.
Hybridization
Hybridization between two diverse RKN strains followed by loss of meiosis is an attractive explanation for the genomic structure of MIG species since it accounts for all our observations in a single step. Hybridization allows contributing genomes to have diverged in allopatry (separate organisms, species, or regions) so that no recombination of any kind acted to homogenize the alpha and beta alleles. Upon hybridization, the gene sets are prediverged, and can be considered homoeologs. Hypotriploidy can be readily explained by an interspecific hybridization event involving one reduced (n) and one unreduced (2n) gamete followed by a rapid genomic turnover back toward (partial) diploidy. This is a process well described in the literature, especially with reference to interspecific hybridization (Mason and Pires 2015). We note that hybrid origins for intragenomic divergent alleles are very well documented in both animals and plants (Heliconius Genome Consortium 2012; Miller et al. 2012; Abbott et al. 2013; Marcussen et al. 2014; Vernot et al. 2016), whereas extreme ASD by asexual accumulation of mutations is controversial, with few if any clear examples. The bringing together of diverged chromosomes may be a mechanism contributing to the disruption of meiotic segregation and thus the origins of asexual modes of reproduction, and Janssen et al. (2017) indicate an association between cytological triploidy and apomixis in Meloidogyne.
We note that not all loci are present as divergent alpha and beta gene pairs, and the proportion of the genome with elevated coverage did not correspond to half of each assembly (fig. 2). The MIG species differ in their proportions of loci present as divergent pairs, indicative of independent change in each lineage.
The predictions of frozen ASD and hybridization models do diverge for the source populations. Hybridization predicts that divergent parental species exist (or existed) and that each have (had) genomes more related to one of the MIG homoeologous genomes than to the other. These parental lineages may not be common in agricultural environments and have not been recorded. The discovery of wild species with nonhybrid genomes, which were phylogenetic sisters to either the A or B MIG genomes, would be key to our understanding of MIG origins.
Another characteristic of MIG species that likely contributes to their genome complement and evolution is the chromosome structure of this group. As is true throughout the phylum Nematoda, Meloidogyne chromosomes are holocentric. In addition, based on cytological examination, chromosomes in MIG apomicts are very small and their number varies within and between species (Triantaphyllou 1981, 1985). For most isolates of M. incognita, the chromosome numbers are between 41 and 48 and hence they are considered to be hypotryploid. Considerable polymorphism between isolates has also been observed for relative size distribution of chromosomes (Triantaphyllou 1981). These features suggest that chromosome missegregation, breakage, translocation, inversions, or similar events may have contributed to the genome copy variations seen in the modern MIG species. Cytogenetic studies using modern techniques would be interesting and necessary to explore these possibilities.
Meloidogyne floridensis is a sibling not a parent of M. incognita
In contrast to the ameiotic species, M. floridensis appears to be effectively diploid from both cytological (Handoo et al. 2004) and genomic analyses (this study). We show (fig. 2) that the process of homoeolog loss is almost complete in M. floridensis, with the telling exception of the gene pairs we use in the phylogeny to reveal its more complex history. Despite sharing the same hybrid origins (fig. 3A), M. floridensis has regained or maintained the ability to carry out meiosis in a form of automixis. How it was able to maintain a meiotic reproductive system, during this turbulent and dynamic period of genome reorganization, and whether the more complete rediploidization is a cause or a consequence of the retention of meiosis, will require further study.
Lunt et al. (2014) hypothesized, based on shared homoeologous genes and difference in reproductive mode, that M. floridensis was a parent of the hybrid MIG apomicts. Our phylogenomic analyses, using much richer data, reject this hypothesis. Instead, M. floridensis is a close relative of M. incognita, is derived from the same parents, and nested within the apomict MIG species. Handoo et al. (2004) showed that M. floridensis is cytologically very different from the MIG apomicts, and enters meiosis during oogenesis. Eighteen chromosome pairs were observed, and bivalents were present before nuclear division and polar body formation, though no second meiotic division was observed. This strongly suggests that meiotic recombination occurs in M. floridensis, as is usual in automixis. The Clade 2 RKN M. hapla also reproduces by automixis (Liu et al. 2007) and in this species meiotic parthenogenesis results in rapid homozygosity (as meiosis II is not completed and diploidy is restored by rejoining of sister chromosomes). If meiosis in M. floridensis involves endo-reduplication and failure to complete meiosis II as suggested by Handoo et al. (2004) this would explain the substantial loss of heterozygosity compared with its closest relative, M. incognita.
Additionally, Lunt et al. (2014) proposed a complex double-hybridization to explain the relationship between the M. floridensis genome and the published M. incognita genome (Abad et al. 2008). Sections of the M. incognita genome were reported to be present in three diverged copies (Abad et al. 2008), implying that three separate parental genomes had been brought together from, we inferred, two hybridization events. There have been many suggestions in the literature that M. incognita is triploid or hypotriploid, based on cytological data, and the observation of triploidy indicated by three diverged copies of some loci by Abad et al. (2008) was biologically plausible. We also found some instances of three diverged copies in our initial assemblies, and M. arenaria has several loci present as three diverged copies (fig. 2). However careful re-examination of these (supplementary results, Supplementary Material online) revealed that they were all gene prediction artefacts, problematic orthology groups (merging two genes, or representing one fragmented ortholog as two separate copies), or paralogous gene family members. We find no convincing evidence for the presence of three homoeologs in any of our sequenced MIG genomes, or in our reanalyses of the published M. incognita or M. floridensis genomes. Although we discuss earlier that MIG species do indeed have hypotriploid genomes, none contains a third divergent genome copy, but rather a second copy of one of the two homoeologs found in all other MIG species.
Genome Size and Gene Number in the MIG
The genome sizes and gene numbers in the MIG species do not fully correlate with predictions of hypotriploidy. Many current genome assemblies come from organisms that are inbred, or naturally homozygous, leading to alleles being merged in the genome assembly, and the production of a collapsed, haploid assembly from a diploid organism. Assembly of the genomes of organisms that have greater divergence between alleles may result in a partially uncollapsed, near-diploid genome estimate, where some of the genome generates a collapsed haploid sequence, while divergent segments are independently assembled. If we assume, as seems likely from our analyses (fig. 2), that homoeologs derived from A and B genomes are present in the apomicts for about half of the genomic regions, and that they are represented separately in the assembly, then our reference genomes (table 1) will represent partially diploid assemblies, and be expected to be larger than a comparable haploid assembly. The 53-Mb genome assembly of M. hapla is high-quality and contiguous, and matches closely to experimental estimates of the haploid genome size (50 Mb) (Opperman et al. 2008). The sequenced M. hapla was highly homozygous, and the assembly is expected to contain no or very few regions where haploid segments have assembled independently due to sequence divergence. If M. hapla represents a base genome size for Meloidogyne, we would expect the assemblies from the MIG apomicts to be of the order of 75 Mb. However, these species generate assemblies that are 122–163 Mb. Genome size is a biological attribute and varies between even closely related species due to segmental duplications, expansions of repetitive elements and transposons, and long-term biases in insertion versus deletion. Szitenberg et al. (2016) showed that the MIG have an increased content of transposable elements compared with M. hapla (supplementary section 4, Supplementary Material online). The larger MIG apomict genome assemblies are thus likely a reflection of this biological variation in genome size. The M. floridensis assembly is ∼75 Mb in span but largely homozygous, and is, as expected, between M. hapla and the MIG apomicts in size. The peculiar structure of the MIG apomictic genomes makes it very challenging to accurately estimate either total genome size, or the proportion of the genome present at two and three copies. More accurate estimation of the proportion of triploid genome, and to what extent and for which loci this differs between isolates, will require a much more contiguous genome assembly as a framework for read-mapping based quantification.
Protein coding gene number estimates are influenced by annotation procedure and genome contiguity in addition to real, biological, sources of variation. Poor genome quality can lead to gene number inflation by fragmentation of predicted coding sequences, or gene number reduction by poor assembly. We found between 14,144 protein coding sequences in M. floridensis and up to 30,308 (M. arenaria) in the apomicts (table 1). The homozygous genome of M. hapla contains ∼14,700 protein coding genes (Opperman et al. 2008), very similar to the number predicted in the mostly homozygous M. floridensis. The elevated gene numbers in the MIG apomicts likely reflect the independent prediction of genes in the homoeologous segments.
Homoeolog Loss and the Evolution of MIG Genomes
Gene conversion is an outcome of noncrossover recombination between chromosomes and is a common and important force influencing genome structure and diversity (Chen et al. 2007; Pessia et al. 2012; Korunes and Noor 2017). Gene conversion is associated with double strand break repair and involves the replacement of one, typically allelic, sequence by another such that they become identical. It is common during meiosis, which is initiated by a double strand break, but additionally occurs during mitosis (Chen et al. 2007). We found multiple lines of evidence suggesting that this may be an important process in shaping MIG genomes.
We have provided evidence that a part of the genome of the apomict Meloidogyne is triploid containing alpha1, alpha2, and beta copies of homoeologous segments. The apomict alpha1 and alpha2 regions contain the same protein coding loci in different species (supplementary section 7, Supplementary Material online), strongly suggesting that they arose in a common event in the MIG ancestor. The alternative, that three independent recent duplications occurred involving exactly the same genomic regions, is a less parsimonious scenario. Alpha1 and alpha2 are almost identical in sequence and they do not independently assemble, map, or resolve phylogenetically to indicate sequence divergence. This is challenging for the evolutionary scenario shown in the phylogeny (fig. 3A). If the alpha1 and alpha2 copies derive from the two haploid copies present in the alpha parental species, and have been evolving independently since that event, then alpha1 and alpha2 should be at least as divergent as the alpha genomes in different MIG species. The MIG species are clearly distinct with ∼2% protein-coding sequence divergence between them in each genome copy, and yet within a species the alpha1 and alpha2 copies have remained essentially identical over the same time span. However, homogenization of sequence copies is known in other genomes. Concerted evolution is a type of gene conversion, found in most organisms, which operates repeatedly over large genomic regions (Liao 1999) to maintain their sequence identity. It has been well-characterized in many systems including the homogenization of eukaryotic rRNA repeats (Eickbush and Eickbush 2007) and the palindromes of the male specific region on human and chimp Y chromosomes (Rozen et al. 2003). Concerted evolution will be more frequent between highly similar sequences, such as the original alpha1 and alpha2 alleles, than between more divergent sequences, such as alpha and beta homoeologs (Chen et al. 2007), a scenario supported by our data.
Different apomict species have different proportions of alpha and beta homoeologs remaining in their genomes (fig. 2). One mechanism by which this could occur is the deletion of one of the divergent genomic copies. Deletions to restore diploidy are known in other polyploids (Schnable et al. 2009; Lien et al. 2016). Another mechanism by which homoeologs could be lost, but without deletion of any genomic region, is gene conversion. This process could homogenize A and B genomic copies, leading to a reduction of gene clusters containing two copies (fig. 2) with no way to regain these divergent sequences in an apomictic species. This process would likely be stochastic, occurring differently in each species lineage, and generating variability between the MIG apomicts. These outcomes are observed and in addition we see sequence homogenization. When the sequence of homologous contigs containing divergent alpha and beta gene pairs are analyzed, we can directly observe the action of noncrossover recombination between those sequences. Although converted sequences are homozygous, noncrossover recombination can be a mechanism that also increases diversity among mitotically reproducing organisms. For example, the hybrid plant-pathogenic oomycete Phytophthora sojae has a high rate of mitotic gene conversion between homoeologous genes, and different lineages of the pathogen have generated variation in avirulence by gene conversion of different genomic regions (Chamnanpunt et al. 2001).
Although M. floridensis possessed the same alpha and beta genomes, with the same evolutionary history, its genome is very different in homoeolog content to that of the apomicts. Only a minor component of the M. floridensis genome is still present in divergent alpha and beta homoeologous copies (fig. 2) despite the fact that it shares genomic origins with the other MIG. The reduced genome span and count of protein coding loci in M. floridensis compared with the apomict MIG (table 1) could also be a product of coassembly of homogenized homoeologous loci. Extensive noncrossover recombination may provide a mechanism to explain these results. Since gene conversion is initiated by double strand breaks, which are associated with meiotic recombination, the automict M. floridensis may have an increased rate of gene conversion compared with the apomicts. This is very challenging to study since the loss of homoeologs itself removes the variation necessary to measure this process, and will likely require more data from different M. floridensis lineages. Meiosis is a diverse and powerful process that provides other mechanisms than gene conversion however by which to homogenize homologous sequences. Meloidogyne hapla, also an automict, homogenizes its genome rapidly by the rejoining of sister chromosomes during meiosis (Liu et al. 2007). Although superficially similar to the automixis in M. hapla, due to the existence of a small heterozygote fraction, the exact nature of the meiotic division of M. floridensis is still unclear (Handoo et al. 2004). If, as suggested, endoduplication of the genome is involved then this could also be a mechanism for homogenization and loss of homoeologs not requiring high levels of gene conversion. Genetic pedigree and cytological studies will be informative in determining the exact nature of automixis in M. floridensis, but at present, we cannot distinguish the driving forces for its genome-wide homoeolog loss.
Whether the differences in retention of ancestral homoeologous variants may underlie differences in host range and pathogenicity between species is a challenging question. The genome-wide differential retention of alpha and beta genomes, as well as the presence of alpha1 and alpha2 copies, could be functionally very significant for these nematodes. The complexity of MIG genome structure, as well as the diverse genetic mechanisms that may be driving this, could be an important source of adaptive variation. These processes could operate to generate diversity between the asexual mitotic parthenogen species, but also by putting the meiotic parthenogen M. floridensis on a separate genomic trajectory. Understanding the nature and rate of adaptive evolution in MIG species, as they compete with the defenses produced by plant resistance genes, will be an important direction for future research.
Low Intraspecific Divergence Implies Recent Global Colonization
The most economically important MIG species are globally distributed in agricultural land across the tropics, and whether they are recent immigrants or more anciently endemic in those locations has been unclear (Trudgill and Blok 2001). Comparisons of mitochondrial genomes have revealed closely related haplotypes globally distributed (Janssen et al. 2016), suggesting that they result from recent migrations associated with modern agriculture. Our sequencing of eight isolates of M. incognita and five of M. javanica from multiple continents has also shown that there is remarkably little genetic variation between isolates, and that this is true of both nuclear or mitochondrial genomes (fig. 3A; supplementary section 6, Supplementary Material online). The lack of intraspecific sequence diversity, even between samples taken from different continents, strongly suggests that agricultural environments do not contain indigenous populations but rather isolates that have only expanded with modern agriculture in the last few hundred years.
Nuclear sequence diversity exceeded mitochondrial diversity in comparisons between the MIG species (fig. 4). Although this is not typical of animals, the mitochondrial mutation rate is known to vary greatly among lineages (Nabholz et al. 2009; Lavrov and Pett 2016). It is, however, difficult to conclude that a low mutation rate alone is the cause of this extreme interspecific mitochondrial sequence identity since comparisons between the MIG and outgroups show much more variation in mtDNA than nuclear genomes (fig. 4). Between these closely related species the extremely high AT-content in the mitochondrial genome (84% for all MIG) may play a role. With mostly just two nucleotide character states out of four represented in the sequence, and most segregating intraspecific polymorphism being third codon position synonymous changes, saturation and homoplasy could be prevalent, leading to underestimation of the true divergence.
More genetic diversity was present within M. arenaria and within M. floridensis than the other MIG species examined, and other studies have also indicated that M. arenaria contains considerable diversity (Blok et al. 1997; Adam et al. 2005; Carneiro et al. 2008). Our sampling for these two taxa was limited, and in order to understand the structure and level of diversity in the MIG species extensive geographic sampling for population genomics will be needed. Many additional RKN species are likely to fall within the MIG phylogenetic cluster based on classical and molecular characterization (Holterman et al. 2009; Pagan et al. 2015). As we have shown earlier, phylogenomics of the nuclear genome is able to separate closely related species, and even low coverage sequencing without genome assembly will, given appropriate analyses, be able to place species robustly within this phylogeny. In turn, phylogenetic understanding can be transformed into insights into global dispersal patterns, adaptive evolution and plant host specialization.
Conclusions
We have sequenced the genomes of a diverse set of apomicts from the Meloidogyne incognita group of species and shown that the divergent genome copies, most likely from a hybridization event, predates the formation of these species. This group therefore shares the same parental species and evolutionary history. The MIG genomes are hypotriploid and subject to the action of noncrossover recombination both in homogenizing the divergent loci, but also in generating distinctiveness between the genomes of different species. Noncrossover recombination is especially visible between the MIG genomes because of the divergent genomic copies present, and broader sampling of genomes will speak to its phenotypic importance in this important system. Our study demonstrates the power of comparative genomics to untangle biologically complex species histories and the data will provide an opportunity to study the complex evolution of sex and genome structure.
Supplementary Material
Supplementary data are available at Genome Biology and Evolution online.
Supplementary Material
Acknowledgments
This work was funded by the Natural Environment Research Council through research grant NE/J011355/1 awarded to D.H.L. and M.B. A.S. is funded through the “ICA in Israel” grant 031606a and the Israel Ministry of Science. D.R.L. was funded through a James Hutton Institute postgraduate fellowship. We thank colleagues in Edinburgh Genomics and The Liverpool Centre for Genome Research for expert sequencing support. We thank Philippe Castagnone-Sereno, Etienne Danchin, Bob Robbins, Tom Prior for additional Meloidogyne samples, discussions, and advice. Christoph Hahn, Georgios Koutsovoulos, and Sujai Kumar gave valuable bioinformatics advice. The University of Hull VIPER High Performance Compute cluster supported our analyses. We thank Africa Gómez for informative discussions and close reading of the article.
Literature Cited
- Abad P, et al. 2008. Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita. Nat Biotechnol. 26(8):909–915. [DOI] [PubMed] [Google Scholar]
- Abbott R, et al. 2013. Hybridization and speciation. J Evol Biol. 26(2):229–246. [DOI] [PubMed] [Google Scholar]
- Adam MAM, Phillips MS, Blok VC.. 2005. Identification of Meloidogyne spp. from North East Libya and comparison of their inter- and intra-specific genetic variation using RAPDs. Nematology 7(4):599–609. [Google Scholar]
- Adams BJ, Dillman AR, Finlinson C.. 2009. Molecular taxonomy and phylogeny In: Perry RN, Moens M, Starr JL, editors. Root-knot nematodes. Wallingford: CABI: p. 119–138. [Google Scholar]
- Bebber DP, Holmes T, Gurr SJ.. 2014. The global spread of crop pests and pathogens. Glob Ecol Biogeogr. 23(12):1398–1407. [Google Scholar]
- Blanc-Mathieu R, et al. 2017. Hybridization and polyploidy enable genomic plasticity without sex in the most devastating plant-parasitic nematodes. PLoS Genet. 13(6):e1006777. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blok VC, Phillips MS, McNicol JW.. 1997. Genetic variation in tropical Meloidogyne spp. as shown by RAPDs. Fundam Appl. 20:127–133. [Google Scholar]
- Blok VC, Powers TO.. 2009. Biochemical and molecular identification In: Perry RN, Moens M, Starr JL, editors. Root-knot nematodes. Wallingford: CABI: p. 98–118. [Google Scholar]
- Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T.. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15):1972–1973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carneiro RMDG, et al. 2008. Diversity of Meloidogyne arenaria using morphological, cytological and molecular approaches. Nematology 10(6):819–834. [Google Scholar]
- Carneiro RMDG, Tigano MS, Randig O, Almeida MRA, Sarah J-L.. 2004. Identification and genetic diversity of Meloidogyne spp. (Tylenchida: Meloidogynidae) on coffee from Brazil, Central America and Hawaii. Nematology 6(2):287–298. [Google Scholar]
- Chamnanpunt J, Shan WX, Tyler BM.. 2001. High frequency mitotic gene conversion in genetic hybrids of the oomycete Phytophthora sojae. Proc Natl Acad Sci U S A. 98(25):14530–14535. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J-M, Cooper DN, Chuzhanova N, Férec C, Patrinos GP.. 2007. Gene conversion: mechanisms, evolution and human disease. Nat Rev Genet. 8(10):762–775. [DOI] [PubMed] [Google Scholar]
- Chevreux B, Wetter T, Suhai S, Others. 1999. Genome sequence assembly using trace signals and additional sequence information In: German Conference on Bioinformatics; Vol. 99; Heidelberg; p. 45–56. [Google Scholar]
- De Ley IT, et al. 2002. Phylogenetic analyses of Meloidogyne small subunit rDNA. J Nematol. 34(4):319–327. [PMC free article] [PubMed] [Google Scholar]
- Eickbush TH, Eickbush DG.. 2007. Finely orchestrated movements: evolution of the ribosomal RNA genes. Genetics 175(2):477–485. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Eisenback JD, Triantaphyllou HH.. 1991. Root-knot nematodes: Meloidogyne species and races. Man Agric Nematol. 191–274. [Google Scholar]
- Emms DM, Kelly S.. 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16:157.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Esbenshade PR, Triantaphyllou AC.. 1990. Isozyme phenotypes for the identification of Meloidogyne species. J Nematol. 22(1):10–15. [PMC free article] [PubMed] [Google Scholar]
- Garrison E, Marth G.. 2012. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907 [q-bio.GN]. Available from: https://arxiv.org/abs/1207.3907.
- Gori K, Suchan T, Alvarez N, Goldman N, Dessimoz C.. 2016. Clustering genes of common evolutionary history. Mol Biol Evol. 33(6):1590–1605. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hahn C, Bachmann L, Chevreux B.. 2013. Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads–a baiting and iterative mapping approach. Nucleic Acids Res. 41(13):e129–e129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Handoo ZA, et al. 2004. Morphological, molecular, and differential-host characterization of Meloidogyne floridensis n. sp. (Nematoda: Meloidogynidae), a root-knot nematode parasitizing peach in Florida. J Nematol. 36(1):20–35. [PMC free article] [PubMed] [Google Scholar]
- Hartman KM, Sasser JN.. 1985. Identification of Meloidogyne species on the basis of differential host test and perineal-pattern morphology In: Barker KR, Carter CC, Sasser JN, editors. An advanced treatise on Meloidogyne, Vol. II. Methodology. Raleigh: Department of Plant Pathology, North Carolina State University; p. 69–77. [Google Scholar]
- Heliconius Genome Consortium. 2012. Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature 487:94–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holterman M, et al. 2009. Small subunit rDNA-based phylogeny of the Tylenchida sheds light on relationships among some high-impact plant-parasitic nematodes and the evolution of plant feeding. Phytopathology 99(3):227–235. [DOI] [PubMed] [Google Scholar]
- Huerta-Cepas J, Dopazo J, Gabaldón T.. 2010. ETE: a python environment for tree exploration. BMC Bioinformatics 11:24.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hugall A, Moritz C, Stanton J, Wolstenholme DR.. 1994. Low, but strongly structured mitochondrial DNA diversity in root knot nematodes (Meloidogyne). Genetics 136(3):903–912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hunt DJ, Handoo ZA. Others 2009. Taxonomy, identification and principal species. Root-Knot Nematodes 1:55–88. [Google Scholar]
- Janssen T, Karssen G, Topalović O, Coyne D, Bert W.. 2017. Integrative taxonomy of root-knot nematodes reveals multiple independent origins of mitotic parthenogenesis. PLoS One 12(3):e0172190.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Janssen T, Karssen G, Verhaeven M, Coyne D, Bert W.. 2016. Mitochondrial coding genome analysis of tropical root-knot nematodes (Meloidogyne) supports haplotype based diagnostics and reveals evidence of recent reticulate evolution. Sci Rep. 6:22591.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kajitani R, et al. 2014. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Res. 24(8):1384–1395. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katoh K, Standley DM.. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 30(4):772–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korunes KL, Noor MAF.. 2017. Gene conversion and linkage: effects on genome evolution and speciation. Mol Ecol. 26(1):351–364. [DOI] [PubMed] [Google Scholar]
- Kumar S, Jones M, Koutsovoulos G, Clarke M, Blaxter M.. 2013. Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots. Front Genet. 4:237.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- LaFave MC, Sekelsky J.. 2009. Mitotic recombination: why? When? How? Where? PLoS Genet. 5(3):e1000411.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lavrov DV, Pett W.. 2016. Animal mitochondrial DNA as we do not know it: mt-genome organization and evolution in nonbilaterian lineages. Genome Biol Evol. 8(9):2896–2913. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H, Durbin R.. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liao D. 1999. Concerted evolution: molecular mechanism and biological implications. Am J Hum Genet. 64(1):24–30. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lien S, et al. 2016. The Atlantic salmon genome provides insights into rediploidization. Nature 533(7602):200–205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu QL, Thomas VP, Williamson VM.. 2007. Meiotic parthenogenesis in a root-knot nematode results in rapid genomic homozygosity. Genetics 176(3):1483–1490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lunt DH. 2008. Genetic tests of ancient asexuality in root knot nematodes reveal recent hybrid origins. BMC Evol Biol. 8:194.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lunt DH, Kumar S, Koutsovoulos G, Blaxter ML.. 2014. The complex hybrid origins of the root knot nematodes revealed through comparative genomics. PeerJ 2:e356.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marcussen T, et al. 2014. Ancient hybridizations among the ancestral genomes of bread wheat. Science 345(6194):1250092. [DOI] [PubMed] [Google Scholar]
- Mason AS, Pires JC.. 2015. Unreduced gametes: meiotic mishap or evolutionary mechanism? Trends Genet. 31(1):5–10. [DOI] [PubMed] [Google Scholar]
- Miller W, et al. 2012. Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proc Natl Acad Sci U S A. 109:E2382–E2390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mirarab S, et al. 2014. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17):i541–i548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moens M, Perry RN, Starr JL.. 2009. Meloidogyne species – a diverse group of novel and important plant parasites In: Perry RN, Moens M, Starr JL editors. Root-knot nematodes. Wallingford: CABI: p. 1–17. [Google Scholar]
- Myers EW, et al. 2000. A whole-genome assembly of Drosophila. Science 287(5461):2196–2204. [DOI] [PubMed] [Google Scholar]
- Nabholz B, Glémin S, Galtier N.. 2009. The erratic mitochondrial clock: variations of mutation rate, not population size, affect mtDNA diversity across birds and mammals. BMC Evol Biol. 9(1):54.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Opperman CH, et al. 2008. Sequence and genetic map of Meloidogyne hapla: a compact nematode genome for plant parasitism. Proc Natl Acad Sci U S A. 105(39):14802–14807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pagan C, et al. 2015. Mitochondrial haplotype-based identification of ethanol-preserved root-knot nematodes from Africa. Phytopathology 105(3):350–357. [DOI] [PubMed] [Google Scholar]
- Parra G, Bradnam K, Korf I.. 2007. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067. [DOI] [PubMed] [Google Scholar]
- Pessia E, et al. 2012. Evidence for widespread GC-biased gene conversion in eukaryotes. Genome Biol Evol. 4(7):675–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Robinson DF, Foulds LR.. 1981. Comparison of phylogenetic trees. Math Biosci. 53(1-2):131–147. [Google Scholar]
- Romiguier J, et al. 2014. Comparative population genomics in animals uncovers the determinants of genetic diversity. Nature 515(7526):261–263. [DOI] [PubMed] [Google Scholar]
- Rozen S, et al. 2003. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423(6942):873–876. [DOI] [PubMed] [Google Scholar]
- Schnable PS, et al. 2009. The B73 maize genome: complexity, diversity, and dynamics. Science 326(5956):1112–1115. [DOI] [PubMed] [Google Scholar]
- Slater GSC, Birney E.. 2005. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6:31.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stamatakis A. 2014. RAxML Version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9):1312–1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szitenberg A, et al. 2016. Genetic drift, not life history or RNAi, determine long-term evolution of transposable elements. Genome Biol Evol. 8(9):2964–2978. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Szitenberg A, John M, Blaxter ML, Lunt DH.. 2015. ReproPhylo: an environment for reproducible phylogenomics. PLoS Comput Biol. 11(9):e1004447.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tigano MS, Carneiro RMDG, Jeyaprakash A, Dickson DW, Adams BJ.. 2005. Phylogeny of Meloidogyne spp. based on 18S rDNA and intergenic region of mitochondrial DNA sequences. Nematology. 7(6):851–862. [Google Scholar]
- Triantaphyllou AC. 1963. Polyploidy and parthenogenesis in the root-knot nematode Meloidogyne arenaria. J Morphol. 113:489–499. [DOI] [PubMed] [Google Scholar]
- Triantaphyllou AC. 1981. Oogenesis and the chromosomes of the parthenogenic root-knot nematode Meloidogyne incognita. J Nematol. 13(2):95–104. [PMC free article] [PubMed] [Google Scholar]
- Triantaphyllou AC. 1985. Cytogenetics, cytotaxonomy and phylogeny of root-knot nematodes In: Sasser JN, editor. An advanced treatise on Meloidogyne, Vol. I. Biology and control. Raleigh: North Carolina State University. [Google Scholar]
- Trudgill DL, Blok VC.. 2001. Apomictic, polyphagous root-knot nematodes: exceptionally successful and damaging biotrophic root pathogens. Annu Rev Phytopathol. 39:53–77. [DOI] [PubMed] [Google Scholar]
- Vernot B, et al. 2016. Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science 352(6282):235–239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Williamson VM, Roberts PA.. 2009. Mechanisms and genetics of resistance In: Perry RN, Moens M, Starr JL, editors. Root-knot nematodes. Wallingford: CABI: p. 301–325. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.