Comparative Analysis between Homoeologous Genome Segments of Brassica napus and Its Progenitor Species Reveals Extensive Sequence-Level Divergence

Foo Cheung; Martin Trick; Nizar Drou; Yong Pyo Lim; Jee-Young Park; Soo-Jin Kwon; Jin-A Kim; Rod Scott; J Chris Pires; Andrew H Paterson; Chris Town; Ian Bancroft

doi:10.1105/tpc.108.060376

. 2009 Jul;21(7):1912–1928. doi: 10.1105/tpc.108.060376

Comparative Analysis between Homoeologous Genome Segments of Brassica napus and Its Progenitor Species Reveals Extensive Sequence-Level Divergence^[W]^,^[OA]

Foo Cheung ^a,¹, Martin Trick ^b,¹, Nizar Drou ^b, Yong Pyo Lim ^c, Jee-Young Park ^d, Soo-Jin Kwon ^d, Jin-A Kim ^d, Rod Scott ^e, J Chris Pires ^f, Andrew H Paterson ^g, Chris Town ^a, Ian Bancroft ^b,²

PMCID: PMC2729604 PMID: 19602626

Abstract

Homoeologous regions of Brassica genomes were analyzed at the sequence level. These represent segments of the Brassica A genome as found in Brassica rapa and Brassica napus and the corresponding segments of the Brassica C genome as found in Brassica oleracea and B. napus. Analysis of synonymous base substitution rates within modeled genes revealed a relatively broad range of times (0.12 to 1.37 million years ago) since the divergence of orthologous genome segments as represented in B. napus and the diploid species. Similar, and consistent, ranges were also identified for single nucleotide polymorphism and insertion-deletion variation. Genes conserved across the Brassica genomes and the homoeologous segments of the genome of Arabidopsis thaliana showed almost perfect collinearity. Numerous examples of apparent transduplication of gene fragments, as previously reported in B. oleracea, were observed in B. rapa and B. napus, indicating that this phenomenon is widespread in Brassica species. In the majority of the regions studied, the C genome segments were expanded in size relative to their A genome counterparts. The considerable variation that we observed, even between the different versions of the same Brassica genome, for gene fragments and annotated putative genes suggest that the concept of the pan-genome might be particularly appropriate when considering Brassica genomes.

INTRODUCTION

Polyploidy is widespread in angiosperms and is thought to have been a predominant factor in the evolution and success of these species (Leitch and Bennett, 1997; Wendel, 2000). Understanding the mechanisms involved in the structural and functional evolution of genomes during the process of diploidization following polyploidy is of major importance to plant biology. The availability of the complete genome sequence for Arabidopsis thaliana (Arabidopsis Genome Initiative, 2000) has enabled the outcomes of the diploidization process to be analyzed not only at the sequence level directly within the genome of Arabidopsis by the identification of related genome segments (Blanc et al., 2000; Paterson et al., 2000), but also in relation to sequences from distantly related species, including tomato (Solanum lycopersicum; Ku et al., 2000) and rice (Oryza sativa; Mayer et al., 2001). However, studies involving very ancient genome duplication and speciation events, such as those represented in Arabidopsis, the most recent of which, termed the alpha genome duplication (Bowers et al., 2003), give little insight into the mechanisms involved.

The cultivated Brassica species are the group of crops most closely related to Arabidopsis, all of which are members of the Brassiceae tribe within the Brassicaceae family (Warwick and Black, 1991). In contrast with tomato and rice, the lineages of which diverged from that of Arabidopsis ∼150 and 200 million years ago (Mya), respectively (Yang et al., 1999; Wolfe et al., 1989), the Brassica and Arabidopsis lineages diverged only ∼20 Mya (Yang et al., 1999). The lineages of the species Brassica rapa and Brassica oleracea, which contain the Brassica A and C genomes, respectively, have been estimated to have diverged ∼3.7 Mya (Inaba and Nishio, 2002). Brassica napus is an allopolyploid, arising from the hybridization of A and C genome progenitors (U, 1935), probably during human cultivation (i.e., <10,000 years ago). Genetic mapping confirmed that the progenitor A and C genomes are essentially intact in B. napus and have not been rearranged (Parkin et al., 1995). Therefore, the Brassica species provide an opportunity to study the evolution of genome structure over a wide range of timescales. However, representatives of the precise ancestors of natural B. napus have yet to be identified, and the breeding of rapeseed is likely to have included crosses that could have introduced into the oilseed rape germplasm allelic variation from additional sources, such as B. rapa (Qiu et al., 2006).

Comparative studies conducted at the level of genetic linkage maps revealed extensive duplication within Brassica genomes (Kowalski et al., 1994; Lagercrantz and Lydiate, 1996), and segmental relationships were identified indicative of a mixture of single, duplicated, and triplicated genome segments relative to Arabidopsis (Lan et al., 2000; Schmidt et al., 2001; Babula et al., 2003; Lukens et al., 2003; Parkin et al., 2003). More recently, it was determined using a cytogenetic approach that a distinctive feature of the Brassiceae tribe is that they contain extensively triplicated genomes (Lysak et al., 2005). Around the same time, a study based upon linkage mapping using sequenced restriction fragment length polymorphism markers demonstrated that 21 segments of the genome of Arabidopsis, representing almost its entirety, could be replicated and rearranged to generate the structure of the B. napus genome (Parkin et al., 2005). The majority of the Arabidopsis genome (11 segments) could each be aligned to six segments of the B. napus genome, indicative of triplication in the genomes of both progenitor species. However, there were numerous examples of segments having been detected in less than six copies, and some examples of more then six segments having been identified. A broader study across the Brassicaceae has identified 24 conserved chromosomal blocks, relating them to a proposed ancestral karyotype (n = 8) (Schranz et al., 2006). Although the most likely explanation for the structure of the Brassica genomes is paleohexaploidy followed by segmental loss and limited segmental duplication, other explanations are possible (Lukens et al., 2004), including paleotetraploidy followed by more extensive segmental duplication. Where analyses have been conducted on targeted regions of the genomes of B. oleracea, B. rapa, and B. napus using physical mapping techniques, the results have been consistent with the fundamentally triplicated nature of the diploid Brassica genomes (O'Neill and Bancroft, 2000; Park et al., 2005; Rana et al., 2004).

Two sequence-level studies, one in B. oleracea (Town et al., 2006) and one in B. rapa (Yang et al., 2006), have clarified aspects of genome evolution and organization in Brassica by taking a comparative approach using homoeologous regions of the genome of Arabidopsis. Between the studies, 11 Brassica genome segments were analyzed, totaling ∼2.8 Mb of contiguous sequences. The overall mean synonymous base substitution rate between Brassica genes and their Arabidopsis orthologs can be calculated as 0.51 (with a range for the individual segments of 0.46 to 0.58). Using the commonly adopted estimate of mutational rate of 1.5 × 10⁻⁸ synonymous substitutions per site per year (Koch et al., 2000), the estimate of the time that the Arabidopsis and Brassica lineages diverged can be refined to ∼17.0 Mya. The overall mean synonymous base substitution rate between genes in the related sets of Brassica genome segments can be calculated as 0.43 (with a range for each pair of segments of 0.36 to 0.57), allowing refinement of the estimate of the time that the replicated Brassica subgenomes diverged to ∼14.3 Mya.

The B. rapa study (Yang et al., 2006) characterized an additional segmental duplication, which occurred ∼0.8 Mya, resulting in the presence of four homoeologous genome segments in B. rapa for one segment in Arabidopsis. It is likely that such events, along with the segmental deletions, will account for the observed variances from genome triplication that have been observed (Parkin et al., 2005). In the larger study, of ∼2.2 Mb of B. oleracea sequences (Town et al., 2006), sequence annotation identified 177 genes in the B. oleracea genome segments that were in perfectly conserved collinear order with their orthologs in Arabidopsis. However, using Arabidopsis as an outgroup, it was shown that 35% of the genes inferred to be present when genome triplication occurred in the Brassica lineage have been lost, most likely via a deletion mechanism, in an interspersed pattern. In addition, evidence for the frequent insertion of gene fragments of nuclear genomic origin was identified, along with four examples of apparently intact genes in noncollinear positions in the B. oleracea and Arabidopsis genomes.

Brassica polyploids can be synthesized artificially. For example, B. napus can be resynthesized by hybridization of B. rapa and B. oleracea. Such lines initially display genome instability that has been shown to persist for at least five generations of self-pollination, leading to genetic changes, in all lines studied (Gaeta et al., 2007) and has been interpreted as indicating that a high rate of genome evolution occurs in polyploids (Song et al., 1995). These genetic changes are thought to be homoeologous nonreciprocal transpositions and were correlated with qualitative changes in the expression of specific genes and with phenotypic variation (Gaeta et al., 2007). By contrast, only a small number of homoeologous recombination events have been observed in oilseed rape (B. napus) cultivars (Parkin et al., 1995; Sharpe et al., 1995; Udall et al., 2004). When compared with its progenitor species at the level of genome microstructure, using hybridization-based physical mapping approaches, natural B. napus appears to show relatively little change in gene content and order (Rana et al., 2004). One explanation for the difference between resynthesized and natural B. napus is that natural B. napus may have evolved or inherited a locus controlling homoeologous recombination (Jenczewski et al., 2003).

The complexity of Brassica genome structure caused by multiple rounds of duplication (either segmental or the result of polyploidy), along with chromosomal-scale rearrangements and gene-level deletions, causes immense difficulty for attempts to understand the evolutionary timescales by the analysis of sequences of individual genes. To our knowledge, there have been no sequence-level analyses reported across complete sets of homoeologous segments of the genomes of a polyploid Brassica (such as B. napus) and representatives of its ancestral diploid progenitor species (such as B. rapa and B. oleracea). To fill this knowledge gap, understand more of the evolutionary processes shaping the structure of polyploid genomes over relatively short timescales, and perhaps to begin reconciling the results from natural and resynthesized B. napus, we undertook such a study. We focused on a set of related Brassica genome segments that had already been characterized by BAC-based physical mapping (Rana et al., 2004). These represent six sets of homoeologous genome segments as described in previous studies in B. oleracea (O'Neill and Bancroft, 2000; Town et al., 2006) and an almost complete set of the homoeologous regions of the genomes of B. rapa and B. napus. In the case of B. napus, BAC clones representing both the A genome (from a B. rapa progenitor) and the C genome (from a B. oleracea progenitor) homoeologs were studied.

RESULTS

Generation of Sequence Contigs

BACs were selected for sequencing defined regions of the genomes of B. rapa and B. napus on the basis of previous physical mapping analyses (Rana et al., 2004; Park et al., 2005), with substitution of the B. napus Contig A BACs listed in Rana et al. (2004) for clones identified on the basis of BAC end sequence alignments. The clones represent all or overlapping parts of the BAC contigs assembled (O'Neill and Bancroft, 2000) and sequenced (Town et al., 2006) initially for B. oleracea ssp alboglabra A12DH. The BAC clones that were sequenced as part of this study, and their assignment to contigs and (for B. napus clones) genome, are shown in Table 1. These include KBrH138O03, which contains sequences from B. rapa ssp pekinensis Chiifu. This clone had been sequenced as part of the Brassica rapa Genome Sequencing project and overlaps part of the Contig E region. BACs were generally sequenced to GenBank phase 3 finished standards, although there were several intergenic regions that could not be completely sequenced and two physical gaps in BAC JBr037K23 (representing the B. rapa Contig F region). After the completion of sequencing, the BACs from B. napus C genome Contig E were assembled to produce a single sequence assembly. The lengths of the resulting sequences are summarized in Table 1, along with GenBank accession numbers.

Table 1.

BAC Clones Sequenced

BAC(s)	Genome Segment^a	Size (bp)	GenBank Accession	Represents
JBnB047N24	B. napus C genome contig A (oilseed rape var Tapidor DH)	181,916	AC236789	Single BAC
JBr087B14	B. rapa contig B (ssp trilocularis RO18)	114,153	AC236902	Single BAC
JBnB192G15	B. napus A genome contig B (oilseed rape var Tapidor DH)	157,438	AC236788	Single BAC
JBnB009P12	B. napus C genome contig B (oilseed rape var Tapidor DH)	122,957	AC236784	Single BAC
JBr085G14	B. rapa contig C (ssp trilocularis RO18)	113,669	AC236901	Single BAC
JBnB015G17	B. napus A genome contig C (oilseed rape var Tapidor DH)	137,645	AC236785	Single BAC
JBnB144L05	B. napus C genome contig C (oilseed rape var Tapidor DH)	159,019	AC236787	Single BAC
JBr027J19	B. rapa contig D (ssp trilocularis RO18)	176,779	AC236898	Single BAC
JBnB074N14	B. napus A genome contig D (oilseed rape var Tapidor DH)	126,210	AC236786	Single BAC
JBnB001K07	B. napus C genome contig D (oilseed rape var Tapidor DH)	115,667	AC236783	Single BAC
JBr38I20	B. rapa contig E (ssp trilocularis RO18)	138,819	AC236791	Single BAC
KBrH138O03	B. rapa contig E (ssp pekinensis Chiifu)	158,973	AC225403	Single BAC
JBnB169A13	B. napus A genome contig E (oilseed rape var Tapidor DH)	142,504	AC236790	Single BAC
JBnB179F10, JBnB167N6, JBnB161I24, JBnB33K19	B. napus C genome contig E (oilseed rape var Tapidor DH)	394,448	AC236792	BAC contig
JBr037K23	B. rapa contig F (ssp trilocularis RO18)	119,843	AC236899	Single BAC
JBnB187B24	B. napus A genome contig F (oilseed rape var Tapidor DH)	141,876	AC236897	Single BAC
JBnB122M23	B. napus C genome contig F (oilseed rape var Tapidor DH)	128,998	AC236896	Single BAC

Open in a new tab

Identified by BAC contigs as defined in O'Neill and Bancroft (2000) and Rana et al. (2004), except the B. napus clone representing Contig A, which was identified using public B. napus BAC end sequences.

Annotation of Sequence Contigs

Gene prediction was conducted using genemarkHMM (Lukashin and Borodovsky, 1998) with limited manual curation to resolve inconsistencies between paralogs. The B. rapa A genome and B. napus A genome and C genome sequences were newly derived. The B. oleracea C genome contigs were sequenced and published previously (Town et al., 2006). However, recent changes to genemarkHMM necessitated reannotation of this contig to make comparisons of gene counts and other features comparable across the genome segments. Therefore, this analysis should be considered as including a new annotation of the preexisting B. oleracea sequences. The results are summarized in Table 2.

Table 2.

Summary of Annotated Features in Sequence Contigs Representing Sets of Homoeologous Genome Segments

Genome	Contig	Sequence Length	Total Genes Including Transposons	Genes with Functional Annotation	Transposons	Hypothetical Genes ≥100 Amino Acids	Hypothetical Genes <100 Amino Acids
B. oleracea	A	356,505	131	62	28	29	12
B. napus C	A	181,916	64	36	11	8	9
B. rapa	B	114,153	45	31	5	5	4
B. napus A	B	157,438	46	27	3	12	4
B. oleracea	B	284,024	82	37	16	18	11
B. napus C	B	122,957	38	18	5	8	7
B. rapa	C	113,669	37	25	0	10	2
B. napus A	C	137,645	44	37	0	7	0
B. oleracea	C	285,752	105	38	39	16	12
B. napus C	C	159,019	59	23	13	15	8
B. rapa	D	176,779	67	44	6	10	7
B. napus A	D	126,210	43	26	5	9	3
B. oleracea	D	353,037	120	49	23	22	26
B. napus C	D	115,667	43	22	6	9	6
B. rapa	E	138,819	31	21	1	5	4
B. napus A	E	142,504	32	20	0	7	5
B. oleracea	E	385,314	135	34	42	28	31
B. napus C	E	394,448	141	39	49	32	21
B. rapa	F	119,843	41	19	5	9	8
B. napus A	F	141,876	47	17	11	9	10
B. oleracea	F	335,918	119	28	42	32	17
B. napus C	F	128,998	47	10	17	11	9

Open in a new tab

After accounting for differences in the lengths of the sequenced segments, the relative densities of genes with functional annotations and those related to transposons differ both between homoeologous segments and between the A and C genomes. The highest densities of predicted genes with functional annotation, >20 per 100 kb, are in B. rapa Contigs B, C, and D and B. napus A genome Contigs C and D. The lowest densities, <10 per 100 kb, are in B. oleracea Contigs E and F and B. napus C genome Contigs E and F. The gene density is generally higher in the A genome, with mean values of ∼21 genes per 100 kb for B. rapa and ∼18 genes per 100 kb for B. napus A genome, than in the C genome, which has mean values of ∼12 genes per 100 kb for B. oleracea and 14 genes per 100 kb for B. napus C genome.

The density of transposon-related gene predictions shows the opposite trend, with generally higher density in the C genome, with mean values of ∼9.5 transposon-related gene predictions per 100 kb for B. oleracea and 8.2 transposon-related gene predictions per 100 kb for B. napus C genome, compared with mean values of ∼2.5 transposon-related gene predictions per 100 kb for B. rapa and ∼2.7 transposon-related gene predictions per 100 kb for B. napus A genome. The contrast in transposon-related content between genomes is greatest in Contigs C and E, which collectively contain only one transposon-related gene prediction across the four A genome segments, but 143 transposon-related gene predictions across the four C genome segments. These results are consistent with the relative expansion of genome regions being principally a consequence of the insertion of transposable elements rather than tandem or segmental duplications of genes or gene-containing sequences.

Overall Alignment of Homoeologous Genome Segments

We first compared the overall similarity of each of the homoeologous regions of the A and C genomes at the nucleotide level using MUMmer (Kurtz et al., 2004). The results for Contig E are shown, by way of an example, in Figures 1 to 4. The results for the remaining contigs are shown in Supplemental Figures 1 to 17 online. For each set of genome segments, the A genomes of B. rapa and B. napus (e.g. Figure 1) show a high degree of similarity along their entire length, as do the C genomes of B. oleracea and B. napus (e.g. Figure 2). In Contig E, there is one inversion, at the end of the A genome segments. In addition to annotated genes, the collinearity includes both intergenic regions and transposons and is punctuated by numerous small insertion/deletion (InDel) events. By contrast, comparisons between the A and C genomes, from either the two diploids (B. rapa and B. oleracea) (e.g. Figure 3) or the allotetraploid B. napus (e.g. Figure 4) showed more fragmented collinearity. This is due primarily to transposon insertions in the C genomes relative to the A genomes and is most pronounced in Contigs C and E.

Figure 1. — Alignment of the Homoeologous Regions of *B. rapa* (JCVI ID = 97) versus *B. napus* A Genome (JCVI ID = 96) as Found in Contig E Using MUMmer.

Each red dot denotes a MUM (a region of maximum unique match) with the apparently continuous lines being produced by the proximity of adjacent matches. Matches on the reverse strand are shown in blue.

Figure 2. — Alignment of the Homoeologous Regions of *B. oleracea* (JCVI ID = 120) versus *B. napus* C Genome (JCVI ID = 98) as Found in Contig E Using MUMmer.

Each red dot denotes a MUM (a region of maximum unique match) with the apparently continuous lines being produced by the proximity of adjacent matches. Matches on the reverse strand are shown in blue.

Figure 3. — Alignment of the Homoeologous Regions of *B. rapa* (JCVI ID = 97) versus *B. oleracea* (JCVI ID = 120) as Found in Contig E Using MUMmer.

Each red dot denotes a MUM (a region of maximum unique match) with the apparently continuous lines being produced by the proximity of adjacent matches. Matches on the reverse strand are shown in blue.

Figure 4. — Alignment of the Homoeologous Regions of *B. napus* A Genome (JCVI ID = 96) versus *B. napus* C Genome (JCVI ID = 98) as Found in Contig E Using MUMmer.

Each red dot denotes a MUM (a region of maximum unique match) with the apparently continuous lines being produced by the proximity of adjacent matches. Matches on the reverse strand are shown in blue.

Detailed Alignment of Sequence Annotations

VISTA plots (Frazer et al., 2004) provide highly informative visualizations of the similarities and differences between homoeologous chromosome regions. A query sequence is compared with a reference sequence and the annotation on that reference sequence. In the resulting plot, the nucleotide coordinates and annotation are those of the reference sequence, with the y axis showing percentage of identity between the two sequences (computed in 100-bp windows). Different colors are used to draw attention to conservation in exons and in noncoding sequences. Gaps reflect sequence that is present in the reference sequence but absent from the query sequence. For each set of homoeologous regions, VISTA plots were generated for reciprocal analyses of the two versions of the Brassica A genome and for the two versions of the Brassica C genome, as shown in Supplemental Figures 18 to 39 online. These reveal extensive sequence conservation between both coding and noncoding regions, as would be expected for such closely related genomes. They also show that there is extensive variation by InDel events throughout all of the contigs.

Comparative Genome Analysis

We compared the genome segments with each other and with the corresponding region of the genome of Arabidopsis on the basis of their gene annotations. To identify sets of orthologous genes, each set of predicted proteins was searched against the Arabidopsis proteome. The results are summarized in Figure 5 and show that there is extensive conservation of both gene order and gene content across each set of related genome segments. The phylogenies of the protein families match those expected from the assignment of genomic regions to Contigs A to F. An example is shown in Figure 6. There were, however, several instances of genes being modeled for some, but not all, members of a set of segments related across genomes. To assess whether or not there were related sequences in these regions lacking an expected gene model, we analyzed all of the sequence contigs by deconstructing them to 1000-bp overlapping segments and used BLASTN to identify sequence similarity to the coding regions of genes annotated in Arabidopsis, as described previously (Town et al., 2006). The results of the analyses, for B. rapa and B. napus sequences, are available in Supplemental Table 1 online. The results of alignment to the homoeologous regions of the Arabidopsis genome are summarized in Supplemental Figure 40 online. This analysis revealed 21 instances (circled in Supplemental Figure 40 online) of the presence of sequences, in collinear positions, related to Arabidopsis gene models, but which had not been incorporated into gene models during the annotation process. In none of these cases could manual intervention produce intact homologous gene models. Rather, they represent instances of partial deletions of genes from particular Brassica genomic regions, leaving collinear conserved gene fragments as noted previously for B. oleracea (Town et al., 2006). None of these 21 instances, which occur across nine gene families, involve genes predicted to be involved in transcription or cellular communication/transduction. The analysis also enabled the identification in the sequences derived from B. rapa and B. napus of many interspersed gene fragments, as first described in Brassica species in B. oleracea (Town et al., 2006).

Figure 5. — Relationships between Genes Modeled in the Sets of Genome Segments.

Symbols are color-coded to indicate functional classification of proteins denote the presence in a genome segment of a gene model with high similarity to a gene model of *Arabidopsis*, as shown by a connecting line.

Figure 6. — Phylogenetic Analysis of a Family of Sodium:Dicarboxylate Symporters Found on Contigs A, B, and C.

We identified one example of nontandem gene duplication. This involves, in Contig C, a second copy of a gene with high similarity to At5g47590 occurring between orthologs of At5g47600 and At5g47610. This is the only instance of disrupted collinearity with the corresponding regions of the genome of Arabidopsis. There is one example of a noncollinear (with respect to Arabidopsis) homologous sequence that is conserved across the A and C genomes: sequences with similarity to At3g43790 (annotated as a carbohydrate transporter, which has ∼88% identity in exon regions) in Contig E.

Timing of Genome Divergence

Previous studies have estimated that the Brassica A and C genomes, as represented in B. rapa and B. oleracea, diverged∼3.7 Mya (Inaba and Nishio, 2002). The approach used (PCR amplification from genomic DNA of specific genes for sequencing) is problematic in polyploids such as B. napus, as both homoeologs tend to coamplify. Thus, the time of divergence of the A and C genomes as represented in natural B. napus and their homoeologs in B. rapa and B. oleracea, respectively, has not been determined. We used the sequences we had obtained to estimate the timing of these events for each genomic region separately. As summarized in Tables 3 to 8, the contigs contained varying numbers of complete sets of gene families conserved across all four homoeologous genome segments, from 3 in contig F to 15 in Contig C. In addition, 23 genes are conserved between the Contig A homoeologs (which were identified in the C genome only). Synonymous base substitution rates, Ks values, were calculated between the B. oleracea and B. rapa orthologs for each of the genes in Contigs B to F.

Table 3.

Proteins Conserved across Sets of Brassica Contig A Regions and in Arabidopsis

Description	Arabidopsis	B. oleracea	B. napus C Genome
Hypothetical protein	AT5G47690	119.m000152	115.m000085
CCAAT-box binding transcription factor family protein/leafy cotyledon 1-related (L1L)	AT5G47670	119.m000214	115.m000114
DNA binding protein-related	AT5G47660	119.m000249	115.m000111
ADP-ribose diphosphatase (Nudix-related) binding/hydrolase	AT5G47650	119.m000257	115.m000092
CCAAT-box binding transcription factor family protein	AT5G47640	119.m000171	115.m000108
Mitochondrial acyl carrier protein	AT5G47630	119.m000182	115.m000080
Heterogeneous nuclear ribonucleoprotein, putative	AT5G47620	119.m000239	115.m000120
Zinc finger (C3HC4-type RING finger) family protein	AT5G47610	119.m000243	115.m000098
Hypothetical protein	AT5G47570	119.m000177	115.m000109
Malate transmembrane transporter/sodium:dicarboxylate symporter	AT5G47560	119.m000158	115.m000068
Cys protease inhibitor, putative	AT5G47550	119.m000167	115.m000066
Mo25 family protein	AT5G47540	119.m000161	115.m000113
Auxin-responsive protein, putative	AT5G47530	119.m000228	115.m000117
Rab GTPase	AT5G47520	119.m000139	115.m000124
SEC14 cytosolic factor family protein	AT5G47510	119.m000215	115.m000096
Pectinesterase family protein	AT5G47500	119.m000136	115.m000112
Nodulin MtN21 family protein	AT5G47470	119.m000225	115.m000078
Pentatricopeptide repeat-containing protein	AT5G47460	119.m000150	115.m000083
Hypothetical protein	AT5G47455	119.m000195	115.m000070
Tonoplast intrinsic protein	AT5G47450	119.m000140	115.m000093
Hypothetical protein, contains InterPro domain Trp RNA binding attenuator protein-like (InterPro:IPR016031)	AT5G47420	119.m000212	115.m000118
myb family transcription factor	AT5G47390	119.m000169	115.m000101
HAT2; transcription factor	AT5G47370	119.m000154	115.m000102

Open in a new tab

Table 4.

Proteins Conserved across Sets of Brassica Contig B Regions and in Arabidopsis

Description	Arabidopsis	B. napus A Genome	B. rapa	B. oleracea	B. napus C Genome
Cys protease inhibitor, putative	AT5G47550	109.m000047	114.m000056	117.m000158	101.m000045
Mo25 family protein	AT5G47540	109.m000090	114.m000078	117.m000099	101.m000055
Auxin-responsive protein, putative	AT5G47530	109.m000049	114.m000058	117.m000146	101.m000056
Rab GTPase	AT5G47520	109.m000078	114.m000060	117.m000138	101.m000058
Hypothetical protein	AT5G47490	109.m000083	114.m000053	117.m000115	101.m000057
Hypothetical protein contains InterPro domain Zinc finger, RING/FYVE/PHD-type	AT5G47430	109.m000064	114.m000086	117.m000139	101.m000060
myb family transcription factor	AT5G47390	109.m000060	114.m000061	117.m000118	101.m000073

Open in a new tab

Table 5.

Proteins Conserved across Sets of Brassica Contig C Regions and in Arabidopsis

Description	Arabidopsis	B. napus A Genome	B. rapa	B. oleracea	B. napus C Genome
Heat shock protein-related	AT5G47600	102.m000055	113.m000051	118.m000116	107.m000085
Heat shock protein-related	AT5G47590	102.m000076	113.m000060	118.m000167	107.m000102
Heat shock protein-related	AT5G47590	102.m000088	113.m000044	118.m000192	107.m000099
Unknown protein	AT5G47570	102.m000067	113.m000047	118.m000120	107.m000060
Malate transmembrane transporter/sodium:dicarboxylate symporter	AT5G47560	102.m000068	113.m000063	118.m000122	107.m000074
Cys protease inhibitor, putative	AT5G47550	102.m000080	113.m000069	118.m000158	107.m000112
Rab GTPase	AT5G47520	102.m000051	113.m000054	118.m000177	107.m000064
SEC14 cytosolic factor family protein	AT5G47510	102.m000053	113.m000055	118.m000174	107.m000106
Pectinesterase family protein	AT5G47500	102.m000062	113.m000073	118.m000175	107.m000063
Nodulin MtN21 family protein	AT5G47470	102.m000086	113.m000041	118.m000115	107.m000104
Hypothetical protein	AT5G47455	102.m000074	113.m000048	118.m000118	107.m000088
Tonoplast intrinsic protein	AT5G47450	102.m000069	113.m000066	118.m000185	107.m000118
Hypothetical protein contains InterPro domain Pleckstrin-like	AT5G47440	102.m000082	113.m000046	118.m000145	107.m000071
Hypothetical protein	AT5G47400	102.m000079	113.m000059	118.m000191	107.m000065
HAT2; transcription factor	AT5G47370	102.m000083	113.m000071	118.m000132	107.m000086

Open in a new tab

Table 6.

Proteins Conserved across Sets of Brassica Contig D Regions and in Arabidopsis

Description	Arabidopsis	B. napus A Genome	B. rapa	B. oleracea	B. napus C Genome
Myosin heavy chain-related	AT4G17210	104.m000083	110.m000112	122.m000154	100.m000076
Unknown protein	AT4G17215	104.m000079	110.m000084	122.m000132	100.m000057
Microtubule-associated protein	AT4G17220	104.m000076	110.m000083	122.m000216	100.m000083
Hypothetical protein	AT4G17250	104.m000050	110.m000111	122.m000137	100.m000067
l-lactate dehydrogenase, putative	AT4G17260	104.m000052	110.m000114	122.m000214	100.m000066
Mo25 family protein	AT4G17270	104.m000061	110.m000106	122.m000229	100.m000086
Hypothetical protein, contains InterPro domain Pleckstrin-like	AT4G17350	104.m000045	110.m000119	122.m000189	100.m000054
60S ribosomal protein L15	AT4G17390	104.m000084	110.m000103	122.m000206	100.m000056
myb family transcription factor	AT5G47390	104.m000068	110.m000129	122.m000207	100.m000064
Homeobox-leucine zipper DNA binding/transcription factor	AT4G17460	104.m000053	110.m000109	122.m000196	100.m000082
Palmitoyl protein thioesterase family protein	AT4G17470 AT4G17480 AT4G17483	104.m000077	110.m000089	122.m000183	100.m000059

Open in a new tab

Table 7.

Proteins Conserved across Sets of Brassica Contig E Regions and in Arabidopsis

Description	Arabidopsis	B. napus A Genome	B. rapa	B. oleracea	B. napus C Genome
60S ribosomal protein L15	AT4G17390	96.m000041	97.m000047	99.m000221	98.m000160
Palmitoyl protein thioesterase family protein	AT4G17483	96.m000051	97.m000036	99.m000255	98.m000253
Hypothetical protein	AT4G17486	96.m000050	97.m000061	99.m000211	98.m000196
Ethylene-responsive element binding factor	AT4G17490	96.m000064	97.m000037	99.m000230	98.m000209
Ethylene-responsive element binding factor	AT4G17500	96.m000038	97.m000048	99.m000259	98.m000270
Nuclear RNA binding protein, putative	AT4G17520	96.m000042	97.m000052	99.m000241	98.m000195
rab GTP binding protein	AT4G17530	96.m000033	97.m000034	99.m000163	98.m000278
Transporter-related	AT4G17550	96.m000049	97.m000039	99.m000170	98.m000210
Ribosomal protein L19 family protein	AT4G17560	96.m000063	97.m000035	99.m000234	98.m000254
Zinc finger (GATA type) family protein	AT4G17570	96.m000047	97.m000040	99.m000195	98.m000207
Aromatic-rich family protein	AT4G17650	96.m000034	97.m000051	99.m000162	98.m000249

Open in a new tab

Table 8.

Proteins Conserved across Sets of Brassica Contig F Regions and in Arabidopsis

Description	Arabidopsis	B. napus A Genome	B. rapa	B. oleracea	B. napus C Genome
Tonoplast intrinsic protein	AT4G17340	108.m000087	111.m000071	121.m000134	105.m000058
Hypothetical protein, contains InterPro domain Pleckstrin-like	AT4G17350	108.m000054	111.m000068	121.m000182	105.m000084
Hypothetical protein, contains zinc finger domain	AT4G17410	108.m000093	111.m000057	121.m000198	105.m000048

Open in a new tab

Using the commonly adopted estimate of mutational rate of 1.5 × 10⁻⁸ synonymous substitutions per site per year (Koch et al., 2000), the time at which the B. oleracea and B. rapa lineages diverged can be estimated. The mean values, as summarized in Table 9, are in excellent agreement with the previously estimated timing of this divergence, 3.7 Mya (Inaba and Nishio, 2002), validating the mean Ks values of these sets of genes as being an appropriate measure. We therefore used the same approach to quantify the relatedness and divergence times of the A and C genomes as represented in the diploid species (B. oleracea and B. rapa) and the allotetraploid B. napus. The results are shown in Table 9. The estimated time since divergence of the B. napus genome segments (as represented in Contigs A to F) from those representative of their progenitor species differed considerably between the regions studied. In the B. napus C genome, the most closely related segment to that in B. oleracea was Contig E, with a mean estimate of 0.12 Mya. The most distantly related was Contig D, with a mean estimate of 1.31 Mya. In the B. napus A genome, the most closely related segment to that in B. rapa was Contig F, with a mean estimate of 0.45 Mya. The most distantly related was Contig E, with a mean estimate of 1.25 Mya.

Table 9.

Divergence of Genome Segments Based on Synonymous Base Substitution Rates

	B. rapa–B. oleracea		B. oleracea–B. napus C Genome		B. rapa–B. napus A Genome
Region	Ks ± sd	Mya^a	Ks ± sd	Mya^a	Ks ± sd	Mya^a
Contig A			0.008 ± 0.014	0.27 ± 0.47
Contig B	0.077 ± 0.040	2.57 ± 1.33	0.018 ± 0.016	0.60 ± 0.53	0.031 ± 0.019	1.03 ± 0.63
Contig C	0.127 ± 0.054	4.23 ± 1.80	0.023 ± 0.027	0.77 ± 0.90	0.035 ± 0.025	1.17 ± 0.83
Contig D	0.110 ± 0.042	3.66 ± 1.40	0.041 ± 0.039	1.37 ± 1.31	0.028 ± 0.024	0.92 ± 0.81
Contig E	0.092 ± 0.042	3.08 ± 1.41	0.004 ± 0.004	0.12 ± 0.15	0.037 ± 0.019	1.25 ± 0.62
Contig F	0.089 ± 0.029	2.97 ± 0.96	0.014 ± 0.006	0.46 ± 0.20	0.014 ± 0.006	0.45 ± 0.19

Open in a new tab

Time since divergence of the genome segments: million years ago ± sd.

Divergence of Homoeologous Genome Segments by Single Nucleotide Polymorphisms and InDels

To quantify relative expansion or contraction in length of related Brassica genome segments, we calculated the length of sequence encompassing gene models representing complete sets of conserved genes, from the beginning of the first collinear gene model, to the end of the last. The results are summarized in Table 10. Contigs C and E show considerable expansion in the C genome relative to the A genome, with the remainder of the regions being of more similar lengths in the two genomes. The A genome regions in B. napus are generally (four cases out of the five analyzed) shorter than in B. rapa, whereas the C genome in B. napus (four cases out of the six analyzed) is more frequently longer than in B. oleracea. The sequenced genome segments contain similar amounts of coding sequence, but the expanded C genome segments show a much increased content of transposon-related and noncoding sequences. The extent of sequence divergence between A and C genomes is of relevance for assessing the feasibility of developing homoeolog-specific molecular markers and to monitor homoeolog-specific gene expression in B. napus. In addition, any differences between rates of polymorphism occurrence between different fractions of the genome may be indicative of differential constraints on polymorphism generation and retention.

Table 10.

Overall Lengths of Aligned Genome Segments (bp)

Region	Arabidopsis	B. napus A Genome	B. rapa	B. oleracea	B. napus C Genome
Contig A	AT5G47690 to AT5G47370			146,021	160,690
Contig B	AT5G47560 to AT5G47390	55,717	61,639	81,151	83,002
Contig C	AT5G47600 to AT5G47370	65,637	70,381	149,299	105,105
Contig D	AT4G17210 to AT4G17483	47,904	66,291	56,744	74,131
Contig E	AT4G17380 to AT4G17650	72,248	69,553	320,888	378,635
Contig F	AT4G17340 to AT4G17410	14,066	20,311	16,765	16,744

Open in a new tab

We assessed the single nucleotide polymorphism (SNP) content of the coding regions of the genes conserved across sets of related genome segments. This includes both synonymous and nonsynonymous polymorphisms and provides a measure of the polymorphism specifically within the transcriptome. The results are shown in Table 11. The relative rates of polymorphism are all consistent with the relative periods of time estimated since the divergence of the genome segments (as shown in Table 9). The lowest polymorphism rate was observed between the Contig E orthologs in B. oleracea and B. napus C genome: SNPs were present at a frequency of 0.16%. The greatest polymorphism rate was observed between the Contig E orthologs in B. rapa and B. napus A genome: SNPs were present at a frequency of 1.49%.

Table 11.

SNP and InDel Rates between Aligned Genome Segments

	B. oleracea–B. napus C Genome			B. rapa–B. napus A Genome
	Coding	Overall		Coding	Overall
Region	SNPs %	SNPs %	InDels/kb	SNPs %	SNPs %	InDels/kb
Contig A	0.45	0.39	0.64
Contig B	1.01	1.69	3.04	0.96	1.63	3.38
Contig C	1.15	1.46	2.13	1.45	1.98	4.39
Contig D	1.57	1.69	3.87	1.07	0.82	1.72
Contig E	0.16	0.47	0.55	1.49	1.73	3.73
Contig F	0.82	1.96	2.48	0.72	1.07	2.22

Open in a new tab

We then assessed the overall polymorphism content of the genome segments (including the coding sequences). This provides a measure of the polymorphism within A genomic sequences and within C genomic sequences (but not between A and C). The results are shown in Table 11. The relative rates of polymorphism are broadly consistent with the polymorphism rates observed within coding sequences. For example, the least polymorphic genome segments by both analyses are the C genome Contig A and Contig E regions, whereas A genome Contig C and Contig E are among the most polymorphic.

In addition to sequence evolution by single nucleotide mutation, leading to SNPs, insertion-deletion events can also give rise to polymorphisms termed InDels. We assessed the number and sizes of InDels between the genome segments in B. napus and the representatives of the progenitor sequences, including both coding and noncoding sequences. The results are shown in Table 11. The relative rates of InDel polymorphism are consistent with the SNP polymorphism rates observed both within coding sequences and overall. For example, the lowest InDel polymorphism rates were observed between the C genome Contig A and Contig E regions, whereas A genome Contig C and Contig E are among the most polymorphic. The size distribution of the InDels detected is shown in Figure 7 for the Contig E region and Supplemental Figures 41 to 45 for the remaining regions. Although the majority of InDels are very small, mostly under 4 bp, there are numerous larger ones.

Figure 7. — Size Distribution, for the Contig E Region, of InDel Variation between *B. rapa* and *B. napus* A Genome Segments and between *B. oleracea* and *B. napus* C Genome Segments.

Comparison of the A Genome in B. napus, B. rapa ssp trilocularis, and B. rapa ssp pekinensis

Although a BAC-based physical map has been developed for B. rapa ssp trilocularis (http://brassica.bbsrc.ac.uk/IGF/), the Brassica rapa Genome Sequencing project selected a Chinese cabbage, B. rapa ssp pekinensis var Chiifu, for genome sequencing (http://brassica.bbsrc.ac.uk/brassica_genome_sequencing_concept.htm). Consequently, the extent to which the genome of B. rapa ssp pekinensis represents the A genome of B. napus is of particular importance.

A BAC library, named KBrH, has been constructed using genomic DNA of B. rapa ssp pekinensis var Chiifu (Park, et al., 2005). We sequenced a portion of clone KBrH138O03 that overlaps with the B. rapa ssp trilocularis and B. napus A genome Contig E segments that we have analyzed at the sequence level. In total, 17,653 bp could be aligned across all three genomes. Over this overlapping region, the B. rapa ssp trilocularis and B. napus A genome sequences differ at 293 bases (1.66%), the B. rapa ssp trilocularis and B. rapa ssp pekinensis sequences differ at 111 bases (0.63%), and the B. rapa ssp pekinensis and B. napus sequences differ at 316 bases (1.79%). To ensure that the three sequences are of consistently high quality, 10 regions rich in polymorphisms were resequenced. All polymorphisms were validated. Therefore, we conclude that the B. rapa genomes are substantially more closely related to each other then they are to the A genome of B. napus and that the genome of B. rapa ssp trilocularis may be slightly more representative of the A genome of B. napus than is the genome of B. rapa ssp pekinensis, at least in this region.

DISCUSSION

Conservation of Gene Order in Brassica Genomes

As had been shown previously for the B. oleracea genome segments (Town et al., 2006), the homoeologous B. rapa and both B. napus genome segments show almost perfect conservation of gene order with the homoeologous regions of the Arabidopsis genome. Breakdown of collinearity of apparently intact genes between the genomes of Arabidopsis and Brassica species has been postulated to be the consequence of transposition of intact genes (Town et al., 2006). However, as these were present in only one paralogous Brassica genome segment and only one representative of the paralog (that of B. oleracea ssp alboglabra A12DH) was analyzed, it was unclear when the putative transposition took place. We have identified an additional example of an apparently intact gene in a noncollinear position, a gene very similar (∼88% nucleotide identity in exon regions) to At3g43790. This gene is in a position not covered by the B. oleracea sequence, but is present, in conserved positions, in the sequences from B. rapa and both genomes of B. napus. We have reexamined the sequences from the paralogous regions of the B. oleracea genome (Contigs D and F in O'Neill and Bancroft, 2000; Town et al., 2006), which do cover the corresponding region. The gene is not present in either. Therefore, we conclude that the most likely explanation is that the transposition of a gene with homology to At3g43790 occurred after the divergence of the lineages leading to Arabidopsis and Brassica, in only one of the paralogous ancestral Brassica genomes, but before the divergence of the Brassica A and C genome lineages.

We identified 21 instances of partial gene loss, where remnants of genes could be identified based on nucleotide sequence similarity, but which could not be included in gene models. None of these involved genes inferred to be involved in transcription or cellular communication/signal transduction, which is consistent with the hypothesis that dosage-sensitive genes are preferentially retained following genome duplication.

Chronology of Brassica Genome Divergence

Using the synonymous base substitution rates in sets of genes conserved across homoeologous genome segments, we estimated that the A and C genomes as represented in B. rapa and B. oleracea diverged between 2.57 and 4.23 Mya, in agreement with previous estimates (Inaba and Nishio, 2002). Our estimates of the timing of divergence of the B. napus genomes relative to the genomes of B. oleracea and B. rapa differed considerably between the genome regions studied, varying between mean estimates of 0.12 and 1.37 Mya. This is unlikely to be indicative of a difference in nucleotide substitution rates across different regions of the genome of B. napus. Rather, because the precise lines of B. oleracea and B. rapa that hybridized to form natural B. napus are unknown, these results more likely indicate that different parts of the genome of B. napus, as represented by European Winter oilseed rape variety Tapidor, were derived from different lines of B. rapa and B. oleracea, none of which were identical to the representatives of these species that we studied (i.e., B. oleracea ssp alboglabra A12DH and B. rapa ssp trilocularis RO18). Further analyses, based on overall sequence polymorphism rates, showed that the B. napus A genome, as represented in Contig E, may be slightly more diverged from that of B. rapa ssp pekinensis var Chiifu (which is the subject of the Brassica genome sequencing effort) than it is from B. rapa ssp trilocularis.

Genome Evolution by SNP and InDel Mechanisms

The majority of polymorphisms distinguishing the two Brassica A genomes or the two Brassica C genomes are SNPs. These vary in abundance approximately in proportion to the estimated time since divergence of the genome segments, for example, the relatively closely related C genome Contig E regions, which diverged ∼0.12 Mya, show an overall genomic SNP rate of 0.47%, whereas the more distantly related A genome Contig E regions, which diverged ∼1.25 Mya, show an overall genomic SNP rate of 1.73%. In addition to SNPs, the Brassica genomes differed by InDels. These occur at high frequency, on average 0.55 per kb between the relatively closely related C genome Contig E regions and 3.73 per kb between the more distantly related A genome Contig E regions. Their abundance in Brassica genomes is consistent with the previously observed ease of identification of molecular markers based on InDel differences (http://brassica.bbsrc.ac.uk/IMSORB/). In two of the regions of the genome that we studied, Contig C and Contig E, the C genome was found to be greatly expanded relative to the A genome, primarily by the insertion of transposable elements. Indeed, the size of the genome of B. oleracea is, at ∼600 Mb, significantly larger than that of B. rapa, which is ∼500 Mb (Arumuganthan and Earle, 1991). Therefore, this overall genome expansion may be attributable at least partly to transposon amplification in euchromatic regions.

Perspectives on Genome Evolution

The gross structure of the Brassica genomes appears to have evolved by a series of polyploidization, segmental duplication, and deletion events in varying proportions dependent upon whether a paleohexaploid or paleotetraploid ancestor was involved. Three complete sets of three related paralogous genome segments have been sequenced, two in B. oleracea (Town et al., 2006) and one in B. rapa (Yang et al., 2006). If evolution had proceeded via a paleotetraploid with subsequent segmental duplication, the extant representative genome segments in the diploid species would show evidence in the triplicated genomic regions of two distinct duplication events. In none of the three cases examined to date was this observed; rather, all three paralogs were approximately equally diverged in each case. This favors the hypothesis of a paleohexaploid ancestor. A later segmental duplication has been characterized at the sequence level and was estimated to have occurred ∼0.8 Mya (Yang et al., 2006) (i.e., very much later than the hypothesized hexaploidy). Definitive proof of segmental deletions is difficult, especially for small segments when using approaches based upon molecular markers and linkage maps, as have been conducted to date (Parkin et al., 2005), but is very likely to have occurred. Small-scale deletions have been observed at the level of genome microstructure and sequence in Brassica (O'Neill and Bancroft, 2000; Rana et al., 2004; Town et al., 2006; Yang et al., 2006). Thus, B. napus represents an excellent model system in which to study the process of diploidization following polyploidy.

There is clear evidence that resynthesized B. napus shows a high rate of genome change (Song et al., 1995; Udall et al., 2004; Lukens et al., 2004), and this continues for at least five generations following polyploidy, leading to qualitative changes in the expression of specific genes and phenotypic variation (Gaeta et al., 2007). The genetic changes are likely to involve homoeoogous nonreciprocal transpositions (Gaeta et al., 2007). Natural B. napus may have evolved or inherited a locus controlling homoeologous recombination (Jenczewski et al., 2003), so such a high rate of genome change may not have occurred for long, if at all. We found no evidence within the B. napus genome of homoeologous exchanges (i.e., the genes in the B. napus A genome were most closely related to the genes in B. rapa, and the genes in the B. napus C genome were most closely related to the genes in B. oleracea).

Our studies were successful in providing estimates of the timing of the divergence of the A and C genomes as represented in B. napus and its progenitor species. These differed considerably between different regions of the B. napus genome, indicating that the genome of oilseed rape, as exemplified by var Tapidor, is likely to have been derived from multiple different progenitors with varying degrees of relatedness to B. oleracea ssp alboglabra A12DH and B. rapa ssp trilocularis RO18 or ssp pekinensis Chiifu. It is highly unlikely that we will be able to differentiate between the genome changes that occurred before the formation of B. napus and those which have occurred subsequently.

Our analyses confirm that interspersed gene fragments, first described in Brassica species in B. oleracea (Town et al., 2006), also occur in B. rapa and B. napus. These fragments contain introns so are of genomic origin. The process of incorporation into regions of the genome of unspliced fragments of unlinked cellular genes has been termed transduplication (Juretic et al., 2005) and has been observed to have been mediated by MULE, CACTA, and Helitron elements (Jiang et al., 2004; Lai et al., 2005; Zabala and Vodkin, 2005). Although the capture mechanism is not understood, it is likely a consequence of the rolling-circle mechanism of transposon replication (Feschotte and Wessler, 2001). The resulting insertions can contain fragments of many genes (Morgante et al., 2005). Although these are generally pseudogenes (Gupta et al., 2005), they frequently appear to be transcribed (Brunner et al., 2005). Transduplication of an apparently functional gene by a MULE has been reported in Arabidopsis (Hoen et al., 2006). Thus, both transposon-mediated assembly of novel genes and transposon-mediated dispersal of duplicates of functional genes to new positions within the genome have been described in plants. The Brassica genomes show evidence of the consequences of such genome evolutionary mechanisms and represent a new group of related plant species in which to study them. In addition, knowledge of these characteristics of Brassica genomes will be important for comparative genomic approaches for the exploitation of the emerging B. rapa genome sequence.

Our results are consistent with the plasticity of the genomes of Brassica species being similar to those of cereal genomes (Morgante, 2006). It seems likely that the genomes of many of the world's major crop species are evolving and diverging so quickly that we should expand our perspective to consider their pan-genomes, a concept that has been put forward for some bacterial species (Tettelin et al., 2005). The pan-genome comprises a core shared genome and a variable fraction partially shared between lines and acknowledges that the genome of a species is not fully represented by the genome sequence of any single line. Ongoing generation of genetic variation would be consistent with a hypothesis that the continued success in breeding improved varieties of crops such as oilseed rape and wheat (Triticum aestivum), despite very narrow genetic bases, is underpinned by the inherent properties of their genomes to evolve at the sequence level.

METHODS

BAC Sequencing

The KBrH138O03 clone, which was donated by the Korea Brassica Genome Resource Bank, was sequenced as described previously (Yang et al., 2006). The remaining BACs were sequenced essentially as described previously (Town et al., 2006).

Sequence Annotation

Gene predictions were made using Genemark.hmm (Lukashin and Borodovsky, 1998) version 3.3b 76 and the Arabidopsis thaliana matrix. This was also the default program used for gene calling both for Arabidopsis annotation (Haas et al., 2005) and the previous Brassica oleracea annotation (Town et al., 2006). Changes in the program since our previous annotation of B. oleracea contig E necessitated reannotating this contig with the newer version of the program to provide uniformity for comparisons across the contigs. Limited manual curation of gene models was performed to resolve inconsistencies between paralogous gene models uncovered during phylogenetic analysis. Gene models were assigned functions based upon database matches or HMM domain content as described previously. Gene predictions with similarity to known transposons were identified by searching against a curated set of transposon-encoded proteins (ftp://ftp.tigr.org/pub/data/TransposableElements/transposon_db.pep). Predicted proteins <100 amino acids in length with no database match were excluded from the final annotation.

VISTA plots (Frazer et al., 2004) were generated using the Web interface hosted at the Lawrence Berkeley Labs (http://genome.lbl.gov/vista/mvista/submit.shtml) using the AVID alignment option (Bray et al., 2003).

Phylogenic Analysis of Protein Families

Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 (Tamura et al., 2007). Protein sequences were aligned using ClustalW using the following default parameters: for pairwise alignments, a gap opening penalty = 10 and a gap extension penalty = 0.1; for multiple alignments, a gap opening penalty =10 and a gap extension penalty = 0.2. The protein weight matrix used was Gonnet with residue-specific penalties ON, hydrophilic penalties ON, a gap separation distance = 4, end gap separation OFF, use negative matrix OFF and delay divergent cutoff = 30%. Phylogenetic trees were constructed by the neighbor-joining method using default parameters as follows: gaps/missing data, complete deletion; model, amino: Poisson correction; substitutions to include, all; pattern among lineages, same; rates among sites, uniform. The alignments used for phylogenetic analysis are available as Supplemental Data Set 1 online.

Calculation of Ks Values

The analysis was performed by comparing sets of four BACs or BAC contigs representing Brassica napus A genome, Brassica rapa A genome, B. napus C genome, and B. oleracea C genome. Varying numbers of hypothetical genes families were involved in the analyses, and all the contigs were compared against one another for each hypothetical gene family. The Bioperl (Stajich et al., 2002) script bp_pairwise_kaks.pl was used to perform the analyses. The script works by taking as input two cDNA sequences that are going to be compared, translating these sequences to their corresponding protein sequences, aligning the protein sequences using ClustalW (Larkin et al., 2007), and then using the protein alignments together with the cDNA sequences to calculate the Ka, Ks, and Ka/Ks ratio by implementing the yn00 method (Yang and Nielsen, 2000), which is part of the PAML distribution (Yang, 2007).

Estimation of SNP and InDel Content

SNPs and Indels among the sequenced BACs were identified using MUMmer (Kurtz et al., 2004) with InDels identified between the genome segments by calculation of the difference in base pair coordinates of consecutive aligned SNP positions.

Accession Numbers

Sequence data from this article can be found in the EMBL/GenBank data libraries under the accession numbers listed in Table 1.

Supplemental Data

The following materials are available in the online version of this article.

Supplemental Figure 1. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig A, Using MUMmer, for B. napus C Genome versus B. oleracea.
Supplemental Figure 2. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig B, Using MUMmer, for B. napus A genome versus B. rapa.
Supplemental Figure 3. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig B, Using MUMmer, for B. napus C Genome versus B. oleracea.
Supplemental Figure 4. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig B, Using MUMmer, for B. rapa versus B. oleracea.
Supplemental Figure 5. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig B, Using MUMmer, for B. napus A Genome versus B. napus C Genome.
Supplemental Figure 6. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig C, Using MUMmer, for B. napus A Genome versus B. rapa.
Supplemental Figure 7. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig C, Using MUMmer, for B. napus C Genome versus B. oleracea.
Supplemental Figure 8. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig C, Using MUMmer, for B. rapa versus B. oleracea.
Supplemental Figure 9. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig C, Using MUMmer, for B. napus A Genome versus B. napus C Genome.
Supplemental Figure 10. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig D, Using MUMmer, for B. napus A Genome versus B. rapa.
Supplemental Figure 11. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig D, Using MUMmer, for B. napus C Genome versus B. oleracea.
Supplemental Figure 12. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig D, Using MUMmer, for B. rapa versus B. oleracea.
Supplemental Figure 13. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig D, Using MUMmer, for B. napus A Genome versus B. napus C Genome.
Supplemental Figure 14. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig F, Using MUMmer, for B. napus A Genome versus B. rapa.
Supplemental Figure 15. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig F, Using MUMmer, for B. napus C Genome versus B. oleracea.
Supplemental Figure 16. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig F, Using MUMmer, for B. rapa versus B. oleracea.
Supplemental Figure 17. Alignment of the Homoeologous Regions of the A and C Genomes, as Represented in Contig F, Using MUMmer, for B. napus A Genome versus B. napus C Genome.
Supplemental Figure 18. VISTA Plots Showing, for Contig A, the Sequence Relationships between the Contigs from B. napus C Genome as Reference and B. oleracea as Query.
Supplemental Figure 19. VISTA Plots Showing, for Contig A, the Sequence Relationships between the Contigs from B. oleracea as Reference and B. napus C Genome as Query.
Supplemental Figure 20. VISTA Plots Showing, for Contig B, the Sequence Relationships between the Contigs from B. napus C Genome as Reference and B. oleracea as Query.
Supplemental Figure 21. VISTA Plots Showing, for Contig B, the Sequence Relationships between the Contigs from B. oleracea as Reference and B. napus C Genome as Query.
Supplemental Figure 22. VISTA Plots Showing, for Contig B, the Sequence Relationships between the Contigs from B. napus A Genome as Reference and B. rapa as Query.
Supplemental Figure 23. VISTA Plots Showing, for Contig B, the Sequence Relationships between the Contigs from B. rapa as Reference and B. napus A Genome as Query.
Supplemental Figure 24. VISTA Plots Showing, for Contig C, the Sequence Relationships between the Contigs from B. napus C Genome as Reference and B. oleracea as Query.
Supplemental Figure 25. VISTA Plots Showing, for Contig C, the Sequence Relationships between the Contigs from B. oleracea as Reference and B. napus C Genome as Query.
Supplemental Figure 26. VISTA Plots Showing, for Contig C, the Sequence Relationships between the Contigs from B. napus A Genome as Reference and B. rapa as Query.
Supplemental Figure 27. VISTA Plots Showing, for Contig C, the Sequence Relationships between the Contigs from B. rapa as Reference and B. napus A genome as Query.
Supplemental Figure 28. VISTA Plots Showing, for Contig D, the Sequence Relationships between the Contigs from B. napus C Genome as Reference and B. oleracea as Query.
Supplemental Figure 29. VISTA Plots Showing, for Contig D, the Sequence Relationships between the Contigs from B. oleracea as Reference and B. napus C Genome as Query.
Supplemental Figure 30. VISTA Plots Showing, for Contig D, the Sequence Relationships between the Contigs from B. napus A Genome as Reference and B. rapa as Query.
Supplemental Figure 31. VISTA Plots Showing, for Contig D, the Sequence Relationships between the Contigs from B. rapa as Reference and B. napus A Genome as Query.
Supplemental Figure 32. VISTA Plots Showing, for Contig E, the Sequence Relationships between the Contigs from B. napus C Genome as Reference and B. oleracea as Query.
Supplemental Figure 33. VISTA Plots Showing, for Contig E, the Sequence Relationships between the Contigs from B. oleracea as Reference and B. napus C Genome as Query.
Supplemental Figure 34. VISTA Plots Showing, for Contig E, the Sequence Relationships between the Contigs from B. napus A Genome as Reference and B. rapa as Query.
Supplemental Figure 35. VISTA Plots Showing, for Contig E, the Sequence Relationships between the Contigs from B. rapa as Reference and B. napus A Genome as Query.
Supplemental Figure 36. VISTA Plots Showing, for Contig F, the Sequence Relationships between the Contigs from B. napus C Genome as Reference and B. oleracea as Query.
Supplemental Figure 37. VISTA Plots Showing, for Contig F, the Sequence Relationships between the Contigs from B. oleracea as Reference and B. napus C Genome as Query.
Supplemental Figure 38. VISTA Plots Showing, for Contig F, the Sequence Relationships between the Contigs from B. napus A Genome as Reference and B. rapa as Query.
Supplemental Figure 39. VISTA Plots Showing, for Contig F, the Sequence Relationships between the Contigs from B. rapa as Reference and B. napus A Genome as Query.
Supplemental Figure 40. Homologies to Arabidopsis Genes, as Identified by BLAST.
Supplemental Figure 41. Size Distribution, for the Contig A Region, of InDel Variation between B. rapa and B. napus A Genome Segments and between B. oleracea and B. napus C Genome Segments.
Supplemental Figure 42. Size Distribution, for the Contig B Region, of InDel Variation between B. rapa and B. napus A Genome Segments and between B. oleracea and B. napus C Genome Segments.
Supplemental Figure 43. Size Distribution, for the Contig C Region, of InDel Variation between B. rapa and B. napus A Genome Segments and between B. oleracea and B. napus C Genome Segments.
Supplemental Figure 44. Size Distribution, for the Contig D Region, of InDel Variation between B. rapa and B. napus A Genome Segments and between B. oleracea and B. napus C Genome Segments.
Supplemental Figure 45. Size Distribution, for the Contig F Region, of InDel Variation between B. rapa and B. napus A Genome Segments and between B. oleracea and B. napus C Genome Segments.
Supplemental Table 1. The Results of BLASTN Analysis of the B. rapa and B. napus Sequences Relative to Arabidopsis Gene Models, Presented in Excel Format.
Supplemental Data Set 1. Alignments Used for Phylogenetic Analysis.

Supplementary Material

[Supplemental Data]

tpc.108.060376_index.html^{(1KB, html)}

Acknowledgments

We thank Paul Wilkinson of the University of Bath and members of the John Innes Centre Genome Laboratory for their contributions to the sequencing of the clones. This work was funded by the UK Biotechnology and Biological Sciences Research Council (BBS/B/07330, BB/E017363, and competitive support grant to the John Innes Centre). The research of Y.P.L., J.-Y.P., S.-J.K., and J.-A.K. was supported by grants from Rural Development Administration (BioGreen 21 Program 20050301034438 and National Academy of Agricultural Science Projects 2007139062200001502 and 200901FHT020710397), the Technology Development Program for Agriculture and Forestry, Ministry for Food, Agriculture, Forestry, and Fisheries (Project No. 607003-05), and National Institute of Agricultural Biotechnology (Project 04-1-12-2), Korea. J.C.P., C.T., and A.H.P. were supported by the U.S. National Science Foundation (DBI-0638536).

The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantcell.org) is: Ian Bancroft (ian.bancroft@bbsrc.ac.uk).

^[W]

Online version contains Web-only data.

^[OA]

Open access articles can be viewed online without a subscription.

www.plantcell.org/cgi/doi/10.1105/tpc.108.060376

References

Arabidopsis Genome Initiative (2000). Analysis of the genome of the flowering plant Arabidopsis thaliana. Nature 408 796–815. [DOI] [PubMed] [Google Scholar]
Arumuganthan, K., and Earle, E.D. (1991). Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 9 208–218. [Google Scholar]
Babula, D., Kaczmarek, M., Barakat, A., Delseny, M., Quiros, C.F., and Sadowski, J. (2003). Chromosomal mapping of Brassica oleracea based on ESTs from Arabidopsis thaliana: Complexity of the comparative map. Mol. Genet. Genomics 268 656–665. [DOI] [PubMed] [Google Scholar]
Blanc, G., Barakat, A., Guyot, R., Cooke, R., and Delseny, M. (2000). Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12 1093–1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bowers, J.E., Chapman, B.A., Rong, J., and Paterson, A.H. (2003). Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422 433–438. [DOI] [PubMed] [Google Scholar]
Bray, N., Dubchak, I., and Pachter, L. (2003). AVID: A global alignment program. Genome Res. 13 97–102. [DOI] [PMC free article] [PubMed] [Google Scholar]
Brunner, S., Pea, G., and Rafalski, A. (2005). Origins, genetic organization and transcription of a family of non-autonomous helitron elements in maize. Plant J. 43 799–810. [DOI] [PubMed] [Google Scholar]
Feschotte, C., and Wessler, S.R. (2001). Treasures in the attic: Rolling circle transposons discovered in eukaryotic genomes. Proc. Natl. Acad. Sci. USA 98 8923–8924. [DOI] [PMC free article] [PubMed] [Google Scholar]
Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M., and Dubchak, I. (2004). VISTA: Computational tools for comparative genomics. Nucleic Acids Res. 32: W273–W279. [DOI] [PMC free article] [PubMed]
Gaeta, R.T., Pires, J.C., Iniguez-Luy, F., Leon, E., and Osborn, T.C. (2007). Genomic changes in resynthesized Brassica napus and their effect on gene expression and phenotype. Plant Cell 19 3403–3417. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gupta, S., Gallavotti, A., Stryker, G.A., Schmidt, R.J., and Lal, S.K. (2005). A novel class of Helitron- related transposable elements in maize contain portions of multiple pseudogenes. Plant Mol. Biol. 57 115–127. [DOI] [PubMed] [Google Scholar]
Haas, B.J., Wortman, J.R., Ronning, C.M., Hannick, L.I., Smith, R.K. Jr, Maiti, R., Chan, A.P., Yu, C., Farzad, M., Wu, D., White, O., and Town, C.D. (2005). Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biology 3 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hoen, D.R., Park, K.C., Elrouby, N., Yu, Z., Mohabir, N., Cowan, R.K., and Bureau, T.E. (2006). Transposon-mediated expansion and diversification of a family of ULP-like genes. Mol. Biol. Evol. 23 1254–1268. [DOI] [PubMed] [Google Scholar]
Inaba, R., and Nishio, T. (2002). Phylogenetic analysis of Brassiceae based on the nucleotide sequences of the S-locus related gene, SLR1. Theor. Appl. Genet. 105 1159–1165. [DOI] [PubMed] [Google Scholar]
Jenczewski, E., Eber, F., Grimaud, A., Huet, S., Lucas, M.O., Monod, H., and Chèvre, A.-M. (2003). PrBn, a major gene controlling homoeologous pairing in oilseed rape (Brassica napus) haploids. Genetics 164 645–653. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jiang, N., Bao, Z., Zhang, X., Eddy, S.R., and Wessler, S.R. (2004). Pack-MULE transposable elements mediate gene evolution in plants. Nature 431 569–573. [DOI] [PubMed] [Google Scholar]
Juretic, N., Hoen, D.R., Huynh, M.L., Harrison, P.M., and Bureau, T.E. (2005). The evolutionary fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 15 1292–1297. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koch, M.A., Haubold, B., and Mitchell-Olds, T. (2000). Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol. 17 1483–1498. [DOI] [PubMed] [Google Scholar]
Kowalski, S.D., Lan, T.-H., Feldmann, K.A., and Paterson, A.H. (1994). Comparative mapping of Arabidopsis thaliana and Brassica oleracea chromosomes reveals islands of conserved gene order. Genetics 138 499–510. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ku, H.-M., Vision, T., Liu, J., and Tanksley, S.D. (2000). Comparing sequenced segments of the tomato and Arabidopsis genomes: Large-scale duplication followed by selective gene loss creates a network of synteny. Proc. Natl. Acad. Sci. USA 97 9121–9126. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S.L. (2004). Versatile and open software for comparing large genomes. Genome Biol. 5 R12. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lagercrantz, U., and Lydiate, D. (1996). Comparative genome mapping in Brassica. Genetics 144 1903–1910. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lai, J., Li, Y., Messing, J., and Dooner, H.K. (2005). Gene movement by Helitron transposons contributes to the haplotype variability of maize. Proc. Natl. Acad. Sci. USA 102 9068–9073. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lan, T.H., DelMonte, T.A., Reischmann, K.P., Hyman, J., Kowalski, S., McFerson, J., Kresovich, S., and Paterson, A.H. (2000). An EST-enriched comparative map of Brassica oleracea and Arabidopsis thaliana. Genome Res. 10 776–788. [DOI] [PMC free article] [PubMed] [Google Scholar]
Larkin, M.A., et al. (2007). ClustalW and ClustalX version 2. Bioinformatics 23 2947–2948. [DOI] [PubMed] [Google Scholar]
Leitch, I.J., and Bennett, M.D. (1997). Polyploidy in angiosperms. Trends Plant Sci. 2 470–476. [Google Scholar]
Lukashin, A., and Borodovsky, M. (1998). GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res. 26 1107–1115. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lukens, L., Zou, F., Lydiate, D., Parkin, I., and Osborn, T. (2003). Comparison of a Brassica oleracea genetic map with the genome of Arabidopsis thaliana. Genetics 164 359–372. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lukens, L.N., Quijada, P.A., Udall, J., Pires, J.C., Schranz, M.E., and Osborn, T.C. (2004). Genome redundancy and plasticity within ancient and recent Brassica crop species. Biol. J. Linn. Soc. Lond. 82 665–674. [Google Scholar]
Lysak, M.A., Koch, M.A., Pecinka, A., and Schubert, I. (2005). Chromosome triplication found across the tribe Brassiceae. Genome Res. 15 516–525. [DOI] [PMC free article] [PubMed] [Google Scholar]
Mayer, K., et al. (2001). Conservation of microstructure between a sequenced region of the genome of rice and multiple segments of the genome of Arabidopsis thaliana. Genome Res. 11 1167–1174. [DOI] [PMC free article] [PubMed] [Google Scholar]
Morgante, M. (2006). Plant genome organisation and diversity: The year of the junk! Curr. Opin. Biotechnol. 17 168–173. [DOI] [PubMed] [Google Scholar]
Morgante, M., Brunner, S., Pea, G., Fengler, K., Zuccolo, A., and Rafalski, A. (2005). Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat. Genet. 37 997–1002. [DOI] [PubMed] [Google Scholar]
O'Neill, C.M., and Bancroft, I. (2000). Comparative physical mapping of segments of the genome of Brassica oleracea var ssp. alboglabra that are homoeologous to sequenced regions of the chromosomes 4 and 5 of Arabidopsis thaliana. Plant J. 23 233–243. [DOI] [PubMed] [Google Scholar]
Park, J.Y., et al. (2005). Physical mapping and microsynteny of Brassica rapa ssp. pekinensis genome corresponding to a 222 kb gene-rich region of Arabidopsis chromosome 4 and partially duplicated on chromosome 5. Mol. Genet. Genomics 274 579–588. [DOI] [PubMed] [Google Scholar]
Parkin, I.A.P., Gulden, S.M., Sharpe, A.G., Lukens, L., Trick, M., Osborn, T.C., and Lydiate, D.J. (2005). Segmental structure of the Brassica napus genome based on comparative analysis with Arabidopsis thaliana. Genetics 171 765–781. [DOI] [PMC free article] [PubMed] [Google Scholar]
Parkin, I.A.P., Sharpe, A.G., Keith, D.J., and Lydiate, D.J. (1995). Identification of the A and C genomes of amphidiploid Brassica napus (oilseed rape). Genome 38 1122–1131. [DOI] [PubMed] [Google Scholar]
Parkin, I.A.P., Sharpe, A.G., and Lydiate, D.J. (2003). Patterns of genome duplication within the Brassica napus genome. Genome 46 291–303. [DOI] [PubMed] [Google Scholar]
Paterson, A.H., Bowers, J.E., Burow, M.D., Draye, X., Elsik, C.G., Jiang, C., Katsar, C.S., Lan, T., Lin, Y., Ming, R., and Wright, R.J. (2000). Comparative genomics of plant chromosomes. Plant Cell 12 1523–1539. [DOI] [PMC free article] [PubMed] [Google Scholar]
Qiu, D., et al. (2006). A comparative linkage map of oilseed rape and its use for QTL analysis of seed oil and erucic acid content. Theor. Appl. Genet. 114 67–80. [DOI] [PubMed] [Google Scholar]
Rana, D., van den Boogaart, T., O'Neill, C.M., Hynes, L., Bent, E., Macpherson, L., Park, J.Y., Lim, Y.P., and Bancroft, I. (2004). Conservation of the microstructure of genome segments in Brassica napus and its diploid relatives. Plant J. 40 725–733. [DOI] [PubMed] [Google Scholar]
Schranz, M.E., Lysak, M.A., and Mitchell-Olds, T. (2006). The ABC's of comparative genomics in the Brassicaceae: Building blocks of crucifer genomes. Trends Plant Sci. 11 535–542. [DOI] [PubMed] [Google Scholar]
Schmidt, R., Acarkan, A., and Boivin, K. (2001). Comparative structural genomics in the Brassicaceae family. Plant Physiol. Biochem. 39 253–262. [Google Scholar]
Sharpe, A.G., Parkin, I.A.P., Keith, D.J., and Lydiate, D.J. (1995). Frequent non-reciprocal translocations in the amphidiploid genome of oilseed rape. Genome 38 1112–1121. [DOI] [PubMed] [Google Scholar]
Song, K., Lu, P., Tang, K., and Osborn, T.C. (1995). Rapid genome change in synthetic polyploids of Brassica and its implications for polyploid evolution. Proc. Natl. Acad. Sci. USA 92 7719–7723. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stajich, J.E., et al. (2002). The Bioperl Toolkit: Perl modules for the life sciences. Genome Res. 12 1161–1168. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24 1596–1599. [DOI] [PubMed] [Google Scholar]
Tettelin, H., et al. (2005). Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”. Proc. Natl. Acad. Sci. USA 102 13950–13955. [DOI] [PMC free article] [PubMed] [Google Scholar]
Town, C.D., et al. (2006). Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveals gene loss, fragmentation and dispersal following polyploidy. Plant Cell 18 1348–1359. [DOI] [PMC free article] [PubMed] [Google Scholar]
U, N. (1935). Genome analysis in Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Jpn. J. Bot. 7: 389–452.
Udall, J., Quijada, P., and Osborn, T.C. (2004). Detection of chromosomal rearrangements derived from homologous recombination in four mapping populations of Brassica napus L. Genetics 169 967–979. [DOI] [PMC free article] [PubMed] [Google Scholar]
Warwick, S.I., and Black, L.D. (1991). Molecular systematics of Brassica and allied genera (Subtribe Brassicinae, Brassiceae) – Chloroplast genome and cytodeme congruence. Theor. Appl. Genet. 82 81–92. [DOI] [PubMed] [Google Scholar]
Wendel, J.F. (2000). Genome evolution in polyploids. Plant Mol. Biol. 42 225–249. [PubMed] [Google Scholar]
Wolfe, K.H., Gouy, M., Yang, Y.W., Sharp, P.M., and Li, W.H. (1989). Date of the monocot-dicot divergence estimated from the chloroplast DNA sequence data. Proc. Natl. Acad. Sci. USA 86 6201–6205. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang, T.J., et al. (2006). Sequence-level analysis of the diploidization process in the triplicated FLC region of Brassica rapa. Plant Cell 18 1339–1347. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang, Y.W., Lai, K.N., Tai, P.Y., and Li, W.H. (1999). Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J. Mol. Evol. 48 597–604. [DOI] [PubMed] [Google Scholar]
Yang, Z., and Nielsen, R. (2000). Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol and Evol. 17 32–43. [DOI] [PubMed] [Google Scholar]
Yang, Z. (2007). PAML 4: A program package for phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24 1586–1591. [DOI] [PubMed] [Google Scholar]
Zabala, G., and Vodkin, L.O. (2005). The wp mutation of Glycine max carries a gene-fragment-rich transposon of the CACTA superfamily. Plant Cell 17 2619–2632. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplemental Data]

tpc.108.060376_index.html^{(1KB, html)}

tpc.108.060376_1.pdf^{(1.8MB, pdf)}

tpc.108.060376_Bancroft_resubmission_Supplemental_Table_1.xls^{(401.5KB, xls)}

tpc.108.060376_Bancroft_revised_Supplemental_Dataset_1.txt^{(6.1KB, txt)}

[bib1] Arabidopsis Genome Initiative (2000). Analysis of the genome of the flowering plant Arabidopsis thaliana. Nature 408 796–815. [DOI] [PubMed] [Google Scholar]

[bib2] Arumuganthan, K., and Earle, E.D. (1991). Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 9 208–218. [Google Scholar]

[bib3] Babula, D., Kaczmarek, M., Barakat, A., Delseny, M., Quiros, C.F., and Sadowski, J. (2003). Chromosomal mapping of Brassica oleracea based on ESTs from Arabidopsis thaliana: Complexity of the comparative map. Mol. Genet. Genomics 268 656–665. [DOI] [PubMed] [Google Scholar]

[bib4] Blanc, G., Barakat, A., Guyot, R., Cooke, R., and Delseny, M. (2000). Extensive duplication and reshuffling in the Arabidopsis genome. Plant Cell 12 1093–1101. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Bowers, J.E., Chapman, B.A., Rong, J., and Paterson, A.H. (2003). Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422 433–438. [DOI] [PubMed] [Google Scholar]

[bib6] Bray, N., Dubchak, I., and Pachter, L. (2003). AVID: A global alignment program. Genome Res. 13 97–102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Brunner, S., Pea, G., and Rafalski, A. (2005). Origins, genetic organization and transcription of a family of non-autonomous helitron elements in maize. Plant J. 43 799–810. [DOI] [PubMed] [Google Scholar]

[bib8] Feschotte, C., and Wessler, S.R. (2001). Treasures in the attic: Rolling circle transposons discovered in eukaryotic genomes. Proc. Natl. Acad. Sci. USA 98 8923–8924. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Frazer, K.A., Pachter, L., Poliakov, A., Rubin, E.M., and Dubchak, I. (2004). VISTA: Computational tools for comparative genomics. Nucleic Acids Res. 32: W273–W279. [DOI] [PMC free article] [PubMed]

[bib10] Gaeta, R.T., Pires, J.C., Iniguez-Luy, F., Leon, E., and Osborn, T.C. (2007). Genomic changes in resynthesized Brassica napus and their effect on gene expression and phenotype. Plant Cell 19 3403–3417. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Gupta, S., Gallavotti, A., Stryker, G.A., Schmidt, R.J., and Lal, S.K. (2005). A novel class of Helitron- related transposable elements in maize contain portions of multiple pseudogenes. Plant Mol. Biol. 57 115–127. [DOI] [PubMed] [Google Scholar]

[bib142] Haas, B.J., Wortman, J.R., Ronning, C.M., Hannick, L.I., Smith, R.K. Jr, Maiti, R., Chan, A.P., Yu, C., Farzad, M., Wu, D., White, O., and Town, C.D. (2005). Complete reannotation of the Arabidopsis genome: methods, tools, protocols and the final release. BMC Biology 3 7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Hoen, D.R., Park, K.C., Elrouby, N., Yu, Z., Mohabir, N., Cowan, R.K., and Bureau, T.E. (2006). Transposon-mediated expansion and diversification of a family of ULP-like genes. Mol. Biol. Evol. 23 1254–1268. [DOI] [PubMed] [Google Scholar]

[bib13] Inaba, R., and Nishio, T. (2002). Phylogenetic analysis of Brassiceae based on the nucleotide sequences of the S-locus related gene, SLR1. Theor. Appl. Genet. 105 1159–1165. [DOI] [PubMed] [Google Scholar]

[bib14] Jenczewski, E., Eber, F., Grimaud, A., Huet, S., Lucas, M.O., Monod, H., and Chèvre, A.-M. (2003). PrBn, a major gene controlling homoeologous pairing in oilseed rape (Brassica napus) haploids. Genetics 164 645–653. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Jiang, N., Bao, Z., Zhang, X., Eddy, S.R., and Wessler, S.R. (2004). Pack-MULE transposable elements mediate gene evolution in plants. Nature 431 569–573. [DOI] [PubMed] [Google Scholar]

[bib16] Juretic, N., Hoen, D.R., Huynh, M.L., Harrison, P.M., and Bureau, T.E. (2005). The evolutionary fate of MULE-mediated duplications of host gene fragments in rice. Genome Res. 15 1292–1297. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib17] Koch, M.A., Haubold, B., and Mitchell-Olds, T. (2000). Comparative evolutionary analysis of chalcone synthase and alcohol dehydrogenase loci in Arabidopsis, Arabis, and related genera (Brassicaceae). Mol. Biol. Evol. 17 1483–1498. [DOI] [PubMed] [Google Scholar]

[bib18] Kowalski, S.D., Lan, T.-H., Feldmann, K.A., and Paterson, A.H. (1994). Comparative mapping of Arabidopsis thaliana and Brassica oleracea chromosomes reveals islands of conserved gene order. Genetics 138 499–510. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Ku, H.-M., Vision, T., Liu, J., and Tanksley, S.D. (2000). Comparing sequenced segments of the tomato and Arabidopsis genomes: Large-scale duplication followed by selective gene loss creates a network of synteny. Proc. Natl. Acad. Sci. USA 97 9121–9126. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib20] Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S.L. (2004). Versatile and open software for comparing large genomes. Genome Biol. 5 R12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Lagercrantz, U., and Lydiate, D. (1996). Comparative genome mapping in Brassica. Genetics 144 1903–1910. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Lai, J., Li, Y., Messing, J., and Dooner, H.K. (2005). Gene movement by Helitron transposons contributes to the haplotype variability of maize. Proc. Natl. Acad. Sci. USA 102 9068–9073. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Lan, T.H., DelMonte, T.A., Reischmann, K.P., Hyman, J., Kowalski, S., McFerson, J., Kresovich, S., and Paterson, A.H. (2000). An EST-enriched comparative map of Brassica oleracea and Arabidopsis thaliana. Genome Res. 10 776–788. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] Larkin, M.A., et al. (2007). ClustalW and ClustalX version 2. Bioinformatics 23 2947–2948. [DOI] [PubMed] [Google Scholar]

[bib25] Leitch, I.J., and Bennett, M.D. (1997). Polyploidy in angiosperms. Trends Plant Sci. 2 470–476. [Google Scholar]

[bib26] Lukashin, A., and Borodovsky, M. (1998). GeneMark.hmm: New solutions for gene finding. Nucleic Acids Res. 26 1107–1115. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] Lukens, L., Zou, F., Lydiate, D., Parkin, I., and Osborn, T. (2003). Comparison of a Brassica oleracea genetic map with the genome of Arabidopsis thaliana. Genetics 164 359–372. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Lukens, L.N., Quijada, P.A., Udall, J., Pires, J.C., Schranz, M.E., and Osborn, T.C. (2004). Genome redundancy and plasticity within ancient and recent Brassica crop species. Biol. J. Linn. Soc. Lond. 82 665–674. [Google Scholar]

[bib29] Lysak, M.A., Koch, M.A., Pecinka, A., and Schubert, I. (2005). Chromosome triplication found across the tribe Brassiceae. Genome Res. 15 516–525. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Mayer, K., et al. (2001). Conservation of microstructure between a sequenced region of the genome of rice and multiple segments of the genome of Arabidopsis thaliana. Genome Res. 11 1167–1174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib32] Morgante, M. (2006). Plant genome organisation and diversity: The year of the junk! Curr. Opin. Biotechnol. 17 168–173. [DOI] [PubMed] [Google Scholar]

[bib31] Morgante, M., Brunner, S., Pea, G., Fengler, K., Zuccolo, A., and Rafalski, A. (2005). Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat. Genet. 37 997–1002. [DOI] [PubMed] [Google Scholar]

[bib33] O'Neill, C.M., and Bancroft, I. (2000). Comparative physical mapping of segments of the genome of Brassica oleracea var ssp. alboglabra that are homoeologous to sequenced regions of the chromosomes 4 and 5 of Arabidopsis thaliana. Plant J. 23 233–243. [DOI] [PubMed] [Google Scholar]

[bib34] Park, J.Y., et al. (2005). Physical mapping and microsynteny of Brassica rapa ssp. pekinensis genome corresponding to a 222 kb gene-rich region of Arabidopsis chromosome 4 and partially duplicated on chromosome 5. Mol. Genet. Genomics 274 579–588. [DOI] [PubMed] [Google Scholar]

[bib37] Parkin, I.A.P., Gulden, S.M., Sharpe, A.G., Lukens, L., Trick, M., Osborn, T.C., and Lydiate, D.J. (2005). Segmental structure of the Brassica napus genome based on comparative analysis with Arabidopsis thaliana. Genetics 171 765–781. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib35] Parkin, I.A.P., Sharpe, A.G., Keith, D.J., and Lydiate, D.J. (1995). Identification of the A and C genomes of amphidiploid Brassica napus (oilseed rape). Genome 38 1122–1131. [DOI] [PubMed] [Google Scholar]

[bib36] Parkin, I.A.P., Sharpe, A.G., and Lydiate, D.J. (2003). Patterns of genome duplication within the Brassica napus genome. Genome 46 291–303. [DOI] [PubMed] [Google Scholar]

[bib38] Paterson, A.H., Bowers, J.E., Burow, M.D., Draye, X., Elsik, C.G., Jiang, C., Katsar, C.S., Lan, T., Lin, Y., Ming, R., and Wright, R.J. (2000). Comparative genomics of plant chromosomes. Plant Cell 12 1523–1539. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Qiu, D., et al. (2006). A comparative linkage map of oilseed rape and its use for QTL analysis of seed oil and erucic acid content. Theor. Appl. Genet. 114 67–80. [DOI] [PubMed] [Google Scholar]

[bib40] Rana, D., van den Boogaart, T., O'Neill, C.M., Hynes, L., Bent, E., Macpherson, L., Park, J.Y., Lim, Y.P., and Bancroft, I. (2004). Conservation of the microstructure of genome segments in Brassica napus and its diploid relatives. Plant J. 40 725–733. [DOI] [PubMed] [Google Scholar]

[bib41] Schranz, M.E., Lysak, M.A., and Mitchell-Olds, T. (2006). The ABC's of comparative genomics in the Brassicaceae: Building blocks of crucifer genomes. Trends Plant Sci. 11 535–542. [DOI] [PubMed] [Google Scholar]

[bib43] Schmidt, R., Acarkan, A., and Boivin, K. (2001). Comparative structural genomics in the Brassicaceae family. Plant Physiol. Biochem. 39 253–262. [Google Scholar]

[bib42] Sharpe, A.G., Parkin, I.A.P., Keith, D.J., and Lydiate, D.J. (1995). Frequent non-reciprocal translocations in the amphidiploid genome of oilseed rape. Genome 38 1112–1121. [DOI] [PubMed] [Google Scholar]

[bib44] Song, K., Lu, P., Tang, K., and Osborn, T.C. (1995). Rapid genome change in synthetic polyploids of Brassica and its implications for polyploid evolution. Proc. Natl. Acad. Sci. USA 92 7719–7723. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Stajich, J.E., et al. (2002). The Bioperl Toolkit: Perl modules for the life sciences. Genome Res. 12 1161–1168. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007). MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24 1596–1599. [DOI] [PubMed] [Google Scholar]

[bib47] Tettelin, H., et al. (2005). Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome”. Proc. Natl. Acad. Sci. USA 102 13950–13955. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Town, C.D., et al. (2006). Comparative genomics of Brassica oleracea and Arabidopsis thaliana reveals gene loss, fragmentation and dispersal following polyploidy. Plant Cell 18 1348–1359. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib49] U, N. (1935). Genome analysis in Brassica with special reference to the experimental formation of B. napus and peculiar mode of fertilization. Jpn. J. Bot. 7: 389–452.

[bib50] Udall, J., Quijada, P., and Osborn, T.C. (2004). Detection of chromosomal rearrangements derived from homologous recombination in four mapping populations of Brassica napus L. Genetics 169 967–979. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Warwick, S.I., and Black, L.D. (1991). Molecular systematics of Brassica and allied genera (Subtribe Brassicinae, Brassiceae) – Chloroplast genome and cytodeme congruence. Theor. Appl. Genet. 82 81–92. [DOI] [PubMed] [Google Scholar]

[bib52] Wendel, J.F. (2000). Genome evolution in polyploids. Plant Mol. Biol. 42 225–249. [PubMed] [Google Scholar]

[bib53] Wolfe, K.H., Gouy, M., Yang, Y.W., Sharp, P.M., and Li, W.H. (1989). Date of the monocot-dicot divergence estimated from the chloroplast DNA sequence data. Proc. Natl. Acad. Sci. USA 86 6201–6205. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Yang, T.J., et al. (2006). Sequence-level analysis of the diploidization process in the triplicated FLC region of Brassica rapa. Plant Cell 18 1339–1347. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Yang, Y.W., Lai, K.N., Tai, P.Y., and Li, W.H. (1999). Rates of nucleotide substitution in angiosperm mitochondrial DNA sequences and dates of divergence between Brassica and other angiosperm lineages. J. Mol. Evol. 48 597–604. [DOI] [PubMed] [Google Scholar]

[bib57] Yang, Z., and Nielsen, R. (2000). Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol and Evol. 17 32–43. [DOI] [PubMed] [Google Scholar]

[bib56] Yang, Z. (2007). PAML 4: A program package for phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24 1586–1591. [DOI] [PubMed] [Google Scholar]

[bib58] Zabala, G., and Vodkin, L.O. (2005). The wp mutation of Glycine max carries a gene-fragment-rich transposon of the CACTA superfamily. Plant Cell 17 2619–2632. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Comparative Analysis between Homoeologous Genome Segments of Brassica napus and Its Progenitor Species Reveals Extensive Sequence-Level Divergence[W],[OA]

Foo Cheung

Martin Trick

Nizar Drou

Yong Pyo Lim

Jee-Young Park

Soo-Jin Kwon

Jin-A Kim

Rod Scott

J Chris Pires

Andrew H Paterson

Chris Town

Ian Bancroft

Abstract

INTRODUCTION

RESULTS

Generation of Sequence Contigs

Table 1.

Annotation of Sequence Contigs

Table 2.

Overall Alignment of Homoeologous Genome Segments

Figure 1.

Figure 2.

Figure 3.

Figure 4.

Detailed Alignment of Sequence Annotations

Comparative Genome Analysis

Figure 5.

Figure 6.

Timing of Genome Divergence

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Divergence of Homoeologous Genome Segments by Single Nucleotide Polymorphisms and InDels

Table 10.

Table 11.

Figure 7.

Comparison of the A Genome in B. napus, B. rapa ssp trilocularis, and B. rapa ssp pekinensis

DISCUSSION

Conservation of Gene Order in Brassica Genomes

Chronology of Brassica Genome Divergence

Genome Evolution by SNP and InDel Mechanisms

Perspectives on Genome Evolution

METHODS

BAC Sequencing

Sequence Annotation

Phylogenic Analysis of Protein Families

Calculation of Ks Values

Estimation of SNP and InDel Content

Accession Numbers

Supplemental Data

Supplementary Material

Acknowledgments

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Comparative Analysis between Homoeologous Genome Segments of Brassica napus and Its Progenitor Species Reveals Extensive Sequence-Level Divergence^[W]^,^[OA]