Two haplotype-resolved genome assemblies for AAB allotriploid bananas provide insights into banana subgenome asymmetric evolution and Fusarium wilt control

Wen-Zhao Xie; Yu-Yu Zheng; Weidi He; Fangcheng Bi; Yaoyao Li; Tongxin Dou; Run Zhou; Yi-Xiong Guo; Guiming Deng; Wenhui Zhang; Min-Hui Yuan; Pablo Sanz-Jimenez; Xi-Tong Zhu; Xin-Dong Xu; Zu-Wen Zhou; Zhi-Wei Zhou; Jia-Wu Feng; Siwen Liu; Chunyu Li; Qiaosong Yang; Chunhua Hu; Huijun Gao; Tao Dong; Jiangbo Dang; Qigao Guo; Wenguo Cai; Jianwei Zhang; Ganjun Yi; Jia-Ming Song; Ou Sheng; Ling-Ling Chen

doi:10.1016/j.xplc.2023.100766

. 2023 Nov 15;5(2):100766. doi: 10.1016/j.xplc.2023.100766

Two haplotype-resolved genome assemblies for AAB allotriploid bananas provide insights into banana subgenome asymmetric evolution and Fusarium wilt control

Wen-Zhao Xie ^1,^2,⁵, Yu-Yu Zheng ^2,⁵, Weidi He ^1,⁵, Fangcheng Bi ¹, Yaoyao Li ¹, Tongxin Dou ¹, Run Zhou ², Yi-Xiong Guo ², Guiming Deng ¹, Wenhui Zhang ², Min-Hui Yuan ³, Pablo Sanz-Jimenez ², Xi-Tong Zhu ³, Xin-Dong Xu ³, Zu-Wen Zhou ³, Zhi-Wei Zhou ², Jia-Wu Feng ², Siwen Liu ¹, Chunyu Li ¹, Qiaosong Yang ¹, Chunhua Hu ¹, Huijun Gao ¹, Tao Dong ¹, Jiangbo Dang ⁴, Qigao Guo ⁴, Wenguo Cai ³, Jianwei Zhang ², Ganjun Yi ^1,^∗, Jia-Ming Song ^3,^∗∗, Ou Sheng ^1,^∗∗∗, Ling-Ling Chen ^3,^∗∗∗∗

PMCID: PMC10873913 PMID: 37974402

Abstract

Bananas (Musa spp.) are one of the world’s most important fruit crops and play a vital role in food security for many developing countries. Most banana cultivars are triploids derived from inter- and intraspecific hybridizations between the wild diploid ancestor species Musa acuminate (AA) and M. balbisiana (BB). We report two haplotype-resolved genome assemblies of the representative AAB-cultivated types, Plantain and Silk, and precisely characterize ancestral contributions by examining ancestry mosaics across the genome. Widespread asymmetric evolution is observed in their subgenomes, which can be linked to frequent homologous exchange events. We reveal the genetic makeup of triploid banana cultivars and verify that subgenome B is a rich source of disease resistance genes. Only 58.5% and 59.4% of Plantain and Silk genes, respectively, are present in all three haplotypes, with >50% of genes being differentially expressed alleles in different subgenomes. We observed that the number of upregulated genes in Plantain is significantly higher than that in Silk at one-week post-inoculation with Fusarium wilt tropical race 4 (Foc TR4), which confirms that Plantain can initiate defense responses faster than Silk. Additionally, we compared genomic and transcriptomic differences among the genes related to carotenoid synthesis and starch metabolism between Plantain and Silk. Our study provides resources for better understanding the genomic architecture of cultivated bananas and has important implications for Musa genetics and breeding.

Key words: bananas, Plantain, Silk, asymmetric evolution, homologous exchange, Fusarium wilt

This study reports two haplotype-resolved genome assemblies for AAB banana cultivars (Plantain and Silk) and the characterization of the ancestry mosaics across their genomes. Widespread asymmetric evolution in their subgenomes is linked to frequent homologous exchange events. Plantain can initiate defense responses more quickly than Silk following banana Foc TR4 infection.

Introduction

Bananas (Musa spp.), the world’s largest herbaceous plants, are primarily grown in tropical and subtropical regions, and are of great significance to human societies (Kema and Drenth, 2020). Dessert varieties, such as Cavendish, are some of the most widely traded fruits globally (FAOSTAT Crops, 2022). Starchy cooking varieties, such as Plantains, are staple crops that contribute significantly to the diets of many developing countries (Robinson and Sauco, 2010). Most cultivated bananas are seedless triploid varieties (2n = 3× = 33) that were created through intra- or interspecific hybridization between Musa acuminata (A genome) and Musa balbisiana (B genome) (Simmonds and Shepherd, 1955). Plantains are part of an important subgroup of cooking-type bananas that serve as a major dietary component in numerous African, Latin American, and Caribbean countries (Robinson and Sauco, 2010). In the major banana-producing countries, the per capita consumption of Plantains ranges from 40 kg/year in the Democratic Republic of the Congo to 153 kg/year in Gabon (Akyeampong and Escalant, 1998). Plantain fruits have orange flesh and contain high levels of carotenoids (Sheng et al., 2023). Most Plantain varieties are resistant to Fusarium wilt caused by Fusarium oxysporum f. sp. cubense tropical race 4 (Foc TR4), the most destructive pathogen threatening banana production worldwide (Zhan et al., 2022). Genomic in situ hybridization studies have confirmed that Plantains with an AAB genome have 21 A and 12 B chromosomes (D’Hont et al., 2000). Silk is a dessert-type of banana that is widely distributed across South Asia, Southeast Asia, South America, and Australia. It is a moderately vigorous plant that produces exceptionally flavorful dessert fruits with white flesh and a sub-acid, apple-like flavor. In contrast to Plantains, the Silk subgroup is highly susceptible to Fusarium wilt (Zhan et al., 2022).

Polyploid genomes are difficult to assemble due to the presence of multiple duplication events, including whole-genome duplications (WGDs) and segmental duplications, that often occur during plant evolution. These duplications often result in the merging of repetitive sequences into a single collapsed region during genome assembly, which can lead to incorrect links between multiple genomic regions. The first draft genome assembly of M. acuminata spp. malaccensis (DH-Pahang) was published in 2012 (D’Hont et al., 2012) and refined in 2016 (Martin et al., 2016) before a telomere-to-telomere assembly was constructed in 2021 (Belser et al., 2021). The M. balbisiana (DH-PKW) genome was assembled in 2019 (Wang et al., 2019). However, no cultivated AAB banana genome has been sequenced until now.

The complex nature of allotriploid genomes presents a challenge when investigating the molecular mechanisms of certain traits. This is mostly due to the presence of allelic sequence variations that are difficult to exclude. In this study, we present haplotype-resolved genomes of two allotriploid cultivated bananas: Plantain and Silk. Our findings reveal a highly complex origin of the A subgenome in cultivated bananas, and our comparison of the A1, A2, and B subgenomes reveals key insights into the evolution, genetic diversity, and functional divergence of these subgenomes. Through our transcriptomic and functional analyses, we demonstrate that Plantain has more differentially expressed genes (DEGs) at an earlier stage of Foc TR4 infection than Silk. We also examine the genetic differences underlying carotenoid production and starch metabolism in the two genomes by analyzing genomic and transcriptomic data from different developmental and postharvest stages. We discover that an insertional mutation in the CRTISO gene in Plantain, and gene number variation in Silk, may be tightly correlated with banana quality. Our findings overcome the limitations of other allotriploid genome assemblies to-date and provide a solid basis for understanding the origin, domestication, and genetic features of cultivated bananas.

Results

Haplotype assembly and annotation of two AAB banana genomes: Plantain and Silk

Karyotype analyses confirmed that Plantain and Silk bananas have highly complex allotriploid (2n = 3× = 33) genomes (Figure 1A and Supplemental Figure 1; Supplemental Table 1). The genome sizes of Plantain and Silk were estimated to be 1.69 Gb and 1.52 Gb with heterozygosity percentages of 2.58% and 2.90%, respectively. Plantain and Silk were sequenced separately using 59 Gb (35×) and 38 Gb (25×) of PacBio HiFi reads, 272 Gb (161×) and 189 Gb (124×) of PacBio CLR long reads, 167 Gb (98×) and 233 Gb (153×) of Illumina reads, and 205 Gb (121×) and 230 Gb (151×) of Hi-C (high-throughput/resolution chromosome conformation capture) reads, respectively (Supplemental Table 2). Using the haplotype phasing and genome assembly pipeline presented in Figure 1B, we generated three haplotypes with contig N50 values of 2.01–2.92 Mb for Plantain and Silk (Table 1; Supplemental Figure 2). The switch errors between hap A and hap B were 0.51% and 0.46% for Plantain and Silk, respectively. The switch errors between hap A1 and hap A2 were 3.65% and 3.28% for Plantain and Silk, respectively (Supplemental Figure 3). Over 90% of the Plantain reads and 93% of the Silk reads were anchored to final chromosomes (Supplemental Figure 4; Supplemental Table 3). The centromeric regions spanned 0.3–3.7 Mb in Plantain and 0.3–6.5 Mb in Silk and contained 424 protein-coding genes for both banana types (Supplemental Figure 5; Supplemental Table 4). More than half of the telomeres were identified in the Plantain and Silk banana genomes (Table 1 and Supplemental Table 5).

Overview of the Plantain and Silk genome assemblies and features.

**(A)** Karyotype of Plantain. Scale bar corresponds to 5 μ $m$ .

**(B)** Flowchart of genome assembly and haplotype phasing processes.

**(C)** Circos diagram of Plantain and Silk. The circles from outer to inner represent contigs and gaps, respectively. **(A),** GC (guanine and cytosine) content (window size of 500 kb) **(B)**, gene density (window size of 100 kb) **(C)**, transposable element (TE) density (window size of 100 kb) **(D)**, SNP density (window size of 100 kb) **(E)**, HiFi, CLR, and Illumina read coverage (window size of 100 kb) **(F)**. For each track, the outer and inner layers refer to Plantain and Silk, respectively.

Table 1.

Characteristics of the Plantain (Batard) and Silk (Figue Pomme Géante) assemblies.

Genomic feature	Plantain (Batard)			Silk (Figue Pomme Géante)
Assembly	Haplotype A1	Haplotype A2	Haplotype B	Haplotype A1	Haplotype A2	Haplotype B
Total size of assembled scaffold (bp)	499 703 824	458 762 907	455 996 872	496 873 197	481 651 324	486 519 213
Number of scaffolds	26	19	110	20	20	73
Contig N50 (Mb)	2.64	2.31	2.01	2.92	2.91	2.09
Number of telomeres/subtelomeres	16	17	17	19	17	18
GC content (%)	39.44	39.48	38.31	39.13	39.15	38.27

Annotation

Total number of anchored genes	30 589	28 733	27 768	31 445	31 302	30 615
Unanchored genes	191	249	548	679	441	506
Number of genes with annotated alleles	28 232	26 718	21 414	28 055	28 392	23 702
Number of genes with NLRs	70	64	68	68	62	65
Number of genes with WRKYs	151	139	145	168	161	152
Total size of transposable elements (bp)	156 020 531	147 163 447	140 319 970	157 318 644	155 883 275	151 641 174

Assessment

BUSCOs of assembly (%)	93.3	90.0	85.8	93.5	96.4	92.9
BUSCOs of annotation (%)	89.4	86.9	80.3	89.3	91.9	86.9
LAI score	18.80	18.64	14.78	18.68	19.69	14.69
Merqury quality value (completeness)	59.0286 (96.78%)			60.7093 (96.81%)

Open in a new tab

The long terminal repeat (LTR) assembly indexes (LAIs) (Ou et al., 2018) for Plantain and Silk were 14.69 and 19.69, respectively, with an average of ∼92% BUSCO (Simão et al., 2015) plant reference genes in each assembly (Supplemental Figure 2; Supplemental Table 6). Consensus quality values were estimated using Merqury (Rhie et al., 2020) and found to be 59.03 (96.78%) for Plantain and 60.71 (96.81%) for Silk (Table 1). Furthermore, the accuracy and completeness of the assemblies was supported by high mapping rates of PacBio HiFi reads, PacBio long reads, and Illumina reads (Supplemental Table 6). Subgenomes A and B of Plantain and Silk were highly consistent with the published M. acuminata (A genome) and M. balbisiana (B genome) genomes (Wang et al., 2019; Belser et al., 2021), and the collinearity among these subgenomes was highly consistent (Supplemental Figures 2 and 6). The phasing accuracy of haplotypes A1 and A2 in Plantain and Silk was confirmed by PCR results for the alleles (Supplemental Figure 2F; Supplemental Table 7). All of these results suggest that the assemblies for Plantain and Silk are of high quality.

The two assembled genomes of the Plantain and Silk varieties contained 56.47% and 54.37% transposable elements (TEs), respectively, which is consistent with findings in other banana varieties of the Musa genus (Supplemental Table 8). TEs in intergenic regions accounted for 74.90% and 72.35% of the total TEs in Plantain and Silk, respectively. TEs in exonic regions accounted for only 3.69% and 3.57% of the total TEs in Plantain and Silk, respectively. Compared with diploid genomes, the insertion time of intact LTRs in AAB genomes occurred later and with greater frequency, suggesting transposons in triploid bananas are more active (Supplemental Figure 7). The Plantain and Silk genomes had 12,885 and 12,069 intact long terminal repeat retrotransposons (LTR-RTs), respectively. Among these, 66.74% in Plantain and 66.49% in Silk were inserted within the last one million years, which is after M. acuminata and M. balbisiana diverged. This may have driven recent gene duplication events and banana domestication (S upplemental Figure 7; S upplemental Table 9).

The Plantain and Silk genomes encode 88,078 and 94,988 protein-coding genes, respectively, with an average coding sequence length of approximately 1.2 kb and an average of five exons per gene (Supplemental Table 10). Functional information was available for 97.84% and 98.28% of the genes in Plantain and Silk, respectively. In addition, 30,346 and 31,267 noncoding RNAs were annotated in Plantain and Silk, respectively (Supplemental Table 11). Most of the nucleotide-binding domain leucine-rich repeats (NLRs) immune receptors identified in each accession were of the coiled-coil (CNL) variety followed by NB-and-LRR-only proteins (NLs), CCR-NLRs (RNLs), and TIR-NLRs (TNLs), with an uneven distribution across the chromosomes (Supplemental Figure 8; Supplemental Table 12). Furthermore, a total of 435 and 481 putative WRKY genes were identified in Plantain and Silk, respectively, with high expression levels observed in the rhizomes, root tips, and roots, particularly in response to Foc TR4 infection (Supplemental Figure 8D).

Phylogenetic relationships between Musaceae and the ancestors of Plantain and Silk bananas

We constructed a phylogenetic tree of the Musaceae family to determine the evolutionary positions of Plantain and Silk in the family (Figure 2A; Supplemental Table 13). Our findings indicated that subgenome A1 of Plantain (Pa1)/subgenome A2 of Plantain (Pa2) was more closely related to M. acuminata spp. banksii than subgenome A1 of Silk (Sa1)/subgenome A2 of Silk (Sa2), possibly due to varietal differences. Subgenomes B (subgenome B of Silk [Sb] and subgenome B of Plantain [Pb]) and M. balbisiana are in the same clade, which supports a previous study showing that Plantain and Silk originated from a cross between the AA and BB genomes (Cenci et al., 2021). A functional enrichment analysis revealed that expanded gene families in Plantain are enriched in the categories of protein kinase activity, transferase activity, and response to stress, whereas expanded gene families in Silk are enriched in the categories of organic substance biosynthetic process, phosphorus metabolic process, and protein metabolic process (Supplemental Figure 9A; Supplemental Table 14). Notably, expanded gene families in Plantain are more closely related to stress resistance than those in Silk.

Phylogenetic relationships between members of the Musaceae family and genome ancestry mosaics for the triploid cultivars Plantain and Silk.

**(A)** Phylogenetic tree of Plantain, Silk, and the other nine Musaceae species (*Ensete glaucum* [Snow banana], M. *textilis* [Abaca], M. *troglodytarum* [Utafun], M. *balbisiana* [DH-PKW], M. *schizocarpa* [Schizocarpa], M. *acuminata* ssp. *burmannica* [Calcutta 4], M. *acuminata* ssp. *zebrina* [Maia oa], M. *acuminata* ssp. *malaccensis* [DH-Pahang], and M. *acuminata* ssp. *banksii* [Banksii]), including their divergence time based on orthologs of the single gene family.

**(B)** Density distributions of the Ks values for homologous genes. Wvi, *Wurfbainia villosa*.

**(C)** Pedigree composition of subgenome A in the triploid cultivated bananas Plantain and Silk.

**(D)** Chromosome ancestry painting of Plantain. Contributions from ancestral groups are represented along the 11 chromosomes by segments of different colors (green, M. *acuminata* ssp. *banksia;* red, M. *acuminata* ssp. *zebrine;* blue, M. *schizocarpa* of literature).

WGDs have played a significant role in angiosperm genome evolution. A previous study suggests that Musaceae underwent three species-specific WGD events, namely, α/β and γ events (Lescot et al., 2008). After analyzing the Ks peaks in pairwise genome comparisons, we estimated that the α/β event occurred approximately 58.67–59.67 million years ago (Ks = 0.528–0.537), which is in contrast to the WGDs that occurred in W. villosa of the Zingiberaceae family (Yang et al., 2021). The γ event occurred 98.56–100 million years ago (Ks = 0.887–0.900) in both the Musaceae and Zingiberaceae families (Figure 2B and Supplemental Figure 9B). Because the α/β WGD events occurred close together, the Ks values of the collinear block could not be entirely separated. However, we did observe a collinear region on chromosomes 3, 6, 10, and 11 in subgenome A1 of Silk when Ks was ∼0.5. Most paralogous gene clusters shared relationships with three other clusters present in all of the subgenomes, indicating that more than two WGDs had occurred (Supplemental Figure 10).

Understanding the patterns of interspecific introgression can reveal the origins of cultivated bananas (Martin et al., 2023). We precisely characterized the ancestral contributions of Plantain and Silk by examining the ancestry mosaics along the genome (Supplemental Figures 11 and 12). In both Plantain and Silk, there were at least five possible contributors to their A subgenomes. In Plantain, we discovered a dominant contribution (85.54%) from M. acuminata ssp. banksii, along with introgressions from M. acuminata ssp. malaccensis (5.07%), M. acuminata ssp. zebrina (3.11%), M. schizocarpa (4.06%), and M. balbisiana (0.36%). Silk, on the other hand, originated mainly (59.86%) from M. acuminata ssp. malaccensis, with additional regions acquired from M. acuminata ssp. banksii (29.22%), M. acuminata ssp. zebrina (9.55%), and M. schizocarpa (1.06%) (Figure 2C; Supplemental Table 15). These results indicate that subgenome A underwent an extremely complex process of hybridization. Notably, we did not observe any contributions from M. acuminata ssp. burmannica in Plantain and Silk triploids, and we found their B subgenomes to be homogenous (Figure 2D and Supplemental Figure 13). Our findings highlight that the origin of cultivated bananas is more complex than expected and involved multiple hybridization events.

We further investigated genomic variations in the two AAB genomes. The alignment between the Plantain and Silk genomes revealed high collinearity. We found a total of 12,127, 733 SNPs and 1,699,094 indels between the two genomes, with an average of approximately 8.42 SNPs and 1.18 InDels per kilobase (Supplemental Table 16). The distribution of SNPs and InDels was positively correlated, and both were more abundant in intergenic regions (Supplemental Figure 14). We identified 84.70 Mb of inversions between the Plantain and Silk genomes (Supplemental Table 17) and confirmed the authenticity of three inversions using PacBio HiFi reads to align to the assemblies (Supplemental Figure 15). Between the haplotypes of Plantain and Silk, we identified 105–255 and 142–240 inversions, respectively. We also identified 55.81 Mb of translocations consisting of 7435 interchromosomal translocations and 4148 intrachromosomal translocations (Supplemental Table 17). We further characterized 3886–17234 regions identified as presence/absence variations (PAVs). These PAV regions had a cumulative length of 11.04–67.14 Mb and were associated with 743–4262 genes (Supplemental Tables 17 and 18). A KEGG enrichment analysis revealed that messenger RNA biogenesis and starch and sucrose metabolism were the two most highly enriched pathways (Supplemental Figure 16).

Asymmetric evolution between subgenomes in the allotriploid genomes

The loss of homologous genes is a common phenomenon that occurs after polyploidy (Zhao et al., 2017). We observed that regions of gene loss overlapped significantly with homologous exchange (HE) regions, suggesting the loss of chromosomal segments after HE is a key factor in gene loss (Figure 3A and Supplemental Figures 17 and 18). Compared with M. acuminata ssp. malaccensis and M. balbisiana, Plantain lost 6508 genes (3463 in Pa and 3045 in Pb), whereas Silk lost 5237 genes (2917 in Sa and 2320 in Sb). Interestingly, more genes were lost from subgenome A than subgenome B for both Plantain and Silk (Supplemental Table 19). Specifically, more genes were lost from the WRKY33 disease resistance gene family (Zhou et al., 2022) in subgenome A than in subgenome B (Figure 3B). A manual inspection of individual missing genes revealed that only 17.12%–50.75% of the identified lost genes were completely absent, whereas 25.06%–37.85% of the lost genes actually corresponded to gene alterations caused by SNPs, InDels, and TEs. The remaining 24.19%–45.02% of lost genes were not annotated as genes due to a lack of expression (Supplemental Figure 19).

Subgenomic differentiation, asymmetric fractionation, and expression of haplotypes.

**(A)** Asymmetry analysis between subgenomes Pa1 and Pb. All gene loss regions between Pa1 and Pb are shown in yellow blocks. DEA percentage distribution is plotted above or below each chromosome in 100-kb bins. HEs are indicated by the area framed by the box. Black triangles indicate the presence of telomere sequence repeats. Collinear regions between Pa1 and Pb are linked by gray lines.

**(B)** Image shows the evolutionary tree of the WRKY33 gene family in Plantain, and the heatmap shows the expression (log₂ (TPM)) at each time point after pathogen infection. In the evolutionary tree, missing represents the loss of genes in the A1 or A2 subgenomes, and genetic variations resulting in functional changes are indicated by brown type. **(C)** Identification strategies and statistics for alleles. Allelic gene pairs were selected according to the following rules: (1) paired regions must be on homologous haplotypes, (2) when there is one-to-many paired genes, select the one with the higher C-score (score(A, B)/max(score(A), score (B))), (3) three genes paired with each other are three alleles, two genes paired with each other are two alleles, and all others are one allele, and (4) syntenic gene pairs defined above must be double-checked manually. **(D)** DEAs have relatively higher Ka and Ka/Ks values than equivalently expressed alleles in Plantain. P-values were calculated using a two-tailed Student’s t-test.

(E) SNP density in Plantain DEA/EEA features.

Based on the phased haplotypes, 58.46% and 59.37% of the annotated genes were present in all three subgenomes, 28.25% and 25.01% were present in two subgenomes, and 12.18% and 13.91% were present in one subgenome. An average of 2.44 and 2.42 copies of each gene were present in Plantain and Silk, respectively (Figure 3C; Supplemental Table 20). We calculated Ka and Ks values between allelic pairs to assess the evolutionary rate of alleles and found that most Ka/Ks values for alleles were low (<0.05) (Supplemental Figure 20). Approximately 3.81% and 4.35% (3352 and 4132) of the allelic pairs exhibited possible positive selection (Ka/Ks > 1) (Supplemental Table 21). We observed a positive correlation between allelic copies and gene expression, which is consistent with previous analyses of the effects of copy number variation on gene expression (Pham et al., 2017) (Supplemental Figure 21). To compare patterns of homologous gene expression and their divergence between the three subgenomes, we compared genome-wide transcriptional levels of 17,162 and 18,799 homologous gene pairs in different tissues between the Plantain and Silk subgenomes A and B (Supplemental Figure 22). A total of 9014 and 10,015 homoeologous gene pairs (∼62.73% and ∼64.04%) had expression differences larger than two-fold in at least one tissue, including 4669/5774 and 5430/5578 homologs with higher expression in subgenomes A and B, respectively. Among these homologs, 3584/4437 and 4345/4241 had higher expression levels in all tissue types from subgenomes A and B, respectively, whereas 1085 subgenome A homologs and 1337 subgenome B homologs had varying levels of expression depending on the tissue type (Supplemental Table 22). These results reveal the asymmetric expression patterns between subgenomes A and B in Plantain and Silk.

We observed a log-linear increase in differentially expressed alleles (DEAs) as the number of RNA-sequencing (RNA-seq) samples increased, which plateaued after 35 and 23 samples for Plantain and Silk, respectively (Supplemental Figure 23; Supplemental Table 23). A total of 52,338 and 49,388 DEAs were identified in Plantain and Silk, respectively (Figure 3D; Supplemental Table 24). Of these DEAs, 25.76% from Plantain and 23.97% from Silk exhibited significant expression differences among the three alleles. Notably, DEAs had significantly higher Ka (t-test, P = 2.2 × 10⁻¹⁶) and Ka/Ks (t-test, P = 2.0 × 10⁻⁷) values than non-DEAs, suggesting DEAs undergo more rapid evolution. The promoters, exons, introns, 5' UTRs, and 3' UTRs of DEAs had higher SNP densities than non-DEAs, which may underlie the differences in their expression (Figure 3E and Supplemental Figure 23D).

We also investigated the expression of NLR, WRKY22, and leucine-rich repeat receptor-like kinase (LRR-RLK) disease resistance gene families and genes involved in carotenoid synthesis and the ethylene pathway in different subgenomes. The expression of NLRs, WRKY22, and LRR-RLKs was higher in subgenome B than in subgenome A, which may be due to variations between the alleles (Supplemental Figures 24A–24F and 25). The expression of carotenoid synthesis genes was higher in subgenome A than in subgenome B. This was particularly prominent in Silk subgenome A, where it was three times higher than in subgenome B during fruit decomposition. By contrast, the expression of ethylene pathway genes was higher in subgenome A than in subgenome B and higher in Plantain than in Silk during fruit ripening (Supplemental Figure 24G–24N). These findings suggest that asymmetric evolution has significantly impacted the genetic basis of banana disease resistance. Subgenome B appears to contribute more to disease resistance, whereas subgenome A is more involved in carotenoid degradation and ethylene-induced ripening.

Plantain contains more DEGs at an earlier stage of Foc TR4 infection than Silk

We conducted field and pot experiments to assess the differences in resistance against Foc TR4 between Plantain and Silk. We observed typical disease symptoms in Silk plants that included leaf yellowing, pseudostem splitting, and an average rhizome discoloration index (RDI) of 3.7 (Figure 4A, 4B, and Supplemental Table 25). However, infected Plantain plants exhibited no discernible signs of infection and had an RDI of 1 (Figure 4A and Supplemental Table 25). Thickened lignin deposition in the endodermis is associated with diseases caused by multiple soil-borne pathogens such as Aphanomyces euteiches (Djébali et al., 2009), Phytophthora sojae (Thomas et al., 2007), and nematodes (Holbein et al., 2019) among others. We observed thickened lignin deposition in the endodermis and vasculature of Plantain roots compared to Silk roots (Figure 4C). These findings confirm that Plantain is highly resistant to banana Fusarium wilt, whereas Silk is highly susceptible.

Differential responses between Plantain and Silk upon inoculation with *Foc* TR4.

**(A)** Rhizomes of Plantain and Silk plants inoculated with *Foc* TR4 in the field. A deep golden color develops on the inner rhizome in Silk.

**(B)** Rhizomes from Plantain and Silk inoculated with *Foc TR4 were* cut in half longitudinally. The rhizome of Plantain showed no traces of brown discoloration in the lower and center regions, whereas the rhizome of Silk developed an extensive brown discoloration at 3 wpi.

**(C)** Lignin deposition in the roots of Plantain and Silk. Phloroglucinol staining (upper) and autofluorescence (lower) of lignin in the root section. Yellow and red arrowheads indicate thickened lignin deposition at the endodermis and phloem cell walls, respectively. Scale bar corresponds to 100 μm.

**(D)** Venn diagrams of differentially expressed genes from Plantain and Silk at 1 and 3 wpi.

**(E)** Pathway distribution of the 199 differentially expressed genes involved in plant hormone signal transduction. Blue indicates higher and orange indicates lower relative expression in Plantain compared to Silk.

**(F)** Heatmap of gene expression at 1–5 wpi with *Foc* TR4 from (E).

**(G)** Phylogenetic tree of differentially expressed MYBs in Plantain with other known cell-wall-associated MYB transcription factors.

**(H)** *Mp_B_07G08030* (MYB) expression led to enhanced luminescence intensity from LUC driven by the *Mp_A2_01G04240* (PAL) promoter relative to the control. The mean ± SD of three biological replicates is shown along with a Student’s t-test.

**(I)** Electrophoretic mobility shift assay revealed a migrated band, indicating an interaction between PAL and MpMYB36.

To gain a better understanding of the mechanism underlying Fusarium wilt resistance in Plantain, we searched for DEGs at 0, 1, 2, 3, 4, and 5 weeks post-inoculation (wpi) of Foc TR4 for both varieties (Supplemental Table 26). At 1 wpi, 1663 genes were upregulated in Plantain compared to 901 in Silk. However, the number of DEGs in Silk increased rapidly after 2 wpi. By 4 and 5 wpi, there were few DEGs shared between Plantain and Silk (Figure 4D and Supplemental Figure 26). A KEGG enrichment analysis revealed that the DEGs in Plantain at 1 wpi were highly enriched in well-known disease resistance pathways such as plant‒pathogen interaction, plant hormone signal transduction, and phenylpropanoid biosynthesis (Supplemental Figure 27). By contrast, Silk DEGs were enriched in metabolic pathways unrelated to disease resistance at 1 wpi, but this trend was reversed by 3 wpi (Supplemental Table 27). Overall, Plantain had a higher number of DEGs at the earlier stage of Foc TR4 infection than Silk.

To further investigate the differential regulation of pathways involved in plant‒pathogen interactions, we focused on DEGs with putative roles in pathogen-associated molecular pattern-triggered immunity and effector-triggered immunity (Supplemental Figure 28). These DEGs included genes encoding peroxidases, as well as RPS2, CDPK, CEBiP, PTI6, PR1, and CML. Seven of these genes were validated using RT‒qPCR, which confirmed the results from RNA-seq (Supplemental Table 28). We also identified 199 DEGs across six time points that consisted of various plant hormone signaling and response pathways such as the auxin, abscisic acid, ethylene, jasmonic acid, and salicylic acid pathways (Supplemental Table 29). Among these pathways, 130 genes (65.33%) were expressed at higher levels in Plantain than in Silk (Figure 4E). These findings suggest that the banana response to Foc TR4 infection involves multiple phytohormone signaling pathways and responses (Figure 4F).

Phenylpropionic acid and flavonoid biosynthesis are part of secondary metabolism and play an important role in plant defense by strengthening cell walls and producing phytoalexins. We examined the expression of lignin biosynthesis genes in Plantain and Silk and found that the expression of PAL, 4CL, HCT, CCoAOMT, CCR, CAD, and POD/LAC was induced earlier in Plantain upon Foc TR4 infection than in Silk (Supplemental Table 30). Furthermore, we observed that Plantain had a greater number of F5H and COMT genes than Silk. The chalcone synthase gene (Ferrer et al., 1999), which is required for the biosynthesis of the antibacterial flavonoids phytocyanin and anthocyanin in plants, was expressed earlier in Plantain and more copies were present in the Plantain genome than in the Silk genome (S upplemental Figure 29). These findings demonstrate that genes involved in the phenylpropanoid biosynthesis pathway are expressed earlier in Plantain than in Silk in response to Foc TR4 infection. We also found that several DEGs in Plantain with putative roles in disease resistance were differentially expressed at 1 wpi, whereas they were not differentially expressed until 3 wpi in Silk (Supplemental Figure 30; Supplemental Table 31).

MYB genes encode a large family of transcription factors that play an important role in the regulation of lignin biosynthesis (Dubos et al., 2010). A phylogenetic analysis revealed that MpMYB36 (Mp_B_07G08030) belongs to the same subfamily as AtMYB46, which is a second-layer master regulator of secondary cell wall biosynthesis (Zhong et al., 2007). Most of the MYBs in this cluster (Figure 4G) are involved in lignin biosynthesis (Yang et al., 2007; Zhong et al., 2007; McCarthy et al., 2009; Chen et al., 2019). A coexpression network analysis revealed that 34 differentially expressed MYBs in Plantain were grouped with other lignin biosynthesis genes in distinct coexpression clusters after Foc TR4 infection (Supplemental Table 32). Among these MYBs, MpMYB36 expression was positively correlated with 33 other lignin biosynthesis genes (Supplemental Figure 31A and 31B). We analyzed the promoter regions of 11 lignin biosynthesis-related genes to further clarify the role of MpMYB36 in promoting secondary cell wall lignin deposition. At least one AC (a consensus sequence rich in base A and C, which usually exist in the promoter of lignin biosynthesis genes)/secondary cell wall MYB-responsive element was identified in the promoter region of each gene (Supplemental Figure 32). Transient expression of a dual-luciferase reporter demonstrated that coexpression of MpMYB36 with LUC driven by the Mp_A2_01G04240 (PAL) and Mp_A2_10G20840 (HCT) promoters resulted in a significant increase in the LUC/REN ratio (Figure 4H and Supplemental Figure 31C), suggesting MpMYB36 enhances the expression of Mp_A2_01G04240 and Mp_A2_10G20840 through direct promoter binding. Using an electrophoretic mobility shift assay (EMSA), we further confirmed that MpMYB36 can regulate PAL (Figure 4I). These results indicate that MpMYB36 plays an important role in plant defense responses by strengthening cell walls.

Genomic insights into carotenoid synthesis and starch metabolism in cultivated banana

Carotenoids are abundant compounds in bananas that can be converted into vitamin A in humans. We collected RNA from five fruit developmental stages (Figure 5A and Supplemental Figure 33) and found that the carotenoid content in Plantain was higher than that in Silk at each stage of fruit development (Supplemental Figure 34). A total of 44 and 48 genes involved in the carotenoid synthesis pathway were identified in Plantain and Silk, respectively (Supplemental Table 33). The expression of early carotenoid biosynthesis genes was greater than that of late carotenoid biosynthesis genes for both varieties, but the expression of early carotenoid biosynthesis genes was generally higher in Plantain than in Silk (Supplemental Figure 34, Supplemental Table 34). These results suggest that the early biosynthetic pathway has a greater impact on carotenoid synthesis. We also found that CRTISO genes were more highly expressed in Plantain, and carotenoids accumulated to higher levels in Plantain than in Silk (Figure 5B and Supplemental Figures 35 and 36). The coding regions of two Plantain CRTISO genes in subgenome A (CRTISO1 and CRTISO2) were identical to CRTISO3 in subgenome B. An 87-bp insertion was present between the fifth and sixth exons in CRTISO1 and CRTISO2 from Plantain that was absent in the homologs of CRTISO genes of Silk, M. acuminata, and M. balbisiana (Figure 5C and Supplemental Figure 37). We also noticed that residues required for lycopene binding by the CRTISO1 protein were not conserved between the Plantain and Silk homologs. A kinetic simulation of binding revealed that the stability of the interaction between the two CRTISO1 homologs with lycopene was also quite different (Supplemental Figure 38). The Plantain CRTISO1 and CRTISO2 genes were also expressed at higher levels than their homologs in Silk, suggesting the 87-bp insertion in the Plantain homologs may enhance the activity of CRTISO1 and CRTISO2 in Plantain, thereby affecting the carotenoid synthesis pathway.

Multiomics differential analysis of carotenoid and starch metabolism.

**(A)** Overview of the carotenoid synthesis pathway. Genes aligned horizontally in the heatmap indicate genes at each of the five developmental stages in Plantain. We divided genes involved in the carotenoid synthesis pathway into two groups designated as “early” (gray background) and “late”. Low to high gene expression is indicated by a change in color from blue to red. PSY, phytoene synthase; PDS, phytoenedesaturase; ZISO, ε-carotene isomerase; ZDS, ε-carotenedesaturase; *CRTISO*, carotenoid isomerase; LCYE, lycopene δ-cyclase; LCYB, lycopene β-cyclase; BCH, β-carotene hydroxylase; ECH, ε-carotene hydroxylase.

**(B)** Bar plot shows the gene expression profile (TPM) of *CRTISO1* at each of the five developmental stages in Plantain and Silk. Line graph shows changes in total carotenoid content.

**(C)** CDS sequence comparison of *CRTISO* in 11 species (Egl, E. *glaucum*; SY137, M. *troglodytarum*; U9, *Musa* textilis; BB, M. *balbisiana*; SS, M. *schizocarpa*; Bur, M. *acuminata* spp. *burmannica*; Zeb, M. *acuminata* spp. *zebrina*; AA, M. *acuminata* spp. *malaccensis*; Ban, M. *acuminata* spp. *banksii*).

**(D)** Overview of the starch synthesis and degradation pathways. GBSS, granule-bound starch synthase; SSS, soluble starch synthase; SBE, starch branching enzyme; DBE, starch debranching enzyme; AMY, α-amylase; BMY, β-amylase; DPE, starch phosphorylase.

**(E)** Bar graph shows the gene expression profile (TPM) at eight postharvest stages in Plantain and Silk. The other two graphs below show the hydrolysis of fruit starch at eight postharvest stages.

**(F)** Circular bar graph shows the number of genes related to starch metabolism in six gene families from four species: Plantain, Silk, AA, and BB.

Bananas are a high-starch fruit with a high ratio of amylose and amylopectin that can be used to synthesize resistant starch after a treatment of heat and moisture. This type of starch supports a better gut microbiota in the human body. We collected bananas at five developmental stages (S1–S5) and eight postharvest stages for a comparative analysis to better understand the differences in starch content between Plantain and Silk. We identified 90 starch metabolism-related genes in the Plantain genome, including 28 in the starch synthesis pathway and 62 in the starch degradation pathway (Figure 5D). Ninety-eight starch metabolism-related genes were identified in the Silk genome, including 30 in the starch synthesis pathway and 68 in the starch degradation pathway (Supplemental Table 35). During the early stages of starch synthesis, the average expression levels of genes related to starch synthesis in Plantain were higher than those in Silk. We found that most starch accumulated in the early stages (S1–S3) before peaking in the S4–S5 stages (Supplemental Figure 39). There were more β-amylase genes in Plantain and Silk than in M. acuminata and M. balbisiana (Figure 5F; Supplemental Table 36). The average expression level and number of BMY (beta-amylase) genes in Silk were both higher than in Plantain (Figure 5E). Notably, the degradation rates of both amylose and amylopectin in Silk were faster than in Plantain (Supplemental Tables 37 and 38). In summary, genomic and transcriptomic analyses reveal that the number of genes related to starch degradation in Silk and their gene expression levels were higher than those in Plantain.

Discussion

Most of the previously published genomes have been mosaic assemblies. This approach results in a significant loss of information for highly heterozygous polyploid species. Developing a haplotype-resolved genome for such species remains a challenge. There are currently four accepted strategies for phasing these genomes. One approach requires initial contig assembly to be followed by the identification and duplication of collapsed contigs based on read depth, where the augmented set of sequences is subjected to haplotype phasing along with initial phased contigs that results in a fully haplotype-solved assembly (Zhang et al., 2021). Trio binning (Koren et al., 2018) is an approach that can recover both parental haplotypes from F1 individuals by partitioning unique parental reads before assembly, but this approach is quite time-consuming and laborious. Another strategy is to infer regional haplotypes by aligning sequenced reads to reference genomes (Chin et al., 2016), but this approach is limited by the continuity of an available reference assembly. Finally, Hi-C technology has been used to create allele-resolved assemblies (Zhang et al., 2019). Using a combination of ultra-high accuracy PacBio HiFi reads, CLR reads, Hi-C reads, Illumina short reads, telomere-to-telomere gapless chromosomes of the ancestral species, and assembly strategies based on aligning sequenced reads to reference genomes and the use of Hi-C technology, we provide the first two haplotype-resolved assemblies of allotriploid cultivated Plantain and Silk bananas. The contig N50 values, GC content, full-length transcripts, and other indices support a high level of integrity and accuracy for the reference genomes. The first two haplotype-resolved genomes of AAB allotriploid bananas provide a basis for further genetic studies of Musa.

Our genome mosaic results revealed complex and specific hybridization origins for Plantain and Silk that involved at least six ancestors. Subgenome A of both varieties is more complex than initially expected. Plantain is a Banksii-rich cultivar, whereas Silk is a DH-Pahang-rich cultivar. The presence of genomic regions with unknown origins indicates the existence of other unknown ancestors. The results we obtained from comparative genomics are more accurate than those obtained from Illumina reads. The most intuitive result is that each locus is found at its true chromosomal location in both Plantain and Silk. Evidence of recombination between the A and B genomes is clear, confirming that several interspecific hybridization steps occurred at their origins, which has been previously suggested (Cenci et al., 2021). We also identified contributions from M. schizocarpa in both Plantain and Silk that were thought to be restricted to a few M. schizocarpa × M. acuminate cultivars and thought to occur in east African highland bananas. There is also a significant difference between our results for Silk and those reported by Martin et al. (Martin et al., 2023), which we speculate is due to their use of a different Silk cultivar.

Musaceae species share three WGD events. The variation and loss of genomic fragments resulting from these WGDs has led to major changes in gene families across different species. Our results indicate that the functional divergence of subgenomes occurred in polyploid bananas after WGD. Homoeologous exchanges may obscure the signal from expression dominance in the subgenomes of allopolyploids, which can result in a series of rapid genetic and epigenetic modifications of agronomic traits (Bird et al., 2018). Asymmetric subgenomic fractionation occurred in allopolyploids, primarily via the accumulation of small deletions in gene clusters through illegitimate recombination. We observed a large overlap between gene loss regions and homologous exchange (HE) regions, indicating that the loss of chromosomal segments after HE is a key factor in gene loss (Figure 3A). DEAs have profound effects on growth and evolvability. This could be due to differences in the distribution of SNPs in the promoter regions of adjacent genes, which are associated with gene expression. Asymmetric evolution significantly impacted the genetic basis of banana disease resistance. Subgenome B provides a greater contribution to disease resistance than subgenome A, which contributes more to carotenoid degradation and ethylene-induced ripening. These findings provide new resources and guidance for genome-based molecular marker-assisted breeding in bananas.

Fusarium wilt, which is caused by Foc, is a destructive soil-borne fungal disease that severely threatens the sustainable development of the global banana industry. While Foc TR4 can cause severe yield losses in Silk, cooking bananas such as Plantain appear to be resistant (Zuo et al., 2018). MYB transcription factors play an important role in the regulation of lignin biosynthesis (Dubos et al., 2010). We identified an MYB transcription factor located on chr07B of Plantain that is highly positively correlated with 33 lignin biosynthesis genes (Supplemental Figure 31A and 31B). A double luciferase assay confirmed that MpMYB36 (Mp_B_07G08030) directly regulates the expression of PAL (Mp_A2_01G04240) and HCT (Mp_A2_10G20840). Using an EMSA, we further confirmed that MpMYB36 regulates PAL (Mp_A2_01G04240) (Figure 4H, 4I, and Supplemental Figure 31C). The expression of genes involved in the phenylpropanoid biosynthesis pathway also occurs earlier in Plantain than in Silk in response to Foc TR4 infection (Supplemental Figure 29; Supplemental Table 30). Phenylpropionic acid biosynthesis is part of secondary metabolism and plays an important role in plant defense by strengthening cell walls. While we lack direct evidence of the role of MpMYB36 in the resistance against Foc TR4 in Plantain, we speculate that MpMYB36 promotes the increased expression of lignin biosynthesis-related genes through its regulation of PAL. This leads to an increase in lignin accumulation in Plantain that strengthens the cell wall. We summarize our findings on Foc TR4 disease resistance in Plantain in a simple schematic illustration (Supplemental Figure 40).

Bananas are a rich source of ascorbic acid (vitamin C), β-carotene (provitamin A), magnesium (Mg), and potassium (K) (Wall, 2006). Carotenoids present in chromoplasts endow flowers and fruits with their distinct colors (Hirschberg, 2001). There are significant differences in the carotenoid content of Plantain and Silk bananas. The expression of carotenoid synthesis genes was much higher than the expression of carotenoid decomposition genes in the developmental stage, suggesting carotenoid accumulation is crucial for fruit development. CRTISO lies upstream of the carotenoid metabolic pathway and plays an important role in carotenoid synthesis. The genetic structures of CRTISO homologs between Plantain and Silk are very different, which may account for the significantly higher expression of these genes in Plantain compared to Silk. We expect that this would have a positive effect on anabolic carotenoid accumulation. The starch content of the two cultivars was 61.30%–86.76% (Ravi and Mustaffa, 2013). Plantain contains more differentially expressed starch synthesis genes during the developmental period compared to Silk. By contrast, more amylolytic genes are differentially expressed in Silk than in Plantain after the fruit has been picked and ripened. The expression of these amylolytic genes is also higher in Silk, which results in significantly less starch in Silk and greatly reduces its economic value. The carotenoid and starch content of Silk (dessert) and Plantain (cooking) fruits vary widely, but the molecular mechanisms of these differences is still unclear. Additional in-depth studies are needed.

Methods

Plant materials

The Plantain variety used in this study was obtained from the Centre Africain de Recherche sur Bananiers et Plantains, and the Silk variety was obtained from the International Musa Germplasm Transit Center (ITC). The Plantain cultivar is a French Horn from the starchy plantain subgroup and is one of the most popular cultivars in West Africa. The Silk (ITC0769, https://doi.org/10.18730/9KGW1) cultivar is a dessert cultivar bearing sweet acidic fruits with an apple-like flavor. Samples of the two cultivars were collected from the National Center for Banana Genetic Improvement in Guangzhou, China.

SMRTbell library construction

The library target size for SMRTbell library construction depends on the goals of the project and the quality and quantity of the starting gDNA. A g-TUBE was used to shear gDNA fragments to a target size of 10–20 kb for the SMRTbell libraries used in this study. After shearing, AMPure PB Beads were used to concentrate sheared gDNA. ExoVII was used to shear long overhangs before DNA damage repair. T4 DNA polymerase was used to fill in 5′ overhangs and remove 3′ overhangs. T4 PNK was used to phosphorylate the 5′ hydroxyl group. The SMRTbell hairpin adapters included in the Template Prep Kit were ligated to the repaired ends. We then performed size selection using the Blue Pippin System and set the size cutoff threshold dependent on the project goals. AMPure PB Beads were used to concentrate and purify SMRTbell templates after size selection. Sequencing primers were annealed to both ends of the SMRTbell templates, and DNA polymerase was bound to both ends of the SMRTbell templates using the Binding Kit. Finally, a DNA Sequencing Reagent Kit was used to load the libraries in SMRT cells according to the manual (Pacific Biosciences).

Illumina short-read library preparation and sequencing

Prior to Illumina short-read sequencing, samples were run on a 1% agarose gel to check for any DNA degradation or contamination. DNA purity was checked using a NanoPhotometer spectrophotometer (IMPLEN, CA, USA). The DNA concentration was measured using the Qubit DNA Assay Kit in a Qubit 2.0 Fluorometer (Life Technologies, CA, USA). A total of 1.5 μg of DNA per sample was used as input for the DNA sample preparations. Sequencing libraries were generated using the TruSeq Nano DNA HT Sample Preparation Kit (Illumina, CA, USA) following the manufacturer’s recommendations, and index codes were added to trace reads back to their appropriate sample. Libraries were sequenced using an Illumina NovaSeq 6000 platform (Illumina, CA, USA) to generate 150-bp paired-end (PE) reads with insert sizes of approximately 350 bp.

Hi-C library preparation and sequencing

Hi-C library construction followed the standard protocol with a few modifications (Belton et al., 2012). The tissue was ground in liquid nitrogen and crosslinked with 4% formaldehyde solution at room temperature in a vacuum for 30 min. Glycine (2.5 M) was added to quench the crosslinking reaction for 5 min before the samples were placed on ice for 15 min. Samples were centrifuged at 2500 rpm at 4°C for 10 min. The pellet was washed with 500 μl of PBS and then centrifuged for 5 min at 2500 rpm. The pellet was resuspended in 20 μl of lysis buffer (1 M Tris–HCl [pH 8], 1 M NaCl, 10% CA-630, and 13 units of protease inhibitor), and the supernatant was centrifuged at 5000 rpm at room temperature for 10 min. The pellet was washed twice in 100 μl of ice-cold 1× NEB buffer and then centrifuged for 5 min at 5000 rpm. Nuclei were resuspended in 100 μl of NEB buffer, solubilized with dilute SDS, and incubated at 65°C for 10 min. After quenching the SDS with Triton X-100, an overnight digestion was performed at 37°C on a rocking platform with 400 units of DPNII (GATC) restriction enzyme. DNA ends were labeled with biotin-14-dCTP, and crosslinked fragments were blunt-end ligated. The proximal chromatin DNA was re-ligated using a ligation enzyme. Nuclear complexes were reverse-crosslinked by incubating with proteinase K at 65°C. DNA was purified by phenol–chloroform extraction. Biotin was removed from non-ligated fragment ends using T4 DNA polymerase. Fragment ends sheared by sonication (200–600 bp) were repaired using a mixture of T4 DNA polymerase, T4 polynucleotide kinase, and Klenow DNA polymerase. Biotin-labeled Hi-C samples were enriched using streptavidin C1 magnetic beads. After adding A-tails to the fragment ends, Illumina PE sequencing adapters were ligated. Hi-C sequencing libraries were then amplified by PCR (12–14 cycles) and sequenced using the Illumina NovaSeq 6000 platform with 150-bp PE reads.

RNA quantification and transcriptome sequencing

RNA degradation and contamination were evaluated by 1% agarose gel electrophoresis. RNA purity was assessed using a NanoPhotometer spectrophotometer (IMPLEN). RNA integrity was assessed using the RNA Nano 6000 Assay Kit with a Bioanalyzer 2100 system (Agilent Technologies, CA, USA). A total of 1 μg of RNA per sample was used as input material for RNA sample preparation. Sequencing libraries were generated using the NEBNext Ultra RNA Library Prep Kit for Illumina (NEB) following the manufacturer’s recommendations, and index codes were added to trace the reads back to their appropriate sample. Clustering of the index-coded samples was performed using a cBot Cluster Generation System with the TruSeq PE Cluster Kit v3-cBot-HS (Illumina) according to the manufacturer’s instructions. After cluster generation, the library preparations were sequenced using the Illumina NovaSeq 6000 platform with 150-bp PE reads.

Estimation of genome size and heterozygosity

The genome size for each variety was estimated using k-mer frequency analysis, which involves analyzing the distribution of k-mers in the genome with Poisson’s distribution. Prior to assembly, we used Jellyfish v2.2.7 (Marçais and Kingsford, 2011) to generate the 17-mer distribution of 167 Gb (Plantain) and 233 Gb (Silk) Illumina short reads, which we then uploaded to the GenomeScope website (http://qb.cshl.edu/genomescope/). This analysis revealed estimated genome sizes of 1694.56 Mb with a 2.58% heterozygosity rate for the Plantain genome and 1520.59 Mb with a 2.90% heterozygous rate for the Silk genome.

Genome assembly and annotation

We phased PacBio HiFi and Hi-C reads from Plantain and Silk by aligning them to the publicly available AA and BB reference genomes. This was accomplished by mapping reads with Minimap2 (2.18-r1015) (Li H. 2018) with the settings -cx asm20 --secondary=no. Preliminary phasing was then performed using in-house Python scripts. If every alignment position for a given read resided on the AA genome, the read was classified as coming from the AA genome (AA group). If every alignment position for a given read resided on the BB genome, the read was classified as coming from the BB genome (BB group). If the alignment position was present in both the AA and BB genomes, the origin of the read was classified as unknown (unknown group) and was phased in the next step. For unknown reads, we used PP2PG (Feng et al., 2021) with the settings -ax splice–uf --secondary=no -C5 -O6,24 -B4 –MD of Minimap2 and --maxgap=500 --mincluster=100 of MUMmer (4.0.0beta2) (Marçais et al., 2018) to evaluate the SNP sites. Reads with SNPs present in the AA genome were classified as belonging to the AA group, whereas those present in the BB genome were classified as belonging to the BB group.

A flow chart of the genome assembly approach is shown in Figure 1B. We used an assembly scheme that combined phased PacBio HiFi reads and Hi-C reads to assemble the phased genome. For hap A1 and hap A2, Hifiasm (0.15.5-r352) was used with the default settings (Cheng et al., 2022) to build a haplotype assembly. Hifiasm corrects reads and produces a phased assembly graph. It then maps Hi-C short reads to the graph, links unitigs in the assembly graph that share mapped Hi-C fragments, and identifies a bipartitioning of unitigs so that unitigs linked by many Hi-C fragments tend to be grouped together. Hifiasm produces a haplotype-resolved assembly using both the unitig partition and assembly graph. Haplotype B was assembled using Hifiasm with default parameters. For initial assemblies, we used Khaper (Zhang et al., 2021) to select primary contigs and filter redundant sequences. Contigs were anchored onto pseudochromosomes using Juicer v1.6 and the 3D-DNA pipeline (Dudchenko et al., 2017). JuiceBox (Durand et al., 2016) was employed to visualize Hi-C data, and manual modifications were made to obtain the final haploid assemblies.

A combined strategy using homology alignment and de novo searching was used to identify repeat sequences. Tandem repeats were extracted using TRF (Benson, 1999) by ab initio prediction. Homolog prediction was conducted using the Repbase database (Bao et al., 2015) and RepeatMasker (Zhi et al., 2006) with in-house scripts (RepeatProteinMask) and default parameters to extract repeat regions. Based on ab initio prediction, a de novo repetitive element database was built using LTR_FINDER (Xu and Wang, 2007), RepeatScout, and RepeatModeler (Flynn et al., 2020) with default parameters. All repeat sequences with lengths >100 bp and less than 5% N gaps constituted the raw TE library. A custom library was created using a combination of Repbase data and our de novo TE library and processed using UCLUST to create a nonredundant library. RepeatMasker was used to identify DNA-level repeats in this custom library. EDTA (Ou et al., 2019) was used for prediction, and the results from RepeatMasker and EDTA were merged to make the final TEs. A summary of the process of annotating protein-coding genes and noncoding RNA is provided in the supporting information.

Genome assembly assessment

Assessment of scaffold assembly

The quality of the assembled genomes was assessed based on multiple datasets. The LAI (Ou et al., 2018) was calculated using LTR_retriever (v2.9.0) (Ou et al., 2018). The accuracy and completeness of the genome assemblies and predicted genes were evaluated using BUSCO (v5.3.2) (Simão et al., 2015) with the Embryophyta odb10 database. We used Merqury (Rhie et al., 2020), which uses a k-mer approach for the raw sequence reads and genome assembly, to estimate quality values. Illumina PE reads were mapped to the assemblies using BWA (v0.7.10-r789) (Vasimuddin et al., 2019) with default parameters. All of the third-generation sequencing reads were mapped to the genome using Minimap2 (2.18-r1015) (Li H. 2018).

Assessment of phasing quality

We assessed the phasing and switch errors between hap A and hap B in the Plantain and Silk assemblies using Merqury with hap-kmers. A phasing assessment between hap A1 and hap A2 was conducted using ONT ultralong reads. We sequenced 20 Gb of ONT ultralong reads with an average read length of 100 kb for Plantain and Silk. The ONT ultralong reads were mapped to the assemblies using Minimap2 (Li, 2018). When an ONT ultralong read could be uniquely mapped to a single chromosome with a length greater than 70% of its own length, it was categorized as a correct assembly. When a segment of an ONT ultralong read aligned to position 1 and another segment aligned to position 2, both within the same haplotype assembly, it was categorized as an assembly error. If position 1 and position 2 belonged to different haplotype assemblies, it was classified as a switch error. A breakpoint was defined as a site lacking base coverage.

Identification of centromeres

Tandem repeats and satellite DNA sequences are commonly found around the centromeres of many plant species and are classified as centromeric or pericentromeric (Song et al., 2021). Prior studies have indicated that Musa lacks a typical centromeric satellite and its centromeres are instead composed of various types of retrotransposons, particularly Ty3/Gypsy-like elements and a LINE-like element named Nani’a (D’Hont et al., 2012; Belser et al., 2021; Hribová et al., 2010). Several elements of the chromovirus CRM clade, a lineage of Ty3/Gypsy retrotransposons, were restricted to these centromeric regions (Wang et al., 2022). To determine the positions of these transposons, we integrated the annotation results from RepeatMasker, RepeatScount, RepeatModeler, and EDTA to obtain the position of LINE/L1 transposons (RIL code). We then integrated the annotation results from LTRharvest (Ellinghaus et al., 2008), LTR_Finder, and LTR_Retriever (Ou et al., 2018) to obtain the positions of Gypsy-like transposons. Finally, we used TEsorter (Zhang et al., 2022) to further classify the LTR transposons obtained as described above and the distribution of CRM locations. The final pericentromeric position was obtained by combining the regions from LINE/L1, CRM, and Gypsy and manually adjusting them. TEs were categorized following the classification of Wicker et al. (2007).

Identification of NLR and WRKY genes

To identify NLR genes, we searched for genes encoding proteins with an NB domain and either a TIR or CC domain. Proteins with an LRR or CC domain alone were not considered to be NLRs. We further defined NLRs with an N-terminal TIR domain as TNLs, an N-terminal CC domain as CNLs, an N-terminal RPW8 domain as RNLs, and NLRs with neither of these domains as NLs. Canonical NLRs contain an interior NB (Pfam accession PF00931) domain, a C-terminal LRR (PF00560, PF07725, PF13306, PF13855) domain, and either a TIR (PF01582), RPW8 (PF05659), or CC domain at the N-terminus (Van de Weyer et al., 2019).

To comprehensively identify WRKY genes, a hidden Markov model seed file of the WRKY domain (PF03106) was obtained from the Pfam database (http://pfam.sanger.ac.uk/). HMMER 3.3 (Mistry et al., 2013) was used to search WRKY genes from the Plantain and Silk genome database with an E-value threshold of 1e-5. All nonredundant WRKY protein sequences were validated for the presence of WRKY domains by submitting them as search queries to the Pfam and SMART (http://smart.embl.de/) databases. Each potential gene was then manually examined to ensure the conserved heptapeptide sequence at the N-terminal region of the predicted WRKY domain.

Identification of SNPs, InDels, and structural variations

The Plantain and Silk genomes, along with their haplotype genomes, were aligned against each other using MUMmer with parameters -g 1000 -c 90 -L 40. The alignment block was then filtered out of the mapping noise, and the one-to-one alignment was identified using a delta filter with parameters -r –q. Show-snps was used to identify SNPs and InDels (<100 bp) with the parameter -ClrTH. SNPs and InDels were annotated using SnpEff (Cingolani et al., 2012).

To identify inversions and translocations, we aligned the Plantain, Silk, and haplotype genomes against each other using MUMmer. For the original alignment block to be filtered, we selected a unique alignment block that was longer than 1000 bp. SyRI (Goel et al., 2019) was used to identify inversions and translocations on both sides. We used a method published by Sun et al. (2018) to identify genes with large structural variations. Their method maps gene sequences, including the –2 kb upstream and +2 kb downstream regions of each gene, to query genomes using BWA-MEM (Vasimuddin et al., 2019).

Identification of PAVs and HEs

Potential PAVs were identified in the Plantain, Silk, and haplotype genomes using show-diff in MUMmer (Kurtz et al., 2004). Sequences that intersected with gap regions in the respective genome were excluded. Sequences with the BRK feature type were removed and classified as nonreference sequences that aligned to the gap-start or gap-end boundary. Genes with >80% overlap with the PAV region were classified as PAV-related genes.

HEs were identified by aligning Illumina reads to the DH-Pahang and DH-PKW reference genomes with BWA-MEM (Vasimuddin et al., 2019) and preserving unique alignments. The HE loci were identified based on the depth of read coverage.

Gene families of 11 bananas in Musaceae

Protein sequences from nine species (M. textilis [Abaca], M. troglodytarum [Utafun], M. schizocarpa [Schizocarpa], M. acuminata ssp. malaccensis [DH-Pahang], M. acuminata ssp. banksii [Banksii], M. acuminata ssp. burmannica [Calcutta 4], M. acuminata ssp. zebrina [Maia oa], M. balbisiana [DH-PKW], and Ensete glaucum) were downloaded from the Phytozome database to serve as references (Goodstein et al., 2012). For genes with alternative splice variants, the longest transcript was selected to represent the gene. The similarities between sequence pairs were calculated using BlastP with an E-value cutoff of 1e-10. To identify gene family membership based on overall gene similarity, we employed OrthoMCL v2.0.9 (Li et al., 2003) with default parameters in conjunction with Markov chain clustering.

Phylogenomic analysis and ancestor traceability

A total of 2043 single-copy orthologous genes were extracted using OrthoFinder (Emms and Kelly, 2019), and protein sequences were aligned using MAFFT (Katoh et al., 2009). Conserved sites from multiple sequence alignment results were then extracted using Gblocks (Castresana, 2000), a phylogenetic tree was constructed using RAxML (Stamatakis, 2015) with the E. glaucum dataset as the out-group, and 1000 bootstrap analyses were performed to test the robustness of each branch. Divergence time estimates were calculated using MCMCTree (Puttick, 2019) with two secondary calibration points of ∼5.4 and ∼9.8 million years ago. These calibration points represent the time when M. balbisiana split from M. acuminata (∼5.4 million years ago) and E. glaucum split from M. acuminata (∼9.8 million years ago). The iTOL (Letunic and Bork, 2021) tool was used to visualize the phylogenetic tree. Gene families undergoing expansion or contraction were identified in the 11 sequenced species using CAFE (P-value threshold = 0.05 and automatically searched for the λ value) (Han et al., 2013). Genes belonging to significantly expanded gene families were subjected to GO and KEGG functional enrichment analyses.

While no subspecies has been defined thus far in M. balbisiana, M. acuminata is divided into multiple subspecies, among which at least four (M. acuminata ssp. banksii, M. acuminata ssp. zebrina, M. acuminata ssp. burmannica, and M. acuminata ssp. malaccensis) have been identified as contributors to cultivated banana varieties (Perrier et al., 2011). MUMmer (4.0.0beta2) (Kurtz et al., 2004) was used to map Plantain and Silk genomes to M. acuminata ssp. banksii, M. acuminata ssp. malaccensis, M. acuminata ssp. zebrina, M. acuminata ssp. burmannica, M. schizocarpa, and M. balbisiana (DH-PKW + PKW). Mapping results were filtered with the parameters -i 90 -L 1000 using a delta filter, and show-snps was used to identify SNPs between every pair of genomes with the parameters -C -T -r -l -x 1. Each window was set to 100, 200, 500, or 1000 kb to divide the genome. BEDtools (Quinlan, 2014) coverage was used to count the proportion of SNPs in each window according to the number of SNPs. For unmatched windows, the number of SNPs was manually set to NA. Using an in-house Python script, each window was derived from an ancestor scoring judgment. To reduce false positives, the SNP results were adjusted with the collinear blocks (1, 10, and 20 kb) of the genomes. The results were then compared to those from previous studies (accession numbers: Plantain 148 and 149, Silk 139 and 140) (Martin et al., 2023).

Analysis of synteny and WGD

Syntenic blocks were identified using the Python version of jcvi MCScanX (Wang et al., 2012) with default parameters. Proteins were used as queries in searches against genomes of other plant species to find the best matching pairs. Each aligned block represented an orthologous pair derived from the common ancestor. In general, the ratio of the nonsynonymous substitution rate (Ka) to synonymous substitution rate (Ks) was used to assess gene selection by PAML. The sequences of the homologous genes were imported into WGDI (Sun et al., 2022) to calculate the gene pair values.

Statistics of lost homologous gene pairs and expression bias of homologous genes

We aligned the DH-Pahang and DH-PKW genomes and the subgenomes of Plantain and Silk using GeneTribe (Chen et al., 2020) and examined the presence/absence of orthologous pairs in the Pa/Pb and Sa/Sb genomes. We extracted 1:1 orthologous pairs shared between DH-Pahang and DH-PKW and examined the presence/absence of orthologous pairs in the Pa/Sa and Pb/Sb genomes. We then selected a subset of genes that had lost their orthologous paired genes in either the Pa/Pb or Sa/Sb genomes but not in all genomes. This was done to study the mechanism of gene fractionation. A genome-wide chi-squared test was performed to determine if the number of DH-Pahang/Pa lost genes differed significantly from the number of DH-PKW/Pb lost genes (P ≤ 0.05). Genes in each orthologous pair were categorized as singletons or duplicates based on their duplication status in the DH-Pahang and DH-PKW genomes. For orthologous pairs where one gene was annotated as a singleton and another as a duplicate, we calculated the percentage of lost genes in each category and tested for deviation from a 1:1 ratio using the chi-squared test. We measured the Ka/Ks values for lost and conserved orthologous gene pairs based on their counterparts from the DH-Pahang and DH-PKW genomes. We used the sequences of lost genes from M. acuminata or M. balbisiana genomes as queries and mapped them back to the Plantain and Silk genomes using BLASTN v2.7.1 (E-value 1e-10; word_size 30; -qcov_hsp_perc 0.8) (Altschul et al., 1990) to study the segmentation/deletion mechanism.

The triallelic data obtained using MCScan (Tang et al., 2008), and the expression results obtained using RSEM (Li and Dewey, 2011), were used to quantify the corresponding expression levels. The expression level of subgenome A was calculated as (A1 + A2)/2, and expressed genes with a TPM ≥ 1 were selected as candidate genes. A chi-squared test was then performed to determine whether the expression of subgenome A significantly differed from that of subgenome B (P ≤ 0.05) to identify the subgenome with dominant expression. To examine the effect of TE insertions on gene expression, the distance from the nearest TE inserted into the upstream region of a gene was identified using BEDtools (closest -id -D a) (Quinlan, 2014), and the correlations were compared for the orthologous pairs between the A and B genomes.

Identification of alleles and DEAs

To identify the homologous regions between the three haplotypes of Plantain and Silk, we applied MCScan with the parameters --cscore=.99 to obtain the reciprocal best hit to construct syntenic blocks based on well-aligned genes. Allelic gene pairs were selected according to the following rules: (1) paired regions must be in homologous haplotypes, (2) when there are one-to-many paired genes, select the one with the higher C-score (score(A, B)/max(score(A), score(B))), (3) three genes paired with each other are three alleles, two genes paired with each other are two alleles, and all others are one allele, and (4) syntenic gene pairs defined above must be double-checked manually.

The reads generated from RNA-seq of samples from 35 Plantain and 23 Silk sets were trimmed using Trimmomatic (Bolger et al., 2014) and mapped against annotated gene models using STAR/2.7.3a with the parameters --twopassMode Basic --outSAMmultNmax 1 (Dobin et al., 2013). Only the best alignment was retained for each read, and RSEM (Li and Dewey, 2011) was then used to estimate TPM values.

(1)
RNA data without duplicates, first, we sorted TPM values from high to low (I, II, III) for each of the three alleles, and second, we identified DEAs and adopted five standards:
(2)
TPM_A1 ≥ 1 or TPM_A2 ≥ 1 or TPM_B ≥ 1;
(3)
Count_A1 ≥ 10 or Count_A2 ≥ 10 or Count_B ≥ 10;
(4)
I/II or II/III, more than two-fold difference (allele 3);
(5)
TPMA1/TPMA2 ≥ 2 or ≤ 0.5; TPMA1/TPMB ≥ 2 or ≤ 0.5; TPMA2/TPMB ≥ 2 or ≤ 0.5 (allele 2);
(6)
Detected in at least two samples.
(7)
For RNA data with biological replicates, a DEA was identified if the log fold change of TPM values between the two alleles was > 2 with adjusted P < 0.05 and detected in at least two samples.

Foc TR4 culture conditions and inoculant preparation

The Foc TR4 strain was provided by the Guangdong Provincial Key Laboratory of Tropical and Subtropical Fruit Tree Research. The fungal sample (1 cm²) was placed on an ultraclean workbench and inoculated into 50 ml of sterile potato dextrose broth and cultured at 28°C for approximately 7 days. The spore suspension was aspirated, and spores were counted under a light microscope using a hemocytometer. The observed concentration of the spore suspension was 1 × 10⁸ spores/ml, which was diluted to a final concentration of 1 × 10⁶ spores/ml with sterile water.

RNA-seq analysis of Plantain and Silk infected with FOC TR4

RNA-seq was performed on three biological replicates of inoculated and uninoculated Plantain and Silk rhizomes at five different developmental stages (1–5 weeks). Trimmomatic (Bolger et al., 2014) was used to remove low-quality reads. Clean reads were then mapped to the reference genomes of Plantain and Silk using STAR/2.7.3a (Dobin et al., 2013), with only the best alignment retained for each read using the parameters --twopassMode Basic --outSAMmultNmax 1. Mapped reads corresponding to each transcript were assembled, and TPM values were calculated using RSEM. DEG analysis was conducted using DESeq2 from the R Bioconductor package. A P-value threshold of ≤0.05 was used as a cutoff, and increased gene expression was indicated by a log₂FC > 2, whereas decreased gene expression was indicated by a log₂FC < –2. Comparisons were made between inoculated and uninoculated plants at weeks 1–5 post-inoculation for both Plantain and Silk.

Inoculation procedure and disease resistance evaluation

Banana plants at the four- to six-leaf stage were removed from the soil and rinsed. Plant roots were inoculated by complete immersion in a spore suspension for 30 min. The inoculated plants were transferred to nutrient cups (20 cm in top diameter, 15 cm in bottom diameter, 17.5 cm in height) containing sterilized perlite before being placed in a 50-cm long plastic box with a width of 30 cm and a height of 10 cm without a lid. Tap water was maintained at the bottom of the box at a depth of 1–2 cm, and Hoagland’s nutrient solution was added regularly. Silk was used as the susceptible control, and Plantain was used as the resistant control. Before inoculation (week 0), and in the first to fifth weeks after inoculation, three plants of each variety were taken from the inoculated and uninoculated groups (only the control group was sampled at week 0), and the whole plant was longitudinally cut. Photographs were taken, rhizomes were taken as samples, and samples were stored at –80°C. The RDI was calculated according to the discoloration inside the rhizome to evaluate disease resistance. An RDI score greater than five and less than or equal to six indicated high sensitivity.

The field evaluation was performed in the experimental field at the Banana and Vegetable Research Institute in Dongguan City, Guangdong Province. The soil was already infected with Foc TR4, and the incidence rate of disease on Cavendish bananas planted in this field was over 70% (Zuo et al., 2018). No chemicals were applied during the test. Two evaluations were conducted: in November 2020 and November 2021. The rhizomes and pseudostems were cut and photographed once plants exhibited signs of disease or at the end of the experiment if they survived that long.

Coexpression network analysis between differentially expressed MYB transcription factors and lignin biosynthesis genes after Foc TR4 inoculation

The coexpression algorithm in the R package WGCNA (Langfelder and Horvath, 2008) was used to identify coexpression modules. The power value threshold option was disabled while constructing the modules, and the obtained power values ranged from 1 to 20. A gradient technique was employed to determine the average and independent connection degrees of multiple modules. An independence degree of 0.8 was used as the cutoff for the power value. Modules were built using WGCNA once the power value threshold was established, and genes related to each module were examined. The minimum number of genes in a module was set to 30 to ensure the results were highly reliable. Coexpression networks were visualized using Cytoscape (Shannon et al., 2003). MYB transcription factors typically recognize specific AC-rich cis elements ([ACC(A/T)A(A/C) (T/C)]) that are prevalent in the promoters of genes, which regulate lignin biosynthesis such as PAL, 4CL, CCR, and CAD (Zhao and Dixon, 2011).

EMSA

5'-biotinylation labled and unlabled oligonucleutide (cgaagcccaccaacccccca) were used as PAL probe and competitor respectively. 5'-biotin lable oligonucleutide (cgaagccctttttttcccca) was used as mutanted PAL probe. The probe was incubated with the nuclear extract at room temperature for 30 min. The entire reaction mixture was run on a non-denaturing 0.5× TBE 6% polyacrylamide gel for 1 h at 60 V at 4°C before being transferred to a Biodyne B nylon membrane (Pall Corporation, NY, USA). The signal from the probe was visualized using reagents included in the kit and a ChemiDoc XRS system (Bio-Rad Laboratories, CA, USA).

Lignin detection

Fresh root tips were imbedded in 5% low-melting agar (agarose type I; Sigma, MO, USA) and sectioned using a vibratome (Leica VT 1000S) to assess swelling and cell wall modifications. A 10-min treatment with phloroglucinol hydrochloric solution (VWR Prolabo) was used to detect lignin deposits. Autofluorescence was detected using an excitation range of 340–380 nm with a 400 nm dichroic mirror and 425 nm long-pass emission filter. All images were obtained using an inverted microscope (Leica DMIRBE) with a CCD camera (color cooled view; Photonic Science).

Accession numbers

The Plantain and Silk genome assemblies have been deposited in the Genome Warehouse in the National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation, under accession number GWHCAXY00000000 (https://ngdc.cncb.ac.cn/gwh/Assembly/reviewer/nqIfOTpCtsMRttrQgYRpJRkvCPGEfwMIxmXWyNgXMHUtjRNhyZUyDApTkrJKgPjt) and GWHCAXX00000000 (https://ngdc.cncb.ac.cn/gwh/Assembly/reviewer/RPWygVXQyldacPmLODuXkAvCNEcnYWnkcdQFokwGcMmwklEaOorTlArZVFLxSnLW).

Funding

This work was jointly funded by the Strategy of Rural Vitalization of Guangdong Provinces (2022-NPY-00-003, 2022-NJS-00-001), the National Natural Science Foundation of China (32270712), the earmarked fund for CARS (CARS-31-01), GDAAS (202102TD, R2020PY-JX002), the Ba-Gui Scholar Program of Guangxi (to Z.-G.H), the Laboratory of Lingnan Modern Agriculture Project (NT2021004), and the Maoming Branch Grant (2021TDQD003).

Author contributions

L.-L.C., O.S., J.-M.S., and G.Y. conceived and supervised this study. O.S., G.Y., W.H., F.B., Y.L., T.D., G.M., S.L., C.L., Q.Y., C.H., H.G., and T.D. collected samples and performed experiments. W.-Z.X., Y.-Y.Z., L.-L.C., J.-M.S., R.Z., Y.-X.G., W.Z., M.-H.Y., S.-J.P., X.-T.Z., X.-D.X., Z.-W.Z., J.-W.F., and J. Z. performed genome assembly and annotation, comparative genomics analysis, and transcriptome data analysis. J.D. and Q.G. performed karyotype analysis of Plantain and Silk banana. W.-Z.X., Y.-Y.Z., J.-M.S., O.S., and L.-L.C. wrote and revised the paper.

Acknowledgments

We sincerely thank professor Maojun Wang at the Huazhong Agricultural University for his guidance in the asymmetric evolution of Plantain and Silk genomes. We also acknowledge the National Key Laboratory of Crop Genetic Improvement in HZAU for providing the computational resources to conduct this study. No conflict of interest is declared.

Published: November 15, 2023

Footnotes

Published by the Plant Communications Shanghai Editorial Office in association with Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and CEMPS, CAS.

Supplemental information is available at Plant Communications Online.

Contributor Information

Ganjun Yi, Email: yiganjun@vip.163.com.

Jia-Ming Song, Email: jmsong@gxu.edu.cn.

Ou Sheng, Email: shengou6@126.com.

Ling-Ling Chen, Email: llchen@gxu.edu.cn.

Supplemental information

Document S1. Supplemental Figures 1–40 and Supplemental Tables 1–4b–11–13, 15–17, 19, 20, 22–26, 28, 30, and 33–38

mmc1.pdf^{(57.4MB, pdf)}

Data S1. Supplemental Tables 4a, 12, 14, 18, 21, 27, 29, 31, and 32

mmc2.xlsx^{(7.1MB, xlsx)}

Document S2. Article plus supplemental information

mmc3.pdf^{(64.6MB, pdf)}

References

Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
Akyeampong E., Escalant J.V. In: Bananas and Food Security. Boto I., Fouré E., Ngalani J., Thornton T., Valat M., editors. CIRAD; 1998. Plantains in west and central Africa: an overview. Montpellier 10–11. [Google Scholar]
Bao W., Kojima K.K., Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Belser C., Baurens F.C., Noel B., Martin G., Cruaud C., Istace B., Yahiaoui N., Labadie K., Hřibová E., Doležel J., et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun. Biol. 2021;4:1047. doi: 10.1038/s42003-021-02559-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
Belton J.M., McCord R.P., Gibcus J.H., Naumova N., Zhan Y., Dekker J. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012;58:268–276. doi: 10.1016/j.ymeth.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
Bird K.A., VanBuren R., Puzey J.R., Edger P.P. The causes and consequences of subgenome dominance in hybrids and recent polyploids. New Phytol. 2018;220:87–93. doi: 10.1111/nph.15256. [DOI] [PubMed] [Google Scholar]
Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 2000;17:540–552. doi: 10.1093/oxfordjournals.molbev.a026334. [DOI] [PubMed] [Google Scholar]
Cenci A., Sardos J., Hueber Y., Martin G., Breton C., Roux N., Swennen R., Carpentier S.C., Rouard M. Unravelling the complex story of intergenomic recombination in ABB allotriploid bananas. Ann. Bot. 2021;127:7–20. doi: 10.1093/aob/mcaa032. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen K., Song M., Guo Y., Liu L., Xue H., Dai H., Zhang Z. MdMYB46 could enhance salt and osmotic stress tolerance in apple by directly activating stress-responsive signals. Plant Biotechnol. J. 2019;17:2341–2355. doi: 10.1111/pbi.13151. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen Y., Song W., Xie X., Wang Z., Guan P., Peng H., Jiao Y., Ni Z., Sun Q., Guo W. A collinearity-incorporating homology inference strategy for connecting emerging assemblies in the Triticeae tribe as a pilot practice in the plant pangenomic era. Mol. Plant. 2020;13:1694–1708. doi: 10.1016/j.molp.2020.09.019. [DOI] [PubMed] [Google Scholar]
Cheng H., Jarvis E.D., Fedrigo O., Koepfli K.P., Urban L., Gemmell N.J., Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 2022;40:1332–1335. doi: 10.1038/s41587-022-01261-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chin C.S., Peluso P., Sedlazeck F.J., Nattestad M., Concepcion G.T., Clum A., Dunn C., O'Malley R., Figueroa-Balderas R., Morales-Cruz A., et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cingolani P., Platts A., Wang L.L., Coon M., Nguyen T., Wang L., Land S.J., Lu X., Ruden D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
D’Hont A., Denoeud F., Aury J.M., et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–217. doi: 10.1038/nature11241. [DOI] [PubMed] [Google Scholar]
D’Hont A., Paget-Goy A., Escoute J., Carreel F. The interspecific genome structure of cultivated banana, Musa spp. revealed by genomic DNA in situ hybridization. Theor. Appl. Genet. 2000;100:177–183. [Google Scholar]
Djébali N., Jauneau A., Ameline-Torregrosa C., Chardon F., Jaulneau V., Mathé C., Bottin A., Cazaux M., Pilet-Nayel M.L., Baranger A., et al. Partial resistance of Medicago truncatula to Aphanomyces euteiches is associated with protection of the root stele and is controlled by a major QTL rich in proteasome-related genes. Mol. Plant Microbe Interact. 2009;22:1043–1055. doi: 10.1094/MPMI-22-9-1043. [DOI] [PubMed] [Google Scholar]
Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
Dubos C., Stracke R., Grotewold E., Weisshaar B., Martin C., Lepiniec L. MYB transcription factors in Arabidopsis. Trends Plant Sci. 2010;15:573–581. doi: 10.1016/j.tplants.2010.06.005. [DOI] [PubMed] [Google Scholar]
Dudchenko O., Batra S.S., Omer A.D., Nyquist S.K., Hoeger M., Durand N.C., Shamim M.S., Machol I., Lander E.S., Aiden A.P., et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
Durand N.C., Shamim M.S., Machol I., Rao S.S.P., Huntley M.H., Lander E.S., Aiden E.L. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ellinghaus D., Kurtz S., Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinf. 2008;9:18. doi: 10.1186/1471-2105-9-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
Emms D.M., Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
FAOSTAT Crops . 2022. (Food and Agriculture Organization of the United Nations, 2022.http://www.fao.org/faostat/en/#data/QC [Google Scholar]
Feng J.W., Lu Y., Shao L., Zhang J., Li H., Chen L.L. Phasing analysis of the transcriptome and epigenome in a rice hybrid reveals the inheritance and difference in DNA methylation and allelic transcription regulation. Plant Commun. 2021;2 doi: 10.1016/j.xplc.2021.100185. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ferrer J.L., Jez J.M., Bowman M.E., Dixon R.A., Noel J.P. Structure of chalcone synthase and the molecular basis of plant polyketide biosynthesis. Nat. Struct. Biol. 1999;6:775–784. doi: 10.1038/11553. [DOI] [PubMed] [Google Scholar]
Flynn J.M., Hubley R., Goubert C., Rosen J., Clark A.G., Feschotte C., Smit A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goel M., Sun H., Jiao W.B., Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019;20:277. doi: 10.1186/s13059-019-1911-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goodstein D.M., Shu S., Howson R., Neupane R., Hayes R.D., Fazo J., Mitros T., Dirks W., Hellsten U., Putnam N., et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40:D1178–D1186. doi: 10.1093/nar/gkr944. [DOI] [PMC free article] [PubMed] [Google Scholar]
Han M.V., Thomas G.W.C., Lugo-Martinez J., Hahn M.W. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 2013;30:1987–1997. doi: 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]
Hirschberg J. Carotenoid biosynthesis in flowering plants. Curr. Opin. Plant Biol. 2001;4:210–218. doi: 10.1016/s1369-5266(00)00163-1. [DOI] [PubMed] [Google Scholar]
Holbein J., Franke R.B., Marhavý P., Fujita S., Górecka M., Sobczak M., Geldner N., Schreiber L., Grundler F.M.W., Siddique S. Root endodermal barrier system contributes to defence against plant-parasitic cyst and root-knot nematodes. Plant J. 2019;100:221–236. doi: 10.1111/tpj.14459. [DOI] [PubMed] [Google Scholar]
Hribová E., Neumann P., Matsumoto T., Roux N., Macas J., Dolezel J. Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing. BMC Plant Biol. 2010;10:204. doi: 10.1186/1471-2229-10-204. [DOI] [PMC free article] [PubMed] [Google Scholar]
Katoh K., Asimenos G., Toh H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 2009;537:39–64. doi: 10.1007/978-1-59745-251-9_3. [DOI] [PubMed] [Google Scholar]
Kema G.H.J., Drenth A. Vol. 2. Burleigh Dodds Science Publishing; 2020. Achieving sustainable cultivation of bananas. (Germplasm and Genetic Improvement). [Google Scholar]
Koren S., Rhie A., Walenz B.P., Dilthey A.T., Bickhart D.M., Kingan S.B., Hiendleder S., Williams J.L., Smith T.P.L., Phillippy A.M. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 2018;36:1174–1182. doi: 10.1038/nbt.4277. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kurtz S., Phillippy A., Delcher A.L., Smoot M., Shumway M., Antonescu C., Salzberg S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]
Lescot M., Piffanelli P., Ciampi A.Y., Ruiz M., Blanc G., Leebens-Mack J., da Silva F.R., Santos C.M.R., D'Hont A., Garsmeur O., et al. Insights into the Musa genome: syntenic relationships to rice and between Musa species. BMC Genom. 2008;9:58. doi: 10.1186/1471-2164-9-58. [DOI] [PMC free article] [PubMed] [Google Scholar]
Letunic I., Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–W296. doi: 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li B., Dewey C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li L., Stoeckert C.J., Jr., Roos D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marçais G., Delcher A.L., Phillippy A.M., Coston R., Salzberg S.L., Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 2018;14 doi: 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
Marçais G., Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martin G., Baurens F.C., Droc G., Rouard M., Cenci A., Kilian A., Hastie A., Doležel J., Aury J.M., Alberti A., et al. Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genom. 2016;17:243. doi: 10.1186/s12864-016-2579-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Martin G., Cottin A., Baurens F.C., Labadie K., Hervouet C., Salmon F., Paulo-de-la-Reberdiere N., Van den Houwe I., Sardos J., Aury J.M., et al. Interspecific introgression patterns reveal the origins of worldwide cultivated bananas in New Guinea. Plant J. 2023;113:802–818. doi: 10.1111/tpj.16086. [DOI] [PubMed] [Google Scholar]
McCarthy R.L., Zhong R., Ye Z.H. MYB83 is a direct target of SND1 and acts redundantly with MYB46 in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell Physiol. 2009;50:1950–1964. doi: 10.1093/pcp/pcp139. [DOI] [PubMed] [Google Scholar]
Mistry J., Finn R.D., Eddy S.R., Bateman A., Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013;41:e121. doi: 10.1093/nar/gkt263. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ou S., Chen J., Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ou S., Su W., Liao Y., Chougule K., Agda J.R.A., Hellinga A.J., Lugo C.S.B., Elliott T.A., Ware D., Peterson T., et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Perrier X., De Langhe E., Donohue M., Lentfer C., Vrydaghs L., Bakry F., Carreel F., Hippolyte I., Horry J.P., Jenny C., et al. Multidisciplinary perspectives on banana (Musa spp.) domestication. Proc. Natl. Acad. Sci. USA. 2011;108:11311–11318. doi: 10.1073/pnas.1102001108. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pham G.M., Newton L., Wiegert-Rininger K., Vaillancourt B., Douches D.S., Buell C.R. Extensive genome heterogeneity leads to preferential allele expression and copy number-dependent expression in cultivated potato. Plant J. 2017;92:624–637. doi: 10.1111/tpj.13706. [DOI] [PubMed] [Google Scholar]
Puttick M.N. MCMCtreeR: functions to prepare MCMCtree analyses and visualize posterior ages on trees. Bioinformatics. 2019;35:5321–5322. doi: 10.1093/bioinformatics/btz554. [DOI] [PubMed] [Google Scholar]
Quinlan A.R. BEDTools: the swiss-army tool for genome feature analysis. Curr. Protoc. Bioinformatics. 2014;47:1–34. doi: 10.1002/0471250953.bi1112s47. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ravi I., Mustaffa M.M. Starch and amylose variability in banana cultivars. Indian J. Plant Physiol. 2013;18:83–87. [Google Scholar]
Rhie A., Walenz B.P., Koren S., Phillippy A.M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
Robinson J.C., Sauco V.G. 2nd ed. CABI Publishing; 2010. Bananas and Plantains. [Google Scholar]
Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., Amin N., Schwikowski B., Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sheng O., Yin Z., Huang W., Chen M., Du M., Kong Q., Fernie A.R., Yi G., Yan S. Metabolic profiling reveals genotype-associated alterations in carotenoid content during banana postharvest ripening. Food Chem. 2023;403 doi: 10.1016/j.foodchem.2022.134380. [DOI] [PubMed] [Google Scholar]
Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
Simmonds N.W., Shepherd K. The taxonomy and origins of the cultivated bananas. Bot. J. Linn. Soc. 1955;55:302–312. [Google Scholar]
Song J.M., Xie W.Z., Wang S., Guo Y.X., Koo D.H., Kudrna D., Gong C., Huang Y., Feng J.W., Zhang W., et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant. 2021;14:1757–1767. doi: 10.1016/j.molp.2021.06.018. [DOI] [PubMed] [Google Scholar]
Sun P., Jiao B., Yang Y., Shan L., Li T., Li X., Xi Z., Wang X., Liu J. WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol. Plant. 2022;15:1841–1851. doi: 10.1016/j.molp.2022.10.018. [DOI] [PubMed] [Google Scholar]
Sun S., Zhou Y., Chen J., Shi J., Zhao H., Zhao H., Song W., Zhang M., Cui Y., Dong X., et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat. Genet. 2018;50:1289–1295. doi: 10.1038/s41588-018-0182-0. [DOI] [PubMed] [Google Scholar]
Stamatakis A. Using RAxML to infer phylogenies. Curr. Protoc. Bioinformatics. 2015;51:6.14.1–6.14.14. doi: 10.1002/0471250953.bi0614s51. [DOI] [PubMed] [Google Scholar]
Tang H., Bowers J.E., Wang X., Ming R., Alam M., Paterson A.H. Synteny and collinearity in plant genomes. Science. 2008;320:486–488. doi: 10.1126/science.1153917. [DOI] [PubMed] [Google Scholar]
Thomas R., Fang X., Ranathunge K., Anderson T.R., Peterson C.A., Bernards M.A. Soybean root suberin: anatomical distribution, chemical composition, and relationship to partial resistance to Phytophthora sojae. Plant Physiol. 2007;144:299–311. doi: 10.1104/pp.106.091090. [DOI] [PMC free article] [PubMed] [Google Scholar]
Van de Weyer A.L., Monteiro F., Furzer O.J., Nishimura M.T., Cevik V., Witek K., Jones J.D.G., Dangl J.L., Weigel D., Bemm F. A species-wide inventory of NLR genes and alleles in Arabidopsis thaliana. Cell. 2019;178:1260–1272.e14. doi: 10.1016/j.cell.2019.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
Vasimuddin M., Misra S., Li H., Aluru S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE International Parallel and Distributed Processing Symposium (IPDPS; 2019. IEEE. [Google Scholar]
Wall M.M. Ascorbic acid, vitamin A, and mineral composition of banana (Musa sp.) and papaya (Carica papaya) cultivars grown in Hawaii. J. Food Compos. Anal. 2006;19:434–445. [Google Scholar]
Wang Y., Tang H., Debarry J.D., Tan X., Li J., Wang X., Lee T.H., Jin H., Marler B., Guo H., et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Z., Miao H., Liu J., Xu B., Yao X., Xu C., Zhao S., Fang X., Jia C., Wang J., et al. Musa balbisiana genome reveals subgenome evolution and functional divergence. Nat. Plants. 2019;5:810–821. doi: 10.1038/s41477-019-0452-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang Z., Rouard M., Biswas M.K., Droc G., Cui D., Roux N., Baurens F.C., Ge X.J., Schwarzacher T., Heslop-Harrison P.J.S., et al. A chromosome-level reference genome of Ensete glaucum gives insight into diversity and chromosomal and repetitive sequence evolution in the Musaceae. GigaScience. 2022;11:giac027. doi: 10.1093/gigascience/giac027. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wicker T., Sabot F., Hua-Van A., Bennetzen J.L., Capy P., Chalhoub B., Flavell A., Leroy P., Morgante M., Panaud O., et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 2007;8:973–982. doi: 10.1038/nrg2165. [DOI] [PubMed] [Google Scholar]
Xu Z., Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang C., Xu Z., Song J., Conner K., Vizcay Barrena G., Wilson Z.A. Arabidopsis MYB26/MALE STERILE35 regulates secondary thickening in the endothecium and is essential for anther dehiscence. Plant Cell. 2007;19:534–548. doi: 10.1105/tpc.106.046391. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yang P., Zhao H.Y., Wei J.S., Zhao Y.Y., Lin X.J., Su J., Li F.P., Li M., Ma D.M., Tan X.K., et al. Chromosome-level genome assembly and functional characterization of terpene synthases provide insights into the volatile terpenoid biosynthesis of Wurfbainia villosa. Plant J. 2021;112:630–645. doi: 10.1111/tpj.15968. [DOI] [PubMed] [Google Scholar]
Zhan N., Kuang M., He W., Deng G., Liu S., Li C., Roux N., Dita M., Yi G., Sheng O. Evaluation of resistance of banana genotypes with AAB genome to Fusarium wilt Tropical Race 4 in China. J. Fungi. 2022;8:1274. doi: 10.3390/jof8121274. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang R.G., Li G.Y., Wang X.L., Dainat J., Wang Z.X., Ou S., Ma Y. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 2022;9 doi: 10.1093/hr/uhac017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang X., Chen S., Shi L., Gong D., Zhang S., Zhao Q., Zhan D., Vasseur L., Wang Y., Yu J., et al. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet. 2021;53:1250–1259. doi: 10.1038/s41588-021-00895-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhang X., Zhang S., Zhao Q., Ming R., Tang H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants. 2019;5:833–845. doi: 10.1038/s41477-019-0487-8. [DOI] [PubMed] [Google Scholar]
Zhao M., Zhang B., Lisch D., Ma J. Patterns and consequences of subgenome differentiation provide insights into the nature of paleopolyploidy in plants. Plant Cell. 2017;29:2974–2994. doi: 10.1105/tpc.17.00595. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhao Q., Dixon R.A. Transcriptional networks for lignin biosynthesis: more complex than we thought? Trends Plant Sci. 2011;16:227–233. doi: 10.1016/j.tplants.2010.12.005. [DOI] [PubMed] [Google Scholar]
Zhi D., Raphael B.J., Price A.L., Tang H., Pevzner P.A. Identifying repeat domains in large genomes. Genome Biol. 2006;7:R7. doi: 10.1186/gb-2006-7-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhong R., Richardson E.A., Ye Z.H. The MYB46 transcription factor is a direct target of SND1 and regulates secondary wall biosynthesis in Arabidopsis. Plant Cell. 2007;19:2776–2792. doi: 10.1105/tpc.107.053678. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zhou J., Mu Q., Wang X., Zhang J., Yu H., Huang T., He Y., Dai S., Meng X. Multilayered synergistic regulation of phytoalexin biosynthesis by ethylene, jasmonate, and MAPK signaling pathways in Arabidopsis. Plant Cell. 2022;34:3066–3087. doi: 10.1093/plcell/koac139. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zuo C., Deng G., Li B., Huo H., Li C., Hu C., Kuang R., Yang Q., Dong T., Sheng O., et al. Germplasm screening of Musa spp. for resistance to Fusarium oxysporum f. sp. cubense tropical race 4 (Foc TR4) Eur. J. Plant Pathol. 2018;151:723–734. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supplemental Figures 1–40 and Supplemental Tables 1–4b–11–13, 15–17, 19, 20, 22–26, 28, 30, and 33–38

mmc1.pdf^{(57.4MB, pdf)}

Data S1. Supplemental Tables 4a, 12, 14, 18, 21, 27, 29, 31, and 32

mmc2.xlsx^{(7.1MB, xlsx)}

Document S2. Article plus supplemental information

mmc3.pdf^{(64.6MB, pdf)}

[bib1] Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]

[bib2] Akyeampong E., Escalant J.V. In: Bananas and Food Security. Boto I., Fouré E., Ngalani J., Thornton T., Valat M., editors. CIRAD; 1998. Plantains in west and central Africa: an overview. Montpellier 10–11. [Google Scholar]

[bib3] Bao W., Kojima K.K., Kohany O. Repbase update, a database of repetitive elements in eukaryotic genomes. Mobile DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib4] Belser C., Baurens F.C., Noel B., Martin G., Cruaud C., Istace B., Yahiaoui N., Labadie K., Hřibová E., Doležel J., et al. Telomere-to-telomere gapless chromosomes of banana using nanopore sequencing. Commun. Biol. 2021;4:1047. doi: 10.1038/s42003-021-02559-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib5] Belton J.M., McCord R.P., Gibcus J.H., Naumova N., Zhan Y., Dekker J. Hi-C: a comprehensive technique to capture the conformation of genomes. Methods. 2012;58:268–276. doi: 10.1016/j.ymeth.2012.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib6] Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib7] Bird K.A., VanBuren R., Puzey J.R., Edger P.P. The causes and consequences of subgenome dominance in hybrids and recent polyploids. New Phytol. 2018;220:87–93. doi: 10.1111/nph.15256. [DOI] [PubMed] [Google Scholar]

[bib8] Bolger A.M., Lohse M., Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 2000;17:540–552. doi: 10.1093/oxfordjournals.molbev.a026334. [DOI] [PubMed] [Google Scholar]

[bib10] Cenci A., Sardos J., Hueber Y., Martin G., Breton C., Roux N., Swennen R., Carpentier S.C., Rouard M. Unravelling the complex story of intergenomic recombination in ABB allotriploid bananas. Ann. Bot. 2021;127:7–20. doi: 10.1093/aob/mcaa032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib11] Chen K., Song M., Guo Y., Liu L., Xue H., Dai H., Zhang Z. MdMYB46 could enhance salt and osmotic stress tolerance in apple by directly activating stress-responsive signals. Plant Biotechnol. J. 2019;17:2341–2355. doi: 10.1111/pbi.13151. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib12] Chen Y., Song W., Xie X., Wang Z., Guan P., Peng H., Jiao Y., Ni Z., Sun Q., Guo W. A collinearity-incorporating homology inference strategy for connecting emerging assemblies in the Triticeae tribe as a pilot practice in the plant pangenomic era. Mol. Plant. 2020;13:1694–1708. doi: 10.1016/j.molp.2020.09.019. [DOI] [PubMed] [Google Scholar]

[bib13] Cheng H., Jarvis E.D., Fedrigo O., Koepfli K.P., Urban L., Gemmell N.J., Li H. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 2022;40:1332–1335. doi: 10.1038/s41587-022-01261-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib14] Chin C.S., Peluso P., Sedlazeck F.J., Nattestad M., Concepcion G.T., Clum A., Dunn C., O'Malley R., Figueroa-Balderas R., Morales-Cruz A., et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib15] Cingolani P., Platts A., Wang L.L., Coon M., Nguyen T., Wang L., Land S.J., Lu X., Ruden D.M. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib90] D’Hont A., Denoeud F., Aury J.M., et al. The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012;488:213–217. doi: 10.1038/nature11241. [DOI] [PubMed] [Google Scholar]

[bib16] D’Hont A., Paget-Goy A., Escoute J., Carreel F. The interspecific genome structure of cultivated banana, Musa spp. revealed by genomic DNA in situ hybridization. Theor. Appl. Genet. 2000;100:177–183. [Google Scholar]

[bib17] Djébali N., Jauneau A., Ameline-Torregrosa C., Chardon F., Jaulneau V., Mathé C., Bottin A., Cazaux M., Pilet-Nayel M.L., Baranger A., et al. Partial resistance of Medicago truncatula to Aphanomyces euteiches is associated with protection of the root stele and is controlled by a major QTL rich in proteasome-related genes. Mol. Plant Microbe Interact. 2009;22:1043–1055. doi: 10.1094/MPMI-22-9-1043. [DOI] [PubMed] [Google Scholar]

[bib18] Dobin A., Davis C.A., Schlesinger F., Drenkow J., Zaleski C., Jha S., Batut P., Chaisson M., Gingeras T.R. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib19] Dubos C., Stracke R., Grotewold E., Weisshaar B., Martin C., Lepiniec L. MYB transcription factors in Arabidopsis. Trends Plant Sci. 2010;15:573–581. doi: 10.1016/j.tplants.2010.06.005. [DOI] [PubMed] [Google Scholar]

[bib20] Dudchenko O., Batra S.S., Omer A.D., Nyquist S.K., Hoeger M., Durand N.C., Shamim M.S., Machol I., Lander E.S., Aiden A.P., et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib21] Durand N.C., Shamim M.S., Machol I., Rao S.S.P., Huntley M.H., Lander E.S., Aiden E.L. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib22] Ellinghaus D., Kurtz S., Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinf. 2008;9:18. doi: 10.1186/1471-2105-9-18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib23] Emms D.M., Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib24] FAOSTAT Crops . 2022. (Food and Agriculture Organization of the United Nations, 2022.http://www.fao.org/faostat/en/#data/QC [Google Scholar]

[bib25] Feng J.W., Lu Y., Shao L., Zhang J., Li H., Chen L.L. Phasing analysis of the transcriptome and epigenome in a rice hybrid reveals the inheritance and difference in DNA methylation and allelic transcription regulation. Plant Commun. 2021;2 doi: 10.1016/j.xplc.2021.100185. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib26] Ferrer J.L., Jez J.M., Bowman M.E., Dixon R.A., Noel J.P. Structure of chalcone synthase and the molecular basis of plant polyketide biosynthesis. Nat. Struct. Biol. 1999;6:775–784. doi: 10.1038/11553. [DOI] [PubMed] [Google Scholar]

[bib27] Flynn J.M., Hubley R., Goubert C., Rosen J., Clark A.G., Feschotte C., Smit A.F. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 2020;117:9451–9457. doi: 10.1073/pnas.1921046117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib28] Goel M., Sun H., Jiao W.B., Schneeberger K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 2019;20:277. doi: 10.1186/s13059-019-1911-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib29] Goodstein D.M., Shu S., Howson R., Neupane R., Hayes R.D., Fazo J., Mitros T., Dirks W., Hellsten U., Putnam N., et al. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40:D1178–D1186. doi: 10.1093/nar/gkr944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib30] Han M.V., Thomas G.W.C., Lugo-Martinez J., Hahn M.W. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol. Biol. Evol. 2013;30:1987–1997. doi: 10.1093/molbev/mst100. [DOI] [PubMed] [Google Scholar]

[bib31] Hirschberg J. Carotenoid biosynthesis in flowering plants. Curr. Opin. Plant Biol. 2001;4:210–218. doi: 10.1016/s1369-5266(00)00163-1. [DOI] [PubMed] [Google Scholar]

[bib32] Holbein J., Franke R.B., Marhavý P., Fujita S., Górecka M., Sobczak M., Geldner N., Schreiber L., Grundler F.M.W., Siddique S. Root endodermal barrier system contributes to defence against plant-parasitic cyst and root-knot nematodes. Plant J. 2019;100:221–236. doi: 10.1111/tpj.14459. [DOI] [PubMed] [Google Scholar]

[bib33] Hribová E., Neumann P., Matsumoto T., Roux N., Macas J., Dolezel J. Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing. BMC Plant Biol. 2010;10:204. doi: 10.1186/1471-2229-10-204. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib34] Katoh K., Asimenos G., Toh H. Multiple alignment of DNA sequences with MAFFT. Methods Mol. Biol. 2009;537:39–64. doi: 10.1007/978-1-59745-251-9_3. [DOI] [PubMed] [Google Scholar]

[bib35] Kema G.H.J., Drenth A. Vol. 2. Burleigh Dodds Science Publishing; 2020. Achieving sustainable cultivation of bananas. (Germplasm and Genetic Improvement). [Google Scholar]

[bib36] Koren S., Rhie A., Walenz B.P., Dilthey A.T., Bickhart D.M., Kingan S.B., Hiendleder S., Williams J.L., Smith T.P.L., Phillippy A.M. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 2018;36:1174–1182. doi: 10.1038/nbt.4277. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] Kurtz S., Phillippy A., Delcher A.L., Smoot M., Shumway M., Antonescu C., Salzberg S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib38] Langfelder P., Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 2008;9:559. doi: 10.1186/1471-2105-9-559. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] Lescot M., Piffanelli P., Ciampi A.Y., Ruiz M., Blanc G., Leebens-Mack J., da Silva F.R., Santos C.M.R., D'Hont A., Garsmeur O., et al. Insights into the Musa genome: syntenic relationships to rice and between Musa species. BMC Genom. 2008;9:58. doi: 10.1186/1471-2164-9-58. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] Letunic I., Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–W296. doi: 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib41] Li B., Dewey C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinf. 2011;12:323. doi: 10.1186/1471-2105-12-323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib42] Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib44] Li L., Stoeckert C.J., Jr., Roos D.S. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 2003;13:2178–2189. doi: 10.1101/gr.1224503. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib45] Marçais G., Delcher A.L., Phillippy A.M., Coston R., Salzberg S.L., Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput. Biol. 2018;14 doi: 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib46] Marçais G., Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27:764–770. doi: 10.1093/bioinformatics/btr011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib47] Martin G., Baurens F.C., Droc G., Rouard M., Cenci A., Kilian A., Hastie A., Doležel J., Aury J.M., Alberti A., et al. Improvement of the banana "Musa acuminata" reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genom. 2016;17:243. doi: 10.1186/s12864-016-2579-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib48] Martin G., Cottin A., Baurens F.C., Labadie K., Hervouet C., Salmon F., Paulo-de-la-Reberdiere N., Van den Houwe I., Sardos J., Aury J.M., et al. Interspecific introgression patterns reveal the origins of worldwide cultivated bananas in New Guinea. Plant J. 2023;113:802–818. doi: 10.1111/tpj.16086. [DOI] [PubMed] [Google Scholar]

[bib49] McCarthy R.L., Zhong R., Ye Z.H. MYB83 is a direct target of SND1 and acts redundantly with MYB46 in the regulation of secondary cell wall biosynthesis in Arabidopsis. Plant Cell Physiol. 2009;50:1950–1964. doi: 10.1093/pcp/pcp139. [DOI] [PubMed] [Google Scholar]

[bib50] Mistry J., Finn R.D., Eddy S.R., Bateman A., Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013;41:e121. doi: 10.1093/nar/gkt263. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib51] Ou S., Chen J., Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] Ou S., Su W., Liao Y., Chougule K., Agda J.R.A., Hellinga A.J., Lugo C.S.B., Elliott T.A., Ware D., Peterson T., et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib54] Perrier X., De Langhe E., Donohue M., Lentfer C., Vrydaghs L., Bakry F., Carreel F., Hippolyte I., Horry J.P., Jenny C., et al. Multidisciplinary perspectives on banana (Musa spp.) domestication. Proc. Natl. Acad. Sci. USA. 2011;108:11311–11318. doi: 10.1073/pnas.1102001108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] Pham G.M., Newton L., Wiegert-Rininger K., Vaillancourt B., Douches D.S., Buell C.R. Extensive genome heterogeneity leads to preferential allele expression and copy number-dependent expression in cultivated potato. Plant J. 2017;92:624–637. doi: 10.1111/tpj.13706. [DOI] [PubMed] [Google Scholar]

[bib56] Puttick M.N. MCMCtreeR: functions to prepare MCMCtree analyses and visualize posterior ages on trees. Bioinformatics. 2019;35:5321–5322. doi: 10.1093/bioinformatics/btz554. [DOI] [PubMed] [Google Scholar]

[bib57] Quinlan A.R. BEDTools: the swiss-army tool for genome feature analysis. Curr. Protoc. Bioinformatics. 2014;47:1–34. doi: 10.1002/0471250953.bi1112s47. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib58] Ravi I., Mustaffa M.M. Starch and amylose variability in banana cultivars. Indian J. Plant Physiol. 2013;18:83–87. [Google Scholar]

[bib59] Rhie A., Walenz B.P., Koren S., Phillippy A.M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib60] Robinson J.C., Sauco V.G. 2nd ed. CABI Publishing; 2010. Bananas and Plantains. [Google Scholar]

[bib61] Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., Amin N., Schwikowski B., Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. doi: 10.1101/gr.1239303. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib62] Sheng O., Yin Z., Huang W., Chen M., Du M., Kong Q., Fernie A.R., Yi G., Yan S. Metabolic profiling reveals genotype-associated alterations in carotenoid content during banana postharvest ripening. Food Chem. 2023;403 doi: 10.1016/j.foodchem.2022.134380. [DOI] [PubMed] [Google Scholar]

[bib63] Simão F.A., Waterhouse R.M., Ioannidis P., Kriventseva E.V., Zdobnov E.M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[bib64] Simmonds N.W., Shepherd K. The taxonomy and origins of the cultivated bananas. Bot. J. Linn. Soc. 1955;55:302–312. [Google Scholar]

[bib65] Song J.M., Xie W.Z., Wang S., Guo Y.X., Koo D.H., Kudrna D., Gong C., Huang Y., Feng J.W., Zhang W., et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant. 2021;14:1757–1767. doi: 10.1016/j.molp.2021.06.018. [DOI] [PubMed] [Google Scholar]

[bib66] Sun P., Jiao B., Yang Y., Shan L., Li T., Li X., Xi Z., Wang X., Liu J. WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol. Plant. 2022;15:1841–1851. doi: 10.1016/j.molp.2022.10.018. [DOI] [PubMed] [Google Scholar]

[bib67] Sun S., Zhou Y., Chen J., Shi J., Zhao H., Zhao H., Song W., Zhang M., Cui Y., Dong X., et al. Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes. Nat. Genet. 2018;50:1289–1295. doi: 10.1038/s41588-018-0182-0. [DOI] [PubMed] [Google Scholar]

[bib68] Stamatakis A. Using RAxML to infer phylogenies. Curr. Protoc. Bioinformatics. 2015;51:6.14.1–6.14.14. doi: 10.1002/0471250953.bi0614s51. [DOI] [PubMed] [Google Scholar]

[bib69] Tang H., Bowers J.E., Wang X., Ming R., Alam M., Paterson A.H. Synteny and collinearity in plant genomes. Science. 2008;320:486–488. doi: 10.1126/science.1153917. [DOI] [PubMed] [Google Scholar]

[bib70] Thomas R., Fang X., Ranathunge K., Anderson T.R., Peterson C.A., Bernards M.A. Soybean root suberin: anatomical distribution, chemical composition, and relationship to partial resistance to Phytophthora sojae. Plant Physiol. 2007;144:299–311. doi: 10.1104/pp.106.091090. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib71] Van de Weyer A.L., Monteiro F., Furzer O.J., Nishimura M.T., Cevik V., Witek K., Jones J.D.G., Dangl J.L., Weigel D., Bemm F. A species-wide inventory of NLR genes and alleles in Arabidopsis thaliana. Cell. 2019;178:1260–1272.e14. doi: 10.1016/j.cell.2019.07.038. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib91] Vasimuddin M., Misra S., Li H., Aluru S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE International Parallel and Distributed Processing Symposium (IPDPS; 2019. IEEE. [Google Scholar]

[bib72] Wall M.M. Ascorbic acid, vitamin A, and mineral composition of banana (Musa sp.) and papaya (Carica papaya) cultivars grown in Hawaii. J. Food Compos. Anal. 2006;19:434–445. [Google Scholar]

[bib73] Wang Y., Tang H., Debarry J.D., Tan X., Li J., Wang X., Lee T.H., Jin H., Marler B., Guo H., et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib74] Wang Z., Miao H., Liu J., Xu B., Yao X., Xu C., Zhao S., Fang X., Jia C., Wang J., et al. Musa balbisiana genome reveals subgenome evolution and functional divergence. Nat. Plants. 2019;5:810–821. doi: 10.1038/s41477-019-0452-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib75] Wang Z., Rouard M., Biswas M.K., Droc G., Cui D., Roux N., Baurens F.C., Ge X.J., Schwarzacher T., Heslop-Harrison P.J.S., et al. A chromosome-level reference genome of Ensete glaucum gives insight into diversity and chromosomal and repetitive sequence evolution in the Musaceae. GigaScience. 2022;11:giac027. doi: 10.1093/gigascience/giac027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib76] Wicker T., Sabot F., Hua-Van A., Bennetzen J.L., Capy P., Chalhoub B., Flavell A., Leroy P., Morgante M., Panaud O., et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 2007;8:973–982. doi: 10.1038/nrg2165. [DOI] [PubMed] [Google Scholar]

[bib77] Xu Z., Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007;35:W265–W268. doi: 10.1093/nar/gkm286. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib78] Yang C., Xu Z., Song J., Conner K., Vizcay Barrena G., Wilson Z.A. Arabidopsis MYB26/MALE STERILE35 regulates secondary thickening in the endothecium and is essential for anther dehiscence. Plant Cell. 2007;19:534–548. doi: 10.1105/tpc.106.046391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib79] Yang P., Zhao H.Y., Wei J.S., Zhao Y.Y., Lin X.J., Su J., Li F.P., Li M., Ma D.M., Tan X.K., et al. Chromosome-level genome assembly and functional characterization of terpene synthases provide insights into the volatile terpenoid biosynthesis of Wurfbainia villosa. Plant J. 2021;112:630–645. doi: 10.1111/tpj.15968. [DOI] [PubMed] [Google Scholar]

[bib80] Zhan N., Kuang M., He W., Deng G., Liu S., Li C., Roux N., Dita M., Yi G., Sheng O. Evaluation of resistance of banana genotypes with AAB genome to Fusarium wilt Tropical Race 4 in China. J. Fungi. 2022;8:1274. doi: 10.3390/jof8121274. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib81] Zhang R.G., Li G.Y., Wang X.L., Dainat J., Wang Z.X., Ou S., Ma Y. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res. 2022;9 doi: 10.1093/hr/uhac017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib82] Zhang X., Chen S., Shi L., Gong D., Zhang S., Zhao Q., Zhan D., Vasseur L., Wang Y., Yu J., et al. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet. 2021;53:1250–1259. doi: 10.1038/s41588-021-00895-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib83] Zhang X., Zhang S., Zhao Q., Ming R., Tang H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants. 2019;5:833–845. doi: 10.1038/s41477-019-0487-8. [DOI] [PubMed] [Google Scholar]

[bib84] Zhao M., Zhang B., Lisch D., Ma J. Patterns and consequences of subgenome differentiation provide insights into the nature of paleopolyploidy in plants. Plant Cell. 2017;29:2974–2994. doi: 10.1105/tpc.17.00595. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib85] Zhao Q., Dixon R.A. Transcriptional networks for lignin biosynthesis: more complex than we thought? Trends Plant Sci. 2011;16:227–233. doi: 10.1016/j.tplants.2010.12.005. [DOI] [PubMed] [Google Scholar]

[bib86] Zhi D., Raphael B.J., Price A.L., Tang H., Pevzner P.A. Identifying repeat domains in large genomes. Genome Biol. 2006;7:R7. doi: 10.1186/gb-2006-7-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib87] Zhong R., Richardson E.A., Ye Z.H. The MYB46 transcription factor is a direct target of SND1 and regulates secondary wall biosynthesis in Arabidopsis. Plant Cell. 2007;19:2776–2792. doi: 10.1105/tpc.107.053678. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib88] Zhou J., Mu Q., Wang X., Zhang J., Yu H., Huang T., He Y., Dai S., Meng X. Multilayered synergistic regulation of phytoalexin biosynthesis by ethylene, jasmonate, and MAPK signaling pathways in Arabidopsis. Plant Cell. 2022;34:3066–3087. doi: 10.1093/plcell/koac139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib89] Zuo C., Deng G., Li B., Huo H., Li C., Hu C., Kuang R., Yang Q., Dong T., Sheng O., et al. Germplasm screening of Musa spp. for resistance to Fusarium oxysporum f. sp. cubense tropical race 4 (Foc TR4) Eur. J. Plant Pathol. 2018;151:723–734. [Google Scholar]

PERMALINK

Two haplotype-resolved genome assemblies for AAB allotriploid bananas provide insights into banana subgenome asymmetric evolution and Fusarium wilt control

Wen-Zhao Xie

Yu-Yu Zheng

Weidi He

Fangcheng Bi

Yaoyao Li

Tongxin Dou

Run Zhou

Yi-Xiong Guo

Guiming Deng

Wenhui Zhang

Min-Hui Yuan

Pablo Sanz-Jimenez

Xi-Tong Zhu

Xin-Dong Xu

Zu-Wen Zhou

Zhi-Wei Zhou

Jia-Wu Feng

Siwen Liu

Chunyu Li

Qiaosong Yang

Chunhua Hu

Huijun Gao

Tao Dong

Jiangbo Dang

Qigao Guo

Wenguo Cai

Jianwei Zhang

Ganjun Yi

Jia-Ming Song

Ou Sheng

Ling-Ling Chen

Abstract

Introduction

Results

Haplotype assembly and annotation of two AAB banana genomes: Plantain and Silk

Figure 1.

Table 1.

Phylogenetic relationships between Musaceae and the ancestors of Plantain and Silk bananas

Figure 2.

Asymmetric evolution between subgenomes in the allotriploid genomes

Figure 3.

Plantain contains more DEGs at an earlier stage of Foc TR4 infection than Silk

Figure 4.

Genomic insights into carotenoid synthesis and starch metabolism in cultivated banana

Figure 5.

Discussion

Methods

Plant materials

SMRTbell library construction

Illumina short-read library preparation and sequencing

Hi-C library preparation and sequencing

RNA quantification and transcriptome sequencing

Estimation of genome size and heterozygosity

Genome assembly and annotation

Genome assembly assessment

Assessment of scaffold assembly

Assessment of phasing quality

Identification of centromeres

Identification of NLR and WRKY genes

Identification of SNPs, InDels, and structural variations

Identification of PAVs and HEs

Gene families of 11 bananas in Musaceae

Phylogenomic analysis and ancestor traceability

Analysis of synteny and WGD

Statistics of lost homologous gene pairs and expression bias of homologous genes

Identification of alleles and DEAs

Foc TR4 culture conditions and inoculant preparation

Inoculation procedure and disease resistance evaluation

Coexpression network analysis between differentially expressed MYB transcription factors and lignin biosynthesis genes after Foc TR4 inoculation

EMSA

Lignin detection

Accession numbers

Funding

Author contributions

Acknowledgments

Footnotes

Contributor Information

Supplemental information