Skip to main content
Plant Communications logoLink to Plant Communications
. 2023 Feb 9;4(3):100557. doi: 10.1016/j.xplc.2023.100557

A high-quality, phased genome assembly of broomcorn millet reveals the features of its subgenome evolution and 3D chromatin organization

Zhiheng Wang 1,5, Shihui Huang 2,3,5, Zhengyue Yang 1, Jinsheng Lai 4, Xiang Gao 1,, Junpeng Shi 1,∗∗
PMCID: PMC10203449  PMID: 36760128

Dear Editor,

Grasses of the genus Panicum grow in natural and agricultural ecosystems worldwide and include about 450 species distributed throughout tropical and temperate regions. Most Panicum grasses remain unexploited, with the exceptions of broomcorn millet (Pmiliaceum) (Shi et al., 2019; Zou et al., 2019), switchgrass (Pvirgatum) (Lovell et al., 2021), Hall’s panicgrass (P. hallii) (Lovell et al., 2018), and a few other species that have been successfully domesticated into staple, forage, and energy crops. Broomcorn millet (2n = 4x = 36) is probably one of the earliest domesticated grain crops, originating in North China around 10 000 years ago. Switchgrass is another sister tetraploid (2n = 4x = 36) that has been widely researched for bioenergy in North America. With superior resistance to abiotic and biotic stresses, minimal input requirements, and high resource use efficiency, they and other Panicums are ideal pioneers for a large area of marginal lands and therefore hold great promise for contributing to global food and energy security. Draft genomes of broomcorn millet and switchgrass have recently been reported (Shi et al., 2019; Zou et al., 2019; Lovell et al., 2021), revealing highly conserved synteny despite apparent differences in genome size (∼850 vs. ∼1130 Mb). Although two close time points (∼6.7 and ∼5.9 million years ago [Mya]) associated with their tetraploidization have been established, it remains unclear whether one or two independent polyploidization events occurred. In addition, the subgenomes of these tetraploids have not been well partitioned owing to incomplete information on their diploid ancestors, and relatively little is known about differences in sequence and transcription between the subgenomes. Improved genome sequences are therefore needed to perform comparative analysis and promote molecular breeding.

To this end, we sequenced the broomcorn millet accession Longmi4 using PacBio HiFi technology, generating 42.3 Gb clean reads (∼51×) with an N50 of ∼15.6 kb (Supplemental Figure 1). All HiFi reads were assembled with Hifiasm (Cheng et al., 2021) and then verified, filtered, and curated to produce the final Longmi_v2 assembly (∼846.0 Mb; Supplemental Figure 2; Supplemental Table 1). The contig N50 of Longmi_v2 has been dramatically improved to ∼26.2 Mb without a significant change in genome size compared with Longmi_v1 (∼848.4 Mb; Supplemental Table 2). A total of 525 gaps in Longmi_v1 have been closed, and ∼85.2% of these filled sequences came from transposable elements (TEs). Using Hi-C data, more than 99% of the contigs have been anchored to 18 pseudo-chromosomes (Figure 1A). About 98.9% of the Embryophyta Benchmarking Universal Single Copy Orthologs (Manni et al., 2021) were annotated, and the LTR Assembly Index (20.8) (Ou et al., 2018) was higher than that of Longmi_v1 (15.9) and switchgrass AP13 (17.3), indicating the high completeness of our assembly in both genic and non-genic regions (Supplemental Table 2). By integrating ab initio prediction, evidence-based prediction, and homology searches, we annotated 60 096 high-confidence genes. Although this gene number was lower than that of Longmi_v1 (63 671), the percentage of Benchmarking Universal Single Copy Orthologs in the encoded proteins was significantly higher (83.7%–96.9%). Functional annotations could be assigned to ∼94.9% of the genes using InterProScan (Blum et al., 2021). We annotated TEs in the Longmi_v2 genome using the EDTA (Extensive de-novo TE Annotator) pipeline (Ou et al., 2019). Approximately 57.9% of the Longmi_v2 genome was found to consist of TEs (Supplemental Table 3), slightly more than that of Longmi_v1 (∼55.4%), a difference that could be explained largely by Helitrons (6.8% vs. 5.1%). About 1.8% more Gypsy retroelements were annotated in Longmi_v2, and unclassified long terminal repeats decreased from 5.1% to 4%. These results suggest a better representation of TEs in the improved Longmi_v2 genome.

Figure 1.

Figure 1

Assembly, annotation, and comparative analysis of the Longmi_v2 genome.

(A) The genome landscape of Longmi_v2. Collinear links between the two subgenomes are represented in the inner track.

(B) Clustering of 13-mers that enabled partitioning of the Longmi_v2 genome into two subgenomes. A total of 55 385 13-mers are specific to subgenome 1, and the remaining 29 401 13-mers are specific to subgenome 2. The maximum number of subgenome-specific k-mers on each chromosome, if greater than 30, was set to 30 to enable better heatmap representation.

(C) Syntenic relationships between the two subgenomes of Longmi_v2. Three major translocations (LM1A–LM7B, LM3A–LM8B, and LM8A–LM3B) are shown in yellow and blue.

(D–G) Comparison of gene expression levels of one-to-one syntenic genes (n = 16 359) between the two subgenomes of Longmi_v2 in the root (D), stem (E), seedling (F) and leaf (G). The x axis represents the log10-transformed transcripts per million value in the two subgenomes, and the y axis represents the number of expressed genes. The purple and green bars represent genes in subgenome 1 and subgenome 2, respectively.

(H)Ks distribution between subgenomes of broomcorn millet and switchgrass. LM1 and LM2 are the two subgenomes of broomcorn millet, and AP13K and AP13N are the two subgenomes of switchgrass.

(I) A better-resolved phylogenetic tree of Panicum inferred from the Ks values of mutual genome or subgenome comparisons. The molecular clock was estimated on the basis of the divergence time of sorghum and maize (Ks ∼ 0.152, 11.9 Mya). Allotetraploidization events in broomcorn millet and switchgrass are indicated with stars.

(J) An illustration showing the timing of the split between the two diploid ancestors (∼6.3 Mya) and their later combination (∼2 Mya) into an allotetraploid in broomcorn millet.

(K) A comparison of gene transcription levels from the A and B compartments of Longmi_v2.

The two subgenomes of broomcorn millet were partitioned using a novel k-mer-based approach (Figure 1B and Supplemental Figure 3) that identified 55 385 and 29 401 13-mers present specifically in subgenome 1 (∼470 Mb) and subgenome 2 (∼367 Mb), respectively. We also verified that this method could successfully reconstruct the K and N subgenomes of switchgrass AP13 (Lovell et al., 2021) (Supplemental Figure 3). Our results were consistent with another newly released broomcorn millet genome (Jinshu7) (Sun et al., 2022) in which the two subgenomes were phased using P. hallii as a diploid reference (Supplemental Figure 4). Three major translocation events were found between the two subgenomes of broomcorn millet (Figure 1C), which were supported by canonical Hi-C interaction of the involved chromosomes (Supplemental Figure 5). TEs, especially Gypsy (180.3 vs. 107.9 Mb) and Helitrons (26.1 vs. 17.6 Mb), can almost completely explain the size difference between the subgenomes (103.0 Mb) (Supplemental Table 3). Although more genes were annotated in subgenome 1 (31 043) than in subgenome 2 (28 567), gene density was higher in subgenome 2 (77 vs. 66 genes/Mb). We previously revealed that, unlike those of maize (Schnable et al., 2011), the subgenomes of broomcorn millet have experienced no apparent gene fractionation (Shi et al., 2019). RNA sequencing data from four tissues revealed overall similar transcript abundances for 16 359 one-to-one syntenic gene pairs between the two subgenomes (Figure 1D–1G). The number of syntenic genes that exhibited differential expression (>two-fold) between the two subgenomes was also similar (Supplemental Figure 6). For the remaining non-syntenic genes (14 684 vs. 12 208), their expression levels were also largely balanced (Supplemental Figure 7). These data suggest no apparent transcriptional dominance between the two subgenomes of broomcorn millet.

To better resolve the phylogeny, we calculated the synonymous substitution rate (Ks) of syntenic genes between the two subgenomes, revealing a major peak in switchgrass (∼0.074, ∼6.68 Mya) that slightly predated that in broomcorn millet (∼0.068, ∼6.30 Mya), consistent with previous reports (Shi et al., 2019; Lovell et al., 2021). The two subgenomes of broomcorn millet and switchgrass were cross compared, resulting in four nearly identical Ks peaks around 0.085 (Figure 1H), indicating that they shared the same lineage before ∼8.1 Mya. These results suggest that two independent polyploidization events were likely to have occurred in broomcorn millet and switchgrass, consistent with a previous report based on five nuclear genes (Bennetzen et al., 2012). Interestingly, although broomcorn millet is primarily grown in China and other Eurasian regions and has a geographic distribution distinct from that of switchgrass and P. hallii (both mainly in North America) (Lovell et al., 2018), it exhibits closer genetic relatedness to P. hallii than to switchgrass (Supplemental Figure 8), especially for subgenome 2, which shared the same lineage with P. hallii until 2 Mya (Figure 1I). Notably, the activity of both Gypsy and Copia elements was similar in the two subgenomes of broomcorn millet within the last 2 million years (Supplemental Figure 9), suggesting that its tetraploidization was very likely to have been completed near 2 Mya (Figure 1J). Relatively recent tetraploidization is also consistent with the high degree of gene retention and balanced gene expression between the two subgenomes.

Finally, we investigated the chromatin organization of broomcorn millet using Hi-C data and found a much larger inactive B compartment (∼486.3 Mb) than active A compartment (∼346.5 Mb). All chromosomes exhibited a typical A–B–A pattern (two A compartments separated by a B compartment), except for LM7A and LM7B, which showed a truncated A–B pattern (Figure 1A). Interestingly, their homologous chromosome in foxtail millet (chr7) also showed a similar A–B pattern (Dong et al., 2020), suggesting that this kind of 3D chromatin structure was likely to have formed in the common ancestor of Panicum and Setaria. The B compartment was consistently larger than the A compartment in both subgenome 1 (285.6 vs. 181.0 Mb) and subgenome 2 (200.7 vs. 165.5 Mb; Supplemental Table 4). Considering the relatively minor size difference in the A compartment between the two subgenomes, compartment B and its highly associated Gypsy elements can largely explain the size difference between the two subgenomes. As expected, more genes were expressed (transcripts per million > 1) in the A compartment (68.8%) than in the B compartment (50.6%); by contrast, there were more than twice as many unexpressed genes (transcripts per million = 0) in the B compartment than in the A compartment (36.6% vs. 17.9%) (Figure 1K).

In summary, we report a platinum-grade genome of broomcorn millet with two well-partitioned subgenomes. Consistent with their balanced gene retention, the two subgenomes exhibited no apparent transcriptional dominance. Comparative analysis revealed that two independent polyploidization events likely occurred in broomcorn millet and switchgrass. The chromatin architecture of broomcorn millet was explored, and the transcriptionally inactive B compartment and its strongly associated Gypsy elements largely explained the size difference between the subgenomes. These data will expand our understanding of Panicum evolution and further accelerate genome-assisted breeding of Panicum crops.

Funding

This work was supported by the National Natural Science Foundation of China (31901596) and the Young Elite Scientists Sponsorship Program by CAST (2021QNRC001) to J.S.

Author contributions

J.S., X.G., and J.L. designed this research project. Z.W. and S.H. analyzed most of the data. Z.Y., X.G., and J.L. participated in data analysis and interpretation. X.G. was involved in sample collection and sequencing. J.S., Z.W., and S.H. wrote the manuscript. J.S. and X.G. finalized the manuscript. All authors have read and approved the manuscript.

Acknowledgments

We thank Professor Song Weibin, Dr. Zhao Hainan, Dr. Lu Qiong, and Dr. Bai Yuhe from China Agricultural University for preparing the plant materials and sequencing libraries and for helpful discussions. No conflict of interest is declared.

Published: February 9, 2023

Footnotes

Published by the Plant Communications Shanghai Editorial Office in association with Cell Press, an imprint of Elsevier Inc., on behalf of CSPB and CEMPS, CAS.

Supplemental information is available at Plant Communications Online.

Contributor Information

Xiang Gao, Email: gaox87@mail.sysu.edu.cn.

Junpeng Shi, Email: shijp6@mail.sysu.edu.cn.

Accession numbers

The genome assembly and annotations have been deposited in the Genome Warehouse at the National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences/China National Center for Bioinformation, under accession number GWHAAEZ00000000.1. The RNA sequencing and PacBio HiFi data have been deposited in the Genome Sequence Archive at the National Genomics Data Center, Beijing Institute of Genomics, under accession number GSA: CRA009679 and at the NCBI with accession numbers SRR20315261–SRR20315266. The genome sequences have also been deposited in DDBJ/ENA/Genbank under accession PPDP00000000. The version described in this paper is PPDP03000000.

All custom codes used in this study are available upon request or can be accessed from GitHub (https://github.com/imhuangshihui/Broomcorn_millet_genome).

Supplemental information

Document S1. Supplemental methods, Supplemental Figures 1–9, and Supplemental Tables 1–4
mmc1.pdf (461KB, pdf)
Document S2. Article plus supplemental information
mmc2.pdf (1.2MB, pdf)

References

  1. Bennetzen J.L., Schmutz J., Wang H., Percifield R., Hawkins J., Pontaroli A.C., Estep M., Feng L., Vaughn J.N., Grimwood J., et al. Reference genome sequence of the model plant setaria. Nat. Biotechnol. 2012;30:555–561. doi: 10.1038/nbt.2196. [DOI] [PubMed] [Google Scholar]
  2. Blum M., Chang H., Chuguransky S., Grego T., Kandasaamy S., Mitchell A., Nuka G., Paysan-Lafosse T., Qureshi M., Raj S., et al. The interpro protein families and domains database: 20 years on. Nucleic Acids Res. 2021;49:D344–D354. doi: 10.1093/nar/gkaa977. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Cheng H., Concepcion G.T., Feng X., Zhang H., Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Dong P., Tu X., Li H., Zhang J., Grierson D., Li P., Zhong S. Tissue-specific hi-c analyses of rice, foxtail millet and maize suggest non-canonical function of plant chromatin domains. J. Integr. Plant Biol. 2020;62:201–217. doi: 10.1111/jipb.12809. [DOI] [PubMed] [Google Scholar]
  5. Lovell J.T., Jenkins J., Lowry D.B., Mamidi S., Sreedasyam A., Weng X., Barry K., Bonnette J., Campitelli B., Daum C., et al. The genomic landscape of molecular responses to natural drought stress in Panicum hallii. Nat. Commun. 2018;9:5213. doi: 10.1038/s41467-018-07669-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Lovell J.T., MacQueen A.H., Mamidi S., Bonnette J., Jenkins J., Napier J.D., Sreedasyam A., Healey A., Session A., Shu S., et al. Genomic mechanisms of climate adaptation in polyploid bioenergy switchgrass. Nature. 2021;590:438–444. doi: 10.1038/s41586-020-03127-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Manni M., Berkeley M.R., Seppey M., Simão F.A., Zdobnov E.M. Busco update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol. 2021;38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Ou S., Chen J., Jiang N. Assessing genome assembly quality using the ltr assembly index (lai) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Ou S., Su W., Liao Y., Chougule K., Agda J.R.A., Hellinga A.J., Lugo C.S.B., Elliott T.A., Ware D., Peterson T., et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Schnable J.C., Springer N.M., Freeling M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl. Acad. Sci. USA. 2011;108:4069–4074. doi: 10.1073/pnas.1101368108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Shi J., Ma X., Zhang J., Zhou Y., Liu M., Huang L., Sun S., Zhang X., Gao X., Zhan W., et al. Chromosome conformation capture resolved near complete genome assembly of broomcorn millet. Nat. Commun. 2019;10:464. doi: 10.1038/s41467-018-07876-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Sun Y., Liu Y., Shi J., et al. Biased mutations and gene losses underlying diploidization of the tetraploid broomcorn millet genome. Plant J. 2023;113:787–801. doi: 10.1111/tpj.16085. [DOI] [PubMed] [Google Scholar]
  13. Zou C., Li L., Miki D., Li D., Tang Q., Xiao L., Rajput S., Deng P., Peng L., Jia W., et al. The genome of broomcorn millet. Nat. Commun. 2019;10:436. doi: 10.1038/s41467-019-08409-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Document S1. Supplemental methods, Supplemental Figures 1–9, and Supplemental Tables 1–4
mmc1.pdf (461KB, pdf)
Document S2. Article plus supplemental information
mmc2.pdf (1.2MB, pdf)

Articles from Plant Communications are provided here courtesy of Elsevier

RESOURCES