Chromosome-scale assembly of the wild wheat relative Aegilops umbellulata

Michael Abrouk; Yajun Wang; Emile Cavalet-Giorsa; Maxim Troukhan; Maksym Kravchuk; Simon G Krattinger

doi:10.1038/s41597-023-02658-2

. 2023 Oct 25;10:739. doi: 10.1038/s41597-023-02658-2

Chromosome-scale assembly of the wild wheat relative Aegilops umbellulata

Michael Abrouk ^1,^2,^✉, Yajun Wang ^1,², Emile Cavalet-Giorsa ^1,², Maxim Troukhan ³, Maksym Kravchuk ³, Simon G Krattinger ^1,^2,^✉

PMCID: PMC10600132 PMID: 37880246

Abstract

Wild wheat relatives have been explored in plant breeding to increase the genetic diversity of bread wheat, one of the most important food crops. Aegilops umbellulata is a diploid U genome-containing grass species that serves as a genetic reservoir for wheat improvement. In this study, we report the construction of a chromosome-scale reference assembly of Ae. umbellulata accession TA1851 based on corrected PacBio HiFi reads and chromosome conformation capture. The total assembly size was 4.25 Gb with a contig N50 of 17.7 Mb. In total, 36,268 gene models were predicted. We benchmarked the performance of hifiasm and LJA, two of the most widely used assemblers using standard and corrected HiFi reads, revealing a positive effect of corrected input reads. Comparative genome analysis confirmed substantial chromosome rearrangements in Ae. umbellulata compared to bread wheat. In summary, the Ae. umbellulata assembly provides a resource for comparative genomics in Triticeae and for the discovery of agriculturally important genes.

Subject terms: Plant sciences, Genome informatics

Background & Summary

The genus Aegilops contains several grass species, commonly referred to as goatgrass. The genus comprises at least 23 diploid and polyploid species and six different genomes (C, D, M, N, S, and U)^1–4. Aegilops species belong to the same tribe as the major cereal crops bread wheat (Triticum aestivum, 2n = 6x = 42; AABBDD genome), durum wheat (Triticum durum, 2n = 4x = 28; AABB genome) and barley (Hordeum vulgare, 2n = 2x = 14). The genus has thus been explored to increase genetic diversity of wheat via wide hybridization and chromosome recombination^5,6.

Aegilops umbellulata (2n = 2x = 14, UU genome) is the only diploid Aegilops species containing the U genome (Fig. 1a). Compared to the bread wheat A, B and D genomes, the U genome contains several large chromosome rearrangements. In particular, chromosomes 4U, 6U,and 7U show multiple reciprocal translocations, inversions and intra-chromosomal translocations^7,8. The U genome is a source of disease resistance genes that have been transferred into wheat, including Lr9, Lr76, Yr70 and PmY39^9–11. Recently, the leaf rust resistance gene Lr9 has been cloned and found to encode an unusual kinase fusion protein. Ae. umbellulata accession TA1851 was identified as the probable donor of Lr9¹². In this previous analysis, a contig-level assembly of TA1851 was generated to evaluate the Lr9 translocation in bread wheat. The TA1851 contig-level assembly was based on ~157 Gb (~35-fold coverage) of HiFi reads¹³.

In this current study, we first polished the TA1851 HiFi reads using the DeepConsensus¹⁴ pipeline in order to increase read accuracy and to improve the primary contig-level assembly. We then assembled an Ae. umbellulata chromosome-scale reference genome by integrating chromatin conformation capture (Omni-C) data. CpG methylation along the chromosomes was inferred from the PacBio CCS data. The high-quality Ae. umbellulata assembly obtained in this study provides a reference for the U genome of the Triticeae tribe. It will serve as the basis to study chromosome rearrangements across different Triticeae species and can be explored to detect U genome introgressions in durum and bread wheat.

Methods

Plant material, DNA extraction and sequencing

The DNA extraction and generation of PacBio HiFi reads was described previously¹². In brief, high molecular weight (HMW) DNA was extracted from young seedlings of Ae. umbellulata accession TA1851 using a modified Qiagen Genomic DNA extraction protocol (10.17504/protocols.io.bafmibk6)¹⁵. DNA was sheared to the appropriate size range (15–20 kb) using Megaruptor 3 (Diagenode) for the construction of PacBio HiFi sequencing libraries. Library preparation was done with the Express Template Prep Kit 2.0 (100-938-900 + Enzyme Clean up 2.0 (101-932-600)), and size was selected with a PippinHT System (Sage Science, HTP0001). Sequencing was performed on PacBio Sequel II systems. The Omni-C library was prepared and sequenced at Cantata Bio using the Dovetail^® Omni-C^® Kit for plant tissues according to the manufacturer’s protocol. One library was sequenced on an Illumina MiSeq platform to generate ~776 million 2 × 150 bp read pairs for Ae. umbellulata accession TA1851.

Contig-level assembly benchmarking

We first compared contig-level assemblies generated by hifiasm¹⁶ and the La Jolla Assembler (LJA)¹⁷ using standard HiFi reads and corrected HiFi reads generated with DeepConsensus¹⁴. The raw subreads from five SMRT cells were processed using the ccs software (https://github.com/PacificBiosciences/ccs) or DeepConsensus (Table 1). The correction with DeepConsensus produced fewer HiFi data (~157 Gb and ~150 Gb for ccs and DeepConsensus, respectively), but resulted in an increase of the mean read QV (29.9 and 33.1 for ccs and DeepConsensus, respectively) (Table 1).

Table 1.

Comparison of read quality and yield per SMRT cell between ccs and DeepConsensus pipeline for the generation of HiFi reads.

		Mean read quality	Total bases
SMRT1	ccs	30.2	36,249,624,158
SMRT1	DeepConsensus	33.2	34,749,641,467
SMRT2	ccs	29	33,632,333,080
SMRT2	DeepConsensus	32.8	32,111,461,231
SMRT3	ccs	29.8	28,901,671,188
SMRT3	DeepConsensus	33.2	27,567,325,551
SMRT4	ccs	30.1	29,868,633,741
SMRT4	DeepConsensus	33.1	28,441,141,874
SMRT5	ccs	30.4	28,845,332,129
SMRT5	DeepConsensus	33.2	27,540,429,173

Open in a new tab

Contig-level assemblies generated with the different assemblers and data sets were assessed using the basic summary statistics (Table 2). All four assemblies had similar total assembly sizes. For hifiasm, we observed marked increases of contig N50 (11.1 Mb to 14 Mb; + 26%) and contig N90 (3.2 Mb to 3.8 Mb; + 20%) when using corrected HiFi reads (Table 2). Overall, LJA outperformed hifiasm in terms of contiguity. In comparison to hifiasm, DeepConsensus did not result in a considerable increase of contig N50 with LJA, while the contig N90 increased by 16% (4.5 Mb to 5.2 Mb). The highest contiguity was observed with LJA and DeepConsensus, showing a 59% and 63% increase in contig N50 and contig N90, respectively, compared to the hifiasm assembly with standard HiFi reads (Table 2). In terms of computational resources, all the contig-level assemblies were performed on a single AMD node using 120 cores. We observed that the memory usage was higher with LJA with an increase of 61% and 20% with the standard and corrected HiFi reads, respectively. The computing time was also considerably higher with LJA (Table 2). Based on the overall performance, the LJA-DeepConsensus contig-level assembly was used to construct a chromosome-scale Ae. umbellulata assembly.

Table 2.

Comparison of contig-level assembly metrics between hifiasm and LJA.

	Standard HiFi reads + hifiasm	Corrected HiFi reads + hifiasm	Standard HiFi reads + LJA	Corrected HiFi reads + LJA
Memory used (Gb of RAM)	161.21	149.42	259.57	178.94
Computing time	8 h 27 min	7 h 59 min	45 h 18 min	42 h 38 min
Contig number	1,379	1,521	1,625	1,306
Largest contig (bp)	57,092,498	49,335,673	64,890,551	63,887,064
Total assembly length (bp)	4,254,802,190	4,275,077,199	4,248,511,730	4,246,443,824
N50 (bp)	11,148,243	14,032,818	17,301,094	17,703,042
N90 (bp)	3,182,027	3,817,306	4,472,704	5,187,921
GC (%)	47.1	47.1	47.1	47.1

Open in a new tab

Chromosome-scale assembly

Construction of the pseudomolecules was performed by integrating Omni-C read data using Juicer (v2; https://github.com/aidenlab/juicer)¹⁸ and the 3D-DNA pipeline (https://github.com/aidenlab/3d-dna)¹⁹. First, to generate the contact maps, Omni-C Illumina short reads were preprocessed with juicer.sh (parameters: -s none–assembly). The output file “merged_nodups.txt” and the primary assembly were then used to produce an assembly with 3D-DNA¹⁹ (using run-asm-pipeline.sh with -r 0 parameter). We used Juicebox (v2.14.00)²⁰ to visualize the Hi-C contact matrix along the assembly, and to manually curate the assembly. The orientation and the chromosome number of each pseudomolecule were determined based on an existing assembly of Ae. tauschii²¹, a close relative of Ae. umbellulata, using a dotplot comparison produced with chromeister (https://github.com/estebanpw/chromeister)²². There has been some inconsistency in naming the highly rearranged chromosomes 4U and 6U. We decided to follow the most common nomenclature used in the recent publication of Said, et al.⁸. Contigs not anchored in the pseudomolecules were concatenated into an “unanchored chromosome”. The final Hi-C contact maps and assemblies were saved using run-asm-pipeline-post-review.sh from the 3D-DNA pipeline. The genome assembly resulted in seven pseudomolecules and one unanchored chromosome (Fig. 1b; Table 3).

Table 3.

Statistics of the Aegilops umbellulata pseudomolecule assembly.

Chromosome	Length	Number of contigs	Number of gene models
chr1U_TA1851	494,422,770	44	3,506
chr2U_TA1851	646,201,372	66	5,363
chr3U_TA1851	587,623,253	77	4,444
chr4U_TA1851	663,525,381	83	4,794
chr5U_TA1851	626,841,358	52	5,522
chr6U_TA1851	543,353,244	42	5,075
chr7U_TA1851	664,393,216	66	5,590
chrUn_TA1851	20,213,230	878	1,974

Open in a new tab

Repeat annotation and gene model prediction

Transposable element annotation was performed using EDTA²³ (v2.0.0; parameters: --sensitive 1 --anno 1 --evaluate 1) using the current version of the TREP database (v19)²⁴ as a curated input library. Overall, 82.30% of the assembly was classified as repetitive sequences (Table 4).

Table 4.

Classification of repeat annotation in Aegilops umbellulata.

	Class	Count	%masked
LTR	Copia	395,484	17.25%
	Gypsy	1,451,075	34.60%
	unknown	867,939	17.40%
TIR	CACTA	183,488	2.45%
	Mutator	171,834	1.95%
	PIF_Harbinger	90,552	0.95%
	Tc1_Mariner	420,310	3.14%
	hAT	48,882	0.41%
nonTIR	helitron	391,265	4.16%
Total			82.30%

Open in a new tab

Gene model prediction was performed by combining a lifting approach using liftoff (v1.6.3)²⁵ and a genome-guided approach using transcriptomics data with HISAT2 (v2.2.1)²⁶, StringTie (2.1.7)²⁷ and Transdecoder (v5.7.0)²⁸. Post-processing of gff3 files and filtering were performed using AGAT (https://github.com/NBISweden/AGAT)²⁹ and gffread (v0.11.7)³⁰. For the gene lifting, gene models of hexaploid wheat line Chinese Spring³¹, Ae tauschii²¹, and Triticum monoccocum accession TA299³² were independently transferred using liftoff (parameters: -a 0.9 -s 0.9 -copies -exclude_partial -polish). For the genome-guided approach, we used publicly available RNA-Seq data of 12 representative Ae. umbellulata accessions³³ and the RNA-Seq data of two bulks representing Ae. umbellulata leaf tissues³⁴. All the RNA-Seq data were mapped individually against the reference sequence using HISAT2 (parameters: --dta --very-sensitive) and the transcripts were assembled using StringTie (parameters: -m 200 -f 0.3) and merged into a single gtf file. The Transdecoder.LongOrfs script was used to identify open reading frames (ORF) of at least 100 amino acids from the merged gtf file. The predicted protein sequences were compared to the UniProt (2021_03) and Pfam³⁵ databases using BLASTP³⁶ (parameters: -max_target_seqs 1 -outfmt 6 -evalue 1e−5) and hmmer3³⁷ (v3.3.2 - parameters: hmmsearch -E 1e-10). The Transdecoder.Predict script was used with the BLASTP and hmmer results to select the best translation per transcript. Finally, the annotation gff3 file was computed using the perl script “cdna_alignment_orf_to_genome_orf.pl” provided in the Transdecoder package.

All the output gff files from the lifting and genome-guided approaches were merged into a single file using the perl script “agat_sp_merge_annotations.pl”. The merged file was then post-processed using gffread tools (parameters:–keep-genes -N -J) to retain transcripts with start and stop codons, and to discard transcripts with 1) premature stop codons and/or 2) having introns with non-canonical splice sites. In total, 36,268 gene models were predicted for which the putative functional annotations were assigned using a protein comparison against the UniProt database (2021_03) using DIAMOND³⁸ (parameter: -f 6 -k 1 -e 1e-6). PFAM domain signatures and GO were assigned using InterproScan version 5.55–88.0³⁹.

The synteny analysis against Ae. tauschii was computed using MCScanX⁴⁰ with defaults parameters, which allowed us to identify the main translocation events within the Ae. umbellulata genome (Fig. 1b).

PacBio DNA methylation profile

Methylation in CpG context was inferred with ccsmeth (v0.3.2)⁴¹, a deep-learning method to detect DNA 5mCpGs by using kinetics features from PacBio CCS reads. The methylation prediction for CCS reads were called using the model “model_ccsmeth_5mCpG_call_mods_attbigru2s_b21.v1.ckpt”. Then, the reads with the MM + ML tags were aligned to the pseudomolecules using BWA (v0.7.17)⁴² and the subsequent BAM file was filtered for hard/soft clips and quality (MAPQ ≥ 60) using SAMtools (v1.8)⁴³. The methylation frequency was calculated at genome level with the modbam files and the aggregate mode of ccsmeth with the model “model_ccsmeth_5mCpG_aggregate_attbigru_b11.v2.ckpt”.

Genome visualization

The genome of Ae. umbellulata accession TA1851 was uploaded into the Persephone^® multi-genome browser (https://web.persephonesoft.com/?data=genomes/TA1851). The data tracks available are the DNA sequence, gene model prediction, and the CpG methylation. A BLAST³⁶ search and synteny analysis with the hexaploid wheat line Chinese Spring (v.2.1)⁴⁴ are also available (Fig. 2).

Fig. 2 — Genome visualization with Persephone. (a) Persephone genome browser visualization. The upper panel represents the position along chromosome 3U. The middle panel shows an example of three gene models with their predicted isoforms. In the lower panel, the CpG methylation profile is represented in blue and red for the unmethylated and methylated bases, respectively. (b) Synteny matrix between the seven *Ae. umbellulata* chromosomes (x-axis) and the 21 chromosomes of the bread wheat line Chinese spring v2.1 (y-axis) (c) Synteny comparison of the highly rearranged *Ae. umbellulata* chromosome 6U (in central position) in comparison to bread wheat chromosomes 1D, 2D, 4D, 6D and 7D. The links between chromosomes represented orthologous gene relationships.

Data Records

The corrected HiFi reads and the raw Omni-C reads were deposited in the Sequence Read Archive at NCBI under accession number ERP147844⁴⁵. The final chromosome assembly was deposited at NCBI under the accession number GCA_032464435.1⁴⁶.

The Ae. umbellulata assembly, gene model prediction, repeat annotations, methylation profile and Hi-C contact map are available on DRYAD Digital Repository⁴⁷ (10.5061/dryad.05qfttf82).

Technical Validation

Assessment of genome assembly and annotation

The Hi-C contact map was manually curated and assessed with Juicebox and showed a dense pattern along the diagonal revealing no potential mis-assemblies (Fig. 3). The anti-diagonals are typical for Triticeae genomes and correspond the Rabl configuration of Triticeae chromosomes^48,49. Chromosome 6U does not show the anti-diagonal, which is most likely due to the extreme acrocentric nature of this chromosome^50,51 (Fig. 3).

Fig. 3 — Contact map after the integration of the Omni-C data and manual correction. Green and blue boxes represent contigs and pseudomolecules, respectively.

The BUSCO⁵² (v5.4.5 – poales_odb10) score of 98% (0.4% fragmented and 1.6% missing BUSCOs) at the genome level indicates a high completeness of the TA1851 assembly. The quality of the Ae. umbellulata assembly was assessed with Merqury⁵³ based on the PacBio HiFi reads using 19-mers. The QV (consensus quality value) and k-mer completeness scores were 59.3 and 98.1%, respectively. We further determined the LTR Assembly Index (LAI) and obtained a value of 16.42, which corresponds to a reference quality genome⁵⁴. Telomeric repeats (TTTAGGG)_n^55,56 were found at the extremities of all the pseudomolecules, except the short arms of chromosomes 1U and 5U,which corresponds to the location of the rDNA loci in Ae. umbellulata⁵⁷.

Completeness of the gene model prediction was evaluated using BUSCO and produced a score of 98.1% (0.3% fragmented and 1.6% missing BUSCOs). The number of predicted gene models (36,268) is in the range of a diploid Triticeae species (34,000–43,000 high-confidence gene models per haploid genome)⁵⁸.

Acknowledgements

We thank the KAUST Bioscience Core Laboratory for sequencing support, Lingli Zou (KAUST) for greenhouse support, and the KAUST supercomputing facilities (https://www.hpc.kaust.edu.sa) for providing computing resources. This publication is based upon work supported by the King Abdullah University of Science and Technology.

Author contributions

M.A. and S.G.K. designed the study. Y.W. performed the DNA extraction. M.A. and E.C-G. analyzed the data. M.T. and M.K. managed the visualization platform. M.A. and S.G.K. wrote the initial manuscript. All authors have read and approved the final manuscript.

Code availability

All software and pipelines were executed according to the manual and protocol of published tools. No custom code was generated for these analyses.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Michael Abrouk, Email: michael.abrouk@kaust.edu.sa.

Simon G. Krattinger, Email: simon.krattinger@kaust.edu.sa

References

1.Molnar-Lang, M., Ceoloni, C. & Dolezel, J. Alien introgression in wheat. (Springer, 2015).
2.Van Slageren, M. Wild wheats: a monograph of Aegilops L. and Amblyopyrum (Jaub. & Spach) Eig (Poaceae). (Agricultural University Wageningen, 1994).
3.Kimber, G. Genome symbols and plasma types in the wheat group. in Proc. 7th Intl. Wheat Genet. Symp. 1209–1211 (1988).
4.Kishii M. An Update of Recent Use of Aegilops Species in Wheat Breeding. Front Plant Sci. 2019;10:585. doi: 10.3389/fpls.2019.00585. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Kilian, B. et al. Aegilops, wild crop relatives, genomic and breeding resources. Cereal (Ed Kole, C), 1–76 (2011).
6.Schneider A, Molnar I, Molnar-Lang M. Utilisation of Aegilops (goatgrass) species to widen the genetic diversity of cultivated wheat. Euphytica. 2008;163:1–19. [Google Scholar]
7.Molnár I, et al. Dissecting the U, M, S and C genomes of wild relatives of bread wheat (Aegilops spp.) into chromosomes and exploring their synteny with wheat. The Plant Journal. 2016;88:452–467. doi: 10.1111/tpj.13266. [DOI] [PubMed] [Google Scholar]
8.Said M, et al. Development of DNA Markers From Physically Mapped Loci in Aegilops comosa and Aegilops umbellulata Using Single-Gene FISH and Chromosome Sequences. Front Plant Sci. 2021;12:689031. doi: 10.3389/fpls.2021.689031. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Sears ER. Brookhaven Symposia in Biology. 1956;9:1–21. [Google Scholar]
10.Bansal M, et al. Aegilops umbellulata introgression carrying leaf rust and stripe rust resistance genes Lr76 and Yr70 located to 9.47-Mb region on 5DS telomeric end through a combination of chromosome sorting and sequencing. Theor Appl Genet. 2020;133:903–915. doi: 10.1007/s00122-019-03514-x. [DOI] [PubMed] [Google Scholar]
11.Zhu ZD, et al. Microsatellite marker identification of a Triticum aestivum - Aegilops umbellulata substitution line with powdery mildew resistance. Euphytica. 2006;150:149–153. [Google Scholar]
12.Wang, Y. et al. An unusual tandem kinase fusion protein confers leaf rust resistance in wheat. Nature Genetics (2023). [DOI] [PMC free article] [PubMed]
13.Wenger AM, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. 2019;37:1155–1162. doi: 10.1038/s41587-019-0217-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Baid G, et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nature Biotechnology. 2023;41:232–238. doi: 10.1038/s41587-022-01435-7. [DOI] [PubMed] [Google Scholar]
15.Driguez P, et al. LeafGo: Leaf to Genome, a quick workflow to produce high-quality de novo plant genomes using long-read sequencing technology. Genome Biol. 2021;22:256. doi: 10.1186/s13059-021-02475-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Bankevich A, Bzikadze AV, Kolmogorov M, Antipov D, Pevzner PA. Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads. Nature Biotechnology. 2022;40:1075–1081. doi: 10.1038/s41587-022-01220-6. [DOI] [PubMed] [Google Scholar]
18.Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Durand NC, et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Luo MC, et al. Genome sequence of the progenitor of the wheat D genome Aegilops tauschii. Nature. 2017;551:498–502. doi: 10.1038/nature24486. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Perez-Wohlfeil E, Diaz-Del-Pino S, Trelles O. Ultra-fast genome comparison for large-scale genomic experiments. Sci Rep. 2019;9:10274. doi: 10.1038/s41598-019-46773-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Ou S, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Wicker T, Matthews DE, Keller B. TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 2002;7:561–562. [Google Scholar]
25.Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021;37:1639–1643. doi: 10.1093/bioinformatics/btaa1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Haas, B. & Papanicolaou, A. TransDecoder (find coding regions within transcripts). http://transdecoder.github.io.
29.Dainat J. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. Version v0. 2020;4:10.5281. [Google Scholar]
30.Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res. 2020;9:304. doi: 10.12688/f1000research.23297.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.International Wheat Genome Sequencing, C. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science361 (2018). [DOI] [PubMed]
32.Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature (2023). [DOI] [PMC free article] [PubMed]
33.Okada M, et al. RNA-seq analysis reveals considerable genetic diversity and provides genetic markers saturating all chromosomes in the diploid wild wheat relative Aegilops umbellulata. BMC plant biology. 2018;18:1–13. doi: 10.1186/s12870-018-1498-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Edae EA, Rouse MN. Bulked segregant analysis RNA-seq (BSR-Seq) validated a stem resistance locus in Aegilops umbellulata, a wild relative of wheat. PLoS One. 2019;14:e0215492. doi: 10.1371/journal.pone.0215492. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Mistry J, et al. Pfam: The protein families database in 2021. Nucleic acids research. 2021;49:D412–D419. doi: 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Camacho C, et al. BLAST+: architecture and applications. BMC bioinformatics. 2009;10:1–9. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic acids research. 2011;39:W29–W37. doi: 10.1093/nar/gkr367. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nature Methods. 2015;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
39.Quevillon E, et al. InterProScan: protein domains identifier. Nucleic Acids Res. 2005;33:W116–120. doi: 10.1093/nar/gki442. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Wang Y, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Ni P, et al. DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing. Nature Communications. 2023;14:4054. doi: 10.1038/s41467-023-39784-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Zhu T, et al. Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly. Plant J. 2021;107:303–314. doi: 10.1111/tpj.15289. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.2023. NCBI Sequence Read Archive. ERP147844
46.2023. NCBI Assembly. GCA_032464435.1
47.Abrouk M, 2023. Data from:Chromosome-scale assembly of the wild wheat relative Aegilops umbellulata. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]
48.Tiang CL, He Y, Pawlowski WP. Chromosome organization and dynamics during interphase, mitosis, and meiosis in plants. Plant Physiol. 2012;158:26–34. doi: 10.1104/pp.111.187161. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Mascher M, et al. A chromosome conformation capture ordered sequence of the barley genome. Nature. 2017;544:427–433. doi: 10.1038/nature22043. [DOI] [PubMed] [Google Scholar]
50.Friebe B, Jiang J, Tuleen N, Gill B. Standard karyotype of Triticum umbellulatum and the characterization of derived chromosome addition and translocation lines in common wheat. Theoretical and Applied Genetics. 1995;90:150–156. doi: 10.1007/BF00221010. [DOI] [PubMed] [Google Scholar]
51.Zhang H, Jia J, Gale M, Devos K. Relationships between the chromosomes of Aegilops umbellulata and wheat. Theoretical and Applied Genetics. 1998;96:69–75. [Google Scholar]
52.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
53.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome biology. 2020;21:1–27. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Richards EJ, Ausubel FM. Isolation of a higher eukaryotic telomere from Arabidopsis thaliana. Cell. 1988;53:127–136. doi: 10.1016/0092-8674(88)90494-1. [DOI] [PubMed] [Google Scholar]
56.Peska V, Garcia S. Origin, Diversity, and Evolution of Telomere Sequences in Plants. Front Plant Sci. 2020;11:117. doi: 10.3389/fpls.2020.00117. [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Castilho A, Heslop-Harrison JS. Physical mapping of 5S and 18S–25S rDNA and repetitive DNA sequences in Aegilops umbellulata. Genome. 1995;38:91–96. doi: 10.1139/g95-011. [DOI] [PubMed] [Google Scholar]
58.Poretti M, Praz CR, Sotiropoulos AG, Wicker T. A survey of lineage‐specific genes in Triticeae reveals de novo gene evolution from genomic raw material. Plant Direct. 2023;7:e484. doi: 10.1002/pld3.484. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2023. NCBI Sequence Read Archive. ERP147844
2023. NCBI Assembly. GCA_032464435.1
Abrouk M, 2023. Data from:Chromosome-scale assembly of the wild wheat relative Aegilops umbellulata. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]

Data Availability Statement

All software and pipelines were executed according to the manual and protocol of published tools. No custom code was generated for these analyses.

[CR1] 1.Molnar-Lang, M., Ceoloni, C. & Dolezel, J. Alien introgression in wheat. (Springer, 2015).

[CR2] 2.Van Slageren, M. Wild wheats: a monograph of Aegilops L. and Amblyopyrum (Jaub. & Spach) Eig (Poaceae). (Agricultural University Wageningen, 1994).

[CR3] 3.Kimber, G. Genome symbols and plasma types in the wheat group. in Proc. 7th Intl. Wheat Genet. Symp. 1209–1211 (1988).

[CR4] 4.Kishii M. An Update of Recent Use of Aegilops Species in Wheat Breeding. Front Plant Sci. 2019;10:585. doi: 10.3389/fpls.2019.00585. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Kilian, B. et al. Aegilops, wild crop relatives, genomic and breeding resources. Cereal (Ed Kole, C), 1–76 (2011).

[CR6] 6.Schneider A, Molnar I, Molnar-Lang M. Utilisation of Aegilops (goatgrass) species to widen the genetic diversity of cultivated wheat. Euphytica. 2008;163:1–19. [Google Scholar]

[CR7] 7.Molnár I, et al. Dissecting the U, M, S and C genomes of wild relatives of bread wheat (Aegilops spp.) into chromosomes and exploring their synteny with wheat. The Plant Journal. 2016;88:452–467. doi: 10.1111/tpj.13266. [DOI] [PubMed] [Google Scholar]

[CR8] 8.Said M, et al. Development of DNA Markers From Physically Mapped Loci in Aegilops comosa and Aegilops umbellulata Using Single-Gene FISH and Chromosome Sequences. Front Plant Sci. 2021;12:689031. doi: 10.3389/fpls.2021.689031. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Sears ER. Brookhaven Symposia in Biology. 1956;9:1–21. [Google Scholar]

[CR10] 10.Bansal M, et al. Aegilops umbellulata introgression carrying leaf rust and stripe rust resistance genes Lr76 and Yr70 located to 9.47-Mb region on 5DS telomeric end through a combination of chromosome sorting and sequencing. Theor Appl Genet. 2020;133:903–915. doi: 10.1007/s00122-019-03514-x. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Zhu ZD, et al. Microsatellite marker identification of a Triticum aestivum - Aegilops umbellulata substitution line with powdery mildew resistance. Euphytica. 2006;150:149–153. [Google Scholar]

[CR12] 12.Wang, Y. et al. An unusual tandem kinase fusion protein confers leaf rust resistance in wheat. Nature Genetics (2023). [DOI] [PMC free article] [PubMed]

[CR13] 13.Wenger AM, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. 2019;37:1155–1162. doi: 10.1038/s41587-019-0217-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Baid G, et al. DeepConsensus improves the accuracy of sequences with a gap-aware sequence transformer. Nature Biotechnology. 2023;41:232–238. doi: 10.1038/s41587-022-01435-7. [DOI] [PubMed] [Google Scholar]

[CR15] 15.Driguez P, et al. LeafGo: Leaf to Genome, a quick workflow to produce high-quality de novo plant genomes using long-read sequencing technology. Genome Biol. 2021;22:256. doi: 10.1186/s13059-021-02475-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Bankevich A, Bzikadze AV, Kolmogorov M, Antipov D, Pevzner PA. Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads. Nature Biotechnology. 2022;40:1075–1081. doi: 10.1038/s41587-022-01220-6. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Durand NC, et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 2016;3:95–98. doi: 10.1016/j.cels.2016.07.002. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Dudchenko O, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–95. doi: 10.1126/science.aal3327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Durand NC, et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 2016;3:99–101. doi: 10.1016/j.cels.2015.07.012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Luo MC, et al. Genome sequence of the progenitor of the wheat D genome Aegilops tauschii. Nature. 2017;551:498–502. doi: 10.1038/nature24486. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Perez-Wohlfeil E, Diaz-Del-Pino S, Trelles O. Ultra-fast genome comparison for large-scale genomic experiments. Sci Rep. 2019;9:10274. doi: 10.1038/s41598-019-46773-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Ou S, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Wicker T, Matthews DE, Keller B. TREP: a database for Triticeae repetitive elements. Trends Plant Sci. 2002;7:561–562. [Google Scholar]

[CR25] 25.Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021;37:1639–1643. doi: 10.1093/bioinformatics/btaa1016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature biotechnology. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Pertea M, et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology. 2015;33:290–295. doi: 10.1038/nbt.3122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Haas, B. & Papanicolaou, A. TransDecoder (find coding regions within transcripts). http://transdecoder.github.io.

[CR29] 29.Dainat J. AGAT: Another Gff Analysis Toolkit to handle annotations in any GTF/GFF format. Version v0. 2020;4:10.5281. [Google Scholar]

[CR30] 30.Pertea G, Pertea M. GFF Utilities: GffRead and GffCompare. F1000Res. 2020;9:304. doi: 10.12688/f1000research.23297.1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.International Wheat Genome Sequencing, C. Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science361 (2018). [DOI] [PubMed]

[CR32] 32.Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature (2023). [DOI] [PMC free article] [PubMed]

[CR33] 33.Okada M, et al. RNA-seq analysis reveals considerable genetic diversity and provides genetic markers saturating all chromosomes in the diploid wild wheat relative Aegilops umbellulata. BMC plant biology. 2018;18:1–13. doi: 10.1186/s12870-018-1498-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Edae EA, Rouse MN. Bulked segregant analysis RNA-seq (BSR-Seq) validated a stem resistance locus in Aegilops umbellulata, a wild relative of wheat. PLoS One. 2019;14:e0215492. doi: 10.1371/journal.pone.0215492. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Mistry J, et al. Pfam: The protein families database in 2021. Nucleic acids research. 2021;49:D412–D419. doi: 10.1093/nar/gkaa913. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Camacho C, et al. BLAST+: architecture and applications. BMC bioinformatics. 2009;10:1–9. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Finn RD, Clements J, Eddy SR. HMMER web server: interactive sequence similarity searching. Nucleic acids research. 2011;39:W29–W37. doi: 10.1093/nar/gkr367. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nature Methods. 2015;12:59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Quevillon E, et al. InterProScan: protein domains identifier. Nucleic Acids Res. 2005;33:W116–120. doi: 10.1093/nar/gki442. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Wang Y, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 2012;40:e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Ni P, et al. DNA 5-methylcytosine detection and methylation phasing using PacBio circular consensus sequencing. Nature Communications. 2023;14:4054. doi: 10.1038/s41467-023-39784-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Zhu T, et al. Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly. Plant J. 2021;107:303–314. doi: 10.1111/tpj.15289. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.2023. NCBI Sequence Read Archive. ERP147844

[CR46] 46.2023. NCBI Assembly. GCA_032464435.1

[CR47] 47.Abrouk M, 2023. Data from:Chromosome-scale assembly of the wild wheat relative Aegilops umbellulata. Dryad Digital Repository. [DOI] [PMC free article] [PubMed]

[CR48] 48.Tiang CL, He Y, Pawlowski WP. Chromosome organization and dynamics during interphase, mitosis, and meiosis in plants. Plant Physiol. 2012;158:26–34. doi: 10.1104/pp.111.187161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Mascher M, et al. A chromosome conformation capture ordered sequence of the barley genome. Nature. 2017;544:427–433. doi: 10.1038/nature22043. [DOI] [PubMed] [Google Scholar]

[CR50] 50.Friebe B, Jiang J, Tuleen N, Gill B. Standard karyotype of Triticum umbellulatum and the characterization of derived chromosome addition and translocation lines in common wheat. Theoretical and Applied Genetics. 1995;90:150–156. doi: 10.1007/BF00221010. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Zhang H, Jia J, Gale M, Devos K. Relationships between the chromosomes of Aegilops umbellulata and wheat. Theoretical and Applied Genetics. 1998;96:69–75. [Google Scholar]

[CR52] 52.Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[CR53] 53.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome biology. 2020;21:1–27. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Ou S, Chen J, Jiang N. Assessing genome assembly quality using the LTR Assembly Index (LAI) Nucleic Acids Res. 2018;46:e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Richards EJ, Ausubel FM. Isolation of a higher eukaryotic telomere from Arabidopsis thaliana. Cell. 1988;53:127–136. doi: 10.1016/0092-8674(88)90494-1. [DOI] [PubMed] [Google Scholar]

[CR56] 56.Peska V, Garcia S. Origin, Diversity, and Evolution of Telomere Sequences in Plants. Front Plant Sci. 2020;11:117. doi: 10.3389/fpls.2020.00117. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Castilho A, Heslop-Harrison JS. Physical mapping of 5S and 18S–25S rDNA and repetitive DNA sequences in Aegilops umbellulata. Genome. 1995;38:91–96. doi: 10.1139/g95-011. [DOI] [PubMed] [Google Scholar]

[CR58] 58.Poretti M, Praz CR, Sotiropoulos AG, Wicker T. A survey of lineage‐specific genes in Triticeae reveals de novo gene evolution from genomic raw material. Plant Direct. 2023;7:e484. doi: 10.1002/pld3.484. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Chromosome-scale assembly of the wild wheat relative Aegilops umbellulata

Michael Abrouk

Yajun Wang

Emile Cavalet-Giorsa

Maxim Troukhan

Maksym Kravchuk

Simon G Krattinger

Abstract

Background & Summary

Fig. 1.

Methods

Plant material, DNA extraction and sequencing

Contig-level assembly benchmarking

Table 1.

Table 2.

Chromosome-scale assembly

Table 3.

Repeat annotation and gene model prediction

Table 4.

PacBio DNA methylation profile

Genome visualization

Fig. 2.

Data Records

Technical Validation

Assessment of genome assembly and annotation

Fig. 3.

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases