Pedigree genome data of an early-matured Geng/japonica glutinous rice mega variety Longgeng 57

Yuanbao Lei; Yunjiang Zhang; Linyun Xu; Wendong Ma; Ziqi Zhou; Jie Li; Pengyu Quan; Muhiuddin Faruquee; Dechen Yang; Fan Zhang; Yongli Zhou; Guangjun Quan; Xiuqin Zhao; Wensheng Wang; Bailong Liu; Zhikang Li; Jianlong Xu; Tianqing Zheng

doi:10.1038/s41597-024-03057-x

. 2024 Feb 22;11:230. doi: 10.1038/s41597-024-03057-x

Pedigree genome data of an early-matured Geng/japonica glutinous rice mega variety Longgeng 57

Yuanbao Lei ^1,^2,^3,^#, Yunjiang Zhang ^1,^#, Linyun Xu ^2,^#, Wendong Ma ¹, Ziqi Zhou ², Jie Li ⁴, Pengyu Quan ⁴, Muhiuddin Faruquee ⁵, Dechen Yang ^2,⁶, Fan Zhang ², Yongli Zhou ², Guangjun Quan ⁴, Xiuqin Zhao ², Wensheng Wang ^2,⁶, Bailong Liu ³, Zhikang Li ², Jianlong Xu ^2,^6,^7,^✉, Tianqing Zheng ^2,^6,^✉

PMCID: PMC10884013 PMID: 38388638

Abstract

By using PacBio HiFi technology, we produced over 700 Gb of long-read sequencing (LRS) raw data; and by using Illumina paired-end whole-genome shotgun (WGS) sequencing technology, we generated more than 70 Gb of short-read sequencing (SRS) data. With LRS data, we assembled one genome and then generate a set of annotation data for an early-matured Geng/japonica glutinous rice mega variety genome, Longgeng 57 (LG57), which carries multiple elite traits including good grain quality and wide adaptability. Together with the SRS data from three parents of LG57, pedigree genome variations were called for three representative types of genes. These data sets can be used for deep variation mining, aid in the discovery of new insights into genome structure, function, and evolution, and help to provide essential support to biological research in general.

Subject terms: Plant breeding, Agricultural genetics

Background & Summary

In recent years, the planting area for rice (Oryza sativa L.) in Heilongjiang (HLJ) province of China has increased to around 4 million ha¹. For this global largest planting region for early Geng/japonica rice, which is about 2.6 times larger than the rice planting area of Japan², determining how to transfer its advantages in agriculture to other branches of the economy remains a significant challenge for agriculture researchers.

Early-matured Geng/japonica varieties provide the base for food security³, and supply critical agro-industrial materials, especially glutinous varieties. Glutinous rice, also called sticky rice, is becoming increasingly popular because of growing public awareness of health issues⁴. Glutinous rice has health benefits in managing diabetes, inhibiting chronic diseases, enhancing digestion, and reducing inflammation⁵. In addition to being an elite cooking material for a low gluten diet and ‘good food’⁶, glutinous rice also provides raw materials for environment-friendly industry^7–9. Longgeng 57 (LG57), a glutinous early variety, has favorable quality and stable-yield behavior in the early Geng/japonica planting region; therefore, it is now planted over more than 120,000 ha per year on average.

Grain quality traits of rice are largely controlled by major genes, such as Waxy for the amylose content and OsNramp5 for the mineral nutritional quality^10–12. Thus, further improvement of grain quality of glutinous rice, e.g., LG57, also requires more genome information.

Currently, joint analysis has become a trend in biotechnology-based rice breeding in HLJ. For example, the Rice Molecular Breeding (RMB) laboratory from the Institute of Crop Science (ICS), Chinese Academy of Agricultural Sciences (CAAS), has set up a genome-based breeding scheme with the aid of both core germplasms of 3K-RG¹³, and the Rice Functional Genomics Breeding (RFGB) information platform¹⁴. It also widely cooperates with local research institutes from HLJ, including Jiamusi Rice Research Institute (JMS-RRI) and Suihua RRI (SH-RRI)³. Herein, we present a dataset from a collaboration between the RMB laboratory and JMS-RRI for early-matured Geng/japonica including LG57. Information based on this dataset for certain target genes, such as Waxy and OsNramp5, were also included as examples for data validation. This dataset comprises more than 770 Gb of pedigree genome data that will be useful for researches in general.

Methods

Plant material and library construction

The early-matured Geng/japonica variety Longgeng57 (LG57) was developed by our own and licensed to be released in 2017 and is now one mega variety with multiple elite traits and widely planted (more than 120,000 hectare per year) in Heilongjiang province in Northeast of China. High-molecular-weight genomic DNA was extracted from 10-day-old leaves of LG57 pedigree members (multiple seeds) with modified CTAB method followed by 0.5x bead purification for twice. The DNA sample through the qualification processes by both 0.75% agarose gel assay and Nanodrop was quantified with Qubit. Then the sample of LG57 met the standard was submitted to the constructions of PacBio HiFi library for long-read sequencing (LRS). Samples of three parents (Longnuo 2 (LN2), Punian 8 (PN8), and Longgeng 29 (LG29)) were submitted to construct Illumina libraries short-read sequencing (SRS) (Fig. 1).

Fig. 1 — Outlines of the workflow used to generate and analyze the pedigree genome data for Longgeng 57 (LG57).

Genomic data were generated for all pedigree members, as listed in Table 1. Among them, PacBio (Menlo Park, CA, USA) protocols were adopted for long-read sequencing of LG57 and Illumina (San Diego, CA, USA) protocols were used for short-read sequencing. The details are as follows.

Table 1.

Genomic data generated for pedigree of Longgeng 57.

Name	code	Tiller number	Panicle size	Genotyping method	Format	Size(Gb)
Longgeng 57	LG57	More	Smaller	PacBio	bam + fa	700 + 0.4
Longnuo 2	LN2	More	Smaller	Illumina sequencing	fq.gz	23.1
Punian 8	PN8	Fewer	Larger	Illumina sequencing	fq.gz	27.0
Longgeng 29	LG29	Fewer	Larger	Illumina sequencing	fq.gz	23.3

Open in a new tab

DNA sample testing

DNA extraction from samples was carried out using a routine method that met the quality standard required for sequencing according to a previous study³. Sample purity and quantity were detected using a Nano Photometer® (IMPLEN, Westlake Village, CA, USA) and a Qubit® 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA), respectively, in combination with Agarose electrophoresis (concentration 1%, voltage 120 V for 45 min).

Library construction and Inventory inspection

Covaris® g-TUBE¹⁵ was used to break the genomic DNA into suitable large pieces. Magnetic beads were then used for enrichment and purification. SageELF (Sage, Newcastle upon Tyne, UK) was adopted to screen and purify the DNA fragments. An Annoroad® Universal DNA Library Prep Kit V2.0 (Annoroad Gene Technology, Beijing, China) was used for sample preparation, including end repair and ligation addition.

To ensure the quality of the library, a three-step quality check procedure was adopted as follows. After the library was constructed, the Qubit 3.0 was used for preliminary quantification. Then, the library was diluted to 1 ng/μL and the insert size was checked using an Agilent 2100 instrument (Agilent, Santa Clara, CA, USA). The effective concentration of the library was accurately quantified using quantitative real-time reverse transcription PCR (qRT-PCR) in a Bio-Rad CFX96 PCR instrument with a Bio-Rad IQ SYBR GRN Kit (both Bio-Rad, Hercules, CA, USA).

Sequencing

The single-molecule real-time (SMRT) method was adopted for the long-read sequencing (LRS) according to standard method (PacBio). Short-read sequencing (SRS) was carried out on the NovaSeq 6000 S4 platform (Illumina) to obtain a 250 bp double-ended sequencing reads.

Genome assembly, validation and annotation

For the LRS data obtained by HiFi library sequencing, the raw data (subreads) from the PacBio sequencing was filtered by using SMRT link v9.0.0.92188 (https://www.pacb.com/support/software-downloads/) with default parameters to obtain high-quality circular consensus sequences (CCS) data. For the assembly, hifiasm¹⁶ with default parameters were employed based on the CCS data. Merqury¹⁷ was adopted for the quality check of LG57 assembly. Also, BUSCO (Benchmarking Universal Single-Copy Orthologs)¹⁸ was used for genome assembly quality assessment. BUSCO analysis with default parameters was carried out using a single-copy gene set of several large evolutionary branches based on the OrthoDB (http://cegg.unige.ch/orthodb). The gene set was compared with the assembled genome using embryophyta_odb10, and the accuracy and completeness were assessed based on the proportions and completeness of the alignment.

Based on the LG57 assembly, two strategies were adopted for genome annotation. The first was a homologous strategy. RepeatMasker with default parameters¹⁹ based on RepBase²⁰ was used to annotate repeats. For gene structures, BLAST²¹ with evalue = 1e-5 and GeMoMa²² with default parameters were used. Prediction of rRNAs, snRNAs, and miRNAs was carried out by aligning the assembly with known non-coding RNA libraries, e.g., Rfam²³.The second was a de novo strategy. For repeat analysis, RepeatModeler (https://www.repeatmasker.org/RepeatModeler/) with -engine ncbi was adopted. For protein-coding gene prediction, Augustus²⁴ with–genemodel = partial, SNAP(https://github.com/KorfLab/SNAP), and GeneMark²⁵ with default parameters were adopted. Based on the above predictions, EVidence Modeler (EVM)²⁶ with default parameters was used to integrate the gene sets predicted by various strategies into a non-redundant gene set. The resulting predictive gene set was compared with various functional databases using UniProt²⁷, NCBI (https://www.ncbi.nlm.nih.gov/nucleotide/), PFAM²⁸, eggNOG²⁹, GO (gene ontology)³⁰, and KEGG (Kyoto Encyclopedia of Genes and Genomes)³¹. For tRNA sequence prediction, we used tRNAscan-SE³² with parameters of -X 20 and –z 8.

The SRS data were aligned to the reference genome and variations were called using a pipeline comprising BWA³³, SAMtools³⁴, and GATK³⁵ with default parameters, with Nipponbare IRGSP 1.0³⁶ as the reference genome.

Data Records

The assembly of LG57 is accessible at NCBI through GenBank³⁷ or the following accession ID of JAXQPT000000000³⁷. Additionally, the raw read data for LG57 in the bam format are also available with accession number of SRR25376496³⁸. Other sequencing pedigree genomic data for parents of LG57, including PN8 (SRR24688636)³⁹, LN2 (SRR24688637)⁴⁰, and LG29 (SRR24688635)⁴¹. Annotation data for LG57 are accessible through figshare⁴². All above data except for the bam files are also accessible in RFGB website (https://rfgb.rmbreeding.cn/download/publicDataDownload/download?dataset=3).

Technical Validation

A total 1,671,418 of reads were obtained. The averaged read-length is 16,831.42 bp and N50 value is more than 17 Kb. The distribution of these reads was shown in Fig. 2. A rough assembly for LG57 was carried out. A quality checking for the assembly of Longgeng 57 was also carried out by using Merqury and BUSCO. Based on the output of Merqury, the completeness of assembly was 99.5% and the QV was 62.0 (Table 2). As shown in Table 3, N50 of contig has arrived at more than 27 Mb, which is over 10 times of our previous work with SJ18³. As shown in Table 4, a total of 1614 groups were searched by BUSCO, the complete groups accounted for about 98.8%. Functional genes predicted in LG57 comparing with those from databases were shown in Table 5. Identified by RepeatMasker, the total length of the repeat sequences is approximately 170MB, accounting for 43.13% of the whole LG57 genome (Table 6). Prediction results of different types of non-coding RNA including miRNA, tRNA, rRNA, and snRNA were listed in Table 7. These RNAs together accounting for 81.3% of the LG57 genome. We also compared the parameters of LG57 to the other assemblies. Averaged gene length of LG57 is longer than those of the others (Table 8).

Fig. 2 — Distribution of lengths of circular consensus sequences (CCS) reads for Longgeng 57 (LG57).

Table 2.

Assembly quality assessment by Merqury for Longgeng 57.

Parameters	LG57
Completeness (%)	99.542
QV	62.0241
Error rate	6.27E-07

Open in a new tab

Table 3.

Comparison of Longgeng 57 dataset with representative assemblies including mega varieties (MV) or standard references (SR).

	LG57	SJ18	MH63	Nipponbare	R498	9311	IR64
Variety types	MV (Early-matured Geng/japonica, glutinous)	MV (Early-matured Geng/japonica, aroma)	MV & SR (Three-line Xian/indica hybrid restorer)	SR (Medium-matured Geng/japonica, aroma)	SR (Three-line Xian/indica hybrid restorer)	MV & SR (Two-line Xian/indica hybrid restorer)	Mega (Xian/indica)
Total nucleotides (Mb)	392.3	418.9	359.9–395.8	373.2	390.3–423.2	426.3	367.1
N50 contig length (bp)	27,391,608	2,467,626	3,097,358 – gap free	7,711,345 – gap free	1,185,206	23,200	27,827,038
Total gene numbers	39,920	38,456	39,406	39,045	38,714	39,285	41,458

Open in a new tab

Table 4.

Assembly quality for Longgeng 57 presented by BUSCO.

BUSCO groups searched	Number	Percentage (%)
Complete (C)	1594	98.8
Complete and single-copy (S)	1560	96.7
Complete and duplicated (D)	34	2.1
Fragmented (F)	13	0.8
Missing (M)	7	0.4
Total	1614	100

Open in a new tab

Table 5.

Functional genes predicted in Longgeng 57 comparing with those from databases.

Databases	Count	Percentage (%)
SwissProt	21346	53.5
GO	21515	53.9
KO	8123	20.4
KEGG PATHWAY	5277	13.2
NR	38111	95.5
NT	39240	98.3
PFAM	21951	55.0
eggNOG	18352	46.0
Total_anno	39877	99.9
Total_unigene	39920	100.0

Open in a new tab

Table 6.

Repeats predicted by different methods in Longgeng 57 assembly.

Type	Repeat length(bp)	% of genome
RepeatMasker	170634399	43.13%
ProteinMask	51165	0.01%
Denovo	181894436	45.97%
Trf	18598807	4.70%
Total	204035399	51.57%

Open in a new tab

Table 7.

Non-coding RNAs annotation results in Longgeng 57 assembly.

Class	Type	Copy	Average length(bp)	Total length(bp)	% of genome
miRNA	miRNA	9684	201.5	1951247	49.3%
tRNA	tRNA	3039	75.3	228859	5.8%
	18 S	342	1739.2	594807	15.0%
rRNA	28 S	1317	144.5	190334	4.8%
	5.8 S	324	158.9	51483	1.3%
	5 S	1014	119.3	120950	3.1%
	CD-box	562	106.9	60112	1.5%
snRNA	HACA-box	64	130.2	8333	0.2%
	splicing	85	147.8	12566	0.3%

Open in a new tab

Table 8.

Annotation results of coding region in Longgeng 57 assembly in comparing to the commonly used assemblies.

Data set	Number of proteins	Averaged gene length(bp)	Averaged cds length(bp)	Averaged exon length(bp)	Averaged intron length(bp)
MH63RS-3	60171	2610.38	1082.45	263.7	493.1
IRGSPv1.0	32441	2324.02	1047.75	264.1	431.14
ZS97RS-3	59737	2599.92	1093.13	267.86	490.07
Longgeng 57	39920	3291.76	1150.31	239.44	563.93

Open in a new tab

For the SRS data of the three parents (LN2, PN8, and LG29), we firstly aligned them against reference genome IRGSPv1.0 to gain the genome variations. Then we adopted sequences of three representative types of major genes from IRGSPv1.0 as queries and BLAST against LG57 assembly to get target sequences.

More details about data validation cases from three key genes for LG57 breeding works based on the pedigree genome data especially the assembly data of LG57 and the alignment data of its three parents were listed in Table 9. The maturing time of Geng/japonica is largely affected by Hd1 gene⁴³, which commonly harbors highly-diverse variation panels in rice genome⁴⁴. In this region, LG57 and its three early Geng/japonica parents show extremely high consistency. The grain quality of glutinous rice is mainly controlled by Waxy gene⁴⁵. LG57 possess better grain quality than other glutinous early Geng/japonica varieties, such as PN2 and LN2. There are three differences in the Waxy genes found between PN8 and LN2. Although a common variation in the 5^th exon of Waxy was found in PN8, LN2, and their progeny, LG57, there is a unique 23 bp deletion in the 1^st exon that is shared by LG57 and its non-glutinous parent, LG29. Variations in major gene OsNramp5 affects the mineral concentrations in rice¹⁰. It’s notable that LG57 has variations that are different from all three parents, which is supposed to be caused by spontaneous mutations in breeding process^46,47. Three types of variations in three representative genes validated the genome data and indicated the possible applications with this dataset. In a word, the quality of the pedigree genome data of LG57 was sufficient for public reuse in the future.

Table 9.

Genome variations in three representative types of genes (Hd1 for maturing time, Waxy for amylose content, and OsNramp5 for mineral concentration, where 0 represents the genotype of the reference genome³⁶ and 1 represents the first alternative genotype (ALT).

Target Loci	Position (bp)	Region	Ref	Alt	LG57	LN2	PN8	LG29
Hd1	9336605	1st exon	—	GAA insert	1/1	1/1	1/1	1/1
	9336784	1st exon	GC	AA	1/1	1/1	1/1	1/1
	9336855	1st exon	C	del	0/0	0/0	0/0	0/0
	9336944	1st exon	G	T	0/0	0/0	0/0	0/0
	9337002	1st exon	C	A	1/1	1/1	1/1	1/1
	9337005	1st exon	C	A	1/1	1/1	1/1	1/1
	9337023	1st exon	G	A	1/1	1/1	1/1	1/1
	9337038	1st exon	33 bp	de1	1/1	1/1	1/1	1/1
	9337278	1st exon	43 bp	de1	0/0	0/0	0/0	0/0
	9337404	2nd exon	TT	de1	0/0	0/0	0/0	0/0
	9337623	2nd exon	AAGA	de1	0/0	0/0	0/0	0/0
Waxy	1767032	1st exon	C	del	1/1	0/0	0/0	1/1
	1767036	1st exon	G	del	1/1	0/0	0/0	1/1
	1767037	1st exon	C	del	1/1	0/0	0/0	1/1
	1767039	1st exon	C	del	1/1	0/0	0/0	1/1
	1767041	1st exon	G	del	1/1	0/0	0/0	1/1
	1767044	1st exon	G	del	1/1	0/0	0/0	1/1
	1768006	5th exon	A	C, del	1/1	1/1	1/1	2/2
OsNramp5	8878343	2nd intron	TCTC	de1	1/1	0/0	0/0	0/0
	8872443	12th intron	A	G	1/1	0/0	0/0	0/0
	8872467	12th intron	22 bp	de1	1/1	1/1	1/1	0/0

Open in a new tab

Acknowledgements

This work was mainly supported by the National Key Research and Development Program of China (2020YFE0202300, 2023ZD04076); the National Nature Science Fund of China (grant number 31871715); the Key Special Program (2022ZD0400404), Ministry of Science and Technology, China, International Science & Technology Innovation Program of Chinese Academy of Agricultural Sciences (grant numbers CAASTIPS, CAAS-ZDRW202109); the Guangxi Key Laboratory of Rice Genetics and Breeding (grant number 2022-36-Z01-KF10); the Hainan Yazhou Bay Seed Lab (B21HJ0216); and the Bill & Melinda Gates Foundation (grant number OPP1130530).

Author contributions

T.Q.Z., Y.J.Z. and J.L.X. designed and conceived research; Y.B.L., L.Y.X., Z.Q.Z. and D.C.Y. prepared samples for sequencing; Y.B.L., J.L., P.Y.Q., M.F., Y.L.Z. and G.J.Q. performed data collection and analysis; X.Q.Z., B.L.L., W.S.W., F.Z. and Z.K.L. give valuable suggestions and contributed new reagents/analytic tools; T.Q.Z. and J.L.X. wrote the paper. All authors read and approved the final manuscript.

Code availability

No custom code was used during this study for the curation and/or validation of the dataset.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Yuanbao Lei, Yunjiang Zhang, Linyun Xu.

Contributor Information

Jianlong Xu, Email: xujlcaas@126.com.

Tianqing Zheng, Email: tonyztq@163.com.

References

1.NBSC. National Data, https://data.stats.gov.cn/english/ (2022).
2.FAO. FAOSTAT-Crops and livestock products, https://www.fao.org/faostat/en/ (2022).
3.Nie SJ, et al. Assembly of an early-matured japonica (Geng) rice genome, Suijing18, based on PacBio and Illumina sequencing. Sci Data. 2017;4:170195. doi: 10.1038/sdata.2017.195. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.EMR. Global Glutinous Rice Market Outlook. (2022).
5.Terashima Y, Nagai Y, Kato H, Ohta A, Tanaka Y. Eating glutinous brown rice for one day improves glycemic control in Japanese patients with type 2 diabetes assessed by continuous glucose monitoring. Asia Pac J Clin Nutr. 2017;26:421–426. doi: 10.6133/apjcn.042016.07. [DOI] [PubMed] [Google Scholar]
6.Cadogan, M. Sticky rice & mango, https://www.bbcgoodfoodme.com/recipes/sticky-rice-and-mango/ (2022).
7.Ling HY, et al. Amylopectin from Glutinous Rice as a Sustainable Binder for High-Performance Silicon Anodes. ENERGY & ENVIRONMENTAL MATERIALS. 2021;4:263–268. doi: 10.1002/eem2.12143. [DOI] [Google Scholar]
8.GAREFU. What Is Glutinous Rice Glue?, https://www.garefutech-paste.com/news/what-is-glutinous-rice-glue-60117817.html (2022).
9.Yao L, et al. Glutinous rice-derived carbon material for high-performance zinc-ion hybrid supercapacitors. Journal of Energy Storage. 2023;58:106378. doi: 10.1016/j.est.2022.106378. [DOI] [Google Scholar]
10.Zhao FJ, Chang JD. A weak allele of OsNRAMP5 for safer rice. J Exp Bot. 2022;73:6009–6012. doi: 10.1093/jxb/erac323. [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Liu CL, et al. Characterization of a major QTL for manganese accumulation in rice grain. Scientific Reports. 2017;7:17704. doi: 10.1038/s41598-017-18090-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Luo JS, et al. A defensin-like protein drives cadmium efflux and allocation in rice. Nature Communications. 2018;9:645. doi: 10.1038/s41467-018-03088-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Wang W, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018;557:43–49. doi: 10.1038/s41586-018-0063-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Wang CC, et al. Towards a deeper haplotype mining of complex traits in rice with RFGB v2.0. Plant Biotechnology Journal. 2020;18:14–16. doi: 10.1111/pbi.13215. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Covaris. g-TUBE, https://www.covaris.com/products-services/consumables/g-tube (2022).
16.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
19.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009;Chapter 4:4.10.11–14.10.14. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]
20.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004;32:W20–25,. doi: 10.1093/nar/gkh435. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Keilwagen J, Hartung F, Grau J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol Biol. 2019;1962:161–177. doi: 10.1007/978-1-4939-9173-0_9. [DOI] [PubMed] [Google Scholar]
23.Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. 2003;31:439–441. doi: 10.1093/nar/gkg006. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Stanke M, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–439,. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Borodovsky M, Lomsadze A. Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinformatics. 2011;Chapter 4:4.6.1–4.6.10. doi: 10.1002/0471250953.bi0406s35. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Consortium TU. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research. 2022;51:D523–D531. doi: 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Finn RD, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–230,. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research. 2018;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Research. 2021;49:9077–9096. doi: 10.1093/nar/gkab688. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Kawahara Y, et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice (N Y) 2013;6:4. doi: 10.1186/1939-8433-6-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Oryza sativa Japonica Group cultivar Early Geng isolate Longgeng 57, whole genome shotgun sequencing project, NCBI GenBank, https://identifiers.org/ncbi/insdc:JAXQPT000000000 (2023).
38.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Genomic data of Long Geng 57 in bam format, NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR25376496 (2023).
39.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Genomic data for PN8 in Fastq format, NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR24688636 (2023).
40.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Genomic data for LN2 in Fastq format, NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR24688637 (2023).
41.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Genomic data for LG29 in Fastq format, NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR24688635 (2023).
42.Zheng T-Q. 2023. Annotation files for Longgeng 57. figshare. [DOI]
43.Leng Y, et al. Using Heading date 1 preponderant alleles from indica cultivars to breed high-yield, high-quality japonica rice varieties for cultivation in south China. Plant Biotechnology Journal. 2020;18:119–128. doi: 10.1111/pbi.13177. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Wu C-C, et al. Studies of rice Hd1 haplotypes worldwide reveal adaptation of flowering time to different environments. PLOS ONE. 2020;15:e0239028. doi: 10.1371/journal.pone.0239028. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Zeng D, et al. Rational design of high-yield and superior-quality rice. Nature Plants. 2017;3:17031. doi: 10.1038/nplants.2017.31. [DOI] [PubMed] [Google Scholar]
46.Faruquee M, et al. Dominant early heading without yield drag in a sister-line BC breeding progeny DEH_229 is controlled by multiple genetic factors with main-effect loci. The Crop Journal. 2021;9:400–411. doi: 10.1016/j.cj.2020.06.014. [DOI] [Google Scholar]
47.Li H, et al. A spontaneous thermo-sensitive female sterility mutation in rice enables fully mechanized hybrid breeding. Cell Res. 2022;32:931–945. doi: 10.1038/s41422-022-00711-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

Zheng T-Q. 2023. Annotation files for Longgeng 57. figshare. [DOI]

Data Availability Statement

No custom code was used during this study for the curation and/or validation of the dataset.

[CR1] 1.NBSC. National Data, https://data.stats.gov.cn/english/ (2022).

[CR2] 2.FAO. FAOSTAT-Crops and livestock products, https://www.fao.org/faostat/en/ (2022).

[CR3] 3.Nie SJ, et al. Assembly of an early-matured japonica (Geng) rice genome, Suijing18, based on PacBio and Illumina sequencing. Sci Data. 2017;4:170195. doi: 10.1038/sdata.2017.195. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.EMR. Global Glutinous Rice Market Outlook. (2022).

[CR5] 5.Terashima Y, Nagai Y, Kato H, Ohta A, Tanaka Y. Eating glutinous brown rice for one day improves glycemic control in Japanese patients with type 2 diabetes assessed by continuous glucose monitoring. Asia Pac J Clin Nutr. 2017;26:421–426. doi: 10.6133/apjcn.042016.07. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Cadogan, M. Sticky rice & mango, https://www.bbcgoodfoodme.com/recipes/sticky-rice-and-mango/ (2022).

[CR7] 7.Ling HY, et al. Amylopectin from Glutinous Rice as a Sustainable Binder for High-Performance Silicon Anodes. ENERGY & ENVIRONMENTAL MATERIALS. 2021;4:263–268. doi: 10.1002/eem2.12143. [DOI] [Google Scholar]

[CR8] 8.GAREFU. What Is Glutinous Rice Glue?, https://www.garefutech-paste.com/news/what-is-glutinous-rice-glue-60117817.html (2022).

[CR9] 9.Yao L, et al. Glutinous rice-derived carbon material for high-performance zinc-ion hybrid supercapacitors. Journal of Energy Storage. 2023;58:106378. doi: 10.1016/j.est.2022.106378. [DOI] [Google Scholar]

[CR10] 10.Zhao FJ, Chang JD. A weak allele of OsNRAMP5 for safer rice. J Exp Bot. 2022;73:6009–6012. doi: 10.1093/jxb/erac323. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR11] 11.Liu CL, et al. Characterization of a major QTL for manganese accumulation in rice grain. Scientific Reports. 2017;7:17704. doi: 10.1038/s41598-017-18090-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Luo JS, et al. A defensin-like protein drives cadmium efflux and allocation in rice. Nature Communications. 2018;9:645. doi: 10.1038/s41467-018-03088-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Wang W, et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature. 2018;557:43–49. doi: 10.1038/s41586-018-0063-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Wang CC, et al. Towards a deeper haplotype mining of complex traits in rice with RFGB v2.0. Plant Biotechnology Journal. 2020;18:14–16. doi: 10.1111/pbi.13215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Covaris. g-TUBE, https://www.covaris.com/products-services/consumables/g-tube (2022).

[CR16] 16.Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Rhie A, Walenz BP, Koren S, Phillippy AM. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology. 2020;21:245. doi: 10.1186/s13059-020-02134-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]

[CR19] 19.Tarailo-Graovac M, Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2009;Chapter 4:4.10.11–14.10.14. doi: 10.1002/0471250953.bi0410s25. [DOI] [PubMed] [Google Scholar]

[CR20] 20.Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA. 2015;6:11. doi: 10.1186/s13100-015-0041-9. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.McGinnis S, Madden TL. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 2004;32:W20–25,. doi: 10.1093/nar/gkh435. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Keilwagen J, Hartung F, Grau J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol Biol. 2019;1962:161–177. doi: 10.1007/978-1-4939-9173-0_9. [DOI] [PubMed] [Google Scholar]

[CR23] 23.Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. 2003;31:439–441. doi: 10.1093/nar/gkg006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Stanke M, et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006;34:W435–439,. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Borodovsky M, Lomsadze A. Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES. Curr Protoc Bioinformatics. 2011;Chapter 4:4.6.1–4.6.10. doi: 10.1002/0471250953.bi0406s35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Haas BJ, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 2008;9:R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Consortium TU. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Research. 2022;51:D523–D531. doi: 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.Finn RD, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42:D222–230,. doi: 10.1093/nar/gkt1223. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR29] 29.Huerta-Cepas J, et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Research. 2018;47:D309–D314. doi: 10.1093/nar/gky1085. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25:25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30. doi: 10.1093/nar/28.1.27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Research. 2021;49:9077–9096. doi: 10.1093/nar/gkab688. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Kawahara Y, et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice (N Y) 2013;6:4. doi: 10.1186/1939-8433-6-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Oryza sativa Japonica Group cultivar Early Geng isolate Longgeng 57, whole genome shotgun sequencing project, NCBI GenBank, https://identifiers.org/ncbi/insdc:JAXQPT000000000 (2023).

[CR38] 38.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Genomic data of Long Geng 57 in bam format, NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR25376496 (2023).

[CR39] 39.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Genomic data for PN8 in Fastq format, NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR24688636 (2023).

[CR40] 40.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Genomic data for LN2 in Fastq format, NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR24688637 (2023).

[CR41] 41.Institute of Crop Sciences/National Key Facility for Crop Gene Resources and Genetic Improvement, C. Genomic data for LG29 in Fastq format, NCBI Sequence Read Archive, https://identifiers.org/ncbi/insdc.sra:SRR24688635 (2023).

[CR42] 42.Zheng T-Q. 2023. Annotation files for Longgeng 57. figshare. [DOI]

[CR43] 43.Leng Y, et al. Using Heading date 1 preponderant alleles from indica cultivars to breed high-yield, high-quality japonica rice varieties for cultivation in south China. Plant Biotechnology Journal. 2020;18:119–128. doi: 10.1111/pbi.13177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Wu C-C, et al. Studies of rice Hd1 haplotypes worldwide reveal adaptation of flowering time to different environments. PLOS ONE. 2020;15:e0239028. doi: 10.1371/journal.pone.0239028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Zeng D, et al. Rational design of high-yield and superior-quality rice. Nature Plants. 2017;3:17031. doi: 10.1038/nplants.2017.31. [DOI] [PubMed] [Google Scholar]

[CR46] 46.Faruquee M, et al. Dominant early heading without yield drag in a sister-line BC breeding progeny DEH_229 is controlled by multiple genetic factors with main-effect loci. The Crop Journal. 2021;9:400–411. doi: 10.1016/j.cj.2020.06.014. [DOI] [Google Scholar]

[CR47] 47.Li H, et al. A spontaneous thermo-sensitive female sterility mutation in rice enables fully mechanized hybrid breeding. Cell Res. 2022;32:931–945. doi: 10.1038/s41422-022-00711-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Pedigree genome data of an early-matured Geng/japonica glutinous rice mega variety Longgeng 57

Yuanbao Lei

Yunjiang Zhang

Linyun Xu

Wendong Ma

Ziqi Zhou

Jie Li

Pengyu Quan

Muhiuddin Faruquee

Dechen Yang

Fan Zhang

Yongli Zhou

Guangjun Quan

Xiuqin Zhao

Wensheng Wang

Bailong Liu

Zhikang Li

Jianlong Xu

Tianqing Zheng

Abstract

Background & Summary

Methods

Plant material and library construction

Fig. 1.

Table 1.

DNA sample testing

Library construction and Inventory inspection

Sequencing

Genome assembly, validation and annotation

Data Records

Technical Validation

Fig. 2.

Table 2.

Table 3.

Table 4.

Table 5.

Table 6.

Table 7.

Table 8.

Table 9.

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

Contributor Information

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases