Improved maize reference genome with single-molecule technologies

Yinping Jiao; Paul Peluso; Jinghua Shi; Tiffany Liang; Michelle C Stitzer; Bo Wang; Michael S Campbell; Joshua C Stein; Xuehong Wei; Chen-Shan Chin; Katherine Guill; Michael Regulski; Sunita Kumari; Andrew Olson; Jonathan Gent; Kevin L Schneider; Thomas K Wolfgruber; Michael R May; Nathan M Springer; Eric Antoniou; W Richard McCombie; Gernot G Presting; Michael McMullen; Jeffrey Ross-Ibarra; R Kelly Dawe; Alex Hastie; David R Rank; Doreen Ware

doi:10.1038/nature22971

. 2017 Jun 12;546(7659):524–527. doi: 10.1038/nature22971

Improved maize reference genome with single-molecule technologies

Yinping Jiao ¹, Paul Peluso ², Jinghua Shi ³, Tiffany Liang ³, Michelle C Stitzer ⁴, Bo Wang ¹, Michael S Campbell ¹, Joshua C Stein ¹, Xuehong Wei ¹, Chen-Shan Chin ², Katherine Guill ⁵, Michael Regulski ¹, Sunita Kumari ¹, Andrew Olson ¹, Jonathan Gent ⁶, Kevin L Schneider ⁷, Thomas K Wolfgruber ⁷, Michael R May ⁸, Nathan M Springer ⁹, Eric Antoniou ¹, W Richard McCombie ¹, Gernot G Presting ⁷, Michael McMullen ⁵, Jeffrey Ross-Ibarra ¹⁰, R Kelly Dawe ⁶, Alex Hastie ³, David R Rank ², Doreen Ware ^1,^11,^✉

PMCID: PMC7052699 PMID: 28605751

An improved reference genome for maize, using single-molecule sequencing and high-resolution optical mapping, enables characterization of structural variation and repetitive regions, and identifies lineage expansions of transposable elements that are unique to maize.

Supplementary information

The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users.

Subject terms: Plant sciences, Genetics, Genome informatics

A better map of the maize genome

The maize genome was initially reported in 2009 but with some accuracy limitations. Doreen Ware and colleagues report a new reference genome for maize using single-molecule sequencing and high-resolution optical mapping. The technique shows improvements in the gene space including resolution of gaps and misassemblies and correction of order and orientation of genes. The authors characterize structural variation and repetitive regions, and identify transposable element lineage expansions unique to maize.

Supplementary information

The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users.

Abstract

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation¹. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions². Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome³, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing⁴. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.

Supplementary information

The online version of this article (doi:10.1038/nature22971) contains supplementary material, which is available to authorized users.

Main

Maize is the most productive and widely grown crop in the world, as well as a foundational model for genetics and genomics⁵. An accurate genome assembly for maize is crucial for all forms of basic and applied research, which will enable increases in yield to feed the growing world population. The current assembly of the maize genome, based on Sanger sequencing, was first published in 2009 (ref. 3). Although this initial reference enabled rapid progress in maize genomics¹, the original assembly is composed of more than 100,000 small contigs, many of which are arbitrarily ordered and oriented, markedly complicating detailed analysis of individual loci⁶ and impeding investigation of intergenic regions crucial to our understanding of phenotypic variation^7,8 and genome evolution^9,10.

Here we report a vastly improved de novo assembly and annotation of the maize reference genome (Fig. 1). On the basis of 65× single-molecule real-time sequencing (SMRT) (Extended Data Fig. 1), we assembled the genome of the maize inbred line B73 into 2,958 contigs, in which half of the total assembly is made up of contigs larger than 1.2 Mb (Table 1, Extended Data Fig. 2a). The assembly of the long reads was then integrated with a high-quality optical map (Extended Data Fig. 1, Extended Data Table 1) to create a hybrid assembly consisting of 625 scaffolds (Table 1). To build chromosome-level super-scaffolds, we combined the hybrid assembly with a minimum tiling path generated from the bacterial artificial chromosomes (BACs)¹¹ and a high-density genetic map¹² (Extended Data Fig. 2b). After gap-filling and error correction using short sequence reads, the total size of maize B73 RefGen_v4 pseudomolecules was 2,106 Mb. The new reference assembly has 2,522 gaps, of which almost half (n = 1,115) have optical map coverage, giving an estimated mean gap length of 27 kb (Extended Data Fig. 2c). The new maize B73 reference genome has 240-fold higher contiguity than the recently published short-read genome assembly of maize cultivar PH207 (contig N50: 1,180 kb versus 5 kb)¹³.

a, Workflow for genome construction. b, Ideograms of maize B73 version 4 reference pseudomolecules. The top track shows positions of 2,522 gaps in the pseudomolecules, including 1,115 gaps in which the lengths were estimated using optical genome maps (orange), whereas the remainder (purple) have undetermined lengths. More than half of the assembly is constituted of contigs longer than 1 Mb, which are shown as light grey bars in the bottom track.

PowerPoint slide

Source data

Extended Data Figure 1 — a, Size distribution of single molecules for the optical maps. A total of 150 Gb (~60-fold coverage) of single-molecule raw data from BioNano chips was collected for map construction. The N50 of the single molecules was ~261 kb, and the label density was 11.6 per 100 kb. After assembly, the total size of the map reached 2.12 Gb with an N50 of 2.47 Mb. b, Length distribution of SMRT sequencing reads. Sequencing of 212 P6-C4 SMRT cells on the PacBio platform generated ~65-fold depth-of-coverage of the nuclear genome. Read lengths averaged 11.7 kb, with reads above 10 kb providing 53-fold depth-of-coverage. c, The accuracy of SMRT sequencing from a representative run. The sequencing error rate was estimated at 10% from the alignment with the maize B73 RefGen_v3 by BLASR. d, Plot of the fraction of alignable data per run (alignable bases/total bases per chip) versus total raw bases (per chip) for each B73 sequencing run. As the plot shows, the trend in the data suggests that as the overall per run raw base yield increases, the fraction of alignable bases decreases. This is owing to the fact that in all runs, a subset of the zero-mode waveguide (ZMWs) will initially have more than one active sequencing enzyme in the observation field at the start of the sequencing run. A ZMW with more than one active polymerase will create unalignable bases while the two polymerases are simultaneously synthesizing DNA and yield a ‘merged sequencing signal from two independent polymerases’. As the loading of a chips increases (yield of bases), the probability of having two or more polymerases in a single ZMW increases.

Table 1.

Assembly statistics of the maize B73 RefGen_v4 genome

	Number of contigs (scaffolds)	Mean length (Mb)	N50 size (Mb)	Maximum length (Mb)	Total assembly length (Mb)
Original optical maps	1,342	1.57	2.47	12.43	2,107
Original contigs from sequence assembly	3,303	0.64	1.04	5.65	2,105
Curated optical maps	1,356	1.56	2.47	12.47	2,114
Curated contigs from sequence assembly	2,958	0.71	1.18	7.26	2,104
Optical maps in hybrid scaffolds	1,287	1.62	2.49	12.47	2,080
Contigs in hybrid scaffolds	2,696	0.77	1.19	7.26	2,075
Hybrid scaffolds	356	5.97	9.73	38.53	2,075
Hybrid scaffolds and non-scaffolded contigs	625	3.45	9.56	38.53	2,105

Open in a new tab

Extended Data Figure 2 — a, Summary of the three assembly sets. b, How the scaffolds were ordered according to the order of the BACs. c, Size distribution of gaps in the pseudomolecules estimated using the optical map.

Comparison of the new assembly to the previous BAC-based maize reference genome assembly revealed more than 99.9% sequence identity and a 52-fold increase in the mean contig length, with 84% of the BACs spanned by a single contig from the long reads assembly. Alignment of chromatin-immunoprecipitation followed by sequencing (ChIP–seq) data for the centromere-specific histone H3 (CENH3)¹⁴ revealed that centromeres are accurately placed and largely intact. Several previously identified¹⁵ megabase-sized mis-oriented pericentromeric regions were also corrected (Extended Data Fig. 3a, b). Moreover, the ends of the chromosomes are properly identified on 14 out of the 20 chromosome arms based on the presence of tandem telomeric repeats and knob 180 sequences (Extended Data Fig. 3a, c).

Extended Data Figure 3 — a, Quality assessment of centromere and telomere using optical genome map. b, Locations of centromeres on pseudomolecules defined by ChIP–seq in the B73 RefGen_v3 and v4. c, Telomere repeats found in the B73 RefGen_v4 pseudomolecules.

Our assembly made substantial improvements in the gene space including resolution of gaps and misassemblies and correction of order and orientation of genes. We also updated the annotation of our new assembly, resulting in consolidation of gene models (Extended Data Fig. 4). Newly published full-length cDNA data⁴ improved the annotation of alternative splicing by more than doubling the number of alternative transcripts from 1.6 to 3.3 per gene (Extended Data Fig. 5a), with about 70% of genes supported by the full-length transcripts. Our reference assembly also vastly improved the coverage of regulatory sequences, decreasing the number of genes exhibiting gaps in the 3-kb region(s) flanking coding sequence from 20% to <1% (Extended Data Fig. 5b). The more complete sequence enabled notable improvements in the annotation of core promoter elements, especially the TATA-box, CCAAT-box, and Y patch motifs (Supplementary Information). Quantitative genetic analyses have shown that polymorphisms in regulatory regions explain a substantial majority of the genetic variation for many phenotypes^7,8, suggesting that the new reference will markedly improve our ability to identify and predict functional genetic variation.

Extended Data Figure 4 — a, The pipeline used to characterize high confidence gene models. b, Summary of B73 RefGen_v4 protein-coding gene annotation, and comparison with RefGen_v3 annotation.

Extended Data Figure 5 — a, Number of transcripts of each gene in v3 and v4 annotation. b, Percentages of genes with gaps in flanking regions in the v3 and v4 annotations.

After its divergence from Sorghum, the maize lineage underwent genome doubling followed by diploidization and gene loss. Previous work showed that gene loss is biased towards one of the parental genomes^3,16, but our new assembly and annotation instead suggest that 56% of syntenic sorghum orthologues map uniquely to the dominant maize subgenome (designated A, total size 1.16 Gb), whereas only 24% map uniquely to subgenome B (total size 0.63 Gb). Gene loss in maize has primarily been considered in the context of polyploidy and functional redundancy¹⁶, but we found that despite its polyploidy, maize has lost a larger proportion (14%) of the 22,048 ancestral gene orthologues than any of the other four grass species evaluated to date (Sorghum, rice, Brachypodium distachyon and Setaria italica; Extended Data Fig. 6). Nearly one-third of these losses are specific to maize, and analysis of a restricted high-confidence set revealed enrichment for genes involved in biotic and abiotic stresses (Extended Data Table 2), for example, NB-ARC domain disease-resistance genes¹⁷ and the serpin protease inhibitor involved in pathogen defence and programmed cell death¹⁸.

Extended Data Figure 6 — a, Species-membership in orthologue sets, giving counts and percentage of orthologue sets of which each species is a member. Numbers in parentheses give the percentage of orthologue sets with membership of all species and versions within the clade. na, not applicable. b, Venn diagram showing overlap of 6,539 orthologue sets rooted in the Poaceae (true grasses) that are deficient in gene membership among five species.

Table 2.

Structural variations from optical maps of two maize lines

	Ki11 map versus B73 RefGen_v4	W22 maps versus B73 RefGen_v4
Total size of genome map (Mb)	2,216	2,280
Map aligned to reference genome (Mb)	722	893
Reference genome covered by map (Mb)	694	861
Region in B73 with insertion and deletion (Mb)	223	221
Ratio of region with insertion and deletion (%)	32.15	25.67
Number of insertions	1,794	1,614
Average insertion size (bp)	21,510	21,470
Number of deletions	1,701	1,597
Average deletion size (bp)	18,340	20,120
Number of deletion regions potentially affecting genes	636	621

Open in a new tab

Transposable elements were first reported in maize¹⁹ and have since been shown to have important roles in shaping genome evolution and gene regulatory networks of many species²⁰. Most of the maize genome is derived from transposable elements^3,21, and careful study of a few regions has revealed a characteristic structure of sequentially nested retrotransposons^21,22 and the effect of deletions and recombination on retrotransposon evolution²³. In the annotation of the original maize assembly, however, fewer than 1% of long terminal repeat (LTR) retrotransposon copies were intact²⁴. By applying a new homology-independent annotation pipeline to our assembly (Extended Data Table 3), we identified 1,268 Mb (130,604 copies) of structurally intact retrotransposons, of which 661 Mb (70,035 copies) are nested retrotransposon copies disrupted by the insertion of other transposable elements, 8.7 Mb (14,041 copies) are DNA terminal inverted repeat transposons, and 76 Mb (21,095 copies) are helitrons. To understand the evolutionary history of maize LTR retrotransposons, we also applied our annotation pipeline to the sorghum reference genome, and used reverse transcriptase protein domain sequences that were accessible owing to the improved assembly of the internal protein coding domains of maize LTR retrotransposons to reconstruct the phylogeny of maize and sorghum LTR retrotransposon families. Despite a higher overall rate of diversification of LTR transposable elements in the maize lineage consistent with its larger genome size, differences in LTR retrotransposon content between genomes were primarily the result of marked expansion of distinct families in both lineages (Fig. 2).

Extended Data Table 3.

Structural annotation of transposable elements

Open in a new tab

a, b, Both Ty3/Gypsy (a) and Ty1/Copia (b) superfamilies are present at higher copy number in maize (red) than in sorghum (blue). Bars (log₁₀-scaled) depict family copy numbers.

PowerPoint slide

Source data

Maize exhibits tremendous genetic diversity²⁵, and both nucleotide polymorphisms and structural variations have important roles in its phenotypic variation^10,26. However, genome-wide patterns of structural variation in plant genomes are difficult to assess²⁷, and previous efforts have relied on short-read mapping, which misses the vast majority of intergenic spaces where most rearrangements are likely to occur¹⁰. To investigate structural variation at a genome-wide scale, we generated optical maps (Extended Data Table 1) for two additional maize inbred lines: the tropical line Ki11, one of the founders of the maize nested association mapping (NAM) population²⁸, and W22, which has served as a foundation for studies of maize genetics²⁹. Owing to the high degree of genomic diversity among these lines, only 32% of the assembled 2,216 Mb map of Ki11 and 39% of the 2,280 Mb W22 map could be mapped to our new B73 reference via common restriction patterns (Table 2, Fig. 3a and Extended Data Fig. 7). The high density of alignments across and near many of the exceedingly retrotransposon-rich centromeres reflects the comparatively low genetic diversity of most centromeres in domesticated maize¹⁵ and illustrates the ability of the combined optical mapping/single-molecule sequencing methodology to traverse large repeat-rich regions. Within the aligned regions, approximately 32% of the Ki11 and 26% of the W22 optical maps exhibited clear evidence of structural variation, including 3,408 insertions and 3,298 deletions (Table 2). The average indel size was approximately 20 kb, with a range from 1 kb to over 1 Mb (Fig. 3b). More than 90% of the indels were unique to one inbred or the other, indicating a high level of structural diversity in maize. As short-read sequence data are available from both Ki11 and W22 (ref. 10), we analysed 1,451 of the largest (>10 kb) deletions and found that 1,083 were supported by a clear reduction in read depth (Fig. 3c). The confirmed deletions occurred in regions of low gene density (4.4 genes per megabase compared to a genome-wide average of 18.7 genes per megabase). One-third (83 out of 257) of the genes missing in Ki11 or W22 lack putative orthologues in all four grasses (rice, sorghum, Brachypodium and Setaria), consistent with previous data³⁰.

a, Alignment and structural variation called from Ki11 and W22 optical maps on chromosome 10. b, Size distribution of the insertion and deletions in Ki11 and W22. c, Example of using short-read alignment to verify a missing region mapped in Ki11.

PowerPoint slide

Source data

Although maize is often considered to be a large-genome crop, most major food crops have even larger genomes with more complex repeat landscapes². Our improved assembly of the B73 genome, generated using single-molecule technologies, demonstrates that additional assemblies of other maize inbred lines and similar high-quality assemblies of other repeat-rich and large-genome plants are feasible. Further high-quality assemblies will in turn extend our understanding of the genetic diversity that forms the basis of the phenotypic diversity in maize and other economically important plants.

Methods

No statistical methods were used to predetermine sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment.

Whole-genome sequencing using SMRT technology

DNA samples for SMRT sequencing were prepared using maize inbred line B73 from NCRPIS (PI550473), grown at University of Missouri. Seeds of this line were deposited at NCRPIS (tracking number PI677128). Etiolated seedlings were grown for 4–6 days in Pro-Mix at 37 °C in darkness to minimize chloroplast DNA. Batches of ~10 g were snap-frozen in liquid nitrogen. DNA was extracted following the PacBio protocol ‘Preparing Arabidopsis Genomic DNA for Size-Selected ~20 kb SMRTbell Libraries’ (http://www.pacb.com/wp-content/uploads/2015/09/Shared-Protocol-Preparing-Arabidopsis-DNA-for-20-kb-SMRTbell-Libraries.pdf).

Genomic DNA was sheared to a size range of 15–40 kb using either G-tubes (Covaris) or a Megarupter device (Diagenode), and enzymatically repaired and converted into SMRTbell template libraries as recommended by Pacific Biosciences. In brief, hairpin adapters were ligated, after which the remaining damaged DNA fragments and those without adapters at both ends were eliminated by digestion with exonucleases. The resulting SMRTbell templates were size-selected by Blue Pippin electrophoresis (Sage Sciences) and templates ranging from 15 to 50 kb, were sequenced on a PacBio RS II instrument using P6-C4 sequencing chemistry. To acquire long reads, all data were collected as either 5- or 6-h sequencing videos.

Construction of optical genome maps using the Irys system

High-molecular mass genomic DNA was isolated from 3 g of young ear tissue after fixing with 2% formaldehyde. Nuclei were purified and lysed in embedded agarose as previously described³¹. DNA was labelled at Nt.BspQI sites using the IrysPrep kit. Molecules collected from BioNano chips were de novo assembled as previously described³² using ‘optArgument_human’.

De novo assembly of the genome sequencing data

De novo assembly of the long reads from SMRT Sequencing was performed using two assemblers: the Celera Assembler PBcR –MHAP pipeline³³ and Falcon³⁴ with different parameter settings. Quiver from SMRT Analysis v2.3.0 was used to polish base calling of contigs. The three independent assemblies were evaluated by aligning with the optical genome maps.

Contamination of contigs by bacterial and plasmid genomes was eliminated using the NCBI GenBank submission system³⁵. Curation of the assembly, including resolution of conflicts between the contigs and the optical map and removal of redundancy at the edges of contigs, is described in the Supplementary Information.

Hybrid scaffold construction

To create hybrid scaffolds, curated sequence contigs and optical maps were aligned and merged with RefAligner³² (P < 1 × 10⁻¹¹). These initial hybrid scaffolds were aligned again to the sequence contigs using a less stringent P value (1 × 10⁻⁸), and those contigs not previously merged were added if they aligned over 50% of their length and without overlapping previously merged contigs, thereby generating final hybrid scaffolds.

Pseudomolecule construction

Sequences from BACs on the physical map that were used to build the maize v3 pseudomolecules were aligned to contigs using MUMMER package³⁶ with the following parameter settings: ‘-l(minimum length of a single match) 100 -c(the minimum length of a cluster of matches) 1000’. To only use unique hits as markers, alignment hits were filtered with the following parameters: ‘-i(the minimum alignment identity) 98 -l(the minimum alignment length) 10000’. Scaffolds were then ordered and oriented into pseudochromosomes using the order of BACs as a guide. For quality control, we mapped the SNP markers from a genetic map built from an intermated maize recombinant inbred line population (Mo17 × B73)¹⁰. Contigs with markers not located in pseudochromosomes from the physical map were placed into the AGP (A Golden Path) using the genetic map.

Further polishing of pseudomolecules

Raw pseudomolecules were subjected to gap filling using Pbjelly (-maxTrim = 0, -minReads = 2) and polished again using Quiver (SMRT Analysis v2.3.0). To increase the accuracy of the base calls, we performed two lanes of sequencing on the same genomic DNA sample (library size = 450 bp) using Illumina 2500 Rapid run, which generated about 100-fold 2 × 250 paired-end (PE) data. Reads were aligned to the assembly using BWA-mem³⁷. Sequence error correction was performed with the Pilon pipeline³⁸, after aligning reads with BWA-mem³⁷ and parsing with SAMtools³⁹, using sequence and alignment quality scores above 20.

Annotation

For comprehensive annotation of transposable elements, we designed a structural identification pipeline incorporating several tools, including LTRharvest⁴⁰, LTRdigest⁴¹, SINE-Finder⁴², MGEScan-non-LTR⁴³, MITE-hunter⁴⁴, HelitronScanner⁴⁵, and others (details in Supplementary Information). The scripts, parameters, and intermediate files of each transposable element superfamily are available at https://github.com/mcstitzer/maize_v4_TE_annotation.

The MAKER-P pipeline was used to annotate protein-coding genes⁴⁶, integrating ab initio prediction with publicly available evidence from full-length cDNA⁴⁷, de novo assembled transcripts from short-read mRNA sequencing (mRNA-seq)⁴⁸, isoform-sequencing (Iso-Seq) full-length transcripts¹⁴, and proteins from other species. The gene models were filtered to remove transposons and low-confidence predictions. Additional alternative transcript isoforms were obtained from the Iso-Seq data. Further details on annotations, core promoter analysis, and comparative phylogenomics are described in Supplementary Information.

Structural variation

Leaves were used to prepare high molecular mass DNA and optical genome maps were constructed as described above for B73. Structural variant calls were generated based on alignment to the reference map B73 v4 chromosomal assembly using the multiple local alignment algorithm (RefSplit)³². A structural variant was identified as an alignment outlier^32,49, defined as two well-aligned regions separated by a poorly aligned region with a large size difference between the reference genome and the map or by one or more unaligned sites, or alternatively as a gap between two local alignments. A confidence score was generated by comparing the non-normalized P values of the two well-aligned regions and the non-normalized log-likelihood ratio⁵⁰ of the unaligned or poorly aligned region. With a confidence score threshold of 3, RefSplit is sensitive to insertions and deletions as small as 100 bp (events smaller than 1 kb are generally compound or substitution and include label changes, not just spacing differences) and other changes such as inversions and complex events which could be balanced. Insertion and deletion calls were based on an alignment outlier P-value threshold of 1 × 10⁻⁴. Insertions or deletions that crossed gaps in the B73 pseudomolecules, or that were heterozygous in the optical genome maps, were excluded. Considering the resolution of the BioNano optical map, only insertion and deletions larger than 100 bp were used for subsequent analyses. To obtain high-confidence deletion sequences, sequencing reads from the maize HapMap2 project⁸ for Ki11 and W22 were aligned to our new B73 v4 reference genome using Bowtie2 (ref. 51). Read depth (minimum mapping quality >20) was calculated in 10-kb windows with step size of 1 kb. Windows with read depth below 10 in Ki11 and 20 in W22 (sequencing depths for Ki11 and W22 were 2.32× and 4.04×, respectively) in the deleted region were retained for further analysis.

Data availability

Raw reads, genome assembly sequences, and gene annotations have been deposited at the NCBI under BioProject number PRJNA10769 and BioSample number SAMN04296295. PacBio whole-genome sequencing data and Illumina data were deposited in the NCBI SRA database under accessions SRX1472849 and SRX1452310, respectively. The GenBank accession number of the genome assembly and annotation is LPUQ00000000. A genome browser including genome feature tracks and ftp is available from Gramene: http://ensembl.gramene.org/Zea_mays/Info/Index. All other data are available from the corresponding author upon reasonable request.

Supplementary information

Supplementary Information^{(770.9KB, pdf)}

This file contains the Supplementary Methods and references. (PDF 770 kb)

Acknowledgements

Y.J., B.W., J.S., M.S.C., X.W., S.K. and D.W. were supported by NSF Gramene grant IOS-1127112, NSF Cereal Gene Discovery grant 1032105, USDA-ARS CRIS 1907-21000-030-00D and NSF Plant Genome award 1238014. J.R.-I. would like to acknowledge support from USDA Hatch project CA-D-PLS-2066-H and NSF Plant Genome award 1238014. R.K.D. would like to acknowledge support from NSF Plant Genome award 1444514. G.G.P. acknowledges support from NSF grant 1444624 and USDA NIFA project HAW05022-H. M.S.C would also like to acknowledge support from NSF PGRP PRFB 1523793. The authors thank S. Koren for sharing genome assembly-related scripts.

Extended data figures and tables

Extended Data Table 1.

Summary of the optical maps of three maize lines

Open in a new tab

Extended Data Table 2.

Overrepresented protein domains in sorghum genes that lack orthologues in maize but are conserved in syntenic positions in other grasses

Open in a new tab

*High-confidence sorghum genes with syntenic orthologues in rice, Brachypodium or Setaria outgroup species.

†Subset of background with no annotated orthologues in either maize v3 or v4 reference assemblies, have <50% LASTZ alignment coverage with v4, and fall within synteny blocks that map to singular assembly contigs in both the A and B subgenomes of maize. Only significantly enriched cases are shown, based on hypergeometric distribution followed by FDR correction.

PowerPoint slides

PowerPoint slide for Fig. 1^{(710.5KB, ppt)}

PowerPoint slide for Fig. 2^{(700KB, ppt)}

PowerPoint slide for Fig. 3^{(668.5KB, ppt)}

Source data

Source data to Fig. 1^{(347.5KB, xlsx)}

Source data to Fig. 2^{(74KB, xlsx)}

Source data to Fig. 3^{(429.1KB, xlsx)}

Author Contributions

D.W. and Y.J. designed and conceived the research, M.M and K.G. prepared DNA samples for PacBio SMRT sequencing, D.R.R., P.P., E.A. and W.R.M. performed PacBio SMRT sequencing, B.W., J.S., R.K.D., T.L. and A.H. generated the BioNano optical genome maps, M.R. generated Illumina sequencing data, Y.J., T.L., J.S., C.-S.C. and A.H. performed the genome assembly, J.C.S., M.S.C., X.W., B.W., Y.J. and S.K. performed gene annotation and evolutionary studies, M.C.S., M.R.M., N.M.S. and J.R.-I. performed transposable element analysis, J.G., J.S., R.K.D., K.L.S., T.K.W., G.G.P. and Y.J. performed the analysis of centromeres and telomeres. B.W., Y.J., J.S., T.L., A.H. and R.K.D. performed the structural variation study. X.W., J.C.S. and Y.J. contributed to the data release. Y.J., J.R.-I., R.K.D., G.G.P. and D.W. wrote the paper. All authors contributed to the revision of the manuscript.

Accession codes

Primary accessions

BioProject

PRJNA10769

NCBI Reference Sequence

LPUQ00000000

Sequence Read Archive

Competing interests

P.P., C.-S.C. and D.R.R. are full-time employees of Pacific Biosciences. J.S., T.L. and A.H. are employees of BioNano Genomics, Inc., and own company stock options. W.R.M. has participated in Illumina sponsored meetings over the past four years and received travel reimbursement and an honorarium for presenting at these events. Illumina had no role in decisions relating to the study/work to be published, data collection and analysis of data, or the decision to publish. W.R.M. has participated in Pacific Biosciences sponsored meetings over the past three years and received travel reimbursement for presenting at these events. W.R.M. is a founder and shared holder of Orion Genomics, which focuses on plant genomics and cancer genetics. W.R.M. is an SAB member for RainDance Technologies, Inc. All other authors declare no competing financial interests.

Footnotes

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Edwards D, Batley J, Snowdon RJ. Accessing complex crop genomes with next-generation sequencing. Theor. Appl. Genet. 2013;126:1–11. doi: 10.1007/s00122-012-1964-x. [DOI] [PubMed] [Google Scholar]
2.Morrell PL, Buckler ES, Ross-Ibarra J. Crop genomics: advances and applications. Nat. Rev. Genet. 2011;13:85–96. doi: 10.1038/nrg3097. [DOI] [PubMed] [Google Scholar]
3.Schnable PS, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–1115. doi: 10.1126/science.1178534. [DOI] [PubMed] [Google Scholar]
4.Wang B, et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat. Commun. 2016;7:11708. doi: 10.1038/ncomms11708. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Hake, S. & Ross-Ibarra, J. Genetic, evolutionary and plant breeding insights from the domestication of maize. eLife4, (2015) [DOI] [PMC free article] [PubMed]
6.Fouquet R, et al. Maize rough endosperm3 encodes an RNA splicing factor required for endosperm cell differentiation and has a nonautonomous effect on embryo development. Plant Cell. 2011;23:4280–4297. doi: 10.1105/tpc.111.092163. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Wallace JG, et al. Association mapping across numerous traits reveals patterns of functional variation in maize. PLoS Genet. 2014;10:e1004845. doi: 10.1371/journal.pgen.1004845. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Rodgers-Melnick E, Vera DL, Bass HW, Buckler ES. Open chromatin reveals the functional maize genome. Proc. Natl Acad. Sci. USA. 2016;113:E3177–E3184. doi: 10.1073/pnas.1525244113. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Hufford MB, et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 2012;44:808–811. doi: 10.1038/ng.2309. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Chia JM, et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 2012;44:803–807. doi: 10.1038/ng.2313. [DOI] [PubMed] [Google Scholar]
11.Wei F, et al. The physical and genetic framework of the maize B73 genome. PLoS Genet. 2009;5:e1000715. doi: 10.1371/journal.pgen.1000715. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Ganal MW, et al. A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS One. 2011;6:e28334. doi: 10.1371/journal.pone.0028334. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Hirsch CN, et al. Draft assembly of elite inbred line PH207 provides insights into genomic and transcriptome diversity in maize. Plant Cell. 2016;28:2700–2714. doi: 10.1105/tpc.16.00353. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Gent JI, Wang K, Jiang J, Dawe RK. Stable patterns of CENH3 occupancy through maize lineages containing genetically similar centromeres. Genetics. 2015;200:1105–1116. doi: 10.1534/genetics.115.177360. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Schneider KL, Xie Z, Wolfgruber TK, Presting GG. Inbreeding drives maize centromere evolution. Proc. Natl Acad. Sci. USA. 2016;113:E987–E996. doi: 10.1073/pnas.1522008113. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Schnable JC, Springer NM, Freeling M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl Acad. Sci. USA. 2011;108:4069–4074. doi: 10.1073/pnas.1101368108. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.McHale L, Tan X, Koehl P, Michelmore RW. Plant NBS-LRR proteins: adaptable guards. Genome Biol. 2006;7:212. doi: 10.1186/gb-2006-7-4-212. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Fluhr R, Lampl N, Roberts TH. Serpin protease inhibitors in plant biology. Physiol. Plant. 2012;145:95–102. doi: 10.1111/j.1399-3054.2011.01540.x. [DOI] [PubMed] [Google Scholar]
19.McClintock B. The origin and behavior of mutable loci in maize. Proc. Natl Acad. Sci. USA. 1950;36:344–355. doi: 10.1073/pnas.36.6.344. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 2007;8:272–285. doi: 10.1038/nrg2072. [DOI] [PubMed] [Google Scholar]
21.SanMiguel P, et al. Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996;274:765–768. doi: 10.1126/science.274.5288.765. [DOI] [PubMed] [Google Scholar]
22.Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A. Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell. 2005;17:343–360. doi: 10.1105/tpc.104.025627. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Sharma A, Schneider KL, Presting GG. Sustained retrotransposition is mediated by nucleotide deletions and interelement recombinations. Proc. Natl Acad. Sci. USA. 2008;105:15470–15474. doi: 10.1073/pnas.0805694105. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Baucom RS, et al. Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genet. 2009;5:e1000732. doi: 10.1371/journal.pgen.1000732. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Buckler ES, Gaut BS, McMullen MD. Molecular and functional diversity of maize. Curr. Opin. Plant Biol. 2006;9:172–176. doi: 10.1016/j.pbi.2006.01.013. [DOI] [PubMed] [Google Scholar]
26.Dooner HK, He L. Maize genome structure variation: interplay between retrotransposon polymorphisms and genic recombination. Plant Cell. 2008;20:249–258. doi: 10.1105/tpc.107.057596. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Saxena RK, Edwards D, Varshney RK. Structural variations in plant genomes. Brief. Funct. Genomics. 2014;13:296–307. doi: 10.1093/bfgp/elu016. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.McMullen MD, et al. Genetic properties of the maize nested association mapping population. Science. 2009;325:737–740. doi: 10.1126/science.1174320. [DOI] [PubMed] [Google Scholar]
29.Strable, J . & Scanlon, M. J. Maize (Zea mays): a model organism for basic and applied research in plant biology. Cold Spring Harb. Protoc.2009, pdb.emo132 (2009) [DOI] [PubMed]
30.Swanson-Wagner RA, et al. Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Res. 2010;20:1689–1699. doi: 10.1101/gr.109165.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.VanBuren R, et al. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature. 2015;527:508–511. doi: 10.1038/nature15714. [DOI] [PubMed] [Google Scholar]
32.Pendleton M, et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods. 2015;12:780–786. doi: 10.1038/nmeth.3454. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Berlin K, et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 2015;33:623–630. doi: 10.1038/nbt.3238. [DOI] [PubMed] [Google Scholar]
34.Chin CS, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2016;44:D67–D72. doi: 10.1093/nar/gkv1276. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Kurtz S, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Walker BJ, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Ellinghaus D, Kurtz S, Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 2008;9:18. doi: 10.1186/1471-2105-9-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Steinbiss S, Willhoeft U, Gremme G, Kurtz S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res. 2009;37:7002–7013. doi: 10.1093/nar/gkp759. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Wenke T, et al. Targeted identification of short interspersed nuclear element families shows their widespread existence and extreme heterogeneity in plant genomes. Plant Cell. 2011;23:3117–3128. doi: 10.1105/tpc.111.088682. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Rho M, Tang H. MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes. Nucleic Acids Res. 2009;37:e143. doi: 10.1093/nar/gkp752. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Han Y, Wessler SR. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 2010;38:e199. doi: 10.1093/nar/gkq862. [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Xiong W, He L, Lai J, Dooner HK, Du C. HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc. Natl Acad. Sci. USA. 2014;111:10263–10268. doi: 10.1073/pnas.1410068111. [DOI] [PMC free article] [PubMed] [Google Scholar]
46.Campbell MS, et al. MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 2014;164:513–524. doi: 10.1104/pp.113.230144. [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Soderlund C, et al. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. PLoS Genet. 2009;5:e1000740. doi: 10.1371/journal.pgen.1000740. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Law M, et al. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol. 2015;167:25–39. doi: 10.1104/pp.114.245027. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Mostovoy Y, et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods. 2016;13:587–590. doi: 10.1038/nmeth.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Cao H, et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience. 2014;3:34. doi: 10.1186/2047-217X-3-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(770.9KB, pdf)}

This file contains the Supplementary Methods and references. (PDF 770 kb)

Data Availability Statement

Primary accessions

BioProject

PRJNA10769

NCBI Reference Sequence

LPUQ00000000

Sequence Read Archive

[CR1] 1.Edwards D, Batley J, Snowdon RJ. Accessing complex crop genomes with next-generation sequencing. Theor. Appl. Genet. 2013;126:1–11. doi: 10.1007/s00122-012-1964-x. [DOI] [PubMed] [Google Scholar]

[CR2] 2.Morrell PL, Buckler ES, Ross-Ibarra J. Crop genomics: advances and applications. Nat. Rev. Genet. 2011;13:85–96. doi: 10.1038/nrg3097. [DOI] [PubMed] [Google Scholar]

[CR3] 3.Schnable PS, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326:1112–1115. doi: 10.1126/science.1178534. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Wang B, et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat. Commun. 2016;7:11708. doi: 10.1038/ncomms11708. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Hake, S. & Ross-Ibarra, J. Genetic, evolutionary and plant breeding insights from the domestication of maize. eLife4, (2015) [DOI] [PMC free article] [PubMed]

[CR6] 6.Fouquet R, et al. Maize rough endosperm3 encodes an RNA splicing factor required for endosperm cell differentiation and has a nonautonomous effect on embryo development. Plant Cell. 2011;23:4280–4297. doi: 10.1105/tpc.111.092163. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Wallace JG, et al. Association mapping across numerous traits reveals patterns of functional variation in maize. PLoS Genet. 2014;10:e1004845. doi: 10.1371/journal.pgen.1004845. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Rodgers-Melnick E, Vera DL, Bass HW, Buckler ES. Open chromatin reveals the functional maize genome. Proc. Natl Acad. Sci. USA. 2016;113:E3177–E3184. doi: 10.1073/pnas.1525244113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Hufford MB, et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 2012;44:808–811. doi: 10.1038/ng.2309. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR10] 10.Chia JM, et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 2012;44:803–807. doi: 10.1038/ng.2313. [DOI] [PubMed] [Google Scholar]

[CR11] 11.Wei F, et al. The physical and genetic framework of the maize B73 genome. PLoS Genet. 2009;5:e1000715. doi: 10.1371/journal.pgen.1000715. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR12] 12.Ganal MW, et al. A large maize (Zea mays L.) SNP genotyping array: development and germplasm genotyping, and genetic mapping to compare with the B73 reference genome. PLoS One. 2011;6:e28334. doi: 10.1371/journal.pone.0028334. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Hirsch CN, et al. Draft assembly of elite inbred line PH207 provides insights into genomic and transcriptome diversity in maize. Plant Cell. 2016;28:2700–2714. doi: 10.1105/tpc.16.00353. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Gent JI, Wang K, Jiang J, Dawe RK. Stable patterns of CENH3 occupancy through maize lineages containing genetically similar centromeres. Genetics. 2015;200:1105–1116. doi: 10.1534/genetics.115.177360. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Schneider KL, Xie Z, Wolfgruber TK, Presting GG. Inbreeding drives maize centromere evolution. Proc. Natl Acad. Sci. USA. 2016;113:E987–E996. doi: 10.1073/pnas.1522008113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR16] 16.Schnable JC, Springer NM, Freeling M. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc. Natl Acad. Sci. USA. 2011;108:4069–4074. doi: 10.1073/pnas.1101368108. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.McHale L, Tan X, Koehl P, Michelmore RW. Plant NBS-LRR proteins: adaptable guards. Genome Biol. 2006;7:212. doi: 10.1186/gb-2006-7-4-212. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR18] 18.Fluhr R, Lampl N, Roberts TH. Serpin protease inhibitors in plant biology. Physiol. Plant. 2012;145:95–102. doi: 10.1111/j.1399-3054.2011.01540.x. [DOI] [PubMed] [Google Scholar]

[CR19] 19.McClintock B. The origin and behavior of mutable loci in maize. Proc. Natl Acad. Sci. USA. 1950;36:344–355. doi: 10.1073/pnas.36.6.344. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR20] 20.Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 2007;8:272–285. doi: 10.1038/nrg2072. [DOI] [PubMed] [Google Scholar]

[CR21] 21.SanMiguel P, et al. Nested retrotransposons in the intergenic regions of the maize genome. Science. 1996;274:765–768. doi: 10.1126/science.274.5288.765. [DOI] [PubMed] [Google Scholar]

[CR22] 22.Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A. Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell. 2005;17:343–360. doi: 10.1105/tpc.104.025627. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Sharma A, Schneider KL, Presting GG. Sustained retrotransposition is mediated by nucleotide deletions and interelement recombinations. Proc. Natl Acad. Sci. USA. 2008;105:15470–15474. doi: 10.1073/pnas.0805694105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Baucom RS, et al. Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome. PLoS Genet. 2009;5:e1000732. doi: 10.1371/journal.pgen.1000732. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Buckler ES, Gaut BS, McMullen MD. Molecular and functional diversity of maize. Curr. Opin. Plant Biol. 2006;9:172–176. doi: 10.1016/j.pbi.2006.01.013. [DOI] [PubMed] [Google Scholar]

[CR26] 26.Dooner HK, He L. Maize genome structure variation: interplay between retrotransposon polymorphisms and genic recombination. Plant Cell. 2008;20:249–258. doi: 10.1105/tpc.107.057596. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Saxena RK, Edwards D, Varshney RK. Structural variations in plant genomes. Brief. Funct. Genomics. 2014;13:296–307. doi: 10.1093/bfgp/elu016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.McMullen MD, et al. Genetic properties of the maize nested association mapping population. Science. 2009;325:737–740. doi: 10.1126/science.1174320. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Strable, J . & Scanlon, M. J. Maize (Zea mays): a model organism for basic and applied research in plant biology. Cold Spring Harb. Protoc.2009, pdb.emo132 (2009) [DOI] [PubMed]

[CR30] 30.Swanson-Wagner RA, et al. Pervasive gene content variation and copy number variation in maize and its undomesticated progenitor. Genome Res. 2010;20:1689–1699. doi: 10.1101/gr.109165.110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.VanBuren R, et al. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature. 2015;527:508–511. doi: 10.1038/nature15714. [DOI] [PubMed] [Google Scholar]

[CR32] 32.Pendleton M, et al. Assembly and diploid architecture of an individual human genome via single-molecule technologies. Nat. Methods. 2015;12:780–786. doi: 10.1038/nmeth.3454. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Berlin K, et al. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 2015;33:623–630. doi: 10.1038/nbt.3238. [DOI] [PubMed] [Google Scholar]

[CR34] 34.Chin CS, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2016;44:D67–D72. doi: 10.1093/nar/gkv1276. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Kurtz S, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. doi: 10.1186/gb-2004-5-2-r12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR37] 37.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR38] 38.Walker BJ, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Ellinghaus D, Kurtz S, Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics. 2008;9:18. doi: 10.1186/1471-2105-9-18. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Steinbiss S, Willhoeft U, Gremme G, Kurtz S. Fine-grained annotation and classification of de novo predicted LTR retrotransposons. Nucleic Acids Res. 2009;37:7002–7013. doi: 10.1093/nar/gkp759. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Wenke T, et al. Targeted identification of short interspersed nuclear element families shows their widespread existence and extreme heterogeneity in plant genomes. Plant Cell. 2011;23:3117–3128. doi: 10.1105/tpc.111.088682. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Rho M, Tang H. MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes. Nucleic Acids Res. 2009;37:e143. doi: 10.1093/nar/gkp752. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Han Y, Wessler SR. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 2010;38:e199. doi: 10.1093/nar/gkq862. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Xiong W, He L, Lai J, Dooner HK, Du C. HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc. Natl Acad. Sci. USA. 2014;111:10263–10268. doi: 10.1073/pnas.1410068111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR46] 46.Campbell MS, et al. MAKER-P: a tool kit for the rapid creation, management, and quality control of plant genome annotations. Plant Physiol. 2014;164:513–524. doi: 10.1104/pp.113.230144. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Soderlund C, et al. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. PLoS Genet. 2009;5:e1000740. doi: 10.1371/journal.pgen.1000740. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Law M, et al. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol. 2015;167:25–39. doi: 10.1104/pp.114.245027. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Mostovoy Y, et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods. 2016;13:587–590. doi: 10.1038/nmeth.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Cao H, et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience. 2014;3:34. doi: 10.1186/2047-217X-3-34. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Improved maize reference genome with single-molecule technologies

Yinping Jiao

Paul Peluso

Jinghua Shi

Tiffany Liang

Michelle C Stitzer

Bo Wang

Michael S Campbell

Joshua C Stein

Xuehong Wei

Chen-Shan Chin

Katherine Guill

Michael Regulski

Sunita Kumari

Andrew Olson

Jonathan Gent

Kevin L Schneider

Thomas K Wolfgruber

Michael R May

Nathan M Springer

Eric Antoniou

W Richard McCombie

Gernot G Presting

Michael McMullen

Jeffrey Ross-Ibarra

R Kelly Dawe

Alex Hastie

David R Rank

Doreen Ware

Supplementary information

A better map of the maize genome

Supplementary information

Abstract

Supplementary information

Main

Figure 1. Genome assembly layout.

Extended Data Figure 1. Summary of data generated for genome construction.

Table 1.

Extended Data Figure 2. Construction of pseudomolecules.

Extended Data Figure 3. Quality assessment and comparison of the assembly in centromere and telomere regions in maize B73 RefGen_v3 and v4.

Extended Data Figure 4. Details of the gene annotation of maize B73 RefGen_v4.

Extended Data Figure 5. Improvement of the annotation of alternative splicing and completeness of regulatory regions of maize RefGen_v4 genes.

Extended Data Figure 6. Comparative analysis of the maize B73 RefGen_v4 genes with other grasses.

Table 2.

Extended Data Table 3.

Figure 2. Phylogeny of maize and sorghum LTR retrotransposon families.

Figure 3. Structural variation from Ki11 and W22.

Extended Data Figure 7. Structural variation characterized from the Ki11 and W22 optical maps.

Methods

Whole-genome sequencing using SMRT technology

Construction of optical genome maps using the Irys system

De novo assembly of the genome sequencing data

Hybrid scaffold construction

Pseudomolecule construction

Further polishing of pseudomolecules

Annotation

Structural variation

Data availability

Supplementary information

Acknowledgements

Extended data figures and tables

Extended Data Table 1.

Extended Data Table 2.

PowerPoint slides

Source data

Author Contributions

Accession codes

Primary accessions

BioProject

NCBI Reference Sequence

Sequence Read Archive

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

Primary accessions

BioProject