Skip to main content
Genome Biology logoLink to Genome Biology
. 2026 Mar 11;27:140. doi: 10.1186/s13059-026-04028-8

Comparative centromere genomics reveals evolutionary divergence in Solanaceae genomes

Penglong Wan 1,#, Ming Hu 1,#, Hongyu Jin 1,#, Shuyuan Tang 1,#, Min Zhong 1, Jiaowen Cheng 1, Zhangsheng Zhu 1, Bihao Cao 1, Guoju Chen 1, Changming Chen 1, Chengjie Chen 2,, Jianwen Song 1,, Yi Liao 1,
PMCID: PMC13097598  PMID: 41814399

Abstract

Background

Centromeres are chromosomal loci epigenetically specified by the histone variant CENH3, where kinetochores assemble to ensure accurate chromosome segregation during cell division. Their repetitive and rapidly evolving DNA has long impeded large-scale characterization. Advances in long-read sequencing now enable complete genome assemblies across species and within populations, providing opportunities to investigate how centromeres evolve and diversify over timescales from thousands to millions of years.

Results

Here, we generate near-telomere-to-telomere genome assemblies for eggplant, African eggplant, and wild pepper. Using CENH3 ChIP–seq, we delineate functional centromeric chromatin in these assemblies and in the cultivated pepper ‘CA59’, tomato ‘Heinz 1706’, and a wild tomato accession. These genomes harbor satellite-free centromeres across all chromosomes except chromosome 3 in tomato and its wild progenitor. Instead, centromeres are primarily composed of Ty3/Gypsy LTR retrotransposons, whose clade composition, abundance, recent activity, and spatial distribution differ among species. Centromere size scales with genome size in Solanaceae crops. Comparisons of closely related genomes reveal frequent centromere positional shifts driven by pericentromeric inversions and centromere repositioning. Synteny decays more rapidly around centromeres, consistent with elevated breakage within CENH3-binding regions. Finally, centromere haplotypes vary within species, exemplified by multiple haplotypes on four African eggplant chromosomes.

Conclusions

These findings highlight the remarkable evolutionary dynamics and within-species variation of centromeres in Solanaceae crops, revealing distinct species-specific organizational patterns. This study positions Solanaceae as a promising model for comparative analyses of plant centromere evolution and provides a foundation for future research exploring how centromere variation contributes to phenotypic diversity.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13059-026-04028-8.

Background

Centromeres are specialized chromosomal regions where kinetochores assemble, mediating attachment to spindle microtubules and ensuring accurate chromosome segregation during cell division [13]. These regions are epigenetically marked by a centromere-specific histone H3 variant, known as CENP-A in animals, CENH3 in plants, and Cse4 in budding yeast [46]. While centromere function is conserved across eukaryotes, their size, structure, and genomic distribution vary widely among species [79]. The simplest centromeres, found in budding yeast, span as little as ~ 120 base pairs, occupy a single nucleosome, and are referred to as point centromeres [10]. More commonly, in most plants and animals, centromeres are confined to a single chromosomal region ranging from several kilobases to megabases in size, and are known as monocentric or regional centromeres [3]. In contrast, in organisms such as butterflies and peas, functional centromeric regions can extend over large portions of, or even the entire chromosome, forming holocentric or metapolycentric centromeres [9, 11, 12]. This evolutionary diversity and rapid turnover of centromeric DNA across species pose a puzzle, known as the “centromere paradox”, of how centromeres maintain their highly conserved and essential role in cell division [13]. Resolving this paradox may require comparative analyses across diverse taxa with varying centromere architectures. These analyses are essential for a comprehensive understanding of centromere biology and for advancing methods of centromere manipulation [1416].

In most eukaryotes, centromeric DNA is primarily composed of tandem repeats. For instance, human centromeres are largely made up of megabase-scale alpha-satellite arrays [17]. Satellite-based centromeres are widespread across plant and animal species [1822], although the satellite sequences themselves are often species-specific [23]. This prevalence has led to the hypothesis that satellites may be advantageous for centromere function [24]. Nevertheless, many species across the tree of life possess centromeres that lack satellite DNA [3, 25], and functional centromeres can form de novo at chromosomal sites without canonical satellite sequences, indicating that satellite DNA is not strictly required for centromere identity [2629]. In plants, centromeres often contain other repetitive elements, particularly long terminal repeat retrotransposons (LTR-RTs), some of which localize preferentially to centromeric regions [3033]. In some cases, centromeric retrotransposons can contribute to the emergence of new satellite repeats, and their amplification has been associated with centromeric chromatin spreading and centromere repositioning [24, 3436]. Although no single DNA sequence fully dictates centromere identity, centromeric chromatin is generally incompatible with transcription of coding genes, consistent with the low gene density typical of centromeric and pericentromeric regions and with evidence that nearby genes can influence CENH3 deposition [37, 38]. Overall, these observations highlight the versatility of centromeric DNA and suggest that the basis of evolutionary sequence preferences remains incompletely understood.

The genetic and epigenetic features of centromeres and their flanking regions are distinctive and can shape gene and genome evolution in ways that differ from the rest of the genome. First, homologous exchange (crossing over) is strongly suppressed in these regions [39, 40]. Second, centromeric chromatin and adjacent heterochromatin are generally less permissive for transcription [41]. Third, the low density of genes and other functional elements around centromeres may create a relatively relaxed genomic context that allows greater tolerance of structural and sequence variation [42, 43]. Moreover, despite low or absent crossing over, gene conversion and nonhomologous recombination occur frequently in centromeric regions, making centromeres potential hotspots for evolutionary innovation [33, 44, 45]. These properties offer a mechanistic basis for understanding the evolution of genes, chromosomal architecture, and karyotype diversity.

Centromeric regions have historically posed significant challenges for sequencing and assembly due to their repetitive nature and large size [46]. However, recent advancements in long-read sequencing technologies have made chromosome-scale genome assemblies, including telomere-to-telomere (T2T) assemblies, more cost-effective and feasible, especially at the population or genus level. This progress enables researchers to explore centromere sequences and structures, as well as their dynamics in a broader evolutionary context [22, 47, 48]. Recent population-level and pan-genomic analyses of centromeres in various model and important species, such as maize [49], soybean [50], wheat [46], rice [51, 52], and Arabidopsis [53], have provided insights into centromere variation, the frequency of centromere repositioning, and haplotype diversity. However, these species typically possess satellite-based centromeres, which are difficult to delineate using short-read ChIP-seq mapping accurately. This limitation hinders the comprehensive capture of evolutionary changes and population diversity of centromeres in these species.

The Solanaceae (nightshade) family comprises approximately 100 genera and 3,000–4,000 species and is often recognized as the third most economically important plant family to humans, after Fabaceae (e.g., soybean, bean, and pea) and Poaceae (e.g., wheat, rice, and maize) [54]. It includes several important agricultural and economic crops, such as potato (Solanum tuberosum), tomato (Solanum lycopersicum), eggplant (Solanum melongena), pepper (Capsicum annuum), and tobacco (Nicotiana tabacum). This family also contains ornamental plants (e.g., Petunia hybrida) and medicinal species in genera such as Datura, Atropa, and Mandragora [55]. Solanaceae has emerged as a promising system for genomic and functional studies of centromeres. First, centromeres across this family show striking diversity in sequence composition and structural organization, including both satellite-based and satellite-free types [24, 56, 57]. Second, extensive genomic resources are now available, including chromosome-scale assemblies for most key crop species [56, 5860], population-level genome assemblies for species such as potato [61] and tomato [62, 63], and taxa-scale genome assemblies, including for Solanum [6466]. Together, these resources provide a strong foundation for centromere genomics across multiple evolutionary timescales. Finally, several Solanaceae species, notably tomato and tobacco, are widely used as experimental model systems, facilitating functional interrogation of centromere biology.

In this study, we assembled nearly complete genomes for eggplant, African eggplant, and wild pepper (C. annuum var. glabriusculum ‘Chiltepin’) to enable comparative sequence analyses of centromeric regions across major crops in the Solanaceae family. Utilizing CENH3 ChIP-seq (chromatin immunoprecipitation sequencing), we defined functional centromeric chromatin within these assemblies, as well as in three additional Solanaceae genomes: the cultivated pepper line ‘CA59’, tomato ‘Heinz 1706’, and the wild tomato progenitor S. pimpinellifolium. By integrating publicly available data on potatoes and tobacco, we investigated the evolutionary dynamics and variation of functional centromeric regions across multiple evolutionary timescales within the Solanaceae. Furthermore, we analyzed CENH3 ChIP-seq data from seven African eggplant individuals, two tomato accessions, two eggplant accessions, and two pepper accessions to assess centromere diversity within species. With these resources, we aim to (1) characterize and compare functional centromeric sequences and organization in different Solanaceae species; (2) quantify the frequency of centromere movement and infer the evolutionary mechanisms behind it; (3) evaluate how centromeres contribute to genome structure and potentially influence karyotype evolution; and (4) assess the extent of centromere haplotype diversity within species.

Results

Genome assemblies of selected Solanaceae crops for comparative centromere analyses

To explore the evolutionary dynamics of functional centromeres in the Solanaceae, we focused on comparative analyses of centromeric regions across six major crops: tomato, pepper, eggplant, African eggplant, potato, and tobacco (Fig. 1A). Additionally, we included the wild progenitors of tomato and pepper, enabling comparisons across both closely related and more distantly related lineages. Chromosome-scale or complete genome assemblies are available for tomato (‘Heinz 1706’) [63], potato (‘DM1-3 516 R44’) [58, 67], tobacco (the LAB strain) [56], and pepper (‘G1-36576’) [57]. Functional centromeres have also been identified in the genome assemblies of potato, tobacco, and pepper using CENH3 ChIP-seq [5658, 67].

Fig. 1.

Fig. 1

Genome assemblies of major Solanaceae crops selected for comparative centromere genomics. A The phylogenetic relationships of eight selected Solanaceae species (see [64]), including six crops (tobacco, pepper, tomato, eggplant, African eggplant, and potato) and the wild progenitors of pepper and tomato, are shown. Representative leaf, flower, and fruit phenotypes of pepper, tomato, and wild tomato lines analyzed in this study are also displayed. Scale bars: 3 cm. B Hi-C maps of newly assembled chromosome-scale genomes for eggplant, African eggplant, and the wild relative of pepper. Representative leaf, flower, and fruit phenotypes are also shown. Scale bars: 3 cm. C Synteny plots of three pairs of closely related genomes: tomato vs. wild tomato, eggplant vs. African eggplant, and pepper vs. wild pepper. Two inter-chromosomal translocations were identified: one involving chromosomes 4 and 7 between eggplant and African eggplant, and the other involving chromosomes 1 and 8 between pepper and its wild progenitor, ‘Chiltepin 12’. D A global synteny map of the eight Solanaceae genomes based on orthologous genes

To broaden our comparative framework, we additionally assembled one genome each for S. aethiopicum (‘Sa01’), S. melongena (‘Sm01’), and C. annuum var. glabriusculum (‘Chiltepin 12’) using PacBio HiFi (high-fidelity) sequencing combined with Hi-C chromosome conformation capture (Fig. 1B). For these genomes, we obtained 41.71 Gb, 53.18 Gb, and 109.39 Gb of HiFi reads, with average read lengths of 20.3 kb, 17.0 kb, and 17.0 kb, respectively. These data correspond to approximately 37 ×, 45 ×, and 35 × coverage, based on their estimated genome sizes (Additional file 1: Table S1) [64, 68]. The resulting assemblies were highly contiguous and complete, with contig N50 values of 77.27 Mb, 57.28 Mb, and 240.86 Mb; Benchmarking Universal Single-Copy Orthologs (BUSCO) scores of 98.4%, 97.5%, and 98.3%; and only 13, 23, and 61 remaining sequence gaps, respectively (Additional file 1: Table S2).

We annotated genes and transposable elements in these new genomes, yielding gene counts and transposable element content comparable to those reported in previous studies [59, 69] (Additional file 1: Table S3). Full details of assembly and annotation procedures are provided in the Methods section. These near-complete genome assemblies offer a foundation for comparative centromere analyses across species and enable high-confidence global and local synteny comparisons across multiple evolutionary distances. Notably, we uncovered two interchromosomal translocations: one between chromosomes 4 and 7 in eggplant and African eggplant, and another between chromosomes 1 and 8 in pepper and its wild progenitor, Chiltepin (Fig. 1C, D).

Centromere size scales positively with genome size in Solanaceae crops

To accurately identify functional centromeric regions and measure their sizes across species, we developed an antibody targeting the full-length CENH3 protein from pepper (C. annuum) and performed ChIP–seq. In addition to the existing CENH3 ChIP–seq datasets for potato, tobacco, and pepper, we used this antibody to generate ChIP–seq data for six additional Solanaceae genomes: cultivated pepper (C. annuum ‘CA59’), wild pepper (C. annuum var. glabriusculum ‘Chiltepin 12’), eggplant (S. melongena ‘Sm01’), African eggplant (S. aethiopicum ‘Sa01’), tomato (S. lycopersicum ‘Heinz 1706’), and wild tomato (S. pimpinellifolium ‘LA1589’). We aligned ChIP–seq reads to their respective genome assemblies and detected clear primary CENH3-enriched peaks on every chromosome in all six genomes (Fig. 2A; Additional file 2: Figs. S1–S6). The pepper CENH3-binding profile generated here closely matched previous results, supporting the high specificity of the antibody (Additional file 2: Figs. S7-S8). To further validate the specificity of the pepper-derived antibody in eggplant and tomato, we also generated species-specific CENH3 antibodies. Using both the species-specific antibodies and the pepper-derived antibody, we produced ChIP–seq data for an additional eggplant accession (‘Sm02’) and tomato accession (‘JW23’). In both species, the pepper-derived and species-specific antibodies produced essentially identical CENH3-associated chromatin profiles (Additional file 2: Figs. S9–S10), indicating their strong concordance. This concordance is consistent with the high conservation of CENH3 amino acid sequences among these Solanaceae species (Fig. 2B).

Fig. 2.

Fig. 2

Delineating functional centromeric chromatin in six Solanaceae genomes using anti-CENH3 ChIP–seq and comparative analyses. A CENH3 ChIP–seq mapping profiles for chromosome 1 across six Solanaceae genomes; profiles for the remaining chromosomes are shown in Additional file 2: Fig. S1–S6. From top to bottom, each panel shows CENH3 ChIP–seq peaks; log-transformed CENH3 and input signals; the proportion of high-quality, uniquely mapped reads (MQ ≥ 30); the distribution of intact LTR retrotransposons within centromeric and flanking regions with estimated insertion times; and annotated protein-coding genes. Gray dashed rectangles highlight the defined centromeric chromatin. B Multiple alignment of CENH3 amino acid sequences across the Solanaceae species analyzed here, together with three distantly related model species (maize, rice, and Arabidopsis thaliana). C Proportion of uniquely mapped MQ ≥ 30 reads within centromeric regions (i.e., CENH3-binding domains) across species. D Estimated centromere sizes across Solanaceae genomes. E Correlation between centromere size and chromosome size across Solanaceae genomes. F Correlation between centromere size and genome size across Solanaceae genomes

We delineated centromere boundaries and quantified total CENH3-binding domain sizes using genome-wide profiles of log-transformed CENH3-to-input ChIP–seq signal intensity, with manual inspection applied when necessary. The estimated mean centromere size per chromosome in the six newly analyzed Solanaceae genomes were 1.56 Mb in African eggplant (S. aethiopicum; 1.26 Mb on chromosome 10 to 1.91 Mb on chromosome 7), 1.74 Mb in eggplant (S. melongena; 1.45 Mb on chromosome 5 to 2.11 Mb on chromosome 7), 3.01 Mb in wild pepper (C. annuum var. glabriusculum; 2.50 Mb on chromosome 12 to 3.77 Mb on chromosome 11), 2.43 Mb in cultivated pepper (C. annuum; 1.98 Mb on chromosome 2 to 2.87 Mb on chromosome 6), 1.44 Mb in wild tomato (S. pimpinellifolium; 0.71 Mb on chromosome 12 to 1.98 Mb on chromosome 9), and 1.55 Mb in cultivated tomato (S. lycopersicum; 1.26 Mb on chromosome 6 to 1.97 Mb on chromosome 4) (Additional file 1: Table S4). Within these functional centromeric domains, the percentage of uniquely mapped, high-quality CENH3 ChIP-seq reads (with a mapping quality of ≥ 30, or MQ30) varied significantly, ranging from 44.80% in African eggplant to 86.66% in wild pepper (Fig. 2C). These percentages are comparable to the background signal, which was calculated from input reads (Additional file 1: Table S4). This variation likely indicates substantial differences in centromeric repeat content among the species studied. Additionally, the high proportion of uniquely mapped reads suggests that a considerable portion of the functional centromeric regions across these six Solanaceae genomes comprises single- or low-copy sequences.

We also assessed whether longer chromosomes tend to harbor larger centromeres across Solanaceae by performing correlation analyses between centromere size and both chromosome size and genome size. We included potato and tobacco for comparison. Their centromere sizes were taken from previous studies [56, 58], then re-evaluated using the same approach applied here and confirmed by manual inspection (Additional file 2: Fig. S11). Unlike the six genomes analyzed in this study, potato and tobacco contain multiple satellite-dominated centromeres (six in potato and eleven in tobacco) [24, 56] that are composed largely of megabase-scale satellite arrays. We found that centromere size is relatively uniform among chromosomes in the six selected genomes, whereas it varies substantially among chromosomes in potato and tobacco, likely reflecting the prevalence of large satellite arrays that complicate precise centromere boundary definition (Fig. 2D). Using centromere size estimates from all eight Solanaceae genomes, we observed a positive correlation between centromere size and chromosome size (Fig. 2E), and between total centromere size and genome size (Fig. 2F), consistent with patterns reported in grasses [70].

Rapid evolutionary shifts in centromere position among Solanaceae species

Centromere movements are common during plant genome evolution and can occur through centromere repositioning, in which centromere activity shifts without substantial sequence loss, or through pericentromeric inversions that rearrange adjacent chromosomal segments. Such changes can alter chromosome structure and contribute to karyotype variation and species diversification [35, 38, 49, 50, 71, 72]. To assess the occurrence and frequency of these events in Solanaceae, we first generated synteny maps based on conserved orthologous sequences between species pairs and used them to evaluate shifts in centromere positions. In comparisons of distantly related species (e.g., eggplant versus tomato, eggplant versus potato, tomato versus potato, and tomato versus pepper), synteny around centromeres was often highly disrupted (Fig. 3A), making it challenging to reconstruct evolutionary changes in centromere chromosomal positions.

Fig. 3.

Fig. 3

Centromere synteny analysis across Solanaceae crops. A Genome-wide synteny plots between distantly related Solanaceae crops (eggplant versus tomato, potato versus tomato, potato versus eggplant, and tomato versus pepper). Centromere positions are marked by gray circles. B Synteny alignments between tomato and its wild progenitor reveal extensive chromosomal rearrangements within and around centromeric regions. CENH3 ChIP–seq peaks for each chromosome are shown above or below the corresponding chromosomes. Orange rectangles indicate 6-Mb regions centered on the CENH3-binding domain of each chromosome. Gray rectangles indicate likely spurious CENH3 peaks associated with satellite repeats. C Summary of centromere synteny states (conserved, repositioned, shifted by a single inversion, and shifted by successive inversions) across three closely related species pairs: tomato versus wild tomato, eggplant versus African eggplant, and pepper versus wild pepper. D Example of centromere position conservation on chromosome 6 between pepper and its wild progenitor (Chiltepin). Track order matches Fig. 2A. E Example of centromere repositioning on chromosome 5 between pepper and its wild progenitor. F Example of centromere movement associated with a pericentromeric inversion on chromosome 12 between tomato and its wild progenitor (S. pimpinellifolium). G Example of centromere movement associated with successive inversions on chromosome 10 between tomato and its wild progenitor

We therefore next assessed centromere synteny across closely related species pairs to examine these shifts over more recent evolutionary timescales and to clarify the underlying mechanisms. Between eggplant (S. melongena ‘Sm01’) and African eggplant (S. aethiopicum ‘Sa01’), centromere positions showed little conservation across homologous chromosomes (Additional file 2: Fig. S12). Between tomato (S. lycopersicum ‘Heinz 1706’) and its wild progenitor (S. pimpinellifolium ‘LA1589’), centromere repositioning occurred on at least three chromosomes (Chr2, Chr7, and Chr8), whereas pericentromeric inversions shifted centromeres on four others (Chr9, Chr10, Chr11, and Chr12) (Fig. 3B). The remaining centromeres retained their physical positions yet showed pronounced sequence divergence, including large insertions and deletions. Similarly, in pepper (C. annuum ‘CA59’) and its wild progenitor (C. annuum var. glabriusculum), centromere repositioning occurred on chromosomes 4 and 5, whereas inversion-associated shifts occurred on chromosomes 3, 7, 8, and 9 (Additional file 2: Fig. S7). These cases can be grouped into four recurring modes of centromere synteny (Fig. 3C): (1) physical conservation with partial loss of conserved sequence (Fig. 3D); (2) repositioning (Fig. 3E); (3) movement via a single pericentromeric inversion (Fig. 3F); and (4) movement likely through successive inversions (Fig. 3G). Together, these results indicate that centromere positions in Solanaceae can change rapidly, even between closely related species, driven primarily by frequent pericentromeric inversions and recurrent repositioning events.

CENH3-binding regions are hotspots for synteny breaks

It is commonly observed that synteny decays more rapidly around centromeres than along chromosome arms during plant genome evolution [38, 73]. This pattern may reflect an elevated rate of DNA breakage near or within functional centromeric domains. Studies in yeast and humans have shown that centromeric regions are particularly susceptible to breakage during both cell division and periods of quiescence [74, 75]. However, it remains unclear whether similar mechanisms operate in plants or, alternatively (and not mutually exclusively), whether selection shapes the accumulation of synteny breaks in centromeric regions. To address this, we analyzed the genomic distribution of synteny breaks inferred from large-scale, high-quality genome assemblies in the Solanaceae, focusing primarily on the genus Solanum (Fig. 4A) [6265]. We selected genomes with contig N50 > 5 Mb to improve breakpoint resolution and defined synteny breaks as breakpoints of inversions, translocations, and other complex chromosomal rearrangements > 20 kb. To capture patterns across evolutionary timescales, we compiled three genome sets: 46 tomato accessions [62, 63], 11 species closely related to tomato [65], and 28 more distantly related Solanum species from the major clade containing tomato [64, 66]. Aligning these genomes to the reference tomato genome ‘SL5’ identified 4,517, 14,030, and 31,632 nonredundant, high-confidence synteny breaks, respectively (Additional file 1: Table S5). To validate breakpoint calling, we randomly selected 60 synteny break events between SL5 and two wild tomato species (S. pimpinellifolium and S. galapagense). Eight events could not be evaluated due to low Hi-C signal quality, and 34 of the remaining 52 (65.4%) were clearly supported by Hi-C contact maps (Additional file 2: Fig. S13), indicating high reliability of synteny breaks inferred from these assemblies. Consistently, synteny blocks aligned to SL5 showed that synteny decays rapidly in centromeric and pericentromeric regions, even over only a few million years of divergence (Fig. 4B; Additional file 2: Fig. S14).

Fig. 4.

Fig. 4

Evolutionary patterns of synteny breaks in Solanum genomes. A Phylogenetic tree of 61 selected Solanaceae species with high-quality genome assemblies (contig N50 > 5 Mb), constructed using four-fold degenerate synonymous sites (4DTv). Among these, 51 species belong to the Solanum genus, which is divided into two major clades. A subtree comprising 11 wild relatives of S. lycopersicum (tomato) is shown below. B Multiple alignment of syntenic blocks for 28 Solanum species relative to S. lycopersicum (tomato), with each genome represented as a separate row. Purple, orange, and blue rectangles indicate synteny blocks inferred at different levels (top, secondary, and tertiary) from net files generated by the Minimap2–ChainNet–Netsynteny pipeline (see Methods), while gray and white rectangles represent genomic regions lacking synteny or containing sequence gaps, respectively. C Synteny block conservation and distribution of synteny breaks along chromosomes for the 28 Solanum genomes. The left Y-axis shows the number of genomes aligned for each chromosome segment window (orange dots), while the right Y-axis indicates the distribution of synteny breaks across 500 kb windows (deep blue bars). Chromosome 1 is shown; see Additional file 2: Fig. S15 for other chromosomes. The centromeric region is highlighted by a gray bar. D Same analysis as in (C) but for 11 closely related wild Solanum species relative to the tomato reference genome (SL5). E Same analysis as in (C) but for 46 diverse tomato accessions relative to the tomato reference genome (SL5). F Enrichment analysis of synteny breaks around CENH3-binding regions. Regions extend on both sides of the functional centromeres, with window sizes equal to the centromere size and stepped by half the centromere length for each chromosome. For each window, a permutation test with 10,000 simulations was performed to estimate the expected number of synteny breaks. Observed values are shown as orange dots, and blue lines indicate 99% confidence intervals. P-values from the permutation tests are provided for the CENH3-binding regions and flanking windows when statistically significant

Using synteny breaks from the three datasets, we first asked whether breaks accumulate preferentially in centromeric and pericentromeric regions (defined as 1 Mb on either side of the CENH3-binding domain) relative to the genome-wide background. As expected, all three datasets showed a broadly higher density of synteny breaks in these regions than the genome-wide average (Fig. 4C–E; Additional file 2: Fig. S15; Additional file 1: Table S6). We next tested whether this enrichment is centered specifically within CENH3-binding domains or instead reflects a broader elevation in the surrounding pericentromeric sequence. In the 46-tomato dataset, representing the most recent evolutionary events, synteny breaks were significantly enriched within CENH3-binding regions but not in the adjacent pericentromeric flanks (Fig. 4F). In the more divergent dataset of 11 closely related species, enrichment extended into the immediately flanking regions (Fig. 4F), consistent with progressive accumulation of breaks in the broader centromeric neighborhood as divergence increases. In contrast, we did not detect enrichment in the most divergent dataset (Additional file 2: Fig. S16), likely because extensive sequence turnover around centromeres limits reliable alignments and reduces breakpoint detectability. Together, these results suggest that synteny breaks preferentially arise within CENH3-binding domains and, over longer timescales, become distributed across broader centromeric and pericentromeric regions.

Finally, we tested whether selection influences the occurrence of synteny breaks within or near centromeric regions (± 1 Mb around CENH3-binding domains) using the 46 tomato accessions. If selection favors particular breaks, they should be enriched at higher allele frequencies; conversely, purifying selection should shift them toward lower frequencies. The mean allele frequency of synteny breaks in centromeric/pericentromeric regions was 13.74%, compared to 14.76% genome-wide, a difference that was not significant (two-sample t-test, P = 0.223). We also found no enrichment of high-frequency synteny breaks (present in ≥ 10 of 46 accessions) in centromeric/pericentromeric regions (n = 27; 9.6%) relative to the genome-wide proportion (n = 477; 14.1%) (hypergeometric test, P = 0.993). These results suggest that, at least in tomato, selection has limited influence on the overall enrichment pattern of synteny breaks within or near centromeric regions.

Rapid sequence evolution of functional centromeres in Solanaceae crops

We next analyzed and compared DNA sequences within functional centromeres across major Solanaceae crop genomes. Specifically, we focused on tandem repeats, transposable elements (TEs), and coding genes within the CENH3-binding regions of each genome. Functional centromeric regions were predominantly repetitive, with LTR retrotransposons (LTR-RTs) comprising 57.69% of centromeric sequence in tomato and up to 84.14% in African eggplant (Additional file 1: Table S7). Across the eight genomes surveyed, we identified 3,383 intact LTR-RTs in potato and 29,894 in tobacco, of which 50–204 reside within functional centromeric regions (Additional file 1: Table S8). Phylogenetic analyses of all intact LTR-RTs in each genome (for example, eggplant in Fig. 5A and African eggplant in Fig. 5B) showed that centromere-associated elements are predominantly from the Ty3/Gypsy family and are enriched in centromeric and pericentromeric regions relative to chromosome arms. Pairwise similarity and phylogenetic comparisons of intact centromeric LTR-RTs across the eight genomes revealed declining conservation with increasing evolutionary distance (Fig. 5C), suggesting largely lineage-specific expansion and divergence. Notably, unlike potato and tobacco, which harbor multiple megabase-scale satellite-array centromeres [24, 56], the six newly analyzed genomes are largely satellite-depleted (< 7.0%), except for chromosome 3 in tomato and wild tomato (Additional file 2: Fig. S17). Below, we introduce the general sequence features of functional centromeres in each newly generated Solanaceae genome.

Fig. 5.

Fig. 5

Sequence composition and comparative analysis of functional centromeric regions across major Solanaceae crops. A Phylogenetic relationships of genome-wide intact LTR-retrotransposons in the eggplant genome. Blue circles indicate elements originating from the functional centromeric regions (n = 75) and predominantly belong to the Tekay subfamily of Ty3/Gypsy. B Phylogenetic relationships of genome-wide intact LTR-retrotransposons in African eggplant. Blue circles indicate elements originating from the functional centromeric regions (n = 288) and predominantly belong to the Tekay subfamily of Ty3/Gypsy. C Pairwise sequence similarity of intact LTR-retrotransposons from the functional centromeric regions of eight Solanaceae genomes. A phylogenetic tree of these elements is shown below, with the number of intact LTR-retrotransposons identified in the centromeric region of each genome indicated in parentheses following the genome name. D Verification of centromere-specific repeats in the centromeric regions of eggplant, African eggplant, tomato, and wild tomato by fluorescence in situ hybridization (FISH). LTR sequences from the most abundant centromeric LTR-retrotransposons in each species, along with a 211-bp tandem repeat in tomato, were used as probes to hybridize somatic metaphase chromosomes derived from root tips. E The centromere of chromosome 3 in tomato and wild tomato is predominantly composed of satellite repeats. F Protein-coding genes are frequently located in CENH3-free islands within centromeric regions. An example from centromere 9 in pepper is shown. G Distinct distribution patterns of young intact LTR-retrotransposons in the centromeric regions of African eggplant and eggplant. In eggplant, LTR-retrotransposons accumulate immediately adjacent to the functional centromeric regions, whereas in African eggplant, they are directly inserted within the functional centromeric regions

In tomato (S. lycopersicum) and its wild progenitor S. pimpinellifolium, functional centromeric regions are predominantly composed of Ty3/Gypsy LTR-RTs, which account for 49.1% of centromeric sequence content in tomato and 53.1% in wild tomato (Additional file 1: Table S7). These proportions are substantially higher than the genome-wide levels in each species (~ 29.3% and ~ 31.9%, respectively) (Additional file 1: Table S3). However, most of these LTR-RTs are incomplete fragments; only 102 and 95 intact elements were identified within the functional centromeric regions of tomato and wild tomato, respectively, with the majority (80/102 in tomato and 76/95 in wild tomato) belonging to the Tekay subfamily (Fig. 5C and Additional file 1: Table S8). The enrichment of Tekay elements in functional centromeric regions was further supported by fluorescence in situ hybridization (FISH) on metaphase chromosomes from root tips (Fig. 5D). Specifically, unlike other chromosomes, the functional centromere on chromosome 3 (CEN3) contains long arrays of a 211-bp satellite repeat, occupying ~ 668 kb (50.27%) of the centromeric sequence in S. lycopersicum and ~ 406 kb (47.49%) in S. pimpinellifolium (Fig. 5E; Additional file 1: Table S7). Other repeat sequences are interspersed within these satellite arrays. This satellite-based organization of CEN3 is conserved between the two species (Additional file 2: Fig. S18). Large arrays of the same 211-bp satellite (> 100 kb) were also detected on seven other chromosomes, spanning ~ 300 kb to ~ 4 Mb (Additional file 2: Fig. S19). Several of these arrays lie near or flank functional centromeres and can generate secondary, non-centromeric CENH3 ChIP–seq peaks (Fig. 3B). The broad abundance and chromosomal distribution of the 211-bp satellite were further supported by FISH experiments (Fig. 5D). Many genes were annotated within the functional centromeric regions, including 193 in S. lycopersicum and 233 in S. pimpinellifolium. Most of these genes have functional annotations in at least one database (NR, Swiss-Prot, eggNOG, GO, KEGG, or PFAM) (Additional file 1: Table S9). Although these genes are located within the functional centromeric chromatin, they typically reside in subregions depleted of CENH3 binding, a pattern also observed in other plant centromeres, including rice [37], maize [49], and potato [24] (Additional file 2: Fig. S5–S6). For genes overlapping regions with more substantial CENH3 enrichment, we found that they tend to show no or lower expression (Additional file 2: Fig. S20), based on public RNA-seq data [76, 77]. Together, the coexistence of satellite-free, gene-containing centromeres with a satellite-based CEN3 suggests ongoing and heterogeneous centromere evolution in tomato and its wild progenitor.

Similar to tomato, the functional centromeric regions of pepper and its wild progenitor are also dominated by Gypsy-type LTR retrotransposons, which comprise 52.54–73.5% of centromeric sequence content across chromosomes. Only 55 and 74 intact elements were identified within the functional centromeric regions of pepper and wild pepper, respectively, accounting for just 1.36% and 1.21% of total functional centromeric sequence (Additional file 1: Tables S7–S8), suggesting relatively low recent LTR-RT activity during centromere evolution. Most of these intact elements (27/55 in pepper and 38/74 in wild pepper) belong to the CRM subfamily (Additional file 1: Table S8). Notably, satellite repeats were rare (< 1.35%) within functional centromeric regions in both genomes (Additional file 1: Table S7). A small number of genes were annotated within centromeric regions in pepper (n = 82) and wild pepper (n = 100), but they are confined mainly to CENH3-depleted subregions, similar to the pattern observed in tomato (Fig. 5F; Additional file 2: Figs. S3–S4). Genes overlapping strong CENH3 signals tended to show little or no expression across tissues (Additional file 2: Fig. S20), based on published RNA-seq datasets [57, 59, 68].

The functional centromeric regions of eggplant and African eggplant show the highest abundance of Gypsy-type retrotransposons among the newly surveyed genomes, comprising 73.97% and 75.90% of centromeric sequence content, respectively (Additional file 1: Table S7). They also contain more intact elements: African eggplant harbors 309 intact elements within functional centromeric regions, accounting for 17.08% of the total centromeric sequence, whereas eggplant contains 154 intact elements, comprising 7.79% (Additional file 1: Table S8). Interestingly, we observed contrasting spatial distributions of young LTR-RTs between the two species: in African eggplant, young LTR-RTs preferentially insert within CENH3-binding domains, whereas in eggplant, they are more frequently inserted in regions flanking these domains, with clear examples on chromosomes 1, 3, 6, and 9 (Fig. 5G; Additional file 2: Figs. S1–S2). This pattern suggests that many secondary (non-centromeric) CENH3 ChIP–seq peaks near functional centromeres are the result of LTR-RT sequences similar to those enriched within centromeric chromatin (Additional file 2: Fig. S21). Moreover, young LTR insertions within CENH3-binding domains often coincide with local CENH3-depleted subregions, a pattern observed on nearly all chromosomes in African eggplant. Together, these results suggest distinct roles of LTR retrotransposons in shaping centromere organization and dynamics in the two species. Genes are also present within centromeric regions (n = 58 in eggplant; n = 22 in African eggplant), but they are consistently confined to CENH3-depleted subregions. RNA-seq analyses of both previously published datasets [78] and new data generated in this study show that genes overlapping CENH3-enriched chromatin exhibit low or undetectable expression, reinforcing a shared relationship between gene content and functional centromere architecture across Solanaceae (Additional file 1: Table S9; Additional file 2: Figs. S1–S2 and Fig. S20).

Intraspecific centromere diversity in Solanaceae crops

Given the extensive divergence of functional centromeric chromatin observed in interspecific comparisons, we also sought to assess its diversity among individuals within Solanaceae species. For eggplant, tomato, and pepper, we compared CENH3 profiling data between two divergent lines in each species. Between two pepper lines (‘CA59’ and ‘G1-36576’), we observed distinct CENH3 patterns on several chromosomes: putative large-scale deletions/insertions in CEN01 and CEN10, spreading into flanking regions in CEN03 and CEN05 (Additional file 2: Fig. S8), and a large pericentromeric inversion was detected on CEN07 (Additional file 2: Fig. S7B). Between two eggplant accessions (‘Sm01’ and ‘Sm02’), centromere patterns were generally consistent, but slight spreading into flanking regions was common, especially at CEN01, CEN03, CEN06, CEN10, and CEN12 (Additional file 2: Fig. S9). Between tomato accessions ‘Heinz 1706’ and ‘JW23’, we observed two large-scale centromere repositioning events on chromosomes 1 and 2, as well as a marked contraction or expansion of the centromeric region on chromosome 12 (Additional file 2: Fig. S10).

For S. aethiopicum, we performed CENH3 ChIP-seq on six additional genotypes that have adapted well to South China after introduction from Africa (Fig. 6A). Phylogenetic analysis based on SNP polymorphism showed that all genotypes examined here belong to the ‘Gilo’ and ‘Aculeatum’ group, two of the four major groups of S. aethiopicum (Fig. 6B) [66, 79]. We aligned CENH3 ChIP-seq reads to the ‘Sa01’ reference and compared CENH3 loading profiles among accessions; mapping quality was comparable across accessions (Additional file 1: Table S10). For most chromosomes (8 of 12), we detected no noticeable differences (Additional file 2: Fig. S22). However, we observed apparent variation in CENH3 loading on four chromosomes (Chr1, Chr3, Chr6, and Chr11). Even with only seven accessions, we identified at least three, four, three, and two distinct centromere haplotypes on these chromosomes, respectively (Fig. 6C–F). A centromere repositioning has occurred on chromosome 3, with centromeric chromatin shifted by ~ 1 Mb from its original location (Fig. 6D). On the other three chromosomes, variation in centromeric chromatin patterns likely reflects large insertion–deletion polymorphisms and possibly pericentromeric inversions or additional centromere repositioning events, which will require validation using high-quality genome assemblies. Furthermore, these centromere changes appear to occur largely independently across chromosomes, as the phylogenetic relationships based on the centromere haplotypes are not always consistent across chromosomes (Fig. 6C–F). Together, these results reveal substantial within-species polymorphism in centromere positioning across major Solanaceae crops.

Fig. 6.

Fig. 6

Extensive centromere haplotype diversity revealed among just seven African eggplant individuals. A Fruit and flower characteristics of six S. aethiopicum accessions. B Phylogenetic tree of 84 diverse S. aethiopicum accessions and wild relatives, including the seven investigated in this study and 77 from previous studies. CF Distinct CENH3-binding profiles across the centromeres of chromosome 1 (C), chromosome 3 (D), chromosome 6 (E), and chromosome 11 (F). Results for the remaining chromosomes are shown in Additional file 2: Fig. S22. Haplotypes are distinguished by different shading colors. The clustering topology of centromeric haplotypes for each chromosome is shown on the left

Discussion

This study presents a comprehensive comparative sequence analysis of functional centromeres across six crop species in the family Solanaceae. Using nearly complete genome assemblies, CENH3 ChIP-seq, and FISH, we establish a framework to investigate the evolutionary dynamics of centromeres in Solanaceae across multiple timescales. Our investigation examines divergence both between distantly related species and closely related species, as well as intraspecific variation. Our findings reveal rapid sequence turnover and species-specific patterns in the evolution of functional centromeres among these species. Coupled with the extensive genomic resources already available at both the population level (e.g., pepper [57, 59, 80, 81] and tomato [63]) and the genus level (e.g., Solanum [64, 66]), these results highlight Solanaceae as a highly tractable system for dissecting centromere evolution and centromere-associated phenotypic effects in plants.

The increasing availability of T2T genomes has revealed that the DNA sequences underlying functional centromeric chromatin are more diverse than previously understood [48]. While satellite DNA typically dominates plant centromeres, satellite-free centromeres are common among the Solanaceae species examined. For instance, in pepper, eggplant, and African eggplant, all centromeres are satellite-free. In contrast, tomato, potato [24], and tobacco [56] exhibit both satellite-free and satellite-dominant centromeres. This satellite-free centromere structure has been proposed as an evolutionary intermediate stage that may eventually transition into satellite-based forms [24, 25]. However, given that satellite-free centromeres appear to be more prevalent than once recognized, are found throughout the Solanaceae phylogeny, and also exist in other plant [82] and animal taxa [83], it remains uncertain which centromere configurations are evolutionarily more stable. Accordingly, whether satellite-based centromeres are universally favored over evolutionary time remains an open question. To address this, we will need broader phylogenomic comparisons across genera to infer long-term evolutionary trends, together with within-species population analyses to test signatures of adaptation, controlled crosses to assess transmission and segregation across generations, and functional assays to evaluate the direct consequences of different centromere architectures or haplotypes.

LTR retrotransposons (LTR-RTs) are major sequence components of plant genomes. Some of these retrotransposons are localized explicitly to centromeres, such as centromeric retrotransposons in rice (CRR) [20], maize (CRM) [21], and wheat (CRW) [35], as well as Celine in Populus species [84] and ATHILA [22, 85, 86] and Tal1 [30] in Arabidopsis. These elements preferentially occupy centromere cores, likely due to mechanisms that recognize CENH3 chromatin [33]. Our results discover significant differences in the types, activities, and abundances of LTR-RTs within functional centromeres across Solanaceae species. The majority belong to the Ty3/Gypsy family, but different subfamilies are predominant in specific species, such as CRM in pepper, Tekay in eggplant and tomato, and Galadriel in tobacco. The proportion of centromeric sequence occupied by intact LTR-RT elements varies markedly, ranging from only 1.21% in wild pepper to 17.08% in African eggplant. This variation indicates diverse levels of centromeric retrotransposon activity among Solanaceae species, though this activity is relatively lower compared to larger plant genomes, such as maize [36, 87] and wheat [46]. The particularly low centromeric retrotransposon activity in pepper may reflect the unique evolutionary history of the genus Capsicum. Furthermore, the distribution patterns of LTR-RTs within functional centromeric regions differ among species. In African eggplant, young LTR-RTs are more concentrated within CENH3-containing chromatin, whereas in eggplant, they are more prevalent in the flanking areas. Together, these patterns suggest that LTR-RTs may play species-specific roles in shaping centromere organization and evolution.

A consistent observation across Solanaceae genomes in this study is the scarcity of coding genes in centromeric regions. When genes overlap with centromeric chromatin, they are always found in areas depleted of CENH3. This pattern has also been observed in many other plant species, including rice [37], potato [24], maize [49], and wheat [35], collectively suggesting an antagonistic relationship between transcription and centromere formation [38]. Coding genes, or potentially other transcripts, located within or near centromeric chromatin may create CENH3-free islands. These islands serve as physical barriers that impede the spread of centromeric chromatin, thereby stabilizing CENH3 deposition and contributing to the organization of centromeres during evolution [38]. Centromeric regions are heavily embedded in heterochromatin, which is generally unfavorable for transcription. However, centromeric transcripts derived from satellite arrays, long terminal repeat retrotransposons (LTR-RTs), and other non-coding repeats are commonly recognized for their functional roles in kinetochore assembly [88, 89]. Therefore, the impact of centromeric chromatin on transcription may differ between coding and non-coding sequences.

Our results provide evidence that chromosomal breakpoints are more likely to occur in regions where CENH3 binds, as indicated by synteny breaks over evolutionary timescales. We have demonstrated that these breaks are significantly enriched in the CENH3-binding areas but not in nearby pericentromeric regions, particularly when examining recent evolutionary events within species. This observation supports the notion that centromeric repeats are inherently unstable and more prone to recombination [90]. It aligns with previous reports that indicate centromeres can act as hotspots for chromosome breakage in animal cells [75, 91]. Overall, these findings help explain why synteny tends to decay rapidly around centromeres in plant genomes. Furthermore, when we analyze synteny breaks in more divergent genomes, we observe enrichment in adjacent pericentromeric regions. This likely reflects frequent shifts in centromere positions or the gradual expansion of CENH3-binding domains into neighboring areas over evolutionary timescales. Although we did not detect widespread signals of selection acting on synteny breaks within centromeres, it remains possible that specific chromosomal rearrangements involving centromeric chromatin could alter centromere strength, thereby promoting 'centromere drive' [92].

Within-species variation in centromere positioning has been observed across multiple Solanaceae crops, including African eggplant, eggplant, tomato, and pepper, although current sampling remains limited. This suggests substantial diversity in centromere location within species, consistent with patterns reported in other plants, including maize [49], soybean [50], Arabidopsis [53], and wheat [46]. These observations motivate several directions for future work. First, expanded sampling will be needed to characterize centromere haplotype diversity within each species and to test whether particular haplotypes are associated with adaptive or domestication-related traits, potentially shaping their frequencies in populations. Second, it will be important to evaluate whether centromeric haplotypes behave as “supergene-like” regions with extended linkage that influences agronomic traits. Third, controlled crosses can determine whether centromere haplotypes differ in transmission across generations, including the potential for segregation distortion associated with strong versus weak centromeres. Addressing these questions will clarify the evolutionary and functional roles of centromere variation, refine our understanding of crop genome diversity, and may open opportunities for centromere-informed breeding strategies.

Conclusions

This study characterizes functional centromeric regions in six newly generated Solanaceae genomes and highlights the prevalence of satellite-free centromeres in this family. Satellite-free centromeres were observed across the Solanaceae crops analyzed here, including eggplant, African eggplant, pepper, and tomato from newly generated genomes, as well as potato and tobacco from published genome resources. These satellite-free centromeres are enriched for Ty3/Gypsy LTR retrotransposons, which vary in type, abundance, recent activity, and spatial distribution across species. We also provide evidence that chromosomal breakpoints are more likely to occur within CENH3-binding regions, based on synteny-break analyses across multiple evolutionary timescales. In addition, we detect substantial within-species diversity in centromere positioning in many Solanaceae crops. Collectively, these findings position Solanaceae as a promising model for exploring centromere evolution and its potential phenotypic implications.

Methods

Plant materials and DNA isolation

We selected one highly inbred accession each from S. melongena (‘Sm01’) and S. aethiopicum (‘Sa01’) for de novo sequencing. Additionally, we improved the genome assembly of C. annuum var. glabriusculum (‘Chiltepin 12’), which had previously been sequenced using a short-read strategy [68]. All of these lines, along with several other African eggplant, eggplant, tomato, and pepper genotypes used in this study, were grown under standard field conditions in Guangzhou, China (23.1291° N, 113.2644° E). Fresh young leaves were harvested, flash-frozen in liquid nitrogen, and stored at –80 °C until genomic DNA extraction. High-molecular-weight DNA was isolated using a modified CTAB method [93] and evaluated for quality and integrity with a NanoDrop spectrophotometer, Qubit fluorometer, and agarose gel electrophoresis. We further assessed DNA size distribution using pulsed-field gel electrophoresis (PFGE), ensuring that only samples with high-quality DNA were used for sequencing.

PacBio HiFi sequencing, assembling, and annotation

To achieve high-quality genome assemblies, we conducted PacBio HiFi (High-Fidelity) sequencing for each species. Sequencing libraries were prepared from high-molecular-weight DNA using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences). We selected DNA fragments exceeding 15 kb and sequenced them using the PacBio Sequel II platform. Additionally, we generated in situ Hi-C data (chromatin interaction information) to aid in scaffolding, resulting in coverages of approximately 94.51 ×, 100.53 ×, and 2.28 × for the eggplant ‘Sm01,’ African eggplant ‘Sa01,’ and the pepper progenitor ‘Chiltepin,’ respectively (Additional file 1: Table S1). Hi-C libraries were prepared using the Phase Genomics Proximo Hi-C Kit and sequenced on the Illumina NovaSeq X Plus platform, yielding 150-bp paired-end reads.

We produced 53.18, 41.71, and 109.39 Gb PacBio HiFi long reads for ‘Sm01’, ‘Sa01’, and ‘Chiltepin’, respectively. These data were used for the initial genome assembly with Hifiasm (v0.20.0) [94], using the parameters “-l 2 -n 4,” yielding 655,766 and 523 contigs for each species, respectively (Additional file 1: Table S2). We obtained chromosome-level assemblies by scaffolding these contigs with in situ Hi-C data using the 3D-DNA pipeline [95] with parameters “–editor-coarse-resolution 100,000 –editor-fine-resolution 1000”. Chromatin interaction patterns were visualized using HiCexplorer (v3.7.5) [96]. We evaluated assembly quality using BUSCO (v5.7.1) [97] against the embryophyta_odb10 dataset (1,614 genes) to assess gene completeness, while contiguity metrics (e.g., contig N50 and L50) were calculated with QUAST (v5.0.2) [98].

We annotated transposable elements (TEs) using EDTA (v2.2.1) (Extensive de-novo TE Annotator) [99] with parameters “–u 2.35e-8 –anno 1 –sensitive 1 –evaluate 1”. Gene models were predicted through an integrative strategy combining homology, transcriptome, and ab initio evidence. Initially, the genomes were soft-masked with RepeatMasker (v4.1.5) using TE libraries from EDTA. For homology-based annotation, we removed redundant proteins from existing resources of the target species using CD-HIT (v0.13) [100]. The clean dataset was then mapped to the masked genomes using Miniprot (v0.13) [101] to generate homology-supported gene models. For transcriptome-supported annotation, RNA-seq reads were aligned using HISAT2 (v2.2.1) [102], transcript structures reconstructed with StringTie (v2.2.1) [103], and coding regions identified by TransDecoder (v5.7.1). For ab initio gene prediction, we utilized the non-redundant protein dataset and transcript alignment results to train gene models with GeneMark-ETP and AUGUSTUS within BRAKER3 (v3.0.8) [104]. Finally, we integrated the three annotation sources using EVidenceModeler (v2.1.0) [105], and gene models were further refined with PASA (Program to Assemble Spliced Alignments) (v2.5.3) [106] to adjust UTRs and exon boundaries. The completeness of the gene models was assessed using BUSCO (v5.7.1) based on the embryophyta_odb10 dataset.

Generation of anti-CENH3 antibodies

Polyclonal antibodies targeting the centromere-specific histone CENH3 were developed from eggplant, pepper, and tomato in rabbits. The full-length coding sequence (CDS) of CENH3 from each species was synthesized and expressed to produce recombinant protein antigens. New Zealand White rabbits were immunized with the purified proteins, and antibody titers were monitored using indirect ELISA. Once adequate titers were achieved, antisera were collected and affinity-purified. The resulting antibodies were evaluated by ELISA and validated through Western blot (WB) analysis. Antibodies that met the quality criteria were then utilized in chromatin immunoprecipitation (ChIP) experiments. To examine their cross-species performance, the pepper-derived antibody was compared with the eggplant- and tomato-derived antibodies in ChIP-seq assays on eggplant and tomato samples, producing consistent centromeric profiles. Notably, the pepper-derived antibody demonstrated stronger and cleaner enrichment; therefore, only the full-length pepper CENH3 antibody was employed for subsequent analyses in this study.

CENH3 ChIP-seq experiment

The ChIP-seq experiment was performed according to a previously described protocol [107]. Briefly, approximately 2 g of young leaves were used for nuclear extraction. To prepare the immunoprecipitated DNA, all buffers contained freshly prepared 1 × complete protease inhibitors (Roche, 11,873,580,001). Frozen crosslinked cells were thawed on ice and lysed sequentially in lysis buffer I (50 mM HEPES–KOH pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40, 0.25% Triton X-100) and lysis buffer II (10 mM Tris–HCl pH 8.0, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA). After each lysis step, samples were rotated at 4 °C for 10 min and centrifuged at 1350 rcf for 5 min. The pellets were then resuspended in sonication buffer (20 mM Tris–HCl pH 8.0, 150 mM NaCl, 2 mM EDTA pH 8.0, 0.1% SDS, 1% Triton X-100) and sonicated for 10 cycles (30 s on, 60 s off on ice, 18–21 W) using a Misonix 3000 sonicator. Lysates were cleared by centrifugation at 16,000 rcf for 10 min at 4 °C. A 50 µL input sample was reserved, and the remaining lysate was incubated overnight at 4 °C with antibody-bound magnetic beads (pepper-derived anti-CENH3 antibody) to enrich DNA fragments. Beads were sequentially washed twice with wash buffers A, B, C, D, and TE buffer. DNA was eluted in 200 µL elution buffer (50 mM Tris–HCl pH 8.0, 10 mM EDTA, 1% SDS) at 65 °C for 1 h, followed by overnight crosslink reversal at 65 °C. RNA was degraded with 2.5 µL RNase A (37 °C, 2 h), and proteins were removed with 10 µL proteinase K (55 °C, 2 h). DNA was purified by phenol:chloroform:isoamyl alcohol extraction, ethanol precipitation, and resuspended in 50 µL TE buffer. For ChIP-seq library preparation, purified DNA was processed using the Illumina TruSeq DNA Sample Preparation v2 kit. Size selection (200–400 bp) was performed using a Pippin Prep system (Sage Science, 2% gel cassette). Libraries were quantified via qPCR (KAPA Biosystems Library Quantification Kit) and sequenced on the Illumina NovaSeq 6000 platform, generating 150 bp paired-end reads.

CENH3 protein sequence analysis

To examine the evolutionary relationships and divergence of CENH3 across Solanaceae and other plant lineages, we collected CENH3 protein sequences from 12 species (Additional file 1: Table S11): Zea mays, Oryza sativa, Arabidopsis thaliana, Nicotiana benthamiana, Nicotiana tabacum, Capsicum annuum var. glabriusculum, Capsicum annuum, Solanum aethiopicum, Solanum melongena, Solanum pimpinellifolium, Solanum lycopersicum, and Solanum tuberosum. Multiple sequence alignment was performed using ClustalW [108], and a phylogenetic tree was constructed in MEGA (v12) [109] using the minimum evolution method with the Poisson correction model for amino-acid substitutions, with 1,000 bootstrap replicates.

Analysis of LTR retrotransposons and sequence similarity in functional centromeric regions

We identified intact LTR-RT elements using EDTA (v2.2.1) and further classified them into subfamilies and clades with TEsorter (v1.4.7) [110]. Phylogenetic trees of Gypsy- and Copia-type retrotransposons were constructed using MEGA (v12) based on the reverse transcriptase (RT) domain sequences, employing the maximum-likelihood method with the Jones-Taylor-Thornton (JTT) model of amino acid substitution. Pairwise sequence similarity of all intact LTR-RTs within the centromeric regions across eight Solanaceae genomes was calculated using TBtools-II [111], and the resulting distance matrix was used to construct a phylogenetic tree and generate heatmaps. Regional centromeric sequence similarity was assessed using StainedGlass (v8.0) [112] with parameters “cooler_window = 100, window = 2000, and mm_f = 10,000”. Tandem repeats were identified using TRASH (v2) [113] with parameters “-m 10,000 and -i 200”.

ChIP-seq data analysis

Raw CENH3 ChIP-seq reads were processed using fastp (v0.23.4) [114] with default parameters to eliminate adapters and low-quality reads. The cleaned reads were subsequently aligned to the corresponding reference genome using Bowtie2 (v2.5.3) with standard settings [115]. The alignment results were analyzed with deepTools (v3.5.5) [116] utilizing the “–normalizeUsing CPM –scaleFactorsMethod None” option to calculate log₂(ChIP/input) ratios. CENH3-enriched peaks were identified through MACS3 (v3.0.1) [117] with the parameters “–broad –min-length 5000”. To determine the precise CENH3 loading regions and sizes, peaks and mapping results were visualized and manually inspected in the Integrative Genomics Viewer (IGV v2.18.4) [118]. The resulting data were then integrated with gene annotations, transposable elements (TEs), and TE insertion times in IGV for figure generation. The centromere coordinates and sizes for potato and tobacco were obtained from previous studies [56, 58] and were verified using the same procedure with manual inspection.

Fluorescence in situ hybridization (FISH)

FISH was performed according to previously described protocols [119, 120]. Root tips (2–3 cm) from each species were treated with nitrous oxide at 10 atm for two hours and fixed in ice-cold 90% acetic acid for 10 min. After three washes in distilled water, the meristematic regions were excised and digested in an enzyme solution containing 1% pectinase and 2% cellulase (prepared in 1 × citrate buffer) at 37 °C for 60 min. The digested tissue was washed with 70% ethanol and gently macerated in 100% acetic acid to prepare cell suspensions, which were dropped onto clean glass slides and air-dried. Chromosome preparations were UV crosslinked (125 mJ/cm2) and stored at –20 °C until use. DNA probes were amplified from long terminal repeats (LTRs, ~ 2000 bp) of abundant, species-specific LTR retrotransposons located in centromeric regions of each species. In addition, tandem repeats from the centromeric regions of chromosome 3 in tomato were also amplified and used as probes. The primers used for probe preparation are listed in Additional file 1: Table S12. All probes were labeled with 488–5-dUTP. Fluorescently labeled DNA probes were diluted to ~ 40 ng/μL in hybridization buffer (2 × SSC + 1 × TE), denatured together with chromosomal DNA at 95 °C for 5 min, and hybridized overnight at 55 °C in a humid chamber. After hybridization, slides were washed in 2 × SSC to remove excess probes, counterstained with DAPI, and examined under a fluorescence microscope. Images were processed using ZEN 2009 and Adobe Photoshop.

Genome-wide synteny construction

Pairwise genome-wide synteny was constructed based on syntenic and orthologous sequences. To identify these sequences, pairwise whole-genome alignments between a reference genome (i.e., SL5) and each query genome were performed using minimap2 (v2.2.27) [121]. The resulting PAF (Pairwise Alignment Format) files were converted to AXT format and processed with the axtChain/chainNet/netSyntenic pipeline [122] to produce netSyntenic files. These files were then filtered using tools from the SVGAP pipeline [123] according to genome complexity and divergence levels. The filtered files were used to generate pairwise alignment files in MAF format, with each block serving as a syntenic and orthologous marker for constructing synteny maps.

Identification and analysis of synteny breaks

Synteny breaks were identified following our previous protocol [124]. Specifically, netSyntenic files generated from the axtChain/chainNet/netSyntenic pipeline were used to identify synteny breakpoints, including those associated with inversions and translocations. Breakpoints were defined based on the start and end coordinates of the “top,” “syn,” “NonSyn,” and “inv” segments in the filtered netSyntenic files. To reduce assembly artifacts, only segments larger than 20 kb and not located near the terminal regions of contigs or chromosomes were considered. To verify the accuracy of synteny breaks, we examined Hi-C contact patterns by mapping Hi-C reads to both the query genome and the reference genome and assessing the consistency of the resulting interaction maps. To quantify the distribution of synteny breakpoints across chromosomes, each chromosome was divided into 500 kb windows. Breakpoint coordinates were extended by 10 kb on both sides, and the number of breakpoints within each window was then counted to assess their enrichment in specific genomic regions.

To quantify the distribution of synteny breakpoints around CENH3-binding regions, 15 windows, each equal in size to the corresponding CENH3-binding region, were extended on both sides with a step size of 50% of the region length for each chromosome. Breakpoint occurrences were then summed across all chromosomes for each window to generate a vector, where each element represents one of the 30 flanking windows or the central CENH3-binding window. Additionally, for each window, we generated 10,000 random breakpoint sets as background controls for the permutation test.

Genome and population data processing and phylogenetic tree construction

High-quality genome assemblies (Contig N50 > 5 Mb) from the Solanaceae family [64, 66] were selected to represent the phylogenetic relationships among the major crops investigated. These genomes were aligned to the tomato SL5 reference using minimap2 (v2.2.27), and single-nucleotide polymorphisms (SNPs) were identified using SVGAP. In addition, resequencing data from 66 African eggplant accessions [79] and seven accessions (Sa01–Sa07) generated in this study were used for SNP calling following the standard GATK pipeline. Genome assemblies for 11 additional African eggplant accessions and its wild progenitor [66] were also used for SNP calling and included in the phylogenetic analysis. Fourfold-degenerate (4D) sites were then extracted and used to construct a phylogenetic tree in MEGA (v12) using the neighbor-joining method with the p-distance model of nucleotide substitution, with 1,000 bootstrap replicates.

Supplementary Information

13059_2026_4028_MOESM1_ESM.xlsx (623KB, xlsx)

Additional file 1: Tables S1-S12. Table S1. Summary of sequencing data generated in this study. Table S2. Summary statistics and quality metrics for the new genome assemblies. Table S3. Summary of repeat and gene annotation. Table S4. Genomic coordinates of CENH3 ChIP–seq–defined functional centromeric regions across Solanaceae genomes. Table S5. Numbers of synteny breakpoints identified from three genome datasets relative to the tomato reference ‘SL5’. Table S6. Enrichment of synteny breakpoints in centromeric and pericentromeric regions relative to the genome-wide background. Table S7. Sequence composition within functional centromeric regions across the analyzed Solanaceae genomes. Table S8. Classification and summary of intact LTR-retrotransposonsacross the analyzed Solanaceae genomes. Table S9. Functional annotation of genes in functional centromeric regions. Table S10. Mapping quality of CENH3 ChIP-seq reads. Table S11. Sources and accession IDs of CENH3 protein sequences from 12 plant species. Table S12. Primers used for FISH probe preparation.

13059_2026_4028_MOESM2_ESM.pdf (9.7MB, pdf)

Additional file 2: Figs. S1-S22. Fig. S1: CENH3 ChIP–seq mapping profile of African eggplant. Fig. S2: CENH3 ChIP-seq mapping profile of eggplant. Fig. S3: CENH3 ChIP-seq mapping profile of wild pepper. Fig. S4: CENH3 ChIP-seq mapping profile of pepper. Fig. S5: CENH3 ChIP-seq mapping profile of wild tomato. Fig. S6: CENH3 ChIP–seq mapping profile of tomato. Fig. S7: Synteny maps of chromosomes Chr01–Chr06 among wild pepper ‘Chiltepin 12’, cultivated pepper ‘CA59’, and the T2T assembly ‘G1-36,576’. Fig. S8: Comparison of CENH3-binding profiles between two pepper accessions. Fig. S9: Comparison of CENH3 ChIP–seq mapping profiles in eggplantbetween two divergent accessions. Fig. S10: Comparison of CENH3 ChIP–seq mapping profiles in tomatobetween two divergent accessions. Fig. S11: CENH3 ChIP-seq mapping profile of tobacco. Fig. S12: Genome-wide synteny and CENH3 ChIP–seq mapping profiles between African eggplantand eggplant. Fig. S13: Examples of synteny breakpoint verification using Hi-C contact maps. Fig. S14: Alignment of synteny blocks from 28 distantly related Solanum species relative to tomato. Fig. S15: Conservation of syntenic sequence and distribution of synteny breaks along chromosomes inferred from alignments between tomato ‘SL5’ and three genome sets spanning different divergence levels. Fig. S16: Enrichment analysis of synteny breaks around CENH3-binding regions. Fig. S17: Sequence composition of the functional centromeric regions across eight Solanaceae genomes. Fig. S18: Pairwise sequence similarity between the 6-Mb pericentromeric regions of tomatoand wild tomato Solanum pimpinellifolium. Fig. S19: Distribution of large arrays of 211-bp satellite repeats in the tomato genome. Fig. S20: Relationship between coding-gene expression and CENH3-binding intensity. Fig. S21: Pairwise sequence similarity around the functional centromeres of African eggplant. Fig. S22: Centromere diversity across seven African eggplant accessions.

Acknowledgements

We thank Dr. Qin You from the Vegetable Research Institute, Guangdong Academy of Agricultural Sciences for her valuable discussions on the project; Dr. Zhengkun Qiu from South China Agricultural University for kindly providing the germplasm resources of S. aethiopicum; and the members of the Liao lab for their helpful discussions and assistance with the project.

Peer review information

Andrew Cosgrove and Wenjing She were the primary editors of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.

Authors’ contributions

Y.L. and J.S. conceived and designed the project; J.S., J.C., Changming C., Z.Z., and P.W. collected and planted accessions of tomato, pepper, and eggplant. P.W., M.H., S.T., Chengjie C., and Y.L. conducted the analysis; Y.L., M.H., P.W., M.Z., J.C., and Z.Z. collected the data; Y.L., P.W., and M.H. wrote the original manuscript; Y.L., M.H., P.W., H.J., J.S., B.C., G.C., Chengjie C., and Z.Z. revised the manuscript. All authors read and approved the final version of the manuscript.

Funding

This work was supported by grants from the Guangdong Basic and Applied Basic Research Foundation (2024A1515010362, 2024A1515010470, and 2024A1515010403), the National Natural Science Foundation of China (32570277, 32302535, 32172564, U22A20497, and U21A20230), the Research Start-up Funding from South China Agricultural University to Yi Liao, Key-Area Research and Development Program of Guangdong Province (2022B0202080001), the Central Public-interest Scientific Institution Basal Research Fund for the Chinese Academy of Tropical Agricultural Sciences (1630032024026), and the Science and Technology Plan Projects of Guangzhou (2024A04J4361).

Data availability

The PacBio HiFi reads, Hi-C data, CENH3 ChIP–seq data, and RNA-seq data generated in this study have been deposited in the Genome Sequence Archive (GSA) at the National Genomics Data Center (NGDC) under accession number CRA035506 [125]. The genome assembly and annotation files for African eggplant (‘Sa01’), eggplant (‘Sm01’) and the wild pepper (‘Chiltepin’) are available at Zenodo (https://doi.org/10.5281/zenodo.18006120) [126]. The previous published genome assemblies and corresponding raw sequencing data for C. annuum (CaT2T, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA962192/ [57] and CA59, https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA788020 [59]), Nicotiana benthamiana (NbT2T, https://ngdc.cncb.ac.cn/gwh/Assembly/86283/show) [56], wild and cultivated tomato accessions (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA809001) [60], the potato line ‘DM 1–3 516 R44’ line (https://spuddb.uga.edu/dm_v6_1_download.shtml) [58], and additional Solanum (www.solpangenomics.com) [66] and Solanaceae genomes (https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA010759) [64] used in this study were downloaded from public repositories. Code used in the study is available at Github, (https://github.com/Wan9299/CENH3) [127] under the MIT license. A version of the resource code has also been archived in Zenodo, (https://doi.org/10.5281/zenodo.18006120) [126].

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Penglong Wan, Ming Hu, Hongyu Jin and Shuyuan Tang contributed equally to this work.

Contributor Information

Chengjie Chen, Email: ccj0410@gmail.com.

Jianwen Song, Email: songjianwen200@scau.edu.cn.

Yi Liao, Email: yiliao@scau.edu.cn.

References

  • 1.Cleveland DW, Mao Y, Sullivan KF. Centromeres and kinetochores: from epigenetics to mitotic checkpoint signaling. Cell. 2003;112:407–21. [DOI] [PubMed] [Google Scholar]
  • 2.Fukagawa T, Earnshaw WC. The centromere: chromatin foundation for the kinetochore machinery. Dev Cell. 2014;30:496–508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Naish M, Henderson IR. The structure, function, and evolution of plant centromeres. Genome Res. 2024;34:161–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Henikoff S, Dalal Y. Centromeric chromatin: what makes it unique? Curr Opin Genet Dev. 2005;15:177–84. [DOI] [PubMed] [Google Scholar]
  • 5.Meluh PB, Koshland D. Budding yeast centromere composition and assembly as revealed by in vivo cross-linking. Genes Dev. 1997;11:3401–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Talbert PB, Masuelli R, Tyagi AP, Comai L, Henikoff S. Centromeric localization and adaptive evolution of an Arabidopsis histone H3 variant. Plant Cell. 2002;14:1053–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Wong CYY, Ling YH, Mak JKH, Zhu J, Yuen KWY. Lessons from the extremes: epigenetic and genetic regulation in point monocentromere and holocentromere establishment on artificial chromosomes. Exp Cell Res. 2020;390:111974. [DOI] [PubMed] [Google Scholar]
  • 8.Hofstatter PG, Thangavel G, Lux T, Neumann P, Vondrak T, Novak P, et al. Repeat-based holocentromeres influence genome architecture and karyotype evolution. Cell. 2022;185:3153-68.e18. [DOI] [PubMed] [Google Scholar]
  • 9.Kuo Y-T, Câmara AS, Schubert V, Neumann P, Macas J, Melzer M, et al. Holocentromeres can consist of merely a few megabase-sized satellite arrays. Nat Commun. 2023;14:3502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Furuyama S, Biggins S. Centromere identity is specified by a single centromeric nucleosome in budding yeast. Proc Natl Acad Sci U S A. 2007;104:14706–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Macas J, Ávila Robledillo L, Kreplak J, Novák P, Koblížková A, Vrbová I, et al. Assembly of the 81.6 Mb centromere of pea chromosome 6 elucidates the structure and evolution of metapolycentric chromosomes. PLoS Genet. 2023;19:e1010633. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Kuo Y-T, Schubert V, Marques A, Schubert I, Houben A. Centromere diversity: how different repeat-based holocentromeres may have evolved. BioEssays. 2024;46:e2400013. [DOI] [PubMed] [Google Scholar]
  • 13.Henikoff S, Ahmad K, Malik HS. The centromere paradox: stable inheritance with rapidly evolving DNA. Science. 2001;293:1098–102. [DOI] [PubMed] [Google Scholar]
  • 14.Zhou J, Liu Y, Guo X, Birchler JA, Han F, Su H. Centromeres: from chromosome biology to biotechnology applications and synthetic genomes in plants. Plant Biotechnol J. 2022;20:2051–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Puchta H, Houben A. Plant chromosome engineering - past, present and future. New Phytol. 2024;241:541–52. [DOI] [PubMed] [Google Scholar]
  • 16.Liu Y, Liu Q, Yi C, Liu C, Shi Q, Wang M, et al. Past innovations and future possibilities in plant chromosome engineering. Plant Biotechnol J. 2024;23:695–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Altemose N, Logsdon GA, Bzikadze AV, Sidhwani P, Langley SA, Caldas GV, et al. Complete genomic and epigenetic maps of human centromeres. Science. 2022;376:eabl4178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kyriacou E, Heun P. Centromere structure and function: lessons from Drosophila. Genetics. 2023;225:iyad170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Packiaraj J, Thakur J. DNA satellite and chromatin organization at mouse centromeres and pericentromeres. Genome Biol. 2024;25:52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Cheng Z, Dong F, Langdon T, Ouyang S, Buell CR, Gu M, et al. Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell. 2002;14:1691–704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Jin W, Melo JR, Nagaki K, Talbert PB, Henikoff S, Dawe RK, et al. Maize centromeres: organization and functional adaptation in the genetic background of oat. Plant Cell. 2004;16:571–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Naish M, Alonge M, Wlodzimierz P, Tock AJ, Abramson BW, Schmücker A, et al. The genetic and epigenetic landscape of the centromeres. Science. 2021;374:eabi7489. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Melters DP, Bradnam KR, Young HA, Telis N, May MR, Ruby JG, et al. Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution. Genome Biol. 2013;14:R10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gong Z, Wu Y, Koblízková A, Torres GA, Wang K, Iovene M, et al. Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell. 2012;24:3559–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Nergadze SG, Piras FM, Gamba R, Corbo M, Cerutti F, McCarter JGW, et al. Birth, evolution, and transmission of satellite-free mammalian centromeric domains. Genome Res. 2018;28:789–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Liu Y, Su H, Pang J, Gao Z, Wang X-J, Birchler JA, et al. Sequential de novo centromere formation and inactivation on a chromosomal fragment in maize. Proc Natl Acad Sci U S A. 2015;112:E1263–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Liu Y, Su H, Zhang J, Shi L, Liu Y, Zhang B, et al. Rapid birth or death of centromeres on fragmented chromosomes in maize. Plant Cell. 2020;32:3113–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lomiento M, Jiang Z, D’Addabbo P, Eichler EE, Rocchi M. Evolutionary-new centromeres preferentially emerge within gene deserts. Genome Biol. 2008;9:R173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Palladino J, Chavan A, Sposato A, Mason TD, Mellone BG. Targeted de novo centromere formation in Drosophila reveals plasticity and maintenance potential of CENP-A chromatin. Dev Cell. 2020;53:129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Tsukahara S, Kawabe A, Kobayashi A, Ito T, Aizu T, Shin-i T, et al. Centromere-targeted de novo integrations of an LTR retrotransposon of Arabidopsis lyrata. Genes Dev. 2012;26:705–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Gao D, Gill N, Kim H-R, Walling JG, Zhang W, Fan C, et al. A lineage-specific centromere retrotransposon in Oryza brachyantha. Plant J. 2009;60:820–31. [DOI] [PubMed] [Google Scholar]
  • 32.Sharma A, Presting GG. Evolution of centromeric retrotransposons in grasses. Genome Biol Evol. 2014;6:1335–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tsukahara S, Bousios A, Perez-Roman E, Yamaguchi S, Leduque B, Nakano A, et al. Centrophilic retrotransposon integration via CENH3 chromatin in Arabidopsis. Nature. 2025;637:744–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Sharma A, Wolfgruber TK, Presting GG. Tandem repeats derived from centromeric retrotransposons. BMC Genomics. 2013;14:142. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhao J, Xie Y, Kong C, Lu Z, Jia H, Ma Z, et al. Centromere repositioning and shifts in wheat evolution. Plant Commun. 2023;4:100556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wolfgruber TK, Sharma A, Schneider KL, Albert PS, Koo D-H, Shi J, et al. Maize centromere structure and evolution: sequence analysis of centromeres 2 and 5 reveals dynamic loci shaped primarily by retrotransposons. PLoS Genet. 2009;5:e1000743. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Yan H, Talbert PB, Lee H-R, Jett J, Henikoff S, Chen F, et al. Intergenic locations of rice centromeric chromatin. PLoS Biol. 2008;6:e286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Liao Y, Zhang X, Li B, Liu T, Chen J, Bai Z, et al. Comparison of Oryza sativa and Oryza brachyantha genomes reveals selection-driven gene escape from the centromeric regions. Plant Cell. 2018;30:1729–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Ellermeier C, Higuchi EC, Phadnis N, Holm L, Geelhood JL, Thon G, et al. RNAi and heterochromatin repress centromeric meiotic recombination. Proc Natl Acad Sci U S A. 2010;107:8701–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Barra V, Fachinetti D. The dark side of centromeres: types, causes and consequences of structural abnormalities implicating centromeric DNA. Nat Commun. 2018;9:4340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Pidoux AL, Allshire RC. The role of heterochromatin in centromere function. Philos Trans R Soc Lond B Biol Sci. 2005;360:569–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Wilkinson MJ, McLay K, Kainer D, Elphinstone C, Dillon NL, Webb M, et al. Centromeres are hotspots for chromosomal inversions and breeding traits in mango. New Phytol. 2025;245:899–913. [DOI] [PubMed] [Google Scholar]
  • 43.Kirkpatrick M. How and why chromosome inversions evolve. PLoS Biol. 2010;8:e1000501. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Shi J, Wolf SE, Burke JM, Presting GG, Ross-Ibarra J, Dawe RK. Widespread gene conversion in centromere cores. PLoS Biol. 2010;8:e1000327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Thakur J, Sanyal K. Efficient neocentromere formation is suppressed by gene conversion to maintain centromere function at native physical chromosomal loci in Candida albicans. Genome Res. 2013;23:638–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ma H, Ding W, Chen Y, Zhou J, Chen W, Lan C, et al. Centromere plasticity with evolutionary conservation and divergence uncovered by wheat 10+ genomes. Mol Biol Evol. 2023;40:msad176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Logsdon GA, Rozanski AN, Ryabov F, Potapova T, Shepelev VA, Catacchio CR, et al. The variation and evolution of complete human centromeres. Nature. 2024;629:136–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Miga KH. Centromere studies in the era of “telomere-to-telomere” genomics. Exp Cell Res. 2020;394:112127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Schneider KL, Xie Z, Wolfgruber TK, Presting GG. Inbreeding drives maize centromere evolution. Proc Natl Acad Sci USA. 2016;113:E987–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Liu Y, Yi C, Fan C, Liu Q, Liu S, Shen L, et al. Pan-centromere reveals widespread centromere repositioning of soybean genomes. Proc Natl Acad Sci USA. 2023;120:e2310177120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Qin P, Lu H, Du H, Wang H, Chen W, Chen Z, et al. Pan-genome analysis of 33 genetically diverse rice accessions reveals hidden genomic variations. Cell. 2021;184:3542-58.e16. [DOI] [PubMed] [Google Scholar]
  • 52.Lv Y, Liu C, Li X, Wang Y, He H, He W, et al. A centromere map based on super pan-genome highlights the structure and function of rice centromeres. J Integr Plant Biol. 2024;66:196–207. [DOI] [PubMed] [Google Scholar]
  • 53.Wlodzimierz P, Rabanal FA, Burns R, Naish M, Primetis E, Scott A, et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature. 2023;618:557–65. [DOI] [PubMed] [Google Scholar]
  • 54.Knapp S. Tobacco to tomatoes: a phylogenetic perspective on fruit diversity in the Solanaceae. J Exp Bot. 2002;53:2001–22. [DOI] [PubMed] [Google Scholar]
  • 55.Bombarely A, Moser M, Amrad A, Bao M, Bapaume L, Barry CS, et al. Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida. Nat Plants. 2016;2:16074. [DOI] [PubMed] [Google Scholar]
  • 56.Chen W, Yan M, Chen S, Sun J, Wang J, Meng D, et al. The complete genome assembly of Nicotiana benthamiana reveals the genetic and epigenetic landscape of centromeres. Nat Plants. 2024;10:1928–43. [DOI] [PubMed] [Google Scholar]
  • 57.Chen W, Wang X, Sun J, Wang X, Zhu Z, Ayhan DH, et al. Two telomere-to-telomere gapless genomes reveal insights into Capsicum evolution and capsaicinoid biosynthesis. Nat Commun. 2024;15:4295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Pham GM, Hamilton JP, Wood JC, Burke JT, Zhao H, Vaillancourt B, et al. Construction of a chromosome-scale long-read reference genome assembly for potato. Gigascience. 2020;9(9):giaa100. [DOI] [PMC free article] [PubMed]
  • 59.Liao Y, Wang J, Zhu Z, Liu Y, Chen J, Zhou Y, et al. The 3D architecture of the pepper genome and its relationship to function and evolution. Nat Commun. 2022;13:3479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Li H, Yang X, Shang Y, Zhang Z, Huang S. Vegetable biology and breeding in the genomics era. Sci China Life Sci. 2023;66:226–50. [DOI] [PubMed] [Google Scholar]
  • 61.Tang D, Jia Y, Zhang J, Li H, Cheng L, Wang P, et al. Genome evolution and diversity of wild and cultivated potatoes. Nature. 2022;606:535–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, et al. Major impacts of widespread structural variation on gene expression and crop improvement in tomato. Cell. 2020;182:145-61.e23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Zhou Y, Zhang Z, Bao Z, Li H, Lyu Y, Zan Y, et al. Graph pangenome captures missing heritability and empowers tomato breeding. Nature. 2022;606:527–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Wu Y, Li D, Hu Y, Li H, Ramstein GP, Zhou S, et al. Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding. Cell. 2023;186:2313-28.e15. [DOI] [PubMed] [Google Scholar]
  • 65.Li N, He Q, Wang J, Wang B, Zhao J, Huang S, et al. Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species. Nat Genet. 2023;55:852–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Benoit M, Jenike KM, Satterlee JW, Ramakrishnan S, Gentile I, Hendelman A, et al. Solanum pan-genetics reveals paralogues as contingencies in crop engineering. Nature. 2025;640:135–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Yang X, Zhang L, Guo X, Xu J, Zhang K, Yang Y, et al. The gap-free potato genome assembly reveals large tandem gene clusters of agronomical importance in highly repeated genomic regions. Mol Plant. 2023;16:314–7. [DOI] [PubMed] [Google Scholar]
  • 68.Qin C, Yu C, Shen Y, Fang X, Chen L, Min J, et al. Whole-genome sequencing of cultivated and wild peppers provides insights into Capsicum domestication and specialization. Proc Natl Acad Sci U S A. 2014;111:5135–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Wei Q, Wang J, Wang W, Hu T, Hu H, Bao C. A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant. Hortic Res. 2020;7:153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Zhang H, Dawe RK. Total centromere size and genome size are strongly correlated in ten grass species. Chromosome Res. 2012;20:403–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Han Y, Zhang Z, Liu C, Liu J, Huang S, Jiang J, et al. Centromere repositioning in cucurbit species: implication of the genomic impact from centromere activation and inactivation. Proc Natl Acad Sci U S A. 2009;106:14937–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Mandáková T, Pouch M, Brock JR, Al-Shehbaz IA, Lysak MA. Origin and evolution of diploid and allopolyploid genomes were accompanied by chromosome shattering. Plant Cell. 2019;31:2596–612. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Tran TD, Cao HX, Jovtchev G, Neumann P, Novák P, Fojtová M, et al. Centromere and telomere sequence alterations reflect the rapid genome evolution within the carnivorous plant genus Genlisea. Plant J. 2015;84:1087–99. [DOI] [PubMed] [Google Scholar]
  • 74.Blitzblau HG, Bell GW, Rodriguez J, Bell SP, Hochwagen A. Mapping of meiotic single-stranded DNA reveals double-stranded-break hotspots near centromeres and telomeres. Curr Biol. 2007;17:2003–12. [DOI] [PubMed] [Google Scholar]
  • 75.Saayman X, Graham E, Nathan WJ, Nussenzweig A, Esashi F. Centromeres as universal hotspots of DNA breakage, driving RAD51-mediated recombination during quiescence. Mol Cell. 2023;83:523-38.e7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Moyle LC, Wu M, Gibson MJS. Reproductive proteins evolve faster than non-reproductive proteins among species. Front Plant Sci. 2021;12:635990. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Su X, Wang B, Geng X, Du Y, Yang Q, Liang B, et al. A high-continuity and annotated tomato reference genome. BMC Genomics. 2021;22:898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Barchi L, Pietrella M, Venturini L, Minio A, Toppino L, Acquadro A, et al. A chromosome-anchored eggplant genome sequence reveals key events in Solanaceae evolution. Sci Rep. 2019;9:11769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Song B, Song Y, Fu Y, Kizito EB, Kamenya SN, Kabod PN, et al. Draft genome sequence of Solanum aethiopicum provides insights into disease resistance, drought tolerance, and the evolution of the genome. Gigascience. 2019;8:giz115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Zhang K, Wang X, Chen S, Liu Y, Zhang L, Yang X, et al. The gap-free assembly of pepper genome reveals transposable-element-driven expansion and rapid evolution of pericentromeres. Plant Commun. 2025;6:101177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Zhang K, Yu H, Zhang L, Cao Y, Li X, Mei Y, et al. Transposon proliferation drives genome architecture and regulatory evolution in wild and domesticated peppers. Nat Plants. 2025;11:359–75. [DOI] [PubMed] [Google Scholar]
  • 82.Su H, Liu Y, Liu C, Shi Q, Huang Y, Han F. Centromere satellite repeats have undergone rapid changes in polyploid wheat subgenomes. Plant Cell. 2019;31:2035–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Cappelletti E, Piras FM, Sola L, Santagostino M, Abdelgadir WA, Raimondi E, et al. Robertsonian fusion and centromere repositioning contributed to the formation of satellite-free centromeres during the evolution of Zebras. Mol Biol Evol. 2022;39:msac162. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Xin H, Wang Y, Zhang W, Bao Y, Neumann P, Ning Y, et al. Celine, a long interspersed nuclear element retrotransposon, colonizes in the centromeres of poplar chromosomes. Plant Physiol. 2024;195:2787–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Pélissier T, Tutois S, Deragon JM, Tourmente S, Genestier S, Picard G. Athila, a new retroelement from Arabidopsis thaliana. Plant Mol Biol. 1995;29:441–52. [DOI] [PubMed] [Google Scholar]
  • 86.Shimada A, Cahn J, Ernst E, Lynn J, Grimanelli D, Henderson I, et al. Retrotransposon addiction promotes centromere function via epigenetically activated small RNAs. Nat Plants. 2024;10:1304–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Ou S, Scheben A, Collins T, Qiu Y, Seetharam AS, Menard CC, et al. Differences in activity and stability drive transposable element variation in tropical and temperate maize. Genome Res. 2024;34:1140–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Bury L, Moodie B, Ly J, McKay LS, Miga KH, Cheeseman IM. Alpha-satellite RNA transcripts are repressed by centromere-nucleolus associations. Elife. 2020;9:e59770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Ramakrishnan Chandra J, Kalidass M, Demidov D, Dabravolski SA, Lermontova I. The role of centromeric repeats and transcripts in kinetochore assembly and function. Plant J. 2024;118:982–96. [DOI] [PubMed] [Google Scholar]
  • 90.Ma J, Bennetzen JL. Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice. Proc Natl Acad Sci USA. 2006;103:383–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Guerrero AA, Gamero MC, Trachana V, Fütterer A, Pacios-Bras C, Díaz-Concha NP, et al. Centromere-localized breaks indicate the generation of DNA damage by the mitotic spindle. Proc Natl Acad Sci USA. 2010;107:4159–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Talbert P, Henikoff S. Centromere drive: chromatin conflict in meiosis. Curr Opin Genet Dev. 2022;77:102005. [DOI] [PubMed] [Google Scholar]
  • 93.Porebski S, Bailey LG, Baum BR. Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol Biol Rep. 1997;15:8–15. [Google Scholar]
  • 94.Cheng H, Asri M, Lucas J, Koren S, Li H. Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph. Nat Methods. 2024;21:967–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, et al. De novo assembly of the genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Wolff J, Backofen R, Grüning B. Loop detection using Hi-C data with HiCExplorer. Gigascience. 2022;11:giac061. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol Biol Evol. 2021;38:4647–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018;34:i142–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Ou S, Su W, Liao Y, Chougule K, Ware D, Peterson T, et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 2019;20:275. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Li W, Jaroszewski L, Godzik A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17:282–3. [DOI] [PubMed] [Google Scholar]
  • 101.Li H. Protein-to-genome alignment with miniprot. Bioinformatics. 2023;39:btad014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37:907–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103.Kovaka S, Zimin AV, Pertea GM, Razaghi R, Salzberg SL, Pertea M. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 2019;20:278. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Gabriel L, Brůna T, Hoff KJ, Ebel M, Lomsadze A, Borodovsky M, et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS, and TSEBRA. Genome Res. 2024;34:769–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105.Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 2008;9:R7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK Jr, Hannick LI. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 2003;31:5654–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Lee TI, Johnstone SE, Young RA. Chromatin immunoprecipitation and microarray-based analysis of protein location. Nat Protoc. 2006;1:729–48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108.Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Kumar S, Stecher G, Suleski M, Sanderford M, Sharma S, Tamura K. MEGA12: molecular evolutionary genetic analysis version 12 for adaptive and green computing. Mol Biol Evol. 2024;41:msae263. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Zhang R-G, Li G-Y, Wang X-L, Dainat J, Wang Z-X, Ou S, et al. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic Res. 2022;9:uhac017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Chen C, Wu Y, Li J, Wang X, Zeng Z, Xu J, et al. TBtools-II: A “one for all, all for one” bioinformatics platform for biological big-data mining. Mol Plant. 2023;16:1733–42. [DOI] [PubMed] [Google Scholar]
  • 112.Vollger MR, Kerpedjiev P, Phillippy AM, Eichler EE. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics. 2022;38:2049–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Wlodzimierz P, Hong M, Henderson IR. TRASH: tandem repeat annotation and structural hierarchy. Bioinformatics. 2023;39:btad308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Chen S, Zhou Y, Chen Y, Gu J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014;42:W187–91. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9:R137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118.Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14:178–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Liu Y, Su H, Liu Y, Zhang J, Dong Q, Birchler JA, et al. Cohesion and centromere activity are required for phosphorylation of histone H3 in maize. Plant J. 2017;92:1121–31. [DOI] [PubMed] [Google Scholar]
  • 120.Liu Y, Liu Q, Su H, Liu K, Xiao X, Li W, et al. Genome-wide mapping reveals R-loops associated with centromeric repeats in maize. Genome Res. 2021;31:1409–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution’s cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003;100:11484–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Hu M, Wan P, Chen C, Tang S, Chen J, Wang L, et al. Accurate, scalable structural variant genotyping in complex genomes at population scales. Mol Biol Evol. 2025;42:msaf180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Liao Y, Zhang X, Chakraborty M, Emerson JJ. Topologically associating domains and their role in the evolution of genome structure and function in Drosophila. Genome Res. 2021;31:397–410. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125.Wan P, Hu M, Jin H, Tang S, Zhong M, Cheng J, et al. Comparative centromere genomics of major crop species in the Solanaceae family. NGDC Genome Sequence Archive. 2025. https://ngdc.cncb.ac.cn/gsa/search?searchTerm=CRA035506.
  • 126.Wan P, Hu M, Jin H, Tang S, Zhong M, Cheng J, et al. Comparative centromere genomics of major crop species in the Solanaceae family. Github. 2025. https://github.com/Wan9299/CENH3.
  • 127.Wan P, Hu M, Jin H, Tang S, Zhong M, Cheng J, et al. Comparative centromere genomics of major crop species in the Solanaceae family. Zenodo. 2025. 10.5281/zenodo.18006120.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

13059_2026_4028_MOESM1_ESM.xlsx (623KB, xlsx)

Additional file 1: Tables S1-S12. Table S1. Summary of sequencing data generated in this study. Table S2. Summary statistics and quality metrics for the new genome assemblies. Table S3. Summary of repeat and gene annotation. Table S4. Genomic coordinates of CENH3 ChIP–seq–defined functional centromeric regions across Solanaceae genomes. Table S5. Numbers of synteny breakpoints identified from three genome datasets relative to the tomato reference ‘SL5’. Table S6. Enrichment of synteny breakpoints in centromeric and pericentromeric regions relative to the genome-wide background. Table S7. Sequence composition within functional centromeric regions across the analyzed Solanaceae genomes. Table S8. Classification and summary of intact LTR-retrotransposonsacross the analyzed Solanaceae genomes. Table S9. Functional annotation of genes in functional centromeric regions. Table S10. Mapping quality of CENH3 ChIP-seq reads. Table S11. Sources and accession IDs of CENH3 protein sequences from 12 plant species. Table S12. Primers used for FISH probe preparation.

13059_2026_4028_MOESM2_ESM.pdf (9.7MB, pdf)

Additional file 2: Figs. S1-S22. Fig. S1: CENH3 ChIP–seq mapping profile of African eggplant. Fig. S2: CENH3 ChIP-seq mapping profile of eggplant. Fig. S3: CENH3 ChIP-seq mapping profile of wild pepper. Fig. S4: CENH3 ChIP-seq mapping profile of pepper. Fig. S5: CENH3 ChIP-seq mapping profile of wild tomato. Fig. S6: CENH3 ChIP–seq mapping profile of tomato. Fig. S7: Synteny maps of chromosomes Chr01–Chr06 among wild pepper ‘Chiltepin 12’, cultivated pepper ‘CA59’, and the T2T assembly ‘G1-36,576’. Fig. S8: Comparison of CENH3-binding profiles between two pepper accessions. Fig. S9: Comparison of CENH3 ChIP–seq mapping profiles in eggplantbetween two divergent accessions. Fig. S10: Comparison of CENH3 ChIP–seq mapping profiles in tomatobetween two divergent accessions. Fig. S11: CENH3 ChIP-seq mapping profile of tobacco. Fig. S12: Genome-wide synteny and CENH3 ChIP–seq mapping profiles between African eggplantand eggplant. Fig. S13: Examples of synteny breakpoint verification using Hi-C contact maps. Fig. S14: Alignment of synteny blocks from 28 distantly related Solanum species relative to tomato. Fig. S15: Conservation of syntenic sequence and distribution of synteny breaks along chromosomes inferred from alignments between tomato ‘SL5’ and three genome sets spanning different divergence levels. Fig. S16: Enrichment analysis of synteny breaks around CENH3-binding regions. Fig. S17: Sequence composition of the functional centromeric regions across eight Solanaceae genomes. Fig. S18: Pairwise sequence similarity between the 6-Mb pericentromeric regions of tomatoand wild tomato Solanum pimpinellifolium. Fig. S19: Distribution of large arrays of 211-bp satellite repeats in the tomato genome. Fig. S20: Relationship between coding-gene expression and CENH3-binding intensity. Fig. S21: Pairwise sequence similarity around the functional centromeres of African eggplant. Fig. S22: Centromere diversity across seven African eggplant accessions.

Data Availability Statement

The PacBio HiFi reads, Hi-C data, CENH3 ChIP–seq data, and RNA-seq data generated in this study have been deposited in the Genome Sequence Archive (GSA) at the National Genomics Data Center (NGDC) under accession number CRA035506 [125]. The genome assembly and annotation files for African eggplant (‘Sa01’), eggplant (‘Sm01’) and the wild pepper (‘Chiltepin’) are available at Zenodo (https://doi.org/10.5281/zenodo.18006120) [126]. The previous published genome assemblies and corresponding raw sequencing data for C. annuum (CaT2T, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA962192/ [57] and CA59, https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA788020 [59]), Nicotiana benthamiana (NbT2T, https://ngdc.cncb.ac.cn/gwh/Assembly/86283/show) [56], wild and cultivated tomato accessions (https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA809001) [60], the potato line ‘DM 1–3 516 R44’ line (https://spuddb.uga.edu/dm_v6_1_download.shtml) [58], and additional Solanum (www.solpangenomics.com) [66] and Solanaceae genomes (https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA010759) [64] used in this study were downloaded from public repositories. Code used in the study is available at Github, (https://github.com/Wan9299/CENH3) [127] under the MIT license. A version of the resource code has also been archived in Zenodo, (https://doi.org/10.5281/zenodo.18006120) [126].


Articles from Genome Biology are provided here courtesy of BMC

RESOURCES