Skip to main content
BMC Genomics logoLink to BMC Genomics
. 2025 Jul 17;26:672. doi: 10.1186/s12864-025-11831-3

The rps15-rpl32 intergenic locus is a phylogeographic marker for Latin American Zea mays landrace varieties and subspecies

Luciano Univaso 1, Francisca Peña 1, Celián Román-Figueroa 1, Manuel Paneque 2,
PMCID: PMC12273293  PMID: 40676502

Abstract

Background

Maize (Zea mays L.) has a deep cultural significance in Latin America, with traditional and native varieties cultivated for millennia. Approximately 220 maize races have been identified in the region. These races have adapted to the diverse environmental conditions, resulting in a considerable diversity of varieties. Although DNA barcoding is widely used for species identification, distinguishing between varieties within a species remains challenging in plants, owing to the conserved nature of standard barcoding loci. Intragenic marker combinations are often used to address this limitation. However, they remain insufficient for variety-level resolution. Therefore, we developed a pipeline to identify robust markers that can distinguish maize varieties based on their phylogeographic origins.

Results

In this study, the small single-copy (SSC) region of the chloroplast genome exhibited the highest mutation rate per nucleotide. Furthermore, the intergenic regions rps15-rpl32 and ndhD-ndhF exhibited the highest mutation rates in the SSC. Comparatively, the coding genes in this region were more conserved. The rps15-rpl32 locus demonstrated improved resolution for phylogeographic analysis when concatenated with a short genetic anchor sequence. This marker accurately identified the geographic origin of maize samples. Overall, it was the most informative marker despite its relatively low SNP and InDel frequency and moderate divergence levels. Contrastingly, the ndhF-ndhD locus exhibited higher mutation rates but failed to effectively resolve phylogenetic relationships.

Conclusions

Our findings demonstrate that concatenated loci can accurately identify the geographic origin of Zea mays varieties and subspecies and elucidate their relationships. Moreover, the superior performance of rps15-rpl32 in the delineation of phylogenetic relationships among regional genomes shows its potential application as a marker for distinguishing closely related varieties and subspecies. This locus can facilitate the streamlined validation of Zea mays varieties for regional authenticity.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-025-11831-3.

Keywords: Maize, Phylogenetic markers, DNA barcoding, Chloroplast genome, Single-copy region

Background

Maize (Zea mays L.) is linked to various products, foods, practices, and traditions deeply rooted in the daily experiences of most Latin American countries [1, 2]. The significant role of maize in the diet of most Latin American nations is partly due to its millennia-old cultivation and close association with cultural identity in the region [1]. Each country throughout the Americas, especially in the Central and Southern regions, has developed its own landrace varieties and subspecies, reflecting the adaptation of maize to diverse local environments [1, 2].

Maize is a plant species belonging to the genus Zea in the grass family (Poaceae), which also includes important agricultural crops such as wheat, rice, oats, and rye [3]. Cultivated maize is a domesticated plant that has evolved alongside humans since ancient times [4]. Characteristics related to yield and palatability have been artificially selected for centuries, giving rise to the most widely recognized varieties today: Zea mays var. amylacea (flour maize); Zea mays var. indurata (hard-shelled maize); Zea mays var. indentata (dent maize); Zea mays var. everta (pop maize); and Zea mays var. saccharata, which includes sweet and supersweet maize [4]. Maize is an annual plant with summer growth [5] and excellent plasticity. Thus, it can adapt to different moisture, sunlight, altitude, and temperature conditions [6].

The global production of this plant increased from 800 to 1000 M tons between 2004 and 2022 [7], with 51,8% of the production coming from the Americas, mostly from the USA, Brazil, Argentina, Mexico, and Chile [7]. This makes maize a product of considerable economic and cultural importance.

Chile and other South American countries have shown increasing interest in the protection of traditional and native crop varieties—such as maize—through certification systems such Geographical Indications and Denominations of Origin [8]. However, these certifications are often based on morphological or organoleptic traits, which may be subjective or environmentally influenced [8, 9]. This demonstrates the need for accessible, objective molecular tools that can reliably differentiate varieties by their regional origin. In this study, we investigated whether hypervariable regions of the chloroplast genome can provide such a marker system.

The differentiation of maize plant variants from similar products can be ambiguous or subjective, as morphological descriptions of closely related species are falling into disuse [10]. Several solutions to the subjectivity of morphological descriptions in taxonomy have been proposed. These mainly include molecular morphology, such as protein structure and genomic architecture [11], mRNA secondary structure [12, 13], massive sequencing techniques [14], the use of SSR [15], and SNPs [16]. Although these methods have been widely used, they are time-consuming and costly. This limits their application in the verification of regionally protected species.

An alternative method for identifying varieties, cultivars, ecotypes, or other biological entities that may be variants of the same species is DNA barcoding [17]. DNA barcoding is a taxonomic method that uses one or more short, standardized genetic markers to identify a particular species [18]. This method can be used to compare unknown DNA samples with registered species on a reference library [19]. The DNA barcode must contain sequence variations, conserved flanking loci, and a short target DNA region [20]. The advantages of this method include (i) universality: DNA barcode markers, such as the COI gene, have conserved regions across several species [21]. This universality allows their application to numerous organisms, including animals [22], plants [23], and fungi [24]. (ii) Efficiency and simplicity: The region of a single gene, intergene, or spacer can be amplified and sequenced relatively quickly and cost-effectively using Sanger or next-generation sequencing technologies.

Plastid genomes have been widely used to identify various plant species and their phylogenetic relationships [2527]. The chloroplast (cp) DNA from Zea mays is circular, double-stranded, and composed of 140,447 base-pairs (bp) in the B73 reference genome (GenBank BioProject PRJNA10769 [28]). The circular cpDNA contains four main regions [29]: two inverted repeat regions (IRa and IRb) with approximately 22,7 kpb each; these are separated by small (SSC) and large (LSC) single-copy regions of approximately 12,5 kpb and 82,3 kbp, respectively [29].

DNA barcodes (DNAbc) from the chloroplast genome have been evaluated for universal application. Furthermore, their degree of sequence divergence, including coding (rbcL, rpoB, rpoC1, and matK) and non-coding (trnH-psbA) regions, individually and in pairs, have been evaluated in a phylogenetically diverse set of plant genera [18]. However, they have some limitations, such as low intraspecific divergence for conserved loci such as matK and rbcL [30]. This makes them suboptimal for distinguishing between varieties or subspecies within specific species taxa. Consequently, these two loci are typically concatenated with other loci, often intergenic ones [30].

The concatenated DNAbc locus has shown good results in medicinal orchids, with ITS1 + matK as the most resolutive combination [31]. Moreover, the matK + ycf1 sequence exhibited the highest genetic diversity across many plant taxonomic ranks [32].

At the species level, monocots, gymnosperms, mosses, and ferns have been identified using trnH–psbA DNAbc [33]. The double- and triple-DNAbc loci matK + trnH-psbA and rpoB + atpF-atpH + matK, respectively, differentiated between Vigna unguiculata (cowpea; Chinese bean) subspecies [34]. This demonstrates that plant species discrimination via DNAbc can be improved if more than one locus is used for analysis [35].

In the present study, we aimed to identify a hypervariable locus that can distinguish the regional origins of 54 maize varieties by comparing their chloroplast genomes. Our findings provide a basis for the streamlined validation of Zea mays subspecies and varieties for regional authenticity.

Methods

Genomic characterization of 54 Zea mays genomes

To characterize the analyzed genomes, we downloaded 22 paired-end Illumina sequencing datasets (Table 1) from the PRJNA479960 bioproject [36] in the FASTQ format. Read quality was then assessed using FastQC [37]. Genome assembly was performed using NOVOWRAP [38], and annotations in GFF3 files were obtained using GeSeq [39]. Assembly quality, including %GC content and assembly length, was evaluated using QUAST [40]. The boundaries of the four plastid regions (LSC, SSC, IRa, and IRb) were determined using cross data from NOVOWRAP and CPJSdraw [41].

Table 1.

Genomic characterization of Zea Mays subspecies and cultivars/varieties

Subspecies Cultivar/Variety Country Accession number Length (bp) %GC LSC SSC IRa IRb
1 mays A188 USA KF241980.1 140,437 38.45 1–82,367 105,133–117672 117,673–140,437 82,368–105,132
2 mays Alazana Perú SRR7477692 140,489 38.44 1–82,379 105,145–117672 117,672–140,438 82,379–105,145
3 mays Atiti-Nhae-Kowoa Brazil SRR7477679 140,450 38.44 1–82,378 105,146–117685 117,686–140,451 82,379–105,145
4 mays B37C USA KP966115.1 140,457 38.44 1–82,385 105,152–117691 117,692–140,457 82,386–105,151
5 mays B37N USA KP966114.1 140,447 38.44 1–82,375 105,142–117681 117,682–140,447 82,376–105,141
6 mays B37S USA KP966116.1 140,534 38.44 1–82,462 105,229–117768 117,769–140,534 82,463–105,228
7 mays B37T USA KP966117.1 140,479 38.44 1–82,407 105,174–117713 117,714–140,479 82,408–105173
8 mays B73 USA KF241981 140,447 38.44 1–82,375 105,142–117681 117,682–140,447 82,376–105,141
9 mays Caracaraia Brazil SRR7477689 140,452 38.43 1–82,380 105,147–117686 117,687–140,452 82,381–105,146
10 mays Catetoa Brazil SRR7477706 140,453 38.45 1–82,381 105,148–117687 117,688–140,453 82,382–105,147
11 mays Chullpia Perú SRR7477683 140,452 38.43 1–82,380 105,147–117686 117,687–140,452 82,381–105,146
12 mays Confite Puntiagudoa Perú SRR7477704 140,459 38.44 1–82,402 105,168–117694 117,695–140,459 82,403–105167
13 mays Cristala Brazil SRR7477705 140,455 38.43 1–82,383 105,150–117689 117,690–140455 82,384–105,149
14 mays Cuzcoa Perú SRR7477696 140,453 38.43 1–82,381 105,148–117687 117,688–140,453 82,382–105,147
15 mays Dente de Burroa Brazil SRR7477712 140,451 38.44 1–82,379 105,146–117685 117,686–140,451 82,380–105145
16 mays Enanoa Perú SRR7477685 140,450 38.44 1–82,378 105,145–117684 117,685–140,450 82,379–105,144
17 mays Granadaa Perú SRR7477686 140,462 38.44 1–82,403 105,170–117696 117,697–140,462 82,404–105169
18 mays Guapia Paraguay SRR7477684 140,451 38.43 1–82,379 105,146–117685 117,686–140,451 82,380–105145
19 mays HaiKou (mays) China MT721151.1 140,452 38.44 1–82,382 105,148–117687 117,688–140,452 82,383–105,147
20 mays Huancavelicanoa Perú SRR7477690 140,464 38.44 1–82,405 105,172–117698 117,699–140,464 82,406–105171
21 mays INIA601 Perú ON920921 140,451 38.00 140,450–82394 105,159–117685 117,686–140,449 82,395–105,158
22 mays Iqueñoa Perú SRR7477691 140,461 38.43 1–82,402 105,169–117695 117,696–140,461 82,403–105168
23 mays Kcullia Perú SRR7477669 140,447 38.44 1–82,375 105,142–117681 117,682–140,447 82,376–105,141
24 mays Mocheroa Perú SRR7477697 140,483 38.43 11–82,404 105,163–117705 117,706–10 82,405–105162
25 mays Molea Brazil SRR7477709 140,462 38.44 1–82,403 105,170–117696 117,697–140,462 82,404–105169
26 mays Morochoa Perú SRR7477694 140,461 38.43 1–82,402 105,169–117695 117,696–140,461 82,403–105168
27 mays Morocho Cajabambinoa Perú SRR7477695 140,457 38.43 1–82,385 105,152–117691 117,692–140,457 82,386–105,151
28 mays Pagaladroga Raymondia Perú SRR7477699 140,454 38.44 1–82,382 105,149–117688 117,689–140,454 82,383–105,148
29 mays Palha-Roxa-Acreanaa Brazil SRR7477703 140,460 38.43 1–82,387 105,154–117694 117,695–140,460 82,388–105,153
30 mays Paroa Peru SRR7477677 140,464 38.44 1–82,405 105,172–117698 117,699–140,464 82,406–105171
31 Parviglumis PC_I11 Mexico BK061469.1 140,461 38.44 1–82,402 105,169–117695 117,696–140,461 82,403–105168
32 Parviglumis PC_I53 Mexico BK061463.1 140,369 38.44 1–82,312 105,078–117604 117,605–140369 82,313–105,077
33 Parviglumis PC_J14 Mexico BK061464.1 140,459 38.44 1–82,400 105,167–117693 117,694–140,459 82,401–105166
34 Parviglumis PC_J48 Mexico BK061467.1 140,457 38.44 1–82,398 105,165–117691 117,692–140,457 82,399–105,164
35 Parviglumis PC_K55 Mexico BK061468.1 140,425 38.45 1–82,369 105,135–117660 117,661–140,425 82,370–105134
36 Parviglumis PC_L06 Mexico BK061465.1 140,454 38.44 1–82,397 105,163–117689 117,690–140454 82,398–105,162
37 Parviglumis PC_L48 Mexico BK061470.1 140,463 38.44 1–82,404 105,171–117697 117,698–140,463 82,405–105170
38 Parviglumis PC_N10 Mexico BK061466.1 140,370 38.44 1–82,313 105,079–117605 117,606–140370 82,314–105,078
39 Parviglumis PC_N14 Mexico BK061471.1 140,462 38.44 1–82,403 105,170–117696 117,697–140,462 82,404–105169
40 Huehuetenangensis PI_441934 Guatemala KR873422.1 140,453 38.44 1–82,394 105,161–117687 117,688–140,453 82,395–105,160
41 mays Pontinhaa Brazil SRR7477700 140,458 38.44 1–82,401 105,167–117693 117,694–140,458 82,402–105166
42 Mexicana RIMME0018 Mexico BK061460.1 140,504 38.48 1–82,445 105,212–117738 117,739–140,504 82,446–105,211
43 Mexicana RIMME0019 Mexico BK061462.1 140,535 38.48 1–82,476 105,243–117769 117,770–140535 82,477–105,242
44 Mexicana RIMME0023 Mexico BK061461.1 140,537 38.48 1–82,480 105,246–117772 117,773–140,537 82,481–105,245
45 Mexicana RIMME0026 Mexico BK061458.1 140,539 38.48 1–82,480 105,247–117773 117,774–140,539 82,481–105,246
46 Mexicana RIMME0031 Mexico BK061459.1 140,495 38.47 1–82,436 105,203–117729 117,730–140495 82,437–105,202
47 mays Semi-Denta Brazil SRR7477687 140,449 38.44 1–82,377 105,144–117683 117,684–140,449 82,378–105,143
48 mays SUPG020 India OP832502.1 140,449 38.44 1–82,372 105,111–117720 117,721–140,449 82,373–105,110
49 mays SUPG021 India OP832501.1 140,411 38.43 1–82,333 105,067–117676 117,677–140,411 82,334–105,066
50 mays Sweet China OM317760.1 142,545 38.44 1–82,465 105,232–117772 117,773–140,538 82,466–105,231
51 mays Uchuquillaa Peru SRR7477681 140,450 38.44 1–82,378 105,145–117684 117,685–140,450 82,379–105,144
52 mays Waxy China OM317761.1 140,454 38.44 1–82,382 105,149–117688 117,689–140,454 82,383–105,148
53 mays Xingu-wauraa Brazil SRR7477701 140,484 38.44 1–82,426 105,192–117718 117,719–140,483 82,427–105,191
54 mays Zhengdan958 China MK348606.1 142,460 38.44 1–82,382 105,149–117688 117,689–140,454 82,383–105,148

agenome assembled for this project, Bold name is the reference genome

LSC Large single-copy region, SSC Small single-copy region, IRa Inverted Repeat A, IRb Inverted Repeat B. %GC, percent of G + C at that zone

SNPs, InDels, and %GC in plastid regions (LSC, SSC, IRa, and IRb)

We used MUMmer [42] to identify SNPs and InDels and subsequently applied the nucmer module to compare the 22 genomes with the B73 reference genome [43]. Delta and coordinate files were processed using custom Python scripts to consolidate all mutations into a single CSV table. Each SNP and InDel was mapped to genes, intergenic regions, country, continent, and plastid regions (LSC, SSC, IRa, and IRb) using RStudio (v2024.12.0, Posit Software, Boston, MA, USA) with the dplyr [44] library. The mutation frequency at each locus across genomes was calculated with RStudio and visualized using PyGWalker [45].

The aligned genomes were segmented into four separate FASTA files representing the plastid regions, and the nucleotide composition (%A, %T, %C, %G, and %GC) was calculated using Python scripts. Mutation rates were determined by dividing the total number of mutation events (e.g., an InDel of 3 bp was treated as three events) by the length of each genomic region; the results were visualized using the ggplot2 library in RStudio [46].

Gene and intergenic region characterization in the SSC region

CPJSdraw was used to represent the B73 chloroplast junctions, whereas Proksee was used to represent the SSC zone [47]. The mutation frequency per base pair was calculated by dividing the number of events by the locus length, and the results were plotted using PyGWalker. Bash and Python custom scripts were used to extract gene and intergenic sequences from the FASTA genomes using GFF3 associated with an individual FASTA file.

Divergence analysis using K2P distances

Multiple sequence alignments of all loci were performed using MAFFT v7 [48] with default parameters. Pairwise genetic distances were calculated using the Kimura 2-parameter (K2P) model with Biopython-based scripts. Divergence was quantified using the following formula:

graphic file with name d33e2187.gif

where"n"represents the number of genome pairs with zero K2P distance. The maximum intraspecific K2P was calculated using Biopython and custom Python scripts through the Kimura equation for divergence values. Gene- and intergenic-level K2P distances were derived from all FASTA files across genomes.

Phylogenetic analysis of selected loci

Rstudio was used to map mutations (SNPs and InDels) at each locus, including reference nucleotide changes and mutation frequencies (i.e., the times when the same event occurs) across genomes, and plotted as a matrix of each nucleotide. Trees were generated through IQ-TREE 2 [49] using the following parameters: model = F81 + F; bootstrap = 1000. The log-likelihood values are shown in the description of each figure. The resulting Newick trees were plotted in RStudio using the ape [50] and ggtree libraries.

Results

Genomic characterization of 54 Zea mays genomes

We used four subspecies of Zea mays (Table 1) from different countries: mays (12 inbred and 27 landraces), huehuetenangensis (1), mexicana (5), and parviglumis (9). The Zea mays B73 [28] reference genome is shown in bold text.

Table 1 shows the species, subspecies, cultivars/varieties, %GC, and total length, along with the positions of the beginning and end of the LSC, SSC, IRa, and IRb regions.

Most genomes were normal in length (average length = 140,535 bp). The shortest was a member of the parviglumis subspecies (I53). In contrast, two Chinese inbreeds, Zhengdan958 and sweet maize, possessed the longest genomes (approximately 2000 bp more than the average). In both cases, the minimum and maximum length differences were not explained by an extension of the four regions that comprised the plastid genome.

The %GC was almost equal to that of the reference genome, and the average length of every genome analyzed (38.43%) was close to the ranges shown in Table 1.

The LSC predominantly showed the same initial origin from base pairs 1–82,300. The beginning of this region shifted by one nucleotide in INIA601; however, this did not affect the other three regions. Similarly, Mochero shifted by 11 nucleotides to move the frame of the IRa region. None of these shifts affected the standard organization of the loci (Table 1).

SNPs, InDels, and %GC in the LSC, SSC, Ira, and IRb regions and their associated loci

To identify regions with the highest mutation density and their potential impact, we analyzed the total number of mutation events (defined as the total number of nucleotides involved in SNPs and InDels) across all maize varieties and subspecies within four chloroplast genomic regions (LSC, SSC, IRa, and IRb) (Fig. 1). The LSC and SSC regions exhibited the highest mutation densities; the total number of affected nucleotides was predominantly associated with insertions in the LSC and deletions in the SSC (Fig. 1c). In contrast, the IRa and IRb regions showed the lowest mutation levels across all genomes (Fig. 1a). The SSC region predominantly consisted of SNPs and deletions, whereas the LSC region exhibited a higher incidence of insertions and SNPs (Fig. 1a, bottom panel). Most mutations were located in the intergenic regions of the SSC and LSC zones (Fig. 1a, top panel). LSC is the longest of the four regions and is consequently more prone to mutations. Therefore, we normalized the mutation counts based on the length of each zone (Fig. 1b). This normalization enabled the comparison of each chloroplast region, such that the differences were attributed to their mutation rate per nucleotide base; thus, the region length was not an interfering factor. Insertions were the most frequent mutation type across all regions after normalization (Fig. 1b).

Fig. 1.

Fig. 1

Screening of SNPs and InDels in 53 varieties and subspecies versus the B73 reference genome. The main mutations were counted (as total nucleotides involved in each genome) and plotted for the four regions of the chloroplast genome. A The SNPs, deletions, and insertions counted at the four regions and the main type of locus (gene or intergene) exhibiting these. B Mutation rate and %GC (inside box). C Counts of the mutations by the genome country of origin

The IRa and IRb regions exhibited similar normalized mutation rates that were consistent with their comparable lengths, whereas the LSC region had the lowest normalized mutation rates (Fig. 1b). The SSC exhibited the highest total normalized mutation rate (Fig. 1b).

This pattern aligned with the low GC content observed in the SSC region, which was the lowest among the four chloroplast regions (inset box, Fig. 1b). The SNP distribution of the SSC showed high bias of G to A and G to T changes across the genomes (Figure S1A). Contrastingly, the least common changes were T to C, C to G, and A to T. This is consistent with the low levels of %GC observed in the SSC.

Genome-wide analysis of the four regions further confirmed that the SSC and LSC regions consistently harbored most of the mutations (Fig. 1c). These findings indicate that the SSC region had the highest mutation rate across all analyzed genomes.

Characterization of genes and intergenes and co-localization of SNP-InDels at the SSC

We analyzed the SSC region to characterize both coding genes and non-coding intergenic regions (hereafter referred to as"intergenes"). We sought to identify conserved and variable sites within the SSC and identify the most divergent loci.

The SSC region spans 12,540 bp and is flanked by ndhF at the beginning and ndhH at the end (Fig. 2a). Analysis of mutations (SNPs, insertions, and deletions) in the SSC revealed markedly short loci within this region, leading to artifacts in the mutation rate calculations (Figure S5). This was particularly evident in the ndhI-ndhG locus. This locus was only 197 bp long and displayed high mutation rates per base pair (Fig. 2d). We restricted our analysis to loci longer than 1,000 bp (indicated by the orange cutoff line in Fig. 2c) to mitigate these artifacts.

Fig. 2.

Fig. 2

SSC loci mutation rates. A The B73 reference plastid genome. B Relevant SSC intergenic regions (lines: green = rps15-rpl32; orange = ndhD-ndhF; red = ccsA-trnN-GUU; at the upper tracks), showing genes as blue arrows pointing at the direction of transcription, the junctions in pink, and the %GC levels. C Locus and variant lengths (orange line: 1000 bp; %GC in color scale). D Events per pb (blue cutoff line: third locus with highest events per pb; inside box: %A in each locus). E Distribution of events per pb, colored by mutation type

The intergenic regions rps15-rpl32 and ndhD-ndhF exhibited the highest mutation rates in SSC (Fig. 2). In contrast, the coding genes in this region were more conserved, with fewer mutations and a lower diversity of mutation types than those in other regions (Fig. 2c-d). The rps15-rpl32 and ndhD-ndhF loci also had low GC content (approximately 30%; 29.24% for ndhD-ndhF and 30.23% for rps15-rpl32). This was consistent with their higher mutation rates.

These findings indicate that the loci with the highest mutation rates within SSC are located at intergenic regions; rps15-rpl32 and ndhD-ndhF were the most divergent loci. This highlights their potential utility in phylogenetic and comparative genomic studies.

Hypervariable zones and conserved genetic anchors for analysis of phylogenic varieties

After identification of genes and non-codificant regions with more mutations in the variable zone of the chloroplast, we analyzed these loci to develop a phylogenetic tree that could identify every variety and subspecies as a possible DNA barcode.

Divergence by K2P of the highest mutant loci

We performed K2P divergence analyses across 54 genomes to evaluate the phylogenetic resolution of the rps15-rpl32 and ndhD-ndhF loci. Common DNA barcoding loci (rbcL and matK) were used as negative controls for divergence.

Table 2 summarizes the K2P divergence results. As expected, the intergenic loci had the highest %K2P divergence compared to the gene loci. The rps15-rpl32 intergenic region demonstrated the highest phylogenetic resolution at 17.6% divergence, whereas ccsA showed the lowest at 3.8%. Similarly, ndhD-ndhF displayed considerable %K2P divergence that almost matched that of rps15-rpl32.

Table 2.

Percentage of K2P-resolution based on regional groups

graphic file with name 12864_2025_11831_Tab2_HTML.jpg

Groups: PE Peru, MXP Mexico-Parviglumis, MXR Mexico-RIMME, CN China, US United States, BR Brazil, IN India. Colors represent the numeric value on a scale from green (highest) to red (lowest).

The most divergent loci in the SSC zone were rps15-rpl32 and ndhD-ndhF; rps15-rpl32 was the best of the group (17.6% resolution). In contrast, the worst was ccsA at 3.846% resolution (Table 2).

Although the intergenic loci had the highest %K2P resolution, the columns with K2P values of intraspecific divergence varied for the different groups, with the values categorized as a heatmap (Table 2). Although rps15-rpl32 exhibited discrete intraspecific divergence, it was more conserved than ndhD-ndhF.

At the group level, Peru had the most intraspecific divergent genomes, followed by India. In contrast, the USA genomes were the most intraspecific non-divergent, indicating that they were highly conserved. The Paraguay and Guatemala groups contained only one genome; therefore, no intragenic divergence was calculated.

These results showed that genes are highly conserved in SSC. Moreover, the intra- and interspecific divergence values of these genes were compared with those of the most common loci used as DNAbc, such as matK and rbcL. This indicates that they have low mutation rates in chloroplast genomes. Furthermore, the analyzed loci showed varying degrees of divergence across different groups, indicating that the rate of mutation differed by group, individual genome, and loci.

rps15-rpl32 as possible regional marker

After K2P divergence analysis, we reconstructed the phylogenetic relationships of the 54 genomes using the best-performing loci. We used rbcL as a negative control for effective comparison, owing to its comparable K2P resolution to matK, along with its SNP and InDel profiles.

Phylogenetic trees based on the rps15-rpl32 and ndhD-ndhF loci

The phylogenetic trees inferred by Maximum Likelihood (ML) show branch colors corresponding to the regional origin of each accession, highlighting geographic clustering patterns and frequencies of SNPs and InDels for each locus (Fig. 3). These phylogenetic trees provide insight into sequence variation. Statistical and construction parameters are provided in material and methods, whereas branch lengths and bootstrap (bt) are shown at the tree nodes.

Fig. 3.

Fig. 3

Phylogenetic inference by ML based on the best gene-locus, as analyzed via percentage of K2P-resolution. SNP and InDels of rbcL (A) and ndhF (B) and their respective phylogenetic inferences (C, D). Branches are colored according to country of origin. Numbers on the side indicate the clade count. Numbers in grey italics represent bootstrap support values; black numbers indicate branch lengths. nREF = nucleotide in the B73 reference genome; nQRY = nucleotide in the query genomes; n = count. Consensus log-likelihood: rbcL tree (C): −1975.187 and ndhF tree: −3015.924

rbcL (Fig. 3c) exhibited limited phylogenetic resolution and failed to effectively represent the varieties. Contrastingly, ndhF (Fig. 3d) slightly improved in resolution (K2P = 7.6%) and formed better-defined clades. The first part of the tree contained a big part of the Latin American landraces (PE, PA, and BR) without significant internal differentiation. The first clade (bt = 53) defined primarily consisted of Peruvian maize, except for Mole. The second clade (bt = 84) contained most of Parviglumis and Mexicana (both from Mexico) and a third clade (bt = 71) had the only two Indian individuals. The remaining genomes were arbitrarily arranged into a single fourth clade (bt = 71), indicating limited resolution for the rest of the dataset. Most of these internal nodes received moderate to high bootstrap support.

Analysis of the nucleotide frequencies of SNPs and InDels in rbcL and ndhF (Fig. 3a–b) revealed minor changes in rbcL (e.g., transitions from A to C and A and C deletions). In contrast, ndhF exhibited a higher frequency of changes (i.e., G-to-A substitutions, C deletions, and some A insertions) than rbcL. These observations suggest that the frequency and type of SNPs may define the phylogenetic relationships among genomes.

The ndhF-ndhD and rps15-rpl32 intergenic loci outperformed genes in resolving phylogenetic relationships. ndhF-ndhD formed six clades; however, these clades failed to align with the regional origins of the groups and appeared randomly distributed.

rps15-rpl32 (17.6% resolution) formed seven clades that grouped individuals of the same varieties and regional origins (Fig. 4c). Clade 1 (bt = 67) grouped three genomes from BR and PE that were not well-defined. The Mexican subspecies were grouped in clades 2 (bt = 19) and 4 (bt = 68), including Huehuetenangensis from GU, which shared Central American origins. Clade 3 (bt = 73) mostly grouped the PE varieties, except for Mole, whereas the others were grouped in clade 7 (bt = 75) with the BR group and a member of the PA group. Finally, clade 6 (bt = 85) mostly grouped the inbred strains (except for A188 and Zhengdan956) with the reference genome and CMS-B37 strains. In contrast, A188 and the Sweet variety did not cluster within the major clades identified in the phylogeny. Rather, they appeared as distinct lineages (Fig. 4d).

Fig. 4.

Fig. 4

Phylogenetic inference by ML based on the best intergene-locus, as analyzed via percentage of K2P-resolution. SNPs and InDels of ndhF-ndhD (A) and rps15-rpl32 (B) and their respective phylogenetic inferences (C, D). Branches are colored according to country of origin. Numbers on the side indicate the clade count. Numbers in grey italics represent bootstrap support values; black numbers indicate branch lengths. nREF = nucleotide in the B73 reference genome; nQRY = nucleotide in the query genomes. The point (.) in the InDels box indicates the absence of nucleotides. Consensus trees log-likelihood: ndhF-ndhD (C): −3889.123 and rps15-rpl32 (D): −4235.487

Analysis of SNP and InDel frequencies revealed that intergenic loci exhibited significantly higher mutation rates than genic regions (Fig. 4), as expected from the %K2P-based divergence (Table 2).

Transitions from T to A were predominant in the ndhF-ndhD intergenic region, followed by those from A to C, C to T, and G to T. This region also showed a high frequency of A and T deletions, whereas T and A insertions were less frequent. In contrast, the rps15-rpl32 locus displayed a discrete overall mutation rate, with SNPs primarily involving A-to-C transitions, and less commonly, G-to-A and G-to-T transitions. The InDel patterns in rps15-rpl32 were characterized by the prevalence of A deletions and T insertions.

These findings highlight the potential ability of intergenic loci to provide a higher phylogenetic resolution than genic regions. The rps15-rpl32 locus was the most informative marker, despite its relatively low SNP and InDel frequency and moderate levels of interspecific and intraspecific divergence. In contrast, ndhF-ndhD exhibited higher mutation rates but failed to effectively resolve the phylogenetic relationships at both the regional and broader scale.

The superior performance of rps15-rpl32 in the delineation of phylogenetic relationships among regional genomes underscores its potential as a marker for distinguishing closely related varieties and subspecies. These results suggest that the ideal locus for phylogenetic inference should balance discrete divergence with adequate mutation frequency to ensure clear resolution.

Concatenated loci provide the best phylogenetic tree

To enhance the resolution of the rps15-rpl32 phylogenetic tree, we concatenated this locus with other genic and intergenic regions to identify markers that could provide better phylogenetic resolution. We generated > 3900 combinations of rps15-rpl32 with various genes and intergenic regions of the plastid genomes. Of these, the concatenation of rps15-rpl32 with rpl23-rpl2 yielded the best phylogenetic resolution; a total K2P divergence of 25.49% was observed (Table 3).

Table 3.

Percentage resolution based on the K2P divergence of simple and concatenated loci

Maximum-Intraspecific K2P distances
Locus %K2P PE MXP MXR CN US BR IN
rps15-rpl32 17.64 0.00379 0.001263 0.000632 0.000316 0.000316 0.001579 0.005054
rpl23-rpl2 1.92 0.26315 0 0 0 0 0.157895 0
rpl23-rpl2 + rps15-rpl32 25.49 0.00471 0.001256 0.000628 0.000314 0.000314 0.002512 0.005024

PE Peru, MXP Mexico-Parviglumis, MXR Mexico-RIMME, CN China, US United States, BR Brazil, IN India

The phylogenetic tree based on this concatenation is shown in Fig. 5a. Seven groups were identified: (i; bt = 91) SA1, formed mainly by Peruvian high-rate (HR) mutations, except for Mole, which is part of the Brazilian HR mutations (Fig. 5b); (ii; bt = 64) SA2, formed by Brazilian and Peruvian low-rate (LR) mutations and SemiDent of the Brazilian HR mutations (Fig. 5a–b). This could be due to the low (but not the lowest) value of intragenic divergence in the Brazilian group at this specific locus, which may not fully reflect the divergence of these genomes; (iii; bt = 66) MX1, formed by four LR mutations in Parviglumis and Mexicana-RIMME0023, the lowest in SNPs and InDels in the Mexicana subspecies (Fig. 5b); (iv; bt = 65) MX2, formed by HR mutations in both Parviglumis and Mexicana subspecies (Fig. 5b); (v; bt = 74) IN, formed only by the two SUPG individuals; (vi; bt = 75) CN, formed only by Chinese genomes; and (vii; bt = 42) US, formed by B37 and B73.

Fig. 5.

Fig. 5

Phylogeographical relationship inference by ML between different varieties and subspecies, as determined based on the concatenated loci rpl23-rpl2 + rps15-rpl32 compared with the frequency of genome-level mutations. A The circular tree represents the regional origins and relationships between varieties and subspecies, colored by country. B The frequency of mutations in the 53 genomes compared with the B73 reference genome. SA1: South America Group 1; SA2: South America Group 2; MX1: Mexican Group 1; MX2: Mexican Group 2; CN: China Group; IN: Indian Group; US: United States Group. Numbers in grey italics represent bootstrap support values; black numbers indicate branch lengths. Consensus tree log-likelihood: −4320.162

These results show that the concatenated loci are powerful enough to generate phylogenetic dendrograms that explain the regional origin of Zea mays varieties and subspecies and elucidate their relationships, consequently reflecting the mutation rate in chloroplast genomes.

Discussion

SSC as the genomic region with the highest mutation rate

Of the four regions that constitute the chloroplast genome, the SSC region exhibited the highest mutation rate per nucleotide (Fig. 1a–b). This phenomenon has been documented in several studies. For instance, Shaw et al. [51] reported this pattern for angiosperm genomes, such as those of Atropa, Nicotiana, Saccharum, and Oryza spp. This is consistent with the information described for other genera, such as Capirona [52], Pulsatilla [53], and Pseudogalium [54]. These observations are consistent with our results and identify the SSC region as the segment of the chloroplast genome with the highest mutation frequency.

Divergence in genes and intergenes and co-localization of SNP-InDels at the SSC region

The SSC region was 12,540 pb in length and was flanked by ndhF and ndhH (Fig. 2a–b). A low GC content in this region has been reported across various genera [25, 27, 52, 54]. This is consistent with the distribution of SNPs that show a high tendency for G-to-A transitions, ultimately affecting the GC percentage in this region (Figure S1A).

We identified eight genes and six intergenic loci in this region. Genes showed fewer events per base pair than intergenic loci (Fig. 2d). The most divergent loci were rps15-rpl32 and ndhD-ndhF. These loci possessed different proportions of SNPs/InDels, with a high predominance of deletions observed in ndhD-ndhF. This divergence was associated with %A in the loci.

K2P divergence in genes and intergenes

The loci with the highest mutation rates in the SSC region were identified and evaluated using K2P analysis (Table 2). Their divergence was assessed based on resolution percentage to determine their suitability as phylogenetic markers. Intergenic loci were more divergent than the genic regions, with ndhD-ndhF and rps15-rpl32 emerging as the most effective markers. Although matK and rbcL are widely used for DNAbc [3032, 55], they performed poorly at this level of phylogenetic proximity, making them unsuitable candidates. rps15-rpl32 was the best among the groups, with 17.6% resolution and discrete levels of intra- and intergenic divergence.

This shows that the Peru group (in this study) is the most divergent, as it has two pools of genomes with high and low mutation rates (Table 2, Fig. 5b).

Biological significance of rps15-rpl32 and ndhF mutation bias at the SSC region

The SSC region is a genomic segment that harbors the genes involved in electron transport and photosynthetic regulation. The high mutation frequency at this region in different Zea mays varieties, owing to distinct ancestral variety crosses, is likely due to the genes within this zone. The main loci findings in the present study are as follows: part of the ndh-cluster (subunits A, D, E, F, G, H, and I), psaC, ccsA, and rpl32.

The ndh-cluster is a chloroplast NAD(P)H dehydrogenase-like complex; most of its subunits are in the SSC region [56]. The NDH-complex is involved in the redox-level regulation of cyclic electron transport (CET) during photosynthesis. It is located in the thylakoid membrane, near photosystem I (PSI). This complex transfers electrons from PSI to the plastoquinone pool, using it as a motor force to pump protons across the thylakoid membrane [57]. This mechanism is necessary for the adjustment of the electron flux [56]. The NDH levels in C4 plants (such as maize) are higher than those in C3 plants [58], indicating that NDH-CET is highly necessary for their cellular process [59].

The NDH subunit NDH-F, encoded by ndhF, is located at the complementary strand in the 3′−5′ direction in the Zea mays plastid genome. It is co-localized within the rpl32-rps15 intergenic locus at the 5′−3′ template strand (Fig. 2b). NDH-F plays a role in the adaptive response to changes in light intensity in tobacco plants [60]. Therefore, the high frequency of mutations observed in this locus, along with the phylogenetic inferences based on the rpl32-rps15 locus (Fig. 4b), could be explained by the edaphoclimatic conditions and light radiation levels of the regions where the plants were cultivated. In these environments, the photosynthetic machinery may have adapted to local conditions, possibly through the SNPs and InDels observed in the rpl32-rps15/ndhF locus, which regulate photosynthesis by balancing electron flux and proton gradients across the thylakoid membrane.

Knockout studies in tobacco plants have shown that rpl32 and rps15 play key roles in plant development and photosynthesis: (i) rpl32 deletion results in abnormal leaf development, indicating its necessity for proper organ formation [61], whereas (ii) rps15 knockout leads to a lower chlorophyll a:b ratio and reduced quantum efficiency of photosystem II (PSII) [61]. These findings suggest that rpl32 and rps15 may be involved in the regulation of PSII components and factors that control leaf development.

ndhF plays a role in the adaptation of plants to changes in light intensity [60]. Similarly, rpl32 and rps15 affect leaf development and PSII efficiency [61]. Thus, the high mutational frequency observed in these loci may reflect adaptive responses to the specific edaphoclimatic and light conditions where these maize varieties were traditionally cultivated. This is consistent with the phylogenetic inferences shown in the present study.

rps15-rpl32 as a phylogeographic marker vs other common DNAbc loci

The use of rbcL as a control for our phylogenetic analysis showed null resolution, which can be explained by the low frequency of SNPs and indels (Fig. 4). ndhF performed better in the phylogenetic trees, possibly because of the almost triple-frequency levels of SNPs and InDels. Although changes from A to C or T were observed in both cases, deletions of A were more strongly related in the genomes analyzed [62].

The intergenes showed better results, owing to the high frequency of SNPs and InDels [63] (Fig. 4a–b). Although both regions were similar in length (~ 3000 bp) and %GC (~ 30%), they exhibited markedly different mutation frequencies per base pair. The mutation frequency of the ndhD-ndhF locus was almost six times more variable than that observed for rps15-rpl32. Although the gene with the highest mutation frequency performed the best in the generation of phylogenetic trees, this pattern did not hold for intergenic regions.

Both loci exhibited a notable bias toward SNPs involving changes in A. Additionally, ndhD-ndhF displayed a high frequency of insertions, which were predominantly A and T insertions. In contrast, rps15-rpl32 was characterized by discrete levels of T insertions and A deletions. This showed a clear tendency for genomic changes to favor adenine and thymine substitutions, with an observed increase in A/T enrichment as the number of mutated base pairs increased.

Complementary RSCU analysis revealed a strong tendency (> 1) for the third nucleotide position to favor A or T in rps15-rpl32. Although this is a non-coding region, it exhibits a distinct trinucleotide composition bias (Table S3), suggesting that it may be subject to selective pressure or mutational biases. This phenomenon has been widely described [62]. rps15 (encoding the 30S subunit of the 70S ribosome) and rpl32 (encoding the 50S subunit of the 70S ribosome) undergo polycistronic transcription in the chloroplast [64]. The intergenic region between these genes may have regulatory roles in the synthesis of polycistronic mRNA, as there are no Shinde-Delgarno sequences (data not shown). The high A/T composition of SNPs and InDels [62, 65] suggests that these regulatory roles involve some mRNA secondary structures, such as stem loops [66]. Furthermore, these changes toward A/T may be selected more because A/T-rich regions are often preferred in chloroplasts, as their replication and transcription consume less energy [67].

The rps15-rpl32 locus primarily showed good results because it has a moderate number of variable sites (14), InDels, and a few parsimony-informative sites (4), suggesting a balance between variability and conservatism (Table S2). This is ideal for phylogeny because a region with too many conserved sites provides minimal information, whereas too much variability can introduce noise [68, 69], as is the case for ndhF-ndhD. The ndhF-ndhD locus has a similar length and %GC to rps15-rpl32 but exhibits excessive mutation frequency (Fig. 4a–c). This is not suitable for the phylogeny of varieties of the same species, as they are genetically similar within the same variety but slightly differ from other varieties.

High variability levels have been observed in noncoding regions at the SSC region (such as ndhF-rpl32 or rpl32-trnL [51, 54, 7072]). However, these regions are not used for phylogenetic analysis, likely because they are not suitable for common ancestor phylogeny reconstruction and may lack the IRb-copy of rps15 [54, 71, 72].

The size and complexity of the variable and conserved sites of rps15-rpl32 is average, allowing it to retain sufficient phylogenetic information without being prone to irrelevant changes.

Phylogenetic trees based on high- and low-divergence intergenic regions

The locus with the highest %K2P value was the most effective for constructing a well-resolved phylogenetic tree of the 54 maize plastid genomes (Figs. 3 and 4). This was confirmed by the rps15-rpl32 tree, which delineated seven distinct clades. These results were supported by moderate-to-high bootstrap values ranging from 50 to 80 in each clade. Although the rps15-rpl32 tree showed the lowest log-likelihood value (–4235.5), this is expected because it has a longer sequence length [73] than other loci (rbcL: 1.4 kb; ndhF: 2.2 kb; ndhF-ndhD: 2.8 kb; rps15-rpl32: 3.1 kb). This trend is further supported by the minimal difference between the log-likelihood values of rps15-rpl32 and the combined rpl23-rpl2 + rps15-rpl32 tree (–4235.5 vs.–4320.2), which also reflects their similar alignment lengths. When interpreting phylogenetic resolution, the number of informative sites and clade support (bootstrap values) become more relevant than the log-likelihood parameter alone. Despite its moderate mutation frequency, rps15-rpl32 provided well-supported clades and was more effective in resolving relationships among regional genomes in the present study.

This locus was concatenated with a short intergenic region of low divergence (rpl23-rpl2) to enhance the resolution, resulting in a %K2P resolution increase of 25.49. As anticipated, the final tree successfully grouped the genomes into seven well-defined clades.

Five maize varieties (Huehutenagensis, A188, Potinha, Xingu-Waura, and Confite Puntiagudo) were incorrectly associated with the expected clades. This discrepancy was likely attributed to the inherent challenge of identifying a single (or multiple) locus capable of adequately representing every genome. Such loci must strike a delicate balance in divergence —neither too high nor too low— to ensure that subspecies of the same variety are sufficiently similar while remaining distinct from other those of varieties. This narrow range of divergence poses a significant constraint on the resolutive capacity of the marker, a characteristic that was consistently highlighted in the present study. The opposite effect occurred for the SA2 group, where the branches at the clade level were not distinguishable because of a low number of mutation events.

These results confirm that rps15-rpl32 can reflect the actual evolution of the whole genome.

K2P may not be the best divergence indicator for closed phylogenetic maize varieties

Are the K2P resolution values representative of divergence in close intraspecific varieties of the same species? Our findings align with those of Srivathsa and Meier [74], who reported that the widespread use of the K2P model in DNA barcoding does not necessarily enhance the resolution of closely related sequences (Tables 2 and 3). K2P distances were associated with locus divergence in the present study, producing lower and higher values for conserved and divergent loci, respectively. However, these distances do not fully represent the phylogenetic resolution observed in our data. Although the best-performing locus showed a 17% resolution, the phylogenetic tree delineated seven well-defined groups, suggesting a markedly higher effective resolution. This resolution reached 25.5% after concatenation with the same groups. The increased resolution indicates improved distribution, which is not consistent with our results. This limitation underscores the importance of using complementary metrics and visualization techniques, as K2P distances alone may underestimate the taxonomic resolution achievable with certain loci, especially in closely related maize varieties [74].

Association of rps15-rpl32 locus phylogeny-inference clades with SSR genetic diversity

Bedoya et al. [75] classified Uchuquilla, Kculli, and Chullpi within the South America– Andean region group (specifically in the Bolivian highland subgroup). This is consistent with our results (Fig. 5a), where they were clustered within the SA1 clade. In the same study, Morocho (classified as SA2 in our study) was placed in the Highland Andean subgroup within the South America– Andean region group, whereas Mochero (SA2) was assigned to the Central Highland Andean subgroup.

Our results are consistent with those of Bedoya et al. [75] and Vigoroux et al. [76] in terms of genomic-geographic background. The SA group represents the plastid genomic lineage of the South America– Andean group, with the SA1 clade corresponding to Bolivian highland maize and SA2 to the Highland Andean. Additionally, our results confirm the existence of a USA group [76] in our US clade (Fig. 5a) and a Highland Mexican group, which Vigoroux et al. [76] subdivided into two major clades in a neighbor-joining phylogeny (corresponding to MX1 and MX2 in Fig. 5a). Overall, our findings suggest that the previous SSR-based genetic background studies on Latin American maize races [75, 76] align with the plastid genomic background, showing a strong correlation.

Some genomes do not fit in the clades within the phylogenetic trees

Finding a locus or loci that can function in different varieties is challenging, especially when the maize varieties are phylogenetically close and the chloroplast evolves considerably slowly [77, 78]. This is mostly because the non-sexual uniparental inheritance [79], where the genes and intergenic regions evolve at different rates ([80, 81]; Fig. 1a), differs in each variety/accession [82] (Figs. 3 and 4). This effect was observed in phylogenetic analysis [76], not only for possible introgression in the regional germplasm but also because we used partial genomic information from only one small genome of the three that compose the plant cell.

Huehuetenagensis and A188 were just outside of their respective clades but close to the CN-USA group, likely because the loci used for the marker have evolutionary patterns similar to those that compose the phylogeography of the CN-USA clades (Fig. 5). However, Potinha, Xingu-Waura, and Confite Puntiagudo were identified as outgroups, as if they were from another species. They shared similar patterns of SNP and InDels that fell into an intermediate range (IR) of events, which likely could not be reflected in the selected phylogenetic-regional marker. This demonstrates the need for other loci that can facilitate the generation of improved results, such as ITS1 and ITS2 [31, 83, 84]. However, not every variety has a sequenced nuclear genome. Additionally, reliable morphological or genetic background information is not available for each genome used in this study. This limitation stems from the in-silico nature of our research and the inherent complexity of collecting such data. However, future studies could mitigate this challenge by incorporating complementary nuclear markers and morphological or ecological data where available. This approach would enable a more comprehensive interpretation of the observed evolutionary relationships, such as those observed in Xingu-Waura or Potinha, and facilitate the geographical classification of chloroplast genomes. Therefore, utilization of the plastid genome remains the best and most universal approach.

Finally, the varieties that should be conserved — those in the LR-group — as well as those considered ancestral, such as Kculli and Enano, or those derived from early inbreeding programs, such as B73 and A188, exhibit minimal divergence at the rps15-rpl32 locus. This suggests that these genomes have remained relatively stable over time, possibly representing ancient parental lineages. In contrast, more recent landraces that have emerged through traditional farming practices with minimal to zero artificial selection, such as Iqueño, or from recently improved varieties, such as INIA601, show greater divergence at this locus. This suggests that this locus is susceptible to evolutionary changes in populations not exposed to strong selective pressures, placing them within the HR-group. This pattern aligns with the classification provided by the Ministry of Agriculture and Irrigation of Peru [85], which categorized some of the varieties investigated in the present study as follows: (i) Primitive: Kculli (zero changes), Enano (LR), and Confite Puntiagudo (IR); (ii) Ancient: Chullpi (LR), Cuzco (LR), Morocho-Cajabambino (LR), Pagaladroga (LR), and Uchuquilla (LR); and (iii) Incipient: Huancavelicano (HR), INIA601 (HR), Iqueño (HR), Mochero (HR InDels), and Mole (HR).

Conclusions

This study demonstrated that concatenated loci (rps15-rpl32_rpl23-rpl2) can identify the regional origin of different maize varieties and subspecies, associate individuals of the same variety-subspecies, and differentiate between subpopulations with low- and high- intraspecific-variety divergence. The main properties of rps15-rpl32 include its suitable length for the identification of mutation, its classification as a noncoding sequence, as well as high %A and low %GC, owing to a high bias from A and T in SNPs and InDels. This observation suggests that the evolutionary dynamics of the chloroplast genome are shaped to preserve mutations favoring A and T. This pattern may represent an evolutionary signal reflecting selective pressures or functional constraints that prioritize the development and maintenance of A/T-rich sequences.

These results indicate that rps15-rpl32 is an effective phylogenetic marker for individuals with recently established geographical relation, rather than their common ancestors. The remarkable phylogenetic capabilities of the rps15-rpl32 intergenic locus may be due to its divergence in recent years. Highly conserved varieties, as well as those considered ancestral or products of ancient inbreeding programs, do not show significant divergence at this locus. In contrast, more recent varieties that have not been subjected to artificial selection have revealed that this locus is susceptible to substantial mutations.

The rps15-rpl32 locus suggests selective pressures in the SSC region. This likely facilitates the adaptations of C4 photosynthetic efficiency in response to environmental variation in areas where Zea mays landraces are cultivated. This characteristic makes this locus useful for different landraces and as a subspecies marker, as it facilitates the easy and rapid screening of Zea mays subspecies.

Supplementary Information

12864_2025_11831_MOESM1_ESM.docx (179.7KB, docx)

Supplementary Material 1. Figure S1. SNP frequency and patterns (A) and InDels (B) at the SSC region across all analyzed Zea mays genomes.

12864_2025_11831_MOESM2_ESM.xlsx (25.5KB, xlsx)

Supplementary Material 2. Table S1. Assembly quality analysis. Genomes were assembled and evaluated using NOVOwrap (Wu et al., 2021) and the QUAST tool (Gurevich et al., 2004), respectively. The variety assembled, as well as the GenBank and SRA accession numbers, are shown. Table S2. Nucleotide composition; GC content; and analysis of variable, conserved, informative, parsimony sites, and singletons for each locus. Table S3. RSCU analysis of the rps15-rpl32 region. Bold letters indicate the highest RSCU value

Supplementary Material 3.  (120.1KB, docx)

Abbreviations

SSR

Simple sequence repeats

SNP

Single nucleotide polymorphism

DNAbc

DNA barcoding

IRa

Inverse repeat A

IRb

Inverse repeat B

SSC

Small single copy

LSC

Large single copy

ML

Maximum likelihood

K2P

Kimura 2-parameter

PE

Peru

MXP

Mexico– Parviglumis

MXR

Mexico– RIMME

CN

China

US

United States of America

BR

Brazil

IN

India

bt

Boostrap value support

Authors’ contributions

LU, FP. and MP: conceptualization, methodology, investigation, and formal analysis; LU: writing—original draft and visualization; FP, CR, and MP: writing—review and editing; MP: resources and supervision. All authors read and approved the final manuscript.

Funding

This work was supported by the FIC-Ñuble (grant number 40035912–0/2022). The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

FIC ÑUBLE 40035912-0/2022

Data availability

All data generated or analyzed during this study are available in the main text, supplementary materials, public databases, or referenced permanent online repositories. The accession numbers of sequence data used in this study are listed in Table S1.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.López E. El maíz En América Latina: Contaminación del centro de origen del maíz. In: Revista Semillas. 2005.https://www.semillas.org.co/es/el-maz-en-amrica-latina-contaminacin-del-centro-de-origen-del-maz.
  • 2.Guzzon F, Arandia Rios LW, Caviedes Cepeda GM, Céspedes Polo M, Chavez Cabrera A, Muriel Figueroa J, et al. Conservation and use of Latin American maize diversity: Pillar of nutrition security and cultural heritage of humanity. Agronomy. 2021;11:172. 10.3390/agronomy11010172. [Google Scholar]
  • 3.Saavedra G. Clasificacion Botanica, germinacion y Desarrollo. In: Boletin INIA - Instituto de Investigaciones Agropecuarias, No. 303. Santiago. 2014. https://hdl.handle.net/20.500.14001/7803. Accessed 18 Oct 2023.
  • 4.Paliwal R, Granados G, Lafitte H, Violic A. El Maíz En Los Trópicos: Mejoramiento y producción. 2001. https://www.fao.org/3/x7650S/x7650s00.htm.
  • 5.Montoro AE, Ruiz MB. Ecofisiología del cultivo de maíz dulce (Zea mays L. var. saccharata). Hortic Argent. 2017;36:153–66 https://hdl.handle.net/20.500.12123/4402. [Google Scholar]
  • 6.OECD Environment Health and Safety Publications. Consensus Document on the Biology of Zea mays subsp. mays (maize). 2003. http://www.oecd.org/biotrack/.
  • 7.Food and Agriculture Organization of the United Nations. FAOSTAT: Crops and livestock products. 2024. https://www.fao.org/faostat. Accessed 13 Dec 2024.
  • 8.Eguillor P. ¿Qué son las Indicaciones Geográficas y las Denominaciones de Origen?. 2015. https://www.odepa.gob.cl/wp-content/uploads/2015/12/IDyDO2015.pdf.
  • 9.Olivos MC, Carrasco F. Agregar Valor a Los Productos Tradicionales de Chile Con El Sello de Origen.: Revista de la OMPI. 2016. https://www.wipo.int/es/web/wipo-magazine/articles/adding-value-to-chiles-heritage-products-with-the-emsello-de-origenem-39597.
  • 10.Salmeri C. Plant morphology: outdated or advanced discipline in modern plant sciences? Flora Mediterr. 2019;29:163–80. 10.7320/FlMedit29.163. [Google Scholar]
  • 11.Tessler M, Galen SC, DeSalle R, Schierwater B. Let’s end taxonomic blank slates with molecular morphology. Front Ecol Evol. 2022;10:1016412. 10.3389/fevo.2022.1016412. [Google Scholar]
  • 12.Zhao S, Chen X, Song J, Pang X, Chen S. Internal transcribed spacer 2 barcode: a good tool for identifying Acanthopanacis cortex. Front Plant Sci. 2015;6:840. 10.3389/fpls.2015.00840. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Zhang W, Tian W, Gao Z, Wang G, Zhao H. Phylogenetic utility of rRNA ITS2 sequence-structure under functional constraint. Int J Mol Sci. 2020;21:6395. 10.3390/ijms21176395. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Unamba CIN, Nag A, Sharma RK. Next generation sequencing technologies: The doorway to the unexplored genomics of non-model plants. Front Plant Sci. 2015;6:1074. 10.3389/fpls.2015.01074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Salazar E, González M, Araya C, Mejía N, Carrasco B. Genetic diversity and intra-racial structure of Chilean Choclero corn (Zeamays L.) germplasm revealed by simple sequence repeat markers (SSRs). Sci Hortic. 2017;225:620–9. 10.1016/j.scienta.2017.08.006. [Google Scholar]
  • 16.Caldu-Primo JL, Mastretta-Yanes A, Wegier A, Piñero D. Finding a needle in a haystack: distinguishing mexican maize landraces using a small number of SNPs. Front Genet. 2017;8:45. 10.3389/fgene.2017.00045. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kress WJ, García-Robledo C, Uriarte M, Erickson DL. DNA barcodes for ecology, evolution, and conservation. Trends Ecol Evol. 2015;30:25–35. 10.1016/j.tree.2014.10.008. [DOI] [PubMed] [Google Scholar]
  • 18.Kress WJ, Erickson DL. A two-locus global DNA barcode for land plants: The coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS ONE. 2007;2:e508. 10.1371/journal.pone.0000508. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ratnasingham S, Hebert PDN. BOLD: The barcode of life data system (http://www.barcodinglife.org). Mol Ecol Notes. 2007;7:355-64. 10.1111/j.1471-8286.2007.01678.x. [DOI] [PMC free article] [PubMed]
  • 20.Taberlet P, Coissac E, Pompanon F, Gielly L, Miquel C, Valentini A, et al. Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res. 2007;35:e14. 10.1093/nar/gkl938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Savolainen V, Cowan RS, Vogler AP, Roderick GK, Lane R. Towards writing the encyclopedia of life: an introduction to DNA barcoding. Philos Trans R Soc Lond B Biol Sci. 2005;360:1805–11. 10.1098/rstb.2005.1730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Yang F, Ding F, Chen H, He M, Zhu S, Ma X, et al. DNA barcoding for the identification and authentication of animal species in traditional medicine. Evid Based Complement Alternat Med. 2018;2018:5160254. 10.1155/2018/5160254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH. Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci U S A. 2005;102:8369–74. 10.1073/pnas.0503123102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Xu J. Fungal DNA barcoding. Genome. 2016;59:913–32. 10.1139/gen-2016-0046. [DOI] [PubMed] [Google Scholar]
  • 25.Yi DK, Lee HL, Sun BY, Chung MY, Kim KJ. The complete chloroplast DNA sequence of Eleutherococcus senticosus (Araliaceae); comparative evolutionary analyses with other three Asterids. Mol Cells. 2012;33:497–508. 10.1007/s10059-012-2281-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Wu L, Nie L, Wang Q, Xu Z, Wang Y, He C, et al. Comparative and phylogenetic analyses of the chloroplast genomes of species of Paeoniaceae. Sci Rep. 2021;11:14643. 10.1038/s41598-021-94137-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Long L, Li Y, Wang S, Liu Z, Wang J, Yang M. Complete chloroplast genomes and comparative analysis of Ligustrum species. Sci Rep. 2023;13:212. 10.1038/s41598-022-26884-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.GenBank. National Center for Biotechnology Information, Bethesda. Zea mays chloroplast genome, B73 reference genome (GENBank BioProject PRJNA10769). 2024. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA10769.
  • 29.Maier RM, Neckermann K, Igloi GL, Kössel H. Complete sequence of the maize chloroplast genome: Gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J Mol Biol. 1995;251:614–28. 10.1006/jmbi.1995.0460. [DOI] [PubMed] [Google Scholar]
  • 30.Lahaye R, van der Bank M, Bogarin D, Warner J, Pupulin F, Gigot G, et al. DNA barcoding the floras of biodiversity hotspots. Proc Natl Acad Sci USA. 2008;105:2923–8. 10.1073/pnas.0709936105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Raskoti BB, Ale R. DNA barcoding of medicinal orchids in Asia. Sci Rep. 2021;11:23651. 10.1038/s41598-021-03025-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Li H, Xiao W, Tong T, Li Y, Zhang M, Lin X, et al. The specific DNA barcodes based on chloroplast genes for species identification of Orchidaceae plants. Sci Rep. 2021;11:1424. 10.1038/s41598-021-81087-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Pang X, Liu C, Shi L, Liu R, Liang D, Li H, et al. Utility of the trnH–psbA intergenic spacer region and its combinations as plant DNA barcodes: a meta-analysis. PLoS ONE. 2012;7:e48833. 10.1371/journal.pone.0048833. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Okoth P, Muoma J, Emmanuel M, Clabe W, Omayio DO, Angienda PO. The potential of DNA barcode-based delineation using seven putative candidate loci of the plastid region in inferring molecular diversity of cowpea at sub-species level. Am J Mol Biol. 2016;6:138–58. 10.4236/ajmb.2016.64014. [Google Scholar]
  • 35.Raskoti BB, Jin W, Xiang X, Schuiteman A, Li D, Li J, et al. A phylogenetic analysis of molecular and morphological characters of Herminium (Orchidaceae, Orchideae): evolutionary relationships, taxonomy, and patterns of character evolution. Cladistics. 2016;32:198–210. 10.1111/cla.12125. [DOI] [PubMed] [Google Scholar]
  • 36.Kistler L, Maezumi SY, Gregorio de Souza J, Przelomska NAS, Malaquias Costa F, Smith O, et al. Multiproxy evidence highlights a complex evolutionary legacy of maize in South America. Science. 2018;2018(362):1309–13. 10.1126/science.aav0207. [DOI] [PubMed] [Google Scholar]
  • 37.Andrews S. FastQC: A quality control tool for high throughput sequence data [Computer software]. Babraham Bioinformatics, Cambridge. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
  • 38.Wu P, Xu C, Chen H, Yang J, Zhang X, Zhou S. NOVOWrap: An automated solution for plastid genome assembly and structure standardization. Mol Ecol Resour. 2021;21:2177–86. 10.1111/1755-0998.13410. [DOI] [PubMed] [Google Scholar]
  • 39.Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq– Versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45:W6-11. 10.1093/nar/gkx391. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5. 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Li H, Guo Q, Xu L, Gao H, Liu L, Zhou X. CPJSdraw: analysis and visualization of junction sites of chloroplast genomes. PeerJ. 2023;11:e15326. 10.7717/peerj.15326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018;14:e1005944. 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546:524–7. 10.1038/nature22971. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Wickham H, François R, Henry L, Müller K. dplyr: A grammar of data manipulation. R package version 1.1.3. 2023. https://CRAN.R-project.org/package=dplyr
  • 45.Kanaries. PyGWalker: A python library for exploratory data analysis with visualization. 2023. https://github.com/Kanaries/pygwalker.
  • 46.Wickham H. ggplot2: Elegant graphics for data analysis. Springer-Verlag, Berlin. 2016. https://ggplot2.tidyverse.org
  • 47.Grant JR, Enns E, Marinier E, Mandal A, Herman EK, Chen CY, et al. Proksee: In-depth characterization and visualization of bacterial genomes. Nucleic Acids Res. 2023;51:W484–92. 10.1093/nar/gkad326. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30:772–80. 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Paradis E, Schliep K. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–8. 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
  • 51.Shaw J, Lickey EB, Schilling EE, Small RL. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: The tortoise and the hare III. Am J Bot. 2007;94:275–88. 10.3732/ajb.94.3.275. [DOI] [PubMed] [Google Scholar]
  • 52.Saldaña CL, Rodriguez-Grados P, Chávez-Galarza JC, Feijoo S, Guerrero-Abad JC, Vásquez HV, et al. Unlocking the complete chloroplast genome of a native tree species from the Amazon Basin, Capirona (Calycophyllum Spruceanum, Rubiaceae), and its comparative analysis with other Ixoroideae species. Genes (Basel). 2022;13:113. 10.3390/genes13010113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Li QJ, Su N, Zhang L, Tong RC, Zhang XH, Wang JR, et al. Chloroplast genomes elucidate diversity, phylogeny, and taxonomy of Pulsatilla (Ranunculaceae). Sci Rep. 2020;10:19781. 10.1038/s41598-020-76699-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Yu W, Li XJ, Lv Z, Yang LE, Peng DL. The complete chloroplast genome sequences of monotypic genus Pseudogalium, and comparative analyses with its relative genera. BMC Genomics. 2025;26:93. 10.1186/s12864-025-11276-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Ho VT, Tran TKP, Vu TTT, Widiarsih S. Comparison of matK and rbcL DNA barcodes for genetic classification of jewel orchid accessions in Vietnam. J Genet Eng Biotechnol. 2021;19:93. 10.1186/s43141-021-00188-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Martín M, Sabater B. Plastid ndh genes in plant evolution. Plant Physiol Biochem. 2010;48:636–45. 10.1016/j.plaphy.2010.04.009. [DOI] [PubMed] [Google Scholar]
  • 57.Laughlin TG, Bayne AN, Trempe JF, Savage DF, Davies KM. Structure of the complex I-like molecule NDH of oxygenic photosynthesis. Nature. 2019;566:411–4. 10.1038/s41586-019-0921-0. [DOI] [PubMed] [Google Scholar]
  • 58.Ishikawa N, Takabayashi A, Noguchi K, Tazoe Y, Yamamoto H, von Caemmerer S, et al. NDH-mediated cyclic electron flow around photosystem I is crucial for C4 photosynthesis. Plant Cell Physiol. 2016;57:2020–8. 10.1093/pcp/pcw127. [DOI] [PubMed] [Google Scholar]
  • 59.Ma M, Liu Y, Bai C, Yong JWH. The significance of chloroplast NAD(P)H dehydrogenase complex and its dependent cyclic electron transport in photosynthesis. Front Plant Sci. 2021;12:661863. 10.3389/fpls.2021.661863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Martín M, Funk HT, Serrot PH, Poltnigg P, Sabater B. Functional characterization of the thylakoid Ndh complex phosphorylation by site-directed mutations in the ndhF gene. Biochim Biophys Acta. 2009;1787:920–8. 10.1016/j.bbabio.2009.03.001. [DOI] [PubMed] [Google Scholar]
  • 61.Fleischmann TT, Scharff LB, Alkatib S, Hasdorf S, Schöttler MA, Bock R. Nonessential plastid-encoded ribosomal proteins in tobacco: a developmental role for plastid translation and implications for reductive genome evolution. Plant Cell. 2011;23:3137–55. 10.1105/tpc.111.088906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Dong W, Xu C, Wen J, Zhou S. Evolutionary directions of single nucleotide substitutions and structural mutations in the chloroplast genomes of the family Calycanthaceae. BMC Evol Biol. 2020;20:96. 10.1186/s12862-020-01661-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Blank D, Wolf L, Ackermann M, Silander OK. The predictability of molecular evolution during functional innovation. Proc Natl Acad Sci USA. 2014;111:3044–9. 10.1073/pnas.1318797111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Hess WR, Prombona A, Fieder B, Subramanian AR, Börner T. Chloroplast rps15 and the rpoB/C1/C2 gene cluster are strongly transcribed in ribosome-deficient plastids: Evidence for a functioning non-chloroplast-encoded RNA polymerase. EMBO J. 1993;12:563–71. 10.1002/j.1460-2075.1993.tb05688.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Yang C, Wang K, Zhang H, Guan Q, Shen J. Analysis of the chloroplast genome and phylogenetic evolution of three species of Syringa. Mol Biol Rep. 2023;50:665–77. 10.1007/s11033-022-08004-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Chen HC, Stern DB. Specific binding of chloroplast proteins in vitro to the 3′ untranslated region of spinach chloroplast petD mRNA. Mol Cell Biol. 1991;11:4380–8. 10.1128/mcb.11.9.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Chakraborty S, Sophiarani Y, Uddin A. Free energy of mRNA positively correlates with GC content in chloroplast transcriptomes of edible legumes. Genomics. 2021;113:2826–38. 10.1016/j.ygeno.2021.06.026. [DOI] [PubMed] [Google Scholar]
  • 68.Santos L, Alves A, Alves R. Evaluating multi-locus phylogenies for species boundaries determination in the genus Diaporthe. PeerJ. 2017;5:e3120. 10.7717/peerj.3120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and deletions: Computational methods, evolutionary dynamics, and biological applications. Mol Biol Evol. 2024;41:msae177. 10.1093/molbev/msae177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Kim K, Lee SC, Lee J, Lee HO, Joh HJ, Kim NH, et al. Comprehensive survey of genetic diversity in chloroplast genomes and 45S nrDNAs within panax ginseng species. PLoS ONE. 2015;10:e0117159. 10.1371/journal.pone.0117159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Ye J, Luo Q, Lang Y, Ding N, Jian YQ, Wu ZK, et al. Analysis of chloroplast genome structure and phylogeny of the traditional medicinal of Ardisia crispa (Myrsinaceae). Sci Rep. 2024;14:19045. 10.1038/s41598-024-66563-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Li DM, Pan YG, Liu HL, Yu B, Huang D, Zhu GF. Thirteen complete chloroplast genomes of the costaceae family: Insights into genome structure, selective pressure and phylogenetic relationships. BMC Genomics. 2024;25:68. 10.1186/s12864-024-09996-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Smirnov V, Warnow T. Phylogeny estimation given sequence length heterogeneity. Syst Biol. 2021;70:268–82. 10.1093/sysbio/syaa058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Srivathsan A, Meier R. On the inappropriate use of Kimura-2-parameter (K2P) divergences in the DNA-barcoding literature. Cladistics. 2012;28:190–4. 10.1111/j.1096-0031.2011.00370.x. [DOI] [PubMed] [Google Scholar]
  • 75.Bedoya CA, Dreisigacker S, Hearne S, Franco J, Mir C, Prasanna BM, et al. Genetic diversity and population structure of native maize populations in Latin America and the Caribbean. PLoS ONE. 2017;12:e0173488. 10.1371/journal.pone.0173488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Vigouroux Y, Glaubitz JC, Matsuoka Y, Goodman MM, Sánchez GJ, Doebley J. Population structure and genetic diversity of New World maize races assessed by DNA microsatellites. Am J Bot. 2008;95:1240–53. 10.3732/ajb.0800097. [DOI] [PubMed] [Google Scholar]
  • 77.Provan J, Soranzo N, Wilson NJ, Goldstein DB, Powell W. A low mutation rate for chloroplast microsatellites. Genetics. 1999;153:943–7. 10.1093/genetics/153.2.943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Smith DR. Mutation rates in plastid genomes: They are lower than you might think. Genome Biol Evol. 2015;7:1227–34. 10.1093/gbe/evv069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Christie JR, Beekman M. Uniparental inheritance promotes adaptive evolution in cytoplasmic genomes. Mol Biol Evol. 2016;34:677–91. 10.1093/molbev/msw266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Andrews TD, Jermiin LS, Easteal S. Accelerated evolution of cytochrome b in simian primates: adaptive evolution in concert with other mitochondrial proteins? J Mol Evol. 1998;47:249–57. 10.1007/PL00006382. [DOI] [PubMed] [Google Scholar]
  • 81.Aoyagi Blue Y, Sakai S. Low mutation rates promote the evolution of advantageous traits by preventing interference from deleterious mutations. Genetica. 2020;148:101–8. 10.1007/s10709-020-00091-6. [DOI] [PubMed] [Google Scholar]
  • 82.Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007;317:338–42. 10.1126/science.1138632. [DOI] [PubMed] [Google Scholar]
  • 83.Chen S, Yao H, Han J, Liu C, Song J, Shi L, et al. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS ONE. 2010;5:e8613. 10.1371/journal.pone.0008613. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Duan H, Wang W, Zeng Y, Guo M, Zhou Y. The screening and identification of DNA barcode sequences for Rehmannia. Sci Rep. 2019;9:17295. 10.1038/s41598-019-53752-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Ministerio de desarrollo agrario y riego (MIDAGRI). El maíz morado peruano. Dirección General de Políticas Agrarias– Dirección de Estudios Económicos. 2021. https://repositorio.midagri.gob.pe/handle/20.500.13036/1152.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12864_2025_11831_MOESM1_ESM.docx (179.7KB, docx)

Supplementary Material 1. Figure S1. SNP frequency and patterns (A) and InDels (B) at the SSC region across all analyzed Zea mays genomes.

12864_2025_11831_MOESM2_ESM.xlsx (25.5KB, xlsx)

Supplementary Material 2. Table S1. Assembly quality analysis. Genomes were assembled and evaluated using NOVOwrap (Wu et al., 2021) and the QUAST tool (Gurevich et al., 2004), respectively. The variety assembled, as well as the GenBank and SRA accession numbers, are shown. Table S2. Nucleotide composition; GC content; and analysis of variable, conserved, informative, parsimony sites, and singletons for each locus. Table S3. RSCU analysis of the rps15-rpl32 region. Bold letters indicate the highest RSCU value

Supplementary Material 3.  (120.1KB, docx)

Data Availability Statement

All data generated or analyzed during this study are available in the main text, supplementary materials, public databases, or referenced permanent online repositories. The accession numbers of sequence data used in this study are listed in Table S1.


Articles from BMC Genomics are provided here courtesy of BMC

RESOURCES