The rps15-rpl32 intergenic locus is a phylogeographic marker for Latin American Zea mays landrace varieties and subspecies

Luciano Univaso; Francisca Peña; Celián Román-Figueroa; Manuel Paneque

doi:10.1186/s12864-025-11831-3

. 2025 Jul 17;26:672. doi: 10.1186/s12864-025-11831-3

The rps15-rpl32 intergenic locus is a phylogeographic marker for Latin American Zea mays landrace varieties and subspecies

Luciano Univaso ¹, Francisca Peña ¹, Celián Román-Figueroa ¹, Manuel Paneque ^2,^✉

PMCID: PMC12273293 PMID: 40676502

Abstract

Background

Maize (Zea mays L.) has a deep cultural significance in Latin America, with traditional and native varieties cultivated for millennia. Approximately 220 maize races have been identified in the region. These races have adapted to the diverse environmental conditions, resulting in a considerable diversity of varieties. Although DNA barcoding is widely used for species identification, distinguishing between varieties within a species remains challenging in plants, owing to the conserved nature of standard barcoding loci. Intragenic marker combinations are often used to address this limitation. However, they remain insufficient for variety-level resolution. Therefore, we developed a pipeline to identify robust markers that can distinguish maize varieties based on their phylogeographic origins.

Results

In this study, the small single-copy (SSC) region of the chloroplast genome exhibited the highest mutation rate per nucleotide. Furthermore, the intergenic regions rps15-rpl32 and ndhD-ndhF exhibited the highest mutation rates in the SSC. Comparatively, the coding genes in this region were more conserved. The rps15-rpl32 locus demonstrated improved resolution for phylogeographic analysis when concatenated with a short genetic anchor sequence. This marker accurately identified the geographic origin of maize samples. Overall, it was the most informative marker despite its relatively low SNP and InDel frequency and moderate divergence levels. Contrastingly, the ndhF-ndhD locus exhibited higher mutation rates but failed to effectively resolve phylogenetic relationships.

Conclusions

Our findings demonstrate that concatenated loci can accurately identify the geographic origin of Zea mays varieties and subspecies and elucidate their relationships. Moreover, the superior performance of rps15-rpl32 in the delineation of phylogenetic relationships among regional genomes shows its potential application as a marker for distinguishing closely related varieties and subspecies. This locus can facilitate the streamlined validation of Zea mays varieties for regional authenticity.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12864-025-11831-3.

Keywords: Maize, Phylogenetic markers, DNA barcoding, Chloroplast genome, Single-copy region

Background

Maize (Zea mays L.) is linked to various products, foods, practices, and traditions deeply rooted in the daily experiences of most Latin American countries [1, 2]. The significant role of maize in the diet of most Latin American nations is partly due to its millennia-old cultivation and close association with cultural identity in the region [1]. Each country throughout the Americas, especially in the Central and Southern regions, has developed its own landrace varieties and subspecies, reflecting the adaptation of maize to diverse local environments [1, 2].

Maize is a plant species belonging to the genus Zea in the grass family (Poaceae), which also includes important agricultural crops such as wheat, rice, oats, and rye [3]. Cultivated maize is a domesticated plant that has evolved alongside humans since ancient times [4]. Characteristics related to yield and palatability have been artificially selected for centuries, giving rise to the most widely recognized varieties today: Zea mays var. amylacea (flour maize); Zea mays var. indurata (hard-shelled maize); Zea mays var. indentata (dent maize); Zea mays var. everta (pop maize); and Zea mays var. saccharata, which includes sweet and supersweet maize [4]. Maize is an annual plant with summer growth [5] and excellent plasticity. Thus, it can adapt to different moisture, sunlight, altitude, and temperature conditions [6].

The global production of this plant increased from 800 to 1000 M tons between 2004 and 2022 [7], with 51,8% of the production coming from the Americas, mostly from the USA, Brazil, Argentina, Mexico, and Chile [7]. This makes maize a product of considerable economic and cultural importance.

Chile and other South American countries have shown increasing interest in the protection of traditional and native crop varieties—such as maize—through certification systems such Geographical Indications and Denominations of Origin [8]. However, these certifications are often based on morphological or organoleptic traits, which may be subjective or environmentally influenced [8, 9]. This demonstrates the need for accessible, objective molecular tools that can reliably differentiate varieties by their regional origin. In this study, we investigated whether hypervariable regions of the chloroplast genome can provide such a marker system.

The differentiation of maize plant variants from similar products can be ambiguous or subjective, as morphological descriptions of closely related species are falling into disuse [10]. Several solutions to the subjectivity of morphological descriptions in taxonomy have been proposed. These mainly include molecular morphology, such as protein structure and genomic architecture [11], mRNA secondary structure [12, 13], massive sequencing techniques [14], the use of SSR [15], and SNPs [16]. Although these methods have been widely used, they are time-consuming and costly. This limits their application in the verification of regionally protected species.

An alternative method for identifying varieties, cultivars, ecotypes, or other biological entities that may be variants of the same species is DNA barcoding [17]. DNA barcoding is a taxonomic method that uses one or more short, standardized genetic markers to identify a particular species [18]. This method can be used to compare unknown DNA samples with registered species on a reference library [19]. The DNA barcode must contain sequence variations, conserved flanking loci, and a short target DNA region [20]. The advantages of this method include (i) universality: DNA barcode markers, such as the COI gene, have conserved regions across several species [21]. This universality allows their application to numerous organisms, including animals [22], plants [23], and fungi [24]. (ii) Efficiency and simplicity: The region of a single gene, intergene, or spacer can be amplified and sequenced relatively quickly and cost-effectively using Sanger or next-generation sequencing technologies.

Plastid genomes have been widely used to identify various plant species and their phylogenetic relationships [25–27]. The chloroplast (cp) DNA from Zea mays is circular, double-stranded, and composed of 140,447 base-pairs (bp) in the B73 reference genome (GenBank BioProject PRJNA10769 [28]). The circular cpDNA contains four main regions [29]: two inverted repeat regions (IRa and IRb) with approximately 22,7 kpb each; these are separated by small (SSC) and large (LSC) single-copy regions of approximately 12,5 kpb and 82,3 kbp, respectively [29].

DNA barcodes (DNAbc) from the chloroplast genome have been evaluated for universal application. Furthermore, their degree of sequence divergence, including coding (rbcL, rpoB, rpoC1, and matK) and non-coding (trnH-psbA) regions, individually and in pairs, have been evaluated in a phylogenetically diverse set of plant genera [18]. However, they have some limitations, such as low intraspecific divergence for conserved loci such as matK and rbcL [30]. This makes them suboptimal for distinguishing between varieties or subspecies within specific species taxa. Consequently, these two loci are typically concatenated with other loci, often intergenic ones [30].

The concatenated DNAbc locus has shown good results in medicinal orchids, with ITS1 + matK as the most resolutive combination [31]. Moreover, the matK + ycf1 sequence exhibited the highest genetic diversity across many plant taxonomic ranks [32].

At the species level, monocots, gymnosperms, mosses, and ferns have been identified using trnH–psbA DNAbc [33]. The double- and triple-DNAbc loci matK + trnH-psbA and rpoB + atpF-atpH + matK, respectively, differentiated between Vigna unguiculata (cowpea; Chinese bean) subspecies [34]. This demonstrates that plant species discrimination via DNAbc can be improved if more than one locus is used for analysis [35].

In the present study, we aimed to identify a hypervariable locus that can distinguish the regional origins of 54 maize varieties by comparing their chloroplast genomes. Our findings provide a basis for the streamlined validation of Zea mays subspecies and varieties for regional authenticity.

Methods

Genomic characterization of 54 Zea mays genomes

To characterize the analyzed genomes, we downloaded 22 paired-end Illumina sequencing datasets (Table 1) from the PRJNA479960 bioproject [36] in the FASTQ format. Read quality was then assessed using FastQC [37]. Genome assembly was performed using NOVOWRAP [38], and annotations in GFF3 files were obtained using GeSeq [39]. Assembly quality, including %GC content and assembly length, was evaluated using QUAST [40]. The boundaries of the four plastid regions (LSC, SSC, IRa, and IRb) were determined using cross data from NOVOWRAP and CPJSdraw [41].

Table 1.

Genomic characterization of Zea Mays subspecies and cultivars/varieties

n°	Subspecies	Cultivar/Variety	Country	Accession number	Length (bp)	%GC	LSC	SSC	IRa	IRb
1	mays	A188	USA	KF241980.1	140,437	38.45	1–82,367	105,133–117672	117,673–140,437	82,368–105,132
2	mays	Alazan^a	Perú	SRR7477692	140,489	38.44	1–82,379	105,145–117672	117,672–140,438	82,379–105,145
3	mays	Atiti-Nhae-Kowo^a	Brazil	SRR7477679	140,450	38.44	1–82,378	105,146–117685	117,686–140,451	82,379–105,145
4	mays	B37C	USA	KP966115.1	140,457	38.44	1–82,385	105,152–117691	117,692–140,457	82,386–105,151
5	mays	B37N	USA	KP966114.1	140,447	38.44	1–82,375	105,142–117681	117,682–140,447	82,376–105,141
6	mays	B37S	USA	KP966116.1	140,534	38.44	1–82,462	105,229–117768	117,769–140,534	82,463–105,228
7	mays	B37T	USA	KP966117.1	140,479	38.44	1–82,407	105,174–117713	117,714–140,479	82,408–105173
8	*mays*	B73	USA	KF241981	140,447	38.44	1–82,375	105,142–117681	117,682–140,447	82,376–105,141
9	mays	Caracarai^a	Brazil	SRR7477689	140,452	38.43	1–82,380	105,147–117686	117,687–140,452	82,381–105,146
10	mays	Cateto^a	Brazil	SRR7477706	140,453	38.45	1–82,381	105,148–117687	117,688–140,453	82,382–105,147
11	mays	Chullpia	Perú	SRR7477683	140,452	38.43	1–82,380	105,147–117686	117,687–140,452	82,381–105,146
12	mays	Confite Puntiagudo^a	Perú	SRR7477704	140,459	38.44	1–82,402	105,168–117694	117,695–140,459	82,403–105167
13	mays	Cristal^a	Brazil	SRR7477705	140,455	38.43	1–82,383	105,150–117689	117,690–140455	82,384–105,149
14	mays	Cuzco^a	Perú	SRR7477696	140,453	38.43	1–82,381	105,148–117687	117,688–140,453	82,382–105,147
15	mays	Dente de Burro^a	Brazil	SRR7477712	140,451	38.44	1–82,379	105,146–117685	117,686–140,451	82,380–105145
16	mays	Enano^a	Perú	SRR7477685	140,450	38.44	1–82,378	105,145–117684	117,685–140,450	82,379–105,144
17	mays	Granada^a	Perú	SRR7477686	140,462	38.44	1–82,403	105,170–117696	117,697–140,462	82,404–105169
18	mays	Guapi^a	Paraguay	SRR7477684	140,451	38.43	1–82,379	105,146–117685	117,686–140,451	82,380–105145
19	mays	HaiKou (mays)	China	MT721151.1	140,452	38.44	1–82,382	105,148–117687	117,688–140,452	82,383–105,147
20	mays	Huancavelicano^a	Perú	SRR7477690	140,464	38.44	1–82,405	105,172–117698	117,699–140,464	82,406–105171
21	mays	INIA601	Perú	ON920921	140,451	38.00	140,450–82394	105,159–117685	117,686–140,449	82,395–105,158
22	mays	Iqueño^a	Perú	SRR7477691	140,461	38.43	1–82,402	105,169–117695	117,696–140,461	82,403–105168
23	mays	Kculli^a	Perú	SRR7477669	140,447	38.44	1–82,375	105,142–117681	117,682–140,447	82,376–105,141
24	mays	Mochero^a	Perú	SRR7477697	140,483	38.43	11–82,404	105,163–117705	117,706–10	82,405–105162
25	mays	Mole^a	Brazil	SRR7477709	140,462	38.44	1–82,403	105,170–117696	117,697–140,462	82,404–105169
26	mays	Morocho^a	Perú	SRR7477694	140,461	38.43	1–82,402	105,169–117695	117,696–140,461	82,403–105168
27	mays	Morocho Cajabambino^a	Perú	SRR7477695	140,457	38.43	1–82,385	105,152–117691	117,692–140,457	82,386–105,151
28	mays	Pagaladroga Raymondi^a	Perú	SRR7477699	140,454	38.44	1–82,382	105,149–117688	117,689–140,454	82,383–105,148
29	mays	Palha-Roxa-Acreana^a	Brazil	SRR7477703	140,460	38.43	1–82,387	105,154–117694	117,695–140,460	82,388–105,153
30	mays	Paro^a	Peru	SRR7477677	140,464	38.44	1–82,405	105,172–117698	117,699–140,464	82,406–105171
31	Parviglumis	PC_I11	Mexico	BK061469.1	140,461	38.44	1–82,402	105,169–117695	117,696–140,461	82,403–105168
32	Parviglumis	PC_I53	Mexico	BK061463.1	140,369	38.44	1–82,312	105,078–117604	117,605–140369	82,313–105,077
33	Parviglumis	PC_J14	Mexico	BK061464.1	140,459	38.44	1–82,400	105,167–117693	117,694–140,459	82,401–105166
34	Parviglumis	PC_J48	Mexico	BK061467.1	140,457	38.44	1–82,398	105,165–117691	117,692–140,457	82,399–105,164
35	Parviglumis	PC_K55	Mexico	BK061468.1	140,425	38.45	1–82,369	105,135–117660	117,661–140,425	82,370–105134
36	Parviglumis	PC_L06	Mexico	BK061465.1	140,454	38.44	1–82,397	105,163–117689	117,690–140454	82,398–105,162
37	Parviglumis	PC_L48	Mexico	BK061470.1	140,463	38.44	1–82,404	105,171–117697	117,698–140,463	82,405–105170
38	Parviglumis	PC_N10	Mexico	BK061466.1	140,370	38.44	1–82,313	105,079–117605	117,606–140370	82,314–105,078
39	Parviglumis	PC_N14	Mexico	BK061471.1	140,462	38.44	1–82,403	105,170–117696	117,697–140,462	82,404–105169
40	Huehuetenangensis	PI_441934	Guatemala	KR873422.1	140,453	38.44	1–82,394	105,161–117687	117,688–140,453	82,395–105,160
41	mays	Pontinha^a	Brazil	SRR7477700	140,458	38.44	1–82,401	105,167–117693	117,694–140,458	82,402–105166
42	Mexicana	RIMME0018	Mexico	BK061460.1	140,504	38.48	1–82,445	105,212–117738	117,739–140,504	82,446–105,211
43	Mexicana	RIMME0019	Mexico	BK061462.1	140,535	38.48	1–82,476	105,243–117769	117,770–140535	82,477–105,242
44	Mexicana	RIMME0023	Mexico	BK061461.1	140,537	38.48	1–82,480	105,246–117772	117,773–140,537	82,481–105,245
45	Mexicana	RIMME0026	Mexico	BK061458.1	140,539	38.48	1–82,480	105,247–117773	117,774–140,539	82,481–105,246
46	Mexicana	RIMME0031	Mexico	BK061459.1	140,495	38.47	1–82,436	105,203–117729	117,730–140495	82,437–105,202
47	mays	Semi-Dent^a	Brazil	SRR7477687	140,449	38.44	1–82,377	105,144–117683	117,684–140,449	82,378–105,143
48	mays	SUPG020	India	OP832502.1	140,449	38.44	1–82,372	105,111–117720	117,721–140,449	82,373–105,110
49	mays	SUPG021	India	OP832501.1	140,411	38.43	1–82,333	105,067–117676	117,677–140,411	82,334–105,066
50	mays	Sweet	China	OM317760.1	142,545	38.44	1–82,465	105,232–117772	117,773–140,538	82,466–105,231
51	mays	Uchuquilla^a	Peru	SRR7477681	140,450	38.44	1–82,378	105,145–117684	117,685–140,450	82,379–105,144
52	mays	Waxy	China	OM317761.1	140,454	38.44	1–82,382	105,149–117688	117,689–140,454	82,383–105,148
53	mays	Xingu-waura_a	Brazil	SRR7477701	140,484	38.44	1–82,426	105,192–117718	117,719–140,483	82,427–105,191
54	mays	Zhengdan958	China	MK348606.1	142,460	38.44	1–82,382	105,149–117688	117,689–140,454	82,383–105,148

Open in a new tab

^agenome assembled for this project, Bold name is the reference genome

LSC Large single-copy region, SSC Small single-copy region, IRa Inverted Repeat A, IRb Inverted Repeat B. %GC, percent of G + C at that zone

SNPs, InDels, and %GC in plastid regions (LSC, SSC, IRa, and IRb)

We used MUMmer [42] to identify SNPs and InDels and subsequently applied the nucmer module to compare the 22 genomes with the B73 reference genome [43]. Delta and coordinate files were processed using custom Python scripts to consolidate all mutations into a single CSV table. Each SNP and InDel was mapped to genes, intergenic regions, country, continent, and plastid regions (LSC, SSC, IRa, and IRb) using RStudio (v2024.12.0, Posit Software, Boston, MA, USA) with the dplyr [44] library. The mutation frequency at each locus across genomes was calculated with RStudio and visualized using PyGWalker [45].

The aligned genomes were segmented into four separate FASTA files representing the plastid regions, and the nucleotide composition (%A, %T, %C, %G, and %GC) was calculated using Python scripts. Mutation rates were determined by dividing the total number of mutation events (e.g., an InDel of 3 bp was treated as three events) by the length of each genomic region; the results were visualized using the ggplot2 library in RStudio [46].

Gene and intergenic region characterization in the SSC region

CPJSdraw was used to represent the B73 chloroplast junctions, whereas Proksee was used to represent the SSC zone [47]. The mutation frequency per base pair was calculated by dividing the number of events by the locus length, and the results were plotted using PyGWalker. Bash and Python custom scripts were used to extract gene and intergenic sequences from the FASTA genomes using GFF3 associated with an individual FASTA file.

Divergence analysis using K2P distances

Multiple sequence alignments of all loci were performed using MAFFT v7 [48] with default parameters. Pairwise genetic distances were calculated using the Kimura 2-parameter (K2P) model with Biopython-based scripts. Divergence was quantified using the following formula:

where"n"represents the number of genome pairs with zero K2P distance. The maximum intraspecific K2P was calculated using Biopython and custom Python scripts through the Kimura equation for divergence values. Gene- and intergenic-level K2P distances were derived from all FASTA files across genomes.

Phylogenetic analysis of selected loci

Rstudio was used to map mutations (SNPs and InDels) at each locus, including reference nucleotide changes and mutation frequencies (i.e., the times when the same event occurs) across genomes, and plotted as a matrix of each nucleotide. Trees were generated through IQ-TREE 2 [49] using the following parameters: model = F81 + F; bootstrap = 1000. The log-likelihood values are shown in the description of each figure. The resulting Newick trees were plotted in RStudio using the ape [50] and ggtree libraries.

Results

Genomic characterization of 54 Zea mays genomes

We used four subspecies of Zea mays (Table 1) from different countries: mays (12 inbred and 27 landraces), huehuetenangensis (1), mexicana (5), and parviglumis (9). The Zea mays B73 [28] reference genome is shown in bold text.

Table 1 shows the species, subspecies, cultivars/varieties, %GC, and total length, along with the positions of the beginning and end of the LSC, SSC, IRa, and IRb regions.

Most genomes were normal in length (average length = 140,535 bp). The shortest was a member of the parviglumis subspecies (I53). In contrast, two Chinese inbreeds, Zhengdan958 and sweet maize, possessed the longest genomes (approximately 2000 bp more than the average). In both cases, the minimum and maximum length differences were not explained by an extension of the four regions that comprised the plastid genome.

The %GC was almost equal to that of the reference genome, and the average length of every genome analyzed (38.43%) was close to the ranges shown in Table 1.

The LSC predominantly showed the same initial origin from base pairs 1–82,300. The beginning of this region shifted by one nucleotide in INIA601; however, this did not affect the other three regions. Similarly, Mochero shifted by 11 nucleotides to move the frame of the IRa region. None of these shifts affected the standard organization of the loci (Table 1).

SNPs, InDels, and %GC in the LSC, SSC, Ira, and IRb regions and their associated loci

To identify regions with the highest mutation density and their potential impact, we analyzed the total number of mutation events (defined as the total number of nucleotides involved in SNPs and InDels) across all maize varieties and subspecies within four chloroplast genomic regions (LSC, SSC, IRa, and IRb) (Fig. 1). The LSC and SSC regions exhibited the highest mutation densities; the total number of affected nucleotides was predominantly associated with insertions in the LSC and deletions in the SSC (Fig. 1c). In contrast, the IRa and IRb regions showed the lowest mutation levels across all genomes (Fig. 1a). The SSC region predominantly consisted of SNPs and deletions, whereas the LSC region exhibited a higher incidence of insertions and SNPs (Fig. 1a, bottom panel). Most mutations were located in the intergenic regions of the SSC and LSC zones (Fig. 1a, top panel). LSC is the longest of the four regions and is consequently more prone to mutations. Therefore, we normalized the mutation counts based on the length of each zone (Fig. 1b). This normalization enabled the comparison of each chloroplast region, such that the differences were attributed to their mutation rate per nucleotide base; thus, the region length was not an interfering factor. Insertions were the most frequent mutation type across all regions after normalization (Fig. 1b).

The IRa and IRb regions exhibited similar normalized mutation rates that were consistent with their comparable lengths, whereas the LSC region had the lowest normalized mutation rates (Fig. 1b). The SSC exhibited the highest total normalized mutation rate (Fig. 1b).

This pattern aligned with the low GC content observed in the SSC region, which was the lowest among the four chloroplast regions (inset box, Fig. 1b). The SNP distribution of the SSC showed high bias of G to A and G to T changes across the genomes (Figure S1A). Contrastingly, the least common changes were T to C, C to G, and A to T. This is consistent with the low levels of %GC observed in the SSC.

Genome-wide analysis of the four regions further confirmed that the SSC and LSC regions consistently harbored most of the mutations (Fig. 1c). These findings indicate that the SSC region had the highest mutation rate across all analyzed genomes.

Characterization of genes and intergenes and co-localization of SNP-InDels at the SSC

We analyzed the SSC region to characterize both coding genes and non-coding intergenic regions (hereafter referred to as"intergenes"). We sought to identify conserved and variable sites within the SSC and identify the most divergent loci.

The SSC region spans 12,540 bp and is flanked by ndhF at the beginning and ndhH at the end (Fig. 2a). Analysis of mutations (SNPs, insertions, and deletions) in the SSC revealed markedly short loci within this region, leading to artifacts in the mutation rate calculations (Figure S5). This was particularly evident in the ndhI-ndhG locus. This locus was only 197 bp long and displayed high mutation rates per base pair (Fig. 2d). We restricted our analysis to loci longer than 1,000 bp (indicated by the orange cutoff line in Fig. 2c) to mitigate these artifacts.

The intergenic regions rps15-rpl32 and ndhD-ndhF exhibited the highest mutation rates in SSC (Fig. 2). In contrast, the coding genes in this region were more conserved, with fewer mutations and a lower diversity of mutation types than those in other regions (Fig. 2c-d). The rps15-rpl32 and ndhD-ndhF loci also had low GC content (approximately 30%; 29.24% for ndhD-ndhF and 30.23% for rps15-rpl32). This was consistent with their higher mutation rates.

These findings indicate that the loci with the highest mutation rates within SSC are located at intergenic regions; rps15-rpl32 and ndhD-ndhF were the most divergent loci. This highlights their potential utility in phylogenetic and comparative genomic studies.

Hypervariable zones and conserved genetic anchors for analysis of phylogenic varieties

After identification of genes and non-codificant regions with more mutations in the variable zone of the chloroplast, we analyzed these loci to develop a phylogenetic tree that could identify every variety and subspecies as a possible DNA barcode.

Divergence by K2P of the highest mutant loci

We performed K2P divergence analyses across 54 genomes to evaluate the phylogenetic resolution of the rps15-rpl32 and ndhD-ndhF loci. Common DNA barcoding loci (rbcL and matK) were used as negative controls for divergence.

Table 2 summarizes the K2P divergence results. As expected, the intergenic loci had the highest %K2P divergence compared to the gene loci. The rps15-rpl32 intergenic region demonstrated the highest phylogenetic resolution at 17.6% divergence, whereas ccsA showed the lowest at 3.8%. Similarly, ndhD-ndhF displayed considerable %K2P divergence that almost matched that of rps15-rpl32.

Table 2.

Percentage of K2P-resolution based on regional groups

Open in a new tab

Groups: PE Peru, MXP Mexico-Parviglumis, MXR Mexico-RIMME, CN China, US United States, BR Brazil, IN India. Colors represent the numeric value on a scale from green (highest) to red (lowest).

The most divergent loci in the SSC zone were rps15-rpl32 and ndhD-ndhF; rps15-rpl32 was the best of the group (17.6% resolution). In contrast, the worst was ccsA at 3.846% resolution (Table 2).

Although the intergenic loci had the highest %K2P resolution, the columns with K2P values of intraspecific divergence varied for the different groups, with the values categorized as a heatmap (Table 2). Although rps15-rpl32 exhibited discrete intraspecific divergence, it was more conserved than ndhD-ndhF.

At the group level, Peru had the most intraspecific divergent genomes, followed by India. In contrast, the USA genomes were the most intraspecific non-divergent, indicating that they were highly conserved. The Paraguay and Guatemala groups contained only one genome; therefore, no intragenic divergence was calculated.

These results showed that genes are highly conserved in SSC. Moreover, the intra- and interspecific divergence values of these genes were compared with those of the most common loci used as DNAbc, such as matK and rbcL. This indicates that they have low mutation rates in chloroplast genomes. Furthermore, the analyzed loci showed varying degrees of divergence across different groups, indicating that the rate of mutation differed by group, individual genome, and loci.

rps15-rpl32 as possible regional marker

After K2P divergence analysis, we reconstructed the phylogenetic relationships of the 54 genomes using the best-performing loci. We used rbcL as a negative control for effective comparison, owing to its comparable K2P resolution to matK, along with its SNP and InDel profiles.

Phylogenetic trees based on the rps15-rpl32 and ndhD-ndhF loci

The phylogenetic trees inferred by Maximum Likelihood (ML) show branch colors corresponding to the regional origin of each accession, highlighting geographic clustering patterns and frequencies of SNPs and InDels for each locus (Fig. 3). These phylogenetic trees provide insight into sequence variation. Statistical and construction parameters are provided in material and methods, whereas branch lengths and bootstrap (bt) are shown at the tree nodes.

rbcL (Fig. 3c) exhibited limited phylogenetic resolution and failed to effectively represent the varieties. Contrastingly, ndhF (Fig. 3d) slightly improved in resolution (K2P = 7.6%) and formed better-defined clades. The first part of the tree contained a big part of the Latin American landraces (PE, PA, and BR) without significant internal differentiation. The first clade (bt = 53) defined primarily consisted of Peruvian maize, except for Mole. The second clade (bt = 84) contained most of Parviglumis and Mexicana (both from Mexico) and a third clade (bt = 71) had the only two Indian individuals. The remaining genomes were arbitrarily arranged into a single fourth clade (bt = 71), indicating limited resolution for the rest of the dataset. Most of these internal nodes received moderate to high bootstrap support.

Analysis of the nucleotide frequencies of SNPs and InDels in rbcL and ndhF (Fig. 3a–b) revealed minor changes in rbcL (e.g., transitions from A to C and A and C deletions). In contrast, ndhF exhibited a higher frequency of changes (i.e., G-to-A substitutions, C deletions, and some A insertions) than rbcL. These observations suggest that the frequency and type of SNPs may define the phylogenetic relationships among genomes.

The ndhF-ndhD and rps15-rpl32 intergenic loci outperformed genes in resolving phylogenetic relationships. ndhF-ndhD formed six clades; however, these clades failed to align with the regional origins of the groups and appeared randomly distributed.

rps15-rpl32 (17.6% resolution) formed seven clades that grouped individuals of the same varieties and regional origins (Fig. 4c). Clade 1 (bt = 67) grouped three genomes from BR and PE that were not well-defined. The Mexican subspecies were grouped in clades 2 (bt = 19) and 4 (bt = 68), including Huehuetenangensis from GU, which shared Central American origins. Clade 3 (bt = 73) mostly grouped the PE varieties, except for Mole, whereas the others were grouped in clade 7 (bt = 75) with the BR group and a member of the PA group. Finally, clade 6 (bt = 85) mostly grouped the inbred strains (except for A188 and Zhengdan956) with the reference genome and CMS-B37 strains. In contrast, A188 and the Sweet variety did not cluster within the major clades identified in the phylogeny. Rather, they appeared as distinct lineages (Fig. 4d).

Analysis of SNP and InDel frequencies revealed that intergenic loci exhibited significantly higher mutation rates than genic regions (Fig. 4), as expected from the %K2P-based divergence (Table 2).

Transitions from T to A were predominant in the ndhF-ndhD intergenic region, followed by those from A to C, C to T, and G to T. This region also showed a high frequency of A and T deletions, whereas T and A insertions were less frequent. In contrast, the rps15-rpl32 locus displayed a discrete overall mutation rate, with SNPs primarily involving A-to-C transitions, and less commonly, G-to-A and G-to-T transitions. The InDel patterns in rps15-rpl32 were characterized by the prevalence of A deletions and T insertions.

These findings highlight the potential ability of intergenic loci to provide a higher phylogenetic resolution than genic regions. The rps15-rpl32 locus was the most informative marker, despite its relatively low SNP and InDel frequency and moderate levels of interspecific and intraspecific divergence. In contrast, ndhF-ndhD exhibited higher mutation rates but failed to effectively resolve the phylogenetic relationships at both the regional and broader scale.

The superior performance of rps15-rpl32 in the delineation of phylogenetic relationships among regional genomes underscores its potential as a marker for distinguishing closely related varieties and subspecies. These results suggest that the ideal locus for phylogenetic inference should balance discrete divergence with adequate mutation frequency to ensure clear resolution.

Concatenated loci provide the best phylogenetic tree

To enhance the resolution of the rps15-rpl32 phylogenetic tree, we concatenated this locus with other genic and intergenic regions to identify markers that could provide better phylogenetic resolution. We generated > 3900 combinations of rps15-rpl32 with various genes and intergenic regions of the plastid genomes. Of these, the concatenation of rps15-rpl32 with rpl23-rpl2 yielded the best phylogenetic resolution; a total K2P divergence of 25.49% was observed (Table 3).

Table 3.

Percentage resolution based on the K2P divergence of simple and concatenated loci

		Maximum-Intraspecific K2P distances
Locus	%K2P	PE	MXP	MXR	CN	US	BR	IN
rps15-rpl32	17.64	0.00379	0.001263	0.000632	0.000316	0.000316	0.001579	0.005054
rpl23-rpl2	1.92	0.26315	0	0	0	0	0.157895	0
rpl23-rpl2 + rps15-rpl32	25.49	0.00471	0.001256	0.000628	0.000314	0.000314	0.002512	0.005024

Open in a new tab

PE Peru, MXP Mexico-Parviglumis, MXR Mexico-RIMME, CN China, US United States, BR Brazil, IN India

The phylogenetic tree based on this concatenation is shown in Fig. 5a. Seven groups were identified: (i; bt = 91) SA1, formed mainly by Peruvian high-rate (HR) mutations, except for Mole, which is part of the Brazilian HR mutations (Fig. 5b); (ii; bt = 64) SA2, formed by Brazilian and Peruvian low-rate (LR) mutations and SemiDent of the Brazilian HR mutations (Fig. 5a–b). This could be due to the low (but not the lowest) value of intragenic divergence in the Brazilian group at this specific locus, which may not fully reflect the divergence of these genomes; (iii; bt = 66) MX1, formed by four LR mutations in Parviglumis and Mexicana-RIMME0023, the lowest in SNPs and InDels in the Mexicana subspecies (Fig. 5b); (iv; bt = 65) MX2, formed by HR mutations in both Parviglumis and Mexicana subspecies (Fig. 5b); (v; bt = 74) IN, formed only by the two SUPG individuals; (vi; bt = 75) CN, formed only by Chinese genomes; and (vii; bt = 42) US, formed by B37 and B73.

These results show that the concatenated loci are powerful enough to generate phylogenetic dendrograms that explain the regional origin of Zea mays varieties and subspecies and elucidate their relationships, consequently reflecting the mutation rate in chloroplast genomes.

Discussion

SSC as the genomic region with the highest mutation rate

Of the four regions that constitute the chloroplast genome, the SSC region exhibited the highest mutation rate per nucleotide (Fig. 1a–b). This phenomenon has been documented in several studies. For instance, Shaw et al. [51] reported this pattern for angiosperm genomes, such as those of Atropa, Nicotiana, Saccharum, and Oryza spp. This is consistent with the information described for other genera, such as Capirona [52], Pulsatilla [53], and Pseudogalium [54]. These observations are consistent with our results and identify the SSC region as the segment of the chloroplast genome with the highest mutation frequency.

Divergence in genes and intergenes and co-localization of SNP-InDels at the SSC region

The SSC region was 12,540 pb in length and was flanked by ndhF and ndhH (Fig. 2a–b). A low GC content in this region has been reported across various genera [25, 27, 52, 54]. This is consistent with the distribution of SNPs that show a high tendency for G-to-A transitions, ultimately affecting the GC percentage in this region (Figure S1A).

We identified eight genes and six intergenic loci in this region. Genes showed fewer events per base pair than intergenic loci (Fig. 2d). The most divergent loci were rps15-rpl32 and ndhD-ndhF. These loci possessed different proportions of SNPs/InDels, with a high predominance of deletions observed in ndhD-ndhF. This divergence was associated with %A in the loci.

K2P divergence in genes and intergenes

The loci with the highest mutation rates in the SSC region were identified and evaluated using K2P analysis (Table 2). Their divergence was assessed based on resolution percentage to determine their suitability as phylogenetic markers. Intergenic loci were more divergent than the genic regions, with ndhD-ndhF and rps15-rpl32 emerging as the most effective markers. Although matK and rbcL are widely used for DNAbc [30–32, 55], they performed poorly at this level of phylogenetic proximity, making them unsuitable candidates. rps15-rpl32 was the best among the groups, with 17.6% resolution and discrete levels of intra- and intergenic divergence.

This shows that the Peru group (in this study) is the most divergent, as it has two pools of genomes with high and low mutation rates (Table 2, Fig. 5b).

Biological significance of rps15-rpl32 and ndhF mutation bias at the SSC region

The SSC region is a genomic segment that harbors the genes involved in electron transport and photosynthetic regulation. The high mutation frequency at this region in different Zea mays varieties, owing to distinct ancestral variety crosses, is likely due to the genes within this zone. The main loci findings in the present study are as follows: part of the ndh-cluster (subunits A, D, E, F, G, H, and I), psaC, ccsA, and rpl32.

The ndh-cluster is a chloroplast NAD(P)H dehydrogenase-like complex; most of its subunits are in the SSC region [56]. The NDH-complex is involved in the redox-level regulation of cyclic electron transport (CET) during photosynthesis. It is located in the thylakoid membrane, near photosystem I (PSI). This complex transfers electrons from PSI to the plastoquinone pool, using it as a motor force to pump protons across the thylakoid membrane [57]. This mechanism is necessary for the adjustment of the electron flux [56]. The NDH levels in C4 plants (such as maize) are higher than those in C3 plants [58], indicating that NDH-CET is highly necessary for their cellular process [59].

The NDH subunit NDH-F, encoded by ndhF, is located at the complementary strand in the 3′−5′ direction in the Zea mays plastid genome. It is co-localized within the rpl32-rps15 intergenic locus at the 5′−3′ template strand (Fig. 2b). NDH-F plays a role in the adaptive response to changes in light intensity in tobacco plants [60]. Therefore, the high frequency of mutations observed in this locus, along with the phylogenetic inferences based on the rpl32-rps15 locus (Fig. 4b), could be explained by the edaphoclimatic conditions and light radiation levels of the regions where the plants were cultivated. In these environments, the photosynthetic machinery may have adapted to local conditions, possibly through the SNPs and InDels observed in the rpl32-rps15/ndhF locus, which regulate photosynthesis by balancing electron flux and proton gradients across the thylakoid membrane.

Knockout studies in tobacco plants have shown that rpl32 and rps15 play key roles in plant development and photosynthesis: (i) rpl32 deletion results in abnormal leaf development, indicating its necessity for proper organ formation [61], whereas (ii) rps15 knockout leads to a lower chlorophyll a:b ratio and reduced quantum efficiency of photosystem II (PSII) [61]. These findings suggest that rpl32 and rps15 may be involved in the regulation of PSII components and factors that control leaf development.

ndhF plays a role in the adaptation of plants to changes in light intensity [60]. Similarly, rpl32 and rps15 affect leaf development and PSII efficiency [61]. Thus, the high mutational frequency observed in these loci may reflect adaptive responses to the specific edaphoclimatic and light conditions where these maize varieties were traditionally cultivated. This is consistent with the phylogenetic inferences shown in the present study.

rps15-rpl32 as a phylogeographic marker vs other common DNAbc loci

The use of rbcL as a control for our phylogenetic analysis showed null resolution, which can be explained by the low frequency of SNPs and indels (Fig. 4). ndhF performed better in the phylogenetic trees, possibly because of the almost triple-frequency levels of SNPs and InDels. Although changes from A to C or T were observed in both cases, deletions of A were more strongly related in the genomes analyzed [62].

The intergenes showed better results, owing to the high frequency of SNPs and InDels [63] (Fig. 4a–b). Although both regions were similar in length (~ 3000 bp) and %GC (~ 30%), they exhibited markedly different mutation frequencies per base pair. The mutation frequency of the ndhD-ndhF locus was almost six times more variable than that observed for rps15-rpl32. Although the gene with the highest mutation frequency performed the best in the generation of phylogenetic trees, this pattern did not hold for intergenic regions.

Both loci exhibited a notable bias toward SNPs involving changes in A. Additionally, ndhD-ndhF displayed a high frequency of insertions, which were predominantly A and T insertions. In contrast, rps15-rpl32 was characterized by discrete levels of T insertions and A deletions. This showed a clear tendency for genomic changes to favor adenine and thymine substitutions, with an observed increase in A/T enrichment as the number of mutated base pairs increased.

Complementary RSCU analysis revealed a strong tendency (> 1) for the third nucleotide position to favor A or T in rps15-rpl32. Although this is a non-coding region, it exhibits a distinct trinucleotide composition bias (Table S3), suggesting that it may be subject to selective pressure or mutational biases. This phenomenon has been widely described [62]. rps15 (encoding the 30S subunit of the 70S ribosome) and rpl32 (encoding the 50S subunit of the 70S ribosome) undergo polycistronic transcription in the chloroplast [64]. The intergenic region between these genes may have regulatory roles in the synthesis of polycistronic mRNA, as there are no Shinde-Delgarno sequences (data not shown). The high A/T composition of SNPs and InDels [62, 65] suggests that these regulatory roles involve some mRNA secondary structures, such as stem loops [66]. Furthermore, these changes toward A/T may be selected more because A/T-rich regions are often preferred in chloroplasts, as their replication and transcription consume less energy [67].

The rps15-rpl32 locus primarily showed good results because it has a moderate number of variable sites (14), InDels, and a few parsimony-informative sites (4), suggesting a balance between variability and conservatism (Table S2). This is ideal for phylogeny because a region with too many conserved sites provides minimal information, whereas too much variability can introduce noise [68, 69], as is the case for ndhF-ndhD. The ndhF-ndhD locus has a similar length and %GC to rps15-rpl32 but exhibits excessive mutation frequency (Fig. 4a–c). This is not suitable for the phylogeny of varieties of the same species, as they are genetically similar within the same variety but slightly differ from other varieties.

High variability levels have been observed in noncoding regions at the SSC region (such as ndhF-rpl32 or rpl32-trnL [51, 54, 70–72]). However, these regions are not used for phylogenetic analysis, likely because they are not suitable for common ancestor phylogeny reconstruction and may lack the IRb-copy of rps15 [54, 71, 72].

The size and complexity of the variable and conserved sites of rps15-rpl32 is average, allowing it to retain sufficient phylogenetic information without being prone to irrelevant changes.

Phylogenetic trees based on high- and low-divergence intergenic regions

The locus with the highest %K2P value was the most effective for constructing a well-resolved phylogenetic tree of the 54 maize plastid genomes (Figs. 3 and 4). This was confirmed by the rps15-rpl32 tree, which delineated seven distinct clades. These results were supported by moderate-to-high bootstrap values ranging from 50 to 80 in each clade. Although the rps15-rpl32 tree showed the lowest log-likelihood value (–4235.5), this is expected because it has a longer sequence length [73] than other loci (rbcL: 1.4 kb; ndhF: 2.2 kb; ndhF-ndhD: 2.8 kb; rps15-rpl32: 3.1 kb). This trend is further supported by the minimal difference between the log-likelihood values of rps15-rpl32 and the combined rpl23-rpl2 + rps15-rpl32 tree (–4235.5 vs.–4320.2), which also reflects their similar alignment lengths. When interpreting phylogenetic resolution, the number of informative sites and clade support (bootstrap values) become more relevant than the log-likelihood parameter alone. Despite its moderate mutation frequency, rps15-rpl32 provided well-supported clades and was more effective in resolving relationships among regional genomes in the present study.

This locus was concatenated with a short intergenic region of low divergence (rpl23-rpl2) to enhance the resolution, resulting in a %K2P resolution increase of 25.49. As anticipated, the final tree successfully grouped the genomes into seven well-defined clades.

Five maize varieties (Huehutenagensis, A188, Potinha, Xingu-Waura, and Confite Puntiagudo) were incorrectly associated with the expected clades. This discrepancy was likely attributed to the inherent challenge of identifying a single (or multiple) locus capable of adequately representing every genome. Such loci must strike a delicate balance in divergence —neither too high nor too low— to ensure that subspecies of the same variety are sufficiently similar while remaining distinct from other those of varieties. This narrow range of divergence poses a significant constraint on the resolutive capacity of the marker, a characteristic that was consistently highlighted in the present study. The opposite effect occurred for the SA2 group, where the branches at the clade level were not distinguishable because of a low number of mutation events.

These results confirm that rps15-rpl32 can reflect the actual evolution of the whole genome.

K2P may not be the best divergence indicator for closed phylogenetic maize varieties

Are the K2P resolution values representative of divergence in close intraspecific varieties of the same species? Our findings align with those of Srivathsa and Meier [74], who reported that the widespread use of the K2P model in DNA barcoding does not necessarily enhance the resolution of closely related sequences (Tables 2 and 3). K2P distances were associated with locus divergence in the present study, producing lower and higher values for conserved and divergent loci, respectively. However, these distances do not fully represent the phylogenetic resolution observed in our data. Although the best-performing locus showed a 17% resolution, the phylogenetic tree delineated seven well-defined groups, suggesting a markedly higher effective resolution. This resolution reached 25.5% after concatenation with the same groups. The increased resolution indicates improved distribution, which is not consistent with our results. This limitation underscores the importance of using complementary metrics and visualization techniques, as K2P distances alone may underestimate the taxonomic resolution achievable with certain loci, especially in closely related maize varieties [74].

Association of rps15-rpl32 locus phylogeny-inference clades with SSR genetic diversity

Bedoya et al. [75] classified Uchuquilla, Kculli, and Chullpi within the South America– Andean region group (specifically in the Bolivian highland subgroup). This is consistent with our results (Fig. 5a), where they were clustered within the SA1 clade. In the same study, Morocho (classified as SA2 in our study) was placed in the Highland Andean subgroup within the South America– Andean region group, whereas Mochero (SA2) was assigned to the Central Highland Andean subgroup.

Our results are consistent with those of Bedoya et al. [75] and Vigoroux et al. [76] in terms of genomic-geographic background. The SA group represents the plastid genomic lineage of the South America– Andean group, with the SA1 clade corresponding to Bolivian highland maize and SA2 to the Highland Andean. Additionally, our results confirm the existence of a USA group [76] in our US clade (Fig. 5a) and a Highland Mexican group, which Vigoroux et al. [76] subdivided into two major clades in a neighbor-joining phylogeny (corresponding to MX1 and MX2 in Fig. 5a). Overall, our findings suggest that the previous SSR-based genetic background studies on Latin American maize races [75, 76] align with the plastid genomic background, showing a strong correlation.

Some genomes do not fit in the clades within the phylogenetic trees

Finding a locus or loci that can function in different varieties is challenging, especially when the maize varieties are phylogenetically close and the chloroplast evolves considerably slowly [77, 78]. This is mostly because the non-sexual uniparental inheritance [79], where the genes and intergenic regions evolve at different rates ([80, 81]; Fig. 1a), differs in each variety/accession [82] (Figs. 3 and 4). This effect was observed in phylogenetic analysis [76], not only for possible introgression in the regional germplasm but also because we used partial genomic information from only one small genome of the three that compose the plant cell.

Huehuetenagensis and A188 were just outside of their respective clades but close to the CN-USA group, likely because the loci used for the marker have evolutionary patterns similar to those that compose the phylogeography of the CN-USA clades (Fig. 5). However, Potinha, Xingu-Waura, and Confite Puntiagudo were identified as outgroups, as if they were from another species. They shared similar patterns of SNP and InDels that fell into an intermediate range (IR) of events, which likely could not be reflected in the selected phylogenetic-regional marker. This demonstrates the need for other loci that can facilitate the generation of improved results, such as ITS1 and ITS2 [31, 83, 84]. However, not every variety has a sequenced nuclear genome. Additionally, reliable morphological or genetic background information is not available for each genome used in this study. This limitation stems from the in-silico nature of our research and the inherent complexity of collecting such data. However, future studies could mitigate this challenge by incorporating complementary nuclear markers and morphological or ecological data where available. This approach would enable a more comprehensive interpretation of the observed evolutionary relationships, such as those observed in Xingu-Waura or Potinha, and facilitate the geographical classification of chloroplast genomes. Therefore, utilization of the plastid genome remains the best and most universal approach.

Finally, the varieties that should be conserved — those in the LR-group — as well as those considered ancestral, such as Kculli and Enano, or those derived from early inbreeding programs, such as B73 and A188, exhibit minimal divergence at the rps15-rpl32 locus. This suggests that these genomes have remained relatively stable over time, possibly representing ancient parental lineages. In contrast, more recent landraces that have emerged through traditional farming practices with minimal to zero artificial selection, such as Iqueño, or from recently improved varieties, such as INIA601, show greater divergence at this locus. This suggests that this locus is susceptible to evolutionary changes in populations not exposed to strong selective pressures, placing them within the HR-group. This pattern aligns with the classification provided by the Ministry of Agriculture and Irrigation of Peru [85], which categorized some of the varieties investigated in the present study as follows: (i) Primitive: Kculli (zero changes), Enano (LR), and Confite Puntiagudo (IR); (ii) Ancient: Chullpi (LR), Cuzco (LR), Morocho-Cajabambino (LR), Pagaladroga (LR), and Uchuquilla (LR); and (iii) Incipient: Huancavelicano (HR), INIA601 (HR), Iqueño (HR), Mochero (HR InDels), and Mole (HR).

Conclusions

This study demonstrated that concatenated loci (rps15-rpl32_rpl23-rpl2) can identify the regional origin of different maize varieties and subspecies, associate individuals of the same variety-subspecies, and differentiate between subpopulations with low- and high- intraspecific-variety divergence. The main properties of rps15-rpl32 include its suitable length for the identification of mutation, its classification as a noncoding sequence, as well as high %A and low %GC, owing to a high bias from A and T in SNPs and InDels. This observation suggests that the evolutionary dynamics of the chloroplast genome are shaped to preserve mutations favoring A and T. This pattern may represent an evolutionary signal reflecting selective pressures or functional constraints that prioritize the development and maintenance of A/T-rich sequences.

These results indicate that rps15-rpl32 is an effective phylogenetic marker for individuals with recently established geographical relation, rather than their common ancestors. The remarkable phylogenetic capabilities of the rps15-rpl32 intergenic locus may be due to its divergence in recent years. Highly conserved varieties, as well as those considered ancestral or products of ancient inbreeding programs, do not show significant divergence at this locus. In contrast, more recent varieties that have not been subjected to artificial selection have revealed that this locus is susceptible to substantial mutations.

The rps15-rpl32 locus suggests selective pressures in the SSC region. This likely facilitates the adaptations of C4 photosynthetic efficiency in response to environmental variation in areas where Zea mays landraces are cultivated. This characteristic makes this locus useful for different landraces and as a subspecies marker, as it facilitates the easy and rapid screening of Zea mays subspecies.

Supplementary Information

12864_2025_11831_MOESM1_ESM.docx^{(179.7KB, docx)}

Supplementary Material 1. Figure S1. SNP frequency and patterns (A) and InDels (B) at the SSC region across all analyzed Zea mays genomes.

12864_2025_11831_MOESM2_ESM.xlsx^{(25.5KB, xlsx)}

Supplementary Material 2. Table S1. Assembly quality analysis. Genomes were assembled and evaluated using NOVOwrap (Wu et al., 2021) and the QUAST tool (Gurevich et al., 2004), respectively. The variety assembled, as well as the GenBank and SRA accession numbers, are shown. Table S2. Nucleotide composition; GC content; and analysis of variable, conserved, informative, parsimony sites, and singletons for each locus. Table S3. RSCU analysis of the rps15-rpl32 region. Bold letters indicate the highest RSCU value

Supplementary Material 3. ^{(120.1KB, docx)}

Abbreviations

SSR: Simple sequence repeats
SNP: Single nucleotide polymorphism
DNAbc: DNA barcoding
IRa: Inverse repeat A
IRb: Inverse repeat B
SSC: Small single copy
LSC: Large single copy
ML: Maximum likelihood
K2P: Kimura 2-parameter
PE: Peru
MXP: Mexico– Parviglumis
MXR: Mexico– RIMME
CN: China
US: United States of America
BR: Brazil
IN: India
bt: Boostrap value support

Authors’ contributions

LU, FP. and MP: conceptualization, methodology, investigation, and formal analysis; LU: writing—original draft and visualization; FP, CR, and MP: writing—review and editing; MP: resources and supervision. All authors read and approved the final manuscript.

Funding

This work was supported by the FIC-Ñuble (grant number 40035912–0/2022). The funders had no role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

FIC ÑUBLE 40035912-0/2022

Data availability

All data generated or analyzed during this study are available in the main text, supplementary materials, public databases, or referenced permanent online repositories. The accession numbers of sequence data used in this study are listed in Table S1.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.López E. El maíz En América Latina: Contaminación del centro de origen del maíz. In: Revista Semillas. 2005.https://www.semillas.org.co/es/el-maz-en-amrica-latina-contaminacin-del-centro-de-origen-del-maz.
2.Guzzon F, Arandia Rios LW, Caviedes Cepeda GM, Céspedes Polo M, Chavez Cabrera A, Muriel Figueroa J, et al. Conservation and use of Latin American maize diversity: Pillar of nutrition security and cultural heritage of humanity. Agronomy. 2021;11:172. 10.3390/agronomy11010172. [Google Scholar]
3.Saavedra G. Clasificacion Botanica, germinacion y Desarrollo. In: Boletin INIA - Instituto de Investigaciones Agropecuarias, No. 303. Santiago. 2014. https://hdl.handle.net/20.500.14001/7803. Accessed 18 Oct 2023.
4.Paliwal R, Granados G, Lafitte H, Violic A. El Maíz En Los Trópicos: Mejoramiento y producción. 2001. https://www.fao.org/3/x7650S/x7650s00.htm.
5.Montoro AE, Ruiz MB. Ecofisiología del cultivo de maíz dulce (Zea mays L. var. saccharata). Hortic Argent. 2017;36:153–66 https://hdl.handle.net/20.500.12123/4402. [Google Scholar]
6.OECD Environment Health and Safety Publications. Consensus Document on the Biology of Zea mays subsp. mays (maize). 2003. http://www.oecd.org/biotrack/.
7.Food and Agriculture Organization of the United Nations. FAOSTAT: Crops and livestock products. 2024. https://www.fao.org/faostat. Accessed 13 Dec 2024.
8.Eguillor P. ¿Qué son las Indicaciones Geográficas y las Denominaciones de Origen?. 2015. https://www.odepa.gob.cl/wp-content/uploads/2015/12/IDyDO2015.pdf.
9.Olivos MC, Carrasco F. Agregar Valor a Los Productos Tradicionales de Chile Con El Sello de Origen.: Revista de la OMPI. 2016. https://www.wipo.int/es/web/wipo-magazine/articles/adding-value-to-chiles-heritage-products-with-the-emsello-de-origenem-39597.
10.Salmeri C. Plant morphology: outdated or advanced discipline in modern plant sciences? Flora Mediterr. 2019;29:163–80. 10.7320/FlMedit29.163. [Google Scholar]
11.Tessler M, Galen SC, DeSalle R, Schierwater B. Let’s end taxonomic blank slates with molecular morphology. Front Ecol Evol. 2022;10:1016412. 10.3389/fevo.2022.1016412. [Google Scholar]
12.Zhao S, Chen X, Song J, Pang X, Chen S. Internal transcribed spacer 2 barcode: a good tool for identifying Acanthopanacis cortex. Front Plant Sci. 2015;6:840. 10.3389/fpls.2015.00840. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Zhang W, Tian W, Gao Z, Wang G, Zhao H. Phylogenetic utility of rRNA ITS2 sequence-structure under functional constraint. Int J Mol Sci. 2020;21:6395. 10.3390/ijms21176395. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Unamba CIN, Nag A, Sharma RK. Next generation sequencing technologies: The doorway to the unexplored genomics of non-model plants. Front Plant Sci. 2015;6:1074. 10.3389/fpls.2015.01074. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Salazar E, González M, Araya C, Mejía N, Carrasco B. Genetic diversity and intra-racial structure of Chilean Choclero corn (Zeamays L.) germplasm revealed by simple sequence repeat markers (SSRs). Sci Hortic. 2017;225:620–9. 10.1016/j.scienta.2017.08.006. [Google Scholar]
16.Caldu-Primo JL, Mastretta-Yanes A, Wegier A, Piñero D. Finding a needle in a haystack: distinguishing mexican maize landraces using a small number of SNPs. Front Genet. 2017;8:45. 10.3389/fgene.2017.00045. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Kress WJ, García-Robledo C, Uriarte M, Erickson DL. DNA barcodes for ecology, evolution, and conservation. Trends Ecol Evol. 2015;30:25–35. 10.1016/j.tree.2014.10.008. [DOI] [PubMed] [Google Scholar]
18.Kress WJ, Erickson DL. A two-locus global DNA barcode for land plants: The coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS ONE. 2007;2:e508. 10.1371/journal.pone.0000508. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Ratnasingham S, Hebert PDN. BOLD: The barcode of life data system (http://www.barcodinglife.org). Mol Ecol Notes. 2007;7:355-64. 10.1111/j.1471-8286.2007.01678.x. [DOI] [PMC free article] [PubMed]
20.Taberlet P, Coissac E, Pompanon F, Gielly L, Miquel C, Valentini A, et al. Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res. 2007;35:e14. 10.1093/nar/gkl938. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Savolainen V, Cowan RS, Vogler AP, Roderick GK, Lane R. Towards writing the encyclopedia of life: an introduction to DNA barcoding. Philos Trans R Soc Lond B Biol Sci. 2005;360:1805–11. 10.1098/rstb.2005.1730. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Yang F, Ding F, Chen H, He M, Zhu S, Ma X, et al. DNA barcoding for the identification and authentication of animal species in traditional medicine. Evid Based Complement Alternat Med. 2018;2018:5160254. 10.1155/2018/5160254. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH. Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci U S A. 2005;102:8369–74. 10.1073/pnas.0503123102. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Xu J. Fungal DNA barcoding. Genome. 2016;59:913–32. 10.1139/gen-2016-0046. [DOI] [PubMed] [Google Scholar]
25.Yi DK, Lee HL, Sun BY, Chung MY, Kim KJ. The complete chloroplast DNA sequence of Eleutherococcus senticosus (Araliaceae); comparative evolutionary analyses with other three Asterids. Mol Cells. 2012;33:497–508. 10.1007/s10059-012-2281-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Wu L, Nie L, Wang Q, Xu Z, Wang Y, He C, et al. Comparative and phylogenetic analyses of the chloroplast genomes of species of Paeoniaceae. Sci Rep. 2021;11:14643. 10.1038/s41598-021-94137-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Long L, Li Y, Wang S, Liu Z, Wang J, Yang M. Complete chloroplast genomes and comparative analysis of Ligustrum species. Sci Rep. 2023;13:212. 10.1038/s41598-022-26884-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.GenBank. National Center for Biotechnology Information, Bethesda. Zea mays chloroplast genome, B73 reference genome (GENBank BioProject PRJNA10769). 2024. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA10769.
29.Maier RM, Neckermann K, Igloi GL, Kössel H. Complete sequence of the maize chloroplast genome: Gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J Mol Biol. 1995;251:614–28. 10.1006/jmbi.1995.0460. [DOI] [PubMed] [Google Scholar]
30.Lahaye R, van der Bank M, Bogarin D, Warner J, Pupulin F, Gigot G, et al. DNA barcoding the floras of biodiversity hotspots. Proc Natl Acad Sci USA. 2008;105:2923–8. 10.1073/pnas.0709936105. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Raskoti BB, Ale R. DNA barcoding of medicinal orchids in Asia. Sci Rep. 2021;11:23651. 10.1038/s41598-021-03025-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Li H, Xiao W, Tong T, Li Y, Zhang M, Lin X, et al. The specific DNA barcodes based on chloroplast genes for species identification of Orchidaceae plants. Sci Rep. 2021;11:1424. 10.1038/s41598-021-81087-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Pang X, Liu C, Shi L, Liu R, Liang D, Li H, et al. Utility of the trnH–psbA intergenic spacer region and its combinations as plant DNA barcodes: a meta-analysis. PLoS ONE. 2012;7:e48833. 10.1371/journal.pone.0048833. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Okoth P, Muoma J, Emmanuel M, Clabe W, Omayio DO, Angienda PO. The potential of DNA barcode-based delineation using seven putative candidate loci of the plastid region in inferring molecular diversity of cowpea at sub-species level. Am J Mol Biol. 2016;6:138–58. 10.4236/ajmb.2016.64014. [Google Scholar]
35.Raskoti BB, Jin W, Xiang X, Schuiteman A, Li D, Li J, et al. A phylogenetic analysis of molecular and morphological characters of Herminium (Orchidaceae, Orchideae): evolutionary relationships, taxonomy, and patterns of character evolution. Cladistics. 2016;32:198–210. 10.1111/cla.12125. [DOI] [PubMed] [Google Scholar]
36.Kistler L, Maezumi SY, Gregorio de Souza J, Przelomska NAS, Malaquias Costa F, Smith O, et al. Multiproxy evidence highlights a complex evolutionary legacy of maize in South America. Science. 2018;2018(362):1309–13. 10.1126/science.aav0207. [DOI] [PubMed] [Google Scholar]
37.Andrews S. FastQC: A quality control tool for high throughput sequence data [Computer software]. Babraham Bioinformatics, Cambridge. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
38.Wu P, Xu C, Chen H, Yang J, Zhang X, Zhou S. NOVOWrap: An automated solution for plastid genome assembly and structure standardization. Mol Ecol Resour. 2021;21:2177–86. 10.1111/1755-0998.13410. [DOI] [PubMed] [Google Scholar]
39.Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq– Versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45:W6-11. 10.1093/nar/gkx391. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5. 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Li H, Guo Q, Xu L, Gao H, Liu L, Zhou X. CPJSdraw: analysis and visualization of junction sites of chloroplast genomes. PeerJ. 2023;11:e15326. 10.7717/peerj.15326. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018;14:e1005944. 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546:524–7. 10.1038/nature22971. [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Wickham H, François R, Henry L, Müller K. dplyr: A grammar of data manipulation. R package version 1.1.3. 2023. https://CRAN.R-project.org/package=dplyr
45.Kanaries. PyGWalker: A python library for exploratory data analysis with visualization. 2023. https://github.com/Kanaries/pygwalker.
46.Wickham H. ggplot2: Elegant graphics for data analysis. Springer-Verlag, Berlin. 2016. https://ggplot2.tidyverse.org
47.Grant JR, Enns E, Marinier E, Mandal A, Herman EK, Chen CY, et al. Proksee: In-depth characterization and visualization of bacterial genomes. Nucleic Acids Res. 2023;51:W484–92. 10.1093/nar/gkad326. [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30:772–80. 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Paradis E, Schliep K. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–8. 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]
51.Shaw J, Lickey EB, Schilling EE, Small RL. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: The tortoise and the hare III. Am J Bot. 2007;94:275–88. 10.3732/ajb.94.3.275. [DOI] [PubMed] [Google Scholar]
52.Saldaña CL, Rodriguez-Grados P, Chávez-Galarza JC, Feijoo S, Guerrero-Abad JC, Vásquez HV, et al. Unlocking the complete chloroplast genome of a native tree species from the Amazon Basin, Capirona (Calycophyllum Spruceanum, Rubiaceae), and its comparative analysis with other Ixoroideae species. Genes (Basel). 2022;13:113. 10.3390/genes13010113. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Li QJ, Su N, Zhang L, Tong RC, Zhang XH, Wang JR, et al. Chloroplast genomes elucidate diversity, phylogeny, and taxonomy of Pulsatilla (Ranunculaceae). Sci Rep. 2020;10:19781. 10.1038/s41598-020-76699-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Yu W, Li XJ, Lv Z, Yang LE, Peng DL. The complete chloroplast genome sequences of monotypic genus Pseudogalium, and comparative analyses with its relative genera. BMC Genomics. 2025;26:93. 10.1186/s12864-025-11276-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Ho VT, Tran TKP, Vu TTT, Widiarsih S. Comparison of matK and rbcL DNA barcodes for genetic classification of jewel orchid accessions in Vietnam. J Genet Eng Biotechnol. 2021;19:93. 10.1186/s43141-021-00188-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Martín M, Sabater B. Plastid ndh genes in plant evolution. Plant Physiol Biochem. 2010;48:636–45. 10.1016/j.plaphy.2010.04.009. [DOI] [PubMed] [Google Scholar]
57.Laughlin TG, Bayne AN, Trempe JF, Savage DF, Davies KM. Structure of the complex I-like molecule NDH of oxygenic photosynthesis. Nature. 2019;566:411–4. 10.1038/s41586-019-0921-0. [DOI] [PubMed] [Google Scholar]
58.Ishikawa N, Takabayashi A, Noguchi K, Tazoe Y, Yamamoto H, von Caemmerer S, et al. NDH-mediated cyclic electron flow around photosystem I is crucial for C4 photosynthesis. Plant Cell Physiol. 2016;57:2020–8. 10.1093/pcp/pcw127. [DOI] [PubMed] [Google Scholar]
59.Ma M, Liu Y, Bai C, Yong JWH. The significance of chloroplast NAD(P)H dehydrogenase complex and its dependent cyclic electron transport in photosynthesis. Front Plant Sci. 2021;12:661863. 10.3389/fpls.2021.661863. [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Martín M, Funk HT, Serrot PH, Poltnigg P, Sabater B. Functional characterization of the thylakoid Ndh complex phosphorylation by site-directed mutations in the ndhF gene. Biochim Biophys Acta. 2009;1787:920–8. 10.1016/j.bbabio.2009.03.001. [DOI] [PubMed] [Google Scholar]
61.Fleischmann TT, Scharff LB, Alkatib S, Hasdorf S, Schöttler MA, Bock R. Nonessential plastid-encoded ribosomal proteins in tobacco: a developmental role for plastid translation and implications for reductive genome evolution. Plant Cell. 2011;23:3137–55. 10.1105/tpc.111.088906. [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Dong W, Xu C, Wen J, Zhou S. Evolutionary directions of single nucleotide substitutions and structural mutations in the chloroplast genomes of the family Calycanthaceae. BMC Evol Biol. 2020;20:96. 10.1186/s12862-020-01661-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Blank D, Wolf L, Ackermann M, Silander OK. The predictability of molecular evolution during functional innovation. Proc Natl Acad Sci USA. 2014;111:3044–9. 10.1073/pnas.1318797111. [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Hess WR, Prombona A, Fieder B, Subramanian AR, Börner T. Chloroplast rps15 and the rpoB/C1/C2 gene cluster are strongly transcribed in ribosome-deficient plastids: Evidence for a functioning non-chloroplast-encoded RNA polymerase. EMBO J. 1993;12:563–71. 10.1002/j.1460-2075.1993.tb05688.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Yang C, Wang K, Zhang H, Guan Q, Shen J. Analysis of the chloroplast genome and phylogenetic evolution of three species of Syringa. Mol Biol Rep. 2023;50:665–77. 10.1007/s11033-022-08004-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Chen HC, Stern DB. Specific binding of chloroplast proteins in vitro to the 3′ untranslated region of spinach chloroplast petD mRNA. Mol Cell Biol. 1991;11:4380–8. 10.1128/mcb.11.9.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Chakraborty S, Sophiarani Y, Uddin A. Free energy of mRNA positively correlates with GC content in chloroplast transcriptomes of edible legumes. Genomics. 2021;113:2826–38. 10.1016/j.ygeno.2021.06.026. [DOI] [PubMed] [Google Scholar]
68.Santos L, Alves A, Alves R. Evaluating multi-locus phylogenies for species boundaries determination in the genus Diaporthe. PeerJ. 2017;5:e3120. 10.7717/peerj.3120. [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and deletions: Computational methods, evolutionary dynamics, and biological applications. Mol Biol Evol. 2024;41:msae177. 10.1093/molbev/msae177. [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Kim K, Lee SC, Lee J, Lee HO, Joh HJ, Kim NH, et al. Comprehensive survey of genetic diversity in chloroplast genomes and 45S nrDNAs within panax ginseng species. PLoS ONE. 2015;10:e0117159. 10.1371/journal.pone.0117159. [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Ye J, Luo Q, Lang Y, Ding N, Jian YQ, Wu ZK, et al. Analysis of chloroplast genome structure and phylogeny of the traditional medicinal of Ardisia crispa (Myrsinaceae). Sci Rep. 2024;14:19045. 10.1038/s41598-024-66563-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Li DM, Pan YG, Liu HL, Yu B, Huang D, Zhu GF. Thirteen complete chloroplast genomes of the costaceae family: Insights into genome structure, selective pressure and phylogenetic relationships. BMC Genomics. 2024;25:68. 10.1186/s12864-024-09996-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Smirnov V, Warnow T. Phylogeny estimation given sequence length heterogeneity. Syst Biol. 2021;70:268–82. 10.1093/sysbio/syaa058. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Srivathsan A, Meier R. On the inappropriate use of Kimura-2-parameter (K2P) divergences in the DNA-barcoding literature. Cladistics. 2012;28:190–4. 10.1111/j.1096-0031.2011.00370.x. [DOI] [PubMed] [Google Scholar]
75.Bedoya CA, Dreisigacker S, Hearne S, Franco J, Mir C, Prasanna BM, et al. Genetic diversity and population structure of native maize populations in Latin America and the Caribbean. PLoS ONE. 2017;12:e0173488. 10.1371/journal.pone.0173488. [DOI] [PMC free article] [PubMed] [Google Scholar]
76.Vigouroux Y, Glaubitz JC, Matsuoka Y, Goodman MM, Sánchez GJ, Doebley J. Population structure and genetic diversity of New World maize races assessed by DNA microsatellites. Am J Bot. 2008;95:1240–53. 10.3732/ajb.0800097. [DOI] [PubMed] [Google Scholar]
77.Provan J, Soranzo N, Wilson NJ, Goldstein DB, Powell W. A low mutation rate for chloroplast microsatellites. Genetics. 1999;153:943–7. 10.1093/genetics/153.2.943. [DOI] [PMC free article] [PubMed] [Google Scholar]
78.Smith DR. Mutation rates in plastid genomes: They are lower than you might think. Genome Biol Evol. 2015;7:1227–34. 10.1093/gbe/evv069. [DOI] [PMC free article] [PubMed] [Google Scholar]
79.Christie JR, Beekman M. Uniparental inheritance promotes adaptive evolution in cytoplasmic genomes. Mol Biol Evol. 2016;34:677–91. 10.1093/molbev/msw266. [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Andrews TD, Jermiin LS, Easteal S. Accelerated evolution of cytochrome b in simian primates: adaptive evolution in concert with other mitochondrial proteins? J Mol Evol. 1998;47:249–57. 10.1007/PL00006382. [DOI] [PubMed] [Google Scholar]
81.Aoyagi Blue Y, Sakai S. Low mutation rates promote the evolution of advantageous traits by preventing interference from deleterious mutations. Genetica. 2020;148:101–8. 10.1007/s10709-020-00091-6. [DOI] [PubMed] [Google Scholar]
82.Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007;317:338–42. 10.1126/science.1138632. [DOI] [PubMed] [Google Scholar]
83.Chen S, Yao H, Han J, Liu C, Song J, Shi L, et al. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS ONE. 2010;5:e8613. 10.1371/journal.pone.0008613. [DOI] [PMC free article] [PubMed] [Google Scholar]
84.Duan H, Wang W, Zeng Y, Guo M, Zhou Y. The screening and identification of DNA barcode sequences for Rehmannia. Sci Rep. 2019;9:17295. 10.1038/s41598-019-53752-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Ministerio de desarrollo agrario y riego (MIDAGRI). El maíz morado peruano. Dirección General de Políticas Agrarias– Dirección de Estudios Económicos. 2021. https://repositorio.midagri.gob.pe/handle/20.500.13036/1152.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12864_2025_11831_MOESM1_ESM.docx^{(179.7KB, docx)}

Supplementary Material 1. Figure S1. SNP frequency and patterns (A) and InDels (B) at the SSC region across all analyzed Zea mays genomes.

12864_2025_11831_MOESM2_ESM.xlsx^{(25.5KB, xlsx)}

Supplementary Material 3. ^{(120.1KB, docx)}

Data Availability Statement

[CR1] 1.López E. El maíz En América Latina: Contaminación del centro de origen del maíz. In: Revista Semillas. 2005.https://www.semillas.org.co/es/el-maz-en-amrica-latina-contaminacin-del-centro-de-origen-del-maz.

[CR2] 2.Guzzon F, Arandia Rios LW, Caviedes Cepeda GM, Céspedes Polo M, Chavez Cabrera A, Muriel Figueroa J, et al. Conservation and use of Latin American maize diversity: Pillar of nutrition security and cultural heritage of humanity. Agronomy. 2021;11:172. 10.3390/agronomy11010172. [Google Scholar]

[CR3] 3.Saavedra G. Clasificacion Botanica, germinacion y Desarrollo. In: Boletin INIA - Instituto de Investigaciones Agropecuarias, No. 303. Santiago. 2014. https://hdl.handle.net/20.500.14001/7803. Accessed 18 Oct 2023.

[CR4] 4.Paliwal R, Granados G, Lafitte H, Violic A. El Maíz En Los Trópicos: Mejoramiento y producción. 2001. https://www.fao.org/3/x7650S/x7650s00.htm.

[CR5] 5.Montoro AE, Ruiz MB. Ecofisiología del cultivo de maíz dulce (Zea mays L. var. saccharata). Hortic Argent. 2017;36:153–66 https://hdl.handle.net/20.500.12123/4402. [Google Scholar]

[CR6] 6.OECD Environment Health and Safety Publications. Consensus Document on the Biology of Zea mays subsp. mays (maize). 2003. http://www.oecd.org/biotrack/.

[CR7] 7.Food and Agriculture Organization of the United Nations. FAOSTAT: Crops and livestock products. 2024. https://www.fao.org/faostat. Accessed 13 Dec 2024.

[CR8] 8.Eguillor P. ¿Qué son las Indicaciones Geográficas y las Denominaciones de Origen?. 2015. https://www.odepa.gob.cl/wp-content/uploads/2015/12/IDyDO2015.pdf.

[CR9] 9.Olivos MC, Carrasco F. Agregar Valor a Los Productos Tradicionales de Chile Con El Sello de Origen.: Revista de la OMPI. 2016. https://www.wipo.int/es/web/wipo-magazine/articles/adding-value-to-chiles-heritage-products-with-the-emsello-de-origenem-39597.

[CR10] 10.Salmeri C. Plant morphology: outdated or advanced discipline in modern plant sciences? Flora Mediterr. 2019;29:163–80. 10.7320/FlMedit29.163. [Google Scholar]

[CR11] 11.Tessler M, Galen SC, DeSalle R, Schierwater B. Let’s end taxonomic blank slates with molecular morphology. Front Ecol Evol. 2022;10:1016412. 10.3389/fevo.2022.1016412. [Google Scholar]

[CR12] 12.Zhao S, Chen X, Song J, Pang X, Chen S. Internal transcribed spacer 2 barcode: a good tool for identifying Acanthopanacis cortex. Front Plant Sci. 2015;6:840. 10.3389/fpls.2015.00840. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR13] 13.Zhang W, Tian W, Gao Z, Wang G, Zhao H. Phylogenetic utility of rRNA ITS2 sequence-structure under functional constraint. Int J Mol Sci. 2020;21:6395. 10.3390/ijms21176395. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Unamba CIN, Nag A, Sharma RK. Next generation sequencing technologies: The doorway to the unexplored genomics of non-model plants. Front Plant Sci. 2015;6:1074. 10.3389/fpls.2015.01074. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Salazar E, González M, Araya C, Mejía N, Carrasco B. Genetic diversity and intra-racial structure of Chilean Choclero corn (Zeamays L.) germplasm revealed by simple sequence repeat markers (SSRs). Sci Hortic. 2017;225:620–9. 10.1016/j.scienta.2017.08.006. [Google Scholar]

[CR16] 16.Caldu-Primo JL, Mastretta-Yanes A, Wegier A, Piñero D. Finding a needle in a haystack: distinguishing mexican maize landraces using a small number of SNPs. Front Genet. 2017;8:45. 10.3389/fgene.2017.00045. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR17] 17.Kress WJ, García-Robledo C, Uriarte M, Erickson DL. DNA barcodes for ecology, evolution, and conservation. Trends Ecol Evol. 2015;30:25–35. 10.1016/j.tree.2014.10.008. [DOI] [PubMed] [Google Scholar]

[CR18] 18.Kress WJ, Erickson DL. A two-locus global DNA barcode for land plants: The coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS ONE. 2007;2:e508. 10.1371/journal.pone.0000508. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR19] 19.Ratnasingham S, Hebert PDN. BOLD: The barcode of life data system (http://www.barcodinglife.org). Mol Ecol Notes. 2007;7:355-64. 10.1111/j.1471-8286.2007.01678.x. [DOI] [PMC free article] [PubMed]

[CR20] 20.Taberlet P, Coissac E, Pompanon F, Gielly L, Miquel C, Valentini A, et al. Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res. 2007;35:e14. 10.1093/nar/gkl938. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR21] 21.Savolainen V, Cowan RS, Vogler AP, Roderick GK, Lane R. Towards writing the encyclopedia of life: an introduction to DNA barcoding. Philos Trans R Soc Lond B Biol Sci. 2005;360:1805–11. 10.1098/rstb.2005.1730. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR22] 22.Yang F, Ding F, Chen H, He M, Zhu S, Ma X, et al. DNA barcoding for the identification and authentication of animal species in traditional medicine. Evid Based Complement Alternat Med. 2018;2018:5160254. 10.1155/2018/5160254. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH. Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci U S A. 2005;102:8369–74. 10.1073/pnas.0503123102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Xu J. Fungal DNA barcoding. Genome. 2016;59:913–32. 10.1139/gen-2016-0046. [DOI] [PubMed] [Google Scholar]

[CR25] 25.Yi DK, Lee HL, Sun BY, Chung MY, Kim KJ. The complete chloroplast DNA sequence of Eleutherococcus senticosus (Araliaceae); comparative evolutionary analyses with other three Asterids. Mol Cells. 2012;33:497–508. 10.1007/s10059-012-2281-6. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Wu L, Nie L, Wang Q, Xu Z, Wang Y, He C, et al. Comparative and phylogenetic analyses of the chloroplast genomes of species of Paeoniaceae. Sci Rep. 2021;11:14643. 10.1038/s41598-021-94137-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Long L, Li Y, Wang S, Liu Z, Wang J, Yang M. Complete chloroplast genomes and comparative analysis of Ligustrum species. Sci Rep. 2023;13:212. 10.1038/s41598-022-26884-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR28] 28.GenBank. National Center for Biotechnology Information, Bethesda. Zea mays chloroplast genome, B73 reference genome (GENBank BioProject PRJNA10769). 2024. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA10769.

[CR29] 29.Maier RM, Neckermann K, Igloi GL, Kössel H. Complete sequence of the maize chloroplast genome: Gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J Mol Biol. 1995;251:614–28. 10.1006/jmbi.1995.0460. [DOI] [PubMed] [Google Scholar]

[CR30] 30.Lahaye R, van der Bank M, Bogarin D, Warner J, Pupulin F, Gigot G, et al. DNA barcoding the floras of biodiversity hotspots. Proc Natl Acad Sci USA. 2008;105:2923–8. 10.1073/pnas.0709936105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Raskoti BB, Ale R. DNA barcoding of medicinal orchids in Asia. Sci Rep. 2021;11:23651. 10.1038/s41598-021-03025-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Li H, Xiao W, Tong T, Li Y, Zhang M, Lin X, et al. The specific DNA barcodes based on chloroplast genes for species identification of Orchidaceae plants. Sci Rep. 2021;11:1424. 10.1038/s41598-021-81087-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Pang X, Liu C, Shi L, Liu R, Liang D, Li H, et al. Utility of the trnH–psbA intergenic spacer region and its combinations as plant DNA barcodes: a meta-analysis. PLoS ONE. 2012;7:e48833. 10.1371/journal.pone.0048833. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Okoth P, Muoma J, Emmanuel M, Clabe W, Omayio DO, Angienda PO. The potential of DNA barcode-based delineation using seven putative candidate loci of the plastid region in inferring molecular diversity of cowpea at sub-species level. Am J Mol Biol. 2016;6:138–58. 10.4236/ajmb.2016.64014. [Google Scholar]

[CR35] 35.Raskoti BB, Jin W, Xiang X, Schuiteman A, Li D, Li J, et al. A phylogenetic analysis of molecular and morphological characters of Herminium (Orchidaceae, Orchideae): evolutionary relationships, taxonomy, and patterns of character evolution. Cladistics. 2016;32:198–210. 10.1111/cla.12125. [DOI] [PubMed] [Google Scholar]

[CR36] 36.Kistler L, Maezumi SY, Gregorio de Souza J, Przelomska NAS, Malaquias Costa F, Smith O, et al. Multiproxy evidence highlights a complex evolutionary legacy of maize in South America. Science. 2018;2018(362):1309–13. 10.1126/science.aav0207. [DOI] [PubMed] [Google Scholar]

[CR37] 37.Andrews S. FastQC: A quality control tool for high throughput sequence data [Computer software]. Babraham Bioinformatics, Cambridge. 2010. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.

[CR38] 38.Wu P, Xu C, Chen H, Yang J, Zhang X, Zhou S. NOVOWrap: An automated solution for plastid genome assembly and structure standardization. Mol Ecol Resour. 2021;21:2177–86. 10.1111/1755-0998.13410. [DOI] [PubMed] [Google Scholar]

[CR39] 39.Tillich M, Lehwark P, Pellizzer T, Ulbricht-Jones ES, Fischer A, Bock R, et al. GeSeq– Versatile and accurate annotation of organelle genomes. Nucleic Acids Res. 2017;45:W6-11. 10.1093/nar/gkx391. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5. 10.1093/bioinformatics/btt086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR41] 41.Li H, Guo Q, Xu L, Gao H, Liu L, Zhou X. CPJSdraw: analysis and visualization of junction sites of chloroplast genomes. PeerJ. 2023;11:e15326. 10.7717/peerj.15326. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol. 2018;14:e1005944. 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR43] 43.Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546:524–7. 10.1038/nature22971. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR44] 44.Wickham H, François R, Henry L, Müller K. dplyr: A grammar of data manipulation. R package version 1.1.3. 2023. https://CRAN.R-project.org/package=dplyr

[CR45] 45.Kanaries. PyGWalker: A python library for exploratory data analysis with visualization. 2023. https://github.com/Kanaries/pygwalker.

[CR46] 46.Wickham H. ggplot2: Elegant graphics for data analysis. Springer-Verlag, Berlin. 2016. https://ggplot2.tidyverse.org

[CR47] 47.Grant JR, Enns E, Marinier E, Mandal A, Herman EK, Chen CY, et al. Proksee: In-depth characterization and visualization of bacterial genomes. Nucleic Acids Res. 2023;51:W484–92. 10.1093/nar/gkad326. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol Biol Evol. 2013;30:772–80. 10.1093/molbev/mst010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Paradis E, Schliep K. ape 5.0: An environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–8. 10.1093/bioinformatics/bty633. [DOI] [PubMed] [Google Scholar]

[CR51] 51.Shaw J, Lickey EB, Schilling EE, Small RL. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: The tortoise and the hare III. Am J Bot. 2007;94:275–88. 10.3732/ajb.94.3.275. [DOI] [PubMed] [Google Scholar]

[CR52] 52.Saldaña CL, Rodriguez-Grados P, Chávez-Galarza JC, Feijoo S, Guerrero-Abad JC, Vásquez HV, et al. Unlocking the complete chloroplast genome of a native tree species from the Amazon Basin, Capirona (Calycophyllum Spruceanum, Rubiaceae), and its comparative analysis with other Ixoroideae species. Genes (Basel). 2022;13:113. 10.3390/genes13010113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Li QJ, Su N, Zhang L, Tong RC, Zhang XH, Wang JR, et al. Chloroplast genomes elucidate diversity, phylogeny, and taxonomy of Pulsatilla (Ranunculaceae). Sci Rep. 2020;10:19781. 10.1038/s41598-020-76699-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Yu W, Li XJ, Lv Z, Yang LE, Peng DL. The complete chloroplast genome sequences of monotypic genus Pseudogalium, and comparative analyses with its relative genera. BMC Genomics. 2025;26:93. 10.1186/s12864-025-11276-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR55] 55.Ho VT, Tran TKP, Vu TTT, Widiarsih S. Comparison of matK and rbcL DNA barcodes for genetic classification of jewel orchid accessions in Vietnam. J Genet Eng Biotechnol. 2021;19:93. 10.1186/s43141-021-00188-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.Martín M, Sabater B. Plastid ndh genes in plant evolution. Plant Physiol Biochem. 2010;48:636–45. 10.1016/j.plaphy.2010.04.009. [DOI] [PubMed] [Google Scholar]

[CR57] 57.Laughlin TG, Bayne AN, Trempe JF, Savage DF, Davies KM. Structure of the complex I-like molecule NDH of oxygenic photosynthesis. Nature. 2019;566:411–4. 10.1038/s41586-019-0921-0. [DOI] [PubMed] [Google Scholar]

[CR58] 58.Ishikawa N, Takabayashi A, Noguchi K, Tazoe Y, Yamamoto H, von Caemmerer S, et al. NDH-mediated cyclic electron flow around photosystem I is crucial for C4 photosynthesis. Plant Cell Physiol. 2016;57:2020–8. 10.1093/pcp/pcw127. [DOI] [PubMed] [Google Scholar]

[CR59] 59.Ma M, Liu Y, Bai C, Yong JWH. The significance of chloroplast NAD(P)H dehydrogenase complex and its dependent cyclic electron transport in photosynthesis. Front Plant Sci. 2021;12:661863. 10.3389/fpls.2021.661863. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR60] 60.Martín M, Funk HT, Serrot PH, Poltnigg P, Sabater B. Functional characterization of the thylakoid Ndh complex phosphorylation by site-directed mutations in the ndhF gene. Biochim Biophys Acta. 2009;1787:920–8. 10.1016/j.bbabio.2009.03.001. [DOI] [PubMed] [Google Scholar]

[CR61] 61.Fleischmann TT, Scharff LB, Alkatib S, Hasdorf S, Schöttler MA, Bock R. Nonessential plastid-encoded ribosomal proteins in tobacco: a developmental role for plastid translation and implications for reductive genome evolution. Plant Cell. 2011;23:3137–55. 10.1105/tpc.111.088906. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR62] 62.Dong W, Xu C, Wen J, Zhou S. Evolutionary directions of single nucleotide substitutions and structural mutations in the chloroplast genomes of the family Calycanthaceae. BMC Evol Biol. 2020;20:96. 10.1186/s12862-020-01661-0. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR63] 63.Blank D, Wolf L, Ackermann M, Silander OK. The predictability of molecular evolution during functional innovation. Proc Natl Acad Sci USA. 2014;111:3044–9. 10.1073/pnas.1318797111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Hess WR, Prombona A, Fieder B, Subramanian AR, Börner T. Chloroplast rps15 and the rpoB/C1/C2 gene cluster are strongly transcribed in ribosome-deficient plastids: Evidence for a functioning non-chloroplast-encoded RNA polymerase. EMBO J. 1993;12:563–71. 10.1002/j.1460-2075.1993.tb05688.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Yang C, Wang K, Zhang H, Guan Q, Shen J. Analysis of the chloroplast genome and phylogenetic evolution of three species of Syringa. Mol Biol Rep. 2023;50:665–77. 10.1007/s11033-022-08004-w. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR66] 66.Chen HC, Stern DB. Specific binding of chloroplast proteins in vitro to the 3′ untranslated region of spinach chloroplast petD mRNA. Mol Cell Biol. 1991;11:4380–8. 10.1128/mcb.11.9.4380. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.Chakraborty S, Sophiarani Y, Uddin A. Free energy of mRNA positively correlates with GC content in chloroplast transcriptomes of edible legumes. Genomics. 2021;113:2826–38. 10.1016/j.ygeno.2021.06.026. [DOI] [PubMed] [Google Scholar]

[CR68] 68.Santos L, Alves A, Alves R. Evaluating multi-locus phylogenies for species boundaries determination in the genus Diaporthe. PeerJ. 2017;5:e3120. 10.7717/peerj.3120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] 69.Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and deletions: Computational methods, evolutionary dynamics, and biological applications. Mol Biol Evol. 2024;41:msae177. 10.1093/molbev/msae177. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Kim K, Lee SC, Lee J, Lee HO, Joh HJ, Kim NH, et al. Comprehensive survey of genetic diversity in chloroplast genomes and 45S nrDNAs within panax ginseng species. PLoS ONE. 2015;10:e0117159. 10.1371/journal.pone.0117159. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Ye J, Luo Q, Lang Y, Ding N, Jian YQ, Wu ZK, et al. Analysis of chloroplast genome structure and phylogeny of the traditional medicinal of Ardisia crispa (Myrsinaceae). Sci Rep. 2024;14:19045. 10.1038/s41598-024-66563-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR72] 72.Li DM, Pan YG, Liu HL, Yu B, Huang D, Zhu GF. Thirteen complete chloroplast genomes of the costaceae family: Insights into genome structure, selective pressure and phylogenetic relationships. BMC Genomics. 2024;25:68. 10.1186/s12864-024-09996-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR73] 73.Smirnov V, Warnow T. Phylogeny estimation given sequence length heterogeneity. Syst Biol. 2021;70:268–82. 10.1093/sysbio/syaa058. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR74] 74.Srivathsan A, Meier R. On the inappropriate use of Kimura-2-parameter (K2P) divergences in the DNA-barcoding literature. Cladistics. 2012;28:190–4. 10.1111/j.1096-0031.2011.00370.x. [DOI] [PubMed] [Google Scholar]

[CR75] 75.Bedoya CA, Dreisigacker S, Hearne S, Franco J, Mir C, Prasanna BM, et al. Genetic diversity and population structure of native maize populations in Latin America and the Caribbean. PLoS ONE. 2017;12:e0173488. 10.1371/journal.pone.0173488. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR76] 76.Vigouroux Y, Glaubitz JC, Matsuoka Y, Goodman MM, Sánchez GJ, Doebley J. Population structure and genetic diversity of New World maize races assessed by DNA microsatellites. Am J Bot. 2008;95:1240–53. 10.3732/ajb.0800097. [DOI] [PubMed] [Google Scholar]

[CR77] 77.Provan J, Soranzo N, Wilson NJ, Goldstein DB, Powell W. A low mutation rate for chloroplast microsatellites. Genetics. 1999;153:943–7. 10.1093/genetics/153.2.943. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR78] 78.Smith DR. Mutation rates in plastid genomes: They are lower than you might think. Genome Biol Evol. 2015;7:1227–34. 10.1093/gbe/evv069. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR79] 79.Christie JR, Beekman M. Uniparental inheritance promotes adaptive evolution in cytoplasmic genomes. Mol Biol Evol. 2016;34:677–91. 10.1093/molbev/msw266. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR80] 80.Andrews TD, Jermiin LS, Easteal S. Accelerated evolution of cytochrome b in simian primates: adaptive evolution in concert with other mitochondrial proteins? J Mol Evol. 1998;47:249–57. 10.1007/PL00006382. [DOI] [PubMed] [Google Scholar]

[CR81] 81.Aoyagi Blue Y, Sakai S. Low mutation rates promote the evolution of advantageous traits by preventing interference from deleterious mutations. Genetica. 2020;148:101–8. 10.1007/s10709-020-00091-6. [DOI] [PubMed] [Google Scholar]

[CR82] 82.Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007;317:338–42. 10.1126/science.1138632. [DOI] [PubMed] [Google Scholar]

[CR83] 83.Chen S, Yao H, Han J, Liu C, Song J, Shi L, et al. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS ONE. 2010;5:e8613. 10.1371/journal.pone.0008613. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR84] 84.Duan H, Wang W, Zeng Y, Guo M, Zhou Y. The screening and identification of DNA barcode sequences for Rehmannia. Sci Rep. 2019;9:17295. 10.1038/s41598-019-53752-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR85] 85.Ministerio de desarrollo agrario y riego (MIDAGRI). El maíz morado peruano. Dirección General de Políticas Agrarias– Dirección de Estudios Económicos. 2021. https://repositorio.midagri.gob.pe/handle/20.500.13036/1152.

PERMALINK

The rps15-rpl32 intergenic locus is a phylogeographic marker for Latin American Zea mays landrace varieties and subspecies

Luciano Univaso

Francisca Peña

Celián Román-Figueroa

Manuel Paneque

Abstract

Background

Results

Conclusions

Supplementary Information

Background

Methods

Genomic characterization of 54 Zea mays genomes

Table 1.

SNPs, InDels, and %GC in plastid regions (LSC, SSC, IRa, and IRb)

Gene and intergenic region characterization in the SSC region

Divergence analysis using K2P distances

Phylogenetic analysis of selected loci

Results

Genomic characterization of 54 Zea mays genomes

SNPs, InDels, and %GC in the LSC, SSC, Ira, and IRb regions and their associated loci

Fig. 1.

Characterization of genes and intergenes and co-localization of SNP-InDels at the SSC

Fig. 2.

Hypervariable zones and conserved genetic anchors for analysis of phylogenic varieties

Divergence by K2P of the highest mutant loci

Table 2.

rps15-rpl32 as possible regional marker

Phylogenetic trees based on the rps15-rpl32 and ndhD-ndhF loci

Fig. 3.

Fig. 4.

Concatenated loci provide the best phylogenetic tree

Table 3.

Fig. 5.

Discussion

SSC as the genomic region with the highest mutation rate

Divergence in genes and intergenes and co-localization of SNP-InDels at the SSC region

K2P divergence in genes and intergenes

Biological significance of rps15-rpl32 and ndhF mutation bias at the SSC region

rps15-rpl32 as a phylogeographic marker vs other common DNAbc loci

Phylogenetic trees based on high- and low-divergence intergenic regions

K2P may not be the best divergence indicator for closed phylogenetic maize varieties

Association of rps15-rpl32 locus phylogeny-inference clades with SSR genetic diversity

Some genomes do not fit in the clades within the phylogenetic trees

Conclusions

Supplementary Information

Abbreviations

Authors’ contributions

Funding

Data availability

Declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Footnotes

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases