Skip to main content
Biotechnology Reports logoLink to Biotechnology Reports
. 2025 Aug 21;48:e00918. doi: 10.1016/j.btre.2025.e00918

Comparative genomic analysis of underutilized legumes: insights into evolutionary relationships, genome evolution and stress tolerance

Omena Bernard Ojuederie a,b,, Ufuoma Lydia Akpojotor d,e, Adetomiwa Ayodele Adeniji b,g, Tina Chukwuyem Ojuederie d,f, Jacob Olagbenro Popoola h, Olubukola Oluranti Babalola b,c,
PMCID: PMC12424240  PMID: 40949161

Highlights

  • Significant genetic diversity was observed among the four legumes with the closest evolutionary relationship observed between Cowpea and Mung bean while Winged bean showed the most divergence.

  • Cowpea and Mung bean experienced significant gene expansions, while African yam bean and Winged bean had substantial gene losses.

  • The legumes showed adaptations to biotic and abiotic stresses, highlighting their potential as climate-resilient crops.

  • African yam bean had the highest potential for producing different secondary metabolites with the highest gene count for saccharide and terpene biosynthesis.

  • Mung bean had the highest gene clusters for alkaloids and polyketides biosynthesis.

  • Opportunity legumes especially African yam bean had higher levels of essential amino acids than Cowpea, which suggests their contribution to improved nutrition.

  • The identified genetic diversity of the legumes can be employed in the development of varieties with improved nutritional value and resilience to environmental stress.

  • Promoting the cultivation and consumption of these legumes can enhance food and nutrition security and meet the SDGs of zero hunger and better health and well-being.

Keywords: Bioactive compounds; Food security; Legume genomics; Orthologous genes; Opportunity crops, Secondary metabolites; Neglected and underutilized legumes

Abstract

African yam bean, Mung bean, and Winged bean, which are rich sources of nutrients and bioactive compounds, offer significant potential for food and nutrition security, yet they are underutilized. A comparative genomic analysis of these legumes with cowpea was conducted to unearth their molecular architecture and uncover their rich genomic profile. Protein and genomic fasta sequences were retrieved from the GenBank of the NCBI, and orthologous genes investigated, and secondary metabolites determined using OrthoVenn3 and PlantiSMASH programs. A total of 7761 single-copy and 20,250 unique genes were identified, which revealed their genetic diversity and conservation. Phylogenetic analysis showed the closest relationship between Cowpea and Mung bean, with Winged bean diverging significantly. Cowpea and Mung bean had significant gene expansions (+1051), while African yam bean (-864) and Winged bean (-643) had substantial gene losses. GO enrichment revealed the contributions to adaptations in the different legume species to biotic and abiotic stresses, highlighting their potential as climate-resilient crops. The highest protein gene (enzyme) count for saccharide (68) and terpene (18) biosynthesis was obtained in AYB. At the same time, mung bean had the highest gene clusters for alkaloids (10) and polyketides (5), and the highest enzyme count for the biosynthesis of alkaloids (32) and polyketides (17). Underutilized legumes exhibited higher essential amino acid levels compared to cowpea. These findings provide valuable insights for breeding programs and biotechnological interventions to improve the nutritional value and acceptance of these underutilized legumes, ultimately contributing to food and nutrition security.

1. Introduction

Estimates from the UN indicate that by 2050, the global population will reach 9.7 billion, meaning that 70 % more food will need to be produced to feed everyone [1]. The need for food production has increased, raising concerns about the availability of the necessary resources [2]. With the ever-growing human population, food insecurity is likely to increase if urgent steps are not taken. This increasing population will need to be fed, but the staple foods being produced currently will not be sufficient to meet these needs. To avoid this situation, coordinated efforts must be made to ensure food security through the cultivation of alternative crops with potentials for food security, particularly in sub-Saharan Africa (SSA), where the effects of climate change are expected to pose a significant challenge due to the frequency and duration of extreme weather events such as excessive heat, drought, and flooding, which have increased over time [3], subsequently affecting crop production and yield. Numerous investigations have demonstrated that crop productivity losses are becoming more frequent due to climate change [4], with an increase in yield losses reported for wheat, maize, and rice by 10 to 25 % per degree Celsius warming [5].

A group of crops, often called orphan, neglected, or underutilized crops, is gaining more research focus due to their ability to grow under different agroecological conditions and enhance food and nutrition security, especially in the developing world. These crops which include a few tuber crops (Taro, Cocoyam, Sweet potato) and legumes (African yam bean (Sphenostylis stenocarpa), Winged bean (Psophocarpus tetragonolobus), Mung bean (Vigna radiata), Bambara groundnut (Vigna subterranea), have the potential to augment food security if incorporated in a person’s daily diet. Resource-poor farmers who engage in subsistence farming are the primary producers of these native crops. Underutilized crops form a rich source of mineral nutrients, antioxidants and medicinal uses, and are well adapted to suboptimal growing conditions [6]. Nevertheless, over the years, underutilized crops have not received enough focus from the scientific community due to little or no investments in research and genetic enhancement, and not being traded internationally at a comparable rate as the major staple crops [6]. Increased use of these crops can improve nutrition and combat hidden hunger, whereas their loss can negatively impact the food security and nutritional status of the impoverished [7].

In combating malnutrition and food insecurity, legumes are vital for preserving food standards and improving the physicochemical characteristics of soil [2]. The aforementioned underutilized legumes are common to Africa and are valuable sources of nutrition and bioactive compounds that promote the health of humans and livestock. Antioxidant-rich legume seeds can reduce the harmful effects of oxidative stress and enhance human health [8]. These underutilized legumes have several health benefits because of their high dietary fiber content, vitamins, and polyunsaturated fat content; they can be used as functional foods [9].

This study focused on the comparative genomics of the underutilized legumes: African yam bean, winged bean, mungbean, and cowpea, due to their growing importance as a food and nutrition security crop in SSA. This study has the potential to enhance the genetic improvement of these legumes, support food security and agricultural sustainability, and identify useful traits relevant to adaptation to environmental stresses. Recently, the Rockefeller Foundation supported the initiative of the Vision for Adapted Crops and Soils (VACS) to augment agricultural productivity and nutrition. VACS is a collaborative partnership between the African Union (AU), UN’s Food and Agricultural Organization (FAO) and the United States Department of the Office of the Special Envoy on Food Security. This partnership calls for more investment in underutilized indigenous and traditional food crops, which include some legumes, cereals, fruits, root and tubers and vegetables, to boost nutrition in Africa and strengthen resilience to climate change. VACS regards these crops as “Opportunity Crops.” These are crops with abundant untapped potential to enhance food and nutrition security in the framework of climate change [10]. Cowpea and mung bean are part of the listed opportunity crops under the legume category due to the enormous opportunities they possess to enhance livelihoods, and boost food and nutrition security globally. However, African yam bean (Sphenostylis stenocarpa) and Winged bean (Psophocarpus tetragonolobus), two tuberous legumes consumed for both seeds and tubers in Africa, also fall into the legume category of opportunity crops. Comparing them with cowpea and mung bean will give more insights into their evolutionary history and gene functions, especially with regard to adaptation to biotic and abiotic stresses.

African yam bean (AYB) is a dual-purpose crop that is eaten in East and Central Africa for its seeds and tubers, and exclusively for the seeds in West Africa. It is rich in mineral nutrients (iron, potassium, magnesium, and nitrogen) having high protein content in the seeds (37 %) and tubers (15 %) and bioactive compounds [8,9]. The flowers of AYB are cleistogamous in nature, and the plant requires staking; hence, it is often grown by farmers with yams so that the same stake is used. A typical AYB plant and its products is presented in Fig. 1. AYB originated from Ethiopia and spread to other parts of Africa. In different regions of Nigeria, AYB is grown by resource-poor farmers who possess the genetic resources [11,12]. It is also used in a variety of ways, such as bean cake (moi moi), and as a special delicacy in ceremonies [12,13].

Fig. 1.

Fig 1:

African yam bean (a) Flowering (b) pod formation (c) tuber formation [15].

Despite the benefits of the crop, several constraints to its production have been reported, including photoperiod sensitivity, which affects tuberization, hardness of the seed coat, which results in more fuel spent on cooking the seeds, poor shelf-life, as well as low seed yield [12]. Characterization of AYB germplasm has been conducted using morphological and molecular approaches to identify accessions with useful traits. However, no major breeding program has been conducted on the crop. Understanding the genetic makeup of AYB and its evolutionary relationship and diversity to related legumes may lead to its genetic improvement and utilization [[14], [15], [16]].

This can be accomplished by utilizing molecular and sequence data obtained from research arising from the use of molecular markers. Specific markers have been developed to identify chloroplast genes in plants and are currently used to analyze and comprehend evolutionary patterns and sequence variants at the genus level. Ribulose-1,5-Bisphosphate Carboxylase/Oxygenase (RuBisCO) is one of such chloroplast genes. The partial rbcL gene sequences for intra-specific variability and phylogenetic relationships among some accessions of AYB were assessed by Popoola et al. [9]. The authors discovered five haplotypes and thirteen polymorphic locations that may be investigated using mutation breeding, nuclear methods, and contemporary biotechnologies to widen or extend the limited genetic basis of AYB. Specific molecular markers and genetic linkage maps for AYB can be developed with single nucleotide polymorphisms (SNPs).

Recently, Osuagwu et al. [17] utilized the rbcl gene to assess the variability of AYB and related legumes in the Fabaceae family. The findings of their study's maternal lineage analysis demonstrated that the AYB may have descended from Glycine max (Soybean) and Cajanus cajan (Pigeon pea) before evolving into its current genetic state over a long period.

The development of genomic breeding values based on next-generation sequencing (NGS), a high-throughput DNA sequencing technique, and other modern technologies has enabled plant breeders to create markers for genotyping, diversity assessments, high-density genetic mapping, and population genetics research [2]. These markers can then be combined with existing breeding approaches to achieve anticipated goals in crop improvement [2]. It is now feasible to quickly compare the genetic sequences of multiple genomes, in addition to examining individual genomes, due to next-generation DNA and RNA sequencing. The development of high-throughput sequencing techniques based on breakthroughs in DNA technology has made it easier to identify single-nucleotide polymorphisms (SNPs) and use them to improve crops [18].

Genome sequencing allows for the selection of genes with desirable traits that enhance yield and improve nutritional quality, allowing for the improvement of crops based on genetic advantages. It also provides the knowledge necessary to comprehend the process of creating a genome, reveals the pathways associated with the stress response, and identifies the mutagenic alterations caused by insertions and deletions in the genome [2]. Sequencing the genome of orphan crops enables their large-scale improvement, which serves as a solid footing for their modifications and development using various techniques like molecular breeding and genome editing. The draft genomes of the legumes used in this study have been sequenced and annotated [[19], [20], [21], [22]]. Comparing these underutilized legumes with cowpea can identify regions of synteny, which will give us more information on their genetic potential. Waweru et al. [21] developed the first chromosome-scale assembly of the AYB genome. In both lablab (6 AYB chromosomes) and common bean (5 AYB chromosomes), syntenic linkages were found across two or more chromosomes, and the gene content of the AYB genome was deciphered utilizing combined transcript and homology evidence for annotation [21]. The recent chromosome genome assembly of mung bean by Khanbo et al. [23] revealed 98.3 % of highly conserved orthologs from gene prediction using Benchmarking Universal Single Copy Orthologs analysis. They conducted a comparative genomics analysis of several crops, including mung bean, black gram, cowpea, adzuki bean, common bean, soybean, chickpea, peanut, melon, water melon, rice and Arabidopsis using single-copy orthologous genes. The phylogenetic analysis revealed a very close relationship between mung bean and black gram, with a divergence of about 4.17 million years ago [23]. In addition, 2764 essential gene families across the three mung bean accessions used and the other plant species were identified, with most gene families implicated in secondary metabolite production. Comparative and functional genomics are therefore essential for understanding the mechanisms of plant evolution and adaptation to various biotic and abiotic stresses. Comparative genomics helps identify genetic variation in wild and agricultural plants for biodiversity-based food solutions, which is useful for understanding the dynamics of the genome of underutilized crops. With functional genomics, geneticists and breeders can design crops that support diverse production systems and maintain agrobiodiversity-related environmental services. This study unraveled the evolutionary relationships and genome evolution of AYB through comparison with cowpea, mung bean and winged bean. Because of the dual food products of AYB and its rich nutritional and nutraceutical benefits, this study identified the secondary metabolites produced in the genome of the above-listed legumes for better utilization and improvement for sustainable food production.

2. Materials and methods

2.1. Experimental site

The in-silico study was carried out in the biotechnology laboratory of the Department of Biological Sciences, Kings University, Nigeria.

2.2. Retrieval of nucleotide and amino acid sequences

We retrieved the protein FASTA sequences of the genomes of AYB, Cowpea, Winged Bean, and Mung Bean from GenBank at the National Centre for Biotechnology Information. This was done by downloading the FASTA protein sequences of all genomes, which were used for the analyses. The details of the crops are given in Table 1.

Table 1.

Information of the legumes used in this study from the NCBI database.

Crop Scientific name Genome assembly Ref Submitter of the genome Date assembled
Cowpea cultivar-IT97K-499–35 Vigna unguiculata ASM411807v2 University of California, Riverside January 2019
Mung bean Vigna radiata Vradiata_ver6 Seoul National University October 2015
Winged bean Psophocarpus tetragonolobus Ptet_1 Agricultural Genomics Institute at Shenzhen April 2024
African yam bean Sphenostylis stenocarpa hybrid_with_Hi-C International Livestock Research Institute October 2023

2.3. Orthologous gene analysis

The downloaded protein-coding FASTA sequences of each crop were uploaded to OrthoVenn3 for validation and analysis. The OrthoMCL algorithm was selected. The protein coding genes of each species were aligned using OrthoVenn, to compare the synteny of the four genomes. The required parameters (annotation, protein similarity, phylogenetic analysis, and expansions and contractions) were set and the analyses run. Orthologous gene clusters and groups were identified. OrthoVenn3 uses Uniprot to annotate the gene clusters.

2.4. Phylogenetic analysis

The aligned sequences were used to develop a phylogenetic tree based on the identification of a highly conserved single-copy gene to describe the evolutionary relationships between the legume species. The multiple sequences were aligned using Muscle. We then constructed a phylogenetic tree using the maximum likelihood method with the JTT+CAT model to illustrate the orthogroups within each species. The CAFE (Computational analysis of gene family evolution) CAFE5 (https://github.com/hahnlab/CAFE5) program was used to calculate the contraction and expansion of the gene family. The aligned multiple fasta sequences obtained were exported to MEGA II for further analyses, to study the amino acid composition of each genome, the conserved sites, and the variable sites within the sequences.

2.5. Secondary metabolite prediction and comparison

Complete genome data/GenBank files for the plant genomes were downloaded from the NCBI website and submitted to the plantiSMASH portal for secondary metabolite analysis [24]. Amino acid sequences of the secondary metabolites were also extracted from the plantiSMASH portal and uploaded to the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) search menu, to create the three-dimensional (3D) structures/models of the unique proteins in the plant genomes [25]. Default settings were employed for the analysis.

3. Results

3.1. Genome assembly of legumes

The genome assembly data for each crop is presented in Table 2. The genome size ranged from 463.1 Mb (mung bean) to 709.8 Mb (winged bean). Winged bean and AYB, both tuberous legumes, had higher genome sizes. Eleven chromosomes were identified in three of the legumes, except winged bean, which had 9 chromosomes. Cowpea and mung bean had the same GC (Guanine-Cytosine) content (33 %). AYB had the highest GC content (34.5 %) while winged bean had the least (32 %). The GC content plays a significant role in shaping genome architecture, evolution, and function, making it an essential aspect of genomic research.

Table 2.

Statistics of orthologous analysis.

Variable Score
Overlaps 15
All clusters 24,535
Single-copy clusters 7761
All proteins 145,376
All singletons 20,250
Percentage of singletons 13.93 %

The statistics of orthologous gene analysis revealed a total of 24,535 clusters, of which 7761 were single copy clusters (Table 3). These groups of single-copy genes are clustered together in a specific chromosomal region, which means they are physically close to each other in the genome. A total of 145,376 proteins were obtained from all four genomes distributed as follows; cowpea (41,082), mung bean (42,284), winged bean (31,170) and AYB (30,840). A total of 20,250 singletons (13.93 %) were identified in the genomes.. It ranged from 2410 (cowpea) to 8004 (AYB). Mung bean had 3096 singletons while winged bean had 6740 (Table 4). Cowpea and mung bean had a higher number of proteins (41,082 and 42,284, respectively) compared to AYB and winged bean (30,840 and 31,170, respectively), which had a smaller number of proteins (Table 4).

Table 3.

Genome Assembly Statistics of the four legumes.

Genome statistics Plant Name
Vigna unguiculata Vigna radiata Sphenostylis stenocarpa Psophocarpus tetragonolobus
Genome size 518.6 Mb 463.1 Mb 649.8 Mb 709.8 Mb
Total ungapped length 515.9 Mb 429.6 Mb 649.6 Mb 709.7 Mb
Number of chromosomes 11 11 11 9
Number of scaffolds 675 2497 11 362
Scaffold N50 41.7 Mb 25.4 Mb 69.5 Mb 83.5 Mb
Scaffold L50 6 6 4 4
Number of contigs 743 23,499 2003 423
Contig N50 10.9 Mb 48.8 kb 859.1 kb 13.2 Mb
Contig L50 16 2292 170 19
GC percent 33 33 34.5 32
Genome coverage 91.0x 300.0x 60.0x 91.0x
Assembly level Chromosome Chromosome Chromosome Chromosome

Table 4.

Distribution of parameters from orthologous genes of cowpea, mung bean, winged bean and African yam bean.

Species Proteins Clusters Singletons
Psophocarpus tetragonologus 31,170 16,564 6740
Vigna radiate 42,284 21,796 3096
Vigna unguiculata 41,082 21,903 2410
Sphenostylis stenocarpa 30,840 19,866 8004

3.2. Venn diagram

The Venn diagram revealed the distribution of orthologous genes across the genome of the species (Fig. 1). It identified groups of genes that are specific to all species, unique to a single species and those found between two species, which is an indication of lineage-specific events of evolution. A total of 751 gene clusters were unique to mung bean, 391 to AYB, 647 to cowpea and 481 unique to winged bean. Cowpea and mung bean shared 1464 genes, while AYB and winged bean shared 580 genes. Fewer genes were shared between cowpea and winged bean (90), and between winged bean and mung bean (73), respectively. All four legume species shared a total of 13,674 orthologous gene clusters (Fig 2).

Fig. 2.

Fig 2:

Venn diagram showing distribution of orthologous gene clusters in the four legume species.

3.3. Comparative genomics based on phylogeny

The phylogenetic tree showing the evolutionary relationship of the legume species, Vigna radiata, Psophocarpus tetragonolobus, Sphenostylis stenocarpa, and Vigna unguiculata based on single-copy gene clusters is presented in Fig. 3. The numbers on the branches represents evolutionary divergence between the species. A larger number indicates a greater genetic difference or longer evolutionary time between species. Winged bean was the most distantly related species among the four, with a genetic distance of 0.094 from the others, and a relatively long branch length indicating significant genetic divergence. AYB had an evolutionary distance of 0.053 from winged bean and 0.039 from both cowpea and mung bean. Mung bean is more closely related to cowpea, with an evolutionary genetic distance of 0.028 compared to AYB. This indicates that the closest evolutionary relationship was observed between cowpea and mungbean, with a very small genetic distance separating them. The tree indicates that winged bean diverged earlier from the common ancestor of all four species. The AYB diverged from a lineage leading to cowpea and mungbean, while cowpea and mungbean diverged most recently from a common ancestor (Fig. 3).

Fig. 3.

Fig 3:

Species phylogenetic tree (Newick) showing evolutionary relationships among legume species.

The maximum likelihood phylogenetic tree showing the number of orthologous groups in each species is presented in Fig. 4. The tree topology reveals the branching pattern of the species. The winged bean is positioned as the outgroup, suggesting that it diverged earlier from the common ancestor than the other three species. The number of orthologous groups varied among species. Cowpea had the highest number (21,903), followed by mung bean (21,796), AYB (19,866), and winged bean (16,564). This suggests differences in gene content and evolutionary history among the species. The variation in the number of orthologous groups indicates differences in gene family expansion or contraction during the evolutionary history of these species.

Fig. 4.

Fig 4:

Maximum Likelihood phylogenetic tree showing orthologous groups present in cowpea, winged bean, mung bean, and AYB.

The computational analysis of gene families in four legume species: AYB, cowpea, winged bean and mung bean is presented in the ultrametric tree in Fig. 5. It shows the variation in gene family among the four legume species. The analysis identified gene family expansions and contractions across the four species. Gene family expansions are increases in the number of genes in a family relative to a reference genome, while gene family contractions are decreases in the number of genes in a family relative to a reference genome. The y-axis represents the number of genes gained or lost (in million years ago). The time scale (x-axis) goes from 10 million years ago to 0 million years ago. The four legume species diverged from each other at different times. Gene family expansions and contractions have occurred throughout the evolutionary history of these four legumes.

Fig. 5.

Fig 5:

Ultrametric tree showing expansion and contraction of orthologous gene clusters in gene family among the four legume species.

The tree revealed that cowpea and mung bean had the highest number of gene family expansions (+1051), indicating a significant increase in gene families compared to its common ancestor. Both cowpea and mung bean gained (+698) and (+862) genes, respectively, from the common ancestor. AYB exhibited a notable number of expansions (+33), suggesting an increase in gene families since its divergence from the Cowpea/Mung bean lineage. Winged bean had relatively fewer gene family expansions (+37). African yam bean had the highest number of gene family contractions (−864), indicating a substantial loss of gene families compared to its ancestral state, followed by winged bean, which displayed a considerable number of contractions (−643), suggesting a reduction in gene families. Cowpea and Mung bean had fewer gene family contractions (−153 and −159, respectively).

3.4. Pairwise heatmap

The pairwise heatmap provides a visual representation of the relationships between the four legume species (Fig. 6). The color intensity in each cell represents the strength of the relationship between the two species being compared. Results reveal that Cowpea and mung bean have the strongest relationship among the four species, indicated by the most intense red color in their corresponding cell. This suggests they share a higher number of similar characteristics or genetic information. Winged bean and AYB seem to have a relatively weaker relationship compared to the other pairs, as indicated by the lighter color in their cell. However, they share more genes compared with Cowpea and mung bean which had lighter intensity. The pairwise heatmap offered a visual overview of the relationships between the four legume species, highlighting the closer relationship between Cowpea and mung bean compared to the other pairs.

Fig. 6.

Fig 6:

Pairwise heatmap of four legume species: Winged bean, Mung bean, Cowpea, and African yam bean.

3.5. Single copy gene clusters

Analysis of the single-copy gene clusters from the genome of the four legume species revealed several biological processes, molecular functions and cellular components in which the gene clusters are involved. These gene clusters are found in only one copy of the legume genome and possess functions that are conserved across all genomes. Ten important biological processes, molecular functions and cellular components controlled by the single copy gene clusters are shown in Fig. 7. The Gene Ontology (GO) annotations are given for each cluster. The bulk of the single copy gene clusters (25.17%) were involved in biological processes (GO:0008150).

Fig. 7.

Fig 7:

Biological processes, molecular functions and cellular components controlled by single copy gene clusters.

This is followed closely by 13.66 % of genes involved in metabolic processes (GO:0008,152). The metabolic process is split into primary metabolic process (GO:0044,238) and macromolecular process (GO:0043,170), each having 5.81 % and 8.03 % single copy gene clusters respectively (Fig. 7). Both processes contribute to nucleic acid metabolic process (GO:0090,309) and RNA metabolic processes (GO:0016,070) respectively (Fig. 8) The cluster genes involved in macromolecule metabolic process and nucleo-base containing metabolic compound, both interact to aid the nucleic acid metabolic process that is essential for RNA metabolic process in the legumes. The genes are involved in RNA metabolism and the functional gene products in plants. Single-copy gene clusters associated with cellular functions (11.1 %) and cellular metabolic processes (9.68 %) were also identified with GO annotations (GO: 0009,987 & GO: 0044,237) respectively. The cellular metabolic process has been replaced by the metabolic processes function (GO:0008,152). Analysis of the molecular functions from gene ontology revealed that 16.96 % of gene clusters are involved in transferase activity (GO:0016,740) during metabolism. About 14.76 % gene clusters (GO:0003,676), are involved in nucleic acid binding and 14.1 % gene clusters (GO:0043,167) associated with ion binding (Fig. 7). Base pairing of nucleic acids is essential for DNA replication and the process of protein synthesis which is required for gene expression.

Fig. 8.

Fig 8:

Metabolic processes of single copy gene clusters.

Key cellular components were identified for the single copy gene clusters. Most of the gene clusters were in the membrane (GO: 0016,020) accounting for 22.28 %. Single copy gene clusters associated with parts of the cell (GO:0044,464) and its components (GO:0005,575) were also abundant in the legumes. The membrane functions include that of the plasma membrane, mitochondria membrane, the nuclear membrane and other membranes present in the cell which maintains the homeostasis in the cell. About 8.65 % of cluster genes were involved in transporter activity (GO:0005,215). This activity is essential for the movement of substances in and out of the cell as well as assimilates from one part of the plant to the other. Other notable single copy gene clusters associated with cellular components includes the nucleus (GO:0005,634), mitochondrion (GO:005,739), and plastid (GO:0009,536) accounting for 4.53 %, 10.87 %, 3.99 % respectively (Fig. 7).

3.6. Gene ontology (GO) enrichment

GO enrichment of the single copy clusters provided valuable insights into the biological processes cellular components and molecular functions associated with the conserved genes identified, and revealed the processes that are over represented among the set of gene clusters. Table 5 shows the list of GO enrichment obtained from the single copy clusters in the four legumes for biological functions related to stress. A total of 321 gene clusters were linked to response to water-deficit stress (G0:0009,414), 295 with biological function in relation to response to salt stress (GO:0009,651) and 197 gene clusters for response to oxidative stress (GO:0006,979). Few gene clusters were obtained for other abiotic stress related traits such as response to heat with 83 genes identified (GO:0009,408), response to ethylene with 31 genes identified (GO:0009,723) as well as response to temperature stimulus (GO:0009,266). Singly copy genes for defense response to fungus (GO:0050,832) and defense response to bacterium (GO:0042,742) were identified (22 and 18 respectively), as well as single copy genes (53) for response to osmotic stress and few genes (4) for proline biosynthesis process. Four genes associated with the molecular function of acid phosphatase activity was also identified (GO:0003,993).

Table 5.

GO Enrichment of singly copy gene clusters for biotic and abiotic stress.

GO ID Type of process Function Count P=Value
GO:0,009414 biological process response to water deprivation 321 8.90E-67
GO:0,009651 biological process response to salt stress 295 1.43E-190
GO:0,006979 biological process response to oxidative stress 197 2.85E-53
GO:0,009408 biological process response to heat 83 2.39E-40
GO:0,009737 biological process response to abscisic acid 80 1.30E-65
GO:0,048364 biological process root development 67 4.48E-126
GO:0,006970 biological process response to osmotic stress 53 4.35E-07
GO:0,009607 biological process response to biotic stimulus 52 1.09E-20
GO:0,000302 biological process response to reactive oxygen species 38 0.000996
GO:0,009723 biological process response to ethylene 31 1.55E-20
GO:0,050832 biological process defense response to fungus 22 1.22E-110
GO:0,080147 biological process root hair cell development 22 0.002532
GO:0042742 Biological process defense response to bacterium 18 1.12E-07
GO:0,009266 biological process response to temperature stimulus 14 2.01E-09
GO:0,010311 biological process lateral root formation 8 4.90E-07
GO:0,006749 biological process glutathione metabolic process 6 2.98E-06
GO:0,055129 biological process L-proline biosynthetic process 4 8.69E-05
GO:0,009610 biological process response to symbiotic fungus 4 2.04E-08
GO:0,003993 molecular function acid phosphatase activity 4 7.76E-06

GO enrichment also identified over represented genes involved in metabolism, gene regulation, and plant physiological functions (Table 6). Most of the genes were involved in biological processes of growth regulation (130, GO:004,008) and photosynthesis (107, GO:0015,979). GO enrichment of important biological processes such as nodulation (39, GO:0009,877), flower development (48, GO:0009908) and response to plant growth hormones; cytokinin (65, GO:0009735), and ethylene (4, GO:0071369) were also identified, among others (Table 6). The GO enrichment identified in this study gives us a better understanding of gene functions and relationships in the four legumes and has enabled the identification of novel biological processes common in the species.

Table 6.

GO Enrichment of single copy gene clusters for regulation, metabolism and physiology.

GO ID Type of process Function Count P=Value
GO:0,040008 biological process regulation of growth 130 3.25E-20
GO:0,015979 Biological process Photosynthesis 107 3.15E-08
GO:0,010228 Biological process vegetative to reproductive phase transition of meristem 97 9.95E-123
GO:0,009416 Biological process response to light stimulus 95 1.23E-33
GO:0,009845 Biological process seed germination 84 1.28E-19
GO:0,005975 Biological process carbohydrate metabolic process 70 4.61E-197
GO:0,006633 Biological process fatty acid biosynthetic process 70 1.27E-07
GO:0,009735 Biological process response to cytokinin 65 2.95E-07
GO:0,009908 Biological process flower development 48 1.44E-19
GO:0,010431 Biological process seed maturation 43 1.95E-10
GO:0,010114 Biological process response to red light 41 2.10E-08
GO:0,009793 Biological process embryo development ending in seed dormancy 40 1.98E-10
GO:0,009877 Biological process Nodulation 39 1.31E-21
GO:0,009507 cellular component Chloroplast 36 5.02E-27
GO:0,009909 Biological process regulation of flower development 33 3.77E-06
GO:0,009637 Biological process response to blue light 24 7.07E-20
GO:0,008033 Biological process tRNA processing 22 3.92E-20
GO:0,006833 Biological process water transport 22 4.80E-07
GO:0,048510 Biological process regulation of timing of transition from vegetative to reproductive phase 15 6.14E-28
GO:0,010214 Biological process seed coat development 15 1.15E-09
GO:0,006865 Biological process amino acid transport 14 1.52E-07
GO:0,090351 Biological process seedling development 13 1.24E-20
GO:0,010337 Biological process regulation of salicylic acid metabolic process 12 2.06E-13
GO:0,048766 Biological process root hair initiation 10 0.026833
GO:0,080183 Biological process response to photooxidative stress 8 0.024811
GO:0,006006 Biological process glucose metabolic process 8 4.67E-06
GO:0,048765 Biological process root hair cell differentiation 7 1.53E-08
GO:2,000031 Biological process regulation of salicylic acid mediated signaling pathway 6 2.35E-08
GO:0,032465 Biological process regulation of cytokinesis 5 0.000482
GO:0,043087 Biological process regulation of GTPase activity 5 1.86E-07
GO:0,009639 Biological process response to red or far-red light 4 4.28E-40
GO:0,048768 Biological process root hair cell tip growth 4 3.00E-19
GO:0,048831 Biological process regulation of shoot system development 4 1.76E-05
GO:0,071369 Biological process cellular response to ethylene stimulus 4 0.014325
GO:0,010082 Biological process regulation of root meristem growth 4 4.28E-40

3.7. Amino acid composition of the legumes

The amino acid composition of protein sequences of the four legume species is presented in Table 7. The results show that the underutilized legumes had higher percentage of essential amino acids compared to cowpea. AYB and winged bean had the highest percentages in seven amino acids. AYB had the highest percentage of alanine (7.15 %), aspartic acid (5.38 %), glutamic acid (6.18 %), glycine (6.94 %), methionine (2.42 %), arginine (5.57 %) and serine (8.45 %). For the winged bean, the amino acids cysteine (1.89 %), histidine (2.50 %), isoleucine (5.51 %), proline (4.95 %), glutamine (3.79 %), threonine (5.26 %) and tryptophan (1.51 %) had the highest composition (Table 7). Mung bean had higher levels of essential amino acids; phenylalanine (4.55 %), lysine (6.10 %), arginine (5.57 %), valine (6.88 %) and tyrosine (3.02 %). Cowpea only had the highest amino acid composition in leucine (9.38 %) and asparagine (4.88 %).

Table 7.

Amino acid frequencies of legumes.

Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Total
Mung bean 6.92 1.79 5.26 6.17 4.55 6.86 2.53 5.17 6.10 9.12 2.18 4.82 4.78 3.49 5.57 8.27 5.08 6.88 1.45 3.02 14,383.00
Cowpea 6.94 1.79 5.08 6.11 4.47 6.83 2.40 5.47 6.09 9.38 2.34 4.88 4.61 3.71 5.49 8.22 5.29 6.65 1.43 2.82 14,216.00
African yam bean 7.15 1.87 5.38 6.18 4.28 6.94 2.40 5.39 5.74 9.36 2.42 4.54 4.70 3.61 5.57 8.45 5.21 6.57 1.39 2.84 14,287.00
Winged bean 6.97 1.89 5.29 5.88 4.46 6.85 2.50 5.51 5.89 9.07 2.37 4.63 4.95 3.79 5.45 8.34 5.26 6.44 1.51 2.98 14,543.00
Avg. 6.99 1.84 5.25 6.08 4.44 6.87 2.46 5.38 5.96 9.23 2.33 4.72 4.76 3.65 5.52 8.32 5.21 6.63 1.44 2.92 14,357.25

3.8. Biosynthetic gene clusters (BGCs), secondary metabolite, and core domain enzyme prediction and comparison

The distribution of the secondary metabolites in the plants is highlighted in the Fig. 9. Overall, Vigna radiata had 42 gene clusters, Vigna unguiculata 30, Psophocarpus tetragonolobus 35, and Sphenostylis stenocarpa 45 (Supplementary Table 1, 2, 3, 4). Mung bean and AYB had equal amounts of terpene clusters (5) and mung bean and cowpea had equal amounts of putative gene clusters. Mung bean had the highest gene clusters for alkaloids (10) and polyketides (5). However, AYB had the highest number of saccharides (10) and putative gene clusters (10). The distribution of protein gene/enzyme count within the core cluster domains predicted by Cluster Database at High Identity with Tolerance (CD-HIT) is shown in Fig. 9. Among the legumes, AYB had the highest protein gene (enzyme) count for saccharide (68) and terpene (18) biosynthesis followed by mung bean. Whereas, mung bean had the highest enzyme count for the biosynthesis of alkaloids (32) and polyketides (17) (Fig. 10). The unique protein genes identified in the core cluster domains of the plant genomes include cytochrome P450 genes, cellulose synthase-like genes, BAHD acyltransferase genes, strictosidine synthase-like genes, dirigent enzymes, lipoxygenase, terpene synthases, glycosyltransferases, copper amine oxidases, polyprenyl synthetases, prenyltransferases, and pictet-spengler enzymes (Bet v1) (Fig. 11; Supplementary Table 1, 2, 3, 4).

Fig. 9.

Fig 9:

Distribution of the secondary metabolites in the legume species.

Fig. 10.

Fig 10:

Total biosynthetic gene/enzyme count within the core cluster domains predicted by Cluster Database at High Identity with Tolerance (CD-HIT) in plantiSMASH.

Fig. 11.

Fig 11:

Unique protein genes/enzymes identified in the core cluster domains of all the legume genomes by Cluster Database at High Identity with Tolerance (CD-HIT).

4. Discussion

Comparative genomics is crucial in understanding evolutionary relationships and genome evolution by comparing genetic information across organisms to unravel gene structure, function, and evolution [26]. Phylogenetic and comparative genomic analyses provide insights into taxonomic affinities, evolutionary relationships, and genome similarities among closely related organisms, thereby refining our understanding of complex evolutionary processes [27]. The increasing number of orphan legume crop genomes provides an ideal opportunity to re-evaluate phylogenetic and functional differences among these legumes. The single-copy genes that clustered together in specific chromosomal regions could mean that they may be functionally related and potentially play a crucial role in maintaining the core biological processes across these legumes. The high number of singletons identified in AYB suggests a greater degree of genomic divergence, possibly due to unique evolutionary pressures or a different evolutionary path compared to the other legumes. The clusters of the orthologous genes indicate that Sphenostylis stenocarpa and Psophocarpus tetragonolobus have a more divergent evolutionary path or possibly a reduction in gene content due to different selective pressures.

We generated a phylogenetic tree comparing Sphenostylis stenocarpa protein-coding sequences to those of other legumes in the Fabaceae family (Vigna radiata, Psophocarpus tetragonolobus, and Vigna unguiculata). The tree divided all the legumes into two major clusters. However, the AYB was sub-grouped with Vigna radiata and Vigna unguiculata. This implies that the evolutionary changes in the sequences of the three legumes are comparable. It also implies that Vigna radiata and Vigna unguiculata are the closest significant crops relative to Sphenostylis stenocarpa than Psophocarpus tetragonolobus. This suggests that Sphenostylis stenocarpa may share the same agronomic and genetic traits with Vigna radiata and Vigna unguiculata. This is evident in the pod shape, as Sphenostylis stenocarpa, Vigna radiata, and Vigna unguiculata have similar pod shapes, unlike the distinct shape of Psophocarpus tetragonolobus. To a more considerable extent, this suggests that the genus Sphenostylis is closer to the genus Vigna than Psophocarpus. The results are consistent with Oswagu et al. [17], who also reported two major clades in the phylogenetic relationship of Sphenostylis stenocarpa and other legumes in the Fabaceae family based on an in-silico approach. The results also collaborate with Abdelsalam et al. [28], who reported two major clusters in the phylogenetic relationship of legumes in Fabaceae plants based on DNA barcoding.

We constructed a second tree based on the maximum likelihood method and compared Sphenostylis stenocarpa to other leguminous species, which further supported this relationship. This corresponds with a previous report by Harms [29], who grouped the genus Sphenostylis with the genus Dolichos and Vigna [30]. The number of orthologous genes conserved for the four crops studied showed that these crops share a common ancestor. However, Sphenostylis stenocarpa and Psophocarpus tetragonolobus have a higher number of conserved genes than Sphenostylis stenocarpa and Vigna radiata, and Sphenostylis stenocarpa and Vigna unguiculata. This implies that Sphenostylis stenocarpa and Psophocarpus tetragonolobus may still share certain functional gene sets. Both legumes are known to produce edible underground tubers. The heat map also showed a stronger relationship between Sphenostylis stenocarpa and Psophocarpus tetragonolobus, which indicates a higher number of conserved genes than Sphenostylis stenocarpa and Vigna radiata, and Sphenostylis stenocarpa and Vigna unguiculata. Further investigation into the specific genes contributing to this relationship could potentially reveal novel traits that enable adaptations within these crops.

The GC content of sequences determines their percentage stability in the event of evolutionary actions [31]. GC pairs form three hydrogen bonds, compared to the two hydrogen bonds formed by adenine-thymine (A-T) pairs. As a result, sequences with higher GC content are more stable and contribute to the overall stability of the DNA double helix [32]. GC content can affect the overall structure and function of the genome, including chromatin organization and replication timing [33]. Our results show that the GC content of Sphenostylis stenocarpa was the highest (34.5 %) and is similar in range to that of other legumes, suggesting that they may all have similar molecular stability, a common ancestor, and conserved regions. This also implies that Sphenostylis stenocarpa has better thermostability properties than the other legumes, which means that its DNA would retain its structure and function under stress. This result collaborates with that of Osuagwu et al. [17], who reported that the GC content of Sphenostylis stenocarpa rbcl was similar in range to other legumes.

We analyzed the Gene family expansions and contractions in the four leguminous crops using the CAFE program, which showed variation among the species. The analysis revealed a distinct divergence in gene family dynamics between two clades: one consisting of Vigna radiata and Vigna unguiculata, and the other of Sphenostylis stenocarpa and Psophocarpus tetragonolobus. The former clade appears to have diverged from the common ancestor primarily through gene family expansion, suggesting that these species experienced evolutionary pressures favoring diversification and increased adaptability. In contrast, the latter clade diverged through gene family contraction, potentially reflecting adaptation to specific, possibly more stable environments where fewer gene families were required. This result is similar to earlier findings, which also reported two major clusters within the legume family [34].

Our findings indicate that the gene family expansions in Sphenostylis stenocarpa and Psophocarpus tetragonolobus were notably lower than those observed in Vigna radiata and Vigna unguiculata. This suggests that the former group experienced fewer gene family expansions during their evolutionary history, implying less evolutionary change and greater conservation of genetic regions shared with their common ancestor. Consequently, Sphenostylis stenocarpa and Psophocarpus tetragonolobus may exhibit lower functional diversity in their gene families compared to Vigna radiata and Vigna unguiculata. In terms of gene family expansion alone, Sphenostylis stenocarpa and Psophocarpus tetragonolobus appear more similar to the common ancestor than the other two species. Conversely, the analysis showed that gene family contraction was significantly higher in Sphenostylis stenocarpa and Psophocarpus tetragonolobus than in Vigna radiata and Vigna unguiculata. This implies that a greater number of non-essential gene families were lost in these species, likely contributing to a more streamlined and efficient genome. However, this contraction might also indicate reduced genetic flexibility or diversity in Sphenostylis stenocarpa and Psophocarpus tetragonolobus compared to Vigna radiata and Vigna unguiculata.

Considering gene family contraction alone, Sphenostylis stenocarpa and Psophocarpus tetragonolobus have diverged more significantly from their common ancestor, reflecting greater evolutionary change. However, when considering the overall ratio of gene family expansion to contraction, it becomes evident that Sphenostylis stenocarpa and Psophocarpus tetragonolobus have diverged significantly from the common ancestor. This divergence is characterized by a lower expansion and higher contraction balance, indicating a distinct evolutionary trajectory compared to Vigna radiata and Vigna unguiculata.

The single-copy gene clusters associated with cellular functions and cellular metabolic functions in this study could be housekeeping genes needed by the plants for the cell physiological processes. Likewise, 147 single-copy gene clusters were identified for transferase activity in this study, which accounts for 16.96 %. Transferases are involved in the transfer of functional groups from one molecule to another during metabolic processes. A very important transferase in plants is glycosyltransferase, which plays an essential role in glycosylation by modifying plant secondary metabolites. It is abundant in the genome of organisms and are specific in their regulation of metabolic pathways. Glycosyltransferases are involved in several processes, including the regulation of hormone homeostasis, the detoxification of xenobiotics and the biosynthesis and storage of secondary compounds [35]. Glycosyltransferases are also involved in the biosynthesis of cell-wall polysaccharides [36]. The ease of accessibility of sequence information from the genomes of plants has allowed tentative confirmation of this prediction; as can be seen in Arabidopsis genome with several putative glycosyltransferase genes identified [35]. Further studies need to be conducted to determine the biochemical activity of the transferases present in the legumes investigated, to give us a mechanistic understanding of the exact functions of their gene products. About 14.1 % gene clusters (GO:0043,167) were identified for ion binding. This explains the interaction between ions and molecules, such as proteins or nucleic acids, which are essential for various biological processes in plants.

We conducted GO enrichment of the legumes to determine the genes that were over-represented in the biological, molecular and cellular components. The GO enrichment analysis of single copy genes revealed that the gene clusters were mainly involved in a range of biological functions, including responses to both abiotic and biotic stresses. A higher number of gene clusters (321) were identified with biological functions for response to drought stress (G0:0009,414) and 295 with biological functions for response to salt stress (GO:0009,651) with fewer genes obtained for response to heat (83 genes, GO:0009,408), and 31 genes for response to ethylene stress (GO:0009,723) and genes (14) for response to temperature stimulus (GO:0009,266). These findings are in agreement with the report by Shorinola et al. [37] that most underutilized crops are adapted to degraded, marginal or otherwise poor soils, and tolerate drought, cold or heat stresses.

The single copy genes are conserved across species and the GO enrichment helped to identify the biological, molecular, and cellular functions in the legume species investigated. The GO enrichment revealed the contributions to adaptations in the different species which highlights their potential as climate resilient crops. It aided the annotation of single-copy genes with specific functions, thereby enhancing genome annotation. The GO enrichment guided the prediction of gene functions of the single copy genes in these legume species. Single-copy genes identified in the four legumes are conserved regions, thus they share similar functions. The use of orthologous genes for the GO enrichment analysis indicates that the genes compared among the legume species are evolutionarily related as they share a common ancestral gene. Moreover, these orthologous genes increase the confidence in the genomic comparison as it minimizes the likelihood of comparing genes from diverse evolutionary origins. Similarity in the GO enrichment obtained in this study indicates that the orthologous genes in AYB, mung bean, winged bean, and cowpea are most likely to be functionally equivalent, meaning they perform the same roles in their respective species. In previous studies, AYB has been shown to be more closely related to species in the Vigna, Glycine, and Phaseolus families. The use of the current sequences in this study to compare the genome of AYB to that of other legumes has increased our knowledge of the unique relations AYB has with winged bean, despite having a stronger overall relationship with mung bean and cowpea. Therefore, further research into the functional implications of these genes could be valuable for crop improvement and breeding programs.

The frequency of each amino acid determines its relative abundance in the protein sequence [38]. AYB, mung bean, winged bean, and cowpea are legumes that have a shared evolutionary lineage [39]. Nevertheless, they have developed unique traits and adjustments to suit their environments. The results of the amino acid analysis indicate that AYB, winged bean, and mung bean possess a higher percentage of essential amino acids compared to the commonly consumed cowpea. This suggests that these lesser-known legumes may offer superior nutritional value, particularly in terms of amino acid composition which agrees with the evaluation of [40] on chemical and nutritional composition of some underutilized legumes. However, AYB stood out with the highest concentrations in seven amino acids, both essential and non-essential. These include alanine (7.15 %), aspartic acid (5.38 %), glutamic acid (6.18 %), glycine (6.94 %), methionine (2.42 %), arginine (5.57 %), and serine (8.45 %). Notably, methionine is a sulfur-containing essential amino acid often limited in legumes, making AYB a promising protein source in plant-based diets. This brings to the forefront the significance of AYB in both animal and human nutrition [41].

Winged bean demonstrated a broader amino acid profile, leading in seven different amino acids; cysteine (1.89 %), histidine (2.50 %), isoleucine (5.51 %), and tryptophan (1.51 %), which are all essential, as well as proline (4.95 %), glutamine (3.79 %), and threonine (5.26 %). This diversity underscores the potential of winged bean as a versatile protein-rich crop. Mung bean showed dominance in phenylalanine (4.55 %), lysine (6.10 %), valine (6.88 %), arginine (5.57 %), and tyrosine (3.02 %) further emphasizing its high essential amino acid content, particularly in lysine, which is often limiting in cereal-based diets.

In contrast, cowpea, while a staple legume in many African diets, had the highest composition in only two amino acids: leucine (9.38 %) and asparagine (4.88 %). Although leucine is critical for protein synthesis and muscle repair, the limited overall amino acid diversity highlights the nutritional limitations of cowpea relative to the underutilized species. These findings suggest that promoting underutilized legumes like AYB, winged bean, and mung bean could enhance dietary amino acid intake, support food diversity, and address protein-energy malnutrition in regions where legumes form a dietary staple. Moreover, their rich amino acid profiles position them as strong candidates for crop improvement programs, functional food development, and sustainable agricultural practices [41]. This amino acid analysis provides valuable insights into their genetic affinity, functional differences, and potential adjustments. Due to the use of conserved domains in this comparative analysis, the resulting amino acids are highly conserved in all four species. This conservation is essential for protein functions related to photosynthesis, nitrogen fixation, and seed development.

Advances in genomics necessitate thorough physiological and metabolic analyses of underutilized crops to bridge knowledge gaps as it will give a more cohesive picture of their adaptation and metabolic phenotypes at the cell, organ, plant and species level. The identification, annotation, and expression analysis of plant biosynthetic gene clusters (BGCs) in the genomes of the four legume species were made easier by plantiSMASH. The combination of a comprehensive library of profile Hidden Markov Models (pHMMs) for enzyme families known to be involved in plant biosynthesis pathways with CD-HIT clustering of predicted protein sequences belonging to the same family, genomic loci encoding multiple different (sub)families of specialized metabolic enzymes were identified [42]. In addition, plantiSMASH made possible the determination of the likelihood that each locus encodes genes that cooperate in a single route by comparing genomic analyses and examining the patterns of gene expression within these putative BGCs. The BGCs and secondary metabolites predicted in the genome of the four legume species revealed that the highest protein gene (enzyme) count for saccharide (68) and terpene (18) biosynthesis was obtained in AYB, while mung bean had the highest gene clusters for alkaloids (10) and polyketides (5) and the highest enzyme count for the biosynthesis of alkaloids (32) and polyketides (17). The existence of these bioactive compounds in both opportunity crops underlines their dietary value when ingested in the human diet. Plant secondary metabolites are organic substances that, while produced by plants, are not directly engaged in the plants' major biological processes [[43], [44], [45]]. However, they are significant in plants’ growth and development, innate immunity, defense response signaling, and response to environmental stresses. Some of the identified compounds have multiple uses aside from agro-industrial purposes, including pharmaceutical, food additive, and cosmetic uses [43]. Terpenes are known to possess anti-malarial and anti-viral properties, and are used in the treatment of cancer, diabetes and some are able to suppress short-term loss of memory [43]. AYB with the highest protein gene count for terpene can be further investigated to determine the types of terpene present, and other essential bioactive compounds, which will enhance our knowledge of the usefulness of the crop for improved human health, and encourage its widespread cultivation and consumption. Terpenes may also contribute to the aroma and flavour of foods prepared, and the medicinal properties of these legumes. Likewise, alkaloids regarded as plant molecules for defense, with established reports of their cardioprotective, anti-microbial, anesthetic, and anti-inflammatory properties in clinical settings, exhibit activities related to ephedrine, morphine, quinine, strychnine, and nicotine [43,[45], [46], [47]]. The composition of these bioactive compounds may vary depending on several factors such as variety, growing conditions, processing methods and cooking procedures.

Polyketides are also essential bioactive components of legumes and include; flavonoids such as quercetin, anthocyanins responsible for red pruple colors present in kidney bean, as well as isoflavonoids. Plant bioactives, such as polyketides, serve metabolic roles in plants that suport growth and development, especially in adverse environments, and have therapeutic value for human health [43]. Plant flavonoids are essential antioxidants that scavenge reactive oxygen species and are produced in increased levels during environmental stressors like cold or heat [43]. Underutilised legumes with high flavonoid concentration show potential as climate resilient crops. AYB and mung bean fall into this category of crops. Nyananyo and Nyingifa [47], investigated the phytochemicals present in the seeds of three genotypes of AYB with different seed colours (white brown speckled, and red), and identified different flavanoids (flavones-Apigenin and Tricin (white seeds), Chrysoeriol (flavone) and Genistein (isoflavone) in the brown speckled seeds, and Apigenin (flavone) and Genistein (isoflavone) in the red seeds. Similarly, Vijayakumar [48] reported using mung bean extract and other underutilized legumes to exert hepato-protective effects due to the presence of antioxidant and anti-inflammatory compounds [8]. Soetan et al. [49] confirmed the presence of antioxidant-related phytochemicals: phenolics and flavonoids in AYB. Hou et al. [50] conducted a comprehensive review of bioactive polyphenols, polysaccharides, and peptides in mung beans and observed that the main phenolic components of mung beans are flavonoids (1.49–1.78 mg catechin equivalent/g), phenolic acids (1.81–5.97 mg rutin equivalent/g), and tannins (1.00–5.75 mg/g). Anthocyanins, flavones, flavonols, isoflavonoids, and flavanols are the five subclasses of flavonoids, which are the most common secondary metabolites in mung beans [41]. Polysaccharides are essential as they play major roles in several physiological activities. In this study, the highest number of saccharides (10) and protein gene (enzyme) count for saccharides (68) was identified in AYB. This was followed closely by mung bean. There is a need to characterize the saccharides present in AYB to determine their usefulness to human health. Nevertheless, several polysaccharides have been identified and characterized for mung bean and were reported to have shown antioxidant [[50], [51]], radical scavenging [50,52] and immunoregulatory activities [50,53]. These bioactive components of the neglected and underutilized legumes studied in this research (AYB, mung bean and winged bean) have not been fully harnessed for improved health and well-being as many consumers in Africa are not aware of their nutritional and health benefits [9].

Other biomolecules identified in the core cluster domains of the investigated legumes include: Cytochrome P450, terpenoid synthases, dirigent enzymes, glycosyltransferases, and lipoxygenases involved in the hydroxylation, demethylation, and epoxidation of various substrates. Whilst Cytochrome P450 is key in the biosynthesis of flavonoids, isoflavonoids, and other secondary metabolites, terpenoid synthase catalyzes the cyclization and/or isomerization of farnesyl pyrophosphate or geranyl pyrophosphate to produce various terpenoids and derivatives. Dirigent enzymes essential in the structural diversity of flavonoids also facilitate the coupling of two phenols to form oligomeric flavonoids. Furthermore, glycosyltransferase catalyzes the transfer of a glycosyl group from a donor sugar to an acceptor molecule and participates in the glycosylation of flavonoids, isoflavonoids, and other secondary metabolites, regulating their solubility, stability, and biological activity. Lipoxygenase catalyzes the oxidation of polyunsaturated fatty acids to hydroperoxides and facilitates the biosynthesis of jasmonic acid, a plant stress and defense hormone. The vast range of secondary metabolites produced by the combined activity of these special enzymes gives legumes their distinct chemical and biological characteristics. The analysis of biosynthetic gene clusters and secondary metabolites resulted in this study resulted in the identification of a diverse array of bioactive compounds in the legumes, including alkaloids, polyketides, and terpenes. These compounds have potential applications in pharmaceuticals, nutrition, and agriculture.

5. Conclusion

A comprehensive comparative genomic analysis of the legume species: Sphenostylis stenocarpa (African Yam Bean), Vigna radiata (Mung Bean), Psophocarpus tetragonolobus (Winged Bean), and Vigna unguiculata (Cowpea) revealed significant genetic diversity. Sphenostylis stenocarpa was closely related to the Vigna genus, particularly Vigna radiata and Vigna unguiculata. However, Sphenostylis stenocarpa and Psophocarpus tetragonolobus exhibited a higher degree of genomic divergence, suggesting unique evolutionary pressures or a different evolutionary path. Gene family expansion and contraction analysis revealed distinct evolutionary trajectories for the two clades, indicating different adaptive strategies. One consisting of Vigna radiata and Vigna unguiculata and the other of Sphenostylis stenocarpa and Psophocarpus tetragonolobus. Additionally, the analysis of biosynthetic gene clusters and secondary metabolites resulted in the identification of a diverse array of bioactive compounds in these legumes, including alkaloids, polyketides, and terpenes. These compounds have potential applications in pharmaceuticals, nutrition, and agriculture. Overall, our comparative genomic analysis provides valuable insights into the evolutionary history and genetic diversity of these legumes. The findings have implications for crop improvement, breeding programs, and the discovery of new bioactive compounds. Future research should focus on exploring the functional implications of the identified gene families and bioactive compounds.

Funding declaration

No funding was received for this research.

Ethics approval and consent to participate

Ethical approval was not required for this study.

CRediT authorship contribution statement

Omena Bernard Ojuederie: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Project administration, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. Ufuoma Lydia Akpojotor: Writing – review & editing, Writing – original draft. Adetomiwa Ayodele Adeniji: Writing – review & editing, Writing – original draft, Software, Methodology, Investigation, Formal analysis, Data curation. Tina Chukwuyem Ojuederie: Writing – review & editing, Writing – original draft, Data curation. Jacob Olagbenro Popoola: Writing – review & editing, Writing – original draft, Formal analysis. Olubukola Oluranti Babalola: Writing – review & editing, Writing – original draft, Resources, Project administration.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Footnotes

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.btre.2025.e00918.

Contributor Information

Omena Bernard Ojuederie, Email: ob.ojuederie@kingsuniversity.edu.ng.

Olubukola Oluranti Babalola, Email: olubukola.babalola@nwu.ac.za.

Appendix. Supplementary materials

Supplementary information

mmc1.doc (194.5KB, doc)

Data availability

The protein and genome fasta sequences used are freely available on the NCBI database for the legumes

References

  • 1.FAO Executive summary: how to feed the world 2050. 2020. https://www.fao.org/fileadmin/templates/wsfs/docs/expert_paper/How_to_Feed_the_World_in_2050.pdf Retrieved 13th July 2024.
  • 2.Afzal M., Alghamdi S.S., Migdadi H.H., Khan M.A., Nurmansyah M.S.B., El-Harty E. Legume genomics and transcriptomics: from classic breeding to modern technologies. Saudi J. Biol. Sci. 2020;27:543–555. doi: 10.1016/j.sjbs.2019.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Kumar B., Singh A.K., Bahuguna R.N., Pareek A., Singla-Pareek S. Orphan crops: a genetic treasure trove for hunting stress tolerance genes. Food Energy Secur. 2023;12:e436. doi: 10.1002/fes3.436. [DOI] [Google Scholar]
  • 4.Zhu T., Flavio C., De Lima F., De Smet I. The heat is on: how crop growth, development, and yield respond to high temperature. J. Exptl. Bot. 2021;72:7359–7373. doi: 10.1093/jxb/erab308. [DOI] [PubMed] [Google Scholar]
  • 5.Deutsch C.A., Tewksbury J.J., Tigchelaar M., Battisti D.S., Merrill S.C., Huey R.B., Naylor R.L. Increase in crop losses to insect pests in a warming climate. Science. 2018;361:916–919. doi: 10.1126/science.aat3466. [DOI] [PubMed] [Google Scholar]
  • 6.Talabi A.O., Vikram P., Thushar H. Rahman H. Ahmadzai N. Nhamo M. Shahid S., Singh R.K. Orphan Crops: a Best Fit for Dietary Enrichment and Diversification in Highly Deteriorated Marginal Environments. Front. Plant Sci. 2022;13 doi: 10.3389/fpls.2022.839704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Ojuederie O.B., Igwe D.O., Ludidi N.N., Ikhajiagbe B. Editorial: neglected and underutilized crop species for sustainable food and nutritional security: prospects and hidden potential. Front. Plant Sci. 2024;14 doi: 10.3389/fpls.2023.1358220. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Ojuederie O.B., Balogun M.O. African yam bean (Sphenostylis stenocarpa) tubers for nutritional security. J. Under. Legumes. 2019;1(1):56–68. [Google Scholar]
  • 9.Popoola J.O., Ojuederie O.B., Aworunse O.S., Adelekan A., Oyelakin A.S., Oyesola O.L., Akinduti P.A., Dahunsi S.A., Adegboyega T.T., Oranusi S.O., et al. Nutritional, functional, and bioactive properties of African underutilized legumes. Front. Plant Sci. 2023;14 doi: 10.3389/fpls.2023.1105364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.E. Fredenberg, K. Karl, S. Passarelli, J. Porciello, V. Rattehalli, A. Auguston, N. Kozlowski, N.. Vision for Adapted Crops and Soils (VACS) research in action: opportunity crops for Africa, (2024), 10.7916/3hd1-8t86. [DOI]
  • 11.Babalola O.O. Springer Cham; 2021. Food Security and Safety: African Perspectives; p. 907. [DOI] [Google Scholar]
  • 12.Ojuederie O.B., Popoola J.O., Aremu C., Babalola O.O. In: Food Security and Safety: African Perspectives. Babalola O.O., editor. Springer; Cham: 2021. Harnessing the Hidden Treasures in African Yam Bean (Sphenostylis stenocarpa), an Underutilised Grain Legume with Food Security Potentials; pp. 1–20. [DOI] [Google Scholar]
  • 13.Nnamani C.V., Ajayi S.A., Oselebe H.O., Atkinson C.J., Igboabuchi A.N., Ezigbo E.C. Sphenostylis stenocarpa (ex. A. Rich.) Harms, a fading genetic resource in a changing climate: prerequisite for conservation and sustainability. Plants. 2017;6(30):1–16. doi: 10.3390/plants6030030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Osuagwu A.N., Edem U.L., Kalu S.E. Evaluation of genetic diversity in Aerial Yam (Dioscorea bulbifera L) Qualitative and Quantitative Morphological Traits. Glob Sci J. 2019;7(11):368–390. [Google Scholar]
  • 15.Osuagwu A.N., Edem U. Evaluation of genetic diversity in Aerial Yam (Dioscorea bulbifera L) Using Simple Sequence Repeats (SSR) Markers. Agrotechnol. 2020;9(5):202. doi: 10.35248/2168-9881.20.9.202. [DOI] [Google Scholar]
  • 16.Ojuederie T.C. University of Ibadan; Ibadan, Nigeria: 2021. M.Sc. Thesis; p. 45pp. [Google Scholar]
  • 17.Osuagwu A.N., Edem U.L., Bassey O.P., Ije E.C. Evaluation of rbcL Gene in African Yam Bean (Sphenostylis stenocarpa Hochst Ex. A Rich Harms) and Related Legumes Using in Silico Approach. J Proteomics Bioinform. 2024;17:66. [Google Scholar]
  • 18.Shitta N.S., Unachukwu N., Edemodu A.C., Abebe A.T., Oselebe H.O., Abtew W.G. Genetic diversity and population structure of an African yam bean (Sphenostylis stenocarpa) collection from IITA GenBank. Sci Rep. 2022;12(1):4437. doi: 10.1038/s41598-022-08271-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Chang Y., Liu H., Liu M., Liao X., Sahu S.K., Fu Y., Song B., B, et al. The draft genomes of five agriculturally important African orphan crops. Gigasci. 2019;8(3):giy152. doi: 10.1093/gigascience/giy152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lonardi S., Muñoz-Amatriaín M., Liang Q., Shu S., Wanamaker S.L., Lo S., Tanskanen J., Schulman A.H., Zhu T., Luo M.C., et al. The genome of cowpea (Vigna unguiculata [L.] Walp.) Plant J. 2019;98:767–782. doi: 10.1111/tpj.14349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Waweru B., Njaci I., Murungi E., Paliwal R., Muli C., Maranga M., Kaimenyi D., Lyimo B., Nigussie H., Ahadi B.B., et al. Chromosome-scale assembly of the African yam bean genome. bioRxiv. 2023;10(31) doi: 10.1101/2023/10.31.564964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Ho W.K., Tanzi A.S., Sang F., Tsoutsoura N., Shah C. Moore N., Bhosale R., Wright V., Massawe F., Mayes S. A genomic toolkit for winged bean Psophocarpus tetragonolobus. Nat. Commun. 2024;15(1):1901. doi: 10.1038/s41467-024-45048-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Khanbo S., Phadphon P., Naktang C., Sangsrakru D., P Waiyamitra, N Narong, Somta P. A chromosome-scale genome assembly of mungbean (Vigna radiata) Peer J. 2024;12(2024) doi: 10.7717/peerj.18771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kauster S.A., Duran H.G., Blin K., Osbourn A., Medema M.H. PlantiSMASH: automated identification, annptation and expression analysis of plant biosynthetic gene clusters. Nucleic Acid Res. 2017;45(W1):W55–W63. doi: 10.1093/nar/gkx305. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.R.C.S.B Protein Data Bank (RCSB.org): delivery of experimentally-determined PDB structures alongside one million computed structure models of proteins from artificial intelligence/machine learning. Nucleic Acids Res. 2023;51(D1):D488–D508. doi: 10.1093/nar/gkac1077. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Feuillet C., Keller B. Comparative Genomics in the Grass Family: molecular Characterization of Grass Genome Structure and Evolution. Ann. Bot. 2022;89(1):3–10. doi: 10.1093/AOB/MCF008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Burgess-Herbert S.L., Euling S.Y. Use of comparative genomics approaches to characterize interspecies differences in response to environmental chemicals: challenges, opportunities, and research needs. Toxicol. Appl. Pharmacol. 2023;271(3):372–385. doi: 10.1016/J.taap.2011.11.011. [DOI] [PubMed] [Google Scholar]
  • 28.Abdelsalam N.R., Hasan M.E., Javed T., Rabie S.M.A., El‑Din H., El‑Wakeel M.F., Zaitou A.F., Abdelsalam A.Z., Aly H.M., et al. Endorsement and phylogenetic analysis of some Fabaceae plants based on DNA barcoding. Mol. Biol. Rep. 2022;49:5645–5657. doi: 10.1007/s11033-022-07574-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Harms H. Leguminosae Africanae II. Bot. Jahrb. 1899;26:308–310. [Google Scholar]
  • 30.Adewale B.D., Odoh C.N. A Review on Genetic Resources, Diversity and Agronomy of African Yam Bean (Sphenostylis stenocarpa (Hochst. Ex A. Rich.) Harms): a Potential Future Food Crop. Sustain. Agric. Res. 2013;2(1) doi: 10.22004/ag.econ.231332. [DOI] [Google Scholar]
  • 31.Forrest M.E., Pinkard O., Martin S., Sweet T.J., Hanson G., Coller J. Codon and amino acid content are associated with mRNA stability in mammalian cells. PLoS One. 2020;15(2) doi: 10.1371/journal.pone.0228730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kino K., Kawada T., Hirao-Suzuki M., Morikawa M., Miyazawa H. Products of Oxidative Guanine Damage Form Base Pairs with Guanine. Int. J. Mol. Sci. 2020;21(20):7645. doi: 10.3390/IJMS2120764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lee C.S.K., Weiβ M., Hamperl S. Where and when to start: regulating DNA replication origin activity in eukaryotic genomes. Nucleus. 2023;14(1) doi: 10.1080/19491034.2023.2229642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Edem U., Osuagwu N.A. Evaluation of Secondary and Tertiary Protein Structure Variation in rbcl Gene of Selected Legumes using Computational Approach. J. Proteomics Bioinform. 2023;16 doi: 10.35248/0974-276X.23.16.642. [DOI] [Google Scholar]
  • 35.Gachon C.M., Langlois-Meurinne M., Saindrenan P. Plant secondary metabolism glycosyltransferases: the emerging functional analysis. Trends Plant Sci. 2005;10(11):542–549. doi: 10.1016/j.tplants.2005.09.007. [DOI] [PubMed] [Google Scholar]
  • 36.Keegstra K., Raikhel N. Plant glycosyltransferases. Curr. Opin. Plant Biol. 2001;4(3):219–224. doi: 10.1016/S1369-5266(00)00164-3. [DOI] [PubMed] [Google Scholar]
  • 37.Shorinola O., Marks R., Emmrich P., Jones C., Odeny M.A. Chapman D. Integrative and inclusive genomics to promote the use of underutilized crops. Nat. Comm. 2024;15:320. doi: 10.1038/s41467-023-44535-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Moura A., Savageau M.A., Alves R. Relative Amino Acid Composition Signatures of Organisms and Environments. PLoS One. 2013;8(10) doi: 10.1371/journal.pone.0077319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.George T.T., Obilana A.O., Oyeyinka S.A. The prospects of African yam bean: past and future importance. Heliyon. 2020;6(11) doi: 10.1016/j.heliyon.2020.e05458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.James S., T.U Nwabueze, Onwuka G.L., Ndife J., Usman M.A. Chemical and nutritional composition of some selected lesser known legumes indigenous to Nigeria. Heliyon. 2020;6(11) doi: 10.1016/j.heliyon.2020.e05497. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Odeku O.A., Ogunniyi Q.A., Ogbole O.O., Fettke J. Forgotten Gems: exploring the Untapped Benefits of Underutilized Legumes in Agriculture, Nutrition, and Environmental Sustainability. Plants. 2024;13(9):1208. doi: 10.3390/plants13091208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Koprivova A., Kopriva S. Plant Secondary Metabolites Altering Root Microbiome Composition and Function. Curr. Opin. Plant Biol. 2022;67 doi: 10.1016/j.pbi.2022.102227. [DOI] [PubMed] [Google Scholar]
  • 43.El-Ramady H., Hajdú P., Tör˝os G., Badgar K., Llanaj X., Kiss A., Abdalla N., Omara A.A., Elsakhawy T., Elbasiouny H., H, et al. Plant Nutrition for Human Health: a Pictorial Review on Plant Bioactive Compounds for Sustainable Agriculture. Sustainability. 2022;14:8329. doi: 10.3390/su14148329. [DOI] [Google Scholar]
  • 44.Twaij B.M., Hasan M.N. Bioactive Secondary Metabolites from Plant Sources: types, Synthesis, and Their Therapeutic Uses. Int. J. Plant Biol. 2022;13:4–14. doi: 10.3390/ijpb13010003. [DOI] [Google Scholar]
  • 45.Heinrich M., Mah J., Amirkia V. Alkaloids used as medicines: structural phytochemistry meets biodiversity—An Update and Forward Look. Molecules. 2021;26(27):1836. doi: 10.3390/molecules26071836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Yeshi K., Crayn D., Ritmejeryt˙e E., Wangchuk P. Plant Secondary Metabolites Produced in Response to Abiotic Stresses Has Potential Application in Pharmaceutical Product Development. Molecules. 2022;27:313. doi: 10.3390/molecules27010313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Nyananyo B.L., Nyingifa A.L. Phytochemical investigation on the seed of Sphenostylis stenocarpa (Hochst ex A. Rich.) Harms (Family Fabaceae) J. Appl. Sci. Environ. Manage. 2011;15(3):419–423. [Google Scholar]
  • 48.Vijayakumar V. In: Sustainable Agriculture Reviews. Guleria P., Kumar V., Lichtfouse E., editors. Vol. 51. Springer; Cham: 2021. Nutraceutical legumes: a brief review on the nutritional and medicinal values. (Sustainable Agriculture Reviews). [DOI] [Google Scholar]
  • 49.Soetan K., Adeola A. Comparative nutritional and functional properties of selected underutilized legumes. Nig. J. Anim. Product. 2018;45:96–106. [Google Scholar]
  • 50.Hou D., Yousaf L., Xue Y., Hu J., Wu J., Hu X., Feng N., Shen Q. Mung Bean (Vigna radiata L.): bioactive Polyphenols, Polysaccharides, Peptides, and Health Benefits. Nutrients. 2019;11:1238. doi: 10.3390/nu11061238. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Lai F., Wen Q., Li L., Wu H., Li X., X Antioxidant activities of water-soluble polysaccharide extracted from mung bean (Vigna radiata L.) hull with ultrasonic assisted treatment. Carbohydr. Polym. 2010;81:323–329. doi: 10.1016/j.carbpol.2010.02.011. [DOI] [Google Scholar]
  • 52.Zhong K., Lin W., Wang Q., Zhou S. Extraction and radicals scavenging activity of polysaccharides with microwave extraction from mung bean hulls. Int. J. Biol. Macromol. 2012;51:612–617. doi: 10.1016/j.ijbiomac.2012.06.032. [DOI] [PubMed] [Google Scholar]
  • 53.Yao Y., Zhu Y., Ren G. Antioxidant and immunoregulatory activity of alkali-extractable polysaccharides from mung bean. Int. J. Biol. Macromol. 2016;84:289–294. doi: 10.1016/j.ijbiomac.2015.12.045. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary information

mmc1.doc (194.5KB, doc)

Data Availability Statement

The protein and genome fasta sequences used are freely available on the NCBI database for the legumes


Articles from Biotechnology Reports are provided here courtesy of Elsevier

RESOURCES