Computational design of an mRNA vaccine targeting antifungal-resistant Lomentospora prolificans

Muhammad Bilal Iqbal Rehmani; Fizza Arshad; Muhammad Umer Khan; Hasan Ejaz; Umar Nishan; Amal Alotaibi; Riaz Ullah; Ke Chen; Suvash Chandra Ojha; Mohibullah Shah

doi:10.1038/s41598-025-14907-y

. 2025 Oct 1;15:34157. doi: 10.1038/s41598-025-14907-y

Computational design of an mRNA vaccine targeting antifungal-resistant Lomentospora prolificans

Muhammad Bilal Iqbal Rehmani ¹, Fizza Arshad ¹, Muhammad Umer Khan ², Hasan Ejaz ³, Umar Nishan ⁴, Amal Alotaibi ⁵, Riaz Ullah ⁶, Ke Chen ⁷, Suvash Chandra Ojha ^7,^✉, Mohibullah Shah ^1,^8,^✉

PMCID: PMC12489060 PMID: 41034328

Abstract

Lomentospora prolificans is an emerging opportunistic pathogen that predominantly affects immunocompromised individuals, as well as healthy individuals, often leading to disseminated disease with high mortality rates. Effective treatment is challenging due to its high intrinsic resistance to antifungal agents. To address this, we employed subtractive proteomics and reverse vaccinology approaches to identify potential antigenic proteins for the design of an mRNA-based multi-epitope vaccine (MEV). Our study identified four antigenic proteins as promising vaccine targets. A vaccine construct was developed using a combination of twelve cytotoxic T lymphocyte (CTL), nine helper T lymphocyte (HTL), and five linear B lymphocyte (LBL) epitopes. These epitopes were connected using appropriate linkers (AAY, GPGPG, and KK) and adjuvants to enhance antigenicity and immunogenicity. The vaccine construct was rigorously evaluated for its physicochemical properties, demonstrating high antigenicity, non-toxicity, non-allergenicity, stability, and solubility. Molecular docking studies were conducted to validate the interactions between the vaccine construct and the human toll-like receptor (TLR4). Immune simulation studies further confirmed the vaccine’s potential to elicit a robust immune response. Additionally, molecular dynamics (MD) simulations, principal component analysis (PCA), dynamic cross-correlation matrix (DCCM) analysis, and binding free energy calculations were performed to assess the stability and efficacy of the vaccine-receptor complex. Codon optimization and in-silico cloning were carried out to ensure efficient expression of the vaccine in Escherichia coli strain K12. The findings of this study suggest that the proposed vaccine construct holds significant promise as a novel mRNA-based therapeutic candidate against L. prolificans infections. Further experimental validation is recommended to advance this vaccine toward clinical application.

Keywords: Antifungal, Infections, Fatal, In-silico vaccine, Bioinformatics

Subject terms: Computational biology and bioinformatics, Immunology

Introduction

Lomentospora prolificans is an emerging opportunistic pathogen that predominantly infects immunocompromised individuals, often resulting in disseminated disease with high mortality rates. However, it can also cause infections in healthy populations¹. Effective treatment is challenging due to the pathogen’s high intrinsic resistance to antifungal agents. Successful management relies on three key pillars: rapid and accessible diagnostic methods, aggressive surgical debridement when applicable, and prompt administration of effective antifungal therapy¹. Sewage, contaminated waterways, and soil contain the environmental fungus L. prolificans. It is also spread by inhalation of spores and direct inoculation². Disseminated infection following a lung transplant and myeloblastic leukemia has also been linked to L. prolificans-induced morbidity. It was reported to be the source of bone infection after trauma and ocular infection during a lawn mower mishap in healthy people³. L. prolificans primarily affects individuals with weakened immune systems, causing a range of severe diseases, including pulmonary and disseminated infection.

This fungus was first identified by Malloch and Salkin in 1984 from an immunocompetent patient with osteomyelitis, marking its initial recognition as a human pathogen⁴. This was the first reported instance of a nosocomial outbreak of L. prolificans infection, occurring during hospital renovation activities. At that time, patients were treated in a temporary facility that lacked standard protective measures for granulocytopenic individuals. They were kept in solitary rooms with restricted access and required masks, wedges, and dressing gowns. This zone lacked air conditioning, and potted plants were isolated from the renovation zone. The four incidents happened in two adjacent rooms occupied by four patients over 28 days⁵. The role of L. prolificans in invasive fungal infections is becoming more widely acknowledged in regions like Australia, the US, and some countries of Europe.

Antifungal medications have not been linked to a reduced chance of mortality⁶. As L. prolificans is resistant to various antifungal drugs, it causes infections that are difficult to treat, usually with adverse consequences. Early identification and vigorous care are crucial because of the high fatality rate, which is particularly severe in cases of disseminated infection or delayed treatment. Amphotericin B-based formulations along with surgery are advised as current therapy guidelines. Frequently, amphotericin B-based formulations (AmB) fail to work on L. prolificans⁷. The major chronic side effect of using amphotericin B-based formulations is nephrotoxicity. AmB likely causes kidney damage through a variety of methods⁸. A combination of voriconazole and terbinafine can effectively control widespread L. prolificans infections, but certain strains of the infection are resistant to this medication, and this resistance has the potential for side effects like liver toxicity and visual disturbances with an overall mortality rate of 87.3%⁴.

Current research is focused on developing effective treatments with minimal adverse effects to enable significant advancements in the near future. This is due to the complex nature of diseases, which necessitates a comprehensive understanding of their pathogenesis and prognosis. New sequencing methods have proliferated recently, allowing scientists to make significant advancements in the area of vaccine production. In recent years, subtractive proteomics and immunoinformatics approaches have gained popularity as effective and cost-efficient strategies for designing vaccines against a wide range of infectious diseases^9,10.

Therefore, to combat infections caused by L. prolificans, the current study employed computer-aided approaches to design a multi-epitope mRNA vaccine based on immunogenic B- and T-cell epitopes, aiming to elicit both humoral (e.g., B-cell activation and antibody production) and cellular (e.g., T-helper cell and cytotoxic T-cell activation) immune responses. The interacting ability of the proposed vaccine was confirmed through molecular docking with TLR4, MD simulations, PCA analysis, DCCM, and binding free energy analysis. Using in-silico cloning, the vaccine was reverse-translated and expressed in E. coli (K12 strain). This study intends to produce an efficient and dynamic vaccine against L. prolificans using a unique complex of the fungus’ antigenic peptides and immunoinformatic approaches for further experiments¹⁰. There is no commercially available mRNA vaccine that targets fungal pathogens (e.g., Candida albicans). The experimental mRNA vaccine candidates against Candida have mostly targeted single antigens and have shown protective immune responses in preclinical models. However, there is little information on long-term immunity, and these vaccines have not yet been clinically approved¹¹. On the other hand, our multi-epitope mRNA vaccine design includes immunodominant epitopes that are expected to stimulate strong humoral (IgM, IgG1, IgG2) and cellular (CD4⁺ and CD8⁺ T-cell) responses. Simulation results show improved antigen clearance, memory formation, and sustained immunoglobulin and T-cell activity after booster doses. Similarly, the in-silico approach assumes linear epitope processing but does not account for mRNA kinetics such as lipid nanoparticle transport or intracellular antigen processing. These are the primary limits of in silico strategies. However, this technique of developing mRNA vaccines based on predicted linear epitopes has been successfully used in prior studies¹². However, experimental validation is required to check the efficacy and effectiveness of the constructed vaccine.

Materials and methods

The following workflow (Fig. 1) depicts the subtractive proteomics and reverse vaccinology technique to develop mRNA vaccines against L. prolificans infections.

Proteome retrieval

The reference proteome of L. prolificans (Proteome ID: UP000233524) was retrieved from the UniProtKB database. The proteome underwent subtractive proteomics analysis to identify promising candidates for the construction of a vaccine against L. prolificans.

Human non-homologous protein identification

To minimize the risk of autoimmune reactions, it was crucial to select antigens that do not resemble human proteins when developing a vaccine. Using BLASTp analysis, the entire proteome of L. prolificans was compared to the human proteome with cutoff thresholds of pident < 35, query coverage < 35, bitscore < 100, and e-value > 1e-4^13,14.

Identification of gut non-homologous proteins

Numerous beneficial bacteria in the host body protect it against external contaminants¹⁵. The relationship between humans and gut flora is mutualistic and symbiotic rather than just commensal. Proteins in this microbiota may be accidentally blocked or inhibited, which could negatively affect the host. For this purpose, the BLAST analysis of human non-homologous proteins against the NCBI gut metagenome was performed to obtain gut non-homologous proteins with the previously defined parameters, such as cutoff values of bitscore < 100, e-value value > 1e-4, pident < 35%, and query coverage < 35%^14,16,17.

Identification of essential and virulent proteins

Essential proteins or genes play critical roles in pathogen survival and virulence¹⁸. Their conserved condition lowers the risk of resistance, and they might elicit strong and long-lasting immune responses. These factors make essential proteins crucial to vaccine development. The BLASTp was used to perform sequence alignment between the prioritized proteins from the above step and proteins in the Database of Essential Genes (DEG) based on the filtering criteria of bitscore > 100, query coverage > 35, pident > 35, and e-value value < 1e-4^19,20. A database called DEG holds regularly updated information on the important proteins found in eukaryotes. Moreover, virulent proteins are of great importance in prioritizing vaccine candidates due to their pathogenic and host invasion properties. The Pathogen–Host Interactions Database (PHI-base) provides information about virulence factors from diverse pathogens²¹. The VFDB database was subjected to BLASTp against VFDB by using cutoffs of pident > 35, query coverage > 35, bitscore > 100, and expectation value < 1e-4 to explore the virulent proteins of L. prolificans by identifying homologous proteins to gut non-homologous proteins²².

Determining protein subcellular localization

In bioinformatics, the subcellular localization of proteins is indispensable for understanding biological functions, predicting protein function, and searching for disease mechanisms. Proteins perform functions according to their location. The extracellular and outer membrane proteins were prioritized for vaccine candidates, while cytoplasmic and mitochondrial proteins were considered as putative drug targets. The Euk-mPLoc 2.0, CELLO v.2.5, and WoLF PSORT servers were utilized to identify the subcellular location of the target proteins. A multi-class support vector machine (SVM) categorizing system called CELLO v.2.5 was utilized to predict the location of fungal proteins in a cell. Euk-mPLoc 2.0²³is more potent and adaptable technique for determining the subcellular localization of eukaryotic proteins²⁴.Moreover, the WoLF PSORT categorizes proteins into more than ten localization sites, including proteins that move between the cytosol and nucleus²⁵.

Screening of potential vaccine targets

The prioritized vaccine proteins were then subjected to the VaxiJen v2.0 server²⁶ for antigenicity prediction. Prediction of antigenicity is an important step because it can generate an immune response. The AllerTOP v2.0 server was utilized to determine these proteins’ allergenic profile²⁷.Additionally, the TMHMM server²⁸ was utilized to verify the topological values of the proteins as 0 or 1. The ProtParam Expasy server was employed to examine their physicochemical properties, like molecular weight, GRAVY value, theoretical pI, number of amino acids, and stability²⁹.

MHC-I and MHC-II epitope prediction

Epitopes are specific regions of antigenic protein molecules that can attach to receptors on immune cells and trigger an immunological reaction. MHC I molecules, which are crucial in presenting peptide antigens to cytotoxic T lymphocytes, are present in all nucleated cells. The MHC I molecules on cytotoxic T cells present peptides from intracellular and extracellular sources, triggering an immune response to destroy the cell. In acquired or adaptive immunity, HTL cells are essential because they help cytotoxic T lymphocytes (CTLs) to destroy infected cells and help B cells to produce and release antibodies. After helper T cells are activated and presented on antigen-presenting cells (APCs), they act as effectors and develop into a particular subgroup needed for a targeted immune response³⁰. Moreover, the full HLA reference allele set was used to ensure that the final epitopes covered the HLA present worldwide, and protein sequences were used as the input for the IEDB-recommended³¹ method of MHC-I and MHC-II epitope forecasting. Immune Epitope Database is a well-established and widely used manually curated database of experimentally characterized immune epitopes. IEDB’s prediction algorithms are trained and benchmarked on large sets of experimentally validated binders and non-binders, with negative control data incorporated during development and validation³².Based on their percentile ranks (< 0.5%), the resulting epitopes were ranked.

Prediction of linear B-cell epitopes

B-cell epitopes are essential for both the development of vaccines and the induction of an adaptive immune response. A bioinformatics tool called Bepipred Linear Epitope Prediction 2.0 was utilized to predict antigenic epitopes, or antibody-binding sites, within protein sequences. Its main goal was to determine the linear B-cell epitopes, or regions of proteins that antibodies can recognize.

Analysis of determined MHC-I, MHC-II, and B-cell epitopes

Antigenic, non-toxic, soluble, and non-allergic epitopes must be used in the development of a vaccine. The VaxiJen v 2.0 was used to screen all T-cell and B-cell epitopes that had been prioritized to determine their antigenicity score. While the server, AllerTop2.0 was utilized to predict the allergenicity of these epitopes. Moreover, the solubility and toxicity of these epitopes were examined using the INNOVAGEN³³ and ToxinPred servers³⁴respectively.

Multi-epitope vaccine construction

To create a vaccine construct, linkers were utilized to join the selected HTL, CTL, and LBL epitopes³⁵. Specifically, the AAY, GPGPG, and KK peptide linkers were utilized to connect the putative CTL, HTL, and LBL epitopes to construct a peptide vaccine³⁶. Linkers are short amino acid fragments that connect adjacent epitopes and adjuvants, enhance immune responses, and restrict epitope folding³⁰. An immunogenic linker such as EAAAK was used to attach the adjuvant, beta defensin, to the epitopes of B and T cells.

mRNA vaccine design

From N-terminal to C-terminal, the final vaccine construct contains a signal tPA peptide attached to the N-terminus to facilitate the vaccine’s exit from the cell. Furthermore, the vaccine design should have a suitable Kozak sequence that starts with the start codon³⁷. Subsequently, the sequence for the 5′ cap and 5′ untranslated domains was added to the N-terminal end of the vaccine to make it transcriptionally stable, while the 3′ untranslated domain and poly-A tail were placed at the C-terminal end. Additionally, the Mfold³⁸ and RNAfold³⁹ servers were utilized to forecast the anticipated vaccine’s mRNA secondary structure based on its minimum free energy. In general, the mRNA structure is considered more stable with a lower free energy⁴⁰.

Analysis of designed vaccine

The designed construct of the vaccine was examined for its antigenicity, topology, solubility, allergenicity, and physicochemical characteristics. For this purpose, VaxiJenv2.0 and ANTIGENpro⁴¹were utilized to determine the vaccine’s antigenicity. The TMHMM server and AllerTOP 2.0 assessed the topology and allergenicity, respectively. The ProtParam tool of the Expasy server assessed distinct physicochemical properties of the designed vaccine construct, such as its aliphatic index (AI), molecular weight (MW), theoretical isoelectric point (PI), instability index, and grand average of hydropathicity (GRAVY). The SOLpro tool and the INNOVAGEN server were utilized to forecast the vaccine’s solubility.

Prediction of 2D and 3D structure

PDBsum and SOPMA servers were utilized to predict the 2D structure of the designed vaccine^42,43. These self-optimized prediction techniques may evaluate characteristics including beta sheets, random coils, and bend areas. Using the trRosetta server⁴⁴a 3D model of the vaccine structure was determined. An online resource named PROCHECK, which describes the stereochemical characteristics of the protein, was used to evaluate the most favored residues in the vaccine through the Ramachandran plot. Moreover, using a Ramachandran plot, the phi/psi angles were evaluated in order to have a thorough grasp of the protein backbone. The SAVES v6.1 server was used to project an ERRAT quality factor value. The ERRAT plot additionally serves as a confirmation procedure for protein structures with non-bonded connections, useful for tracking the progression of crystallographic modeling. The SAVES v6.1 server suggested that the higher the ERRAT score, the better the quality of protein. Moreover, MolProbity is another tool that was used to further validate a vaccine’s structure⁴⁵. MolProbity examines the molecule’s overall stereochemistry and can discover any potential steric conflicts or other structural concerns⁴⁶. Furthermore, the developed 3D model was subsequently evaluated using the Swiss Model Structure Assessment Tool, which validates the vaccine design using the QMEAN score⁴⁷. The QMEAN value provides a composite quality estimate that includes both global and local analysis of the model⁴⁸.

Post-translational modification analysis, globular regions, and proteasomal cleavage analysis

Post-translational modifications (PTMs) of vaccine candidates, peptides, and other types are crucial steps for improving vaccine immunogenicity⁴⁹. A posttranslational modification (PTM) influence the entire protein structure and function in a variety of biological processes, including transcription, translation, apoptosis, cell signaling, and replication. Protein phosphorylation and glycosylation are considered some of the most common and important PTMs since they activate or deactivate different enzymes and receptors. NetNGlyc 1.0⁵⁰ and NetPhos 3.1 servers⁵¹ were used to anticipate locations suggesting posttranslational modifications in proteins, namely N-glycosylation and phosphorylation, respectively. The GlobPlot 2.3 server was used to determine the globular and disordered regions of the vaccine⁵². Disordered regions in proteins frequently contain short linear peptide motifs that are critical for protein activity⁵³. Moreover, the proteasomal cleavage analysis was conducted to analyze the vaccine’s capacity to activate cytotoxic T-cells. This analysis was performed by the Proteasome Cleavage Prediction Server of the Immunomedicine group (http://imed.med.ucm.es/Tools/pcps/).

Aggrescan3D and CABS-flex analysis

When designing vaccines, achieving structural stability is essential. Aggrescan3D was used to examine aggregation-prone areas, followed by the coarse-grained simulation for the flexibility of the vaccine modes using CABS-flex. A total of 50 cycles and 7494 RNG seeds were selected. The global side chain and global C-alpha constraints were set at 1.0⁵⁴.

Conformational B-cell epitopes prediction

The ElliPro⁵⁵ tool from the IEDB server was utilized to determine the conformational B-cell epitopes of the vaccine. A conformational or discontinuous B-cell epitope is the arrangement of amino acids or subunits that make up an antigen and can bind directly with an immune system receptor⁵⁶. ElliPro assigns a protrusion index (PI) score to each epitope that is anticipated. The PI scores of residues serve as the criterion for recognizing conformational B-cell epitopes.

Population coverage analysis

HLA allele dispersion and expression differ by ethnicity and geography around the world, influencing the successful production of vaccines. The population coverage was estimated using the IEDB population coverage tool, which took into account specified MHC I and MHC II epitopes, as well as matching HLA-binding alleles. Based on the distribution of human MHC-binding alleles, this tool forecasts the population coverage of each epitope for various locations around the world⁵⁷.

Prediction of binding pockets

The CASTp server⁵⁸ was used to determine the active sites or binding pockets of proteins. The CASTp offers accurate, comprehensive, and quantitative data on the topographical features of a protein. Active pockets on protein surfaces and the inside of three-dimensional structures were determined, which is crucial for forecasting the protein regions that engage in ligand-protein interactions.

Molecular Docking between vaccine and receptors

A molecular docking study was carried out using the ClusPro 2.0 server⁵⁹to determine the binding affinity between the developed vaccine (as a ligand) and TLR4 (PDB ID: 4G8A), as a receptor. Toll-like receptor 4 is a key lipopolysaccharide (LPS) receptor that identifies pathogen infections and activates inflammatory responses. It plays a vital role in the innate immune response. Three sequential phases are combined by ClusPro, an excellent tool for protein-protein docking, to yield binding affinity: rigid-body docking, lowest energy structure clustering, and energy-minimization-based structural refinement. The PyMol software was utilized to visualize the interactions between the vaccine and the receptor.

Normal mode evaluation

The dynamic motion of the docked complex was further investigated utilizing the iMOD server⁶⁰. For the normal mode simulation study, the docked complex PDB file with the lowest binding energy score was used. This tool utilizes deformability, covariance, B factor, eigenvalue, variance, and elastic network to forecast the range and direction of fundamental protein-protein complex motions.

MD simulation analysis

Assisted Model Building with Energy Refinement, suite v20 (AMBER20) (https://ambermd.org/) was used to conduct the MD simulation analysis as performed previously^61,62. The receptor atoms that were missing were added using AMBER20’s LEaP algorithm. Each complex was submerged in TIP4P water in a truncated octahedron shell with a border distance of 10.0 Å⁶³. Before running the MD simulations, energy minimization on each system was performed to avoid any steric conflict that could emerge during system setup. Initially, all counterions (Na + or Cl-) and water molecules were refined, while the ligand and protein were frozen with a limitation potential of 500 kcal/(molÅ²). The protein’s potential was determined using the AMBER20 force field⁶⁴whereas the ligand’s potential was determined using the general AMBER force field (GAFF)⁶⁵. Protein residue side chains have been loosened, and backbone heavy atoms were restricted by a 5 kcal/(molÅ2) constraint force. Lastly, the complete system was refined or optimized with no constraints. In each phase, structural optimization was performed using 2500 steps of steepest descent and 5000 steps of the conjugate gradient technique. The NVT ensemble used energy minimization to gradually heat each system from 0 to 300 K over 200 ns, using a force constant of 10 kcal/(molÅ2) on the protein-ligand complex. The system proceeded through seven rounds of equilibrations at 300 K in 1 ns interval, with declining restraint weights of 5, 3, 1, 0.5, 0.3, 0.1, and 0 kcal/(molÅ2) and no restrictions on the solvation environment. Finally, each system in the NPT ensemble underwent 200 ns MD productions at 300 K and 1.0 atm pressure⁶⁶. Moreover, MD simulation was also performed for 100 ns, in which each complex was submerged in TIP3P water for better comparison with extensive MD simulation at 200 ns and the TIP4P model.

MD trajectories

The simulated trajectories were utilized to compute the solvent-accessible surface area (SASA), radius of gyration (Rg), root mean square deviation (RMSD), and root mean square fluctuation (RMSF)⁶⁷. The stability of each system was assessed using RMSD from the initial equilibrated spots of TLR backbone atoms and vaccine chains. The oscillations in the side chain atoms of toll-like receptors and vaccine chains were also investigated using RMSF. The system’s compactness during MD simulation was examined using the Rg. Furthermore, SASA was determined to get insight into the common SASA between the TLR4 and vaccine elements⁶⁸.

Binding free energy

To compute the system’s binding free energies, MD simulation trajectories were run with the AMBER20 MMPBSA program. MMPBSA determines free energies by comparing the free energy difference between the complex, protein, and ligand alone. The overall binding free energy is computed as the difference between the free energy of the complex (Gcomplex) and the addition of the free energies of the individual proteins (Gprotein) and ligand (Gligands). It can be determined with the following equation Eq. 1:

In Eq. (2), the solvation-free energy (ΔGsol) is calculated by adding the polar and nonpolar portions.

Compared to total binding free energy estimations, per-residue free energy breakdown can disclose each residue’s influence on the ligand. This allows for a more in-depth analysis of each ligand’s binding capability and selectivity. Equation (3) can be used for calculating it.

In the previous equation, ΔEvdW and ΔEele are van der Waals and electrostatic interactions estimated by AMBER20⁶⁹.

PCA and free energy landscape analysis

Molecular dynamics (MD) simulations allow microscopic analysis of the structure and behavior of molecular systems. Principal component analysis (PCA) has become one of the most extensively utilized approaches to evaluating the mobility of proteins within this framework⁷⁰. PCA is a statistical approach used to reduce the number of dimensions needed to explain protein dynamics systematically. This is accomplished using a decomposition technique that filters observed motions from largest to smallest spatial scales. Principal component analysis is a linear development that extracts the most significant data elements from a covariance or correlation matrix. Two principal components (PC1 and PC2) were selected for examination. These two PCs were then utilized to construct and examine the free energy landscape (FEL). In FEL plots, the deep valleys reflect the lowest energy states, and the boundaries between the deep valleys show intermediate conformations.

DCCM analysis

The dynamic cross-correlation matrix (DCCM) analysis was used for insights into the correlated movements of residues in the V1-TLR4-docked complex during a 200 ns molecular dynamics (MD) simulation. The trajectory information gathered during the MD simulation was used to compute the cross-correlation coefficients between the variations of each residue pair⁷¹. This study identified locations within the complex that displayed substantial correlated or anti-correlated movements, offering observations into the vaccine-receptor-docked complexes’ dynamic behavior and potential functional connections.

Immune simulation

The C-ImmSim server was employed to perform computational immune simulations in order to evaluate the vaccine’s efficacy and immunological characteristics. According to the real-world applications, the smallest suggested gap between the first and second doses of most vaccines is 4 weeks. Three injections were given four weeks apart for our immune simulation. For calculating simulation durations, the C-ImmSim server employed a time-step scale. Each time step on this scale corresponds to 8 h in real life. The total number of antigenic steps for simulation was customized to 1050, and the injection points were set at time steps 1, 84, and 168 ⁷². Thus, the vaccine was administered on days 1, 28 and 56 ⁷³, and also these steps correspond with the clinical trials of mRNA vaccines⁷⁴. The remaining parameters were at their default values.

In-silico cloning

Codon adaptation is a strategy in which codons in the construct’s cDNA sequence are modified to maximize the model vaccine’s expression in a suitable expression system⁷⁵. In order to get Codon Adaptation Index (CAI) values, the Optimizer tool employed an algorithm to back translate the vaccine amino acid sequences to DNA. E. coli expression levels were anticipated using the average GC content and CAI values of the modified patterns. A CAI value of 1.0 is seen to be ideal, and the GC content varies between 30% and 70%. To produce a recombinant plasmid, codon-adapted sequences were added to the plasmid vector pET28a (+). SnapGene 8.0.2 software (www.snapgene.com) was used for this objective.

Identification and analysis of novel drug targets

The mitochondrial and cytoplasmic proteins predicted by subcellular localization were subjected to a BLASTp analysis against the FDA-approved database. DrugBank provided a designed database for FDA-approved drug targets. Proteins shortlisted in the subcellular localization step were examined against DrugBank data using BLASTp analysis, with criteria of qcovs < 35, pident < 35, bitscore < 100, and e-value value > 1e-4^17,20. Targets that had no substantial sequence similarity with DrugBank data were prioritized for further analysis in order to predict novel drug targets. The TMHMM–2.0 server calculated transmembrane topology for proteins. Transmembrane proteins, which span the lipid membrane, are potential drug targets. The physicochemical parameters of the finalized proteins, such as molecular weight, theoretical pI, instability index, and grand average hydropathicity, were investigated using the ProtParam tool⁷⁶.

String database analysis

The STRING 11.5 server was utilized to analyze the interaction network of proteins from different drug target proteins⁷⁷. The system configuration was found to have the lowest interaction score with medium confidence of “0.400”. All other parameters remained unaltered from the system’s default configuration.

3D model prediction and drug pocket screening

The 3D structure of the finalized proteins was predicted using the SWISS-MODEL server⁷⁸. To validate the most suitable model, the ERRAT value and Ramachandran plot of each model were assessed using the SAVES v6.0 server. The selected probable druggable proteins were then analyzed to identify putative binding pockets using DoGSiteScorer⁷⁹. DoGSiteScorer evaluated the druggability of the 3D structures. This tool calculates the pocket residue and druggability score, which ranges from 0 to 1. A score closer to 1 indicates a highly druggable protein cavity⁸⁰.

Results and discussion

Proteome retrieval and non-homologous protein identification

The entire proteome of L. prolificans contained 8533 proteins; after removing duplicates, only 8341 proteins were left behind. These proteins were comprehensively studied utilizing the subtractive proteomics workflow to identify and characterize putative immune activation targets. To verify that these targets do not have invasive immunogenic effects by cross-reacting with human proteins. BLASTp analysis against the human proteome revealed 2075 proteins inside the pathogen’s proteome that were classified as host non-homologous. Moreover, non-homology analysis with the human gut microbiome was carried out to analyze their possible interactions and compatibility with the human gut microbiome. BLASTp analysis revealed that a total of 2048 proteins were non-homologous to human gut microbiota proteins, showing that these proteins are structurally or functionally distinct from those found in the human gut microbiome (Figure S1.)

Identification of essential and virulent proteins

It is believed that essential proteins control the necessary processes and are vital for the survival of pathogens. Regarding the development of vaccines, such proteins raise the possibility of affecting pathogen virulence. Vaccine development based on these targets might result in a stronger immune response and broader defense against several strains. It also reduces the possibility of genetic mutations that may affect vaccine effectiveness⁸¹. The database of essential genes was used to identify 22 essential proteins, while the others were discarded. These 22 important proteins were used in the next steps to identify effective vaccine targets.

Virulent proteins have been shown to play a vital role in the pathogenesis, making them a promising vaccine candidate. Virulent proteins might elicit an immune response; hence, virulent protein analysis was undertaken. The virulent protein analysis revealed 07, and these proteins were employed in subsequent analysis. Both essential and virulent proteins were combined, and duplicates were discarded. A total of 26 proteins were selected in this step for further analysis (Fig. S1).

Subcellular localization of proteins

One of the most crucial aspects of therapeutic targets is protein localization⁸². Firstly, the WoLF PSORT, CELLO v.2.5, and Euk-mPLoc 2.0 bioinformatic tools were employed to determine the subcellular localization of these 26 proteins (Table 1). The 3 proteins were linked to the inner membrane, and 4 proteins served as either drug or vaccine targets based on their subcellular localization and functionality; the protein “L. prolificans peptidase M4 C-terminal domain-containing protein” would be surface-exposed and used for extracellular proteolysis, hence immune-accessible and drug-inhibition accessible⁸³. Histidine kinases possess sensor domains that are membrane-bound and kinase domains within the cytoplasm, thus may be available for both vaccine and drug applications in virtue of their involvement in signal transduction⁸⁴. While L-ornithine N (5)-monooxygenase (PvdA) is not a surface-exposed protein but plays a central role in siderophore biosynthesis and iron acquisition that contributes to virulence, addition of the epitopes of this protein may enhance the protection by disrupting iron metabolic pathways and enhancing host immunity through immune responses⁸⁵. Moreover, TauD/TfdA-like domain-containing proteins perform fundamental metabolic functions; they are essential for the pathogen’s ability to survive in the host because of their critical roles in oxidative stress adaptation and sulfur metabolism⁸⁶. Additionally, prior studies have shown that intracellular enzymes, particularly through MHC-I presentation pathways can support T-cell-mediated immune responses⁸⁷and 19 were found in the cytoplasm and mitochondria and are regarded as drug targets (Fig. 2). They are reachable to the host immune system, and proteins located in the extracellular compartment, outer membrane, and periplasm have been explored for determining vaccine candidates. These proteins are vital to host-pathogen interactions and are typically exposed to the outside environment³⁰.

Table 1.

Subcellular localization of 26 combined proteins.

Sr. No	GI numbers	Protein name	Localization
1	A0A2N3N042	SIS domain-containing protein	Cytoplasmic
2	A0A2N3NHG3	Thiamine phosphate synthase/TenI domain-containing protein	Cytoplasmic
3	A0A2N3NES6	Phosphoribosyltransferase domain-containing protein	Cytoplasmic
4	A0A2N3MXY2	Carbonic anhydrase	Cytoplasmic
5	A0A2N3NAZ6	Isocitrate lyase	Cytoplasmic
6	A0A2N3N8S9	AB hydrolase-1 domain-containing protein	Cytoplasmic
7	A0A2N3N1L7	TauD/TfdA-like domain-containing protein	Cytoplasmic, Extracellular
8	A0A2N3N082	Isocitrate lyase	Cytoplasmic
9	A0A2N3NL25	TauD/TfdA-like domain-containing protein	Cytoplasmic
10	A0A2N3MYD7	Nickel/cobalt efflux system	Membrane
11	A0A2N3NJ79	beta-glucosidase	Cytoplasmic
12	A0A2N3N8J1	NADH: flavin oxidoreductase/NADH oxidase N-terminal domain-containing protein	Cytoplasmic
13	A0A2N3N2I7	Carbohydrate kinase PfkB domain-containing protein	Mitochondria
14	A0A2N3MZ68	PrpF protein	Mitochondria
15	A0A2N3NAY7	AB hydrolase-1 domain-containing protein	Mitochondria
16	A0A2N3NIY2	Cation/H + exchanger domain-containing protein	Membrane
17	A0A2N3N9S5	Fatty acid synthase subunit alpha	Cytoplasmic
18	A0A2N3N399	anthranilate synthase	Cytoplasmic
19	A0A2N3MZG5	pyridoxal 5’-phosphate synthase (glutamine hydrolyzing)	Cytoplasmic
20	A0A2N3NEZ7	Phospho-2-dehydro-3-deoxyheptonate aldolase	Cytoplasmic
21	A0A2N3N5D6	Fructose-bisphosphate aldolase	Cytoplasmic
22	A0A2N3NKV7	Tryptophan synthase	Cytoplasmic
23	A0A2N3NFZ1	Peptidase M4 C-terminal domain-containing protein	Cytoplasmic, Extracellular
24	A0A2N3NAB3	NodB homology domain-containing protein	Extracellular
25	A0A2N3N0P6	Histidine kinase	Cytoplasmic, Extracellular
26	A0A2N3NFW9	L-ornithine N(5)-monooxygenase	Cytoplasmic, Extracellular

Open in a new tab

Fig. 2 — The schematic presentation of subcellular localization of proteins using the CELLO v.2.5, Euk-mPLoc 2.0, and WoLF PSORT servers.

Analysis of prioritized vaccine candidates

Employing the VaxiJen v2.0 and AllerTop 2.0 servers, the antigenicity and allergenicity profile of the vaccine candidate proteins was determined. A protein’s ability to enhance an effective immune response and build protective immunity against infections depends on its antigenicity. The 5 proteins out of 7 vaccine targets (A0A2N3N1L7, A0A2N3MYD7, A0A2N3NAB3, A0A2N3N0P6, and A0A2N3NFW9) were determined to be non-allergenic and antigenic, according to the result of the analysis. In contrast, it has been observed that 2 proteins (A0A2N3NFZ1 and A0A2N3NIY2) are neither allergic nor antigenic. The TMHMM server was used to detect the presence of hydrophobic α-helices in the transmembrane. The presence of transmembrane hydrophobic α-helices in membrane-bound proteins aids in their membrane embedding⁸⁸. The GRAVY value determines protein polarity, the aliphatic index provides thermostability, and the instability index indicates whether the protein is stable or not. Protein stability is influenced by molecular weight and theoretical pI. After the analysis, four proteins (A0A2N3N1L7, A0A2N3NAB3, A0A2N3N0P6, and A0A2N3NFW9) were finalized as vaccine candidates, and they were antigenic, non-allergenic, highly stable, and hydrophilic with a topological value of 0 (Table 2).

Table 2.

Physicochemical properties of prioritized vaccine candidates.

Protein ID	Proteins Names	No. of A.A	Topology Value	Mol. Wt	Theoretical Pi	Aliphatic Index	Gravy	Allergen	Antigen	Instability index
A0A2N3N1L7	TauD/TfdA-like domain-containing protein	382	0	42589.44	5.81	75.79	-0.543	No	Yes	Stable
A0A2N3MYD7	Nickel/cobalt efflux system	459	o33-55i99-121o141-163i210-232o242-259i312-334o361-383i	49665.17	6.02	116.01	0.421	No	Yes	Unstable
A0A2N3NIY2	Cation/H + exchanger domain-containing protein	876	o47-69i76-95o105-127i140-162o175-197i209-231o246-268i288-310o335-357i364-386o396-415i427-449o	94214.21	6.16	113.63	0.417	No	No	Unstable
A0A2N3NFZ1	Peptidase M4 C-terminal domain-containing protein	397	0	44514.12	6.6	74.21	-0.52	No	No	Stable
A0A2N3NAB3	NodB homology domain-containing protein	424	0	44920.04	5.38	67.45	-0.331	No	Yes	Stable
A0A2N3N0P6	Histidine kinase	730	0	81107.73	5.55	92.59	-0.216	No	Yes	Stable
A0A2N3NFW9	L-ornithine N(5)-monooxygenase	582	0	63605.62	6.05	84.97	-0.252	No	Yes	Stable

Open in a new tab

Prediction of CTL and HTL epitopes

Identifying CTL and HTL epitopes is essential to construct a vaccine that is effective. Based on the receptors on their membranes, T-cells are categorized as either CD4+ (HTLs) or CD8+ (CTLs). These cells attach themselves to MHC-I and MHC-II molecules’ epitopes. HTLs interact with MHC-II molecules, while CTLs attach to MHC-I epitopes. MHC molecules and antigenic epitopes can interact to generate a potent immune response. The primary function of MHC1 cells is to eliminate tumor cells that have the proper antigens and to destroy cells infected by foreign substances. MHC-II cells are crucial for initiating and optimizing the immune response. By instructing other cells to carry out these functions, they “mediate” the immune response and control the kind of immunological response that arises⁸⁹. Thus, predicting high-affinity epitopes is critical⁹⁰. The IEDB-recommended NetMHCpan EL 4.1 approach was implemented to predict highly immunogenic CTL epitopes with the percentile rank score of < 0.5%. The top 10 non-overlapping epitopes with a low percentile rank score were selected for every protein. The targeted epitopes were checked for their antigenicity, toxicity, allergenicity, and water solubility. To create the vaccines, 12 CTL epitopes from 4 proteins were chosen that were non-allergenic, highly antigenic, highly non-toxic, and good water-soluble. The 12 epitopes DYKEITTAR, RTHPVTGEK, HESGAATSL, STAEATTAR, KENGVVATF, VSKGLVEQW, KVTRDLTEW, AELRLISAY, TIKTRTPSL, YQYSPTERF, RTAHLTILK, and KVLSINHPR were selected as promising candidates for vaccine designing (Table S1).

The predicted results for MHC-II epitope analysis identified 9 unique epitopes with a percentile rank of less than 2.0, indicating high interaction affinity. These epitopes were further assessed for their antigenicity, non-allergenicity, nontoxicity, and water solubility. The 9 epitopes, FTRNIVGLKKEESDA, QGDYKEITTARYSDE, ELGYHVTNYNLDTKD, QSGRVTTQTPQTESA, LKNDFLANMSHEIRT, MDDYIAKPVNKQLLA, VEQWNRTEGEFIGRD, RSRFTFLNYLHENNR, and IAPVTGDDEPAVPEE, were selected as potential vaccine candidates (Table S2).