Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Jul 3;15:23776. doi: 10.1038/s41598-025-08107-x

Subtractive proteomics and molecular docking identify therapeutic targets and drug candidates in drug resistant Klebsiella Michiganensis THO-011

Abdullah R Alanzi 1,, Ahmad Z Alanazi 2, Khalid Alhazzani 2, Munawar Abbas 3
PMCID: PMC12226713  PMID: 40610643

Abstract

Klebsiella michiganensis, an emerging multidrug-resistant pathogen, poses a significant public health threat. This study employed subtractive genomics to identify potential therapeutic targets in K. michiganensis THO-011. From 4,024 predicted open reading frames, we identified non-redundant, human non-homologous proteins and analyzed them for essentiality, subcellular localization, and metabolic pathway involvement. Two promising druggable targets, WP_004097788.1 and WP_219541799, vital for bacterial survival, were identified. Using AlphaFold-predicted structures, virtual screening of 10,000 natural compounds from the LOTUS database, alongside DrugBank controls, identified LTS0037797 and LTS0037810 as top inhibitors. Glide Gscores ranked these compounds, with validation via MM-GBSA binding energy and molecular dynamics simulations confirming their stability and binding efficacy. These findings highlight novel therapeutic strategies against K. michiganensis, with further in vitro studies necessary to advance these inhibitors to clinical application.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-08107-x.

Keywords: Klebsiella Michiganensis, Subtractive genomics, ORFs, Druggable proteins, Molecular docking, MD simulation

Subject terms: Biophysics, Computational biology and bioinformatics, Drug discovery, Structural biology, Diseases, Health care, Pathogenesis

Introduction

Klebsiella michiganensis is a bacterium that belongs to the genus Klebsiella and is one of nine species that comprise the Klebsiella oxytoca complex. K. michiganensis has been identified as an emerging nosocomial pathogen, including clinical isolates containing carbapenemase-encoding genes1. K. michiganensis was first detected in a toothbrush holder in a Michigan household in 20122. This possible emerging pathogen has thus far been documented in clinical settings across numerous nations35. Even though K. michiganensis can be found in a variety of natural habitats like other Klebsiella species, environmental reservoirs for these organisms have not been thoroughly studied. K. michiganensis isolates have also been found in wastewater, indicating that urban and hospital effluents from coastal areas may be possible causes of human infections6,7.

K. michiganensis has the potential to infect humans, particularly those with compromised immune systems or underlying medical problems. K. michiganensis is well-known for causing infections such as respiratory tract infections, urinary tract infections (UTIs), and bloodstream infections. These infections can be especially troublesome in hospitals, where patients may be more sensitive to bacterial infections because of underlying medical disorders, surgical operations, or the use of invasive medical devices such as catheters and ventilators. Effective drugs can improve patient outcomes and safety by providing reliable treatment options3,8.

Understanding microbial biology, genetics, and resistance mechanisms can be advanced by studying bacteria like K. michiganensis and designing drugs to tackle them. This information can be used to guide future drug development efforts9.

In response to the escalating threat posed by multidrug-resistant Klebsiella michiganensis, an emerging nosocomial pathogen responsible for severe and hard-to-treat infections, our study employed a subtractive proteomics approach to identify potential therapeutic targets. This method involves systematically subtracting proteins shared with the human host and gut microbiota to isolate those essential for the pathogen’s survival and unique to its metabolic pathways, thereby minimizing potential off-target effects. We identified two critical protein targets: WP_004097788.1 (Replicative DNA helicase) and WP_219541799 (bifunctional phosphoribosylaminoimidazolecarboxamide formyl-transferase/IMP cyclohydrolase). These proteins are indispensable for bacterial DNA replication and purine biosynthesis, respectively processes vital for bacterial growth and proliferation. Targeting WP_004097788.1 could impede DNA replication, halting cell division, while inhibiting WP_219541799 would disrupt purine biosynthesis, impairing RNA and DNA synthesis, ultimately curbing the pathogen’s ability to multiply. Through computational docking studies, we discovered natural product ligands from Beta vulgaris and Ganoderma species that exhibit high binding affinities for these targets. These findings align with our research goal of identifying candidate inhibitors for these proteins, laying the groundwork for developing novel therapeutic agents against K. michiganensis. By focusing on these essential and unique protein targets, our study offers promising avenues for the development of effective treatments for patients afflicted by this dangerous pathogen. Further research is necessary to experimentally validate the biological activities of these compounds.

Methods and materials

Data retrieval

The genome for Klebsiella michiganensis THO-011 was downloaded from NCBI (https://www.ncbi.nlm.noh.gov/)10.

ORF prediction

Interpolated Context Model (ICM) is a statistical model used in gene prediction algorithms, particularly in the context of the Glimmer gene prediction software. The third version of Glimmer (Glimmer3) incorporates the Interpolated Context Model to improve gene prediction accuracy. The Interpolated Context Model was first introduced in the context of Glimmer211. The open reading frames (ORF) within the Klebsiella michiganesis genome, were predicted by utilizing the Glimmer3. Start codons were specified according to GenBank translation table entry and a standard GenBank translation table, “atg, gtg, ttg” as a comma separated list was used to specify stop codons.

Translation

The Glimmer3 output consisted of nucleotide sequences representing the predicted genes. To elucidate the corresponding protein sequences, we employed the “Transeq” tool from the EMBOSS suite12. “Transeq” is a powerful tool for translating nucleotide sequences into their corresponding protein sequences, facilitating downstream functional analysis. Using “Transeq,” we translated the predicted genes from “Glimmer3” into amino acid sequences, thus obtaining a comprehensive protein dataset for further analysis. This dataset serves as a valuable resource for characterizing the potential gene products and predicting their functional roles in the Klebsiella michiganensis genome.

Identification and removal of duplicate proteins

Following the translation of predicted genes from Glimmer3 using the “transeq” tool12, we aimed to reduce sequence redundancy and obtain a non-redundant protein dataset. For this purpose, we employed the “CD-HIT Tools” suite13, a powerful software package widely used for sequence clustering and redundancy removal. The CD-HIT process involved several key parameters. The sequence identity threshold was set to 0.6, clustering sequences with ≥ 60% similarity. A word size of 4 was applied to sequences within the similarity range of 0.6–0.7 to balance sensitivity and accuracy. A redundancy tolerance of 2 was used, allowing sequences sharing more than two common sequences to be clustered together. The alignment bandwidth was configured to 20 amino acids, enabling flexible alignment during clustering. Sequences shorter than 100 amino acids were excluded to maintain dataset quality. By executing the CD-HIT clustering process with these specified parameters, we obtained a non-redundant protein dataset, enabling us to perform downstream analyses with reduced computational complexity and enhanced efficiency.

Humans’ non-homologous proteins identification

To identify the human non-homologous proteins of the pathogen and remove any human homologous proteins, we performed a series of bioinformatic analyses. First, we prepared a human protein database from the RefSeq NCBI database14 containing a comprehensive collection of human protein sequences. The DIAMOND software15 version 2.3.0 + was utilized to compare the pathogen protein sequences against the human protein database using the BlastP algorithm. In this process, identical self-hits between sequences were suppressed to avoid redundancy in the results. To assess sequence similarity and statistical significance, we employed composition-based statistics in DIAMOND16. The BLOSUM62 scoring matrix was selected to evaluate sequence alignments, effectively representing amino acid similarities in proteins. A maximum expected value (E-value) threshold of 0.001 was set to identify significant homology, with alignments below this threshold considered statistically significant and indicative of potential homologous relationships. By executing DIAMOND with these specified parameters, we obtained a list of human homologous proteins present in the pathogen’s protein dataset. Subsequently, we removed these human homologs, retaining only the human non-homologous proteins specific to the pathogen.

Prediction of essential genes

To determine genes vital for the pathogen’s survival, the Geptop 2.0 server was utilized. Geptop 2.0 is an online platform specifically developed to predict essential bacterial genes17. It applies a BlastP-based method using a user-specified threshold value set at 1e−5 in this study to identify potential essential genes within the submitted proteome. The server adopts a comparative genomics strategy, focusing on genes conserved among related bacterial species that are likely to serve essential biological functions17.

Comparative pathway analysis and Identification of Unique Pathways

For conducting a comparative pathway analysis between the pathogen and humans, the KEGG (Kyoto Encyclopedia of Genes and Genomes) database along with the KAAS (KEGG Automatic Annotation Server) was employed18. KEGG offers an extensive repository of pathway data and functional annotations related to genes and proteins. KAAS serves as an automated annotation tool that assigns KEGG Orthology (KO) identifiers to genes, facilitating the detection of genes associated with particular pathways19. To pinpoint pathogen-specific pathways absent in the human proteome, a thorough comparison of metabolic pathways between the two organisms was carried out.

Subcellular localization prediction

To predict the subcellular localization of the resulting proteins, the PSORTb web server was used20. PSORTb is a widely recognized tool for predicting bacterial protein localization within the cell. It applies a machine learning-based approach combined with a broad range of localization features to determine the likely subcellular site of a given protein20. Genes associated with unique pathways were analyzed using the PSORTb server to predict and identify their specific subcellular localizations. In bacterial cells, proteins are typically localized to one of five major compartments: the cytoplasm, periplasm, plasma membrane, extracellular space, or outer membrane21. Vaccine targets for membrane proteins and medication targets for cytoplasmic proteins, respectively, have been proposed22.

Virulent proteins identification

The virulence prediction of proteins was conducted using the VFDB (Virulence Factor Database) resource, which is designed to identify virulence factors in bacterial pathogens23. VFDB analysis aided in characterizing the potential virulence-associated functions of the identified proteins.

Proteins druggability potential

To explore potential drug targets, the DrugBank 3.0 database was utilized. DrugBank serves as an extensive resource that compiles data on drugs, their targets, and associated interactions24. The identified proteins were analyzed using a BLAST search against the DrugBank database, applying specific parameters: an E-value of 0.00001, gapped alignment, drug type set to “approved,” and protein type set to “target.” This analysis aimed to identify proteins that could act as drug targets and may be suitable for repurposing existing FDA-approved drugs.

Screening against gut microbiota proteins

The human gut microbiome plays a crucial role in supporting host health and regulating metabolism. To investigate potential interactions between the pathogen’s proteins and the gut microbiome, a sequence similarity analysis was performed. Proteins identified through the DrugBank 3.0 screening were used as query sequences in a BLASTP search against an extensive database of human gut microbiome or metagenome sequences. This analysis aimed to determine whether any of the pathogen proteins exhibited sequence similarity to known elements of the gut microbial community. BLASTP version 2.10.1+ was employed for this purpose25.

Virtual screening and ADMET analyses

A library of natural products containing 10,000 compounds was obtained from the LOTUS database (https://lotus.naturalproducts.net/) and prepared by using the LigPrep tool for the virtual screening against the identified drug target protein26. The 3D structures of the target proteins were retrieved from the AlphaFold database and prepared for the docking using Protein Preparation Wizard27. Prior to receptor preparation, missing loops in the protein structure were modeled using the Modeller tool28. The receptor preparation process involved multiple steps, including the formation of disulfide bonds, assignment of zero-order bonds to metals, and the addition of hydrogen atoms. Any co-crystallized ligands and water molecules were removed. During the optimization phase, the pKa values of ionizable residues were refined at pH 7.0 using the PROPKA program29. Subsequently, energy minimization was carried out using the OPLS_2005 force field. Following protein preparation, three-dimensional grids were generated at predicted binding sites of both proteins, as identified by SiteMap in Schrödinger, to enable site-specific docking. The prepared compounds were docked using the SP (Standard Precision) mode in Glide30. Molecular interactions between echinoderm-derived metabolites and target proteins were then analyzed using Discovery Studio. For ADME and Toxicity analysis of the top hit compounds, Swiss-ADME and (https://www.swissadme.ch) and ProTox 3.0 (https://tox.charite.de/protox3/) online tools were used to evaluate the top hits for the selection of best candidates.

MD simulation

The complexes were analyzed for protein confirmation and ligand stability by running a simulation of 100 ns by using Desmond31. The systems were solvated by placing them in an orthorhombic box of 10 Å, filled with TIP3P water model32. To mimic physiological conditions, counter ions were introduced to neutralize the system, and 0.15 M NaCl was added. The simulation was conducted under an NPT ensemble, maintaining the temperature at 300 K and pressure at 1 atm. Following a relaxation phase, the production run was initiated, with trajectory data recorded at 50 ps intervals. The Simulation Interaction Diagram module in Desmond was employed to analyze the resulting trajectories. The MD simulation was conducted on a high-performance Dell workstation with the following hardware specifications: an operating system of Ubuntu, a system memory of 64 GB, a CPU featuring 12th Gen Intel(R) Core (TM) i7-12700 F, and a powerful NVIDIA GeForce RTX 4070 GPU. The utilization of this high-end hardware, combined with the computational capabilities of the Desmond software, facilitated an efficient and accelerated simulation process. This approach was instrumental in enabling the successful execution of a 100 nanosecond (ns) MD simulation, ensuring both feasibility and reliability in obtaining meaningful results.

Results

Proteome subtractive analysis

The genome of Klebsiella michiganensis THO-011 was retrieved from the NCBI database. In the initial step, open reading frames (ORFs) were predicted, revealing a total of 4024 ORFs within the genome. Accurate ORF prediction is essential for deciphering the genomic architecture of an organism and identifying potential genes and functional elements. The predicted genes were translated into amino acid sequences using the “Transeq” tool, generating a complete protein dataset for downstream analysis. Redundant sequences were removed using the CD-HIT tool suite with a 60% identity threshold, resulting in 2957 non-redundant proteins out of the initial 4024. These distinct sequences were then subjected to a BlastP analysis using DIAMOND, with an e-value cutoff of 0.00001, to identify non-homologous proteins, yielding 2427 hits. Further analysis with the Geptop 2.0 server identified 180 essential genes from the set of 2427 non-homologous proteins, which are likely critical for pathogen survival. Table 1 summarizes each step of the workflow and the corresponding number of proteins at each stage. Although the criteria used were more qualitative in nature, they were consistently applied to select genes and proteins for further investigation.

Table 1.

In the given table, it displays every step and no of protein left to move forward with, from the pathogen.

Sr. No. Steps K. Michiganensis
1 ORF prediction 4024
2 Translation 4024
3

After removal of duplicate

proteins

2957
4

After removal of human

homologous proteins

2427
5 Essential Genes 180
6

Total Proteins in unique

pathways

46
7

Cytoplasmic Proteins

via PSORTb

32
8 Virulent Proteins 16
9 Druggable proteins 2
10 Non gut flora proteins 2

Unique pathways identification

Additional comparative metabolic pathway study of non-homologous essential proteins was conducted. The selected proteins were utilized to determine which metabolic pathways they are associated with. This study is carried out to discover therapeutic targets based on common and important bacterial pathway enzymes. Those metabolic pathways that were unique to K. michiganesis and not identified in humans were selected. Thus, 46 proteins with unique metabolic pathways were chosen for future analysis. Table S1 lists the details of these unique pathways and proteins involved in this.

Sub-cellular localization and virulence

As proteins may localize to various cellular compartments, subcellular localization is a key factor when identifying potential therapeutic targets. In this study, the 46 selected proteins were further analyzed to determine their localization within the bacterial cell. The results revealed that 32 of these proteins were located in the cytoplasm, while the remaining were associated with the cytoplasmic membrane. Cytoplasmic proteins can be used as drug targets, so the 32 cytoplasmic proteins underwent downstream analyses, while the other proteins were discarded. These proteins were then subjected to VFDB analysis, and 16 cytoplasmic proteins were found to be virulent (Table S2). Genes involved in unique pathways or associated with virulence factors were prioritized. This step aimed to identify genes with a potential impact on the pathogen’s biology and interactions with the host.

Druggability analysis

Druggability is another important criterion in evaluating potential therapeutic targets. It refers to the probability that a small-molecule compound can effectively modulate the function of a target protein. To assess the druggability of K. michiganensis virulence-associated proteins, their sequences were compared with known drug targets listed in the DrugBank database. This analysis identified two K. michiganensis proteins that showed strong similarity to targets of FDA-approved small-molecule drugs (Table 2). The schematic diagram of the whole process is shown in Fig. 1. The final proteins of interest were selected based on their importance in unique pathways, potential virulence factors, and predicted druggability for further analyses.

Table 2.

Druggability potential of the appropriately selected proteins from K. michiganensis in drug bank 3.0.

Sr. No K. Michiganensis
(Accessions)
Drug bank
target
Drug bank ID
1 WP_004097788.1 Dicoumarol DB00266
Cannabidiol DB09061
Orlistat DB01083
2 WP_219541799 Pemetrexed DB00642
Methotrexate DB00563

Fig. 1.

Fig. 1

A schematic diagram of Methodological workflow showing the whole process step by step.

Structure Validation, ADMET, and Molecular Docking

The quality of the 3D structures of target proteins obtained from AlphaFold was validated by calculating the ERRAT quality factor and by observing the Ramachandran plots. For the ERRAT score, a good model typically exhibits an average overall quality factor of approximately 91%. The ERRAT quality factor of WP_004097788.1 was 96.33% while the ERRAT quality of WP_219541799 was 98.50% (Fig. 2). Similarly, the Ramachandran plot for WP_004097788.1 (Fig. 3A) and WP_219541799 (Fig. 3B) showed that 95.6% and 95.8% residues were in the favored region while no residues were observed in disallowed region, respectively. Then the prepared natural compounds were docked to both proteins and the compounds with stronger binding affinities were selected for further analysis (Table S3). The DrugBank target compounds were employed as control ligands during the molecular docking studies. In the docking analysis of WP_004097788.1, the control compounds exhibited binding affinities ranging from −4.127 to −2.736 kcal/mol, whereas the selected natural products demonstrated stronger affinities, ranging from −6.125 to −5.789 kcal/mol. Similarly, for WP_219541799, the control compounds showed docking scores between −3.78 and −3.02 kcal/mol, while the selected natural products achieved higher binding affinities of −6.388 and −5.692 kcal/mol. Only two best compounds out of 20 with cleanest ADMET profiling (Table 3) and stronger binding affinities were selected for MD Simulations.

Fig. 2.

Fig. 2

The ERRAT quality factors of the target proteins. (A) WP_004097788.1 (B) WP_219541799.

Fig. 3.

Fig. 3

The Ramachandran plots of target proteins. (A) WP_004097788.1 (B) WP_219541799. The yellow region shows the allowed region while the white region shows the disallowed regions.

Table 3.

ADME and toxicity profiling of top chosen compounds for further MD simulations are given.

ADME profiling
Properties Parameters LTS0037797 LTS0037810
Physiochemical properties MW (g/mol) 388.33 374.43
Heavy Atoms 28 27
Arom. Heavy Atoms 6 6
Rotatable Bonds 4 4
HBA 8 6
HBD 5 4
Molar Refractivity 102.03 100.81
Lipophilicity

Log Po/w

(consensus)

-2.31 2.61
Water solubility Log S (ESOL) -2.32 -3.79
Pharmacokinetics GI absorption Low High
Drug-likeness

Lipinski’s Rule

of 5 (violations)

0 0
Medicine chemistry Synthetic accessibility 4.43 4.50
Toxicity profiling
End point Target LTS0037797 LTS0037810
Organ toxicity Hepatotoxicity In-Active In-Active
Neurotoxicity In-Active In-Active
Cardiotoxicity In-Active In-Active
Carcinogenecity In-Active In-Active
Mutagenicity In-Active In-Active
Cytotoxicity In-Active In-Active
Toxicity End Points Clinical Toxicity In-Active In-Active
BBB-Barrier In-Active In-Active

Tox21-Nuclear Receptor

Signaling Pathways

Androgen Receptor

Alpha

In-Active In-Active

Molecular interactions and simulation analyses

Based on the glide scores, LTS0037797 and LTS0037810 were selected against WP_004097788.1 and WP_219541799 respectively for further investigation by MD simulation.

WP_004097788.1

Among all the natural compounds tested against WP_004097788.1, LTS0037797 exhibited the highest binding affinity. Detailed analysis of its molecular interactions revealed that the compound formed seven hydrogen bonds with the residues Gln222, Arg192, Gly338, Ile337, Lys254, Leu375, and Gln376. It also made a hydrophobic interaction with Pro223 as shown in Fig. 4. The root mean square deviation (RMSD) of the C-alpha atoms was calculated over a 100 ns simulation to assess structural changes in the complex, using the apo protein as a reference for comparison33,34. Throughout the simulation, the Cα atoms of the apo protein maintained RMSD values within the range of approximately 6–7 Å, whereas the complex showed higher RMSD values, fluctuating between 10–12 Å (Fig. 5A). RMSF values were calculated to investigate the protein residues dynamics upon interacting with the ligands35. The RMSF plots revealed that most of the residues did not show many fluctuations during simulation as the RMSF values were lower than 2 Å, throughout the simulation indicating that the ligand did not exert the fluctuations in the protein as the RMSF values of the complex were lower than apo protein (Fig. 5B). The RMSF plots indicated that the majority of residues exhibited minimal fluctuations during the simulation, with RMSF values remaining below 2 Å. This suggests that the ligand did not induce significant flexibility in the protein, as the RMSF values of the complex were lower than those observed in the apo form (Fig. 5C). While Asp124 showed maximum interactions in all residues which were observed in 87% of the snapshots (Fig. 5D). Furthermore, the binding free energy of the complex was calculated using the Prime-MMGBSA module. This total energy was derived from the combined contributions of Van der Waals, Coulombic, solvation, and covalent interactions. Specifically, the Van der Waals energy contributed −31.43 kcal/mol, solvation energy was −11.18 kcal/mol, covalent energy accounted for 9.37 kcal/mol, and Coulombic energy was 1.68 kcal/mol. As a result, the overall binding free energy of the complex was determined to be −58.38 kcal/mol, as presented in the analysis Fig. 6.

Fig. 4.

Fig. 4

The molecular interactions of LTS0037797 against WP_004097788.1 protein target, Hydrogen bonds (green), Hydrophobic (magenta).

Fig. 5.

Fig. 5

(A) The RMSD plot of the apo and complex WP_004097788.1 protein. (B) The residual fluctuation analysis. (C) The protein-ligand interactions. (D) Percentage of interactions observed in snapshots.

Fig. 6.

Fig. 6

The binding free energy of the WP_004097788.1 complex and the contribution of its energy components.

WP_219541799

Docking analysis of WP_219541799 revealed that LTS0037810 had the highest binding affinity among all tested ligands, making it the top candidate for further investigation of molecular interactions and stability. Interaction analysis showed that LTS0037810 formed hydrogen bonds with six amino acid residues: Ser248, Glu241, Asp378, Asp269, Phe379, and Tyr239. Additionally, it established a hydrophobic interaction with Ala250 and a Pi-Cation interaction with Arg391, as illustrated in Fig. 7. The RMSD analysis of the C-alpha atoms for both the apo protein and the complex indicated stability within the range of approximately 5 to 7 Å during the first half of the simulation, increasing slightly to 7-8 Å in the latter half (Fig. 8A). RMSF results showed that the apo form experienced greater fluctuations compared to the protein-ligand complex throughout the simulation (Fig. 8B). In the protein-ligand contact analysis, residues involved in hydrogen bonding included Ala237, Tyr239, Ser248, Ala250, Tyr262, Asp269, Leu377, Asp378, Phe379, and Lys380. Notably, Asp269 and Asp378 also participated in ionic interactions(Fig. 8C). Among these, Ala250 showed the most consistent interaction, appearing in 96% of the simulation frames (Fig. 8D). The binding free energy of the complex was calculated to be −41.26 kcal/mol, with individual energy contributions presented in Fig. 9.

Fig. 7.

Fig. 7

The interactions of LTS0037810 against WP_219541799 protein.

Fig. 8.

Fig. 8

(A) The RMSD plot of apo and complex of WP_219541799 protein. (B) The residual fluctuation analysis. (C) The protein-ligand interactions. (D) Percentage of interactions observed in snapshots.

Fig. 9.

Fig. 9

The overall binding free energy in the WP_219541799 complex and the contribution of its energy components.

Discussion

Some bacteria, including K. michiganensis, can be resistant to multiple antibiotics, making infections challenging to treat. This can lead to increased morbidity, mortality, and healthcare costs. Effective drugs are crucial for addressing public health concerns associated with bacterial infections. To fight this life-threatening scenario, there is a pressing need to develop drugs against K. michiganensis immediately.

A subtractive proteomics approach was used in our investigation to screen therapeutic candidates against K. michiganensis. This approach is used to identify targets based on the identification of essential and non-homologous proteins within pathogenic organisms. Our innovative subtractive proteomics approach pinpoints pathogen-specific proteins critical for survival while minimizing potential cross-reactivity with the human host, marking a pivotal step in precision drug design. Identifying therapeutic targets is an important stage in computer-based drug design techniques36. Recent breakthroughs in bioinformatics and computational biology have resulted in a number of techniques to drug design and in silico analysis, minimizing the time and cost involved with the trial and error of ions allocated to drug development37.

The K. michiganensis THO-011 genome was downloaded from NCBI. The initial stage was to determine 4024 ORFs from the genome of K. michiganesis, which were subsequently translated into amino acid sequences. ORF prediction in a genome is an important step in genomics and bioinformatics. ORF prediction accuracy has a major impact on downstream analyses like as functional annotation and comparative genomics38. The higher redundancy in protein databases must be reduced. Redundancy arises when one or more homologous sequences are present in the same data set. Such sequences will induce unnecessary biases into a particular analysis39. Hence, these 4042 translated proteins were evaluated using CD-HIT, which removed all redundant proteins and provided 2957 non-redundant proteins. These proteins could be human homologs and can disrupt human metabolism and be lethal. The selection of non-homologous proteins can limit the potential of cross-reactivity as well as undesirable outcomes37. So, homologous proteins were removed and 2427 protein were considered for further analyses. Bacterial life requires the presence of essential proteins, they cannot survive if these essential proteins are damaged or altered. So, by targeting these proteins, bacteria can be killed. Bacterial essential gene research aids in understanding the nature of life and identifying novel drug targets for treating pathogenic diseases40,41. Shilpa S. et al. discovered 807 essential proteins in Eubacterium nodatum, Sakharkar et al. discovered 306 essential genes in Pseudomonas aeruginosa, and Chan-Eng Chong et al. discovered 312 essential proteins in Burkholderia pseudomallei using this method4244. So, gene essentiality analysis was performed for finding such genes in the pathogen’s proteome.

Essentiality was validated by a comparison of human and pathogen metabolic pathways using the KEGG database revealed that 23 pathways associated with 46 essential proteins are unique to pathogens only. These pathways include: Amoebiasis, Biosynthesis of various plant secondary metabolites, beta-Lactam resistance, One carbon pool by folate, Biosynthesis of ansamycins, Flagellar assembly, Biofilm formation - Escherichia coli, Naphthalene degradation, Methane metabolism, Monobactam biosynthesis, Chloroalkane and chloroalkene degradation, Carbon fixation in photosynthetic organisms, Two-component system, Bacterial secretion system, Lipopolysaccharide biosynthesis, Cell cycle – Caulobacter, Peptidoglycan biosynthesis, Vancomycin resistance, Quorum sensing, Biofilm formation - Vibrio cholera, Human papillomavirus infection, Lysine biosynthesis, Viral carcinogenesis. The findings of the pathogen-specific pathway identification are consistent with those of L. interrogans, A. baumannii, and S. saprophyticus4547. Because, protein Subcellular localization is critical for drug discovery, protein function, and genomic annotation prediction48. The 32 cytoplasmic proteins out of 46, were identified. The difficulty of purifying and analyzing proteins that are situated on membranes makes cytoplasmic proteins more desirable as therapeutic targets49. These proteins were then evaluated for their virulence and druggability factors. Only 2 highly virulent and druggable proteins were found for further validation through molecular docking and dynamic simulations.

Two druggable proteins: WP_004097788.1 and WP_219541799 were found as potential drug targets. By targeting WP_004097788.1 and WP_219541799, this study provides a roadmap for developing highly selective inhibitors to combat multidrug-resistant K. michiganensis. These proteins, vital for bacterial replication and purine biosynthesis, represent a promising avenue for therapeutic intervention, with the potential to drastically reduce pathogen viability and limit infection spread in clinical settings. WP_004097788.1 is considered important in bacteria for several reasons, because it’s a replicative DNA helicase from Klebsiella is a vital enzyme for bacterial DNA replication. Without it bacteria would not be able to replicate their DNA and would not be able to survive50. WP_219571799.1 is involved in purine biosynthetic pathway, IMP cyclohydrolase activity. It is also involved in recycling purines. It is a vital enzyme for the growth of bacteria. It is a target for antibiotics such as methotrexate and allopurinol51. To determine if homologous proteins exist in other bacteria, we conducted a preliminary sequence similarity search in public databases as we have mentioned in “Prediction of essential genes” section. We identified homologous proteins in related bacterial species, suggesting that these proteins may have conserved functions across different bacterial strains. This conservation implies a potential fundamental role of these proteins in bacterial physiology.

This study not only underscores the critical roles of WP_004097788.1 and WP_219541799 in bacterial survival but also explores their potential as targets for inhibition, paving the way for identifying or designing compounds that could disrupt these essential functions and effectively neutralize K. michiganensis. There are inhibitors reported for helicase family such as Acridines, Flouroquinolones52, Aminoglycosides53. We did not identify any known inhibitors specifically for WP_004097788.1 but for WP_219571799.1 methotrexate and allopurinol can be used as inhibitors. In conclusion, the druggable protein “WP_004097788.1 & WP_219571799.1” is of significance in bacteria due to their potential roles in essential functions. While no inhibitors were identified in the literature up to our knowledge till date specifically for this protein.

In this study, we conducted virtual screening of natural products against these therapeutic target proteins. Molecular Docking is a method of predicting the orientation of tiny molecules in relation to their protein targets. These computational methods provide information on the compounds’ binding affinity and binding activity against their target proteins54. During docking, the drug bank target compounds were employed as controls. The glide g-score was used to evaluate the docking findings, and the top 10 compounds docked against each protein receptor were chosen. The binding poses of the compounds with the highest binding affinity against the target proteins were investigated to determine their molecular interactions. LTS0037797 had the highest binding affinity against WP_004097788.1 protein, while LTS0037810 had the highest binding affinity against WP_219541799 protein. The computational findings of this study also align with global efforts to combat antimicrobial resistance (AMR). Identifying natural product inhibitors such as LTS0037797 and LTS0037810 not only adds to the arsenal of potential drugs but also emphasizes the importance of leveraging natural compound libraries in addressing AMR challenges.

Our study utilized molecular dynamics simulations and MMGBSA analysis to investigate the dynamics, stability, and binding affinity of protein-ligand interactions. Focusing on two inhibitors per protein with notable binding affinity, these methods refined our docking predictions and assisted in ranking ligand candidates for future optimization. The analyses confirmed that these compounds remain stable and effective inhibitors within the protein binding pocket. Consequently, the novel drug targets identified hold significant promise for therapeutic applications, particularly in developing new drug formulations to combat Klebsiella michiganensis infections. Future in vitro and in vivo studies are imperative to validate the efficacy and safety of these identified inhibitors. Such investigations will not only strengthen the therapeutic potential of WP_004097788.1 and WP_219541799 but also accelerate the translation of these computational findings into clinically viable treatments, providing a robust response to the dangers posed by K. michiganensis.

Conclusion

Recognizing the importance of developing effective pharmacological targets, this study aimed to conduct a comprehensive computational analysis of the human pathogen Klebsiella michiganensis to identify viable drug targets using various computational tools and methodologies. In the initial phase, two proteins were selected as potential drug targets based on their unique metabolic pathways and druggability profiles. The second phase involved a detailed structural analysis of these targets, which facilitated subsequent molecular docking and simulation studies. This research represents a significant advancement in the quest for new and effective treatments against K. michiganensis infections, highlighting the potential for these identified targets to contribute to the development of innovative therapeutic strategies.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1 (175.3KB, docx)

Acknowledgements

Authors would like to express their appreciation to Ongoing Research Funding program (ORF-2025-885) at King Saud University Riyadh Saudi Arabia for supporting this research.

Author contributions

Conceptualization, A.A; Methodology, A.Z; Supervision, K.A; Writing – original draft, M.A and A.Z; All authors reviewed the manuscript.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Yang, J. et al. Klebsiella oxytoca complex: update on taxonomy, antimicrobial resistance, and virulence. Clin. Microbiol. Rev.35 (1), e00006–21 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Saha, R. et al. Klebsiella Michiganensis sp. nov., a new bacterium isolated from a tooth brush holder. Curr. Microbiol.66, 72–78 (2013). [DOI] [PubMed] [Google Scholar]
  • 3.Mediavilla, J. R. et al. Colistin-and carbapenem-resistant Escherichia coli harboring mcr-1 and Bla NDM-5, causing a complicated urinary tract infection in a patient from the united States. MBio7 (4), 01191–01116. 10.1128/mbio (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Chapman, P. et al. Genomic investigation reveals contaminated detergent as the source of an extended-spectrum-β-lactamase-producing Klebsiella Michiganensis outbreak in a neonatal unit. J. Clin. Microbiol.58 (5), 01980–01919. 10.1128/jcm (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Sands, K. et al. Characterization of antimicrobial-resistant Gram-negative bacteria that cause neonatal sepsis in seven low-and middle-income countries. Nat. Microbiol.6 (4), 512–523 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Vignaroli, C. et al. Multidrug-resistant and epidemic clones of Escherichia coli from natural beds of Venus clam. Food Microbiol.59, 1–6 (2016). [DOI] [PubMed] [Google Scholar]
  • 7.Citterio, B. et al. Plasmid replicon typing of antibiotic-resistant Escherichia coli from clams and marine sediments. Front. Microbiol.11, 1101 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Xu, L., Sun, X. & Ma, X. Systematic review and meta-analysis of mortality of patients infected with carbapenem-resistant Klebsiella pneumoniae. Ann. Clin. Microbiol. Antimicrob.16, 1–12 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Li, S. et al. A blaSIM-1 and mcr-9.2 harboring Klebsiella Michiganensis strain reported and genomic characteristics of Klebsiella Michiganensis. Front. Cell. Infect. Microbiol.12, 973901 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Sayers, E. W. et al. Database resources of the National center for biotechnology information. Nucleic Acids Res.49 (D1), D10 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Delcher, A. L., Bratke, K. A., Powers, E. C. & Salzberg, S. L. Identifying Bacterial Genes Endosymbiont DNA Glimmer. Bioinformatic23 (6), 673–679 (2007). [DOI] [PMC free article] [PubMed]
  • 12.Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000). [DOI] [PubMed]
  • 13.Huang, Y. et al. CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics26 (5), 680–682 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res.44 (D1), D733–D745 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods12 (1), 59–60 (2015). [DOI] [PubMed] [Google Scholar]
  • 16.Hauser, M. Taxonomic and functional marker genes for viruses. Curr. Opin. Microbiol.31, 82–89 (2016). [Google Scholar]
  • 17.Wen, Q. F., Wei, W. & Guo, F. B. Geptop 2.0: accurately select essential genes from the list of protein-coding genes in prokaryotic genomes. Essential Genes Genomes Methods Protocols2022, 423–430 (2022). [DOI] [PubMed]
  • 18.Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.28 (1), 27–30 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Moriya, Y. et al. KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res.35 (suppl_2), W182–W185 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Yu, N. Y. et al. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics26 (13), 1608–1615 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Maurya, S. et al. Subtractive proteomics for identification of drug targets in bacterial pathogens: s review. Int. J. Eng. Res. Technol., 9. (2020).
  • 22.Hema, K. et al. 202 subunit vaccine design against pathogens causing atherosclerosis. J. Biomol. Struct. Dyn.33 (sup1), 135–136 (2015). [Google Scholar]
  • 23.Chen, L. et al. VFDB: a reference database for bacterial virulence factors. Nucleic Acids Res.33 (suppl_1), D325–D328 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wishart, D. S. et al. DrugBank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res.46 (D1), D1074–D1082 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform.10, 1–9 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.LigPrep, L. P. Schrödinger (LLC, 2018).
  • 27.Schrödinger, L. J. S. S. Schrödinger (LLC, 2017).
  • 28.Webb, B. & Sali, A. Protein Structure Modeling with MODELLER (Springer, 2021). [DOI] [PubMed]
  • 29.Kim, M. O. et al. Effects of histidine protonation and rotameric States on virtual screening of M. tuberculosis. RmlC. 27 (3), 235–246 (2013). [DOI] [PMC free article] [PubMed]
  • 30.Friesner, R. A. et al. Glide: a new approach for rapid, accurate Docking and scoring. 1. Method and assessment of Docking accuracy. J. Med. Chem.47 (7), 1739–1749 (2004). [DOI] [PubMed]
  • 31.Bowers, K. J. et al. Scalable algorithms for molecular dynamics simulations on commodity clusters. In Proceedings of the ACM/IEEE Conference on Supercomputing (2006).
  • 32.Price, D. J. & C.L.J .T.J.o.c.p. Brooks III, A modified TIP3P water potential for simulation with Ewald summation. J. Chem. Phys.121 (20), 10096–10103 (2004). [DOI] [PubMed]
  • 33.Sargsyan, K. et al. How molecular size impacts RMSD applications in molecular dynamics simulations. J. Chem. tTheory Comput.13 (4), 1518–1524 (2017). [DOI] [PubMed]
  • 34.Dhankhar, P. et al. Computational guided identification of novel potent inhibitors of N-terminal domain of nucleocapsid protein of severe acute respiratory syndrome coronavirus 2. J. Biomol. Struct. Dyn.40 (9), 4084–4099 (2022). [DOI] [PMC free article] [PubMed]
  • 35.Martínez, L. J. P. Automatic identification of mobile and rigid substructures in molecular dynamics simulations and fractional structural fluctuation analysis. PloS one10 (3), e0119264 (2015). [DOI] [PMC free article] [PubMed]
  • 36.Hosen, M. I. et al. Application of a subtractive genomics approach for in Silico identification and characterization of novel drug targets in Mycobacterium tuberculosis F11. Interdiscipl. Sci. Comput. Life Sci.6 (1), 48–56 (2014). [DOI] [PubMed] [Google Scholar]
  • 37.Barh, D. et al. In Silico subtractive genomics for target identification in human bacterial pathogens. Drug Dev. Res.72 (2), 162–177 (2011). [Google Scholar]
  • 38.Brent, M. R. Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res.15 (12), 1777–1786 (2005). [DOI] [PubMed] [Google Scholar]
  • 39.Sikic, K. & Carugo, O. Protein sequence redundancy reduction: comparison of various method. Bioinformation5 (6), 234 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Chen, W. H. et al. OGEE v2: an update of the online gene essentiality database with special focus on differentially essential genes in human cancer cell lines. Nucleic Acids Res.2016, gkw1013 (2016). [DOI] [PMC free article] [PubMed]
  • 41.Dickerson, J. E. et al. Defining the role of essential genes in human disease. PloS One6, 11 (2011). [DOI] [PMC free article] [PubMed]
  • 42.Sakharkar, K. R., Sakharkar, M. K. & Chow, V. T. A novel genomics approach for the identification of drug targets in pathogens, with special reference to Pseudomonas aeruginosa. Silico Biol.4 (3), 355–360 (2004). [PubMed] [Google Scholar]
  • 43.Shiragannavar, S. S. et al. Subtractive genomics approach in identifying polysacharide biosynthesis protein as novel drug target against Eubacterium nodatum. Asian J. Pharm. Pharmacol.5 (2), 382–392 (2019). [Google Scholar]
  • 44.Chong, C. E. et al. In Silico analysis of Burkholderia pseudomallei genome sequence for potential drug targets. Silico Biol.6 (4), 341–346 (2006). [PubMed] [Google Scholar]
  • 45.Goyal, M., Citu, C. & Singh, N. Silico identification of novel drug targets in acinetobacter baumannii by subtractive genomic approach. Asian J. Pharm. Clin. Res.11, 230 (2018). [Google Scholar]
  • 46.Amineni, U., Pradhan, D. & Marisetty, H. Silico identification of common putative drug targets in Leptospira interrogans. J. Chem. Biol.3 (4), 165–173 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Shahid, F. et al. Silico subtractive proteomics approach for identification of potential drug targets in Staphylococcus saprophyticus. Int. J. Environ. Res. Public Health17 (10), 3644 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Su, E. C. Y. et al. Protein subcellular localization prediction based on compartment-specific features and structure conservation. BMC Bioinform.8 (1), 330 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Mondal, S. I. et al. Identification of potential drug targets by subtractive genome analysis of Escherichia coli O157: H7: an in Silico approach. Adv. Appl. Bioinf. Chem. AABC8, 49 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Willey, J. M., Sherwood, L. M. & Woolverton, C. J. Prescott’s Microbiology (McGraw-Hill, 2014).
  • 51.Raimondi, M. V. et al. DHFR inhibitors: reading the past for discovering novel anticancer agents. Molecules24 (6), 1140 (2019). [DOI] [PMC free article] [PubMed]
  • 52.Simon, N. et al. Ciprofloxacin is an inhibitor of the Mcm2-7 replicative helicase. Biosci. Rep.33 (5), e00072 (2013). [DOI] [PMC free article] [PubMed]
  • 53.Matsunaga, K. et al. Inhibition of DNA replication initiation by aminoglycoside antibiotics. Antimicrob. Agents Chemother.30 (3), 468–474 (1986). [DOI] [PMC free article] [PubMed]
  • 54.Qamar, M. et al. In-silico identification and evaluation of plant flavonoids as dengue NS2B/NS3 protease inhibitors using molecular Docking and simulation approach. Pak. J. Pharm. Sci.30 (6), 2119–2137 (2017). [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material 1 (175.3KB, docx)

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES