Genome-Based Drug Target Identification in Human Pathogen Streptococcus gallolyticus

Nosheen Afzal Qureshi; Syeda Marriam Bakhtiar; Muhammad Faheem; Mohibullah Shah; Ahmed Bari; Hafiz M Mahmood; Muhammad Sohaib; Ramzi A Mothana; Riaz Ullah; Syed Babar Jamal

doi:10.3389/fgene.2021.564056

. 2021 Mar 25;12:564056. doi: 10.3389/fgene.2021.564056

Genome-Based Drug Target Identification in Human Pathogen Streptococcus gallolyticus

Nosheen Afzal Qureshi ¹, Syeda Marriam Bakhtiar ¹, Muhammad Faheem ², Mohibullah Shah ³, Ahmed Bari ⁴, Hafiz M Mahmood ⁵, Muhammad Sohaib ⁶, Ramzi A Mothana ⁷, Riaz Ullah ^7,^*, Syed Babar Jamal ^2,^*

PMCID: PMC8027347 PMID: 33841489

Abstract

Streptococcus gallolysticus (Sg) is an opportunistic Gram-positive, non-motile bacterium, which causes infective endocarditis, an inflammation of the inner lining of the heart. As Sg has acquired resistance with the available antibiotics, therefore, there is a dire need to find new therapeutic targets and potent drugs to prevent and treat this disease. In the current study, an in silico approach is utilized to link genomic data of Sg species with its proteome to identify putative therapeutic targets. A total of 1,138 core proteins have been identified using pan genomic approach. Further, using subtractive proteomic analysis, a set of 18 proteins, essential for bacteria and non-homologous to host (human), is identified. Out of these 18 proteins, 12 cytoplasmic proteins were selected as potential drug targets. These selected proteins were subjected to molecular docking against drug-like compounds retrieved from ZINC database. Furthermore, the top docked compounds with lower binding energy were identified. In this work, we have identified novel drug and vaccine targets against Sg, of which some have already been reported and validated in other species. Owing to the experimental validation, we believe our methodology and result are significant contribution for drug/vaccine target identification against Sg-caused infective endocarditis.

Keywords: Streptococcus gallollyticus, infective endocarditis, pan-genome, subtractive proteomics, drug prioritization

Introduction

Streptococcus gallolyticus (Sg) is Gram-positive, non-motile bacteria previously referred as Streptococcus bovis. It is phenotypically diverse bacteria belonging to the Lancefield Group D Streptococci (Pasquereau-Kotula et al., 2018; Arjun et al., 2020). This bacterium grows in chain or pairs and is non-γ-hemolytic or slightly γ-hemolytic but sometimes shows alpha-hemolytic activity on ovine blood agar plates (Rusniok et al., 2010; Hensler, 2011). Although commonly present in microflora, approximately 2.5–15% is present in the gastrointestinal tract of a healthy individual (Hinse et al., 2011) and become an opportunistic pathogen causing various diseases, including infective endocarditis, colon cancer, meningitis, and septicemia.

This opportunistic pathogenesis of Sg is dependent on genes involved in polysaccharide production, glucan mucopolysaccharide, a putative component of biofilm produced by this species, and three types of pili and collagen-binding protein (Takamura et al., 2014). These genes provide protection from host immune system and help in adherence to the epithelial lining of the heart (Rusniok et al., 2010), causing infection and resulting in endocarditis (Millar and Moore, 2004).

For the last two decades, a significant rise in incidence of infective endocarditis were observed worldwide (Tripodi et al., 2005; Marmolin et al., 2016; Shahid et al., 2018; Arregle et al., 2019; Chamat-Hedemand et al., 2020). Among 100,000 population, 2.6–7 cases of endocarditis have been reported per year, a significant proportion of which was contributed by streptococcal infections: with incidence of 17% in North America, 31% in other European countries, 39% in the South America, and 32% in rest of the world (Holland et al., 2016). This disease mostly occurs in elderly patients (Firstenberg, 2016), and the median age of patients is ≥58 (Vilcant and Hai, 2018). The risk of developing Sg endocarditis rises with the consumption of uncooked meat or fresh dairy products, weakened immune system, history of hepatic diseases, and comorbidities such as diabetes mellitus and rheumatic disorders (Cãruntu et al., 2014).

In the presence of primary infection, metabolic disorder, or immune-compromised state, Sg tries to cause endocardial injury. This injury then triggers the thrombus formation by the removal of fibrin and platelets. After thrombus formation, the bacteria enters into the bloodstream through the thrombus. As Sg has virulence properties, it can enter into the bloodstream in a paracellular manner without inducing major immune response and adheres to the damaged collagen-rich surface of the cardiac valve (endocardium). Once it is attached to the endocardium, this bacterium proliferates and forms a biofilm, which causes the inflammation in the lining of the heart and causes endocarditis (McDonald, 2009; Hensler, 2011).

Antibacterial drugs such as Penicillin G along with Gentamycin and estreptomicin are preferred medical treatments against infective endocarditis. Other options include Gentamicin-related Ceftriaxone and vancomycin in patients allergic to penicillin (Satué-Bartolomé and Alonso-Sanz, 2009). For patients with persistent fever and resistance to medical therapy, an expensive surgical intervention may be needed (Grubitzsch et al., 2016). Sg is resistant to penicillin, and one of the strains of Sg is also found to be resistant to tetracycline (Hinse et al., 2011). Therefore, development of an efficient treatment strategy against endocarditis, novel therapeutic targets, and potent drugs are urgently required.

For the rapid identification, many computational methods have been established such as core genome and subtractive genomic approaches that allow us to identify the core essential genomes and which do not possess any homology with the human genome (Caputo et al., 2019). These approaches has been used in a number of human pathogens such as Corynebacterium diphtheria (Jamal et al., 2017), Corynebacterium pseudotuberculosis (Tiwari et al., 2014), and Treponema pallidium (Jaiswal et al., 2017). This study is designed with a goal to exploit in silico approaches to link Sg species genomic data with its proteome and to identify the putative therapeutic targets. It can be used to classify potent inhibitors that may contribute to the discovery of compounds that can inhibit pathogenic developments (Jamal et al., 2017). The proteomes from the seven genomes of Sg were compared using a pan genome approach, from which only those genes were selected that were present in all the strains of Sg (Hinse et al., 2011). Then, the predicted core genome was further filtered out on the basis of essentiality for the bacteria, from which only 18 proteins were found to be essential, and all these proteins were non-homologous to the host (human). Out of these 18 proteins, 12 cytoplasmic proteins were identified as drug targets. These essential and non-host homologous protein targets were subjected to virtual screening using a library of 11,993 compounds. The identified putative targets might be used to design peptide vaccines and suggest novel lead druggable compounds that could bind to the proposed target proteins (Barh et al., 2011; Jamal et al., 2017; Uddin et al., 2019).

Materials and Methods

Genome Selection

In the current study, all available strains of Sg with available complete genome were considered for the pan genome analysis. A total of seven strains of Sg were selected; gene and protein sequences were retrieved from NCBI¹.

Identification of Core Genomes

The core genome of Sg was identified from pan genome analysis using EDGAR software (Blom et al., 2016). Only those genes that were common in all the strains of Sg were selected. The selection criteria in EDGAR software were as follows: one strain is selected as a reference strain, and rest of all the strains were compared with the reference strains and from which the core genomes were selected that were common in all the strains. The algorithm that it used was protein Basic Local Alignment Search Tool (BLASTp) with the standard scoring matrix BLOSUM62 and cutoff value of E = 1 × 10^–5 (Blom et al., 2016).

Identification of Non-host Homologous Proteins

The identified core genome of Sg was then subjected to BLASTp against the human proteome to find out the proteins non-homologous to human host using default parameters e-value = 0.0001, bit score ≥ 100, scoring matrix BLOSUM62 and identity ≥ 25%. Only those proteins that showed no hit against human proteome database were selected (Jamal et al., 2017).

Identification of Essential Genes

The non-host homologous proteins were subjected to BLASTp against Database of Essential Genes (DEG) with the standard scoring matrix BLOSUM62, e-value = 0.001 and identity ≥25% to find out essential proteins that are indispensable for the survival of pathogen. The database of essential genes consist of experimentally validated data from eukaryotes, archaea, and prokaryotes, and it covers a large number of essential genes for 31 bacteria containing more than 12,000 bacterial essential genes (Luo et al., 2014).

Drug Target Prioritization

For the determination of potential therapeutics, several factors are used like molecular weight, molecular function, cellular localization, pathway analysis, and virulence (Agüero et al., 2008). Molecular weight (MW) was determined by ProtParam tool². Targets whose MW is <100 kDa are considered as best therapeutic target (Mondal et al., 2015). Molecular functions and biological process for target proteins were determined by Uniprot³. Subcellular localization of pathogen was performed by CELLO⁴. The cellular localization of bacteria determines the environment in which proteins operate. It affects the function of protein by controlling accessibility and availability of all types of molecular interaction partners. The knowledge of protein localization often plays an important role in characterizing the cellular function of hypothetical and newly discovered proteins (Scott et al., 2005). For pathway analysis, the Kyoto Encyclopedia of Genes and Genomes (KEGG) web tool⁵ was used to determine the role of protein targets in different cellular and metabolic pathways (Kanehisa and Sato, 2020). To identify virulence of protein targets, Virulence Factor Database (VFDB)⁶ was used, which determines the pathogenic virulence of the target proteins.

Catalytic Pocket Detection

The shortlisted potential druggable proteins were further screened to detect the possible binding pockets by calculating the druggable score using DoGSiteScorer (Volkamer et al., 2012). It is an automated pocket detection tool that is used for the calculation of druggability of protein cavities. This tool needs sequence of interest in 3D structure format; therefore, SwissModel was used for the prediction of the 3D structure. SwissModel web tool predicts the 3D structures of protein targets (Nielsen et al., 2010). After obtaining 3D structures, the druggability evaluation was performed by DoGSiteScorer. This tool returns the pocket residue and druggability score, which ranges from 0 to 1. The score closer to 1 is considered as a highly druggable protein cavity (Jamal et al., 2017).

Retrieval of Ligands

Eleven thousand nine hundred ninety-three druggable molecules with Tonimoto cutoff level of 60% were retrieved from the ZINC database (Sterling and Irwin, 2015). Then, partial charges were calculated, and energies of these compounds were minimized using energy minimization algorithm with default parameters. All minimized structures were saved in.mdb file. Then, these prepared ligands were used as an input file for molecular docking (Wadood et al., 2014).

Validation of 3D Structures

All the 3D structures quality was further validated using RAMPAGE and ERRAT tool. RAMPAGE stands for RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression. This tool does Ramachandran plot analysis and provides validity score for the 3D structure of target proteins. The score ≥80 were considered good (Batut and Gingeras, 2013). For further validation, ERRAT, an online tool, was used, which provides information about the protein structure with bad regions. The quality factor of the 3D structure ≥37% were considered good (Saddala and Adi, 2018).

Preparation of Protein for Docking

The predicted 3D structures were further prepared for docking using the Molecular Operating Environment (MOE) tool. This tool is quite robust along with the meticulous algorithm. It not only predicts the top ranking poses but also prognosticate the root mean-square deviation (RMSD) along with the calculated energies of docked molecule (Pagadala et al., 2017). The 3D protonation and energy minimization of these 3D structures was done (Vilar et al., 2008); then, these minimized structures were further used as template for molecular docking.

Molecular Docking of Drug Targets

The prepared minimized structures of targeted proteins and ligands were further subjected to molecular docking carried out in MOE using the MOE Dock (Figure 1). It predicted the favorable binding possess of selected ligands active sites of drug targets. Default parameters were selected for molecular docking. After the docking, we analyzed the best poses for hydrogen bonding/π–π interactions, and then, RMSD was calculated in MOE (Wadood et al., 2014). The orientation of the best dock molecules was further analyzed in chimera.

Complete workflow of drug target identification in Sg using *in silico* approaches.

Results and Discussion

Genome Selection

The seven strains of Sg were retrieved from the National Center for Biotechnology Information (NCBI)⁷. The selection was based on the availability of their complete genome to have accuracy in our result. The details of the selected strains are summarized in Table 1.

TABLE 1.

Strains of Streptococcus gallolyticus with information on genome statistics and regions of isolation.

Strains	Genome sizes (MB)	GC%	Total genes	Total proteins	Regions
DSM 16831	2.4929	37.70	2,498	2,341	Australia
NCTC13773	2.49358	37.70	2,496	2,333	Australia
ATCC 43143	2.36224	37.50	2,357	2,229	–
ATCC BAA-2069	2.37721	37.60	2,377	2,218	Germany
UCN34	2.35091	37.60	2,345	2,215	–
ICDDRB-NRC-S1	2.0525	37.70	2,125	1,759	Bangladesh
NCTC8133	1.86767	37.50	1,845	1,733	–

Query ID	Subject ID	% Identity	Proteins
GALLO_RS00005	DEG10330356	92.857	Chromosomal replication initiator protein DnaA
GALLO_RS00200	DEG10200056	80.769	Glucan-binding protein C
GALLO_RS00610	DEG10010101	54.688	Membrane protein insertase YidC
GALLO_RS00675	DEG10380051	53.659	Transcriptional regulator CtsR
SGGBAA2069_ RS00890	DEG10280041	51.448	PTS fructose transporter subunit IIA
GALLO_RS00830	DEG10470198	50	Penicillin-binding protein 2A
SGGBAA2069_ RS01250	DEG10180105	47.283	AraC family transcriptional regulator
GALLO_RS01215	DEG10110082	45.455	DNA polymerase III subunit alpha
GALLO_RS01760	DEG10060346	44	50S ribosomal protein L28
GALLO_RS01960	DEG10470004	41.793	2-isopropylmalate synthase
GALLO_RS02145	DEG10080178	40.355	Ribosome-binding factor A
GALLO_RS02350	DEG10050423	39.623	Amino acid ABC transporter substrate-binding protein, PAAT family/amino acid ABC transporter membrane protein, PAAT family
GALLO_RS02740	DEG10300014	38.71	DNA-binding response regulator
GALLO_RS02995	DEG10430209	38.197	16S rRNA methyltransferase B
GALLO_RS03395	DEG10180247	36.364	Glutamine ABC transporter permease
GALLO_RS03550	DEG10450136	35.789	Penicillin-binding protein 2B
GALLO_RS03570	DEG10460377	35.294	UDP-N-acetylmuramoyl-tripeptide–D-alanyl-D-alanine ligase
GALLO_RS03600	DEG10050249	35.135	1-acyl-sn-glycerol-3-phosphate acyltransferase

Uniprot ID	Protein	Gene	Biological function^a	Molecular function^b	Subcellular localization^c	Virulent^d	Molecular weight^e (kDa/Da)	Pathway analysis^f
A0A139R4E3	Chromosomal replication initiator protein DnaA	dnaA	ATP binding, DNA replication origin binding	DNA replication initiation, regulation of DNA replication	Cytoplasmic	Yes	51,401.48	Two-component system
F5WXJ0	Transcriptional regulator CtsR	ctsR	DNA binding	Regulation of transcription, DNA-templated	Cytoplasmic	Yes	7598.78	Transcriptional regulator of stress and heat shock response
A0A3E2SCT8	PTS fructose transporter subunit IIA	DW662_ 04200	Phosphoenolpyruvate-dependent sugar phosphotransferase system	–	Cytoplasmic	Yes	14,982.13	No hit
A0A380K3P1	Penicillin-binding protein 2A	pbp2A	–	Penicillin binding, Transferase activity, transferring acyl groups	Cytoplasmic	Yes	84,763.57	Beta-lactam resistance
A0A380K803	AraC family transcriptional regulator	melR	Transcription, transcription regulation	DNA-binding transcription factor activity, sequence-specific DNA binding	Cytoplasmic	Yes	31,811.17	No hit
A0A380K8Y7	DNA polymerase III subunit alpha	dnaE	DNA replication	3′–5′ Exonuclease activity, DNA-directed DNA polymerase activity, nucleic acid binding	Cytoplasmic	Yes	165,491.77	DNA replication, mismatch repair, homologous recombination
A0A060RG19	50S ribosomal protein L28	rpmB	Translation	Structural constituent of ribosome	Cytoplasmic	Yes	6883.21	Ribosome
D3HCJ2	2-Isopropylmalate synthase	leuA	lLeucine biosynthetic process	2-Isopropylmalate synthase activity	Cytoplasmic	Yes	33,415.6	Biosynthesis of secondary metabolites, 2-oxocarboxylic acid metabolism, biosynthesis of amino acids, valine, leucine, and isoleucine biosynthesis, pyruvate metabolism, metabolic pathways
F5WZ36	Ribosome-binding factor A	rbfA	Maturation of SSU-rRNA	–	Cytoplasmic	Yes	13,409.48	No hit
A0A139R8A5	DNA-binding response regulator	DW662_ 02135	Phosphorelay signal transduction system, regulation of transcription, DNA-templated	DNA binding	Cytoplasmic	Yes	23,939.71	No hit
A0A1S5WAD9	16S rRNA methyltransferase B	BTR42_ 02745	Regulation of transcription, DNA-templated	RNA binding, rRNA methyltransferase activity	Cytoplasmic	Yes	19,761.96	No hit
F5WZQ7	UDP-N-acetylmuramoyl-tripeptide–D-alanyl-D-alanine ligase	murF	Cell cycle, cell division, cell wall organization, peptidoglycan biosynthetic process, regulation of cell shape	ATP binding, UDP-N-acetylmuramoyl-tripeptide-D-alanyl-D-alanine ligase activity	Cytoplasmic	Yes	50,278.43	Vancomycin resistance, peptidoglycan biosynthesis, metabolic pathways, lysine biosynthesis

S. No	Protein name	ERRAT	RAMPAGE
1	16S rRNA methyltransferase B	90.6699	92.30%
2	PTS fructose transporter subunit IIA	88.0435	90.80%
3	50S ribosomal protein L28	74.0741	87.50%
4	Chromosomal replication initiator protein DnaA	93.6747	92.60%
5	Penicillin-binding protein 2A	93.6823	91.30%
6	DNA polymerase III subunit alpha	89.1	88.90%
7	AraC family transcriptional regulator	100	97.00%
8	DNA-binding response regulator	93.0693	92.00%
9	Transcriptional regulator CtsR	100	100.00%
10	Ribosome-binding factor A	100	96.90%
11	UDP-N-acetylmuramoyl-tripeptide–D-alanyl-D-alanine ligase	94.7248	94.20%
12	2-isopropylmalate synthase	92.766	94.90%

ZINC ID	Number of interactions	Interacting residues	Minimized energy	Dock score
ZINC05835424	4	Ser 238, Asp327, Lys 263, Ala 328	–12.453	–13.4218
ZINC13650894	4	Lys 339, Lys 285, Lys 263, Asp 327	–14.373	–12.7997
ZINC13520246	3	Lys 263, Gly 262, Ser 238	–14.238	–11.2852
ZINC07001187	3	Arg 338, Asp 235, Lys 263	–18.2	–11.2818
ZINC32714665	4	Ala 328, Lys 263, Asp 327, Gly 262	–32.289	–12.2473
ZINC1404930	3	Tyr 282, Lys 285, Lys 339	–14.545	–11.9735
ZINC01711849	4	Lys 346, Lys 285	–0.952	–14.8757
ZINC01532584	5	Lys 285, Lys 339, Cys 330	–22.145	–11.7779
ZINC05181663	3	Ser 331, Lys 339, Lys 285	–21.977	–13.2929
ZINC44551376	5	Asp 341, Lys 339, Ser 27, Asn 28	–8.625	–12.3284

ZINC ID	Number of interactions	Interacting residues	Minimized energy	Dock score
ZINC05839384	3	Lys 291, Asn 120, Lys 115	–12.715	–12.3924
ZINC07089629	2	Lys 291, Asn 120	–15.34	–16.1508
ZINC13540203	4	Arg 417, Lys 412, Asp 312	–21.772	–14.0893
ZINC71618824	2	Arg 417	–14.766	–11.6138
ZINC71782058	5	Arg 41, Lys 412	–24.383	–11.3505
ZINC72281564	3	Lys 291, Asn 120	–16.347	–17.7983
ZINC01585185	5	Lys 291, Asn 120, Lys 115	–12.005	–14.2479
ZINC01152242	2	Tyr 116, Lys 291	–13.191	–13.441
ZINC00387687	2	Lys 115, Glu 294	–0.384	–13.4207
ZINC01844424	2	Lys 291, Asn 113	–22.083	–12.841

ZINC ID	Number of interactions	Interacting residues	Minimized energy	Dock score
ZINC05839384	3	Asp 124	–22.285	–12.3374
ZINC06962237	1	Thr 111	–20.134	–9.68175
ZINC19510011	2	Arg 113	–9.167	–9.14539
ZINC71603173	1	Glu 114	–79.985	–11.3391
ZINC77504434	1	Thr 111	–18.51	–9.38867
ZINC79090716	3	Thr A111, Thr B111, Glu 114	–37.17	–9.17204
ZINC01672834	1	Thr B111	–19.633	–9.02475
ZINC04352554	1	Thr B111	–9.073	–
ZINC655337127	2	Thr A111,Thr B111	–10.149	–9.23486
ZINC65337127	1	Thr A111	–20.417	–9.20724

ZINC ID	Number of interactions	Interacting residues	Minimized energy	Dock score
ZINC18033182	4	Asp 58, Glu 85	–52.033	–11.38848
ZINC32714665	3	His 83, Glu 85, Asp 58	–65.244	–11.38571
ZINC17004087	3	Glu 85, Tyr 87, Asp 58	–12.11	–11.27542
ZINC72145573	4	Lys 3, Glu 85, Asp 58	–19.982	–10.27034
ZINC71780811	4	Lys 118, Gln 28, Glu 22	–23.974	–9.37229
ZINC01638334	3	Asp 58, Glu 85	–27.139	–11.97846
ZINC01613419	4	Glu 85, Asp 58	–22.839	–11.70157
ZINC04261883	4	Glu 85, His 83	–11.661	–10.3571
ZINC38292458	3	Glu 85, Asp 58	–15.396	–10.78736
ZINC49625635	4	Asp 58, Glu 85	–51.252	–10.71723

ZINC ID	Number of interactions	Interacting residues	Minimized energy	Dock score
ZINC05567030	2	Asp408	–13.81	–3.86291
ZINC22048956	3	Tyr 456, Glu 421, Gln 424	–15.356	–13.931
ZINC19799513	2	Lys 166, Asp 382	–19.047	–13.643
ZINC17004087	3	Asp 382, Glu 381	–16.385	–13.3505
ZINC18045201	3	Arg 443, Gln 424	–16.838	–13.1728
ZINC20502353	3	Tyr 456, Gln 424	–1.255	–13.1531
ZINC20070370	2	Gly 425, Ser 424	− −6.277	–12.7398
ZINC32628102	2	Arg 443, Gly 425	–13.827	–12.6254
ZINC16942644	4	Gln 424, Gly 425, Ala 423	–3.839	–12.581

PERMALINK

Genome-Based Drug Target Identification in Human Pathogen Streptococcus gallolyticus

Nosheen Afzal Qureshi

Syeda Marriam Bakhtiar

Muhammad Faheem

Mohibullah Shah

Ahmed Bari

Hafiz M Mahmood

Muhammad Sohaib

Ramzi A Mothana

Riaz Ullah

Syed Babar Jamal

Abstract

Introduction

Materials and Methods

Genome Selection

Identification of Core Genomes

Identification of Non-host Homologous Proteins

Identification of Essential Genes

Drug Target Prioritization

Catalytic Pocket Detection

Retrieval of Ligands

Validation of 3D Structures

Preparation of Protein for Docking

Molecular Docking of Drug Targets

FIGURE 1.

Results and Discussion

Genome Selection

TABLE 1.

Identification of Core Genomes

Identification of Non-host Homologous Proteins

Identification of Essential Genes

TABLE 2.

Drug Target Prioritization

TABLE 3.

TABLE 4.

Docking

Validation of docking

FIGURE 2.

TABLE 5.

FIGURE 3.

TABLE 6.

FIGURE 4.

TABLE 7.

FIGURE 5.

TABLE 8.

FIGURE 6.

TABLE 9.

FIGURE 7.

TABLE 10.

FIGURE 8.

TABLE 11.

FIGURE 9.

TABLE 12.

FIGURE 10.

TABLE 13.

FIGURE 11.

TABLE 14.

FIGURE 12.

TABLE 15.

FIGURE 13.

TABLE 16.

FIGURE 14.

Conclusion

Data Availability Statement

Author Contributions

Conflict of Interest

Acknowledgments

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases