Core Proteomic Analysis of Unique Metabolic Pathways of Salmonella enterica for the Identification of Potential Drug Targets

Reaz Uddin; Muhammad Sufian

doi:10.1371/journal.pone.0146796

. 2016 Jan 22;11(1):e0146796. doi: 10.1371/journal.pone.0146796

Core Proteomic Analysis of Unique Metabolic Pathways of Salmonella enterica for the Identification of Potential Drug Targets

Reaz Uddin ^1,^2,^*,^#, Muhammad Sufian ^1,^#

Editor: Dipshikha Chakravortty³

PMCID: PMC4723313 PMID: 26799565

Abstract

Background

Infections caused by Salmonella enterica, a Gram-negative facultative anaerobic bacteria belonging to the family of Enterobacteriaceae, are major threats to the health of humans and animals. The recent availability of complete genome data of pathogenic strains of the S. enterica gives new avenues for the identification of drug targets and drug candidates. We have used the genomic and metabolic pathway data to identify pathways and proteins essential to the pathogen and absent from the host.

Methods

We took the whole proteome sequence data of 42 strains of S. enterica and Homo sapiens along with KEGG-annotated metabolic pathway data, clustered proteins sequences using CD-HIT, identified essential genes using DEG database and discarded S. enterica homologs of human proteins in unique metabolic pathways (UMPs) and characterized hypothetical proteins with SVM-prot and InterProScan. Through this core proteomic analysis we have identified enzymes essential to the pathogen.

Results

The identification of 73 enzymes common in 42 strains of S. enterica is the real strength of the current study. We proposed all 73 unexplored enzymes as potential drug targets against the infections caused by the S. enterica. The study is comprehensive around S. enterica and simultaneously considered every possible pathogenic strain of S. enterica. This comprehensiveness turned the current study significant since, to the best of our knowledge it is the first subtractive core proteomic analysis of the unique metabolic pathways applied to any pathogen for the identification of drug targets. We applied extensive computational methods to shortlist few potential drug targets considering the druggability criteria e.g. Non-homologous to the human host, essential to the pathogen and playing significant role in essential metabolic pathways of the pathogen (i.e. S. enterica). In the current study, the subtractive proteomics through a novel approach was applied i.e. by considering only proteins of the unique metabolic pathways of the pathogens and mining the proteomic data of all completely sequenced strains of the pathogen, thus improving the quality and application of the results. We believe that the sharing of the knowledge from this study would eventually lead to bring about novel and unique therapeutic regimens against the infections caused by the S. enterica.

Introduction

Salmonella enterica is a Gram-negative facultative anaerobic intracellular bacterium. According to the classification scheme of Kauffmann-White [1], more than 2500 serological variants (or serovars) were categorized in six subspecies [2, 3]. Most of the serovars have a broad range of hosts while some have adapted to specific hosts. The mechanism of adaptation is currently unclear [4]. Typically, S. enterica serovars infect the host through the mouth, leading to the three major symptoms: enterocolitis, bacteremia and enteric fever, or asymptomatic chronic carriage [5]. Human pathogens include serovar Typhi, Paratyphi, Typhimurium, Sendai, Choleraesuis, Dublin and many others [3].

Pathogenesis of Salmonella enterica initiates with its entry in the host organism. Salmonella is usually acquired from the environment by contact with a carrier host or by oral intake of contaminated food or water. After ingestion, Salmonella survives the low pH of the stomach, eventually leading to entry of the intestine where it uses a type III secretion system to deliver effecter proteins essential for intestinal invasion [6]. Hereafter, bacterial progression within the host is different in Non-Typhoidal Salmonella and Typhoidal Salmonella. Non-typhoidal Salmonella serovars induce a localized inflammation which, in immunocompetent persons, results in enterocolitis with the infiltration of polymorphonuclear leukocytes (PMNs) into the sub-mucosal epithelium [7]. In Typhoidal Salmonella, intestinal inflammation is moderate, largely consisting of macrophage infiltration [8] and the bacteria is distributed and reaches the blood either directly or via the mesenteric lymph nodes or are transported within leukocytes, causing bacteremia [9]. Both types of Salmonella grow and persist in systemic tissues where they adapt to the intracellular environment. The pathogen can escape from host cells using secretion systems [10].

A genome is the set of genes in a single functional organism, whereas the pangenome of a prokaryote is the set of non-redundant genes which includes a core genome containing genes present in all strains; dispensable genes that are absent from one or more strains, but not all; and genes that are unique to each strain [11]. Recently, microbial pangenomics has attracted the scientific community which was inspired by the accessibility to sequenced data of whole-genomes of the strains of particular species [12–15]. Simultaneously, research on pan-proteomics was also initiated to study the effects of similarities and differences at the protein level among the strains of specie [16–18]. As of October 13, 2015, there were only 45 target genes reported in DrugBank Database for S. enterica, which covers only 1.6% of its core genome size i.e. 2,800 [19]. Since the pathogen has developed resistance against conventional drugs, so there is a dire need to find new therapeutic drug targets.

In the present study, we took the whole proteome sequence data of 42 strains of 19 serovars of S. enterica and KEGG-annotated metabolic pathway data of Homo sapiens, identified and discarded S. enterica homologs of human proteins in unique metabolic pathways (UMPs) and identified enzymes essential to the pathogen using DEG database. We compared our results to a previous study [20] where they searched for new antimicrobial targets by focusing on different metabolic enzymes of a single serovar and comparing the results with other serovars at the genome level. In a more recent report, the pangenomic analyses of 22 complete and 23 draft genome sequences was performed [19]. However, to the best of our knowledge the current study is the first subtractive core proteomic analysis of the unique metabolic pathways applied to any pathogen for the identification of drug targets primarily essential enzymes.

Methodology

A schematic representation of the methodology is given in Fig 1. 88 biological datasets used in our analyses were downloaded from online sources, details of which are given in S1 Table.

1. Identification of UMPs of S. enterica

KEGG Brite Hierarchy files of H. sapiens and 42 strains of S. enterica containing information about the genes of respective metabolic pathways were downloaded from the KEGG database [21]. The metabolic pathways unique to the serovars (i.e. missing in human host) were identified using KEGG Orthology (KO) IDs, and the corresponding genes were sorted out. The UMPs absent in some strains were listed out using in-house AWK scripts.

2. Clustering common proteins of UMPs of 42 strains

The KEGG IDs of all the genes from UMPs were converted to corresponding NCBI GIs using KEGG-API service [21]. Amino acid sequences were retrieved from the respective strains available on NCBI FTP server [22] using Fastblast [23]. The genes encoding tRNA and rRNA were excluded since the aim was to propose enzymes as the drug targets. Further plasmid-encoded genes were not considered to be essential for the survival of cell, as per information available in the Database of Essential Genes (DEG) [24]. We noticed that some NCBI GIs were discontinued and therefore, updated to the new GIs. We linked the new GIs with the old one and retrieved the sequence. CD-HIT [25] is a standalone command-based application which groups a set of sequences of a database on the basis of sequence identity. Orthologs within the 42 strains were identified by using CD-HIT (updated on August 27, 2012) to group protein sequences with at least 80% sequence identity in to Clusters of Proteins (COPs) so that each COP will be analyzed at once for further steps of subtractive proteomics. The results were verified by comparison to the online server of ElimDupes [26].

3. Searching of non-homologous essential enzymes

To process all COPs for subtractive proteomic analyses at once, a novel strategy was applied which comprised of two approaches. In first approach, proteins of all COPs were subjected to BLASTp [27] against Homo sapiens downloaded from NCBI FTP server [28] and the output was analyzed for non-homologous proteins. In second approach, 3 strains out of 42 were selected at random and proteins of those strains were subjected to BLASTp against human proteome. Both approaches are illustrated in S1 Fig. The parameter details for BLASTp are mentioned in Table 1 (a). The results of both approaches were observed by BioPerl module SearchIO [29] and the better approach was adapted to the next steps considering the criteria of time processing. The non-homologous COPs from the previous step were subjected to BLASTp of DEG V. 10 [24] to identify essential genes of the pathogen. The parameter details are mentioned in Table 1 (b). The KEGG Brite hierarchy is one of the important features of KEGG server containing the information of enzymes of metabolic pathways. The enzymes were sorted out from non-homologous essential COPs of S. enterica using the hierarchy files of 42 strains [21].

Table 1. Parameters for BLASTp.

	a	b	c	d
Program	BLAST+ 2.2.28	BLASTp of DEG 10	BLAST+ 2.2.28	BLAST+ 2.2.28
Query name	COPs	Non-homologous COPs	Non-homologous COPs	Hypothetical proteins
No. of queries	241	198	198	114
Subject name	Human proteome	DEG	VFDB	PDB proteins
No. of subjects	68,939	12,379	2,447	252,484
E-value	1.00E-03	1.00E-05	1.00E-04	1.00E-05

Open in a new tab

4. Searching the virulent genes

VFDB (Virulence Factors Database) [30]containing protein sequences of all virulent genes was downloaded and non-homologous COPs from 3 randomly selected strains were subjected to standalone BLASTp against VFDB sequences to find out virulent genes with sequence identity of 70% or more. Table 1 (c) contained the parameter details.

5. Characterization of the hypothetical proteins

The hypothetical proteins were identified among the enzymes to characterize their structure and/or function. All the hypothetical protein sequences were subjected to standalone BLASTp against protein sequences available in PDB (Protein Data Bank) [31] obtained from PDB FTP server [32]. The parameter details are mentioned in Table 1 (d). The queries with significant hits against PDB database were verified from CD-HIT output and those with ‘no hits’ were subjected to SVM-Prot [33] and InterProScan version 4.0 [34] for protein family prediction. The results were manually cross-checked with CD-HIT output.

6. Validation from the literature:

The non-homologous catalytic proteins considered as putative drug targets were validated from DrugBank database [35] and published results of Becker et. al. [20]. In order to do so, the gene symbols of essential enzymes [20] were converted to full form using DAVID Bioinformatics tool [36], and then searched in both sources manually.

Results and Discussion

1. Identification of UMPs of S. enterica

Each of the metabolic pathways of 42 strains of the S. enterica was compared with the complete human metabolic pathway. On average, each strain has 117 metabolic pathways and at least 34 UMPs (Table 2) with all UMPs present in almost all strains. A heatmap containing the percentage presence of proteins in each pathway and totally absent pathways in individual strains is illustrated in Fig 2, while its corresponding quantitative data is provided as S2 Table. In the studied strains of S. enterica, we found that only the strain (Typhi P-stx-12) was predicted to metabolize the Atrazine, thus may be resistant to it. However the dataset lacked the pathway information of β-Lactam resistance and Bisphenol degradation which were also the next most frequent absent pathways among all studied strains. The strains Heidelberg CFSAN002069 and Typhi CT18 needed to update in KEGG since the data was not updated and hence 22 and 11 NCBI GIs were appended, respectively in both strains and mentioned in S3 Table.

Table 2. Details of Metabolic Pathways and Genes of human and 42 strains of S. enterica.

S.No.	Organism name	Organism KEGG Code	No. of Pathways	Unique Pathways	KEGG ID	NCBI RefSeq ID	NCBI Gis	Sequences
	Homo sapiens	has	286	-	-	H_sapiens	-	-
1	Agona SL483	sea	118	32	428	NC_011149	426	417
2	Arizonae 62 z4 z23	ses	117	31	407	NC_010067	406	406
3	Bareilly CFSAN000189	see	119	33	429	NC_021844	428	426
4	Bovismorbificans 3114	senb	117	31	414	NC_022241	414	414
5	Choleraesuis SC B67	sec	119	33	410	NC_006905	410	408
6	Cubana CFSAN002050	seeb	117	31	416	NC_021818	416	415
7	Dublin CT 02021853	sed	116	31	425	NC_011205	425	416
8	Enteritidis P125109	setc	117	31	435	NC_011294	435	430
9	Gallinarum 287 91	sega	116	31	399	NC_011274	399	399
10	Gallinarum Pullorum CDC1983 67	seg	117	32	409	NC_022221	409	409
11	Gallinarum pullorum RKS5078	sel	116	31	395	NC_016831	395	395
12	Heidelberg 41578	seec	117	31	430	NC_021810	430	426
13	Heidelberg B182	shb	116	31	442	NC_017623	440	430
14	Heidelberg CFSAN002069	senh	116	31	451	NC_021812	451	440
15	Heidelberg SL476	seh	119	33	433	NC_011083	431	431
16	Javiana CFSAN001992	senj	115	31	419	NC_020307	419	419
17	Newport SL254	seeh	118	32	444	NC_011080	444	433
18	Newport USMARC S3124 1	senn	118	32	425	NC_021902	425	425
19	Paratyphi A AKU 12601	sek	117	32	405	NC_011147	404	404
20	Paratyphi A ATCC 9150	spt	117	32	408	NC_006511	407	407
21	Paratyphi B SPB7	spq	118	32	428	NC_010102	427	427
22	Paratyphi C RKS4594	sei	116	31	418	NC_012125	418	418
23	Pullorum S06004	seep	116	31	375	NC_021984	375	375
24	Schwarzengrund CVM19633	sew	119	33	436	NC_011094	435	424
25	Thompson RM6836	sene	117	31	421	NC_022525	421	421
26	Typhi CT18	sty	119	33	409	NC_003198	409	408
27	Typhi P stx 12	sex	117	32	407	NC_016832	407	406
28	Typhi Ty2	stt	117	32	409	NC_004631	409	409
29	Typhi Ty21a	sent	116	31	406	NC_021176	406	406
30	Typhimurium 08–1736	seen	117	31	420	NC_021820	420	420
31	Typhimurium 14028S	seo	117	31	433	NC_016856	433	433
32	Typhimurium 798	sef	117	31	430	NC_017046	430	430
33	Typhimurium D23580	sev	117	31	434	NC_016854	434	434
34	Typhimurium DT104	send	119	32	426	NC_022569	426	426
35	Typhimurium DT2	senr	117	31	428	NC_022544	428	428
36	Typhimurium LT2	stm	118	32	447	NC_003197	447	447
37	Typhimurium SL1344	sey	117	31	435	NC_016810	435	435
38	Typhimurium ST4 74	seb	117	31	437	NC_016857	437	437
39	Typhimurium T000240	sem	119	32	438	NC_016860	438	438
40	Typhimurium U288	setu	119	32	434	NC_021151	434	433
41	Typhimurium UK 1	sej	117	31	430	NC_016863	430	430
42	Typhimurium var 5 CFSAN001921	set	117	32	422	NC_021814	422	422

Open in a new tab

Fig 2 — The heatmap contains percentage presence and absence of genes of in each metabolic pathway of 42 strains of S. *enterica*.

2. Clustering common proteins of UMPs of 42 strains and searching of non-homologous essential enzymes

The CD-HIT resulted in 537 COPs and each cluster was comprised of more than 1 protein. Out of total, 241 COPs contained at least 42 proteins belonging to the 42 strains of S. enterica. S4 Table contained the NCBI-GIs of orthologous proteins (genes) clustered in groups.

The complete human proteome was obtained from NCBI FTP server (details in S1 Table). The non-homologous proteins could be potential drug targets with reduced possible side effects or cross reactivity of the drug with the host proteins. It is essential to find the similarity of the shortlisted sequences with the human host. In order to do so, we compared each COP with the individual human proteins. We performed this comparison by two separate approaches (details in methods section). As stated earlier that the COPs were consisted of up to 80% similar proteins; therefore, if we compare either (i) each single entry of the COPs with the host proteins or (ii) comparing few randomly selected entries of the COPs with human host proteins, the outcome would remain same. We used both of the approaches to see if the statement maintains. Both approaches of searching non-homologous sequences in the pathogen revealed exactly same results i.e. 198 out of 241 COPs were identified as non-homologous to humans (Table 3). The second approach was selected for the further steps of subtractive proteomics as the approach was accurate and relatively fast. The COP names mentioned in Table 3 were allocated by the authors following the criteria of maximum or common occurrences of that name in a respective cluster. One important aspect was observed during the tabulation of data (Table 3) that despite having exactly the same or closely similar names within the COPs, the member proteins of the respective COPS showed low similarity among them. These COPs include Cytochrome BD-II Ubiquinol Oxidase (COP # 139 and 221), D-alanyl-D-alanine Carboxypeptidase (COP # 127 and 190), Lipopolysaccharide core biosynthesis protein (COP # 250, 339 and 384), Peptidoglycan Synthetase FtsI (COP # 65 and 67), PTS system Ascorbate-specific transporter IIC (COP # 129 and 164), Transcriptional regulator (COP # 17 and 167), Tricarboxylate transport membrane protein (COP # 109 and 476), Two component response regulator (COP # 378, 410 and 411) and Type III Secretion apparatus protein SpaR (COP # 341 and 344). From the similar named COPs, we randomly selected the few proteins and subjected to online BLASTp which resulted in low similarity in each case. There might be two possibilities for the outcome; either these sets of COPs were isozymes or might be human error during the GenBank submission. For instance BLASTp of NCBI GI 194443076 and 194443845 have only 29% identity though they both have same name and belong to the same strain. The beta subunit of the subtype 1 and 2 of the enzyme Nitrate reductase shared more than 80% sequence similarity and hence clustered in a single COP. The enzyme Succinate Dehydrogenase Cytochrome b556 large membrane was somehow not characterized as an enzyme during KEGG analysis hence its UniProt ID was mentioned in Table 3.

Table 3. Functional characterization of non-homologous COPs.

COP Name	Subtype	COP #	Virulent	Essential	Enzyme	Becker 2006
[Citrate (pro-3S)-lyase] ligase		247
2-(5''-triphosphoribosyl)-3'-dephospho-CoA synthase		432
2-dehydro-3-deoxyphosphooctonate aldolase		328		Yes	Yes	Yes
3-deoxy-D-manno-octulosonic-acid transferase		192		Yes	Yes	Yes
3-deoxy-manno-octulosonate cytidylyltransferase		361		Yes	Yes	Yes
Acetate kinase		205		Yes	Yes
ADP-heptose—LPS heptosyltransferase	I	291		Yes	Yes	Yes
	II	261		Yes	Yes	Yes
Aerotaxis receptor		104		Yes
Alanine racemase		245		Yes	Yes
Alkylphosphonate utilization operon protein PhnA		498		Yes	Yes
Anti-sigma-28 factor	FlgM	507
Aspartate racemase		366
Bifunctional chorismate mutase/prephenate dehydrogenase		227
Carbon storage regulator		527		Yes
Chemotaxis methyltransferase	CheR	321		Yes	Yes
Chemotaxis protein	CheA	49		Yes	Yes	Dispensable
	CheW	276		Yes
	CheZ	409
	CheY	486	Yes	Yes
Chemotaxis-specific methylesterase		259	Yes	Yes	Yes
Chromosomal replication initiation protein		136		Yes
Citrate lyase	Gamma	505		Yes	Yes
Colanic acid capsular biosynthesis activation protein	A	416
Cytochrome BD-II ubiquinol oxidase	1	221		Yes	Yes
	1	139		Yes	Yes
	2	273		Yes	Yes
D-alanyl-D-alanine carboxypeptidase		127		Yes	Yes
		190		Yes	Yes
DNA-binding transcriptional activator	DcuR	376		Yes
	KdpE	395		Yes
	SdiA	355
	UhpA	404		Yes
DNA-binding transcriptional regulator	BaeR	374		Yes
	BasR	390	Yes	Yes
	CpxR	385		Yes
	PhoP	398	Yes	Yes
	QseB	336		Yes
	RstA	368		Yes
D-ribose transporter	RbsB	310		Yes
Flagella synthesis protein	FlgN	478
Flagellar assembly protein	FliH	370
Flagellar basal body L-ring protein		382	Yes
Flagellar basal body P-ring biosynthesis protein	FlgA	401
Flagellar basal body rod modification protein		383
Flagellar basal body rod protein	FlgB	479	Yes
	FlgC	484	Yes
	FlgF	354
	FlgG	343	Yes
Flagellar biosynthesis protein	FliJ	471		Yes
	FliO	487		Yes
	FliP	364	Yes
	FliQ	517	Yes
	FliR	340
	FliT	490
Flagellar hook protein	FlgE	201
Flagellar hook-associated protein	FlgL	290		Yes
Flagellar hook-basal body protein	FliE	503
Flagellar hook-length control protein		199
Flagellar motor protein	MotA	312		Yes
Flagellar motor switch protein	FliM	275	Yes
Flagellar motor switch protein	G	279	Yes
Flagellar MS-ring protein		68
Flagellar protein	FliS	481	Yes
Formate dehydrogenase-O	Gamma	412
Fructose 1,6-bisphosphate aldolase		244		Yes	Yes
Fumarate reductase	C	485
	D	492		Yes
Glutamate/aspartate ABC transporter permease	GltK	397		Yes
Hydrogenase 2	Large	72
	Small	230		Yes	Yes
Integral membrane protein	MviN	90		Yes
Invasion protein	InvA	48	Yes
Isochorismatase		326	Yes	Yes	Yes
Isochorismate synthase		174		Yes	Yes
Lipid A biosynthesis lauroyl acyltransferase		280		Yes	Yes	Yes
Lipid-A-disaccharide synthase		218		Yes	Yes	Yes
Lipopolysaccharide 1,2-glucosyltransferase		272		Yes	Yes
Lipopolysaccharide 1,3-galactosyltransferase		271		Yes	Yes
Lipopolysaccharide core biosynthesis protein		250		Yes	Yes
		339		Yes	Yes	Yes
		384		Yes	Yes
	RfaG	226		Yes	Yes
Maltose ABC transporter substrate-binding protein		132		Yes
Monofunctional biosynthetic peptidoglycan transglycosylase		369		Yes	Yes
Multidrug efflux system	MdtC	12		Yes
Nitrate reductase 1	Alpha	4		Yes	Yes
Nitrate reductase molybdenum cofactor assembly chaperone 1		381
Nitrate reductase (81 duplicates of 1 and 2)	Beta	98		Yes	Yes
Nitrogen regulation protein	NR(I)	135		Yes
	NR(II)	260		Yes	Yes
	P-II 1	497		Yes
O-antigen ligase		166		Yes	Yes	Yes
Osmolarity response regulator	OmpR	367		Yes
Osmolarity sensor protein	EnvZ	160		Yes	Yes
Outer membrane channel protein	TolC	120		Yes
Outer membrane lipoprotein		482
Outer membrane porin protein	C	223		Yes
Outer membrane protease		293	Yes
Outer membrane protein	F	238		Yes
Penicillin-binding protein	1b	33		Yes	Yes	Yes
	2	54		Yes
Peptide transport periplasmic protein	SapA	76		Yes
Peptidoglycan synthetase	1a	31		Yes	Yes	Yes
	FtsI	65		Yes	Yes	Yes
	FtsI	67		Yes	Yes	Yes
Phosphate ABC transporter substrate-binding protein		251		Yes
Phosphate acetyltransferase		43		Yes	Yes
Phosphate regulon sensor protein	PhoR	186		Yes	Yes
Phosphoenolpyruvate carboxylase		29		Yes	Yes
Phosphoenolpyruvate-protein phosphotransferase		40		Yes	Yes
Phosphoglyceromutase		93		Yes	Yes
Phospho-N-acetylmuramoyl-pentapeptide-transferase		242		Yes	Yes	Yes
PII uridylyl-transferase		23		Yes	Yes
Preprotein translocase	SecA	22		Yes
	SecB	459		Yes
	SecD	60		Yes
	SecE	477		Yes
	SecF	270		Yes
	SecG	439		Yes
	SecY	169		Yes
	YajC	500		Yes
PTS system ascorbate-specific transporter	IIC	129		Yes
	IIC	164		Yes
PTS system fructose-specific transporter	IIBC	74		Yes	Yes
PTS system glucitol/sorbitol-specific transporter	IIA	491
	IiB	284
PTS system glucose-specific transporter	IIA	447		Yes	Yes
	IIBC	119		Yes	Yes
PTS system lactose/cellobiose-specific transporter	IIB	515
PTS system L-ascorbate-specific transporter	IIA	460		Yes	Yes
PTS system mannitol-specific transporter	IIA	465		Yes	Yes
	IIABC	50		Yes	Yes
PTS system mannose-specific transporter	IiAB	285		Yes	Yes
	IIC	338
	IID	324
PTS system N,N'-diacetylchitobiose-specific transporter	IIA	496		Yes	Yes
	IIB	499		Yes	Yes
	IIC	158
PTS system phosphohistidinoprotein-hexose phosphotransferase	Hpr	514		Yes
	Npr	516		Yes
PTS system transporter subunit IIA-like nitrogen-regulatory protein	PtsN	452		Yes	Yes
Purine-binding chemotaxis protein		449	Yes	Yes
Respiratory nitrate reductase 1	Gamma	396
RNA polymerase sigma factor for flagellar biosynthesis		377	Yes	Yes
RNA polymerase sigma-54 factor		128		Yes
Sec-independent translocase		434
Secretion system apparatus protein	SsaU	255	Yes
	SsaV	45	Yes
Sensor protein	PhoQ	123	Yes	Yes	Yes
	BasS/ PmrB	246	Yes	Yes	Yes
	RstB	179		Yes	Yes
Signal transduction histidine-protein kinase	BaeS	140		Yes	Yes
Succinate dehydrogenase cytochrome b556 large membrane		483		Yes	K8TKP2
Surface presentation of antigens protein	SpaO	305	Yes
	SpaP	400	Yes
	SpaQ	521	Yes	Yes
	SpaS	249	Yes
Tetraacyldisaccharide 4'-kinase		282		Yes	Yes	Yes
Tetrathionate reductase complex	A	13	Yes
Transcriptional activator	FlhC	423	Yes
	FlhD	494	Yes
Transcriptional regulator	PhoB	387		Yes
		167		Yes
		17	Yes	Yes
	RcsB	386		Yes
Tricarboxylate transport membrane protein		109
		476
Twin arginine translocase	A	522		Yes
	E	525		Yes
Twin-arginine protein translocation system	TatC	345		Yes
Two component response regulator		410	Yes	Yes
		411	Yes	Yes
		378		Yes
Two-component sensor kinase protein		152		Yes	Yes
Type III secretion apparatus lipoprotein YscJ/HrcJ family		352	Yes	Yes
Type III secretion apparatus needle protein	PrgI	523	Yes
	SsaG	519	Yes
Type III secretion apparatus protein	SpaR	341	Yes	Yes
	SpaR	344	Yes	Yes
Type III secretion outer membrane pore		111	Yes
Type III secretion outer membrane protein YscC/HrcC family		73	Yes
Type III secretion system protein		286	Yes	Yes
	FliP	407	Yes
	InvE	229	Yes
UDP pyrophosphate phosphatase		333		Yes	Yes
UDP-2,3-diacylglucosamine hydrolase		372		Yes	Yes	Yes
UDP-3-O-[3-hydroxymyristoyl] glucosamine N-acyltransferase		265		Yes	Yes	Yes
UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine deacetylase		301		Yes	Yes	Yes
UDP-N-acetylenolpyruvoylglucosamine reductase		263		Yes	Yes	Yes
UDP-N-acetylglucosamine 1-carboxyvinyltransferase		194		Yes	Yes	Yes
UDP-N-acetylglucosamine acyltransferase		342		Yes	Yes	Yes
UDP-N-acetylmuramate—L-alanine ligase		121		Yes	Yes	Yes
UDP-N-acetylmuramoylalanyl-D-glutamate—2,6-diaminopimelate ligase		113		Yes	Yes	Yes
UDP-N-acetylmuramoyl-L-alanyl-D-glutamate synthetase		177		Yes	Yes	Yes
UDP-N-acetylmuramoyl-tripeptide—D-alanyl-D-alanine ligase		157		Yes	Yes	Yes
Virulence membrane protein	PagC	430
Zinc resistance protein		467		Yes

Open in a new tab

Additionally, we searched for the essential and virulent genes from the 198 COPs by applying the same subtractive proteomics approach. The database of essential genes (DEG) is a well curated open-access database consisting of essential genes from various organisms ranging from single-cell prokaryotes to multicellular eukaryotes. The bacteria harbor various virulent genes which lead to pathogenecity. Therefore, identifying virulent factors in the genome could lead us to elucidate the molecular mechanism of bacterial pathogenecity. The VFDB [30] is an online server containing information about virulent genes present in various microorganisms. Similar results were obtained from 3 randomly selected strains and it was found out that 138 out of 198 COPs were essential for the bacteria as per the prediction of DEG (Table 3), and 42 out of 198 COPs were identified as virulent genes (Table 3). There were 73 enzymes in the 138 non-humongous essential COPs (Table 3). The NCBI GIs of each respective COP was presented in S5 Table. The S1 Text contained important information regarding the accessibility of NCBI GIs mentioned in S5 Table. The data illustrated through pie chart in Fig 3 and tabulated in Table 4 revealed that most of the targets (34%) belonged to the subclass ‘phosphoryl transferases’ or ‘kinases’ which are the most favorable targets in drug discovery research [37].

Fig 3 — The pie chart reveals that 63% of the enzyme targets belong to Transferase class which is subdivided into phosphoryl (34%), glycosyl (19%) and other (10%) transferases.

Table 4. Enzyme Classification of 73 drug targets.

Enzyme name	E.C. Number	Enzyme Class	Enzyme Sub-class
Cytochrome BD-II ubiquinol oxidase 1	1.10.3.10	Oxidoreductase	diphenols as donors
Cytochrome BD-II ubiquinol oxidase 2	1.10.3.10	Oxidoreductase	diphenols as donors
Cytochrome BD-II ubiquinol oxidase 3	1.10.3.10	Oxidoreductase	diphenols as donors
Hydrogenase 3	1.12.-.-	Oxidoreductase	hydrogen as donor
UDP-N-acetylenolpyruvoylglucosamine reductase	1.3.1.98	Oxidoreductase	CH-CH group of donors
Nitrate reductase 1	1.7.99.4	Oxidoreductase	nitrogenous compounds as donors
Nitrate reductase 2	1.7.99.4	Oxidoreductase	nitrogenous compounds as donors
Chemotaxis methyltransferase	2.1.1.80	Transferase	One-Carbon group
Lipid A biosynthesis lauroyl acyltransferase	2.3.1.-	Transferase	acyl
UDP-N-acetylglucosamine acyltransferase	2.3.1.129	Transferase	acyl
UDP-3-O-[3-hydroxymyristoyl] glucosamine N-acyltransferase	2.3.1.191	Transferase	acyl
Phosphate acetyltransferase	2.3.1.8	Transferase	acyl
ADP-heptose—LPS heptosyltransferase 1	2.4.-.-	Transferase	glycosyl
ADP-heptose—LPS heptosyltransferase 2	2.4.-.-	Transferase	glycosyl
Lipopolysaccharide core biosynthesis protein 1	2.4.-.-	Transferase	glycosyl
Lipopolysaccharide core biosynthesis protein 2	2.4.-.-	Transferase	glycosyl
Lipopolysaccharide core biosynthesis protein 3	2.4.-.-	Transferase	glycosyl
Lipopolysaccharide core biosynthesis protein 4	2.4.-.-	Transferase	glycosyl
Peptidoglycan synthetase 1	2.4.1.129	Transferase	glycosyl
Peptidoglycan synthetase 2	2.4.1.129	Transferase	glycosyl
Peptidoglycan synthetase 3	2.4.1.129	Transferase	glycosyl
Lipid-A-disaccharide synthase	2.4.1.182	Transferase	glycosyl
Lipopolysaccharide 1,3-galactosyltransferase	2.4.1.44	Transferase	glycosyl
Lipopolysaccharide 1,2-glucosyltransferase	2.4.1.58	Transferase	glycosyl
Monofunctional biosynthetic peptidoglycan transglycosylase	2.4.2.-	Transferase	glycosyl
3-deoxy-D-manno-octulosonic-acid transferase	2.4.99.12	Transferase	glycosyl
2-dehydro-3-deoxyphosphooctonate aldolase	2.5.1.55	Transferase	alkyl
UDP-N-acetylglucosamine 1-carboxyvinyltransferase	2.5.1.7	Transferase	alkyl
Tetraacyldisaccharide 4'-kinase	2.7.1.130	Transferase	phosphorus
PTS system fructose-specific transporter	2.7.1.69	Transferase	phosphorus
PTS system glucose-specific transporter 1	2.7.1.69	Transferase	phosphorus
PTS system glucose-specific transporter 2	2.7.1.69	Transferase	phosphorus
PTS system L-ascorbate-specific transporter	2.7.1.69	Transferase	phosphorus
PTS system mannitol-specific transporter 1	2.7.1.69	Transferase	phosphorus
PTS system mannitol-specific transporter 2	2.7.1.69	Transferase	phosphorus
PTS system mannose-specific transporter	2.7.1.69	Transferase	phosphorus
PTS system N,N'-diacetylchitobiose-specific transporter 1	2.7.1.69	Transferase	phosphorus
PTS system N,N'-diacetylchitobiose-specific transporter 2	2.7.1.69	Transferase	phosphorus
PTS system transporter subunit IIA-like nitrogen-regulatory protein	2.7.1.69	Transferase	phosphorus
Chemotaxis protein	2.7.13.3	Transferase	phosphorus
Osmolarity sensor protein	2.7.13.3	Transferase	phosphorus
Phosphate regulon sensor protein	2.7.13.3	Transferase	phosphorus
Sensor protein 1	2.7.13.3	Transferase	phosphorus
Sensor protein 2	2.7.13.3	Transferase	phosphorus
Sensor protein 3	2.7.13.3	Transferase	phosphorus
Signal transduction histidine-protein kinase	2.7.13.3	Transferase	phosphorus
Acetate kinase	2.7.2.1	Transferase	phosphorus
Nitrogen regulation protein	2.7.3.-	Transferase	phosphorus
Two-component sensor kinase protein	2.7.3.-	Transferase	phosphorus
Phosphoenolpyruvate-protein phosphotransferase	2.7.3.9	Transferase	phosphorus
3-deoxy-manno-octulosonate cytidylyltransferase	2.7.7.38	Transferase	phosphorus
PII uridylyl-transferase	2.7.7.59	Transferase	phosphorus
Phospho-N-acetylmuramoyl-pentapeptide-transferase	2.7.8.13	Transferase	phosphorus
Chemotaxis-specific methylesterase	3.1.1.61	Hydrolase	Ester bond
Alkylphosphonate utilization operon protein PhnA	3.11.1.2	Hydrolase	phosphonoacetate
Isochorismatase	3.3.2.1	Hydrolase	Ether bond
D-alanyl-D-alanine carboxypeptidase 1	3.4.16.4	Hydrolase	peptidase
D-alanyl-D-alanine carboxypeptidase 2	3.4.16.4	Hydrolase	peptidase
Penicillin-binding protein	3.4.16.4	Hydrolase	peptidase
UDP-3-O-[3-hydroxymyristoyl] N-acetylglucosamine deacetylase	3.5.1.-	Hydrolase	linear amides
UDP pyrophosphate phosphatase	3.6.1.27	Hydrolase	acid anhydrides
UDP-2,3-diacylglucosamine hydrolase	3.6.1.54	Hydrolase	acid anhydrides
Phosphoenolpyruvate carboxylase	4.1.1.31	Lyase	Carbon-Carbon
Fructose 1,6-bisphosphate aldolase	4.1.2.13	Lyase	Carbon-Carbon
Citrate lyase	4.1.3.6	Lyase	Carbon-Carbon
Alanine racemase	5.1.1.1	Isomerase	Epimerases
Phosphoglyceromutase	5.4.2.-	Isomerase	Intramolecular transfer
Isochorismate synthase	5.4.99.6	Isomerase	Intramolecular transfer
O-antigen ligase	6.-.-.-	Ligase	Ligase
UDP-N-acetylmuramoyl-tripeptide—D-alanyl-D-alanine ligase	6.3.2.10	Ligase	Peptide Synthases
UDP-N-acetylmuramoylalanyl-D-glutamate—2,6-diaminopimelate ligase	6.3.2.13	Ligase	Peptide Synthases
UDP-N-acetylmuramate—L-alanine ligase	6.3.2.8	Ligase	Peptide Synthases
UDP-N-acetylmuramoyl-L-alanyl-D-glutamate synthetase	6.3.2.9	Ligase	Peptide Synthases

Open in a new tab

3. Characterization of the hypothetical proteins

Hypothetical proteins are those for which the sequences are available but their family and functional classification has not been established. As such they may represent unidentified drug targets [38, 39]. The computational methods (for e.g. Blast2GO, HMMscan, KEGG Automatic Annotation Server (KAAS), ProtParam server, PSORTb, SVMProt, etc) are effective in annotating the functional and family classes of the big number of hypothetical sequences present in bacterial genomes [40–42]. The functional classification may lead us to predict the mechanism of the possible metabolic pathway in which the protein is involved. In order to characterize the hypothetical proteins among the shortlisted COPs, we first looked how many proteins were hypothetical. We found out that there were 3,105 proteins in 73 COPs, out of which 114 proteins were hypothetical (Table 5). The identifier details of these 3,105 enzymes are provided in S6 Table.

Table 5. BLASTp of Hypothetical Proteins in non-homologous COPs in PDB.

NCBI-GI	Protein name	COP #	KEGG Organism code	NCBI RefSeq ID	PDB Best Hit
NCBI-GI	Protein name	COP #	KEGG Organism code	NCBI RefSeq ID	PDB ID	Bit Score	Percent identity
161503125	SARI_01190	4	ses	NC_010067	1q16_A	1247	95.1
161613744	SPAB_01469	4	spq	NC_010102	1q16_A	1247	95.3
538362953	BN855_24210	43	senb	NC_022241	1xco_F	313	46.6
378959111	STBHUCCB_10250	49	sex	NC_016832	2lp4_A	227	84.1
538362544	BN855_20090	49	senb	NC_022241	2lp4_A	227	84.1
161505779	SARI_03955	50	ses	NC_010067	1j6t_A	146	97.9
161616759	SPAB_04578	50	spq	NC_010102	1j6t_A	146	97.9
161504756	SARI_02879	65	ses	NC_010067	4kqr_B	549	45.7
161612466	SPAB_00156	65	spq	NC_010102	4kqr_B	549	45.7
161503040	SARI_01104	67	ses	NC_010067	4kqr_B	539	43.0
161613654	SPAB_01378	67	spq	NC_010102	4kqr_B	539	43.4
538362430	BN855_18930	67	senb	NC_022241	4kqr_B	539	43.4
161503126	SARI_01191	98	ses	NC_010067	3ir7_B	511	92.8
161613745	SPAB_01470	98	spq	NC_010102	3ir7_B	511	93.2
161613976	SPAB_01714	98	spq	NC_010102	3ir7_B	506	80.2
378984138	STMDT12_C15970	98	sem	NC_016860	3ir7_B	506	80.0
538360694	BN855_1290	113	senb	NC_022241	1e8c_B	471	92.4
538361709	BN855_11640	119	senb	NC_022241	1o2f_B	90	93.3
161504450	SARI_02563	139	ses	NC_010067	No hit	-	-
161615466	SPAB_03237	139	spq	NC_010102	No hit	-	-
538362753	BN855_22200	140	senb	NC_022241	4i5s_B	226	30.5
378961722	STBHUCCB_37440	152	sex	NC_016832	4i5s_B	224	31.3
538364097	BN855_35800	160	senb	NC_022241	1bxd_A	161	90.7
29144086	t3806	166	stt	NC_004631	No hit	-	-
62182206	SC3636	166	sec	NC_006905	No hit	-	-
161505752	SARI_03928	166	ses	NC_010067	No hit	-	-
161616791	SPAB_04610	166	spq	NC_010102	No hit	-	-
488656245	TY21A_19335	166	sent	NC_021176	No hit	-	-
378959497	STBHUCCB_14250	179	sex	NC_016832	4i5s_B	222	27.9
538360809	BN855_2450	218	senb	NC_022241	No hit	-	-
161504097	SARI_02195	221	ses	NC_010067	No hit	-	-
161615026	SPAB_02786	221	spq	NC_010102	No hit	-	-
161505743	SARI_03919	226	ses	NC_010067	2iw1_A	374	85.8
161616800	SPAB_04619	226	spq	NC_010102	2iw1_A	374	86.4
538364332	BN855_38170	226	senb	NC_022241	2iw1_A	374	86.1
378962383	STBHUCCB_44400	246	sex	NC_016832	4i5s_B	225	29.8
538364321	BN855_38060	261	senb	NC_022241	1psw_A	346	92.5
161505747	SARI_03923	271	ses	NC_010067	1ss9_A	273	26.0
161616796	SPAB_04615	271	spq	NC_010102	1ga8_A	273	26.0
161505748	SARI_03924	272	ses	NC_010067	3tzt_B	252	27.0
161616795	SPAB_04614	272	spq	NC_010102	3tzt_B	252	25.8
161504449	SARI_02562	273	ses	NC_010067	No hit	-	-
161615465	SPAB_03236	273	spq	NC_010102	No hit	-	-
378960680	STBHUCCB_26520	273	sex	NC_016832	No hit	-	-
379699575	STM474_0375	273	seb	NC_016857	No hit	-	-
538361476	BN855_9270	282	senb	NC_022241	4itn_A	316	27.5
161503046	SARI_01110	285	ses	NC_010067	2jzh_A	170	94.7
161613660	SPAB_01384	285	spq	NC_010102	2jzh_A	170	95.3
378959115	STBHUCCB_10290	321	sex	NC_016832	1af7_A	274	99.3
161504232	SARI_02339	326	ses	NC_010067	2fq1_B	285	87.4
161615198	SPAB_02966	326	spq	NC_010102	2fq1_B	285	88.1
538361140	BN855_5890	326	senb	NC_022241	2fq1_B	285	88.4
538363806	BN855_32850	333	senb	NC_022241	No hit	-	-
161505744	SARI_03920	339	ses	NC_010067	No hit	-	-
161616799	SPAB_04618	339	spq	NC_010102	No hit	-	-
538364331	BN855_38160	339	senb	NC_022241	No hit	-	-
528818715	SN31241_20010	361	senn	NC_021902	1vh1_D	480	94
378960112	STBHUCCB_20620	361	sex	NC_016832	1vh1_D	479	94
16759510	Conserved	372	sty	NC_003198	No hit	-	-
56414314	SPA2188	372	spt	NC_006511	No hit	-	-
378698495	SL1344_0528	372	sey	NC_016810	No hit	-	-
378956078	SPUL_2424	372	sel	NC_016831	No hit	-	-
378444035	None	372	sev	NC_016854	No hit	-	-
383495341	UMN798_0581	372	sef	NC_017046	No hit	-	-
537437644	SPUCDC_2410	372	seg	NC_022221	No hit	-	-
549723245	Conserved	372	senr	NC_022544	No hit	-	-
550899973	Conserved	372	send	NC_022569	No hit	-	-
525841289	CFSAN001921_21865	384	set	NC_021814	No hit	-	-
525860398	CFSAN002050_25550	384	seeb	NC_021818	No hit	-	-
526221794	SE451236_02340	384	seen	NC_021820	No hit	-	-
525949065	SEEB0189_01285	384	see	NC_021844	No hit	-	-
529222678	I137_18460	384	seep	NC_021984	No hit	-	-
549482315	IA1_18065	384	sene	NC_022525	No hit	-	-
161502511	SARI_00555	465	ses	NC_010067	3oxp_B	147	44.2
161612923	SPAB_00629	465	spq	NC_010102	3oxp_B	147	44.2
378984906	STMDT12_C23650	465	sem	NC_016860	3oxp_B	147	44.2
16767539	STM4289	498	stm	NC_003197	2akl_A	110	68.2
16762971	Conserved	498	sty	NC_003198	2akl_A	110	68.2
29144458	t4196	498	stt	NC_004631	2akl_A	110	68.2
56416088	SPA4107	498	spt	NC_006511	2akl_A	110	68.2
62182738	SC4168	498	sec	NC_006905	2akl_A	110	68.2
161505231	SARI_03369	498	ses	NC_010067	2akl_A	92	66.3
161617431	SPAB_05288	498	spq	NC_010102	2akl_A	110	68.2
194444767	SNSL254_A4635	498	seeh	NC_011080	2akl_A	110	68.2
194448085	SeHA_C4635	498	seh	NC_011083	2akl_A	110	68.2
194735822	SeSA_A4544	498	sew	NC_011094	2akl_A	110	68.2
197365014	SSPA3814	498	sek	NC_011147	2akl_A	110	68.2
197249113	SeAg_B4551	498	sea	NC_011149	2akl_A	110	68.2
198243014	SeD_A4684	498	sed	NC_011205	2akl_A	110	68.2
205355060	SG4134	498	sega	NC_011274	2akl_A	110	67.3
207859443	SEN4060	498	setc	NC_011294	2akl_A	110	67.3
224586054	SPC_4352	498	sei	NC_012125	2akl_A	110	68.2
378702132	SL1344_4226	498	sey	NC_016810	2akl_A	110	68.2
378957845	SPUL_4281	498	sel	NC_016831	2akl_A	110	67.3
378962381	STBHUCCB_44380	498	sex	NC_016832	2akl_A	110	68.2
378447608	None	498	sev	NC_016854	2akl_A	110	68.2
378453234	STM14_5159	498	seo	NC_016856	2akl_A	110	68.2
378986964	STMDT12_C44240	498	sem	NC_016860	2akl_A	110	68.2
378991557	STMUK_4274	498	sej	NC_016863	2akl_A	110	68.2
383498867	UMN798_4648	498	sef	NC_017046	2akl_A	110	68.2
452121975	CFSAN001992_12425	498	senj	NC_020307	2akl_A	110	68.2
482906826	STU288_21535	498	setu	NC_021151	2akl_A	110	68.2
488656631	TY21A_21335	498	sent	NC_021176	2akl_A	110	68.2
525815577	SEEH1578_07620	498	seec	NC_021810	2akl_A	110	68.2
525828145	CFSAN002069_10645	498	senh	NC_021812	2akl_A	110	68.2
525839753	CFSAN001921_18970	498	set	NC_021814	2akl_A	110	68.2
525856209	CFSAN002050_04690	498	seeb	NC_021818	2akl_A	110	68.2
526218734	SE451236_04480	498	seen	NC_021820	2akl_A	110	68.2
525948743	SEEB0189_20995	498	see	NC_021844	2akl_A	110	68.2
529221780	I137_20500	498	seep	NC_021984	2akl_A	110	67.3
537439413	SPUCDC_4267	498	seg	NC_022221	2akl_A	110	67.3
549481441	IA1_20890	498	sene	NC_022525	2akl_A	110	68.2
549726803	Conserved	498	senr	NC_022544	2akl_A	110	68.2
550903633	Conserved	498	send	NC_022569	2akl_A	110	68.2

Open in a new tab

Later on, we performed a BLASTp search using 114 hypothetical sequences as ‘query’ and sequences of PDB as ‘database’. It was performed so that if there is any homology in already well characterized PDB database then it may lead us to classify the hypothetical proteins. The BLASTp showed hits against 81 queries with the PDB database while rest (i.e. 33) queries showed no hits (Table 5). The names of obtained hits for 81 queries were manually matched with the corresponding 24 COPs. The leftover 33 queries for which no similarity was found in PDB database were subjected to the bioinformatics tools i.e SVM–Prot and InterProScan. The obtained results for the 33 ‘no hits’ were confirmed by matching their names with the respective COPs. All results verified the output of CD-HIT clustering.

4. Validation from the literature

A similar study was performed by Becker et. al. using experimental techniques, so we have compared our results obtained from in silico approach. We also looked in the DrugBank of the possible entry of any drug target(s) against Salmonella. The DrugBank [35] reported 19 drug targets of S. enterica. 11 out of 19 belonged to the human, while remaining 8 belonged to the bacteria. The oxygen-insensitive NADPH Nitro reductase was common in 35 strains only. Other five did not belong to UMP. Only one (i.e. Penicillin-binding protein) out of 8 genes was present in the output of current strategy. Results are summarized in Table 6. Becker and his coworkers [20] have reported 155 essential enzymes for S. enterica serovar Typhimurium strain LT2, and compared those with various strains of S. enterica by performing extensive experimental study. We compared our identified 73 enzymes with the results of Becker and observed that 24 enzymes were shared by the reports of Becker et. al. (Table 3). Furthermore, the enzyme CheA (Chemotaxis Protein, COP # 49) was found as essential in current study while Backer et. al. suggested it as non-essential. This discrepancy may arise due to the recent updates in the DEG.

Table 6. S. enterica eight genes as drug targets–data from DrugBank.

Genes	Molecule	Output	Reason
16S rRNA	Nucleic Acid	excluded	Not the aim
30S ribosomal protein S10	Protein	X	Not in UMPs of SE
30S ribosomal protein S12	Protein	X	Not in UMPs of SE
DNA gyrase subunit A	Enzyme	X	Not in UMPs of SE
DNA topoisomerase 4 subunit A	Enzyme	X	Not in UMPs of SE
Oxygen-insensitive NADPH nitroreductase	Enzyme	X	In 35/42 strains
Penicillin-binding protein 2	Enzyme	present	Included
Probable pyruvate-flavodoxin oxidoreductase	Enzyme	X	Not in UMPs of SE

Open in a new tab

Conclusion

We have performed extensive computational analysis of S. enterica at the level of core proteome to identify new potential drug targets. Subtractive proteomics through a novel approach was applied, i.e. by considering only proteins of the unique metabolic pathways of the pathogens and mining the proteomic data of all completely sequenced strains of the pathogen, thus improving the quality and application of the results. We identified 73 enzymes that are common to 42 strains of S. enterica, belong to unique metabolic pathways, are essential for pathogen survival and which have no human homologs. These four characteristics suggest that the enzymes are potential drug targets and should be tested experimentally. We compared them to experimental data [Becker et. al] showing that 24 out of the 73 (~33%) enzymes are current drug targets. The remaining 49 enzymes are new potential drug targets. We have annotated the function of 114 hypothetical proteins unique to S. enterica, providing additional new potential drug targets. Finally, our organization of the available core proteomic data (available in S2, S4, S5 and S6 Tables) in different categories e.g. clusters, organism codes, NCBI RefSeq IDs etc, provide a basis for further studies.

Supporting Information

S1 Fig. Strategy for subtractive proteomic analysis.

(XLSX)

Click here for additional data file.^{(74.9KB, xlsx)}

S1 Table. Details of downloaded biological datasets.

(XLSX)

Click here for additional data file.^{(14KB, xlsx)}

S2 Table. Number of Genes present in Unique Metabolic Pathways of 42 strains of S. enterica.

(XLSX)

Click here for additional data file.^{(14.9KB, xlsx)}

S3 Table. Discontinued and Updated NCBI GIs of Heidelberg CFSAN002069 and Typhi CT18.

(XLSX)

Click here for additional data file.^{(11.2KB, xlsx)}

S4 Table. Cluster of Proteins (COPs) formed using CD-HIT.

(XLSX)

Click here for additional data file.^{(59KB, xlsx)}

S5 Table. Non-homologous Essential Enzymes of S. enterica 42 strains as drug targets.

(XLSX)

Click here for additional data file.^{(35.2KB, xlsx)}

S6 Table. Protein Identifiers and Names of 73 COPs.

(XLSX)

Click here for additional data file.^{(174.8KB, xlsx)}

S1 Text. Accessibility of NCBI GIs mentioned in S5 Table.

(DOCX)

Click here for additional data file.^{(305.9KB, docx)}

Acknowledgments

The authors would like to gratefully acknowledge the Higher Education Commission of Pakistan to provide fellowship during the study.

Abbreviations

KEGG: Kyoto Encyclopedia of Genes and Genomes
CD-HIT: Cluster Database at High Identity with Tolerance
DEG: Database of Essential Genes
UMP: Unique Metabolic Pathways
SVM: Support Vector Machine
KO: KEGG Orthology
FTP: File Transfer Protocol
NCBI-GI: National Center for Biotechnology Information—GenInfo Identifier
COP: Cluster of Proteins
API: Program Interface
BLAST: Basic Local Alignment Search Tool
BLASTp: Protein-Protein BLAST
VFDB: Virulence Factors Database
PDB: Protein Databank
SE: Salmonella enterica

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

The study was supported by International Foundation for Science (IFS) grant# F/5378-1. The authors would also like to gratefully acknowledge the Higher Education Commission of Pakistan for providing fellowship during the study.

References

1.Popoff MY, Bockemuhl J, Gheesling LL. Supplement 2001 (no. 45) to the Kauffmann-White scheme. Research in microbiology. 2003;154(3):173–4. Epub 2003/04/23. 10.1016/s0923-2508(03)00025-1 . [DOI] [PubMed] [Google Scholar]
2.Betancor L, Yim L, Martinez A, Fookes M, Sasias S, Schelotto F, et al. Genomic Comparison of the Closely Related Salmonella enterica Serovars Enteritidis and Dublin. The open microbiology journal. 2012;6:5–13. Epub 2012/03/01. 10.2174/1874285801206010005 ; PubMed Central PMCID: PMCPmc3282883. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Coburn B, Grassl GA, Finlay BB. Salmonella, the host and disease: a brief review. Immunology and cell biology. 2007;85(2):112–8. Epub 2006/12/06. 10.1038/sj.icb.7100007 . [DOI] [PubMed] [Google Scholar]
4.Sun JS, Hahn TW. Comparative proteomic analysis of Salmonella enterica serovars Enteritidis, Typhimurium and Gallinarum. The Journal of veterinary medical science / the Japanese Society of Veterinary Science. 2012;74(3):285–91. Epub 2011/10/15. . [DOI] [PubMed] [Google Scholar]
5.Fierer J, Guiney DG. Diverse virulence traits underlying different clinical outcomes of Salmonella infection. The Journal of clinical investigation. 2001;107(7):775–80. Epub 2001/04/04. 10.1172/jci12561 ; PubMed Central PMCID: PMCPmc199580. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Patel JC, Galan JE. Manipulation of the host actin cytoskeleton by Salmonella—all in the name of entry. Current opinion in microbiology. 2005;8(1):10–5. Epub 2005/02/08. 10.1016/j.mib.2004.09.001 . [DOI] [PubMed] [Google Scholar]
7.Haraga A, Ohlson MB, Miller SI. Salmonellae interplay with host cells. Nature reviews Microbiology. 2008;6(1):53–66. Epub 2007/11/21. 10.1038/nrmicro1788 . [DOI] [PubMed] [Google Scholar]
8.Wangdi T, Winter SE, Baumler AJ. Typhoid fever: "you can't hit what you can't see". Gut microbes. 2012;3(2):88–92. Epub 2011/12/14. 10.4161/gmic.18602 ; PubMed Central PMCID: PMCPmc3370952. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Carter PB, Collins FM. The route of enteric infection in normal mice. The Journal of experimental medicine. 1974;139(5):1189–203. Epub 1974/05/01. ; PubMed Central PMCID: PMCPmc2139651. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Mastroeni P, Grant A. Dynamics of spread of Salmonella enterica in the systemic compartment. Microbes and infection / Institut Pasteur. 2013;15(13):849–57. Epub 2013/11/05. 10.1016/j.micinf.2013.10.003 . [DOI] [PubMed] [Google Scholar]
11.Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proceedings of the National Academy of Sciences of the United States of America. 2005;102(39):13950–5. Epub 2005/09/21. 10.1073/pnas.0506758102 ; PubMed Central PMCID: PMCPmc1216834. [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Deng X, Phillippy AM, Li Z, Salzberg SL, Zhang W. Probing the pan-genome of Listeria monocytogenes: new insights into intraspecific niche expansion and genomic diversification. BMC genomics. 2010;11:500 Epub 2010/09/18. 10.1186/1471-2164-11-500 ; PubMed Central PMCID: PMCPmc2996996. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ, Angiuoli SV, et al. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome biology. 2010;11(10):R107 Epub 2010/11/03. 10.1186/gb-2010-11-10-r107 ; PubMed Central PMCID: PMCPmc3218663. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Hao P, Zheng H, Yu Y, Ding G, Gu W, Chen S, et al. Complete sequencing and pan-genomic analysis of Lactobacillus delbrueckii subsp. bulgaricus reveal its genetic basis for industrial yogurt production. PloS one. 2011;6(1):e15964 Epub 2011/01/26. 10.1371/journal.pone.0015964 ; PubMed Central PMCID: PMCPmc3022021. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. Journal of bacteriology. 2008;190(20):6881–93. Epub 2008/08/05. 10.1128/jb.00619-08 ; PubMed Central PMCID: PMCPmc2566221. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Lilburn TG, Cai H, Gu J, editors. The Core and Pan-Genome of the Vibrionaceae. Bioinformatics, Systems Biology and Intelligent Computing, 2009 IJCBS'09 International Joint Conference on; 2009: IEEE.
17.Yang L, Tan J, O'Brien EJ, Monk JM, Kim D, Li HJ, et al. Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data. Proceedings of the National Academy of Sciences of the United States of America. 2015;112(34):10810–5. 10.1073/pnas.1501384112 . [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Zhang L, Xiao D, Pang B, Zhang Q, Zhou H, Zhang L, et al. The core proteome and pan proteome of Salmonella Paratyphi A epidemic strains. PloS one. 2014;9(2):e89197 Epub 2014/03/04. 10.1371/journal.pone.0089197 ; PubMed Central PMCID: PMCPmc3933413. [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Jacobsen A, Hendriksen RS, Aaresturp FM, Ussery DW, Friis C. The Salmonella enterica pan-genome. Microbial ecology. 2011;62(3):487–504. Epub 2011/06/07. 10.1007/s00248-011-9880-1 ; PubMed Central PMCID: PMCPmc3175032. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Becker D, Selbach M, Rollenhagen C, Ballmaier M, Meyer TF, Mann M, et al. Robust Salmonella metabolism limits possibilities for new antimicrobials. Nature. 2006;440(7082):303–7. Epub 2006/03/17. 10.1038/nature04616 . [DOI] [PubMed] [Google Scholar]
21.Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic acids research. 2014;42(Database issue):D199–205. Epub 2013/11/12. 10.1093/nar/gkt1076 . [DOI] [PMC free article] [PubMed] [Google Scholar]
22.NCBI. NCBI FTP server 2013 [cited 2013 December 21]. Available: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/.
23.Hallam S. Fast Blast 2013 [cited December, 2014]. Available: http://www.cmde.science.ubc.ca/hallam/fastblast.php.
24.Luo H, Lin Y, Gao F, Zhang CT, Zhang R. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic acids research. 2014;42(Database issue):D574–80. Epub 2013/11/19. 10.1093/nar/gkt1131 . [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics (Oxford, England). 2012;28(23):3150–2. Epub 2012/10/13. 10.1093/bioinformatics/bts565 ; PubMed Central PMCID: PMCPmc3516142. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.HCV-Sequence-Database. ElimDupes [December, 2014]. Available: http://hcv.lanl.gov/content/sequence/ELIMDUPES/elimdupes.html.
27.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC bioinformatics. 2009;10:421 Epub 2009/12/17. 10.1186/1471-2105-10-421 ; PubMed Central PMCID: PMCPmc2803857. [DOI] [PMC free article] [PubMed] [Google Scholar]
28.NCBI. NCBI FTP server 2014 [updated January 6; cited 2014 January 11]. Available: ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/.
29.Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome research. 2002;12(10):1611–8. Epub 2002/10/09. 10.1101/gr.361602 ; PubMed Central PMCID: PMCPmc187536. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Chen L, Xiong Z, Sun L, Yang J, Jin Q. VFDB 2012 update: toward the genetic diversity and molecular evolution of bacterial virulence factors. Nucleic acids research. 2012;40(Database issue):D641–5. Epub 2011/11/10. 10.1093/nar/gkr989 ; PubMed Central PMCID: PMCPmc3245122. [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, Dutta S, et al. The RCSB Protein Data Bank: new resources for research and education. Nucleic acids research. 2013;41(Database issue):D475–82. Epub 2012/11/30. 10.1093/nar/gks1200 ; PubMed Central PMCID: PMCPmc3531086. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.PDB. RCSB PDB FTP server 2014 [updated October 1; cited 2014 January 18]. Available: ftp://ftp.wwpdb.org/pub/pdb/derived_data/.
33.Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ. SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic acids research. 2003;31(13):3692–7. Epub 2003/06/26. ; PubMed Central PMCID: PMCPmc169006. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, et al. Analysis Tool Web Services from the EMBL-EBI. Nucleic acids research. 2013;41(Web Server issue):W597–600. Epub 2013/05/15. 10.1093/nar/gkt376 ; PubMed Central PMCID: PMCPmc3692137. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, et al. DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic acids research. 2011;39(Database issue):D1035–41. Epub 2010/11/10. 10.1093/nar/gkq1126 ; PubMed Central PMCID: PMCPmc3013709. [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Huang da W, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, et al. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic acids research. 2007;35(Web Server issue):W169–75. Epub 2007/06/20. 10.1093/nar/gkm415 ; PubMed Central PMCID: PMCPmc1933169. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Cohen P, Alessi DR. Kinase drug discovery—what's next in the field? ACS chemical biology. 2013;8(1):96–104. Epub 2013/01/02. 10.1021/cb300610s ; PubMed Central PMCID: PMCPmc4208300. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Teh BA, Choi SB, Musa N, Ling FL, Cun ST, Salleh AB, et al. Structure to function prediction of hypothetical protein KPN_00953 (Ycbk) from Klebsiella pneumoniae MGH 78578 highlights possible role in cell wall metabolism. BMC structural biology. 2014;14:7 Epub 2014/02/07. 10.1186/1472-6807-14-7 ; PubMed Central PMCID: PMCPmc3927764. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Naqvi AA, Shahbaaz M, Ahmad F, Hassan MI. Identification of Functional Candidates amongst Hypothetical Proteins of Treponema pallidum ssp. pallidum. PloS one. 2015;10(4):e0124177 Epub 2015/04/22. 10.1371/journal.pone.0124177 ; PubMed Central PMCID: PMCPmc4403809. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Ravooru N, Ganji S, Sathyanarayanan N, Nagendra HG. Insilico analysis of hypothetical proteins unveils putative metabolic pathways and essential genes in Leishmania donovani. Frontiers in genetics. 2014;5:291 Epub 2014/09/11. 10.3389/fgene.2014.00291 ; PubMed Central PMCID: PMCPmc4144268. [DOI] [PMC free article] [PubMed] [Google Scholar]
41.Shahbaaz M, Hassan MI, Ahmad F. Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20. PloS one. 2013;8(12):e84263 Epub 2014/01/07. 10.1371/journal.pone.0084263 ; PubMed Central PMCID: PMCPmc3877243. [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Cui T, Zhang L, Wang X, He ZG. Uncovering new signaling proteins and potential drug targets through the interactome analysis of Mycobacterium tuberculosis. BMC genomics. 2009;10:118 Epub 2009/03/21. 10.1186/1471-2164-10-118 ; PubMed Central PMCID: PMCPmc2671525. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Strategy for subtractive proteomic analysis.

(XLSX)

Click here for additional data file.^{(74.9KB, xlsx)}

S1 Table. Details of downloaded biological datasets.

(XLSX)

Click here for additional data file.^{(14KB, xlsx)}

S2 Table. Number of Genes present in Unique Metabolic Pathways of 42 strains of S. enterica.

(XLSX)

Click here for additional data file.^{(14.9KB, xlsx)}

S3 Table. Discontinued and Updated NCBI GIs of Heidelberg CFSAN002069 and Typhi CT18.

(XLSX)

Click here for additional data file.^{(11.2KB, xlsx)}

S4 Table. Cluster of Proteins (COPs) formed using CD-HIT.

(XLSX)

Click here for additional data file.^{(59KB, xlsx)}

S5 Table. Non-homologous Essential Enzymes of S. enterica 42 strains as drug targets.

(XLSX)

Click here for additional data file.^{(35.2KB, xlsx)}

S6 Table. Protein Identifiers and Names of 73 COPs.

(XLSX)

Click here for additional data file.^{(174.8KB, xlsx)}

S1 Text. Accessibility of NCBI GIs mentioned in S5 Table.

(DOCX)

Click here for additional data file.^{(305.9KB, docx)}

Data Availability Statement

All relevant data are within the paper and its Supporting Information files.

[pone.0146796.ref001] 1.Popoff MY, Bockemuhl J, Gheesling LL. Supplement 2001 (no. 45) to the Kauffmann-White scheme. Research in microbiology. 2003;154(3):173–4. Epub 2003/04/23. 10.1016/s0923-2508(03)00025-1 . [DOI] [PubMed] [Google Scholar]

[pone.0146796.ref002] 2.Betancor L, Yim L, Martinez A, Fookes M, Sasias S, Schelotto F, et al. Genomic Comparison of the Closely Related Salmonella enterica Serovars Enteritidis and Dublin. The open microbiology journal. 2012;6:5–13. Epub 2012/03/01. 10.2174/1874285801206010005 ; PubMed Central PMCID: PMCPmc3282883. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref003] 3.Coburn B, Grassl GA, Finlay BB. Salmonella, the host and disease: a brief review. Immunology and cell biology. 2007;85(2):112–8. Epub 2006/12/06. 10.1038/sj.icb.7100007 . [DOI] [PubMed] [Google Scholar]

[pone.0146796.ref004] 4.Sun JS, Hahn TW. Comparative proteomic analysis of Salmonella enterica serovars Enteritidis, Typhimurium and Gallinarum. The Journal of veterinary medical science / the Japanese Society of Veterinary Science. 2012;74(3):285–91. Epub 2011/10/15. . [DOI] [PubMed] [Google Scholar]

[pone.0146796.ref005] 5.Fierer J, Guiney DG. Diverse virulence traits underlying different clinical outcomes of Salmonella infection. The Journal of clinical investigation. 2001;107(7):775–80. Epub 2001/04/04. 10.1172/jci12561 ; PubMed Central PMCID: PMCPmc199580. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref006] 6.Patel JC, Galan JE. Manipulation of the host actin cytoskeleton by Salmonella—all in the name of entry. Current opinion in microbiology. 2005;8(1):10–5. Epub 2005/02/08. 10.1016/j.mib.2004.09.001 . [DOI] [PubMed] [Google Scholar]

[pone.0146796.ref007] 7.Haraga A, Ohlson MB, Miller SI. Salmonellae interplay with host cells. Nature reviews Microbiology. 2008;6(1):53–66. Epub 2007/11/21. 10.1038/nrmicro1788 . [DOI] [PubMed] [Google Scholar]

[pone.0146796.ref008] 8.Wangdi T, Winter SE, Baumler AJ. Typhoid fever: "you can't hit what you can't see". Gut microbes. 2012;3(2):88–92. Epub 2011/12/14. 10.4161/gmic.18602 ; PubMed Central PMCID: PMCPmc3370952. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref009] 9.Carter PB, Collins FM. The route of enteric infection in normal mice. The Journal of experimental medicine. 1974;139(5):1189–203. Epub 1974/05/01. ; PubMed Central PMCID: PMCPmc2139651. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref010] 10.Mastroeni P, Grant A. Dynamics of spread of Salmonella enterica in the systemic compartment. Microbes and infection / Institut Pasteur. 2013;15(13):849–57. Epub 2013/11/05. 10.1016/j.micinf.2013.10.003 . [DOI] [PubMed] [Google Scholar]

[pone.0146796.ref011] 11.Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". Proceedings of the National Academy of Sciences of the United States of America. 2005;102(39):13950–5. Epub 2005/09/21. 10.1073/pnas.0506758102 ; PubMed Central PMCID: PMCPmc1216834. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref012] 12.Deng X, Phillippy AM, Li Z, Salzberg SL, Zhang W. Probing the pan-genome of Listeria monocytogenes: new insights into intraspecific niche expansion and genomic diversification. BMC genomics. 2010;11:500 Epub 2010/09/18. 10.1186/1471-2164-11-500 ; PubMed Central PMCID: PMCPmc2996996. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref013] 13.Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ, Angiuoli SV, et al. Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species. Genome biology. 2010;11(10):R107 Epub 2010/11/03. 10.1186/gb-2010-11-10-r107 ; PubMed Central PMCID: PMCPmc3218663. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref014] 14.Hao P, Zheng H, Yu Y, Ding G, Gu W, Chen S, et al. Complete sequencing and pan-genomic analysis of Lactobacillus delbrueckii subsp. bulgaricus reveal its genetic basis for industrial yogurt production. PloS one. 2011;6(1):e15964 Epub 2011/01/26. 10.1371/journal.pone.0015964 ; PubMed Central PMCID: PMCPmc3022021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref015] 15.Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, et al. The pangenome structure of Escherichia coli: comparative genomic analysis of E. coli commensal and pathogenic isolates. Journal of bacteriology. 2008;190(20):6881–93. Epub 2008/08/05. 10.1128/jb.00619-08 ; PubMed Central PMCID: PMCPmc2566221. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref016] 16.Lilburn TG, Cai H, Gu J, editors. The Core and Pan-Genome of the Vibrionaceae. Bioinformatics, Systems Biology and Intelligent Computing, 2009 IJCBS'09 International Joint Conference on; 2009: IEEE.

[pone.0146796.ref017] 17.Yang L, Tan J, O'Brien EJ, Monk JM, Kim D, Li HJ, et al. Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data. Proceedings of the National Academy of Sciences of the United States of America. 2015;112(34):10810–5. 10.1073/pnas.1501384112 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref018] 18.Zhang L, Xiao D, Pang B, Zhang Q, Zhou H, Zhang L, et al. The core proteome and pan proteome of Salmonella Paratyphi A epidemic strains. PloS one. 2014;9(2):e89197 Epub 2014/03/04. 10.1371/journal.pone.0089197 ; PubMed Central PMCID: PMCPmc3933413. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref019] 19.Jacobsen A, Hendriksen RS, Aaresturp FM, Ussery DW, Friis C. The Salmonella enterica pan-genome. Microbial ecology. 2011;62(3):487–504. Epub 2011/06/07. 10.1007/s00248-011-9880-1 ; PubMed Central PMCID: PMCPmc3175032. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref020] 20.Becker D, Selbach M, Rollenhagen C, Ballmaier M, Meyer TF, Mann M, et al. Robust Salmonella metabolism limits possibilities for new antimicrobials. Nature. 2006;440(7082):303–7. Epub 2006/03/17. 10.1038/nature04616 . [DOI] [PubMed] [Google Scholar]

[pone.0146796.ref021] 21.Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic acids research. 2014;42(Database issue):D199–205. Epub 2013/11/12. 10.1093/nar/gkt1076 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref022] 22.NCBI. NCBI FTP server 2013 [cited 2013 December 21]. Available: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/.

[pone.0146796.ref023] 23.Hallam S. Fast Blast 2013 [cited December, 2014]. Available: http://www.cmde.science.ubc.ca/hallam/fastblast.php.

[pone.0146796.ref024] 24.Luo H, Lin Y, Gao F, Zhang CT, Zhang R. DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic acids research. 2014;42(Database issue):D574–80. Epub 2013/11/19. 10.1093/nar/gkt1131 . [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref025] 25.Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics (Oxford, England). 2012;28(23):3150–2. Epub 2012/10/13. 10.1093/bioinformatics/bts565 ; PubMed Central PMCID: PMCPmc3516142. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref026] 26.HCV-Sequence-Database. ElimDupes [December, 2014]. Available: http://hcv.lanl.gov/content/sequence/ELIMDUPES/elimdupes.html.

[pone.0146796.ref027] 27.Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC bioinformatics. 2009;10:421 Epub 2009/12/17. 10.1186/1471-2105-10-421 ; PubMed Central PMCID: PMCPmc2803857. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref028] 28.NCBI. NCBI FTP server 2014 [updated January 6; cited 2014 January 11]. Available: ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/mRNA_Prot/.

[pone.0146796.ref029] 29.Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome research. 2002;12(10):1611–8. Epub 2002/10/09. 10.1101/gr.361602 ; PubMed Central PMCID: PMCPmc187536. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref030] 30.Chen L, Xiong Z, Sun L, Yang J, Jin Q. VFDB 2012 update: toward the genetic diversity and molecular evolution of bacterial virulence factors. Nucleic acids research. 2012;40(Database issue):D641–5. Epub 2011/11/10. 10.1093/nar/gkr989 ; PubMed Central PMCID: PMCPmc3245122. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref031] 31.Rose PW, Bi C, Bluhm WF, Christie CH, Dimitropoulos D, Dutta S, et al. The RCSB Protein Data Bank: new resources for research and education. Nucleic acids research. 2013;41(Database issue):D475–82. Epub 2012/11/30. 10.1093/nar/gks1200 ; PubMed Central PMCID: PMCPmc3531086. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref032] 32.PDB. RCSB PDB FTP server 2014 [updated October 1; cited 2014 January 18]. Available: ftp://ftp.wwpdb.org/pub/pdb/derived_data/.

[pone.0146796.ref033] 33.Cai CZ, Han LY, Ji ZL, Chen X, Chen YZ. SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic acids research. 2003;31(13):3692–7. Epub 2003/06/26. ; PubMed Central PMCID: PMCPmc169006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref034] 34.McWilliam H, Li W, Uludag M, Squizzato S, Park YM, Buso N, et al. Analysis Tool Web Services from the EMBL-EBI. Nucleic acids research. 2013;41(Web Server issue):W597–600. Epub 2013/05/15. 10.1093/nar/gkt376 ; PubMed Central PMCID: PMCPmc3692137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref035] 35.Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, et al. DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic acids research. 2011;39(Database issue):D1035–41. Epub 2010/11/10. 10.1093/nar/gkq1126 ; PubMed Central PMCID: PMCPmc3013709. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref036] 36.Huang da W, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, et al. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic acids research. 2007;35(Web Server issue):W169–75. Epub 2007/06/20. 10.1093/nar/gkm415 ; PubMed Central PMCID: PMCPmc1933169. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref037] 37.Cohen P, Alessi DR. Kinase drug discovery—what's next in the field? ACS chemical biology. 2013;8(1):96–104. Epub 2013/01/02. 10.1021/cb300610s ; PubMed Central PMCID: PMCPmc4208300. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref038] 38.Teh BA, Choi SB, Musa N, Ling FL, Cun ST, Salleh AB, et al. Structure to function prediction of hypothetical protein KPN_00953 (Ycbk) from Klebsiella pneumoniae MGH 78578 highlights possible role in cell wall metabolism. BMC structural biology. 2014;14:7 Epub 2014/02/07. 10.1186/1472-6807-14-7 ; PubMed Central PMCID: PMCPmc3927764. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref039] 39.Naqvi AA, Shahbaaz M, Ahmad F, Hassan MI. Identification of Functional Candidates amongst Hypothetical Proteins of Treponema pallidum ssp. pallidum. PloS one. 2015;10(4):e0124177 Epub 2015/04/22. 10.1371/journal.pone.0124177 ; PubMed Central PMCID: PMCPmc4403809. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref040] 40.Ravooru N, Ganji S, Sathyanarayanan N, Nagendra HG. Insilico analysis of hypothetical proteins unveils putative metabolic pathways and essential genes in Leishmania donovani. Frontiers in genetics. 2014;5:291 Epub 2014/09/11. 10.3389/fgene.2014.00291 ; PubMed Central PMCID: PMCPmc4144268. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref041] 41.Shahbaaz M, Hassan MI, Ahmad F. Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20. PloS one. 2013;8(12):e84263 Epub 2014/01/07. 10.1371/journal.pone.0084263 ; PubMed Central PMCID: PMCPmc3877243. [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0146796.ref042] 42.Cui T, Zhang L, Wang X, He ZG. Uncovering new signaling proteins and potential drug targets through the interactome analysis of Mycobacterium tuberculosis. BMC genomics. 2009;10:118 Epub 2009/03/21. 10.1186/1471-2164-10-118 ; PubMed Central PMCID: PMCPmc2671525. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Core Proteomic Analysis of Unique Metabolic Pathways of Salmonella enterica for the Identification of Potential Drug Targets

Reaz Uddin

Muhammad Sufian

Roles

Abstract

Background

Methods

Results

Introduction

Methodology

Fig 1. Methodology.

1. Identification of UMPs of S. enterica

2. Clustering common proteins of UMPs of 42 strains

3. Searching of non-homologous essential enzymes

Table 1. Parameters for BLASTp.

4. Searching the virulent genes

5. Characterization of the hypothetical proteins

6. Validation from the literature:

Results and Discussion

1. Identification of UMPs of S. enterica

Table 2. Details of Metabolic Pathways and Genes of human and 42 strains of S. enterica.

Fig 2. Heatmap of genes in UMPs of S. enterica strains.

2. Clustering common proteins of UMPs of 42 strains and searching of non-homologous essential enzymes

Table 3. Functional characterization of non-homologous COPs.

Fig 3. Enzyme classification of 73 potential drug targets.

Table 4. Enzyme Classification of 73 drug targets.

3. Characterization of the hypothetical proteins

Table 5. BLASTp of Hypothetical Proteins in non-homologous COPs in PDB.

4. Validation from the literature

Table 6. S. enterica eight genes as drug targets–data from DrugBank.

Conclusion

Supporting Information

Acknowledgments

Abbreviations

Data Availability

Funding Statement

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases