Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2013 May 21;29(14):1821–1822. doi: 10.1093/bioinformatics/btt289

TiPs: a database of therapeutic targets in pathogens and associated tools

Rosalba Lepore 1, Anna Tramontano 1,2,3,*, Allegra Via 1,*
PMCID: PMC3702258  PMID: 23698860

Abstract

Motivation: The need for new drugs and new targets is particularly compelling in an era that is witnessing an alarming increase of drug resistance in human pathogens. The identification of new targets of known drugs is a promising approach, which has proven successful in several cases. Here, we describe a database that includes information on 5153 putative drug–target pairs for 150 human pathogens derived from available drug–target crystallographic complexes.

Availability and implementation: The TiPs database is freely available at http://biocomputing.it/tips.

Contact: anna.tramontano@uniroma1.it or allegra.via@uniroma1.it

1 INTRODUCTION

Novel mechanisms to escape therapy are constantly emerging among human pathogen populations, and this clearly urges the development, on one hand, of new drugs for the treatment of the diseases and, on the other hand, of rapid and effective methods to help expand the landscape of available treatment options (Hopkins et al., 2011). In this context, computational studies are called on to help identify novel therapeutic targets and characterize their interactions, and indeed a number of such efforts are described in the literature (Aguero et al., 2008; Kinnings et al., 2010; Lepore et al., 2011; Orti et al., 2009). However, these are mostly devoted to the analysis of single targets or specific tropical disease pathogens.

The TiPs database has been developed with the aim of facilitating the identification of new therapeutic targets in >150 organisms responsible for human infections. We performed a large-scale analysis to systematically identify candidate targets in the proteomes of such organisms. The rationale of our approach is based on the intrinsic polypharmacological behaviour of compounds targeting homologous proteins (Paolini et al., 2006). We considered all drug–target pairs for which the 3D structure of the complex is experimentally known and used the sequence of the target to identify its homologues in human pathogens. The evolutionary conservation of such homologues and their 3D structures (available or predicted) were used to verify whether the original drug was in principle able to bind them as it does the original target. To this aim, stringent filters were applied to ensure that predicted binding sites and their interactions with the drug are as accurate as possible. Pathogen proteins predicted with high confidence to be therapeutic targets and the putative drugs interacting with them were collected and annotated in TiPs.

2 METHODS

More than 400 human pathogen species were obtained from ‘The Approved List of Biological Agents’ provided by the Advisory Committee on Dangerous Pathogens. To unambiguously assign an identifier (ID) to human pathogens, the names of the organisms were mapped onto the NCBI Taxonomy Database records (http://www.ncbi.nlm.nih.gov/Taxonomy/).

Drug compounds and information on their molecular targets were obtained from DrugBank (http://www.drugbank.ca). The SMILE IDs of drugs annotated either as ‘inhibitor’, ‘agonist’ or ‘antagonist’ were used to associate them with ligands present in the PDB structure entries (Berman et al., 2012). Only identical compounds were considered (Tanimoto coefficient = 1). A total of 308 distinct drugs were observed in complex with at least one PDB structure. About 40% of these (119/308) occur in complex with their actual pharmaceutical target. These were used as starting points to predict potential drug targets in pathogens. The search for homologues in pathogens was performed using BLAST+ (Camacho et al., 2009) with default parameters against the nr database (ftp://ftp.ncbi.nlm.nih.gov/blast/db/). We only retained highly reliable hits, i.e. those showing at least 40% sequence identity to the original target and e-value < 106. Pathogen taxonomic IDs were retrieved by matching the gi numbers of BLAST hits to the NCBI Taxonomy database.

For each known drug–target complex, we defined the binding site as the subset of target residues having at least one atom within 3.5 Å distance from any atom of the drug. The drug-binding site residues in the predicted pathogen sequences were retrieved through a multiple sequence alignment (MSA) of the original target sequence with its homologues generated with T-coffee (Taly et al., 2011). The number and type of aligned residues were used to classify the binding site local conservation, both in terms of sequence coverage (percentage of binding site residues in the original target that could be aligned to the pathogen sequence) and identity (percentage of identical residues among the aligned binding site residues). Coverage and identity percentages were calculated separately for each pathogen sequence in the alignment. Only pathogen proteins showing at least 80% coverage in their binding sites were further considered (4215). Among these 4215 reliable putative targets, only 41 have a solved structure in the PDB. Homology modelling (Kopp and Schwede, 2004) was used to predict the structure of the remaining ones as follows: for each pathogen sequence, an MSA was generated using three iterations of HHblits (Remmert et al., 2012) (with default parameters) on the non-redundant Uniprot database. The MSA was used as HHsearch query to search for templates in the PDB70 database. We only selected templates with at least 40% sequence identity (and e-value < 105) with the pathogen query sequence. If more than one template was found, the one with the highest coverage to the pathogen sequence was selected. Models were generated using the Modeller software. Note that the best template used to build the model corresponds to the original structure in the drug–target complex only in 153 cases, whereas in all the other cases, the best template was a different structure.

The binding site residues of the original complex and of the predicted target were structurally superimposed using the LGA software (Zemla, 2003). Subsequently, the ligands were transferred into the structure or model of the pathogen proteins that could be successfully superimposed <5 Å distance to the known target. Binding sites in the modelled structures were analysed for the occurrence of nearby insertions/deletions. These cases are suitably highlighted in the TiPs database search output. This allows users to analyse them to establish the likelihood that their presence affects the conformation of the binding site.

3 RESULTS

TiPs currently contains 4071 candidate pathogen target structures involved in 5153 different drug–target complexes in 150 pathogens. All entries are thoroughly annotated with both sequence and functional information. The database can be queried by organism name (genus or specie name), protein family or function (EC number, GO terms and Pfam), as well as UniProt ID. The query returns a sortable table providing information about both known and predicted drug–target pairs and links to visualize specific information on the drug(s) (physicochemical properties, structure, indication and side effects), the target(s) [UniProt annotation and PDB structure(s)] and to visually analyse or download their 3D complexes. Ligplot (Laskowski and Swindells, 2011) drawings of both the known and inferred binding sites in complex with the drug are available as well (Fig. 1).

Fig. 1.

Fig. 1.

The figure shows the results of ‘all pathogens’ filtered by the ‘ATP binding’ GO term query in the TiPs database. The output table lists all putative pathogen targets. Each table row reports the known and predicted target UniProt IDs, their overall sequence identity, their binding site identity and rmsd, whether there are clashes between the known drug and the predicted target, and whether there are insertions or deletions nearby the binding site in the alignment used to model the protein. For each hit, the system also shows details of the structure(s) and the binding site(s) in a Jmol window and the corresponding Ligplot drawings

ACKNOWLEDGEMENT

The authors are grateful to all members of the group for useful suggestions.

Funding: This work was supported by the King Abdullah University of Science and Technology (KAUST), Award No. KUK-I1-012-43, PRIN 20108XYHJS and FIRB RBIN06E9Z8_005.

Conflict of Interest: none declared.

REFERENCES

  1. Aguero F, et al. Genomic-scale prioritization of drug targets: the TDR targets database. Nat. Rev. Drug Discov. 2008;7:900–907. doi: 10.1038/nrd2684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Berman HM, et al. The protein data bank at 40: reflecting on the past to prepare for the future. Structure. 2012;20:391–396. doi: 10.1016/j.str.2012.01.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Hopkins AL, et al. Rapid analysis of pharmacology for infectious diseases. Curr. Top. Med. Chem. 2011;11:1292–1300. doi: 10.2174/156802611795429130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Kinnings SL, et al. The Mycobacterium tuberculosis drugome and its polypharmacological implications. PLoS Comput. Biol. 2010;6:e1000976. doi: 10.1371/journal.pcbi.1000976. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Kopp J, Schwede T. Automated protein structure homology modeling: a progress report. Pharmacogenomics. 2004;5:405–416. doi: 10.1517/14622416.5.4.405. [DOI] [PubMed] [Google Scholar]
  7. Laskowski RA, Swindells MB. LigPlot+: multiple ligand-protein interaction diagrams for drug discovery. J. Chem. Inf. Model. 2011;51:2778–2786. doi: 10.1021/ci200227u. [DOI] [PubMed] [Google Scholar]
  8. Lepore R, et al. Identification of the Schistosoma mansoni molecular target for the antimalarial drug artemether. J. Chem. Inf. Model. 2011;51:3005–3016. doi: 10.1021/ci2001764. [DOI] [PubMed] [Google Scholar]
  9. Orti L, et al. A kernel for open source drug discovery in tropical diseases. PLoS Negl. Trop. Dis. 2009;3:e418. doi: 10.1371/journal.pntd.0000418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Paolini GV, et al. Global mapping of pharmacological space. Nat. Biotechnol. 2006;24:805–815. doi: 10.1038/nbt1228. [DOI] [PubMed] [Google Scholar]
  11. Remmert M, et al. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat. Methods. 2012;9:173–175. doi: 10.1038/nmeth.1818. [DOI] [PubMed] [Google Scholar]
  12. Taly JF, et al. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures. Nat. Protoc. 2011;6:1669–1682. doi: 10.1038/nprot.2011.393. [DOI] [PubMed] [Google Scholar]
  13. Zemla A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Res. 2003;31:3370–3374. doi: 10.1093/nar/gkg571. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES