Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Feb 15.
Published in final edited form as: Bioorg Med Chem Lett. 2010 Jan 11;20(4):1338–1343. doi: 10.1016/j.bmcl.2010.01.017

The Privileged Chemical Space Predictor (PCSP): A computer program that identifies privileged chemical space from screens of modularly assembled chemical libraries

Steven J Seedhouse 1, Lucas P Labuda 1, Matthew D Disney 1,*
PMCID: PMC2825671  NIHMSID: NIHMS169869  PMID: 20097562

Abstract

Modularly assembled combinatorial libraries are often used to identify ligands that bind to and modulate the function of a protein or a nucleic acid. Much of the data from screening these compounds, however, is not efficiently utilized to define structure-activity relationships (SAR). If SAR data are accurately constructed, it can enable the design of more potent binders. Herein, we describe a computer program called Privileged Chemical Space Predictor (PCSP) that statistically determines SAR from high-throughput screening (HTS) data and then identifies features in small molecules that predispose them for binding a target. Features are scored for statistical significance and can be utilized to design improved second generation compounds or more target-focused libraries. The program’s utility is demonstrated through analysis of a modularly assembled peptoid library that was screened for binding to and inhibiting a group I intron RNA from the fungal pathogen Candida albicans.


Combinatorial chemistry is commonly used to discover chemical genetics probes or therapeutics. The major advantage of combinatorial synthesis and screening is that it allows for the efficient synthesis and testing of large ligand libraries1. In synthesizing a large library, however, one must compromise between how focused the library is (target oriented synthesis, TOS) and how diverse it is (diversity oriented synthesis, DOS).24 That is, does one test a narrowly focused landscape of chemical space such that the best ligand within this limited window is not overlooked or should one probe vast landscapes of chemical space in the hopes of identifying novel ligands? In the former case, novel binders are missed while in the latter case it is likely that the best inhibitor would not be identified. This conflict of interest is inevitable but careful library design and analysis of screening data can mitigate this dilemma.

One approach that could merge these screening philosophies while also mitigating their drawbacks is to statistically analyze features in binders from a screen. This analysis would define privileged chemical space for a target and could be used to guide the rational design of second generation binders. Such an approach is most easily applied to modularly assembled libraries of ligands such as peptoids, peptides or carbohydrates,59 because each module and its relative positioning to other modules can be statistically analyzed to determine features that afford a potent ligand.

Small molecule-RNA interactions are one area where there is an underdeveloped knowledge base regarding the types of chemical ligands that are biased for binding a target10, although there continues to be progress in this area.11,12 Previously, a modularly assembled library of RNA-focused peptoids was designed, synthesized, and tested via microarray for binding to an essential group I ribozyme from the pathogenic fungus Candida albicans (Figure 1).13 In that report, a library of peptoids was constructed with nine different monomers assembled across three positions; only 14% of the total possible number of library members were synthesized and tested. The compound library was screened by microarray14 and manual statistical analysis of the binders identified features in ligands that predispose them for binding to and inhibiting the group I intron. This analysis allowed for the design of improved ligands that were not tested in the original screen but would have been members of a comprehensive library. Such manual analysis, however, is not practical with larger libraries.

Figure 1.

Figure 1

Example of a rational “ground-up” approach towards library design: a) building blocks are chosen and functionalized for combinatorial peptoid synthesis; the corresponding two-letter abbreviations are shown, b) privileged building blocks are modularly assembled onto a peptoid scaffold with an azide linker, c) compounds are immobilized onto a microarray surface via the azide linker using 1,3 Huisgen Dipolar Cycloaddition reaction for high-throughput screening.

In order to streamline analysis of SAR data, a computational approach was developed and is described herein in which HTS data are statistically analyzed to identify privileged RNA-binding space. The Windows-based program that implements this approach is called Privileged Chemical Space Predictor (PCSP). The output of the program is an analysis of the features in the compounds that are predisposed for binding and inhibiting a target, or privileged chemical space. The PCSP program can be downloaded free of charge at http://www.nsm.buffalo.edu/Research/rna.

The user interface of PCSP was designed to provide sufficient control over data analysis and data output (Figure 2). First, the user can define a two-letter nomenclature for each building block of a modularly assembled ligand. Examples using the peptoid library described above are shown in Figure 1B (Bl, Ag, Tr, etc.). Next, data are uploaded into PCSP from a standard text file that contains columns with the identity of each compound and its signal for binding a ligand or its potency for inhibition of a target. PCSP normalizes the binding or inhibition values and calculates statistical trends once a cut-off is assigned to score a ligand as a positive hit. For example, in our studies using microarrays, we consider a ligand to be a binder if it gives ≥20% of the signal relative to the best binder on an array. PCSP then determines statistical trends according to Z-scores that correspond to a confidence level >95%. Z-scores are calculated from Equation 115:

Zobs=p1p2Φ(1Φ)(1n1+1n2)  Φ=(n1p1+n2p2)/(n1+n2) (1)

where n1 is the number of compounds containing the trend, n2 is the number of compounds without the trend, p1 is the proportion of binders amongst a population of compounds with a specific trend, p2 is the proportion of binders amongst compounds that do not contain the trend, and Φ is the pooled sample proportion.

Figure 2.

Figure 2

PCSP User Interface showing the results from the analysis of an uploaded data file and a trend query.

Z-scores are easily converted to p-values16, which provide a direct assessment of the probability that the associated trend represents privileged chemical space. For example, the null hypothesis for this system is that if specific building blocks and their relative arrangement on peptoids have no particular bias for binding to a target, then populations of binders and non-binders should be similar in composition. A two-tailed p-value represents the probability that the observed proportion of a specific trend could occur if in fact there is no such bias. Therefore, a trend with an associated two-tailed p-value of < 0.05 statistically confers >95% confidence of rejecting the null hypothesis.

PCSP also provides weighted scores for each ligand binder. This value is calculated using a novel formula (Equation 2) that is analogous to that used to determine statistical Z-scores. In this weighted function, each ligand is normalized for its binding signal on an array or any other measured value (e.g. IC50). In this way, not only populations that statistically produce binders are taken into consideration, but also the relative affinity or potency of those binders is considered.

Zweighted=γ1γ2Φ(1Φ)(1n1+1n2)  Φ=(n1γ1+n2γ2)/(n1+n2) (2)

where n1 is the number of compounds containing the trend, n2 is the number of compounds without the trend, γ1 is the average relative binding signal amongst a population of compounds with a specific trend, γ2 is the average relative binding signal amongst those which do not contain that trend, and Φ is the pooled sample proportion.

To validate the effectiveness of PCSP, data from the previously reported microarray screen of modularly assembled ligands and the C. albicans group I intron13 were loaded into PCSP and analyzed. Based on the raw microarray signals, PCSP identified numerous statistically significant trends (>95% confidence level) for composition of the ligands. A schematic of the analysis performed by PCSP is shown in Figure 3. Three classes of trends are revealed by PCSP that describe chemical space that predisposes a compound for binding. These classes include the occurrence or absence of particular modules, the position of each module, and the positioning of modules relative to each other.

Figure 3.

Figure 3

Schematic of PCSP function: a) each ligand, described by its building blocks’ abbreviations, is analyzed individually by PCSP, b) the composition of each binder and non-binder is recorded for statistical analysis of the two populations according to their composition, c) the composition of each ligand is scaled to its affinity as determined by a high-throughput screen and used to calculate weighted scores for each trend, and d) privileged chemical space is predicted according to the statistical trends found amongst binders and the weighted score.

All trends identified by PCSP with Zobs corresponding to >95% confidence level and a Zweighted >2 are listed in Table 1. The most significant trends for building block composition included compounds that contain exactly one Tr (Zobs = 3.18, two-tailed p-value = 0.0015), one Pg (Zobs = 3.19, two-tailed p-value = 0.0014), or 0 Bl (Zobs = 3.66, two-tailed p-value = 0.0003). Also, the weighted formula revealed that in addition to those peptoids containing 0 Bl building blocks, compounds containing two Tr building blocks displayed unusually high binding signals. Thus, for future design of peptoids targeting this RNA, the Bl building block should be avoided while the Tr building block should be included.

Table 1.

The privileged chemical space within a modularly assembled peptoid library identified for the C. albicans group I intron by PCSP.a

Number of Building Blocks
By Zobs Normalized
Binding Signal
IC50 (µM) By Zweighted Normalized
Binding Signal
IC50 (µM)
0 Bl 3.66 0 Bl 10.10
1 Pg 3.19 0.21–0.55 399–1733 2 Tr 9.07 1.0 158
1 Tr 3.18 0.32–0.76 150–817 0 Ss 7.79
0 Pp 2.23 0 Pp 7.60
0 Ag 7.16
1 Tr 7.05 0.32–0.76 150–817
0 Bz 6.61
1 Pg 4.80 0.21–0.55 399–1733
0 Tp 4.67
1 Ii 4.43 0.32–1.0 158–1733
2 Ii 4.34 0.21–0.76 157–2200
1 Tp 3.98 0.32–0.61 150–2200
0 Pg 3.65
Position of Individual Building Blocks
xx-xx-Pgb 3.19 0.21–0.55 399–1733 Tr-xx-xxb 8.99 0.32–1.0 150–158
Tr-xx-xxb 2.78 0.32–1.0 150–158 xx-Tr-xxb 7.65 0.38–1.0 158–817
xx-Tr-xxb 2.52 0.38–1.0 158–817 xx-xx-Iib 6.04 0.32–1.0 150–2200
xx-xx-Pgb 4.74 0.21–0.55 399–1733 Ii-xx-xxb 2.32 0.21–0.55 399–2200
Position of Building Blocks Relative to Each Other
xx-Tr-Iib 4.52 0.38–1.0 158–817 xx-Bz-Pgb 3.23 0.55 -
a

Only Zobs corresponding to a >95% confidence level and Zweighted > 2 are shown.

b

xx” denotes any building block.

PCSP also determined that the positional dependence of certain building blocks were favourable with >95% confidence: xx-xx-Pg (Zobs = 3.19, two-tailed p-value = 0.0014), Tr-xx-xx (Zobs = 2.78, two-tailed p-value = 0.0054), and xx-Tr-xx (Zobs = 2.52, p = 0.0117) where xx denotes any building block. The three most relevant weighted trends of this type were Tr-xx-xx, xx-Tr-xx, and xx-xx-Ii. In addition to the positioning of single building blocks, the positions of two ligand modules relative to one another are also analyzed by PCSP, and xx-Tr-Ii (Zobs = 4.52, two-tailed p-value <0.0001) is statistically significant. It was predicted by the weighted formula that xx-Bz-Pg may also be a good combination of building blocks for high affinity binding.

In the previous report13, new compounds were synthesized based on manual statistical analysis of the screening data. Two second generation compounds, Tr-Tr-Pg and Tr-Tp-Pg, were designed by replacing the building block in the third position of potent first generation inhibitors (Tr-Tr-Ii and Tr-Tp-Ii) with Pg (Table 1, two-tailed p-value = 0.0014). Both Tr-Tr-Pg and Tr-Tp-Pg are more potent than the respective parent compound. Other second generation compounds with four points of diversity were synthesized using the results of statistical analysis, which determined with the highest confidence level that the presence of Pg and Tr building blocks were important features for binding to the C. albicans group I intron. Therefore, Pg was placed in the 4th variable position as a side chain element to already potent ligands. This addition provided the best inhibitors, with compound Tr-Tr-Tr-Pg being the most potent of all compounds studied (~5-fold more potent than all first generation inhibitors). Interestingly, the presence of two Tr building blocks had the highest Zweighted score (Table 1, 9.07). The presence of Tr in the first position has a Zweighted score of 8.99 while Tr in the second position has a Zweighted score of 7.65 (Table 1). (It should be noted that Tr-Tr-Tr constituted only one member of the 109 peptoids screened in the study. Therefore, statistical analysis cannot meaningfully assign Zobs or Zweighted values. In contrast, multiple members of the library contained one or two Tr building blocks in different positions.) The presence of one Pg building block also represents privileged chemical space and has a Zweighted score of 4.80 (Table 1).

Taken together, the first and second generation structures and their relative potencies correspond directly to the statistical analysis performed by PCSP (Figure 4 and Table 1), indicating its potential utility in future RNA-ligand or protein-ligands screens of any scale. For example, one of the best first generation inhibitors, Tr-Tr-Ii, also has the highest binding signal on the array. This compound is composed of statistically significant trends with large Zobs and Zweighted values. Two Tr building blocks corresponds to a Zweighted value of 9.07 while one Ii has a Zweighted value of 4.43. In addition, this compound contains trends that depend on the position of individual building blocks or the positions of the building blocks relative to each other. PCSP identified the following positional trends: Tr-xx-xx (Zweighted = 8.99), xx-Tr-xx (Zweighted = 7.65), xx-xx-Ii (Zweighted = 6.04), and xx-Tr-Ii (Zweighted = 4.52). The program also identified chemical space that should be avoided. The absence of Bl building blocks has a Zweighted value of 10.10. Interestingly, Tr-Tr-Bl does not bind to the C. albicans group I ribozyme when displayed on the microarray surface.13 Another compound that contains Bl, Pp-Bl-Bz, does not bind the group I ribozyme and is a poor inhibitor with an IC50 >5 mM. This compound has three features that are selected against according to their Zweighted values: 0 Pp building blocks has a Zweighted value of 7.60; 0 Bl building blocks has a Zweighted value of 10.10; and 0 Bz building blocks has a Zweighted value of 6.61. Therefore, PCSP successfully identified both positive and negative features for inhibition of the C. albicans ribozyme by statistical analysis of the binding signals from the microarray.

Figure 4.

Figure 4

Statistical interpretation from PCSP is used to design 2nd generation compounds with increased potency by modifying initial hits to contain the identified trends.

In summary, the PCSP program was developed to statistically analyze a modularly assembled library to identify features that predispose ligands for binding RNA. The program was able to quickly determine significant trends from a published microarray study on the binding of RNA-focused peptoids to a validated RNA target13. PCSP should also be generally useful for application to other biomolecular targets besides RNA. For example, modularly assembled peptoid libraries displayed on microarrays or beads have been screened to identify high affinity ligands that bind proteins.1719 Moreover, PCSP could be used to probe SAR data and define privileged diversity element space and the privileged relative positioning of diversity elements in any combinatorial library. The computational approach using PCSP expedites analysis of SAR data, can reveal statistically relevant trends that might otherwise be overlooked, and can be easily applied to screens of larger modularly assembled ligand libraries. Such studies should streamline analysis of HTS and SAR data to establish more information on the types of modularly assembled ligands that are biased for binding RNA or other targets that are present in the genome.

Supplementary Material

01

Acknowledgments

We thank the National Institutes of Health for funding this work (RO1 – GM079235).

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Supplementary Data

Supplementary data associated with this article can be found, in the online version, at doi: (to be filled in). These data include the source code for the PSCP computer program.

References

  • 1.Geysen HM, Schoenen F, Wagner D, Wagner R. Nat. Rev. Drug Discovery. 2003;2:222. doi: 10.1038/nrd1035. [DOI] [PubMed] [Google Scholar]
  • 2.Burke MD, Schreiber SL. Angew. Chem. Int. Ed. Engl. 2004;43:46. doi: 10.1002/anie.200300626. [DOI] [PubMed] [Google Scholar]
  • 3.Fergus S, Bender A, Spring DR. Curr. Opin. Chem. Biol. 2005;9:304. doi: 10.1016/j.cbpa.2005.03.004. [DOI] [PubMed] [Google Scholar]
  • 4.Schreiber SL. Science. 2000;287:1964. doi: 10.1126/science.287.5460.1964. [DOI] [PubMed] [Google Scholar]
  • 5.Simon RJ, Kania RS, Zuckermann RN, Huebner VD, Jewell DA, Banville S, Ng S, Wang L, Rosenberg S, Marlowe CK, Spellmeyer DC, Tan RY, Frankel AD, Santi DV, Cohen FE, Bartlett PA. Proc. Natl. Acad. Sci. U. S. A. 1992;89:9367. doi: 10.1073/pnas.89.20.9367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Olivos HJ, Alluri PG, Reddy MM, Salony D, Kodadek T. Org. Lett. 2002;4:4057. doi: 10.1021/ol0267578. [DOI] [PubMed] [Google Scholar]
  • 7.Liu Y, Palma AS, Feizi T. Biol. Chem. 2009;390:647. doi: 10.1515/BC.2009.071. [DOI] [PubMed] [Google Scholar]
  • 8.Ratner DM, Adams EW, Disney MD, Seeberger PH. ChemBiochem. 2004;5:1375. doi: 10.1002/cbic.200400106. [DOI] [PubMed] [Google Scholar]
  • 9.Winkler DF, Hilpert K, Brandt O, Hancock RE. Methods Mol. Biol. 2009;570:157. doi: 10.1007/978-1-60327-394-7_5. [DOI] [PubMed] [Google Scholar]
  • 10.Thomas JR, Hergenrother PJ. Chem. Rev. 2008;108:1171. doi: 10.1021/cr0681546. [DOI] [PubMed] [Google Scholar]
  • 11.Parsons J, Castaldi MP, Dutta S, Dibrov SM, Wyles DL, Hermann T. Nat. Chem. Biol. 2009;5:823. doi: 10.1038/nchembio.217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Marcheschi RJ, Mouzakis KD, Butcher SE. ACS Chem. Biol. 2009;4:844. doi: 10.1021/cb900167m. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Labuda LP, Pushechnikov A, Disney MD. ACS Chem. Biol. 2009;4:299. doi: 10.1021/cb800313m. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.MacBeath G, Koehler AN, Schreiber SL. J. Am. Chem. Soc. 1999;121:7967. [Google Scholar]
  • 15.Weiss NA, Hassett MJ. Introductory statistics. Reading, Mass: Addison-Wesley Pub. Co.; 1982. [Google Scholar]
  • 16.To convert Zobs to a two-tailed p-value, standard statistics tables can be used. For example, see Table 1 in (15)
  • 17.Lim HS, Reddy MM, Xiao X, Wilson J, Wilson R, Connell S, Kodadek T. Bioorg. Med. Chem. Lett. 2009;19:3866. doi: 10.1016/j.bmcl.2009.03.153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Udugamasooriya DG, Dineen SP, Brekken RA, Kodadek T. J. Am. Chem. Soc. 2008;130:5744. doi: 10.1021/ja711193x. [DOI] [PubMed] [Google Scholar]
  • 19.Zuckermann RN, Kodadek T. Curr. Opin. Mol. Ther. 2009;11:299. [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

01

RESOURCES