Abstract
Previously, we screened a series of arylcarboxylic acid hydrazide derivatives for their ability to induce macrophage tumor necrosis factor α (TNF-α) production and identified 16 such compounds. In the present study, we evaluated 23 additional arylcarboxylic acid hydrazides and found that seven of these compounds also induced macrophage TNF-α production, representing novel compounds with this activity. The total set of active compounds was then used for computational structure–activity relationship (SAR) analysis to further optimize lead molecules. A sequence of 1) linear discriminant analysis, 2) classification tree analysis with linear combination, and 3) univariate splits based on atom pair descriptors led to the derivation of SAR rule-based algorithms with fittng accuracy of 96.5, 91.9, and 84.9%, respectively. The SAR rules obtained from classification tree analysis with univariate splits, which was based on three atom pair descriptors only, revealed that the main factors influencing agonist activity of arylcarboxylic acid hydrazide derivatives were the presence of a methyl or trifluoromethyl group in the benzene ring attached to the furan moiety, an alkoxy group in the aromatic ring near the methylenehydrazide linker, and two or more halogen atoms (chlorine or bromine) on one side of the dumbbell-shaped hydrazide molecule opposed by an aromatic moiety on the opposite side of the molecule. Thus, these rules represent a relatively simple classification approach for de novo design of small molecule inducers of macrophage TNF-α production.
Keywords: tumor necrosis factor α, macrophage, atom pairs, molecular descriptors, structure–activity relationship analysis
1. Introduction
Tumor necrosis factor α (TNF-α) is a key cytokine that contributes to immune and inflammatory reactions and is important for both innate and adaptive immunity.1 Currently, a significant effort is focused on the development of anti-TNF-α agents as therapeutics for treatment of chronic inflammatory conditions, such as rheumatoid arthritis and inflammatory bowel disease2. However, TNF-α is also well known for its ability to induce apoptosis of tumor cells, resulting in tumor necrosis, and use of TNF-α in cancer treatment has been pursued.3 Unfortunately, the clinical use of TNF-α has been limited due to its proinflammatory activity.4 On the other hand, stimulation of endogenous TNF-α production is still considered a reasonable approach in tumor biotherapy, and several compounds have been found to induce TNF-α, inhibit tumor blood flow, and cause necrosis in experimental tumors.5 Indeed, a number of small molecule cytokine inducers have been identified and characterized for their ability to stimulate TNF-α production. For example, both natural and synthetic agents with antimicrobial and antitumor properties, such as imidazoquinolines and taxanes, have been shown to induce a broad range of cytokines in cell culture and/or in vivo.6,7 Recently, we identified several small molecule N-formyl peptide receptor agonists that potently induced TNF-α production in murine and human macrophages.8 Interestingly, these compounds all contained an arylcarboxylic acid hydrazide core structure, which is distinct from other known inducers of TNF-α production.
Our analysis of arylcarboxylic acid hydrazides showed that individual ring substituents had significant impact on the potency of these derivatives for inducing macrophage TNF-α production,8 suggesting that further structure–activity relationship (SAR) analysis of these compounds would contribute to our understanding of their mechanism of action and could lead to the development of additional compounds with enhanced efficacy. Indeed, SAR and quantitative SAR (QSAR) models have been instrumental in understanding molecular mechanisms of action of receptor agonists and antagonists, directing their design, and in virtual screening.9 To date, non-computational SAR analysis has been performed for a series of taxoids;10,11 however, there are currently no reported computational SAR models for small-molecule inducers of TNF-α production.
While a variety of molecular parameters can be used in the computational methods for (Q)SAR analysis,12,13 some of these parameters are complex physicochemical or geometrical descriptors whose calculation is associated with difficulties due to molecular flexibility and inadequate sampling of conformational space. In contrast, topological indices (i.e., 2D descriptors) obtained from the structural formula of a compound are very attractive because of their simplicity. Recently, we developed an improved approach to SAR methodology based on atom pair descriptors in combination with classical physicochemical and geometrical descriptors and showed that this methodology can detect specific combinations of substructure patterns that confer high or low inhibitory activity against neutrophil elastase.14 Here, we utilized a similar approach for computational SAR analysis of a large group of arylcarboxylic acid hydrazides, including our previously reported derivatives8 and several novel analogs identified here in further screening. These studies provide further optimization of these molecules as lead compounds that can induce macrophage TNF-α production and also provide clues to the molecular features required for agonist activity.
2. Results and Discussion
2.1. Identification of novel TNF-α inducers and selection of the molecular set
Previously, we screened a series of arylcarboxylic acid hydrazide derivatives for their ability to induce macrophage tumor necrosis factor α (TNF-α) production and found that 16 compounds induced production of modest-to-high levels of TNF-α by murine and human macrophages.8 Structures of these compounds and their activity, expressed as fold-increase (FI) in macrophage TNF-α production above solvent control, together with the inactive arylcarboxylic acid hydrazides that we evaluated previously are shown in Table 1 (Compounds 1, 8–10, 23–50, and 52–82). FI was used to normalize the activity for experiment-to-experiment variations observed in background due to solvent (DMSO) alone. Variations in background activity are likely due to differences in batches of our cultured macrophages, as it is clear that the number of passages affects cell activity, and newer batches of cells exhibited much higher stimulated activity as well as much higher background activity. Since FI represents relative activity above background, use of FI values allowed us to compare results from a number of experiments regardless of background, and average FI from three independent experiments are provided.
Table 1.
Effect of arylcarboxylic acid hydrazide derivatives on macrophage TNF-α production
| A. (2-furyl)methylene-hydrazides of nicotinic acid | |||||||
|---|---|---|---|---|---|---|---|
![]() | |||||||
| Compound | R1 | R2 | R3 | R4 | R5 | R6 | FIa |
| 1 | H | H | H | Br | H | H | 50 |
| 2 | H | H | H | Cl | CH3 | H | 60 |
| 3 | H | Br | H | H | Cl | H | 25 |
| 4 | CH3 | H | H | Cl | CH3 | H | 21 |
| 5 | H | H | H | Cl | H | H | 17 |
| 6 | H | H | H | H | Cl | H | 15 |
| 7 | H | Br | H | Cl | CH3 | H | 10 |
| 8 | CH3 | H | H | CF3 | H | H | <5 |
| 9 | CH3 | H | Cl | H | Cl | Cl | <5 |
| 10 | H | H | H | COOH | OH | H | N.A. |
| 11 | H | H | H | H | Br | H | N.A. |
| 12 | CH3 | H | H | H | Cl | H | N.A. |
| 13 | CH3 | H | H | Cl | H | H | N.A. |
| 14 | H | H | Cl | H | Cl | Cl | N.A. |
| 15 | H | Br | Cl | H | Cl | H | N.A. |
| 16 | H | H | Cl | Cl | H | H | N.A. |
| 17 | H | H | Cl | H | H | H | N.A. |
| 18 | CH3 | H | H | Br | H | H | N.A. |
| 19 | CH3 | H | Cl | Cl | H | H | N.A. |
| 20 | CH3 | H | Cl | H | Cl | H | N.A. |
| 21 | CH3 | H | Cl | H | H | Cl | N.A. |
| 22 | CH3 | H | H | Cl | Cl | H | N.A. |
| B. (2-furyl)methylene-hydrazides of benzoic acid | ||||||||
|---|---|---|---|---|---|---|---|---|
![]() | ||||||||
| Compound | R1 | R2 | R3 | R4 | R5 | R6 | R7 | FI |
| 23 | H | F | H | H | CF3 | H | H | 35 |
| 24 | H | H | H | Cl | Cl | H | H | 8 |
| 25 | NO2 | H | H | H | CF3 | H | H | <5 |
| 26 | H | Cl | Cl | Cl | H | H | H | <5 |
| 27 | H | NO2 | H | Cl | H | H | Cl | <5 |
| 28 | I | H | Cl | H | CF3 | H | H | <5 |
| 29 | H | Br | H | H | CF3 | H | H | N.A. |
| 30 | H | H | NO2 | Cl | H | Cl | H | N.A. |
| 31 | H | H | OH | H | Cl | Cl | H | N.A. |
| 32 | H | NO2 | H | H | Cl | H | H | N.A. |
| 33 | H | OCH3 | H | H | COOH | Cl | H | N.A. |
| 34 | H | NO2 | H | H | Cl | OCH3 | H | N.A. |
| 35 | OH | H | H | H | H | NO2 | H | N.A. |
| 36 | H | t-butyl | H | H | H | NO2 | H | N.A. |
![]() | ||||||
|---|---|---|---|---|---|---|
| Compound | R1 | R2 | R3 | R4 | R5 | FI |
| 37 | H | Br | H | OH | H | N.A. |
| 38 | H | CH3 | H | H | CH3 | N.A. |
| 39 | H | H | H | H | H | N.A. |
| 40 | H | OH | H | OH | NO2 | N.A. |
| 41 | NO2 | H | NO2 | OH | NO2 | N.A. |
| 42 | H | Cl | H | H | H | N.A. |
| 43 | Br | H | H | OCH3 | CH3 | N.A. |
| 44 | H | Br | H | OCH3 | H | N.A. |
| 45 | H | ![]() |
H | H | H | N.A. |
| 46 | H | H | H | H | Br | N.A. |
| 47 | H | H | F | H | H | N.A. |
| 48 | H | ![]() |
H | H | H | N.A. |
| 49 | H | CI | H | Cl | H | N.A. |
| 50 | H | H | H | F | I | N.A. |
| C. Other derivatives | ||
|---|---|---|
| N | Structure | FI |
| 51 | ![]() |
50 |
| 52 | ![]() |
35 |
| 53 | ![]() |
25 |
| 54 | ![]() |
7 |
| 55 | ![]() |
<5 |
| 56 | ![]() |
<5 |
| 57 | ![]() |
<5 |
| 58 | ![]() |
<5 |
| 59 | ![]() |
N.A. |
| 60 | ![]() |
N.A. |
| 61 | ![]() |
N.A. |
| 62 | ![]() |
N.A. |
| 63 | ![]() |
N.A. |
| 64 | ![]() |
N.A. |
| 65 | ![]() |
N.A. |
| 66 | ![]() |
N.A. |
| 67 | ![]() |
N.A. |
| 68 | ![]() |
N.A. |
| 69 | ![]() |
N.A. |
| 70 | ![]() |
N.A. |
| 71 | ![]() |
N.A. |
| 72 | ![]() |
N.A. |
| 73 | ![]() |
N.A. |
| 74 | ![]() |
N.A. |
| 75 | ![]() |
NA |
| 76 | ![]() |
NA |
| 77 | ![]() |
NA |
| 78 | ![]() |
NA |
| 79 | ![]() |
NA |
| 80 | ![]() |
NA |
| 81 | ![]() |
N.A. |
| 82 | ![]() |
N.A. |
| 83 | ![]() |
N.A. |
| 84 | ![]() |
N.A. |
| 85 | ![]() |
N.A. |
| 86 | ![]() |
N.A. |
Macrophage TNF-α production induced by 50 µM of the indicated compound is shown as fold-increase (FI) above response to vehicle (DMSO) control. Activity of Compounds 2–7, 11–22, 51, and 83–86 was evaluated in the present work. Data for Compounds 1, 8–10, 23–50, and 52–82 were from our previous report8.
To increase the molecular data set, we selected 23 additional arylcarboxylic acid hydrazide derivatives and evaluated their ability to stimulate TNF-α production. As shown in Table 1, we identified 7 additional novel compounds with varying levels of activity (Compounds 2–7, and 51). Derivatives of nicotinic acid (Compound 2) and isonicotinic acid (Compound 51) were the most active, inducing similar levels of TNF-α that were induced by control LPS (50 ng/ml) and the most potent of our previously identified compounds (Figure 1). Activation of macrophage TNF-α production was not due to endotoxin contamination, since analysis of Compounds 2 and 51 for endotoxin using a limulus amebocyte lysate assay showed that these compounds contained no endotoxin (below detection limit; data not shown). Furthermore, treatment with the additional compounds, which included 7 active compounds (2–7 and 51) and 16 inactive compounds (11–22 and 83–86) from our set, had no effect on cell viability in J774.A1 macrophages, indicating lack of cytotoxicity at concentrations ≤50 µM (data not shown).
Figure 1.
Effect of the most potent arylcarboxylic acid hydrazides on macrophage TNF-α production. J774.A1 macrophages (2×105 cells/well) were cultured in the presence of the indicated concentrations of Compound 2 (□), Compound 51 (■), or 50 ng/ml LPS (●) for 24 hr, and TNF-α was measured in the cell supernatants by ELISA. The data are presented as the fold-increase (FI) in TNF-α production above DMSO control and represent the mean±SD of three independent experiments with triplicate samples analyzed in each experiment.
In SAR studies a compound set under investigation is conventionally split into two or more classes (Active, Moderately active, Low-active, Non-active, etc.) rather than using individual activities. This allows formulation of more or less simple SAR classification rules, in contrast to a QSAR study where initial numerical values of activity are used (FI, for example). For SAR analysis here, the total set of the arylcarboxylic acid hydrazide derivatives (Compounds 1–86) was divided into two activity classes based on their experimentally-determined activity. Compounds that induced macrophage TNF-α production (FI≥2) were classified as “Active” (23 compounds), whereas inactive derivatives were placed in the non-active group labeled “NA” (63 compounds).
2.2. Descriptors
Atom pairs were automatically generated from bond connectivity of the arylcarboxylic acid hydrazides and are specified in terms of types of the two atoms in a pair separated by the number of chemical bonds in the structural formula.15 As described previously,14 we used the atom type names from MM+ force field, as implemented in HyperChem. According to this scheme, specific atom pairs are defined as T1_D_T2, where T1 and T2 are the atom types assigned by HyperChem, and D is the number of chemical bonds in the shortest path between the two atoms (see Experimental Section). HyperChem output in a HIN file format was entered directly into our CHAIN program, which generated all possible atom pairs and frequencies of their occurrence in each of the 86 hydrazides. These frequencies were considered as values of the corresponding atom pair descriptors, and examples of atom pairs are shown in Figure 2. Note that atom pair descriptors are easily interpretable in terms of standard chemical formulae. For example, BR_11_CA indicates the simultaneous presence of bromine atom and an aromatic ring in the opposite sides of a molecule (see Figure 2). It should be noted that, although atom naming was taken from MM+ force field, performing MM+ molecular mechanics optimization itself is not necessary because only bond connectivity, but not geometry, is important for the atom pair calculation.
Figure 2.
Examples of atom pair descriptors in selected active arylcarboxylic acid hydrazides. Atom pairs are depicted in red and indicated below the structure. Compound numbers correspond to those shown in Table 1.
In total, 836 unique atom pairs were generated for all 86 hydrazides, and a histogram of the number of atom pairs with different bond distances is presented in Figure 3A. Note that the histogram has two maxima at 5 and 10–11 chemical bonds, which is in agreement with the dumbbell shape of most compounds in our set. Indeed, all of the molecules contain two bulky moieties connected by the hydrazide linker. Hence, the relatively “short” atom pairs originated from the same moiety, as well as the much “longer” atom pairs representing atoms in the two different moieties prevail in the total number of 836 descriptors generated.
Figure 3.
Numbers of unique atom pairs in the set of arylcarboxylic acid hydrazides. The numbers are shown for each of the indicated bond distances initially generated for the 86 hydrazides (Panel A). Atom pairs subsequently included in the best LDA model are shown in Panel B.
2.3. Linear discriminant analysis
One of the most powerful pattern recognition techniques is linear discriminant analysis (LDA), and we recently applied it to SAR analysis of compounds with elastase inhibitory activity.14 Likewise, we used LDA here as a basic methodology for SAR classification of the 86 hydrazide derivatives. Taking into account that classical LDA is unable to handle as many as 836 descriptors for 86 compounds, we performed advanced LDA with the Forward Stepwise option available in STATISTICA 6.0. At each step, descriptors were successively included or excluded until no significant (p<0.05) improvement of the model was achieved. This procedure led to the selection of only 14 significant variables from the initially generated 836 descriptors. The following atom pairs were selected by Forward Stepwise LDA: C4_2_NA, C3_3_CL, C4_3_O2, CO_3_NA, CO_3_NO, C3_5_C4, C4_7_OF, BR_11_CA, CA_12_CL, CL_12_NA, BR_13_C4, C4_14_NO, BR_15_C4, and CL_15_O2. Use of the classification functions obtained with these pairs resulted in 95.3% correct classification: 20 of 23 active and 62 of 63 inactive hydrazides were correctly classified to their experimentally-determined activity. In addition, values of the 14 atom pairs selected were not mutually correlated with each other (r≤0.7), i.e. they can be regarded as independent variables.
In order to further decrease the number of descriptors, we performed LDA analysis with the Best Subset Search option, starting from 14 atom pairs selected after the first run of the LDA procedure and found that the best subset consisted of 13 atom pairs as listed above, but with C4_2_NA excluded. These variables provided the least misclassification error among all other possible subsets of different sizes chosen from 14 descriptors, and the SAR model obtained had an improved quality of classification: 96.5% compounds were classified correctly compared to their experimental activity (Tables 2 and Table 3). This LDA model can be presented by two classification functions F(Active) and F(NA):
| (Eq. 1) |
| (Eq. 2) |
Table 2.
Classification matrices for linear discriminant analysis (LDA) and classification tree analyses with linear combination splits (CTLCS) and univariate splits.
| Experimentally Determined Classification |
Calculated Classification | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| LDA model | CTLCS model | Classification Tree with Univariate Splits |
|||||||
| Active | NA | Accuracy (%) |
Active | NA | Accuracy (%) |
Active | NA | Accuracy (%) |
|
| Active | 20 | 3 | 87.0 | 20 | 3 | 87.0 | 16 | 7 | 69.6 |
| NA | 0 | 63 | 100.0 | 4 | 59 | 93.7 | 6 | 57 | 90.5 |
| Total | 20 | 66 | 96.5 | 24 | 62 | 91.9 | 22 | 64 | 84.9 |
The number of compounds correctly classified by the model is indicated in bold.
Table 3.
Experimentally determined, SAR-calculated, and LOO-predicted classes of macrophage TNF-α inducing activity for all 86 arylcarboxylic acid hydrazide derivatives.
| LDA Model | CTLCS Model | Classification Tree with Univariate Splits | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Atom Pairs and Frequency of Occurrence | Calculateda | ||||||||
| Calculated | LOO-predicted | Calculated | LOO-predicted | C3_5_C4 | CA_12_CL | BR_11_CA | |||
| 1 | Active | Active | Active | Active | Active | 0 | 0 | 1 → | NA |
| 2 | Active | Active | Active | Active | Active | 1 | Active | ||
| 3 | Active | Active | Active | Active | Active | 0 | 1 | 1 → | NA |
| 4 | Active | Active | Active | Active | Active | 1 | Active | ||
| 5 | Active | NA | NA | NA | NA | 0 | 2 | 0 → | NA |
| 6 | Active | NA | NA | NA | NA | 0 | 1 | 0 → | NA |
| 7 | Active | Active | Active | Active | Active | 1 | Active | ||
| 8 | Active | Active | Active | Active | Active | 1 | Active | ||
| 9 | Active | NA | NA | NA | NA | 0 | 4 | Active | |
| 10 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 11 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 12 | NA | NA | NA | NA | NA | 0 | 1 | 0 → | NA |
| 13 | NA | NA | NA | NA | NA | 0 | 2 | 0 → | NA |
| 14 | NA | NA | NA | NA | NA | 0 | 4 | Active | |
| 15 | NA | NA | Active | Active | Active | 0 | 2 | 1 → | NA |
| 16 | NA | NA | NA | NA | NA | 0 | 3 | 0 → | NA |
| 17 | NA | NA | NA | NA | NA | 0 | 1 | 0 → | NA |
| 18 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 19 | NA | NA | NA | NA | NA | 0 | 3 | 0 → | NA |
| 20 | NA | NA | NA | NA | NA | 0 | 2 | 0 → | NA |
| 21 | NA | NA | NA | NA | NA | 0 | 3 | 0 → | NA |
| 22 | NA | NA | NA | Active | Active | 0 | 3 | 0 → | NA |
| 23 | Active | Active | Active | Active | Active | 1 | Active | ||
| 24 | Active | Active | Active | Active | Active | 0 | 4 | Active | |
| 25 | Active | Active | Active | Active | Active | 1 | Active | ||
| 26 | Active | Active | Active | Active | Active | 0 | 5 | Active | |
| 27 | Active | Active | Active | Active | Active | 0 | 4 | Active | |
| 28 | Active | Active | Active | Active | Active | 1 | Active | ||
| 29 | NA | NA | NA | NA | NA | 1 | Active | ||
| 30 | NA | NA | Active | NA | Active | 0 | 3 | 0 → | Active |
| 31 | NA | NA | NA | NA | NA | 0 | 3 | 0 → | NA |
| 32 | NA | NA | NA | NA | NA | 0 | 2 | 0 → | NA |
| 33 | NA | NA | NA | NA | NA | 0 | 1 | 0 → | NA |
| 34 | NA | NA | NA | NA | NA | 0 | 2 | 0 → | NA |
| 35 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 36 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 37 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 38 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 39 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 40 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 41 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 42 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 43 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 44 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 45 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 46 | NA | NA | NA | Active | Active | 0 | 0 | 1 → | NA |
| 47 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 48 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 49 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 50 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 51 | Active | Active | Active | Active | Active | 1 | Active | ||
| 52 | Active | Active | Active | Active | Active | 0 | 2 | 0 → | NA |
| 53 | Active | Active | NA | Active | NA | 0 | 0 | 0 → | NA |
| 54 | Active | Active | Active | Active | Active | 1 | Active | ||
| 55 | Active | Active | NA | Active | NA | 0 | 3 | 0 → | NA |
| 56 | Active | Active | Active | Active | Active | 1 | Active | ||
| 57 | Active | Active | Active | Active | Active | 0 | 0 | 2 → | Active |
| 58 | Active | Active | Active | Active | Active | 0 | 0 | 2 → | Active |
| 59 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 60 | NA | NA | Active | NA | Active | 1 | Active | ||
| 61 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 62 | NA | NA | NA | NA | NA | 0 | 0 | 1 → | NA |
| 63 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 64 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 65 | NA | NA | NA | NA | NA | 0 | 0 | 2 → | Active |
| 66 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 67 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 68 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 69 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 70 | NA | NA | Active | NA | Active | 1 | Active | ||
| 71 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 72 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 73 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 74 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 75 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 76 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 77 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 78 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 79 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 80 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 81 | NA | NA | NA | NA | NA | 0 | 0 | 0 → | NA |
| 82 | NA | NA | NA | NA | Active | 0 | 1 | 0 → | NA |
| 83 | NA | NA | NA | NA | NA | 0 | 1 | 0 → | NA |
| 84 | NA | NA | NA | NA | NA | 0 | 3 | 0 → | NA |
| 85 | NA | NA | NA | NA | NA | 1 | Active | ||
| 86 | NA | NA | NA | Active | Active | 0 | 0 | 1 → | NA |
Incorrect classifications are indicated in bold italics. Arrows correspond to compound classification upon entering terminal nodes of the tree shown in Fig 5.
According to these equations, a compound will be classified as “Active” if the value of F(Active) > F(NA), and vice versa. The classifications observed and calculated by the LDA model for Compounds 1–86 are shown in Table 3, and values of all atom pair descriptors used in Equation 1 and Equation 2 are shown in Supplementary Table S1.
The predictive ability of the LDA model was evaluated by the leave-one-out (LOO) procedure. The LOO prediction resulted in 89.5% correct classification, and 18 of 23 active and 59 of 63 inactive hydrazides were correctly predicted for their TNF-α induction activity classes (Table 3). Thus, these results confirm usefulness of the LDA model for a priori evaluation of macrophage TNF-α inducing activity of arylcarboxylic acid hydrazides.
Although 13 atom pair descriptors were utilized in the derived LDA model, this number should not be regarded as too large. Conventionally, the recommended number of variables for SAR and QSAR models, from a statistical point of view, should be ≤20% of the number of compounds. Hence, the number of atom pairs selected is reasonable for 86 hydrazide derivatives investigated. Additionally, all coefficients of the classification functions (Eq. 1 and Eq. 2) were significant according to the Fisher criterion.
The atom pairs involved in Eq. 1 and Eq. 2 are not uniformly distributed in the number of chemical bonds D. Figure 3B shows that six atom pairs used in the LDA model have bond distances from 3 to 7, while the other seven descriptors are characterized by D values from 11 to 15. Indeed, this distribution is a reflection of total atom pair distribution (Figure 3A), which is conditioned by the dumbbell shape of the compounds investigated. On the other hand, the importance of “longer” atom pairs for SAR classification supports the supposition that a biological target interacts with the entire hydrazide molecule, rather than with metabolites of a smaller size.
2.4. Classification tree analysis with linear combination splits
In our previous SAR analysis of N-benzoylpyrazoles with elastase inhibitory activity, we also used LDA methodology;14 however, its use was preceded by application of one-way analysis of variance (ANOVA)16 for preliminary selection of descriptors having significant differences between in-class and total variances. This led to a substantial decrease in the number of atom pairs to reduce dimensionality of the data matrix for further SAR analysis. Since each descriptor selected by ANOVA has one-dimensional separation of classes, compounds from different groups are characterized by relatively distinct areas of data point projections on a single coordinate axis associated with a given descriptor (e.g., see Figure 4A).
Figure 4.
Simulated examples of descriptors with one-dimensional (A) and two-dimensional (B) separation. Active and non-active compounds are represented by open and close circles, respectively.
It should be noted that in the case of hydrazides 1–86, the pre-selection of atom pairs by ANOVA did not result in a satisfactory SAR model for predicting their macrophage TNF-α inducing activity if the LDA method was applied to the ANOVA-selected descriptors. Instead, good classification was achieved by stepwise LDA applied to the initial non-reduced data matrix, as described above. Noteably, only three atom pairs (C3_5_C4, CA_12_CL, and C4_14_NO) of thirteen descriptors involved in Eq. 1 and Eq. 2 were selected in the trial run of ANOVA and thus had approximately one-dimensional class separation, as exemplified in Figure 4A.
The other 10 atom pairs had occurrences that non-significantly differed between classes of active and non-active compounds. These atom pair descriptors clearly belong to another type where the activity classes were separated in higher-dimensional subspaces of such descriptors (see two-dimensional example in Figure 4B). Although projections of data points for both classes in this example are approximately uniformly distributed on each coordinate axis, there exists a line of good separation, and such descriptors appear to be very useful for SAR analysis, as demonstrated above by LDA. In a more common case of higher dimensionality, there may exist a hyper-plane separating two classes of compounds. Taking into account the distribution character of data points for Compounds 1–86 in descriptor space, we attempted to apply a methodology known as classification tree analysis with linear combination splits (CTLCS).17
In this approach, a logical tree was created where a split condition for each tree node depends on a linear combination of several descriptors. We found that the best classification tree for Compounds 1–86 had just one split. The 13 atom pairs utilized in the LDA model (see Eq. 1 and Eq. 2) were used as a basis in the CTLCS approach, and all pairs were included in the function F(x) (Eq. 3), indicating again that all 13 descriptors were important for prediction of the correct biological activity class.
| (Eq. 3) |
According to the split condition, a compound would be classified as inactive if F(x) ≤ 0; otherwise a compound belongs to the “Active” class. The classification matrix obtained by the CTLCS method is shown in Table 2. The activity classes were predicted correctly for 20 of 23 active and 59 of 63 inactive hydrazides, resulting in a total accuracy of fitting 91.9%. The calculated and LOO-predicted classes for individual compounds are shown in Table 3. In 73 of 86 cases (84.9%), a priori prediction of activity class by the LOO procedure was correct. While LDA classification by Eq. 1 and Eq. 2 had better characteristics of fitting and prediction (Table 2), the CTLCS model was two-fold simpler in the amount of calculation necessary for a compound classification. Satisfactory results obtained by the one-split tree based on linear combination of variables indicates that the descriptor space is divided into two areas by a hyper-plane expressed by Eq. 3. Each of these areas preferentially contains data points for compounds of a single activity class, such as in the simulated two-dimensional example given in Figure 4B. Such well-organized data in a space of atom pair descriptors demonstrates the powerful ability of atom pairs to separate compounds of different activity in SAR analysis.
It should be noted that most of the incorrect classifications by both the LDA and CTLCS methods were made in the subset of nicotinic acid hydrazide derivatives 1–22 (Table 3). Hence, some structural or physico-chemical peculiarities of nicotinic acid hydrazides (e.g., polarizability, dipole moment, etc.) may be reflected non-significantly in the entire matrix of atom pair descriptors.
2.5. Classification tree analysis with univariate splits
Although the LDA and CTLCS models had high fitting and predictive abilities, it is difficult to formulate these models in a set of intuitively understandable “chemical” rules. The methodology of binary classification tree analysis with univariate splits18 is more suitable for deriving simplified SAR rules, while being less complex than the LDA or CTLCS methods. Based on the 13 descriptors selected in LDA above, we obtained the optimal classification tree with univariate splits shown in Figure 5. The atom pair descriptors involved in the optimal tree were selected automatically by STATISTICA 6.0 using an exhaustive univariate split selection method (see Experimental Section).
Figure 5.
Binary classification tree reflecting the simplified SAR rules for predicting macrophage TNF-α inducing activity of arylcarboxylic acid hydrazide derivatives.
According to this tree, the prediction of Compounds 1–86 as “Active” or “NA” depends on three atom pairs: C3_5_C4, CA_12_CL, and BR_11_CA (examples shown in Figure 2). Taking into account that atom pair descriptors adopt integer values only, the conditions present in Figure 5 can be interpreted as follows. If a compound has at least one C3_5_C4 atom pair, then the compound is classified as “Active.” Similarly, on the second and third splits, a compound is classified as “Active” if it has more than three CA_12_CL atom pairs or more than one BR_11_CA atom pair, respectively. An insufficient number of all the enumerated atom pairs leads to the left lowest terminal node where the compound is assigned as “NA.” In total, 84.9% of the compounds were classified correctly using only these three atom pairs (57 of 63 inactive and 16 of 23 active arylcarboxylic acid hydrazide derivatives were correctly classified) (Table 2). Classifications made by the tree for compounds 1–86 are shown in Table 3.
As indicated above, the BR_11_CA atom pair represents in a “chemical” sense the simultaneous presence of a bromine atom and an aromatic ring on the opposite sides of a molecule. The descriptor C3_5_C4 is characteristic of two types of compounds: one containing a methyl- or trifluoromethyl-substituted benzene ring attached to the furan moiety (see Figure 2, Compounds 7 and 25), and the other containing an alkoxy group in the aromatic ring connected to the azomethine carbon of the linker (see Figure 2, Compound 56). If activity is based on the presence of the CA_12_CL atom pair, at least four of these atom pairs are necessary for classification as “Active.” This atom pair is present when aromatic fragments are located on both sides of a dumbbell shaped molecule, with one aromatic moiety containing two or more chlorine atoms in ortho and meta positions (four such atom pairs in Compound 24 are shown in Figure 2).
Although the accuracy of classification by these simplified rules is slightly lower than that of the LDA or CTLCS approaches, it can be very useful for non-computational, logical prediction of the activity class for a given arylcarboxylic acid hydrazide derivative. Note that the classification tree model with univariate splits, like the LDA and CTLCS models, also includes “longer” atom pairs with 11 and 12 chemical bonds, which is in agreement with the proposed interaction of the entire non-metabolized molecule with a given biological target rather than smaller metabolites.
3. Conclusion
Previously, we identified a novel class of compounds that potently induced TNF-α production in macrophages via activation of N-formyl peptide receptors and found that the active compounds had an arylcarboxylic acid hydrazide core structure.8 Here, we identified additional arylcarboxylic acid hydrazide derivatives that induced macrophage TNF-α production. We then used the combined group of all 86 compounds for SAR analysis to further define the features of these molecules important for activity and developed a simple, but accurate SAR model for predicting biological activity in future compound screening. A sequence of LDA, classification tree analyses with linear combination, and univariate splits based on the atom pair descriptors led to the derivation of SAR rule-based algorithms with 96.5, 91.9, and 84.9% predictive accuracy, respectively. Furthermore, LOO analysis confirmed the usefulness of theses models for a priori evaluation of macrophage TNF-α inducing activity by arylcarboxylic acid hydrazides. The intuitively understandable rules obtained from the classification tree with univariate splits, which is based on three atom pair descriptors only, revealed that the main factors influencing the activity of a given arylcarboxylic acid hydrazide derivative were either 1) the presence of a methyl or trifluoromethyl group in the benzene ring attached to the furan moiety, 2) an alkoxy group in the aromatic ring near the methylenehydrazide linker, or 3) two or more halogen atoms (chlorine or bromine) in one side of the dumbbell shaped molecule, with an aromatic fragment on the opposite side. The successful application of atom pairs to heterogeneous sets of compounds can be explained by their non-global nature, as this approach is based on simple local features of molecules rather than with certain chemical building blocks. Overall, our data demonstrate that the use of atom pair descriptors is a valuable tool for developing different SAR rules for high-throughput screening of data sets and could provide a relatively simple classification useful for de novo design of macrophage TNF-α inducers with arylcarboxylic acid hydrazide scaffolds.
4. Experimental
4.1. Reagents
The additional 23 compounds (2-7, 11–22, 51, and 83–86) investigated were purchased from Princeton BioMolecular Research, Inc. (Monmouth Junction, NJ). Their purity and identity were verified by Princeton BioMolecular Research using NMR spectroscopy, elemental analysis, and mass spectroscopy. 1H NMR spectra provided by Princeton BioMolecular Research for these compounds (10% solutions in deuterated dimethyl sulfoxide, DMSO-d6) were obtained with a Bruker Avance 200 MHz spectrometer (Bruker BioSpin, Billerica, MA) and are included in Supplementary Table S2.
4.2. Cell culture
Murine macrophage J774.A1 cells were cultured in DMEM supplemented with 10% (v/v) heat-inactivated fetal bovine serum (FBS), 10 mM HEPES, 100 µg/ml streptomycin, and 100 U/ml penicillin. Cells were grown in sterile tissue culture flasks at 37°C in a humidified atmosphere containing 5% CO2 and gently detached by scraping.
4.3. Determination of TNF-α
For treatments, cells were plated in 96-well microtiter plates at 2×105 cells/well in culture media, except FBS was reduced to 3% (v/v). The cells were treated for 24 hr with negative control DMSO, test compounds, or positive control LPS. A murine TNF-α enzyme-linked immunosorbent assay (ELISA) kit (BD Biosciences Pharmigen) was used to detect this cytokine in the cell supernatants. Cytokine concentrations were determined by extrapolation from the TNF-α standard curve, according to the manufacturer’s protocol.
4.4. Cytotoxicity assay
Cytotoxicity was analyzed with a CellTiter-Glo Luminescent Cell Viability Assay Kit (Promega, Inc., Madison, WI), according to the manufacturer’s protocol. Briefly, J774.A1 cells were cultured at a density of 3×104 cells/well with the test compounds for 24 hr at 37°C and 5% CO2, substrate was added, and luminescence signal in the samples was analyzed with a Fluoroscan Ascent FL microplate reader.
4.5. Endotoxin assay
Endotoxin was measured using the Limulus Amebocyte Lysate Pyrogent Plus kit (Cambrex Bio Science, Walkersville, MD). Briefly, the limulus amebocyte lysate was reconstituted in 250 µl solution of test compound (50 µM in endotoxin-free water/1% DMSO), and each vial was incubated at 37 °C for 1 hr. At the end of the incubation period, each vial was inverted 180° to estimate gel formation in comparison with control (endotoxin free water).
4.6. Structure encoding by atom pairs
For the purpose of SAR analysis, we used an atom pair representation of molecular structures with each atom pair denoted as T1_D_T2, where T1 and T2 are the types of atoms in the pair and D is the topological (bond) distance or number of bonds in the shortest path between these atoms in the structural formula. As previously reported,14 T1 and T2 were defined with symbolic codes used in HyperChem, Version 7 (Hypercube, Inc., Gainesville, FL) for atom type representation within MM+ force field. For example, CA, CO, and C3 codes were used for sp2-hybridized aromatic, carbonyl, and furan carbon atoms, respectively. This approach allows easy generation of atom pairs directly from the output file containing the molecular structure (HIN file) built by HyperChem. As atom pairs T1_D_T2 and T2_D_T1 are equivalent, we used a unified definition with lexicographic order of type substrings (i.e., with T1≤T2).
All 836 unique atom pairs possible for non-hydrogen atoms in the 86 derivatives of arylcarboxylic acid hydrazides were generated. This 86×836 data matrix was automatically built by our CHAIN program, based on HIN files created in HyperChem. A matrix element at the intersection of the ith row and jth column was equal to the jth atom pair occurrence in the ith molecule.
4.7. Derivation of SAR classification
Derivation of SAR classification was performed irst by the LDA method with the “Forward Stepwise” option, using the corresponding module of STATISTICA 6.0. The statistical criterion for inclusion or exclusion of descriptors at each step was p≤0.05. The stepwise LDA allowed selection of 14 significant descriptors from 836 atom pairs generated initially. The LDA run was then repeated with the “Best Subset Search” option on the basis of 14 variables selected in the first LDA run. The best subset consisted of 13 atom pairs giving the least misclassification error of LDA model.
Starting from 13 variables of the best subset, we developed binary classification tree models with discriminant-based linear combination splits (CTLCS) and with univariate splits. The classification trees were built with STATISTICA 6.0 using estimated prior probabilities and equal misclassification costs for classes.17,18 An exhaustive C&RT-style univariate split selection method was used, as described by Breiman et al.18
Supplementary Material
Acknowledgments
This work was supported in part by Department of Defense grant W9113M-04-1-0001, National Institutes of Health grants P20 RR-020185 and U54 AI-065357, National Institutes of Health contract HHSN266200400009C, an equipment grant from the M.J. Murdock Charitable Trust, and the Montana State University Agricultural Experimental Station. The U.S. Army Space and Missile Defense Command, 64 Thomas Drive, Frederick, MD 21702 is the awarding and administering acquisition office. The content of this report does not necessarily reflect the position or policy of the U.S. Government.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- 1.Beutler B. J. Invest. Med. 1995;43:227. [PubMed] [Google Scholar]
- 2.Wagner G, Laufer S. Med Res. Rev. 2006;26:1. doi: 10.1002/med.20042. [DOI] [PubMed] [Google Scholar]
- 3.Lejeune FJ, Lienard D, Matter M, Ruegg C. Cancer Immun. 2006;6:6. [PubMed] [Google Scholar]
- 4.Reed JC. Nat. Clin. Pract. Oncol. 2006;3:388. doi: 10.1038/ncponc0538. [DOI] [PubMed] [Google Scholar]
- 5.Baguley BC. Curr. Opin. Investig. Drugs. 2001;2:967. [PubMed] [Google Scholar]
- 6.Burkhart CA, Berman JW, Swindell CS, Horwitz SB. Cancer Res. 1994;54:5779. [PubMed] [Google Scholar]
- 7.Schön M, Schön MP. Curr. Med. Chem. 2007;14:681. doi: 10.2174/092986707780059625. [DOI] [PubMed] [Google Scholar]
- 8.Schepetkin IA, Kirpotina LN, Tian J, Khlebnikov AI, Ye RD, Quinn MT. Mol. Pharm. 2008;74:392. doi: 10.1124/mol.108.046946. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Andricopulo AD, Montanari CA. Mini. Rev. Med. Chem. 2005;5:585. doi: 10.2174/1389557054023224. [DOI] [PubMed] [Google Scholar]
- 10.Kirikae T, Ojima I, Kirikae F, Ma Z, Kuduk SD, Slater JC, Takeuchi CS, Bounaud PY, Nakano M. Biochem. Biophys. Res. Commun. 1996;227:227. doi: 10.1006/bbrc.1996.1494. [DOI] [PubMed] [Google Scholar]
- 11.Ojima I, Fumero-Oderda CL, Kuduk SD, Ma Z, Kirikae F, Kirikae T. Bioorg. Med. Chem. 2003;11:2867. doi: 10.1016/s0968-0896(03)00181-0. [DOI] [PubMed] [Google Scholar]
- 12.Buttingsrud B, Ryeng E, King RD, Alsberg BK. J. Comput. Aided Mol. Des. 2006;20:361. doi: 10.1007/s10822-006-9058-y. [DOI] [PubMed] [Google Scholar]
- 13.Khlebnikov AI, Schepetkin IA, Domina NG, Kirpotina LN, Quinn MT. Bioorg. Med. Chem. 2007;15:1749. doi: 10.1016/j.bmc.2006.11.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Khlebnikov AI, Schepetkin IA, Quinn MT. Bioorg. Med. Chem. 2008;16:2791. doi: 10.1016/j.bmc.2008.01.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Carhart RE, Smith DH, Venkataraghavan R. J. Chem. Inf. Comput. Sci. 1985;25:64. [Google Scholar]
- 16.Lindman HR. Analysis of Variance in Complex Experimental Designs. San Francisco: W. H. Freeman & Co; 1974. [Google Scholar]
- 17.Loh WY, Shih YS. Statistica Sinica. 1997;7:815. [Google Scholar]
- 18.Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Monterey: Wadsworth & Brooks/Cole Advanced Books & Software; 1984. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.














































