Skip to main content
ADMET & DMPK logoLink to ADMET & DMPK
. 2020 Mar 4;8(1):29–77. doi: 10.5599/admet.766

Prediction of aqueous intrinsic solubility of druglike molecules using Random Forest regression trained with Wiki-pS0 database

Alex Avdeef 1
PMCID: PMC8915599  PMID: 35299775

Abstract

The accurate prediction of solubility of drugs is still problematic. It was thought for a long time that shortfalls had been due the lack of high-quality solubility data from the chemical space of drugs. This study considers the quality of solubility data, particularly of ionizable drugs. A database is described, comprising 6355 entries of intrinsic solubility for 3014 different molecules, drawing on 1325 citations. In an earlier publication, many factors affecting the quality of the measurement had been discussed, and suggestions were offered to improve ways of extracting more reliable information from legacy data. Many of the suggestions have been implemented in this study. By correcting solubility for ionization (i.e., deriving intrinsic solubility, S0) and by normalizing temperature (by transforming measurements performed in the range 10-50 °C to 25 °C), it can now be estimated that the average interlaboratory reproducibility is 0.17 log unit. Empirical methods to predict solubility at best have hovered around the root mean square error (RMSE) of 0.6 log unit. Three prediction methods are compared here: (a) Yalkowsky’s general solubility equation (GSE), (b) Abraham solvation equation (ABSOLV), and (c) Random Forest regression (RFR) statistical machine learning. The latter two methods were trained using the new database. The RFR method outperforms the other two models, as anticipated. However, the ability to predict the solubility of drugs to the level of the quality of data is still out of reach. The data quality is not the limiting factor in prediction. The statistical machine learning methodologies are probably up to the task. Possibly what’s missing are solubility data from a few sparsely-covered chemical space of drugs (particularly of research compounds). Also, new descriptors which can better differentiate the factors affecting solubility between molecules could be critical for narrowing the gap between the accuracy of the prediction models and that of the experimental data.

Keywords: aqueous intrinsic solubility, druglike, interlaboratory experimental error, pDISOL-X, General Solubility Equation (GSE), Abraham Solvation Equation (ABSOLV), multiple linear regression (MLR), Random Forest regression (RFR), quantitative structure-property relationships (QSPR)

Introduction

In pharmaceutical research, the aqueous solubility of exploratory compounds is a very important physical property to assess [1, 2]. Peroral drugs with very low solubility may not release sufficient compound from the solid form during the intestinal transit to generate therapeutic benefit. Conversely, highly water-soluble drugs may not be able to permeate lipoidal barriers in the intestinal wall and in the barriers beyond, to reach the therapeutic site of action in sufficient concentration. Thus, not too little and not too much solubility is an important balancing act in compound advancement during drug development.

Given the large number of compounds tested in drug discovery, measurement of solubility is done by high-throughput methods, which generate “kinetic” values in buffers containing 0.5-5%v/v DMSO [2, 3]. Usually, amorphous solids precipitate from supersaturated solutions in the microtitre wells. Although kinetic solubility can be 10-100 times higher than equilibrium solubility, it is nevertheless suitable for anticipating whether a particular test compound will precipitate in an in-vitro bioassay, triggering a false positive test [3-6]. Compounds advanced into later stages of research are fewer in number. Justifiably, more rigorous methods are used to measure their equilibrium solubility, often in media more reflective of the biological fluids to which drugs are exposed [7].

It is beneficial to predict equilibrium solubility of research compounds at the start of discovery projects, as part of virtual screening of compound libraries, before any actual measurements are done, for assisting in the prioritizing molecules for the project. Numerous methods for predicting solubility of organic molecules have been described in the literature, based on quantitative structure-property relationships (QSPR), where the molecular structure is used to predict physicochemical properties [8].

This study concerns prediction of the equilibrium solubility of drugs. Perhaps, more importantly, the focus is on the impact of molecules selected to train the prediction method. The details of the evolving Wiki pS0 database (in ADME Research) [9] of druglike molecules will be described. Since 2011, the focused searching of the primary literature for equilibrium measurements of aqueous solubility (especially as a function of pH) of druglike molecules has contributed to 6355 intrinsic solubility, log S0, entries. The pre-processing of the available solubility data to extract the underlying S0 values (normalized to 25 °C [10]) utilized the purpose-designed computer program, pDISOL-X (in ADME Research) [11] (whose prototype FORTRAN version, STBLTY, was first coded in the late 1970s [12]). As part of the curation, data quality was assessed by interlaboratory comparisons of those molecules which were studied multiple times by different researchers. The log S0 values, along with their estimated standard deviations (SD), were then used to train two solubility prediction models: (i) weighted multiple linear regression (MLR) using Abraham solvation descriptors [13], and (ii) Random Forest regression (RFR) [14] using the diverse descriptor collection from the RDKit open-source chemoinformatics and machine-learning library [15]. The results were compared to those calculated by the general solubility equation (GSE), which requires no training [16, 17]. Four external test sets [18-20] were employed in the validation of the models, taking care to remove any of the test set molecules from the large training set. Three of the test sets (containing only druglike molecules) have appeared in landmark ‘Solubility challenges’ [19, 20].

Methods

Quantitative structure-property relationships (QSPR) models

General solubility equation (GSE)

In 1965 Irmann [21] described solubility prediction based on a group contribution approach. For solids, he included a term related to the entropy of fusion, coupled with the melting point (Tm). In 1968, Hansch et al. [22] recognized that the octanol-water partition coefficients, log P, are strongly correlated linearly with aqueous solubility values, log Sw, for nonionizable liquid samples. Expanding on the work of Irmann and Hansch, Yalkowsky and coworkers developed and popularized the general solubility equation (GSE), to enable the prediction of solubility of liquids and solids in water [16-18, 23-27]. Just two variables, Tm (°C) and log P, both experimental determined, are used in the equation to predict solubility of organic compounds in water (in log molar units):

graphic file with name admet-8-766-e001.jpg (1)

The equation requires no “training.” Although the GSE is rooted in sound thermodynamic principles, some assumptions had to be made in developing the equation: test compounds are taken to be nonionized and fully-miscible in octanol (leading to the 0.5 intercept term), and that the water and octanol phases are assumed not appreciably mutually soluble (but, according to [28]: water-saturated octanol contains ~25 mol% water; solubility of octanol in water is ~2 mM). The implicit assumption behind the 0.01 factor arises from the near constancy of the entropy of fusion. This is in reasonable agreement with the relatively nonflexible aromatic solutes initially considered. A semi-empirical version of the GSE was proposed: the calculated log P could be used in place of the experimental value. More recently, a version was proposed entirely based on calculated descriptors [27]. Empirically-adjusted coefficients in Eq. (1), based on various training sets [16, 24, 29], did not result in substantially improved predictions of the solubility of druglike substances. The GSE is popular for its ease of use [17].

Yalkowsky and Banerjee [18] proposed an external test set of 21 molecules: 6 solid and 3 liquid poorly-soluble pesticides (log Sw -3.4 to -7.9), 11 simple drugs (log Sw 0.5 to -4.1), and a laxative/dye molecule (with somewhat uncertain solubility). As will be shown below (cf., Fig. 11a), the solubility of the above test set molecules is well predicted by Eq. (1). This test set has been widely used by other investigators.

Figure 11.

Figure 11.

GSE (“untrained”) prediction (Eq. 1) of the four external test sets. RDKit log P was used.

Empirical prediction models

Dearden [30], Taskinen and Norinder [31] thoroughly reviewed solubility prediction studies reported from 1992 to 2005 [25, 29, 32-47] which used the popular Yalkowsky-Banerjee external test set to assess the efficacy of the empirical methods. The average of the reported prediction root-mean square errors (RMSE) is about 0.9 log unit, with individual values found to range from 0.6 to 1.4. The predictions of Raevsky et al. [29] (nearest-neighbor method, using HYBOT hydrogen bond descriptors) and Tetko et al. [40] (artificial neural network method, with electrotopological E-state indices) fared slightly better than those of others. Many of the training sets used in the prediction studies consisted of several hundred simple organic molecules, including aromatic hydrocarbons, polyhalogenated organic compounds, practically-insoluble agrochemicals and environmental pollutants, many in liquid form at room temperature, but only relatively few druglike molecules (resulting in spotty coverage of the chemical space resembling today’s pharmaceutical discovery compounds). As summarized in the reviews [30, 31], prediction methods included multiple-linear regression (MLR), principal components regression (PCR), partial least-squares (PLS), k-nearest neighbors (kNN), artificial neural networks (ANN), support vector regression (SVR), and Random Forest regression (RFR). Some of the QSPR methods were based on hundreds of calculated atomic and molecular 2D and 3D descriptors. In many of the studies, the most influential descriptors are two calculated physical properties: log P and molar refractivity, MR, (which accounts for molecular size and polarizability). Other calculated 2D descriptors included partial-charge surface properties, atom and functional group counts, connectivity and topological and electrotopological indices, H-bond donor and acceptor counts; 3D descriptors included energy terms (total potential energy, electrostatic, molecular mechanics force-field energy), molecular shape, volumes, and water-accessible surface areas [48-55].

Wang and Hou [56] summarized solubility prediction efforts up to 2010, comparing the results of 16 studies. They discussed the improvements resulting from consensus modeling. Also, there was a discussion of using “local data” models to improve predictability, with the domain of applicability (DOA) identified by molecular descriptor similarity, rather than structural (e.g., Tanimoto indices) similarity.

Abraham solvation equation (ABSOLV)

Abraham and Le [13] amended the Abraham solvation equation [57] to predict solubility:

graphic file with name admet-8-766-e002.jpg (2)

In the MLR equation, the log S0 is the dependent variable (measured log intrinsic molar solubility) and the independent variables are the five solute descriptors accounting for the transfer of solute from one phase to another: A is the sum of H-bond acidity, B is the sum of H-bond basicity, Sπ is the dipolarity/polarizability (subscripted here, so as not to be confused with solubility), E is an excess molar refraction in units of (cm3∙mol-1)/10, and V is the McGowan characteristic volume in units of (cm3∙mol-1)/100. The c0-c6 coefficients in Eq. (2) are determined by MLR, trained on a set of intrinsic solubility values of a diverse collection of molecules. The five Abraham solvation descriptors may be calculated from 2D structure (introduced as a SMILES text or as coordinates in a ‘mol’ or ’sdf’ type file) using the program ABSOLV [58] (cf., www.acdlabs.com). The A∙B cross-term in Eq. (2) is intended to deal with intermolecular H-bond interactions between acid and base functional groups in the solid or liquid environment. Its inclusion, as an alternative to using the Tm term in Eq. (1), was intended to improve the prediction accuracy. Eq. (2) applied to the Yalkowsky-Banerjee external test set, using the MLR coefficients reported by Abraham and Le (their Eq. 11), with ABSOLV-calculated descriptors, resulted in RMSE = 1.71 log unit (prostaglandin-E2 was an extreme outlier; data not shown). In the present study, we re-determined the seven MLR coefficients using our own training data, with the data weighted according to estimated measurement errors, to find a much better fit, as will be shown below (cf., Fig. 12a).

Figure 12.

Figure 12.

ABSOLV weighted MLR prediction (Eq. 2) of the four external test sets. The Abraham Solvation Equation was trained with the druglike Wiki-pS0 database.

Random Forest regression

Of the new machine-learning statistical approaches, the Random Forest regression (RFR) method is thought to be among the top performers, in terms of prediction accuracy. The method was introduced in 2001 by Brieman [14], and is implemented in the open-source “randomForest” library for the R statistical software [59-61]. RFR may be appealing to new users because it can be employed “off the shelf,” requiring only minimal learning. In many applications, the default “tuning” parameters are nearly optimal. RFR works by constructing an ensemble of hundreds of decision trees [62].

To illustrate, in part, how RFR works, Figure 1 shows an example of a single recursive partition decision tree constructed (Algorithm Builder v.1.8, ACD/Labs, Toronto, Canada; www.acdlabs.com), using the 600 zwitterionic molecules in the Wiki-pS0 database, drawing on the five Abraham descriptors [57]. The process begins with the unsupervised selection of one of the descriptors (E in the example) and finding the optimal ‘splitting’ value (1.27 in the example) which divides the solubility data into two branches: the left branch grouping 369 molecules which have descriptors less than the splitting value and the right branch grouping 231 molecules with descriptors equal to or exceeding the splitting value. A criterion for the splitting can be based on minimizing the residual sum of squares at each node,

Figure 1.

Figure 1.

Example of a calculated recursive partition decision tree (Algorithm Builder v.1.8), based on 600 zwitterionic molecules (Wiki-pS0 database), using Abraham descriptors. At each node, all five descriptors are queried to select the one best suited for further splitting of the data. In part, node splitting stops at 5 molecules. By comparison, the Random Forest method uses hundreds of trees (each containing a different subset of randomly-selected solubility values of molecules) and re-selects a subset of descriptors randomly for each node splitting

graphic file with name admet-8-766-e003.jpg (3)

where i indexes the solubility values in the left branch and j indexes those in the right branch; y represents log S0 values; <y> is the average value in the left/right branch. Each of the two branches generates a new node. The process then repeats until the “terminal” nodes are reached, associated with a specified minimum of molecules (e.g., 5). In the above decision tree training, r2 = 0.70 and RMSE = 0.81 (average of the seven terminal “leafs”). Generally, the node splitting procedure yields ever more homogeneous groupings of molecules, and produces trees which bring together similar solubility values at the same node.

The above example involved just one tree, where at each node, all of the descriptors were considered in the selection of the one best suited to split the node. RFR is different in a number of ways. Typically, 500 decision trees – a “forest” – are constructed.

Liaw [61] graphically illustrated the structure of a typical random forest. The entire data matrix comprises n rows of solubility values and p columns of chemical descriptors. Each tree in the forest is allocated a different bootstrap (with replacement) sample of the n rows – i.e., it contains a randomly-selected subset (e.g., two thirds) of the entire solubility data. For each tree, the “left out” molecules (e.g., one third) are called the ‘out-of-bag’ (OOB) sample. Each tree is grown to its maximum size by node splitting, as partly illustrated in Figure 1. In RFR, only a randomly-selected subset of the available descriptors (typically, p/3) is used at each node in each tree. Each tree is grown until the terminal nodes are reached, with each final “leaf” containing a specified minimum number of solubility values, the average of which being the predicted value for the particular tree. The final prediction for the regression model is made by averaging predictions from all trees. All the compounds that did not take part of the tree growing process (OOB compounds) can be used as an internal validation set to estimate the error of the model.

To assess the predictability of the models in the current study, we randomly split the solubility data into a training set (70%) and an internal test set (30%), as described by Walters [63]. Also, external test sets proposed by others were predicted based on the RFR model trained with all of the molecules (excluding any from the external test sets).

RFR is not sensitive to the presence of irrelevant descriptors, even those which are highly correlated. Hence, “over-fitting” the data is not expected. (However, it is noteworthy that if test set molecules are also included in the training set, then their RFR “prediction” will be very close to the user-provided measured values.) RFR includes built-in estimation of (i) prediction accuracy (as standard deviation of the predicted mean), (ii) descriptor importance (as a result of sensitivity testing of each descriptor), and (iii) similarity between molecules (as a result of the node filtering process). The application of the method to QSAR predictions has been described in detail by Svetnik et al. [64]. An inconvenience of the currently-developed RFR method is that it cannot extrapolate (in the sense that MLR methods can): it cannot predict any solubility value outside of the range encompassed by the training set. For example, the extremely-low (log S0 < -8) solubility of drugs like amiodarone, clofazimine, itraconazole, halofantrine, and probucol is not expected to be well estimated by RFR. The latter molecules are near the edge of the chemical space (defined by the descriptors used) that’s sparsely populated by molecules with similar solubility. The closest molecules are likely to be more soluble than the above test compounds.

The first applications of RFR to predict solubility appeared in 2007 [65, 66]. Schroeter et al. [65] used Sw and SpH data (mixed values not corrected for ionization) to train a RFR method, using ~4000 measurements mostly taken from secondary sources [35, 67, 68] and some from in-house (Bayer Schering Pharma) sources. For the Huuskonen data [35] as test set, RMSE = 0.66 (n=1290) was reported. For the solubility data in the domain of applicability (DOA) matching that of research compounds (10-3 to 10-7 M solubility), the RFR method indicated RMSE ~ 0.85 log unit. In the Palmer et al. [66] RFR analysis, aqueous solubility values of 998 structurally diverse druglike solid organic compounds were gathered from similar secondary sources: Handbook of Aqueous Solubility [69], Huuskonen [35], and Delaney [47]. (It was not reported how molecules were corrected for ionization.) The authors used the molecular operating rnvironment (MOE) [70] to generate 126 two-dimensional (log P, MR, charged-surface properties, atom, group, and H-bond counts, connectivity and topological indices) and 36 three-dimensional (total potential energy, electrostatic contributions, molecular shape, and solvent-accessible surface area) descriptors. Various values of the RFR tuning parameters, ntree, mtry, and nodesize, were explored in the model trained with all of the 2D descriptors, with the best parameter values found to be ntree = 500, mtry = 42, and nodesize = 5, which are the usual default values. The training set of compounds produced the statistics: r2 = 0.98, RMSE = 0.28, n = 988, bias = 0.007. As often pointed out, this is not an accurate measure of the predictability of solubility of molecules not used in the training process. Randomly splitting the entire data into a training set (70%) and an internal test set (30%) produces a good measure of the ability of the model to predict solubility of compounds not included in the training set, indicated r2 = 0.89, RMSE = 0.69, n = 330, bias = 0.017. An external test set produced similar statistics. Including the 3D descriptors did not make substantial improvements to the model.

The most influential descriptors in the Palmer et al. study were calculated to be those related to the fractional van der Waals surface area, VSA. The ten most important descriptors ranked by RFR were log P > negative VSA (PEOE_VSA_FNEG) > number of hydrophobic atoms (a_hyd) > MR > hydrophobic atoms VSA (vsa_hyd) > chi1v (topological) > polar VSA (PEOE_VSA_FPOL) > hydrophobic VSA (PEOE_VSA_FHYD) > MW > negative polar VSA (PEOE_VSA_FPNEG).

More recently, Walters [63] thoroughly compared the Huuskonen thermodynamic Sw values (n = 1274) [34, 35], the Llinas et al. thermodynamic S0 values (n = 94) [19] and PubChem (n=1000) kinetic high-throughput solubility [71] databases using the RFR framework. The publication serves as a very useful tutorial to the machine-learning method, and is highly recommended for those interested to try RFR.

Gap between prediction and experiment

For 411 compounds characterized by multi-source solubility measurements, Katritzky et al. [72] found standard deviation, SD, to be 0.58 log in replicate values. According to Taskinen and Norinder [31], an AstraZeneca in-house database of solubility measurements of different batches of the same compound typically showed reproducibility of 0.49 log. Higher uncertainties had been discussed (Jorgensen and Duffy [73]; Palmer and Mitchell [74]). It has been a widely-shared view that interlaboratory measurement reproducibility is typically 0.6 log.

As mentioned previously, the solubility prediction errors are often in the 0.6 to 1.3 log unit range [30, 31, 56, 73, 74]. So, one might surmise that prediction methods are approaching measurement error limit. But, this may not be so.

First, many of the early prediction studies considered molecules from a chemical space occupied by relatively simple organic molecules and some complex agrochemicals, which were adequately represented by the then available training set data. In some of these studies, low RMSE were achieved. Earlier training sets were under-represented in practically insoluble and highly lipophilic druglike molecules, whose physicochemical properties are not easy to measure accurately. In some cases, important descriptors, such as calculated log P can be off by 1-2 log units (e.g., amiodarone). Since values of log P > 5 or < -2 are difficult to measure accurately by the shake-flask method [28], log P prediction methods can be uncertain for out-of-bounds molecules. At such extreme values, experimental log P values may not strongly correlate with the experimental log S values [75]. Since many of today’s research compounds have very low solubility, the earlier prediction methods that have shown low RMSE are not expected to do as well when subjected to predicting solubility of practically insoluble drug molecules, such as amiodarone and itraconazole, or novel research compounds synthesized in drug discovery programs, for which there may be a shortage of prediction training set data publically available.

Second, the perceived 0.6 log error in measured solubility may be upwardly biased, given how disparate legacy data have been handled in assembling large training sets. The relatively poor reproducibility may be the result of systematic errors arising from mixing different types of solubility values, measured at different temperatures, or simply gathered from poor-quality measurements. A ‘white paper’ drawing on expert consensus thoughts of researchers from six countries addressed the critical needs related to experimental assay design, and how legacy data can be better processed to reveal improved precision [76]. A related study [9] discussed at length the correction of data for ionization when solution complexity distorts the expected shape of the log S-pH profile predicted by the Henderson-Hasselbalch equation. When solubility values measured in the temperature range 10-50 °C are transformed to values at 25 °C, the estimates of the interlaboratory precision improve [10].

The above two points suggest that the gap between prediction and experimental errors may still be substantial. Similarly, Palmer and Mitchell [74] made the case that it’s not the data that are limiting, but rather it’s the prediction methods (and/or descriptors) that need further improvements. In an earlier review, Faller and Ertl [77] suggested that “no really satisfactory approach to [drug] solubility prediction is available yet,” in spite of the large number of prediction studies.

Quality and chemical space of experimental data

It has been consistently shown that the best prediction models are devised from training set molecules that occupy very similar chemical space (defined by the descriptors used) as those in the test set [63]. For drug solubility prediction, the ideal training sets would consist of molecules of interest to discovery projects. Only a tiny fraction of such measurements are publically available, and in-house pharma prediction studies are unlikely to be openly publicized.

Measuring equilibrium solubility of ionizable molecules is expensive and analytical-resource consuming. Even given high analytical investment, quality is not assured when results are based on poorly-designed assays.

Factors affecting reproducibility in published solubility data – ‘white paper’ summary

Many of the factors affecting the quality of equilibrium solubility measurement have been discussed in the consensus report (‘white paper’ [76]) are summarized in the list:

  • dissolution of added solid has not reached equilibrium during the selected equilibration time,

  • solid state characterization not performed after equilibration - polymorphs, hydrates, solvates, nanoparticles, amorphous forms not identified,

  • formations of drug aggregates/oligomers (dimers, trimers, …), micelles, and drug-buffer complexes in solution at equilibrium [78],

  • poor wettability,

  • adsorption to filter/vial surface,

  • inappropriate phase separation methods used, e.g., (i) first centrifuging a saturated solution, then filtering the supernatant (without first saturating the filter); (ii) multiple re-centrifuging a centrifuged solution (without pre-saturating the vial surfaces); (iii) nano-sized particles passing through filter,

  • using unnecessarily high buffer concentrations, possibly effecting drug-buffer complexation [78],

  • not using buffers with low-soluble ionizable drugs (especially weak bases),

  • effect of impurities unaccounted, especially those which are ionizable when unbuffered solutions are used,

  • not measuring the final pH of the equilibrated saturated solution of ionizable drugs (buffered pH may be altered by the drug),

  • not taking into account the effect of ambient CO2 on the water solubility of low-soluble bases in unbuffered solutions,

  • inadequate pH electrode calibration at low/high pH (junction/asymmetry effects), and in drug-salt studies (high ionic strength),

  • compound instability at the extremes of pH or over long saturation times (e.g., indomethacin, acetylsalicylic acid, ascorbic acid),

  • stereoisomers (DL-, D-, L-), (R-/S-), or cis-/trans-isomers not stated,

  • limit of detection (LOD) - not sufficiently sensitive analytical methods used to determine drug concentration below LOD,

  • for ionizable compounds, inaccurate value of pKa used to calculate log S0 from log S-pH profile introduces systematic error.

The impact of the above factors can be minimized by employing good experimental practices and appropriate data analysis methods. However, in today’s solubility prediction methods, factors such as the formation of differing polymorphs, hydrates, solvates, amorphous solids, and the impact of stereoisomers, are not adequately addressed.

Data

Wiki-pS0 database

The intrinsic solubility database, Wiki-pS0 (in-ADME Research), contains 6355 log S0 (log molar) entries, based on measured aqueous solubility values of 3014 different compounds collected from 1325 cited references (as of April 2019). In the majority of the cases, the literature data were further processed, using pDISOL-X (in-ADME Research), to extract intrinsic solubility (S0) values from reported aqueous free-acid/base or salt solubilities (Sw), solubilities at specified pH (SpH), or log S-pH profiles [9, 11, 76, 78-81]. All of the molecules are solids at room temperature (except for propofol, whose Tm is 14 °C). There are 1078 log S0 entries derived from 9907 individual log S measurements at a particular pH (cf., Fig. 1a in [9]). About half of the data sources originate from secondary listings and the rest are from primary sources. In the case of secondary sources, the citations to the original work are generally available, and in many cases were consulted for clarifications. Differently named molecules were identified and reconciled by searching the database for matching Tanimoto structural fingerprint indices [15].

For 3671 entries, comments were added to the database records (based on available information in the original sources), briefly noting experimental method used (mostly saturation shake-flask), temperature (23 °C assumed when ‘room temperature’ was stated or no value was provided), equilibration time, apparent quality of data, standard deviation in measured values (if reported), buffers/pH, polymorphic or hydrate form (if identified), method of solid separation, agitation method, etc.

The most reliable data had been determined by the saturation shake-flask (SSF) method (still the “gold standard” methodology in the minds of most experimentalists), especially when taken as a function of pH. Also, two potentiometric instruments have demonstrated their importance: pSOL [82] and CheqSol [83] (both now available from Pion Inc., Billerica, MA, USA). The characterization of solid forms (crystalline, amorphous, nanoparticle, etc.) and their impact on the measured solubility are important considerations (i.e., solvate, polymorph, racemate effects), but these are not always reported/detailed in the solubility studies.

Two websites: ChemSpider (Royal Society of Chemistry, UK) www.chemspider.com, and PubChem (https://pubchem.ncbi.nlm.nih.gov/ were valuable for checking names of molecules, obtaining CAS numbers, getting structure representations (SMILES), melting points (Tm), and the like. ACD/Labs ChemSketch was useful for drawing molecules and constructing SMILES representation for molecules. When measured Tm were not found (as in 19% of the entries in Wiki-pS0), then Lang and Bradley [84] predicted Tm were used: QsarDB open repository of data and prediction tools (http://qsardb.org/repository/predictor/10967/104?model=rf).

Data added to Wiki-pS0 from multi-source compilations (‘low hanging fruit’)

  • PHYSPROP database [67] (Sept 1999 version: over 6000 measured water solubility, Sw): 1327 values were selected for molecules not appreciably ionized in water. Excluded compounds were: (a) Tm < 40 °C, (b) log Sw < -8 or > 0, (c) surfactants/long aliphatic chain molecules, (d) polycyclic aromatic hydrocarbons, (e) peroxides, (f) carboxylic acids, (g) salts/complexes, (h) dyes or names containing color, and (i) herbicides, pesticides, insecticides, rodenticides, and acaricides (as indicated by “tags” at the ChemSpider website). Of the selected 1327 compounds, the Sw values of 1210 nonionizable/nonionized molecules were taken to be S0. The other 117 compounds were processed by pDISOL-X to calculate S0 and pHsat (pH of saturated solution) from the given Sw and pKa, assuming pure water was the solvent, and the Henderson-Hasselbalch equation was valid [11]. Literature references (many from Merck Index and Beilstein – cf., below) were recorded in Wiki-pS0.

  • Handbook of aqueous solubility data [85]: 1130 Sw of druglike molecules were selected, with 776 values subjected to pDISOL-X analysis to determine S0 values. Some values were listed as intrinsic in the handbook, only requiring adjustments when the temperature was not 25 °C. Original references were recorded in Wiki-pS0. Many references were checked; however, references for 65 compounds could not be accessed online. Occasionally, reported Sw values for neutral compounds were actually those of drug-salt measurements, as clarified on checking the original literature.

  • Beilstein [68] (cf., [67]): Sw values of 474 compounds were used, after conversion to the S0 scale, where necessary.

  • Benet-Broccatelli-Oprea 927 BDDCS solubility list [86]. This compilation contains interesting drugs, but no references to original sources were cited and no experimental details were given. Of the drug solubilities listed, 333 were selected. In many cases, the original sources were recognized on cross checking with existing entries in Wiki-pS0. The Sw values were mostly of free bases/acids, but some were clearly of salts, which required careful effort to discern. All selected values were converted to the S0 scale using pDISOL-X.

  • Analytical profiles of drug substances (APDS) [87]. The first 39 volumes of the series of monographs were searched for quantitative solubility data. Monographs on 155 molecules were selected for pre-processing. Most of the reported solubility values of ionizable molecules were measured in pure water with unspecified saturation pH. For those ionizable molecules which were not drug salts, the intrinsic values were calculated by pDISOL-X. Unfortunately, the solubility reported in APDS is often devoid of experimental detail (e.g., temperature not always reported), some citing ‘personal communication’ as references. Nevertheless, there are several high-quality log S - pH original data sets in the monographs.

  • Merck index [88]. Sw values of 173 molecules were used, after conversion to the S0 scale. The Merck Index is often cited in older databases (e.g., [67]), but it may not be a sufficiently reliable general source for critical studies (literature references not usually given, details often lacking, etc.).

  • Biowaiver monographs for immediate release solid oral dosage forms [89]. Dressman and colleagues published a series of papers (2005-2018), from which 14 drug solubility values were added to Wiki-pS0, some being not previously-published measurements.

  • Miscellaneous collections: Freier’s book [90] - 96 values were selected; Handbook of Biochemistry [91] - 54 values were used; Kühne et al. tabulation [33] - 53 values used; Mullin’s book [92] - 51 values used; Raevsky et al. tabulation [29] - 32 values used.

Single-source measurement of many compounds (‘quick catches’)

The small single-source databases below consist largely of intrinsic solubility values. Useful collections of original measurements included those of McFarland et al. [93], Bergström and coworkers [94-98], and Faller and Ertl [77].

  • Avdeef [80] - 39 values, not published elsewhere, were used.

  • Rytting et al. [99] - free-base/acid (no salts used) SSF-measured Sw: solubility of 113 molecules, all measured in one laboratory, with S0 calculated by pDISOL-X.

  • CheqSol log S0 at 25 °C (potentiometric) - 233 values for 145 molecules collected from several publications: Stuart and Box [83], Sköld et al. [100], Llinàs et al. [19, 101], Box and Comer [102], Hopfinger et al. [103], Narasimham et al. [104], Hsieh et al. [105], Comer et al. [106], Palmer and Mitchell [74], Etherson et al. [107]; Schönherr et al. [108]; Fornells et al. [109], and Baek et al. [110].

  • pSOL log S0 at 25 °C (potentiometric) – 75 published values were collected: Avdeef [111, 112], Avdeef et al. [82], Avdeef and Berger [113], Faller and Wohnsland [114], Bergström et al. [115], Fioritto et al. [116], and Ottaviani et al. [117].

Data from miscellaneous primary sources (‘deep-sea fishing’)

About 2000 solubility values were gathered from various primary (non-database) sources. Those publications which contained measurements as a function of pH were particularly valuable. A large fraction of the primary source data originated from a few journals: Int. J. Pharm., J. Pharm. Sci., Pharm. Res., J. Chem. Eng. Data, Eur. J. Pharm. Sci., AAPS PharmSciTech, AAPS J, J. Chem. Inf. Comput. Sci., and Ind. Eng. Chem. Res.

Sources of pKa data

The pKa values of the ionizable molecules were taken from Avdeef [80]; (cf., www.in-ADME.com/wiki_pka.php/), and various other established sources. When no experimental values were found, then the values calculated by MarvinSketch 5.3.7 (ChemAxon Ltd., www.chemaxon.com) were used. The pKa values were automatically adjusted for changes in the ionic strength [11, 80] and temperature [118] by pDISOL-X.

Units conversion

Solubility data have been reported in many concentration units: mol/L (molarity, M), mM, μM, mol/kg (molality, m), mole fraction (x), mg/mL, μg/mL, mg/100mL, mg/dL, %w/v, g%mL, mg/mL%, mg%, “1 in 15 of water,” “soluble in 3 parts of water,” “2% soluble in water,” units of IU/mL, etc. Mole fraction and molality units are almost always used when solubility is determined over a wide range of temperatures, since the units do not depend on the density of the solutions. In the clearly presented accounts, the equivalent molecular weight to use to convert the practical units (e.g., μg/mL) to molarity is stated (e.g., “concentration is expressed as free base equivalent”). In practice, it is all too easy to make a mistake in converting the reported units to the preferred molarity scale, so extra care is recommended.

It could be argued that solubility should be tabulated in logarithmic units (preferably based on molarity). (i) Direct values span over 12 orders of magnitude and cannot be accurately depicted in S-pH plots at the low end of the scale (sic - log of “zero” solubility is undefined). Unfortunately, raw S-pH data are often presented only in a plot, with points plotted at ~zero. (ii) Errors in log S values do not depend on the magnitude of the log S (whereas they do when direct units are considered). This is problematic when refinement of constants is based on S measurements and unit weights are assumed.

In the Wiki-pS0 database, values reported in molality units are noted, but are seldom converted to those in molarity (by applying solution density), since the differences are small around the temperature range of interest, and since solution density is usually not reported.

Interlaboratory reproducibility

There are 870 different molecules in Wiki-pS0 for which solubility was reported from at least two different sources. This formed the basis for estimating interlaboratory reproducibility. Some molecules had been studied in many different laboratories. For example, there were 33 different reports of the solubility of diclofenac found to date, with 17 of these measured at several different pH values, whose complicated profiles were reconciled and discussed by Bergström and Avdeef [79]. The next most-frequently studied molecules are phenytoin, barbital, and ketoprofen, with 30, 26, and 24 interlaboratory determinations, respectively. The average interlaboratory reproducibility, SDavg, based on the curated 870 replicated studies, has been determined to be 0.17 log unit, significantly lower than the experimental reproducibility suggested in past studies (~0.6 log unit) [72-74]. As noted above, many factors can lead to the perception of poor reproducibility of measurements. It takes some effort to factor in the possible sources of systematic error, to attain the low SDavg. Still, for some difficult-to-measure drug molecules, the intrinsic solubility is quite uncertain, with SD values exceeding 0.5 log unit [20, 79].

Physicochemical properties of database molecules

The 6355 intrinsic solubility set ranges in log S0 from -11.0 to +1.8 (log molarity), essentially with a Gaussian distribution: mean = -3.04, median = -3.00, SD = 1.88. Figure 2 shows the solubility distribution for the molecules. About 47% of the entries have log S0 between -7 and -3, the typical range (DOA – domain of applicability) of values for drugs and research compounds [65]. About 2% of the molecules have log S0 < -7. Some of the least-soluble molecules (log S0 < -8) in the database are amiodarone < clofazimine < itraconazole < halofantrine < ubiquinone < epristeride < vinorelbine < silafluofen < cosalane < etretinate < probucol < arotinoic acid < clomifene < motretinide < lasalocid < carbenoxolone. The most soluble (log S0 > 0) substances are amino acids, simple carboxylic acids, and carbohydrates.

Figure 2.

Figure 2.

Distribution of intrinsic solubility values in Wiki-pS0.

Figure 3 shows the trend between measured log S0 and calculated log P (RDKit [15]), the most important descriptor in the prediction of solubility. The scatter is substantial, and perhaps trends nonlinearly at the extremes of the scales. The measured extreme values of log S0 are possibly more accurate (since these are mostly determined from multi-point log S-H profiles) than the corresponding calculated log P (cf., ubiquinone and amikacin log P values). The traditional shake-flask method for direct-measure log P is thought to be limited to the range (-2 to +5), so methods for prediction of log P would be hard pressed to extrapolate accurately beyond that range, in the absence of reliably measured log P training-set values.

Figure 3.

Figure 3.

Plot of log S0 (largely SSF type) versus octanol-water partition coefficient, log P, calculated using the RDKit software [15].

Figure 4 shows the distribution of errors determined by averaging the log S0 of those replicate molecules measured in different laboratories. The average value of interlaboratory standard deviation is SDavg = 0.17 log unit. The individual SD values trend to higher values as solubility decreases (Fig. 4b).

Figure 4.

Figure 4.

Interlaboratory reproducibility, as indicated by SD, was determined from averaging log S0 derived from different sources. (a) Error distribution for the 870 replicates. (b) Interlaboratory average log S0 plotted against the corresponding SD values. The trend suggests that the lowest solubility values have the highest errors, but the data scatter is high.

The molecule showing the poorest reproducibility, with SD = 0.93 log unit (avg. from five sources), is clofazimine. It is also among the least soluble molecules in the database, with average log S0 = -9.05. The weakly dibasic (pKa 3.83, 7.54 at 37 °C, I = 0.15 M [105]) phenazine antibiotic (MW 473.4 g/mol) is used to treat leprosy. The orally-bioavailable molecule has the very unusual characteristic of precipitating and accumulating as easily-visible red microcrystals in macrophages [119].

Rule of 5 characteristics

Figure 5 shows the distribution of properties used by Lipinski et al. [120] to define the Rule of 5 as an indicator of “drug-likeness.” Frame (a) shows the log P distribution, with the average value of 1.89. About 80% of the 6355 entries fall within the range of 0 to 5 (expected range for druglike molecules). Frame (b) shows the distribution of molecular weights, with the mean value 280 g/mol. About 95% of the molecules have MW < 500 g/mol (‘good’ range). Frame (c) considers H-bonding characteristics. The red bars (tallest) refer to H-bond donor counts (NHD), where 98% NHD ≤ 5 (‘good’). The black bars (extending to higher counts) refer to H-bond acceptors (NHA), where 97% NHA ≤ 10 (‘good’). For the most part, the database molecules are in the expected boundaries of drug-likeness, with log P showing some violations at the high end, and more so at the low end for about 20% of the entries.

Figure 5.

Figure 5.

Rule of 5 property distributions: (a) log P, (b) molecular weight (MW), and (c) number of H-bond donors (NHD) and acceptors (NHA). Most of the molecules have ‘druglike’ properties.

Results and discussion

Table 1 summarizes the results of the weighted multiple linear regression (MLR) analysis of the Abraham solvation equation (ABSOLV), and the ‘trained’ version of Yalkowsky’s general solubility equation (GSE). Also listed are the Random Forest regression (RFR) metrics. The 22 quaternary ammonium compounds were treated as a separate subset, using just some of the Abraham descriptors. The remaining 6333 solubility values were subjected to the full MLR analyses. Furthermore, the molecules were considered separately in each of four acid-base classes – with reference to predominant charge state at pH 7.4: acids(-), bases(+), neutrals(0), and zwitterions(±), as well as in combined classes.

Table 1. Results of log S0 prediction using three computational modelsa.

graphic file with name admet-8-766-t001.jpg

a Descriptors defined in Abbreviations and definitions section. n(tr) = training set count; n(val) = count for internal test set validation. The calculations with n=6333 count did not include the 22 quaternary ammonium drugs.

Yalkowsky’s general solubility equation (GSE)

It was of interest to see how well the GSE (untrained) predicted solubility values in the database. Figure 6 shows the results of applying Eq. 1 to the acid-base subset data. The first three classes (Figs. 6a-c) have similar statistical metrics: r2 = 0.54 to 0.61, RMSE = 1.15 to 1.24, bias = -0.14 to -0.30, and MPP = 37-40% (measure of prediction performance: percentage of the absolute residuals ≤ ±0.5 log unit). The GSE did not perform as well for the zwitterions (Fig. 6d): r2 = 0.07, RMSE = 1.54, bias = +0.34, and MPP =25%. The average calculated log P [15] for the zwitterion set is 0.07 (Table 1), suggesting that the GSE prediction of zwitterions is based largely on Tm contributions. When all the classes were combined (n = 6333, excluding 22 quaternary ammonium drugs), the untrained GSE prediction yielded r2 = 0.57 and RMSE = 1.23 (Table 1).

Figure 6.

Figure 6.

Prediction of the Wiki-pS0 database log S0 values using Yalkowsky’s General Solubility Equation (GSE), Eq. 1. The molecules are divided into four acid-base classes with reference to predominant charge state at pH 7.4: acids(-), bases(+), neutrals(0), and zwitterions(±). The solid diagonal is the identity line. The dashed lines are displaced from the identity line by ±0.5 log. The pie chart refers to the percentage of ‘correct’ predictions, MPP (measure of prediction performance).

When the fixed coefficients in Eq. 1 (0.5, -1.0, -0.01) were subjected to regression using weighted MLR, the fit improved only slightly for the combined acid-base classes: r2 = 0.60, RMSE = 1.17, n = 6333, but the refined coefficients (-0.33, -0.83, - 0.006) were quite different from the traditional values, especially for the intercept coefficient (Table 1). This may be due to the negative correlation between the intercept and the Tm terms (-82 to -97%). When the molecules were examined by the acid-base classes, the acids most resembled the results of the untrained GSE, with coefficients (0.62, -0.94, -0.0115) and metrics: r2 = 0.70, RMSE = 1.02, n = 1424. The bases and neutrals indicated a negative intercept, -0.28, with only slightly improved metrics (Table 1). The zwitterion class had reversal of signs for both the intercept and the temperature dependence coefficients, with the slightly improved metrics: r2 = 0.22, RMSE = 1.28, n = 600.

Weighted multiple linear regression using Abraham descriptors (ABSOLV)

Figure 7 displays, by acid-base classes, the results of the weighted MLR analysis using the five Abraham ABSOLV descriptors plus the A·B cross-product term. The statistical metrics were similar for the four classes: r2 = 0.61 to 0.73, RMSE = 0.77 (zwitterions) to 1.01 (neutrals), and 40-43% ‘correct’ values (MPP). The performance was slightly better than that of the GSE (trained or untrained), and a lot better in the case of zwitterions. The refined ABSOLV coefficients (Table 1) indicate acid-base class differences. These coefficients are not similar to the ones reported by Abraham and Le [13]. In MLR, such differences in coefficients can arise when different training sets are used, as a result of correlations between descriptors. It was found that const:A correlations ranged -50 to -83% and const:AB correlations ranged +57 to +79%.

Figure 7.

Figure 7.

MLR prediction of log S0 in the Wiki-pS0 database using Abraham Solvation Equation (ABOLV), Eq. 2. The molecules are divided into four acid-base classes with reference to predominant charge state at pH 7.4: acids(-), bases(+), neutrals(0), and zwitterions(±). The solid diagonal is the identity line. The dashed lines are displaced from the identity line by ±0.5 log. The pie chart refers to the percentage of ‘correct’ predictions, MPP (measure of prediction performance).

Random Forest regression using RDKit combined with Abraham descriptors and melting points

Descriptors

For the RFR model building, the 193 RDKit (2014 version) descriptors calculated were pooled with the Tm (81% values measured, the rest calculated) and the calculated ABSOLV descriptors. The Abbreviations and definitions section below identifies and defines the most important descriptors used in the RFR algorithm.

Training set and internal validation

Figure 8a shows the entire training set RFR analysis, with the metrics: r2 = 0.95, RMSE = 0.40, 
bias = -0.007. This is not a good measure of the predictive power of the method. Rather, it indicates how well the model can incorporate the information represented by the descriptors and relate it to solubility in the training set [66]. The randomly selected internal test set of 1906 solubility values (30%) are better indicators of the ability of the model to predict external tests compounds which are unknown to the training process. Figure 8b shows the internal test set prediction results: r2 = 0.89, RMSE = 0.60, 
bias = 0.0002. This performance is to be expected for external test molecules which are well-represented by the chemical space of the database, as illustrated below.

Figure 8.

Figure 8.

Random Forest regression analysis. The solid diagonals are the identity lines, and the dashed lines refer to ±0.5 log deviations. The MPP pie charts refer to percentage of ‘correct’ prediction, with absolute residuals ≤0.5 log. (a) Training set using the entire database. (b) Internal test sets, based on 30% of the database. The unfilled-circle symbols correspond to the zwitterion internal test set (30% of 600).

The bottom section of Table 1 summarizes the analysis metrics, both for the entire data set and for the acid-base subsets. The best internal test set performance was found for the zwitterions: r2 = 0.91, RMSE = 0.45. The right-most column identifies the ten most-important descriptors in the analysis. For the overall data, and for the acid, base, and neutral subsets, the most important descriptor is log P. It’s particularly noteworthy that log P is not in the top-10 list for the zwitterions. In several of the cases, the second-most important descriptor is molecular refractivity (cf., Abbreviations and definitions for the RDKit terminology). Topological indices play particularly important roles in the zwitterion subset.

Principal component analysis of thirty of the most important RDKit descriptors

The principal component analysis (PCA) function, prcomp(), in the factoextra R library was used to process the 30-most important descriptors identified in RFR. Figure 9 shows the loading plot based on the first two principal components, which account for 63% of the total variance in the descriptors. Only the HallKierAlpha descriptor has a negative PC1 value, with all of the rest of the descriptors being in the positive PC1 domain. The close proximity of many of the descriptors to each other suggests high correlation between them. Such correlations would be problematic in MLR analysis, but not in RFR.

Figure 9.

Figure 9.

Principal components analysis loading plot for the 30-most important RFR descriptors. The zoom view identifies highly-correlated size-related descriptors. Circles represent the 10-most important descriptors; squares represent the second 10-most important descriptors; diamonds represent the remaining ranking.

Figure 10 shows the scores plots for the solubility data. Frame (a), which considers only the molecules with MW < 500 g/mol, shows a very dense but apparently symmetrical distribution about the origin. As MWs increase, the points shift in the direction of increasing PC1. Frame (b) shows the molecules with MW > 500 g/mol. The distribution is sparse and further shifted to increasing PC1 values, as MW values increase. Frame (c) shows all the data with the acid-base subset notation. Very large molecules are thinly represented in the bottom-right quadrant. Zwitterions tent to be in the negative PC2 half, evenly distributed in PC1.

Figure 10.

Figure 10.

Principal components scores plot for the RFR training set. (a) Molecular weights < 500 g/mol; (b) MW > 500 g/mol; (c) “Comet-shaped” distribution for the entire database by acid-base classes. Symbols have the same meaning as in Figures 6 and 7. The green diamonds refer to quaternary ammonium drugs.

Validation against four external test sets

Four external test sets were selected to explore how well the GSE, ABSOLV, and RFR models perform. For each of the test sets, all the test molecules found in the training set were removed, so that the prediction was of truly “unknown” molecules. This was not necessary for the traditional GSE model, since it requires no training. The observed and calculated values are listed in Appendix Tables A1-A4.

Figure 11 displays the correlation plots of the GSE calculation for each of the four test sets, using RDKit-calculated log P. RMSE range from 0.97 to 1.24, as 22 – 42% of the data are ‘correctly’ predicted (MPP).

Figure 12 displays the correlation plots of the ABSOLV weighted MLR analysis for each of the four test sets. The ABSOLV model predicted the Hopfinger et al. Test Set 2 better than did the SGE model (RMSE 0.98 vs. 1.23), but did not do as well with Test Set 1 (RMSE 1.15 vs 0.97). The performances with Test Sets 3 and 4 were comparable between GSE and ABSOLV models, with RMSE values ranging from 1.02 to 1.24.

Figure 13 displays the correlation plots of the RFR model for each of the four test sets. The overall statistics (r2 = 0.66-0.83, RMSE = 0.75-1.05) indicate that the predictions are better than those in the other two models.

Figure 13.

Figure 13.

RFR prediction of the four external test sets. With 3 outliers removed (n=29) in (d), r2 = 0.82, RMSE=0.76, F=121, bias = -0.31, with 41% residuals falling inside the dashed lines.

However, there were two main problem areas in the RFR modeling, as indicated by poor fit: (i) Fig. 13a shows the outlier pesticides 4,4’-DDT, 2,2’,4,5,5’-PCB and chlordane; (ii) Fig. 13d shows the outlier drugs amiodarone, clofazimine, and itraconazole.

Case (i) can be remedied. The Wiki-pS0 database has very few agrochemicals and no DDT or PCB derivatives. We decided to temporarily augment our database with agrochemicals, to see if RFR prediction could be improved for Test Set 1 (Fig. 13a). The Huuskonen [35] database of 1297 organic molecules was screened with three filters: (a) only compounds with log Sw < -5 would be used; (b) only solids would be considered; and (c) Test Set 1 compounds would be excluded. This process resulted in 115 new entries to the augmented database. Figure 14 shows the improved results. By adding a few agrochemicals to the RFR training set, r2 increased from 0.83 to 0.90, RMSE decreased from 0.83 to 0.66, bias lowered from -0.23 to +0.02, and ‘correct’ predictions increased from 57 to 71%. The well-known adage that “like predicts like” is amply illustrated in this example.

Figure 14.

Figure 14.

Prediction of Test Set 1 molecules with an augmented RFR training set.

Antipyrine appears to be poorly fit for reasons related to uncertainty in calculated log P (calc: 1.48, obs: 0.38). Replacement of the calculated with the observed value improved the antipyrine fit by 0.2 log units, suggesting that other descriptors may be problematic. (The improvement in the GSE calculation was 1.2 log units for antipyrine.)

Case (ii) remains problematic - a case of training-set “missing neighbors” problem. As is evident in Fig. 13d, amiodarone, clofazimine, and itraconazole are poorly predicted, in part because there are few other molecules possessing the properties of these three compounds (cf., upper right edge in scores plot Fig.10c) in the database, and also, because RFR cannot extrapolate solubility beyond the range of its training data. From the PCA analysis, the five nearest neighbors to amiodarone (log S0 = -10.4), based on three principal components, are halofantrine, irbesartan, butaperazine, mifepristone, and probucol. The log S0 values for these neighbors show high variance: -8.0, -3.7, -4.3, -5.2, and -8.4, respectively. The RFR-predicted value for amiodarone is log S0 = -6.8, barely greater than the average value of the five nearest neighbors. To do better, the database needs new neighbors in the chemical space close to amiodarone, clofazimine, and itraconazole. Or, better descriptors are needed to define the chemical space, so that truly ‘similar’ molecules will have nearly the same solubility values. With the three outliers removed, the metric improve: r2 = 0.82, RMSE = 0.76, bias = -0.31, and MPP = 41%.

Prediction of solubility of quaternary ammonium drugs

Quaternary ammonium compounds are salts, and so do not fall into the category of neutral species associated with the log S0 constants studied here. GSE and RFR methods did not produce satisfactory results (r2 ~ 0 in both cases) for this subclass of compounds. However, it was possible to come up with a modified ABSOLV model for this small group of molecules (n=22), based on the equation:

graphic file with name admet-8-766-e004.jpg (4)

with r2 = 0.97 and RMSE = 0.27, where Sπ in Eq. (4) is the dipolarity/polarizability Abraham descriptor. Figure 15 compares the tested calculations. Strong H-bond donors (acids) decrease solubility, whereas strong H-bond acceptors (bases) have the opposite effect. High dipolarity/polarizability compounds are associate with low solubility.

Figure 15.

Figure 15.

Prediction models for quaternary ammonium compounds. Here, S0 represents the quaternary ammonium salt solubility, SQA.

Summary

The properties of the chemical space of druglike molecules in the Wiki-pS0 database of intrinsic aqueous solubility were described in considerable detail. The database was used to train two solubility prediction models: multiple linear regression (MLR) and Random Forest regression (RFR). The predictivity of the models was tested with four external sets of compounds. The MLR model incorporated calculated Abraham solvation descriptors (ABSOLV). The RFR model used the aggregate set of Tm (mostly measured values), ABSOLV, and RDKit 2D (204 descriptors in all). As a comparative benchmark, the General Solubility Equation (GSE), which requires no training, was used to predict the intrinsic solubility of the Wiki-pS0 druglike molecules.

For the intrinsic solubility set, excluding the permanently-charged quaternary amines, RMSE calculated as 1.23 (GSE), 1.00 (ABSOLV), and 0.28 (RFR) for the training sets. The intrinsic set was further divided into four subsets, based on dominant charge at pH 7.4: acids(-), bases(+), neutrals(0), and zwitterions(±). The performances of GSE and ABSOLV were comparable for acids, bases, and neutrals, but for the zwitterionic subset, ABSOLV was better.

For the permanently-charged quaternary amines (n=22), both GSE and RFR did not do well (r2 = 0). It was possible to develop a simplified ABSOLV training-set model using just three of the solvation descriptors.

The above comparisons are not entirely satisfactory tests of the predictivity of the three methods. For the RFR method, the data are randomly separated into a training set (70%) and an internal test set (30%). RMSE = 0.60 and MPP = 76% ‘correct’ predictions for the internal test set calculation. For the zwitterionic subset, RMSE = 0.45 and MPP = 91%.

The four external test sets allowed the comparisons of the three models in a uniform way. Test Set 1 (te1) was compiled by Yalkowsky and Banerjee [18] for testing the GSE. The other three test sets consisted of druglike molecules, all solids at room temperature, containing no agrochemicals. Test Set 2 (te2) molecules were originally used in the first Solubility Challenge [19, 103], and Test Sets 3 (te3) and 4 (te4) molecules were used in the second Solubility Challenge [20].

The GSE applied to simple organic compounds (te1) indicated RMSE = 0.97 and MPP = 29% ‘correct’ predictions. When experimental log P values are used in Eq. (1) [18], the te1 performance improves: RMSE = 0.72 and MPP = 52%.

RFR outperformed the other two methods on the whole. When Wiki-pS0 was augmented with 115 agrochemicals, te1 prediction improved (RMSE = 0.66, MPP = 71%), and was better than that of GSE. For te2 and te3 drug solubility RFR predictions, RMSE = 0.85 and 0.75, resp., whereas MPP = 50 and 57%, resp. There were three molecules in te4 that RFR did not predict well: amiodarone, clofazimine, and itraconazole. Apparently, the current database has limited chemical space coverage in the vicinity of these outliers. With the three outliers removed, RMSE = 0.76 and MPP = 41% for te4.

Conclusion

The GSE is popular for its simplicity and easy of calculation. It is a convenient benchmark against which to assess new prediction methods. Druglike molecules are expected to be predicted by GSE to within 1.1-1.2 log unit, or to within 0.5 log unit 22-42% of the time. However, its performance with zwitterionic molecules is limited. The ABSOLV method holds the middle position in the comparisons. The RFR method in this study is attractive, both for its predictive performance and ease of use. It is expected to predict druglike molecules similar to those in Wiki-pS0 to within 0.6 log unit of the measured values, or within 0.5 log unit 76% of the time. The RFR software is freely downloadable from open sources.

Evidently, the evaluated prediction methods cannot match the precision of measured equilibrium solubility data. The methods need to be further enhanced. More discriminating descriptors would be welcome additions to the openly-available collections. As the amiodarone, clofazimine, and itraconazole examples illustrate, there are still under-populated neighborhoods in the chemical space of the currently tested database. How effective Wiki-pS0 will be in predicting the solubility of newly-synthesized molecules in pharmaceutical research remains to be explored.

Abbreviations and definitions

DOA domain of applicability associated with druglike substances, determined by descriptor or structural (e.g., Tanimoto indices) similarity.
DTT Dissolution Titration Template potentiometric method used to determine intrinsic solubility, S0
HH Henderson-Hasselbalch equation [80]; e.g., for monoprotic base, log S = log S0 + log ( 10 +pKa – pH + 1 )
OOB “Out-of-Bag” built-in validation set of compounds randomly selected by the RFR method, which have not been used to train the model.
pHsat the equilibrium pH of a saturated water solution of compound whose solubility is Sw
S solubility, ideally expressed in units of mol/L (M), μg/mL, or mg/mL
S 0 “intrinsic” solubility (i.e., the solubility of the uncharged form of the compound)
S w “water” solubility, defined by dissolving enough pure free acid/base in distilled water (or water containing an inert salt - as ionic strength adjustor) to form a saturated solution. The final pH of the suspension, pHsat, and S0 can be calculated by the HH equation (when valid), provided the true pKa is known. Compound added in salt form may disproportionate into free acid/base, depending on how much solid had been added. Calculation of the pH and S0 of such salt suspensions can be uncertain.
S pH “pH buffer” solubility (i.e., the total solubility of the compound at a measured equilibrated pH)
SSF saturation shake-flask method, the “gold standard” solubility measurement method
RMSE root-mean-square error: RMSE = [ 1/n Σi (yiobs - yicalc)2 ]1/2, where yobs/ ycalc = observed/calculated value of log S0 according to model, n = number of measurements of log S0
r2 squared linear correlation coefficient, r2 = 1 - Σi (yiobs - yicalc)2 / Σi (yiobs - <y>)2 , where y = log S0, and <y> is the mean value of log S0
SD standard deviation: SD = [ 1/n Σi (yiobs - <y>)2 ]1/2, where n = number of measurements, <y> = mean value of log S0
F F-statistic: F = (n-p-1)/p · Σi (yiobs - <y>)2 / Σi (yiobs - yicalc)2, where p = number of regression parameters
MPP Measure of prediction performance [103]. It refers to the percent of ‘correct’ predictions, as defined by the count of absolute residuals |log S0obs – log S0calc| ≤ 0.5 divided by the number of measurements. MPP is represented as a pie chart in the correlation plots.
ntree number of trees specified in the Random Forest regression (RFR) – typically 500
mtry number of descriptors to use in the node splitting process in RFR – typically a third of the descriptors
nodesize minimum number of data points in the terminal node, beyond which no splitting takes place – typically 5 measurements

Abraham solvation descriptors

A H-bond total acidity
B H-bond total basicity
Sπ dipolarity/polarizability due to solute-solvent interactions between bond dipoles and induced dipoles
E excess molar refraction (dm3 mol-1 / 10); which models dispersion force interaction arising from π- and n-electrons of the solute
V McGowan molar volume (dm3 mol-1 / 100)
A·B acid-base H-bonding product descriptor used in ABSOLV solubility prediction

Most important RDKit descriptors in RFR analysis

Subdivided Surface Area Molecular Descriptors [121]

LabuteVSA sum of atomic contributions [51] to the accessible van der Waals surface area
MolLogP sum of atomic contributions to octanol/water partition coefficient, log P
MolMR sum of atomic contributions to molar refractivity, MR
SlogP_VSAk sum of accessible van der Waals surface area for those atoms with atomic contribution to log P; k refers to a small domain of atomic-contribution to log P; intended to capture hydrophobic/lipophilic effects
SMR_VSAk sum of accessible van der Waals surface area for those atoms with atomic contribution to molar refractivity; k refers to a small domain of atomic-contribution to MR; intended to capture molecular size & polarizability
PEOE_VSAk intended to capture direct electrostatic interactions in a particular range; based on iterative equalization of atomic orbital electronegativities [49].

Complexity descriptors

BertzCT complexity index, based on size, symmetry, branching, rings, multiple bonds, and heteroatoms characteristic of solute [50].
Ipc content information of topological graph [48] - entropy of atomic distribution in solute

Topological and electrotopological connectivity indices

Chi0, Chi0n, Chi0v, Chi1, Chi1n, Chi4n, Chi4v, α – Kier-Hall topological connectivity and shape indices [52, 53, 55] – numerical representations of topology of solute calculated from graphical depiction of the molecule

Atomic and subroup counts, HeavyAtomCount, NumberAromaticCarbocycles, NumberAromaticRings, RingCount, fr_benzene

Availability of the Wiki-pS0 Database

The entire Wiki-pS0 database is planned to be released in book form: A. Avdeef. Intrinsic Aqueous Solubility Data for Pharmaceutical Research. Wiley-Interscience, Hoboken, NJ (under discussion with publisher). A sampling is presented in Table A5, with citations to the original literature [122-196].

Acknowledgements

We dedicate this study to the memory of Prof. Gilles Klopman (1933-2015). His enthusiastic smile greeting graduate students to his 8 a.m. quantum mechanics class is warmly remembered, as are his many contributions to computational chemistry [32].

Manfred Kansy, Holger Fischer (Hoffman-La Roche, Basel), and Uko Maran (Univ. of Tartu, Estonia) have provided valuable insights and leads into the literature of chemoinformatics, for which we are grateful. We are greatly indebted to Agustin G. Asuero (Univ. of Seville, Spain) and Tatjana Verbić (University of Belgrade, Serbia) for facilitating with many important solubility-pH publications. Part of this work was reported at the IAPC-8 meeting in Split, Croatia, 9-11 September 2019 (www.iapchem.org). Test Sets 3 and 4 have been used in the new ‘Solubility Challenge’ [20], a competition which closed 8 September 2019.

Appendix. – Calculated results for the three models and a sampling of the database

Table A1. External Test Set 1 (Yalkowsky & Banerjee,1992) a.

log S0 (avg., 25 °C) T m log P Calculated log S0
NAME (Wiki-pS0) SD n (°C) (RDKit) GSE ABSOLV RFR
Acetylsalicylic_Acid -1.64 0.03 28 135 1.31 -1.91 -1.74 -1.92
Antipyrine 0.45 0.08 9 114 1.48 -1.87 -1.96 -1.18
Atrazine -3.69 0.15 6 173 1.78 -2.76 -2.54 -3.72
Benzocaine -2.19 0.12 14 89 1.45 -1.59 -1.97 -1.36
Chlordane -6.59 0.61 6 25 5.68 -5.18 -5.04 -5.08
Chlorpyrifos -5.70 0.24 5 43 4.72 -4.40 -3.51 -5.61
DDT,4,4'- -7.90 0.69 15 109 6.5 -6.84 -5.29 -6.06
Diazepam -3.81 0.11 10 132 3.15 -3.72 -4.08 -3.68
Diazinon -3.75 0.10 3 25 3.58 -3.08 -2.81 -4.06
Diuron -3.84 0.09 3 159 3.09 -3.93 -2.76 -3.55
Lindane -4.54 0.13 10 113 3.64 -4.02 -3.53 -4.32
Malathion -3.35 0.02 9 25 2.12 -1.62 -2.39 -3.40
Nitrofurantoin -3.33 0.11 13 264 0.07 -1.96 -2.06 -2.77
Parathion -4.27 0.17 12 25 3.27 -2.77 -3.21 -4.08
PCB,2,2',4,5,5'- -7.40 0.20 19 77 6.62 -6.64 -5.47 -5.84
Phenobarbital -2.30 0.08 26 175 0.7 -1.70 -2.32 -2.51
Phenolphthalein -5.08 0.17 2 263 3.56 -5.44 -4.46 -4.15
Phenytoin -4.07 0.13 30 297 1.77 -3.99 -3.34 -3.45
Prostaglandin_E2 -2.40 0.09 5 67 3.25 -3.17 -3.38 -3.35
Testosterone -4.10 0.09 16 155 3.88 -4.68 -3.91 -4.22
Theophylline -1.38 0.09 15 273 -1.04 -0.94 -1.55 -1.79
Min. -7.90 0.02
Max. 0.45 0.69
Mean -3.85 0.17

a Melting point of liquids are set to 25 °C (chlordane, diazinon, malathion, and parathion). The measured log P of antipyrine is 0.38.

SD refers to standard deviation from averaging n interlaboratory reported values.

Table A2. External Test Set 2 (Hopfinger et al. 2009).

log S0 (avg., 25 oC) T m log P Calculated log S0
NAME (Wiki-pS0) SD n (°C) (RDKit) GSE ABSOLV RFR
Acebutolol -2.56 0.31 3 119 2.37 -2.81 -2.37 -3.14
Amoxicillin -2.12 0.07 11 194 0.02 -1.21 -1.71 -1.80
Bendroflumethiazide -4.30 0.28 6 222 1.63 -3.10 -3.39 -4.31
Benzocaine -2.19 0.12 14 89 1.45 -1.59 -1.99 -1.19
Benzthiazide -4.84 0.22 6 232 2.43 -4.00 -4.42 -4.89
Clozapine -4.60 0.12 4 184 2.03 -3.12 -3.90 -3.57
Dibucaine -4.04 0.35 3 65 3.49 -3.39 -3.71 -4.06
Diethylstilbestrol -4.39 0.35 7 171 4.83 -5.79 -3.92 -4.57
Diflunisal -4.99 0.56 11 214 3.04 -4.43 -3.66 -4.21
Dipyridamole -5.14 0.12 11 163 -0.02 -0.86 -4.91 -2.84
Folic Acid -5.96 0.16 6 250 -0.04 -1.71 -2.50 -3.58
Furosemide -4.47 0.22 22 206 1.89 -3.20 -2.97 -3.58
Hydrochlorothiazide -2.72 0.10 18 274 -0.35 -1.64 -2.15 -2.91
Imipramine -4.30 0.26 11 146 3.88 -4.59 -4.36 -4.47
Indomethacin -5.48 0.22 21 159 3.93 -4.77 -4.72 -5.15
Ketoprofen -3.41 0.23 24 94 3.11 -3.30 -3.48 -4.19
Lidocaine -1.82 0.08 20 69 2.58 -2.52 -2.56 -2.62
Meclofenamic Acid -6.72 0.31 4 257 4.74 -6.56 -4.32 -5.59
Naphthoic Acid,2- -3.81 0.25 6 185 2.54 -3.64 -2.98 -3.30
Probenecid -4.83 0.20 4 197 2.20 -3.42 -2.63 -3.33
Pyrimethamine -4.00 0.47 4 233 2.52 -4.10 -3.93 -3.74
Salicylic Acid -1.88 0.08 21 158 1.09 -1.92 -1.98 -1.61
Sulfamerazine -3.11 0.06 7 237 1.17 -2.79 -3.03 -2.83
Sulfamethizole -2.77 0.12 6 208 1.23 -2.56 -3.29 -2.81
Terfenadine -7.74 0.71 11 150 6.45 -7.20 -5.98 -6.34
Thiabendazole -3.97 0.50 4 305 2.69 -4.99 -3.71 -3.56
Tolbutamide -3.54 0.09 7 129 1.78 -2.32 -2.85 -3.05
Trazodone -3.27 0.20 6 87 2.36 -2.48 -4.23 -4.22
Min. -7.74 0.06
Max. -1.82 0.71
Mean -4.03 0.24

Table A3. External Test Set 3 (Avg. Interlab. SD ~0.17).

log S0 (avg., 25 °C) T m log P Calculated log S0
NAME (Wiki-pS0) SD n (°C) (RDKit) GSE ABSOLV RFR
Acetazolamide -2.38 0.18 11 259 -0.86 -0.98 -1.50 -2.29
Acetylsalicylic Acid -1.67 0.15 16 142 1.31 -1.98 -1.71 -1.94
Alclofenac -4.40 0.16 4 92 2.53 -2.70 -2.58 -2.97
Ambroxol -3.87 0.17 3 234 3.19 -4.78 -3.90 -4.34
Aripiprazole -6.64 0.21 3 139 4.86 -5.50 -5.18 -5.30
Atovaquone -6.07 0.18 3 224 5.51 -7.00 -5.13 -6.00
Atrazine -3.69 0.15 6 173 1.78 -2.76 -2.49 -3.83
Baclofen -1.78 0.15 4 208 1.86 -3.19 -1.95 -2.51
Barbital,Buta- -2.22 0.16 10 167 0.79 -1.71 -1.59 -2.30
Benzthiazide -4.84 0.22 6 232 2.43 -4.00 -4.46 -4.65
Bromazepam -3.39 0.13 3 193 2.63 -3.81 -3.60 -3.98
Candesartan cilexetil -6.79 0.15 6 167 6.32 -7.24 -7.78 -6.37
Carbamazepine -3.22 0.16 15 192 3.39 -4.56 -3.83 -3.96
Carbazole -5.19 0.19 3 246 3.32 -5.03 -3.74 -4.12
Carbendazim -4.56 0.19 4 320 1.74 -4.19 -2.39 -3.03
Cefmenoxime -3.27 0.14 7 187 -0.87 -0.25 -3.66 -2.84
Cefprozil -1.68 0.20 4 222 0.71 -2.18 -2.35 -2.49
Celecoxib -5.89 0.18 6 158 3.51 -4.34 -4.77 -4.77
Cephradine -1.18 0.13 8 140 0.35 -1.00 -2.13 -2.07
Chlorpropamide -3.17 0.14 7 128 1.74 -2.27 -2.83 -3.11
Cholic Acid, Deoxy- -4.62 0.15 7 176 4.48 -5.49 -4.44 -4.74
Cilostazol -4.93 0.13 3 160 3.46 -4.31 -4.35 -4.36
Cimetidine -1.52 0.22 8 142 0.6 -1.27 -1.71 -2.44
Ciprofloxacin -3.57 0.18 20 267 1.58 -3.50 -2.97 -3.34
Cisapride -6.78 0.17 6 110 3.36 -3.71 -4.30 -4.72
Corticosterone -3.29 0.17 7 182 2.67 -3.74 -3.80 -3.29
Cortisone Acetate -4.22 0.13 4 222 2.56 -4.03 -3.89 -4.21
Cyclosporine A -5.03 0.16 6 151 3.27 -4.03 -7.00 -4.45
Daidzein -5.23 0.13 5 330 2.87 -5.42 -3.11 -4.47
Desipramine -3.83 0.18 3 100 3.53 -3.78 -4.14 -4.18
Dexamethasone -3.56 0.18 16 263 1.9 -3.78 -3.55 -3.80
Diazoxide -3.43 0.22 4 329 1.87 -4.41 -2.34 -3.16
Diclofenac -5.34 0.18 34 168 4.36 -5.29 -4.15 -5.35
Diflorasone Diacetate -4.82 0.16 3 223 2.99 -4.47 -4.20 -4.98
Difloxacin -3.83 0.21 3 211 2.72 -4.08 -4.05 -4.02
Diltiazem -3.02 0.13 3 210 3.37 -4.72 -4.24 -4.80
Diphenylamine -3.53 0.14 3 54 3.43 -3.22 -3.22 -3.68
DOPA,L- -1.76 0.17 6 270 0.05 -2.00 -1.06 -1.79
Enalapril -1.36 0.21 3 144 1.6 -2.29 -3.01 -2.90
Estradiol,17α- -5.00 0.18 5 215 3.61 -5.01 -3.98 -4.78
Estrone -5.38 0.19 8 255 3.82 -5.62 -4.02 -4.79
Ethoxzolamide -3.76 0.17 3 189 1.34 -2.48 -2.79 -3.00
Etoposide -3.60 0.20 4 244 1.34 -3.03 -4.51 -3.52
Eucalyptol -1.66 0.21 3 37 2.74 -2.36 -2.07 -2.22
Fenbufen -5.18 0.21 10 186 3.4 -4.51 -3.78 -3.72
Flumequine -3.90 0.19 3 253 2.35 -4.13 -2.83 -3.76
Flurbiprofen -4.34 0.20 23 111 3.68 -4.04 -3.64 -4.08
Folic Acid -5.96 0.16 6 250 -0.04 -1.71 -2.53 -3.58
Ganciclovir -1.78 0.13 3 250 -1.97 0.22 -0.81 -1.88
Glipizide -5.61 0.21 9 209 2.08 -3.42 -4.33 -4.68
Griseofulvin -4.52 0.19 15 220 2.69 -4.14 -3.39 -3.56
Haloperidol -5.71 0.17 10 151 4.43 -5.19 -4.24 -4.50
Ibrutinib -4.85 0.19 7 155 4.22 -5.02 -6.43 -5.08
Indinavir -4.53 0.16 5 168 2.87 -3.80 -5.45 -4.84
Indomethacin -5.48 0.22 21 159 3.93 -4.77 -4.72 -5.17
Indoprofen -4.65 0.21 5 214 3.04 -4.43 -3.65 -4.21
Ketoconazole -5.47 0.14 11 146 4.21 -4.92 -5.95 -5.38
Maprotiline -4.62 0.22 5 92 4.21 -4.38 -4.53 -4.95
Metolazone -3.88 0.21 8 256 2.71 -4.52 -4.12 -4.39
Nabumetone -4.40 0.21 3 80 3.37 -3.42 -3.66 -4.04
Naproxen -4.23 0.16 17 153 3.04 -3.82 -3.29 -4.08
Nelfinavir -6.21 0.20 3 350 4.75 -7.50 -5.62 -5.36
Nevirapine -3.41 0.14 6 248 2.65 -4.38 -3.54 -3.90
Nifedipine -4.71 0.15 11 173 2.18 -3.16 -3.22 -4.67
Nimesulide -4.74 0.14 5 144 2.76 -3.45 -3.92 -4.22
Norfloxacin -2.88 0.16 19 221 1.27 -2.73 -2.67 -3.13
Nortriptyline -3.93 0.16 5 214 3.83 -5.22 -4.28 -4.51
Noscapine -4.48 0.14 3 176 2.88 -3.89 -3.95 -3.84
Ofloxacin -2.03 0.13 14 254 1.54 -3.33 -3.04 -1.37
Oxazepam -4.03 0.17 5 206 2.45 -3.76 -3.46 -3.65
Oxyphenbutazone -3.94 0.19 3 96 3.49 -3.70 -3.49 -4.24
Papaverine -4.33 0.19 12 147 3.86 -4.58 -4.32 -4.42
Perphenazine -4.48 0.17 6 97 3.94 -4.16 -4.95 -4.74
Phenacetin -2.30 0.14 10 135 2.04 -2.64 -1.97 -2.14
Phenazopyridine -4.02 0.16 7 139 2.66 -3.30 -3.10 -3.36
Pindolol -3.75 0.15 9 170 1.91 -2.86 -2.45 -2.91
Pravastatin -4.86 0.15 10 326 2.44 -4.95 -3.45 -3.60
Prednisolone, Methyl- -3.33 0.18 5 233 1.8 -3.38 -3.65 -3.45
Primidone -2.53 0.14 4 282 0.54 -2.61 -1.97 -2.31
Probenecid -4.83 0.20 4 197 2.2 -3.42 -2.62 -3.39
Promazine -4.45 0.13 4 33 4.24 -3.82 -4.33 -4.74
Promethazine -4.38 0.19 11 60 4.24 -4.09 -4.29 -4.68
Repaglinide -4.77 0.17 4 131 5.22 -5.78 -5.22 -6.45
Resveratrol, trans- -3.75 0.18 7 254 2.97 -4.76 -3.04 -3.60
Ritonavir -5.17 0.16 5 121 5.91 -6.37 -7.47 -5.80
Rofecoxib -4.61 0.16 5 207 2.56 -3.88 -3.67 -4.11
Spironolactone -4.21 0.16 6 135 4.85 -5.45 -5.12 -5.25
Strychnine -3.38 0.19 6 275 2.09 -4.09 -4.06 -3.30
Sulfasalazine -6.41 0.14 9 220 1.8 -3.25 -3.85 -4.36
Sulfathiazole -2.62 0.22 9 202 1.53 -2.80 -3.10 -2.57
Sulfisomidine -2.16 0.14 3 243 1.48 -3.16 -3.21 -2.84
Sulfisoxazole -3.13 0.14 3 191 1.67 -2.83 -3.09 -2.81
Sulindac -4.96 0.21 7 184 4.37 -5.46 -4.34 -5.10
Tetracaine -3.11 0.11 3 149 2.62 -3.36 -2.61 -2.78
Tetracycline -3.22 0.15 8 165 -0.37 -0.53 -1.68 -2.72
Thiacetazone -3.50 0.16 10 225 0.81 -2.31 -2.38 -2.80
Triamcinolone -3.52 0.21 5 270 0.62 -2.57 -3.10 -3.12
Triamterene -4.11 0.14 9 313 0.83 -3.21 -4.17 -3.42
Warfarin -4.78 0.20 11 161 3.61 -4.47 -3.84 -4.29
Xanthine -3.60 0.21 3 300 -1.06 -1.19 -1.24 -2.69
Min. -6.79 0.11
Max. -1.18 0.22
Mean -4.03 0.17

Table A4. External Test Set 4 (Avg. Interlab. SD ~0.62).

log S0 (avg., 25 °C) T m log P Calculated log S0
NAME (Wiki-pS0) SD n (°C) (RDKit) GSE ABSOLV RFR
Amantadine -2.19 0.50 3 180 1.91 -2.96 -1.95 -1.96
Amiodarone -10.40 0.50 5 156 6.94 -7.75 -7.68 -6.77
Amodiaquine -5.49 0.65 3 208 5.18 -6.51 -4.86 -5.57
Bisoprolol -2.09 0.59 3 100 2.37 -2.62 -2.51 -2.50
Bromocriptine -5.50 0.51 5 217 3.19 -4.61 -5.65 -5.12
Buprenorphine -6.07 0.83 3 210 4.41 -5.76 -5.54 -5.20
Chlorprothixene -5.99 0.51 6 98 5.19 -5.42 -4.96 -5.23
Clofazimine -9.05 0.93 5 211 7.49 -8.85 -7.42 -6.88
Curcumin -5.36 0.68 3 177 3.37 -4.39 -4.22 -4.67
Danazol -6.10 0.52 10 229 4.22 -5.76 -5.01 -4.69
Didanosine -1.24 0.54 3 162 -0.21 -0.66 -1.75 -1.34
Diflunisal -4.99 0.56 11 214 3.04 -4.43 -3.70 -4.02
Diphenhydramine -3.21 0.55 4 169 3.35 -4.29 -3.47 -3.41
Etoxadrol -1.96 0.55 3 124 2.81 -3.30 -2.96 -3.14
Ezetimibe -4.94 0.51 4 165 4.89 -5.79 -4.62 -5.55
Fentiazac -5.84 0.65 4 161 4.76 -5.62 -5.40 -4.90
Iopanoic Acid -5.49 0.66 3 155 3.74 -4.54 -5.94 -4.85
Itraconazole -8.98 0.61 3 165 5.58 -6.48 -8.45 -6.50
Miconazole -5.82 0.50 6 161 6.45 -7.31 -5.86 -5.71
Mifepristone -5.22 0.75 4 194 5.41 -6.60 -5.96 -5.82
Omeprazole -3.70 0.50 3 156 2.9 -3.71 -3.70 -3.88
Pioglitazone -6.20 0.66 4 184 3.16 -4.25 -4.15 -4.44
Procaine -2.30 0.60 3 61 1.77 -1.63 -2.27 -2.55
Quinine -3.06 0.57 7 177 3.17 -4.19 -3.74 -2.85
Raloxifene -6.82 0.56 6 145 6.08 -6.78 -6.70 -6.18
Rifabutin -4.09 0.66 3 176 4.62 -5.63 -6.88 -5.22
Saquinavir -5.92 0.58 3 350 3.09 -5.84 -5.95 -4.92
Sulfadimethoxine -3.74 0.70 3 204 0.88 -2.17 -2.89 -3.17
Tamoxifen -7.52 0.72 7 98 6.00 -6.23 -5.55 -6.09
Telmisartan -6.73 0.84 5 262 7.26 -9.13 -9.03 -7.15
Terfenadine -7.74 0.71 11 150 6.45 -7.20 -6.03 -6.61
Thiabendazole -3.97 0.50 4 305 2.69 -4.99 -3.75 -3.62
Min. -10.40 0.50
Max. -1.24 0.93
Mean -5.24 0.62

Table A5. Listing of external test set four solubility values from the Wiki-pS0 database a.

Appendix.

Appendix.

Appendix.

Appendix.

Appendix.

Appendix.

Appendix.

Appendix.

Appendix.

a RN – Registry number (CAS). ΔHsol – calculated [9] heat of solubility, used to adjust data to a standard temperature (25 °C). pKa – calculated for strongest acid and weakest base groups. Num.pH – number of SpH measurements in the log S – pH profile. *,**,*** indicate small, moderate, extensive concentration of aggregate/complex.

Footnotes

Conflict of interest: The author declares no conflict of interest.

References

  • [1].Hörter D, Dressman JB. Influence of physicochemical properties on dissolution of drugs in the gastrointestinal tract. Adv Drug Deliv Rev. 46 (2001) 75–87. [DOI] [PubMed] [Google Scholar]
  • [2].Lipinski CA, Lombardo F, Dominy BW, Feeney PJ. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev. 46 (2001) 3–26. [DOI] [PubMed] [Google Scholar]
  • [3].Saal C, Petereit AC. Optimizing solubility: Kinetic versus thermodynamic solubility temptations and risks. Eur J Pharm Sci. 47 (2012) 589–95. [DOI] [PubMed] [Google Scholar]
  • [4].Doak AK, Wille H, Prusiner SB, Shoichet BK. Colloid formation by drugs in simulated intestinal fluid. J Med Chem. 53 (2010) 4259–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Di L, Fish PV, Mano T. Bridging solubility between drug discovery and development. Drug Discov Today. 17 (2012) 486–95. [DOI] [PubMed] [Google Scholar]
  • [6].Bergström CAS, Holm R, Jørgensen SA, Andersson SBE, Artursson P, Beato S, et al. Early pharmaceutical profiling to predict oral drug absorption: Current status and unmet needs. Eur J Pharm Sci. 57 (2014) 173–99. [DOI] [PubMed] [Google Scholar]
  • [7].Riethorst D, Brouwers J, Motmans J, Augustijns P. Human intestinal fluid factors affecting intestinal drug permeation in vitro. Eur J Pharm Sci. 121 (2018) 338–46. [DOI] [PubMed] [Google Scholar]
  • [8].Sun H. A Practical Guide to Rational Drug Design. Elsevier, Amsterdam, 2015, pp. 193-223. [Google Scholar]
  • [9].Avdeef A. Suggested improvements for measurement of equilibrium solubility-pH of ionizable drugs. ADMET DMPK. 3 (2015) 84–109. [Google Scholar]
  • [10].Avdeef A. Solubility temperature dependence predicted from 2D structure. ADMET DMPK. 3 (2015) 298–344. [Google Scholar]
  • [11].Völgyi G, Marosi A, Takács-Novák K, Avdeef A. Salt solubility products of diprenorphine hydrochloride, codeine and lidocaine hydrochlorides and phosphates – Novel method of data analysis not dependent on explicit solubility equations. ADMET DMPK. 1 (2013) 48–62. [Google Scholar]
  • [12].Avdeef A. STBLTY: methods for construction and refinement of equilibrium models. In: Leggett D.J. (Ed.), Computational Methods for the Determination of Formation Constants, Plenum: New York, 1985, pp. 355-473. [Google Scholar]
  • [13].Abraham MH, Le J. The correlation and prediction of the solubility of compounds in water using an amended solvation energy relationship. J Pharm Sci. 88 (1999) 868–80. [DOI] [PubMed] [Google Scholar]
  • [14].Breiman L. Random forests. Mach Learn. 45 (2001) 5–32. [Google Scholar]
  • [15].Landrum G, Lewis R, Palmer A, Stiefl N, Vulpetti A. Making sure there's a "give" associated with the "take": Producing and using open-source software in big pharma. J. Cheminformatics. 3 (2011) 1-1; cf., http://www.rdkit.org/ (accessed 5 May 2019). [Google Scholar]
  • [16].Yalkowsky SH, Valvani SC. Solubility and partitioning I: Solubility of nonelectrolytes in water. J Pharm Sci. 69 (1980) 912–22. [DOI] [PubMed] [Google Scholar]
  • [17].Alantary D, Yalkowsky S. Comments on prediction of the aqueous solubility using the general solubility equation (GSE) versus a genetic algorithm and a support vector machine model. Pharm Dev Technol. 23 (2018) 739–40. [DOI] [PubMed] [Google Scholar]
  • [18].Yalkowsky SH, Banerjee S. Aqueous Solubility: Methods of Estimation for Organic Compounds. Marcel Dekker, Inc., New York, 1992, p. 142. [Google Scholar]
  • [19].Llinàs A, Glen RC, Goodman JM. Solubility challenge: can you predict solubilities of 32 molecules using a database of 100 reliable measurements? J Chem Inf Model. 48 (2008) 1289–303. [DOI] [PubMed] [Google Scholar]
  • [20].Llinas A, Avdeef A. Solubility Challenge revisited after 10 years, with multi-lab shake-flask data, using tight (SD ~0.17 log) and loose (SD ~0.62 log) test sets. J Chem Inf Model. 59 (2019) 3036–40. [DOI] [PubMed] [Google Scholar]
  • [21].Irmann F. A simple correlation between water solubility and the structure of hydrocarbons and halogenated hydrocarbons. Chemieingenieurtechnik (Weinh). 37 (1965) 789–98. [Google Scholar]
  • [22].Hansch C, Quinlan JE, Lawrence GL. Linear free-energy relationship between partition coefficients and the aqueous solubility of organic liquids. J Org Chem. 33 (1968) 347–50. [Google Scholar]
  • [23].Ran Y, Yalkowsky SH. Prediction of drug solubility by the General Solubility Equation. J Chem Inf Comput Sci. 41 (2001) 354–7. [DOI] [PubMed] [Google Scholar]
  • [24].Jain N, Yalkowsky SH. Estimation of the aqueous solubility I: Application to organic nonelectrolytes. J Pharm Sci. 90 (2001) 234–52. [DOI] [PubMed] [Google Scholar]
  • [25].Ran Y, Jain N, Yalkowsky SH. Prediction of aqueous solubility of organic compounds by the General Solubility Equation (GSE). J Chem Inf Comput Sci. 41 (2001) 1208–17. [DOI] [PubMed] [Google Scholar]
  • [26].Jain N, Yang G, Machatha SG, Yalkowsky SH. Estimation of the aqueous solubility of weak electrolytes. Int J Pharm. 319 (2006) 169–71. [DOI] [PubMed] [Google Scholar]
  • [27].Jain P, Yalkowsky SH. Prediction of aqueous solubility from SCRATCH. Int J Pharm. 385 (2010) 1–5. [DOI] [PubMed] [Google Scholar]
  • [28].Dearden JC, Bresnen GM. The measurement of partition coefficients. Quant. Struct.-. Act. Relat. 7 (1988) 133–44. [Google Scholar]
  • [29].Raevsky OA, Raevskaya OE, Schaper K-J. Analysis of water solubility data on the basis of HYBOT descriptors. Part 3. Solubility of solid neutral chemicals and drugs. QSAR Comb Sci. 23 (2004) 327–43. [Google Scholar]
  • [30].Dearden JC. In silico prediction of aqueous solubility. Expert Opin Drug Discov. 1 (2006) 31–52. [DOI] [PubMed] [Google Scholar]
  • [31].Taskinen J, Norinder U. In silico prediction of solubility. In: Testa B., van de Waterbeemd H. (Eds.). Comprehensive Medicinal Chemistry II, Elsevier: Oxford, UK, 2007, pp. 627-648. [Google Scholar]
  • [32].Klopman G, Wang S, Balthasar DM. Estimation of aqueous solubility of organic molecules by the group contribution approach. Application to the study of biodegradation. J Chem Inf Comput Sci. 32 (1992) 474–82. [DOI] [PubMed] [Google Scholar]
  • [33].R. Kühne R . R.-U. Ebert, F. Kleint, G. Schmidt, G. Schüürmann. Group contribution methods to estimate water solubility of organic chemicals. Chemosphere. 30 (1995) 2061–77. [Google Scholar]
  • [34].Huuskonen J, Salo M, Taskinen J. Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J Chem Inf Comput Sci. 38 (1998) 450–6. [DOI] [PubMed] [Google Scholar]
  • [35].Huuskonen J. Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. J Chem Inf Comput Sci. 40 (2000) 773–7. [DOI] [PubMed] [Google Scholar]
  • [36].Huuskonen J, Rantanen J, Livinstone D. Prediction of aqueous solubility for a diverse set of organic compounds based on atom-type electrotopological state indices. Eur J Med Chem. 35 (2000) 1081–8. [DOI] [PubMed] [Google Scholar]
  • [37].Huuskonen J. Estimation of water solubility from atom-type electrotopological state indices. Environ Toxicol Chem. 20 (2001) 491–7. [PubMed] [Google Scholar]
  • [38].Livingstone DJ, Ford MG, Huuskonen JJ, Salt DW. Simultaneous prediction of aqueous solubility and octanol/water partition coefficient based on descriptors derived from molecular structure. J Comput Aided Mol Des. 15 (2001) 741–52. [DOI] [PubMed] [Google Scholar]
  • [39].Liu R, So S-S. Development of quantitative structure-property relationship models for early ADME evaluation in drug discovery. 1. Aqueous solubility. J Chem Inf Comput Sci. 41 (2001) 1633–9. [DOI] [PubMed] [Google Scholar]
  • [40].Tetko IV, Tanchuk VYu, Kasheva TN, Villa AEP. Estimation of aqueous solubility of chemical compounds using E-state indices. J Chem Inf Comput Sci. 41 (2001) 1488–93. [DOI] [PubMed] [Google Scholar]
  • [41].Yan A, Gasteiger J. Prediction of aqueous solubility of organic compounds based on a 3D structure representation. J Chem Inf Comput Sci. 43 (2003) 429–34. [DOI] [PubMed] [Google Scholar]
  • [42].Wegner JK, Zell A. Prediction of aqueous solubility and partition coefficient optimized by a genetic algorithm based descriptor selection method. J Chem Inf Comput Sci. 43 (2003) 1077–84. [DOI] [PubMed] [Google Scholar]
  • [43].Butina D, Gola JMR. Modeling aqueous solubility. J Chem Inf Comput Sci. 43 (2003) 837–41. [DOI] [PubMed] [Google Scholar]
  • [44].Yan A, Gasteiger J. Prediction of aqueous solubility of organic compounds by topological descriptors. QSAR Comb Sci. 22 (2003) 821–9. [Google Scholar]
  • [45].Hou TJ, Xia K, Zhang W, Xu XJ. ADME evaluation in drug discovery. 4. Prediction of aqueous solubility based on atom contribution approach. J Chem Inf Comput Sci. 44 (2004) 266–75. [DOI] [PubMed] [Google Scholar]
  • [46].Sun H. A universal molecular descriptor system for prediction of log P, log S, log BB and absorption. J Chem Inf Comput Sci. 44 (2004) 748–57. [DOI] [PubMed] [Google Scholar]
  • [47].Delaney JS. ESOL: estimating aqueous solubility directly from molecular structure. J Chem Inf Comput Sci. 44 (2004) 1000–5. [DOI] [PubMed] [Google Scholar]
  • [48].Bonchev D, Trinajstić N. Information theory, distance matrix, and molecular branching. J Chem Phys. 67 (1977) 4517–33. [Google Scholar]
  • [49].Gasteiger J, Marsali M. Iterative partial equalization of orbital electronegativity - a rapid access to atomic charges. Tetrahedron. 36 (1980) 3219–28. [Google Scholar]
  • [50].Bertz SH. The first general index of molecular complexity. J Am Chem Soc. 103 (1981) 3599–601. [Google Scholar]
  • [51].Wildman SA, Crippen GM. Prediction of physicochemical parameters by atomic contributions. J Chem Inf Comput Sci. 39 (1999) 868–73. [Google Scholar]
  • [52].Hall LH, Kier LB. Reviews of Computational Chemistry. In: Boyd D., Lipkowitz K. (Eds.), VCH Publishers, 2 (1991) 367-422. [Google Scholar]
  • [53].Hall LH, Kier LB. The nature of structure–activity relationships and their relation to molecular connectivity. Eur. J. Med. Chem. -. Chim Ther. 4 (1997) 307–12. [Google Scholar]
  • [54].Leach AR, Gillet VJ. An Introduction to Chemoinformatics. Rev. Edn. Springer, 2007, pp 53-74. [Google Scholar]
  • [55].Dearden JC. The use of topological indices in QSAR and QSPR modeling. In: Roy K. (Ed.) Advances in QSAR Modeling. Challenges and Advances in Computational Chemistry and Physics, vol 24. Springer, Cambridge, 2017, pp. 57-88. [Google Scholar]
  • [56].Wang J, Hou T. Recent advances on aqueous solubility prediction. Comb Chem High Throughput Screen. 14 (2011) 328–38. [DOI] [PubMed] [Google Scholar]
  • [57].Abraham MH. Scales of hydrogen bonding - their construction and application to physicochemical and biochemical processes. Chem Soc Rev. 22 (1993) 73–83. [Google Scholar]
  • [58].Platts JA, Butina D, Abraham MH, Hersey A. Estimation of molecular linear free energy relation descriptors using a group contribution approach. J Chem Inf Comput Sci. 39 (1999) 835–45. [DOI] [PubMed] [Google Scholar]
  • [59].Liaw A, Wiener M. Classification and regression by Random Forest. R News. 2 (2002) 18–22. [Google Scholar]
  • [60]. https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm (accessed 5 May 2019)
  • [61].Liaw A. Random Forests What, Why, And How. https://www.youtube.com/watch?v=XJnjlpW9w5A. (youtube lecture). https://nyhackr.blob.core.windows.net/presentations/Random-Forests-What-Why-and-How_Andy_Liaw.pdf (slides from above lecture).
  • [62].Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Chapman & Hall/CRC: Boca Raton, 1984. [Google Scholar]
  • [63].Walters WP. What are our models really telling us? A practical tutorial on avoiding common mistakes when building predictive models. In: Bajorath J. (Ed.). Chemoinformatics for Drug Discovery. John Wiley & Sons, Hoboken, NJ, 2014, pp. 1-31. [Google Scholar]
  • [64].Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: a classification and regression tool for compound classification and QSAR modelling. J Chem Inf Comput Sci. 43 (2003) 1947–58. [DOI] [PubMed] [Google Scholar]
  • [65].Schroeter TS, Schwaighofer A, Mika S, Laak AT, Suelzle D, Ganzer U, et al. Estimating the domain of applicability for machine learning QSAR models: a study on aqueous solubility of drug discovery molecules. J Comput Aided Mol Des. 21 (2007) 485–98. [DOI] [PubMed] [Google Scholar]
  • [66].Palmer DS, O’Boyle NM, Glen RC, Mitchell JBO. Random Forest models to predict aqueous solubility. J Chem Inf Model. 47 (2007) 150–8. [DOI] [PubMed] [Google Scholar]
  • [67].Howard P, Meylan W. PHYSPROP DATABASE. Syracuse Research Corp., N. Syracuse, NY, Sept. 1999. https://www.srcinc.com/what-we-do/environmental/scientific-databases.html (accessed 3 May 2019).
  • [68].Beilstein CrossFire Database. San Ramon, CA, USA. [Google Scholar]
  • [69].Yalkowsky SH, He Y. The Handbook of Aqueous Solubility Data. CRC Press, Boca Raton, 2003. [Google Scholar]
  • [70].MOE. Chemical Computing Group Inc., Montreal, QC H3A 2R7, Canada. http://www.chemcomp.com (accessed 6 May 2019).
  • [71].Guha R, Dexheimer TS, Kestranek AN, Jadhav A, Chervenak AM, Ford MG, et al. Exploratory analysis of kinetic solubility measurements of a small molecule library. Bioorg Med Chem. 19 (2011) 4127–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [72].Katritzky AR, Wang Y, Sild S, Tamm T, Karelson M. QSPR studies on vapor pressure, aqueous solubility, and the prediction of water-air partition coefficients. J Chem Inf Model. 38 (1998) 720–5. [Google Scholar]
  • [73].Jorgensen WL, Duffy EM. Prediction of drug solubility from structure. Adv Drug Deliv Rev. 54 (2002) 355–66. [DOI] [PubMed] [Google Scholar]
  • [74].Palmer DS, Mitchell JBO. Is experimental data quality the limiting factor in predicting the aqueous solubility of druglike molecules? Mol Pharm. 11 (2014) 2962–72. [DOI] [PubMed] [Google Scholar]
  • [75].Hughes LD, Palmer DS, Nigsch F, Mitchell JBO. Why are some properties more difficult to predict than others? A study of QSPR models of solubility, melting point, and log P. J Chem Inf Model. 48 (2008) 220–32. [DOI] [PubMed] [Google Scholar]
  • [76].Avdeef A, Fuguet E, Llinàs A, Ràfols C, Bosch E, Völgyi G, et al. Equilibrium solubility measurement of ionizable drugs – consensus recommendations for improving data quality. ADMET DMPK. 4 (2016) 117–78. [Google Scholar]
  • [77].Faller B, Ertl P. Computational approaches to determine drug solubility. Adv Drug Deliv Rev. 59 (2007) 533–45. [DOI] [PubMed] [Google Scholar]
  • [78].Marković OS, Pešić MP, Shah AV, Serajuddin ATM, Verbić TZ, Avdeef A. Solubility-pH profile of desipramine hydrochloride in saline phosphate buffer: enhanced solubility due to drug-buffer aggregates. Eur J Pharm Sci. 133 (2019) 264–74. [DOI] [PubMed] [Google Scholar]
  • [79].Bergström CAS, Avdeef A. Perspectives in solubility measurement and interpretation. ADMET DMPK. 7 (2019) 88–105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [80].Avdeef A. Absorption and Drug Development, Second Edition, Wiley-Interscience, Hoboken NJ, 2012. [Google Scholar]
  • [81].Takács-Novák K, Urac M, Horváth P, Völgyi G, Anderson BD, Avdeef A. Equilibrium solubility measurement of compounds with low dissolution rate by Higuchi’s Facilitated Dissolution Method. A validation study. Eur J Pharm Sci. 106 (2017) 133–41. [DOI] [PubMed] [Google Scholar]
  • [82].Avdeef A, Berger CM, Brownell C. pH-metric solubility. 2. Correlation between the acid-base titration and the saturation shake-flask solubility-pH methods. Pharm Res. 17 (2000) 85–9. [DOI] [PubMed] [Google Scholar]
  • [83].Stuart M, Box K. Chasing equilibrium: Measuring the intrinsic solubility of weak acids and bases. Anal Chem. 77 (2005) 983–90. [DOI] [PubMed] [Google Scholar]
  • [84].Lang A.S.I.D., Bradley J.-C.. ONS Melting Point Model 010. QDB archive, DOI: 10.15152/QDB.104. QsarDB content. Property mpC. Steps:Calculate descriptors. SMILES. Calculate. Scroll down to mpC.10.15152/QDB.104
  • [85].Yalkowsky SH, He Y, Jain P. Handbook of Aqueous Solubility Data, Second Edition. CRC Press, Boca Raton, FL, 2010. [Google Scholar]
  • [86].Benet LZ, Broccatelli F, Oprea TI. BDDCS applied to over 900 drugs. AAPS J. 13 (2011) 519–47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [87].Analytical Profiles of Drug Substances (Analytical Profiles of Drug Substances and Excipients; Profiles of Drug Substances, Excipients and Related Methodology). K. Florey (ed., vols. 1-20), H.G. Brittain (ed., vols. 21-39). Academic Press, San Diego, 1972-2014. [Google Scholar]
  • [88].O’Neil MJ, Heckelman PE, Dobbelaar PH, Roman KJ, editors. The Merck Index: an Encyclopedia of Chemicals, Drugs, and Biologicals, The Royal Society of Chemistry, 15th Ed, 2013. [Google Scholar]
  • [89].Series with J.B. Dressman and colleagues. Biowaiver monographs for immediate-release solid oral dosage forms. J. Pharm. Sci. 94 (2005) through at least 107 (2018). [DOI] [PubMed] [Google Scholar]
  • [90].Freier RK. Aqueous Solutions, Volume 1: Data for Inorganic and Organic Compounds. Walter de Gruyter: New York, 1976. [Google Scholar]
  • [91].Sober HA, editor. Handbook of Biochemistry. 2nd Edition. CRC Press: Cleveland, OH, 1970, pp. B65-B68. [Google Scholar]
  • [92].Mullin JW. Crystallisation. Butterworths, London, pp. 425-426, 1972. [Google Scholar]
  • [93].McFarland JW, Avdeef A, Berger CM, Raevsky OA. Estimating the water solubilities of crystalline compounds from their chemical structures alone. J Chem Inf Comput Sci. 41 (2001) 1355–9. [DOI] [PubMed] [Google Scholar]
  • [94].Bergström CAS, Wassvik CM, Norinder U, Luthman K, Artursson P. Global and local computational models for aqueous solubility prediction of druglike molecules. J Chem Inf Comput Sci. 44 (2004) 1477–88. [DOI] [PubMed] [Google Scholar]
  • [95].Bergström CA, Norinder U, Luthman K, Artursson P. Experimental and computational screening models for prediction of aqueous drug solubility. Pharm Res. 19 (2002) 182–8. [DOI] [PubMed] [Google Scholar]
  • [96].Bergström CAS, Luthman K, Artursson P. Accuracy of calculated pH-dependent aqueous drug solubility. Eur J Pharm Sci. 22 (2004) 387–98. [DOI] [PubMed] [Google Scholar]
  • [97].Wassvik CM, Holmen AG, Bergström CA, Zamora I, Artursson P. Contribution of solid-state properties to the aqueous solubility of drugs. Eur J Pharm Sci. 29 (2006) 294–305. [DOI] [PubMed] [Google Scholar]
  • [98].Bergström CA, Wassvik CM, Johansson K, Hubatsch I. Poorly soluble marketed drugs display solvation limited solubility. J Med Chem. 50 (2007) 5858–62. [DOI] [PubMed] [Google Scholar]
  • [99].Rytting E, Lentz KA, Chen XQ, Qian F, Venkatesh S. A Quantitative Structure–Property Relationship for Predicting Drug Solubility in PEG 400/Water Cosolvent Systems. Pharm Res. 21 (2004) 237–44. [DOI] [PubMed] [Google Scholar]
  • [100].Sköld C, Winiwarter S, Wernevik J, Bergström F, Engström L, Allen R, et al. Presentation of a structurally diverse and commercially available drug data set for correlation and benchmarking studies. J Med Chem. 49 (2006) 6660–71. [DOI] [PubMed] [Google Scholar]
  • [101].Llinàs A, Burley JC, Box KJ, Glen RC, Goodman JM. Diclofenac solubility: independent determination of the intrinsic solubility of three crystal forms. J Med Chem. 50 (2007) 979–83. [DOI] [PubMed] [Google Scholar]
  • [102].Box KJ, Comer JEA. Using measured pKa, log P and solubility to investigate supersaturation and predict BCS class. Curr Drug Metab. 9 (2008) 869–78. [DOI] [PubMed] [Google Scholar]
  • [103].Hopfinger AJ, Esposito EX, Llinàs A, Glen RC, Goodman JM. Findings of the challenge to predict aqueous solubility. J Chem Inf Model. 49 (2009) 1–5. [DOI] [PubMed] [Google Scholar]
  • [104].Narasimham LYS, Barhate VD. Kinetic and intrinsic solubility determination of some β-blockers and antidiabetics by potentiometry. J Pharm Res. 4 (2011) 532–6. [Google Scholar]
  • [105].Hsieh Y-L, Ilevbare GA, van Eerdenbrugh B, Box KJ, Sanchez-Felix MV, Taylor LS. pH-Induced precipitation behavior of weakly basic compounds: determination of extent and duration of supersaturation using potentiometric titration and correlation to solid state properties. Pharm Res. 29 (2012) 2738–53. [DOI] [PubMed] [Google Scholar]
  • [106].Comer J, Judge S, Matthews D, Towes L, Falcone B, Goodman J, et al. The intrinsic aqueous solubility of indomethacin. ADMET DMPK. 2 (2014) 18–32. [Google Scholar]
  • [107].Etherson K, Halbert G, Elliott M. Determination of excipient based solubility increases using the CheqSol method. Int J Pharm. 465 (2014) 202–9. [DOI] [PubMed] [Google Scholar]
  • [108].Schönherr D, Wollatz U, Haznar-Garbacz D, Hanke U, Box KJ, Taylor R, et al. Characterisation of selected active agents regarding pKa values, solubility concentrations and pH profiles by SiriusT3. Eur J Pharm Biopharm. 92 (2015) 155–70. [DOI] [PubMed] [Google Scholar]
  • [109].Fornells E, Fuguet E, Mañéa M, Ruiz R, Box K, Bosch E, et al. Effect of vinylpyrrolidone polymers on the solubility and supersaturation of drugs; a study using the Cheqsol method. Eur J Pharm Sci. 117 (2018) 227–35. [DOI] [PubMed] [Google Scholar]
  • [110].Baek K, Jeon SB, Kim BK, Kang NS. Method validation for equilibrium solubility and determination of temperature effect on the ionization constant and intrinsic solubility of drugs. J. Pharm. Sci. Emerg. Drugs. 6 (2018) 1–6. [Google Scholar]
  • [111].Avdeef A. pH-metric solubility. 1. Solubility-pH profiles from Bjerrum plots. Gibbs buffer and pKa in the solid state. Pharm Pharmacol Commun. 4 (1998) 165–78. [Google Scholar]
  • [112].Avdeef A. Physicochemical profiling (solubility, permeability, and charge state). Curr Top Med Chem. 1 (2001) 277–351. [DOI] [PubMed] [Google Scholar]
  • [113].Avdeef A, Berger CM. pH-metric solubility. 3. Dissolution titration template method for solubility determination. Eur J Pharm Sci. 14 (2001) 281–91. [DOI] [PubMed] [Google Scholar]
  • [114].Faller B, Wohnsland F. Physicochemical parameters as tools in drug discovery and lead optimization. In: Testa B., van de Waterbeemd H., Folkers G., Guy R. (Eds.). Pharmacokinetic Optimization in Drug Research. Verlag Helvetica Chimica Acta: Zürich and Wiley - VCH: Weinheim, pp. 257-274 (2001). [Google Scholar]
  • [115].Bergström CAS, Strafford M, Lazarova L, Avdeef A, Luthman K, Artursson P. Absorption classification of oral drugs based on molecular surface properties. J Med Chem. 46 (2003) 558–70. [DOI] [PubMed] [Google Scholar]
  • [116].Fioritto AF, Bhattachar SN, Wesley JA. Solubility measurement of polymorphic compounds via the pH-metric titration technique. Int J Pharm. 330 (2007) 105–13. [DOI] [PubMed] [Google Scholar]
  • [117].Ottaviani G, Gosling DJ, Patissier C, Rodde S, Zhou L, Faller B. What is modulating solubility in simulated intestinal fluids? Eur J Pharm Sci. 41 (2010) 452–7. [DOI] [PubMed] [Google Scholar]
  • [118].Sun N, Avdeef A. Biorelevant pKa (37oC) Predicted from the 2D Structure of the Molecule and its pKa at 25oC. J Pharm Biomed Anal. 56 (2011) 173–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [119].Murashov MD, Diaz-Espinosa J, LaLone V, Tan JWY, Laza R, Wang X, et al. Synthesis and characterization of a biomimetic formulation of clofazimine hydrochloride microcrystals for parenteral administration. Pharmaceutics. 10 (2018) 238. doi: http://dx.doi.org/10.3390/pharmaceutics10040238. 10.3390/pharmaceutics10040238 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [120].Lipinski CA. Drug-like properties and the causes of poor solubility and poor permeability. J Pharmacol Toxicol Methods. 44 (2000) 235–49. [DOI] [PubMed] [Google Scholar]
  • [121].Labute P. A widely applicable set of descriptors. J Mol Graph Model. 18 (2000) 464–77. [DOI] [PubMed] [Google Scholar]
  • [122].Baka E, Comer JEA, Takács-Novák K. Study of equilibrium solubility measurement by saturation shake-flask method using hydrochlorothiazide as model compound. J Pharm Biomed Anal. 46 (2008) 335–41. [DOI] [PubMed] [Google Scholar]
  • [123].Bakatselou V, Oppenheim RC, Dressman JB. Solubilization and wetting effects of bile salts in the dissolution of steroids. Pharm Res. 8 (1991) 1461–9. [DOI] [PubMed] [Google Scholar]
  • [124].Bannigan P, Stokes K, Kumar A, Madden C, Hudson SP. Investigating the effects of amphipathic gastrointestinal compounds on the solution behavior of salt and free base forms of clofazimine: An in vitro evaluation. Int J Pharm. 552 (2018) 180–92. [DOI] [PubMed] [Google Scholar]
  • [125].Bard B, Martel S, Carrupt P-A. High throughput UV method for the estimation of thermodynamic solubility and the determination of the solubility in biorelevant media. Eur J Pharm Sci. 33 (2008) 230–40. [DOI] [PubMed] [Google Scholar]
  • [126].Box KJ, Völgyi G, Baka E, Stuart M, Takács-Novák K. Equilibrium versus kinetic measurement of aqueous solubility, and the ability of compounds to supersaturate in solution - a validation study. J Pharm Sci. 95 (2006) 1298–307. [DOI] [PubMed] [Google Scholar]
  • [127].Bridges JW, Walker SR, Williams RT. Species differences in the metabolism and excretion of sulphasomidine and sulphamethomidine. Biochem J. 111 (1969) 173–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [128].Cassens J, Prudic A, Ruether F, Sadowski G. Solubility of pharmaceuticals and their salts as a function of pH. Ind Eng Chem Res. 52 (2013) 2721–31. [Google Scholar]
  • [129].Charoo NA, Shamsher AAA, Lian LY, Abrahamsson B, Cristofoletti R, Groot DW, et al. Biowaiver monograph for immediate-release solid oral dosage forms: bisoprolol fumarate. J Pharm Sci. 103 (2014) 378–91. [DOI] [PubMed] [Google Scholar]
  • [130].Chen X-Q, Venkatesh S. Miniature device for aqueous and non-aqueous solubility measurements during drug discovery. Pharm Res. 21 (2004) 1758–61. [DOI] [PubMed] [Google Scholar]
  • [131].Chiarini A, Tartarini A, Fini A. pH-Solubility relationship and partition coefficients for some antiinflammatory arylaliphatic acids. Arch Pharm (Weinheim). 317 (1984) 268–73. [Google Scholar]
  • [132].Clarysse S, Browuwers J, Tack J, Annaert P, Augustijns P. Intestinal drug solubility estimation based on simulated intestinal fluids: comparison with solubility in human intestinal fluid. Eur J Pharm Sci. 43 (2011) 260–9. [DOI] [PubMed] [Google Scholar]
  • [133].Cotton ML, Hux RA. Diflunisal. Anal. Prof. Drug Subst. 14 (1985) 491–526. [Google Scholar]
  • [134].Cutrignelli A, Lopedota A, Denora N, Iacobazzi RM, Fanizza E, Laquintana V, et al. A new complex of curcumin with sulfobutylether-β-cyclodextrin: characterization studies and in vitro evaluation of cytotoxic and antioxidant activity on HepG-2 cells. J Pharm Sci. 103 (2014) 3932–40. [DOI] [PubMed] [Google Scholar]
  • [135].Cantero MM. Solubility determination of compounds of pharmaceutical interest, Bachelor's Degree Final Project, Univ. Barcelona, Jan 2018; http://diposit.ub.edu/dspace/handle/2445/119664 C. Ràfols private communication
  • [136].Dezani AB, Dezani TM, Ferreira JCF, Serra CHR. Solubility evaluation of didanosine: a comparison between the equilibrium method and intrinsic dissolution for biopharmaceutics classification purposes. Braz J Pharm Sci. 53 (2017) 1–8. http://dx.doi.org/10.1590/s2175-97902017000216128. 10.1590/s2175-97902017000216128 [DOI] [Google Scholar]
  • [137].Drewe J, Keck M, Guitard P, Pellet A, Johnston B, Beglinger C. Relevance of pH dependency on in vitro release of bromocriptine from a modified-release formulation. J Pharm Sci. 80 (1991) 160–3. [DOI] [PubMed] [Google Scholar]
  • [138].Eisenbrand J, Picher H. Bestimmung der dissoziationskonstanten, loslichkeiten und verteilungskoeffizienten von pantokain- und novokainbase. Arch Pharm Ber Dtsch Pharm Ges. 276 (1938) 1–17. [Google Scholar]
  • [139].Erlich L, Yu D, Pallister DA, Levinson RS, Gole DG, Wilkinson PA, et al. Relative bioavailability of danazol in dogs from liquid-filled hard gelatin capsules. Int J Pharm. 179 (1999) 49–53. [DOI] [PubMed] [Google Scholar]
  • [140].Fagerberg JH, Tsinman O, Tsinman K, Sun N, Avdeef A, Bergström CAS. Dissolution rate and apparent solubility of poorly soluble compounds in biorelevant dissolution media. Mol Pharm. 7 (2010) 1419–30. [DOI] [PubMed] [Google Scholar]
  • [141].Fini A, Laus M, Orienti I, Zecchi V. Dissolution and partition thermodynamic functions of some non-steroidal anti-inflammatory drugs. J Pharm Sci. 75 (1986) 23–5. [DOI] [PubMed] [Google Scholar]
  • [142].French DL, Mauger JW. Evaluation of the physicochemical properties and dissolution characteristics of mesalamine: relevance to controlled intestinal drug delivery. Pharm Res. 10 (1993) 1285–90. [DOI] [PubMed] [Google Scholar]
  • [143].Fagerberg JH, Al-Tikriti Y, Ragnarsson G, Bergström CAS. Ethanol effects on apparent solubility of poorly soluble drugs in simulated intestinal fluid. Mol Pharm. 9 (2012) 1942–52. [DOI] [PubMed] [Google Scholar]
  • [144].Fagerberg JH, Sjögren E, Bergström CAS. Concomitant intake of alcohol may increase the absorption of poorly soluble drugs. Eur J Pharm Sci. 67 (2015) 12–20. [DOI] [PubMed] [Google Scholar]
  • [145].Garrett ER, Chandran VR. Pharmacokinetics of morphine and its surrogates VI: Bioanalysis, solvolysis kinetics, solubility, pKa' values, and protein binding of buprenorphine. J Pharm Sci. 74 (1985) 515–24. [DOI] [PubMed] [Google Scholar]
  • [146].Glomme A, März J, Dressman JB. Predicting the intestinal solubility of poorly soluble drugs. In: Testa B., Krämer S.D., Wunderli-Allenspach H., Folkers G. (Eds.), Pharmacokinetic Profiling in Drug Research. Wiley-VCH, pp 259-280 (2006). [Google Scholar]
  • [147].Glomme A, März J, Dressman JB. Comparison of a miniaturized shake-flask solubility method with automated potentiometric acid/base titrations and calculated solubilities. J Pharm Sci. 94 (2005) 1–16. [DOI] [PubMed] [Google Scholar]
  • [148].Holcombe IJ, Fusari SA. Diphenhydramine Hydrochloride. Anal. Profiles Drug Subst. 3 (1974) 173–232. [Google Scholar]
  • [149].Jackson MJ, Kestur US, Hussain MA, Taylor LS. Characterization of supersaturated danazol solutions - impact of polymers on solution properties and phase transitions. Pharm Res. 33 (2016) 1276–88. [DOI] [PubMed] [Google Scholar]
  • [150].Janes JO, Loeb PM, Berk RN, Dietschy JM. Intestinal absorption of oral cholecystographic agents. Clin Res. 25 (1977) 312–312. [PubMed] [Google Scholar]
  • [151].Kalepu S, Nekkanti V, Manthina M, Padavala V. Development and validation of a dissolution method for raloxifene hydrochloride in pharmaceutical dosage forms using RP-HPLC. J Chem Pharm Res. 5 (2013) 981–7. [Google Scholar]
  • [152].Kramer SF, Flynn GL. Solubility of organic hydrochlorides. J Pharm Sci. 61 (1972) 1896–904. [DOI] [PubMed] [Google Scholar]
  • [153].Lestari MLAD, Ardiana F, Indrayanto G. Ezetimibe. Profiles Drug Subst Excip Relat Methodol. 36 (2011) 103–49. [DOI] [PubMed] [Google Scholar]
  • [154].Lin B, Pease JH. A high throughput solubility assay for drug discovery using microscale shake-flask and rapid UHPLC–UV–CLND quantification. J Pharm Biomed Anal. 122 (2016) 126–40. [DOI] [PubMed] [Google Scholar]
  • [155].Loftsson T, Hreinsdótitir D. Determination of aqueous solubility by heating and equilibration: a technical note. AAPS PharmSciTech 7. Article. 4 (2006) E1–4. [DOI] [PubMed] [Google Scholar]
  • [156].Lordi NG, Christian JE. Physical properties and pharmacological activity: antihistamines. J Am Pharm Assoc Am Pharm Assoc. 45 (1956) 300–5. [DOI] [PubMed] [Google Scholar]
  • [157].Marasini N, Tran TH, Poudel BK, Cho HJ, Choi YK, Chi S-C, et al. Fabrication and evaluation of pH-modulated solid dispersion for telmisartan by spray-drying technique. Int J Pharm. 441 (2013) 424–32. [DOI] [PubMed] [Google Scholar]
  • [158].Meylan WM, Howard PH, Boethling RS. Improved method for estimating water solubility from octanol/water partition coefficient. Environ Toxicol Chem. 15 (1996) 100–6. [Google Scholar]
  • [159].Mithani SD, Bakatselou V, TenHoor CN, Dressman JB. Estimation of the increase in solubility of drugs as a function of bile salt concentration. Pharm Res. 13 (1996) 163–7. [DOI] [PubMed] [Google Scholar]
  • [160].Muankaew C, Jansook P, Sigurdsson HH, Loftsson T. Cyclodextrin-based telmisartan ophthalmic suspension: Formulation development for water-insoluble drugs. Int J Pharm. 507 (2016) 21–31. [DOI] [PubMed] [Google Scholar]
  • [161].Nair A, Abrahamsson B, Barends DM, Groot DW, Kopp S, Polli JE, et al. Biowaiver monographs for immediate release solid oral dosage forms: amodiaquine hydrochloride. J Pharm Sci. 101 (2012) 4390–401. [DOI] [PubMed] [Google Scholar]
  • [162].Najib NM, Suleiman MS. The kinetics of dissolution of diflunisal and diflunisal-polyethylene glycol solid dispersion. Int J Pharm. 57 (1989) 197–203. [Google Scholar]
  • [163].O’Driscoll C, Corrigan OI. Clofazimine. Anal. Profiles Drug Subst. Excip. 21 (1992) 75–108. [Google Scholar]
  • [164].O’Reilly JR, Corrigan OI, O’Driscoll CM. The effect of simple micellar systems on the solubility and intestinal absorption of clofazimine (B663) in the anaesthetized rat. Int J Pharm. 105 (1994) 137–46. [Google Scholar]
  • [165].Ottaviani G, Wendelspiess S, Alvarez-Sánchez R. Importance of critical micellar concentration for the prediction of solubility enhancement in biorelevant media. Mol Pharm. 12 (2015) 1171–9. [DOI] [PubMed] [Google Scholar]
  • [166].Pedersen BL, Müllertz A, Brøndsted H, Kristensen HG. A comparison of the solubility of danazol in human and simulated gastrointestinal fluids. Pharm Res. 17 (2000) 891–4. [DOI] [PubMed] [Google Scholar]
  • [167].Peeters J, Neeskens P, Tollenaere JP, Van Remoortere P, Brewster ME. Characterization of the interaction of 2-hydroxypropyl-beta-cyclodextrin with itraconazole at pH 2, 4 and 7. J Pharm Sci. 91 (2002) 1414–22. [DOI] [PubMed] [Google Scholar]
  • [168].Perlovich GL, Kurkov SV, Bauer-Brandl A. The difference between partitioning and distribution from a thermodynamic point of view: NSAIDs as an example. Eur J Pharm Sci. 27 (2006) 150–7. [DOI] [PubMed] [Google Scholar]
  • [169].Pitré D. Iopoanoic acid. Anal. Profiles Drug Subst. 14 (1985) 181–206. [Google Scholar]
  • [170].Plöger GF, Hofsäss MA, Dressman JB. Solubility determination of active pharmaceutical ingredients which have been recently added to the list of Essential Medicines in the context of the Biopharmaceutics Classification System - biowaiver. J Pharm Sci. 107 (2018) 1478–88. [DOI] [PubMed] [Google Scholar]
  • [171].Roy SD, Roos E, Sharma K. Transdermal delivery of buprenorphine through cadaver skin. J Pharm Sci. 83 (1994) 126–30. [DOI] [PubMed] [Google Scholar]
  • [172].Seedher N, Kanojia M. Micellar solubilization of some poorly soluble antidiabetic drugs: a technical note. AAPS PharmSciTech. 9 (2008) 431–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [173].Shoghi E, Fuguet E, Bosch E, Ráfols C. Solubility-pH profile of some acidic, basic and amphoteric drugs. Eur J Pharm Sci. 48 (2013) 290–1. [DOI] [PubMed] [Google Scholar]
  • [174].Sieger P, Cui Y, Schenerer S. pH-dependent solubility and permeability profiles: a useful tool for prediction of oral bioavailability. Eur J Pharm Sci. 105 (2017) 82–90. [DOI] [PubMed] [Google Scholar]
  • [175].Singh BN. A quantitative approach to probe the dependence and correlation of food-effect with aqueous solubility, dose/solubility ratio, and partition coefficient (log P) for orally active drugs administered as immediate-release formulations. Drug Dev Res. 65 (2005) 55–75. [Google Scholar]
  • [176].Srivalli KMR, Mishra B. Improved aqueous solubility and antihypercholesterolemic activity of ezetimibe on formulating with hydroxypropyl-?-cyclodextrin and hydrophilic auxiliary substances. AAPS PharmSciTech. 17 (2016) 272–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [177].Strauch S, Dressman JB, Shah VP, Kopp S, Polli JE, Barends DM. Biowaiver monographs for immediate-release solid oral dosage forms: quinine sulfate. J Pharm Sci. 101 (2012) 499–508. [DOI] [PubMed] [Google Scholar]
  • [178].Streng WH, His SK, Helms PE, Tan TGH. General treatment of pH-solubility profiles of weak acids and bases and the effects of different acids on the solubility of a weak base. J Pharm Sci. 73 (1984) 1679–84. [DOI] [PubMed] [Google Scholar]
  • [179].Živanović VS, Pešić MP, Horváth V, Madarász J, Cvijetić IN, Popović GV, et al. Terfenadine solubility studies. IAPC-4 Conference, Red Island, Croatia, 21-24 Sept 2015. [Google Scholar]
  • [180].Taupitz T, Dressman JB, Klein S. New formulation approaches to improve solubility and drug release from fixed dose combinations: case examples pioglitazone/glimepiride and ezetimibe/simvastatin. Eur J Pharm Biopharm. 84 (2013) 208–18. [DOI] [PubMed] [Google Scholar]
  • [181].Teixeira CCC, de Paiva E, Jr, de Freitas LAP. Fluidized bed hot-melt granulation as a tool to improve curcuminoid solubility. AAPS PharmSciTech. 19 (2018) 1061–71. [DOI] [PubMed] [Google Scholar]
  • [182].Thakkar V, Dhankecha R, Gohel M, Shah P, Pandya T, Gandhi T. Enhancement of solubility of artemisinin and curcumin by co-solvency approach for application in parenteral drug delivery system. Int J Drug Deliv. 8 (2016) 77–88. [Google Scholar]
  • [183].Tran PHL, Tran HTT, Lee B-J. Modulation of microenvironmental pH and crystallinity of ionizable telmisartan using alkalizers in solid dispersions for controlled release. J Control Release. 129 (2008) 59–65. [DOI] [PubMed] [Google Scholar]
  • [184].Watari N, Hanano M, Kaneniwa N. Dissolution of slightly soluble drugs. VI. Effect of particle size of sulfadimethoxine on the oral bioavailability. Chem Pharm Bull (Tokyo). 28 (1980) 2221–5. [DOI] [PubMed] [Google Scholar]
  • [185].Wauchope RD, Buttler TM, Hornsby AG, Augustin-Beckers PWM, Burt JP. The SCS/ARS/CES pesticide properties database for environmental decision-making. Rev Environ Contam Toxicol. 123 (1992) 1–155. [PubMed] [Google Scholar]
  • [186].Williams GC, Sinko PJ. Oral absorption of the HIV protease inhibitors: a current update. Adv Drug Deliv Rev. 39 (1999) 211–38. [DOI] [PubMed] [Google Scholar]
  • [187].Wuyts B, Brouwers J, Mols R, Tack J, Annaert P, Augustijns P. Solubility profiling of HIV protease inhibitors in human intestinal fluids. J Pharm Sci. 102 (2013) 3800–7. [DOI] [PubMed] [Google Scholar]
  • [188].Woldemichael T, Keswani RK, Rzeczycki PM, Murashov MD, LaLone V, Gregorka B, et al. Reverse engineering the intracellular self-assembly of a functional mechanopharmaceutical device. Sci Rep. 8 (2018) 2934. doi: http://dx.doi.org/10.1038/s41598-018-21271-7. 10.1038/s41598-018-21271-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [189].Yamashita S, Fukunishi A, Higashino H, Kataoka M, Wada K. Design of supersaturable formulation of telmisartan with pH modifier: in vitro study on dissolution and precipitation. J Pharm Investig. 47 (2017) 163–71. [Google Scholar]
  • [190].Alelyunas YW, Liu R, Pelosi-Kilby L, Shen C. Application of a dried-DMSO rapid throughput 24-h equilibrium solubility in advancing discovery candidates. Eur J Pharm Sci. 37 (2009) 172–82. [DOI] [PubMed] [Google Scholar]
  • [191].Al Omari MM, Zughul MB, Davies JED, Badwan AA. Effect of buffer species on the complexation of basic drug terfenadine with b-cyclodextrin. J Incl Phenom Macrocycl Chem. 58 (2007) 227–35. [Google Scholar]
  • [192].Avdeef A, Kansy M, Bendels S, Tsinman K. Absorption-excipient-pH classification gradient maps: sparingly-soluble drugs and the pH partition hypothesis. Eur J Pharm Sci. 33 (2008) 29–41. [DOI] [PubMed] [Google Scholar]
  • [193].Application materials for Pioglitazone Tablet 30 mg Sawai; Sawai Pharmaceutical Co., Ltd. (cited by: Sugita M, Kataoka M, Sugihara M, Takeuchi S, Yamashita S. Effect of excipients on the particle size of precipitated pioglitazone in the gastrointestinal tract: impact on bioequivalence. AAPS J 16 (2014) 1119-1127.) [DOI] [PMC free article] [PubMed]
  • [194].Andersson SBE, Alvebratt C, Bevernage J, Bonneau D, da Costa Mathews C, Dattani R, et al. Interlaboratory validation of small-scale solubility and dissolution measurements of poorly water-soluble drugs. J Pharm Sci. 105 (2016) 2864–72. [DOI] [PubMed] [Google Scholar]
  • [195].Anderson BD, Wygant MB, Xiang T-X, Waugh WA, Stella VJ. Preformulation solubility and kinetic studies of 2′,3′-dideoxypurine nucleosides: potential anti-AIDS agents. Int J Pharm. 45 (1988) 27–37. [Google Scholar]
  • [196].Ahad A, Shakeel F, Alfaifi OA, Raish M, Ahmad A, Al-Jenoobi FI, et al. Solubility determination of raloxifene hydrochloride in ten pure solvents at various temperatures: Thermodynamics-based analysis and solute–solvent interactions. Int J Pharm. 544 (2018) 165–71. [DOI] [PubMed] [Google Scholar]

Articles from ADMET & DMPK are provided here courtesy of International Association of Physical Chemists

RESOURCES