Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2025 Jul 29;125(15):7057–7098. doi: 10.1021/acs.chemrev.4c00855

Physics-Based Solubility Prediction for Organic Molecules

Daniel J Fowles , Benedict J Connaughton , James W Carter , John B O Mitchell , David S Palmer ‡,*
PMCID: PMC12355698  PMID: 40728940

Abstract

Accurate prediction of aqueous solubility for organic molecules is of great importance across a range of fields, from the design and manufacturing of energy materials, to assessing the environmental impact of potential pollutants. It is of particular significance to the pharmaceutical industry, in which problems with low aqueous solubility frequently hamper the development of new drugs. Experimental measurements of solubility are used extensively, but are often time-consuming, resource intensive and only applicable to already synthesized molecules. As such, there is a need for the development of computational approaches to predict solubility. In recent years, there have been considerable advances in physics-based methods, with several contrasting techniques able to give accurate predictions of solubility and a wealth of thermodynamic data for structural optimization. Here, we provide the reader with a thorough understanding of the theoretical background and practical applications of these physics-based methods to predict solubility. This includes discussions of the various advantages and disadvantages of each approach, and an indication of areas of continuing research. Experimental and data-driven methods to assess solubility are also discussed to provide context.


graphic file with name cr4c00855_0008.jpg


graphic file with name cr4c00855_0007.jpg

1. Introduction

1.1. Importance of Solubility

Solubility is a fundamental physicochemical property that is important in many areas of academic and industrial chemical research. Knowing how much of a solute will dissolve in a solvent, and how that value can be modulated by changes in structure, solution composition, or environmental factors, is valuable at all stages of discovery, development, and production of new chemical entities. A case in point is the pharmaceutical industry, where it is estimated that 70% of candidate molecules in development have solubility issues, which have directly contributed to the slowdown in the production of new drugs.. , Similar issues are observed in other industries. For example, solubility directly impacts the effectiveness, safety, and environmental behavior of agrochemicals like pesticides and herbicides. , For chemical manufacturing, where solubility can cause problems with synthesis, purification and characterization of molecules, understanding solubility behavior can be invaluable for process optimization.

Since experimental measurements of solubility are often time-consuming, resource intensive, and only applicable to already synthesized molecules, there is a demand for accurate methods to predict solubility, to be used to guide the selection of solutes, solvents, or environmental conditions, thereby lowering costs and time spent on unnecessary synthesis and testing. Predictive methods can also help to understand the structure–property relationships that determine solubility, thereby aiding molecular design.

The prediction of solubility from molecular structure has been an active area of academic research in theoretical and computational physical chemistry for over a century, from early structure–property relationships, to semiempirical models, to physics-based approaches, to deep learning algorithms. The majority of predictive methods are data-driven approaches, which use statistical models to learn a relationship between the physical property of interest (i.e., solubility) and an appropriate computational representation of the molecule. Although these methods are convenient to use and fast, most are only able to make predictions at fixed environmental and chemical conditions, and are often unreliable for molecules dissimilar to those in their training set. Furthermore, as these methods are not based on any fundamental chemical theory, little information is provided about the underlying chemistry, and as such they are difficult to systematically improve.

The calculation of the solubility of crystalline organic molecules from theory and simulation without parametrization against empirical solubility data has been a longstanding goal in computational chemistry. It has proven to be a significant challenge because of the need to accurately simulate both the solid crystalline and dissolved solution phases, while accounting for a variety of factors that influence the solubility, including crystalline polymorphic form. One significant advantage of physics-based methods is that they offer a more rigorous approach for predicting aqueous solubility than data-driven methods, with a clearly defined theoretical basis, which means they can provide a wealth of structural and thermodynamic data for chemical and process optimization. A considerable number of approaches have been proposed, with significant progress in modeling solvation , and simulating crystalline polymorphs for organic molecules having helped contribute to the success of a wide range of them.

This review will provide the reader with a thorough understanding of the theoretical background and practical applications of these physics-based methods to predict solubility. This will include a discussion of the various advantages and disadvantages of each approach, and an indication of areas of continuing research. In Section , fundamental concepts are introduced and methods for measuring experimental solubility are reviewed, with particular focus on how the variability in experimental data influences the development and validation of computational methods. To provide context, Section presents the current state-of-the-art in data-driven methods to predict solubility, from traditional QSPR approaches to deep learning. In Section , physics-based methods to predict solubility are reviewed. The focus is on methods that do not require parametrization against solubility data, but some semiempirical methods are included where they have a strong theoretical background. Guided by the current literature, Sections and primarily focus on the prediction of intrinsic aqueous solubility, but in Section the scope is broadened to consider prediction of solubility in other solvents and under other conditions. Sections and provide Conclusions and Future Perspectives that summarize recent progress and highlight promising avenues for further work. The scope of the review encompasses crystalline small organic solutes, including druglike molecules and other small functional organic molecules, but not large polymers, which often require different computational methods to model accurately. The review begins in the next section with a description of the relevant theoretical background and key concepts.

1.2. Solvation, Dissolution, and Solubility

Solvation describes the process through which a dissolved substance and solvent interact. As a solid solute dissolves, particles will shift from the bulk solid to solution with the formation of solute–solvent complexes, and an equilibrium will emerge with dissolution and precipitation as opposing processes. If the amount of solute added to the solution exceeds the solubility limit, dissolution will continue until the concentration of dissolved solute exceeds that which can be favorably held in solution (“supersaturation”), at which point the rate of precipitation will exceed the rate of dissolution and dissolved solute will revert to the bulk solid. Solubility represents the point at which a stable solute–solvent thermodynamic equilibrium has been achieved, with the rate of dissolution equal to the rate of precipitation. The solute and solvent of a binary mixture have the same chemical potential at thermodynamic equilibrium and are in coexistence with each other. From this, solubility can be generally defined as the maximum amount of substance that can dissolve in a specified amount of solvent at thermodynamic equilibrium under specified environmental conditions.

1.3. Factors That Influence Solubility

The solubility of a compound can be influenced by many factors, ranging from changes in environmental conditions to the introduction of new chemical species. A selection of these factors will be discussed over the next few sections in order to provide relevant background for the discussion of computational methods to predict solubility.

1.3.1. Temperature

The solubility of a solute in a solvent is a function of the temperature. The solubility of a compound will shift depending on whether the enthalpy change of the dissolution reaction is endothermic (ΔH dissolution > 0) or exothermic (ΔH dissolution < 0). For most solid compounds, a higher temperature will further induce a breakdown of its crystal lattice, resulting in greater mobility from the bulk to solution, and increased solubility. There are exceptions, such as sodium sulfate, which forms less soluble hydrate complexes at higher temperatures. Gaseous compounds typically trend in the opposite direction to solid species, with higher temperatures leading to greater degasification of the solvent. Changes to temperature and dissolution enthalpy can be related to a change in solubility through the van’t Hoff equation.

lnK2K1=ΔrHR(1T11T2) 1

where K 1 and K 2 are the solubility products at temperatures T 1 and T 2, respectively, Δ r H is the standard dissolution enthalpy and R is the molar gas constant. The solubility product defines the concentration of dissolved species in a saturated solution. The van’t Hoff equation lacks solvent specific expressions, and so will struggle to predict solubilities across different solvents.

The linear form of the van’t Hoff equation can be used to determine whether a compound will exhibit a higher or lower solubility by estimating the enthalpy and entropy of dissolution. , By measuring the solubility product at different temperatures, the change in enthalpy can be used to ascertain the endothermic or exothermic nature of the species.

lnKsp=ΔrHRT+ΔrSR 2

where K sp is the solubility product at a given temperature and Δ r S is the standard dissolution entropy.

The van’t Hoff equation includes an implicit assumption that enthalpy and entropy do not change with temperature. This assumption does not hold for all species, and so an additional term must be introduced to correct for this.

lnKsp=a+bT+cT2 3

where

ΔrH=R+(b+2cT) 4

and

ΔrS=R+(a+cT2) 5

1.3.2. Pressure

The effects of pressure on solubility only needs to be considered for gaseous species. The pressure dependence for condensed phases is often insignificant and can be neglected in practice. Gaseous solutes form an equilibrium between gas present above the solvent and that which has dissolved within the solvent. The solubility product, and so the solubility of a given gaseous compound, can be quantified with Henry’s law.

ρ=kHc 6

where ρ is the partial pressure, k H is the Henry’s law constant for a given gaseous solute and c is the concentration of dissolved gas in solution. Henry’s law states that the concentration of dissolved gas in solution is directly proportional to the gaseous partial pressure above the solution. By altering the pressure of a system, the partial pressure will be directly altered. An increase in pressure will cause the gas above the solvent to compress, leading to an increase in partial pressure and so greater solubility. Henry’s law is only an approximation for systems where the gaseous species is present at a low concentration, and in solutions where no chemical reaction is taking place.

1.3.3. pH and Buffers

A commonly applied and effective technique for adjusting the solubility of an ionizable solute within a given solvent is by altering the pH of that medium. The relationship between bulk solution pH and the extent of solute solubilization depends upon the number and identity of charged groups found within the solute structure. For example, the solubility of compounds containing basic anions can generally be increased through a reduction in solvent pH. From the addition of an acidic compound, newly introduced H+ ions will react with basic anions present in solution to form water. H3O+ ions will also form from the reaction of H+ ions and water. This shift in equilibrium will thus increase the solubility of the bulk compound.

Mg(OH)2(s)Mg(aq)2++2OH(aq) 7
H(aq)++OH(aq)H2O(l) 8

Buffers are one method used extensively to control the solution pH. A buffer solution is any aqueous based mixture of compounds containing a weak acid and its conjugate base, or vice versa. The purpose of a buffer solution is to minimize any change to the pH of the solution when a strong acid or base is added. Buffer solutions are able to maintain a stable pH because of the chemical equilibrium that forms between the weak acid and its conjugate base. This equilibrium will help to control the concentration of H+ and H3O+ ions present in solution as a strong acid or base is added. No single buffer is effective over the full pH range, and multiple buffers may be needed to obtain pH control over a wide range. For example, a combination of phosphate and citrate is very effective as a buffer over a pH range of 1.6–7.7, and acetic acid acts as a buffer in a pH range of 3.7–5.8 (shown below).

CH3COOH(aq)+H2O(l)CH3COO(aq)+H3O(aq)+ 9

In solution, acetic acid will dissociate into acetate and H+ ions. If a strong acid is added, then the presence of existing H+ ions will limit the increase in hydrogen ion concentration, and instead the equilibrium position will shift toward the weak acid. Similarly, adding a strong base will not raise the pH as OH ions will instead react with the weak acid.

The Henderson–Hasselbalch equation can be used to estimate the pH of a buffer solution from the concentrations of weak acid and conjugate base in solution. ,

pH=pKa+log10([Base][Acid]) 10

where pK a is the acid dissociation constant and the terms in square brackets denote the concentration of conjugate base and weak acid, respectively.

1.3.4. Cosolvents, Surfactants and Complexation

Many classes of nonpolar compound exhibit poor aqueous solubility and poor miscibility with water. This observed tendency of nonpolar substances to aggregate in an aqueous solution and to be excluded by water is sometimes referred to as the hydrophobic effect. The origin of the hydrophobic effect is still debated, but it arises at least in part from the inability of nonpolar compounds to form strong hydrogen bonds with water. In pure water, there is a dynamic network of hydrogen bonds between solvent molecules. The enthalpic change for hydrating a solute is dominated by an unfavorable contribution for creating a cavity in the solvent for the solute, and favorable contributions from solute–solvent interactions, the latter of which are typically bigger for hydrophilic than hydrophobic compounds. Close to the surface of the solute, hydrogen bonds reorientate forming a slightly more ordered ”cage” around some solutes, with a corresponding loss of translation and rotational entropy of the solvent molecules. The hydrophobic effect is observed when the interplay between enthalpic and entropic factors makes solvation less favorable than aggregation. Aggregation of nonpolar compounds occurs because it reduces the water-accessible surface area, thereby minimizing the disruptive effect.

If the compound under investigation is a weak electrolyte, then altering the pH of the solution may be sufficient to solubilize it, but for many classes of compound a change in pH is not sufficient, and so other techniques must be applied to counteract this behavior. Many such techniques make use of hydrotropes to solubilize hydrophobic compounds, of which three such approaches will be briefly discussed below. Interested readers are directed toward more in depth reviews covering these topics. ,

Cosolvents are organic compounds that can be introduced to an aqueous solution to better match the polarity of water with that of the poorly soluble solute. These compounds typically maintain a mutual miscibility with water through the presence of hydrogen bond donor and/or hydrogen bond acceptor groups. Cosolvents typically contain a small hydrocarbon region, reducing the hydrogen bond density within its vicinity. This nonpolar pocket provides space for a previously poorly soluble solute to exist in solution and thus improve its solubility. As the introduction of a nonpolar cosolvent will reduce the polarity of water, improvements in solubility will only be observed for nonpolar solutes, with a reduction in solubility for most polar compounds. A wide range of cosolvents are in common use, and studies directly investigating new cosolvent mixtures, as well as their application in in-silico methods, are regularly released. ,−

Surfactants are amphiphilic molecules, characterized by the presence of both hydrophilic and hydrophobic groups. These opposing groups drive a tendency for amphiphilic molecules to orient themselves at the interface between phases of different polarity. For multiphase systems involving water, the surfactant will orient itself with its more polar region directed toward water and its more nonpolar region toward the less polar phase. As an aqueous solution becomes saturated with surfactant, groups of surfactant molecules will aggregate to form micelles. These typically spherical structures maximize the favorable interactions between the hydrophilic head of each individual surfactant and water while maintaining a nonpolar core. Beyond macroscopic interfaces, surfactants will also orient themselves to the microscopic interface between a solute and water. The micellar surfactant will incorporate the solute within its structure, with the exact location dependent on the polarity of the solute. The greater the difference in polarity between the solute and water, the more likely it is that the solute will be incorporated closer to the core of the micelle. Surfactant micelles provide nonpolar solutes with a favorable position within solution, thus improving their solubility. Improvements in the solubilization of poorly soluble drugs in water are often reported, such as those reported for erythromycin with the inclusion of nonionic surfactants by Bhat et al., as well as for systems in which solubility is limited by pH. ,

Solubilisation by complexation is a common method for improving the solubility of poorly soluble compounds. Complexing agents work similarly to surfactants, by creating a favorable intermolecular association between water-soluble ligands and poorly soluble substrates within an aqueous environment. There are multiple classes of complexing agent actively used to improve the solubility of organic molecules, including metal coordination complexes, organic molecular complexes, inclusion complexes and pharmacosomes. Coordination complexes, unlike other classes of complexing agent, form through the covalent bonding of ligands with a central metal cation. The formation of water-soluble metal complexes has been shown to improve the solubility and stability of a variety of commercial drugs. , The complexation of organic substrates within organic, inclusion and pharmacosome complexes occurs through a range of noncovalent interactions, such as hydrogen bonding and charge-transfer. Organic molecular complexes are often formed between a poorly soluble drug molecule and a similarly sized ligand or polymer. The introduction of caffeine or β-cyclodextrin to an aqueous solution of sulfathiozole leads to a significant increase in the solubility of sulfathiozole. Polyvinylpyrrolidone is a very common polymeric complexing agent, and has been shown to improve the solubility of a range of drug molecules. , Inclusion complexes can be generally defined as host–guest complexes, where the substrate is partially or fully enveloped within the ligand superstructure. Several variations of these ligands exist, such as clathrates, channel lattice complexes and supramolecular cyclic structures. Pharmacosomes, which possess the most similarity to surfactants, are amphiphilic phospholipid structures that form concentric lipid bilayers within aqueous solution. Substrate-phospholipid complexes can increase substrate solubility in both aqueous and nonaqueous solutions. ,

1.3.5. Solid Form and Solubility

Observed solubility values will be dependent on the physical form of the precipitate. If the precipitate is a solid crystal, then the crystalline polymorphs must be considered. Many, perhaps even most, organic compounds are capable of crystallizing into two or more polymorphs with distinct crystal structures. Since by definition it has the lowest free energy, the most stable polymorph is always anticipated to be the least soluble solid form.

Kinetic solubilities can be assigned to any polymorph that is not the most thermodynamically stable crystalline form of the solute. Such polymorphs form according to Ostwald’s rule, which states that less stable polymorphs precipitate more rapidly due to their kinetically favorable disorganized state. Given enough time, kinetic polymorphs will rearrange into more thermodynamically stable states, until the crystal structure with the lowest free energy is found. A kinetic solubility value can typically be assigned to any metastable polymorph, provided it is stable enough to exist as the precipitate without changing form. The precipitate observed is strongly dependent on the conditions and method used to measure solubility and is unlikely to be reproduced under different protocols. Any combination of amorphous, crystalline, salt or cocrystal states could contribute to the physical make up of the precipitate. Typically, solubilities of polymorphs of the same compound differ by factors of two or less, which corresponds to about 1.8 kJ mol–1 at room temperature, though occasionally these differences may be of an order of magnitude or more.

Beyond polymorphic crystal forms of the compound itself, there may also be hydrates, solvates or cocrystals with other species present in the crystal. The existence of different possible solid forms raises questions both of the identity of the solid present in a given solubility experiment, and also of whether the most thermodynamically stable polymorph of the given compound has even been discovered yet. The nature of the solid present may depend on the solubility technique being employed, for instance dispensing from DMSO typically leads to amorphous solids, while potentiometric methods allow more control. In a particularly impressive example of the latter, Llinas et al. were able to discover a new stablest polymorph of the pharmaceutical compound sulindac, and to measure distinct solubilities for both polymorphs in the same experiment.

1.4. Definitions of Solubility

The many factors that influence solubility have given rise to different definitions of solubility in the published literature. For the purposes of discussing the physics-based (and data-driven) predictive methods in this field, the most useful definition to start from is ”intrinsic solubility” because it is clearly defined, can be measured accurately, and has been widely adopted in modeling studies, including two community-wide blind challenges.

1.4.1. Intrinsic Solubility

Intrinsic solubility is defined as the concentration of the un-ionized form of an ionizable molecule in a saturated solution at thermodynamic equilibrium at a given temperature. Several different models make use of intrinsic solubility for the determination of molecule specific solvated behavior. One such model is the Noyes-Whitney equation, used to calculate dissolution rate.

dWdt=DA(CsC)L 11

where the left-hand term describes the rate of dissolution, D is the diffusion coefficient specific to a given solute–solvent pair, A is the surface area of the crystalline solute, C is the concentration of the bulk solid, C s is the concentration of the solute in solution and L is the diffusion layer thickness.

The Noyes-Whitney equation accurately describes the relationship between dissolution rate and solubility, with a poor dissolution rate often mirrored by an equally poor solubility, and vice versa. Many parameters influence a solute’s dissolution rate, and cause the rate to change as dissolution occurs. For example, the particle surface area of a solute does not typically remain constant during dissolution, often decreasing in size over time. This results in a dissolution rate which similarly decreases with time, thus influencing any measurements taken later.

1.5. Experimental Methods for Intrinsic Solubility Determination

The shake flask method , is a conceptually simple approach to solubility determination, the principle being simply to keep dissolving solute in the solution until it is saturated. Thus, an excess of sample is added to the solution, which often includes a buffer, and the sample is shaken at a given temperature and pressure to dissolve as much sample as possible. This is expected to yield a supersaturated solution, and the solution will require a period of time to reach equilibrium, the length of which may be difficult to estimate. Excess sample is filtered off, and the concentration of the remaining equilibrated solution is measured; usually by a process involving liquid chromatography linked to detection by either UV/visible spectroscopy or mass spectrometry.

Solvent evaporation methods such as LYSA ,, have some similarities with the typical industrial implementations of the shake flask approach, the major difference being that the sample is typically delivered as a solution in an organic solvent, which is then evaporated to leave a solid sample, not necessarily crystalline. Again, spectroscopic or chromatographic methods can be used to measure the saturated concentration, calibration to known concentrations of the stock solution facilitating quantitative results.

CheqSol (Chasing equilibrium Solubility) ,− is a potentiometric titration approach applicable to compounds with a titrable ionizable group, whether acidic or basic. The procedure uses automated acid–base titrations to shuttle between supersaturated and subsaturated concentrations a little higher and lower than that at which precipitation is observed. In this way, the system is forced toward equilibrium on a time scale of tens of minutes to around an hour. The continued cycling between super- and subsaturated solutions has been used to identify changes in the crystalline polymorphic form of the precipitate in some cases. While CheqSol remains a low-throughput method, it avoids long equilibration times and uncertainty over whether equilibrium has been reached. Similar to CheqSol, the Dissolution Template Titration (DTT) method is also a potentiometric approach. In DTT, the problem is understood in terms of a three-state model, consisting of an initial state before titrant addition, an intermediate state following titrant addition but prior to solute dissolution, and the final equilibrium state. The equilibrium solubility is calculated by analyzing the Bjerrum plot for the titration under the assumption that this three-state model applies.

The synthetic method ,, uses the intensity of a laser passed through a solution to assess the presence of solid matter. The intensity is measured first for pure solvent, then a small quantity of solute is added. The detected laser light intensity will initially drop, but should recover as the solid is dissolved. Saturation is indicated by the lowest concentration at which the signal fails to recover full intensity after addition of a further small quantity of solid. The same approach can be used to measure temperature-dependence of solubility, by varying the experimental temperature and finding the limit where full dissolution no longer occurs at a known concentration.

A contrasting approach is to adapt the high-throughput kinetic solubility assays commonly used by pharmaceutical companies, in the hope of tweaking them to facilitate equilibrium solubility determination. These methods, however, generally rely on compounds being delivered from stock solutions in dimethyl sulfoxide (DMSO), which means that the assayed solution will contain some DMSO effectively acting as a cosolvent. The method is likely to overestimate the equilibrium solubility due to remaining supersaturation and the experimental time scale would also need to be extended to at least several hours.

While all these methods are widely used, each of them has distinct advantages and disadvantages. The shake flask method is the traditional gold standard, offering reliable and accurate thermodynamic solubility measurements. However, it is time-consuming, labor-intensive, and requires a significant amount of material, making it less suitable for early stage drug discovery. Solvent evaporation methods, such as LYSA, provide rapid results using minimal sample quantities, but evaporation-based techniques can lead to nonequilibrium conditions, potentially skewing results. Potentiometric methods, such as CheqSol and Dissolution Template Titration (DTT), offer fast, automated solubility measurements by tracking pH changes. These methods are convenient for ionizable compounds and can provide insight into both kinetic and thermodynamic solubility, and often additional information about acid–base behavior, but they may not be applicable for nonionizable molecules and require precise calibration. High-throughput methods that involve diluting compounds into aqueous solutions from DMSO stock allow for rapid screening with small amounts of material. However, DMSO can affect the solubility of certain compounds, leading to potential inaccuracies, and the method may not capture the desired solubility accurately due to the presence of the DMSO. The techniques most suited to accurate measurement of equilibrium solubilities, the shake flask and CheqSol methods, require effort of at least the order of hours per compound. Higher throughput assays, on the other hand, are more suited to give only an order of magnitude estimate of the equilibrium solubility, which while useful in some contexts does not yield precise or accurate values. Benchmarking computational chemistry methods is not really possible without the best available experimental data, including an understanding of the magnitude of likely experimental errors.

1.6. Errors in Experimental Solubility Data

As noted elsewhere, various kinds of errors can occur in solubility data, especially secondary data in the literature. The first kind are what have been called gross errors, essentially outright mistakes. Sometimes that might be due to a failure to appreciate the difference between the solubility of the neutral form, as required for intrinsic solubility, and that of ionised acids or bases. Another possibility is that the distinction between kinetic and equilibrium solubilities is blurred, either due to misunderstanding or to a rushed experimental time scale leading to a supersaturated value being reported. The solubility determination of indomethacin, as discussed in detail by Comer et al., exemplifies a different type of mistake, where previously researchers had failed to anticipate a chemical reaction occurring under the experimental pH conditions, leading to the solubility of the wrong compound being recorded. A further possibility is typographical errors, which can occur either through mistyping by hand numbers from a literature source, such as writing log S = −3.79 as log S = −7.39, or by accidentally mishandling electronic data.

A second class of error is the systematic error that might occur between sets of solubility determinations which may result from differences in technique or between different laboratories, even if in principle the same technique has been applied. The possibility of repeatable differences between say CheqSol and Shake Flask has been investigated to some extent by Avdeef, see Figure of ref . That analysis does not suggest that regressions between methods have gradients differing from unity by much more than 0.01 or intercepts much larger than 0.15 log S units. Distinct laboratories may also have different practices in terms of the extent to which variables such as temperature and pH are controlled and monitored. Similar considerations affect the compilation of solubility data sets and the extent to which variations in reported experimental temperatures or pH are acceptable in data that may later be used as a training or test set for solubility prediction. High-throughput solubility assays based on multiwell plates may have systematic differences first between different plates, and second between the corresponding positions, that is the same row and column numbers, on different plates. The interplate differences are likely to be a simple consequence of each plate often being loaded with a set of similar compounds from a single source, while the location-based differences may be the result of systematic biases in the dispensing or reading technologies, as suggested by Rohde in the context of the first EUOS/SLAS Joint Challenge: Compound Solubility.

3.

3

Illustrative example of a probability distribution for a system of solute (gray particles) and solvent (blue particles) as a function of solute fraction (χ i ). At the solubility limit, the solute particles will have an equal probability of being in both the solid phase (the green peak at χ i = 1.0) and the solution phase (the blue peak). The location (mole fraction) of the solution phase peak is the solubility limit. Reproduced with permission of the Royal Society of Chemistry from ref . Available under a Creative Commons Attribution 3.0 Unported License. Copyright 2018 Royal Society of Chemistry.

Third, there are random errors between repeat runs of the same experiment as conducted by the same experimenters with the same apparatus. For CheqSol, this kind of reproducibility error is asserted to be as small as ± 0.05 log S units. Small random differences between runs are unavoidable, but their effects can be reduced and mitigated by running multiple repetitions; albeit with a consequential reduction in throughput.

Additional factors may cause errors in solubility assays. A significant fraction of small organic molecules with low aqueous solubility, including some marketed drugs, can form nano- or microscale agglomerates (30 nm to 1000 nm diameter). These colloidal aggregates can skew experimental solubility results, leading to under- or overestimated solubilities. Separately, they also cause problems for some biological assays, where unspecific binding of the aggregate to proteins can cause local denaturation and apparent inhibition. Algorithmic methods have been developed to identify self-aggregation behavior based on either QSPRs or chemical similarity to known aggregators, though the latter approach is limited by design to known chemotypes. Key descriptors in these models normally capture aspects of hydrophobicity (e.g., logP or hydrophobic surface area), hydrogen bond donor/acceptor characteristics, and molecular structure (e.g., no. of sp3 centers, self-complimentarity between key interaction sites), but individually few of these descriptors are specific to the aggregation process. , The models that have been developed to predict self-aggregation can be used as computational filters to remove molecules that are likely to cause problems in experimental assays. When combined with computational solubility predictions, this makes it possible to prioritize for experimental analysis those molecules that have the desired solubility profile and are expected to be easier to analyze. The inclusion of ”self-aggregation” descriptors in data-driven solubility models would not be expected to lead to an improvement in performance because the descriptors are not specific to the aggregation process, as noted above.

One of the most significant works discussing the size and nature of errors in solubility data is by Avdeef. He demonstrates that, by very careful and time-consuming curation of the data, common sources of error can be identified, accounted for, and in some cases corrected. This requires critical analysis and detailed understanding of the experimental procedures used, thus identifying instances where the experimental conditions such as pH, buffers, temperature, technique, purity, crystallinity of the solid and compound stability may require accounting for. By this kind of careful and thorough approach, Avdeef and colleagues were able to prepare the 100-compound ‘tight’ data set for the 2019 Solubility Challenge, with a claimed interlaboratory standard deviation as low as ± 0.17 log S units, a far cry from more commonly cited figures in the region of 0.5–0.7 log S units. This required data selection as well as data curation and correction. An accompanying 32-compound ‘loose’ data set contained compounds where the data could not be corralled into such close alignment, and had a cited interlaboratory standard deviation of ± 0.62 log S units.

Since there is some uncertainty as to how large the errors in experimental solubility data are, it is therefore not clear exactly how much of the observed RMSE of a predictive method is down to experimental error and how much is due to the limitations of predictive models. In practice, teasing these components apart may be problematic, since experimental error impacts the prediction process twice for data-driven models; once in the training data on which the model is built, and a second time in the test data against which the predictive performance is assessed.

2. Data-Driven, Machine Learning, and AI-Based Methods

Recent advances in data-driven modeling and machine learning have had significant impacts on many fields, including chemistry. However, attempts to find quantitative relationships between chemical structure and experimental properties such as solubility have a long history, an early example being the work of Fühner from 1924 in which the solubility of a series of hydrocarbons was related to the number of methylene groups. Development of Quantitative Structure Property Relationship (QSPR) modeling in the 1960s by Hansch, provided a new approach for building predictive models and although initially used to predict biological activity, these techniques were quickly applied to the prediction of other ADMET and physicochemical properties. Many early models were constructed using multilinear regression, similar to the group contribution approach, but approaches have since been expanded to other methods, including Support Vector Machines (SVM), Random Forests (RF) and other tree based methods and Neural Networks (NN), , benefiting from the advances in machine learning during the 1990s and 2000s.

More recently the advent of deep learning has brought with it expectations for another step-change in prediction accuracy. However, while deep learning has shown dramatic advances in many fields, including in chemistry with the AlphaFold2 model for protein structure prediction, a similarly significant improvement has not yet been demonstrated in property prediction. These techniques bring with them a number of new challenges, particularly the need for large data sets with reliable and consistently measured solubility values. Outstanding questions also remain on the best way to represent molecules within deep learning, with competing descriptor, graph and text based approaches, and use of 2D or 3D information. The more “black-box” nature of deep learning also makes understanding the reasons behind particular predictions significantly more challenging.

A useful measure of the recent progress in data-driven solubility prediction can be made by considering the solubility challenges, the first posed in 2008 and the second released 10 years later. The first Solubility Challenge was proposed to address the large numbers of prediction models published and to try to evaluate the different approaches, however, the results proved inconclusive. In the intervening years, debate has focused on whether the limiting factor is the accuracy and consistency of experimental solubility data or limitations in the methods and algorithms used. The intervening decade has also seen the wider adoption of machine learning in property prediction and the introduction of deep learning techniques to chemistry and solubility prediction specifically and by the second challenge, in 2018, all models submitted used QSPR or Machine Learning (ML) based models.

2.1. Group Contribution and QSPR Approaches

A conceptually simple way to predict various properties from molecular structure is the Group Contribution (GC) approach, in which the property of interest is predicted based on the presence and counts of certain molecular substructures or functional groups. In its simplest form, the GC approach uses a weighted sum over groups and its success relies on the additivity of the property being predicted. An early application to solubility by Klopman et al., built on earlier work to predict log P resulting in an RMSD of 1.25. This work was further developed using an expanded fragment set reducing the RMSD to 0.84 on a larger 120 compound test set. , Hou et al., found that an atom, rather than fragment, based approach gave further improvement with an RMSD of 0.79 on the same 120 compound test set with the added advantage that atom based approaches are not necesserily limited to a specific region of fragment space.

Group contribution based methods can work well within a series of related molecules or a limited region of chemical space, but suffer limitations when applied to molecules dissimilar to the training set, especially if new fragments are present. Multilevel GC methods go some way to accommodating this and also capturing collective or proximity effects. By including second and third degree corrections Marrero et al., were able to achieve an accuracy of 0.55 log units.

While the GC approach is relatively successful in capturing additive behavior in the liquid phase, both for solubility and other related properties, such as log P, it is less able to capture the complexity, nonadditive behavior and long-range interactions associated with the crystal phase, a considerable limitation for solid solubility prediction. GC solubility models generally perform better on liquids than solids and additional descriptors or experimental data such as melting points are often needed to improve the treatment of the solid phase.

GC methods are also used to model components in other solubility prediction approaches. For example, GC based log P prediction models are important chemical descriptors for QSPR models and have been used as a substitute for experimental data in semiempirical methods such as the General Solubility Equation.

QSPR modeling can be considered as an expansion on the group contribution approach. The simplest models are based on multilinear regression, mirroring group contribution models, but with variables which indicate the presence of specific functional groups replaced with a broader set of descriptors or molecular fingerprints. Thousands of different descriptors exist, with many freely available in open-source chemoinformatics toolkits. Descriptors include familiar molecular properties such as molecular weight, charge and numbers of hydrogen donors and acceptors, but also values calculated from the topology and connectivity of the molecule to capture the 2D structure. In some cases descriptors are also calculated from 3D conformations.

The ESOL method developed by Delaney uses 9 descriptors derived from chemical formula or 2D chemical structure and fit to 2874 experimental intrinsic solubility values. The model achieves a test set RMSD of 1.01, which represents a similar performance to the GSE, but unlike the GSE, ESOL contains no experimental parameters. Huuskonen on the other hand, used a set of more abstract descriptors including E-state indices to fit a NN with a single hidden layer. The model performed well on a test set spanning the same chemical space as the training set with RSMD of 0.6. Both of these models and the associated data sets have become important benchmarks and are often used to assess the performance of new approaches.

Descriptors can also be developed from separate computationally calculated chemical properties and used as input to a QSPR model to introduce additional physics-based information, such as through the use of molecular simulation. Jorgenson and Duffy used Monte Carlo simulations to calculate interaction energies and molecular surface areas, averaged over 3D conformations. The resulting model contained only 5 descriptors in total, but gave an RMSD of 0.55, comparable to or better than many GC based approaches. Although supplementing QSPR models with data from alternative computational approaches is attractive, the disadvantage of these more complex descriptors is the additional steps and time needed when making predictions on new compounds.

2.2. Deep Learning

In the last 10 to 15 years, the accumulation and availability of huge data sets alongside increases in hardware performance, specifically the use of GPUs, has initiated the field of deep learning. The importance of this field was recognized with the award of the 2018 Turing Prize and in many areas has led to a significant jump forward in predictive power and accuracy, with landmark achievements made in areas including image recognition (AlexNet), and protein structure prediction. Deep learning shows huge promise and potential and is being applied to a range of prediction tasks within chemistry. However, outstanding questions such as how to represent molecules and optimal model architecture remain.

2.2.1. Deep NNs

Deep learning methods are based on neural network models. However, the first uses of neural networks for solubility prediction predate the recent explosion in interest. In 1991, Bodor et al. compared the performance of a NN and simple regression models, finding that the neural network based model outperformed the simple model even on a small data set of 300 compounds. A more recent study from Boobier et al., also found that a NN with a single hidden layer outperformed other models on a 100 molecule data set. However, whereas Bodor’s model contains 18 neurons, typical deep learning models contain thousands or millions of fitted parameters and often multiple hidden layers, thus requiring much larger training data sets.

A number of deep NNs trained on molecular descriptors or fingerprints, similar to QSPR models, have been developed. Conn et al. looked at the effect that training data set size and different descriptor sets have on model performance and highlighted the importance of not just the size of the training data set, but also the chemical space represented. Cui et al. investigated the impact of network architecture on performance and specifically the network depth. Using a ResNet with convolutional layers trained on PubChem molecular fingerprints, they found that very deep networks with 20 or 26 hidden layers were the best performing on their test data.

2.2.2. Graph-Based NNs

Deep learning has also led to new architectures and new approaches allowing learning directly from molecular structures. Graph based NNs have shown promise in chemical applications and intuitively appear well suited to handling 2D molecular structures, removing the need to first represent the molecule numerically using precomputed descriptors. Graph based NNs for chemistry differ from the Graph Convolutional NN (GCNN) in some other fields, such as spectral GCNNs, as they must be able to accept graphs of various sizes and structures. All spatial graph based NNs use the same basic operations and are related to general message passing NNs. At each layer, information at each node is shared with the neighboring nodes, allowing the influence of each atom in the molecule to gradually spread across the molecule.

Graph based networks were used in one of the early applications of deep learning to aqueous solubility prediction by Lusci et al. Molecules were represented as a set of directed graphs, in which for each atom in a molecule, there is a separate graph with each bond in the graph pointing toward that one atom. Atoms are represented as feature vectors and information from each atom is mixed with neighboring nodes at each level of the network following the direction of the bonds. The inclusion of additional, physics-based information in the form of a log P descriptor was found to have no impact on overall performance for most models. This method was tested on a number of common benchmark data sets, with an average accuracy of 0.58 reported on the ESOL data set and an RMSD of 0.90 on the first Solubility Challenge.

A different approach was taken by Duvenaud et al. to define a different GCNN algorithm. Molecules were represented as undirected graphs and the graph convolutional layers were used to generate a learnt molecular fingerprint. The model showed comparable performance to Lusci’s model, with an average error of 0.52 when tested on the ESOL data set, and outperformed other NN models built using precomputed fingerprints, highlighting the advantages of learning directly from the molecular structure. By tracing particular elements in the fingerprint back to the molecular substructures present in the molecule, some information about how the model makes decisions could also be obtained.

This ability for GCNNs with learnt molecular representations was further investigated by Yang et al., in relation to a range of physicochemical property prediction tasks and resulted in another variant on the graph based approach, D-MPNN, in which information is placed on bonds rather than on atoms. Tests on the ESOL data set showed that this bond-centered approach outperformed an equivalent atom-centered network and also NNs trained on precomputed descriptors or fingerprints. Ryu et al. developed Bayesian GNNs for solubility prediction, with their test set encompassing 20% of the approximately 10000 compounds in the data sets. Despite the promise they saw in their results, as yet the work appears to have been published only as a preprint.

Wiercioch and Kirchmair utilized a deep transformer model with a graph-based molecular representation to predict solubility for a set of approximately 130 compounds. Their study offers a valuable insight into the limitations that training set size places on such larger ML models. Even though they used transfer learning on a set of 6000 compounds with pK a values to pretrain, their results still showed substantial dependence on data set size. When given over 1000 solubility values to train on, the transformer clearly outperformed a Random Forest benchmark. However, upon repeating that comparison with a reduced training set of 100 compounds, the transformer’s predictivity fell behind that of the Random Forest. As to molecular representations, Zheng et al. found that a graph-based approach required plentiful high-quality data to outperform traditional cheminformatics descriptors. These findings demonstrate the critical need for larger sets of verified high-quality solubility data, which will be essential to these larger models fulfilling their promise.

2.2.3. Other NN Models

In chemoinformatics, molecules are often represented as text strings using the SMILES string format, or more recently SELFIES. This opens up the possibility of applying NN models designed for learning from text, such as transformers, to molecular property prediction. Francoeur and Koes developed SolTranNet for solubility prediction from SMILES strings based on a molecule attention transformer architecture. As a regression model, SolTranNet was found to perform worse than other deep learning approaches on the solubility challenge data set, however, it showed good accuracy at classifying insoluble compounds.

With the diversity of deep learning approaches and architectures available, it is not yet clear if particular architectures are more suited to deep learning and various comparison studies have been performed. Panapitiya et al. recently performed a thorough comparison of a range of different deep learning architectures. In contrast to earlier studies on GCNNs, they found that a NN trained on precomputed descriptors out-performed methods which learn directly from the chemical structure. Of these methods though the graph based approaches were the most promising. Further investigations are needed using consistent training and test sets to more conclusively determine the most promising approaches for predicting physicochemical properties and specifically solubility.

2.2.4. Large Language Models

The rapid development of large language models (LLMs) in recent years has revolutionized how we access, interact with, and verify information. Those in the chemical sciences have already felt its far-reaching consequences, from integration into educational settings to its use in theoretical chemistry. While still a rapidly evolving method, large language models (LLMs) present a promising and intuitive method of solubility prediction. Since molecular property prediction is often limited by the expensive and time-consuming process of labeling data, LLM-based approaches typically employ zero-shot or few-shot scenarios where labeled data for use in learning is either limited or absent. There have been some interesting recent developments in chemistry-trained LLMs, for example in the QSAR/QSPR field. Zheng et al. recently presented a predictive LLM model pretrained on the academic literature, which they tested over a wide range of scientific problems, including both solubility and hydration free energy prediction, with very promising results. It will be interesting to observe future developments of this and similar models.

In 2023, Seidl et al. presented CLAMP (Contrastive Language-Assay-Molecule Pretraining), a new contrastive learning method which uses both textual and chemical data as input for activity prediction and which can adapt to new prediction tasks by ‘understanding’ the natural language information used to describe the task. CLAMP’s innovative modular architecture encodes language data and structure data separately before embedding the two data sources in a joint module. Seidl et al. evaluated their model’s effectiveness in three tests. First, a zero-shot transfer learning exercise to investigate whether the model could gain knowledge from textual data alone, with no chemical structure information; while Scientific Language Models (SLM) have shown ability to perform such tasks, previous prediction models fail. , The CLAMP model was found to outperform all tested comparator methods, including SLM models, except for the frequency hitter model developed by Schimunek et al., which does not make use of textual input but instead enriches the representation of the test molecule using information about similar molecules. The second test, representation learning, checked whether the molecular representations learned by CLAMP were transferrable across data sets. It was found that CLAMP performed better than other models on five of the eight data sets tested, and for those where it did not perform best, results were within one standard deviation of those of the best-performing model. The third test used CLAMP as a retrieval tool, enabling users to search a chemical database and find molecules ranked in priority for potential wet lab applications based on a given bioassay query. In this task, CLAMP outperformed the previously best-performing model, KV-PLM, by a multiple of 50 in its ability to rank highly those molecules which were active toward the query bioassay. The model’s accuracy in performing this task was based on the enrichment factor, which measures the accuracy of the top n results in a retrieval task. Given the success of these test results, CLAMP represents a foundational use of LLMs in molecular property prediction.

In 2024, Zhao et al. presented GIMLET (Graph Instruction based MolecuLe zEro-shoT learning), another tool for molecular property prediction which made use of a large language model-based approach. With their work, Zhao et al. sought to overcome two key issues with the CLAMP approach. The first was that the graph neural network on which CLAMP relies has only a limited ability to carry structural information. The second was caused by the inclusion of CLAMP’s additional joint embedding module. This additional module is difficult to train as “deep transformers have vanishing gradients in early layers”, , which means that the gradients become so small that the network is unable to learn from the data in the additional module, and the module also incurs further cost and the need for additional parameters. To circumvent these issues, GIMLET unified graphical and textual data, encoding them without the need for an additional embedding module. The GIMLET method gave promising results in zero-shot scenarios and was applied to both classification and regression problems. While LLM-based approaches tend to struggle with regression problems due to difficulty in formatting numerical outputs, GIMLET generated correctly formatted numerical answers in 98% of its regression tasks. The GIMLET method was used for zero-shot aqueous solubility prediction for a set of molecules, using the following textual instruction:

“Solubility (logS) can be approximated by negative LogP −0.01 * (MPt - 25) + 0.5. Can you approximate the logS of this molecule by its negative logP and MPt?”

GIMLET therefore used textual input to provide the model with the General Solubility Equation. Zhao et al. noted a “strong correlation” between predicted and experimental results for the zero shot prediction, with a RMSE of 1.132, compared to 1.331 for a supervised graph convolution network and 1.253 for a supervised graph attention network approach. GIMLET was outperformed by the supervised Graphormer method, with an RMSE value of 0.901, or of 0.804 when using a pretrained data set. However, when GIMLET was used to make few shot solubility predictions, results outperformed the supervised general intelligence network. The GIMLET model represents a significant step forward in solubility prediction, offering results which outperform other several supervised methods, and which allows for straightforward user input in natural language.

However, Liu et al. later noted that CLAMP and GIMLET lacked the extensive generalization that LLMs possess when used in natural language processing, such as the GPT softwares. , Liu et al. therefore presented MolecularGPT, a LLM which can be generalized to a variety of molecular property prediction tasks in few shot and zero shot scenarios. Their work sought to unify data corresponding to molecules of different sizes, densities and chemical space into a single consistent format. MolecularGPT uses SMILES codes as a method of converting graph information to string information. For predicting ESOL water solubility on the same data set as was used to test GIMLET, Molecular GPT predictions gave a RMSE of 1.471, higher than the 1.132 of GIMLET. However, MolecularGPT was more consistent in its answers than GIMLET when the type of instruction is changed; MolecularGPT’s predictions had a standard deviation of 0.007 with respect to its RMSE across five different instruction types, while GIMLET had a standard deviation of 0.020. ,

Both MolecularGPT and GIMLET represent the cutting edge of natural language processing in the prediction of solubility, as well as having the power of generalized molecular property prediction.

2.3. Empirical “Physics-Inspired” Models

Alongside empirical, data-driven methods for solubility prediction, there are also a number of semiempirical methods which use theoretical, physics-based arguments to construct simple relations between solubility and other calculated or experimentally determined parameters.

2.3.1. General Solubility Equation

In 1980, Yalkowsky and Valani proposed the first version of the General Solubilty Equation (GSE). Subsequently revised by Jain and Yalkowsky, this equation relates solubility, log S, to the octanol–water partition coefficient, log P, and melting point (MP), T m ,

logS=0.5logP0.01(Tm25) 12

where temperature is measured in degrees Celsius. For liquid solubility, T m is set to 25 °C so that the MP term is zero. The equation describes the intrinsic solubility of organic compounds, although modifications to extend it to weak electrolytes have also been described.

The form of the GSE can be understood by considering the fusion cycle. The melting point term is associated with the free energy of fusion, which must be overcome to melt the solid and produce a supercooled liquid solute phase. This represents the ideal solubility, in which the enthalpy of solute–solvent mixing is zero. The log P term accounts for the solvation of the solute as the supercooled melt mixes with the aqueous phase. log P has been shown to correlate approximately linearly with solubility, a relationship which also explains the prevalence of log P descriptors in QSPR models for solubility prediction. The GSE contains no trainable parameters and the values of the coefficients and intercept term have not been obtained by fitting to experimental data. However, fitting equations of the same general form to experimental solubility data sets has resulted in similar parameters for the coefficients and intercept and only slight improvements in prediction accuracy. ,,

Solubility predictions using the GSE can be made with experimental log P and MP data if available, but calculated log P descriptors, such as the atom contribution approach of Wildman and Crippen can also be used without too great a loss of accuracy (for example the RMSD increased from 0.52 to 0.62 on the Jorgensen and Duffy data set). Replacing the experimentally determined MP with a predicted MP value is more difficult due to the limited performance of current MP prediction models with RMSDs of around 31 – 40 °C. However, McDonagh et al. demonstrated that useful qualitative predictions can be made using a RF model to predict the MP, especially considering the relatively low weighting of the MP term in the GSE.

Removing the reliance on experimental data enables the use of the GSE to be extended to unsynthesised compounds. Taking inspiration from the form of the GSE, Hill and Young proposed the Solubility Forecast Index (SFI), defined as SFI = c log D pH 7.4 + N aromatic rings, to estimate the level of solubility for drug-like compounds, with the general guideline that compounds with an SFI < 5 are likely to have reasonable solubility and other physicochemical properties.

Jain and Yalkowsky also developed an alternative model to the GSE called SCRATCH, which uses no experimental parameters. Instead SCRATCH combines the aqueous activity coefficient, calculated from the AQUAFAC group contribution approach, with the predicted MP, based on the enthalpy and entropy of melting, calculated using a group contribution approach and descriptor based equation respectively. This showed only slightly reduced performance compared to the GSE.

The GSE has been tested on a number of common benchmark test sets, with RMSDs of 1.23 (first solubility challenge), 1.10 and 1.24 (second solubility challenge tight and loose test sets respectively), using experimental MPs. , The GSE works well on small compounds, but is less accurate on the larger compounds found in pharmaceuticals or pesticides. This is in part due to the assumption of Walden’s rule, which states that the entropy of melting is constant, and which breaks down for larger and more flexible compounds.

To address this limitation, Avdeef and Kansy extended the GSE by including two additional parameters to account for molecular flexibility and hydrogen bonding, which modify the log P and MP term coefficients and intercept in the original GSE. The flexibility was accounted for using the Kier index, following the suggestion from Caron et al., while hydrogen bonding was included using the Abraham H-bond acceptor parameter, B. Unlike the original GSE, the contributions of these additional terms were fit to experimental data using a large data set of intrinsic solubility data. The resulting “Flexible Acceptor” GSE showed considerable improvement on larger more flexible, compounds, specifically “beyond rule of 5” (bRo5) compounds which break Lipinski’s rules of 5 (Ro5) compared to the original GSE (RMSD reduced from 3.0 to 1.1), without sacrificing much of the simplicity of the original GSE equation. A subsequent study found that this “Flexible Acceptor” GSE also outperformed the original GSE and an RF model on a test set comprising newly approved drug compounds covering both Ro5 and bRo5 chemical space.

2.3.2. Hansen Solubility Parameters

The GSE is limited to aqueous solubilities, due to the use of the octanol–water partition coefficient to account for the solute–solvent interactions. However, other approaches have been developed to estimate solubility in different solvents.

In 1950, Hildebrand and Scott introduced the solubility parameter, δ2, defined as the interaction energy density for a given compound, with the idea that solutes and solvents with similar values of δ2 should be mutually favorable for solubility. The rationale was that molecules will be soluble if the strength of solvent–solvent and solute–solute interactions is similar to the strength of solvent–solute interactions, so that the enthalpic penalty to mixing is minimized. This idea is often expressed by the rule of thumb that “like dissolves like”. The approach was developed further by Hansen, by splitting the total energy density into three Hansen Solubility Parameters (HSP) to separately describe the contributions of the dispersion, δ D 2, polarity, δ P 2, and hydrogen bonding, δ H 2, to the overall energy density, where δ2 = δ D 2 + δ P 2 + δ H 2.

The HSP define the coordinate axes of a 3D space in which solvents and solutes can be arranged. The distance between solute i, and solvent j, within this HSP or Hansen space is defined as

Ra=4(δD,iδD,j)2+(δP,iδP,j)2+(δH,iδH,j)2

­(note that the dispersion term is scaled by a factor of 4). Solutes are soluble in solvents which are close in HSP space, and the boundary between soluble and insoluble mixtures is represented by the Hanson sphere, which has radius R o . The location of solvents relative to the edge of the Hansen sphere can be described by the Relative Energy Difference (RED), RED = R a /R o , where RED < 1 indicates a solvent is inside the Hanson sphere and the solute is likely to be soluble.

Originally the HSP approach was applied to find suitable solvents for polymers, however, it has been extended to look at solubility in a much wider range of systems, including ionic liquids and lipid bilayers for drug delivery. Interest in HSP methods has also increased with the growing focus on green chemistry and in particular the search for alternative solvents with a lower environmental impact. Modifications to the form of the HSP parameters have also been suggested to better capture the behavior of small molecules and to extrapolate results to different temperatures.

HSP values for new solutes can be found from experiment, however a variety of methods have also been used to predict HSP, many based on group contribution approaches. For example, Stefanis et al. used a combination of UNIFAC functional groups, supplemented with second-order groups to account for the effect of conjugation. Alternatively, Mathieu suggested a 3 parameter equation which takes the molar volume and a GC based estimate of the molar refractivity, and is fit to experimental δ D data. This was combined with a fragment based model to predict δ H and δ P , based on a more limited subset of fragments which significantly contribute to molecular polarity and hydrogen bonding. The use of the molar refractivity proved to be an improvement for δ D prediction, however, the more extensive training set of Stefanis et al. gave better predictions for δ P .

HSP values have also be calculated using information from other physics-based methods. Jarvas et al., used σ-moments calculated from COSMO-RS as input to a NN with a single hidden layer. They found that they were able to predict HSP for a diverse set of compounds including salts and ionic liquids and that a NN was an appropriate model to capture the nonlinear relationship between σ-moments and HSP. More recently, Sanchez-Lengeling et al. have developed a workflow which combines various techniques including simulation, DFT and COSMO-RS calculations to obtain descriptors to train a Gaussian Process model. This model, named gpHSP, provides both an estimate of the three HSP parameters and the associated prediction uncertainties and has a predictive performance approaching the experimental error in measured HSP values.

3. Physics-Based Methods

Physics-based methods of solubility prediction are largely based on first-principles, so do not require parametrization against experimental solubility data in the way that data-driven methods do, and are capable of providing chemical insight. Over the last few decades, many different physics-based methods for solubility prediction have been developed. These methods fall broadly into two groups, which will be reviewed in detail: (i) direct methods, which compute solubility directly from simulation; and (ii) indirect methods, which calculate thermodynamic parameters that are then combined to obtain solubility, often based on a thermodynamic cycle.

3.1. Direct Calculation of Solubility

Direct solubility calculations rely on simulating the system of interest. Several distinct methodologies exist, including the direct coexistence method, in which solid-phase and solution-phase solute molecules are equilibrated in a molecular dynamics simulation; , the chemical potentials route, which leverages the equivalence of the solid-phase and solution-phase chemical potentials at the solubility limit; the density of states approach, which offers a distinctly different methodology; , and the Einstein crystal method, which uses thermodynamic integration from a hypothetical crystal. These methods have been used to predict aqueous and nonaqueous solubilities of simple salts and small organic molecules. However, they are often limited by high computational cost and the ability of the chosen force field to represent a range of solute molecules in both solid and solution phases in a single simulation. Therefore, the following discussion tracks efforts to improve computational efficiency and provide accurate, generalizable results.

3.1.1. Direct Coexistence

Commonly known as the ”brute force” approach, direct coexistence uses molecular dynamics simulations to simulate two adjacent periodic cells, one populated with the solid-phase solute, the other with the solution-phase solute in a selected solvent, at a particular temperature and pressure. , Simultaneous dissolution and crystallization processes are then simulated until the overall system reaches equilibrium. The solute’s equilibrium concentration, which represents its solubility limit under the given conditions, is then determined by simply counting the number of solute and solvent molecules in the solution-phase cell. This method benefits from theoretical simplicity, with a relatively straightforward setup and execution. However, the direct coexistence method is limited by high computational cost, which has two root causes.

The first is that, in order for the solute and solvent molecules to be counted, the molecular dynamics simulation requires an explicit solvation model. This results in high computational costs meaning that, in practice, the molecular dynamics simulations are limited to small systems. This hinders the precision of the method’s predictions, since systems with larger atom counts allow for more precise calculation of the equilibrium concentration. Furthermore, the need for an explicit solvation model is an intrinsic limitation of the method, since less costly representations of the solvent such as a polarizable continuum model, in which the solvent is represented by a single, homogeneous, polarizable field, would not allow for atom counting. The method does, however, benefit from phenomena such as solvation shells being accounted for by the explicit solvation model. The second root cause of the high computational cost is the time taken for the system to equilibrate, which in some cases can be up to several microseconds. Since molecular dynamics simulations typically employ timesteps on the order of femtoseconds, on the order of a billion simulation steps are required for the system to equilibrate.

Another limitaton of the direct coexistence method is that in order to simulate the dissolution process of a query compound, the crystal structure of the compound must be known. , Later in this review, we discuss the feasibility of instead basing the calculation on a model structure derived from Crystal Structure Prediction (CSP). However, the method has so far been applied primarily to simple alkali halides, the structures of which are well documented.

Manzanilla-Granados et al. used the direct coexistence method to predict aqueous sodium chloride solubility, using the JC-TIP4P-Ew models for water, and Joung and Cheatham’s parameters for the sodium chloride species. ,, The group sought to compare four distinct initial systems within the simulation and monitor the impact on computational cost and prediction accuracy. The first system comprised a sodium chloride solution between two solid-phase sodium chloride plates; the second an aqueous sodium chloride solution with ions distributed randomly, and no solid-phase sodium chloride; the third a block of solid-phase sodium chloride submerged in pure water; and the fourth a block of solid-phase sodium chloride submerged in a supersaturated aqueous sodium chloride solution. It was found that the solubility prediction, which averaged 4.2(3) mol kg–1 across the four setups, did not significantly change between setups, within simulation error. It was, however, observed that the fourth setup equilibrated much faster than the other three setups, affording a more tractable computational cost without hindering the prediction accuracy. However, it is not clear if these results are transferrable to other systems. The group also tested the first and fourth setups using the JC-SPC/E water model rather than the JC-TIP4P-Ew model, , which gave an average prediction of 5.9(3) mol kg–1, highlighting the force field-dependence of solubility predictions. Both of these predictions are in reasonable agreement with an experimental value of 6.15 mol kg–1 (298.15 K, 1 bar).

Later, Kolafa applied the polarizable AH/BK3 model for both the ions and the water molecules, to predict the aqueous solubility of sodium chloride. The AH/BK3 model represents ionic charges using Gaussian charge distributions which are connected to their gas-phase positions by harmonic springs. The mechanical equilibrium between the electrostatic forces and the spring forces determines the polarizability of the ion. The AH/BK ion model also features exponential repulsion and the r–6 attraction functions (where r is the interatomic/interionic distance). The ion and water force fields are ‘transferable’, meaning that the potential of ion–ion, ion-dipole and dipole–dipole interactions can be described as a simple combination of the two interacting potentials. Kolafa found that this method produced a reliable, albeit inaccurate, result of 0.56(15) mol kg–1, which is a less accurate prediction that those given by Manzanilla-Granados et al. In the same paper, however, Kolafa modified the AH/BK3 force field to create the MAH/BK3 model, retained some key features such as the Gaussian charge distribution, the charge-on-spring polarizability, and the repulsion and attraction functions, but refitting the alkali halide crystal’s thermomechanical parameters. This refitting rectified an issue of the simulated sodium chloride crystal being too stable, which had resulted in the underestimation of the crystal’s solubility. The MAH/BK3 model gave a more accurate solubility prediction of 3.7(3) mol kg–1. Kolafa also tested the JC-SPC/E ion model, previously used by Manzanilla-Granados et al., which yielded a prediction of 3.6(3) mol kg–1, in reasonable agreement with that of the MAH/BK3 model. Unlike the MAH/BK3 and AH/BK3 models, the JC-SPC/E model is a nonpolarizable force field which represents ions using elementary point charges rather than Gaussian charges.

Perhaps the most accurate results using the direct coexistence method come from Yagasaki et al., who investigated pairing alkali halide models with a variety of different water models and comparing the results. The group developed Lennard-Jones parameters for sodium (Na+), potassium (K+) and chloride (Cl) ions and predicted the aqueous solubility of sodium chloride and potassium chloride using three water models: SPC/E, TIP3P and TIP4P/2005. The sodium chloride model performed best when paired with the TIP3P model, though only by a small margin, giving a prediction of 6.1(3) mol kg–1. The potassium chloride model gave a consistent result of about 4.7 mol kg–1 with all three water models, which agrees with the experimental value of 4.76 mol kg–1. Despite yielding reasonably accurate results, the Yagasaki et al. study is limited by several factors. Since the simulations were conducted at ambient conditions, it is unclear if the models can capture the temperature dependence of solubility. Further, it is unclear if the Lennard-Jones parameters developed for Na+, K+ and Cl ions are transferable to other alkali metal and halide ions, providing opportunities for further research.

The direct coexistence method offers a theoretically straightforward route to solubility prediction. Its notoriously long equlibration times have been tackled by Manzanilla-Granados et al., however it is unclear whether their results are generalizable to other systems. While recent studies have yielded promising results, these results were obtained from specially developed Lennard-Jones parameters which may not transfer to other alkali halides. Perhaps most importantly, studies using the direct coexistence approach have been limited to simple alkali halides. To predict the solubility of organic molecules, more sophisticated approaches are required.

3.1.2. Chemical Potentials Route

The chemical potential route (CPR) is a method of solubility prediction which leverages the fact that, under given temperature and pressure conditions, the solubility limit occurs when the chemical potentials of the solid-phase solute and solution-phase solute are equal. The chemical potential of the solid-phase solute is defined as the Gibbs free energy per molecule, while the chemical potential of the solution-phase solute, which varies with solution concentration, is defined as the Gibbs free energy change resulting from the addition of one solute molecule into the solution. CPR methods typically use Monte Carlo (MC) and molecular dynamics (MD) simulations to calculate the chemical potential of the solid-phase solute, and determine the chemical potential of the solution-phase solute as a function of the solution concentration. This enables the solubility limit to be predicted by satisfying the equality of the two chemical potentials, where the corresponding solution concentration represents the solubility limit (see Figure ). While such methods have been used to predict aqueous solubilities of organic compounds, much of the development of the method is concerned with predicting solubilities for simple alkali halides.

1.

1

Illustrative example of the chemical potentials approach to solubility calculation. The solubility limit can be determined by finding the point at which the chemical potentials of the solid-phase and solution-phase solutes are equal.

In 2015, the CPR method was used with four different ion models, each paired with the SPC/E water model, to predict the aqueous solubility of NaCl at ambient conditions, revealing significant issues. The four ion models were: KBFF (a nonpolarizable force field); RDVH (a force field which represents ions using Lennard-Jones potentials with superimposed elementary point charges); JC-SPC/E (a nonpolarizable force field which represents ions as elementary point charges, as described in Section .); and SD (a polarizable force field which represents ions using Lennard-Jones potentials). The best prediction was made by the RDVH model (5.69(7) mol kg–1), followed by the JC-SPC/E model (3.71(4) mol kg–1), the KBFF model (0.88(2) mol kg–1) and the SD model (0.63(1) mol kg–1). The models were then tested for their transferability to elevated temperatures: the JC-SPC/E and RDVH models predicted a decrease in solubility with increasing temperature, while the KBFF and SD models predicted little influence of temperature on solubility. According to the Gibbs function, the incorrect temperature dependence of the JC SPC/E and RDVH models suggests that the dissolution process had a negative entropy change. This indicates that the entropy of the modeled crystal was too high, the entropy of the modeled solution was too low, or both, which may have been caused by overfitting the model. In 2022, the solubility of NaCl in a water–methanol mixture was calculated for varying solvent ratios ranging from pure water to pure methanol. Despite not giving the best predictions in their 2015 model, the group used the Joung-Cheatham model for the salt, the SPC/E model for the water, and the OPLS-AA model for the methanol, and achieved “surprisingly good” agreement with experimental data. The solution chemical potential was calculated by insertion of a single ion into the solution, with a neutralizing background, and this was compared with the solid-state potential calculated from the group’s 2015 paper.

In 2020, Dočkal et al. reparameterised the existing AH/BK3 and MAH/BK3 models (which were previously discussed with regards to their use in the direct coexistence method) and applied the resultant reparameterised model (RM) to a CPR method. The RM gave a good aqueous NaCl solubility prediction of 7.0(4) mol kg–1 (ambient conditions). The RM was tested at elevated temperatures, giving values of 6.7(6) mol kg–1 (373.15 K, 1 bar) and 6.8(6) mol kg–1 (473.15 K, 15.5 bar). While these predictions could reflect the temperature dependence of solubility, within one standard deviation, their ability to do so was not conclusively proven. Dočkal et al. pointed out that their reparameterisation process is generalizable to other simple alkali halide salts, although this is yet to be tested. The group also noted that the inaccuracies in the predictions of their model were likely due to the ion–water interactions still being modeled by the AH/BK3 model. Reparameterising these interactions could yield even more accurate results, but this is yet to be done, highlighting another opportunity for future work. There are several other opportunities for future work. The RM is currently more attractive than other modern models such as the Madrid model, as it does not improve solubility predictions at the expense of predictions of other properties. However, the temperature dependence of the Madrid model has not yet been tested; the ability to capture such trends could make it more advantageous than the RM, providing an important topic for future work. Further, several modern polarizable water models, such as E3B and HBP, have not had compatible ion models designed for them so their solubility prediction has not yet been tested. ,

While the chemical potentials route has mostly been reserved for simple alkali halides, it has been applied to alkaline earth metal halides.. This presents a challenge due to the slow dynamics of such systems, and the fact that alkaline earth metal halides are hydrated in the solid phase but not in the liquid phase. The group used three force fields for the molecular dynamics simulations: Mamatkulov et al., DRVH, and ECCR2, with ECCR2 giving the most accurate predictions of solubility.

A common problem with the chemical potentials route to solubility prediction is the need to convert the modeling of the solid-phase solute, which has bound degrees of freedom, to the solution-phase solute, which has additional translational and rotational degrees of freedom. Khanna et al. took first steps to deal with this problem by positing a decoupled approach to calculate chemical potentials at a solid–liquid phase equilibrium which starts from two independent reference systems. The group adopted the Frenkel-Ladd method for obtaining the solid-state chemical potential, ,, in which an Einstein crystal is used as a reference system, its chemical potential is calculated, and it is converted into a fully interacting crystal in a stepwise fashion. The change in chemical potential of each of these steps can be calculated, hence the chemical potential of the fully interacting molecular crystal can be found. The Frenkel-Ladd method is further discussed in Section . However, to avoid the solid-to-liquid transition, the group employed an independent reference system for the liquid-phase compounds, known as the centroid. The goal of the centroid approach to calculating liquid-phase chemical potentials is to give the reference system the same number and types of degrees of freedom as are present in the fully interacting molecule, as the closer the reference system is to the real system, the easier it is to transform the reference system into the fully interacting system; difficulties often arise, for example, when ideal gases are used as reference systems as they possess fewer degrees of freedom than real gases. The centroid is composed of all of the atoms present in the real system, each bound by a spring to the collective center of mass. The chemical potential of the centroid can be computed either analytically or through simulation, and it is then transformed into the fully interacting molecule in a series of steps, first by bonding the atoms together as they are in the real molecule, then by gradually turning on correct angles, dihedrals and Lennard-Jones interactions, and finally by turning on Coulombic interactions (see Figure ).

2.

2

Stepwise conversion of a centroid to a fully interacting molecule. 1. The centroid system transforms to bonded atoms; 2. bond angles are turned on; 3. dihedrals are turned on; 4. Lennard-Jones interactions are turned on; 5. Coulombic interactions are turned on. Reproduced with permission from Reference . Copyright 2020 American Institute of Physics (AIP) Publishing.

Each of these steps has an associated chemical potential change, which can be added to the chemical potential of the reference system, hence affording the chemical potential of the fully interacting molecule. While the group originally used this method to compute solid–liquid phase equilibria, it was later extended to solubility calculation in 2023 to predict the aqueous solubilities of naphthalene and β-succinic acid. These calculations required additional solvation free energy calculations at varying concentrations. The predicted solubility of naphthalene as a mole fraction of 3.66(0.2) × 10–6 is in reasonable agreement with the experimental value of 4.4 × 10–6, and the calculations display a linear increase in solubility with temperature. However, the predicted solubilities across a temperature range of between 300 and 350 K for succinic acid were around three times larger than experimental values. Given the rigidity of naphthalene and the relative flexibility of succinic acid, these results indicate that the centroid method performs well for rigid compounds but poorly for more flexible compounds. However, the standard error of the solubility calculations overall remains below 10%, making the centroid method of the chemical potentials approach a promising candidate for future development. Khanna et al. concluded that overcoming inaccuracies in the parts of the force field modeling solute–solvent and solvent–solvent interactions would be particularly beneficial.

Another significant step forward in the chemical potentials route came recently from Reinhardt et al., who sought to provide a streamlined molecular dynamics methodology. To model a solution’s chemical potential as a function of concentration, the group used a Debye crystal of a known energy, in which all pairs of atoms were connected by harmonic springs, as a starting point for thermodynamic integration, which was used to switch off the harmonic Debye interactions and switch on the potential, creating an interacting crystal. Then, a Debye solute molecule, in which all atom pairs were connected by harmonic springs and the center of mass and rotational movement were constrained, underwent Hamiltonian thermodynamic integration followed by free-energy perturbation to convert it into a fully interacting molecule. Finally, free-energy perturbation was used to add a softly interacting solute molecule into a very dilute solution of concentration C 0, the potential of which was gradually switched on to become a fully interacting solute molecule in solution. The solvation free energy was calculated as the difference in free energy between the solute molecule fully interacting in dilute solution and the same molecule in the gas phase. The S 0 method was then used to calculate the chemical potential, μ sol , as a function of solution concentration, c. The S 0 method is based on the thermodynamic relationship between the changes in particle numbers and derivatives of the chemical potentials with respect to molar concentration,

μsol(c)=μsol(c0)+kBTln(cc0)+kBTlnc0lncdln(c)[1SMM0SMS0ccS1] 13

where the M and S subscripts represent solute and solvent molecules, respectively; S MM is the static structure factor in the k → 0 limit between solute molecules, and S MM is the same between solute and solvent molecules. This methodology was used to predict solubilities of sodium chloride in water, urea polymorphs in water at a range of temperatures, and paracetamol in water and in ethanol. The method tended to underestimate solubilities, for example giving a value of (3.5 ± 0.5) × 10–8 mol kg–1 for aqueous paracetamol at 20 °C, compared to the experimental value of 0.0845 mol kg–1 under the same conditions. This shortcoming is likely due to the breakdown of the harmonic potential approximation, which is key to the Debye crystal, at elevated temperatures. However, the increased computational efficiency of this method, and its ability to reasonably accurately predict the solubility of simple alkali halides, makes it a promising candidate for future work in solubility prediction for organic molecules.

The chemical potentials route offers a reliable method of solubility prediction which leverages the equivalence between the solid phase and solution phase solute at the solubility limit. Such methods are computationally less intensive than direct coexistence methods, especially with recent steps forward, and can provide accurate solubility predictions for simple alkali halides. While they tend to give predictions of varying quality for small organic molecules, predictions remain heavily dependent on the force field used. Such methods can also become computationally cumbersome when calculating solubilities under a wide set of temperature and pressure conditions. However, a different approach, using the system’s density of states, is able to straightforwardly access solubility predictions at a range of temperatures and pressures.

3.1.3. Thermodynamic Approaches (Density of States)

In response to the abundance of simulation-intense solubility prediction methods such as those discussed above, Boothroyd et al. proposed an efficient way of accessing solubility predictions across a range of temperatures and pressures using only a small number of simulations. , Their approach involved calculating a solute–solvent system’s density of states, which is the number of configurations that that system may occupy at a given energy, in order to identify the point of phase coexistence between the solid phase and solution phase solute, thus accessing the solubility. By applying well-established density of states calculations, which have historically been limited to single-component systems, , to multicomponent systems, Boothroyd et al. accessed solubility predictions for simple salts and organic molecules.

For a single-component system, the point of equilibrium can be determined from the density of states by first considering the isothermal–isobaric (NpT) partition function in which distinct microstates may be degenerate. This is expressed as

Q(N,p,T)=EVΩ(V,E)expβ(E+pV) 14

where Ω­(V, E) is the system’s density of states, E and V are the system’s energy and volume, respectively, and the summation is over all energy levels. Knowing the system’s density of states allows for scanning of the corresponding probability distribution,

P(E,V)=1Q(N,p,T)explnΩ(V,E)β(E+pV) 15

across either temperature or pressure while keeping the other constant, affording access to coexistence conditions. For multicomponent systems containing species i and j, where the population of species i (N i ) is allowed to change and the population of species j (N j ) is constant, the partition function is given by

Q(μi,p,T)Nj=EVNiΩ(Ni,V,E)Njexpβ(E+pVμiNi) 16

where μ i is the chemical potential of species i. Again, if the density of states, here Ω­(N i ,V,E)N j , is known, the corresponding probability distribution,

P(Ni,E,V)Nj=1Q(μi,p,T)NjexplnΩ(Ni,V,E)Njβ(E+pVμiNi) 17

can be used to identify the coexistence point at which the solid phase and solution phase component i are in equilibrium. This probability distribution as a function of N i displays two peaks of equal area, representing the two coexistence states: one at x i = 1, representing a solid component i, and the other at some lower mole fraction, 0 ≤ x i < 1, which corresponds to the saturated solution, i.e., the solubility limit (see Figure ). Therefore, the mole fraction at which the peak corresponding to the saturated solution occurs is the solubility limit.

This method was first used to predict the aqueous solubility of sodium chloride at ambient conditions; the prediction of 3.77(5) mol kg–1 agreed with values predicted by previously described methods and is comparable to the experimental value of 6.15 mol kg–1. The key advantage of this method is that, since the density of states is independent of temperature and pressure, a system’s solubility limit can theoretically be predicted at a range of temperatures and pressures from only one density of states calculation. This method is therefore efficient in solubility prediction across a variety of conditions. However, in order to plot an accurate probability distribution for solubility prediction, a large number of free energy calculations are required across a wide range of possible discrete mole fractions. Further, although these numerous calculations are necessary, many of them will not be directly used for solubility prediction, as only those corresponding to the relevant peak in the probability distribution directly offer useful information about the solubility limit. However, the group later tackled these inefficiencies.

A subsequent alteration to the method used the density of states to calculate the system’s free energy of solution at a given concentration, rather than to calculate coexistence conditions. Performing several free energy calculations across a range of concentrations allowed for a free energy of solution versus concentration function to be fitted; the system’s chemical potential was then accessible as the first derivative of this function. Leveraging the fact that the solubility limit occurs at the equivalence of the solid-phase and solution-phase chemical potentials, the intersection of this solution-phase chemical potential function with that of the solid-phase chemical potential provided access to the solubility.

This modification was a step forward in efficiency. While the revised method required several free energy calculations in order to produce an accurate free energy function, far fewer calculations were required than in the original method. Moreover, all of the calculations in the modified method were pertinent to the final prediction, making it more information-efficient. The modified approach also retained advantages associated with the original density of states method, specifically the ability to calculate solubilities across a range of temperature and pressure conditions with only one density of states calculation. This method was used to determine the solubility of urea both in water and methanol. The urea and methanol molecules were modeled using the General AMBER Force Field and the water molecules were modeled using the TIP3P force field. , The solubility prediction in methanol, 0.85(3) mol kg–1, was somewhat comparable to the experimental value of 4.01 mol kg–1. However, the method failed badly for the aqueous solubility of urea, giving a prediction of 0.46 mol kg–1, which is about 45 times lower than the experimental value of 20.15 mol kg–1. Despite this shortcoming, the method captured the correct temperature–dependence of solubility for both solvents, and gave accurate relative solubilities in both cases. According to Boothroyd and Anwar, the failure of the absolute solubility predictions and success of the relative solubility predictions suggests that “something systematic may be missing from the force fields”.

The density of states approach, particularly the updated methodology, provides a method of solubility prediction for organic molecules in both aqueous and nonaqueous solvents which is far more computationally efficient that the direct coexistence and chemical potential routes. The method allows solubility calculation across a range of temperatures and pressures from only one density of states calculation and, while absolute solubility predictions have so far been poor, offers reasonably accurate relative solubility predictions. Like many physics-based methods, the density of states approach may be improved by better-parametrized force fields. However, no further improvements to the method have been reported since.

3.1.4. Einstein Crystal Method

An atomic Einstein crystal is a hypothetical crystal in which each atom is bound to its lattice position by a harmonic spring; all springs in the crystal oscillate at the same frequency Since the force constant of the springs is known, the Helmholtz free energy of the crystal can be calculated, and hence the chemical potential of the solid can be found analytically. For molecular Einstein crystals, several springs are required per molecule: one central spring binds the molecule’s center of mass to its lattice position, while several more hold the molecule in the same orientation as all other molecules in the crystal. The chemical potential of the molecular Einstein crystal can similarly be found analytically.

The Einstein Crystal method, developed by Frenkel and Ladd in 1984, used Monte Carlo simulations to compute the free energy of the solute crystal of interest by thermodynamic integration to the real crystal from an atomic Einstein crystal of the same structure. The extended Einstein crystal method (EECM), proposed in 2017, is a straightforward extension of the original method and was used to successfully predict the aqueous solubility of naphthalene. To compute the free energy of the naphthalene crystal, MD simulations were employed to calculate the free energy changes associated with a series of reversible steps which transformed the reference molecular Einstein crystal (the free energy of which was known) into the real naphthalene crystal, hence finding the free energy (and subsequently the chemical potential) of the real naphthalene crystal. The chemical potential of the aqueous naphthalene solution was then determined using an MD simulation to insert a naphthalene molecule into a cavity in simple point-charge (SPC) water, then shrink the cavity away, leaving only the naphthalene molecule in the solution. The predicted solubility value of 4.74 × 10–6 mol kg–1 compared well with the experimental value of 4.40 × 10–6 mol kg–1.

In 2019, the EECM was employed by a different group to predict the solubility of acetaminophen (paracetamol) in ethanol. The predicted solubility of 0.085(14) mol L–1 compared well with the mean experimental value of 0.059(4) mol L–1. Acetaminophen was selected as a test solute due to its pharmaceutical relevance and the availability of experimental data for comparison.

Direct solubility calculations offer a range of varied methodologies. While the direct coexistence method remains computationally costly, recent studies have obtained highly accurate results for simple alkali halides. Further, recent developments to the chemical potentials route have significantly streamlined computational workflows and opened the door for the successful application to organic molecules. , Meanwhile, vastly different methods such as the density of states approach have been applied to organic molecules with reasonable success. Despite these improvements and successes, direct solubility calculations remain computationally intensive and offer limited accuracy and generalizability.

3.2. Indirect Calculation of Solubility

3.2.1. Thermodynamic Cycles

Intrinsic solubility can be defined via the Gibbs free energy of solution as

ΔGsol*=RTln(SoVm) 18

where ΔG sol is the free energy change of transferring a molecule from the crystalline phase to aqueous solution under standard conditions, R is the molar gas constant, T is the temperature, S o is the intrinsic solubility in moles per liter and V m is the molar volume of the substrate crystal. The activity coefficient for the solute in solution is assumed to be unity. The Ben-Naim definition, , which describes the transfer of a molecule between two phases at a fixed center of mass in each phase, is used as the standard state, indicated by the superscript *.

The most obvious route with which to calculate intrinsic aqueous solubility is by computing the Gibbs free energy of solution, ΔG sol . The direct prediction of solution free energy is uncommon, however, as an accurate prediction cannot be easily made from a single simulation. As a result, solution free energy is often determined indirectly through several calculations.

Measurement of solution free energy can be achieved through two separate thermodynamic cycles: the fusion and sublimation cycles. The fusion thermodynamic cycle consists of the transfer of a molecule from crystal to solution via a super cooled molten phase, in which a normally solid solute is considered to behave as a liquid. This hypothetical state cannot be reproduced experimentally, and is introduced to allow for the separation of solid phase interactions from solvated interactions. Alternatively, the sublimation cycle follows a molecule through the gas phase to solution, and is accessible through both experiment and computation.

ΔGsol*=ΔGfus*+ΔGmix*=ΔGsub*+ΔGsolv* 19

Care must be taken when combining calculations from separate simulations to obtain the free energy of solution or intrinsic solubility. A consistent methodology is required throughout this process to minimize the introduction of error between calculations.

In the following subsections, methods to calculate sublimation, fusion and solvation free energies are discussed separately, before their combination to predict solubility is described.

3.2.2. Sublimation

The indirect modeling of solution free energy through the sublimation thermodynamic cycle sees a given molecule transition from crystal to solution via an isolated state. Of the two steps involved in this process, sublimation and solvation, measuring sublimation and its associated free energy has historically been the more difficult. Experimentally, this difficulty arises due to the fact many approaches, such as calorimetric methods, typically only yield a sublimation enthalpy. Related thermodynamic properties, such as the sublimation entropy, are often back-calculated from experimentally obtained data. This is not always done, however, leading to an incomplete set of properties for many molecules reported in the literature, which can result in the pairing of data from different experimental procedures and may contribute to any experimental noise observed.

Informatics based approaches, such as QSPR models, can provide relatively accurate predictions of solubility and its related thermodynamic properties when provided with high quality data. However, the benefits of such models are limited by their need for large quantities of reliable experimental data, the predictive capabilities of such models beyond the subset of molecules found within their training data, as well as a lack of an underlying theoretical basis with which to interpret the reasoning behind a given prediction. Beyond the accuracy of experimental data, a lack of relevant descriptors related to solid-state contributions have been found to be another limiting factor in the accurate prediction of solubility for quantitative structure–property relationship models.

Modeling solid state behavior via physics-based approaches generally involves the calculation of enthalpic and entropic contributions from the crystalline and gaseous phases toward the sublimation free energy. The calculation of these properties can take a variety of forms, depending on the level of theory applied and approximations made. These various approaches can be summarized into one of two methodologies: Ψ mol and Ψ crys . The first of these two methods, Ψ mol , is a model potential based approach in which a combination of force field and quantum mechanical level calculations are made to determine a crystalline lattice energy and thermodynamic contributions from phase specific molecular degrees of freedom. Crystalline lattice energies are typically calculated as the sum of all interactions between a central molecule and those surrounding it. The atom–atom interactions modeled within this system use an intermolecular potential in which the electrostatic term is generated from a Distributed Multipole Analysis (DMA) of the molecular charge distribution. The lowest energy crystalline structure is then found using this potential. Thermodynamic contributions from phase specific molecular degrees of freedom can be separated into vibrational, rotational and translational molecular motions found in the crystalline and gaseous phases. By making assumptions about the contributions from molecular motion observed in both phases, the equipartition theorem can be applied to calculate each in terms of RT, where R is the molar gas constant and T is the temperature. Intramolecular vibrations are assumed to be equal across gaseous and crystalline structures, which allows for the separation of the crystalline lattice energy into crystalline and gaseous contributions, as well as the calculation of vibrational and entropic contributions through the crystal phonon modes. From this, sublimation enthalpy can be approximated as ΔH sub = – U lattice – 2RT, where U lattice is the crystal lattice energy, and 2RT approximates the contribution of the crystalline and gaseous degrees of freedom to the sublimation enthalpy. Sublimation entropy can also be determined from this approximation as ΔS sub = S crys – (S gas + S gas ), where S crys is the crystalline phonon entropy, S gas is the gaseous translational entropy contribution, and S gas is the gaseous rotational entropy contribution. This 2RT approximation is often applied as the crystal lattice energy is assumed to be the dominant contribution to the sublimation enthalpy. Nonetheless, phonon mode calculations are still needed if one wishes to obtain the entropy and thus the free energy, or indeed for a more accurate computation of the sublimation enthalpy. Abramov et al. reported intrinsic solubility predictions for benzoylphenylurea and benzodiazepine derivatives using sublimation free energies estimated entirely from crystal lattice energies and the 2RT approximation. Where more accuracy is required, entropic contributions are included in sublimation free energy calculations, and have been reported for small organic and drug-like molecules in a number of studies. ,, There are several inaccuracies associated with the Ψ mol approach, namely the 2RT approximation and the use of model potential based calculations. The 2RT approximation applies a universal correction, independent of crystal behavior and structure, to approximate the contribution of molecular degrees of freedom to the sublimation enthalpy. It assumes, among other things, that the molecular vibrations are unaffected by transfer from the crystal to the gas, that they are not coupled to the phonons in the crystal, and that the phonon modes’ contributions follow equipartition. This approximation has been found to be inconsistent between crystals of varying size and rigidity, with particularly significant errors introduced for small molecular crystals.

The Ψ crys approach by comparison performs rigorous electronic structure calculations on the full crystal structure with periodic boundary conditions. By replacing the model potential and 2RT approximation with quantum mechanical (QM) calculations, including accurately computed phonon mode calculations, one can explicitly represent solid state contributions to the sublimation free energy. This allows more accurate and detailed modeling of the crystal structure to be carried out. The use of quantum mechanics throughout all crystalline calculations enables the consistent use of a single level of theory across all phases, which cannot be done with Ψ mol . Due to this increased complexity, the Ψ crys approach has become increasingly popular and successful recently. Several independent studies have been reported which investigate the accuracy of DFT and post Hartree–Fock based calculations, as well as comparisons of QM based phonon mode calculations to model potential based approaches. An ab initio fragment-based additive scheme has been proposed for approximating the sublimation enthalpies of small molecular crystals at temperatures nearing 0 K, , with a more recent application to sublimation pressures. Blind studies investigating the development of organic crystal structure prediction and current state-of-the-art methods are periodically released. Most recently, the seventh blind study ran until September 2022 with a relevant publication expected to be released in due course. Overviews discussing the current state of CSP can also be found in the literature. However, the Ψ crys approaches most often used for CSP are designed to correctly order the energies of different polymorphs, hence identifying the most stable crystal structure, rather than to give accurate absolute sublimation free energies. Recent studies have also highlighted the need for greater accuracy in polymorph ranking methods, which have been reported to produce problematic conformational energies through DFT based delocalization errors. , Fowles et al. showed that the periodic dispersion-corrected density functional Ψ crys PBE-TS method, despite its ability to order alternative structures, could not generate accurate absolute sublimation free energies of the kind required for solubility prediction. Thus, it was necessary to use the Ψ mol approach to compute the sublimation leg of the thermodynamic cycle corresponding to solubility. Firaha et al. established an experimental benchmark for crystalline polymorph ranking alongside proposing a single point energy correction for use in lattice energy minimization and phonon mode calculations, with phase transition errors below 2 kJ mol–1 reported for their benchmark data set.

The sublimation free energy measures the relative energy of a real or hypothetical crystal structure versus the corresponding ideal gas, and its computation is therefore essential to the field of CSP. This field has made enormous strides in the last 35 years, from its state being described as scandalous by the then editor of Nature, albeit as a provocative opening line to a broadly constructive article. As evidenced by the periodic “Blind Tests” where the CSP community is challenged with a selection of newly solved and unpublished structures, the field has progressed to reach an encouraging level of success and accuracy in predicting many organic crystal structures, especially those with less conformational flexibility. Up to now, indirect calculations of solubility using CSP methods to model sublimation have generally been attempted only on compounds with known crystal structures. However, since nearly 90% of polymorph pairs differ in energy by less than 6 kJ mol–1 and around 40% to 50% by less than 2 kJ mol–1, equivalent respectively to about 1.0 and 0.4 log S units, the question of first-principles solubility prediction using predicted crystal structures arises. These estimated energy differences are of similar size to the errors in a high-quality solubility prediction, and suggest that such an approach may lie on the cusp of feasibility.

3.2.3. Fusion

Through the fusion thermodynamic cycle, a given molecule is transferred from crystal to solution via a supercooled liquid state. This state simulates a normally solid solute as a liquid, and allows for the separation of solid-state and solvated interactions. Thermodynamic models which simulate this transition are commonly used for understanding solid–liquid phase equilibria, such as determining the solubility of a given solute over a broad range of conditions. A wide range of thermodynamic models exist, the simplest of which are fully empirical models which make use of interpolation and extrapolation to predict solubility at different conditions. , On the other side, there are the semiempirical or fully theoretical excess Gibbs energy (GE) models, such as NRTL, UNIQUAC, UNIFAC, PC-SAFT, as well as the general solubility equation (GSE). GE models typically include parameters that are related to properties important to phase equilibria, such as molecular size and polarity, which makes them more reliable for property prediction over a range of conditions. However, this requires the use of experimentally derived high quality data which may not always be available. For example, melting point and the enthalpy of fusion are needed for solubility prediction and are determined experimentally using conventional differential scanning calorimetry (DSC) or adiabatic calorimetry, although for biological systems fast scanning calorimetry (FSC) is often required due to sample decomposition. As such models are often applied as a high throughput screening process of virtual libraries or early stage discovery when sufficient quantities of compound may not be available, their use as predictive models is limited. Unlike GE models, the general solubility equation can be derived from the fusion thermodynamic cycle, and so bears some similarity to physics-based approaches. The GSE predicts solubility from a compound’s melting temperature and octanol–water partition coefficient (logP), so long as some assumptions are applied to the entropy of fusion. Artursson et al. have investigated the quality of solubility predictions made with the GSE, as well as its entropic assumptions, determining the GSE to give reasonable accuracy. The general solubility equation cannot readily be applied to virtual molecular libraries however as the accurate prediction of melting temperature continues to be a difficult task, with predictive errors above 30 °C still common. ,,

The application of molecular simulations to the prediction of amorphous solubility for drug-like molecules has been investigated by Luder et al. in a series of studies. With the use of Monte Carlo simulations and the free energy perturbation (FEP) method, free energies associated with the transfer from gas into water (ΔG hyd ) and gas into pure amorphous phase (ΔG ga ) can be used to estimate the amorphous solubility of drug-like molecules.

Sa=exp(ΔGaw/RT)Vm,a 20

where S a is the amorphous solubility, ΔG aw is the free energy for transferring a molecule from the amorphous phase to aqueous solution (ΔG aw = – ΔG ga + ΔG hyd ), and V m ,a is the molar volume of the drug-like molecule in the amorphous phase. The authors determined FEP simulations to be too computationally expensive for routine use with drug-like molecules, and so proposed an approximate theory for determining ΔG ga and ΔG hyd . This new approach, which they refer to as the simplified response (SR) theory, uses linear response theory and mean field theory to approximate for electrostatic and Lennard-Jones interactions, respectively. Due to a lack of experimentally measured amorphous solubilities, the SR method was benchmarked on data derived from experimental intrinsic aqueous solubility (S a ), entropy of fusion (ΔS m ) and melting point (T m ).

SaS0exp(ΔSmRln(TmT)) 21

Vinutha and Frenkel used a combination of simulation-based methods to investigate the solubility of amorphous materials. They combined direct coexistence simulations with computation of chemical potential via thermodynamic integration in a study relevant to both glasses and supercooled liquid phases. They noted some significant differences between the numerical predictions of the different approaches, though trends were broadly in line with experiment, including a decrease in amorphous solubility with time.

Amorphous solubilities were obtained with an RMSD of roughly 1 log S unit, although model reparameterisation and the introduction of an empirical correction were necessary for this to be achieved. Predicted values were noted to be sensitive to the choice of force field and atomic partial charges used within the model. Further, experimentally derived data is necessary to calculate intrinsic aqueous solubilities.

The determination of amorphous solubility can be useful for molecules that exhibit poor solubility in their thermodynamically stable form. The higher solubility metastable polymorphs can provide greater bioavailability within drug delivery systems, often achieved through the use of amorphous solid dispersion (ASD). With this approach a drug molecule is dispersed within the matrix of an often water-soluble polymer, which in turn provides greater solubility than can be achieved with the crystalline form. Several recent reviews have discussed the use of ASD systems for drug-like molecules, with a range of other studies having investigated the amorphous solubilities of various compounds. Two physics-inspired semiempirical models, the Hildebrand solubility parameter and Flory–Huggins (FH) interaction parameter, are commonly applied for predicting the solubility of solid dispersions. These approaches assume the polymer acts as a solvent and the amorphous polymorph as the solute in a mixed system, with the favorability of the ASD controlled by the Gibbs free energy of mixing: ΔG mix = ΔH mix TΔS mix . The Hildebrand solubility parameter, δ, describes the energy needed to vaporise one mole of molecules from the liquid phase per unit volume, referred to as the cohesive energy density (CED). The magnitude of δ therefore gives an approximation of the condensed phase intermolecular forces, which can be related to the enthalpy of mixing as

ΔHmix=Vt(δ1δ2)2ϕ1ϕ2 22

where V t is the total volume of the mixture, and δ and ϕ represent the solubility parameters and volume fractions of the individual components, respectively. To better model individual contributions to the total solubility parameter, Hansen developed a three-dimensional form of the Hildebrand parameter: δ2, where δ2 = δ d 2 + δ p 2 + δ h 2. The Hansen solubility parameter models solubility explicitly in terms of dispersion forces, δ d , polar forces, δ p , and hydrogen-bonding forces, δ h , which gives greater accuracy for solutions containing polar components. The solubility parameter approach has found use as a screening tool for identifying potential carrier polymers, but however is limited by its purely thermodynamic and semiempirical nature. This approach primarily fails to account for entropic contributions to the free energy of mixing or the impact of kinetic terms on the stability of an ASD, and is often dependent on the availability of experimental data.

The FH theory addresses the lack of entropic contributions seen in the solubility parameter approach by specifically considering the entropic effect of the polymer within a system containing compounds with significant size discrepancies.

ΔGmixRT=n1lnϕ1+n2lnϕ2+χn1ϕ2 23

Here, subscript 1 refers to the solvent, subscript 2 is the solute, ϕ is the volume fraction, n is the number of moles, R describes the molar gas constant, T is the temperature, and χ is the Flory-Higgins interaction parameter. As entropic contributions are included the FH interaction parameter a more detailed description of the thermodynamics of ASD systems can be obtained, however kinetic effects are also not considered by FH theory which will hinder its accuracy for more complex systems. Potential shortcomings in the FH interaction parameter have also been investigated in a series of publications by Anderson. , A more extensive discussion of the solubility and interaction parameters can be found in reviews focused on ASD modeling. ,,

3.2.4. Hydration

The modeling of solvated systems and their associated thermodynamic properties has long been an area of active research, resulting in a wide range of possible approaches for the prediction of hydration free energy. These approaches can generally be separated by the method with which solvent structure is represented, and fall into one of two categories: implicit or explicit solvent models. Implicit solvent models do not include any structural characterization of the bulk solvent system, relying only on a continuous dielectric medium to represent solvent; whereas, explicit solvent models include an atomistic representation of both solute and solvent molecules.

3.2.4.1. Implicit Solvation

Implicit solvent models (sometimes also referred to as continuum models) are a popular and widely applied approach for investigating solvent effects in biomolecular and chemical systems due to their relatively low cost and reasonable accuracy against more fundamental molecular simulations. Cramer et al. have developed a series of solvation models, referred to as the “SMx” models. The final model in this series, SM12, was parametrized for several solvation thermodynamic properties, including solvation free energy prediction across neutral and ionised solutes in aqueous or organic solvent systems, achieving average mean unsigned errors (MUE) of 0.5–0.8 kcal mol–1 and 2.2–7.7 kcal mol–1 for neutral and ionised solutes, respectively. The previous model, SM8, performed comparably when applied to the same data sets, however SM12 was parametrized against a more diverse training set and has been defined for the entire periodic table. A more theoretically rigorous form of the SMx models, SMD, was later released by Marenich et al., which treats the bulk electrostatic contribution with the nonhomogeneous Poisson equation (NPE), rather than the generalized Born approximation used by the SMx series. The SMD reportedly performs comparably to the SM8 and SM12 models, with average MUE of 0.6–1.0 kcal mol–1 for neutral solutes across aqueous and organic solvents. For ionised solutes, an average MUE of 4 kcal mol–1 was achieved, although solvent specific deviations in SFE prediction for anions and cations were reported. The authors claimed that the SMD and SM8 models, at the time of publishing, were more accurate than other available methods. The SMx series of models include solute specific macroscopic parameters, including acidity and basicity parameters, which may not not be readily available for compounds outside those used for model parametrization. The SMD solvent model has been regularly included in SAMPL blind study submissions, , with a range of independent studies focusing on its further development. Alternatively, the polarizable continuum model (PCM) proposed by Tomasi et al. has led to the development of several different implementations of the PCM framework. The simplest of these implementations, referred to as the conductor-like PCM (CPCM), includes the conductor-like screening solvation boundary condition, and has been included in a number of studies. A correction of the polarization charge densities is included within CPCM, by introducing a scaling factor x for the solvent dielectric constant, ε: f(ε) = ε – 1/ε + x. An integral equation formalism of the polarizable continuum model (IEF-PCM) was introduced by Cances et al., implementing a more sophisticated treatment of the solvation boundary condition by taking into account anisotropic or isotropic dielectric continuum solvation. , IEF-PCM has found success as a relatively accurate model in the prediction of solvation free energy, and has been included in a number of SAMPL submissions.

A very similar approach to CPCM, the conductor-like screening model (COSMO), was independently developed by Klamt et al., and includes this same polarization charge density correction. The scaling factor used in both methods typically differs, however, with a value of 0.5 used for COSMO. The primary benefit of using CPCM or COSMO over the original PCM, is due to the introduction of the boundary condition, which eliminated outlying charge errors caused by solute electron density expanding out beyond the solute cavity. The COSMO approach has since been extended beyond the typical polarizable continuum solvation procedure with the COSMO model for real solvent (COSMO-RS). This model introduced a statistical mechanics treatment of interacting surfaces, and so is not limited in its description of the solvated environment, as is the case for any model with the polarizable continuum procedure. COSMO-RS has been directly compared to the SM8 solvent model, using the same data set of 2346 solute–solvent pairs across 91 solvents on which the SM8 model was parametrized and tested. The findings of this study concluded that the COSMO-RS procedure was capable of solvation free energy predictions to within 0.48 kcal mol–1 of experimental values, compared to 0.59 kcal mol–1 with SM8, without a need for experimental or adjustable parameters. COSMO-RS holds several advantages over more conventional implicit solvent models, enabling its application in mixed solvent and variable temperature systems. Solvent specific parametrization is not needed in the COSMO workflow, with solute and solvent molecules treated equally. Second, QM calculations are performed on a molecule-by-molecule basis, and are not necessary across different solvents. More detailed descriptions of the COSMO and COSMO-RS concepts are available in the literature. A particular drawback of COSMO-RS, however, is its inability to model the solvent specific response to solute behavior, and so prevents the measurement of properties that can be readily investigated by other continuum models. This drawback was corrected for with the introduction of the direct COSMO-RS (DCOSMO-RS) method, which replaces the solvent dielectric response with the calculated solute surface polarization charge density. The COSMO series of models have become a commonly applied tool for solvated phase property investigation beyond solvation free energy, , such as phase equilibria, chemical process optimization, solvent screening and within drug discovery screening processes. , There are drawbacks to the application of implicit solvent models, however, despite the clear advantage of low computational cost. An implicit description of bulk solvent, including important solvent–solute interactions and solvent reorientation, are questionable approximations.

3.2.4.2. Explicit Solvation

Explicit solvent models offer a more rigorous approach to modeling solvated systems than implicit methods, as they more closely represent the underlying physical phenomena. A fully atomistic representation of both solute and solvent enables the consideration of specific solute–solvent interactions and solvent configuration that is not possible with a purely continuum based approach. Explicit solvent models are commonly applied to the prediction of solvation free energy through alchemical free energy methods, such as thermodynamic integration (TI) or free energy perturbation (FEP), applied within atomistic molecular dynamics (MD) or Monte Carlo (MC) simulations. Due to their rigorous theoretical basis, molecular dynamics and Monte Carlo simulations have become a popular molecular modeling approach for investigating biological and chemical systems of interest. Alchemical free energy methods compute solvation free energy through a series of nonphysical intermediary states, which simulate the movement of a solute from vacuum to solution, or vice versa. Further details on their use can be found in the literature. , Free energy calculations are routinely performed on a variety of molecular systems and processes, such as, protein folding, protein–ligand binding, and small molecule movement across membranes, among other biologically and chemically relevant systems. Sherman et al. reported MD/FEP calculated solvation free energies with an average unsigned error of 1.10 kcal mol–1 for a data set of 239 drug-like molecules. The authors tested several different force fields and water models, concluding that OPLS_2005 and the SPC water model provided the most accurate results, although semiempirical charge assignment was needed for solute molecules containing particularly polar groups. Sadowsky and Arey proposed a combined QM/MM method, involving additive contributions from both levels of theory. Ab initio molecular dynamics (AIMD) calculations of ion hydration free energies have been reported in several studies. Leung et al. reported hydration free energies for four monovalent ions to within 4% of experimental measurements through AIMD/TI based calculations. With this approach, individual thermodynamic integration procedures were necessary for each ion investigated. A similar investigation was carried out by Chaudhari et al. for divalent metal ions, with a comparison of quasi-chemical theory and force field based free energy calculations. Li and Wang developed ionic pairwise potentials with the adaptive force matching (AFM) method and combined QM/MM calculations, without the need for empirical parametrization. These AFM models could predict hydration free energies for salts to within 6% of experimental references. Mobley et al. have discussed the use of polarizable iterative atomic charges for biomolecular systems in which the molecular environment undergoes a change in polarization. With this approach, atomic charges were derived from electronic structure calculations and combined with GAFF bonded parameters. Hydration free energies were predicted to within 2 kcal mol–1 of experiment for a data set of 613 organic molecules through free energy perturbation calculations. Fully polarizable force fields, such as AMOEBA, have received a lot of interest for their inherent ability to respond to dynamic environmental changes during simulations which is not possible with traditional pairwise models. Fully atomistic molecular simulations offer a viable alternative to implicit continuum based approaches, however, the explicit representation of complex systems comes at a far greater cost, often requiring significant time investment to model even a modest number of systems.

3.2.4.3. Reference Interaction Site Model

A range of solvation models have been developed from the integral equation theory (IET) of molecular liquids, offering a viable alternative to expensive fully atomistic simulations and approximate continuum models. IET based models include an explicit representation of solute–solvent interactions via an atomistic force field. Unlike molecular simulation methods, however, the solvated system is described through a set of correlation functions, enabling the efficient computation of solvation structure and thermodynamics from statistical mechanics. A detailed review of the molecular integral equation theory and its applications is available in the literature. Some of the most recent developments have involved the reference interaction site model (RISM) and the closely related molecular DFT approach. The latter is based on classical DFT and a six-dimensional representation of positional and orientational solvent density. Recent studies on the MDFT theory have shown the method to accurately predict hydration free energies to within 1 kcal mol–1 of experiment with an average computation time of 2 CPU minutes per molecule. Although several IET based methods have been developed, only the reference interaction site model (RISM) will be discussed in detail here as it is the most widely implemented approach and the only one that has been used to predict solubility.

The RISM theory uses a simplified form of the high-dimensional molecular Ornstein–Zernike (MOZ) equations to model solvent density distribution around a solute molecule through a set of correlation functions. Solvation free energy predictions are made analytically using one of several available free energy functionals. From the RISM framework, two distinct methods have emerged. The simplest of these is 1D-RISM, in which the MOZ equations are approximated as a set of one-dimensional integral equations. 1D-RISM is rarely used in its common form for quantitative calculations of solvation thermodynamics as many of its free energy functionals fail to accurately predict the energetic parameters of chemical systems.

These functionals, such as the Hyper-Netted-Chain (1D-RISM/HNC) and Kovalenko-Hirata (1D-RISM/KH) models, are too inaccurate for routine use and typically achieve absolute prediction errors above an order of magnitude from experimental values. More accurate models have since been developed, including the Gaussian Fluctuations (1D-RISM/GF) and partial wave (1D-RISM/PW) free energy expressions. However, although several studies have since reported reasonable qualitative agreement with experimental data, large predictive errors are still commonly observed for many chemical systems. There have been various efforts to increase the accuracy of 1D-RISM beyond improvements to the RISM theory, such as through the introduction of empirical corrections, or by replacing the free energy functional entirely with a machine learning model. Ratkova et al. introduced a hybrid RISM and cheminformatics model by including empirical corrections to 1D-RISM calculated SFE. By including correction parameters determined from chemical descriptors, this structural descriptors correction (SDC) model was able to lower the prediction error of small organic molecules in aqueous solution at 298 K to 1.2 kcal mol–1. However, the inclusion of data set specific descriptors limits the wider applicability of this approach, with the potential need for reparameterisation when new molecules are introduced. Machine learning based free energy functionals trained on 1D-RISM correlation functions were introduced by Palmer et al. This method, RISM-MOL-INF, could make accurate predictions of hydration free energy with partial least-squares (PLS) models on a limited data set of small organic molecules. The RISM-MOL-INF process has recently been overhauled, with a more robust model proposed by Fowles et al. This new method replaced the existing machine learning approach with a deep learning convolutional neural network (CNN), and enabled the accurate prediction of solvation free energy across aqueous and organic solvents beyond 298 K. ,

The more commonly used form of the RISM theory is 3D-RISM, which approximates the MOZ equations with a set of three-dimensional integral equations. The 3D-RISM equations relate 3D intermolecular solvent–solute total correlation functions, h a (r), and direct correlation functions, c α (r),

ha(r)=ξ=1nsolventR3cξ(rr)χξα(|r|)dr 24

where ξ and α denote the indexes of sites in a solvent molecule and n solvent is the number of sites in a solvent molecule. The bulk solvent susceptibility function, χ ξα , describes the mutual correlations of sites ξ and α in solvent molecules in the bulk solvent. This solvent susceptibility function can be obtained from a preparatory 1D-RISM calculation: χ ξα (r) = ω ξα (r) + ρh ξα (r), where h ξα (r) are intramolecular correlation functions, ω ξα (r) are solvent–solvent site total correlation functions and ρ describes the solvent bulk number density.

To solve for h a (r) and c α (r), n solvent closure relations are introduced.

hα(r)=exp(βuα(r)+hα(r)cα(r)+Bα(r))1 25

Here, u α (r) is the interaction potential between solute molecule and α solvent site, B (r) are bridge functionals, β = 1/k B T, k B is the Boltzmann constant, and T is temperature.

Generally, exact solutions cannot be computed for the bridge functions, and so must be approximated for. The most commonly used closure relationship is the KH closure developed by Kovalenko and Hirata, which improves upon convergence rates obtained with the Hyper-Netted-Chain closure and prevents a possible divergence of the numerical solution of the RISM equations.

hα(r)={exp(Ξα(r))1Ξα(r)0Ξα(r)Ξα(r)>0 26

where Ξα(r) = – βu α (r) + h (r) – c (r).

There are several approximate functionals available within RISM for the calculation of solvation free energy. These free energy functionals obtain an analytical solution from the total and direct correlation functions. Many of these functionals have been used extensively, but are generally too inaccurate for routine use and provide hydration free energies with a large error from experiment. The KH and GF free energy functionals are provided below.

ΔGsolvKH=ραkβTα=1nsolvent[12hα2(r)θ(hα(r))cα(r)12cα(r)hα(r)]dr 27
ΔGsolvGF=ραkβTα=1nsolvent[cα(r)12cα(r)hα(r)]dr 28

where ρ α is the number density of solvent sites α and θ is the Heaviside step function. A strong linear correlation was observed between the error in hydration free energies calculated using the 3D-RISM/GF model and the 3D-RISM calculated partial molar volume. From this observation, Palmer et al. proposed the universal correction (3D-RISM/UC) free energy functional.

ΔGsolvUC=ΔGsolvGF+a(ρV)+b 29

Here, ρV is the dimensionless partial molar contribution, a is the scaling coefficient and b is the intercept. The values of the scaling coefficient and intercept are obtained by linear regression against experimental data for simple organic molecules. The 3D-RISM/UC model has been shown to give accurate predictions of HFE to within 1 kcal mol–1 of experiment, with further applications to the calculation of hydration thermodynamics. Other semiempirical free energy functionals have been proposed, such as the partial wave correction functional for 1D-RISM (1D-RISM/PWC), or the cavity corrected functional (3D-RISM/CC). However, neither model has undergone significant validation beyond small organic molecules.

A theoretically rigorous free energy functional was developed by Sergiievskyi et al., initially referred to as the initial state correction, but later changed to the pressure correction (3D-RISM/PC). , This functional was originally developed for molecular density functional theory (MDFT), but can also improve calculated solvation free energies with 3D-RISM by correcting for an overestimation in the solvent pressure. A second functional, the advanced pressure correction (3D-RISM/PC+), was also developed by Sergiievskyi et al., and later shown by Misin et al. to be more accurate than the original PC functional for SFE prediction in 3D-RISM. More generally, advances in the 3D-RISM theory have enabled the investigation of a wide range of systems, such as the influence of molecular orientation, or solute parameters in the RISM theory, calculation of HFE for molecular ions, salting out effects, protein–water interactions, as well as the introduction of coarse-grained solvent models. Hayashi et al. have recently proposed a hybrid approach involving the angle-dependent IET and 3DRISM in which hydration free energies have been reported for large biomolecular systems with accuracies nearing molecular dynamics simulations. 3DRISM derived descriptors have also been used in combination with machine learning to compute solvation free energies in a wide-range of different solvents.

3.2.5. Solubility

This section will include a critical discussion of methods to obtain aqueous solubility, from the sublimation cycle, fusion cycle, closely related methods, and also data-driven approaches. This section does not intend to be an exhaustive list of published methods, but rather a summary of the various approaches. Interested readers are directed toward other reviews covering individual topics, ,− including general introductions into the application of machine learning within computational chemistry. , Amaro et al. discuss multi scale modeling of biological systems and its potential applications within drug discovery in their thorough review of the topic, with a recent study investigating its use for gas solubility. Figure provides an illustration of routes that can be included across a range of physics-based methodologies for solubility prediction.

4.

4

Illustration of routes considered in a sample of physics-based approaches to solubility prediction. Each method links the crystal and aqueous solution through a thermodynamic cycle. Blue = GSE, Green = Direct Coexistence, Black = Chemical Potentials, , Red = Chemical Potentials, Pink = Chemical Potentials, Brown = Chemical Potentials, Orange = Density of States, Gray = Simulation Free, , Purple = Fusion

3.2.5.1. Sublimation Cycle

There have been a wide range of studies investigating aqueous solubility prediction through the sublimation cycle, due in part to the common use of techniques for the calculation of solvation free energy. A molecular simulation approach involving free energy perturbation calculations to estimate the intrinsic solubilities of drug-like small molecules was proposed by Mondal et al. With this approach, a crystalline lattice is generated through Monte Carlo simulations, and an averaged sublimation free energy determined through a series of five individual FEP+ calculations. Intrinsic solubility is then estimated from the averaged sublimation free energy, and a hydration free energy determined through a single FEP+ calculation. From a data set of 33 small drug-like compounds, a mean unsigned error of 0.6 log units was obtained. No detail was provided for calculated sublimation and hydration free energies, and so it is not known how individual errors contribute to calculated log S values. The authors also reported that this method was successfully integrated as part of a lead optimization study of 103 compounds, identifying a number of analogs with improved aqueous solubility. Schnieders et al. developed a combined MD and stochastic dynamics approach as an alternative to the standard free energy prediction methods used alongside MD simulations. By combining the AMOEBA polarizable force field with the orthogonal space random walk (OSWR) strategy, Schnieders et al. argued that the crystalline energy landscape could be better replicated than with fixed charge force fields and traditional free energy methods. The OSWR strategy, unlike other free energy methods, is capable of predicting the most favorable crystalline structure, however, a comparison against experimentally observed structures was not included in this study. Similarly to the method proposed by Mondal et al., the AMOEBA/OSWR approach averages calculated sublimation and hydration free energies from five independent simulations. From a data set of seven n-alkylamides, hydration, sublimation and solution free energies were reported alongside log S values. However, no experimental hydration or sublimation free energy data was provided. Experimental solution free energies and intrinsic solubilities were available for four of seven n-alkylamides, with average errors of 1.1 kcal mol–1 and 0.7 log units, respectively. Both of these simulation based approaches are capable of predicting intrinsic solubility to within 1 log unit accuracy, however, a considerable number of simulations are needed for even a single compound, which may limit the applicability of such methods for larger data sets. The AMOEBA/OSWR approach also requires further validation on a larger data set of varied compounds, with a breakdown of component errors, to ensure a cancellation of errors is not favorably leading to accurate log S predictions.

Palmer et al. have reported several methods of predicting intrinsic aqueous solubility for crystalline organic molecules via the sublimation cycle. In an introductory study, a joint direct computation and informatics based approach, using calculated thermodynamic properties combined with molecular descriptors in three-variable linear regression models was tested. These models were trained on a data set of 34 drug-like molecules, with calculated lattice energies, log P and rotatable bonds per compound included as molecular descriptors. An external test set of 26 molecules was used to validate this linear model, with an RMSE of 0.71 log units achieved. Hydration free energies and sublimation free energies were obtained via QM and CSP calculations, respectively. In this study, a focus was put on using affordable levels of theory with which to calculate thermodynamic parameters. As a result, QM calculations were performed using the B3LYP functional and SM5.4 or SCRF continuum solvent models. Similarly, solid phase calculations included a number of approximations. The 2RT approximation was included in the computation of sublimation enthalpy to avoid costly phonon mode calculations, with the crystal lattice energy determined using a model potential. The entropic contribution to sublimation was assumed to only include gaseous rotational and translational components, and the intramolecular crystalline vibrational contribution. The authors noted that intrinsic solubilities calculated directly from sublimation and hydration free energies did not reach sufficient accuracy, which will be partly a result of the wide ranging approximations used. Palmer et al. later demonstrated that the intrinsic solubility of crystalline drug-like molecules could be estimated from the direct computation of sublimation and hydration free energies. This approach used the same thermodynamic cycle, but with improved estimates of the sublimation and hydration terms. Model potential based calculations of the crystalline structure were performed with three different functionals (MP2, B3LYP, HF). Hydration free energy calculations were also extended beyond QM/implicit methods with the 3D-RISM/UC model. The sublimation and solvation free energies were calculated in different standard states such that the molar volume of the crystal was not required to compute solubility:

S=P0RTexp(ΔGsubo+ΔGhyd*RT) 30

Here, P o is the standard atmospheric pressure, R is the molar gas constant, T is temperature, superscripts o and * denote the 1 atm and 1 mol/L standard states, respectively. From a data set of 25 compounds, an RMSE of 1.45 log units was obtained through a combination of 3D-RISM/UC hydration and B3LYP sublimation free energy calculations. The model-potentials derived from MP2 and B3LYP calculations provided almost identical sublimation free energy values, with RMSE of 5.63 and 5.66 kJ mol–1, respectively, suggesting the source of error for solid phase calculations may derive from the overall modeling strategy and the 2RT approximation rather than the level of theory. In the most recent study, Fowles et al. reported a more rigorous physics-based proof-of-concept that built upon previous work. Many of the approximations found in the sublimation free energy model were replaced with the explicit calculation of crystal phonon modes through full DFT based PBE-TS calculations. This captured the contributions of the vibrational modes of the crystal and removed the need for the 2RT approximation, allowing for a more accurate estimate of entropic contributions. By combining these improvements with a model-potential based lattice energy, more accurate estimates of sublimation free energy could be obtained than with the more approximate approaches previously reported. This approach was tested on a small data set of three drug-like compounds for which experimental log S were available. The authors also provided experimental hydration free energies and sublimation enthalpies for two compounds, enabling some comparison of methods. From calculated sublimation data, the authors concluded that the use of the 2RT approximation for molecules beyond small organic compounds does not provide reasonable accuracy, with as much as a 17-fold error in solubility when applied to this data set. There was also good agreement between experimental and calculated sublimation enthalpies to within 1.1 kcal mol–1. Hydration free energies were evaluated at several levels of theory, with QM calculations involving the PBE, PBE0, and PBE0-DH functionals, as well as with atomistic MD/FEP; from which, the most accurate values were obtained from FEP calculations with errors ranging from 0.38 to 0.86 kcal mol–1 of experiment. The authors concluded that combining sublimation and hydration free energies which had been calculated using the highest levels of theory available for either component led to the most accurate log S predictions, as opposed to maintaining a consistent methodology throughout. From this combination of sublimation and hydration free energies, intrinsic solubilities ranging from 0.13 to 1.10 log units were obtained, which the authors noted exceeded the performance of several machine learning models implemented for prediction on this data set. The improvements made to the sublimation free energy routine within this proof-of-concept provide a more rigorous methodology with which to predict intrinsic solubility through physics-based calculations. Figure illustrates the progress that has been made in predicting intrinsic aqueous solubility from a thermodynamic cycle via the vapor using the methods discussed in this section. To further validate this approach, however, it must be tested on significantly more compounds, for which sufficient experimental thermodynamic data is available.

5.

5

(a) Early methods to predict intrinsic aqueous solubility from a thermodynamic cycle via the vapor required parametrization against experimental solubility data to provide accurate estimates of solubility. (b, c) Improvements in modeling of sublimation and solvation free energies have enabled physics-based predictions of solubility with improving accuracy at increased computational expense. , Reproduced and adapted from references , , and . Copyrights 2008, 2012, and 2021 American Chemical Society.

A straightforward physics-based approach was developed by Abramov et al. for use in guiding the chemical modification of lead compounds with poor solubility. This approach was intended to replace informatics based models which often rely on training data that does not include a sufficient description of the solid phase contribution. Here, intrinsic solubility was estimated from sublimation enthalpy and hydration free energy as log S ≈ – (ΔH sub + ΔG hyd ), where ΔH sub = – U lattice – 2RT. The authors noted the decision to approximate sublimation free energy from sublimation enthalpy was driven by the choice to avoid phonon mode calculations, and so is likely to be the main source of error in this approach, alongside errors introduced by the 2RT approximation. This method was tested on two pharmaceutical series with known poor solubility, benzoylphenylurea and benzodiazepine derivatives. As the main goal of this approach was not to explicitly predict intrinsic solubility, no detailed comparison against experimental values has been given, although experimental values have been provided alongside calculated sublimation and hydration free energies. The authors instead conclude that this physics-based approach provides a useful breakdown of sublimation and hydration contributions with which to guide molecular modification for the improvement of poor solubility.

In addition to predicting solubility, physics-based methods such as those discussed in this section provide valuable thermodynamic information about solute transfer from the crystalline phase to vapor and then to aqueous solution. Given that the solubility of a crystalline solute relies on both the properties of the undissolved crystal and the solution itself, the thermodynamic insights afford a deeper understanding not only of which of two molecules is more soluble, but also of the reasons behind the differences in solubility. For example, the Solubility Thermodynamic Profile shown in Figure indicates how changes in solute–solute and solute–solvent interactions, as represented by sublimation and hydration free energies, affect solubility. By contrast, data-driven methods, being statistical rather than rooted in first-principles, supply only a restricted statistical perspective on the core physicochemical mechanisms. Furthermore, as most data-driven models derive solubility predictions based on molecular rather than crystalline structure, they cannot explain or predict varying solubilities across different polymorphs of the same molecule.

6.

6

Solubility thermodynamic landscape created from data in Tables 2, 3, and 4 and eq 9 of Reference . The solubility of a crystalline organic molecule depends on the chemical potential of the solid and the solution. Solutes with similar solubilities may have very different sublimation and hydration thermodynamics. Reproduced and adapted from ref . Copyright 2021 American Chemical Society.

3.2.5.2. Fusion Cycle

A multistage simulation procedure was proposed by Luder et al. for the calculation of amorphous and intrinsic aqueous solubility. With this procedure, solubilities were predicted from the free energy change for transferring a molecule from the pure amorphous phase into aqueous solution. A series of steps are carried out to determine the free energy change associated with the pure amorphous phase, in which the free energy is initially evaluated for a pure melt at 673 K and corrected for the desired free energy change at 298 K with a second set of simulations. The free energy of transfer, amorphous solubility and intrinsic solubility are evaluated with the following equations:

ΔGaw=ΔGva+ΔGvw 31
Sa=exp(ΔGaw/RT)/Vm 32
S0=Sa/exp(ΔSm/Rln(Tm/T)) 33

Here, ΔG aw , ΔG va , ΔG vw are the free energies of transfer from amorphous to water, solvation in pure melt, and hydration, respectively. Subscript a and 0 refer to amorphous and intrinsic solubility, respectively. ΔS m is the entropy of melting, T m is the melting point, and V m is the molar volume of a given molecule. Two separate methodologies were tested with this procedure: MD/FEP and a semiempirical model involving MC simulations. The semiempirical model included approximations from linear response and mean field approximation theories, with the goal of obtaining accurate solubilites in a shorter time frame than more traditional methods. Across a data set of 46 drug molecules, the authors reported an RMSE of 13 kJ mol–1 between amorphous to pure melt free energies calculated with MD/FEP and their semiempirical model. Amorphous and intrinsic solubilities calculated with the semiempirical model were also compared against experimental values for 8 of the compounds in their data set. No statistical analysis was provided, however, reasonable correlation can be noted from plots included in the study. The authors attributed a large source of the error between simulation methods to be a result of uncertainty in the surface tension of the TIP4P water model. Further investigation by the authors concluded that the choice of force field plays a major role in the errors observed, with empirical parameters to correct for systematic errors necessary to achieve reliable predictions.

3.2.5.3. Closely Related Methods

Thompson et al. proposed an alternative approach for predicting intrinsic solubility from hydration free energies and vapor pressures. Intrinsic aqueous solubilites were compared against calculated values for a data set of 75 liquid solutes and 7 solid solutes through QM calculations involving the SM5.42R continuum solvent model, and the HF, B3LYP or AM1 levels of theory. The relationship between hydration free energy, pure substance vapor pressure, and intrinsic solubility is given as

S=(P°P)exp(ΔGhydRT) 34

where P is the pressure of an ideal gas at 1 molar concentration and 298 K, and P° is the equilibrium pure substance vapor pressure. An additional calculation is performed to obtain the vapor pressure for a given solute from its free energy of self-solvation. From calculations involving the B3LYP level of theory and SM5.42R solvent model, an MUE of 0.36 log units was achieved across the full data set of liquid and solid solutes. Calculated vapor pressures and hydration free energies were also compared against experimental values, with MUE of 0.31 log units and 0.37 kcal mol–1, respectively. The authors noted that no additional descriptors for solubility prediction were required, beyond those typically applied for the calculation of solvation free energy. As this method was primarily tested on liquid solutes, there is insufficient data with which to determine how this method would perform on a larger data set of crystalline solutes. As such, additional benchmarking is needed to determine the applicability of this approach.

The COSMO-RS method, originally developed for the prediction of equilibrium thermodynamics, has been expanded to the prediction of solubility for solid compounds. The COSMO-RSol method, proposed by Klamt et al., is a procedure in which fusion free energies are predicted via a multilinear regression model. This process includes three parameters obtained from a training set of 150 drug-like molecules: chemical potential of the compound in a solvated state, cavity volume and the number of ring atoms per solute. Solubilites are calculated from the fusion thermodynamic cycle, with predicted fusion free energies combined with mixing free energies calculated using COSMO-RS. This method was tested on a data set of 107 pesticide molecules, obtaining an RMSD of 0.61 log units, compared to an RMSD of 0.66 log units for the training set of 150 drug-like compounds. The authors also included a comparison against an HQSAR model, which had been trained on 405 pesticide molecules and achieved a prediction accuracy of 0.72 log units. To the best of our knowledge, COSMO-RSol has not undergone any further testing on additional data sets, however, similar methods have since been reported. ,

Bjelobrk et al. developed an MD approach for modeling the solubility of organic crystals through well-tempered metadynamics simulations. For a given solute–solvent pair, the free energy difference between the solvated solute and its crystallized state positioned at the edge of the crystal was measured. Simulations were performed at different solution concentrations, and a solubility then determined from the concentration at which the free energy values between states were equal. This method was tested on a single polymorph of urea and naphthalene in a range of organic solvents, with approximately five simulations per solute–solvent pair. The authors reported mole fraction solubilities with experimental errors ranging from 0.0004 to 0.00507, as well as predicted melting points with errors of 14K and 25K for urea and naphthalene, respectively. It was noted that simulation times exceeding 750 ns were necessary to sufficiently sample this dissolution process. Although this method was only tested on organic solvents, it should be trivial to extend toward aqueous solvent. Neha et al. recently carried out a related molecular dynamics study, using both direct coexistence and chemical potential approaches in their investigations of the solubilities of three urea polymorphs.

3.2.5.4. Data-Driven Approaches

Boobier et al. investigated the performance of various machine learning models for predicting solubility in aqueous and organic solvents. A range of linear and nonlinear algorithms were each trained on a set of 14 descriptors, of which 11 were derived from quantum mechanical calculations and the remaining 3 were taken from experimental measurements. Five solvent specific data sets were compiled to test this method, covering water, ethanol, benzene and acetone. Two separate data sets were used for water, the first of which included all available data, while the second only included data in the same solubility range as that available for organic solvents. A total of 900 data points were available for the full water data set, 560 for the reduced data set, and 695, 464, and 452 data points for ethanol, benzene, and acetone, respectively. Several results were reported as part of this study. First, nonlinear models all performed comparably, with the extra trees algorithm providing RMSE of 0.54–0.83 log units across all solvent data sets. On average, 70% of solute predictions were reportedly within 0.70 log units of experiment, which is in line with the experimentally measured error. Comparisons were also made against several commonly used prediction tools (AquaSol, EPISuite, and COSMOtherm) using the same evaluation data sets, as well as an external test set obtained from AstraZeneca. The extra trees models outperformed all three benchmarked tools across each data sets. The authors concluded that the availability of high quality experimental data, and the choice of descriptors are more limiting than the chosen model for accurate solubility prediction.

A QSPR based approach was proposed by Abramov, in which solubility is calculated via fusion and mixing free energies predicted from separate models. Each model used the Cubist algorithm, trained on a large set of Dragon and VolSurf+ descriptors. A set of in-house SMARTS keys were also included as a descriptor. Abramov noted that fusion and mixing free energies were pseudovalues, with fusion values obtained from experimental fusion enthalpies and melting points, and mixing free energies determined using experimental log S and pseudo fusion free energies. For a data set of 62 drug-like molecules, an RMSE of 5.1 kJ mol–1 was reported for both predicted fusion and mixing free energies against pseudoexperimental values, while an RMSE of 0.7 log units was obtained for log S. The author determined that no statistical significance was being identified between fusion free energy and descriptors, with an average Pearson correlation coefficient of 0.21, and that a better description of solid state interactions is needed to further improve upon current cheminformatics based approaches.

Vermeire et al. developed a series of deep neural networks capable of predicting a range of thermodynamic properties, including solvation free energy, enthalpy, and solubility of solid solutes at a range of temperatures for organic or aqueous solvent. Each model was trained on a considerable quantity of experimental data, with upward of 11000 data points used for the solvation free energy and solubility models. Solvation enthalpy and free energy models were initially trained on 1 million and 800000 quantum mechanically calculated values via a transfer learning approach, respectively, before fine-tuning with experimental data. Solubility predictions in a given solvent involved the use of reference values, either obtained from experimental measurements in organic solvent or from predictions of the aqueous solubility.

log(SX,298K)=log(Sref,298K)(ΔGX,solv,298KΔGref,solv,298K)R·298K 35

Here, subscript ref refers to a reference value, and subscript X is the desired solvent being calculated for. For solubilites predictions beyond 298 K, the dissolution enthalpy at 298 K is introduced as the sum of sublimation and solvation enthalpies.

ln(STS298K)=(ΔHsub,298KΔHsolv,298K)R(1T1298K) 36

Here, ΔH sub ,298K is the sublimation free energy for a given solute–solvent pair, determined from multilinear regression coefficients. ΔH solv ,298K is the predicted solvation free energy. The authors note that the temperature dependence of the dissolution enthalpy can be accounted for through numerical integration for systems nearing the critical temperature of a solvent. For predictions involving aqueous solubility as a reference value, RMSE of 0.89 log units and 1.49 log units were obtained for 1051 data points at 298 K and 4922 data points at a 243–364 K range, respectively. The authors note that when experimental solubilites are available, these errors reduce further, as is the case for experimental data for ethanol solvent, with which errors decrease to 0.29 log units for 785 data points and 0.44 log units for 3071 data points, respectively. This procedure relies on a series of submodels to make accurate predictions of solvation thermodynamic parameters before solubility can be determined. If a single model fails within this series, any subsequent parameters may not be acquirable. More over, although the inclusion of experimental data may improve the quality of some predictions, it will not always be readily available.

With solubility being an important physicochemical property for organic molecules across a variety of fields, many different machine learning approaches have been proposed for its accurate prediction from both physics-based and purely data-driven foundations. As such, new and updated methodologies are regularly released which cannot be covered in detailed. Interested readers are directed toward several more recent articles mentioned here. The relative strengths of graph-based and molecular descriptor featurisation for solubility prediction have been discussed independently by several groups. , Similar studies have reported the most effective features from descriptor and fingerprint-based approaches. Francoeur et al. proposed a SMILES-based molecule attention tranformer (MAT) model which could obtain accuracies within 1.7 log S units of experiment. Analyses of the publicly available solubility data commonly used for developing and validating machine learning models has been given by Sorkun et al. and Llompart et al. who aim to scrutinize the quality of existing data sets and understand the factors influencing model performance. They conclude that there is not as yet good evidence of advanced neural network models leading to a breakthrough in predictivity for solubility. Ramos et al. developed an ensemble of recurrent neural networks for predicting the solubility of small molecules, which is available for public use via an online application. Several studies have also proposed generalized approaches for solubility prediction in water and a range of organic solvents. ,, Lastly, Llinas et al. begun a series of solubility challenges within the data-driven community to provide insight into the established approaches for solubility prediction. The first challenge proposed the question of whether a small data set of 100 compounds with high-precision solubility measurements was sufficient to accurately predict the intrinsic solubility of 32 druglike compounds. The findings from this blind challenge found significant variation in prediction accuracy across the wide range of proposed models, and spurred later discussion on the limitations of existing molecular descriptors used for solubility prediction. The second challenge provided two test sets of differing quality, one including curated gold standard shake-flask measurements and the other made up of compounds with greater uncertainty, with which contestants were free to apply their own prediction methods to. The findings from this second challenge showed minimal improvement in prediction accuracy from the use of more complex machine learning algorithms when compared to the first challenge, and emphasized the need for high quality open source data sets with clearer traceability for experimental errors that may be introduced during model development. ,

3.2.5.5. Solubility of Metastable Polymorphs

As discussed in Section , many, perhaps even most, organic compounds are capable of crystallizing into two or more polymorphs with distinct crystal structures and different solubilities. This can have important consequences for industrial applications. A good example is provided by the pharmaceutical sector, where polymorphic form and solubility influence dosage formulation; the problem that Abbott Laboratories had with Ritonavir providing a case in point.

Experimental determination of the solublity of a metastable polymorph is challenging because metastable forms may transform into more stable forms during solubility assays, and because identifying the crystalline form that is present in equilibrium with the saturated solution is difficult. As described in Section , the cycling between super- and subsaturated solutions in potentiometric solubility assays can sometimes help to identify changes in crystalline polymorphic form, but these methods are not guaranteed to find any or all of the metastable polymorphs. Recently, there has been growing interest in semiexperimental and computational methods to assess the solubility of metastable polymorphs. The former rely on experimental estimates of the solubility of a stable polymorphic form and the difference in stability of the stable and metastable polymorphic forms, as such they cannot be used for predictions prior to compound synthesis. Semiexperimental methods to assess polymorph solubility are discussed in a recent review. Conversely, computational methods provide means to predictively assess the solubility thermodynamic landscape. Considering the relationship between sublimation, solvation free energy and solubility that is expressed in a thermodynamic cycle via the vapor (Section ), it is evident that the relative solubilities of metastable polymorphs do not depend on the solutes’s solvation free energy. Therefore, calculation of sublimation free energy (or equivalently the free energy or chemical potential of the solid form) is sufficient to estimate the relative solubility of different polymorphs, and when combined with a calculated solubility for a stable form provides the absolute solubility of each polymorph. Despite recent advances in crystal structure prediction (Section ), recent studies have highlighted the need for greater accuracy in polymorph ranking methods. , Physics-based methods have been used to predict polymorph solubility from both predicted and experimental crystal structures. Neha et al. used molecular dynamics and both the direct coexistence and chemical potential methods (Section ) to compute the solubilities of three urea polymorphs. Mortazavi et al. used crystal structure prediction to assess the relative solubilities of polymorphs of rotigotine leading to the discovery of a new stable form..

4. Toward Universal Solubility Prediction Using Physics-Based Methods

Although most physics-based methods have focused on the prediction of intrinsic aqueous solubility under ambient conditions, there is a growing literature on their use for solubility prediction in other conditions, including in nonaqueous solvents or as a function of pH. In this section, recent applications of physics-based methods in these areas are summarized, while areas for future work are highlighted. The current state-of-the-art of data-driven (and semiempirical) approaches will also be described to provide context, but the coverage of those areas is not intended to be exhaustive in that respect.

Since physics-based methods do not need to be (re)­parametrized on solubility data, unlike data-driven methods, their application to nonaqueous solvents, or varying pH should be an opportunity to demonstrate their versatility. In practice though, there has been less work done in these areas than on the prediction of aqueous solubility under ambient conditions. In part this may reflect the importance of water in biological systems, but it also reflects some practical challenges. In moving to nonaqueous solvents, problems may be encountered in accurately modeling the liquid, which requires different intermolecular potentials and, for larger solvent molecules, better configurational sampling. Similarly, the first-principles prediction of pH-dependent solubility requires calculations of acid–base dissociation constants (pK a), which although well-established in the literature, and implemented in some commercial software, remain computationally expensive and prone to errors for some classes of compounds. The solubility prediction field is expected to benefit from ongoing advances in theory, computation and computing power in related fields in the coming years.

4.1. pH-Dependent Solubility

pH can significantly influence the apparent solubility of ionizable compounds, which we define as the total solubility of the neutral and ionised fractions of the molecule. In the literature, a distinction is sometimes made between bulk pH and microenvironmental pH, particularly for systems where localized pH differences exist. Bulk pH refers to the overall pH of the solution, affecting solubility by altering the ionization state of the solute. For instance, weak acids and bases have greater aqueous solubility when ionized, which depends on whether the bulk pH is above or below the pK a’s of their titratable groups (as discussed in Section ). The majority of methods to predict pH-dependent solubility are implicitly targeted at the prediction of bulk pH. Microenvironmental pH represents localized pH variations, often near surfaces or within confined regions, such as near solid particles or in biological environments like cells. These localized pH shifts can cause different ionization states of a compound compared to the bulk solution, leading to enhanced or reduced solubility in these localized regions, particularly if diffusion or mixing of these layers is slow. Such changes in local solubility cause issues in several industries, including pharmaceutical development, where the microenvironmental pH in the solvent close to the surface of a dosage form can influence the local solubility and hence the dissolution rate, absorption and bioavailability. A common strategy to improve dosage forms is to include an additive to modify the pH in the microenvironment to control dissolution.. , The models used to relate microenvironmental pH, solubility and dissolution rate in the vicinity of a surface (such as a dissolving tablet) fall in three broad categories based on their underlying assumptions. First, the composition of the system at the solid/liquid interace can be assumed to be at thermodynamic equilibrium. This assumes that saturation takes place and other processes such as dissolution do not provide a rate-determining step. Second, the composition near the surface can be assumed to be diffusion-controlled in which case a diffusional boundary layer adjacent to the dissolving surface exists, and models must account for mass transport through this layer. , Third, composition in the vicinity of the surface can be assumed to be controlled by the dissolution rate. , In some cases, dissolution rate can be modeled more accurately by modifying the Noyes-Whitney eq (Section ) to account for the influence of microenvironmental pH on solubility. The study of microenvironmental pH is a large and growing field and readers are directed to several excellent review articles for a comprehensive overview of this topic. ,

Although changing the pH can be an effective way to control the total apparent solubility of an ionizable molecule, it does not affect the intrinsic solubility (neutral form only), except indirectly if the choice of buffer has other effects on the solute (i.e., salting in/out, promoting aggregation, etc.). Hence, one strategy to model pH-dependent solubility is to make separate predictions of intrinsic solubility and pK a and then to apply the Henderson–Hasselbalch equation or similar relationship. From a physics-based perspective, the acid–base dissociation constant, pK a, can be estimated from quantum mechanics calculations and a thermodynamic cycle via the gas-phase. Conceptually, this approach can be considered to comprise two steps: (i) a gas-phase calculation of the pK a; (ii) correction of the gas-phase pK a by the SFEs of the neutral and ionised species. The SFE of the proton is normally obtained from standard data tables. Since several different values for it have been published, care must be taken in selecting the appropriate one, considering both the accuracy of the experimental measurements and the manner in which the data is reported, including the standard states and e.g. whether the Galvani potential for bringing the proton from gas-phase into the solvent dielectric has been included. The other terms are obtained from quantum mechanics calculations, where the solvent environment is commonly incorporated using an implicit continuum model, but can also be included by methods based on integral equation theory or classical density functional theory. Explicit solvent models can also be used, but then molecular dynamics simulations are required to sample the configurational space of the system, thereby dramatically increasing the computational expense of the calculations. Since the added computational expense does not necessarily bring greater accuracy, these approaches are currently less relevant to pH-dependent solubility prediction. A challenge in calculating pK a by physics-based methods is obtaining consistency between the SFEs calculated for the neutral and ionised solutes and those obtained from data tables for the proton; since the magnitude of these numbers is typically large, small discrepancies can lead to large errors in pK a. Nonetheless, reliable predictions with accuracies of ± 0.1 to 0.2 in the value of pK a are expected for small organic molecules with single titratable groups.

The same strategy of predicting intrinsic aqueous solubility and pK a separately and then using them to estimate pH-dependent solubility can also be used by semiempirical and data-driven approaches. In this case, pK a is normally calculated directly from parametrized models rather than indirectly from a thermodynamic cycle via the gas-phase as is done by physics-based approaches. A wide-variety of such parametrized models exist to predict pK a, including group contribution methods, descriptor-based methods, and graph-based methods. In principle, any of the QSPR approaches that were described in the context of solubility prediction in Section could be retrained to predict pK a instead. There is no clear consensus about which set of features or supervised learning algorithm is most accurate, with both traditional statistical models and AI performing well for certain classes of molecules. Several methods are expected to predict pK a with an accuracy of ± 0.1 to 0.2 pK a units in the value of pK a. Regardless of whether intrinsic solubility and pK a are calculated from physics-based methods, semiempirical approaches or data-driven models, or indeed whether one of the values is obtained from experiment, a combination of the errors from the separate calculations or experiments, and from the assumptions in the Henderson–Hasselbalch equation, can lead to significant errors in solubility pH profiles in unfavorable cases.

The Henderson–Hasselbalch equation can be considered to be an empirical correction to the intrinsic solubility based on the difference between the pK a values of the solute and the experimental pH. For a weak base, it predicts a 10-fold increase in solubility for each unit decrease in pH below the pK a. In reality, this increase in solubility is limited by the solubility of the salt (which is not infinite) and may be affected by other factors such as ion-pair formation or aggregation of the solute in solution that are not captured by the equation. As a consequence, predictions from the Henderson–Hasselbalch equation do not always agree with experiment, even when intrinsic solubility and pK a are known accurately. Bergström et al. demonstrated that pH-dependent solubility profiles deviated from the those predicted by the Henderson–Hasselbalch equation for a data set of 25 amines. Bonin et al. reported that the Henderson–Hasselbalch equation gave poor predictions of pH-dependent solubility for compounds at Bayer, though data was not provided. Nonetheless, some successes have been reported. Völgyi et al. showed that the HH equation was valid for six nitrogen containing heterocycles provided that accurate pK a and S0 values were employed. Using pK a values derived from the Marvin software, and intrinsic solubility predictions derived from a model trained on 4500 compounds from the PHYSPROP data set, Hansen et al. calculated pH-dependent solubility profiles with a RMSE of 0.79 log units on a set of 22 compounds. Johnson et al. proposed a hybrid model to predict aqueous solubility that accounted for ionization effects and crystal packing. By employing a predicted pK a and the Henderson–Hasselbalch equation, they were able to calculate the pH-dependent solubility with reasonable accuracy for some example drug-like molecules. The method introduced several interesting ideas, but was limited by the need for an experimental or simulated crystal structure and a reliance on molecular simulation, which increased its computational expense. Several commercial companies provide modules for the prediction of pH-dependent solubility based on some form of the Henderson–Hasselbalch equation, including Percepta from ACD/laboratories and ADMET predictor software from Simulations Plus, but the details of these methods are not disclosed.

Due to challenges faced in estimating pH-dependent solubility from calculated intrinsic solubility and pK a, some data-driven approaches have been developed to predict it directly. These methods can be broadly categorized as those that are trained to predict total solubility at a prespecified pH, and those that are trained to predict total solubility at a user-specified pH. The methods in the first category are often trained on data from a specific experimental assay, using e.g. a given buffer system, which implicitly limits their domain of applicability, but reduces interassay variability. The methods in the latter category accept pH as an input, thereby allowing it to be used to make predictions at different pHs. One of the best examples of both categories is the work by Bonin et al., who collated nearly 300000 data points from 11 solubility assays and showed that a multitask neural network trained to predict solubility at three selected pH values (acidic, neutral basic) gave more accurate predictions than single-task neural networks or Random Forest models based on ECFP-6 fingerprints. The alternative strategy of adding pH as an input variable to the machine learning models led to less reliable models. A multitask neural network is a machine learning algorithm trained to predict multiple properties at the same time (in this instance, solubility at three pHs). During training, the model for each property benefits from information gleaned from the other tasks. Other work on direct prediction of pH-dependent solubility has generally been validated on smaller data sets or homologous compound families. Sun et al. developed a five-variable multilinear regression model to predict pH-dependent solubility, which showed good correlations between calculated and experimental values for 25 molecules. Galarza and Gomez reported predictions of pH-dependent solubility for 258 compounds using ACD lab software. Aleksic et al. compared several models developed at Boehringer Ingelheim for the prediction of solubility at pH 2.2, 4.5 and 6.8. Compared to work on intrinsic solubility prediction, there have been relatively few studies on machine learning approaches to pH-dependent solubility; this is an area where further work is required, which would be supported by the measurement of new accurate pH-dependent solubility data.

4.2. Nonaqueous Solvents

Successful physics-based calculation of solubility in organic solvents has already been demonstrated by several groups, but these studies have often focused on small numbers of solutes or limited ranges of conditions, and larger-scale studies comparing different computational methods on polyfunctional solutes (i.e., drugs, pesticides, etc.) are lacking. Nonetheless, the progress made by physics-based methods is encouraging, and several of the existing methods show promising results.

Several of the physics-based approaches that were discussed in Section X have been applied to the prediction of absolute intrinsic solubility in organic solvents. Based on a methodology that was initially applied by Li and co-workers to compute the solubility of naphthalene in water, Bellucci et al. use the extended Einstein-Crystal methodology to predict solubility of paracetamol in ethanol with good agreement between calculated (0.085 ± 0.014 in mole ratio) and experiment (average 0.0585 ± 0.004). Although a customized force field was required to model the solution, the parametrization was not carried out against ethanol solubility data, and the associated methodology provides a general approach to force field development for solubility calculations. Bjelobrk et al. computed solubility of urea in acetonitrile, ethanol, and methanol, and naphthalene in ethanol and toluene. Since solubility is measured at thermodynamic equilibrium, it is defined as the concentration at which the chemical potentials of the dissolved and undissolved solutes are identical. Here the authors use molecular dynamics simulations of slow-growing crystalline surfaces, rather than the bulk crystalline phase, to probe equilibrium. The energy difference between the solvated solute (A) and its crystallized state at a crystal surface kink site (B), ΔF = F B F B , was calculated to indicate whether the solution was undersaturated, ΔF > 0, at solubility, ΔF = 0, or supersaturated, ΔF < 0. Equilibrium solubility was identified by extrapolating between the results of simulations carried out at different solute concentrations. The computed solubility values were not in quantitative agreement with experiment, which the authors put down to deficiencies in the GAFF force field, but they were in the right order of magnitude and they did reveal the correct trends between solvents.

Some groups have focused on the calculation of the relative solubility in different solvents rather than absolute solubility. Assuming the precipitate is in the same solid-form in each solvent, the influence of the solid-state can be ignored. The relative solubility of a solute in two solvents can then be assessed from the difference in the solvation free energies of the solute in each solvent. Consequently, calculating relative solubility is a simpler problem than calculating absolute solubility. Liu et al. used alchemical free energy calculations with molecular dynamics simulations to calculate the relative solubilities of 53 solute–solvent pairs representing 8 small organic solutes and 36 solvents. Using alchemical calculations of solvation free energy, the authors report a good correlation between calculated and experimental relative solubility (R2=0.85) and a respectable error in ln­(S A /S B ) (RMSE = 1.17). Interestingly, poorer predictions of relative solubility were observed when DFT with an implicit continuum model, the SMD method, was used to compute solvation free energy (R2=0.1). This observation seems at odds with other studies that have used implicit continuum models successfully for the calculation of solvation free energy in organic solvents, and solubility. Notably, Thompson et al. calculated absolute intrinsic aqueous solubilities of liquids and a small number of solids from solvation free energies and vaporisation energies using the SM5.42R model and obtained a mean unsigned error in log10 S of 0.36.

5. Conclusions

Prediction of solubility is essential in fields such as pharmaceutical development, materials science, and environmental chemistry, and ongoing advances in both physics-based and data-driven methods are paving the way for substantial improvements in accuracy and utility. Each approach has its strengths and limitations, and the future of solubility prediction will involve advances in both areas, as well as innovative hybridization of these techniques to overcome their respective challenges.

Physics-based methods, grounded in first-principles, offer a range of potential advantages that make them highly appealing for solubility prediction. These approaches, based on quantum mechanics and/or molecular mechanics calculations, do not rely on parametrization against experimental solubility data, which in theory should allow them to be applied across a broad range of chemical spaces without the constraint of training data availability. Unlike data-driven models, which may struggle to extrapolate beyond the scope of their training sets, physics-based models should theoretically predict solubility in entirely new chemical classes and under different conditions or solvents without requiring extensive reoptimization. Furthermore, these methods yield valuable chemical and thermodynamic insights, such as enthalpic and entropic contributions to solubility, which are useful for optimizing processes like crystallization or solvent selection. Another critical advantage is that these methods can be systematically refined as our understanding of molecular interactions improves or computational power increases, offering a pathway to enhanced accuracy over time.

However, despite their theoretical appeal, physics-based methods are currently less frequently applied in practice compared to data-driven approaches. This disparity stems primarily from several key challenges. First, while physics-based methods provide fundamental insights, they often struggle with accuracy, especially when compared to data-driven methods trained on high-quality solubility data for the given chemical class of interest. When suitable training data is available for the desired chemical domain, machine learning (ML) models can outperform physics-based approaches in terms of accuracy. Additionally, the computational demand of physics-based methods can be prohibitive for some applications, as they often require significant processing power and time to simulate solvation dynamics or crystal structures accurately. Another practical limitation is that physics-based methods frequently depend on the availability of a known or predicted crystal structure, which may not always be accessible, particularly for novel compounds. In this respect, solubility prediction continues to benefit from advances in crystal structure prediction, both in terms of providing more accurate simulated crystal structures, and in terms of improved methods for computing lattice energies and sublimation free energies.

In contrast, data-driven methods, particularly those based on machine learning, have become increasingly popular due to their ability to predict solubility rapidly and with high accuracy, provided that sufficient experimental data is available. These models do best in situations where large, reliable data sets are available for training, allowing them to detect relevant patterns and correlations. However, they can be limited by the scope of the training data, which means their predictions may fail when applied to chemical spaces that deviate from the data they were trained on or to conditions that were not part of the original model development. The key to future success in data-driven methods lies in the continuous accumulation of high-quality solubility data, as well as the development of more sophisticated algorithms that can capture the underlying chemical and physical principles governing solubility.

6. Future Perspectives

Looking ahead, the future of solubility prediction will likely involve advances in both physics-based and data-driven approaches, as well as the development of new hybrid methods that combine the strengths of both approaches.

Recent advances in physics-based methods have generally been limited to the prediction of solubility in aqueous media. Further development of these methods to enable solubility prediction in different solvent compositions and under nonstandard conditions would be of great benefit to the field. The curation of a unified set of industrially relevant solutes, each with accurate experimental measurements of solubility, thermodynamic parameters (e.g., ΔG, ΔH and ΔS of solution, solvation and sublimation) and crystal structures for would drive innovation in this area.

One of the key advantages of physics-based methods is that they can be systematically improved based on a clear theoretical framework. As such many of these methods are expected to benefit from ongoing work to improve quantum mechanics and molecular mechanics methods, including but not limited to better density functionals and basis sets and improved force fields or AI potentials. Moreover, as physics-based calculations move from being applied to small rigid organic solutes in water only, and become more routinely used for larger solutes in mixed solvents, the efficiency with which conformational and configurational space can be sampled will become more important. Improvements in this area will be obtained from advances in enhanced sampling methods and computational power.

Since solubility depends on the physical form of the undissolved precipitate, the perfect solubility model would be able to make accurate predictions for all relevant solid forms, including, for example, amorphous and crystalline states, and different polymorphs, salts, and hydrates. Data-driven models are normally trained to predict solubility for one solid form (e.g., the pure crystal) and small changes (e.g., in polymorphic form) are usually ignored. This introduces a limitation to the model and an implicit error in the predictions. Most physics-based methods do take into account the influence of the solid form, but have to rely on a simulation of the solid form when experimental structural data is not available. As such physics-based solubility prediction will benefit from future advances in solid form prediction, especially crystal structure prediction methods.

As physics-based and data-driven approaches continue to develop, an attractive option to reduce their short comings is to combine them. Hybrid models could leverage the accuracy of machine learning where data is available, while using physics-based insights to guide predictions in uncharted chemical spaces or novel solvents. Additionally, machine learning models could be used to improve the efficiency of physics-based simulations, helping to reduce computational costs by identifying the most relevant molecular conformations or improving or replacing force-field parameters. Another promising hybrid approach is to combine data-driven solubility approaches with self-driving laboratories for fully autonomous solubility prediction. , Such automated experimentation can accelerate solubility data generation allowing new areas of chemical space to be explored more efficiently.

The ability of physics-based methods to systematically improve as computational power advances offers long-term promise, particularly as high-performance computing (HPC) and quantum computing become more accessible. These developments could help overcome current limitations in computational demand, making physics-based predictions more feasible for larger and more complex systems. Similarly, advances in machine learning techniques, such as transfer learning and active learning, could help expand the applicability of data-driven methods, allowing them to perform better in low-data or out-of-sample scenarios.

Advances in software and numerical computing resources are also required to enable nonspecialists to benefit from these methods. This is especially important for physics-based methods that often require calculations using multiple different techniques and significant domain knowledge. Recent work by companies such as Schrodinger and others show progress in this respect. Applications in pharmaceutical drug discovery and agrochemical industries make this an important area for ongoing development.

Comparing the performance of various solubility methods becomes problematic when each approach is evaluated on a different, and often small, test set. Over the last few decades, two Blind Challenges for the Prediction of Aqueous Solubility that have been organized by the American Chemical Society have helped to benchmark data-driven prediction methods, and drive innovation. The physics-based solubility prediction community would now benefit from its own blind solubility prediction challenge for the validation of physics-based methods from a single, high-quality data set. This experimental benchmark data set would ideally include crystallographic and thermodynamic data in addition to solubility measurements so that all aspects of physics-based predictions could be independently validated. Significant value to current research efforts could be found by combining future blind challenges for physics-based and AI/ML solubility prediction with crystal structure prediction challenges for the same molecules. To reach this goal would require a concerted effort from both computational researchers to bring together related but often distinct communities, and experimental researchers to measure and collate the required physical and thermodynamic data.

In conclusion, while physics-based methods are currently underutilized due to their computational intensity and dependence on structural information, they hold considerable long-term potential for improving solubility prediction, particularly in unexplored chemical spaces or novel environments. Data-driven methods will continue to dominate where sufficient data exists, but their future success will depend on how well they can generalize beyond their training data. By integrating these two approaches, future solubility models will likely achieve greater accuracy, robustness, and versatility, ultimately accelerating progress in drug development, materials design, and other critical applications.

Acknowledgments

D.J.F., J.W.C., and D.S.P. thank the EPSRC for funding via Prosperity Partnership EP/S035990/1. B.C. thanks EaSI-CAT for funding.

Glossary

Abbreviations

ADMET

Absorption, Distribution, Metabolism, Elimination, and Toxicity

AFM

Adaptive Force Matching

AIMD

Ab Initio Molecular Dynamics

B3LYP

Becke, 3-parameter, Lee–Yang–Parr

bRo5

Beyond (Lipinski) Rule of Five

CheqSol

Chasing Equilibrium Solubility

CC

Cavity Correction

CNN

Convolutional Neural Network

COSMO

Conductor-like Screening Model

COSMO-RS

Conductor-like Screening Model for Real Solvents

CPCM

Conductor-like Polarizable Continuum Model

CPR

Chemical Potential Route

CSP

Crystal Structure Prediction

DFT

Density Functional Theory

DL

Deep Learning

DMA

Distributed Multipole Analysis

DOS

Density of States

DSC

Differential Scanning Calorimetry

DTT

Dissolution Template Titration

EECM

Extended Einstein Crystal Method

FEP

Free Energy Perturbation

FSC

Fast Scanning Calorimetry

GC

Group Contribution

GCNN

Graph Convolutional Neural Network

GE

Gibbs Energy

GF

Gaussian Fluctuations

GSE

General Solubility Equation

HF

Hartree–Fock

HSP

Hansen Solubility Parameters

HNC

Hyper-Netted Chain

IEFPCM

Integral Equation Formulism Polarizable Continuum Model

IET

Integral Equation Theory

KH

Kovalenko-Hirata

LLM

Large Language Model

logP

Logarithm of the Octanol–Water Partition Coefficient

MC

Monte Carlo

MD

Molecular Dynamics

MDFT

Molecular Density Functional Theory

ML

Machine Learning

MOZ

Molecular Ornstein–Zernike (equation)

MP

Melting Point

MP2

Møller–Plesset Second Order

MUE

Mean Unsigned Error

NN

Neural Networks

OSRW

Orthogonal Space Random Walk

PC

Pressure Correction

PCM

Polarizable Continuum Model

PC+

Pressure Correction Plus

PLS

Partial Least Squares

PW

Partial Wave

PWC

Partial Wave Correction

QM

Quantum Mechanics

QSPR

Quantitative Structure Property Relationship

RED

Relative Energy Difference

RF

Random Forests

RISM

Reference Interaction Site Model

RMSD

Root Mean Square Deviation

Ro5

(Lipinski) Rule of Five

SDC

Structural Descriptor Corrections

SFI

Solubility Forecast Index

SPC

Simple Point Charge (model)

SPC/E

Simple Point Charge Extended (model)

SR

Simplified Response (theory)

SVM

Support Vector Machines

TI

Thermodynamic Integration

UC

Universal Correction

Biographies

Daniel J. Fowles graduated with a Masters in Chemistry from the University of Strathclyde in 2019. He obtained his PhD in Chemistry at the same institute in 2023 studying physics-based and data-driven approaches to compute solvation thermodynamic properties under the supervision of Prof David Palmer.

Benedict J. Connaughton graduated with a Masters in Chemistry with First Class Honours from the University of St Andrews in 2022. His undergraduate research included internships working on solubility prediction with Dr John Mitchell at the University of St Andrews and molecular simulation with Prof David Palmer at the University of Strathclyde. He is currently pursuing a PhD in computational chemistry at the University of St Andrews under the supervision of Dr John Mitchell and Professor Michael Bühl.

James W. Carter graduated with an MSci in Chemistry with First Class Honours from the University of Cambridge in 2013. He then completed his PhD in Computational Biophysics at Imperial College London in 2019 under the supervision of Prof Fernando Bresme. From 2020 to 2023, he was a postdoctoral researcher in the Department of Chemistry at the University of Strathclyde working on computational drug discovery under the supervision of Prof David Palmer. He is currently a postdoctoral researcher at Duke University studying deep learning for molecular property prediction under the supervision of Dr Daniel Reker.

John B. O. Mitchell obtained his PhD in Theoretical Chemistry from Cambridge, studying the energetics of hydrogen bonding with Prof. Sally Price. He then worked with Prof. Janet Thornton at University College London, applying computational chemistry to the growing field of structural bioinformatics. He returned to Cambridge in 2000, taking up a lectureship in Chemistry. He was appointed to a readership at St Andrews in 2009. His research uses theoretical and machine learning techniques in pharmaceutical chemistry, condensed phase modelling, and structural bioinformatics. His group have worked extensively on prediction of bioactivity, solubility, melting point and hydrophobicity from chemical structure, using both informatics and theoretical chemistry methodologies. Recently they have developed novel applications of machine learning in computational biochemistry, such as drug side effect prediction, identifying athletic performance enhancers, and competing against a panel of human experts to predict solubility accurately.

David S. Palmer obtained his Ph.D. in Chemistry from Cambridge University in 2008 for work on computational drug discovery. After completing his doctorate, he carried out postdoctoral research on protein biochemistry with Prof. Frank Jensen and Prof. Birgit Schiott at Aarhus University in Denmark and on solvation thermodynamics with Prof. DSc Maxim V. Fedorov at the Max Planck Institute for Mathematics in the Sciences in Leipzig, Germany. In 2012 he was awarded a Marie Curie Intra-European Fellowship and moved to the Department of Physics at the University of Strathclyde. He was appointed as a Lecturer in the Department of Chemistry at the same institute in 2014, and promoted to Senior Lecturer in 2019, Reader in 2022, and Professor in 2024. He leads a research group applying molecular simulation and informatics methods to problems in biochemistry and chemical physics.

CRediT: Daniel J. Fowles conceptualization, software, visualization, writing - original draft, writing - review & editing; Benedict J. Connaughton conceptualization, software, visualization, writing - original draft, writing - review & editing; James W. Carter conceptualization, software, visualization, writing - original draft, writing - review & editing; John B. O. Mitchell conceptualization, funding acquisition, project administration, resources, software, supervision, visualization, writing - original draft, writing - review & editing; David S. Palmer conceptualization, funding acquisition, project administration, resources, software, supervision, visualization, writing - original draft, writing - review & editing.

The authors declare no competing financial interest.

References

  1. S B., Betageri G. V.. Can Formulation and Drug Delivery Reduce Attrition during Drug Discovery and DevelopmentReview of Feasibility, Benefits and Challenges. APSB. 2014;4:3–17. doi: 10.1016/j.apsb.2013.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Williams H. D., Trevaskis N. L., Charman S. A., Shanker R. M., Charman W. N., Pouton C. W., Porter C. J. H.. Strategies to Address Low Drug Solubility in Discovery and Development. Pharmacol. Rev. 2013;65:315–499. doi: 10.1124/pr.112.005660. [DOI] [PubMed] [Google Scholar]
  3. Zhang Y., Lorsbach B. A., Castetter S., Lambert W. T., Kister J., Wang N. X., Klittich C. J. R., Roth J., Sparks T. C., Loso M. R.. Physicochemical property guidelines for modern agrochemicals. Pest Manag. Sci. 2018;74:1979–1991. doi: 10.1002/ps.5037. [DOI] [PubMed] [Google Scholar]
  4. Mulqueen P.. Recent advances in agrochemical formulation. Adv. Colloid Interface Sci. 2003;106:83–107. doi: 10.1016/S0001-8686(03)00106-4. [DOI] [PubMed] [Google Scholar]
  5. Llinas A., Avdeef A.. Solubility Challenge Revisited after Ten Years, with Multilab Shake-Flask Data, Using Tight (SD ∼ 0.17 Log) and Loose (SD ∼ 0.62 Log) Test Sets. J. Chem. Inf. Model. 2019;59:3036–3040. doi: 10.1021/acs.jcim.9b00345. [DOI] [PubMed] [Google Scholar]
  6. Llinas A., Oprisiu I., Avdeef A.. Findings of the Second Challenge to Predict Aqueous Solubility. J. Chem. Inf. Model. 2020;60:4791–4803. doi: 10.1021/acs.jcim.0c00701. [DOI] [PubMed] [Google Scholar]
  7. Bergström C. A. S., Larsson P.. Computational Prediction of Drug Solubility in Water-Based Systems: Qualitative and Quantitative Approaches Used in the Current Drug Discovery and Development Setting. Int. J. Pharm. 2018;540:185–193. doi: 10.1016/j.ijpharm.2018.01.044. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Lipinski C. A., Lombardo F., Dominy B. W., Feeney P. J.. Experimental and Computational Approaches to Estimate Solubility and Permeability in Drug Discovery and Development Settings. Adv. Drug Delivery Rev. 1997;23:3–25. doi: 10.1016/S0169-409X(96)00423-1. [DOI] [PubMed] [Google Scholar]
  9. Ratkova E. L., Palmer D. S., Fedorov M. V.. Solvation Thermodynamics of Organic Molecules by the Molecular Integral Equation Theory: Approaching Chemical Accuracy. Chem. Rev. 2015;115:6312–6356. doi: 10.1021/cr5000283. [DOI] [PubMed] [Google Scholar]
  10. Bergazin T. D., Tielker N., Zhang Y., Mao J., Gunner M. R., Francisco K., Ballatore C., Kast S., Mobley D.. Evaluation of Log P, pKa and Log D Predictions from the SAMPL7 Blind Challenge. J. Comput. Aided Mol. Des. 2021;35:771–802. doi: 10.1007/s10822-021-00397-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Reilly A. M., Cooper R. I., Adjiman C. S., Bhattacharya S., Boese A. D., Brandenburg J. G., Bygrave P. J., Bylsma R., Campbell J. E., Car R.. et al. Report on the Sixth Blind Test of Organic Crystal Structure Prediction Methods. Acta Crystallogr. Sect. B Struct. Sci. Cryst. Eng. Mater. 2016;72:439–459. doi: 10.1107/S2052520616007447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Day G. M.. Current approaches to predicting molecular organic crystal structures. Crystallogr. Rev. 2011;17:3–52. doi: 10.1080/0889311X.2010.517526. [DOI] [Google Scholar]
  13. Bardwell D. A., Adjiman C. S., Arnautova Y. A., Bartashevich E., Boerrigter S. X. M., Braun D. E., Cruz-Cabeza A. J., Day G. M., Della Valle R. G., Desiraju G. R.. et al. Towards crystal structure prediction of complex organic compounds - a report on the fifth blind test. Acta. Cryst. B. 2011;67:535–551. doi: 10.1107/S0108768111042868. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Seidell, A. Solubilities of Inorganic and Metal Organic Compounds: A Compilation of Quantitative Solubility Data from the Periodical Literature; van Nostrand, 1965. [Google Scholar]
  15. van’t Hoff M. J. H.. Etudes de Dynamique Chimique. Frederik Muller C. 1884;3:333–336. doi: 10.1002/recl.18840031003. [DOI] [Google Scholar]
  16. Grant D., Mehdizadeh M., Fairbrother J.. Non-Linear van’t Hoff Solubility-Temperature Plots and Their Pharmaceutical Interpretation. Int. J. Pharm. 1984;18:25–38. doi: 10.1016/0378-5173(84)90104-2. [DOI] [Google Scholar]
  17. Nguyen D. L. T., Kim K.-J.. Experimental Solubilities of Taltirelin in Water, Ethanol, 1-Propanol and 2-Propanol over Temperatures from 273.2 to 323.2 K. J. Chem. Eng. Jpn. 2018;51:216–221. doi: 10.1252/jcej.17we080. [DOI] [Google Scholar]
  18. Henry W.. III. Experiments on the Quantity of Gases Absorbed by Water, at Different Temperatures, and under Different Pressures. Philos. Trans. R. Soc. London. 1803;93:29–274. doi: 10.1098/rstl.1803.0004. [DOI] [Google Scholar]
  19. Henderson L. J.. Concerning the Relationship between the Strength of Acids and Their Capacity to Preserve Neutrality. Am. J. Physiol. 1908;21:173–179. doi: 10.1152/ajplegacy.1908.21.2.173. [DOI] [Google Scholar]
  20. Hasselbalch, K. A. Die Berechnung Der Wasserstoffzahl Des Blutes Aus Der Freien Und Gebundenen Kohlensäure Desselben, Und Die Sauerstoffbindung Des Blutes Als Funktion Der Wasserstoffzahl. Biochem. Z. 1917, 78, 112–144. [Google Scholar]
  21. Pratt L. R., Chandler D.. Theory of the hydrophobic effect. J. Chem. Phys. 1977;67:3683–3704. doi: 10.1063/1.435308. [DOI] [Google Scholar]
  22. Sun Q.. The Hydrophobic Effects: Our Current Understanding. Molecules. 2022;27:7009. doi: 10.3390/molecules27207009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. van der Vegt N. F. A., Nayar D.. The Hydrophobic Effect and the Role of Cosolvents. J. Phys. Chem. B. 2017;121:9986–9998. doi: 10.1021/acs.jpcb.7b06453. [DOI] [PubMed] [Google Scholar]
  24. Kronberg B.. The hydrophobic effect. Curr. Opin. Colloid Interface Sci. 2016;22:14–22. doi: 10.1016/j.cocis.2016.02.001. [DOI] [Google Scholar]
  25. Sun Q.. The physical origin of hydrophobic effects. Chem. Phys. Lett. 2017;672:21–25. doi: 10.1016/j.cplett.2017.01.057. [DOI] [Google Scholar]
  26. Yalkowsky S. H., Joseph T. R.. Solubilization by Cosolvents I: Organic Solutes in Propylene Glycol-Water Mixtures. J. Pharm. Sci. 1985;74:416–421. doi: 10.1002/jps.2600740410. [DOI] [PubMed] [Google Scholar]
  27. Bhalani D. V., Nutan B., Kumar A., Chandel A. K. S.. Bioavailability Enhancement Techniques for Poorly Aqueous Soluble Drugs and Therapeutics. Biomedicines. 2022;10:2055. doi: 10.3390/biomedicines10092055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Zakharova L. Y., Vasilieva E. A., Mirgorodskaya A. B., Zakharov S. V., Pavlov R. V., Kashapova N. E., Gaynanova G. A.. Hydrotropes: Solubilization of nonpolar compounds and modification of surfactant solutions. J. Mol. Liq. 2023;370:120923. doi: 10.1016/j.molliq.2022.120923. [DOI] [Google Scholar]
  29. Gao Q., Zhu P., Zhao H., Farajtabar A., Jouyban A., Acree W. E. Jr.. Solubility, Solvent Effect, and Solvation Performance of MBQ-167 in Aqueous Cosolvent Solutions. J. Chem. Eng. Data. 2021;66:4725–4739. doi: 10.1021/acs.jced.1c00711. [DOI] [Google Scholar]
  30. Rytting E., Lentz K. A., Chen X.-Q., Qian F., Venkatesh S.. Aqueous and Cosolvent Solubility Data for Drug-like Organic Compounds. AAPS J. 2005;7:E78–E105. doi: 10.1208/aapsj070110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Mettou A., Papaneophytou C., Melagraki G., Maranti A., Liepouri F., Alexiou P., Papakyriakou A., Couladouros E., Eliopoulos E., Afantitis A.. et al. Aqueous Solubility Enhancement for Bioassays of Insoluble Inhibitors and QSPR Analysis: A TNF-α Study. SLAS Discovery. 2018;23:84–93. doi: 10.1177/2472555217712507. [DOI] [PubMed] [Google Scholar]
  32. Kwon H.-C., Kwon J.-H.. Measuring Aqueous Solubility in the Presence of Small Cosolvent Volume Fractions by Passive Dosing. Environ. Sci. Technol. 2012;46:12550–12556. doi: 10.1021/es3035363. [DOI] [PubMed] [Google Scholar]
  33. Bhat P. A., Dar A. A., Rather G. M.. Solubilization Capabilities of Some Cationic, Anionic, and Nonionic Surfactants toward the Poorly Water-Soluble Antibiotic Drug Erythromycin. J. Chem. Eng. Data. 2008;53:1271–1277. doi: 10.1021/je700659g. [DOI] [Google Scholar]
  34. Chakraborty S., Shukla D., Jain A., Mishra B., Singh S.. Assessment of Solubilization Characteristics of Different Surfactants for Carvedilol Phosphate as a Function of pH. J. Colloid Interface Sci. 2009;335:242–249. doi: 10.1016/j.jcis.2009.03.047. [DOI] [PubMed] [Google Scholar]
  35. Desai D., Wong B., Huang Y., Ye Q., Tang D., Guo H., Huang M., Timmins P.. Surfactant-Mediated Dissolution of Metformin Hydrochloride Tablets: Wetting Effects versus Ion Pairs Diffusivity. J. Pharm. Sci. 2014;103:920–926. doi: 10.1002/jps.23852. [DOI] [PubMed] [Google Scholar]
  36. Loftsson T.. Drug Solubilization by Complexation. Int. J. Pharm. 2017;531:276–280. doi: 10.1016/j.ijpharm.2017.08.087. [DOI] [PubMed] [Google Scholar]
  37. Sareen R., Jain N., Dhar K. L.. Curcumin-Zn­(II) Complex for Enhanced Solubility and Stability: An Approach for Improved Delivery and Pharmacodynamic Effects. Pharm. Dev. Technol. 2016;21:630–635. doi: 10.3109/10837450.2015.1041042. [DOI] [PubMed] [Google Scholar]
  38. Zakelj, S. ; Berginc, K. ; Ursic, D. ; Kristl, A. . Influence of Metal Cations on the Solubility of Fluoroquinolones. Pharmazie 2007, 62, 318–320. [PubMed] [Google Scholar]
  39. Higuchi T., Lach J. L.. Investigation of Some Complexes Formed in Solution by Caffeine*: IV. Interactions between Caffeine and Sulfathiazole, Sulfadiazine, P-aminobenzoic Acid, Benzocaine, Phenobarbital, and Barbital. J. Am. Pharm. Assoc. 1954;43:349–354. doi: 10.1002/jps.3030430609. [DOI] [PubMed] [Google Scholar]
  40. Diez N. M., de la Peña A. M., García M. C. M., Gil D. B., Cañada-Cañada F.. Fluorimetric Determination of Sulphaguanidine and Sulphamethoxazole by Host-Guest Complexation in β-Cyclodextrin and Partial Least Squares Calibration. J. Fluoresc. 2007;17:309–318. doi: 10.1007/s10895-007-0174-4. [DOI] [PubMed] [Google Scholar]
  41. Garekani H., Sadeghi F., Ghazi A.. Increasing the Aqueous Solubility of Acetaminophen in the Presence of Polyvinylpyrrolidone and Investigation of the Mechanisms Involved. Drug Dev. Ind. Pharm. 2003;29:173–179. doi: 10.1081/DDC-120016725. [DOI] [PubMed] [Google Scholar]
  42. Charvalos E., Tzatzarakis M. N., Bambeke F. V., Tulkens P. M., Tsatsakis A. M., Tzanakakis G. N., Mingeot-Leclercq M.-P.. Water-Soluble Amphotericin B-polyvinylpyrrolidone Complexes with Maintained Antifungal Activity against Candida Spp. and Aspergillus Spp. and Reduced Haemolytic and Cytotoxic Effects. J. Antimicrob. Chemother. 2006;57:236–244. doi: 10.1093/jac/dki455. [DOI] [PubMed] [Google Scholar]
  43. Thakral S., Madan A. K.. Urea Co-Inclusion Compounds of 13 Cis-Retinoic Acid for Simultaneous Improvement of Dissolution Profile, Photostability and Safe Handling Characteristics. J. Pharm. Pharmacol. 2008;60:823–832. doi: 10.1211/jpp.60.7.0003. [DOI] [PubMed] [Google Scholar]
  44. Kolesnichenko I. V., Anslyn E. V.. Practical Applications of Supramolecular Chemistry. Chem. Soc. Rev. 2017;46:2385–2390. doi: 10.1039/C7CS00078B. [DOI] [PubMed] [Google Scholar]
  45. Li J., Liu P., Liu J.-P., Yang J.-K., Zhang W.-L., Fan Y.-Q., Kan S.-L., Cui Y., Zhang W.-J.. Bioavailability and Foam Cells Permeability Enhancement of Salvianolic Acid B Pellets Based on Drug-Phospholipids Complex Technique. Eur. J. Pharm. Biopharm. 2013;83:76–86. doi: 10.1016/j.ejpb.2012.09.021. [DOI] [PubMed] [Google Scholar]
  46. Xia H.-j., Zhang Z.-h., Jin X., Hu Q., Chen X.-y., Jia X.-b.. A Novel Drug-Phospholipid Complex Enriched with Micelles: Preparation and Evaluation in Vitro and in Vivo. Int. J. Nanomedicine. 2013;8:545–554. doi: 10.2147/IJN.S39526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. van Santen R. A.. The Ostwald Step Rule. J. Phys. Chem. 1984;88:5768–5769. doi: 10.1021/j150668a002. [DOI] [Google Scholar]
  48. Sou T., Bergström C. A.. Automated Assays for Thermodynamic (Equilibrium) Solubility Determination. Drug Discovery Today Technol. 2018;27:11–19. doi: 10.1016/j.ddtec.2018.04.004. [DOI] [PubMed] [Google Scholar]
  49. Pudipeddi M., Serajuddin A. T. M.. Trends in Solubility of Polymorphs. J. Pharm. Sci. 2005;94:929–39. doi: 10.1002/jps.20302. [DOI] [PubMed] [Google Scholar]
  50. Braun D. E., McMahon J. A., Bhardwaj R. M., Nyman J., Neumann M. A., Streek J. V. D., Reutzel-Edens S. M.. Inconvenient Truths about Solid Form Landscapes Revealed in the Polymorphs and Hydrates of Gandotinib. Cryst. Growth Des. 2019;19:2947–2962. doi: 10.1021/acs.cgd.9b00162. [DOI] [Google Scholar]
  51. Neumann M. A., van de Streek J.. How Many Ritonavir Cases Are There Still out There? Faraday Discuss. 2018;211:441–458. doi: 10.1039/C8FD00069G. [DOI] [PubMed] [Google Scholar]
  52. Llinàs A., Box K. J., Burley J. C., Glen R. C., Goodman J. M.. A New Method for the Reproducible Generation of Polymorphs: Two Forms of Sulindac with Very Different Solubilities. J. Appl. Crystallogr. 2007;40:379–381. doi: 10.1107/S0021889807007832. [DOI] [Google Scholar]
  53. Noyes A. A., Whitney W. R.. The Rate of Solution of Solid Substances in Their Own Solutions. J. Am. Chem. Soc. 1897;19:930–934. doi: 10.1021/ja02086a003. [DOI] [Google Scholar]
  54. Jouyban, A. ; Fakhree, M. A. A. . Toxicity and Drug Testing; InTech, 2012. [Google Scholar]
  55. Alsenz J., Kansy M.. High Throughput Solubility Measurement in Drug Discovery and Development. Adv. Drug Delivery Rev. 2007;59:546–567. doi: 10.1016/j.addr.2007.05.007. [DOI] [PubMed] [Google Scholar]
  56. Avdeef A.. Suggested Improvements for Measurement of Equilibrium Solubility-pH of Ionizable Drugs. ADMET DMPK. 2015;3:84–109. doi: 10.5599/admet.3.2.193. [DOI] [Google Scholar]
  57. Box K., Comer J. E., Gravestock T., Stuart M.. New Ideas about the Solubility of Drugs. Chem. Biodivers. 2009;6:1767–1788. doi: 10.1002/cbdv.200900164. [DOI] [PubMed] [Google Scholar]
  58. Stuart M., Box K.. Chasing Equilibrium: Measuring the Intrinsic Solubility of Weak Acids and Bases. Anal. Chem. 2005;77:983–990. doi: 10.1021/ac048767n. [DOI] [PubMed] [Google Scholar]
  59. Comer J., Judge S., Matthews D., Towers L., Falcone B., Goodman J., Dearden J.. The Intrinsic Aqueous Solubility of Indomethacin. ADMET DMPK. 2014;2:18–32. doi: 10.5599/admet.2.1.33. [DOI] [Google Scholar]
  60. Avdeef A., Berger C. M.. pH-metric Solubility. Eur. J. Pharm. Sci. 2001;14:281–291. doi: 10.1016/S0928-0987(01)00190-7. [DOI] [PubMed] [Google Scholar]
  61. Yang Z.-S., Zeng Z.-X., Xue W.-L., Zhang Y.. Solubility of Bis­(Benzoxazolyl-2-Methyl) Sulfide in Different Pure Solvents and Ethanol + Water Binary Mixtures between (273.25 and 325.25) K. J. Chem. Eng. Data. 2008;53:2692–2695. doi: 10.1021/je8005419. [DOI] [Google Scholar]
  62. Hao H.-x., Hou B.-h., Wang J.-k., Zhang M.-j.. Solubility of Erythritol in Different Solvents. J. Chem. Eng. Data. 2005;50:1454–1456. doi: 10.1021/je0501033. [DOI] [Google Scholar]
  63. Mitchell J. B.. Three Machine Learning Models for the 2019 Solubility Challenge. ADMET DMPK. 2018;8:215–250. doi: 10.5599/admet.835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  64. Rohde, B. Plate/Row/Column Model with CLogP for Solubility. https://www.kaggle.com/code/bernhardrohde/plate-row-column-model-with-clogp-for-solubility/notebook, version 4, 2022; [Online; accessed 11-April-2025].
  65. Llinàs A., Glen R. C., Goodman J. M.. Solubility Challenge: Can You Predict Solubilities of 32 Molecules Using a Database of 100 Reliable Measurements? J. Chem. Inf. Model. 2008;48:1289–1303. doi: 10.1021/ci800058v. [DOI] [PubMed] [Google Scholar]
  66. Kombo D. C., Stepp J. D., Lim S., Elshorst B., Li Y., Cato L., Shomali M., Fink D., LaMarche M. J.. Predictions of Colloidal Molecular Aggregation Using AI/ML Models. ACS Omega. 2024;9:28691–28706. doi: 10.1021/acsomega.4c02886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Seidler J., McGovern S. L., Doman T. N., Shoichet B. K.. Identification and Prediction of Promiscuous Aggregating Inhibitors among Known Drugs. J. Med. Chem. 2003;46:4477–4486. doi: 10.1021/jm030191r. [DOI] [PubMed] [Google Scholar]
  68. Irwin J. J., Duan D., Torosyan H., Doak A. K., Ziebart K. T., Sterling T., Tumanian G., Shoichet B. K.. An Aggregation Advisor for Ligand Discovery. J. Med. Chem. 2015;58:7076–7087. doi: 10.1021/acs.jmedchem.5b01105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Lee K., Yang A., Lin Y.-C., Reker D., Bernardes G. J. L., Rodrigues T.. Combating small-molecule aggregation with machine learning. Cell Rep. Phys. Sci. 2021;2:100573. doi: 10.1016/j.xcrp.2021.100573. [DOI] [Google Scholar]
  70. Fühner H.. Die Wasserlöslichkeit in homologen Reihen. Ber. dtsch. Chem. Ges. A/B. 1924;57:510–515. doi: 10.1002/cber.19240570326. [DOI] [Google Scholar]
  71. Hansch C., Maloney P. P., Fujita T., Muir R. M.. Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients. Nature. 1962;194:178–180. doi: 10.1038/194178b0. [DOI] [Google Scholar]
  72. Palmer D. S., O’Boyle N. M., Glen R. C., Mitchell J. B. O.. Random Forest Models To Predict Aqueous Solubility. J. Chem. Inf. Model. 2007;47:150–158. doi: 10.1021/ci060164k. [DOI] [PubMed] [Google Scholar]
  73. Hughes L. D., Palmer D. S., Nigsch F., Mitchell J. B. O.. Why Are Some Properties More Difficult To Predict than Others? A Study of QSPR Models of Solubility, Melting Point, and Log P. J. Chem. Inf. Model. 2008;48:220–232. doi: 10.1021/ci700307p. [DOI] [PubMed] [Google Scholar]
  74. Jumper J., Evans R., Pritzel A., Green T., Figurnov M., Ronneberger O., Tunyasuvunakool K., Bates R., Žídek A., Potapenko A.. et al. Highly Accurate Protein Structure Prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Palmer D. S., Mitchell J. B.. Is Experimental Data Quality the Limiting Factor in Predicting the Aqueous Solubility of Druglike Molecules? Mol. Pharmaceutics. 2014;11:2962–2972. doi: 10.1021/mp500103r. [DOI] [PubMed] [Google Scholar]
  76. Klopman G., Wang S., Balthasar D. M.. Estimation of Aqueous Solubility of Organic Molecules by the Group Contribution Approach. Application to the Study of Biodegradation. J. Chem. Inf. Comput. Sci. 1992;32:474–482. doi: 10.1021/ci00009a013. [DOI] [PubMed] [Google Scholar]
  77. Klopman G., Zhu H.. Estimation of the Aqueous Solubility of Organic Molecules by the Group Contribution Approach. J. Chem. Inf. Comput. Sci. 2001;41:439–445. doi: 10.1021/ci000152d. [DOI] [PubMed] [Google Scholar]
  78. Hou T. J., Xia K., Zhang W., Xu X. J.. ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach. J. Chem. Inf. Comput. Sci. 2004;44:266–275. doi: 10.1021/ci034184n. [DOI] [PubMed] [Google Scholar]
  79. Marrero J., Gani R.. Group-Contribution-Based Estimation of Octanol/Water Partition Coefficient and Aqueous Solubility. Ind. Eng. Chem. Res. 2002;41:6623–6633. doi: 10.1021/ie0205290. [DOI] [Google Scholar]
  80. Kühne R., Ebert R. U., Kleint F., Schmidt G., Schüürmann G.. Group Contribution Methods to Estimate Water Solubility of Organic Chemicals. Chemosphere. 1995;30:2061–2077. doi: 10.1016/0045-6535(95)00084-L. [DOI] [Google Scholar]
  81. Wildman S. A., Crippen G. M.. Prediction of Physicochemical Parameters by Atomic Contributions. J. Chem. Inf. Comput. Sci. 1999;39:868–873. doi: 10.1021/ci990307l. [DOI] [Google Scholar]
  82. Jain N., Yalkowsky S. H.. Estimation of the Aqueous Solubility I: Application to Organic Nonelectrolytes. J. Pharm. Sci. 2001;90:234–252. doi: 10.1002/1520-6017(200102)90:2&#x0003c;234::AID-JPS14&#x0003e;3.0.CO;2-V. [DOI] [PubMed] [Google Scholar]
  83. Moriwaki H., Tian Y.-S., Kawashita N., Takagi T.. Mordred: A Molecular Descriptor Calculator. J. Cheminform. 2018;10:4. doi: 10.1186/s13321-018-0258-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Delaney J. S.. ESOL: Estimating Aqueous Solubility Directly from Molecular Structure. J. Chem. Inf. Comput. Sci. 2004;44:1000–1005. doi: 10.1021/ci034243x. [DOI] [PubMed] [Google Scholar]
  85. Huuskonen J.. Estimation of Aqueous Solubility for a Diverse Set of Organic Compounds Based on Molecular Topology. J. Chem. Inf. Comput. Sci. 2000;40:773–777. doi: 10.1021/ci9901338. [DOI] [PubMed] [Google Scholar]
  86. Jorgensen W. L., Duffy E. M.. Prediction of Drug Solubility from Monte Carlo Simulations. Bioorg. Med. Chem. Lett. 2000;10:1155–1158. doi: 10.1016/S0960-894X(00)00172-4. [DOI] [PubMed] [Google Scholar]
  87. Krizhevsky A., Sutskever I., Hinton G. E.. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM. 2017;60:84–90. doi: 10.1145/3065386. [DOI] [Google Scholar]
  88. Bodor N., Harget A., Huang M. J.. Neural Network Studies. 1. Estimation of the Aqueous Solubility of Organic Compounds. J. Am. Chem. Soc. 1991;113:9480–9483. doi: 10.1021/ja00025a009. [DOI] [Google Scholar]
  89. Boobier S., Hose D. R. J., Blacker A. J., Nguyen B. N.. Machine Learning with Physicochemical Relationships: Solubility Prediction in Organic Solvents and Water. Nat. Commun. 2020;11:5753. doi: 10.1038/s41467-020-19594-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Conn J. G. M., Carter J. W., Conn J. J. A., Subramanian V., Baxter A., Engkvist O., Llinas A., Ratkova E. L., Pickett S. D., McDonagh J. L.. et al. Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models. J. Chem. Inf. Model. 2023;63:1099–1113. doi: 10.1021/acs.jcim.2c01189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Cui Q., Lu S., Ni B., Zeng X., Tan Y., Chen Y. D., Zhao H.. Improved Prediction of Aqueous Solubility of Novel Compounds by Going Deeper With Deep Learning. Front Oncol. 2020;10:121. doi: 10.3389/fonc.2020.00121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  92. Gilmer, J. ; Schoenholz, S. S. ; Riley, P. F. ; Vinyals, O. ; Dahl, G. E. . Neural Message Passing for Quantum Chemistry. arXiv 2017, 10.48550/arXiv.1704.01212. [DOI]
  93. Lusci A., Pollastri G., Baldi P.. Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-like Molecules. J. Chem. Inf. Model. 2013;53:1563–1575. doi: 10.1021/ci400187y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  94. Duvenaud, D. ; Maclaurin, D. ; Aguilera-Iparraguirre, J. ; Gómez-Bombarelli, R. ; Hirzel, T. ; Aspuru-Guzik, A. ; Adams, R. P. . Convolutional Networks on Graphs for Learning Molecular Fingerprints. arXiv 2015, 10.48550/arXiv.1509.09292. [DOI]
  95. Yang K., Swanson K., Jin W., Coley C., Eiden P., Gao H., Guzman-Perez A., Hopper T., Kelley B., Mathea M.. et al. Analyzing Learned Molecular Representations for Property Prediction. J. Chem. Inf. Model. 2019;59:3370–3388. doi: 10.1021/acs.jcim.9b00237. [DOI] [PMC free article] [PubMed] [Google Scholar]
  96. Ryu, S. ; Lee, S. . Accurate, reliable and interpretable solubility prediction of druglike molecules with attention pooling and Bayesian learning. arXiv 2022, 10.48550/arXiv.2210.07145. [DOI]
  97. Wiercioch M., Kirchmair J.. Dealing with a Data-limited Regime: Combining Transfer Learning And Transformer Attention Mechanism to Increase Aqueous Solubility Prediction Performance. Artificial Intelligence in the Life Sciences. 2021;1:100021. doi: 10.1016/j.ailsci.2021.100021. [DOI] [Google Scholar]
  98. Zheng T., Mitchell J. B. O., Dobson S.. Revisiting the Application of Machine Learning Approaches in Predicting Aqueous Solubility. ACS Omega. 2024;9:35209–35222. doi: 10.1021/acsomega.4c06163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  99. Krenn M., Häse F., Nigam A., Friederich P., Aspuru-Guzik A.. Self-Referencing Embedded Strings (SELFIES): A 100% Robust Molecular String Representation. Mach. Learn.: Sci. Technol. 2020;1:045024. doi: 10.1088/2632-2153/aba947. [DOI] [Google Scholar]
  100. Francoeur P. G., Koes D. R.. SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction. J. Chem. Inf. Model. 2021;61:2530–2536. doi: 10.1021/acs.jcim.1c00331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  101. Panapitiya G., Girard M., Hollas A., Sepulveda J., Murugesan V., Wang W., Saldanha E.. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS Omega. 2022;7:15695–15710. doi: 10.1021/acsomega.2c00642. [DOI] [PMC free article] [PubMed] [Google Scholar]
  102. Rojas A. J.. An Investigation into ChatGPT’s Application for a Scientific Writing Assignment. J. Chem. Educ. 2024;101:1959–1965. doi: 10.1021/acs.jchemed.4c00034. [DOI] [Google Scholar]
  103. Seidl, P. ; Vall, A. ; Hochreiter, S. ; Klambauer, G. . Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language. arXiv 2023, 10.48550/arXiv.2303.03363. [DOI]
  104. Zhao, H. ; Liu, S. ; Ma, C. ; Xu, H. ; Fu, J. ; Deng, Z.-H. ; Kong, L. ; Liu, Q. . GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning. arXiv 2023, 10.48550/arXiv.2306.13089. [DOI]
  105. Liu, Y. ; Ding, S. ; Zhou, S. ; Fan, W. ; Tan, Q. . MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction. arXiv 2024, 10.48550/arXiv.2406.12950. [DOI]
  106. Jablonka K. M., Schwaller P., Ortega-Guerrero A., Smit B.. Leveraging large language models for predictive chemistry. Nat. Mach. Intell. 2024;6:161–169. doi: 10.1038/s42256-023-00788-1. [DOI] [Google Scholar]
  107. Zheng Y., Koh H. Y., Ju J., Nguyen A. T. N., May L. T., Webb G. I., Pan S.. Large language models for scientific discovery in molecular property prediction. Nat. Mach. Intell. 2025;7:437–447. doi: 10.1038/s42256-025-00994-z. [DOI] [Google Scholar]
  108. Brown, T. ; Mann, B. ; Ryder, N. ; Subbiah, M. ; Kaplan, J. D. ; Dhariwal, P. ; Neelakantan, A. ; Shyam, P. ; Sastry, G. ; Askell, A. . Language Models are Few-Shot Learners. Adv. Neural Inf. Process Syst. 2020, 33, 1877–1901. [Google Scholar]
  109. Wei, J. ; Bosma, M. ; Zhao, V. Y. ; Guu, K. ; Yu, A. W. ; Lester, B. ; Du, N. ; Dai, A. M. ; Le, Q. V. . Finetuned Language Models Are Zero-Shot Learners. arXiv 2022, 10.48550/arXiv.2109.01652. [DOI]
  110. Schimunek, J. ; Seidl, P. ; Friedrich, L. ; Kuhn, D. ; Rippmann, F. ; Hochreiter, S. ; Klambauer, G. . Context-enriched molecule representations improve few-shot drug discovery. arXiv 2023, 10.48550/arXiv.2305.09481. [DOI]
  111. Zeng Z., Yao Y., Liu Z., Sun M.. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nat. Commun. 2022;13:862. doi: 10.1038/s41467-022-28494-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  112. Bachlechner, T. ; Majumder, H. ; Mao, B. P. ; Cottrell, G. ; McAuley, J. . ReZero is All You Need: Fast Convergence at Large Depth. arXiv 2020, 10.48550/arXiv.2003.04887. [DOI]
  113. Ying, C. ; Cai, T. ; Luo, S. ; Zheng, S. ; Ke, G. ; He, D. ; Shen, Y. ; Liu, T.-Y. . Do transformers really perform bad for graph representation?. Adv. Neural Inf. Process Syst. 2021, 34, 28877–28888. [Google Scholar]
  114. OpenAI; Achiam, J. ; et al. GPT-4 Technical Report. arXiv 2023, 10.48550/arXiv.2303.08774. [DOI]
  115. Weininger D.. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988;28:31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
  116. Yalkowsky S. H., Valvani S. C.. Solubility and Partitioning I: Solubility of Nonelectrolytes in Water. J. Pharm. Sci. 1980;69:912–922. doi: 10.1002/jps.2600690814. [DOI] [PubMed] [Google Scholar]
  117. Sanghvi T., Jain N., Yang G., Yalkowsky S. H.. Estimation of Aqueous Solubility By The General Solubility Equation (GSE) The Easy Way. QSAR Comb. Sci. 2003;22:258–262. doi: 10.1002/qsar.200390020. [DOI] [Google Scholar]
  118. Hansch C., Quinlan J. E., Lawrence G. L.. Linear Free-Energy Relationship between Partition Coefficients and the Aqueous Solubility of Organic Liquids. J. Org. Chem. 1968;33:347–350. doi: 10.1021/jo01265a071. [DOI] [Google Scholar]
  119. Ran Y., Yalkowsky S. H.. Prediction of Drug Solubility by the General Solubility Equation (GSE) J. Chem. Inf. Comput. Sci. 2001;41:354–357. doi: 10.1021/ci000338c. [DOI] [PubMed] [Google Scholar]
  120. Ali J., Camilleri P., Brown M. B., Hutt A. J., Kirton S. B.. Revisiting the General Solubility Equation: In Silico Prediction of Aqueous Solubility Incorporating the Effect of Topographical Polar Surface Area. J. Chem. Inf. Model. 2012;52:420–428. doi: 10.1021/ci200387c. [DOI] [PubMed] [Google Scholar]
  121. Tetko I. V., Sushko Y., Novotarskyi S., Patiny L., Kondratov I., Petrenko A. E., Charochkina L., Asiri A. M.. How Accurately Can We Predict the Melting Points of Drug-like Compounds? J. Chem. Inf. Model. 2014;54:3320–3329. doi: 10.1021/ci5005288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  122. McDonagh J. L., van Mourik T., Mitchell J. B. O.. Predicting Melting Points of Organic Molecules: Applications to Aqueous Solubility Prediction Using the General Solubility Equation. Mol. Inform. 2015;34:715–724. doi: 10.1002/minf.201500052. [DOI] [PubMed] [Google Scholar]
  123. Hill A. P., Young R. J.. Getting Physical in Drug Discovery: A Contemporary Perspective on Solubility and Hydrophobicity. Drug Discovery Today. 2010;15:648–655. doi: 10.1016/j.drudis.2010.05.016. [DOI] [PubMed] [Google Scholar]
  124. Jain P., Yalkowsky S. H.. Prediction of Aqueous Solubility from SCRATCH. Int. J. Pharm. 2010;385:1–5. doi: 10.1016/j.ijpharm.2009.10.003. [DOI] [PubMed] [Google Scholar]
  125. Myrdal P., Ward G. H., Simamora P., Yalkowsky S. H.. Aquafac: Aqueous Functional Group Activity Coefficients. AR QSAR Environ. Res. 1993;1:53–61. doi: 10.1080/10629369308028816. [DOI] [Google Scholar]
  126. Dannenfelser R.-M., Yalkowsky S. H.. Estimation of Entropy of Melting from Molecular Structure: A Non-Group Contribution Method. Ind. Eng. Chem. Res. 1996;35:1483–1486. doi: 10.1021/ie940581z. [DOI] [Google Scholar]
  127. Avdeef A.. Prediction of Aqueous Intrinsic Solubility of Druglike Molecules Using Random Forest Regression Trained with Wiki-pS0 Database. ADMET DMPK. 2020;8:29–77. doi: 10.5599/admet.766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  128. Avdeef A., Kansy M.. “Flexible-Acceptor” General Solubility Equation for beyond Rule of 5 Drugs. Mol. Pharmaceutics. 2020;17:3930–3940. doi: 10.1021/acs.molpharmaceut.0c00689. [DOI] [PubMed] [Google Scholar]
  129. Kier L. B.. An Index of Molecular Flexibility from Kappa Shape Attributes. Quant. Struct.-Act. Relatsh. 1989;8:221–224. doi: 10.1002/qsar.19890080307. [DOI] [Google Scholar]
  130. Caron G., Digiesi V., Solaro S., Ermondi G.. Flexibility in Early Drug Discovery: Focus on the beyond-Rule-of-5 Chemical Space. Drug Discovery Today. 2020;25:621–627. doi: 10.1016/j.drudis.2020.01.012. [DOI] [PubMed] [Google Scholar]
  131. Avdeef A., Kansy M.. Predicting Solubility of Newly-Approved Drugs (2016–2020) with a Simple ABSOLV and GSE­(Flexible-Acceptor) Consensus Model Outperforming Random Forest Regression. J. Solution Chem. 2022;51:1020–1055. doi: 10.1007/s10953-022-01141-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  132. Hildebrand J. H., Scott R. L.. Solutions of Nonelectrolytes. Annu. Rev. Phys. Chem. 1950;1:75–92. doi: 10.1146/annurev.pc.01.100150.000451. [DOI] [Google Scholar]
  133. Hansen, C. M. The Three Dimensional Solubility Parameter; Danish Technical Press, 1967; Vol. 14. [Google Scholar]
  134. Patterson D.. Role of Free Volume Changes in Polymer Solution Thermodynamics. J. Polym. Sci. Part C Polym. Symp. 1967;16:3379–3389. doi: 10.1002/polc.5070160632. [DOI] [Google Scholar]
  135. Agata Y., Yamamoto H.. Determination of Hansen Solubility Parameters of Ionic Liquids Using Double-Sphere Type of Hansen Solubility Sphere Method. Chem. Phys. 2018;513:165–173. doi: 10.1016/j.chemphys.2018.04.021. [DOI] [Google Scholar]
  136. Ezati N., Roberts M. S., Zhang Q., Moghimi H. R.. Measurement of Hansen Solubility Parameters of Human Stratum Corneum. Iran J. Pharm. Res. 2020;19:572–578. doi: 10.22037/ijpr.2019.112435.13755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  137. Sánchez-Camargo A. d. P., Bueno M., Parada-Alfonso F., Cifuentes A., Ibáñez E.. Hansen Solubility Parameters for Selection of Green Extraction Solvents. TrAC. 2019;118:227–237. doi: 10.1016/j.trac.2019.05.046. [DOI] [Google Scholar]
  138. Louwerse M. J., Maldonado A., Rousseau S., Moreau-Masselon C., Roux B., Rothenberg G.. Revisiting Hansen Solubility Parameters by Including Thermodynamics. ChemPhysChem. 2017;18:2999–3006. doi: 10.1002/cphc.201700408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  139. Stefanis E., Panayiotou C.. Prediction of Hansen Solubility Parameters with a New Group-Contribution Method. Int. J. Thermophys. 2008;29:568–585. doi: 10.1007/s10765-008-0415-z. [DOI] [Google Scholar]
  140. Mathieu D.. Pencil and Paper Estimation of Hansen Solubility Parameters. ACS Omega. 2018;3:17049–17056. doi: 10.1021/acsomega.8b02601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  141. Járvás G., Quellet C., Dallos A.. Estimation of Hansen Solubility Parameters Using Multivariate Nonlinear QSPR Modeling with COSMO Screening Charge Density Moments. Fluid Ph. Equilib. 2011;309:8–14. doi: 10.1016/j.fluid.2011.06.030. [DOI] [Google Scholar]
  142. Sanchez-Lengeling B., Roch L. M., Perea J. D., Langner S., Brabec C. J., Aspuru-Guzik A.. A Bayesian Approach to Predict Solubility Parameters. Adv. Theory Simul. 2019;2:1800069. doi: 10.1002/adts.201800069. [DOI] [Google Scholar]
  143. Manzanilla-Granados H. M., Saint-Martín H., Fuentes-Azcatl R., Alejandre J.. Direct Coexistence Methods to Determine the Solubility of Salts in Water from Numerical Simulations. Test Case NaCl. J. Phys. Chem. B. 2015;119:8389–8396. doi: 10.1021/acs.jpcb.5b00740. [DOI] [PubMed] [Google Scholar]
  144. Joung I. S., Cheatham T. E.. Molecular Dynamics Simulations of the Dynamic and Energetic Properties of Alkali and Halide Ions Using Water-Model-Specific Ion Parameters. J. Phys. Chem. B. 2009;113:13279–13290. doi: 10.1021/jp902584c. [DOI] [PMC free article] [PubMed] [Google Scholar]
  145. Paluch A. S., Jayaraman S., Shah J. K., Maginn E. J.. A Method for Computing the Solubility Limit of Solids: Application to Sodium Chloride in Water and Alcohols. J. Chem. Phys. 2010;133:124504. doi: 10.1063/1.3478539. [DOI] [PubMed] [Google Scholar]
  146. Boothroyd S., Kerridge A., Broo A., Buttar D., Anwar J.. Solubility Prediction from First Principles: A Density of States Approach. Phys. Chem. Chem. Phys. 2018;20:20981–20987. doi: 10.1039/C8CP01786G. [DOI] [PubMed] [Google Scholar]
  147. Boothroyd S., Anwar J.. Solubility Prediction for a Soluble Organic Molecule via Chemical Potentials from Density of States. J. Chem. Phys. 2019;151:184113. doi: 10.1063/1.5117281. [DOI] [PubMed] [Google Scholar]
  148. Li L., Totton T., Frenkel D.. Computational Methodology for Solubility Prediction: Application to the Sparingly Soluble Solutes. J. Chem. Phys. 2017;146:214110. doi: 10.1063/1.4983754. [DOI] [PubMed] [Google Scholar]
  149. Kuentz M., Bergström C. A. S.. Synergistic Computational Modeling Approaches as Team Players in the Game of Solubility Predictions. J. Pharm. Sci. 2021;110:22–34. doi: 10.1016/j.xphs.2020.10.068. [DOI] [PubMed] [Google Scholar]
  150. Thakur T. S., Dubey R., Desiraju G. R.. Crystal Structure and Prediction. Annu. Rev. Phys. Chem. 2015;66:21–42. doi: 10.1146/annurev-physchem-040214-121452. [DOI] [PubMed] [Google Scholar]
  151. Horn H. W., Swope W. C., Pitera J. W., Madura J. D., Dick T. J., Hura G. L., Head-Gordon T.. Development of an improved four-site water model for biomolecular simulations: TIP4P-Ew. J. Chem. Phys. 2004;120:9665–9678. doi: 10.1063/1.1683075. [DOI] [PubMed] [Google Scholar]
  152. Joung I. S., III T. E. C.. Determination of Alkali and Halide Monovalent Ion Parameters for Use in Explicitly Solvated Biomolecular Simulations. J. Phys. Chem. B. 2008;112:9020–9041. doi: 10.1021/jp8001614. [DOI] [PMC free article] [PubMed] [Google Scholar]
  153. Berendsen H. J. C., Grigera J. R., Straatsma T. P.. The Missing Term in Effective Pair Potentials. J. Phys. Chem. 1987;91:6269–6271. doi: 10.1021/j100308a038. [DOI] [Google Scholar]
  154. Kolafa J.. Solubility of NaCl in Water and Its Melting Point by Molecular Dynamics in the Slab Geometry and a New BK3-compatible Force Field. J. Chem. Phys. 2016;145:204509. doi: 10.1063/1.4968045. [DOI] [PubMed] [Google Scholar]
  155. Kiss P. T., Baranyai A.. A New Polarizable Force Field for Alkali and Halide Ions. J. Chem. Phys. 2014;141:114501. doi: 10.1063/1.4895129. [DOI] [PubMed] [Google Scholar]
  156. Kiss P. T., Baranyai A.. A Systematic Development of a Polarizable Potential of Water. J. Chem. Phys. 2013;138:204507. doi: 10.1063/1.4807600. [DOI] [PubMed] [Google Scholar]
  157. Yagasaki T., Matsumoto M., Tanaka H.. Lennard-Jones Parameters Determined to Reproduce the Solubility of NaCl and KCl in SPC/E, TIP3P, and TIP4P/2005 Water. J. Chem. Theory Comput. 2020;16:2460–2473. doi: 10.1021/acs.jctc.9b00941. [DOI] [PubMed] [Google Scholar]
  158. Jorgensen W. L., Chandrasekhar J., Madura J. D., Impey R. W., Klein M. L.. Comparison of Simple Potential Functions for Simulating Liquid Water. J. Chem. Phys. 1983;79:926–935. doi: 10.1063/1.445869. [DOI] [Google Scholar]
  159. Abascal J. L. F., Sanz E., García Fernández R., Vega C.. A Potential Model for the Study of Ices and Amorphous Water: TIP4P/Ice. J. Chem. Phys. 2005;122:234511. doi: 10.1063/1.1931662. [DOI] [PubMed] [Google Scholar]
  160. Khanna V., Doherty M. F., Peters B.. Predicting solubility and driving forces for crystallization using the absolute chemical potential route. Mol. Phys. 2023;121:e2155595. doi: 10.1080/00268976.2022.2155595. [DOI] [Google Scholar]
  161. Mester Z., Panagiotopoulos A. Z.. Temperature-Dependent Solubilities and Mean Ionic Activity Coefficients of Alkali Halides in Water from Molecular Dynamics Simulations. J. Chem. Phys. 2015;143:044505. doi: 10.1063/1.4926840. [DOI] [PubMed] [Google Scholar]
  162. Gee M. B., Cox N. R., Jiao Y., Bentenitis N., Weerasinghe S., Smith P. E.. A Kirkwood-Buff Derived Force Field for Aqueous Alkali Halides. J. Chem. Theory Comput. 2011;7:1369–1380. doi: 10.1021/ct100517z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  163. Reiser S., Deublein S., Vrabec J., Hasse H.. Molecular Dispersion Energy Parameters for Alkali and Halide Ions in Aqueous Solution. J. Chem. Phys. 2014;140:044504. doi: 10.1063/1.4858392. [DOI] [PubMed] [Google Scholar]
  164. Smith D. E., Dang L. X.. Computer Simulations of NaCl Association in Polarizable Water. J. Chem. Phys. 1994;100:3757–3766. doi: 10.1063/1.466363. [DOI] [Google Scholar]
  165. Saravi S. H., Panagiotopoulos A. Z.. Activity Coefficients and Solubilities of NaCl in Water-Methanol Solutions from Molecular Dynamics Simulations. J. Phys. Chem. B. 2022;126:2891–2898. doi: 10.1021/acs.jpcb.2c00813. [DOI] [PubMed] [Google Scholar]
  166. Dočkal J., Lísal M., Moučka F.. Molecular Force Field Development for Aqueous Electrolytes: 2. Polarizable Models Incorporating Crystalline Chemical Potential and Their Accurate Simulations of Halite, Hydrohalite, Aqueous Solutions of NaCl, and Solubility. J. Chem. Theory Comput. 2020;16:3677–3688. doi: 10.1021/acs.jctc.0c00161. [DOI] [PubMed] [Google Scholar]
  167. Benavides A. L., Portillo M. A., Chamorro V. C., Espinosa J. R., Abascal J. L. F., Vega C.. A Potential Model for Sodium Chloride Solutions Based on the TIP4P/2005 Water Model. J. Chem. Phys. 2017;147:104501. doi: 10.1063/1.5001190. [DOI] [PubMed] [Google Scholar]
  168. Tainter C. J., Shi L., Skinner J. L.. Reparametrized E3B (Explicit Three-Body) Water Model Using the TIP4P/2005 Model as a Reference. J. Chem. Theory Comput. 2015;11:2268–2277. doi: 10.1021/acs.jctc.5b00117. [DOI] [PubMed] [Google Scholar]
  169. Jiang H., Moultos O. A., Economou I. G., Panagiotopoulos A. Z.. Hydrogen-Bonding Polarizable Intermolecular Potential Model for Water. J. Phys. Chem. B. 2016;120:12358–12370. doi: 10.1021/acs.jpcb.6b08205. [DOI] [PubMed] [Google Scholar]
  170. Young J. M., Tietz C., Panagiotopoulos A. Z.. Activity Coefficients and Solubility of CaCl2 from Molecular Simulations. J. Chem. Eng. Data. 2020;65:337–348. doi: 10.1021/acs.jced.9b00688. [DOI] [Google Scholar]
  171. Mamatkulov S., Fyta M., Netz R. R.. Force Fields for Divalent Cations Based on Single-Ion and Ion-Pair Properties. J. Chem. Phys. 2013;138:024505. doi: 10.1063/1.4772808. [DOI] [PubMed] [Google Scholar]
  172. Deublein S., Reiser S., Vrabec J., Hasse H.. A Set of Molecular Models for Alkaline-Earth Cations in Aqueous Solution. J. Phys. Chem. B. 2012;116:5448–5457. doi: 10.1021/jp3013514. [DOI] [PubMed] [Google Scholar]
  173. Martinek T., Duboué-Dijon E., Timr Š., Mason P. E., Baxová K., Fischer H. E., Schmidt B., Pluhařová E., Jungwirth P.. Calcium Ions in Aqueous Solutions: Accurate Force Field Description Aided by Ab Initio Molecular Dynamics and Neutron Scattering. J. Chem. Phys. 2018;148:222813. doi: 10.1063/1.5006779. [DOI] [PubMed] [Google Scholar]
  174. Khanna V., Doherty M. F., Peters B.. Absolute chemical potentials for complex molecules in fluid phases: A centroid reference for predicting phase equilibria. J. Chem. Phys. 2020;153:214504. doi: 10.1063/5.0025844. [DOI] [PubMed] [Google Scholar]
  175. Frenkel D., Ladd A. J. C.. New Monte Carlo Method to Compute the Free Energy of Arbitrary Solids. Application to the Fcc and Hcp Phases of Hard Spheres. J. Chem. Phys. 1984;81:3188–3193. doi: 10.1063/1.448024. [DOI] [Google Scholar]
  176. Bellucci M. A., Gobbo G., Wijethunga T. K., Ciccotti G., Trout B. L.. Solubility of Paracetamol in Ethanol by Molecular Dynamics Using the Extended Einstein Crystal Method and Experiments. J. Chem. Phys. 2019;150:094107. doi: 10.1063/1.5086706. [DOI] [PubMed] [Google Scholar]
  177. Reinhardt A., Chew P. Y., Cheng B.. A streamlined molecular-dynamics workflow for computing solubilities of molecular and ionic crystals. J. Chem. Phys. 2023;159:184110. doi: 10.1063/5.0173341. [DOI] [PubMed] [Google Scholar]
  178. Granberg R. A., Rasmuson A. C.. Solubility of Paracetamol in Pure Solvents. J. Chem. Eng. Data. 1999;44:1391–1395. doi: 10.1021/je990124v. [DOI] [Google Scholar]
  179. Wang F., Landau D. P.. Efficient, Multiple-Range Random Walk Algorithm to Calculate the Density of States. Phys. Rev. Lett. 2001;86:2050–2053. doi: 10.1103/PhysRevLett.86.2050. [DOI] [PubMed] [Google Scholar]
  180. Shell M. S., Debenedetti P. G., Panagiotopoulos A. Z.. Generalization of the Wang-Landau method for off-lattice simulations. Phys. Rev. E. Stat. Nonlin. Soft Matter Phys. 2002;66:056703. doi: 10.1103/PhysRevE.66.056703. [DOI] [PubMed] [Google Scholar]
  181. Mastny E. A., de Pablo J. J.. Direct calculation of solid-liquid equilibria from density-of-states Monte Carlo simulations. J. Chem. Phys. 2005;122:124109. doi: 10.1063/1.1874792. [DOI] [PubMed] [Google Scholar]
  182. Wang J., Wolf R. M., Caldwell J. W., Kollman P. A., Case D. A.. Development and testing of a general amber force field. J. Comput. Chem. 2004;25:1157–1174. doi: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
  183. Ben-Naim A.. Standard Thermodynamics of Transfer. Uses and Misuses. J. Phys. Chem. 1978;82:792–803. doi: 10.1021/j100496a008. [DOI] [Google Scholar]
  184. Ben-Naim A.. Solvation Thermodynamics of Nonionic Solutes. J. Chem. Phys. 1984;81:2016–2027. doi: 10.1063/1.447824. [DOI] [Google Scholar]
  185. Docherty R., Pencheva K., Abramov Y. A.. Low Solubility in Drug Development: De-Convoluting the Relative Importance of Solvation and Crystal Packing. J. Pharm. Pharmacol. 2015;67:847–856. doi: 10.1111/jphp.12393. [DOI] [PubMed] [Google Scholar]
  186. Perlovich G. L., Raevsky O. A.. Sublimation of Molecular Crystals: Prediction of Sublimation Functions on the Basis of HYBOT Physicochemical Descriptors and Structural Clusterization. Cryst. Growth Des. 2010;10:2707–2712. doi: 10.1021/cg1001946. [DOI] [Google Scholar]
  187. Abramov Y. A.. Major Source of Error in QSPR Prediction of Intrinsic Thermodynamic Solubility of Drugs: Solid vs Nonsolid State Contributions? Mol. Pharmaceutics. 2015;12:2126–2141. doi: 10.1021/acs.molpharmaceut.5b00119. [DOI] [PubMed] [Google Scholar]
  188. Stone A.. Distributed Multipole Analysis, or How to Describe a Molecular Charge Distribution. Chem. Phys. Lett. 1981;83:233–239. doi: 10.1016/0009-2614(81)85452-8. [DOI] [Google Scholar]
  189. Price S. L., Leslie M., Welch G. W. A., Habgood M., Price L. S., Karamertzanis P. G., Day G. M.. Modelling Organic Crystal Structures Using Distributed Multipole and Polarizability-Based Model Intermolecular Potentials. Phys. Chem. Chem. Phys. 2010;12:8478–8490. doi: 10.1039/c004164e. [DOI] [PubMed] [Google Scholar]
  190. Gavezzotti, A. ; Filippini, G. . Theoretical Aspects and Computer Modeling of the Molecular Solid State; Wiley and Sons: Chichester, U.K., 1997; pp 61–99. [Google Scholar]
  191. Palmer D. S., Llinàs A., Morao I., Day G. M., Goodman J. M., Glen R. C., Mitchell J. B. O.. Predicting Intrinsic Aqueous Solubility by a Thermodynamic Cycle. Mol. Pharmaceutics. 2008;5:266–279. doi: 10.1021/mp7000878. [DOI] [PubMed] [Google Scholar]
  192. Abramov Y. A., Sun G., Zeng Q., Zeng Q., Yang M.. Guiding Lead Optimization for Solubility Improvement with Physics-Based Modeling. Mol. Pharmaceutics. 2020;17:666–673. doi: 10.1021/acs.molpharmaceut.9b01138. [DOI] [PubMed] [Google Scholar]
  193. Palmer D. S., McDonagh J. L., Mitchell J. B. O., van Mourik T., Fedorov M. V.. First-Principles Calculation of the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules. J. Chem. Theory Comput. 2012;8:3322–3337. doi: 10.1021/ct300345m. [DOI] [PubMed] [Google Scholar]
  194. McDonagh J. L., Palmer D. S., van Mourik T., Mitchell a. J. B. O.. Are the Sublimation Thermodynamics of Organic Molecules Predictable? J. Chem. Inf. Model. 2016;56:2162–2179. doi: 10.1021/acs.jcim.6b00033. [DOI] [PubMed] [Google Scholar]
  195. Buchholz H. K., Hylton R. K., Brandenburg J. G., Seidel-Morgenstern A., Lorenz H., Stein M., Price S. L.. Thermochemistry of Racemic and Enantiopure Organic Crystals for Predicting Enantiomer Separation. Cryst. Growth Des. 2017;17:4676–4686. doi: 10.1021/acs.cgd.7b00582. [DOI] [Google Scholar]
  196. Otero-de-la-Roza A., Johnson E. R.. A Benchmark for Non-Covalent Interactions in Solids. J. Chem. Phys. 2012;137:054103. doi: 10.1063/1.4738961. [DOI] [PubMed] [Google Scholar]
  197. Reilly A. M., Tkatchenko A.. Understanding the Role of Vibrations, Exact Exchange, and Many-Body van Der Waals Interactions in the Cohesive Properties of Molecular Crystals. J. Chem. Phys. 2013;139:024705. doi: 10.1063/1.4812819. [DOI] [PubMed] [Google Scholar]
  198. Hoja J., Tkatchenko A.. First-Principles Stability Ranking of Molecular Crystal Polymorphs with the DFT+MBD Approach. Faraday Discuss. 2018;211:253–274. doi: 10.1039/C8FD00066B. [DOI] [PubMed] [Google Scholar]
  199. Fowles D. J., Palmer D. S., Guo R., Price S. L., Mitchell J. B.. Toward Physics-Based Solubility Computation for Pharmaceuticals to Rival Informatics. J. Chem. Theory Comput. 2021;17:3700–3709. doi: 10.1021/acs.jctc.1c00130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  200. Iuzzolino L., McCabe P., Price S. L., Brandenburg J. G.. Crystal Structure Prediction of Flexible Pharmaceutical-like Molecules: Density Functional Tight-Binding as an Intermediate Optimisation Method and for Free Energy Estimation. Faraday Discuss. 2018;211:275–296. doi: 10.1039/C8FD00010G. [DOI] [PubMed] [Google Scholar]
  201. Brandenburg J. G., Grimme S.. Accurate Modeling of Organic Molecular Crystals by DispersionCorrected Density Functional Tight Binding (DFTB) J. Phys. Chem. Lett. 2014;5:1785–1789. doi: 10.1021/jz500755u. [DOI] [PubMed] [Google Scholar]
  202. Brandenburg J. G., Potticary J., Sparkes H. A., Price S. L., Hall S. R.. Thermal Expansion of Carbamazepine: Systematic Crystallographic Measurements Challenge Quantum Chemical Calculations. J. Phys. Chem. Lett. 2017;8:4319–4324. doi: 10.1021/acs.jpclett.7b01944. [DOI] [PubMed] [Google Scholar]
  203. Heit Y. N., Beran G. J. O.. How Important Is Thermal Expansion for Predicting Molecular Crystal Structures and Thermochemistry at Finite Temperatures? Acta. Crystallogr. B Struct. Sci. Cryst. Eng. Mater. 2016;72:514–529. doi: 10.1107/S2052520616005382. [DOI] [PubMed] [Google Scholar]
  204. Cervinka C., Fulem M., Ružička K.. CCSD­(T)/CBS fragment-based calculations of lattice energy of molecular crystals. J. Chem. Phys. 2016;144:064505. doi: 10.1063/1.4941055. [DOI] [PubMed] [Google Scholar]
  205. Cervinka C., Fulem M.. State-of-the-Art Calculations of Sublimation Enthalpies for Selected Molecular Crystals and Their Computational Uncertainty. J. Chem. Theory Comput. 2017;13:2840–2850. doi: 10.1021/acs.jctc.7b00164. [DOI] [PubMed] [Google Scholar]
  206. Cervinka C., Fulem M.. Probing the Accuracy of First-Principles Modeling of Molecular Crystals: Calculation of Sublimation Pressures. Cryst. Growth Des. 2019;19:808–820. doi: 10.1021/acs.cgd.8b01374. [DOI] [Google Scholar]
  207. Beran G. J. O.. Frontiers of molecular crystal structure prediction for pharmaceuticals and functional organic materials. Chem. Sci. 2023;14:13290. doi: 10.1039/D3SC03903J. [DOI] [PMC free article] [PubMed] [Google Scholar]
  208. Hoja J., Tkatchenko A., Ko H., Car R., Neumann M., DiStasio R.. Reliable and Practical Computational Description of Molecular Crystal Polymorphs. Sci. Adv. 2019;5:eaau3338. doi: 10.1126/sciadv.aau3338. [DOI] [PMC free article] [PubMed] [Google Scholar]
  209. Greenwell C., Beran G. J. O.. Inaccurate Conformational Energies Still Hinder Crystal Structure Prediction in Flexible Organic Molecules. Crys. Growth Des. 2020;20:4875–4881. doi: 10.1021/acs.cgd.0c00676. [DOI] [Google Scholar]
  210. Beran G. J. O., Wright S. E., Greenwell C., Cruz-Cabeza A. J.. The interplay of intra- and intermolecular errors in modeling conformational polymorphs. J. Chem. Phys. 2022;156:104112. doi: 10.1063/5.0088027. [DOI] [PubMed] [Google Scholar]
  211. Firaha D., Liu Y. M., van de Streek J., Sasikumar K., Dietrich H., Helfferich J., Aerts L., Braun D. E., Broo A., DiPasquale A. G.. et al. Predicting crystal form stability under real-world conditions. Nature. 2023;623:324–328. doi: 10.1038/s41586-023-06587-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  212. Maddox J.. Crystals from First Principles. Nature. 1988;335:201–201. doi: 10.1038/335201a0. [DOI] [Google Scholar]
  213. Nordstrom F. L., Rasmuson A. C.. Prediction of solubility curves and melting properties of organic and pharmaceutical compounds. Eur. J. Pharm. Sci. 2009;36:330–344. doi: 10.1016/j.ejps.2008.10.009. [DOI] [PubMed] [Google Scholar]
  214. Cao Z., Hu Y., Li J., Kai Y., Yang W.. Solubility of glycine in binary system of ethanol + water solvent mixtures: Experimental data and thermodynamic modeling. Fluid Ph. Equilib. 2013;360:156–160. doi: 10.1016/j.fluid.2013.09.013. [DOI] [Google Scholar]
  215. Renon H., Prausnitz J. M.. Estimation of Parameters for the NRTL Equation for Excess Gibbs Energies of Strongly Nonideal Liquid Mixtures. Ind. Eng. Chem. Process Des. Dev. 1969;8:413–419. doi: 10.1021/i260031a019. [DOI] [Google Scholar]
  216. Abrams D. S., Prausnitz J. M.. Statistical Thermodynamics of Liquid Mixtures: A New Expression for the Excess Gibbs Energy of Partly or Completely Miscible Systems. AIChE J. 1975;21:116–128. doi: 10.1002/aic.690210115. [DOI] [Google Scholar]
  217. Fredenslund A., Jones R. L., Prausnitz J. M.. Group-Contribution Estimation of Activity Coefficients in Nonideal Liquid Mixtures. AIChE J. 1975;21:1086–1099. doi: 10.1002/aic.690210607. [DOI] [Google Scholar]
  218. Gross J., Sadowski G.. Perturbed-Chain SAFT: An Equation of State Based on a Perturbation Theory for Chain Molecules. Ind. Eng. Chem. Res. 2001;40:1244–1260. doi: 10.1021/ie0003887. [DOI] [Google Scholar]
  219. Gill, P. ; Moghadam, T. T. ; Ranjbar, B. . Differential Scanning Calorimetry Techniques: Applications in Biology and Nanoscience. J. Biomol. Technol. 2010, 21, 167–193. [PMC free article] [PubMed] [Google Scholar]
  220. Chua Y. Z., Do H. T., Schick C., Zaitsau D., Held C.. New Experimental Melting Properties as Access for Predicting Amino-Acid Solubility. RSC Adv. 2018;8:6365–6372. doi: 10.1039/C8RA00334C. [DOI] [PMC free article] [PubMed] [Google Scholar]
  221. Do H. T., Chua Y. Z., Kumar A., Pabsch D., Hallermann M., Zaitsau D., Schick C., Held C.. Melting Properties of Amino Acids and Their Solubility in Water. RSC Adv. 2020;10:44205–44215. doi: 10.1039/D0RA08947H. [DOI] [PMC free article] [PubMed] [Google Scholar]
  222. Wassvik C. M., Holmen A. G., Bergstrom C. A., Zamora I., Artursson P.. Contribution of Solid-State Properties to the Aqueous Solubility of Drugs. Eur. J. Pharm. Sci. 2006;29:294–305. doi: 10.1016/j.ejps.2006.05.013. [DOI] [PubMed] [Google Scholar]
  223. O’Boyle N. M., Palmer D. S., Nigsch F., Mitchell J. B. O.. Simultaneous Feature Selection and Parameter Optimisation Using an Artificial Ant Colony: Case Study of Melting Point Prediction. Chem. Cent. J. 2008;2:21. doi: 10.1186/1752-153X-2-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  224. Westergren J., Lindfors L., Hoglund T., Luder K., Nordholm S., Kjellander R.. In Silico Prediction of Drug Solubility: 1. Free Energy of Hydration. J. Phys. Chem. B. 2007;111:1872–1882. doi: 10.1021/jp064220w. [DOI] [PubMed] [Google Scholar]
  225. Lüder K., Lindfors L., Westergren J., Nordholm S., Kjellander R.. In Silico Prediction of Drug Solubility: 2. Free Energy of Solvation in Pure Melts. J. Phys. Chem. B. 2007;111:1883–1892. doi: 10.1021/jp0642239. [DOI] [PubMed] [Google Scholar]
  226. Lüder K., Lindfors L., Westergren J., Nordholm S., Kjellander R.. In Silico Prediction of Drug Solubility. 3. Free Energy of Solvation in Pure Amorphous Matter. J. Phys. Chem. B. 2007;111:7303–7311. doi: 10.1021/jp071687d. [DOI] [PubMed] [Google Scholar]
  227. Lüder K., Lindfors L., Westergren J., Nordholm S., Persson R., Pedersen M.. In Silico Prediction of Drug Solubility: 4. Will Simple Potentials Suffice? J. Comput. Chem. 2009;30:1859–1871. doi: 10.1002/jcc.21173. [DOI] [PubMed] [Google Scholar]
  228. Vinutha H., Frenkel D.. Computation of the chemical potential and solubility of amorphous solids. J. Chem. Phys. 2021;154:124502. doi: 10.1063/5.0038955. [DOI] [PubMed] [Google Scholar]
  229. Thakore S. D., Akhtar J., Jain R., Paudel A., Bansal A. K.. Analytical and Computational Methods for the Determination of Drug-Polymer Solubility and Miscibility. Mol. Pharmaceutics. 2021;18:2835–2866. doi: 10.1021/acs.molpharmaceut.1c00141. [DOI] [PubMed] [Google Scholar]
  230. Walden D. M., Bundey Y., Jagarapu A., Antontsev V., Chakravarty K., Varshney J.. Molecular Simulation and Statistical Learning Methods toward Predicting Drug-Polymer Amorphous Solid Dispersion Miscibility, Stability, and Formulation Design. Molecules. 2021;26:182. doi: 10.3390/molecules26010182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  231. Pandi P., Bulusu R., Kommineni N., Khan W., Singh M.. Amorphous solid dispersions: An update for preparation, characterization, mechanism on bioavailability, stability, regulatory considerations and marketed products. Int. J. Pharm. 2020;586:119560. doi: 10.1016/j.ijpharm.2020.119560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  232. Anderson B. D.. Predicting Solubility/Miscibility in Amorphous Dispersions: It Is Time to Move Beyond Regular Solution Theories. J. Pharm. Sci. 2018;107:24–33. doi: 10.1016/j.xphs.2017.09.030. [DOI] [PubMed] [Google Scholar]
  233. Paus R., Ji Y., Vahle L., Sadowski G.. Predicting the Solubility Advantage of Amorphous Pharmaceuticals: A Novel Thermodynamic Approach. Mol. Pharmaceutics. 2015;12:2823–2833. doi: 10.1021/mp500824d. [DOI] [PubMed] [Google Scholar]
  234. Chakravarty P., Lubach J. W., Hau J., Nagapudi K.. A Rational Approach towards Development of Amorphous Solid Dispersions: Experimental and Computational techniques. Int. J. Pharm. 2017;519:44–57. doi: 10.1016/j.ijpharm.2017.01.003. [DOI] [PubMed] [Google Scholar]
  235. Gobbo D., Ballone P., Decherchi S., Cavalli A.. Solubility Advantage of Amorphous Ketoprofen. Thermodynamic and Kinetic Aspects by Molecular Dynamics and Free Energy Approaches. J. Chem. Theory Comput. 2020;16:4126–4140. doi: 10.1021/acs.jctc.0c00166. [DOI] [PubMed] [Google Scholar]
  236. Hancock B. C., York P., Rowe R. C.. The use of solubility parameters in pharmaceutical dosage form design. Int. J. Pharm. 1997;148:1–21. doi: 10.1016/S0378-5173(96)04828-4. [DOI] [Google Scholar]
  237. Xiang T.-X., Anderson B. D.. Effects of Molecular Interactions on Miscibility and Mobility of Ibuprofen in Amorphous Solid Dispersions With Various Polymers. J. Pharm. Sci. 2019;108:178–186. doi: 10.1016/j.xphs.2018.10.052. [DOI] [PubMed] [Google Scholar]
  238. DeBoyace K., Wildfong P. L.. The Application of Modeling and Prediction to the Formation and Stability of Amorphous Solid Dispersions. J. Pharm. Sci. 2018;107:57–74. doi: 10.1016/j.xphs.2017.03.029. [DOI] [PubMed] [Google Scholar]
  239. Marenich A. V., Olson R. M., Kelly C. P., Cramer C. J., Truhlar D. G.. Self-Consistent Reaction Field Model for Aqueous and Nonaqueous Solutions Based on Accurate Polarized Partial Charges. J. Chem. Theory Comput. 2007;3:2011–2033. doi: 10.1021/ct7001418. [DOI] [PubMed] [Google Scholar]
  240. Cramer C. J., Truhlar D. G.. A Universal Approach to Solvation Modeling. Acc. Chem. Res. 2008;41:760–768. doi: 10.1021/ar800019z. [DOI] [PubMed] [Google Scholar]
  241. Marenich A. V., Cramer C. J., Truhlar D. G.. Generalized Born Solvation Model SM12. J. Chem. Theory Comput. 2013;9:609–620. doi: 10.1021/ct300900e. [DOI] [PubMed] [Google Scholar]
  242. Marenich A. V., Cramer C. J., Truhlar D. G.. Universal Solvation Model Based on Solute Electron Density and on a Continuum Model of the Solvent Defined by the Bulk Dielectric Constant and Atomic Surface Tensions. J. Phys. Chem. B. 2009;113:6378–6396. doi: 10.1021/jp810292n. [DOI] [PubMed] [Google Scholar]
  243. Ribeiro R. F., Marenich A. V., Cramer C. J., Truhlar D. G.. Prediction of SAMPL2 Aqueous Solvation Free Energies and Tautomeric Ratios Using the SM8, SM8AD, and SMD Solvation Models. J. Comput. Aided. Mol. Des. 2010;24:317–333. doi: 10.1007/s10822-010-9333-9. [DOI] [PubMed] [Google Scholar]
  244. König G., Mei Y., IV F. C. P., Simmonett A. C., Miller B. T., Herbert J. M., Woodcock H. L., Brooks B. R., Shao Y.. Computation of Hydration Free Energies Using the Multiple Environment Single System Quantum Mechanical/Molecular Mechanical (MESS-QM/MM) Method. J. Chem. Theory Comput. 2016;12:332–344. doi: 10.1021/acs.jctc.5b00874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  245. Ho J.. Are Thermodynamic Cycles Necessary for Continuum Solvent Calculation of pKas and Reduction Potentials? Phys. Chem. Chem. Phys. 2015;17:2859–2868. doi: 10.1039/C4CP04538F. [DOI] [PubMed] [Google Scholar]
  246. Kromann J. C., Steinmann C., Jensen J. H.. Improving Solvation Energy Predictions Using the SMD Solvation Method and Semiempirical Electronic Structure Methods. J. Chem. Phys. 2018;149:104102. doi: 10.1063/1.5047273. [DOI] [PubMed] [Google Scholar]
  247. Zanith C. C., Jr J. R. P.. Performance of the SMD and SM8Models for Predicting Solvation Free Energy of Neutral Solutes in Methanol, Dimethyl Sulfoxide and Acetonitrile. J. Comput. Aided Mol. Des. 2015;29:217–224. doi: 10.1007/s10822-014-9814-3. [DOI] [PubMed] [Google Scholar]
  248. Mirzaei S., Ivanov M. V., Timerghazin Q. K.. Improving Performance of the SMD Solvation Model: Bondi Radii Improve Predicted Aqueous Solvation Free Energies of Ions and pKa Values of Thiols. J. Phys. Chem. A. 2019;123:9498–9504. doi: 10.1021/acs.jpca.9b02340. [DOI] [PubMed] [Google Scholar]
  249. Miguel E. L. M., Santos C. I. L., Silva C. M., Jr J. R. P.. How Accurate Is the SMD Model for Predicting Free Energy Barriers for Nucleophilic Substitution Reactions in Polar Protic and Dipolar Aprotic Solvents? J. Braz. Chem. Soc. 2016;27:2055–2061. doi: 10.5935/0103-5053.20160095. [DOI] [Google Scholar]
  250. Miertus S., Scrocco E., Tomasi J.. Electrostatic Interaction of a Solute with a Continuum. A Direct Utilizaion of AB Initio Molecular Potentials for the Prevision of Solvent Effects. Chem. Phys. 1981;55:117–129. doi: 10.1016/0301-0104(81)85090-2. [DOI] [Google Scholar]
  251. Cossi M., Rega N., Scalmani G., Barone V.. Energies, Structures, and Electronic Properties of Molecules in Solution with the C-PCM Solvation Model. J. Comput. Chem. 2003;24:669–681. doi: 10.1002/jcc.10189. [DOI] [PubMed] [Google Scholar]
  252. Takano Y., Houk K. N.. Benchmarking the Conductor-like Polarizable Continuum Model (CPCM) for Aqueous Solvation Free Energies of Neutral and Ionic Organic Molecules. J. Chem. Theory Comput. 2005;1:70–77. doi: 10.1021/ct049977a. [DOI] [PubMed] [Google Scholar]
  253. Gutowski K. E., Dixon D. A.. Predicting the Energy of the Water Exchange Reaction and Free Energy of Solvation for the Uranyl Ion in Aqueous Solution. J. Phys. Chem. A. 2006;110:8840–8856. doi: 10.1021/jp061851h. [DOI] [PubMed] [Google Scholar]
  254. Ho J., Ertem M. Z.. Calculating Free Energy Changes in Continuum Solvation Models. J. Phys. Chem. B. 2016;120:1319–1329. doi: 10.1021/acs.jpcb.6b00164. [DOI] [PubMed] [Google Scholar]
  255. Cances E., Mennucci B., Tomasi J.. A New Integral Equation Formalism for the Polarizable Continuum Model: Theoretical Background and Applications to Isotropic and Anisotropic Dielectrics. J. Chem. Phys. 1997;107:3032. doi: 10.1063/1.474659. [DOI] [Google Scholar]
  256. Cances E., Mennucci B.. New Applications of Integral Equations Methods for Solvation Continuum Models: Ionic Solutions and Liquid Crystals. J. Math. Chem. 1998;23:309–326. doi: 10.1023/A:1019133611148. [DOI] [Google Scholar]
  257. Meunier A., Truchon J.-F.. Predictions of Hydration Free Energies from Continuum Solvent with Solute Polarizable Models: The SAMPL2 Blind Challenge. J. Comput. Aided Mol. Des. 2010;24:361–372. doi: 10.1007/s10822-010-9339-3. [DOI] [PubMed] [Google Scholar]
  258. Viayna A., Pinheiro S., Curutchet C., Luque F. J., Zamora W. J.. Prediction of N-octanol/Water Partition Coefcients and Acidity Constants (pKa) in the SAMPL7 Blind Challenge with the IEFPCM-MST Model. J. Comput. Aided Mol. Des. 2021;35:803–811. doi: 10.1007/s10822-021-00394-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  259. Pickard F. C. IV, Konig G., Simmonett A. C., Shao Y., Brooks B. R.. An Efficient Protocol for Obtaining Accurate Hydration Free Energies Using Quantum Chemistry and Reweighting from Molecular Dynamics Simulations. Bioorg. Med. Chem. 2016;24:4988–4997. doi: 10.1016/j.bmc.2016.08.031. [DOI] [PMC free article] [PubMed] [Google Scholar]
  260. Genheden S., Mikulskis P., Hu L., Kongsted J., Soderhjelm P., Ryde U.. Accurate Predictions of Nonpolar Solvation Free Energies Require Explicit Consideration of Binding-Site Hydration. J. Am. Chem. Soc. 2011;133:13081–13092. doi: 10.1021/ja202972m. [DOI] [PubMed] [Google Scholar]
  261. Klamt A., Schuurmann G.. COSMO: A New Approach to Dielectric Screening in Solvents with Explicit Expressions for the Screening Energy and Its Gradient. J. Chem. Soc. 1993;2:799–805. doi: 10.1039/P29930000799. [DOI] [Google Scholar]
  262. Klamt A., Diedenhofen M.. Calculation of Solvation Free Energies with DCOSMO-RS. J. Phys. Chem. A. 2015;119:5439–5445. doi: 10.1021/jp511158y. [DOI] [PubMed] [Google Scholar]
  263. Eckert F., Klamt A.. Fast Solvent Screening via Quantum Chemistry:COSMO-RS Approach. AIChE. 2002;48:369–385. doi: 10.1002/aic.690480220. [DOI] [Google Scholar]
  264. Klamt A., Mennucci B., Tomasi J., Barone V., Curutchet C., Orozco M., Luque F. J.. On the Performance of Continuum Solvation Methods. A Comment on “Universal Approaches to Solvation Modeling”. Acc. Chem. Res. 2009;42:489–492. doi: 10.1021/ar800187p. [DOI] [PubMed] [Google Scholar]
  265. Klamt A.. The COSMO and COSMO-RS Solvation Models. WIREs Comput. Mol. Sci. 2018;8:e1338. doi: 10.1002/wcms.1338. [DOI] [Google Scholar]
  266. Sinnecker S., Rajendran A., Klamt A., Diedenhofen M., Neese F.. Calculation of Solvent Shifts on Electronic G-Tensors with the Conductor-like Screening Model (COSMO) and Its Self-Consistent Generalization to Real Solvents (Direct COSMO-RS) J. Phys. Chem. A. 2006;110:2235–3345. doi: 10.1021/jp056016z. [DOI] [PubMed] [Google Scholar]
  267. Klamt A., Eckert F., Reinisch J., Wichmann K.. Prediction of Cyclohexane-Water Distribution Coefficients with COSMO-RS on the SAMPL5 Data Set. J. Comput. Aided Mol. Des. 2016;30:959–967. doi: 10.1007/s10822-016-9927-y. [DOI] [PubMed] [Google Scholar]
  268. Klamt A., Diedenhofen M.. Blind Prediction Test of Free Energies of Hydration with COSMO-RS. J. Comput. Aided Mol. Des. 2010;24:357–360. doi: 10.1007/s10822-010-9354-4. [DOI] [PubMed] [Google Scholar]
  269. Wlazlo M., Alevizou E. I., Voutsas E. C., Domanska U.. Prediction of Ionic Liquids Phase Equilibrium with the COSMO-RS Model. Fluid Ph. Equilib. 2016;424:16–31. doi: 10.1016/j.fluid.2015.08.032. [DOI] [Google Scholar]
  270. Song Z., Wang J., Sundmacher K.. Evaluation of COSMO-RS for Solid-Liquid Equilibria Prediction of Binary Eutectic Solvent Systems. Green Energy Env. 2021;6:371–379. doi: 10.1016/j.gee.2020.11.020. [DOI] [Google Scholar]
  271. Andersson M. P., Bennetzen M. V., Klamt A., Stipp S. L. S.. First-Principles Prediction of Liquid/Liquid Interfacial Tension. J. Chem. Theory Comput. 2014;10:3401–3408. doi: 10.1021/ct500266z. [DOI] [PubMed] [Google Scholar]
  272. Scheffczyk J., Schäfer P., Fleitmann L., Thien J., Redepenning C., Leonhard K., Marquardt W., Bardow A.. COSMO-CAMPD: A Framework for Integrated Design of Molecules and Processes Based on COSMO-RS. Mol. Syst. Des. Eng. 2018;3:645–657. doi: 10.1039/C7ME00125H. [DOI] [Google Scholar]
  273. Garcia-Chavez L. Y., Hermans A. J., Schuur B., de Haan A. B.. COSMO-RS Assisted Solvent Screening for Liquid-Liquid Extraction of Mono Ethylene Glycol from Aqueous Streams. Sep. Purif. Technol. 2012;97:2–10. doi: 10.1016/j.seppur.2011.11.041. [DOI] [Google Scholar]
  274. Pozarska A., da Costa Mathews C., Wong M., Pencheva K.. Application of COSMO-RS as an Excipient Ranking Tool in Early Formulation Development. Eur. J. Pharm. Sci. 2013;49:505–511. doi: 10.1016/j.ejps.2013.04.021. [DOI] [PubMed] [Google Scholar]
  275. Paloncyova M., DeVane R., Murch B., Berka K., Otyepka M.. Amphiphilic Drug-like Molecules Accumulate in a Membrane below the Head Group Region. J. Phys. Chem. B. 2014;118:1030–1039. doi: 10.1021/jp4112052. [DOI] [PubMed] [Google Scholar]
  276. Mey A. S. J. S., Allen B. K., Macdonald H. E. B., Chodera J. D., Hahn D. F., Kuhn M., Michel J., Mobley D. L., Naden L. N., Prasad S.. et al. Best Practices for Alchemical Free Energy Calculations. Living J. Comput. Mol. Sci. 2020;2:18378. doi: 10.33011/livecoms.2.1.18378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  277. Chodera J. D., Mobley D. L., Shirts M. R., Dixon R. W., Bransond K., Pande V. S.. Alchemical free energy methods for drug discovery: Progress and challenges. Curr. Opin. Struct. Biol. 2011;21:150–160. doi: 10.1016/j.sbi.2011.01.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  278. Shivakumar D., Williams J., Wu Y., Damm W., Shelley J., Sherman W.. Prediction of Absolute Solvation Free Energies Using Molecular Dynamics Free Energy Perturbation and the OPLS Force Field. J. Chem. Theory Comput. 2010;6:1509–1519. doi: 10.1021/ct900587b. [DOI] [PubMed] [Google Scholar]
  279. Sadowsky D., Arey J. S.. Prediction of Aqueous Free Energies of Solvation Using Coupled QM and MM Explicit Solvent Simulations. Phys. Chem. Chem. Phys. 2020;22:8021. doi: 10.1039/D0CP00582G. [DOI] [PubMed] [Google Scholar]
  280. Leung K., Rempe S. B., von Lilienfeld O. A.. Ab Initio Molecular Dynamics Calculations of Ion Hydration Free Energies. J. Chem. Phys. 2009;130:204507. doi: 10.1063/1.3137054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  281. Chaudhari M. I., Pratt L. R., Rempe S. B.. Utility of Chemical Computations in Predicting Solution Free Energies of Metal Ions. Mol. Simul. 2018;44:110–116. doi: 10.1080/08927022.2017.1342127. [DOI] [Google Scholar]
  282. Li J., Wang F.. Accurate Prediction of the Hydration Free Energies of 20 Salts through Adaptive Force Matching and the Proper Comparison with Experimental References. J. Phys. Chem. B. 2017;121:6637–6645. doi: 10.1021/acs.jpcb.7b04618. [DOI] [PMC free article] [PubMed] [Google Scholar]
  283. Riquelme M., Lara A., Mobley D. L., Verstraelen T., Matamala A. R., Vöhringer-Martinez E.. Hydration Free Energies in the FreeSolv Database Calculated with Polarized Iterative Hirshfeld Charges. J. Chem. Inf. Model. 2018;58:1779–1797. doi: 10.1021/acs.jcim.8b00180. [DOI] [PMC free article] [PubMed] [Google Scholar]
  284. Shi Y., Xia Z., Zhang J., Best R., Wu C., Ponder J. W., Ren P.. The Polarizable Atomic Multipole-Based AMOEBA Force Field for Proteins. J. Chem. Theory. Comput. 2013;9:4046–4063. doi: 10.1021/ct4003702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  285. Manzoni F., Soderhjelm P.. Prediction of Hydration Free Energies for the SAMPL4 Data Set with the AMOEBA Polarizable Force Field. J. Comput. Aided Mol. Des. 2014;28:235–244. doi: 10.1007/s10822-014-9733-3. [DOI] [PubMed] [Google Scholar]
  286. Bradshaw R. T., Essex J. W.. Evaluating Parametrization Protocols for Hydration Free Energy Calculations with the AMOEBA Polarizable Force Field. J. Chem. Theory Comput. 2016;12:3871–3883. doi: 10.1021/acs.jctc.6b00276. [DOI] [PubMed] [Google Scholar]
  287. Walker B., Liu C., Wait E., Ren P.. Automation of AMOEBA Polarizable Force Field for Smallmolecules: Poltype 2. J. Comput. Chem. 2022;43:1530–1542. doi: 10.1002/jcc.26954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  288. Luukkonen S., Belloni L., Borgis D., Levesque M.. Predicting Hydration Free Energies of the FreeSolv Database of Drug-like Molecules with Molecular Density Functional Theory. J. Chem. Inf. Model. 2020;60:3558–3565. doi: 10.1021/acs.jcim.0c00526. [DOI] [PubMed] [Google Scholar]
  289. Luukkonen S., Levesque M., Belloni L., Borgis D.. Hydration Free Energies and Solvation Structures with Molecular Density Functional Theory in the Hypernetted Chain Approximation. J. Chem. Phys. 2020;152:064110. doi: 10.1063/1.5142651. [DOI] [PubMed] [Google Scholar]
  290. Borgis D., Luukkonen S., Belloni L., Jeanmairet G.. Simple Parameter-Free Bridge Functionals for Molecular Density Functional Theory. Application to Hydrophobic Solvation. J. Phys. Chem. B. 2020;124:6885–6893. doi: 10.1021/acs.jpcb.0c04496. [DOI] [PubMed] [Google Scholar]
  291. Borgis D., Luukkonen S., Belloni L., Jeanmairet G.. Accurate Prediction of Hydration Free Energies and Solvation Structures Using Molecular Density Functional Theory with a Simple Bridge Functional. J. Chem. Phys. 2021;155:024117. doi: 10.1063/5.0057506. [DOI] [PubMed] [Google Scholar]
  292. Chandler D., Andersen H. C.. Optimized Cluster Expansions for Classical Fluids. II. Theory of Molecular Liquids. J. Chem. Phys. 1972;57:1930–1937. doi: 10.1063/1.1678513. [DOI] [Google Scholar]
  293. Singer S. J., Chandler D.. Free Energy Functions in the Extended RISM Approximation. Mol. Phys. 1985;55:621–625. doi: 10.1080/00268978500101591. [DOI] [Google Scholar]
  294. Hirata, F. Molecular Theory of Solvation; Springer: Dordrecht, The Netherlands, 2003. [Google Scholar]
  295. Sato K., Chuman H., Ten-no S.. Comparative Study on Solvation Free Energy Expressions in Reference Interaction Site Model Integral Equation Theory. J. Phys. Chem. B. 2005;109:17290–17295. doi: 10.1021/jp053259i. [DOI] [PubMed] [Google Scholar]
  296. Chandler D., Singh Y., Richardson D. M.. Excess Electrons in Simple Fluids. I. General Equilibrium Theory for Classical Hard Sphere Solvents. J. Chem. Phys. 1984;81:1975. doi: 10.1063/1.447820. [DOI] [Google Scholar]
  297. Ten-no S., Iwata S.. On the Connection between the Reference Interaction Site Model Integral Equation Theory and the Partial Wave Expansion of the Molecular Ornstein-Zernike Equation. J. Chem. Phys. 1999;111:4865. doi: 10.1063/1.479746. [DOI] [Google Scholar]
  298. Palmer D. S., Sergiievskyi V. P., Jensen F., Fedorov M. V.. Accurate Calculations of the Hydration Free Energies of Druglike Molecules Using the Reference Interaction Site Model. J. Chem. Phys. 2010;133:044104. doi: 10.1063/1.3458798. [DOI] [PubMed] [Google Scholar]
  299. Palmer D. S., Chuev G. N., Ratkova E. L., Fedorov M. V.. In Silico Screening of Bioactive and Biomimetic Solutes Using Molecular Integral Equation Theory. Curr. Pharm. Des. 2011;17:1695–1708. doi: 10.2174/138161211796355065. [DOI] [PubMed] [Google Scholar]
  300. Chuev G. N., Fedorov M. V., Crain J.. Improved Estimates for Hydration Free Energy Obtained by the Reference Interaction Site Model. Chem. Phys. Lett. 2007;448:198–202. doi: 10.1016/j.cplett.2007.10.003. [DOI] [Google Scholar]
  301. Frolov A. I., Ratkova E. L., Palmer D. S., Fedorov M. V.. Hydration Thermodynamics Using the Reference Interaction Site Model: Speed or Accuracy? J. Phys. Chem. B. 2011;115:6011–6022. doi: 10.1021/jp111271c. [DOI] [PubMed] [Google Scholar]
  302. Ratkova E. L., Chuev G. N., Sergiievskyi V. P., Fedorov M. V.. An Accurate Prediction of Hydration Free Energies by Combination of Molecular Integral Equations Theory with Structural Descriptors. J. Phys. Chem. B. 2010;114:12068–12079. doi: 10.1021/jp103955r. [DOI] [PubMed] [Google Scholar]
  303. Palmer D. S., Misin M., Fedorov M. V., Llinas A.. Fast and General Method to Predict the Physicochemical Properties of Druglike Molecules Using the Integral Equation Theory of Molecular Liquids. Mol. Pharmaceutics. 2015;12:3420–3432. doi: 10.1021/acs.molpharmaceut.5b00441. [DOI] [PubMed] [Google Scholar]
  304. Fowles D. J., McHardy R. G., Ahmad A., Palmer D. S.. Accurately Predicting Solvation Free Energy in Aqueous and Organic Solvents beyond 298 K by Combining Deep Learning and the 1D Reference Interaction Site Model. Digit. Discovery. 2023;2:177–188. doi: 10.1039/D2DD00103A. [DOI] [Google Scholar]
  305. Fowles D. J., Palmer D. S.. Solvation Entropy, Enthalpy and Free Energy Prediction Using a Multi-Task Deep Learning Functional in 1D-RISM. Phys. Chem. Chem. Phys. 2023;25:6944–6954. doi: 10.1039/D3CP00199G. [DOI] [PubMed] [Google Scholar]
  306. Kovalenko A., Hirata F.. Hydration Free Energy of Hydrophobic Solutes Studied by a Reference Interaction Site Model with a Repulsive Bridge Correction and a Thermodynamic Perturbation Method. J. Chem. Phys. 2000;113:2793. doi: 10.1063/1.1305885. [DOI] [Google Scholar]
  307. Genheden S., Luchko T., Gusarov S., Kovalenko A., Ryde U.. An MM/3D-RISM Approach for Ligand Binding Affinities. J. Phys. Chem. B. 2010;114:8505–8516. doi: 10.1021/jp101461s. [DOI] [PubMed] [Google Scholar]
  308. Drabik P., Gusarov S., Kovalenko A.. Microtubule Stability Studied by Three-Dimensional Molecular Theory of Solvation. Biophys. J. 2007;92:394–403. doi: 10.1529/biophysj.106.089987. [DOI] [PMC free article] [PubMed] [Google Scholar]
  309. Blinov N., Dorosh L., Wishart D., Kovalenko A.. Association Thermodynamics and Conformational Stability of B-Sheet Amyloid b(17–42) Oligomers: Effects of E22Q (Dutch) Mutation and Charge Neutralization. Biophys. J. 2010;98:282–296. doi: 10.1016/j.bpj.2009.09.062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  310. Ten-no S.. Free Energy of Solvation for the Reference Interaction Site Model: Critical Comparison of Expressions. J. Chem. Phys. 2001;115:3724. doi: 10.1063/1.1389851. [DOI] [Google Scholar]
  311. Ten-no S., Junga J., Chuman H., Kawashima Y.. Assessment of Free Energy Expressions in RISM Integral Equation Theory: Theoretical Predictionof Partition Coefficients Revisited. Mol. Phys. 2010;108:327–332. doi: 10.1080/00268970903451848. [DOI] [Google Scholar]
  312. Ratkova E. L., Fedorov M. V.. Combination of RISM and Cheminformatics for Efficient Predictions of Hydration Free Energy of Polyfragment Molecules: Application to a Set of Organic Pollutants. J. Chem. Theory Comput. 2011;7:1450–1457. doi: 10.1021/ct100654h. [DOI] [PubMed] [Google Scholar]
  313. Palmer D. S., Frolov A. I., Ratkova E. L., Fedorov M. V.. Towards a Universal Method for Calculating Hydration Free Energies: A 3D Reference Interaction Site Model with Partial Molar Volume Correction. J. Phys.: Condens. Matter. 2010;22:492101. doi: 10.1088/0953-8984/22/49/492101. [DOI] [PubMed] [Google Scholar]
  314. Palmer D. S., Frolov A. I., Ratkova E. L., Fedorov M. V.. Toward a Universal Model to Calculate the Solvation Thermodynamics of Druglike Molecules: The Importance of New Experimental Databases. Mol. Pharmaceutics. 2011;8:1423–1429. doi: 10.1021/mp200119r. [DOI] [PubMed] [Google Scholar]
  315. Sergiievskyi V. P., Fedorov M. V.. 3DRISM Multigrid Algorithm for Fast Solvation Free Energy Calculations. J. Chem. Theory Comput. 2012;8:2062–2070. doi: 10.1021/ct200815v. [DOI] [PubMed] [Google Scholar]
  316. Palmer D. S., Sørensen J., Schiøtt B., Fedorov M. V.. Solvent Binding Analysis and Computational Alanine Scanning of the Bovine Chymosin-Bovine κ-Casein Complex Using Molecular Integral Equation Theory. J. Chem. Theory Comput. 2013;9:5706–5717. doi: 10.1021/ct400605x. [DOI] [PubMed] [Google Scholar]
  317. Huang W., Blinov N., Kovalenko A.. Octanol-Water Partition Coefficient from 3D-RISM-KH Molecular Theory of Solvation with Partial Molar Volume Correction. J. Phys. Chem. B. 2015;119:5588–5597. doi: 10.1021/acs.jpcb.5b01291. [DOI] [PubMed] [Google Scholar]
  318. Truchon J.-F., Pettitt B. M., Labute P.. A Cavity Corrected 3D-RISM Functional for Accurate Solvation Free Energies. J. Chem. Theory Comput. 2014;10:934–941. doi: 10.1021/ct4009359. [DOI] [PMC free article] [PubMed] [Google Scholar]
  319. Sergiievskyi V. P., Jeanmairet G., Levesque M., Borgis D.. Fast Computation of Solvation Free Energies with Molecular Density Functional Theory: Thermodynamic-ensemble Partial Molar Volume Corrections. J. Phys. Chem. Lett. 2014;5:1935–1942. doi: 10.1021/jz500428s. [DOI] [PubMed] [Google Scholar]
  320. Sergiievskyi V., Jeanmairet G., Levesque M., Borgis D.. Solvation Free-Energy Pressure Corrections in the Three Dimensional Reference Interaction Site Model. J. Chem. Phys. 2015;143:184116. doi: 10.1063/1.4935065. [DOI] [PubMed] [Google Scholar]
  321. Misin M., Fedorov M. V., Palmer D. S.. Communication: Accurate Hydration Free Energies at a Wide Range of Temperatures from 3D-RISM. J. Chem. Phys. 2015;142:091105. doi: 10.1063/1.4914315. [DOI] [PubMed] [Google Scholar]
  322. Tanimoto S., Yoshida N., Yamaguchi T., Ten-no S. L., Nakano H.. Effect of Molecular Orientational Correlations on Solvation Free Energy Computed by Reference Interaction Site Model Theory. J. Chem. Inf. Model. 2019;59:3770–3781. doi: 10.1021/acs.jcim.9b00330. [DOI] [PubMed] [Google Scholar]
  323. Roy D., Kovalenko A.. Performance of 3D-RISM-KH in Predicting Hydration Free Energy: Effect of Solute Parameters. J. Phys. Chem. A. 2019;123:4087–4093. doi: 10.1021/acs.jpca.9b01623. [DOI] [PubMed] [Google Scholar]
  324. Misin M., Fedorov M. V., Palmer D. S.. Hydration Free Energies of Molecular Ions from Theory and Simulation. J. Phys. Chem. B. 2016;120:975–983. doi: 10.1021/acs.jpcb.5b10809. [DOI] [PubMed] [Google Scholar]
  325. Misin M., Vainikka P. A., Fedorov M. V., Palmer D. S.. Salting-out Effects by Pressure-Corrected 3D-RISM. J. Chem. Phys. 2016;145:194501. doi: 10.1063/1.4966973. [DOI] [PubMed] [Google Scholar]
  326. Fusani L., Wall I., Palmer D., Cortes A.. Optimal water networks in protein cavities with GAsol and 3D-RISM. Bioinform. 2018;34:1947–1948. doi: 10.1093/bioinformatics/bty024. [DOI] [PubMed] [Google Scholar]
  327. Misin M., Palmer D. S., Fedorov M. V.. Predicting Solvation Free Energies Using Parameter-Free Solvent Models. J. Phys. Chem. B. 2016;120:5724–5731. doi: 10.1021/acs.jpcb.6b05352. [DOI] [PubMed] [Google Scholar]
  328. Hikiri S., Hayashi T., Inoue M., Ekimoto T., Ikeguchi M., Kinoshita M.. An accurate and rapid method for calculating hydration free energies of a variety of solutes including proteins. J. Chem. Phys. 2019;150:175101. doi: 10.1063/1.5093110. [DOI] [PubMed] [Google Scholar]
  329. Kinoshita M., Hayashi T.. Accurate and rapid calculation of hydration free energy and its physical implication for biomolecular functions. Biophys. Rev. 2020;12:469–480. doi: 10.1007/s12551-020-00686-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  330. Hayashi T., Kawamura M., Miyamoto S., Yasuda S., Murata T., Kinoshita M.. An accurate and rapid method for calculating hydration free energies of solutes including small organic molecules, peptides, and proteins. J. Mol. Liq. 2024;406:124989. doi: 10.1016/j.molliq.2024.124989. [DOI] [Google Scholar]
  331. Subramanian V., Ratkova E., Palmer D., Engkvist O., Fedorov M., Llinas A.. Multisolvent Models for Solvation Free Energy Predictions Using 3D-RISM Hydration Thermodynamic Descriptors. J. Chem. Inf. Model. 2020;60:2977–2988. doi: 10.1021/acs.jcim.0c00065. [DOI] [PubMed] [Google Scholar]
  332. Skyner R. E., McDonagh J. L., Groom C. R., van Mourik T., Mitchell J. B. O.. A Review of Methods for the Calculation of Solution Free Energies and the Modelling of Systems in Solution. Phys. Chem. Chem. Phys. 2015;17:6174. doi: 10.1039/C5CP00288E. [DOI] [PubMed] [Google Scholar]
  333. Palmer, D. S. ; Fedorov, M. V. . Computational Pharmaceutical Solid State Chemistry; John Wiley & Sons, Ltd, 2016; pp 263–286, Section 11. [Google Scholar]
  334. McDonagh, J. L. ; Mitchell, J. B. ; Palmer, D. S. ; Skyner, R. E. In Solubility in Pharmaceutical Chemistry; Saal, C. , Nair, A. , Eds.; De Gruyter: Berlin, 2020; pp 71–112. [Google Scholar]
  335. Keith J. A., Vassilev-Galindo V., Cheng B., Chmiela S., Gastegger M., Müller K.-R., Tkatchenko A.. Combining Machine Learning and Computational Chemistry forPredictive Insights Into Chemical Systems. Chem. Rev. 2021;121:9816–9872. doi: 10.1021/acs.chemrev.1c00107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  336. Shi Y.-F., Yang Z.-X., Ma S., Kang P.-L., Shang C., Hu P., Liu Z.-P.. Machine Learning for Chemistry: Basics and Applications. Engineering. 2023;27:70–83. doi: 10.1016/j.eng.2023.04.013. [DOI] [Google Scholar]
  337. Amaro R. E., Mulholland A. J.. Multiscale methods in drug design bridge chemical and biological complexity in the search for cures. Nat. Rev. Chem. 2018;2:0148. doi: 10.1038/s41570-018-0148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  338. Atiq O., Ricci E., Baschetti M. G., Angelis M. G. D.. Multi-scale modeling of gas solubility in semi-crystalline polymers: bridging Molecular Dynamics with Lattice Fluid Theory. Fluid Ph. Equilib. 2023;570:113798. doi: 10.1016/j.fluid.2023.113798. [DOI] [Google Scholar]
  339. Li L., Totton T., Frenkel D.. Computational Methodology for Solubility Prediction: Application to Sparingly Soluble Organic/Inorganic Materials. J. Chem. Phys. 2018;149:054102. doi: 10.1063/1.5040366. [DOI] [PubMed] [Google Scholar]
  340. Moučka F., Nezbeda I., Smith W. R.. Chemical Potentials, Activity Coefficients, and Solubility in Aqueous NaCl Solutions: Prediction by Polarizable Force Fields. J. Chem. Theory Comput. 2015;11:1756–1764. doi: 10.1021/acs.jctc.5b00018. [DOI] [PubMed] [Google Scholar]
  341. Sanz E., Vega C.. Solubility of KF and NaCl in Water by Molecular Simulation. J. Chem. Phys. 2007;126:014507. doi: 10.1063/1.2397683. [DOI] [PubMed] [Google Scholar]
  342. Mondal, S. ; Tresadern, G. ; Greenwood, J. ; Kim, B. ; Kaus, J. ; Wirtala, M. ; Steinbrecher, T. ; Wang, L. ; Masse, C. ; Farid, R. ; et al. A Free Energy Perturbation Approach to Estimate the Intrinsic Solubilities of Drug-like Small Molecules. ChemRvix 2019, 10.26434/chemrxiv.10263077. [DOI]
  343. Schnieders M. J., Baltrusaitis J., Shi Y., Chattree G., Zheng L., Yang W., Ren P.. The Structure, Thermodynamics, and Solubility of Organic Crystals from Simulation with a Polarizable Force Field. J. Chem. Theory Comput. 2012;8:1721–1736. doi: 10.1021/ct300035u. [DOI] [PMC free article] [PubMed] [Google Scholar]
  344. Thompson J. D., Cramer C. J., Truhlar D. G.. Predicting Aqueous Solubilities from Aqueous Free Energies of Solvation and Experimental or Calculated Vapor Pressures of Pure Substances. J. Chem. Phys. 2003;119:1661–1670. doi: 10.1063/1.1579474. [DOI] [Google Scholar]
  345. Klamt A., Eckert F., Hornig M., Beck M. E., Bürger T.. Prediction of Aqueous Solubility of Drugs and Pesticides with COSMO-RS. J. Comput. Chem. 2002;23:275–281. doi: 10.1002/jcc.1168. [DOI] [PubMed] [Google Scholar]
  346. Hyttinen N., Heshmatnezhad R., Elm J., Kurtén T., Prisle N. L.. Estimating Aqueous Solubilities and Activity Coefficients of Mono- and α, ω-Dicarboxylic Acids Using COSMOtherm. Atmos. Chem. Phys. 2020;20:13131–13143. doi: 10.5194/acp-20-13131-2020. [DOI] [Google Scholar]
  347. Loschen C., Klamt A.. Prediction of Solubilities and Partition Coefficients in Polymers Using COSMO-RS. Ind. Eng. Chem. Res. 2014;53:11478–11487. doi: 10.1021/ie501669z. [DOI] [Google Scholar]
  348. Bjelobrk Z., Mendels D., Karmakar T., Parrinello M., Mazzotti M.. Solubility Prediction of Organic Molecules with Molecular Dynamics Simulations. Cryst. Growth Des. 2021;21:5198–5205. doi: 10.1021/acs.cgd.1c00546. [DOI] [Google Scholar]
  349. Neha, Aggarwal M., Soni A., Karmakar T.. Polymorph-Specific Solubility Prediction of Urea Using Constant Chemical Potential Molecular Dynamics Simulations. J. Phys. Chem. B. 2024;128:8477–8483. doi: 10.1021/acs.jpcb.4c02027. [DOI] [PubMed] [Google Scholar]
  350. Vermeire F. H., Chung Y., Green W. H.. Predicting Solubility Limits of Organic Solutes for a Wide Range of Solvents and Temperatures. J. Am. Chem. Soc. 2022;144:10785–10797. doi: 10.1021/jacs.2c01768. [DOI] [PubMed] [Google Scholar]
  351. Lee S., Lee M., Gyak K.-W., Kim S. D., Kim M.-J., Min K.. Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks. ACS Omega. 2022;7:12268–12277. doi: 10.1021/acsomega.2c00697. [DOI] [PMC free article] [PubMed] [Google Scholar]
  352. Tayyebi A., Alshami A. S., Rabiei Z., Yu X., Ismail N., Talukder M. J., Power J.. Prediction of organic compound aqueous solubility using machine learning: a comparison study of descriptor-based and fingerprints-based models. J. Cheminf. 2023;15:99. doi: 10.1186/s13321-023-00752-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  353. Łukasz, M. ; Danel, T. ; Mucha, S. ; Rataj, K. ; Tabor, J. ; Jastrzebski, S. . Molecule Attention Transformer. arXiv 2020, 10.48550/arXiv.2002.08264. [DOI]
  354. Sorkun M. C., Koelman J. V. A., Er S.. Pushing the limits of solubility prediction via quality-oriented data selection. iScience. 2021;24:101961. doi: 10.1016/j.isci.2020.101961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  355. Llompart P., Minoletti C., Baybekov S., Horvath D., Marcou G., Varnek A.. Will we ever be able to accurately predict solubility? Sci. Data. 2024;11:303. doi: 10.1038/s41597-024-03105-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  356. Ramos M. C., White A. D.. Predicting small molecules solubility on endpoint devices using deep ensemble neural networks. Digital Discovery. 2024;3:786. doi: 10.1039/D3DD00217A. [DOI] [PMC free article] [PubMed] [Google Scholar]
  357. Ye Z., Ouyang D.. Prediction of Small-Molecule Compound Solubility in Organic Solvents by Machine Learning Algorithms. J. Cheminf. 2021;13:98. doi: 10.1186/s13321-021-00575-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  358. Vassileiou A. D., Robertson M. N., Wareham B. G., Soundaranathan M., Ottoboni S., Florence A. J., Hartwigd T., Johnston B. F.. A unified ML framework for solubility prediction across organic solvents. Digital Discovery. 2023;2:356–367. doi: 10.1039/D2DD00024E. [DOI] [Google Scholar]
  359. Hopfinger A. J., Esposito E. X., Llinàs A., Glen R. C., Goodman J. M.. Findings of the Challenge to Predict Aqueous Solubility. J. Chem. Inf. Model. 2009;49:1–5. doi: 10.1021/ci800436c. [DOI] [PubMed] [Google Scholar]
  360. Nicoud L., Licordari F., Myerson A. S.. Estimation of the Solubility of Metastable Polymorphs: A Critical Review. Cryst. Growth Des. 2018;18:7228–7237. doi: 10.1021/acs.cgd.8b01200. [DOI] [Google Scholar]
  361. Mortazavi, M. ; Hoja, J. ; Aerts, L. ; Quéré, L. ; van de Streek, J. ; Neumann, M. A. ; Tkatchenko, A. . Computational polymorph screening reveals late-appearing and poorly-soluble form of rotigotine. Commun. Chem. 2019, 2, 10.1038/s42004-019-0171-y. [DOI] [Google Scholar]
  362. Badawy S. I. F., Hussain M. A.. Microenvironmental pH Modulation in Solid Dosage Forms. J. Pharm. Sci. 2007;96:948–959. doi: 10.1002/jps.20932. [DOI] [PubMed] [Google Scholar]
  363. Taniguchi C., Kawabata Y., Wada K., Yamada S., Onoue S.. Microenvironmental pH-modification to improve dissolution behavior and oral absorption for drugs with pH-dependent solubility. Expert Opin. Drug Delivery. 2014;11:505–516. doi: 10.1517/17425247.2014.881798. [DOI] [PubMed] [Google Scholar]
  364. Uekusa T., Avdeef A., Sugano K.. Is equilibrium slurry pH a good surrogate for solid surface pH during drug dissolution? Eur. J. Pharm. Sci. 2022;168:106037. doi: 10.1016/j.ejps.2021.106037. [DOI] [PubMed] [Google Scholar]
  365. Higuchi W. I., Parrott E. L., Wurster D. E., Higuchi T.. Investigation of Drug Release from Solids II.*. J. Am. Pharm. Assoc. 1958;47:376–383. doi: 10.1002/jps.3030470522. [DOI] [PubMed] [Google Scholar]
  366. Krieg B. J., Taghavi S. M., Amidon G. L., Amidon G. E.. In Vivo Predictive Dissolution: Comparing the Effect of Bicarbonate and Phosphate Buffer on the Dissolution of Weak Acids and Weak Bases. J. Pharm. Sci. 2015;104:2894–2904. doi: 10.1002/jps.24460. [DOI] [PubMed] [Google Scholar]
  367. Paus R., Ji Y., Braak F., Sadowski G.. Dissolution of Crystalline Pharmaceuticals: Experimental Investigation and Thermodynamic Modeling. Ind. Eng. Chem. Res. 2015;54:731–742. doi: 10.1021/ie503939w. [DOI] [Google Scholar]
  368. Dejmek M., Ward C. A.. A statistical rate theory study of interface concentration during crystal growth or dissolution. J. Chem. Phys. 1998;108:8698–8704. doi: 10.1063/1.476298. [DOI] [Google Scholar]
  369. Casalini T., Mann J., Pepin X.. Predicting Surface pH in Unbuffered Conditions for Acids, Bases, and Their Salts - A Review of Modeling Approaches and Their Performance. Mol. Pharmaceutics. 2024;21:513–534. doi: 10.1021/acs.molpharmaceut.3c00661. [DOI] [PubMed] [Google Scholar]
  370. Hasselbalch, K. A. Die Berechnung der Wasserstoffzahl des Blutes aus der freien und gebundenen Kohlensäure desselben, und die Sauerstoffbindung des Blutes als Funktion der Wasserstoffzahl; Julius Springer, 1916. [Google Scholar]
  371. Bergström C. A. S., Luthman K., Artursson P.. Accuracy of calculated pH-dependent aqueous drug solubility. Eur. J. Pharm. Sci. 2004;22:387–398. doi: 10.1016/j.ejps.2004.04.006. [DOI] [PubMed] [Google Scholar]
  372. Bonin A., Montanari F., Niederführ S., Göller A. H.. pH-dependent solubility prediction for optimized drug absorption and compound uptake by plants. J. Comput.-Aided Mol. Des. 2023;37:129–145. doi: 10.1007/s10822-023-00496-3. [DOI] [PubMed] [Google Scholar]
  373. Völgyi G., Baka E., Box K. J., Comer J. E. A., Takács-Novák K.. Study of pH-dependent solubility of organic bases. Revisit of Henderson-Hasselbalch relationship. Anal. Chim. Acta. 2010;673:40–46. doi: 10.1016/j.aca.2010.05.022. [DOI] [PubMed] [Google Scholar]
  374. Hansen N. T., Kouskoumvekaki I., Jørgensen F. S., Brunak S., Jónsdóttir S. s.. Prediction of pH-Dependent Aqueous Solubility of Druglike Molecules. J. Chem. Inf. Model. 2006;46:2601–2609. doi: 10.1021/ci600292q. [DOI] [PubMed] [Google Scholar]
  375. Johnson S. R., Chen X.-Q., Murphy D., Gudmundsson O.. A Computational Model for the Prediction of Aqueous Solubility That Includes Crystal Packing, Intrinsic Solubility, and Ionization Effects. Mol. Pharmaceutics. 2007;4:513–523. doi: 10.1021/mp070030+. [DOI] [PubMed] [Google Scholar]
  376. Advanced Chemistry Development, Inc. , Toronto, ON, Canada, ACD/Percepta. https://www.acdlabs.com. [Google Scholar]
  377. Simulations Plus, Inc. : Lancaster, CA, ADMET Predictor. https://www.acdlabs.com. [Google Scholar]
  378. Sun F., Yu Q., Zhu J., Lei L., Li Z., Zhang X.. Measurement and ANN prediction of pH-dependent solubility of nitrogen-heterocyclic compounds. Chemosphere. 2015;134:402–407. doi: 10.1016/j.chemosphere.2015.04.092. [DOI] [PubMed] [Google Scholar]
  379. Galarza, L. M. ; Gomez, L. A. T. . Prediction of pH-dependent aqueous solubility of druglike molecules of different chemical behavior. MOL2NET 03, International Conference Series on Multidisciplinary Sciences; 2017. [Google Scholar]
  380. Aleksić S., Seeliger D., Brown J. B.. ADMET Predictability at Boehringer Ingelheim: State-of-the-Art, and Do Bigger Datasets or Algorithms Make a Difference? Mol. Inform. 2022;41:2100113. doi: 10.1002/minf.202100113. [DOI] [PubMed] [Google Scholar]
  381. Stojaković J., Baftizadeh F., Bellucci M. A., Myerson A. S., Trout B. L.. Angle-Directed Nucleation of Paracetamol on Biocompatible Nanoimprinted Polymers. Cryst. Growth Des. 2017;17:2955–2963. doi: 10.1021/acs.cgd.6b01093. [DOI] [Google Scholar]
  382. Tom G., Schmid S. P., Baird S. G., Cao Y., Darvish K., Hao H., Lo S., Pablo-García S., Rajaonson E. M., Skreta M.. et al. Self-Driving Laboratories for Chemistry and Materials Science. Chem. Rev. 2024;124:9633–9732. doi: 10.1021/acs.chemrev.4c00055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  383. Liang Y., Job H., Feng R., Parks F., Hollas A., Zhang X., Bowden M., Noh J., Murugesan V., Wang W.. High-throughput solubility determination for data-driven materials design and discovery in redox flow battery research. Cell Rep. Phys. Sci. 2023;4:101633. doi: 10.1016/j.xcrp.2023.101633. [DOI] [Google Scholar]

Articles from Chemical Reviews are provided here courtesy of American Chemical Society

RESOURCES