Abstract
The physicochemical properties of molecular crystals, such as solubility, stability, compactability, melting behaviour and bioavailability, depend on their crystal form1. In silico crystal form selection has recently come much closer to realization because of the development of accurate and affordable free-energy calculations2–4. Here we redefine the state of the art, primarily by improving the accuracy of free-energy calculations, constructing a reliable experimental benchmark for solid–solid free-energy differences, quantifying statistical errors for the computed free energies and placing both hydrate crystal structures of different stoichiometries and anhydrate crystal structures on the same energy landscape, with defined error bars, as a function of temperature and relative humidity. The calculated free energies have standard errors of 1–2 kJ mol−1 for industrially relevant compounds, and the method to place crystal structures with different hydrate stoichiometries on the same energy landscape can be extended to other multi-component systems, including solvates. These contributions reduce the gap between the needs of the experimentalist and the capabilities of modern computational tools, transforming crystal structure prediction into a more reliable and actionable procedure that can be used in combination with experimental evidence to direct crystal form selection and establish control5.
Subject terms: Computational methods, Drug discovery and development, Materials chemistry
Accuracy of free-energy calculations can be improved by constructing an experimental benchmark for solid–solid free-energy differences, quantifying statistical errors for the computed free energies and placing both hydrate and anhydrate crystal structures on the same energy landscape.
Main
Molecular crystals are important components of food products6, semiconductors7, explosives8, agrochemicals9 and pharmaceuticals10–12. Their physicochemical properties depend on an interplay of chemical composition and molecular packing within a crystal structure, known as a crystal form or polymorph when more than one arrangement exists10,11,13,14. The development of an undesirable crystal form can have harmful consequences, as shown in the cases of ritonavir and rotigotine12,15,16. Crystal form selection remains a challenge because of the implications for physical and chemical stability, solubility, dissolution, nucleation barriers, mechanical properties, filtration, powder flow and, under consideration here, the formation of hydrates and solvates stable within accessible temperature and humidity ranges1,17–19. To complement experimental efforts, computational methods, in particular crystal structure prediction (CSP), are becoming important for polymorph risk assessment and control3,5,20–23. The capabilities of CSP have improved greatly in recent years by the inclusion of temperature-dependent free-energy calculations, yet without rigorously assessing the accuracy of such predictions2,4,24–31. CSP can be used to identify the most stable, possibly still to be discovered, crystal form of pharmaceutically relevant molecules20,21,32 and has started to be applied early in the molecular and materials design cycles by balancing accuracy and computational efficiency33. Furthermore, CSP is used in the prediction of stoichiometric hydrates34 and solvates35,36, but without explicitly considering relative humidity or solvent activity. Free-energy calculations have been used to construct hydrate–anhydrate phase diagrams, still requiring experimental calibration for every compound and pair of crystal forms37. A data-driven and topological algorithm38 has been recently used in conjunction with CSP for the prediction of fractional or nonstoichiometric hydrates to evaluate the ever-present risk of hydrate formation for industrially relevant compounds, due in part to the ubiquity of water vapour in the atmosphere. The current study addresses four of the most salient open issues: the need for more accurate and affordable free-energy calculations, the lack of a reliable free-energy benchmark, the quantification of computational errors and the prediction of stability relationships between hydrates and anhydrates as a function of temperature and relative humidity without a need for compound-dependent experimental calibration. These advances bridge the gap between the capabilities of in silico crystal form selection and the needs of bench practitioners, as demonstrated by case studies on two pharmaceutical compounds.
Composite free-energy calculations
Ideally, free-energy calculations would be carried out using a single, accurate ab initio method and the standard machinery of statistical thermodynamics. In practice, however, such an approach is economically unfeasible. Time and cost are vital elements for innovators, whether they use purely experimental methods or complement with CSP. The requirement to treat many crystal forms in a single CSP study limits the acceptable amount of central processing unit (CPU) time required for a single free-energy calculation to about one day on 1,000 cores. An alternative is to combine a variety of affordable calculations that capture various physical effects and, when united, correct for the shortcomings of each other.
As shown in Extended Data Fig. 1, our free-energy calculation method TRHu(ST) 23 (an acronym for temperature- and relative-humidity-dependent free-energy calculations with standard deviations) combines the composite PBE0 + MBD + Fvib approach4,39 (where PBE0 is a hybrid functional composed of the Perdew–Burke-Ernzerhof (PBE) functional with 25% Hartree–Fock exchange energy, MBD is many-body dispersion energy, and Fvib is the free energy of phonons at finite temperature) with an additional single-molecule correction29 and reduces the CPU time requirements of the phonon calculations by blending force field and ab initio calculations. Moreover, imaginary and very soft vibrational modes, hydrogen-bond stretch vibrations and methyl-group rotations are explicitly sampled (Methods).
Free-energy benchmark
For calculated energies to be used in process design and risk assessment, knowledge of the associated errors is as important as the predicted values themselves. The quantification of these errors has received almost no attention in CSP because an extensive and reliable benchmark for solid–solid free-energy differences of industrially relevant compounds was not available. The scientific literature describes solid–solid phase transformations and lists stability relationships, but usually the published data do not constitute the determination of a free-energy difference between two determined crystal structures. For example, the free-energy differences between polymorphs can be obtained from their solubility ratio in infinite dilution, but a dataset is only complete when the crystal structures of both polymorphs have been solved and the solubilities of both polymorphs have been measured in a common solvent, where both forms are only moderately soluble. Literature information is often incomplete, possibly because there was no incentive for the authors to publish all measurements, but more likely because the experiments are usually very challenging for metastable polymorphs. Assuming all data were complete for a compound, it may be split over several publications by different authors, complicating matters further in case the naming of polymorphs or the assignment of structural and thermodynamic properties is open to interpretation.
For this work, complete data for a chemically diverse and industrially relevant set of compounds have been collected from the literature and from several experimental contributors working in academia and industry. Apart from the 12 free-energy differences obtained from solubility ratios, four reversible (enantiotropic) phase transitions between polymorphs and 21 reversible hydrate–anhydrate phase transitions as a function of relative humidity have been used to determine free-energy differences. At the phase-transition temperature that separates the stability domains of two polymorphs, the free energies are equal by definition. Therefore, every experimental observation of a reversible phase transition constitutes the measurement of a free-energy difference that is zero. The case of hydrate–anhydrate phase transitions is discussed separately below. Our free-energy benchmark is described in detail in the Supplementary Information. The performance of our free-energy calculations on the benchmark is documented in Extended Data Tables 1 and 2.
Extended Data Table 1.
represents one standard deviation of the error for the free energy difference between two crystal structures. is the deviation between the calculated and the experimental free energy difference. Further information about experimental data and validation results can be found in the Supplementary Information.
*Experimental structure contains disorder.
†Experimental solubility values are not measured at infinite dilution conditions.
‡Experimental phase transition does not have measured upper and lower bounds.
Extended Data Table 2.
represents one standard deviation of the error for the free energy difference between two crystal structures. is the deviation between the calculated and the experimental free energy difference. Further information about experimental data and validation results can be found in the Supplementary Information.
*Experimental structure contains disorder.
†Reference critical water activity is not reversibly determined.
‡Experimental structure was corrected based on calculations, further details are provided in the Supplementary Information.
Transferable error estimation
To apply a quantitative risk assessment to compounds, it is necessary to express the deviation between the experimental and the computed free-energy differences in terms of a small set of parameters that enable extrapolation of the observed errors to chemical compounds not part of the benchmark, accounting for molecular size and chemical variability. We rationalize the observed energy discrepancies in terms of standard deviation (σ) of the energy error per water molecule, and standard deviation of the energy error per atom, σat = 0.191 kJ mol−1, for non-water atoms in the compound (Supplementary Information). Standard deviations of free energies and their differences can be derived from these basic values for any chemical compound and water content using the formulae presented in the Methods based on Gaussian error propagation. For example, the standard error for a molecule consisting of N atoms is √N times larger than σat, and using this relationship the standard error per water molecule can be translated to a standard error per atom in a water molecule of . When compounds from the benchmark did not fulfil all quality criteria (Supplementary Information), they were excluded from the determination of and σat. In general, one would expect from a well-calibrated error estimation model that the deviation between the computational and experimental free-energy differences normalized by the expected standard error should follow a Gaussian distribution with a standard deviation of one. Extended Data Fig. 2 shows that this expectation is fulfilled for the benchmark compounds.
Hydrate–anhydrate phase transitions
Hydrate–anhydrate phase transitions are different from the other experimental sources of free-energy information discussed above. Water molecules leave the solid state on dehydration and have to be modelled explicitly in their liquid or gas phase. Calculating the gas-phase free energy of water (Supplementary Information), we established that a systematic underestimation of the phase-transition relative humidity can be avoided by adding an empirical correction, , to the computed gas-phase chemical potential of water. The correction was fitted with (Supplementary Information). Figure 1 shows the experimental and calculated phase-transition relative humidities (top, right scale), which are related to the pressure-dependent part of the chemical potential (bottom, left scale). The experimental relative humidities are reproduced to within a factor of 1.7 on average over all compounds in the validation set. Without the chemical potential correction for water, calculation and experiment still agree within a factor of 2.4 (Extended Data Fig. 3), proving that the chemical potential correction is not strictly required, thus making our approach directly applicable to other solvents.
Pharmaceutical case studies
We present two pharmaceutical case studies—radiprodil and upadacitinib—to illustrate the predictive power of our method featuring the new standard graphical representation of CSP results, with defined error bars, as a function of temperature and relative humidity. The structures of the two molecules are shown in Fig. 2.
Radiprodil is an NR2B-negative allosteric modulator initially developed for the treatment of neuropathic pain40. More recently, it has shown potential in the treatment of infantile spasms41 and may provide therapeutic benefits in paediatric epileptic disorders42.
The crystal-energy landscape of radiprodil, computed at defined temperatures and relative humidities, is shown in Fig. 3. The full symbols correspond to a predicted anhydrate, monohydrate or dihydrate crystal structure characterized in terms of its free energy and density.
The experimental anhydrate, monohydrate and dihydrate forms, indicated by open symbols in Fig. 3, correspond to the most stable predicted crystal structures for each of the respective stoichiometries, demonstrating the accuracy of our composite energy calculation method. Error bars are provided for each predicted crystal structure. The energy differences between the structures on the lower part of the crystal-energy landscape are of the same order of magnitude as the error bars. At the top of the crystal-energy landscape, the error bars are larger, because not all energies are calculated at the highest level of theory to preserve CPU time.
The temperature dependence of selected anhydrate structures, including the experimental forms A and C, is shown in Fig. 4.
The error bars are translated into stripes in Fig. 4, with the accuracy in the prediction of the phase-transition temperature being shown by the extent of the crossover region of the stripes. Here the predicted phase-transition temperature of 481.60 K differs from the experimental value of 343.15 K by 138.45 K, which is less than 1σ of 183 K. At every temperature, all free energies have been shifted such that they average to zero. Without this temperature-dependent shift, the free-energy curves would be hardly distinguishable (Extended Data Fig. 4).
Upadacitinib is a Janus kinase inhibitor that works by blocking certain signals causing inflammation. Upadacitinib is being developed for a range of auto-immune diseases43, and to date it has been approved by the US Food and Drug Administration for rheumatoid arthritis, psoriatic arthritis, ulcerative colitis, atopic dermatitis and ankylosing spondylitis44.
The crystal-energy landscape of the anhydrate, hemihydrate and monohydrate structures of upadacitinib is shown in Extended Data Fig. 5. The two experimentally observed phases, form I (hemihydrate) and form III (anhydrate) again correspond to the most stable predicted structures at their respective stoichiometries. The free energies of the most stable predicted anhydrate, hemihydrate and monohydrate at 25 °C as a function of relative humidity are shown in Fig. 5.
No monohydrate has been observed experimentally, fitting with the fact that the monohydrate is predicted to be less stable than both the anhydrate and the hemihydrate at 25 °C over the full range of relative humidities. The predicted phase transition from form III to form I at 7.8% relative humidity is found experimentally at 14% relative humidity, well within the 1σ confidence interval ranging from 2.4% relative humidity to 26% relative humidity. Varying both temperature and relative humidity, the solid–solid phase diagram indicating the most stable form as a function of the thermodynamic variables can be constructed as shown in Fig. 6. The more complex solid–solid phase diagram of radiprodil is presented in Supplementary Fig. 15.
Discussion
Impact on crystal form selection
Both case studies are examples of well-controlled polymorphic systems, in which the respective stable crystal forms at specified temperatures and relative humidities have been prepared experimentally. However, it has been suggested45 that for about 30% of the compounds in late-phase pharmaceutical development, the most stable crystal form does not readily crystallize. Modern crystal-energy landscapes reveal these missing forms, and using relative energy differences and calculated errors, we can assess the risk that a more stable predicted form is actually more stable in the real world, with the predicted magnitude of decrease in solubility should the new form crystallize. If the missing form is perceived as a risk and further efforts are made to bring it into existence, the a priori predicted phase diagram shows the experimental conditions at which the missing form is thermodynamically favoured. If, on the contrary, it has not been possible to obtain the missing form despite intensified experimental screening, the same predicted phase diagram provides a map for the design of a robust process to mitigate the risk of encountering the missing form.
Current versus required accuracy
If CSP is to be universally incorporated into industrial processes in, for example, pharmaceutical development, properties such as temperature and relative humidity of phase transitions must be predicted with an accuracy similar to or better than the experimental values. This is especially true for phenomena occurring within the range of conditions likely to be encountered in processing and storage. However, for many thermodynamic properties, even small free-energy errors give rise to relatively large confidence intervals in predictions because of, for example, the exponential dependence of those measurable quantities on free-energy differences or the rather similar slopes of the temperature-dependent free-energy curves of organic crystals. For most of the reference compounds, the standard error of the free-energy difference, , is between 1 kJ mol−1 and 2 kJ mol−1. At the current level of accuracy, hydrate–anhydrate phase-transition relative humidities are predictable to within a factor of 1.7 and the 1σ error of 183 K obtained for the anhydrate–anhydrate phase-transition temperature of radiprodil is representative of what has been observed in numerous confidential contract research studies. Therefore, at present, our method enables the prediction of probable phase transitions between two crystal forms, although not the point at which it will occur. The prediction of phase-transition temperatures to within 10 K requires a further improvement in accuracy by a challenging factor of 20.
Rigorous quantification of uncertainty
With error bars representing a standard deviation under normal distribution, Gaussian statistics enable us to quantify the reliability of the predictions. For example, for a pair of polymorphs with a temperature-dependent reversible phase transition (enantiotropic relationship), the overlap of the 1σ free-energy error can be translated into a confidence interval of the transition temperature.
Comparing the magnitudes of the errors and , we see that the error per atom of the water molecule at 0.379 kJ mol−1 is approximately two times larger than the error per non-water atom at 0.191 kJ mol−1. Both values are affected by the computational and the experimental errors, and as such represent the upper limits of the actual computational error. The larger error per atom for water is consistent with the fact that water is generally considered difficult to model. By contrast, 70% of the atoms of the compounds in our test set are carbon atoms or their covalently bonded neighbours that are much easier to describe. The difference of a factor of two between the two errors suggests a substantial dependence of the error per atom on the atomic species. Therefore, the derived value for is an average that applies only to compounds that are close to the average chemical composition of the benchmark.
Outlook
Borrowing some words from a famous quote of John Maddox46, despite substantial recent progress, it is “one of the continuing scandals in the physical sciences that it remains in general impossible to predict” the solid–solid phase-transition temperatures of pharmaceutical compounds to within a few kelvins.
The availability of diverse and reliable data is a prerequisite for further improvements. To establish the benchmark presented in this work, a collaboration of a substantial number of companies and academic groups was required. It is important that academic groups are able to apply for funding related to the measurement of accurate structural and thermodynamic properties of organic solids, and this attempt should not be considered outdated in thermodynamics by the funding agencies. Likewise, it would be desirable that pharmaceutical companies all around the world declassify more of their internal solid-state data when compatible with Intellectual Property requirements. With more data, the present work could be extended to all relevant crystallization solvents, and atom-species-dependent atomic errors could be determined, thus enabling a finer description of the computational errors.
For a further reduction of the computational error, two broad approaches can be identified. On the one hand, two important physical phenomena have not been taken into account in the present work—namely, thermal lattice expansion and a full treatment of the contribution of anharmonicity to the lattice free energy. Because ab initio methods are expensive and the force fields are not accurate enough for such an endeavour, it can be expected that machine learning force fields will provide a cost-accuracy compromise to capture these effects47,48. On the other hand, there is the accuracy of the ab initio calculations themselves. The single-point energy-correction scheme of our method can, in principle, provide consistent energies and forces for lattice-energy minimization and vibrational sampling. The single-point energy-correction scheme itself may then be taken to a higher level of theory, calculating lattice energies directly at the CCSD(T)49 level or performing monomer, dimer50 and, potentially, trimer corrections at that level of theory.
We hope that our work will inspire others to tackle the organic solid–solid phase-transition temperature challenge.
Methods
Computational details
Naming
The name of our energy calculation method is TRHu(ST) 23, an abbreviation of temperature- and relative-humidity-dependent free-energy calculations with standard deviations, and it should be pronounced as Trust 23. The name is meant to encompass both the actual energy calculations and the model for transferable error estimates calibrated on a specific benchmark. Because the energy calculations, the error estimation model and the benchmark will continue to evolve, we have added the year of publication to refer to our implementation and to define a naming scheme for improvements to come.
Electronic structure energy corrections
All experimental crystal structures were minimized with the Perdew–Burke–Ernzerhof (PBE) functional51 augmented with Neumann–Perrin dispersion correction (PBE-NP)52 with the light basis set with the 2010 species defaults using the FHI-aims ab initio package53–57. Subsequently, a set of single-point energy corrections and a monomer correction were applied to the minimized structures. Single-point energy corrections included a light-to-tight basis-set correction, a functional correction from PBE to PBE058,59 and a correction from PBE-NP to PBE with a non-local many-body dispersion correction (PBE-MBD-NL)39,60,61. The monomer correction followed the same protocol as described elsewhere29 and was performed in addition to the aforementioned single-point corrections with second-order Møller–Plesset theory with a correction for van der Waals dispersion (MP2D)62 using the NAO-VCC-4Z63 basis set in FHI-aims and the anaconda psi4/mp2d module64,65. The performance of the method with various components removed, such as the single-point corrections or the single-molecule MP2D correction, is shown in Supplementary Table 24 and Supplementary Figs. 16–19 and demonstrates that the combined method improves performance substantially over previously published methods. All calculations, including those with third-party code, were carried out in GRACE66.
Phonon calculations
Before the second-order dynamic matrix (Hessian) calculation, a cell replication to a supercell was carried out for all structures to guarantee at least a distance of 8 Å between each atom and its nearest symmetry copy. Next, a Cartesian displacement of 0.01 Å was applied in six Cartesian directions to compute the forces at the PBE-NP level and to derive the Hessian from finite differences.
For the determination of eigenvalues and eigenmodes, a k-point-dependent Hessian of the original cell was first derived from the supercell Hessian for every k-point compatible with the periodic boundary conditions of the supercell. Subsequently, the mass-weighted Hessian was used to obtain eigenvalues, from which the vibrational contribution to the free energies was computed using the harmonic approximation. Imaginary modes corresponding to double-well potentials were observed at the PBE-NP level for only one form of gaboxadol HCl and one form of verebecestat. The handling of imaginary modes is described below.
Eigenmode following corrections
A few more corrections to the harmonic approximation were added to the free energies. The corrections are termed the imaginary mode correction and the very soft mode correction and include explicit mode sampling from modes with eigenvalues less than 0 cm−1 and 25 cm−1, respectively. These modes were explicitly sampled with at least three points in each direction (positive and negative) and approximated to a fourth-order polynomial potential. Next, the free energy of each mode was calculated by explicitly solving the one-dimensional Schrödinger equation for the polynomial potential. The free-energy contribution was splined out for modes between 15 cm−1 and 25 cm−1 with a fourth-order polynomial. The same procedure without splining was applied to imaginary modes, explicitly sampling the double-well potential and solving its Schrödinger equation.
Methyl top correction
For the methyl top correction, we applied a standard approach, that is, explicit methyl top rotation potential sampling inside the crystal, then solved the Schrödinger equation for a particle on the computed potential.
Hydrogen-bond correction
The hydrogen-bond correction aimed to correct the zero-point vibration energy for hydrogen atoms in X–H···Y, where X, Y = N, O, F. Each hydrogen-bonded hydrogen atom was displaced along six Cartesian directions by 0.01 Å to compute a local mass-weighted Hessian matrix. Solving an eigenmode problem for this matrix gives three eigenvalues. The eigenvector with the largest eigenmode of the hydrogen-bonded hydrogen atom was followed in the positive and negative directions to sample the potential, with at least two points in each direction. Energy levels used in the free-energy calculations were obtained from a solution of the Schrödinger equation for a particle on the quartic potential fitted for sampled points.
Large-cell correction
To conserve CPU time, phonon calculations at the ab initio level are limited to very small supercells that are not large enough to capture the effect of phonon band structure on the lattice free energy, in particular for acoustic modes. Therefore, a large-cell correction was carried out using tailor-made force fields67 reparametrized with additional reference data at the PBE-NP level of theory for the force calculations. Using the phonon calculations described above, including imaginary mode and soft mode corrections, force field lattice free energies were computed for the small supercell already used in the ab initio phonon calculations and a larger supercell with minimal distances of 24 Å between symmetry copies of an atom. The difference between the two lattice free-energy calculations was used as a correction. As already described above, eigenvalues and eigenmodes were explicitly calculated for every k-point compatible with the periodic boundary conditions of the supercell. This way the band structure of all modes, including acoustic modes, is explicitly taken into account. The contribution of the three acoustic modes at the gamma point was approximated by 3RT. It is important to note that imaginary modes can only be explicitly sampled and evaluated if an explicit supercell is available for the corresponding k-point.
For the gas-phase water calculations, the large-cell correction was not applied.
Statistical model of error
The ultimate aim of crystal structure prediction as part of solid form selection is the derisking of the developed form to prevent the recurrence of a disaster such as that of ritonavir. As such, the uncertainty or expected error of the calculated values must be carefully quantified relative to experimental observables. The error in our calculations can be attributed to myriad sources, such as errors from the density functional, the dispersion correction, the choice of basis set, the harmonic approximation for the computation of the vibrational partition function and basis-set superposition errors. Because we wanted to evaluate if the interactions of the water molecule within the crystal give rise to larger errors than other atoms on average, we separated the total error into two contributions: one contribution per atom for the main compound in the asymmetric unit cell and one contribution per water molecule in the asymmetric unit. Each error is assumed to be normally distributed and statistically independent, such that the error for a single free energy per organic molecule would also be normally distributed, with a variance found by Gaussian error propagation:
1 |
Here, F and are the observed and predicted free energies of the crystal structure per organic molecule, m is the number of atoms per organic molecule, n is the number of water molecules per organic molecule and N is the number of organic molecules per asymmetric unit. and are the standard deviations of the per-organic-molecule-atom and per-water-molecule errors, respectively. Further details are provided in the Supplementary Information.
Likewise, the total error on the calculated energy difference between two crystal structures is also normally distributed with
2 |
and are the observed and predicted free-energy differences between two crystal structures of the same organic molecule, indexed by i. The parameters N and n may now be different for each crystal structure.
and refer to 1σ of the error for the free energy of a single crystal structure and the free-energy difference between two crystal structures, respectively, and are referred to as and . Both quantities are per organic molecule.
Chemical potential correction of water
A single fitted correction on the chemical potential of water, , is needed to account for the reference state of the free-energy calculations and other factors such as the neglect of basis set superpositioning errors and thermal lattice expansion. The performance of the method without water chemical potential correction is shown in Extended Data Fig. 3.
Standard error calculation
A validation dataset of anhydrate solubility ratios and enantiotropic phase transitions (Supplementary Data) is used to independently solve for . A validation dataset of hydrate–anhydrate phase-transition systems (Supplementary Data) is used to solve for , and simultaneously. Further details on the error model and characterizing the error distributions are provided in the Supplementary Information.
Anhydrates free-energy differences
Starting from solubility ratios and reversibly determined anhydrate phase transitions, a reference free-energy difference was calculated at the experimental temperature. This reference free-energy difference was compared with the predicted free-energy difference at the same temperature. was calculated using Gaussian error propagation. Further details are provided in Supplementary Table 22 and Supplementary Fig. 12.
Hydrate–anhydrate phase transitions
For the hydrate systems, the reference data point used was the relative humidity at hydrate–anhydrate coexistence, or critical water activity, measured at 298.15 K. To compute the standard error on the predicted coexistence relative humidity, the standard deviation of the error of the free-energy difference is normalized to one molecule of water ( is the difference in the number of water molecules between the two structures):
3 |
Note the difference between in equation (3) and from equations (1) and (2); the latter is a constant, whereas the former is system-dependent. Further equations for the calculation of the predicted phase transition from calculated hydrate–anhydrate free energies are provided in the Supplementary Information.
Crystal structure prediction
Crystal structure predictions were carried out with GRACE 2.7 following the procedure described previously32.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-023-06587-3.
Supplementary information
Acknowledgements
D.E.B. acknowledges funding from the Austrian Science Fund (FWF projects T593-N19 and V436-N34). M.A.N, D.F., K.S., J.H., H.D., J.v.d.S and Y.M.L. acknowledge funding from the German “Zentrales Innovationsprogramm Mittelstand” (projects EP180192 and ZF4716901DF9).
Extended data figures and tables
Author contributions
M.A.N. and D.F. conceived the project and developed the method. D.F. developed the validation protocol. D.F., J.H. and M.A.N. wrote the software infrastructure to carry out the calculations, with contributions from K.S. and H.D. J.v.d.S. compiled the initial validation set and provided the crystallography, including disorder analysis and inspecting and correcting crystal structures. D.F., Y.M.L., J.v.d.S. and K.S. ran most of the calculations. Y.M.L. and D.F. analysed the validation results. A.G.D. contributed to the calculations. A.T. contributed to the method development. L.A., D.E.B., A.B., A.Y.L., S.L.M., S.O.N.L., W.J.L., A.M., P.M., O.D.P., M.R., S.R., A.Y.S. and G.R.W. contributed to the experimental data. S.M.R.-E. assisted with the writing of the paper. Y.M.L. wrote the first version of the paper with input from all authors. M.A.N. rewrote the text after the reviewers’ comments.
Peer review
Peer review information
Nature thanks Graeme Day and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
Crystal structure data, crystal solubilities, phase-transition temperatures and relative humidity values of the phase transitions used in this paper are provided in the Supplementary Information and Supplementary Data.
Code availability
The free-energy calculations described in this work were originally implemented in and carried out with v.3.0 of the commercial GRACE software package that can be licensed from Avant-garde Materials Simulation (AMS). As agreed with the editor, the actual free-energy calculations have been extracted from GRACE and collected as source code or pseudo-code into a library that is available from AMS upon request. The library requires the user to provide a SuperCellManager object that performs the actual single-point energy calculations. It has been tested that the library provides the same results as the corresponding code in GRACE.
Competing interests
A.M. and A.Y.S. are employees of AbbVie and may own AbbVie stock. G.R.W. is an employee of Novartis and may own Novartis stock. M.A.N. is the owner of AMS. D.F., Y.M.L., J.v.d.S., K.S., H.D. and J.H. are or have been employees of AMS and have no other competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Dzmitry Firaha, Email: dzmitry.firaha@avmatsim.eu.
Yifei Michelle Liu, Email: yifei.michelle@gmail.com.
Marcus A. Neumann, Email: marcus.neumann@avmatsim.eu
Extended data
is available for this paper at 10.1038/s41586-023-06587-3.
Supplementary information
The online version contains supplementary material available at 10.1038/s41586-023-06587-3.
References
- 1.Saal C. Selection of solid-state forms: challenges, opportunities, lessons learned and adventures from recent years. J. Pharm. Pharmacol. 2015;67:755–756. doi: 10.1111/jphp.12435. [DOI] [PubMed] [Google Scholar]
- 2.Yang M, et al. Prediction of the relative free energies of drug polymorphs above zero kelvin. Cryst. Growth Des. 2020;20:5211–5224. doi: 10.1021/acs.cgd.0c00422. [DOI] [Google Scholar]
- 3.Abramov YA, Sun G, Zeng Q. Emerging landscape of computational modeling in pharmaceutical development. J. Chem. Inf. Model. 2022;62:1160–1171. doi: 10.1021/acs.jcim.1c01580. [DOI] [PubMed] [Google Scholar]
- 4.Hoja J, et al. Reliable and practical computational description of molecular crystal polymorphs. Sci. Adv. 2019;5:eaau3338. doi: 10.1126/sciadv.aau3338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Price SL, Reutzel-Edens SM. The potential of computed crystal energy landscapes to aid solid-form development. Drug Discov. Today. 2016;21:912–923. doi: 10.1016/j.drudis.2016.01.014. [DOI] [PubMed] [Google Scholar]
- 6.Hartel RW. Advances in food crystallization. Annu. Rev. Food Sci. Technol. 2013;4:277–292. doi: 10.1146/annurev-food-030212-182530. [DOI] [PubMed] [Google Scholar]
- 7.Yang J, et al. Large-scale computational screening of molecular organic semiconductors using crystal structure prediction. Chem. Mater. 2018;30:4361–4371. doi: 10.1021/acs.chemmater.8b01621. [DOI] [Google Scholar]
- 8.Cady HH, Larson AC, Cromer DT. The crystal structure of α-HMX and a refinement of the structure of β-HMX. Acta Crystallogr. 1963;16:617–623. doi: 10.1107/S0365110X63001651. [DOI] [Google Scholar]
- 9.Lamberth C, Jeanmart S, Luksch T, Plant A. Current challenges and trends in the discovery of agrochemicals. Science. 2013;341:742–746. doi: 10.1126/science.1237227. [DOI] [PubMed] [Google Scholar]
- 10.Lee EH. A practical guide to pharmaceutical polymorph screening & selection. Asian J. Pharm. Sci. 2014;9:163–175. doi: 10.1016/j.ajps.2014.05.002. [DOI] [Google Scholar]
- 11.Censi R, Di Martino P. Polymorph impact on the bioavailability and stability of poorly soluble drugs. Molecules. 2015;20:18759–18776. doi: 10.3390/molecules201018759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bauer J, et al. Ritonavir: an extraordinary example of conformational polymorphism. Pharm. Res. 2001;18:859–866. doi: 10.1023/A:1011052932607. [DOI] [PubMed] [Google Scholar]
- 13.Yokoyama T, Umeda T, Kuroda K, Sato K, Takagishi Y. Studies on drug nonequivalence. VII. Bioavailability of acetohexamide polymorphs. Chem. Pharm. Bull. 1979;27:1476–1478. doi: 10.1248/cpb.27.1476. [DOI] [PubMed] [Google Scholar]
- 14.Aguiar AJ, Zelmer JE. Dissolution behavior of polymorphs of chloramphenicol palmitate and mefenamic acid. J. Pharm. Sci. 1969;58:983–987. doi: 10.1002/jps.2600580817. [DOI] [PubMed] [Google Scholar]
- 15.Wolff, H.-M., Quéré, L. & Riedner, J. Polymorphic form of rotigotine. European patent 2215072 B1 (2015).
- 16.Newman A, Wenslow R. Solid form changes during drug development: good, bad, and ugly case studies. AAPS Open. 2016;2:2. doi: 10.1186/s41120-016-0003-4. [DOI] [Google Scholar]
- 17.Braun DE, et al. Inconvenient truths about solid form landscapes revealed in the polymorphs and hydrates of gandotinib. Cryst. Growth Des. 2019;19:2947–2962. doi: 10.1021/acs.cgd.9b00162. [DOI] [Google Scholar]
- 18.Peresypkin A, et al. Discovery of a stable molecular complex of an API with HCl: a long journey to a conventional salt. J. Pharm. Sci. 2008;97:3721–3726. doi: 10.1002/jps.21264. [DOI] [PubMed] [Google Scholar]
- 19.Chekal BP, et al. The challenges of developing an API crystallization process for a complex polymorphic and highly solvating system. Part I. Org. Process Res. Dev. 2009;13:1327–1337. doi: 10.1021/op9001559. [DOI] [Google Scholar]
- 20.Neumann MA, van de Streek J, Fabbiani FPA, Hidber P, Grassmann O. Combined crystal structure prediction and high-pressure crystallization in rational pharmaceutical polymorph screening. Nat. Commun. 2015;6:7793. doi: 10.1038/ncomms8793. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Taylor CR, et al. Minimizing polymorphic risk through cooperative computational and experimental exploration. J. Am. Chem. Soc. 2020;142:16668–16680. doi: 10.1021/jacs.0c06749. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Bhardwaj RM, et al. A prolific solvate former, galunisertib, under the pressure of crystal structure prediction, produces ten diverse polymorphs. J. Am. Chem. Soc. 2019;141:13887–13897. doi: 10.1021/jacs.9b06634. [DOI] [PubMed] [Google Scholar]
- 23.Andrews JL, et al. Derisking the polymorph landscape: the complex polymorphism of mexiletine hydrochloride. Cryst. Growth Des. 2021;21:7150–7167. doi: 10.1021/acs.cgd.1c01009. [DOI] [Google Scholar]
- 24.Dybeck EC, McMahon DP, Day GM, Shirts MR. Exploring the multi-minima behavior of small molecule crystal polymorphs at finite temperature. Cryst. Growth Des. 2019;19:5568–5580. doi: 10.1021/acs.cgd.9b00476. [DOI] [Google Scholar]
- 25.Francia NF, Price LS, Nyman J, Price SL, Salvalaglio M. Systematic finite-temperature reduction of crystal energy landscapes. Cryst. Growth Des. 2020;20:6847–6862. doi: 10.1021/acs.cgd.0c00918. [DOI] [Google Scholar]
- 26.Sun G, et al. Current state-of-the-art in-house and cloud-based applications of virtual polymorph screening of pharmaceutical compounds: a challenging case of AZD1305. Cryst. Growth Des. 2021;21:1972–1983. doi: 10.1021/acs.cgd.0c01266. [DOI] [Google Scholar]
- 27.Bowskill DH, Sugden IJ, Konstantinopoulos S, Adjiman CS, Pantelides CC. Crystal structure prediction methods for organic molecules: state of the art. Annu. Rev. Chem. Biomol. Eng. 2021;12:593–623. doi: 10.1146/annurev-chembioeng-060718-030256. [DOI] [PubMed] [Google Scholar]
- 28.Dudek MK, Drużbicki K. Along the road to crystal structure prediction (CSP) of pharmaceutical-like molecules. CrystEngComm. 2022;24:1665–1678. doi: 10.1039/D1CE01564H. [DOI] [Google Scholar]
- 29.Greenwell C, et al. Overcoming the difficulties of predicting conformational polymorph energetics in molecular crystals via correlated wavefunction methods. Chem. Sci. 2020;11:2200–2214. doi: 10.1039/C9SC05689K. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Beran GJO, et al. How many more polymorphs of ROY remain undiscovered. Chem. Sci. 2022;13:1288–1297. doi: 10.1039/D1SC06074K. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Zhang P, et al. Harnessing cloud architecture for crystal structure prediction calculations. Cryst. Growth Des. 2018;18:6891–6900. doi: 10.1021/acs.cgd.8b01098. [DOI] [Google Scholar]
- 32.Mortazavi M, et al. Computational polymorph screening reveals late-appearing and poorly-soluble form of rotigotine. Commun. Chem. 2019;2:70. doi: 10.1038/s42004-019-0171-y. [DOI] [Google Scholar]
- 33.Mattei A, et al. Efficient crystal structure prediction for structurally related molecules with accurate and transferable tailor-made force fields. J. Chem. Theory Comput. 2022;18:5725–5738. doi: 10.1021/acs.jctc.2c00451. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Braun DE, Karamertzanis PG, Price SL. Which, if any, hydrates will crystallise? Predicting hydrate formation of two dihydroxybenzoic acids. Chem. Commun. 2011;47:5443–5445. doi: 10.1039/C1CC10762C. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Cruz-Cabeza AJ, et al. Predicting stoichiometry and structure of solvates. Chem. Commun. 2010;46:2224–2226. doi: 10.1039/b922955h. [DOI] [PubMed] [Google Scholar]
- 36.Cruz-Cabeza AJ, Day GM, Jones W. Towards prediction of stoichiometry in crystalline multicomponent complexes. Chem. Eur. J. 2008;14:8830–8836. doi: 10.1002/chem.200800668. [DOI] [PubMed] [Google Scholar]
- 37.Dybeck EC, et al. A comparison of methods for computing relative anhydrous–hydrate stability with molecular simulation. Cryst. Growth Des. 2023;23:142–167. doi: 10.1021/acs.cgd.2c00832. [DOI] [Google Scholar]
- 38.Hong RS, Mattei A, Sheikh AY, Tuckerman ME. A data-driven and topological mapping approach for the a priori prediction of stable molecular crystalline hydrates. Proc. Natl Acad. Sci. USA. 2022;119:e2204414119. doi: 10.1073/pnas.2204414119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Hermann J, Tkatchenko A. Density functional model for van der Waals interactions: unifying many-body atomic approaches with nonlocal functionals. Phys. Rev. Lett. 2020;124:146401. doi: 10.1103/PhysRevLett.124.146401. [DOI] [PubMed] [Google Scholar]
- 40.Mony L, Kew JN, Gunthorpe MJ, Paoletti P. Allosteric modulators of NR2B-containing NMDA receptors: molecular mechanisms and therapeutic potential. Br. J. Pharmacol. 2009;157:1301–1317. doi: 10.1111/j.1476-5381.2009.00304.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Auvin S, et al. Radiprodil, a NR2B negative allosteric modulator, from bench to bedside in infantile spasm syndrome. Ann. Clin. Transl. Neurol. 2020;7:343–352. doi: 10.1002/acn3.50998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Mullier B, et al. GRIN2B gain of function mutations are sensitive to radiprodil, a negative allosteric modulator of GluN2B-containing NMDA receptors. Neuropharmacology. 2017;123:322–331. doi: 10.1016/j.neuropharm.2017.05.017. [DOI] [PubMed] [Google Scholar]
- 43.Mohamed M-EF, Zeng J, Marroum PJ, Song I-H, Othman AA. Pharmacokinetics of upadacitinib with the clinical regimens of the extended‐release formulation utilized in rheumatoid arthritis phase 3 trials. Clin. Pharmacol. Drug Dev. 2019;8:208–216. doi: 10.1002/cpdd.462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Duggan S, Keam SJ. Upadacitinib: first approval. Drugs. 2019;79:1819–1828. doi: 10.1007/s40265-019-01211-z. [DOI] [PubMed] [Google Scholar]
- 45.Neumann MA, van de Streek J. How many ritonavir cases are there still out there? Faraday Discuss. 2018;211:441–458. doi: 10.1039/C8FD00069G. [DOI] [PubMed] [Google Scholar]
- 46.Maddox J. Crystals from first principles. Nature. 1988;335:201. doi: 10.1038/335201a0. [DOI] [Google Scholar]
- 47.Poltavsky I, Tkatchenko A. Machine learning force fields: recent advances and remaining challenges. J. Phys. Chem. Lett. 2021;12:6551–6564. doi: 10.1021/acs.jpclett.1c01204. [DOI] [PubMed] [Google Scholar]
- 48.Unke OT, et al. Machine learning force fields. Chem. Rev. 2021;121:10142–10186. doi: 10.1021/acs.chemrev.0c01111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Lee, T. J. & Scuseria, G. E. in Quantum Mechanical Electronic Structure Calculations with Chemical Accuracy Vol. 13 (ed. Langhoff, S. R.) 47–108 (Springer, 1995).
- 50.Beran GJO, Wright SE, Greenwell C, Cruz-Cabeza AJ. The interplay of intra- and intermolecular errors in modeling conformational polymorphs. J. Chem. Phys. 2022;156:104112. doi: 10.1063/5.0088027. [DOI] [PubMed] [Google Scholar]
- 51.Perdew JP, Burke K, Ernzerhof M. Generalized gradient approximation made simple. Phys. Rev. Lett. 1996;77:3865–3868. doi: 10.1103/PhysRevLett.77.3865. [DOI] [PubMed] [Google Scholar]
- 52.Neumann MA, Perrin M-A. Energy ranking of molecular crystals using density functional theory calculations and an empirical van der Waals correction. J. Phys. Chem. B. 2005;109:15531–15541. doi: 10.1021/jp050121r. [DOI] [PubMed] [Google Scholar]
- 53.Blum V, et al. Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Commun. 2009;180:2175–2196. doi: 10.1016/j.cpc.2009.06.022. [DOI] [Google Scholar]
- 54.Knuth F, Carbogno C, Atalla V, Blum V, Scheffler M. All-electron formalism for total energy strain derivatives and stress tensor components for numeric atom-centered orbitals. Comput. Phys. Commun. 2015;190:33–50. doi: 10.1016/j.cpc.2015.01.003. [DOI] [Google Scholar]
- 55.Togo, A., Seto, Y. & Pashov, D. Spglib. GitHub https://github.com/spglib/spglib (2008).
- 56.Yu VW, et al. ELSI: A unified software interface for Kohn–Sham electronic structure solvers. Comput. Phys. Commun. 2018;222:267–285. doi: 10.1016/j.cpc.2017.09.007. [DOI] [Google Scholar]
- 57.Havu V, Blum V, Havu P, Scheffler M. Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions. J. Comput. Phys. 2009;228:8367–8379. doi: 10.1016/j.jcp.2009.08.008. [DOI] [Google Scholar]
- 58.Perdew JP, Ernzerhof M, Burke K. Rationale for mixing exact exchange with density functional approximations. J. Chem. Phys. 1996;105:9982–9985. doi: 10.1063/1.472933. [DOI] [Google Scholar]
- 59.Adamo C, Barone V. Toward reliable density functional methods without adjustable parameters: the PBE0 model. J. Chem. Phys. 1999;110:6158–6170. doi: 10.1063/1.478522. [DOI] [Google Scholar]
- 60.Tkatchenko A, DiStasio RA, Jr, Car R, Scheffler M. Accurate and efficient method for many-body van der Waals interactions. Phys. Rev. Lett. 2012;108:236402. doi: 10.1103/PhysRevLett.108.236402. [DOI] [PubMed] [Google Scholar]
- 61.Ambrosetti A, Reilly AM, DiStasio RA, Jr, Tkatchenko A. Long-range correlation energy calculated from coupled atomic response functions. J. Chem. Phys. 2014;140:18A508. doi: 10.1063/1.4865104. [DOI] [PubMed] [Google Scholar]
- 62.Řezáč J, Greenwell C, Beran GJO. Accurate noncovalent interactions via dispersion-corrected second-order Møller–Plesset perturbation theory. J. Chem. Theory Comput. 2018;14:4711–4721. doi: 10.1021/acs.jctc.8b00548. [DOI] [PubMed] [Google Scholar]
- 63.Zhang IY, Ren X, Rinke P, Blum V, Scheffler M. Numeric atom-centered-orbital basis sets with valence-correlation consistency from H to Ar. New J. Phys. 2013;15:123033. doi: 10.1088/1367-2630/15/12/123033. [DOI] [Google Scholar]
- 64.psi4. Anaconda.org. https://anaconda.org/psi4/repo.
- 65.Smith DGA, et al. Psi4 1.4: open-source software for high-throughput quantum chemistry. J. Chem. Phys. 2020;152:184108. doi: 10.1063/5.0006002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Neumann MA, Leusen FJJ, Kendrick J. A major advance in crystal structure prediction. Angew. Chem. Int. Ed. 2008;47:2427–2430. doi: 10.1002/anie.200704247. [DOI] [PubMed] [Google Scholar]
- 67.Neumann MA. Tailor-made force fields for crystal-structure prediction. J. Phys. Chem. B. 2008;112:9810–9829. doi: 10.1021/jp710575h. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Crystal structure data, crystal solubilities, phase-transition temperatures and relative humidity values of the phase transitions used in this paper are provided in the Supplementary Information and Supplementary Data.
The free-energy calculations described in this work were originally implemented in and carried out with v.3.0 of the commercial GRACE software package that can be licensed from Avant-garde Materials Simulation (AMS). As agreed with the editor, the actual free-energy calculations have been extracted from GRACE and collected as source code or pseudo-code into a library that is available from AMS upon request. The library requires the user to provide a SuperCellManager object that performs the actual single-point energy calculations. It has been tested that the library provides the same results as the corresponding code in GRACE.