Abstract
Approximately 40% of marketed drugs exhibit suboptimal pharmacokinetic profiles. Co‐crystallization, where pairs of molecules form a multicomponent crystal, constitutes a promising strategy to enhance physicochemical properties without compromising the pharmacological activity. However, finding promising co‐crystal pairs is resource‐intensive due to the large and diverse range of possible molecular combinations. We present DeepCocrystal, a novel deep learning approach designed to predict co‐crystal formation by processing the “chemical language” from a supramolecular vantage point. Rigorous validation of DeepCocrystal showed a balanced accuracy of 78% in realistic scenarios, outperforming existing models. Explainable AI approaches uncovered the decision‐making process of DeepCocrystal, showing its capability to learn chemically relevant aspects of the “supramolecular language” that match experimental co‐crystallization patterns. By leveraging properties of molecular string representations, DeepCocrystal can also estimate the uncertainty of its predictions. We harnessed this capability in a challenging prospective study and successfully discovered two novel co‐crystals of diflunisal, an anti‐inflammatory drug. This study underscores the potential of deep learning—and in particular of chemical language processing—to accelerate co‐crystallization and ultimately drug development, in both academic and industrial contexts. DeepCocrystal is available as an easy‐to‐use web application at https://deepcocrystal.streamlit.app/.
Keywords: Chemical language processing, Co‐crystallization, Deep learning, Explainable AI, Supramolecular chemistry
DeepCocrystal is a deep learning model that predicts co‐crystal formation by learning supramolecular interactions from molecular string representations. It achieves high accuracy, estimates uncertainty, and offers interpretable insights into its predictions. Validated prospectively, DeepCocrystal discovered novel diflunisal co‐crystals and is available as a user‐friendly web application.
![]()
Introduction
Co‐crystallization enables the optimization of the pharmacokinetic properties of active pharmaceutical ingredients (APIs).[ 1 , 2 ] Via co‐crystallization, supramolecular interactions between the API and another molecule (coformer) are established to form a multicomponent crystal[ 3 ] (Figure 1a). The resulting co‐crystal preserves the bioactivity of the lead molecule while enhancing desirable properties, such as solubility and stability. Owed to the high number of possible combinations, finding the optimal coformer for a given API is far from trivial and ultimately relies on a labor‐ and time‐intensive process based on trial and error.[ 4 , 5 ]
Figure 1.

Overview of key elements of DeepCocrystal for co‐crystal prediction. a) The co‐crystallization between an active pharmaceutical ingredient (API) and a coformer involves the formation of a multicomponent crystalline structure (co‐crystal), in which the API and coformer are held together by non‐covalent interactions. b) SMILES strings, which convert a molecular graph into one string. One molecule can be represented by many different SMILES strings, based on the starting (non‐hydrogen) atom and the chosen direction for graph traversal. c) DeepCocrystal represents API and coformers via augmented SMILES strings and passes them through 1‐dimensional (1D) convolutions. Fully connected layers are then used to predict the co‐crystallization output as a continuous number between 0 and 1, which can be then discretized (with a cut‐off of 0.5) to perform a prediction (“negative” pair if below, and “positive” pair otherwise).
Machine learning—which automatically extracts relevant information from chemical datasets[ 6 ]—can aid in prioritizing API‐coformer pairs for co‐crystallization.[ 7 , 8 , 9 ] Several approaches have been used for co‐crystallization prediction, ranging from “human‐engineered” molecular descriptors with tree‐based models and neural networks[ 10 , 11 ] to deep representation learning, using graph convolutional networks.[ 12 , 13 ] In co‐crystallization prediction, machine learning might struggle to generalize to previously unseen molecules.[ 9 ] This is in part due to limitations of training datasets, which are unrealistically imbalanced toward “positive” API‐coformer pairs.[ 9 , 14 ] In fact, unlike positive results, negative co‐crystallization results may stem from either the inability of two molecules to co‐crystallize or suboptimal experimental conditions, leading to a general under‐reporting of negative data.[ 15 ] To counterbalance this issue, random labeling of untested molecule pairs as negatives has been proposed as a solution,[ 9 ] at the risk of introducing mislabeled data. At the same time, undersampling approaches could rebalance positive and negative molecule pairs, but they might distort data distribution and lead to overfitting compared to the real‐world occurrence.[ 16 , 17 ] Hence, developing machine learning approaches that are less sensitive to the limitations of current co‐crystallization datasets and perform well on previously unseen molecular pairs bears great potential to advance API‐coformer prioritization.
Here, we introduce DeepCocrystal, a novel deep learning approach designed to learn the “supramolecular language” of co‐crystallization while counteracting dataset biases and improving generalization. Supramolecular chemistry can be viewed as a language[ 18 , 19 , 20 ]: atoms (“letters”) form molecules (“words”), whose combinations give rise to supramolecular interactions (“sentences”). Building on this analogy, we extend current chemical language processing techniques[ 21 , 22 , 23 , 24 ]—which predict molecular properties from single string representations[ 25 , 26 ]—to predicting supramolecular interactions between pairs of molecules (i.e., co‐crystallization).
DeepCocrystal represents single molecules (API and coformer) as SMILES (Simplified Molecular Input Line Entry Systems[ 25 ]) strings (Figure 1b), whose chemical information is combined to predict whether they form co‐crystals.
DeepCocrystal exploits an intriguing property of the “chemical language” of SMILES strings, whereby a given molecule can be represented via multiple SMILES strings.[ 27 ] DeepCocrystal leverages such non‐univocity to achieve two goals: a) mitigate the class imbalance (by using different numbers of SMILES strings for positive and negative molecular pairs), and b) estimate the uncertainty of its predictions.
DeepCocrystal shows superior performance and generalization capacity than existing approaches.[ 13 , 28 , 29 , 30 , 31 , 32 ] Moreover, model explanation revealed DeepCocrystal's ability to capture key substructures in molecular pairs involved in supramolecular interactions. When applied prospectively to identify coformer candidates, all high‐certainty predictions of DeepCocrystal were confirmed experimentally—leading to the identification of two previously unreported diflunisal co‐crystals. SMILES augmentation and uncertainty estimation were pivotal to achieve DeepCocrystal's performance, both in our in silico analysis and experimentally. To the best of our knowledge, this is the first application of “supramolecular language” processing to predict co‐crystallization—opening novel opportunities in supramolecular chemistry.
Results and Discussion
DeepCocrystal Architecture
DeepCocrystal represents molecules as SMILES strings.[ 25 ] SMILES are obtained by traversing a molecular graph starting from a non‐hydrogen atom, and by annotating atoms and bonds with specific symbols (Figure 1b). Different SMILES strings for the same molecule can be obtained by starting from a different non‐hydrogen atom and/or following a different path for graph traversal (Figure 1b). DeepCocrystal leverages convolutional neural networks (CNNs)[ 33 ] for “chemical language” processing. CNNs are a class of deep learning models commonly used for processing sequences of text.[ 34 ] Via convolution—which involves sliding a filter (kernel) over the input text—CNNs can capture information and features at different levels of abstraction, and progressively aggregate it to provide a prediction. CNNs have been previously applied to predict the properties of single molecules from their SMILES strings,[ 21 , 22 , 23 ] and have shown optimal performance compared to other SMILES‐processing architectures.[ 35 ]
DeepCocrystal extends traditional chemical language processing approaches beyond the “one‐molecule‐one‐property” paradigm to learn simultaneously from the SMILES strings of pairs of molecules (i.e., API‐coformer pairs). In particular, DeepCocrystal uses two separate CNNs to learn “latent representations” of the input molecular structures (of each API and coformer) and then aggregates this information via a fully connected neural network to predict the potential co‐crystallization of the input pair (Figure 1c). Via DeepCocrystal, the co‐crystallization potential of any molecular pair is predicted as a number between 0 (negative) and 1 (positive). Every API and coformer pair is represented with multiple SMILES strings.[ 27 ] In particular, given a co‐crystallization dataset, SMILES from the “negative” pairs are augmented proportionally to the percentage of “positive” pairs, and vice versa. This strategy aims to alleviate class imbalance without artificially incorporating potentially mislabeled information or losing important information by undersampling.
Performance of DeepCocrystal
To train DeepCocrystal, we collected and manually curated a dataset of experimentally‐determined co‐crystal structures, from a) the Cambridge Structural Database[ 36 ] and b) existing co‐crystal literature.[ 13 , 37 , 38 , 39 , 40 , 41 ] Moreover, a set of in‐house experiments was conducted to measure the co‐crystallization of additional molecular pairs. The collected dataset comprises a total of 6632 API‐coformer pairs, of which 5240 (79%) are co‐crystals (“positive”) and 1392 (21%) are physical mixtures (“negative”, i.e., no observed co‐crystallization). The dataset was randomly divided into training, validation, and “internal” test sets (80%, 10%, and 10%, respectively) using class‐based stratification. This splitting procedure was repeated 10 times. Interestingly, data curation by removing non‐informative molecular pairs allowed avoiding model “shortcuts” and memorization (see Supporting Information, Materials and Methods). An “external” test set comprising 364 molecular pairs (129 co‐crystals and 235 non‐co‐crystals) was curated from both literature[ 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 ] (63%) and in‐house data (37%). The external test set has lower substructure similarity[ 55 ] to the training set (Figure S1)—constituting a more challenging prediction scenario.
Given the proportion of “positive” and “negative” pairs, for DeepCocrystal we performed a 2:7 (positive:negative) SMILES augmentation, i.e., we represented each molecule belonging to positive pairs with two randomized SMILES, and those belonging to negative pairs with seven randomized SMILES each. This “augmented” dataset was used to predict co‐crystallization with CNNs. To quantify the efficacy of the chosen augmentation, we performed two controls using the same architecture (Figure 1c) and different SMILES versions: a) “SMI‐CNN(can)” relying on a single canonical (unique) SMILES string[ 56 ] for each molecule, and b) “SMI‐CNN(1:4)”, which uses a 1:4 augmentation. SMI‐CNN(can) allows to inspect the effect of using multiple SMILES for the same molecule as in DeepCocrystal, while SMI‐CNN(1:4) controls for the chosen augmentation level.
Models were evaluated in terms of Recall (ability to correctly classify positive pairs, Equation S1), Precision (accuracy of positive predictions, Equation S2), Specificity (ability to correctly classify negative pairs, Equation S3) and Balanced Accuracy[ 57 ] (overall performance, Equation S4). These metrics were computed by considering predictions lower than 0.5 as a “negative”, or “positive” otherwise (Table 1). All models reached a Balanced Accuracy above 88%, with DeepCocrystal (2:7 augmentation) performing the best. When looking at class performance, different trends can be observed. In identifying “positive” pairs, canonical SMILES led to the best performance (up to 5% increase in Recall), while SMI‐CNN(1:4) and DeepCocrystal showed comparable performance. On Precision, DeepCocrystal achieved the highest score, significantly different from SMI‐CNN(can) (Wilcoxon signed‐rank test, p < 0.05, preceded by Friedman test). On the contrary, DeepCocrystal significantly improved the ability to identify negative pairs (Friedman test and Wilcoxon signed‐rank test, p < 0.05), resulting in an 8% increase in Specificity compared to the canonical version. This evidence highlights how SMILES augmentation on the negative class, can aid in mitigating the data unbalance.
Table 1.
Performance of DeepCocrystal. DeepCocrystal was tested on two test sets, one internal and one external. The internal test set was composed of 664 molecular pairs. The external set was composed of 364 pairs collected in a second phase of the project, and containing more structurally diverse molecular pairs. The test sets were used to benchmark DeepCocrystal with a) approaches using the same architecture (CNN) and different SMILES representations (“SMI‐CNN”), and b) models from literature. Balanced Accuracy (global performance), Recall (performance on “positive” pairs), Precision (accuracy of positive predictions), Specificity (performance on “negative” pairs) are reported for each set and each model (the closer to 100%, the better). The best performance per metric is highlighted in boldface for each considered test set.
| Test set | Model | Representation | BAcc | Recall | Precision | Specificity |
|---|---|---|---|---|---|---|
| Internal | DeepCocrystal | SMILES (2:7) | 89% ± 2% | 92% ± 2% | 97% ± 1% | 87% ± 3% |
| SMI‐CNN(can) | SMILES (1:1) | 88% ± 2% | 96% ± 1% | 95% ± 1% | 79% ± 6% | |
| SMI‐CNN(1:4) | SMILES (1:4) | 88% ± 2% | 91% ± 2% | 96% ± 1% | 86% ± 3% | |
| Descriptor‐DNN[ 28 ] | Descriptors | 89% ± 1% | 96% ± 1% | 96% ± 1% | 83% ± 2% | |
| Fingerprint‐DNN[ 29 ] | Fingerprints | 91% ± 1% | 96% ± 1% | 96% ± 1% | 85% ± 2% | |
| External | DeepCocrystal | SMILES (2:7) | 78% | 75% | 68% | 81% |
| SMI‐CNN(can) | SMILES (1:1) | 59% | 93% | 41% | 26% | |
| SMI‐CNN(1:4) | SMILES (1:4) | 69% | 71% | 54% | 66% | |
| Descriptors‐DNN[ 28 ] | Descriptors | 63% | 84% | 44% | 41% | |
| Fingerprint‐DNN[ 29 ] | Fingerprints | 57% | 90% | 40% | 25% | |
| CCGNet[ 13 ] | Molecular graph | 60% | 51% | 47% | 69% | |
| CC‐Descriptors‐ML a) [ 30 ] | Descriptors | 63% | 79% | 45% | 48% | |
| MC[ 31 ] | Structural descriptors | 45% | 69% | 32% | 21% | |
| HBP[ 32 ] | Structural descriptors | 43% | 7% | 15% | 78% |
Performance computed by excluding five molecular pairs that were used for model training.
On the more challenging external set, the SMILES augmentation increased the Balanced Accuracy by 10% (or more) compared to using canonical SMILES strings. This aspect indicates a higher generalization potential provided by learning from different SMILES versions of the same molecule. When comparing different augmentation levels, DeepCocrystal (2:7 augmentation) shows better performance than SMI‐CNN(1:4), in terms of global performance (9% higher Balanced Accuracy) and minimization of false positives (14% higher Precision)—indicating a greater potential for prospective applications.
Model Benchmarking
DeepCocrystal was benchmarked against six existing approaches. Four models are based on machine learning (ML) algorithms: i) CCGNet,[ 13 ] which relies on graph neural networks to perform a prediction; ii) CC‐Descriptor‐ML, which relies on an array of “classical” machine learning models trained on co‐crystal descriptors;[ 30 ] iii) Descriptor‐DNN, based on a fully connected neural network trained on molecular descriptors;[ 28 ] and iv) Fingerprint‐DNN, a fully connected neural network trained on extended connectivity fingerprints.[ 29 , 58 ] To ensure comparability and account for the lack of provided code, data, and/or hyperparameters, we re‐implemented and trained Descriptor‐DNN and Fingerprint‐DNN, using the same dataset as DeepCocrystal (see Supporting Information, Materials and Methods). Furthermore, we considered two statistical tools developed by the Cambridge Crystallographic Data Centre (CCDC): i) Molecular Complementarity (MC), which evaluates geometric and polarity factors for molecular compatibility, and ii) Hydrogen Bond Propensity (HBP), which estimates the likelihood of hydrogen bonding between different functional groups, considering both homomeric and heteromeric interactions. All six approaches were benchmarked for their performance on the internal and external test sets, whenever possible. CCGNet and CC‐Descriptor‐ML were excluded from the internal evaluation due to data overlap between the training and test sets (CCGNet), and the significant time required to calculate descriptors for thousands of data points (CC‐Descriptor‐ML). Similarly, CCDC tools were applied only to the external test set, as HBP requires extensive computation for some API‐coformer pairs, while MC is not yet implemented as an automated approach, requiring manual entry for each case.
On the internal set, very small differences are observed across methods, with no single model consistently outperforming the others on every metric (Friedman test, α = 0.05). DeepCocrystal performs marginally better than the rest in minimizing false positives (as visible from Precision and Specificity), and Fingerprint‐DNN has a marginally better global performance (Balanced Accuracy) and identification of true positives (Recall).
On the external set—the most challenging scenario—differences between models are more evident. DeepCocrystal consistently outperformed the benchmarks (Table 1). DeepCocrystal achieved 15%–35% higher Balanced Accuracy, 21%–53% higher Precision, and 12%–60% higher Specificity than the benchmarks, albeit with a moderate Recall reduction (of up to 15% lower). HBP and MC exhibited the lowest Balanced Accuracy values, making them less informative than a random selection. Moreover, when analyzing different ranges of similarity (see Table S4), the advantages of DeepCocrystal become particularly evident for molecules with a similarity below 0.6, where Precision and Specificity are respectively 21%–35% and 27%–69% higher than the benchmarks. These results indicate that DeepCocrystal finds a better trade‐off between correctly identifying positive and negative pairs than the benchmarks, even when applied to more diverse molecular pairs. The benchmarks predict positives more frequently (higher Recall), at the cost of more false positives (lower Precision and Specificity). This might be due to the skewed distribution of their training data and the unrealistic proportion of positive pairs. On the contrary, DeepCocrystal shows superior performance in minimizing false positives, thanks to SMILES augmentation (as visible from the SMI‐CNN controls), making it a more effective tool for accelerating co‐crystallization screening.
Uncertainty Estimation
Uncertainty quantification[ 59 ] is crucial whenever the model predictions are used to steer experimental design, since unforeseen prediction errors can lead to wasted time and resources. Although many approaches to estimate uncertainty exist,[ 60 , 61 , 62 ] this aspect has not been systematically incorporated into co‐crystallization prediction to date. To extend the applicability of DeepCocrystal to real‐world scenarios, we equipped it with an estimate of its (un)certainty. Here we use the term “uncertainty” in its broad sense, i.e., as a measure of confidence in the model's predictions.[ 63 , 64 ] We represented each molecular pair with 10 different (pairs of) SMILES strings, and used DeepCocrystal predictions to estimate uncertainty. Specifically, whenever a given molecular pair received conflicting predictions across its SMILES repetitions, we considered the model predictions as potentially unreliable. This was substantiated by the observation that the average model prediction across SMILES repetitions highlighted several errors (Figure 2a).
Figure 2.

Dissecting DeepCocrystal predictions. a) Relationship between prediction and classification performance. The SMILES of external test set samples were augmented 10 times and the average prediction was computed per API‐coformer pair. Such average prediction was used to classify the molecular pairs based on a cut‐off of 0.5 (negative if below, and positive otherwise). Molecular pairs were evaluated by comparing their true class with the predicted class, and counting the frequency of assignations: TP = true positive; FP = false positive; FN = false negative; TN = true negative. Box plots depict the distribution of DeepCocrystal's predictions for each group (central line: median; box: inter‐quartile range; whiskers: minimum and maximum values). The median predictions of DeepCocrystal were significantly different between true and false classifications (i.e., TP vs. FP, and TN vs. FN; Kruskal–Wallis H‐test, p < 0.05). b) N‐gram relevance for DeepCocrystal predictions. Top and bottom n‐grams derived from layer‐wise relevance propagation (LRP). Functional groups commonly involved in supramolecular interactions, such as nitrogen‐ and oxygen‐containing motifs, appear among the most relevant, paired in complementary groups. c) Relevance of DeepCocrystal's prediction (as obtained by LRP) on selected crystal structures. Colors indicate the relevance for the final prediction (blue: positive contribution to co‐crystal formation; red: negative contribution to co‐crystal formation). (1) 2‐phenylpropionic acid and 1,2‐bis(4‐pyridil)ethane (CSD: FOBDUZ); (2) prothionamide and benzensulfonic acid (CSD: LISKEH); (3) ibuprofen and piperazine (CSD: FAGKAD); (4) Methyl gallate and proline (CSD: ZAMZEW); (5) bacailen and pyrazinamide (CSD: SIFYUF); (6) vinpocetine and malic acid.[ 53 ]
We tested two ways of estimating the DeepCocrystal's uncertainty using its predictions on the SMILES repetitions for each molecular pair: a) Majority voting, whereby the number of agreements in the predicted class per each molecular pair is used as a measure of confidence (the higher, the better), and (b) Standard deviation‐based estimation, whereby the standard deviation across augmented SMILES (per each pair) is computed (the lower, the better). For each approach, several thresholds of uncertainty (i.e., on standard deviation or on number of agreeing predictions) were evaluated for their effect on performance and on the number of molecules retained for prediction (Table 2).
Table 2.
Uncertainty estimation with DeepCocrystal. External test set molecules were represented as 10 SMILES strings each before prediction (using DeepCocrystal). Two approaches were considered to estimate uncertainty, i.e., majority voting, which picks the most frequent class among the predictions (per molecular pair), and standard deviation computed on the individual model predictions per each pair. Different uncertainty thresholds on each approach were analyzed for their effect on the model performance, as well as on the number of molecular pairs predicted. The number and percentage of predicted pairs (i.e., predictions below the considered thresholds), Balanced Accuracy (BAcc), Recall, Specificity, and Precision are reported. DeepCocrystal on canonical SMILES (which is invariant to augmentation and cannot be used for uncertainty estimation) was used as a performance baseline. The best performing models per metric are highlighted in boldface.
| SMILES input | Method | Thr. | No. Pairs (%) | BAcc | Recall | Precision | Specificity |
|---|---|---|---|---|---|---|---|
| Canonical | – | – | 364 (100%) | 78% | 75% | 68% | 81% |
| Augmented (10‐fold) | Major. | ≥ 50% | 364 (100%) | 76% | 75% | 64% | 77% |
| Major. | ≥ 60% | 348 (96%) | 77% | 75% | 67% | 79% | |
| Major. | ≥ 70% | 313 (86%) | 79% | 77% | 72% | 82% | |
| Major. | ≥ 80% | 287 (79%) | 82% | 79% | 74% | 84% | |
| Major. | ≥ 90% | 254 (70%) | 84% | 82% | 77% | 86% | |
| Major. | = 100% | 218 (60%) | 87% | 86% | 83% | 89% | |
| St. dev. | ≤ 0.50 | 364 (100%) | 76% | 75% | 64% | 77% | |
| St. dev. | ≤ 0.40 | 351 (96%) | 77% | 76% | 66% | 78% | |
| St. dev. | ≤ 0.30 | 275 (76%) | 82% | 80% | 73% | 83% | |
| St. dev. | ≤ 0.20 | 227 (62%) | 86% | 85% | 80% | 87% | |
| St. dev. | ≤ 0.10 | 191 (52%) | 88% | 86% | 86% | 90% | |
| St. dev. | ≤ 0.05 | 161 (44%) | 88% | 84% | 86% | 91% |
For both uncertainty estimation strategies, DeepCocrystal's performance consistently increases when using stricter thresholds (up to 10% improvement across metrics), with a progressively smaller number of predicted pairs (Table 2). Both approaches have their merits and drawbacks. Standard deviation outperforms majority voting in classification performance (up to 2% improvement), at the expense of the number of predicted molecular pairs (57 fewer pairs). The uncertainty‐estimation approach to use, and its respective threshold, should be chosen on a case‐by‐case basis, depending on the ultimate goal. For prospective application, we used standard deviation, with a threshold of 0.10, since it provides the highest Recall (recognition of true positives) and Specificity (recognition of true negatives), albeit with a high portion of non‐predicted molecules.
Model Explainability
Although deep learning excels at capturing complex non‐linear relationships between input molecules and their co‐crystallization behavior, it lacks the interpretability of simpler machine learning models.[ 65 ] Understanding how DeepCocrystal generates its predictions is essential for ensuring that the results derive from structure–property relationships and not spurious correlations. Furthermore, adding elements of explainability[ 65 ] can further enhance the understanding of co‐crystallization processes and generate new knowledge. To achieve explainability for DeepCocrystal, we applied layer‐wise relevance propagation (LRP).[ 66 ] LRP explains deep learning predictions by propagating relevance backward through the network, identifying the most influential parts of the input (i.e., molecule SMILES) that contribute to the predicted outcome (i.e., co‐crystallization). Positive relevance values indicate a positive contribution to co‐crystallization, and vice versa for negative values. To analyze the obtained relevance values, the SMILES strings were divided into n‐grams—contiguous sequences of n SMILES tokens. In this work, n = 4 was chosen as it a) matches the information seen by the model (kernel size = 4), and b) allows capturing substructures. We identified the top five and bottom five n‐grams based on their relevance for each API‐coformer pair (Figure 2b).
Our analysis revealed complementary functional groups appearing at the top of the ranking. Specifically, oxygen‐containing groups are consistently linked with amine or aromatic nitrogen groups, suggesting that DeepCocrystal recognizes key supramolecular interactions driven by the complementarity between hydrogen bond donors and acceptors. In contrast, non‐interacting molecular substructures appear at the bottom, where oxygen‐containing groups are typically paired with carbon atoms, or two hydrogen bond acceptors are paired together (e.g., aromatic nitrogen with a thio group), indicating the absence of detectable supramolecular synthons.
To compare the provided explanations with an experimentally determined ground truth, we selected the top six predicted co‐crystals of different APIs from the external test set, for which crystal structures were publicly available[ 43 , 47 , 51 , 53 , 54 , 67 ] (Figure 2c). In addition to recognizing complementary functional groups in the explanations, DeepCocrystal correctly identifies key interactions present in the crystal structures. In cases where a single synthon drives co‐crystallization (e.g., Figure 2c, structures 1, 3), the model consistently assigns high relevance to the interacting hydrogen bond donor and acceptor groups. When multiple interactions contribute to crystal packing (e.g., Figure 2c, structures 2, 5, 6), at least one is correctly identified. These findings highlight that DeepCocrystal not only predicts co‐crystallization outcomes with high accuracy, but it also autonomously learns fundamental principles of supramolecular interactions—corroborating its potential as a valuable tool for guiding co‐crystal design.
Prospective Experimental Application
We applied DeepCocrystal prospectively, to previously unseen molecular pairs. Diflunisal (Figure 3a), an anti‐inflammatory drug,[ 37 ] was selected as the API due to its poor water solubility, which renders co‐crystallization a viable strategy to enhance bioavailability.[ 68 ] As coformer candidates, we selected 12 natural products containing polyphenolic or purine moieties (Table S11), due to their co‐administrability and health benefits such as central nervous system stimulation, reduced risk of neurodegenerative diseases, and anti‐inflammatory properties.[ 69 , 70 , 71 , 72 ]
Figure 3.

Prospective experimental validation. a) Coformer candidates for diflunisal (API), selected for the prospective experimental validation. DeepCocrystal was used to select two “positive” predictions (adenine and caffeine), two “negative” predictions (rosmarinic acid and riboflavin), and two high‐uncertainty predictions (theobromine and xanthine) for experimental testing. The experimental tests confirmed DeepCocrystal predictions (Table 3). b) Relevance of DeepCocrystal's prediction (as obtained by LRP) on (7) diflunisal and adenine, and (8) diflunisal and caffeine. Colors indicate the relevance for the final prediction (blue: positive contribution to co‐crystal formation; red: negative contribution to co‐crystal formation).
10‐fold SMILES augmentation was performed for each molecule, and the co‐crystallization potential of the respective 12 API‐coformer pairs was predicted with DeepCocrystal (Table S11). For experimental validation, three categories of predictions were considered (Table 3): a) top‐two high‐certainty, positive predictions (adenine and caffeine), b) top‐two high‐certainty, negative predictions (rosmarinic acid and riboflavin), and c) two most uncertain predictions (theobromine and xanthine). Each selected pair was tested in the lab via well‐established protocols, i.e., via grinding, liquid‐assisted grinding, and slurry methods.[ 5 , 73 , 74 , 75 ] The co‐crystallization outcome was determined on the obtained powder samples, via infrared spectroscopy and solid‐state nuclear magnetic resonance (see Supporting Information, Materials and Methods).
Table 3.
Prospective validation of DeepCocrystal. DeepCocrystal was used to predict the co‐crystallization potential of 12 coformer candidates with the API diflunisal, and six candidates were selected for lab experiments. The experimental outcome after lab validation is reported: × = negative outcome; ? = uncertain outcome (not predicted, st. dev. > 0.30); ✓ = positive outcome. Classification performance and percentage of pairs not predicted (Pairsout) are reported. Mean and standard deviation of DeepCocrystal's predictions can be found in Supplementary S11. The performance of the benchmarks on the new set of experiments is also reported.
| Tested | Prospective application | Retrospective evaluation | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Coformer | Experimental | DeepCocrystal | SMI‐CNN(can) | CCGNet | CC‐Descriptor‐ML | Fingerprint‐DNN | Descriptor‐DNN | MC | HBP |
| Adenine | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | × | × |
| Caffeine | ✓ | ✓ | ✓ | × | ✓ | ✓ | ✓ | ✓ | × |
| Theobromine | × | ? | ✓ | × | ✓ | ✓ | ✓ | × | × |
| Xanthine | × | ? | ✓ | ✓ | ✓ | × | ✓ | × | × |
| rosmarinic acid | × | × | ✓ | ✓ | × | ✓ | × | ✓ | × |
| riboflavin | × | × | × | × | × | × | × | ✓ | × |
| BAcc (%) | n.a. | 100% | 63% | 42% | 75% | 67% | 75% | 50% | 50% |
| Recall (%) | n.a. | 100% | 100% | 50% | 100% | 100% | 100% | 50% | 0% |
| Prec (%) | n.a. | 100% | 40% | 33% | 50% | 50% | 50% | 33% | 0% |
| Sp (%) | n.a. | 100% | 25% | 33% | 50% | 33% | 50% | 50% | 100% |
| Pairsout (%) | n.a. | 33% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
All four high‐certainty predictions of DeepCocrystal (adenine and caffeine as “positive” predictions, and rosmarinic acid and riboflavin as “negative” predictions) were confirmed experimentally (Table 3). To the best of our knowledge, the use of adenine and caffeine as coformers for diflunisal has not been previously reported. Future dissolution studies and activity assays will be needed to investigate whether this co‐crystal leads to an improvement in the solubility and pharmacokinetic profile of diflunisal, as observed in other caffeine‐based systems.[ 76 , 77 , 78 ] Furthermore, both selected high‐uncertainty pairs (theobromine and xanthine) did not form co‐crystals (Table 3), indicating the usefulness of our uncertainty estimation approach to rule out false predictions. This experimental validation confirms the potential of DeepCocrystal to accelerate the discovery of novel co‐crystal pairs, even with the structurally similar selection of potential coformers chosen in this study.
LRP analysis on the two identified co‐crystals highlighted key API‐coformer substructures used for prediction (Figure 3b). In particular, the hydroxyl and carboxyl groups of diflunisal, and the nitrogen atoms of caffeine and adenine showed high relevance, confirming DeepCocrystal's ability to capture electrostatic complementarity for decision‐making. Interestingly, adenine shows the highest positive relevance on its pyridinic ring and amino group, suggesting that supramolecular interactions with diflunisal likely occur on the pyridinic side rather than the imidazolic ring, where atoms have a lower relevance, as seen in the salicylic acid–adenine system (see Supporting Information, Figure S35). For the negative predictions (Figure S35), no complementary hydrogen bond donor and acceptor groups in the API and coformer exhibited high relevance. Here, the fluorine atoms of diflunisal showed low relevance, revealing a starkly different behavior compared to their role in positive pairs. Taken together, these aspects confirm that DeepCocrystal captures supramolecular information by modeling the interaction patterns between API and coformer, rather than relying solely on isolated molecular features.
The obtained experimental results were also used to evaluate the performance of the benchmark models retrospectively (Table 3). Although such evaluation is somewhat limited due to the experiments being selected with DeepCocrystal, it offers additional insights into model comparison on external data. SMILES augmentation resulted crucial in achieving these results. SMI‐CNN(can) predicted all purine derivate coformers as “positive” for co‐crystallization, yielding 60% false positives overall. These findings indicated that a) SMILES augmentation allowed DeepCocrystal to better capture small structural changes that might be relevant for co‐crystallization, and b) that the uncertainty estimation allowed further reduction of the number of false positives (Table S11). All literature benchmarks struggled in differentiating molecules with similar chemical structures (e.g., all purines were classified as positive), and led to many false positives (Precision consistently below 50%).
Although the number of co‐crystallization pairs we tested is limited, these results underscore DeepCocrystal's capacity to correctly identify positive pairs while minimizing false positives—thereby reducing the experimental efforts needed for co‐crystal identification.
Conclusions
Optimizing the pharmacokinetic properties of active compounds is an everlasting challenge in drug discovery, and co‐crystallization is an attractive strategy to address this issue. However, identifying suitable co‐crystallization partners for active compounds is both resource‐ and time‐intensive. To accelerate this process, we developed DeepCocrystal, a deep chemical language processing approach designed to predict the co‐crystallization of any selected molecular pairs.
This study shows the potential of DeepCocrystal to advance the state‐of‐the‐art. DeepCocrystal owes its performance to the intriguing properties of the SMILES language, which allowed mitigating data imbalance and estimating uncertainty. By learning (and then combining) single‐molecule information, DeepCocrystal learns elements of the “supramolecular language”[ 18 , 19 , 20 ] of co‐crystal formation. Compared to existing approaches, DeepCocrystal is particularly beneficial to minimize the number of false positives, which is crucial for prospective application campaigns. The experimental validation of DeepCocrystal further corroborated its potential and identified adenine and caffeine as two previously unreported coformers of diflunisal. These results, taken together, underscore the potential of DeepCocrystal to accelerate the discovery of co‐crystallization partners. Additionally, explainability analysis revealed DeepCocrystal's ability to learn chemistry‐rooted elements of supramolecular synthon formation. By identifying functional groups that facilitate or hinder co‐crystallization, DeepCocrystal proved able to learn elements of the “supramolecular language” of co‐crystallization.
This first‐in‐time adoption of the “supramolecular language” perspective with SMILES strings shows its potential for co‐crystallization prediction. Although this study only focused on “two‐word sentences” (i.e., molecule pairs), our approach could be extended to supramolecular interactions among multiple molecular partners. As the complexity of the supramolecular tasks increases, we anticipate model “shortcuts”[ 79 ] to become more prominent, making explainable AI approaches[ 65 , 79 ] relevant in exposing those issues. Moreover, extensive datasets with thorough annotations on stereochemistry might further expand the prediction ability of “supramolecular language” processing approaches based on SMILES strings. Ultimately, extensions of DeepCocrystal might open unexplored opportunities in supramolecular chemistry, e.g., for drug development,[ 80 ] materials discovery,[ 81 ] and beyond.
Author Contributions
Conceptualization: F.G., R.Ö., R.B.; Methodology: R.Ö., F.G., R.B.; Software: R.Ö., R.B.; Validation: R.B.; Formal analysis and Investigation: R.B., R.Ö., F.G.; Resources: F.G., M.R.C.; Data Curation: R.B.; Supervision: F.G.; Writing—Original Draft: R.B., R.Ö., F.G.; Writing—Review and Editing: R.B., R.Ö., F.G. with contributions from R.G. and M.R.C. All authors approved the content of this manuscript before submission.
Conflict of Interests
The authors declare no conflict of interest.
Supporting information
Supporting Information
Acknowledgements
F.G. and R.Ö. acknowledge the support from the Centre for Living Technologies (CLT) and from the Irène Curie Fellowship. This research was supported by the Nederlandse Organisatie voor Wetenschappelijk (NWO, grant no. EINF‐7239 to FG) and by the European Union (ERC, ReMINDER, 101077879 to FG). Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. M.R.C., R.G., and R.B. acknowledge the support from Project CH4.0 under the MUR program “Dipartimenti di Eccellenza 2023–2027” (CUP: D13C22003520001) and Project “Predire” under the program NODES (CUP: D17G22000150001). The authors acknowledge Roberta Beccaria, Filippo Turchi, Natalia Velonia Bellonia, Elena Amadio, Chiara Sabena, Chiara Rosso, and Rossella Zeolla for their lab experimental contributions to the external dataset, as well as Derek van Tilborg and Luke Rossen for technical suggestions, and Andrea Gardin for additional technical investigations.
Birolo R., Özçelik R., Aramini A., Gobetto R., Chierotti M. R., Grisoni F., Angew. Chem. Int. Ed.. 2025, 64, e202507835. 10.1002/anie.202507835
Data Availability Statement
The code to apply DeepCocrystal to any API–coformer pair, along with all publicly available data used for training and validation, is available at: https://github.com/molML/deep-cocrystal. Additionally, DeepCocrystal is available as an easy‐to‐use web application at: https://deepcocrystal.streamlit.app/.
References
- 1. Duggirala N. K., Perry M. L., Almarsson Ö., Zaworotko M. J., Chem. Commun. 2016, 52, 640–655. [DOI] [PubMed] [Google Scholar]
- 2. Thayyil A. R., Juturu T., Nayak S., Kamath S., Adv. Pharm. Bull. 2020, 10, 203–212. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Desiraju G. R., Angew. Chem. Int. Ed. Engl. 1995, 34, 2311–2327. [Google Scholar]
- 4. Ngilirabanga J. B., Samsodien H., Nano Select 2021, 2, 512–526. [Google Scholar]
- 5. Cappuccino C., Cusack D., Flanagan J., Harrison C., Holohan C., Lestari M., Walsh G., Lusi M., Cryst. Growth Des. 2022, 22, 1390–1397. [Google Scholar]
- 6. Artrith N., Butler K. T., Coudert F. X., Han S., Isayev O., Jain A., Walsh A., Nat. Chem. 2021, 13, 505–508. [DOI] [PubMed] [Google Scholar]
- 7. Sarkar N., Gonnella N. C., Krawiec M., Xin D., Aakeröy C. B., Cryst. Growth Des. 2020, 20, 7320–7327. [Google Scholar]
- 8. Molajafari F., Li T., Abbasichaleshtori M., ZD M. H., Cozzolino A. F., Fandrick D. R., Howe J. D., CrystEngComm 2024, 26, 1620–1636. [Google Scholar]
- 9. von Essen C., Luedeker D., Drug Discovery Today, 2023, 28, 103763. [DOI] [PubMed] [Google Scholar]
- 10. Yang D., Wang L.i, Yuan P., An Q.i, Su B., Yu M., Chen T., Hu K., Zhang L.i, Lu Y., Du G., Chin. Chem. Lett. 2023, 34, 107964. [Google Scholar]
- 11. Wang D., Yang Z., Zhu B., Mei X., Luo X., Cryst. Growth Des. 2020, 20, 6610–6621. [Google Scholar]
- 12. Kang Y., Chen J., Hu X., Jiang Y., Li Z., CrystEngComm 2023, 25, 6405–6415. [Google Scholar]
- 13. Jiang Y., Yang Z., Guo J., Li H., Liu Y., Guo Y., Li M., Pu X., Nat. Commun. 2021, 12, 5950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Xiao F., Cheng Y., Wang J. R., Wang D., Zhang Y., Chen K., Mei X., Luo X., Pharmaceutics 2022, 14, 2198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Birolo R., Alladio E., Bravetti F., Chierotti M. R., Gobetto R., in Novel Formulations and Future Trends, Elsevier, Amsterdam: 2024, pp. 483–512. [Google Scholar]
- 16. Fernández A., García S., Herrera F., in Hybrid Artif. Intell. Sys.: 6th Int. Conf., Springer, Wroclaw, Poland: 2011, pp. 1–10. [Google Scholar]
- 17. Japkowicz N., in Proc. of the Int'l Conf. on artificial intelligence 2000, 56, 111–117. [Google Scholar]
- 18. Cragg P. J., Cragg P. J., An Introduction to Supramolecular Chemistry. Springer, Cham: 2010. [Google Scholar]
- 19. Lehn J. M., Angew. Chem. Int. Ed. Engl. 1988, 27, 89–112. [Google Scholar]
- 20. Brock C. P., Dunitz J. D., Chem. Mater. 1994, 6, 1118–1127. [Google Scholar]
- 21. Hirohara M., Saito Y., Koda Y., Sato K., Sakakibara Y., BMC Bioinform. 2018, 19, 83–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Kimber T. B., Engelke S., Tetko I. V., Bruno E., Godin G., arXiv preprint arXiv:1812.04439, 2018.
- 23. van Tilborg D., Alenicheva A., Grisoni F., J. Chem. Inf. Model. 2022, 62, 5938–5951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. O¨ ztürk H., Özgür A., Schwaller P., Laino T., Ozkirimli E., Drug Discovery Today 2020, 25, 689–705. [DOI] [PubMed] [Google Scholar]
- 25. Weininger D., J. Chem. inform. Comp. Sci. 1988, 28, 31–36. [Google Scholar]
- 26. Krenn M., Ai Q., Barthel S., Carson N., Frei A., Frey N. C., Friederich P., Gaudin T., Gayle A. A., Jablonka K. M., Lameiro R. F., Lemm D., Lo A., Moosavi S. M., Nápoles‐Duarte J. M., Nigam A. K., Pollice R., Rajan K., Schatzschneider U., Schwaller P., Skreta M., Smit B., Strieth‐Kalthoff F., Sun C., Tom G., von Rudorff G. F., Wang A., White A. D., Young A., Yu R., et al., Patterns 2022, 3, 100588. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Li C., Feng J., Liu S., Yao J., Comput. Intell. Neurosci. 2022, 2022, 8464452. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Mswahili M. E., Lee M. J., Martin G. L., Kim J., Kim P., Choi G. J., Jeong Y. S., Appl. Sci. 2021, 11, 1323. [Google Scholar]
- 29. Devogelaer J. J., Meekes H., Tinnemans P., Vlieg E., De Gelder R., Angew. Chem. Int. Ed. 2020, 59, 21711–21718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Liang X., Liu S., Li Z., Deng Y., Jiang Y., Yang H., Eur. J. Pharm. Biopharm. 2024, 196, 114201. [DOI] [PubMed] [Google Scholar]
- 31. Fábián L., Cryst. Growth Des. 2009, 9, 1436–1443. [Google Scholar]
- 32. Galek P. T., Fábián L., Motherwell W. S. Allen F. H., Feeder N., Struct. Sci. 2007, 63, 768–782. [DOI] [PubMed] [Google Scholar]
- 33. LeCun Y., Bottou L., Bengio Y., Haffner P., Proc. IEEE 1998, 86, 2278–2324. [Google Scholar]
- 34. Yin W., Kann K., Yu M., Schütze H., arXiv preprint arXiv:1702.01923, 2017.
- 35. Özçelik R., Grisoni F., Digital Discovery 2025, 4, 316–325. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Groom C. R., Bruno I. J., Lightfoot M. P., Ward S. C., Acta Crystall. Sect. B: Struct. Sci. Crys. Eng. Mater. 2016, 72, 171–179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Shen T., Pharmacother.: J. Human Pharma. Drug Therapy 1983, 3, 3S–8S. [Google Scholar]
- 38. Aakeröy C. B., Grommet A. B., Desper J., Pharmaceutics 2011, 3, 601–614. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Grecu T., Hunter C. A., Gardiner E. J., McCabe J. F., Cryst. Growth Des. 2014, 14, 165–171. [Google Scholar]
- 40. Grecu T., Adams H., Hunter C. A., McCabe J. F., Portell A., Prohens R., Cryst. Growth Des. 2014, 14, 1749–1755. [Google Scholar]
- 41. Roca‐Paixão L., Correia N. T., Affouard F., CrystEngComm 2019, 21, 6991–7001. [Google Scholar]
- 42. Buol X., Robeyns K., Caro Garrido C., Tumanov N., Collard L., Wouters J., Leyssens T., Pharmaceutics 2020, 12, 653. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Dash S. G., Thakur T. S., Cryst. Growth Des. 2020, 21, 449–461. [Google Scholar]
- 44. Ma X., Chen X., Zhen Y., Zheng X., Shi C., Jin S., Liu B., Chen B., Wang D., J. Mol. Struct. 2024, 1298, 136942. [Google Scholar]
- 45. Pontes A. G. O., Vidal L. M. T., de Oliveira Y. S., Bezerra B. P., Girão S. B. H., Ayala A. P., J. Mol. Struct. 2024, 1311, 138374. [Google Scholar]
- 46. Ouyang J., Liu L., Li Y., Chen M., Zhou L., Liu Z., Xu L., Shehzad H., Particuology 2024, 90, 20–30. [Google Scholar]
- 47. Ji W. J., Jiang J. Y., Hong M., Zhu B., Ren G. B., Qi M. H., Cryst. Growth Des. 2023, 23, 5770–5784. [Google Scholar]
- 48. Rossi F., Cerreia Vioglio P., Bordignon S., Giorgio V., Nervi C., Priola E., Gobetto R., Yazawa K., Chierotti M. R., Cryst. Growth Des. 2018, 18, 2225–2233. [Google Scholar]
- 49. Bordignon S., Cerreia Vioglio P., Amadio E., Rossi F., Priola E., Voinovich D., Gobetto R., Chierotti M. R., Pharmaceutics 2020, 12, 818. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Liu F., Wang L. Y., Yu M. C., Li Y. T., Wu Z. Y., Yan C. W., Eur. J. Pharm. Sci. 2020, 144, 105216. [DOI] [PubMed] [Google Scholar]
- 51. Bordignon S., Cerreia Vioglio P., Bertoncini C., Priola E., Gobetto R., Chierotti M. R., Cryst. Growth Des. 2021, 21, 6776–6785. [Google Scholar]
- 52. D'Abbrunzo I., Bianco E., Gigli L., Demitri N., Birolo R., Chierotti M. R., Skoric I., Keiser J., Häberli C., Voinovich D., Hasa D., Perissutti B., Int. J. Pharm. 2023, 644, 123315. [DOI] [PubMed] [Google Scholar]
- 53. D'Abbrunzo I., Birolo R., Chierotti M. R., Bučar D. K., Voinovich D., Perissutti B., Hasa D., Eur. J. Pharm. Biopharm. 2024, 201, 114344. [DOI] [PubMed] [Google Scholar]
- 54. Birolo R., Bravetti F., Alladio E., Priola E., Bianchini G., Novelli R., Aramini A., Gobetto R., Chierotti M. R., Cryst. Growth Des. 2023, 23, 7898–7911. [Google Scholar]
- 55. Rogers D., Hahn M., J. Chem. Inf. Model. 2010, 50, 742–754. [DOI] [PubMed] [Google Scholar]
- 56. Schneider N., Sayle R. A., Landrum G. A., J. Chem. Inf. Model. 2015, 55, 2111–2120. [DOI] [PubMed] [Google Scholar]
- 57. Ballabio D., Grisoni F., Todeschini R., Chemom. Intell. Lab. Syst. 2018, 174, 33–44. [Google Scholar]
- 58. Chen J., Li Z., Kang Y., Li Z., Crystals 2024, 14, 313. [Google Scholar]
- 59. Hirschfeld L., Swanson K., Yang K., Barzilay R., Coley C. W., J. Chem. Inf. Model. 2020, 60, 3770–3780. [DOI] [PubMed] [Google Scholar]
- 60. Tyralis H., Papacharalampous G., Artif.l Intell. Rev. 2024, 57, 94. [Google Scholar]
- 61. Vriza A., Sovago I., Widdowson D., Kurlin V., Wood P. A., Dyer M. S., Digi.l Discov. 2022, 1, 834–850. [Google Scholar]
- 62. Ghanavati M. A., Rohani S., Cryst. Growth Des. 2025. [Google Scholar]
- 63. Schwaller P., Vaucher A. C., Laino T., Reymond J. L., ChemRxiv 2020. [Google Scholar]
- 64. Kimber T. B., Gagnebin M., Volkamer A., Artif. Intell. Life Sci. 2021, 1, 100014. [Google Scholar]
- 65. Jiménez‐Luna J., Grisoni F., Schneider G., Nat. Mach. Int. 2020, 2, 573–584. [Google Scholar]
- 66. Bach S., Binder A., Montavon G., Klauschen F., Müller K. R., Samek W., PLoS One 2015, 10, e0130140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67. Hao S. Y., Li J. Y., Mu C. Q., Liu D. S., Yang Y., Li Y. T., Liu T. M., Wang X. K., Liu F., Cryst. Growth Des. 2023, 23, 885–891. [Google Scholar]
- 68. Snetkov P., Morozkina S., Olekhnovich R., Uspenskaya M., Materials 2021, 14, 6687. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69. Martìnez‐Pinilla E., On˜atibia‐Astibia A., Franco R., Front. Pharmacol. 2015, 6, 126866. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70. Yahfoufi N., Alsadi N., Jambi M., Matar C., Nutrients 2018, 10, 1618. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71. Luo C., Zou L., Sun H., Peng J., Gao C., Bao L., Ji R., Jin Y., Sun S., Front. Pharmacol. 2020, 11, 153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72. Rodak K., Kokot I., Kratz E. M., Nutrients 2021, 13, 3088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73. Weyna D. R., Shattock T., Vishweshwar P., Zaworotko M. J., Cryst. Growth Des. 2009, 9, 1106–1123. [Google Scholar]
- 74. Charpentier M. D., Devogelaer J. J., Tijink A., Meekes H., Tinnemans P., Vlieg E., de Gelder R., Johnston K., Ter Horst J. H., Cryst. Grow. Des. 2022, 22, 5511–5525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75. Guo M., Sun X., Chen J., Cai T., Acta Pharm. Sin. B 2021, 11, 2537–2564. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76. Bordignon S., Cerreia Vioglio P., Priola E., Voinovich D., Gobetto R., Nishiyama Y., Chierotti M. R., Cryst. Growth Des. 2017, 17, 5744–5752. [Google Scholar]
- 77. Kumar G. S., Seethalakshmi P., Bhuvanesh N., Kumaresan S., J. Mol. Struct. 2013, 1050, 88–96. [Google Scholar]
- 78. Goud N. R., Gangavaram S., Suresh K., Pal S., Manjunatha S. G., Nambiar S., Nangia A., J. Pharm. Sci. 2012, 101, 664–680. [DOI] [PubMed] [Google Scholar]
- 79. Lapuschkin S., Wäldchen S., Binder A., Montavon G., Samek W., Müller K. R., Nat. Commun. 2019, 10, 1096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80. Kawakami K., Ebara M., Izawa H., Sanchez‐Ballester N. M., Hill J. P., Ariga K., Curr. Med. Chem. 2012, 19, 2388–2398. [DOI] [PubMed] [Google Scholar]
- 81. Stupp S. I., Palmer L. C., Chem. Mater. 2014, 26, 507–518. [Google Scholar]
- 82. Bruno I. J., Cole J. C., Edgington P. R., Kessler M., Macrae C. F., McCabe P., Pearson J., Taylor R., Acta Crystall. Sect. B: Struct. Sci. 2002, 58, 389–397. [DOI] [PubMed] [Google Scholar]
- 83. Mohammad M. A., Alhalaweh A., Velaga S. P., Int. J. Pharm. 2011, 407, 63–71. [DOI] [PubMed] [Google Scholar]
- 84. Moriwaki H., Tian Y. S., Kawashita N., Takagi T., J. Cheminform. 2018, 10, 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85. Landrum G., Release 2013, 1, 4. [Google Scholar]
- 86. Macrae C. F., Edgington P. R., McCabe P., Pidcock E., Shields G. P., Taylor R., Towler M., Streek J., Appl. Crystall. 2006, 39, 453–457. [Google Scholar]
- 87.CCDC Open Source, “CSD Python API Scripts – Multi Component Hydrogen Bond Propensity.” https://github.com/ccdc‐opensource/csd‐python‐api‐scripts/tree/main/scripts/, 2017. (accessed: March 2025).
- 88. Friedman M., J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar]
- 89. Wilcoxon F., in Breakthroughs in Statistics: Methodology and Distribution, Springer, Cham: 1992, pp. 196–202. [Google Scholar]
- 90. Holm S., Scandinavian J. stat. 1979, 6, 65–70. [Google Scholar]
- 91. Alber M., Lapuschkin S., Seegerer P., Hägele M., Schütt K. T., Montavon G., Samek W., Müller K. R., Dähne S., Kindermans P. J., J. Mach. Learn. Res. 2019, 20, 1–8. [Google Scholar]
- 92. Riniker S., Landrum G. A., J. Cheminform. 2013, 5, 1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93. Birolo R., Bravetti F., Bordignon S., D'Abbrunzo I., Mazzeo P. P., Perissutti B., Bacchi A., Chierotti M. R., Gobetto R., Pharmaceutics 2022, 14, 1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supporting Information
Data Availability Statement
The code to apply DeepCocrystal to any API–coformer pair, along with all publicly available data used for training and validation, is available at: https://github.com/molML/deep-cocrystal. Additionally, DeepCocrystal is available as an easy‐to‐use web application at: https://deepcocrystal.streamlit.app/.
