Abstract
The development of new drugs addressing serious mental health and other disorders should avoid having a psychedelic experience. Analogs of psychedelic drugs can have clinical utility and are termed “psychoplastogens”. These represent promising candidates for treating opioid use disorder to reduce drug-dependence, with rarely reported serious adverse effects. This drug abuse cessation is linked to the induction of neuritogenesis and increased neuroplasticity, a hallmark of psychedelic molecules such as lysergic acid diethylamine. Some, but not all psychoplastogens may act through the G-protein coupled receptor (GPCR) 5HT2A whereas others may display very different polypharmacology making prediction of hallucinogenic potential challenging. In the process of developing tools to help design new psychoplastogens, we have used artificial intelligence in the form of machine learning classification models for predicting psychedelic effects using a published in vitro dataset from PsychLight (Support Vector Classification (SVC), Area Under the Curve (AUC) 0.74) and in vivo human data derived from books from Shulgin and Shulgin (SVC, AUC, 0.72) with nested 5-fold cross validation. We have also explored conformal predictors with ECFP6 and electrostatic descriptors in efforts to optimize them. These models have been used to predict known 5HT2A agonists to assess their potential to act as psychedelics and induce hallucinations for PsychLight (SVC, AUC 0.97) and Shulgin and Shulgin (random forest, AUC 0.71). We have tested these models with head twitch data from mouse. This predictive capability is desirable to reliably design new psychoplastogens that lack in vivo hallucinogenic potential and help assess existing and future molecules for this potential. These efforts also provide useful insights to understand the psychedelic structure activity relationship.
Keywords: Conformal predictors, Hallucinogenic, Machine Learning, Psychedelic, Support Vector Classification
Graphical Abstract

INTRODUCTION
The field of psychiatric medicine has seen a renaissance in recent years with new evidence of psychedelics effectively treating depression, previously treatment-resistant depression (e.g. ketamine) and substance abuse disorders1, 2. Psychedelics are a group of drugs which alter perception, sense of self, and cognition, and are associated with mystical and often reportedly life-changing experiences. Common psychedelics include lysergic acid diethylamine (LSD), the prodrug psilocybin and the active metabolite psilocin, as well as N,N-Dimethyltryptamine (DMT). Many psychedelics are thought to act as agonists of serotonin 5-HT2A, as well as other 5-HT receptors along with dopaminergic and glutamatergic systems, where they can induce long-term changes in neuronal structure2, 3. Psychedelics can display polypharmacology such as ibogaine which is known to interact with numerous targets (opioid receptors, dopamine transporter, serotonin transporter, glutamate receptors, nicotinic receptors and neurotrophic factors)4. Activation of the 5-HT2A receptor is known to facilitate the downstream release of brain-derived neurotrophic factor (BDNF) as well as cause profound changes in the default-model network (DMN), and it is through these two downstream effects that psychedelics are proposed to offer their therapeutic value5. Changes in resting-state functional connectivity (RSFC) in the DMN have been found in patients with depression taking psilocybin, and these changes have been shown to be predictive of treatment response6. Such effects are also potentially relevant for substance dependence, particularly opioid use disorder (OUD), where abnormal patterns of RSFC in the DMN have been reported across various classes of illicit drugs and are associated with craving and relapse via impaired self-awareness, negative emotions and rumination7. DMN interactions with the salience and executive control networks have also been shown to be disrupted in drug-dependent individuals7. Neurons in the prefrontal cortex are suggested to play a key role in OUD, where alteration of opioid signaling can change binging and addiction behaviors5, 8. Recent research has confirmed that serotonergic psychedelics can promote both structural and functional plasticity in the prefrontal cortex, robustly increasing neuritogenesis and spinogenesis, and leading to increased synapse numbers and function. These effects are reported to come about via the Tropomyosin receptor kinase B (TrkB), a mechanistic target of rapamycin (mTOR) and the aforementioned 5-HT2A signaling pathways5, 9.
Recently, a new class of molecules derived from psychedelics have been introduced that are known as psychoplastogens10, 11. These are described as various serotonergic and entactogens that rapidly display plasticity-promoting properties by activation of mTOR. The ability to induce neuroplasticity is thought to be the end result of increased BDNF release, which stimulates spinogenesis and neuritogenesis12. Disruption of BDNF release or failure to promote plasticity is associated with no reduction in drug-seeking behavior in mice.
Critically, the psychedelic experience and induced neuronal plasticity are not entirely linked, and some evidence suggests that the psychedelic experience is separate from the therapeutic role of psychedelics13. Not all neuroplasticity effects are mediated through the 5-HT2A receptor5, 14. While the 5-HT2A receptor seems to play roles in the psychedelic experience and neuroplasticity, some psychoplastogens do not induce a psychedelic experience11 and could therefore have value for treating OUD1, 2. As an example, the analog of the natural product psychedelic ibogaine, called tabernanthalog, was recently synthesized and demonstrated decreased drug-seeking responses and induced neuritogenesis and spinogenesis in rats11. However, it did not induce a head-twitch response, which is a common hallmark of a psychedelic experience in rodents11. While traditional psychedelics broadly target many serotonin as well as other receptors, tabernanthalog is more specific to 5HT2A than ibogaine, having antagonistic effects for 5HT1F, no activity at 5HT2B, and a better specificity ratio of 5HT2a/5HT2b. Tabernanthalog, therefore has more specificity than ibogaine for 5HT2A15.
We now describe a starting point for the development of psychoplastogens avoiding psychedelic effects using readily available public datasets of molecules with clearly described in vitro and in vivo effects, to learn from this data using artificial intelligence (AI). We have curated a dataset from a recent publication in which a biosensor assay for 5-HT2A called PsychLight16 was described and has been used in vivo as well as in vitro to generate data in a functional assay. This represents perhaps the largest such in vitro dataset from a single laboratory. In addition, we further curated an in vivo dataset from two well-known books published by Alexander and Ann Shulgin17, 18 describing several classes of psychedelics (phenyethylamines and tryptamines) which were synthesized by Alexander and subsequently tested in vivo by the authors and their colleagues over a period of decades. We also introduce state-of the art machine learning modeling techniques such as conformal predictors19 and developed a novel molecular fingerprint based on electrostatic similarity of fragments of the core structure to investigate whether we can improve the predictive capabilities of these relatively small “psychedelic models”, as well as the fine-tuning of a pre-trained Large Language model which lead to similar results. To evaluate the various model’s predictive potential, we have curated a set of molecules from the literature which are known to cause a head-twitch response, a common marker of psychedelic activity in mice. We also utilized a set of 5HT2A agonists described in a recent review to test these models further15. To our knowledge, these represent the first machine learning models curated and validated enabling prediction of in vivo psychedelic effects and hallucinogenic potential from molecular structure alone.
RESULTS AND DISCUSSION
We curated the PsychLight in vitro dataset16 into two separate datasets. The first set was composed only of known hallucinogens and non-hallucinogens reported in Dong et al.16, representing a more conservative, higher quality dataset. We refer to this as PsychLight A. Next, we added compounds with unknown hallucinogenic potential, but which were classified with the PsychLight reporter assay into either hallucinogenic or non-hallucinogenic classes, giving us a larger dataset with higher coverage of chemical space, but which may be error prone. We called this dataset PsychLight B (see Methods for full dataset curation procedure).
We developed classification machine learning models using 8 algorithms in our Assay Central software20 for the PsychLight dataset (N = 54 for PsychLight A, 84 for PsychLight B) to differentiate hallucinogenic from non-hallucinogenic compounds. For features, we compared a number of different descriptor sets including MACCS Keys, Morgan Fingerprints, and Calculated Chemical Descriptors (see methods) to determine which features would produce the most predictive models. We used 166 MACCS Keys as calculated through RDKit21. We also used the ChemAxon22 software to generate the chemical feature descriptors. Finally, we used Morgan Fingerprints from RDKit21 with diameter 3 and nbits set to 1024 for the model inputs (Figure S1). Morgan Fingerprints produced overall better models (Tables 1A and 1B) than either MACCS Keys (Table S1) or chemical feature descriptors (Table S2). Models trained on Psychlight A overall had better predictive power, based on 5-fold nested cross validation, than models trained on Psychlight B, with the best models having MCC of 0.7 and 0.44, respectively (Tables 1A and 1B). This suggests the inclusion of the predicted data may diminish the model if the predictions are incorrect, and thus we proceeded with using the higher quality but smaller PsychLight dataset A for the rest of the analysis, which we will subsequently refer to as just the PsychLight dataset. The XGBoost (xgb) model had the highest 5-fold cross validation MCC (0.7), precision (0.75), and recall (0.79) of all the models trained on the PsychLight A data (Table 1B).
Table 1.
PsychLight16 model 5-fold nested cross validation statistics for PsychLight Dataset A (1A) and PsychLight Dataset B (1B). tn = true negative, fp = false positive, fn = false negative, tp = true positive. Confusion matrix numbers represent sum of the cross-validation number, e.g. tn = sum of all true negative over all folds. ada = AdaBoosted decision trees, bnb = Naïve Bayesian, knn = k-nearest neighbors, lreg = linear regression, rf = random forest, DL = deep learning, svc = support vector machine, xgb = xgboost.
| 1A | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
| ada | 0.75 | 0.65 | 0.82 | 0.45 | 0.47 | 0.7 | 0.63 | 0.82 | 27 | 6 | 7 | 11 |
| bnb | 0.75 | 0.65 | 0.76 | 0.45 | 0.46 | 0.7 | 0.63 | 0.81 | 27 | 6 | 7 | 11 |
| knn | 0.76 | 0.73 | 0.84 | 0.53 | 0.57 | 0.62 | 0.9 | 0.7 | 23 | 10 | 2 | 16 |
| lreg | 0.79 | 0.71 | 0.85 | 0.55 | 0.55 | 0.7 | 0.73 | 0.82 | 27 | 6 | 5 | 13 |
| rf | 0.77 | 0.68 | 0.79 | 0.5 | 0.52 | 0.66 | 0.75 | 0.79 | 26 | 7 | 5 | 13 |
| DL | 0.61 | 0.32 | 0.63 | 0.07 | 0.07 | 0.31 | 0.33 | 0.75 | 25 | 8 | 12 | 6 |
| svc | 0.82 | 0.79 | 0.84 | 0.65 | 0.67 | 0.72 | 0.9 | 0.79 | 26 | 7 | 2 | 16 |
| xgb | 0.84 | 0.81 | 0.81 | 0.68 | 0.7 | 0.75 | 0.9 | 0.82 | 27 | 6 | 2 | 16 |
| 1B | ||||||||||||
| Model | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
| ada | 0.70 | 0.40 | 0.62 | 0.22 | 0.21 | 0.43 | 0.38 | 0.84 | 46 | 9 | 15 | 9 |
| bnb | 0.70 | 0.38 | 0.65 | 0.20 | 0.22 | 0.51 | 0.36 | 0.84 | 46 | 9 | 15 | 9 |
| knn | 0.70 | 0.54 | 0.73 | 0.32 | 0.33 | 0.51 | 0.61 | 0.73 | 40 | 15 | 9 | 15 |
| lreg | 0.73 | 0.55 | 0.71 | 0.37 | 0.39 | 0.60 | 0.57 | 0.80 | 44 | 11 | 10 | 14 |
| rf | 0.75 | 0.59 | 0.70 | 0.41 | 0.43 | 0.62 | 0.61 | 0.80 | 44 | 11 | 9 | 15 |
| DL | 0.65 | 0.17 | 0.70 | 0.02 | 0.03 | 0.23 | 0.14 | 0.87 | 48 | 7 | 21 | 3 |
| svc | 0.76 | 0.60 | 0.74 | 0.43 | 0.44 | 0.63 | 0.61 | 0.82 | 45 | 10 | 9 | 15 |
| xgb | 0.68 | 0.52 | 0.64 | 0.29 | 0.30 | 0.49 | 0.57 | 0.73 | 40 | 15 | 10 | 14 |
SHAP Explainability for Psychedelic Activity
Although chemical descriptors from ChemAxon did not produce the most predictive models, they produced more explainable models, and thus we decided to investigate the Feature importance of each chemical descriptor for model predictions. We used SHAP23 analysis to produce a beeswarm plot, which shows the effect of each feature on all predictions over both the PsychLight dataset and the Shulgin dataset using their respective models (Figure S2). We used the Random Forest model for SHAP analysis as it performed well for both the PsychLight and Shulgin datasets (Table S2).The most important features included pKa and RotBond (rotatable bonds) for both models; AroRing (aromatic rings), LogD for the PsychLight A model, and TPSA (total polar surface area) and MW for the Shulgin model. The SHAP values suggest that higher pKa, more aromatic rings, lower number of rotational bonds, higher TPSA and lower logD are important for discerning molecules with psychedelic activity versus those that are predicted to not induce psychedelic activity. Several of these descriptors may also be relevant to predict blood brain barrier penetration to ensure they have CNS activity so it would be ideally important to compare these requirements and those for avoiding psychedelic activity in parallel.
The comparatively larger combined PIHKAL and TIKHAL dataset (N = 221) representing human in vivo data indicated that the Naive Bayes (bnb) model had the highest 5-fold nested cross validation MCC (0.33, Table 2) while Support Vector Classification (svc) had the highest AUC (0.72).
Table 2.
Shulgin and Shulgin PIHKAL17 and TIHKAL18 model. 5-fold nested cross validation statistics. tn = true negative, fp = false positive, fn = false negative, tp = true positive. Confusion matrix numbers represent sum of the cross-validation number, e.g. tn = sum of all true negative over all folds. ada = AdaBoosted decision trees, bnb = Naïve Bayesian, knn = k-nearest neighbors, lreg = linear regression, rf = random forest, DL = deep learning, svc = support vector machine, xgb = xgboost.
| Model | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ada | 0.61 | 0.60 | 0.67 | 0.21 | 0.22 | 0.57 | 0.62 | 0.59 | 69 | 48 | 38 | 63 |
| bnb | 0.67 | 0.63 | 0.70 | 0.33 | 0.33 | 0.65 | 0.62 | 0.70 | 82 | 35 | 38 | 63 |
| knn | 0.63 | 0.64 | 0.63 | 0.28 | 0.28 | 0.59 | 0.71 | 0.57 | 66 | 51 | 29 | 72 |
| lreg | 0.64 | 0.64 | 0.68 | 0.29 | 0.30 | 0.61 | 0.68 | 0.61 | 71 | 46 | 32 | 69 |
| rf | 0.62 | 0.62 | 0.69 | 0.25 | 0.25 | 0.59 | 0.67 | 0.58 | 68 | 49 | 34 | 67 |
| DL | 0.65 | 0.60 | 0.68 | 0.30 | 0.30 | 0.65 | 0.56 | 0.73 | 85 | 32 | 44 | 57 |
| svc | 0.66 | 0.63 | 0.72 | 0.31 | 0.32 | 0.64 | 0.63 | 0.68 | 79 | 38 | 37 | 64 |
| xgb | 0.65 | 0.63 | 0.68 | 0.30 | 0.31 | 0.63 | 0.65 | 0.66 | 77 | 40 | 36 | 65 |
These models were then used to predict 5-HT2A agonists curated from a recent review by Duan et al 202415 which were not in the respective models (Table 3). The PsychLight A model predicted 22 molecules with the best AUC shown with the svc model (0.97, Table 4, Table S3). The Shulgin and Shulgin svc model predicted 27 molecules with similar accuracy (0.81, Table 5, Table S4). A t-SNE plot of the training and test sets (using the same molecular descriptors as used for machine learning) suggests the Shulgin and Shulgin dataset is made up of several clusters, whereas the Psychlight dataset is mainly in two closely-situated clusters (Figure 1). Interestingly, the test set seems to sample all the areas with the two training sets and does not have any molecules further away from the training sets.
Table 3.
Test set – Numbering system from Duan et al 202315, class predictions from the best PsychLight and Shulgin Support Vector Classification models. Predictions left blank represent molecules that were in the training set.
| Molecule name | Hallucinogenic activity | Prediction PsychLight Model | Prediction Shulgin and Shulgin Model |
|---|---|---|---|
| Psilocybin (2) | Y | 1 | |
| LSD (3) | Y | ||
| Mescaline (4) | Y | 0 | |
| DMT (5) | Y | ||
| 5-MeO-DMT (6) | Y | ||
| Bufotenine (7) | Y | ||
| Psilocin (8) | Y | ||
| isoDMT (22) | N | 0 | |
| 6-OMe-isoDMT (23) | Y | 0 | |
| 5-OMe-isoDMT (24) | N | 0 | 0 |
| 5-Br-DMT (32) | N | 0 | |
| DET (41) | Y | 1 | |
| 6F-DET (42) | N | 0 | |
| 5-MeO-T (51) | Y | 1 | |
| 5-MeO-DET (52) | Y | 1 | 1 |
| 3-OMe (59) | Y | 0 | 0 |
| 2-OMe (65) | Y | 0 | 0 |
| 2Br-LSD (68) | N | 0 | 0 |
| AL-LAD (69) | Y | 0 | 1 |
| LSZ (70) | Y | 0 | 0 |
| Lisuride (71) | N | 0 | |
| Ariadne (95) | N | 0 | 0 |
| (100) | N | 0 | 0 |
| (101) | N | 0 | 0 |
| (102) | N | 0 | 0 |
| 25CN-NBOH (104) | Y | 0 | |
| DMCPA (123) | N | 0 | 0 |
| (128) | N | 0 | 0 |
| (129) | N | 0 | 0 |
| 25N-N1-Nap (156) | N | 0 | 0 |
| 25N-NBP (157) | N | 0 | 0 |
| Ibogaine (162) | Y | ||
| Tabernanthalog (164) | N | 0 | |
| INCH-7086 (168) | N | 0 | 0 |
| (R)-69 (170) | N | 0 | 0 |
| (R)-70 (171) | N | 0 | 0 |
Table 4.
PsychLight16 model external validation statistics for predicting 5-HT2A agonists from Duan et al 202415. ada = AdaBoosted decision trees, bnb = Naïve Bayesian, knn = k-nearest neighbors, lreg = linear regression, rf = random forest, DL = deep learning, svc = support vector machine, xgb = xgboost.
| Model | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ada | 0.59 | 0.31 | 0.68 | 0.04 | 0.04 | 0.4 | 0.25 | 0.79 | 11 | 3 | 6 | 2 |
| bnb | 0.73 | 0.67 | 0.79 | 0.44 | 0.45 | 0.6 | 0.75 | 0.71 | 10 | 4 | 2 | 6 |
| knn | 0.5 | 0.48 | 0.62 | 0.05 | 0.05 | 0.38 | 0.63 | 0.43 | 6 | 8 | 3 | 5 |
| lreg | 0.73 | 0.7 | 0.75 | 0.47 | 0.5 | 0.58 | 0.88 | 0.64 | 9 | 5 | 1 | 7 |
| rf | 0.73 | 0.4 | 0.82 | 0.3 | 0.42 | 1 | 0.25 | 1 | 14 | 0 | 6 | 2 |
| DL | 0.82 | 0.67 | 0.89 | 0.56 | 0.62 | 1 | 0.5 | 1 | 14 | 0 | 4 | 4 |
| svc | 0.77 | 0.55 | 0.97 | 0.43 | 0.53 | 1 | 0.38 | 1 | 14 | 0 | 5 | 3 |
| xgb | 0.77 | 0.74 | 0.76 | 0.55 | 0.57 | 0.64 | 0.88 | 0.71 | 10 | 4 | 1 | 7 |
Table 5.
Shulgin and Shulgin PIHKAL17 and TIHKAL18 model external validation statistics for predicting 5-HT2A agonists from Duan et al 202415. ada = AdaBoosted decision trees, bnb = Naïve Bayesian, knn = k-nearest neighbors, lreg = linear regression, rf = random forest, DL = deep learning, svc = support vector machine, xgb = xgboost.
| Model | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ada | 0.7 | 0.43 | 0.49 | 0.23 | 0.24 | 0.5 | 0.38 | 0.84 | 16 | 3 | 5 | 3 |
| bnb | 0.67 | 0.57 | 0.69 | 0.32 | 0.35 | 0.46 | 0.75 | 0.63 | 12 | 7 | 2 | 6 |
| knn | 0.63 | 0.5 | 0.67 | 0.22 | 0.24 | 0.42 | 0.63 | 0.63 | 12 | 7 | 3 | 5 |
| lreg | 0.52 | 0.24 | 0.57 | −0.11 | −0.11 | 0.22 | 0.25 | 0.63 | 12 | 7 | 6 | 2 |
| rf | 0.67 | 0.47 | 0.71 | 0.23 | 0.23 | 0.44 | 0.5 | 0.74 | 14 | 5 | 4 | 4 |
| DL | 0.67 | 0.4 | 0.68 | 0.17 | 0.17 | 0.43 | 0.38 | 0.79 | 15 | 4 | 5 | 3 |
| svc | 0.81 | 0.55 | 0.67 | 0.46 | 0.54 | 1 | 0.38 | 1 | 19 | 0 | 5 | 3 |
| xgb | 0.63 | 0.29 | 0.66 | 0.04 | 0.04 | 0.33 | 0.25 | 0.79 | 15 | 4 | 6 | 2 |
Figure 1.

A) t-SNE plot comparing the chemical space occupied by compounds from the PsychLight (green), Shulgin and Shulgin (purple) training sets and external test sets (blue, salmon). B) t-SNE of only molecules that are hallucinogenic. Serotonin agonist review = 5-HT2A agonists from Duan et al 202415.
Conformal Predictors improve model performance with uncertainty-calibrated predictions
While these results suggest the machine learning models perform well in terms of cross validation and external validation, the small test dataset size and low molecular diversity suggests that the applicability domain is restricted to a narrow chemical property space. Often it is unknown whether one can trust such model predictions and it is just as important to understand model uncertainty as the model prediction itself. To improve the value of our psychedelic predictive models, we have implemented a conformal prediction framework over the best models for the Shulgin and PsychLight A datasets.
Conformal Prediction19 is a method for converting a scoring function representing a notion of uncertainty into a rigorous uncertainty measure. Supposing a training dataset , a test set , and a calibration dataset are independent and identically distributed and we have a classifier which outputs estimated probabilities (such as a prediction score) of belong to a class , a quantile can be constructed from the calibration datasets such that
holds for any given selected threshold. In brief, this method allows us to select an accuracy threshold (defined as ) and to classify any prediction scores that do not meet the accuracy threshold as out of domain. We implemented Conformal Predictors for SVC models for both the Shulgin and PsychLight A datasets as these performed best on the external dataset of 5-HT2A agonists from Duan et al 202415.
We found that conformal predictors improved the predictive accuracy of our models as the expense of removing predictions with high uncertainty from the test set (Figure 2). As our accuracy threshold increases (, the precision, specificity, and AUC improve (Figure 2A), while the fraction of the test set which is rejected as outside of the domain increases (Figure 2B). At an accuracy threshold of 0.9, the Shulgin model’s precision increases from 0.64 to 0.86, and the specificity increased from 0.68 to 0.9 at the cost of classifying 68% of the test set as out of domain. The PsychLight model however did not show an improvement over the base model using this approach, and the models instead became worse under conformal predictor training (Figure 3). There was still a positive relationship with increasing threshold accuracy and statistical improvement, although with high variance. This is likely because The PsychLight dataset is much smaller and covers less chemical space than the Shulgin dataset (Figure 1), with critical training datapoints being used for calibration, which is also limited in its effectiveness due to size. We chose an accuracy threshold of 0.8 to balance improved prediction results versus keeping more molecules in-domain of the model for the Shulgin model.
Figure 2.

Conformal Predictor Statistics for the Shulgin SVC model. A) The change in statistics calculated for only predictions included in domain based on the accuracy threshold. B) The fraction of the test set that remains in-domain decreases as the accuracy threshold is increased.
Figure 3.

Conformal Predictor Statistics for the PsychLight SVC model. A) The change in statistics calculated for only predictions included in domain based on the accuracy threshold. B) The fraction of the test set that remains in-domain decreases as the accuracy threshold is increased.
To further test the performance of our in vivo psychedelic prediction models, we have also curated a set of 75 molecules from the published literature which are known to induce a head-twitch response in mice, a common behavior of psychedelic activity (See Methods and Supplemental References). We used this head-twitch data as an external test set to test the efficacy of our conformal predictor models. After dropping duplicate molecules between the head-twitch and each model training datasets, 56 molecules remained for testing the Shulgin model and 40 remained for the PsychLight model. The PsychLight model had a precision of 0.62 and specificity of 0.7, but a low recall of 0.15. In contrast with the cross-validation results, with the conformal threshold set to 0.8, precision and recall improve to 0.7 and 0.78, respectively (Table 6). However, only 12/40 of the test set molecules were in-domain. No True-negatives were predicted, which mostly likely arose due to the dataset having a heavy imbalance in favor of molecules that cause head-twitch response and the narrow chemical space coverage (Figure 1). The Shulgin model was far more robust, with high precision (0.82) and specificity (0.89), but also suffered similarly from low recall (0.24). Using a conformal predictor threshold of 0.8, 16/57 molecules were in-domain (Table 7). However, the precision and specificity both were 1, as no false positives were predicted, and the recall slightly improved from 0.24 to 0.27.
Table 6.
PsychLight16 model head-twitch external validation statistics. CP = Conformal Predictors. SVC = support vector classifier. Thr. = Accuracy Threshold for Conformal Predictors.
| Model | Thr. | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVC | - | 0.28 | 0.24 | 0.40 | −0.08 | −0.16 | 0.62 | 0.15 | 0.71 | 12 | 5 | 46 | 8 |
| CP | 0.8 | 0.58 | 0.74 | 0.67 | −0.25 | −0.26 | 0.70 | 0.78 | 0.00 | 0 | 3 | 2 | 7 |
Table 7.
Shulgin and Shulgin PIHKAL17 and TIHKAL18 model head-twitch external validation statistics. CP = conformal Predictors. SVC = support vector classifier. Thr. = Accuracy Threshold for Conformal Predictors.
| Model | Thr. | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVC | - | 0.45 | 0.37 | 0.50 | 0.09 | 0.15 | 0.82 | 0.24 | 0.89 | 16 | 2 | 29 | 9 |
| CP | 0.80 | 0.50 | 0.43 | 0.67 | 0.19 | 0.32 | 1.00 | 0.27 | 1.00 | 5 | 0 | 8 | 3 |
Psychedelic Substructure Analysis
We used similarity maps24 to investigate the importance of a particular atom on the model predictions. Briefly, similarity maps highlight the strength of a particular atomic contribution to a positive or negative prediction. The stronger the pink color, the more the atom contributes to positive predictions and the stronger the green color, the more it contributes toward non-psychedelic predictions (Figure 4). We focused on known molecules that induce head-twitch response as examples from the ‘head-twitch dataset’. The PsychLight model tended to highlight the aromatic ring, two carbons, and nitrogen (tryptamines or phenethylamines depending on the aromatic ring) for almost every structure, suggesting that other atoms attached to the nitrogen are not as important for a head-twitch response. The Shulgin model highlighted similar atoms, with a stronger discrimination (green highlighting) against atoms and structures attached to the aromatic rings. This tracks with known structure-activity relationships for serotonin 5-HT2A agonists25. As 5-HT2A agonism is the most likely target underlying the in vivo head-twitch response, this suggests the machine learning models are very likely picking up potentially relevant mechanistic features of molecules for this receptor interaction in vivo. Interestingly, The Shulgin model highlighted distinct substructures in comparison to the PsychLight model, suggesting the models themselves are learning different features but also potentially finding complementary substructural features for psychedelics.
Figure 4.

Atomic contribution Highlighting of test set known psychedelics. Atoms are highlighted based on their contribution towards active (psychedelic) or non-active (not-psychedelic) predictions. A stronger red represents more contribution toward active predictions while green indicates a stronger contribution toward non-active predictions.
Pretrained Large Language Model fine-tuned on Psychedelic datasets outperforms classically trained models
The introduction of state-of-the-art large language models (LLMs) has revolutionized the field of machine learning and have been used to great effect. A common approach to reduce resource cost is to first pre-train a large language model on a large corpus of data before fine-tuning it for tasks with smaller but related tasks, utilizing transfer-learning to help improve model performance26.
We fine-tuned a Bidirectional and Auto-Regressive Transformer (BART)27 pre-trained on Simplified Molecular Line Entry System (SMILES), which acts as a natural language for molecule representation (MolBART)28. Our fine-tuned model was set up as a multi-task model to predict both the PsychLight and Shulgin psychedelic activity separately. Due to the small dataset sizes, we performed 5-fold cross validation for the MolBART fine-tuning. We ensured that structures present in both PsychLight and Shulgin dataset were not in either in the training or test sets together. We found that the fine-tuned model significantly outperformed classic models (Table 8), suggesting the pre-training and related multi-task objective likely improved model performance in this case.
Table 8.
PsychLight16 and Shulgin and Shulgin PIHKAL17 and TIHKAL18 fine-tuned MolBART model nested cross-validation statistics. tn = true negative, fp = false positive, fn = false negative, tp = true positive. Confusion matrix numbers represent sum of the cross-validation number, e.g. tn = sum of all true negative over all folds. tn = true negative, fp = false positive, fn = false negative, tp = true positive. Confusion matrix numbers represent sum of the cross-validation number, e.g. tn = sum of all true negative over all folds.
| Dataset | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PsychLight_A | 0.93 | 0.86 | 0.88 | 0.78 | 0.81 | 0.93 | 0.80 | 0.97 | 29 | 1 | 2 | 8 |
| Shulgin | 0.91 | 0.90 | 0.90 | 0.81 | 0.81 | 0.91 | 0.89 | 0.92 | 83 | 7 | 9 | 71 |
We then fine-tuned a final MolBART model on all the training data from the Shulgin and PsychLight datasets. The model began overfitting during training (Figure 5), and thus we chose the final model as the checkpoint with the lowest validation loss (epoch 135, training step 1087). Although the cross-validation was improved over the traditional models, the MolBART model performed similarly on the external test set and head-twitch data (Table 9), and somewhat underperformed on the head-twitch response.
Figure 5.

Training and Validation loss of MolBART Fine-tuning of the Shulgin and PsychLight datasets.
Table 9.
PsychLight16 and Shulgin and Shulgin PIHKAL17 and TIHKAL18 fine-tuned MolBART model prediction statistics for external 5HT2A dataset and head-twitch dataset. tn = true negative, fp = false positive, fn = false negative, tp = true positive. Confusion matrix numbers represent sum of the cross-validation number, e.g. tn = sum of all true negative over all folds. tn = true negative, fp = false positive, fn = false negative, tp = true positive. Confusion matrix numbers represent sum of the cross-validation number, e.g. tn = sum of all true negative over all folds.
| Test Set Predictions | ACC | F1-Score | AUC | Cohen’s Kappa | MCC | Precision | Recall | Specificity | tn | fp | fn | tp |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5HT2A# | 0.73 | 0.40 | 0.63 | 0.30 | 0.42 | 1.00 | 0.25 | 1.00 | 14 | 0 | 6 | 2 |
| 5HT2A& | 0.70 | 0.60 | 0.72 | 0.38 | 0.40 | 0.50 | 0.75 | 0.68 | 13 | 6 | 2 | 6 |
| Head-twitch# | 0.27 | 0.19 | 0.44 | −0.07 | −0.15 | 0.60 | 0.11 | 0.76 | 13 | 4 | 48 | 6 |
| Head-twitch& | 0.38 | 0.39 | 0.42 | −0.12 | −0.15 | 0.58 | 0.29 | 0.56 | 10 | 8 | 27 | 11 |
PsychLight dataset
Shulgin dataset
Electrostatic Descriptors
We next developed a novel molecular fingerprint based on electrostatic similarity of fragments of the core structure of the molecule. Our goal in developing these fingerprints was to create a more generalizable description of the molecule, since the electrostatic similarity is an abstract representation of molecular features, and we are modeling in vivo effects. The method compares the electrostatic similarity of molecule fragments to the electrostatic similarity of all fragments that exist in the training sets to generate a feature vector (See Methods). We trained 8 classical machine learning models in the same nested cross validation training loop as the previous models on both the PsychLight A and Shulgin datasets using electrostatic descriptors (Table S6 and Table S7, respectively). The majority of the PsychLight models had similar statistics compared to models with Morgan Fingerprints, with the notable exception of the SVC model, which produced the best results (MCC 0.72; Precision 0.77; Recall 0.9). The Shulgin models showed similar metrics, suggesting both feature sets carried sufficient information and the dataset size may be the bottleneck in model performance. The electrostatic fingerprint models performed similar or worse compared to Morgan Fingerprints on the external test set (Table S8 and Table S9). However, the electrostatic fingerprint PsychLight models performed significantly better than the Morgan fingerprint models on the mouse head-twitch dataset (Tables S10 and S11).
CONCLUSIONS
Historically there have been limited structure activity relationships (SAR) generated by medicinal chemists to explore the psychedelic effect of various natural products and their derivatives. When SARs were characterized, they tended to be focused on specific receptors like 5-HT2A29 and these efforts have not led to machine learning models to predict this in vivo activity. To remedy this, we searched for datasets that we could use. Unlike in vitro data, which dominates most public bioactivity databases such as ChEMBL30, PubChem31 etc., curated in vivo biological data is much harder to find. In the case of datasets for psychedelic molecules with hallucinogenic potential, these do not appear to have been systematically curated to form the basis of machine learning models to predict this property. We have now demonstrated that such machine learning models can be generated with the limited published for in vitro and in vivo data available and in the process, we have described both nested 5-fold cross validation as well as external validation with a set of known 5-HT2A receptor agonists with known hallucinogenic or non-hallucinogenic activity and in vivo in the mouse head twitch model.
Of interest, the similarity maps for atom contribution to model predictions suggested that the tryptamine and phenethylamine cores were important for a head-twitch response prediction, while other substructures were not relevant or even contributed strongly to a negative prediction. This tracks with the known structure-activity relationships for serotonin 5-HT2A agonists in particular25. Perhaps this is not surprising as 5-HT2A agonism is the most likely target underlying the in vivo head-twitch response, both psychedelic prediction models shown here are picking up potentially relevant mechanistic features. The availability of such predictive machine learning models could also represent accessible tools for future drug design when combined with generative models32 in order to create new psychoplastogens or analogs of existing psychedelics or for that matter completely novel molecules for other CNS or non-CNS targets that avoid hallucinogenic potential.
Limitations of this study are certainly the small training sets that are currently publicly available from single laboratory sources, (which is outside our control) as well as the small external test sets of known 5-HT2A receptor agonists and other molecules with known head-twitch activity. Further curation of the literature might add more data on molecules with head-twitch data as well as the potential for more in vitro data from future publications using PsychLight. It is unlikely we would be able to add more human in vivo data on psychedelics without clinical studies which would be slow and expensive. We explored 8 different classical machine learning classification algorithms as well as large language models28 and observed a wide range in performance, suggesting the benefits of assessing several different machine learning approaches in parallel. In general, the methods used provided at least one algorithm with very good accuracy for the test sets. We could also have assessed many other machine learning methods such as few shot learning33, 34, graph-based networks35, and other deep learning based algorithms such as LSTMs36.
The incorporation of conformal predictors improved model performance of both the Shulgin and PsychLight models on cross-validation and external test set metrics. This suggests conformal predictors are robust and can improve prediction quality at the cost of removing uncertain predictions. The small dataset sizes in this study certainly decreased the quality of the calibration, leading to less performance gains than theoretically expected, and increasing the calibration and training dataset sizes could improve both the model and threshold accuracy.
The fine-tuning of a state-of-the-art, pretrained large language model (MolBART) led to similar predictions over other machine learning methods and models tried under cross-validation. However, the predictions on external test sets were generally not improved greatly over the baseline models. This could be due to mis-alignment on the tasks as we assume the in vivo Shulgin and PsychLight models are good predictors of the external serotonin receptor agonists and head-twitch phenotype, although this is not necessarily the case, and direct comparison of models on these external test sets may not be easily interpretable.
The larger Shulgin dataset has higher coverage of chemical property space around the tests sets while the PsychLight dataset was confined to a smaller region based on the t-SNE plot. This correlates with the decreased performance of the PsychLight model on the external test sets in comparison to the more diverse Shulgin model. It would be of interest to see how future machine learning models perform with more structurally diverse molecules well outside applicability domains of the models in a larger chemical property space such as all approved drugs or other classes of molecules. The head twitch dataset is imbalanced in favor of psychedelics. This may suggest in future the need to further exhaustively curate the literature, to specifically add more negative head twitch data and then use this to see if it improves the model statistics and prediction capabilities.
In this study we have also used a large language model MolBART28 fine-tuned on our curated datasets as a comparison which offered little improvement, likely again due to the limited amount of data available which is outside of the sweet-spot for this method relative to other approaches34. There are certainly other large language models such as RoBERTa37, DeBERTa38, and ELECTRA-DTA39 which could be used and compared in future with the training and test datasets described here.
It has also not escaped our notice that such machine learning models could also have the potential for dual use40 in the development of illicit psychedelics and hence it will be important to ensure that such machine learning models are not misused to help design and synthesize such molecules. We have demonstrated that extracting the data from public sources to enable these models is no longer a limitation and anyone can access this. As synthesis of such AI designed psychedelic molecules is an additional step, the prior published syntheses for PIHKAL17 and TIHKAL18 classes of compounds suggest that this is within reach of a skilled medicinal chemist though perhaps would warrant caution in going as far as Shulgin in terms of testing them on oneself.
In conclusion, these machine learning models offer an additional novel tool that could be used alongside in vitro and in vivo approaches for predicting psychedelic effect in humans. This would ultimately aid the design of molecules that do not possess this capability but retain useful therapeutic efficacy to treat important diseases of the central nervous system which plague humankind.
METHODS
Data curation
PsychLight datasets
Molecule structures were obtained from ChemSpider41 or PubChem42. The PsychLight Assay included known hallucinogenic and non-hallucinogenic compounds, which were compiled into a classification dataset (N = 36 non-hallucinogenic and 18 hallucinogenic compounds) called “PsychLight A”. A previously published heatmap of ligand scores for the PsychLight assay was used to classify molecules with unknown hallucinogenic potential as hallucinogenics (scores greater than 0, N = 8) or non-hallucinogenics (scores less than 0, N = 22)16. When combined with PsychLight A, we obtain a dataset of 26 predicted or known hallucinogenic compounds and 58 known or predicted non-hallucinogenic compounds which we call PsychLight B. For the PsychLight SL model, as some unknown hallucinogenic ligand scores span an uncertain scoring region which could be assigned to hallucinogenic or non-hallucinogenic compounds, we assign soft-labels to represent uncertainty for. LED-A-9, S-MDDMA, LED-A-112, assigning a score of 0.25 to represent the uncertainty of the 0 class.
Shulgin dataset
PIHKAL17 and TIHKAL18 represent two key texts consisting of N = 164 and N = 56 molecules, respectively, which describe the synthesis and testing of psychedelics in humans, in some cases at different doses. While the data is not truly quantitative, the ‘patient’ responses are semi-quantified with scores. Qualitative comments and commentary sections for each molecule were scored from 0 (no psychedelic effect) to 3 (sustained psychedelic effect). When explicit numbers were absent from mention in the text a best judgement was made from reading the patient experiences as described by the author. To have a more balanced dataset, we assigned score 0 and 1 to Class 0, and scores 2 and 3 to Class 1. This resulted in a balanced dataset of 102 actives and 119 inactives, representing no/low psychedelic potency and high psychedelic potency respectively.
Head-twitch response dataset
A PubMed search was performed on keywords “head twitch”. Approx. 200 references were initially retrieved. A selection of these papers was used to curate a dataset with reasonable chemical diversity (see Supplemental References and Supplemental Table S5). This file was standardized with our proprietary “E-Clean” software using open-source RDkit functions and then used in Assay Central43.
Machine learning
Our software, Assay Central uses multiple machine learning algorithms integrated in web-based software to build models, as described previously20. The machine learning model validation was performed using a nested 5-fold cross validation, which performed a hyperparameter search in the inner-loop and statistical evaluation in the outer-loop of the nested cross-validation. The final nested 5-fold cross validation scores are an average of each of the hold-out set metrics.
Conformal predictions
For both the PsychLight SVC and Shulgin SVC models, we combine two specialized cases of conformal predictors, Cross-Conformal Predictors and Class-Conditional Conformal Predictors to achieve a more reliable model. Class-Conditional Conformal Predictors construct a separate quantile for each class, psychedelic (1) and not psychedelic (0), which achieves the following coverage:
This ensures predictions are not biased for imbalanced classes. In the Cross Conformal Predictors procedure, we split the dataset into N separate folds, and use each fold as a calibration set for a model trained on the remaining folds to compute the class-specific . The final values chosen are the average of the 5 folds,
This provides a more stable value calculated from the average of the folds vs. a single value from a single calibration set.
MolBART Architecture and Training
We fine-tuned the base MolBART pretrained model provided by Irwin et al.28 on both the PsychLight and Shulgin datasets. In brief, the pretrained model is a Denoising Sequence-to-Sequence model which was trained on over 1.7 billion SMILES. The fine-tuning was performed on both the PsychLight and Shulgin datasets together. For the cross-validation, we separately split the PsychLight and Shulgin datasets into 5-folds. As a few molecules overlapped between the PsychLight and Shulgin datasets, for the cross-validation we ensured that identical molecules would exist in either the training or test sets of each fold so as not to bias the test statistics. Default hyperparameters were chosen for the model training and the model was trained for 300 epochs for each model training run. The model checkpoint with the lowest validation loss was chosen for all model fine-tuning. For each training run, 4 folds (80%) of the data was used for training and the remaining fold (20%) was split evenly into a validation and test set, so the final splits for each training run was 80:10:10 for train/validation/test, respectively. For the final model, all training data was used to fine-tune the model.
Chemical descriptors feature generation
For our chemical descriptors feature generation, we used ChemAxon Calculator to generate a set of 11 descriptors: logP, logD, pKa, Hydrogen Bond Donor (HBD), Total Polar Surface Area (TPSA), Molecular Weight (MW), sp3, Aromatic Ring Count (AroRing), Rotational Bond Count (RotBond), Hydrogen Bond Acceptor (HBA), and Aliphatic Ring Count (AliphRing).
Electrostatic descriptors feature generation
To identify core fragments, we first find the Murcko scaffold44 of a molecule, then break the Murcko scaffold into smaller fragments using the BRICS45 algorithm (as implemented in RDKit). Using this process, we identified every unique core fragment in the training and test sets, to be used as a reference set of core fragments. An individual fragment can be assigned a fingerprint vector by computing the electrostatic similarity of the fragment to each fragment in the reference set (scaled to a value between 0 and 1) using the ESPSIM package, and storing the results in a vector whose indices correspond to the reference set of fragments46. Finally, to fingerprint an entire molecule, we compute a fingerprint in this manner for each of its core fragments, and sum over all of the resulting vectors. To account for side groups and connections between fragments, we also compute an FCFP6 fingerprint after removing features associated with the isolated fragments, and we concatenate it onto the electrostatic fragment fingerprint.
t-SNE plots
We created t-SNE plots of the training and test sets in this study using the Scikit-learn library of Python (described previously47) with a perplexity 50, number of components set to 2, and the remaining parameters were set to default.
Supplementary Material
Grant information
We kindly acknowledge NIH funding from R43DA055419-01 from NIDA, 2R44ES031038 from NIEHS and R44GM122196-02A1 from NIGMS.
Footnotes
Supporting information
Supporting information includes a figure to provide an overview of the activities in this study and a figure showing SHAP analysis, PsychLight data excel file, Shulgin data excel file, Head twitch data, PsychLight and Shulgin model 5-fold nested cross validation statistics built with MACCS fingerprints, PsychLight and Shulgin model 5-fold nested cross validation statistics built with chemical descriptor fingerprints, PsychLight model 5-fold nested cross validation statistics built with electrostatic fingerprint descriptors, Shulgin and Shulgin PIHKAL and TIHKAL model 5-fold nested cross validation statistics built with electrostatic fingerprint descriptors, PsychLight electrostatic fingerprint model external validation statistics, Shulgin and Shulgin PIHKAL and TIHKAL model external validation, PsychLight electrostatic fingerprint model head-twitch external validation statistics, and Shulgin and Shulgin PIHKAL and TIHKAL electrostatic fingerprint model head-twitch external validation statistics
Conflicts of interest
S.E. is the owner of Collaborations Pharmaceuticals, Inc. and F.U., T.J., J.H., S.S., and T.R.L. are employees of Collaborations Pharmaceuticals, Inc.
Dual use statement
The machine learning models derived in this study have the potential to be used for guiding design of novel psychedelics and we would propose this should be avoided and that we carefully monitor who has access to such tools.
References
- (1).Salinsky LM; Merritt CR; Zamora JC; Giacomini JL; Anastasio NC; Cunningham KA μ-opioid receptor agonists and psychedelics: pharmacological opportunities and challenges. Frontiers in Pharmacology 2023, 14, Mini Review. DOI: 10.3389/fphar.2023.1239159. [DOI] [PMC free article] [PubMed] [Google Scholar]; Jones G; Ricard JA; Lipson J; Nock MK Associations between classic psychedelics and opioid use disorder in a nationally-representative U.S. adult sample. Scientific Reports 2022, 12 (1), 4099. DOI: 10.1038/s41598-022-08085-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (2).Peters J; Olson DE Engineering Safer Psychedelics for Treating Addiction. Neuroscience Insights 2021, 16, 26331055211033847. DOI: 10.1177/26331055211033847. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (3).Dos Santos RG; Hallak JEC Ayahuasca, an ancient substance with traditional and contemporary use in neuropsychiatry and neuroscience. Epilepsy Behav 2019, 106300. DOI: 10.1016/j.yebeh.2019.04.053. [DOI] [PubMed] [Google Scholar]
- (4).Ona G; Reverte I; Rossi GN; Dos Santos RG; Hallak JE; Colomina MT; Bouso JC Main targets of ibogaine and noribogaine associated with its putative anti-addictive effects: A mechanistic overview. J Psychopharmacol 2023, 37 (12), 1190–1200. DOI: 10.1177/02698811231200882 From NLM Medline. [DOI] [PubMed] [Google Scholar]
- (5).Ly C; Greb AC; Cameron LP; Wong JM; Barragan EV; Wilson PC; Burbach KF; Soltanzadeh Zarandi S; Sood A; Paddy MR; et al. Psychedelics Promote Structural and Functional Neural Plasticity. Cell Rep 2018, 23 (11), 3170–3182. DOI: 10.1016/j.celrep.2018.05.022 From NLM Medline. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (6).Aday JS; Mitzkovitz CM; Bloesch EK; Davoli CC; Davis AK Long-term effects of psychedelic drugs: A systematic review. Neurosci Biobehav Rev 2020, 113, 179–189. DOI: 10.1016/j.neubiorev.2020.03.017. [DOI] [PubMed] [Google Scholar]
- (7).Zhang R; Volkow ND Brain default-mode network dysfunction in addiction. Neuroimage 2019, 200, 313–331. DOI: 10.1016/j.neuroimage.2019.06.036 From NLM. [DOI] [PubMed] [Google Scholar]
- (8).Baldo BA Prefrontal Cortical Opioids and Dysregulated Motivation: A Network Hypothesis. Trends Neurosci 2016, 39 (6), 366–377. DOI: 10.1016/j.tins.2016.03.004 From NLM. [DOI] [PMC free article] [PubMed] [Google Scholar]; Giacchino JL; Henriksen SJ Opioid effects on activation of neurons in the medial prefrontal cortex. Prog Neuropsychopharmacol Biol Psychiatry 1998, 22 (7), 1157–1178. DOI: 10.1016/s0278-5846(98)00053-0 From NLM. [DOI] [PubMed] [Google Scholar]
- (9).Perkins D; Sarris J; Rossell S; Bonomo Y; Forbes D; Davey C; Hoyer D; Loo C; Murray G; Hood S; et al. Medicinal psychedelics for mental health and addiction: Advancing research of an emerging paradigm. Australian & New Zealand Journal of Psychiatry 0 (0), 0004867421998785. DOI: 10.1177/0004867421998785. [DOI] [PubMed] [Google Scholar]
- (10).Olson DE Psychoplastogens: A Promising Class of Plasticity-Promoting Neurotherapeutics. J Exp Neurosci 2018, 12, 1179069518800508. DOI: 10.1177/1179069518800508 From NLM. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (11).Cameron LP; Tombari RJ; Lu J; Pell AJ; Hurley ZQ; Ehinger Y; Vargas MV; McCarroll MN; Taylor JC; Myers-Turnbull D; et al. A non-hallucinogenic psychedelic analogue with therapeutic potential. Nature 2021, 589 (7842), 474–479. DOI: 10.1038/s41586-020-3008-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (12).Cohen-Cory S; Kidane AH; Shirkey NJ; Marshak S Brain-derived neurotrophic factor and the development of structural neuronal connectivity. Dev Neurobiol 2010, 70 (5), 271–288. DOI: 10.1002/dneu.20774 From NLM. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (13).Majić T; Schmidt TT; Gallinat J Peak experiences and the afterglow phenomenon: when and how do therapeutic effects of hallucinogens depend on psychedelic experiences? J Psychopharmacol 2015, 29 (3), 241–253. DOI: 10.1177/0269881114568040 From NLM. [DOI] [PubMed] [Google Scholar]
- (14).Dunlap LE; Azinfar A; Ly C; Cameron LP; Viswanathan J; Tombari RJ; Myers-Turnbull D; Taylor JC; Grodzki AC; Lein PJ; et al. Identification of Psychoplastogenic N,N-Dimethylaminoisotryptamine (isoDMT) Analogues through Structure-Activity Relationship Studies. J Med Chem 2020, 63 (3), 1142–1155. DOI: 10.1021/acs.jmedchem.9b01404 From NLM Medline. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (15).Duan W; Cao D; Wang S; Cheng J Serotonin 2A Receptor (5-HT(2A)R) Agonists: Psychedelics and Non-Hallucinogenic Analogues as Emerging Antidepressants. Chem Rev 2024, 124 (1), 124–163. DOI: 10.1021/acs.chemrev.3c00375 From NLM Medline. [DOI] [PubMed] [Google Scholar]
- (16).Dong C; Ly C; Dunlap LE; Vargas MV; Sun J; Hwang IW; Azinfar A; Oh WC; Wetsel WC; Olson DE; Tian L Psychedelic-inspired drug discovery using an engineered biosensor. Cell 2021, 184 (10), 2779–2792 e2718. DOI: 10.1016/j.cell.2021.03.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (17).Shulgin A; Shulgin A PIHKAL: A chemical love story; Transform press, 1990. [Google Scholar]
- (18).Shulgin A; Shulgin A TIHKAL: The Continuation; Transform Press, 2002. [Google Scholar]
- (19).Cortes-Ciriano I; Bender A Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Prediction Errors for Deep Neural Networks. J Chem Inf Model 2019, 59 (3), 1269–1281. DOI: 10.1021/acs.jcim.8b00542. [DOI] [PubMed] [Google Scholar]; Cortes-Ciriano I; Bender A Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout. J Chem Inf Model 2019, 59 (7), 3330–3339. DOI: 10.1021/acs.jcim.9b00297. [DOI] [PubMed] [Google Scholar]; Norinder U; Boyer S Conformal Prediction Classification of a Large Data Set of Environmental Chemicals from ToxCast and Tox21 Estrogen Receptor Assays. Chem Res Toxicol 2016, 29 (6), 1003–1010. DOI: 10.1021/acs.chemrestox.6b00037. [DOI] [PubMed] [Google Scholar]; Fagerholm U; Hellberg S; Alvarsson J; Arvidsson McShane S; Spjuth O In silico prediction of volume of distribution of drugs in man using conformal prediction performs on par with animal data-based models. Xenobiotica 2021, 51 (12), 1366–1371. DOI: 10.1080/00498254.2021.2011471. [DOI] [PubMed] [Google Scholar]; Alvarsson J; Arvidsson McShane S; Norinder U; Spjuth O Predicting With Confidence: Using Conformal Prediction in Drug Discovery. J Pharm Sci 2021, 110 (1), 42–49. DOI: 10.1016/j.xphs.2020.09.055. [DOI] [PubMed] [Google Scholar]; Angelopoulou AN; Bates S A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification. arXiv:2107.07511 2021.
- (20).Lane T; Russo DP; Zorn KM; Clark AM; Korotcov A; Tkachenko V; Reynolds RC; Perryman AL; Freundlich JS; Ekins S Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 2018, 15 (10), 4346–4360. DOI: 10.1021/acs.molpharmaceut.8b00083. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (21).Anon. RDKit: Open-Source Cheminformatics Software. www.rdkit.org (accessed.
- (22).ChemAxon. Calculator Plugins were used for structure property prediction and calculation, Marvin 20.16.0, 2020. 2024. http://www.chemaxon.com (accessed.
- (23).Lundberg S; Lee S-I A Unified Approach to Interpreting Model Predictions. 2017; p arXiv:1705.07874.
- (24).Riniker S; Landrum GA Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods. J Cheminform 2013, 5 (1), 43. DOI: 10.1186/1758-2946-5-43 From NLM PubMed-not-MEDLINE. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (25).Sard H; Kumaran G; Morency C; Roth BL; Toth BA; He P; Shuster L SAR of psilocybin analogs: discovery of a selective 5-HT 2C agonist. Bioorg Med Chem Lett 2005, 15 (20), 4555–4559. DOI: 10.1016/j.bmcl.2005.06.104 From NLM Medline. [DOI] [PubMed] [Google Scholar]; Kolaczynska KE; Luethi D; Trachsel D; Hoener MC; Liechti ME Receptor Interaction Profiles of 4-Alkoxy-3,5-Dimethoxy-Phenethylamines (Mescaline Derivatives) and Related Amphetamines. Front Pharmacol 2021, 12, 794254. DOI: 10.3389/fphar.2021.794254 From NLM PubMed-not-MEDLINE. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (26).Raffel C; Shazeer N; Roberts A; Lee K; Narang S; Matena M; Zhou Y; Li W; Liu PJ Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. 2019; p arXiv:1910.10683.; Yenduri G; M R; Selvi G C; Y S; Srivastava G; Maddikunta PKR; G DR; Jhaveri RH; B P; Wang W; et al. Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. 2023; p arXiv:2305.10435.
- (27).Lewis M; Liu Y; Goyal N; Ghazvininejad M; Mohamed A; Levy O; Stoyanov V; Zettlemoyer L BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. 2019; p arXiv:1910.13461.
- (28).Irwin R; Dimitriadis S; He J; Bjerrum EJ Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 2022, 3 (1), 015022. DOI: 10.1088/2632-2153/ac3ffb. [DOI] [Google Scholar]
- (29).Nichols DE From ‘there’ to ‘here’: psychedelic natural products and their contributions to medicinal chemistry. In Ethnopharmacologic search for psychoactive drugs - 2017, Prance G, McKenna DJ, De Loenen B, Davis W Eds.; Vol. II; Synergetic Press, 2017; pp 202–217. [Google Scholar]
- (30).Gaulton A; Hersey A; Nowotka M; Bento AP; Chambers J; Mendez D; Mutowo P; Atkinson F; Bellis LJ; Cibrian-Uhalte E; et al. The ChEMBL database in 2017. Nucleic Acids Res 2017, 45 (D1), D945–D954. DOI: 10.1093/nar/gkw1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (31).Kim S; Thiessen PA; Bolton EE; Chen J; Fu G; Gindulyte A; Han L; He J; He S; Shoemaker BA; et al. PubChem Substance and Compound databases. Nucleic Acids Res 2016, 44 (D1), D1202–1213. DOI: 10.1093/nar/gkv951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Urbina F; Lowden CT; Culberson JC; Ekins S MegaSyn: Integrating Generative Molecular Design, Automated Analog Designer, and Synthetic Viability Prediction. ACS Omega 2022, 7 (22), 18699–18713. DOI: 10.1021/acsomega.2c01404. [DOI] [PMC free article] [PubMed] [Google Scholar]; Zhavoronkov A; Ivanenkov YA; Aliper A; Veselov MS; Aladinskiy VA; Aladinskaya AV; Terentiev VA; Polykovskiy DA; Kuznetsov MD; Asadulaev A; et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat Biotechnol 2019, 37 (9), 1038–1040. DOI: 10.1038/s41587-019-0224-x. [DOI] [PubMed] [Google Scholar]; Putin E; Asadulaev A; Vanhaelen Q; Ivanenkov Y; Aladinskaya AV; Aliper A; Zhavoronkov A Adversarial Threshold Neural Computer for Molecular de Novo Design. Mol Pharm 2018, 15 (10), 4386–4397. DOI: 10.1021/acs.molpharmaceut.7b01137. [DOI] [PubMed] [Google Scholar]; Gomez-Bombarelli R; Wei JN; Duvenaud D; Hernandez-Lobato JM; Sanchez-Lengeling B; Sheberla D; Aguilera-Iparraguirre J; Hirzel TD; Adams RP; Aspuru-Guzik A Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Cent Sci 2018, 4 (2), 268–276. DOI: 10.1021/acscentsci.7b00572. [DOI] [PMC free article] [PubMed] [Google Scholar]; Kang SG; Morrone JA; Weber JK; Cornell WD Analysis of Training and Seed Bias in Small Molecules Generated with a Conditional Graph-Based Variational Autoencoder horizontal line Insights for Practical AI-Driven Molecule Generation. J Chem Inf Model 2022, 62 (4), 801–816. DOI: 10.1021/acs.jcim.1c01545 From NLM Medline. [DOI] [PubMed] [Google Scholar]; Hochreiter S; Schmidhuber J Long Short-Term Memory. Neural Computation 1997, 9, 1735–1780. [DOI] [PubMed] [Google Scholar]; Blaschke T; Olivecrona M; Engkvist O; Bajorath J; Chen H Application of Generative Autoencoder in De Novo Molecular Design. Mol Inform 2018, 37 (1–2). DOI: 10.1002/minf.201700123. [DOI] [PMC free article] [PubMed] [Google Scholar]; Sanchez-Lengeling B; Outeiral C; Guimaraes GL; Aspuru-Guzik A Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC). 2017. https://chemrxiv.org/engage/chemrxiv/article-details/60c73d91702a9beea7189bc2 (accessed. [Google Scholar]; Winter R; Montanari F; Steffen A; Briem H; Noé F; Clevert D-A Efficient multi-objective molecular optimization in a continuous latent space. Chemical Science 2019, 10 (34), 8016–8024, 10.1039/C9SC01928F. DOI: 10.1039/C9SC01928F. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (33).Vinyals O; Blundell C; Lillicrap T; Kavukcuoglu K; Wierstra D Matching Networks for One Shot Learning. 2016; p arXiv:1606.04080.; Vella D; Ebejer J-P Few-Shot Learning for Low-Data Drug Discovery. Journal of Chemical Information and Modeling 2023, 63 (1), 27–42. DOI: 10.1021/acs.jcim.2c00779. [DOI] [PubMed] [Google Scholar]; Altae-Tran H; Ramsundar B; Pappu AS; Pande V Low Data Drug Discovery with One-Shot Learning. ACS Cent Sci 2017, 3 (4), 283–293. DOI: 10.1021/acscentsci.6b00367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (34).Snyder SH; Vignaux PA; Ozalp MK; Gerlach J; Puhl AC; Lane TR; Corbett J; Urbina F; Ekins S The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications. Commun Chem 2024, 7 (1), 134. DOI: 10.1038/s42004-024-01220-4 From NLM PubMed-not-MEDLINE. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (35).Liu K; Sun X; Jia L; Ma J; Xing H; Wu J; Gao H; Sun Y; Boulnois F; Fan J Chemi-net: a graph convolutional network for accurate drug property prediction. 2018. arXiv:1803.06236v2 (accessed. [DOI] [PMC free article] [PubMed]; Yao K; Wang X; Li W; Zhu H; Jiang Y; Li Y; Tian T; Yang Z; Liu Q; Liu Q Semi-supervised heterogeneous graph contrastive learning for drug-target interaction prediction. Comput Biol Med 2023, 163, 107199. DOI: 10.1016/j.compbiomed.2023.107199. [DOI] [PubMed] [Google Scholar]
- (36).Greff K; Srivastava RK; Koutník J; Steunebrink BR; Schmidhuber J LSTM: A Search Space Odyssey. IEEE Transactions on Neural Networks and Learning Systems 2017, 28 (10), 2222–2232. DOI: 10.1109/TNNLS.2016.2582924. [DOI] [PubMed] [Google Scholar]; Urbina F; Batra K; Luebke KJ; White JD; Matsiev D; Olson LL; Malerich JP; Hupcey MAZ; Madrid PB; Ekins S UV-adVISor: Attention-Based Recurrent Neural Networks to Predict UV-Vis Spectra. Anal Chem 2021, 93 (48), 16076–16085. DOI: 10.1021/acs.analchem.1c03741. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (37).Liu Y; Ott M; Goyal N; Du J; Joshi M; Chen D; Levy O; Lewis M; Zettlemoyer L; Stoyanov V RoBERTa: A Robustly Optimized BERT Pretraining Approach. 2019; p arXiv:1907.11692.
- (38).He P; Liu X; Gao J; Chen W DeBERTa: Decoding-enhanced BERT with Disentangled Attention. 2020; p arXiv:2006.03654.
- (39).Wang J; Wen N; Wang C; Zhao L; Cheng L ELECTRA-DTA: a new compound-protein binding affinity prediction model based on the contextualized sequence encoding. Journal of Cheminformatics 2022, 14 (1), 14. DOI: 10.1186/s13321-022-00591-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (40).Urbina F; Lentzos F; Invernizzi C; Ekins S Dual Use of Artificial Intelligence-powered Drug Discovery. Nat Mach Intell 2022, 4 (3), 189–191. DOI: 10.1038/s42256-022-00465-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (41).Pence HE; Williams AJ ChemSpider: An Online Chemical Information Resource. J Chem Educ 2010, 87, 1123–1124. [Google Scholar]
- (42).Wang Y; Xiao J; Suzek TO; Zhang J; Wang J; Bryant SH PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res 2009, 37 (Web Server issue), W623–633. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (43).Lane TR; Foil DH; Minerali E; Urbina F; Zorn KM; Ekins S Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery. Mol Pharm 2021, 18 (1), 403–415. DOI: 10.1021/acs.molpharmaceut.0c01013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (44).Bemis GW; Murcko MA The properties of known drugs 1. molcular frameworks. J Med Chem 1996, 39, 2887–2893. [DOI] [PubMed] [Google Scholar]
- (45).Degen J; Wegscheid-Gerlach C; Zaliani A; Rarey M On the art of compiling and using ‘drug-like’ chemical fragment spaces. ChemMedChem 2008, 3 (10), 1503–1507. DOI: 10.1002/cmdc.200800178 From NLM Medline. [DOI] [PubMed] [Google Scholar]
- (46).Bolcato G; Heid E; Bostrom J On the Value of Using 3D Shape and Electrostatic Similarities in Deep Generative Methods. J Chem Inf Model 2022, 62 (6), 1388–1398. DOI: 10.1021/acs.jcim.1c01535 From NLM Medline. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (47).Lane TR; Harris J; Urbina F; Ekins S Comparing LD(50)/LC(50) Machine Learning Models for Multiple Species. J Chem Health Saf 2023, 30 (2), 83–97. DOI: 10.1021/acs.chas.2c00088. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
