Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2024 Sep 18;151(3):280–292. doi: 10.1111/acps.13754

Psychosis Prognosis Predictor: A continuous and uncertainty‐aware prediction of treatment outcome in first‐episode psychosis

Daniël P J van Opstal 1,, Seyed Mostafa Kia 1,2,3, Lea Jakob 4,5, Metten Somers 1, Iris E C Sommer 6, Inge Winter‐van Rossum 1,7, René S Kahn 7, Wiepke Cahn 1, Hugo G Schnack 1,8
PMCID: PMC11787921  PMID: 39293941

Abstract

Introduction

Machine learning models have shown promising potential in individual‐level outcome prediction for patients with psychosis, but also have several limitations. To address some of these limitations, we present a model that predicts multiple outcomes, based on longitudinal patient data, while integrating prediction uncertainty to facilitate more reliable clinical decision‐making.

Material and Methods

We devised a recurrent neural network architecture incorporating long short‐term memory (LSTM) units to facilitate outcome prediction by leveraging multimodal baseline variables and clinical data collected at multiple time points. To account for model uncertainty, we employed a novel fuzzy logic approach to integrate the level of uncertainty into individual predictions. We predicted antipsychotic treatment outcomes in 446 first‐episode psychosis patients in the OPTiMiSE study, for six different clinical scenarios. The treatment outcome measures assessed at both week 4 and week 10 encompassed symptomatic remission, clinical global remission, and functional remission.

Results

Using only baseline predictors to predict different outcomes at week 4, leave‐one‐site‐out validation AUC ranged from 0.62 to 0.66; performance improved when clinical data from week 1 was added (AUC = 0.66–0.71). For outcome at week 10, using only baseline variables, the models achieved AUC = 0.56–0.64; using data from more time points (weeks 1, 4, and 6) improved the performance to AUC = 0.72–0.74. After incorporating prediction uncertainties and stratifying the model decisions based on model confidence, we could achieve accuracies above 0.8 for ~50% of patients in five out of the six clinical scenarios.

Conclusion

We constructed prediction models utilizing a recurrent neural network architecture tailored to clinical scenarios derived from a time series dataset. One crucial aspect we incorporated was the consideration of uncertainty in individual predictions, which enhances the reliability of decision‐making based on the model's output. We provided evidence showcasing the significance of leveraging time series data for achieving more accurate treatment outcome prediction in the field of psychiatry.

Keywords: psychosis prognosis prediction, machine learning, uncertainty‐aware decision making, precision psychiatry


Significant outcomes

  • We built a model for predicting multiple outcomes that can handle time series data; the model's performance improved when receiving more information over time.

  • The model incorporates the uncertainty of individual predictions in the decision‐making process; we demonstrated that this results in safer prognostic decisions.

  • Models with these properties (multi‐task, time series based, considering uncertainty) can be translated into prediction tools for clinical practice.

Limitations

  • Due to the design of the OPTiMiSE trial, the sample size for predictions at week 10 was up to four times smaller compared to for predictions at week 4.

  • Further validation on external datasets is needed.

  • All patients in the dataset used amisulpride; generalization of our model to patients receiving other treatments needs to be investigated.

1. INTRODUCTION

There is an abundance of research into predictors of outcome in psychosis, but until now, clinicians are unable to reliably predict the disease course nor the success rate of (pharmacological) treatment intervention(s) of an individual patient. A possible way forward is the use of machine learning techniques.1, 2 In psychiatry research, machine learning techniques are increasingly being used, particularly in psychotic disorders. 3 Several studies4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 examined illness progress in existing psychotic disorders, each predicting different outcomes. Of these, two studies4, 14 aimed to predict antipsychotic treatment response. In an open‐label randomized clinical trial of five broadly used antipsychotics (N = 334), 4 clinical and sociodemographic variables were used as inputs to a support vector machine to predict the level of functioning at four and 52 weeks after the start of antipsychotic treatment in patients with first‐episode psychosis with an accuracy of 71%–72%. Another study 14 predicted response to asenapine in a double‐blind, placebo‐controlled trial including 532 patients, and found that early improvement of several individual symptoms predicted treatment response with an accuracy of 78%–85%.

The aforementioned psychosis prognosis prediction studies have noteworthy limitations that hinder their practical use as prediction tools in day‐to‐day clinical practice. Firstly, the importance of different outcomes may vary for individual patients, and therefore, clinicians and patients should have the ability to choose the relevant outcomes to be predicted. However, most existing prediction models in psychosis research are single‐task models, focusing on predicting only one outcome measure. To overcome this limitation, we employ a multi‐task learning 15 approach in our study, training a model to predict multiple outcome measures simultaneously.

Secondly, in clinical practice, it is crucial to have an adaptive prediction tool that can accommodate the changes in a patient's status and incorporate additional information obtained during each visit. Traditional machine learning methods used in psychosis prognosis prediction lack the ability to accommodate the dynamic nature of patients' status. To address this limitation, we propose employing a machine learning approach capable of making predictions based on multiple assessments over time. One such approach is long short‐term memory (LSTM), a type of recurrent neural network that has been successfully used in various healthcare domains. 16

Thirdly, as machine learning models can be uncertain about their prediction (like human beings), the clinician needs to be informed about the uncertainty involved in model predictions. Knowing that the model is (very) sure about a certain prediction, the clinician may more confidently integrate the machine's prediction with their own judgment. On the other hand, when the model is unsure about a certain prediction, the clinician may opt not to use the machine's prediction as a guide to treat the patient. To date, most models used for treatment outcome prediction do not incorporate the uncertainty in estimated model parameters (i.e., the epistemic uncertainty 17 ) into the model predictions, so it is unclear how far we can trust their predictions. Therefore it is desirable to integrate the model uncertainty in the predictions to facilitate the clinical usage of the model and to allow for more trustworthy decision‐making.18, 19 Such an improvement eventually results in safer prediction models and will reduce the risk of making wrong decisions, for example, by tapering or switching antipsychotic medication too early or unnecessarily late.

In our study, we present a machine learning framework that predicts multiple outcomes based on longitudinal patient data while integrating prediction uncertainty to facilitate more reliable clinical decision‐making. This prediction model was trained using data from the OPTiMiSE study, an international multicenter prospective clinical research trial. 20 By addressing aforementioned limitations (which are discussed in detail in the Supporting Information  S12), we aim to enhance the applicability and trustworthiness of prediction models in guiding clinical practice and optimizing treatment strategies for patients with psychosis.

2. MATERIALS AND METHODS

2.1. The Psychosis Prognosis Predictor

Treatment of a first‐episode patient is a sequence of (re)evaluation of the patient's status and effects of treatment thus far and decisions about (changing) treatment. At each time point, the psychiatrist integrates newly available data with information gathered in the past. For it to be useful in this clinical practice, a machine‐learning prediction tool must do the same. In this study, the functioning of the prediction model is evaluated in the OPTiMiSE study. 20

2.2. The dataset

The OPTiMiSE study 20 is a large, international, multicenter antipsychotic three‐phase switching study. The study was conducted in 27 sites in 14 European countries and Israel. Patients with first‐episode psychosis were examined at multiple visits and treated with antipsychotic medication that could be changed based on the patient's response. We used data from patients in phase one and phase two. In the first phase, patients (N = 446/371 started/completed) were treated with amisulpride (up to 800 mg/day) for four weeks. Patients who then met the criteria for symptomatic remission did not continue to the next phase. Patients not in remission went on to phase two (N = 93/72 started/completed) and either continued using amisulpride or switched to olanzapine (≤20 mg/day) for six weeks. Patient characteristics at the start of each treatment phase are shown in Table 1.

TABLE 1.

Patient characteristics at the start of each treatment phase.

Phase one (N = 446) Phase two (N = 93)
Age (years) 26.0 (6.0) 25.2 (5.4)
Sex
Women 134 (30%) 23 (25%)
Men 312 (70%) 70 (75%)
Race
White 386 (87%) 86 (92%)
Other 60 (13%) 7 (8%)
Education (years) a 12.3 (3.0) 11.9 (2.7)
Living status
Independently 83 (19%) 20 (22%)
With assistance 363 (81%) 73 (78%)
Employment status
Employed or student 185 (41%) 33 (35%)
Unemployed 261 (59%) 60 (65%)
Disease type b
Schizophreniform disorder 190 (43%) 28 (30%)
Schizoaffective disorder 27 (6%) 2 (2%)
Schizophrenia 229 (51%) 63 (68%)
Comorbid major depressive disorder 34/429 (8%) 9/91 (10%)
Suicidality 55/429 (13%) 10/91 (11%)
Substance abuse or dependence in the past 12 months 75/429 (17%) 9/91 (10%)
Type of care at baseline
Inpatient 276 (62%) 53 (57%)
Outpatient 170 (38%) 40 (43%)
Duration of untreated psychosis (months) 6.3 (6.2) 8.4 (7.3)
Antipsychotic naïve 187 (42%) 54 (58%)
Clinical scores c
PANSS total score 78.2 (18.7) 85.7 (16.4)
PANSS Positive subscale 20.2 (5.5) 21.7 (5.1)
PANSS Negative subscale 19.4 (7.1) 22.4 (7.0)
PANSS General subscale 38.6 (9.8) 41.6 (9.3)
CGI severity d 4.5 (0.9) 4.7 (0.8)
Depression score e 13.5 (4.6) 14.2 (4.8)
BMI 23.4 (5.0) 23.9 (4.3)

Note: Values are mean (sd), n (%), or n/N (%) (because of incomplete data).

Abbreviations: BMI, body‐mass index (kg/m2); CGI, clinical global impression; PANSS, Positive and Negative Syndrome Scale.

a

In school from age 6 years onwards.

b

According to the Mini International Neuropsychiatric Interview (suicidality: medium to high suicide risk).

c

Scores range from 30 to 210 (total score), 7–49 (positive and negative scale), and 16–112 (general scale); high scores indicate severe psychopathology.

d

Scores range from 1 to 7; high scores indicate increased severity of illness.

e

According to the Calgary Depression Scale for Schizophrenia. Scores range from 0 to 27; high scores indicate increased depression.

2.3. Outcome measures and predictors

Our primary outcome measure for prediction was symptomatic remission. Secondary outcome measures were clinical global remission and functional remission. Symptomatic remission was defined the same way as in the OPTiMiSE study, according to the consensus criteria of Andreasen et al. 21 based on the Positive And Negative Syndrome Scale (PANSS), 22 albeit without the minimum duration of six months. For global illness, we used the Clinical Global Impression (CGI) scale. 23 We considered a CGI score of 4 or lower as clinical global remission. For the functional outcome, we used the Personal and Social Performance (PSP) scale. We considered a global PSP score of 71 points or higher as functional remission, following Morosini's definition where a global PSP score from 71 to 100 points refers only to mild difficulties. 24 For an overview of all features from the OPTiMiSE study that are used as predictors in our model, see Table 2.

TABLE 2.

The type, number, and list of features from the OPTiMiSE study that are used as predictors in our model.

Module Type Number of features Features
Static input features Demographic 20 Age (con), Sex (bin), Race (cat), Immigration status (bin), Marital status (bin), Divorce status (bin), Occupation status (bin), Occupation type (cat), Previous occupation status (bin), Previous occupation type (cat), Father's occupation (cat), Mother's occupation (cat), Years of education (con), Highest education level (cat), Father's highest degree (cat), Mother's highest degree (cat), Living status (bin), Dwelling (cat), Income source (cat), Living environment (cat)
Diagnostic 7 DSM‐IV classification (cat), Duration of the current psychotic episode (con), Current psychiatric treatment (cat), Psychosocial interventions status (bin), Estimated prognosis (cat), Hospitalization status (bin)
Lifestyle 7 Recreational drugs history (bin), Recreational drugs since last visit (bin), Caffeine drinks per day (con), Last caffeine drink (cat), Drink Alcohol (bin), Alcoholic drinks in the last year (cat), Smoking status (bin)
Somatic 11 Height (con), Weight (con), Waist (con), Hip (con), BMI (con), Systolic blood pressure (con), Diastolic blood pressure (con), Pulse (con), ECG abnormality (bin), Last mealtime (cat), Last meal type (cat)
Treatment 1 Average medication dosage (con)
CDSS 9 Calgary Depression Scale for Schizophrenia (con)
SWN 20 Subjective well‐being under Neuroleptic Treatment Scale (con)
MINI 67 Mini International Neuropsychiatric Interview (bin)
Dynamic input features PANSS 30 Positive And Negative Symdrome Scale (con)
PSP 5 Personal and Social Performance Scale (con)
CGI‐S 2 Clinical Global Impression Scale severity and improvement (con)

Abbreviations: bin, binary measure; cat, categorical measure; con, continuous measure.

Patients were assessed at various time points during different phases of the study. These assessments included baseline (week 0, W0), the end of phase one (week four, W4), and the end of phase two (week ten, W10), as well as additional assessments at weeks one, two, six, and eight (W1, W2, W6, and W8, respectively). These frequent assessments allow for a comprehensive evaluation of a patient's status and enable the tracking of changes over time. By incorporating data from these multiple time points, our study aims to capture the dynamic nature of the disease and improve the accuracy of psychosis prognosis prediction.

2.4. The design of the Psychosis Prognosis Predictor

We introduce a multi‐modal, time‐aware, and multi‐task recurrent neural network architecture designed specifically for psychosis prognosis prediction. This architecture is capable of handling multi‐modal data from various sources, capturing the dynamic nature of the data as it evolves over time, and simultaneously predicting multiple outcome measures. The proposed architecture, depicted in Figure 1, comprises four conceptual modules that work synergistically to predict the outcomes (for a detailed description about the model architecture, see Supporting Information S12):

  1. Static module: which receives input features that are not changing over time (i.e., the static features, see Table 2) and preprocesses them by imputing the missing values, scaling the continuous features, and one‐hot encoding of categorical features.

  2. Dynamic module: which receives input features that change over time (i.e., the dynamic features, see Table 2). This module includes modality‐specific LSTM units, a recurrent neural network architecture, 25 that is well suited for making predictions on time series data.26, 27 Each LSTM unit transfers the dynamic features from baseline (W0) to a user‐defined endpoint t, into a time‐varying middle representation.

  3. Regression module: which receives the outputs of the static and dynamic modules to predict the dynamic data at the next time point t + 1. The predicted outputs can be concatenated to the dynamic inputs at time point t and earlier, and fed again to the dynamic module for predicting the measures at t + 2. This recursive procedure can be employed for predicting the outcomes in the unlimited future.

  4. Classification module: which receives the same inputs as the regression module and predicts the probability of target classes (not‐remitted or remitted) at time t + 1 for three outcome measures (symptomatic remission, clinical global remission, and functional remission).

FIGURE 1.

FIGURE 1

The psychosis prognosis predictor architecture consists of four layers that are organized into four conceptual modules. The layers include (1) the representation learning layer that learns a middle representation for dynamic features; (2) the fusion layer that merges the preprocessed static features with dynamic middle representations; (3) the interaction layer that seeks to benefit from interaction between static features and dynamic features from different modalities; (4) the output layer that predicts the outputs at the next time step. The modules include (1) the static module for preprocessing and merging the static features; (2) the dynamic module that includes LSTM units for learning middle representation for dynamic features from time 0 to time t; (3) the regression module for predicting the dynamic measures at the next time step (t + 1), and (4) classification module for predicting the outcomes (CR, clinical global remission; FR, functional remission; SR, symptomatic remission) at the next time step. The prediction loop from the output of the regression module to the inputs of the dynamic module (the thick yellow arrow) enables the network to predict the outcomes at an arbitrary future point.

2.5. From predictions to uncertainty‐aware clinical decisions

In general, the probabilities predicted by a classifier are used as outcomes for clinical decision‐making, by discretizing the probabilities into classes of decisions by imposing a hard threshold (e.g., 0.5 in binary classification). However, a classifier, like a human being, can sometimes be unsure about its predictions. How sure a model is about its predictions can be quantified by incorporating the epistemic uncertainty,17, 28 that is, the uncertainty in the model parameters, into its predictions. Now, the challenge is to combine the predicted probabilities and their estimated uncertainties into final clinical decisions.

In this paper, we use a fuzzy logic 29 approach for translating the predictions of the model into uncertainty‐aware clinical decisions. Fuzzy logic provides a mathematical framework for representing vague and imprecise information. We employ Mamdani's rule‐based fuzzy inference procedure. 30 Using five fuzzy membership functions, the predicted class probabilities are combined with their associated uncertainties, and then are transformed to one out of seven clinical decisions, namely ‘definite no‐remission (DN)’, ‘probable no‐remission (PN)’, ‘unsure no‐remission (UN)’, ‘unsure (US)’, ‘unsure remission (UR)’, ‘probable remission (PR)’, and ‘definite remission (DR)’; (see Supporting Information S12 for a detailed description of the procedure).

Figure 2A shows how the fuzzy logic framework modifies the predicted probability of remission based on the estimated model uncertainty. Figure 2B shows how the decision surface is divided between seven categories of decisions. These uncertainty‐aware categorical decisions can play the role of meta‐information aiding clinicians in more safer AI‐aided decision‐making. For example, if a decision lies in one of the “unsure” categories, the clinicians can ignore the model prediction and rely on other sources of information (e.g., a second opinion from a colleague or gathering more information about the patient).

FIGURE 2.

FIGURE 2

(A) The modified probability of symptomatic remission (color scale) after adjusting the predicted probability (y‐axis) based on model uncertainty (x‐axis). For example, p = 1.00 (i.e., the model predicts 100% remission for a certain patient) can be transferred to a value between 0.60 and 1.00 depending on the level of model uncertainty. This modification resulted in better‐calibrated predictions of the probability of remission and is thus more suitable for clinical usage. Calibration of the prediction models is a critical factor but often neglected,31, 32 especially in identifying the threshold of risks in clinical decision‐making. 33 (B) The span of decision surfaces for seven categories of clinical decisions. The fuzzy logic approach for decision‐making enables the prediction model to also say “I do not know” when the decision lies in the unsure (US) category. This is a crucial feature for more safe applications of ML models in clinical settings. 18 Furthermore, psychiatrists can refrain from relying on model predictions when the decisions lie in the unsure remission (UR) or unsure no‐remission (UN) to reduce the risk of wrong decisions. The other categories are ‘definite no‐remission (DN)’, ‘probable no‐remission (PN)’, ‘probable remission (PR)’, and ‘definite remission (DR)’.

2.6. Model training procedure and evaluation

For more robust training of a complex model on small data, we pretrained the model on synthetic data and used data augmentation techniques (see Supporting Information S12). Furthermore, we used dropout in the proposed neural network architecture, with a two‐fold advantage: during the training, it prevents the network from overfitting 34 ; while in the prediction phase, it enables estimating the uncertainty in the predictions. 35 The estimated uncertainties are used in the proposed decision‐making module (see section From predictions to uncertainty‐aware clinical decisions) to translate the model predictions of outcomes into risk‐aware clinical decisions.

In this study, the classification performance of the proposed architecture was evaluated using 20 repetitions of two cross‐validation strategies: (1) 10‐fold cross‐validation and (2) one‐site‐out cross‐validation. The repeated cross‐validation procedures helped account for variations with data perturbation and ensure reliable estimates of the model's generalization performance. For each repetition of the cross‐validation, evaluation metrics were calculated to measure the classification performance. The metrics used in this study include:

  1. Area Under the Receiver Operating Characteristic Curve (AUC): quantifies the overall discriminative power of the model. It represents the ability of the model to distinguish between the positive and negative classes.

  2. Balanced Accuracy (BAC)

  3. Sensitivity

  4. Specificity

2.7. Experimental setup

We use the model to predict the outcomes at four weeks (W4) and ten weeks (W10) following the initiation of treatment (W0). In order to assess the impact of including patient status information obtained during the treatment phase on the accuracy of the predictions, we conducted a performance comparison of the predictor when using different lengths of data points over time, ranging from W1 to W6 (as illustrated in Figure 3). This evaluation was carried out across six distinct clinical scenarios (S1–6): Predicting W4‐outcomes based on data at W0 (S1) or W0 + W1 (S2), and predicting W10‐outcomes based on data at W0 (S3), W0 + W1 (S4), W0 + W1 + W4 (S5), or W0 + W1 + W4 + W6 (S6) (see Figure 3), allowing us to examine the predictive capabilities of the model under various conditions.

FIGURE 3.

FIGURE 3

Prognosis prediction models in six clinical scenarios. Static and dynamic features are represented as red and black blocks, respectively. (S1) Only information from W0 is used for predicting the outcome at W4. (S2) Prediction at W4 is made after a 1‐week follow‐up by adding the dynamic information from W1. (S3) Only information from W0 is used for predicting the outcome at W10. (S4) Prediction at W10 is made after a 1‐week follow‐up by adding the dynamic information from W1. (S5) All the information from the first phase of the study is used to predict the outcome at W10. (S6) The dynamic information from W6 is also used as input data to the model for prediction at W10.

3. RESULTS

3.1. More data over time results in higher prediction accuracy

As summarized in Table 3 and Figure 4, using one‐site‐out cross‐validation, AUC for the 4‐week outcomes in S1 and S2 scenarios ranged from 0.66 for functional remission to 0.71 for symptomatic remission. For the 10‐week outcomes, AUC ranged from 0.72 for functional remission to 0.74 for symptomatic remission (for balanced accuracy, sensitivity and specificity, see sTables 13 in the Supplementary Tables S11 and sFigures [Link], [Link], [Link]). Across all outcome measures, the AUC of the 4‐week predictions improved by 0.04–0.05 when not only baseline data (W0) but also data after one week (W1) were used. For 10‐week predictions, the use of all time series data improved AUC by 0.08–0.17, across all outcome measures.

TABLE 3.

Performance of the prediction models predicting three outcome measures (symptomatic remission, clinical global remission, and functional remission) for six clinical scenarios (S1–S6).

Clinical Scenario N Symptomatic remission Clinical global remission Functional remission
AUC 10‐fold One‐site‐out 10‐fold One‐site‐out 10‐fold One‐site‐out
S1 371 0.701 (0.015) 0.664 (0.014) 0.708 (0.018) 0.677 (0.014) 0.668 (0.021) 0.622 (0.019)
S2 371 0.733 (0.011) 0.706 (0.010) 0.743 (0.009) 0.720 (0.011) 0.712 (0.018) 0.662 (0.018)
S3 72 0.573 (0.031) 0.586 (0.029) 0.560 (0.035) 0.560 (0.028) 0.642 (0.056) 0.643 (0.039)
S4 72 0.640 (0.025) 0.635 (0.043) 0.602 (0.038) 0.598 (0.027) 0.669 (0.045) 0.641 (0.052)
S5 72 0.666 (0.025) 0.663 (0.043) 0.677 (0.028) 0.682 (0.032) 0.691 (0.042) 0.678 (0.074)
S6 72 0.746 (0.030) 0.744 (0.022) 0.747 (0.028) 0.729 (0.024) 0.746 (0.059) 0.720 (0.053)

Note: Performance is measured by the area under the receiver operating characteristic curve (AUC). The values are averaged over 20 repetitions of 10‐fold and one‐site‐out cross‐validation. The values in the parentheses represent the standard deviation over these repetitions. S1 and S2: although 446 subjects entered phase one of the study, due to dropout of 75 subjects during this phase, the number of subjects used in these models is 371. S3–S6: 250 subjects achieved symptomatic remission after phase one (and therefore did not continue to phase two), and there was an additional dropout of 28 subjects between phase one and two. Although thus 93 subjects entered phase two, due to dropout of 21 subjects during this phase, the number of subjects used in these models is 72.

FIGURE 4.

FIGURE 4

AUCs of the model across three outcome measures (first column: Symptomatic remission, second column: Clinical global remission, and third column: Functional remission) for six clinical scenarios. The x‐axes represent the clinical scenarios in phase one (S1 and S2) and phase two of the study (S3, S4, S5, and S6). The y‐axis shows the AUC. The blue and red lines represent the results for 10‐fold and one‐site‐out cross‐validation, respectively. The error bars show the standard deviation of performance across 20 repetitions. The results in the first row show the AUCs in phase one in a 4‐week prediction. The added use of time point W1 increases the AUC for all outcome measures, with added AUC ranging from 0.03 to 0.05. The second row shows the results of phase two in a 10‐week prediction. Except for one instance (input: W0–W1, outcome: Functional remission; validation: One‐site out), each added time point further increases the prediction performance for all outcome measures.

3.2. Incorporating model uncertainty reduces the risk of decision making

To quantitatively evaluate the advantage of using uncertainty‐aware predictions, we evaluated the prediction accuracy for symptomatic remission in six clinical scenarios at four levels of conservativeness:

  • Level 0, in which the trivial threshold‐based approach is used for decision‐making. A hard threshold of 0.5 is applied to the predicted probability of remission to decide between non remission (below the threshold) or remission (equal or above the threshold) decisions. In fact, the proposed decision‐making method is not used.

  • Level 1, in which the clinician abstains from utilizing the model's predictions that lie in the ‘unsure (US)’ category (when the model says “I do not know”).

  • Level 2 of conservativeness, where predictions from the three most uncertain prediction categories (US, UR, and UN) are not used for clinical decision‐making.

  • Level 3, where in the most conservative usage of model predictions, only the most certain decisions of the model (the DR and DN categories) are employed by the clinicians for decision‐making.

The results of applying increasing levels of conservativeness are presented in Table 4. An incremental trend in the accuracy of decisions is seen, when rising the conservativeness level from 0 to 3 (see also Figure 5), which is, naturally, accompanied by a decrease in the number of patients for whom an ML‐aided decision is made (represented by decisiveness in the table). At level 1, and by excluding ~10% of decisions in the US category, the accuracy of the model is improved by ~0.06 across all clinical scenarios. At level 2, excluding ~50% (the uncertain predictions in US, UR, and UN) from decision‐making, results in a further increase in accuracy to ~0.86. By restricting the decision‐making to DR and DN categories at level 3, the accuracy of the model is increased to ~0.95 within ~16% of patients with decisions. Thus, the clinicians can trust the DR and DN decisions with 0.95 confidence (although without being able to use decisions for ~84% of their patients). This is a crucial feature for more trustworthy decision‐making in clinics because the users (i.e., clinicians) not only receive an ML‐aided data‐driven recommendation from the machine but are also informed about the risk involved in relying on these predictions. The more confidence in the model's predictions (in DR and DN categories), the less risk is involved in AI aided decision‐making.

TABLE 4.

The decisiveness (the proportion of decided sample to total sample) and the accuracy of decision for symptomatic remission, for six clinical scenarios (rows) and four decision levels (columns).

Clinical scenario Decisiveness Accuracy
Level 0 Level 1 Level 2 Level 3 Level 0 Level 1 Level 2 Level 3
S1 1.00 (0.00) 0.88 (0.02) 0.48 (0.03) 0.15 (0.02) 0.65 (0.02) 0.70 (0.02) 0.86 (0.02) 0.97 (0.01)
S2 1.00 (0.00) 0.90 (0.02) 0.53 (0.03) 0.18 (0.02) 0.69 (0.01) 0.73 (0.01) 0.87 (0.01) 0.97 (0.01)
S3 1.00 (0.00) 0.91 (0.03) 0.57 (0.08) 0.21 (0.06) 0.52 (0.03) 0.56 (0.02) 0.73 (0.05) 0.90 (0.04)
S4 1.00 (0.00) 0.91 (0.02) 0.50 (0.04) 0.16 (0.04) 0.57 (0.03) 0.62 (0.02) 0.81 (0.02) 0.94 (0.02)
S5 1.00 (0.00) 0.89 (0.04) 0.43 (0.06) 0.15 (0.05) 0.61 (0.03) 0.67 (0.04) 0.86 (0.03) 0.95 (0.02)
S6 1.00 (0.00) 0.91 (0.03) 0.48 (0.05) 0.18 (0.04) 0.68 (0.03) 0.72 (0.03) 0.89 (0.03) 0.96 (0.02)
Median 1.00 0.90 0.48 0.16 0.61 0.67 0.86 0.95

Note: The values are averaged over 20 repetitions of 10‐fold cross‐validation. The values in the parentheses represent the standard deviation over these repetitions.

FIGURE 5.

FIGURE 5

A comparison between the accuracy and decisiveness of ML‐aided decision‐making at four levels of conservativeness (0–3). The boxplots represent the average accuracy (left) and decisiveness (right) over 20 repetitions of 10‐fold cross‐validation across six clinical scenarios. Using more certain predictions for decision‐making results in less decisive but more accurate models.

4. DISCUSSION

This study set out to build a prediction model that has the potential to be used as a tool to assist in clinical decision‐making. To the best of our knowledge, this is the first model that fulfills three crucial criteria for use in clinical practice (for a more detailed comparison between our approach and more common machine learning models, see Supporting Information S12). Firstly, by using a recurrent neural network architecture, the model was trained on time series data. Previous studies4, 5, 6, 7, 8 only used baseline data in their prediction models, thus, do not accommodate the use of time series data in the same prediction model. In contrast, the proposed architecture provides the flexibility to add additional data over time as the status of the patient develops after receiving a certain treatment. Therefore, unlike other prediction models, it better fits real‐world data, 36 where a patient is a dynamic entity and assessed regularly.

Secondly, our architecture is multi‐task, so one prediction model can predict multiple outcome measures simultaneously. This feature has been highlighted as important by our patient and doctor panel of advisors which regularly contributes to the understanding of real‐world patient and doctor needs. The involvement of such panels was found to be crucial in building trust in AI solutions in healthcare, among both patients and doctors. 37 In previous studies, when predicting multiple outcome measures, separate prediction models were needed, one for each outcome measure4, 5, 6, 7, 38, 39, 40, 41, 42, 43, 44, 45; In this study, we were able to predict symptomatic, clinical global and functional remission in just one model. The multi‐task model provides a way for clinicians and their patients to know what the predicted (differential) effects of treatment are in several domains.

Thirdly, we used the uncertainty of predictions to adjust the prediction accuracy and this was implemented in a novel decision‐making module. Thus, clinicians and their patients will get additional information about how sure the machine is about an individual prediction. As a result, this improves the chance of making the right treatment decision. We consider this an important feature, given the potential consequences of wrong decision‐making in treatment. For example in psychosis, unnecessary side effects of antipsychotic medication or longer duration of untreated psychosis are to be considered in this aspect. Models that have predictive uncertainty incorporated will help to create more trust with the physicians (and patients) using them. Furthermore, the ability to say “I don't know” when the model is uncertain about an individual prediction, is a necesarry feature for safe translation of machine learning models to clinical practice. 18 Using this flexible multi‐task recurrent architecture that incorporates the uncertainty of individual predictions, we took a leap forward toward improving patient care with the help of machine learning prediction models.

Considering the specific characteristics of our current solution, we found accuracies of up to 0.72 AUC for 4‐week prediction and up to 0.74 AUC for 10‐week prediction using one‐site‐out as a validation method. These results are comparable to previously conducted studies.4, 5, 7, 9 We have shown that the use of multiple time points increased the accuracy of prediction for all outcome measures for both 4‐week and 10‐week predictions.

Although the accuracy of our models is not above the 80% threshold suggested by the APA 46 when all patients are incorporated, we still consider our models could be clinically relevant after incorporating the prediction uncertainty. When uncertain predictions are discarded (our ‘decision‐making level 2’), across the six different prediction models, predictions were still possible for 43%–57% of the patients with accuracies ranging from 0.73 to 0.89, with even five of our six models achieving an accuracy above 0.8. This feature could therefore be an important step toward reaching our goal of building an interactive tool for individual prediction of the prognosis in psychosis.

In this study, we merely used clinical and sociodemographic predictor variables, only requiring basic medical examination and questionnaires to obtain. Other studies suggest more advanced medical tests like blood serum biomarkers 47 or structural MRI scans 48 prove meaningful in predicting antipsychotic treatment response. Combining these different types of predictor variables in one prediction model might be an essential next step in order to attain higher prediction accuracies, which we will explore in future research. However, the feasibility of such a model in clinical practice, with a higher burden on patients due to the more invasive methods required, and the higher medical costs associated with this, is an important factor to be considered. Expanding our model with other kinds of data, specifically possibly important “easy to obtain” clinical predictors not currently available (e.g., family history, somatic comorbidity, traumatic experiences), might therefore be a more desirable way to improve accuracy and validity.

4.1. Limitations

Our LSTM model can use time series data, but currently, this is only possible when data from all previous time points are also available. In clinical practice, this could be a potential problem, in situations where a patient misses an appointment or is not capable of providing information in certain exams or questionnaires at some point. The risk of having missing values is bigger for models that rely on many features. Feature selection could lower this risk, but could not be reliably implemented (see the detailed Discussion in Supporting Information S12). The problem can be solved by using LSTM models that can handle missing measurements 49 or by incorporating the length of time intervals in the modeling process. 50 We consider these as possible future directions to extend our work.

Considering the data the model was tested on, all patients in the dataset used amisulpride in the first phase, and amisulpride or olanzapine in the second phase of their treatment. Therefore our model only applies to patients using amisulpride. Also, not all potentially relevant predictor variables were available in our data set, such as childhood adverse events. A larger dataset with a more diverse and heterogeneous sample would improve the clinical applicability of future models.

5. CONCLUSION

We developed and tested a psychosis prognosis prediction model that has properties that are required for use in daily clinical practice. Using a flexible multi‐task recurrent neural network architecture that was optimized for this goal, the ability to use time series data was shown to be of great importance once prediction models will be used in clinical care. By building a multi‐task model, different clinically relevant outcomes can be predicted simultaneously. For more reliable decision‐making, we built a decision‐making module that considers the uncertainty of individual predictions and we demonstrated its usefulness.

FUNDING INFORMATION

This work was supported by ZonMw (project ID 63631 0011) and by a research grant from the AI for Health working group of the TU/e‐WUR‐UU‐UMCU (EWUU) alliance.

CONFLICT OF INTEREST STATEMENT

RSK reports consulting fees from Alkermes, Sunovion, Gedeon‐Richter, and Otsuka.

PEER REVIEW

The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer‐review/10.1111/acps.13754.

ETHICS STATEMENT

All relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

PATIENT CONSENT STATEMENT

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

CLINICAL TRIAL REGISTRATION

All clinical trials and any other prospective interventional studies must be registered with an ICMJE‐approved registry, such as ClinicalTrials.gov. Any such study reported in the manuscript has been registered.

OPTiMiSE dataset: https://www.thelancet.com/journals/lanpsy/article/PIIS2215‐0366(18)30252‐9/fulltext.

Supporting information

Figure S1. The data augmentation process. A set of ten samples with time‐length 2–5 are generated for a sample with the length of five timepoints.

ACPS-151-280-s003.pdf (465.4KB, pdf)

Figure S2. (a) Five Gaussian membership functions for the probability of remission. These functions are used to map the values of the probability of remission (p), the worst‐case probability of remission (p w ), and the best‐case probability of remission (p b ) in the x‐axis to a membership value (between 0 and 1) in the y‐axis for ‘very low’, ‘low’, ‘medium’, ‘high’, and ‘very high’ categories; (b) Gaussian membership functions for seven clinical decisions, ‘definite no‐remission (DN)’, probable no‐remission (PN)’, ‘unsure no‐remission (UN)’, ‘unsure (US)’, ‘unsure remission (UR)’, ‘probable remission (PR)’, ‘definite remission (DR)’.

ACPS-151-280-s001.pdf (113.7KB, pdf)

Figure S3. Seven rules in the proposed fuzzy inference system for translating the predicted probability of remission (p), the worst‐case probability of remission (p w ), and the best‐case probability of remission (p b ) into risk‐aware clinical decisions. The green stars show the value of the corresponding membership function in each rule for an example prediction with p = 0.9, p w  = 0.25, and p b  = 1.00. The orange and blue boxes represent the fuzzy max and min operations, respectively. The gray area in the last right column shows the mass under the membership function of each decision. These masses are combined using fuzzy max aggregation. The x‐coordinate of the centroid of the aggregated mass represents the uncertainty‐aware probability of remission (p~) that aggregates the model uncertainty into the final prediction.

ACPS-151-280-s009.pdf (3.8MB, pdf)

Figure S4. Balanced accuracies (BACs) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x‐axes represent the clinical scenarios in phase one (S1 and S2) and phase 2 of the study (S3, S4, S5, and S6). The y‐axis shows the BAC. The blue and red lines represent the results for 10‐fold and one‐site‐out cross‐validation, respectively. The error bars show the standard deviation of performance across 20 repetitions. The results in the first row show the BAC in phase one in a 4‐week prediction. The added use of time point W1 increases the BAC for all outcome measures. This is mainly due to the increased sensitivity of the model when a new time point is added. The second row shows the results of phase two in a 10‐week prediction. Except for one instance (functional remission), each added time point further increases the BAC for all outcome measures. The increase in the BACs in this case is a byproduct of the increased specificity (see sFigures [Link], [Link]) of the model when a new time point is added.

ACPS-151-280-s010.pdf (748.6KB, pdf)

Figure S5. Sensitivity (SEN) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x‐axes represent the clinical scenarios in phase one (S1 and S2) and phase two of the study (S3, S4, S5, and S6). The y‐axis shows the SEN. The blue and red lines represent the results for 10‐fold and one‐site‐out cross‐validation, respectively. The error bars show the standard deviation of SENs across 20 repetitions. The results in the first row show the SEN in phase one in a 4‐week prediction. The added use of time point W1 increases the SEN for all outcome measures. The second row shows the results of phase two in a 10‐week prediction. In most cases adding a new time point results in a reduced sensitivity of the model.

ACPS-151-280-s004.pdf (684.3KB, pdf)

Figure S6. Specificity (SPC) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x‐axes represent the clinical scenarios in phase one (S1 and S2) and phase two of the study (S3, S4, S5, and S6). The y‐axis shows the SPC. The blue and red lines represent the results for 10‐fold and one‐site‐out cross‐validation, respectively. The error bars show the standard deviation of SPCs across 20 repetitions. The results in the first row show the SPC in phase one in a 4‐week prediction. The added use of time point W1 slightly decreases the SPC for all outcome measures. The second row shows the results of phase two in a 10‐week prediction. In all cases adding a new time point results in higher model specificity.

ACPS-151-280-s012.pdf (699.9KB, pdf)

Figure S7. To handle dynamic patient status in outcome prediction using conventional ML approaches, we need specialized models for data collected at each visit.

ACPS-151-280-s006.pdf (29.3KB, pdf)

Figure S8. When using conventional ML approaches for outcome prediction, due to their fixed input size, we cannot feed them with accumulated data over time. We need a new model for the mixed data.

ACPS-151-280-s007.pdf (40.3KB, pdf)

Figure S9. Using conventional approaches, we should train several specialized models to accurately predict at different time points in the future.

ACPS-151-280-s002.pdf (19.4KB, pdf)

Figure S10. Using conventional single‐task approaches, we need to train one model per outcome. This is while the proposed multi‐task approach can predict several outcomes simultaneously.

ACPS-151-280-s011.pdf (39.9KB, pdf)

Tables S11. Supplementary Tables.

ACPS-151-280-s005.docx (23KB, docx)

Text S12. Supporting Information.

ACPS-151-280-s008.docx (34KB, docx)

van Opstal DPJ, Kia SM, Jakob L, et al. Psychosis Prognosis Predictor: A continuous and uncertainty‐aware prediction of treatment outcome in first‐episode psychosis. Acta Psychiatr Scand. 2025;151(3):280‐292. doi: 10.1111/acps.13754

DATA AVAILABILITY STATEMENT

All data produced in the present study are available upon reasonable request to the authors.

REFERENCES

  • 1. Bzdok D, Meyer‐Lindenberg A. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry Cogn Neurosci Neuroimaging. 2018;3(3):223‐230. doi: 10.1016/j.bpsc.2017.11.007 [DOI] [PubMed] [Google Scholar]
  • 2. Chekroud AM, Bondar J, Delgadillo J, et al. The promise of machine learning in predicting treatment outcomes in psychiatry. World Psychiatry. 2021;20(2):154‐170. doi: 10.1002/wps.20882 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Salazar de Pablo G, Studerus E, Vaquerizo‐Serrano J, et al. Implementing precision psychiatry: a systematic review of individualized prediction models for clinical practice. Schizophr Bull. 2021;47(2):284‐297. doi: 10.1093/schbul/sbaa120 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Koutsouleris N, Kahn RS, Chekroud AM, et al. Multisite prediction of 4‐week and 52‐week treatment outcomes in patients with first‐episode psychosis: a machine learning approach. Lancet Psychiatry. 2016;3(10):935‐946. doi: 10.1016/S2215-0366(16)30171-7 [DOI] [PubMed] [Google Scholar]
  • 5. Fond G, Bulzacka E, Boucekine M, et al. Machine learning for predicting psychotic relapse at 2 years in schizophrenia in the national FACE‐SZ cohort. Prog Neuro‐Psychopharmacol Biol Psychiatry. 2019;92:8‐18. doi: 10.1016/j.pnpbp.2018.12.005 [DOI] [PubMed] [Google Scholar]
  • 6. Leighton SP, Krishnadas R, Chung K, et al. Predicting one‐year outcome in first episode psychosis using machine learning. PLoS One. 2019;14(3):e0212846. doi: 10.1371/journal.pone.0212846 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. De Nijs J, Burger TJ, Janssen RJ, et al. Individualized prediction of three‐ and six‐year outcomes of psychosis in a longitudinal multicenter study: a machine learning approach. NPJ Schizophr. 2021;7(1):34. doi: 10.1038/s41537-021-00162-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Austin JC, Hippman C, Honer WG. Descriptive and numeric estimation of risk for psychotic disorders among affected individuals and relatives: implications for clinical practice. Psychiatry Res. 2012;196(1):52‐56. doi: 10.1016/j.psychres.2012.02.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Soldatos RF, Cearns M, Nielsen MØ, et al. Prediction of early symptom remission in two independent samples of first‐episode psychosis patients using machine learning. Schizophr Bull. 2022;48(1):122‐133. doi: 10.1093/schbul/sbab107 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Lin E, Lin CH, Lane HY. Applying a bagging ensemble machine learning approach to predict functional outcome of schizophrenia with clinical symptoms and cognitive functions. Sci Rep. 2021;11(1):6922. doi: 10.1038/s41598-021-86382-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Li Y, Zhang L, Zhang Y, et al. A random Forest model for predicting social functional improvement in Chinese patients with schizophrenia after 3 months of atypical antipsychotic Monopharmacy: a cohort study. Neuropsychiatr Dis Treat. 2021;17:847‐857. doi: 10.2147/NDT.S280757 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Leighton SP, Upthegrove R, Krishnadas R, et al. Development and validation of multivariable prediction models of remission, recovery, and quality of life outcomes in people with first episode psychosis: a machine learning approach. Lancet Digit Health. 2019;1(6):e261‐e270. doi: 10.1016/S2589-7500(19)30121-9 [DOI] [PubMed] [Google Scholar]
  • 13. Basaraba CN, Scodes JM, Dambreville R, et al. Prediction tool for individual outcome trajectories across the next year in first‐episode psychosis in coordinated specialty care. JAMA Psychiatry. 2023;80(1):49‐56. doi: 10.1001/jamapsychiatry.2022.3571 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Ogyu K, Noda Y, Yoshida K, et al. Early improvements of individual symptoms as a predictor of treatment response to asenapine in patients with schizophrenia. Neuropsychopharmacol Rep. 2020;40(2):138‐149. doi: 10.1002/npr2.12103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Caruana R. Multitask Learning. In: Thrun S, Pratt L, eds. Learning to Learn. Springer; 1998:95‐133. doi: 10.1007/978-1-4615-5529-2_5 [DOI] [Google Scholar]
  • 16. Kaushik S, Choudhury A, Sheron PK, et al. AI in healthcare: time‐series forecasting using statistical, neural, and ensemble architectures. Front Big Data. 2020;3:4. doi: 10.3389/fdata.2020.00004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Kendall A, Gal Y. What uncertainties do we need in Bayesian deep learning for computer vision? arXiv [csCV]. Published online March 15, 2017. Accessed December 9, 2022. https://proceedings.neurips.cc/paper/2017/hash/2650d6089a6d640c5e85b2b88265dc2b.Abstract.html
  • 18. Kompa B, Snoek J, Beam AL. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit Med. 2021;4(1):4. doi: 10.1038/s41746-020-00367-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Meijerink L, Cinà G, Tonutti M. Uncertainty estimation for classification and risk prediction on medical tabular data. arXiv [statML] . Published online April 13, 2020. http://arxiv.org/abs/2004.05824
  • 20. Kahn RS, Winter van Rossum I, Leucht S, et al. Amisulpride and olanzapine followed by open‐label treatment with clozapine in first‐episode schizophrenia and schizophreniform disorder (OPTiMiSE): a three‐phase switching study. Lancet . Psychiatry. 2018;5(10):797‐807. doi: 10.1016/S2215-0366(18)30252-9 [DOI] [PubMed] [Google Scholar]
  • 21. Andreasen NC, Carpenter WT Jr, Kane JM, Lasser RA, Marder SR, Weinberger DR. Remission in schizophrenia: proposed criteria and rationale for consensus. Am J Psychiatry. 2005;162(3):441‐449. doi: 10.1176/appi.ajp.162.3.44 [DOI] [PubMed] [Google Scholar]
  • 22. Kay SR, Fiszbein A, Opler LA. The positive and negative syndrome scale (PANSS) for schizophrenia. Schizophr Bull. 1987;13(2):261‐276. doi: 10.1093/schbul/13.2.261 [DOI] [PubMed] [Google Scholar]
  • 23. Guy W. ECDEU Assessment Manual for Psychopharmacology: 1976. National Institute of Mental Health; 1976. [Google Scholar]
  • 24. Morosini PL, Magliano L, Brambilla L, Ugolini S, Pioli R. Development, reliability and acceptability of a new version of the DSM‐IV social and occupational functioning assessment scale (SOFAS) to assess routine social functioning. Acta Psychiatr Scand. 2000;101(4):323‐329. https://www.ncbi.nlm.nih.gov/pubmed/10782554 [PubMed] [Google Scholar]
  • 25. Hochreiter S, Schmidhuber J. Long short‐term memory. Neural Comput. 1997;9(8):1735‐1780. doi: 10.1162/neco.1997.9.8.1735 [DOI] [PubMed] [Google Scholar]
  • 26. Lindemann B, Müller T, Vietz H, Jazdi N, Weyrich M. A survey on long short‐term memory networks for time series prediction. Procedia CIRP. 2021;99:650‐655. doi: 10.1016/j.procir.2021.03.088 [DOI] [Google Scholar]
  • 27. Fischer T, Krauss C. Deep learning with long short‐term memory networks for financial market predictions. Eur J Oper Res. 2018;270(2):654‐669. doi: 10.1016/j.ejor.2017.11.054 [DOI] [Google Scholar]
  • 28. Cox DR. Principles of Statistical Inference. Cambridge University Press; 2006. https://play.google.com/store/books/details?id=nRgtGZXi2KkC [Google Scholar]
  • 29. Zadeh LA. Fuzzy logic. Computer. 1988;21(4):83‐93. doi: 10.1109/2.53 [DOI] [Google Scholar]
  • 30. Mamdani EH. Application of fuzzy algorithms for control of simple dynamic plant. Proc Inst Electr Eng. 1974;121(12):1585‐1588. doi: 10.1049/piee.1974.0328 [DOI] [Google Scholar]
  • 31. van Calster B, McLernon DJ, van Smeden M, et al. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17(1):230. doi: 10.1186/s12916-019-1466-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Nixon J, Dusenberry M, Jerfel G, et al. Measuring calibration in deep learning. arXiv [csLG]. Published online April 2, 2019. Accessed December 9, 2022. http://openaccess.thecvf.com/content_CVPRW_2019/papers/Uncertainty_and_Robustness_in_Deep_Visual_Learning/Nixon_Measuring_Calibration_in_Deep_Learning_CVPRW_2019_paper.pdf
  • 33. Van Smeden M, Reitsma JB, Riley RD, et al. Clinical prediction models: diagnosis versus prognosis. J Clin Epidemiol. 2021;132:142‐145. doi: 10.1016/j.jclinepi.2021.01.009 [DOI] [PubMed] [Google Scholar]
  • 34. Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15(1):1929‐1958. Accessed December 16, 2022. https://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf?utm_content=buffer79b43&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer [Google Scholar]
  • 35. Gal Y, Ghahramani Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: Balcan MF, Weinberger KQ, eds. Proceedings of The 33rd International Conference on Machine Learning. Vol 48. Proceedings of Machine Learning Research. PMLR; 2016:1050‐1059 https://proceedings.mlr.press/v48/gal16.html [Google Scholar]
  • 36. Makady A, de Boer A, Hillege H, Klungel O, Goettsch W, (on behalf of GetReal Work Package 1) . What is real‐world data? A review of definitions based on literature and stakeholder interviews. Value Health. 2017;20(7):858‐865. doi: 10.1016/j.jval.2017.03.008 [DOI] [PubMed] [Google Scholar]
  • 37. Banerjee S, Alsop P, Jones L, Cardinal RN. Patient and public involvement to build trust in artificial intelligence: a framework, tools, and case studies. Patterns (N Y). 2022;3(6):100506. doi: 10.1016/j.patter.2022.100506 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Albert N, Bertelsen M, Thorup A, et al. Predictors of recovery from psychosis analyses of clinical and social factors associated with recovery among patients with first‐episode psychosis after 5 years. Schizophr Res. 2011;125(2–3):257‐266. doi: 10.1016/j.schres.2010.10.013 [DOI] [PubMed] [Google Scholar]
  • 39. De Wit S, Ziermans TB, Nieuwenhuis M, et al. Individual prediction of long‐term outcome in adolescents at ultra‐high risk for psychosis: applying machine learning techniques to brain imaging data. Hum Brain Mapp. 2017;38(2):704‐714. doi: 10.1002/hbm.23410 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Gasquet I, Haro JM, Tcherny‐Lessenot S, Chartier F, Lépine JP. Remission in the outpatient care of schizophrenia: 3‐year results from the schizophrenia outpatients health outcomes (SOHO) study in France. Eur Psychiatry. 2008;23(7):491‐496. doi: 10.1016/j.eurpsy.2008.03.012 [DOI] [PubMed] [Google Scholar]
  • 41. Koutsouleris N, Kambeitz‐Ilankovic L, Ruhrmann S, et al. Prediction models of functional outcomes for individuals in the clinical high‐risk state for psychosis or with recent‐onset depression: a multimodal, multisite machine learning analysis. JAMA Psychiatry. 2018;75(11):1156‐1172. doi: 10.1001/jamapsychiatry.2018.2165 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Lambert M, Karow A, Leucht S, Schimmelmann BG, Naber D. Remission in schizophrenia: validity, frequency, predictors, and patients' perspective 5 years later. Dialogues Clin Neurosci. 2010;12(3):393‐407. doi: 10.31887/DCNS.2010.12.3/mlambert [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Lambert M, Schimmelmann BG, Naber D, et al. Prediction of remission as a combination of symptomatic and functional remission and adequate subjective well‐being in 2960 patients with schizophrenia. J Clin Psychiatry. 2006;67(11):1690‐1697. doi: 10.4088/jcp.v67n1104 [DOI] [PubMed] [Google Scholar]
  • 44. Malla A, Norman R, Schmitz N, et al. Predictors of rate and time to remission in first‐episode psychosis: a two‐year outcome study. Psychol Med. 2006;36(5):649‐658. doi: 10.1017/S0033291706007379 [DOI] [PubMed] [Google Scholar]
  • 45. Caton CLM, Hasin DS, Shrout PE, et al. Predictors of psychosis remission in psychotic disorders that co‐occur with substance use. Schizophr Bull. 2006;32(4):618‐625. doi: 10.1093/schbul/sbl007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Botteron K, Carter C, Castellanos FX, et al. Consensus report of the APA work group on neuroimaging markers of psychiatric disorders. Am Psychiatr Assoc. 2012; https://www.researchgate.net/profile/Karen‐Seymour/publication/261507750_Consensus_Report_of_the_APA_Work_Group_on_Neuroimaging_Markers_of_Psychiatric_Disorders/links/0c9605346a4d865d9b000000/Consensus‐Report‐of‐the‐APA‐Work‐Group‐on‐Neuroimaging‐Markers‐of‐Psychiatric‐Disorders.pdf [Google Scholar]
  • 47. Martinuzzi E, Barbosa S, Daoudlarian D, et al. Stratification and prediction of remission in first‐episode psychosis patients: the OPTiMiSE cohort study. Transl Psychiatry. 2019;9(1):20. doi: 10.1038/s41398-018-0366-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Chen Y, Liu S, Zhang B, et al. Baseline symptom‐related white matter tracts predict individualized treatment response to 12‐week antipsychotic monotherapies in first‐episode schizophrenia. Transl Psychiatry. 2024;14(1):23. doi: 10.1038/s41398-023-02714-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Kia SM, Rad NM, van Opstal DPJ, et al. PROMISSING: pruning missing values in neural networks. arXiv [csLG]. Published online June 3, 2022. http://arxiv.org/abs/2206.01640
  • 50. Baytas IM, Xiao C, Zhang X, et al. Patient Subtyping via Time‐Aware LSTM Networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ‘17. Association for Computing Machinery; 2017:65–74. doi: 10.1145/3097983.3097997 [DOI]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1. The data augmentation process. A set of ten samples with time‐length 2–5 are generated for a sample with the length of five timepoints.

ACPS-151-280-s003.pdf (465.4KB, pdf)

Figure S2. (a) Five Gaussian membership functions for the probability of remission. These functions are used to map the values of the probability of remission (p), the worst‐case probability of remission (p w ), and the best‐case probability of remission (p b ) in the x‐axis to a membership value (between 0 and 1) in the y‐axis for ‘very low’, ‘low’, ‘medium’, ‘high’, and ‘very high’ categories; (b) Gaussian membership functions for seven clinical decisions, ‘definite no‐remission (DN)’, probable no‐remission (PN)’, ‘unsure no‐remission (UN)’, ‘unsure (US)’, ‘unsure remission (UR)’, ‘probable remission (PR)’, ‘definite remission (DR)’.

ACPS-151-280-s001.pdf (113.7KB, pdf)

Figure S3. Seven rules in the proposed fuzzy inference system for translating the predicted probability of remission (p), the worst‐case probability of remission (p w ), and the best‐case probability of remission (p b ) into risk‐aware clinical decisions. The green stars show the value of the corresponding membership function in each rule for an example prediction with p = 0.9, p w  = 0.25, and p b  = 1.00. The orange and blue boxes represent the fuzzy max and min operations, respectively. The gray area in the last right column shows the mass under the membership function of each decision. These masses are combined using fuzzy max aggregation. The x‐coordinate of the centroid of the aggregated mass represents the uncertainty‐aware probability of remission (p~) that aggregates the model uncertainty into the final prediction.

ACPS-151-280-s009.pdf (3.8MB, pdf)

Figure S4. Balanced accuracies (BACs) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x‐axes represent the clinical scenarios in phase one (S1 and S2) and phase 2 of the study (S3, S4, S5, and S6). The y‐axis shows the BAC. The blue and red lines represent the results for 10‐fold and one‐site‐out cross‐validation, respectively. The error bars show the standard deviation of performance across 20 repetitions. The results in the first row show the BAC in phase one in a 4‐week prediction. The added use of time point W1 increases the BAC for all outcome measures. This is mainly due to the increased sensitivity of the model when a new time point is added. The second row shows the results of phase two in a 10‐week prediction. Except for one instance (functional remission), each added time point further increases the BAC for all outcome measures. The increase in the BACs in this case is a byproduct of the increased specificity (see sFigures [Link], [Link]) of the model when a new time point is added.

ACPS-151-280-s010.pdf (748.6KB, pdf)

Figure S5. Sensitivity (SEN) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x‐axes represent the clinical scenarios in phase one (S1 and S2) and phase two of the study (S3, S4, S5, and S6). The y‐axis shows the SEN. The blue and red lines represent the results for 10‐fold and one‐site‐out cross‐validation, respectively. The error bars show the standard deviation of SENs across 20 repetitions. The results in the first row show the SEN in phase one in a 4‐week prediction. The added use of time point W1 increases the SEN for all outcome measures. The second row shows the results of phase two in a 10‐week prediction. In most cases adding a new time point results in a reduced sensitivity of the model.

ACPS-151-280-s004.pdf (684.3KB, pdf)

Figure S6. Specificity (SPC) of the model across three outcome measures (first column: symptomatic remission, second column: clinical global remission, and third column: functional remission) for six clinical scenarios. The x‐axes represent the clinical scenarios in phase one (S1 and S2) and phase two of the study (S3, S4, S5, and S6). The y‐axis shows the SPC. The blue and red lines represent the results for 10‐fold and one‐site‐out cross‐validation, respectively. The error bars show the standard deviation of SPCs across 20 repetitions. The results in the first row show the SPC in phase one in a 4‐week prediction. The added use of time point W1 slightly decreases the SPC for all outcome measures. The second row shows the results of phase two in a 10‐week prediction. In all cases adding a new time point results in higher model specificity.

ACPS-151-280-s012.pdf (699.9KB, pdf)

Figure S7. To handle dynamic patient status in outcome prediction using conventional ML approaches, we need specialized models for data collected at each visit.

ACPS-151-280-s006.pdf (29.3KB, pdf)

Figure S8. When using conventional ML approaches for outcome prediction, due to their fixed input size, we cannot feed them with accumulated data over time. We need a new model for the mixed data.

ACPS-151-280-s007.pdf (40.3KB, pdf)

Figure S9. Using conventional approaches, we should train several specialized models to accurately predict at different time points in the future.

ACPS-151-280-s002.pdf (19.4KB, pdf)

Figure S10. Using conventional single‐task approaches, we need to train one model per outcome. This is while the proposed multi‐task approach can predict several outcomes simultaneously.

ACPS-151-280-s011.pdf (39.9KB, pdf)

Tables S11. Supplementary Tables.

ACPS-151-280-s005.docx (23KB, docx)

Text S12. Supporting Information.

ACPS-151-280-s008.docx (34KB, docx)

Data Availability Statement

All data produced in the present study are available upon reasonable request to the authors.


Articles from Acta Psychiatrica Scandinavica are provided here courtesy of Wiley

RESOURCES