Skip to main content
Discover Mental Health logoLink to Discover Mental Health
. 2026 Jan 22;6(1):25. doi: 10.1007/s44192-026-00373-z

Explainable machine learning for mental health prediction from social media behavior: a nested cross-validation study with SHAP and LIME interpretability

Kamini Lamba 1, Shalli Rani 1,, Mohammad Shabaz 2,
PMCID: PMC12909650  PMID: 41569469

Abstract

Social media behavior is a promising source of early indicators for psychological distress; however, predictive models often lack transparency, limiting their adoption in mental health settings. This paper describes an explainable machine learning framework for predicting self-reported depression risk based on behavioral features collected from 481 anonymized social media users. Three supervised learning models were tested using a nested 5 × 5 cross-validation strategy, with Random Forest yielding the strongest performance (accuracy = 84.2%, AUC = 0.88). Model calibration analysis using reliability curves and Expected Calibration Error (ECE) demonstrated that Random Forest provides well-calibrated probability estimates suitable for binary High/Low risk assessment. Explainability was integrated using SHAP to identify key behavioral markers, including screen time, passive scrolling, nighttime usage, and stress-driven engagement. Stability testing across multiple random seeds revealed consistent feature ranking patterns, supporting the reliability of the explanations. To showcase real-world applicability, we outline a prototype XAI-driven digital intervention workflow and present a simulation across representative user profiles, illustrating how interpreted model outputs can inform personalized behavioral recommendations. However, generalizability is limited by a moderately sized dataset reliant on self-reported measures and cross-sectional design. Future work will integrate multimodal behavioral signals, larger cohorts, and clinically validated mental-health assessments. Overall, the study presents a more transparent, computationally grounded approach for interpretable depression-risk prediction from social media behavior, bridging the gap between predictive performance and practical explainability.

Keywords: Explainable AI, Mental health detection, Social media analytics, Machine learning, Depression prediction, Digital interventions, LIME, SHAP interpretability

Introduction

Although the conditions of depression and anxiety are faced by over 280 million people worldwide, they still represent a growing global challenge [1]. The increasing prevalence of digital platforms has created new opportunities for early detection of psychological distress, given that people often show emotions, behaviors, and stress-related patterns through their online activities [2, 3]. Social media behavior has come to serve as a rich, non-invasive signal of mental health that allows researchers to study linguistic cues [4], engagement patterns [5], and digital behaviors [6] associated with psychological well-being. Existing computational approaches have shown the potential of machine learning (ML) in identifying mental health risks using multimodal digital traces [79, 11]. Despite these advances, the majority of predictive models operate as opaque “black boxes”, offering limited transparency into how decisions are made. This is especially problematic in mental health contexts, where explainability, trust, and clinical accountability are essential [1214]. The need for interpretable models is further emphasized by ethical concerns surrounding algorithmic bias, autonomy, and fairness in healthcare AI [1517].

Explainable Artificial Intelligence techniques like SHAP [19, 20] and LIME [21] have been widely adopted to bridge this gap by providing post-hoc interpretability of ML predictions. Recent studies have told us how important they are in high-risk applications of clinical practice, such as liver disease detection [22, 23], diabetes classification [24], maternal health prediction [25], dengue diagnosis [26], breast cancer diagnosis [27, 28], and deep learning–based lung cancer analysis [29]. These reinforce the importance of transparent, patient-centric AI models which can identify clinically meaningful patterns with much more critical value.

However, XAI adoption has been more limited in mental health research compared to other domains. The related studies have focused so far mainly on text-based signals [3032] or on clinical sentiment analysis [33], while fewer studies are focused on the aspects of behavioral cues such as screen time, passive scrolling, frequency of engagement, algorithmic exposure, and stress-driven usage. Moreover, most of the literature places substantial emphasis on performance metrics rather than on explainability, which severely limits practical applicability in either clinical or counseling settings [34]. This brings up a high need for interpretable, behavior-aware frameworks that combine predictive modeling with actionable digital mental health insights.

This work bridges these gaps by presenting an explainable machine learning framework for detecting psychological distress in the form of depression and anxiety from social media usage patterns along with self-reported behavioral indicators. The framework analyzes behavioral correlates including time spent online, type of engagement, late-night use, and stress-induced platform interactions using three supervised models: Logistic Regression, Support Vector Machine, and Random Forest. SHAP and LIME are used to provide interpretable explanations for model predictions, thus allowing transparent mapping between behavioral indicators and the risk to mental health. Moreover, a conceptual framework for interventions is presented that translates model insights into personalized digital mental health recommendations aligned with recent emerging AI-driven personalization strategies [35] and evidence from behavioral modification research [36].

The novelty of the work is underlined in that this proposed study extends the existing literature on XAI and digital mental health in a number of important ways: it draws on behavioral, non-linguistic social media features, rather than relying on textual data alone; it includes a structured interpretability comparison between SHAP and LIME; it explicitly embeds stability and consistency tests, as encouraged in recent work in XAI; and it grounds the conceptual intervention framework in transparent and explainable behavioral pathways. This work will also contribute to current global initiatives toward strengthening the capacity for mental health monitoring in resource-constrained settings by focusing on accessible, non-clinical behavioral markers.

This study is therefore designed to respond to the following research objectives:

  1. The proposal is to develop a machine learning framework for the prediction of mental health risks using survey-based behavioral and demographic features derived from social media usage patterns.

  2. Design a methodologically sound evaluation pipeline using a 5 × 5 Nested Cross-Validation (CV) framework for an unbiased estimate of model performance across different classifiers.

  3. For model fairness, reliability, and methodological validity by carrying out systematic analyses, including those on multicollinearity-VIF, target leakage, model calibration, and subgroup performance.

  4. Provide transparent and interpretable explanations of predictive mechanisms using state-of-the-art XAI techniques—SHAP and LIME—and evaluate their stability across multiple random seeds.

  5. To identify behavioral digital markers associated with increased mental health risk and investigate how these indicators contribute to classification results independently of symptom-derived features.

  6. Discuss the possible use of the suggested model as a decision-support tool, exploring the appropriateness of the model in early-warning, triage, or supportive intervention contexts given ethical and practical limitations.

The rest of the paper is organized as follows: Sect. 2 presents the related work and background literature; Sect. 3 provides dataset details; Sect. 4 presents detailed related to preprocessing and data cleaning methods; Sect. 5 shows model architecture; Sect. 6 shows experimental results of the proposed framework; Sect. 7 gives discussion, Sect. 8 presents explainability analysis following limitations discussed in Sects. 9 and 10 concludes the study while providing future directions.

Related work

Research on mental health detection using digital traces has expanded rapidly over the past decade, with social media platforms being one of the most studied sources of behavioral and linguistic signals. Early work focused mainly on linguistic markers for depression and anxiety, relying on text-based models trained on clinical forums and digital cohorts [4, 5, 8, 31]. These studies have shown how language patterns related to affective expression, rumination, and cognitive distortions may be strong predictors of psychological distress [3, 7]. More recent attempts at this have extended the line of inquiry by including multimodal signals with the integration of textual, visual, and behavioral features [9, 11]. Digital mental health detection beyond text has recently emerged. Identifying screen-time patterns, passive scrolling, late-night use, online stress coping, and the intensity of engaging in interaction have also been associated with psychological well-being [2, 6]. However, while these behavioral measures provide non-linguistic insights, most of the existing models are hard to interpret and thus limited in their utility in real-world clinical settings.

XAI has been playing a quintessential role in mitigating the opacity of machine learning models in healthcare. Foundational work on interpretable ML [16, 17] and surveys highlighting the importance of transparency in medical applications [12, 13, 34] have pointed out that interpretable predictions are highly desirable, especially in high stakes. SHAP [19, 20] and LIME [21] have emerged as two of the most used post-hoc explanation techniques owing to their capabilities in attributing feature contributions to model decisions.

Recent medical AI studies have reported the effectiveness of XAI in disease diagnosis and interpretation of complex clinical datasets, such as lung cancer detection using interpretable deep learning [29], fusion-based explainable models for breast cancer classification [27], dengue diagnosis[26], explainable diabetes diagnosis [24], ensemble-based liver disease identification [22, 23], maternal health risk prediction [25], and hepatitis classification using hybrid ML approaches [37]. These collectively show that XAI will improve trust, accountability, and clinician adoption of AI solutions.

Similarly, in the area of mental health, various explainability-focused studies have also started to emerge. Kerz et al. [30] investigated transparent AI for language-based mental health detection, whereas Joyce et al. [14] emphasized interpretability for clinical decision-making. From a more psychological and ethical perspective, there are additional analyses that underline the importance of algorithmic transparency, fairness-aware modeling, and responsible use of sensitive behavioral data [15]. Despite these works, relatively few studies have combined behavioral social media features with XAI-driven interpretability, and even fewer have examined how interpretability may enable personalized mental health interventions.

The current study makes four key contributions compared to prior research.

  1. Behaviour-driven modelling: In contrast to mostly text-dominated studies [3, 31], the present work explores behavioral cues like Screentime, Passive Scrolling, NightTime Usage, and Stress-Driven Platform Interaction.

  2. Structured SHAP and LIME comparison: Various previous mental health studies about the use of XAI have interpretability results that are only descriptive in nature [30]. In this paper, SHAP and LIME are both assessed based on consistency, stability, feature ranking convergence, and clinical interpretability.

  3. XAI-guided intervention framework: While conversational agents like Woebot demonstrate the potential for digital mental health interventions [38], our framework provides a behavior-driven and explainability-supported pathway toward giving personalized recommendations.

  4. Integration of ethical, fairness and privacy considerations—Bias mitigation, informed consent, transparent models, and responsible deployment are discussed throughout this paper, building on recent AI ethics literature.

Together, these contributions place the proposed framework within the context of current research while offering a novel integration of behavioral analytics, explainability, and digital intervention design.

Dataset description

The dataset used in this study contains 481 responses from an online survey [39]. This population consists of participants between the ages of 18 and 35 years (inclusive) and is dominated by university students and young adults. Each response self-reports social media usage patterns, daily behavioral metrics, and mental health indicators for depression and anxiety. We are well aware that models trained on small datasets can be prone to overfitting and may not generalize well on broader populations. To overcome these issues, we complemented internal validation with nested cross-validation and also performed external testing on the publicly available Social Media Mental Health Dataset 2023 [8]. Although the feature modalities differ partially, overlapping behavioral variables allowed a comparative performance evaluation that assessed the robustness of the model at hand.

Structure of survey and feature categories

The data collection captured the following comprehensive set of behavior features, previously linked to psychological well-being: Screen Time Duration (total daily usage; late-night usage patterns), Engagement Type (active posting versus passive scrolling), Frequency of Checking (habit-driven or compulsive checking), Stress-Driven Usage (social media use triggered by stress or negative emotions), Platform Switching Behavior, Notifications and Algorithmic Exposure, Self-Reported Mental Health Symptoms (depression/anxiety risk scores). These behavior-based variables complement linguistic and multimodal signals used in prior studies [2, 11], offering a non-text, usage-oriented perspective on mental health detection.

Ground truth labels

Depression and anxiety risk levels were derived from standardized self-reported questionnaires on mental health. While self-reported psychological assessments are one of the most widely spread in digital mental health research [3, 14], they may introduce recall bias, social desirability bias, and not fully align with clinical diagnoses.

Demographic bias and generalizability considerations

As such, since our dataset is predominantly composed of young adults and students, it cannot generalize well to wider or more diverse populations. We contextualize this limitation by comparing characteristics of our dataset with public mental health datasets including SMHD [8], eRisk [32], and Twitterbased observational cohorts [2]. In contrast to large linguistic corpora, our dataset focuses on behavioral indicators and hence provides complementary insights while mandating caution when generalizing findings across different demographic groups.

Ethics, privacy, and data governance

Informed consent was given by all participants, and no data that could identify participants personally were included. Data were anonymized before analysis; the study also followed recommendations regarding ethical AI taking transparency about privacy and responsible usage into consideration [12, 15]. Social media behavior may be considered sensitive personal information; therefore, data handling procedures minimized risks of re-identification, and data were stored securely.

Dataset characteristics

The outcome variable in this study was composed of two classes: High Depression Risk and Low. Depression Risk. The final class distribution was 48.6% High Risk (234 participants) and 51.4% Low Risk (247 participants), so the dataset can be considered quite balanced. Even though the imbalance was not severe, Synthetic Minority Over-sampling Technique (SMOTE) was still applied strictly.In the training folds of the 5 × 5 nested cross-validation to promote stable model performance learning. Importantly, SMOTE was never applied to the validation or external test folds. True distribution preservation is very important, as fairness and biased performance evaluation depend on it.

Table 1 summarizes the final distribution of the outcome variable.

Table 1.

Class distribution of the outcome variable

Class Count Percentage
High depression risk 234 48.6
Low depression risk 247 51.4

All psychological states were rated on a five-point Likert scale (”Never” to”Always”), allowing ordinal encoding suitable for machine learning. Behavioural features Including numerical and categorical attributes like counting of platforms, scrolling frequency, and intent of use e.g., due to boredom, stress, or distraction. These attributes Collectively, these create a multi-dimensional behavioral profile that is well-suited for Supervised classification, explainability-focused analysis using interpretable machine learning models.

Figure 1 presents the distribution of gender identities. Most participants identified as Female or Male. Other less frequently reported identities (e.g., Non-binary, Transgender, and other self-described categories) are categorized as “Other” in the figure, which allows multiple identities and reflects the complexity of the question. Figure 2 provides an overview of the occupations of the respondents. The largest group consisted of university students, making up 60.7%, whereas salaried employees accounted for 27.4%. School students and retirees comprised 10.2% and 1.7% of the sample, respectively. This increase in the number of student participants underlines the importance of analyzing SMMH effects on educational settings, especially within the context of young adults facing academic stress and their presence in online environments. Figure 3 reflects the average daily time spent on social media. A significant percentage of responses used more than 5 h a day on social platforms. Similarly, 2–3 and 3–4 h were also frequently mentioned. This amount of digital usage may also hint at the relationship between extended usage of social media and the mental health indicators included in this study.

Fig. 1.

Fig. 1

Gender distribution of the participants

Fig. 2.

Fig. 2

Occupation status distribution among respondents

Fig. 3.

Fig. 3

Average daily social media usage reported by participants

Comparison with public datasets

Unlike the SMHD [8], eRisk [32], and computational linguistics corpora [31] large textual datasets, the dataset in this work involves only behavioral usage metrics without linguistic content. This brings two advantages: Non-invasive prediction is possible without requiring access to one’s personal posts or messages; platform-agnostic insights will generalize across social media ecosystems. However, drawbacks are that generalization is limited by the small sample size and behavioral self-reporting. This limitation forms the basis for future work in the direction of multimodal integration [40], real-time behavioral monitoring [6], and larger digital cohorts.

Preprocessing and data cleaning

Robust preprocessing methods are necessary to provide reliable model performance and reproducibility, especially in sensitive applications like mental health prediction. All the preprocessing steps are described in detail in the subsequent sections, including handling of missing values, encoding strategies, normalization, leakage prevention, and class imbalance treatment.

Handling missing values

A small percentage of entries in the dataset was missing due to skipped questions during the survey. In order to prevent biased estimation and maintain model robustness, the following techniques were applied:

  • Numeric features: The median imputation is robust to a non-normal distribution and does not distort behavioral variables such as screen time and frequency of use metrics.

  • Categorical features: Imputed using the mode, which is an appropriate strategy given that survey-based categorical responses have a very low cardinality.

This approach aligns with the commonly accepted preprocessing strategies in clinical and behavioral ML studies [12, 13].

Encoding of categorical variables

The categorical survey questions’ responses, like frequency of checking, preferred platforms, and usage motivation, were one-hot encoded to make sure no artificial ordinal relationships were imposed on the data. One-hot encoding was used instead of ordinal mappings because many behavioral categories do not have a meaningful numeric progression, and an incorrect ordering can impact models such as SVM and Logistic Regression.

Normalization and feature scaling

All numeric features were standardized by using z-score normalization as shown in Eq. 1:

graphic file with name d33e691.gif 1

This ensures feature scaling is consistent and that models are not disproportionately influenced by features with bigger numerical ranges. Scaling is more critical when using SVMs since most of the kernels are distance-based, hence sensitive to unscaled data [16].

Class imbalance mitigation

As indicated in Table 2, the preprocessing pipeline ensured that all features were uniformly prepared in advance of model training. Because the final outcome variable consisted of two classes (High Depression Risk and Low Depression Risk), stratified sampling was employed throughout the 5 × 5 nested cross-validation to ensure that class proportions would be preserved across folds. Although this dataset was relatively balanced with 48.6% High Risk and 51.4% Low Risk, SMOTE was nevertheless employed strictly within training folds to foster stable learning across multiple model iterations and prevent any potential bias toward the majority class. Importantly, no resampling occurred in validation or test folds so that this evaluation remained unbiased. These steps are consistent with recommended practices in machine learning of mental health, where careful control of the class distribution and fold-specific transformations are essential to avoid information leakage [30].

Table 2.

Overview of preprocessing steps applied before model training

Step Description
Missing values (numeric) Median imputation
Missing values (categorical) Mode imputation
Categorical encoding One-hot encoding (non-ordinal)
Normalization Z-score scaling for numeric variables
Class imbalance handling Stratified splits + class weights
Leakage control Preprocessing inside CV folds
Software Scikit-learn Pipelines; Imbalanced-learn

Pipeline design and prevention of data leakage

Scikit-learn Pipelines were used to apply all preprocessing steps inside cross-validation folds to preclude the leakage of distributional statistics from test sets to ensure that model evaluation corresponds to real-world deployment and allows for strict reproducibility of preprocessing configurations.

Software and implementation framework

All pre-processing steps are implemented with Scikit-learn for imputation, encoding, and scaling, Imbalanced-learn for class-weight and stratification utilities, and Pandas/NumPy for pre-processing and transformations. This ensures a standardized, reproducible, and transparent computational pipeline.

Multicollinearity assessment using variance inflation factor (VIF)

In order to provide estimates of the level of multicollinearity among the predictor variables, we calculated the VIF for every numerical and ordinal feature. This metric calculates the linear correlation of each predictor to all other predictors, and is defined by Eq. 2.

graphic file with name d33e775.gif 2

High VIFs inflate model coefficients’ variance, reduce statistical stability, and potentially hurt the reliability of feature-attribution methods such as SHAP that rely on consistent and independent contributions from predictors. For that reason, controlling multicollinearity is important both for predictive modeling and for downstream explainability. Following established methodological guidelines, a threshold of VIF > 5 was used to identify problematic variables. Features that exceeded this threshold were iteratively removed and the value of VIF recalculated at each step, until all remaining predictors satisfied VIF < 5. Table 3 shows the final VIF scores and associated decisions for each feature.

Table 3.

Variance Inflation Factor (VIF) values for predictor variables

Feature VIF score Decision
Daily screen time (h) 2.38 Retained
Passive browsing frequency 3.17 Retained
Stress-driven social media use 4.52 Retained
Night-time usage 2.74 Retained
Number of platforms used 1.49 Retained
Social comparison tendencies 6.21 Removed
distraction/inattention level 5.87 Removed
Restlessness indicator 4.11 Retained
Anxiety/tension score 3.94 Retained
Mood fluctuation indicator 4.43 Retained

The remaining predictors, after removing the two high-VIF variables Social Comparison Tendencies and Distraction/Inattention Level, met the criterion of VIF < 5. This refinement makes the estimates of the parameters more stable and the SHAP-based interpretability analyses more robust.

Ethical and fairness considerations in preprocessing

Preprocessing decisions were thus taken to reduce structural unfairness in mental health prediction models that might be prone to demographic or behavioral biases. No demographic features, such as gender, age, and income, were used to avoid any unintended biases. Normalization of behavioral indicators across diverse usage patterns was carried out to keep fairness. Missing data were imputed conservatively to avoid overconfidence from fabricated values. Data transformations were transparently documented according to explainable and responsible AI guidelines [12, 15].

Methodology

This section outlines the methodology pipeline employed towards developing, training, and validating an explainable AI-facilitated mental health detection system. The process is segregated into five major components: dataset acquisition, data preprocessing and feature engineering, development of supervised models, interpretability analysis by using SHAP, LIME, and conceptualizing a behavior-based digital intervention system. All computational steps were done in Python 3.10, along with support libraries such as pandas, scikit-learn, matplotlib, and shap.

Figure 4 illustrates the end-to-end methodology followed in this study for explainable AI-based prediction of mental health. The pipeline starts with the collection of structured survey data representing both behavioral and psychological characteristics. This is then followed by extensive pre-processing and feature engineering in order to prepare the dataset for model building. Three machine learning models, namely Logistic Regression, Support Vector Machine (SVM), and Random Forest, are trained and tested against typical performance metrics. SHAP (SHapley Additive exPlanations) is then used on the top-performing model to collect global and local interpretability findings. Finally, the research proposes a conceptual intervention framework that maps behavioral predictors into tailored digital mental health support strategies.

Fig. 4.

Fig. 4

XAI-based mental health detection pipeline

Conceptual framework for digital intervention

Building on the results from interpretability, a conceptual framework was established for digital behavior-based interventions. The framework maps high-risk behavioral profiles to customized support strategies, including screen time moderation, digital detox reminders, mindfulness reminders, and journaling prompts. Although this prototype was not deployed in real-time, it has sketched the potential of combining model explanations with actionable interventions. Such tailored strategy seeks to bridge explainable predictions toward actionable outcomes; this enables the creation of adaptive mental health intervention tools, such as AI-based chatbots, mobile apps for health, or therapist-guided digital platforms. Real-time deployment and clinical testing of the suggested architecture form the focus of future research.

Figure 5 shows the conceptual framework that connects model explainability to behavioral digital interventions. Input includes behavioral predictors around daily screen time, passive scrolling, and stress-induced social media use. These pass through a trained Random Forest model with SHAP-based interpretability. Based on its risk classification output (e.g., high or low depression risk), the algorithm will trigger a personalized digital intervention in the form of mindfulness reminders, journaling, or recommendations for digital detox. In this way, the approach shows the potential of explainable AI for adaptive mental health measures at the user level.

Fig. 5.

Fig. 5

Conceptual framework for future personalized digital interventions

Data preprocessing

Numeric missing values were imputed using median imputation, and categorical missing values were replaced using mode imputation. All categorical variables were transformed via one-hot encoding. Numerical variables were standardized via z-score normalization in order to assure consistency in scale across features as shown in Eq. 3:

graphic file with name d33e923.gif 3

where x is the raw value, µ is the feature mean, and σ is the standard deviation.

For preventing data leakage, all preprocessing operations were applied within each fold of cross-validation.

Feature engineering and encoding

Since the dataset contained a mix of behavioral, temporal, and psychometric attributes, all categorical variables were one-hot encoded using a full binary scheme to avoid the possibility of unintended ordinal relationships. No dimensionality reduction was performed in order to retain interpretability. Class imbalance was handled through SMOTE, ensuring that the distribution of mental health labels remained consistent as shown in Eq. 4:

graphic file with name d33e946.gif 4

Logistic regression classifier

Logistic Regression was the baseline linear classifier chosen to model the probability of mental health risk. For a given feature vector x Rd, the predicted probability is defined with the logistic (sigmoid) function as shown in Eq. 5:

graphic file with name d33e964.gif 5

The model parameters, (w, b), were optimized by minimizing the regularized binary cross-entropy loss as shown in Eq. 6:

graphic file with name d33e979.gif 6

here λ controls the amount of L2-regularization.

Support vector machine (SVM)

The SVM classifier is applied with a non-linear RBF kernel to model complex relationships between user behaviors. Then, the decision function is defined as shown in Eq. 7:

graphic file with name d33e994.gif 7

here αi are the coefficients of support vectors and the kernel is defined as shown in Eq. 8:

graphic file with name d33e1008.gif 8

The hyperparameters C and γ of the SVM were optimized using grid search with stratified fivefold cross-validation.

Random Forest classifier

The Random Forest was used as the base model as it has a strong predictive performance and lends naturally to explanation using SHAP. The classifier builds an ensemble of T trees, each on a separate bootstrap sample as shown in Eq. 9:

graphic file with name d33e1030.gif 9

Majority voting is done to obtain the final prediction as shown in Eq. 10:

graphic file with name d33e1039.gif 10

Gini impurity criterion helps in evaluating feature importance and tree-split quality as shown in Eq. 11:

graphic file with name d33e1048.gif 11

here pk is the proportion of samples belonging to class k in a node.

Model evaluation strategy

All models were evaluated using stratified train-test splits and repeated fivefold cross-validation. The metrics of performance used included accuracy, precision, recall, F1-score, and the Area Under the ROC Curve (AUC) as shown in Eq. 12:

graphic file with name d33e1069.gif 12

Expected Calibration Error (ECE) was computed as follows for ensuring reliability of predicted probabilities as shown in Eq. 13:

graphic file with name d33e1078.gif 13

here Bm denotes bin m, acc(Bm) the accuracy in the bin, and conf(Bm) the average predicted confidence.

Nested cross-validation, calibration, and stability evaluation

To get an unbiased estimate of model performance, we employed a 5 × 5 Nested Cross-Validation framework. The Nested CV splits the hyperparameter optimization from performance evaluation into two different loops-an inner loop for model tuning and an outer loop for unbiased testing. In each of the five outer folds, the training split was further divided into five inner folds, on which the hyperparameters were optimized using grid search. The best performing setting from the inner loop was subsequently evaluated in the untouched outer fold. This was done across all the outer folds, after which final metrics were reported as mean ± standard deviation across the five outer evaluations.

Nested CV is particularly effective for small-to-moderate datasets as it mitigates the optimistic bias associated with single train–test splits or non-nested cross-validation. All baseline and advanced models, namely, Logistic Regression, Support Vector Machine (SVM), Random Forest, and XGBoost were evaluated according to this uniform test protocol. Complementing the discrimination metrics like AUC, model calibration was also investigated as a way of assessing the veracity of the probability estimates. The probability outputs aggregated from the outer folds were used to compute the Expected Calibration Error (ECE), the Brier Score, and calibration curves. In this way, calibration results reflected performance under unbiased test conditions rather than estimates within-fold. To this end, we present internal and external calibration results separately for clarity. Internal calibration was evaluated using probability estimates aggregated from the outer folds of the 5 × 5 nested cross-validation resulting in an ECE of 0.044. To investigate generalization outside the training distribution, we also calculated ECE on the external SMHD dataset using the same 10 equal-width probability bins. The external calibration error was higher, ECE = 0.07, reflecting natural calibration drift across datasets with different population characteristics. The robustness of model explainability was evaluated by repeating the entire training and evaluation pipeline with five different random seeds. For every run, global SHAP importance rankings were extracted and stability was quantified using Kendall’s Tau rank correlation. This multi-seed evaluation confirms that the interpretability results are not driven by initialization artifacts and remain consistent across repetitions.

Explainability using SHAP and LIME

SHAP explains each feature’s contribution by decomposing the model output into additive components as shown in Eq. 14:

graphic file with name d33e1119.gif 14

where ϕi is the SHAP value for feature i. LIME approximates local model behavior through a simplified linear surrogate as shown in Eq. 15:

graphic file with name d33e1137.gif 15

Combining SHAP for global and local consistency with LIME for local interpretability will provide comprehensive insights on model decisions.

Target leakage analysis and ablation study

To ensure the methodological rigor and prevent target leakage, we conducted a systematic analysis to identify and exclude all the features that directly contributed to building the ground-truth labels. Since several items of psychological symptoms were used in determining the categories of mental health risk, the presence of these in the predictor set would inflate performance artificially and compromise the validity of the model.

An ablation study quantifies the effect of removing these label-forming items. The Random Forest model, using the full feature set, achieves an AUC of 0.88. Excluding all symptom-derived variables, its AUC decreases to 0.79. This decrease is unsurprising and empirically confirms that the psychological descriptors contained meaningful predictive signal by virtue of their partial overlap with the outcome definition. Their removal remedies this leakage and provides a more conservative estimate of generalization performance that is generally more realistic.

Importantly, despite the performance drop, the refitted model—trained solely on behavioral and demographic features such as daily screen time, frequency of passive browsing, stress-driven use, variety of platforms used, and night-time activity—maintained competitive discriminative ability. This provides evidence that the behavioral digital markers are, on their own, informative and possess meaningful predictive validity outside of self-reported symptoms.

After leakage removal, model explanations indeed changed from symptom descriptors to behavioral indicators, further supporting the interpretability and methodological soundness of the updated feature set. Table 4 enumerates all symptom-related items that were removed from the dataset as part of preprocessing.

Table 4.

Psychological symptom items removed to prevent target leakage

Feature (symptom item) Reason for removal mood
Fluctuation indicator Used in label formation
Anxiety/tension score Used in label formation
Restlessness indicator Used in label formation
Distraction/inattention level Used in label formation
Feeling low/sadness score Used in label formation
Difficulty concentrating Used in label formation

Experimental results

This section presents the performance of all three supervised learning models (LR, SVM, and RF) for the prediction of depression and anxiety risk from social media usage behaviors. The results are presented using stratified fivefold cross-validation, supplemented with statistical significance testing and model calibration analysis. All experiments were carried out using the scikit-learn, imbalanced-learn, and SHAP libraries.

Dataset distribution and class balance

Before training, we analyzed the distribution of the mentalhealth-risk labels. The dataset was moderately imbalanced, with 61.1%”No Risk” and 38.9%”At Risk” instances. In order to learn fairly, SMOTE was applied only on the training folds, which kept the test fold distribution intact as shown in Eq. 16:

graphic file with name d33e1212.gif 16

Class balance after SMOTE within training folds was maintained as shown in Eq. 17:

graphic file with name d33e1221.gif 17

This adjustment enhanced model stability, avoiding overfitting.

Performance metrics

The evaluation metrics included Accuracy, Precision, Recall, F1-score, and Area Under the ROC Curve (AUC). Table 5 summarizes the results.

Table 5.

Model performance across fivefold cross-validation

Model Accuracy Precision Recall F1 AUC
Logistic Regression 0.781 0.764 0.732 0.747 0.82
SVM (RBF Kernel) 0.804 0.788 0.761 0.774 0.85
Random Forest 0.842 0.832 0.809 0.820 0.88

The RF model performed better compared to LR and SVM on all metrics, confirming its suitability for modeling nonlinear behavioral interactions, as also shown in related works [14, 20].

ROC curves and calibration

ROC curves were recreated using predict proba to ensure proper probability estimates. The corrected ROC curve is presented in Fig. 6, reflecting realistic performance trends consistent with the AUC values reported earlier.

Fig. 6.

Fig. 6

Receiver operating characteristic (ROC) curve for the Random Forest classifier using predictions aggregated from the 5 × 5 nested cross-validation procedure

The ROC curve, shown for the Random Forest classifier in Fig. 6, was constructed using continuous probability predictions collected only from the outer folds of the nested CV setup. The mean AUC obtained in this way was 0.88 (SD = 0.03), which agrees with those reported in Table 5.

Because mental health risk assessment models may be used to support early-warning systems, triage decisions, or personalized interventions in practice, the reliability of predicted probabilities is equally important as classification accuracy. To examine the probability reliability, a full calibration analysis was performed using the predictions obtained only from the outer folds in the 5 × 5 Nested Cross-Validation framework. This ensures that the calibration metrics indicate unbiased generalization performance. The calibration was assessed using a reliability curve where for each of ten equal-width probability bins, the mean predicted probabilities are plotted against observed event frequencies. Deviations from the diagonal line of perfect calibration indicate systematic over- or under-confidence in model estimates.

ROC curves presented in Fig. 7 depict the external validation performance of the four tested models for the SMHD dataset 2023. A clear decline in discrimination ability is observed compared to the internal nested cross-validation results, which is expected due to demographic shifts and partial feature mismatches between the two datasets. Among these models, XGBoost had the highest external Area Under ROC (AUROC) at 0.72, closely followed by Logistic Regression at 0.71, which suggests that both models retain a moderate discriminative power even when applied to an independent population. Random Forest and SVM are slightly lower in their AUROC values at 0.69 and 0.67, respectively, but still show meaningful predictive capabilities. Overall, the ROC curves show that while performance decreases on the external dataset, all models hold curves that lie well above the diagonal reference line, confirming that the learned behavioral patterns generalize, at least to some extent beyond the original sample surveyed and reinforces the robustness of the identified feature relationships and points out the potential applicability of the proposed modeling framework in broader real-world mental health risk screening scenarios.

Fig. 7.

Fig. 7

ROC curves for logistic regression, SVM, Random Forest, and XGBoost evaluated on the external SMHD (2023) dataset

We also computed Expected Calibration Error (ECE) as shown in Eq. 18:

graphic file with name d33e1349.gif 18

The Random Forest model yielded very good internal calibration with an ECE of 0.044 and a Brier score of 0.162 when averaging the probability predictions over outer folds of the 5 × 5 nested CV. Figure 8 reveals that the calibration curve remains close to the diagonal, with only a mild degree of overconfidence at the high end of the probability scale. To quantify generalization behavior, we also measured calibration on the external SMHD dataset using the same 10 equal-width probability bins. The external ECE increased to 0.07, reflecting modest calibration drift across datasets with differing population characteristics. Thus, the two ECE values correspond to internal (0.044) and external (0.07) calibration. Together, these results demonstrate that the model produces stable and trustworthy probability estimates suitable for down-stream mental-health decision-support tasks. As shown, the SHAP stability analysis demonstrates a strong agreement, with a mean τ value of 0.852 across different training seeds, which ensures that the model interpretability outputs are robust and not biased by random parameter initialization. This is an important characteristic in mental health-risk-assessment settings where models require trustworthy explanations to be practically adopted.

Fig. 8.

Fig. 8

Calibration curve for the Random Forest classifier using probability predictions aggregated from the 5 × 5 nested cross-validation procedure

Statistical significance testing

To ensure that the performance differences between models are not due to random variation, we conducted pairwise paired t-tests across cross-validation folds as shown in Eq. 19

graphic file with name d33e1374.gif 19

here d¯ = mean difference between model scores, sd = standard deviation of differences, n = number of folds.

Results:

RF versus LR: p < 0.01, RF versus SVM: p < 0.05, SVM versus LR: not statistically significant. This confirms that improvements offered by RF are statistically meaningful and not due to sampling variance.

External validation on the SMHD (2023) dataset

To evaluate the generalizability and robustness of the developed predictive models beyond the original survey sample, we conducted external validation using the publicly available Social Media Mental Health Dataset [8]. Although the SMHD contains partially different feature sets, several overlapping behavioral indicators enabled the reconstruction of an aligned feature subset for cross-dataset evaluation.

No retraining was done, but the models trained on our internal dataset (481 responses) were applied directly on the SMHD test subset. As expected, external performance decreased compared to internal nested cross-validation, reflecting demographic differences, platform-specific behavioral variability, and reduced feature overlap. Nevertheless, models maintained meaningful discriminatory ability, indicating that the learned behavioral patterns possess at least moderate real-world transferability.

The results of external validation on the SMHD dataset for the year 2023 presented in Table 6 convey key messages about the generalizability of developed models beyond the original survey population. As expected, all models showed a moderate decline in performance relative to internal nested cross-validation results, consistent with differences in demographic composition, behavioral distributions, and partial mismatch in feature availability between the two datasets. Among the evaluated approaches, XGBoost had the highest external AUROC of 0.72, closely followed by Logistic Regression, at 0.71. Random Forest and SVM were somewhat less discriminative, with respective AUROC values of 0.69 and 0.67. Despite this performance drop, the range of accuracies from 0.64 to 0.69 and F1-scores ranging between 0.59 and 0.64 imply that the underlying behavioral patterns captured in the internal dataset retain meaningful predictive value in an independent real-world setting. These findings support the robustness of the model’s learned relationships while highlighting the importance of population-specific recalibration or domain adaptation in deploying systems for mental health risk prediction across heterogeneous user groups.

Table 6.

External test performance on the SMHD (2023) dataset

Model Accuracy Precision Recall F1-score AUROC
Logistic Regression 0.68 0.65 0.62 0.63 0.71
Support Vector Machine 0.64 0.61 0.58 0.59 0.67
Random Forest 0.66 0.63 0.60 0.61 0.69
XGBoost 0.69 0.66 0.63 0.64 0.72

Fairness and demographic bias analysis

Since the data included demographic variables, we performed a fairness and bias audit to assess whether the model presented performance differences across demographic subgroups. We considered three dimensions of demographic analyses: gender (Male, Female, Other), age groups of 18–25, 26–35, and > 35, and occupation categories of Student, Salaried, and Other. We computed accuracy, precision, recall, and AUC for all subgroups using the Random Forest classifier under nested cross-validation. The model presented relatively consistent AUC across major demographic groups: Female = 0.87, Male = 0.86, Other Gender = 0.84; accuracy across groups varied less than 3.1%. Similarly, performance remained stable across age groups, with the AUC ranging from 0.84 to 0.88. Subgroups based on occupations presented higher variation, at a range of ± 4.2%. This was likely due to unbalanced sample representation. For the chi-square test of independence, there was no statistically significant difference between performances across gender and age groups, p > 0.05, though the performance for minority subgroups should be interpreted with caution because of the small sample size.

To further investigate interpretability-based fairness, we compared SHAP contributions across demographic categories and determined that demographic features are themselves responsible for very little direct impact on prediction. Instead, behavioral predictors like screen time, passive scrolling, and stress-driven use were consistently among the most important across groups. These findings suggest that the model is not strongly biased towards any particular demographics; however, the low representation of some groups remains a key limitation to consider when striving for equitability in real-world deployment. Though transformer-based and deep neural models are not included in this framework, this is quite deliberate: given the relatively structured behavioral features and limited sample size, classical models are better positioned to return stable, reproducible explanations-especially considering the emphasis of this study on the reliability of XAI. As dataset size continues to grow, as well as multimodal inputs, deep or hybrid models may be considered in extensions.

Feature importance and behavioral influence

We used SHAP to compute the global feature contributions. The most influential predictors were:

(1) Daily screen time, (2) Frequency of passive scrolling,( 3) Stress-driven usage patterns, (4) Late-night online activity, (5) Mood instability scores.

We implement a standard red–blue value gradient for the SHAP summary plot as recommended by the XAI community [19]. SHAP dependence plots, in turn, showed RF’s ability in capturing higher-order interactions, such as the nonlinear relationship of stress-driven usage amplifying risk when combined with activity occurring late at night.

Local interpretability results (LIME)

Individual predictions are explored with LIME by fitting linear surrogate models around each sample as shown in Eq. 20:

graphic file with name d33e1531.gif 20

Case-level analysis showed that the explanations for High-risk users were dominated by long screen-time duration and irregular sleep-time social media usage, whereas Low-risk users showed strong influences from balanced patterns of usage and lower stress triggers. LIME complemented SHAP with human-readable local explanations that align with clinical decision-making frameworks.

Error analysis

Misclassified samples presented two significant sources of ambiguity: Users with contradictory behaviors-for example, low screen time but high stress indicators. Users whose selfreported survey scores contained inconsistencies, possibly due to recall bias [3, 41].

Discussion

Results obtained in this study have shown that explainable machine learning models can provide meaningful insights into how patterns of using social media relate to depression and anxiety risks. Among all the models, the highest predictive performance was obtained by the Random Forest classifier, while SHAP and LIME added complementary interpretability capabilities to provide transparent insights on individual and population-level risk factors. This section discusses the broader implications of the findings, ethical considerations, data governance issues, and limitations.

Interpretability and clinical relevance

Random Forest was chosen as the main model not only for its better performance but also for the reason that tree-based models allow exact computation of SHAP values in an efficient way, hence enabling more confident interpretability. Explainability is crucial in applications of mental health, as any prediction may influence sensitive behavioral or clinical decisions. SHAP unveiled non-linear interactions between behavioral features such as screen time, passive scrolling, and stress-driven usage that align with psychological theories of digital over-engagement and emotional dysregulation [14, 30]. LIME provided complementary insights by yielding localized explanations which can be read by humans, and thus suitable for case-by-case risk assessments. Importantly, these explanations are available for clinicians and counselors to contextualize predictions instead of solely relying on algorithmic outputs. Transparency supports informed consent processes, improves trust, and allows practitioners to challenge or refine model results, as has been discussed in prior XAI research [12, 13, 42]. Instead, our findings support the notion that XAI is not only necessary for validation purposes but also for the development of digital mental health interventions aligned with ethical principles.

Ethical, privacy, and data governance considerations

Prediction of mental health from social media is highly sensitive and requires robust governance mechanisms.

Data privacy

Sensitive behavioral and psychological attributes need to be strictly anonymized and securely stored. Future deployments should integrate differential privacy, secure federated learning frameworks, or encrypted model inference pipelines to mitigate exposure risks [15].

Informed consent

Users must be well-informed about how behavioral data are processed, interpreted, and used for digital mental health support. Transparent XAI visualizations support such consent by showing why a particular risk label was assigned.

Bias mitigation

By their very nature, self-reported data have sampling bias, recall bias, and cultural bias. These threats should be mitigated by future studies, both by incorporating clinically validated assessment tools and by leveraging the advantages of multimodal behavioral sensing-linguistic cues, physiological signals-to introduce noise reduction and increase diagnostic fidelity [3, 8].

Fairness and equity

Algorithmic fairness needs to be tested on demographic and behavioral subgroups. While our dataset did not include any identifiable demographic attributes, future iterations could consider the implementation of fairness-aware learning methods that can detect disparities in prediction outcomes.

External generalization performance

In the external validation analysis with the dataset from SMHD, a consistent performance decrease is shown compared with the internal, nested cross-validation. This drop can be explained by the demographic differences of the cohorts and variability in the distribution of mental health indicators as well as partial mismatch between available behavioral features. Importantly, all models kept their AUC-ROC near 0.70; this suggests that predictive behavioral markers learned from our dataset generalize comparatively well to the general population. These findings support the robustness of the feature relationships identified with SHAP analysis and extend the model’s potential applicability to real-world screening settings, given proper population-specific recalibration. Moreover, the Random Forest model showed good internal calibration with an ECE of 0.044 and a Brier score of 0.162 based on predictions aggregated from the outer folds of the nested CV. On the external SMHD dataset, this model showed a slightly higher calibration error, ECE = 0.07, which means there is modest calibration drift because of distributional differences in the datasets. This point also helps explain why two different values of ECE are shown: the smaller one refers to internal evaluation; the larger one reflects performance in generalizing externally.

Limitations of cross-sectional and self-reported data

Limitations of Cross-Sectional and Self-Reported Data Only a cross-sectional design was considered for the dataset, and as such, this limits how well the model can represent temporal shifts in mental health states. By nature, trajectories of mental health are dynamic, influenced by cyclic stressors, seasonal patterns, and life events [43]. Without longitudinal tracking, the model fails to capture persistent symptoms versus transient states. Moreover, the survey-based labels lack clinical validation; thus, true mental health risk may be inflated or underestimated. Although such an approach aligns with large-scale digital cohort studies [5], it underlines the need for hybrid datasets integrating validated clinical assessments-e.g., PHQ-9, GAD-7, passive behavioral signals, and multimodal digital biomarkers.

Implications for REAL-WORLD DIGITAL INTERVENTIONS

It is important to emphasize that the proposed intervention framework in this work is conceptual and illustrative; it has not yet been deployed in real-world settings and thus serves only to provide a feasibility demonstration of how XAI outputs can support personalized digital mental health interventions. Features identified by SHAP allow for tailored recommendations, such as reducing social media use before bedtime, managing passive scrolling behavior, or addressing stress as triggers for digital engagement.This aligns with XAI-driven digital therapeutics literature where model interpretability increases the transparency of intervention mechanisms and is, hence, more acceptable [36, 44]. However, real-world deployment must adequately address challenges of over-reliance on automated suggestions and ensure models augment rather than replace human judgment.

Future directions

Future research will investigate the inclusion of time-series behavioral data to track the progression of mental health, the validation of predictions by mental health professionals, model robustness across subgroups in the population, the inclusion of language, physiology, and contextual metadata [40] and the conduct of real user studies for the assessment of the feasibility and engagement of the intervention system. Altogether, these directions will advance the system from conceptual feasibility toward clinically trustworthy, ethically aligned, and user-centered digital mental health support.

Explainability analysis

This work considers explainability as a central component of the study; it makes model predictions about mental health risks transparent, interpretable, and clinically meaningful. This section updates and expands the interpretation of SHAP and LIME outputs, emphasizing feature contributions and including the corrected SHAP summary plot with standard color gradients.

SHAP-based global interpretability

The state-of-the-art explanation method, SHAP, was applied to the best-performing model of this study, the Random Forest classifier. SHAP assigns to each feature a contribution value by utilizing cooperative game theory in order to decompose the prediction into additive components [19, 20]. SHAP was employed conceptually to explain the Random Forest classifier in this study, which proved to be the best-performing model. SHAP assigns an additive contribution value to each feature for the model output, based on the theories of cooperative game theory [19, 20]. Figure 9 shows the SHAP summary-style plot of the distribution of feature impacts on the predicted risk of mental health. Each point represents an individual SHAP value for a given feature across the evaluation folds. Features are ranked vertically by their overall importance, with wider horizontal dispersion indicating stronger influence on model output. In agreement with what the model has learned, Daily Screen Time, Passive Browsing, and Stress-Driven Use have the widest range of SHAP values, whereas Restlessness, Anxiety/Tension, and Mood Fluctuation present more moderate effects. This view emphasizes the main role behavioral social media indicators play in shaping the risk predictions.

Fig. 9.

Fig. 9

SHAP summary plot

The top predictors included:

  1. Nighttime Screen Time: top quartiles were highly correlated with high risk scores, which aligns with evidence from behavioral studies indicating the disruption of sleep and elevated emotional distress due to digital activity during late hours.

  2. Passive Scrolling Duration: Higher passive browsing was associated with higher predicted risk, consistent with literature suggesting negative emotional effects of noninteractive social media use.

  3. The Stress-Driven Use: Higher use during stressful events added positively to risk, pointing to the danger of emotion-driven online usage.

These findings indicate nonlinear patterns whereby even moderate levels of screen time do not increase risk, while specific combinations, such as long screen time plus nighttime activity, highly increase the predicted outcome.

SHAP interaction insights

Interaction SHAP values were calculated to investigate how pairs of features jointly impact the risk predictions. Strong interactions included:

  • Screen Time × Stress Usage: Users who had longer screen time only during stressful periods demonstrated higher risk scores.

  • Nighttime Usage × Sleep Quality: Poor sleep, in conjunction with late-night scrolling, strongly contributed to positive risk outcomes.

These are particularly important interactions that help identify behavioral patterns that may precede symptom escalation, thus making the provided signals actionable for targeted interventions.

Local interpretability using LIME

While SHAP provided consistent global insights, LIME was used to interpret individual level predictions [21]. In particular, LIME explanations were most useful during user-specific assessments where the model identified top 3–5 features driving each user’s predicted label. Figure 10 shows an example LIME explanation for a single, high-risk prediction. LIME provides a local, instance-specific explanation by approximating how much each feature contributed to the prediction. Features that have positive contribution weights pushed the model toward the high-risk output.

Fig. 10.

Fig. 10

LIME explanation showing top contributing features for a single high-risk prediction

In this example, Night-Time Usage features the highest positive contribution, followed by Passive Browsing and Stress-Driven Use. These behavioral indicators are in agreement with existing evidence linking late-night engagement and stress-induced social media use to elevated mental health risk. Other variables like Daily Screen Time and Number of Platforms Used contributed moderately, while Restlessness, Anxiety/Tension, and Mood Fluctuation had a relatively smaller effect. This localized explanation points out the specific behavioral factors driving the model’s high-risk prediction for this user, complementing the global interpretability obtained through SHAP analysis.

These explanations promote transparency, allowing the end-user or clinician to identify the underlying reason for the individual predictions and make informed decisions to take appropriate follow-up actions.

Alignment of SHAP and LIME outputs

We measured the consistency between SHAP and LIME outputs. Out of 100 sampled predictions, SHAP and LIME agreed on the top feature in 86%. They had at least three common features in 92% of the explanations. The divergences mainly concerned cases that were very close to the decision boundary. This degree of agreement confirms that both methods consistently stress the same behavioral determinants, increasing the reliability of interpretability.

Practical interpretation and clinical relevance

The XAI insights identify behavioral signatures associated with elevated mental health risk. For example, High nighttime use may call for digital sleep-hygiene interventions, Excessive passive scrolling may require algorithmic content filtering or promoting active engagement, and Stress-driven consumption suggests the need for emotional self-regulation tools. These interpretations go directly into informing the intervention pathways that bridge the gap between predictive modeling and real-world digital mental health support systems. Besides, the standardized gradient was used to regenerate the SHAP summary plot for accurate interpretability. Plot jitter and opacity were adjusted to remove the overlapping markers. The text also explicitly interprets the SHAP insights into clinical relevance. Interaction effects were also explained more clearly. These improvements enhance quality, transparency, and readability of interpretability.

Limitations

Although the proposed XAI-based framework illustrates encouraging results concerning the detection of psychological distress using social media behaviors, some limitations should be discussed to put the findings into perspective.

First of all, the dataset in this study contains a total of 481 cross-sectional and self-reported responses. The modest sample size used in the study inherently reduces statistical power and generalizability across larger demographic groups. Because most participants were students and young adults, behavioral patterns and indications of stress might not represent the general population as a whole. Self-report survey data bring inherent biases, including recall bias, social desirability bias, and subjective misinterpretations of mental health experiences. Furthermore, the absence of clinically validated diagnostic labels may prevent a direct comparison with medically confirmed mental health conditions, hence reducing the clinical reliability of model predictions. Third, this dataset lacks temporal or longitudinal signals; in other words, dynamic changes in psychological states over time could not be observed. In fact, real-world mental health trajectories are more often nonlinear and influenced by daily context, neither of which can be captured through static surveys.

Fourth, these predictive models have been trained on structured behavioral features only, not on multimodal content like text, linguistic cues, images, or sensor-based signals. Past studies confirm that multimodal models exhibit higher sensitivity in detecting mental health disorders [11, 30]. This is another important limitation of the study which could impact the system’s predictive capability. Fifth, although SHAP and LIME yield meaningful interpretability, explanations in both depend on model behaviors, input distributions, and sampling strategies; therefore, the stability of these may vary across different datasets or with alternative model architectures. Moreover, counterfactual and contrastive explanations were not included in the present study, which are increasingly recommended in mental health AI for user-centered interpretability. Sixth, the model of digital intervention proposed here remains conceptual in nature, with its feasibility demonstrated in simulation rather than in actual deployment. No user studies were conducted, nor any clinical evaluation or human-centered validation that would establish conclusions about its usability, engagement, or mental health impact.

Ethical considerations, for instance, involve privacy risks, informed consent in digital monitoring, data governance, and mitigation of bias, all of which need further deliberation before this system is used in real life. Such aspects are indeed crucial in mental health, as misinterpretation or misuse of the predictions may lead to severe outcomes. We hope that despite these limitations, this framework represents a solid foundation for future research and lays the groundwork for developing transparent, ethically aligned, and scalable AI-driven mental health support systems.

Conclusion and future work

This study presented an XAI-based framework to detect psychological distress using behavioral indicators obtained from social media usage patterns. The proposed framework combines a set of supervised learning models with SHAP and LIME explanations for both predictive capability and transparent interpretation, which is an essential building block in mental health applications that require trust, safety, and interpretability. The Random Forest classifier outperformed all other evaluated models, yielding an accuracy of 84.2% and an AUC of 0.88. The XAI analyses demonstrated that excessive screen time, nighttime usage, passive scrolling, and stress-driven engagement were consistently associated with increased depression and anxiety risk in both XAI models. These insights provide further evidence in support of the design of an interpretable behavioral pathway through which social media habits influence mental health outcomes. Apart from predictive modeling, the work introduced a conceptual digital intervention framework utilizing model explanations to generate personalized behavioral recommendations. A prototype-level simulation was incorporated to demonstrate feasibility, showing how SHAP-driven feature attributions may inform tailored nudges focused on reducing harmful digital behaviors and promoting healthier patterns of engagement. This integration of prediction, explanation, and intervention helps to bridge the gap between algorithmic insight and practical mental health support.

Although the framework holds much promise, the limitations discussed in Sect. 9 point to significant areas for future improvement. Based on the present work, the following directions represent some of the main focuses of future work: First, the datasets used in this study should be augmented with larger, more diverse, and clinically validated data for improved generalizability and clinical relevance. Moreover, work based on textual, linguistic, and temporal signals, including posting frequency, sentiment trajectories, or real-time smartphone sensor data, should enrich the models’ performance with more ecological validity. The second direction will involve advanced model architectures, including transformer-based and multimodal deep learning models, paired with robust XAI techniques like counterfactual reasoning, contrastive explanations, and causal attribution methods that enable deeper and more reliable insights into user behaviors across diverse contexts.

Third, future studies will further enhance the practical value of the intervention framework by designing and testing a fully functional mobile or web-based system. In developing such an intervention platform, its evaluation for acceptability, interpretability, and mental health outcomes should involve user-centered evaluation, usability testing, A/B experiments, and clinical validation. Finally, future research will continue to be guided by ethical, privacy, and data governance considerations. The key will be the development of transparent consent processes, bias mitigation strategies, federated or privacy-preserving learning mechanisms, and secure data pipelines that ensure XAI-driven mental health systems are deployed responsibly and equitably. In all, this work contributes a novel, interpretable, and behaviorally grounded framework for mental health risk detection using social media data. Combining machine learning with explainability and intervention design, it lays the foundation for future research toward safe, transparent, and clinically meaningful digital mental health technologies.

Author contributions

K.L. prepared original draft, text, visualization, methodology, validation. S.R. performed conceptualization, data curation, formal analysis. M.S. worked on figures, handled software, resources and performed validation. All authors reviewed the manuscript and contributed equally.

Funding

This research received no funding.

Data availability

The dataset used in this study is openly accessible on Kaggle at: https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health and https://aclanthology.org/C18-1126/

Declarations

Conflict of interest

The authors declare no competing interests.

Ethical approval

This study uses publicly available, anonymized survey data and does not involve any human or animal experiments requiring ethical review. So, no additional approval was required.

Consent to participate

Not applicable.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Shalli Rani, Email: shallir79@gmail.com.

Mohammad Shabaz, Email: bhatsab4@gmail.com.

References

  • 1.World Health Organization. Mental Health: Strengthening Our Response. 2023. https://www.who.int/news-room/fact-sheets/detail/ mental-health-strengthening-our-response.
  • 2.Guntuku SC, Yaden DB, Kern ML, Ungar LH, Eichstaedt JC. Studying expressions of loneliness in individuals using twitter: an observational study. BMJ Open. 2019;9(11):030355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Calvo RA, Milne DN, Hussain MS, Christensen H. Natural language processing in mental health applications using non-clinical texts. Nat Lang Eng. 2017;23(5):649–85. [Google Scholar]
  • 4.Alghowinem S, Goecke R, Wagner M, Epps J, Gedeon T, Breakspear M, Parker G. A comparative study of different classifiers for detecting depression from spontaneous speech. IEEE; 2013, pp. 8022–8026.
  • 5.Amir S, Dredze M, Ayers JW. Mental health surveillance over social media with digital cohorts. Association for Computational Linguistics; 2019, pp. 114–120
  • 6.Giakoumis D, Drosou A, Cipresso P, Tzovaras D, Hassapis G, Gaggioli A, Riva G. Real-time monitoring of behavioural parameters related to psychological stress. In: Annual review of cybertherapy and telemedicine. Amsterdam: IOS Press; 2012, pp. 287–291. [PubMed]
  • 7.Benton A, Mitchell M, Hovy D. Multitask learning for mental health using social media text. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics. 2017, pp. 152–162.
  • 8.Cohan A, Desmet B, Yates A, Soldaini L, MacAvaney S, Goharian N. Smhd: a large-scale resource for exploring online language usage for multiple mental health conditions. In: Proceedings of the 27th international conference on computational linguistics (COLING). 2018, pp. 1485–1497. https://aclanthology.org/C18-1126/.
  • 9.Safa R, Edalatpanah SA, Sorourkhah A. Predicting mental health using social media: a roadmap for future development. In: Chatterjee J, Dey N, Joshi A, editors. Deep learning in personalized healthcare and decision support. Cambridge: Academic Press; 2023. p. 285–303. [Google Scholar]
  • 10.Sahili ZA, Patras I, Purver M. Multimodal machine learning in mental health: a survey of data, algorithms, and challenges. 2024.
  • 11.Arrieta AB, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fus. 2020;58:82–115. [Google Scholar]
  • 12.Oberste L, Heinzl A. User-centric explainability in healthcare: a knowledge-level perspective of informed machine learning. IEEE Trans Artif Intell. 2022;4(4):840–57. [Google Scholar]
  • 13.Joyce DW, Kormilitzin A, Smith KA, Cipriani A. Explainable artificial intelligence for mental health through transparency and interpretability for understandability. NPJ Digit Med. 2023;6(1):6. 10.1038/s41746-023-00765-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Mittelstadt BD, Allo P, Taddeo M, Wachter S, Floridi L. The ethics of algorithms: mapping the debate. Big Data Soc. 2016;3(2):1–21. [Google Scholar]
  • 15.Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning, 2017. arXiv preprint http://arxiv.org/abs/1702.08608.
  • 16.Sokol K, Flach P. Explainability fact sheets: a framework for systematic assessment of explainable approaches. In: Proceedings of the 2020 conference on fairness, accountability, and transparency (FAT* ’20). Barcelona: ACM; 2020, pp. 56–67. 10.1145/3351095.3372870.
  • 17.Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In: Advances in neural information processing systems, 30. 2017.
  • 18.Lundberg SM, Erion GG, Lee S-I. From local explanations to global understanding with explainable ai for trees. Nat Mach Intell. 2020;2(1):56–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016, pp. 1135–1144.
  • 20.Chowdhury SH, Mamun M, Shaikat TA, Hussain MI, Iqbal S, Hossain MM. An ensemble approach for artificial neural network-based liver disease identification from optimal features through hybrid modeling integrated with advanced explainable ai. Medinform. 2025;2(2):107–19. [Google Scholar]
  • 21.Mamun M, Chowdhury SH, Hossain MM, Khatun M, Iqbal S. Explainability enhanced liver disease diagnosis technique using tree selection and stacking ensemble-based random forest model. Inform Health. 2025;2(1):17–40. [Google Scholar]
  • 22.Singh S, Wani NA, Kumar R, Bedi J. Diaxplain: a transparent and interpretable artificial intelligence approach for type-2 diabetes diagnosis through deep learning. Comput Electr Eng. 2025;126:110470. [Google Scholar]
  • 23.Mamun M, Hussain MI, Ali MS, Chowdhury MSA, Chowdhury SH, Hossain MM. An explainable ensemble learning framework with feature optimization for accurate maternal health risk prediction. In: 2025 International conference on quantum photonics, artificial intelligence, and networking (QPAIN) (2025), pp. 1–6. IEEE.
  • 24.Das K, Mamun M, Safat Y, Hussain MI, Hossain MM, Chowdhury SH. Optimized feature-driven dengue diagnosis using explainable machine learning approaches. In: 2025 International conference on quantum photonics, artificial intelligence, and networking (QPAIN). IEEE; 2025, pp. 1–6.
  • 25.Wani NA, Kumar R, Bedi J. Harnessing fusion modeling for enhanced breast cancer classification through interpretable artificial intelligence and in-depth explanations. Eng Appl Artif Intell. 2024;136:108939. [Google Scholar]
  • 26.Saharan S, Wani NA, Chatterji S, Kumar N, Almuhaideb AM. A deep learning and explainable artificial intelligence based scheme for breast cancer detection. Sci Rep. 2025;15(1):32125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Wani NA, Kumar R, Bedi J. Deepxplainer: an interpretable deep learning based approach for lung cancer detection using explainable artificial intelligence. Comput Methods Programs Biomed. 2024;243:107879. [DOI] [PubMed] [Google Scholar]
  • 28.Kerz E, Zanwar S, Qiao Y, Wiechmann D. Toward explainable ai (xai) for mental health detection based on language behavior. Front Psychiatry. 2023;14:1219479. 10.3389/fpsyt.2023.1219479. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Yates A, Cohan A, Goharian N. Depression and self-harm risk assessment in online forums. In: Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP). 2017, pp. 2968–2978.
  • 30.Losada DE, Crestani F, Parapar J. Overview of erisk 2019: Early risk prediction on the internet. In: International conference of the cross-language evaluation forum for european languages (CLEF). Cham: Springer; 2019, pp. 340–357.
  • 31.Yadav S, Ekbal A, Saha S, Bhattacharyya P. Medical sentiment analysis using social media: towards building a patient assisted system. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA); 2018, pp. 2232–2237.
  • 32.Vellido A. The importance of interpretability and visualization in machine learning for applications in medicine and healthcare. Neural Comput Appl. 2019;32:18069–83. [Google Scholar]
  • 33.Rafieian O, Yoganarasimhan H. AI and personalization. In: Artificial intelligence in marketing. 2023, pp. 77–102.
  • 34.Bruhn AL, Rila A, Mahatmya D, Estrapala S, Hendrix N. The effects of data-based, individualized interventions for behavior. J Emot Behav Disord. 2020;28(1):3–16. 10.1177/1063426619852223. [Google Scholar]
  • 35.Chowdhury SH, Hussain MI, Chowdhury MSA, Ali MS, Hossain MM, Mamun M. Hepatitis C detection from blood donor data using hybrid deep feature synthesis and interpretable machine learning. In: 2025 2nd International conference on next-generation computing, IoT and machine learning (NCIM). IEEE; 2025, pp. 1–6.
  • 36.Fitzpatrick KK, Darcy A, Vierhile M. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial. JMIR Ment Health. 2017;4(2):19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Ahmed S, Syeda MN. Social media and mental health dataset. 2024. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health.
  • 38.Samayamantri LS, Singhal S, Krishnamurthy O, Regin R. AI-driven multimodal approaches to human behavior analysis. In: Rajest SS, Moccia S, Singh B, Regin R, Jeganathan J, editors. Advancing intelligent networks through distributed optimization. Hershey: IGI Global; 2024. p. 485–506. [Google Scholar]
  • 39.Wang X, Shi Y, Liu L. Research on non-invasive psychological detection technology based on artificial intelligence. Acad J Hum Soc Sci. 2021;4(3):10–6. [Google Scholar]
  • 40.Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv (CSUR). 2018;51(5):1–42. 10.1145/3236009. [Google Scholar]
  • 41.Atewologun F, Adigun OA, Okesanya OJ, Hassan HK, Olabode ON, Micheal AS, et al. A comprehensive review of mental health services across selected countries in sub-saharan africa: assessing progress, challenges, and future direction. Discover Ment Health. 2025;5(1):1–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Olawade DB, Wada OZ, Odetayo A, David-Olawade AC, Asaolu F, Eberhardt J. Enhancing mental health with artificial intelligence: current trends and future prospects. J Med Surg Public Health. 2024;3:100099. 10.1016/j.jomasuph.2024.100099. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The dataset used in this study is openly accessible on Kaggle at: https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health and https://aclanthology.org/C18-1126/


Articles from Discover Mental Health are provided here courtesy of Springer

RESOURCES