Abstract
Purpose
Social networks have been developed as a great point for its users to communicate with their interested friends and share their opinions, photos, and videos reflecting their moods, feelings and sentiments. This creates an opportunity to analyze social network data for user’s feelings and sentiments to investigate their moods and attitudes when they are communicating via these online tools.
Methods
Although diagnosis of depression using social networks data has picked an established position globally, there are several dimensions that are yet to be detected. In this study, we aim to perform depression analysis on Facebook data collected from an online public source. To investigate the effect of depression detection, we propose machine learning technique as an efficient and scalable method.
Results
We report an implementation of the proposed method. We have evaluated the efficiency of our proposed method using a set of various psycholinguistic features. We show that our proposed method can significantly improve the accuracy and classification error rate. In addition, the result shows that in different experiments Decision Tree (DT) gives the highest accuracy than other ML approaches to find the depression.
Conclusions
Machine learning techniques identify high quality solutions of mental health problems among Facebook users.
Keywords: Social network, Emotions, Depression, Sentiment analysis
Introduction
The proliferations of internet and communication technologies, especially the online social networks have rejuvenated how people interact and communicate with each other electronically. The applications such as Facebook, Twitter, Instagram and alike not only host the written and multimedia contents but also offer their users to express their feelings, emotions and sentiments about a topic, subject or an issue online. On one hand, this is great for users of social networking site to openly and freely contribute and respond to any topic online; on the other hand, it creates opportunities for people working in the health sector to get insight of what might be happening at mental state of someone who reacted to a topic in a specific manner. In order to provide such insight, machine learning techniques could potentially offer some unique features that can assist in examining the unique patterns hidden in online communication and process them to reveal the mental state (such as ‘happiness’, ‘sadness’, ‘anger’, ‘anxiety’, depression) among social networks’ users. Moreover, there is growing body of literature addressing the role of social networks on the structure of social relationships such as breakup relationship, mental illness (‘depression’, ‘anxiety’, ‘bipolar’ etc.), smoking and drinking relapse, sexual harassment and for suicide ideation [1, 2].
In this study, we aim to analyze Facebook data to detect any factors that may reflect the depression of relevant Facebook’s users. Various machine learning techniques are employed for such purpose. Considering the key objective of this study, the following are subsequent research challenges addressed in paper.
Define what depression is and what are the common factors contributing toward depression.
What are the factors to look for depression detection in Facebook comments?
How to extract these factors from Facebook comments?
What is the relationship between these factors and attitudes toward depression?
When is the most influential time to communicate within depressive Indicative Facebook user?
What are the most influential machine learning techniques for detection of depression in Facebook comments?
In the context of above mentioned challenges, we analyse depression from Facebook users’ data [3, 4]. As users express their feeling as a post or comments in the Facebook platform, sometimes their posts and comments refer to as emotional state such as ‘joy’, ‘sadness’, ‘fear’, ‘anger’, or ‘surprise’ [5, 6]. We analyze various features of Facebook comments by collecting data through an effective method of machine learning classification techniques and to make overall judgements regarding their various parts. In this study, we used publically available Facebook data (from bipolar, depression and anxiety Facebook page) containing users’ comments. Once we access the data, it was cleaned from any inconsistency and then analyzed by a software application called LIWC [7, 8].
In this study, we examine various linguistic cues which help to detect emotion cause events: the position of cause event and experiencer relative to the emotion keyword: emotional process like positive emotion (e.g. ‘happy’, ‘love’, ‘nice’), negative emotion (e.g. ‘worthless’, ‘loser’, ‘hurt’, ‘ugly’, ‘nasty’), sadness (e.g. ‘worry’, ‘crying’, ‘grief’, ‘sad’), anger (e.g. ‘stop’, ‘shit’, ‘hate’, ‘kill’, ‘annoyed’) and anxiety (e.g. ‘worried’, ‘fearful’). A temporal process like present focus (e.g. ‘today’, ‘is’, ‘now’), past focus (e.g. ‘ago’, ‘did’, ‘talked’) and future focus (e.g. ‘shall’, ‘may’, ‘will’, ‘soon’). Linguistic words like articles (e.g. ‘a’, ‘an’, ‘the’), prepositions (e.g. ‘for’, ‘in’, ‘of’, ‘to’, ‘with’, ‘above’), auxiliary verbs (e.g. ‘do’, ‘have’, ‘am’, ‘will’), conjunctions (e.g. ‘and’, ‘but’, ‘whereas’), personal pronoun (e.g. ‘I’, ‘them’, ‘her’, ‘him’), impersonal pronouns (e.g. ‘it’, ‘it’s’, ‘those’), verbs (e.g. ‘go’, ‘good’) and negation (e.g. ‘deny’, ‘dishonest’, ‘no’, ‘not’, ‘never’).
The main contributions of this paper are listed as follows:
First, we synthesized the literature on various emotion detection techniques to detect depression.
Second, we designated four features for our specific research problem and elaborate on the lesson learned from using each type.
Third, our experiments are carried out on datasets of Facebook user comments.
Fourth, we suggest machine learning techniques to utilize all factors and maintain robustness. We also identify that a Decision Tree classifier outperforms other classifiers (a SVM, KNN and Ensemble) for our dataset. Finally, our work also shows the importance of depression detection for mental disorder detection.
The remainder of the paper is organized as follows: “Related work” presents the related work of detecting depression analysis of social network data. Methodology is explained in the third section. The experimental analysis is presented in the fourth section, and its discussion in the fifth section. Finally, the conclusion and future work are provided in the last section.
Related work
There is growing body of literature that analyses the properties of depression [9–12]. Choudhury et al. [13] argue that depression constitutes a genuine test in individual and general wellbeing. Considerable number of individuals experiences the ill-effects of despondency and just a division gets sufficient treatment every year. They also investigated the possibility to utilize online networking to identify and analyze any sign of significant depression issue in people. Through their web-based social networking postings, they quantified behavioral credits identifying with social engagement, feeling, dialect and semantic styles, sense of the self-system, and notices of antidepressant medications.
Choudhury et al. [14] considered online networking as a promising instrument for public health, concentrating on the utilization of Twitter presents on fabricating predictive models about the forthcoming impact of childbirth on the conduct and disposition of new mothers. Utilizing Twitter posts, they measured postpartum changes in 376 mothers along measurements of social engagement, feeling, informal community, and phonetic style. O’Dea et al. [15] examined that Twitter is progressively researched as methods for recognizing psychological well-being status, including depression and suicidality in the population. Their investigation revealed that it is conceivable to recognize the level of worry among suicide-related tweets, utilizing both human coders and a programmed machine classifier.
Zhang et al. [16] have shown that if individuals with a high danger of suicide can be recognized through online networking like microblog, it is conceivable to actualize a dynamic intervention system to save their lives.
Many researchers have demonstrated that utilizing user-created content (UGC) accurately may help decide individuals’ psychological wellness levels. For instance, Aldarwish and Ahmad [17] examined that the utilization of Social Network Sites (SNS) is expanding these days, particularly by the more youthful eras. Because the accessibility of SNS enables clients to express their interests, sentiments and offer day by day schedule [18, 19].
Nguyen et al. [20] utilized machine learning and statistical strategies to separate online messages amongst depression and control groups utilizing temperament, psycholinguistic procedures and substance subjects removed from the posts created by individuals from these groups.
Park et al. [21] investigated states of mind and practices toward online web-based social networking in view of whether one is discouraged or not. They directed semi-organized up close and personal meetings with 14 dynamic Twitter users, half of whom were discouraged and the other half non-discouraged. Other than they examined a few plan implications for future social networks that could better suit users with depression and give bits of knowledge towards helping discouraged users address their issues through online web-based social networking [22].
Bachrach et al. [23] studied how user’s activity on Facebook identifies with their identity, as measured by the standard Five Factor Model. They analyzed relationships between user’s identity and the properties of their Facebook profiles. For instance, the size and thickness of their friendship network, number of transferred photographs, and number of occasions went to, number of gathering enrolment’s, and number of times the user has been tagged in photographs. Ortigosa et al. [24] have exhibited a new strategy for sentiment examine in Facebook that suggests that starting from messages composed by users, as to extract data about the users’ assessment extremity (positive, unbiased or negative), as transmitted in the messages they write; and to show the users’ standard conclusion extremity and to distinguish huge passionate changes.
In the context of Facebook mining, Holleran [9] found initial evidence that depression is a major contributor to the overall global burden of diseases. In other related work, Wang et al. [19] and Shen et al. [25] examined various depression-related features, and built a multimodal depressive model to detect the depressed users.
Although, some of the above reported work has discussed emotional process, temporal process, linguistic style to detect depression, the following shortcomings are observed in the existing literature:
There are few individual studies that have applied SVM, KNN, Decision Tree and Ensemble separately. There are no well-known studies that have combined all these techniques together at same dataset to investigate the variations in technique-based findings.
There is no significant study that has applied the above-mentioned machine learning techniques on Facebook data for depression detection.
To address the above-listed shortcomings, we make an attempt to detect depression from Facebook comments with the present work; we expand the scope of social media-based depression measures, describing the different features of Facebook user comments. We applied machine learning approaches that can use those measures for the detection of individuals who are suffering with depression.
Methodology
In this study, we first focused on four types of factors such as emotional process, temporal process, linguistic style and all (emotional, temporal, linguistic style) features together for the detection and processing of depressive data received as Facebook posts. We then apply supervised machine learning approaches to study each factor types independently. The classification techniques such as ‘decision tree’, ‘k-Nearest Neighbor’, ‘Support Vector Machine’, and ‘ensemble’ are deemed suitable for each type (refer to Fig. 1).
Data set exploration
We worked on Facebook users’ comments for depressive behavioral exploration and detection. We collected data from the social network [26]. Preparing of social network data, in particular Facebook user’s comments is one of the primary challenges which bear information on whether or not they could contain depression bearing content. To tackle this issue we use NCapture for collecting data from Facebook [27, 28]. For qualitative data analysis, NCapture is a powerful tool in the world today. It is intended to enable to arrange, break down and discover knowledge in unstructured data like open-ended survey responses, social media, interviews, articles and web content. Furthermore it gives a place to arrange and deal with material to discover knowledge in a more proficient way [29].
Data set preparation
After collecting the raw data from Facebook, it was analyzed by using LIWC Software [7, 8]. LIWC is the heart of the text analysis strategy and can process text on a line by line. Our primary dataset contains total 21 columns where 13 columns represent the linguistic style (articles, prepositions, auxiliary verbs, conjunctions, personal pronoun, impersonal pronouns, verbs, negation etc.) information, 5 columns represent the emotional (positive, negative, sad, anger and anxiety) information, 3 columns represent the temporal process (past, present and future) information and each column gives the individual information’s about depressive behavior (refer to Table 1).
Table 1.
Characteristic | Quantity |
---|---|
Total number of cells | 150,045 |
Total number of depressive indicative cells based on our manual response (with zero values) | 87,129 |
Total number of non-depressive indicative cells based on our manual response | 62,916 |
Total number of depressive indicative cells based on our manual response for linguistic style (without zero value) | 43,551 |
Articles | 3200 |
Prepositions | 3842 |
Auxiliary verbs | 3813 |
Conjunctions | 3619 |
Personal pronoun | 3875 |
Impersonal pronoun | 3444 |
Verbs | 4004 |
Negations | 2637 |
Pronoun | 3989 |
Adverb | 3407 |
Adjective | 3342 |
Comparative | 2436 |
Interrogative | 1943 |
Total number of depressive indicative cells based on our manual response for emotional process (without zero value) | 13,884 |
Positive | 2676 |
Negative | 4149 |
Sad | 1733 |
Anger | 1177 |
Anxiety | 4149 |
Total number of depressive indicative cells based on our manual response for temporal process (without zero value) | 8237 |
Past | 2527 |
Present | 3909 |
Future | 1801 |
Building ground truth dataset
This section discusses the process employed to construct our dataset with ground truth label information (on whether the comments is depression indicative). The Facebook data containing users’ comments were divided into two sets (a) for the positive (YES) class (depression indicative comments) and (b) for the negative (NO) class (non-depression indicative comments).
Out of the total 7145 comments, 58% obtained ‘YES’ for depression indicative comments and 42% obtained ‘NO’ for non-depressive indicative comments. Table 1 illustrated the dataset information and a few examples of depression indicative comments are given in Tables 2 and 3.
Table 2.
Data set information | Quantity |
---|---|
Total number of Facebook users comments | 7145 |
Depression indicative comments | 4149 |
Non-depression indicative comments | 2996 |
Table 3.
Examples | Response |
---|---|
I am currently having the problem of restlessness and needing to move but I also don’t feel like moving | YES |
I feel sad and con not concentrate in my studies | YES |
I find faults in all the people around me and I feel lonely and alone | YES |
My daughter started on depakote at age 16. She did ok but, when she started lithium things changed for the better. Even she recognized the change and gets upset if a Dr. Wants to take her off lithium. Everyone and every MD is different | YES |
I hate the fact that I know some of my triggers but can’t avoid them…l have to just keep up the exposure as I’ve been told this is better than isolating myself in fear | YES |
I’m having a terrible day. Angry at everyone. Been so depressed now for more than 30 days in a row. Hiding in my bedroom away from people. Pushing my friends away. I’m trying to fix the urge to cut but fear I’m not strong enough to keep ignoring the call of the blade. Please I need help | YES |
Put an alarm on your phone I need to again it works | NO |
I use to use rubbing alcohol and worked whl younger but dint give a rats ass now still get teased by it by insecure men. But they can go fuck themselves | NO |
Story of my life. I struggle with these things daily | NO |
I take Latuda at night because it makes me sleepy and xanax throughout the day for anxiety | NO |
Feature extraction
To describe and demonstrate amongst depressive and non-depressive posts, we extract the different features in view of psycholinguistic measurements from the user’s post. It is clarified briefly as follows:
Psycholinguistic features LIWC is a psycholinguistic vocabulary package made by psychological analysts to perceive the different affective, intellectual, and etymological parts lies on user’s verbal or written correspondence. It returns more than 70 different factors with higher level of psycholinguistic features, for example,
Psychological process—affective process, social process, cognitive process, perceptual process, biological process, drives, time orientations, relativity, personal concerns
Linguistic process—word count, word/sentence, pronoun, personal pronoun, articles, prepositions, auxiliary verbs, adverbs, conjunctions, Negations
Others grammar—verbs, adjectives, comparisons, interrogatives, number, quantifiers.
These higher-level categories are also divided into subcategories such as
Biological processes—sexual, body, ingestion and health
Affective processes—anxiety, anger, sadness, positive emotion, negative emotion
Time orientations—present, past, future
Social processes—family, friends, male, female
Perceptual processes—see, hear, feel.
To do our research work, we took 23 among 70 factors, and changed over every depressive and non-depressive post into numerical values in view of psycholinguistic features. Table 4 demonstrates the different classes of LIWC psycholinguistic processes.
Table 4.
LIWC derived cues | Example word |
---|---|
Emotional process | |
Positive emotion words | Happy, love, nice, sweet |
Negative emotion words | Worthless, loser, hurt, ugly, ‘nasty’ |
Sadness words | Worry, crying, grief, sad |
Anger words | Stop, shit, hate, kill, annoyed |
Anxiety words | Worried, fearful |
Temporal process | |
Present focus | Today, is, now |
Past focus | Ago, did, talked |
Future focus | Shall, may, will, soon |
Linguistic style | |
Articles | A, an, the |
Prepositions | For, in, of, to, with, above |
Auxiliary verbs | Do, have, am, will |
Adverbs | Quickly, slowly, very, really |
Conjunctions | And, but, whereas |
Total pronouns | I, them, itself |
Personal pronoun | I, them, her |
1st person singular pronoun | I, me, mine |
1st person plural pronoun | We, us, our |
2nd person | You, your |
3rd person singular pronoun | He, she, her, him |
3rd person plural pronoun | They, their, they’d |
Impersonal pronouns | It, it’s, those |
Verbs | Go, good |
Negation | Deny, dishonest, no, not, never |
Measuring depressive behavior
We presented a set of attributes like emotional process, temporal process, and linguistic style that can be used to characterize the depressive behaviors of users. Our dataset consists of five emotional variables (positive, negative, sad, anger, anxiety), three temporal categories (present focus, past focus and future focus), and 9 standard linguistic dimensions (e.g., articles, prepositions, auxiliary verb, adverbs, conjunctions, pronoun, verbs and negations) [30–36]. We calculate their values by the standard LIWC2015 scales. A complete list of the standard LIWC2015 scales including examples of our dataset is included in Table 4.
Emotional processes Emotion process, a complex experience of consciousness, bodily sensation, and behaviour that reflects the personal significance of a thing, an event, or a state of affairs. The analysis of the emotional comments of social network data can be leveraged to produce reliable predicts in a variety of circumstances [25]. We use psycholinguistic dimensions for considering five features of the emotion state manifested in the comments: positive affect (PA), negative affect (NA), sadness affect (SA), anger affect (AA), and anxiety affect (AnA) [37–41].
Temporal process
Generally, temporal process word provides information about past focus category, present focus category and future focus category of how people are referencing each other and their degree of emotionality.
Linguistic process
Linguistics process is one of the largest parts of LIWC psycholinguistics vocabulary package. It was intended to quantify word use in mentally significant classifications. Also it has been effectively used to recognize connections between people in social co-operations, including relative status, trickiness, and the nature of close relationship. So, In our study we use nine specific linguistics features (articles, prepositions, auxiliary verbs, adverbs, conjunctions, personal pronoun, impersonal pronouns, verbs, and negations) to characterize user comments for our experimental analysis.
Classification model
This stage constructs prediction model for depression post/comments recognition, by considering the psycholinguistic features as input. Considering our training corpus B = p1; p2….. pn of n posts/comments, such that each post/comments pi is labeled with the class either as depressive or non-depressive, where L = l1|l2. The task of a classifier f is to find the corresponding label for each posts/comments.
In this work, we employ four popular classifiers: Support Vector Machine (SVM), Decision Tree, Ensemble, and k-Nearest Neighbor (kNN).
Support Vector Machines (SVM) Support Vector Machines also known as support vector networks. It is a non-probabilistic linear binary classifier that analyzes data for classification or anomaly detection. It builds a hyperplane into high dimensional feature space and finds a hyperplane that isolates the data into two classes with the biggest separation to the closest training data purpose of any class.
Decision Tree (DT) Decision tree is a simple and all around used classification based systematic approach that makes the hierarchical tree from the training dataset. The state of decision tree is to divide the data hierarchically that have different characteristics. For instance of text documents classification, roots are commonly identified in terms and internal individual nodes may be sub-divided to its children in view of the yes or no of a term in the document.
Ensemble Ensemble methods use multiple learning algorithms of decision tree for better predictive performance.
K-Nearest Neighbor (KNN) K-Nearest Neighbor (KNN) is a non-parametric approach use to discover the distances from point of interest to points in training set.
Experimental analysis
In this study, we examine the execution of various classifiers for depression detection in a shorter time.
Data analysis
The analysis is conducted using MATLAB 2016b. We applied four major classifiers: Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Decision trees (DT), and Ensemble. Each classifier has sub-classifiers such as Decision trees—Simple DT, Medium DT, and Complex DT; SVM—Linear, Quadratic, Cubic, Fine Gaussian, Medium Gaussian, and Coarse Gaussian; KNN—Fine, Medium, Coarse, Cosine, Cubic and Weighted, Ensemble—Boosted tree, Bagged tree, Subspace discriminant, Subspace KNN, RUSBoosted Tree [42–44].
Using the above classification techniques, we examined detection performance of Facebook user comments. To comprehend the significance of different feature types, we applied four classifiers techniques each utilizing: emotional process, linguistic style, temporal process and all features. The results of the analysis are reported in Tables 5 and 6 that suggests Decision Tree as best performing model. Although KNN gives the high precision but Decision Tree gives the highest result for recall and F-measure relating to the class of depression indicative comments of Facebook user. Similarly, for linguistic style Decision Tree gives the highest result for precision, recall and F-measure.
Table 5.
Feature | Emotional process | Linguistic style | ||||
---|---|---|---|---|---|---|
Decision Tree | ||||||
Algorithm | Pr | Re | Fm | Pr. | Re. | Fm. |
Complex Tree | 0.59 | 0.85 | 0.69 | 0.58 | 0.86 | 0.69 |
Medium Tree | 0.58 | 0.96 | 0.73 | 0.58 | 0.97 | 0.73 |
Simple Tree | 0.59 | 0.97 | 0.73 | 0.58 | 0.99 | 0.73 |
K Nearest Neighbors (KNN) | ||||||
Fine KNN | 0.59 | 0.59 | 0.59 | 0.58 | 0.58 | 0.58 |
Medium KNN | 0.59 | 0.59 | 0.59 | 0.59 | 0.55 | 0.57 |
Coarse KNN | 0.59 | 0.88 | 0.71 | 0.59 | 0.80 | 0.70 |
Cosine KNN | 0.58 | 0.59 | 0.58 | 0.59 | 0.60 | 0.60 |
Cubic KNN | 0.59 | 0.59 | 0.59 | 0.60 | 0.54 | 0.57 |
Weighted KNN | 0.58 | 0.62 | 0.60 | 0.59 | 0.65 | 0.62 |
Support Vector Machine (SVM) | ||||||
Linear SVM | 0.58 | 1 | 0.73 | 0.58 | 1 | 0.73 |
Quadratic SVM | 0.57 | 0.81 | 0.67 | 0.58 | 1 | 0.73 |
Cubic SVM | 0.58 | 0.86 | 0.69 | 0.58 | 0.91 | 0.71 |
Fine Gaussian SVM | 0.58 | 0.88 | 0.70 | 0.59 | 0.90 | 0.71 |
Medium Gaussian SVM | 0.58 | 0.99 | 0.73 | 0.58 | 0.99 | 0.73 |
Coarse Gaussian SVM | 0.58 | 1 | 0.73 | 0.58 | 1 | 0.73 |
Ensemble classifiers | ||||||
Ensemble Boosted Tree | 0.58 | 0.96 | 0.73 | 0.58 | 0.99 | 0.73 |
Ensemble Bagged Tree | 0.59 | 0.68 | 0.63 | 0.58 | 0.68 | 0.63 |
Ensemble Subspace Discriminant | 0.58 | 0.99 | 0.73 | 0.58 | 0.99 | 0.73 |
Ensemble Subspace KNN | 0.59 | 0.63 | 0.61 | 0.59 | 0.66 | 0.62 |
Ensemble RUSBoosted Tree | 0.62 | 0.44 | 0.51 | 0.61 | 0.40 | 0.48 |
Highest value of some sub-classifiers are given in bold
Table 6.
Feature | Temporal process | All features | ||||
---|---|---|---|---|---|---|
Decision Tree | ||||||
Algorithm | Pr. | Re. | Fm. | Pr. | Re. | Fm. |
Complex Tree | 0.58 | 0.90 | 0.71 | 0.59 | 0.84 | 0.69 |
Medium Tree | 0.58 | 0.97 | 0.73 | 0.58 | 0.90 | 0.70 |
Simple Tree | 0.58 | 0.99 | 0.74 | 0.58 | 0.98 | 0.73 |
K Nearest Neighbors (KNN) | ||||||
Fine KNN | 0.57 | 0.58 | 0.58 | 0.59 | 0.57 | 0.58 |
Medium KNN | 0.58 | 0.57 | 0.57 | 0.59 | 0.53 | 0.56 |
Coarse KNN | 0.58 | 0.89 | 0.70 | 0.59 | 0.77 | 0.67 |
Cosine KNN | 0.59 | 0.58 | 0.59 | 0.60 | 0.60 | 0.60 |
Cubic KNN | 0.58 | 0.57 | 0.57 | 0.59 | 0.52 | 0.55 |
Weighted KNN | 0.57 | 0.59 | 0.58 | 0.59 | 0.64 | 0.61 |
Support Vector Machine (SVM) | ||||||
Linear SVM | 0.58 | 0.1 | 0.73 | 0.58 | 1 | 0.73 |
Quadratic SVM | 0.58 | 0.76 | 0.66 | 0.58 | 0.99 | 0.73 |
Cubic SVM | 0.57 | 0.81 | 0.67 | 0.58 | 0.70 | 0.63 |
Fine Gaussian SVM | 0.58 | 0.86 | 0.69 | 0.59 | 0.94 | 0.72 |
Medium Gaussian SVM | 0.58 | 0.99 | 0.73 | 0.58 | 0.98 | 0.73 |
Coarse Gaussian SVM | 0.58 | 1 | 0.73 | 0.58 | 1 | 0.73 |
Ensemble Classifiers | ||||||
Ensemble Boosted Tree | 0.58 | 0.97 | 0.73 | 0.58 | 0.96 | 0.72 |
Ensemble Bagged Tree | 0.59 | 0.69 | 0.63 | 0.59 | 0.70 | 0.64 |
Ensemble Subspace Discriminant | 0.58 | 0.99 | 0.73 | 0.58 | 0.99 | 0.73 |
Ensemble Subspace KNN | 0.58 | 0.65 | 0.61 | 0.59 | 0.66 | 0.62 |
Ensemble RUSBoosted Tree | 0.61 | 0.46 | 0.53 | 0.63 | 0.41 | 0.50 |
Highest value of some sub-classifiers are given in bold
The evaluation matrices parameters (precision, recall and F-measure) have been used to execute these classifiers. It has conducted based on four different ways. True Positive (TP) = the depression cases that are positive and anticipated as positive True Negative (TN) = the depression cases that are negative and anticipated as negative. False Negative (FN) = the depression cases that are positive but anticipated to be negative. False Positive (FP) = the depression cases that are actually negative but anticipated to be positive.
All the evaluation metrics are defined as follows.
Precision is the proportion of true positives to the cases that are anticipated as positive. It is the level of chosen cases that are right.
Recall is the proportion of true positives to the cases that truly positive. It is the level of chosen cases that are selected.
F-measure is the mean of Precision and Recall. It takes both false positives and false negatives into a record. F-measure is calculated as:
The experimental evaluation is done by using 10-fold cross-validation on all level of testing data set. For each classifier, we demonstrate the estimation of its sub-classifier which holds on to high F-measure (refer to Tables 5 and 6). Table 5 demonstrates the sub classifier of the primary classifier that has maximum F-measure values. For example, Coarse Gaussian SVM give better results as compared to Medium Gaussian SVM. Similarly, we have selected the F-measure for other classifiers too. Results are discussed in Fig. 2.
Similarly, we show the results of these prediction models in Table 6. The outcome shows that the best performing model is Decision Tree. Here, for temporal process and all features, KNN and SVM gives almost same the high precision but Decision Tree gives the highest result for Recall and F-measure relating to the class of depression indicative comments of Facebook user (refer to Fig. 2)
Time series analysis
In general, the definition of time series is a record of phenomenon unpredictably fluctuating with time is called time series and Time-series data is a kind of fleeting information which is normally high dimensional and large in data size. In this section, we focus on understanding of the time patterns of Facebook users, at AM (0 h-0 m-1 s–11 h-59 m-59 s) and PM (12 h-0 m-1 s–23 h-59 m-59 s). We use the same data as mentioned in Table 7 to study levels of depression detection among Facebook users. As in correspondence with our findings in Table 7, we see that the AM and Yes values for all of the features are higher than PM.
Table 7.
Time series dataset information | Quantity |
---|---|
Total number of Facebook users comments | 7145 |
Depression indicative comments (Yes) | 4149 |
Non-depression Indicative comments (No) | 2996 |
Total | |
AM | |
Total | 3914 |
Yes | 2247 |
No | 1667 |
PM | |
Total | 3231 |
Yes | 1902 |
No | 1329 |
August | |
AM | |
Total | 3474 |
Yes | 1995 |
No | 1479 |
PM | |
Total | 3038 |
Yes | 1779 |
No | 1259 |
September | |
AM | |
Total | 440 |
Yes | 252 |
No | 188 |
PM | |
Total | 193 |
Yes | 123 |
No | 70 |
From the above mentioned table, we can see that depression indicates that 2247 out of 7145 comments suffering from depression at AM [5]. We observed that depression rate higher at AM than PM because that time because of loneliness, break from work, absence of vitality, or different communications between light/darkness and the nervous system. Next, we analyses our dataset over the month of a year. We already mentioned that we have collected data of August and September month. The monthly patterns of depression for Facebook users with high depression rate (see Table 7 for details) are shown in Fig. 3. We observe the monthly depression pattern. It is showing that we have total 7146 number of comments. 91.14% samples of comments ranging between August 01, 2017–August 30, 2017, and 8.85% between September 01, 2017–September 09, 2017.
We also show an average trend for all comments, combining all times (denoted as “AM” and “PM”). We observed that the hourly depression patterns with highest rate in AM. We also observed that the monthly depression pattern shows a seasonal trend, with highest depression observed during both August and September in the AM, while the lowest during both August and September in the PM. In addition, we observe that depression problem among Facebook users are more associated with personal problems than others. In summary, we view persistent rhythms in depression expression on social media throughout multi day crosswise over people. This provides us with a promising mechanism to monitor fine-grained temporal trends of depression across populations and regions.
Discussion
For a better understanding of the general intuition behind depression, in this paper, we applied Decision Tree, KNN, SVM and Ensemble classifier techniques for depression detection of emotional terms. We showed that all of these classification techniques based on linguistic style, emotional process, temporal process and all (Linguistic, emotional and temporal) features are able to successfully extract the depressive emotional result. Tables 5 and 6 demonstrate the results of various characterizations with various proportions of four features. It can be observed that Decision Tree gives the better outcome. We believe that the current study has laid the ground for future research on inferences and discovery of additional information based on cause-event relation, such as detection of implicit emotion or cause, as well as prediction of public opinion based on cause events, etc. Moreover, in this paper, we applied total 21 types of attributes of LIWC software for detecting depression, but we can apply more than 54 attributes. Though we achieved accuracy between 60 and 80%; there is still some room for improvement. It is important to note that this study does not identify who the sufferers are; but assess the Facebook comments for depression detection.
Conclusion and future work
In this paper we have exhibited the capability of using Facebook as a tool for measuring and detecting major depression among its users. To give a clear understanding of our work, numbers of research challenges were stated at the start of this paper. The analytics performed on the selected dataset, provide some insight on the research challenges. Below is the summary of our findings:
What depression is and what are the common factors contributing toward depression.
While we feel moody, sad or low from time to time, few people encounter these emotions seriously, for drawn out stretches of time (weeks, months or even years) and in some cases with no apparent reason. Despondency is something other than a low state of mind—it’s a genuine condition that influences someone’s physical and emotional feelings.
Depression can influence any of us anytime. However, some phases or events make us more vulnerable to depression. Physical and emotional changes associated with growing-up, losing a loved one, beginning a family, retirement may trigger some emotional influx that could lead toward depression for few people.
What are the factors to look for depression detection in Facebook comments?
It is important to remember that depressive emotions have several signs and symptoms spread across various categories as reported in Table 8.
Table 8.
Signs | Symptoms |
---|---|
Behaviour | Not going out any longer |
Not completing things at work | |
Not doing regular charming exercises | |
Unfit to focus | |
Feelings | Overwhelmed |
Blameworthy | |
Irritate | |
Disappointed | |
Unlucky | |
worried | |
Thoughts | He is winner |
It’s my pleasure | |
Nothing good ever happens to me | |
He was unlucky | |
Life is not the bed of roses | |
He would not be able to work without me | |
Physical | Tired |
Illness | |
Headaches | |
Depression problem | |
Misfortune |
Based on the above mentioned signs and symbols we divided our dataset into 5 emotional variables (positive, negative, sad, anger, anxiety), 3 temporal categories (present focus, past focus and future focus), 9 standard linguistic dimensions (e.g., articles, prepositions, auxiliary verb, adverbs, conjunctions, pronoun, verbs and negations) (See the section III) .
How to extract these factors from Facebook comments?
To extract the above-mentioned factors, we applied Linguistic Inquiry and Word Count (LIWC) on our dataset. The LIWC2015 Dictionary is the heart of the text analysis strategy. It processes our Facebook comments on a ‘line by line’ basis within and across columns of spreadsheet and accesses a single text within a spreadsheet and analyzes each line sequentially and reads one target word at a time.
What is the relationship between these factors and attitudes toward depression?
The relationship between the above-mentioned issues with the attitudes towards depression are varies from person to person. Tables 3 and 4 presents information on depressive and non-depressive condition of Facebook users.
When are the most influential time to communicate within depressive Indicative Facebook user?
In this study, we analyzed 7146 depressive indicative Facebook comments to identify the most influential time. We got 54.77% depressive indicative Facebook users communicate with their friends from midnight to midday and 45.22% from midday to midnight. To prove the findings of above mentioned results, we observed that at AM, most of the depressive indicative Facebook users feels loneliness, stress, lack of energy, or other interactions between light/darkness, interactions problem between family members, suffering from physical problems and the nervous system.
What are the most influential machine learning techniques for detection of depression in Facebook comments?
We utilize different machine learning techniques to assess the execution learned by three unique features. Tables 5 and 6 demonstrate the results of various characterizations with proportions of these features. Firstly, as shown in Table 5, we break down the emotional process and linguistic style and it is evident that decision tree gives the better outcome. Moreover, in Table 6, we examined the temporal procedure and all (emotional, linguistic, temporal together) features. It can be observed that SVM fundamentally performed superior to another classifier. But, for features precision, recall and F-measure calculation decision tree gives the most astounding outcome.
In summary, we studied three types of factors (emotional process, temporal process, linguistic style) and trained a model to utilize each type of factor independently and jointly. We use machine learning techniques to classify the features of comments. Our findings showed that all of the classifiers results are almost between 60 and 80%.
In future work, we plan to use another technique to extract paraphrases from more types of emotional features. Also, we plan to use more dataset to verify our techniques efficiency and effectiveness. We in agreement with the existing body of literature that suggests that more focused studies in depression analysis are needed. Although LIWC has more than 50 attributes, we took total 21 attributes for detecting depression of Facebook users, the most significant attributes are found to be from emotional process factor.
Footnotes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Scott J. Social network analysis. Thousand Oaks: Sage; 2017. [Google Scholar]
- 2.Serrat Olivier. Knowledge Solutions. Singapore: Springer Singapore; 2017. Social Network Analysis; pp. 39–43. [Google Scholar]
- 3.Mikal J, Hurst S, Conway M. Investigating patient attitudes towards the use of social media data to augment depression diagnosis and treatment: a qualitative study. In: Proceedings of the fourth workshop on computational linguistics and clinical psychology—from linguistic signal to clinical reality. 2017.
- 4.Conway M, O’Connor D. Social media, big data, and mental health: current advances and ethical implications. Curr Opin Psychol. 2016;9:77–82. doi: 10.1016/j.copsyc.2016.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Ofek Nir, Katz Gilad, Shapira Bracha, Bar-Zev Yedidya. Advances in Knowledge Discovery and Data Mining. Cham: Springer International Publishing; 2015. Sentiment Analysis in Transcribed Utterances; pp. 27–38. [Google Scholar]
- 6.Yang Y, et al. User interest and social influence based emotion prediction for individuals. In: Proceedings of the 21st ACM international conference on Multimedia. 2013. New York: ACM.
- 7.Tausczik YR, Pennebaker JW. The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol. 2010;29(1):24–54. doi: 10.1177/0261927X09351676. [DOI] [Google Scholar]
- 8.Pennebaker JW, Francis ME, Booth RJ. Linguistic inquiry and word count: LIWC 2001, vol. 71. Mahway: Lawrence Erlbaum Associates; 2001. p. 2001.
- 9.Holleran SE. The early detection of depression from social networking sites. Tucson: The University of Arizona; 2010. [Google Scholar]
- 10.Greenberg LS. Emotion-focused therapy of depression. Per Centered Exp Psychother. 2017;16(1):106–117. [Google Scholar]
- 11.Haberler G. Prosperity and depression: a theoretical analysis of cyclical movements. London: Routledge; 2017. [Google Scholar]
- 12.Guntuku SC, et al. Detecting depression and mental illness on social media: an integrative review. Curr Opin Behav Sci. 2017;18:43–49. doi: 10.1016/j.cobeha.2017.07.005. [DOI] [Google Scholar]
- 13.De Choudhury M, et al. Predicting depression via social Media. In: ICWSM, vol. 13. 2013. p. 1–10.
- 14.De Choudhury M, Counts S, Horvitz E. Predicting postpartum changes in emotion and behavior via social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. New York: ACM; 2013.
- 15.O’Dea B, et al. Detecting suicidality on Twitter. Internet Interv. 2015;2(2):183–188. doi: 10.1016/j.invent.2015.03.005. [DOI] [Google Scholar]
- 16.Zhang L, et al. Using linguistic features to estimate suicide probability of Chinese microblog users. In: International conference on human centered computing. Berlin: Springer; 2014.
- 17.Aldarwish MM, Ahmad HF. Predicting depression levels using social media posts. In: 2017 IEEE 13th international Symposium on Autonomous decentralized system (ISADS). 2017.
- 18.Zhou J, et al. Measuring emotion bifurcation points for individuals in social media. In: 2016 49th Hawaii international conference on system sciences (HICSS). 2016. Koloa: IEEE.
- 19.Wang Xinyu, Zhang Chunhong, Ji Yang, Sun Li, Wu Leijia, Bao Zhana. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg; 2013. A Depression Detection Model Based on Sentiment Analysis in Micro-blog Social Network; pp. 201–213. [Google Scholar]
- 20.Nguyen T, et al. Affective and content analysis of online depression communities. IEEE Trans Affect Comput. 2014;5(3):217–226. doi: 10.1109/TAFFC.2014.2315623. [DOI] [Google Scholar]
- 21.Park M, McDonald DW, Cha M. Perception differences between the depressed and non-depressed users in Twitter. In: ICWSM, vol. 9. 2013. p. 217–226.
- 22.Wee J, et al. The influence of depression and personality on social networking. Comput Hum Behav. 2017;74:45–52. doi: 10.1016/j.chb.2017.04.003. [DOI] [Google Scholar]
- 23.Bachrach Y, et al. Personality and patterns of Facebook usage. In: Proceedings of the 4th annual ACM web science conference. 2012. New York: ACM.
- 24.Ortigosa A, Martín JM, Carro RM. Sentiment analysis in Facebook and its application to e-learning. Comput Hum Behav. 2014;31:527–541. doi: 10.1016/j.chb.2013.05.024. [DOI] [Google Scholar]
- 25.Shen G, et al. Depression detection via harvesting social media: A multimodal dictionary learning solution. In: Proceeding of the twenty-sixth international joint conference on artificial intelligence (IJCAI-17). 2017. p. 3838–3844.
- 26.https://github.com/ranju12345/Depression-Anxiety-Facebook-page-Comments-Text.
- 27.Bazeley P, Jackson K. Qualitative data analysis with NVivo. London: Sage; 2013. [Google Scholar]
- 28.AlYahmady HH, Alabri SS. Using NVivo for data analysis in qualitative research. Int Interdiscip J Educ. 2013;2(2):181–186. doi: 10.12816/0002914. [DOI] [Google Scholar]
- 29.Bandara, W. Using Nvivo as a research management tool: a case narrative. In: Quality and impact of qualitative research: proceedings of the 3rd international conference on qualitative research in IT & IT in Qualitative Research. 2006.
- 30.Akkın Gürbüz HG, et al. Use of social network sites among depressed adolescents. Behav Inf Technol. 2017;36(5):517–523. doi: 10.1080/0144929X.2016.1262898. [DOI] [Google Scholar]
- 31.Zafarani R, Abbasi MA, Liu H. Social media mining: an introduction. New York: Cambridge University Press; 2014. [Google Scholar]
- 32.Bilgihan A, et al. Consumer perception of knowledge-sharing in travel-related online social networks. Tour Manag. 2016;52:287–296. doi: 10.1016/j.tourman.2015.07.002. [DOI] [Google Scholar]
- 33.Fuchs A, Andrews SS. System, method and computer program product for sharing information in a distributed framework. 2017. Google Patents.
- 34.Weeks BE, et al. Incidental exposure, selective exposure, and political information sharing: integrating online exposure patterns and expression on social media. J Comput Mediat Commun. 2017;22(6):363–379. doi: 10.1111/jcc4.12199. [DOI] [Google Scholar]
- 35.Lane DS, et al. From online disagreement to offline action: how diverse motivations for using social media can increase political information sharing and catalyze offline political participation. Soc Media Soc. 2017 doi: 10.1177/2056305117716274. [DOI] [Google Scholar]
- 36.Yin J, et al. Using social media to enhance emergency situation awareness. IEEE Intell Syst. 2012;27(6):52–59. doi: 10.1109/MIS.2012.6. [DOI] [Google Scholar]
- 37.McDougall MA, et al. The effect of social networking sites on the relationship between perceived social support and depression. Psychiatry Res. 2016;246:223–229. doi: 10.1016/j.psychres.2016.09.018. [DOI] [PubMed] [Google Scholar]
- 38.Weedon J, Nuland W, Stamos A. Information operations and Facebook version 1.0. 2017. p. 27.
- 39.Krebs F, et al. Social emotion mining techniques for Facebook posts reaction prediction. arXiv Preprint. arXiv:1712.03249. 2017.
- 40.Laganas C, McLeod K, Lowe E. Political posts on Facebook: an examination of voting, perceived intelligence, and motivations. Pepperdine J Commun Res. 2017;5(1):18. [Google Scholar]
- 41.Wang Y, Pal A. Detecting emotions in social media: a constrained optimization approach. In: IJCAI. 2015.
- 42.Gilbert E, Karahalios K. Predicting tie strength with social media. In: Proceedings of the SIGCHI conference on human factors in computing systems. New York: ACM; 2009.
- 43.Hosseinifard B, Moradi MH, Rostami R. Classifying depression patients and normal subjects using machine learning techniques. In: 2011 19th Iranian Conference on Electrical Engineering (ICEE). 2011. [DOI] [PubMed]
- 44.Islam MR, Kamal ARM, Sultana N, Moni MA, Islam R. Depression detection using K-Nearest Neighbors (KNN) classification technique. In: International conference on computer, communication, chemical, materials and electronic engineering, February 8–9, 2018, Rajshahi, Bangladesh.