Abstract
Collective emotion has been traditionally evaluated by questionnaire survey on a limited number of people. Recently, big data of written texts on the Internet has been available for analyzing collective emotion for very large scales. Although short-term reflection between collective emotion and real social phenomena has been widely studied, long-term dynamics of collective emotion has not been studied so far due to the lack of long persistent data sets. In this study, we extracted collective emotion over a 10-year period from 3.6 billion Japanese blog articles. Firstly, we find that collective emotion shows clear periodic cycles, i.e., weekly and seasonal behaviors, accompanied with pulses caused by natural disasters. For example, April is represented by high Tension, probably due to starting school in Japan. We also identified long-term memory in the collective emotion that is characterized by the power-law decay of the autocorrelation function over several months.
Introduction
Information and Communication Technology enables large amounts of data related to human behaviors to be collected in milliseconds opening a novel research area of data-driven social sciences [1–3]. In particular, personal opinions and feelings that cannot be known directly from other sources are archived from blogs. In the past, only a few celebrities have been able to express their opinions and feelings typically in a book or magazine form. Nowadays, more and more people are writing articles and share content on the Internet, not only for archival purposes, but also for sharing them in real-time. Since the Internet population has already exceeded three billion and many people post their own texts online, various studies of Web-based phenomena have been conducted since the beginning of the twenty-first century.
Diffusion phenomena on microblogging platforms such as Twitter have been well studied in various languages [4–6]. Bursty behaviors [7] and collective attention [8] have been quantified in the Japanese Twitter space. Furthermore, studies on predicting real-world phenomena through the Internet data are rapidly growing, e.g., stock prices [9, 10], movie box office revenue [11, 12], political polls [13], public health including depression mood [14, 15] and macroeconomic indices [16].
Studies of collective emotion from the Internet are also growing rapidly. Pioneering work of measuring collective emotion on Twitter space in the UK is conducted since 2009 [17]. The diffusion of positive and negative emotions in Twitter has been investigated [18]. In one study, circadian rhythms of positive and negative moods on Twitter were measured for two years [19], and in another study, emotional contagions in Facebook posts were reported in 2014 [20]. Collective emotion and its detection method are well discussed in [21].
Collective emotion and its relation to real social phenomena have been also studied [9, 16, 22, 23]. Gilbert and Karahalios constructed an ‘Anxiety Index’ using blog data from three periods in 2008 and performed a comparison with S&P stock market prices. They found that a one sigma increase of the Anxiety Index corresponds to a 0.4% downturn of S&P prices [9]. Bollen et al. measured emotional mood using Twitter for nine months in 2008 and performed a comparison with the Dow Jones Industrial Average. They found that adding the emotion of calm increased prediction accuracy [22]. The United Nations project found that increases in the emotion of confusion happened about three months ahead of the increase in the unemployment rate in Ireland [16]. Furthermore, collective emotion is found to have greater power in affecting ideology [23], and sometimes on misinformation spreading. Extracting and tracing collective emotion on the Internet seems to be essential for building a safe and secure society.
However, most of the earlier studies focused on collective emotion during relatively short-term, i.e., three years or less. This is since social media has penetrated our daily lives only about 10 years ago, e.g., Facebook officially launched in 2006, Twitter began to spread in early 2008, and Instagram was not released until 2010. Therefore, only a few studies on long-term dynamics of collective emotion have been conducted [24, 25] and in particular, the possibility of long-term memory in collective emotion have attracted very little attention so far.
In the present study, we analyzed 3.6 billion blog articles posted during a 10-year period in Japan, from 2006 to 2016. To the best of our knowledge, the 10-year period is the longest period for which emotions have been extracted from the Internet. Our pre-built emotional dictionary was carefully tested with regard to whether the frequency of each listed word was adequate and to whether the listed words were actually affiliated with the emotions of the blog authors.
Our paper is structured as follows. First, we provide a definition of collective emotion used here and compare it with the definition used in earlier studies in Materials and methods. Also, we introduce our data and statistical procedures in this section. We then provide our results regarding the accumulation of collective emotion from blogs. Next, we show the existence of periodic cycles in collective emotion. After removing these periodic cycles, sharp spikes attributed to external events such as natural disasters have been observed. Finally, we discuss the long-term memory of collective emotion which we found using basic statistical methods.
Materials and methods
To quantify collective emotion for long-term, we examined the Japanese blog space that has been widely used since around 2006. Unlike Twitter, which is currently in widespread use, blogs generally have no character limitation and can include long texts. For long texts, it is found that dictionary-based methods are robust to classify emotions accurately [26]. Therefore, we applied dictionary-based methods for 10 years of blog data to determine long-term collective emotion.
Blog data
We employed data from the Japanese blog space between November 1, 2006 and October 31, 2016 using a fee-charging service called ‘Kuchikomi@kakaricho (https://kakaricho.jp/: Accessed August 24, 2018)’ on December 1, 2016. This service provides the daily number of blog articles that include any given target word more than once with a built-in spam filter via API. Here we set the spam filter to a high level. As of October 2016, the full database contains more than 3.6 billion blog articles from 43 million independent accounts. Basically, this database contains public blog articles that are posted on major blogging platforms, tweets on Twitter, and writings on a textboard system in Japanese.
Here we only use public blog articles based on the terms and conditions of the service. In principle, the database can be used by anyone if contracted with the company (https://www.hottolink.co.jp/: Accessed August 24, 2018)’. In fact, various studies have been conducted based on the database so far [27–29]. Due to the system specification, if one blog article contained the same word multiple times, we counted it once. On the other hand, if one blog article contained two different words, we counted it as two. Since we mainly used word frequencies on blog space via API, we cannot access personally identifying information. We checked several publicly readable blog articles throughout our study, but they are anonymized, and we cannot identify the authors.
POMS and emotion dictionary
To extract collective emotion from the Internet, one popular method is to categorize articles as either positive or negative emotion, and then to extend these categories into more dimensions with further complex emotions [25]. The aim of the present study is to analyze long-term periodic cycles and memories of collective emotion which is extracted from the texts obtained from blogs in the Internet. Here we categorize emotions into six dimensions based on the well-established psychological literature [30]. Because some emotions are already difficult to categorize into either positive or negative, e.g., feelings representing fatigue may be classified as both positive and negative according to the context, multidimensional emotions may reveal interesting properties of collective emotion from new perspectives.
Extracting multidimensional emotions has historically been done by psychologists using questionnaires on relatively small groups [31]. In self-reported questionnaire surveys, participants passively answer questions. In recent years, attempts have been made to extract emotions from online texts, which have been written actively and spontaneously, based on words contained in traditional question items [32].
There exists various ways to extract multidimensional emotions. The Affective Norms for English Words (ANEW) is an English emotion dictionary that contains about 1,000 words [33]. ANEW has three semantic differentials, namely, good-bad, active-passive, and strong-weak. Dodds and Danforth quantified happiness in songs, blogs, and a State of the Union address using ANEW words [34]. The Positive and Negative Affect Schedule (PANAS) is also a well-established English psychometric scale that consists of two 10-item mood scales [35], including fear (negative) and joviality (positive). Recently, PANAS was expanded to extract emotions from Twitter [36]. Unlike ANEW, PANAS is officially translated into a number of languages, including Russian and German. However, the Japanese version of PANAS has only been validated within a limited scope.
Here we develop and study the emotion based on the Profile of Mood States (POMS) measure of a psychological rating scale [30]. In this study, we built an original emotion dictionary based on the Japanese version of POMS. POMS was originally developed to measure the effectiveness of pharmacological therapy for veterans in the U.S. POMS can measure temporal mood states based on answers to 65 short questions identifying the following six extracted emotions: Tension-Anxiety (Tension), Depression-Dejection (Depression), Anger-Hostility (Anger), Vigor, Fatigue, and Confusion. In the following, the names of the POMS emotions will be used as those given in parentheses.
POMS 65 questions are attributed to each of the six emotions: 9 items for Tension, 15 for Depression, 12 for Anger, 8 for Vigor, 7 for Fatigue, and 7 for Confusion. The participants answer the questions with scores from zero (fully disagree) to four (fully agree). Note that there are 2 opposite question items in Tension and Confusion. For example, the question ‘feel relaxed’ is used for measuring Tension by scoring small values. These 2 opposite questions and 7 dummy questions that were excluded in our procedure.
The original purpose of POMS is to measure temporal emotions of individuals. However, since many English POMS questions are simple, including items such as ‘sad’ and ‘angry,’ several researchers have recently decided to use it to determine collective emotion on the Internet. Bollen et al. used POMS to extract emotions from Twitter over about a 1-year period [32]. They found that POMS mood reflected some social/economic phenomena such as Thanksgiving Day and elections.
POMS was officially translated into Japanese in 1994 by a Japanese psychologist [37]. Since then, it has been used for various purposes, such as measuring conditions of athletes and conducting mental health checks in firms; therefore, POMS is considered reliable, even for Japanese. The Japanese version of POMS is also used to determine collective emotion on Japanese Twitter space for 5 months and it is found to be related to real social phenomena such as Christmas time [38].
Here we parsed some words which are attributed to POMS emotions to build our emotion dictionary. Overview of our dictionary building procedure is as follows (details are described in S1 Appendix):
Parse one word that best expresses the emotion from each POMS question
Add orthographic variants and synonyms for each parsed word
Remove very low and very high frequency words
When building the emotion dictionary, we adjusted the number of listed words so that specific words would not become dominant. Due to our careful procedure, the number and frequency of words were comparable for each emotion. Eventually, 21 words for Tension, 25 for Depression, 25 for Anger, 20 for Vigor, 22 for Fatigue, and 35 for Confusion were included in our emotion dictionary. Our original emotion dictionary and each emotion time series can be found in S2 Appendix.
Collective emotion time series
In previous literature, Bollen et al. [32] produced collective emotion by averaging the mood vectors for each tweet that is limited to 140 characters. However, in the case of blogs that has no limit on the number of characters, the same method is difficult to implement. Therefore, in order to make it as simple and clear, we defined the collective emotion by aggregating the time series of the frequency of words listed in our dictionary. We first generate the time series for word i that belongs to emotion k at day t, , and define the time series of emotion k as follows:
(1) |
where Mk is the number of words that belong to emotion k. Because the appearance of a word in the emotion dictionary can easily fluctuate due to news and external factors, summing up several words can reduce the fluctuation [28].
Next, to determine each of the emotional dynamics, we calculate each emotion’s time series Zk(t). First, we calculated normalized raw dynamics as follows:
(2) |
where X(t) is the total number of blog articles posted at day t. Then, we standardized as follows:
(3) |
where and σk are the temporal mean and temporal standard deviation of for whole period. The standardized number of whole emotional dynamics and whole blogs that are independent of words X(t) are displayed on a monthly scale in Fig 1.
Calculation of periodic cycles
We determined periodic cycles of time series y(t) as {y(t);t = t0, t0 + 1, ⋯, t0 + L, ⋯, t0 + 2L, ⋯} with its periodicity l = (0, 1, ⋯, L − 1). Thus, weekly periodicity is l = (Mon., Tue., ⋯, Sun.) with L = 7, and yearly periodicity in monthly scale is l = (Jan., Feb., ⋯, Dec.) with L = 12, and yearly periodicity in daily scale is l = (1, 2, ⋯, 365) with L = 365.
The m-th periodicity pm(l) is calculated as follows:
(4) |
where tm = t0+ mL and . Then, the averaged periodicity p(l) is
(5) |
where M is the total number of periodic cycles in time series y(t). The standard deviations of M ensembles s(l) is
(6) |
To exclude the periodic cycle, we simply divided y(t) = y(t0 + ml) by p(l).
Autocorrelation and power spectral density
Autocovariance function Cov(τ) for time series z(t) is calculated as follows:
(7) |
where μ is the temporal mean of z(t) and 〈⋅〉 is the ensemble mean. Then autocorrelation function ρ(τ) is
(8) |
When a stationary time series has long-term memory property, . This occurs when ρ(τ) ∼ τ−α, α < 1 is a clear sign of long-term memory property.
The power spectral density S(f) is the Fourier transform of the corresponding autocorrelation function ρ(τ) by Wiener-Khinchin theorem.
(9) |
Results
Fig 1A shows the monthly time series before removing periodic cycles of each emotional dynamics Zk(t) since November 2006. It is seen that Confusion increased during the global financial crisis in 2008. Tension increased sharply after the 3.11 earthquake in 2011. Vigor turned upward, and Anger and Fatigue turned downward in late 2012, when the Japanese government changed over and the economic situation started to improve.
Periodic cycles
Weekly periodicity
Weekly (7-day) periodicities are observed for each of the six emotional dynamics Zk(t). This is clearly indicated by the autocorrelation functions of each emotional dynamics ρk(τ) before excluding the periodic cycles which show weekly periodic correlations and sharp peak in the power spectrum densities Sk(f) (shown later in Fig 4A and 4B). To further clarify this periodicity, we averaged daily amounts of collective emotion excluding the week of the 3.11 earthquake: March 9 to March 15 in 2011 and the 6 days at the end of the data period in October 2016. The weekly periodicity p(l) is clearly seen in Fig 2A.
It can be seen, for example, that Fatigue is higher on Mondays. By checking blog articles directly, we found some examples of people going out on weekends and being tired until Monday. Depression also increases on Mondays probably due to non-motivation feelings with regard to work and school. Tension increases on Fridays because people are probably worried about the weekend weather.
Somewhat similar weekly periodicities of collective emotion were observed in Twitter space in the United Kingdom in 2011 [39] and in the United States between 2009 and 2010 [40]. In the U.K. study, it has not been clarified which emotions have increased on which day of the week, they found that joy showed the most clearly periodic behavior and anger showed less. In the U.S. study, they found that Saturday has the highest average happiness and Tuesday is the lowest. We cannot compare these results directly to ours. But note that our results show that Anger has weekly periodicity becoming less on weekends and more on weekdays. These weekly periodic cycles may correspond to the result of the U.S. study of happiness.
Yearly periodicity
To test the possibility of yearly (12-month and 365-day) periodicities of collective emotion, we calculated 12 months and each day of the month over the ten years average amounts of collective emotion. We excluded November 2010 to October 2011 because this span surrounds the 3.11 earthquake. Note that for calculating 365-day periodicities, we also excluded February 29 in 2008, 2012, and 2016 for the leap years.
Fig 2B and 2C show the yearly periodicities p(l) for each emotion in monthly scale with shaded colored areas indicating the standard deviations s(l) (see Eq (6) in Materials and methods) and in daily scale with points indicating the major peaks as shown in Table 1.
Table 1. Major dates in which emotion increased significantly every year.
Date | Emotion | Rate (%) | Event |
---|---|---|---|
February 14 | Tension | 112.3±13.7 | Valentine’s Day |
April 6 | Tension | 111.2±11.6 | Entrance ceremonies |
April 7 | Tension | 113.0±12.4 | |
April 8 | Tension | 112.4±10.6 | |
May 7 | Fatigue | 127.1±8.9 | After GW holidays |
December 30 | Confusion | 111.9±9.6 | New Year’s Eve |
December 31 | Depression | 115.6±12.6 | |
Confusion | 122.7±8.5 |
By collecting blog articles selectively and reading their content, we suggest the following reasons for the yearly periodicities in monthly scale. Fatigue increases in July and August, which are summer months in Japan. Because people suffer from hot and humid weather during the Japanese summer, they get tired easily. Because new school and fiscal years start every April in Japan, and many people start new schools or workplaces, this probably creates in April high Tension. Depression and Confusion tend to increase slightly in winter times, particularly in December and January, which might be caused by the short day-length. On the other hand, we did not detect clear monthly trends in the other emotions, Anger and Vigor.
In Table 1, we list the specific dates for which the amount of each emotion increased more than 10% from the temporal average over the 10-year study period after excluding weekly and yearly cycles in monthly scale. In order to extract dates that are systematically high every year, we show dates where the emotion rate’s standard deviations are less than 15% in Table 1. As expected, the listed dates correspond to typical annual events such as New Year’s Eve and Valentine’s Day.
Fatigue tends to be higher after the end of consecutive holidays. For example, Golden Week (GW) holidays that are consecutive national holidays every spring in Japan, show increased Fatigue. Although Fatigue rate is not more than 110%, after New Year’s holidays (108.8% and 108.5% for January 5 and 6 respectively) and traditional Japanese summer holidays (108.7% for August 17) show also higher Fatigue. Interestingly, Depression shows slightly higher on the final day of GW holidays (108.9% for May 6). This result suggests that people feel sad about the end of the holidays.
It is also interesting to note that there are some dates that emotions steadily decrease every year. For example, January 1 is a special day that all emotions except Confusion decrease less than 90%. Christmas Eve is also a special day that all emotions except Depression decrease less than 90%. During New Year’s Days, GW holidays and Christmas days, Tension continues to decrease less than 90%. These findings suggest that people are spending relaxed time (Details are in S1 Appendix).
Yearly periodicities of collective emotion on Twitter in the U.K. has been recently investigated during a period of four years from 2010 to 2014, excluding 2012 [41]. In the U.K., anger and sadness peak in the winter month and anxiety peaks in the autumn and spring. Our data did not show seasonal cycles in Anger, however, we find that also Tension peaks in the spring. Dzogang et al. [41] did not suggest the possible reasons of anxiety, however, since new school year in the U.K. starts in autumn, it may coincide with our results for Tension(-Anxiety).
Furthermore, happy dates in the U.S. between 2008 and 2010 are reported by using Twitter [40] e.g., Christmas Eve and Day, New Year’s Eve and Day, Valentine’s Day, Thanksgiving etc. Some of these days coincide with our outlier dates shown in Table 1, while emotions are very different in both places. For example, New Year’s Eve is a happy day in the U.S. but Confusion and Depression increases in this day in Japan. This might be due to the differences between the typical people character in the U.S. and Japan. In New Year’s holiday, people expect to spend with family in both the U.S. and Japan. On the other hand, in Japan, the person who spends alone tends to feel much more lonely and post their feeling blogs causing high Depression.
Taken together, yearly periodicities exist independent of language, culture and social media platform, but these characteristics might be different depending on them. There are various contexts behind collective emotion due to cultural background and platform usage. Since the difference of cyclic behaviors in Wikipedia editorial activities has been also observed to depend on various cultural backgrounds [42], comparing these periodic cycles in collective emotion across the countries may be of interest in future studies.
Remaining spikes
After removing the weekly and yearly periodicities, autocorrelation functions ρk(τ) show no periodic behaviors (shown later in Fig 4C) and distributions of the daily difference of each emotional dynamics, ΔZk(t) = Zk(t) − Zk(t − 1), show normal distribution in every emotion (S1 Appendix). However, we still identify several sharp spikes in each of the emotional dynamics.
In Table 2, the major spikes that the emotion increased above the average value estimated from earlier seven days are listed. We confirmed that these spikes are associated with real events. We verified that most spikes were attributed to Tension in conjunction with natural disasters such as earthquakes and typhoons that occurred throughout the observation period. As for the duration of increased emotion, all cases except for the 3.11 earthquake returned to their original baseline within a week (Fig 3).
Table 2. Major spikes in descending order of increased rate which are estimated from averaging earlier seven days.
Day | Emotion | Rate (%) | Event |
---|---|---|---|
March 11, 2011 | Tension | 602.6 | the 3.11 earthquake |
March 12, 2011 | Deression | 305.1 | the day after the 3.11 earthquake |
Confusion | 273.6 | ||
April 15, 2016 | Tension | 240.3 | Kumamoto earthquake |
September 21, 2011 | Tension | 193.4 | Typhoon Roke |
October 7, 2009 | Tension | 177.1 | Typhoon Melor |
June 14, 2008 | Tension | 167.5 | Iwate earthquake |
January 18, 2016 | Confusion | 157.9 | Heavy snowfall in Tokyo metropolitan area |
September 10, 2015 | Tension | 156.7 | Heavy rain in Tokyo metropolitan area |
August 11, 2009 | Tension | 154.5 | Shizuoka earthquake |
October 15, 2013 | Tension | 153.8 | Typhoon Wipha |
The 3.11 earthquake was a special case where Tension continued to increase more than one month (37 days), followed by Depression and Confusion. It is also interesting to mention that the peak day of each emotion is different at the 3.11 earthquake. The peak day of Tension is one day after the earthquake, Depression is two days, and Confusion is three days after the earthquake. It reflects the fact that collective emotions are changing day by day. After the 3.11 earthquake, the social mood has been regarded to have changed qualitatively. At the time, an extraordinary mood, the so-called ‘self-restraint mood,’ has been prevalent in Japanese society. In relation to this mood, many people refrained from going out, such as choosing not to hold/attend annual cherry blossom viewing parties. In addition, fewer corporate TV commercials were broadcasts, and movie premiers and new product launches were postponed. To the best of our knowledge, there has been no previous quantitative survey regarding how long this unusual mood continued. Therefore, the present study is the first attempt to measure this unusual mood quantitatively based on the Internet. The 3.11 earthquake has been found to cause relatively low happiness in the U.S. Twitter space as well as found for the Chilean earthquake in February, 2010 [40].
We note that while events such as the Bailout of the U.S. financial system and the Royal Wedding of Prince William caused outlier days of happiness [40], our observed outlier days in Japanese blog space could be only attributed to the natural disasters.
Long-term memory
Long-term correlation
Fig 4 shows autocorrelation functions ρk(τ) and power spectral densities Sk(f) of each daily emotional dynamics Zk(t) in log-log scale. We first separated Zk(t) every one year and ρk(τ) was calculated with maximum lag τ = 365 days. The average autocorrelation function ρk(τ) which are shown in Fig 4C is calculated using only the stationary samples (Details are in S1 Appendix) to clearly see the exponent of ρk(τ). The power spectral densities Sk(f) are averaged over 10 years by Welch’s method [43] after removing leap days (29 February). Compared to the result of raw series (Fig 4A and 4B) that has high correlations in one week and one month (peaks), we can observe clear persistence or long-term correlations without periodic peaks, after removing periodic cycles (Fig 4C and 4D).
Fig 4E and 4F show results of weekly randomized time series, after 10 times averaging. We made randomized series following three patterns: monthly randomized, weekly randomized, and daily randomized. For monthly randomized series, we keep the time series for periods shorter than one month and shuffled randomly the different months in the time series. For weekly randomized series, we applied the same procedure but shuffled randomly the weeks without 6 days at the end of the data period in October 2016. For daily randomized series, we fully randomized the time series on a daily basis.
The autocorrelation functions ρk(τ) show approximately a power-law decay, ρk(τ) ∼ τ−α in the real data (Fig 4C). The power law exponent is found to be close to α ∼ 0.5 for all six emotions, and after six months, ρk(τ) ∼ 0. The long-term persistence is supported by the observation that ρk(τ) decays much sharper in randomized samples depending on the randomized time scales (Fig 4E and S1 Appendix). In particular the results of daily randomized series show indeed ρk(τ) ∼ 0 for τ > 0 as expected.
For the power spectral densities S(f), all emotions show approximately S(f) ∼ f−0.5 in the low-frequency range (Fig 4D), and white noise is observed in daily randomized result (S1 Appendix). From the Wiener-Khintchine theorem, the power spectral density Sk(f) can be expressed by the Fourier transform of its autocorrelation function ρk(τ), resulting in the following relation between the exponents. For ρk(τ) ∼ τ−α, the Sk(f) behaves as Sk(f) ∼ f−(1−α). Thus, we can see that α is indeed approximately 0.5 for all emotions indicating that each emotional dynamics has long-term memory of order of a few months.
Coarse-grained movement
Since positive correlation of emotional dynamics ρk(τ) is found to roughly six months, we performed principal component analysis for time series summarized every six months of each emotional dynamics Zk(t).
The first and second eigenvectors accompanying with component scores are shown in Fig 5. Up to the second principal component, the cumulative contribution ratio was 96.1% (Fig 5A), and 88.6% (weekly randomized in Fig 5B). Thus, the results of principal component analysis reflect the dominant part of the six emotional dynamics in two dimensions. Note that the first principal component was mainly Vigor, and the second was mainly Fatigue for real time series in Fig 5A. Since there are no periodic cycles in time series summarized every six months, we cannot confirm a clear difference between before and after removing periodic cycles.
We confirmed that they were almost independent eigenvectors. However, there were some overlapping parts in the emotion directions (Fig 5). Duplicate words did not exist in different emotions, but the vector directions still overlapped. This may be due to the process of summarizing time series for every six months, e.g., Depression and Confusion moved same directions for a long time in Fig 1.
For the first and second principal component scores, each point in the figure corresponds to a six months average and it moved gradually from point to point in the real data (Fig 5A) rather than jumping between the points in the randomized data (Fig 5B). This indicates that the emotional dynamics changed moderately over time. Thus, we could successfully capture, for the first time, the evidence of the slow dynamics of collective emotion.
Discussion
People are increasingly active on the Internet, and this currently available data can provide new perspectives of collective human behaviors. Extracting and tracing collective emotion is a challenging new research topic because social media has only become widespread in the past decade. Here we extracted collective emotion from the Japanese blog space for 10 years between 2006 and 2016, analyzing 3.6 billion blogs based on dictionary-based method.
Firstly, the periodic cycles for each of the emotional dynamics has been observed after averaging over the 10 years. Weekly and yearly periodicities appeared in each of the emotional dynamics in the Japanese blog space that were connected to real phenomena. In particular, Fatigue tends to increase after consecutive holidays. In Japan, it is known that suicide numbers tend to increase after consecutive holidays. Suicide number is known to be associated with Google Trends in England [44] and Korean blogs [45], measuring collective emotion might be applied to identify earlier signals of suicides.
Secondly, after removing these periodic cycles from each of the emotional time series, we find that sharp spikes could be attributed to natural disasters. In particular, collective emotion increased largely under the influence of the 3.11 earthquake. This influence continued to be high over a month in Tension, Depression, and Confusion. During the 3.11 earthquake period, many rumors spread [5]. It was argued that feelings of anxiety contribute to the spread of rumors during a disaster [46]. In addition, a psychological study involving 24 introductory psychology students reported that anxious feelings accelerate rumor spreading [47]. We achieved similar results but with much richer data from our 3.6 billion blog articles.
Finally, our study is the first to shed light on long-term memory of collective emotion which have attracted little attention so far. In every emotion of real data, autocorrelation showed power-law decay with an exponent much less than one which suggests the existence of long-term memory. Also, the result of power spectrum density and principal component analysis suggest strong indications of long-term memories in collective emotion for time scales of several months.
There are important limitations of this research. Since there are no ground-truth data for collective emotion, our results represent an estimation and plausible. We expect to accumulate a broader range of similar studies and data of collective emotion for future analysis. Also, due to the current lack of geo-located data, we cannot consider the geographical differences of collective emotions in different locations. We believe that considering geographic differences will provide deeper insights and understanding, especially in the cases of natural disasters.
To further develop the present research, the following points could be considered. First, we only focused on the Japanese blog space, which is not equivalent to other cultures in the world. Compared with previous studies with a limited number of participants answering questionnaires, our study used rich data from actively writing individuals. This larger variety of data compared to others represents the high-quality nature of the present study. Second, our results were limited by our dictionary based on POMS [30]. Our dictionary was built based on a traditional psychology scale, the extracted emotions depended on six dimensions with five negative emotions and one positive emotion. However, it is obvious that these emotions do not cover whole dimensions of collective emotion. Especially it is important to add new positive emotions in the analysis. For example, POMS2 [48], the second edition of POMS with new positive emotion Friendliness, has been released and translated into Japanese recently. Additionally, we applied naive summation of dictionary listed words that are checked manually. Using Word2vec [49] and Doc2vec [50] could be a new possible direction for dictionary building procedure semi-automatically. Furthermore, there exists numerous other psychological measures that could be analyzed. Extracting multidimensional emotions is a still challenging task that should attract researchers in the future.
Supporting information
Acknowledgments
The authors give special thanks to Dr. Takeshi Sakaki and Mr. Sakae Mizuki from Hottolink Inc. for carefully checking the words during the construction of our dictionary.
Data Availability
All relevant data is within the paper and its Supporting Information files.
Funding Statement
This work was supported by JST and MOST, SICORP Japan-Israel Cooperative Scientific Research on ICT for a Resilient Society (YS, HT, SH, MT) and METI Development Project of New Indicators Utilizing Big Data and its Analysis Technology (YS, HT, MT). This work was partially supported by JSPS KAKENHI Grant Number 17K12783 (YS). There was no additional external funding received for this study. Sony Computer Science Laboratories, Inc. provided support in the form of salaries for author HT, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific role of HT is given in the ‘author contributions’ section.
References
- 1. Lazer D, Pentland AS, Adamic LA, Aral S, Barabasi AL, Brewer D, et al. Computational social science. Science. 2009;323:721–723. 10.1126/science.1167742 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Conte R, Gilbert N, Bonelli G, Cioffi-Revilla C, Deffuant G, Kertész J, et al. Manifesto of computational social science. Eur Phys J-Spec Top. 2012;214(1):325–346. 10.1140/epjst/e2012-01697-8 [DOI] [Google Scholar]
- 3. Mann A. Core concept: Computational social science. Proc Natl Acad Sci (USA). 2016;113(3):468–470. 10.1073/pnas.1524881113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Bakshy E, Hofman JM, Mason WA, Watts DJ. Everyone’s an influencer: Quantifying influence on Twitter. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining; 2011. p. 65–74.
- 5. Takayasu M, Sato K, Sano Y, Yamada K, Miura W, Takayasu H. Rumor diffusion and convergence during the 3.11 earthquake: A Twitter case study. PLoS ONE. 2015;10(4):e0121443 10.1371/journal.pone.0121443 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Feng L, Hu Y, Li B, Stanley HE, Havlin S, Braunstein LA. Competing for attention in social media under information overload conditions. PLoS ONE. 2015;10(7):e0126090 10.1371/journal.pone.0126090 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Oka M, Ikegami T. Self-organization on social media: Endo-exo bursts and baseline fluctuations. PLoS ONE. 2014;9(10):e109293 10.1371/journal.pone.0109293 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Sasahara K, Hirata Y, Aihara K. Quantifying collective attention from tweet stream. PLoS ONE. 2013;8(4):e61823 10.1371/journal.pone.0061823 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Gilbert E, Karahalios K. Widespread worry and the stock market. In: Proceedings of the 4th International Conference on Weblogs and Social Media (ICWSM’10); 2010. p. 59–65.
- 10. Zheludev I, Smith R, Aste T. When can social media lead financial markets? Sci Rep. 2014;4:1–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Asur S, Huberman BA. Predicting the future with social media. In: Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology; 2010. p. 492–499.
- 12. Mestyán M, Yasseri T, Kertész J. Early prediction of movie box office success based on Wikipedia activity big data. PLoS ONE. 2013;8(8):e71226 10.1371/journal.pone.0071226 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.O’Connor B, Balasubramanyan R, Routledge BR, Smith NA. From tweets to polls: Linking text sentiment to public opinion time series. In: Proceedings of the 4th International Conference on Weblogs and Social Media (ICWSM’10); 2010. p. 122–129.
- 14.Lampos V, Cristianini N. Tracking the flu pandemic by monitoring the social web. In: 2010 2nd International Workshop on Cognitive Information Processing. IEEE; 2010. p. 411–416.
- 15.De Choudhury M, Gamon M, Counts S, Horvitz E. Predicting depression via social media. In: Proceedings of the 47h International Conference on Weblogs and Social Media (ICWSM’13); 2013. p. 128–137.
- 16.United Nations Global Pulse: Unemployment through the lens of social media; 2011. Available from: http://www.unglobalpulse.org/projects/can-social-media-mining-add-depth-unemployment-statistics.
- 17.Lansdall-Welfare T, Lampos V, Cristianini N. Effects of the recession on public mood in the UK. In: Proceedings of the 21st international conference companion on World Wide Web—WWW’12 Companion; 2012. p. 1221–1226.
- 18. Ferrara E, Yang Z. Measuring emotional contagion in social media. PLoS ONE. 2015;10(11):e0142390 10.1371/journal.pone.0142390 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Golder SA, Macy MW. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures. Science. 2011;333(6051):1878–1881. 10.1126/science.1202775 [DOI] [PubMed] [Google Scholar]
- 20. Kramer ADI, Guillory JE, Hancock JT. Experimental evidence of massive-scale emotional contagion through social networks. Proc Natl Acad Sci (USA). 2014;111(24):8788–8790. 10.1073/pnas.1320040111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Lampos V. Detecting events and patterns in large-scale user generated textual streams with statistical learning methods. University of Bristol; 2012. Available from: http://arxiv.org/abs/1208.2873.
- 22. Bollen J, Mao H, Zeng X. Twitter mood predicts the stock market. J Comput Sci. 2011;2:1–8. 10.1016/j.jocs.2010.12.007 [DOI] [Google Scholar]
- 23. Brady WJ, Wills JA, Jost JT, Tucker JA, Van Bavel JJ. Emotion shapes the diffusion of moralized content in social networks. Proc Natl Acad Sci (USA). 2017;114(28):7313–7318. 10.1073/pnas.1618923114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Giachanou A, Crestani F. Like it or not: A survey of Twitter sentiment analysis methods. ACM Comput Surv. 2016;49(2):1–41. 10.1145/2938640 [DOI] [Google Scholar]
- 25. Tsytsarau M, Palpanas T. Survey on mining subjective data on the web. Data Min Knowl Disc. 2011;24(3):478–514. 10.1007/s10618-011-0238-6 [DOI] [Google Scholar]
- 26. Reagan AJ, Danforth CM, Tivnan B, Williams JR, Dodds PS. Sentiment analysis methods for understanding large-scale texts: a case for using continuum-scored words and word shift graphs. EPJ Data Sci. 2017;6:28 10.1140/epjds/s13688-017-0121-9 [DOI] [Google Scholar]
- 27. Ishii A, Arakaki H, Matsuda N, Umemura S, Urushidani T, Yamagata N, et al. The ‘hit’ phenomenon: A mathematical model of human dynamics interactions as a stochastic process. New J Phys. 2012;14:063018 10.1088/1367-2630/14/6/063018 [DOI] [Google Scholar]
- 28. Sano Y. Correlations and fluctuations in the word sets of collective emotions. NOLTA. 2018;9(3):382–390. 10.1587/nolta.9.382 [DOI] [Google Scholar]
- 29. Watanabe H. Empirical observations of ultraslow diffusion driven by the fractional dynamics in languages. Phys Rev E. 2018;98(1):012308 10.1103/PhysRevE.98.012308 [DOI] [PubMed] [Google Scholar]
- 30. McNair DM, Lorr M, Droppleman LF. Manual for the Profile of Mood States Educational and Industrial Testing Services; 1971. [Google Scholar]
- 31. Robinson JP, Shaver PR, Wrightsman LS. Measures of Personality and Social Psychological Attitudes. Academic Press; 1991. [Google Scholar]
- 32.Bollen J, Mao H, Pepe A. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM’11); 2011. p. 450–453.
- 33.Bradley MM, Lang PJ. Affective Norms for English Words (ANEW): Instruction Manual and Affective Ratings. Technical Report C-1, The Center for Research in Psychophysiology, University of Florida.; 1999.
- 34. Dodds PS, Danforth CM. Measuring the happiness of large-scale written expression: Songs, blogs, and presidents. J Happiness Stud. 2009;11(4):441–456. 10.1007/s10902-009-9150-9 [DOI] [Google Scholar]
- 35. Watson D, Clark LA, Tellegen A. Development and validation of brief measures of positive and negative affect: the PANAS scales. J Pers Soc Psychol. 1988;54(6):1063–1070. 10.1037/0022-3514.54.6.1063 [DOI] [PubMed] [Google Scholar]
- 36.Gonçalves P, Benevenuto F, Cha M. PANAS-t: A pychometric scale for measuring sentiments on Twitter. arXiv:13081857. 2013;.
- 37. Yokoyama K, Araki S. POMS Japanese version. Kaneko Shobo; 1994. [Google Scholar]
- 38.Momoi T, Suyari H. Comparison of the mood model generated from Japanese Twitter and the economic index (in Japanese). In: Proceedings of the Japan Workshops of Emergent Intelligence on Network in 2012 (JWEIN12). Japan Society for Software Science and Technology; 2012. p. 12006.
- 39.Lampos V, Lansdall-Welfare T, Araya R, Cristianini N. Analysing mood patterns in the United Kingdom through Twitter content. arXiv:13045507. 2013;.
- 40. Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM. Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS ONE. 2011;6(12):e26752 10.1371/journal.pone.0026752 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Dzogang F, Lansdall-Welfare T, Cristianini N. Seasonal fluctuations in collective mood revealed by Wikipedia searches and Twitter posts. In: Proceedings of the IEEE 16th International Conference on Data Mining Workshops (ICDMW); 2016. p. 931–937.
- 42. Yasseri T, Sumi R, Kertész J. Circadian patterns of Wikipedia editorial activity: A demographic analysis. PLoS ONE. 2012;7(1):e30091 10.1371/journal.pone.0030091 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Welch P. The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics. 1967;15(2):70–73. 10.1109/TAU.1967.1161901 [DOI] [Google Scholar]
- 44. Kristoufek L, Moat HS, Preis T. Estimating suicide occurrence statistics using Google Trends. EPJ Data Sci. 2016;5:32 10.1140/epjds/s13688-016-0094-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Won HH, Myung W, Song GY, Lee WH, Kim JW, Carroll BJ, et al. Predicting national suicide numbers with social media data. PLoS ONE. 2013;8(4):e61809 10.1371/journal.pone.0061809 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Allport GW, Postman L. The psychology of rumor. Henry Holt; 1947. [DOI] [PubMed] [Google Scholar]
- 47. Walker CJ, Beckerle CA. The effect of state anxiety on rumor transmission. J Soc Behav Pers. 1987;2(3):353–360. [Google Scholar]
- 48. Heuchert JP, McNair DM. Profile of Mood States Second Edition Multi-Health Systems Inc; 2012. [Google Scholar]
- 49. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality In: Advances in neural information processing systems; 2013. p. 3111–3119. [Google Scholar]
- 50.Le Q, Mikolov T. Distributed representations of sentences and documents. In: International Conference on Machine Learning; 2014. p. 1188–1196.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data is within the paper and its Supporting Information files.