Skip to main content
Perspectives on Behavior Science logoLink to Perspectives on Behavior Science
. 2024 Jan 24;47(1):283–310. doi: 10.1007/s40614-023-00394-x

Lexicon-Based Sentiment Analysis in Behavioral Research

Ian Cero 1,, Jiebo Luo 2, John Michael Falligant 3,4
PMCID: PMC11035532  PMID: 38660506

Abstract

A complete science of human behavior requires a comprehensive account of the verbal behavior those humans exhibit. Existing behavioral theories of such verbal behavior have produced compelling insight into language’s underlying function, but the expansive program of research those theories deserve has unfortunately been slow to develop. We argue that the status quo’s manually implemented and study-specific coding systems are too resource intensive to be worthwhile for most behavior analysts. These high input costs in turn discourage research on verbal behavior overall. We propose lexicon-based sentiment analysis as a more modern and efficient approach to the study of human verbal products, especially naturally occurring ones (e.g., psychotherapy transcripts, social media posts). In the present discussion, we introduce the reader to principles of sentiment analysis, highlighting its usefulness as a behavior analytic tool for the study of verbal behavior. We conclude with an outline of approaches for handling some of the more complex forms of speech, like negation, sarcasm, and speculation. The appendix also provides a worked example of how sentiment analysis could be applied to existing questions in behavior analysis, complete with code that readers can incorporate into their own work.

Keywords: Data science, Matching, Natural language processing, Sentiment, Text analysis, Verbal behavior


Most human affairs are verbally mediated. A comprehensive account of human verbal behavior is thus necessary for a truly complete science of human behavior. Although existing theories have made substantial progress toward this goal (Hayes et al., 2001; Skinner, 1957), the study of naturally occurring verbal behavior remains limited by at least two practical barriers. First, time-intensive hand coding is often required to wrangle raw verbal data into an analyzable format. Second, the sheer complexity of a real-world verbal scenario often requires a bespoke coding scheme for each study. This lack of methodological standardization in turn leads to duplication of work across different studies and complicates evaluation of the verbal behavioral research literature for its consumers (see Critchfield, Becirevic et al., 2017). The lack of a coherent methodology for examining verbally mediated phenomena may also reflect broader issues within the discipline of behavior analysis with respect to communication of practices and findings with other professionals and consumers (see Becirevic et al., 2016; Critchfield et al., 2016; Critchfield & Doepke, 2018; Critchfield, Doepke et al., 2017).

Fortunately recent developments in computer and data science have greatly improved the prospects for behavior scientists studying verbal behavior, especially naturally occurring verbal behavior. For example, natural language processing has produced a range of impressive techniques for describing linguistic products and using them to predict aspects of human behavior (Jurafsky & Martin, 2008), including in mental health (De Choudhury et al., 2014, 2016). The most relevant of these techniques for contemporary behavior scientists is sentiment analysis, the subfield of natural language processing focused on measuring the overall “sentiment” of a verbal product (e.g., session transcripts, diary entries, social media posts; Liu, 2020). Most often, the sentiment being estimated is the general positivity or negativity of a verbal product. However, with minor adjustments, the same techniques can be used to quantify a range of emotions, as well as social and intellectual processes (Tausczik & Pennebaker, 2010). Because this approach resembles a scaled-up and standardized version of existing hand-coded behavioral studies of verbal products (e.g., McDowell & Caron, 2010), it is a natural fit for behavior analysts—avoiding many of the “black box” criticisms of other modern computer science models.

This discussion is thus designed to introduce readers to the theory and practice of sentiment analysis, with an emphasis on how those techniques can be applied to the study of verbal behavior. We start by outlining some unsolved practical barriers that arise from the status quo approach for studying verbal products. We then introduce readers to the assumptions and techniques of sentiment analysis. Lastly, we address some of the most common objections to sentiment analysis and outline potential solutions to them. An additional worked example is also given in the appendix.

Status Quo Stagnation

Existing theories of verbal behavior have generated substantial insight into not only the underlying taxonomy of verbal products humans exhibit, but also the functional relations that causally connect those products to stimuli in the surrounding environment (Hayes et al., 2001; Skinner, 1957). A nuanced account of verbal operants is required to understand and change complex, socially significant human behavior. There have been many calls for behavior analysts to broaden the scope of their research and clinical practice to incorporate dimensions of human experience (e.g., emotion, language) often untouched by behavior analysts. As described by Hayes (2001), a comprehensive experimental behavior-analytic account of psychology requires a continuous revitalization of behavior-analytic methodologies, and an openness to new theories, preparations, and concepts. But even as leading theorists have themselves argued (Hayes et al., 2001), an expansive program of research into naturally occurring verbal behavior has been slow to develop. Even giving ample credit to novel laboratory tools like the implicit relational association test (Barnes-Holmes et al., 2008; Hussey et al., 2015) and the function acquisition speed test (O’Reilly et al., 2012), research on the naturally occurring verbal products characterizing most humans’ day-to-day experience (e.g., conversation transcripts, social media posts) has largely failed to emerge (cf. Critchfield & Doepke, 2018; Reed, 2016). The lack of systematic research on these verbal relations also has direct implications for the dissemination of behavior-analytic research, as some have highlighted the iatrogenic potential of behavior-analytic jargon on perceptions of our methodologies and applied practices (Becirevic et al., 2016; Critchfield et al., 2016; Critchfield & Doepke, 2018; Critchfield et al., 2017; Normand & Donohue, 2022).

One reason for the slow development lies with the methods. Traditional methods tend to be time-intensive and unique to a given study. As a case in point, consider McDowell and Caron's (2010) finding that the amount of rule-breaking talk produced by potentially delinquent boys was directly proportional to the amount of positive peer talk that followed those utterances. Even more interesting, the association between rule-break talk and positive peer talk strongly conformed to the a priori predictions of the general matching law (GML; Herrnstein, 1970; McDowell, 2013), explaining an average 90% of the variance in the amount of rule-break talk produced. In fact, the bias parameter (b) of the GML for rule-break talk scaled to a measure of deviancy, illustrating that the verbal products of conversation directly map onto real-world measures of target behavior. This study has many upsides. It is theory-driven, has applied relevance across the social and behavioral sciences, and contains clear implications for a future program of experimental and applied studies (e.g., Luna, 2019; Simon & Baum, 2017). Although the article has generated much conceptual interest, there has been little empirical follow up.

One problem is that, like many other studies investigating naturally occurring verbal behavior (e.g., Critchfield et al., 2016), it uses a bespoke manual coding (BMC), in which a custom coding scheme is developed for this (and only this) study and then time-intensively implemented by trained observers. This approach has two major drawbacks.

First, although the BMC approach is easy to conceive, it is resource intensive to implement. In the study above, each of the 210 subject pairs produced 20 min of speech that needed to be hand-coded multiple times by different observers. Assuming only two observers that could code a transcript at the same rate as it would take to read a transcript out loud, that yields 2 × 20 × 210 / 60 = 140 person hr just to complete the coding. In a more recent example, another research group hand-coded n = 897 internet forum posts about applied behavior analysis for message type, tone, and factual accuracy (Turgeon & Lanovaz, 2021). For interrater reliability assessment, 25% (n = 224) of these were then rescored by a second rater. Even assuming a generous 30 s per post, that is still 1,121 posts × .50 min = 560 min (9.34 hr) of cognitively intensive person-time. These are big investments just to go from raw text to spreadsheet (not to mention the time and effort quantifying the degree of interobserver agreement between coders and resolving discrepancies when they arise). Replicating and extending these articles thus comes at a relatively high cost, in turn reducing the likelihood they will ever actually be replicated or extended.

Second, the BMC approach is typically designed to be implemented in a specific study (or related studies from a specific research group) and not reused by other researchers. Although this increases the likelihood that a given coding scheme will be sensitive to the unique features of a given study population, it slows the incremental progress that could have been produced by a collection of studies (and their tools) across research groups that build on one another. Although this collective goods problem is not the fault of any particular researcher or study, the BMC norm has preserved precision at the cost of progress. In response, we argue sentiment analysis is a better trade for behavior analysts studying verbal behavior, especially in naturally occurring contexts. As we show below, it is (1) orders of magnitude more efficient to implement than the BMC approach. Moreover, (2) the use of a standardized coding scheme allows an incremental consensus to develop over time, including even a consensus that we should discard that specific scheme for a new one.

Sentiment Analysis

The goal of sentiment analysis is to quantify the tone or “sentiment” of verbal products. Most often, the sentiment of greatest interest is a document’s polarity or valence, which reflects overall how positive or negative the views expressed in that document are. In addition to the valence of a document, sentiment analysis can also be used to quantify more specific emotional features, like sadness or depression (De Choudhury et al., 2014; Dodds et al., 2011). With slightly more effort, research relying on sentiment analysis techniques has also attempted to quantify even more complex social and intellectual reasoning processes (Tausczik & Pennebaker, 2010).

How can we quantify the valence, emotion, or even reasoning processes expressed in a document? In practice, there are two main approaches used to achieve this task: lexicon-based and machine learning-based. Although head-to-head comparisons show that learning-based methods are often the more accurate of the two approaches (Kotelnikova et al., 2021; Zhang et al., 2014), lexicon-based approaches remain popular and competitive because they are easier to implement and interpret. Lexicon-based methods are also more effective in situations with smaller samples and when there is a focus on specific parts of a document (e.g., praise directed to a specific learner), rather than the overall tone of that document (Khoo & Johnkhan, 2018). Overall, the lexicon-based approach will be more practical, adequate, and insight generating for most behavior analysts than the learning-based approach. We therefore focus exclusively on lexicon-based approaches in this discussion, referring interested readers to more comprehensive works for an introduction to machine learning-based strategies (e.g., Liu, 2020) and their recent application to behavior analysis (Bailey et al., 2021; Lanovaz et al., 2020; Lanovaz & Hranchuk, 2021; Taylor & Lanovaz, 2021; Turgeon & Lanovaz, 2020).

Background Assumptions

Lexicon-based sentiment analysis rests on the twin assumptions that (1) words and other tokens have at least some semantic orientation that they retain across contexts and that (2) these semi-stable orientations can be used to infer the approximate sentiment of a given document, despite the unique context in which the document was created. Phrased more practically, word meanings might fluctuate from context to context, but they have a kind of typical, consensus, or average meaning around which they orbit. Thus, by simply guessing that this typical meaning of the word holds in a given document, we can expect to be approximately right much of the time.1

To see this context-free semantic orientation in action, note most English-fluent readers will accurately sense “murder” has a negative polarity that persists across most contexts. Even overall positive phrases like, “I was ultimately glad he was murdered” imply that the typical reaction to murder is negative. This same assumption extends to larger units of speech too. “Not happy” has at least some context-free negative valence that persists across the different ways that phrase can be used. Those two words, placed in that order, will tend to imply something negative. The same assumption also extends to word qualities beyond just positive–negative polarity. Words like “bonus” almost always imply some degree of uncertainty or surprise. Likewise, it is also plausible that many other words have some context-free semantic orientations representing even more complex processes. For example, “because” and “effect” are related to causal processes and therefore signal at least something about the kind of intellectual orientation in which a speaker was engaged.

If our anecdotal experience with colleagues holds, most readers can immediately think of several ways these assumptions can go awry. The most obvious case is sarcasm, in which a speaker expresses literal agreement or support, but a combination of tone and context indicate their true intention was to communicate disagreement or criticism. These readers have correctly intuited that such creative uses of language are inconsistent with the background assumptions of lexicon-based sentiment analysis and thus the method is often fooled by them. To ensure concerned readers do not immediately abandon us here, please note that we address this and other complex cases (e.g., negation, speculation) directly in the final section—after readers are more familiar with sentiment analysis as a whole. We also note that sarcasm detection even by machine learning models also remains a challenging open problem at present (Joshi et al., 2016).

Conducting Sentiment Analysis

In sentiment analysis, verbal products can be thought of as documents that contain tokens. In this article, we use the term document to refer to any verbal product that could in principle be converted into text. Watercooler conversations, phone calls, and social media posts can all be thought of as “documents” in this sense. In contrast, tokens are the smallest units over verbal behavior under consideration and can be anything from individual letters, to words, to clauses, to sentences, and so on. Individual words or small collections of adjacent words called n-grams are by far the most common. For example, the word “happy” is a unigram, “not happy” is a bigram, “am not happy” is a trigram, and so on. To analyze the verbal content of a document, it must typically be broken up into individual tokens. Documents that are broken up into a list of tokens are thus said to have been tokenized. Sometimes, different forms of the same token are further standardized with a process called stemming (removing frills like suffixes) or lemmatization (a more complex approach for finding the root or lemma of a word), but that will be unnecessary for the approach we take here.

Lastly, a lexicon is a dictionary that links tokens—again typically words, or collections of a few words—to the (one hopes) context-independent categories or numerical values they represent. This is illustrated in Table 1, which contains example entries from three popular lexicons (Hu & Liu, 2004; Mohammad & Turney, 2013; Nielsen, 2011). It is important to note that not all lexicons cover the same collection of words, resulting in some missing cells in the table. So it is important to consider whether a lexicon covers the kinds of tokens likely to be used in a population or context. For example, Western internet slang often uses nonalphabetic emoticons or emojis (e.g., “:)”, “>:(“) to communicate important aspects of tone and research studies that are likely to encounter many tokens of that kind will benefit from lexicons that include them (e.g., Dodds et al., 2011). Moreover, not all lexicons categorize words in the same way. For example, the NRC lexicon provides eight potential emotions a word might be associated with, along with whether that word has a broadly positive or negative association. In contrast, the Bing lexicon categorizes words only by whether they are positive or negative, which is easier to interpret, but is less flexible. Different still, the AFINN lexicon provides integer ratings for only the positiveness or negativeness of a word, rated from -5 to +5.2

Table 1.

Example words and sentiment values from three lexicons

Word Lexicon
Bing NRC AFINN
absolve - - +2
absorbed - positive +1
abundance positive anticipation, disgust, joy, negative, positive, trust -
abundant positive - -
abuse negative anger, disgust, fear, negative, sadness -3

Blank spaces indicate words that are not included in lexicon. Bing is named after one of its creators, Bing Liu. AFINN is also named after one of its creators, Finn Årup Nielsen. NRC = National Research Council in Canada

The process of conducting a sentiment analysis thus involves applying the background assumptions outlined above to a collection of documents in a systematic way—one that intuitively resembles much of the BMC approach, but sped up with the aid of a computer. These steps are illustrated via an annotated example in Appendix 1 and involve first selecting a prevalidated lexicon (or creating a new one). Once the lexicon is selected, each document is tokenized into chunks that match the chosen lexicon (e.g., a lexicon designed for bigrams implies that each document should be tokenized into bigrams as well). To remove noise from the analysis, the analysis will exclude any stop words / stop tokens, which are words or phrases that occur so frequently that they are uninformative about the text from which they are drawn (e.g., “and,” “of the”). The remaining tokens are then scored according to the values present in the lexicon and aggregated (e.g., mean positivity per social media post). This results in a familiar dataset with cases (e.g., subjects, blocks of trials) represented in rows and sentiment scores for each of those cases represented in columns. From there, standard analytic methods can now be straightforwardly applied.

For example, after computing sentiments for several thousand social media posts, the example in Appendix 1 uses linear regression to evaluate how well the authors of those posts conform to the generalized matching law. That walkthrough also demonstrates several important features of sentiment analysis, including its feasibility for behavioral researchers and its fit with the existing pantheon of methods in behavior analysis. In particular, in only about 100 lines of human-readable code, we were able to collect 18,000 instances of verbal behavior from two research subjects, score them for their use of previously validated set of trust-related words, and then show that the use of those words by each subject occurred in proportion to the rate at which it received putative reinforcement (likes from other users). As additional evidence of feasibility, the first author used a time-tracking application to record exactly how many human hours this analysis took to code and run. Results showed the entire code construction process took 4.82 hr from start to finish. It must be admitted that the lead author already had experience with this and related techniques, making this a somewhat optimistic estimate for initiates. However, it does demonstrate that with practice, the study of naturally occurring verbal behavior via sentiment analysis can be far more efficient than the existing BMC approach.

What to Tokenize?

At this point, thoughtful readers may have followed the general outline of sentiment analysis, but have lingering questions about specifics. Among the most common questions are what unit of text to tokenize (e.g., letters, words, word bigrams) and what goes into making that choice. In response, a behavior analyst should first consider the research question and the nature of the text being analyzed. This will then guide the selection of a lexicon. From there, the best tokenization approach will generally be one that matches what is available in the chosen lexicon. For example, a lexicon that is designed for two-word bigrams will work best when the analyst chooses to tokenize their observed text into bigrams. In contrast, a lexicon that includes only single words will generally be most valid on single-word tokens. In that sense, the choice of how to tokenize depends heavily on the lexicon selected for the project. How, then, does an analyst choose a lexicon? This again depends on the needs of the research question and, frankly, what is available for use. However, we describe several issues related to lexicon selection throughout the rest of this discussion, including in the Limitations and Addressing Limitations sections, as well as in the Appendix.

Validity Checks

Lastly, a sentiment analysis is not fully complete without evidence that its results are valid. Thus, a final step involves hand-checking a random subset of documents that were scored by the machine. This is analogous to existing interobserver agreement procedures, and will help quantify whether the results of the sentiment analysis are sufficiently accurate for drawing scientifically meaningful conclusions. For example, in a previous study interested in suicide-related words (Cero & Witte, 2020), we asked blinded raters to examine a random subset of tweets that had been machine coded as related or not related to suicide. Results showed raters perceived the authors of putatively suicide-related tweets to be at significantly greater risk of a suicide attempt than the authors of nonsuicide-related tweets. Although the scoring procedure was shown to be imperfect, this follow-up check was sufficient to support the remaining claims of the article—in this case that people talking about suicide do so in social clusters. Given the existing commitment of behavior analytic research to interrater reliability with human raters, establishing a strong requirement for evidence of interrater reliability with machine-produced sentiment scores is an intuitive standard to maintain with this new analytic technique. It helps quantify the trustworthiness of the results, often revealing that imperfect instruments are reliable enough to safely draw important scientific conclusions.

But Is It Really Behavior Analysis?

The field of behavior analysis has enjoyed much success from a rigorous application of both tried and true methods, as well as adherence to its underlying philosophy. It is therefore understandable that the reader might wonder, “before I commit, how behavior analytic is the new technique, really?” In response, we note that sentiment analysis is merely a streamlined technique for labeling verbal products. By studying the relationship between those products, the stimuli that preceded them, and the outcomes that followed them, sentiment analysis is as behavior analytic as most of the hand-coding approaches in common use already.

Limitations

With the advantages of lexicon-based sentiment analysis established, we now focus on the limitations of lexicon-based sentiment analysis. The most common of these include complex forms of speech, like negation, sarcasm, and speculation. In addition, we emphasize that the effectiveness of lexicon-based sentiment analysis is limited by the quality of preexisting lexicons, which may be more or less available in certain languages.

Negation, Sarcasm, and Speculation

Attentive readers will note that the assumptions underlying sentiment analysis—that words have at least some context-independent semantic orientations that we can estimate—are more plausible for some styles of naturally occurring speech than others. For example, “I’m happy” and “I’m not happy” would mistakenly receive the same positive score with the basic techniques we applied above, even though the second speaker is clearly negating the fact that they are currently happy. Complicating things even more, the phrase “I’m soooo happy today” could imply extra happiness in the event of sincerity or substantial unhappiness in the event of sarcasm. The basic sentiment analysis techniques we have introduced here would likely fail this test. Equally difficult is the case of speculation (e.g., “I wonder if I will be happy today”). These styles of communication are similar both in that they are common and that they modify the usual meaning of the words in our standard lexicons. Attentive readers will note the connection to Skinner’s (1957) analysis of autoclitics, which are instances of verbal behavior designed to modify a listener’s usual reaction to a particular phrase. More recent behavior analytic work on grammar is also likely consistent with this content (Palmer, 2023), suggesting there are multiple avenues for behavior analysis research to engage in reciprocal contributions with NLP methods research.

Lexicons in Other Languages

Most existing lexicons are focused on the English language (Kaity & Balakrishnan, 2020). This introduces at least two problems. First and most obviously, it is not possible to use a lexicon-based approach without a lexicon. So a researcher studying speakers from a language without a preexisting lexicon will either have to make and validate their own, or find an alternative method—possibly going back to the traditional BHC approach above. Second, and more subtly, grammar varies across languages, so sentiment analysis techniques that rely on grammatical assumptions to solve problems cannot be relied upon outside their language of origin. For example, it is possible to handle simple negation in English by analyzing two-word bigrams (e.g., “not happy”). However, negation typically precedes verbs in Spanish (e.g., “No estoy allegro,” which is roughly “Not I’m happy”). So simply analyzing bigrams—a solution designed in English—is unlikely to be effective in Spanish. Researchers working outside English should thus work to find not only lexicons specific to their language of interest, but analysis and scoring methods that have been validated in that language as well.

Document-Level Sentiment

So far, all of the hypothetical examples and real research studies we have described were focused on relatively short speech products (e.g., one or two sentence phrases from an organic conversation, short internet forum posts). And in part because they are so short, it is easy to envision (at least in principle) a single number or label (e.g., “sad”) that could adequately summarize their tone. However, it is sometimes important to quantify the tone of much larger documents (e.g., a long patient history taken as part of a clinical interview, a patient’s extended reflection on their own treatment experience after discharge).

It is unfortunate that existing sentiment analysis techniques—including the lexicon-based techniques emphasized here—are known to be less reliable for long speech products (Cieliebak et al., 2013; Rhanoui et al., 2019). Some of this problem emerges in part because each word contains at least some “noise” that can distort a summary value, like a sentiment score. Depending of the kind of document being analyzed, more words may often mean more noise, making a reliable sentiment value hard to produce. In addition, longer documents are more difficult for sentiment analysis in part because the attitudes that gave rise to a verbal document can be truly and organically self-contradictory. In turn, the risk of such self-contradiction grows with the number of words in a document (e.g., diary entries after a divorce, a blog post about the experience of autism). For at least these reasons, sentiment scores are typically less trustworthy in documents longer than a few sentences.

Addressing Limitations

Knowing such problems are likely to arise in advance, how can we compute accurate sentiment scores in spite of them? This section discusses existing strategies for mitigating the limitations of complex speech (e.g., negation, sarcasm, speculation), as well as approaches for working in languages other than English. Although none of these approaches will completely eliminate the problems outlined above, each has the possibility of reducing their impact—in turn increasing the validity of research results.

Complex Speech Strategies

At the outset, we are disappointed to tell the reader that handling subtle forms of speech is still an open question in sentiment analysis (Joshi et al., 2016; cf. Dragut & Fellbaum, 2014; Dragut et al., 2014; Emerson & Declerck, 2014; Jia, 2009; Schneider & Dragut, 2015). It is a genuinely hard problem and no one solution has produced universal consensus.

Strategy 1: Choose an Honest Operational Definition

One option is to honestly and transparently expand your operational definition of the sentiment you are trying to study, as we have done in our walkthrough in the Appendix. We were careful to say we were analyzing “trust-related” language or “words with a trust dimension,” rather than “words expressing trust.” Our phrasing focuses on the general category to which a word belongs, and is likely expansive enough to safely include cases of negation, sarcasm, and speculation. In contrast, assuming communications that use trust-related words are reliably “expressions of trust” might be more specific, but is also clearly wrong. This careful choice of operational definition is essential for all researchers employing sentiment analysis and thus our recommendation for a minimum standard.

Strategy 2: Use a More-Complex Lexicon

The second approach is to perform a more complex kind of sentiment analysis that adjusts the score of a given word “happy,” depending on the words around it. This can most easily be done with a lexicon explicitly built for such a purpose. For example, the Sentiment Composition Lexicon of Opposing Polarity Phrases (SCL-OPP) includes collections of emotionally salient words (e.g., “happy”) that have been paired with negators (“not happy”), modals (“would have been happy”), and degree adverbs (“somewhat happy”) and re-rated for positive / negative polarity (Kiritchenko & Mohammad, 2017). Because this lexicon uses multiword tokens that have already been adjusted, a researcher can apply them using essentially the same code we offer above. However, it is important to remember that there are many more combinations of words than there are individual words. For this reason, multiword lexicons will tend to cover less of the overall space of possible speech a researcher might be interested in studying. For greater accuracy, this reduced coverage will likely be worth the trade, but it is worth knowing about it in advance.

Strategy 3: Use an Adjustment Formula to Re-Weight Negated Words

Another approach has been to try to mathematically adjust the sentiment of a word that has been negated. If happy = +.75, then not happy = -.75. Although intuitive, this approach is still quite coarse. For example, in contemporary English negating a positive word (“not good”) tends to imply negative sentiment, but negating a negative word (“not bad”) is more likely to imply neutrality than positivity (Hii, 2019; Kiritchenko & Mohammad, 2017). An additional problem comes from negation scoping (e.g., Pröllochs et al., 2015). For example, the scope of the negational phrase “I have not been happy in 10 years” is just one or two words preceding happy and the negation is easy to detect. But in the rearranged case of “happiness is an emotion I have not experienced in 10 years,” the negation follows happiness and is much farther away. How can we detect the scope at which a negation signal is active? An intriguing solution to both problems is to develop a simple formula that is mostly right most of the time, a fast and frugal improvement to the approach in our walkthrough that has shown some promise over existing alternatives (e.g., Hii, 2019). However, there is not yet a widely agreed upon formula that experts agree is reliable enough to be trusted across all contexts. Researchers employing these adjustments will still have to perform post-hoc validity checks of their results and will still find some (if fewer) cases of misscored words in context.

Strategy 4: Utilize a Machine Learning-Based Sentiment Analysis Technique

Recall that all of the sentiment analysis techniques we have discussed here are lexicon-based. They rely on a previously validated dictionary that connects verbal tokens to their associated values. An alternative approach is to use machine learning-based techniques, in which an algorithm attempts to automatically learn how to score new collections of text, based on several (typically large-scale) hand-scored collections it has seen before. This approach has often been shown to be more accurate than lexicon-based approaches alone (e.g., Kotelnikova et al., 2021), but is also beyond the scope of the discussion here. Interested readers are encouraged to consult other related works in this journal (Lanovaz et al., 2020; Turgeon & Lanovaz, 2020) and leading natural language processing conferences (Tang et al., 2019), as well as informative studies that incorporated either lexicon-based sentiment analysis (Yeung et al., 2020; Zhang et al., 2021) or machine-learning based sentiment analysis (Duong et al., 2020; Imtiaz et al., 2022).

Strategies for Languages Other than English

For working in non-English languages with no existing lexicon, there are several approaches that are still actively being investigated. One option is to machine-translate the documents into English, then proceed with an English-based sentiment analysis. On Arabic social media posts, this approach resulted in some loss of accuracy (i.e., some of the sentiment was “lost in translation”), but agreement between human raters and machine-coded sentiment was similar for the translated text and the text in its original language (Salameh et al., 2015). Moreover, in a comprehensive review of several approaches for multilingual sentiment analysis, this “translate to English and then analyze in English” approach was shown to outperform the other commonly used alternatives (Araujo et al., 2016). Taken together, this may be the most workable approach for behavior analysts in the near term, at least until a more comprehensive approach is developed.

Strategies for Document-Level Sentiment

Unlike the complex speech products of short texts described above, document-level sentiment analysis has received much less methodological attention. However, some techniques have shown initially promising results. First, although they are more complicated to implement, machine-learning approaches have been much more effective than the lexicon-based approaches discussed here. Their accuracy scores are admittedly less than perfect, but now fall within the practical range (e.g., +90%) required for most behavioral studies—including in French (Rhanoui et al., 2019).

Conclusion

Rigorous behavior-analytic research on language, emotion, and other topics of interest in mainstream psychology will require an objective and replicable methodology for quantifying complex verbal behavior (Friman et al., 1998). As described above, a number of these methods (viz., NLP) are used in other areas of psychology that (1) enjoy considerable empirical support and (2) are conceptually consistent with Skinnerian and post-Skinnerian accounts of verbal behavior. The methods described in the present discussion are congruent with the foundational dimensions of behavior analysis (Baer et al., 1968), because they rely on direct observation of verbal behavior (and related products; e.g., transcripts), are replicable and well-defined, and have generality to other domains of socially meaningful behavior of interest to behavior analysts. Moreover, this methodology is aligned with calls for additional work in the study of verbal behavior relating to emotion, sentiment, and associated phenomena (Critchfield, Doepke et al., 2017). This method entails the algorithmic operationalization of verbal operants according to their structural and functional properties. As we demonstrate in the Appendix, this method can be used in conjunction with other behavior-analytic concepts and models (e.g., the GML) to further examine and quantify response dynamics typically ignored in the realm of behavior analysis. This methodology also has the potential to be leveraged by behavior analysts to understand how behavior-analytic language influences the dissemination of behavioral research and practices to other providers and consumers (Critchfield, Becirevic et al., 2017). A full translation of sentiment analysis techniques into behavior analytic research will require additional studies, not just on the time trade-off that we have illustrated here (see Appendix), but also on the relative accuracy of sentiment analysis compared to the BMC approach—a worthwhile program of research in its own right. Given the immense potential of NLP methods for analyzing naturally occurring verbal behavior, they are fast becoming essential tools in the behavior analyst’s toolkit.

Appendix 1

For this worked example, we assume only a basic familiarity with the R programming language and the tidyverse suite of packages within it (Wickham & RStudio, 2017). We have intentionally written the code for maximum readability (sometimes at the cost of brevity), so even readers without this background should still be able to read along. Readers interested in brushing up on R and the tidyverse are encouraged to work through any of the excellent and freely available tutorials available online (Wickham & Grolemund, 2017). Readers interested in a more copy/paste-able format of this appendix can find the annotated raw code in our supplemental file here: https://osf.io/sp6mx/?view_only=cdcd6ff0df71417590672e34386e6beb .

A basic behaviorally informed sentiment analysis involves several steps, which we now demonstrate in order.

  1. Select a previously validated lexicon or create a new one

  2. Acquire raw verbal data (documents)

  3. Tokenize your documents and wrangle them into a “tidy” format.

  4. Remove stop words / stop tokens

  5. Use the lexicon to score each token

  6. Compute summary statistics (e.g., proportion of positive words)

  7. Analyze with standard behavior analytic methods (e.g., regression, visual analysis)

We will implement these steps to perform an analysis reminiscent of McDowell and Caron’s (2010) work connecting rule-break talk to received praise, in accordance with the GML. Except in this case, we will be examining whether two U.S. politicians—vice presidents Mike Pence and Kamala Harris—post tweets in accordance with the GML.

  • Step 1: Acquire an Appropriate Lexicon

Technically, steps 1 and 2 can be conducted out of order. We begin with the lexicon in this discussion simply because we needed to begin somewhere. When acquiring a lexicon, a researcher has two options. They can either utilize a prevalidated lexicon from previous research or create a new one. We encourage anyone new to sentiment analysis to use a pre-validated lexicon, which is both safer and faster. The lexicon you choose can come from a range of sources (Khoo & Johnkhan, 2018). The easiest to use will be those already available in an R package like tidytext (Silge & Robinson, 2022), which includes a helper function to download the lexicons displayed in Table 1. For most of the sentiment analysis a researcher would want to conduct in behavior analysis, these will be sufficient because they include several emotional categories that will get a researcher through their first few studies. By the time a researcher completes their first few studies with these well-known lexicons, they should already have a sense of the kind of things they would want in their next lexicon.

One other lexicon behavioral researchers should be aware of right away, however, is the Linguistic Inquiry and Word Count (LIWC; Boyd et al., 2022; Tausczik & Pennebaker, 2010). This lexicon was created for psychological research and has been evaluated and revised several times. It is especially valuable for its comprehensiveness, including many more word categories than is common in other lexicons (e.g., words related to cognitive processes, social processes, hierarchy). For this reason, LIWC has already been used extensively to study the connection between subjects’ linguistic content and a range of psychological topics and in a number of languages (Brandt & Herzberg, 2020; Cutler et al., 2021; Lumontod, 2020). Researchers who find themselves saying “I feel like the basic lexicons aren’t enough, I wish I had a lexicon that covered my niche topic” should immediately check whether LIWC covers their particular case.

In this case, we will use the National Research Council Word-Emotion Association Lexicon (NRC) lexicon, which was built up from a range of sources, including the preexisting WordNet affective lexicon and 8,000 terms from the General Inquirer (Mohammad & Turney, 2010, 2013). Previous work has used it specifically to study Twitter tweets, including the identification of suicide-related posts, predicting Affordable Care Act enrollment, and to evaluate global pandemic reactions (Dubey, 2020; Sarsam et al., 2021; Wong et al., 2015). This diversity of topics, including one using the NRC to predict overt behavior (e.g., insurance enrollment), all increase the plausibility this lexicon tracks behaviorally meaningful verbal content. It has the added advantage of being included in the tidytext R package, so we can load it directly like this.graphic file with name 40614_2023_394_Figa_HTML.jpg

With the NRC lexicon loaded into memory, we can narrow down the kind of sentiment we want to study in this analysis. Here, we retain only words that are related to the dimension of trust / mistrust. We expect this dimension is especially relevant to the occupational success of our two subjects, so it is likely to be a function of some salient reinforcer—like the number of “likes” from Twitter followers. Below, we also provide a random sample of the remaining trust-related words.graphic file with name 40614_2023_394_Figb_HTML.jpg

  • Step 2: Acquire Raw Verbal Data

Most researchers will already be aware of verbal data sources relevant to their research (e.g., intervention session transcripts), so we will avoid repeating some of the most common sources here. Instead, we point out that there are likely a few data sources that researchers have not previously considered. For example, some video conference platforms (e.g., Zoom) have built-in support for automatic transcription of recorded meetings. And readers will be pleased to learn that these transcriptions are both accurate and arrive in a standardized format. In a yet unpublished study, our own research group has already taken advantage of these resources, finding that the latency, duration, and content of speech is associated with intervention satisfaction, recall, and self-reported adoption at 1-month follow-up (manuscript in preparation).

Another example is Project Gutenberg, which provides digital versions of public domain literature. Although this is outside the scope of most modern behavior studies, we mention it to interested readers who might want to follow in Skinner’s early footsteps, which actually began with an analysis of alliteration in the works of William Shakespeare (Skinner, 1939).

The last approach—and the one we use for our worked example—is to use a REST API.3 Usually shortened to just “API,” this is a system for communicating with a web server via code, rather than a point-and-click interface. This process requires some initial effort, but is often simpler than it sounds and is a quick way to access a substantial amount of data. One of the most well-known APIs in research is the Twitter API, which allows people outside of Twitter to access a substantial amount of granular data on the activity of Twitter’s users. To give readers a sense of the scope, the first author was able to gather 64 million tweets from 17 million different for a recent study—all for free (Cero & Witte, 2020).

Although a comprehensive introduction to APIs is beyond the scope of the current discussion, Twitter’s own tutorial is a great introduction and will remain up-to-date whenever they implement changes (Twitter, 2022). In practice, the process involves filling out a brief application to Twitter, who will then provide a set of tokens that function like a username and password. Researchers can then pass these tokens and a search query to an R package (“academictwitteR”) that knows how to handle the Twitter API (Barrie et al., 2022), doing most of the work under the hood.

For example, to save the roughly 30,000 posts Pence and Harris have produced from 2016 through 2021, a researcher simply provides their bearer token from Twitter, a formatted search query for tweets from Pence’s and Harris’s accounts, the dates to search through, and a data path (folder) in which to save the results.graphic file with name 40614_2023_394_Figc_HTML.jpg

The get_all_tweets() function unfortunately saves tweet and user data in separate places, so we’ll need to load and merge them ourselves. For the loading, we can use the bind_tweets() function from the academictwitteR package to bring the tweets and user data into memory. Along the way, we extract (unnest(public_metrics)) some information about each tweet, including the like_count—our putative reinforcer in this mini matching study. We’ll also filter out retweets (which always start with an “RT”), retaining only the tweets generated by Pence and Harris themselves. By coincidence, this leaves exactly 18,000 tweets in total.

We can then use the left_join() function of the tidyverse package to add user information to each tweet.graphic file with name 40614_2023_394_Figd_HTML.jpg

  • Step 3: Tokenize Your Documents and Wrangle Them into a “Tidy” Format

In its current form, our full_df dataframe stores each tweet as a line of text. Although it is easy to read, this makes it hard for our code to access each individual word and compare it to the entries in our NRC lexicon. To get around this, we need to tokenize all of our tweets, so that each row of our dataframe will represent a single word. This is called the tidy format in R. Fortunately, the tidytext package makes this process easy, providing us with the unnest_tokens() function that handles everything automatically. We simply tell it we want a new column named word, which is made up of the individual words from the old text column. Careful readers will thus notice the first several entries in the word column of the tokenized_df now represent the first several words of the first text in the text column of the full_df.graphic file with name 40614_2023_394_Fige_HTML.jpg

  • Step 4: Remove Stopwords from the Dataset

Stop words or stop tokens (in the case of multiword n-grams) are those that occur so often that they are uninformative to the meaning of a text (e.g., “of,” “and,” “the”). Fortunately, just by loading the tidytext package, we have already loaded a precompiled list of stopwords called stop_words in the background. Thus, the quickest way to get these stopwords out of our tokenized_df dataframe is simply to anti_join() them. In an anti-join or anti-merge, only the records from the first dataset (tokenized_df) that DO NOT match anything in the second dataset (stop_words) are retained.

While we are removing unhelpful tokens, we’ll also filter out “t.co” and “https.” Visual inspection of our tokenized_df revealed these are both fragments of web links Pence and Harris posted in some of their tweets, which were accidentally included during the tokenization process (unnest_tokens() thought they were words worth retaining). Because our lexicon does not cover them, we can explicitly filter them out here too.graphic file with name 40614_2023_394_Figf_HTML.jpg

  • Step 5: Use the Lexicon to Score Each Token

We expect this likely sounds as though it will be the most intensive part of sentiment analysis. After all, we estimated for McDowell and Caron’s group to hand-score a much smaller sample of text likely took over 140 person hr. Scoring all the words from 18,000 tweets must be quite laborious, right? In fact, all of our words are effectively scored with just two lines of code, which join the words from our observed dataset to the values in our NRC trust lexicon.graphic file with name 40614_2023_394_Figg_HTML.jpg

A minor snag is that our nrc_trust lexicon only includes words that are trust-related. It produces missing values for everything on which the lexicon is silent (i.e., nontrust words). To simplify our upcoming analysis, we’ll compute a new true/false column called trust_word, which will indicate whether a given word in our dataset is a trust word, based on the values in the adjacent sentiment column.graphic file with name 40614_2023_394_Figh_HTML.jpg

As a quick sanity check, we’ll now peak at a random sample of trust and nontrust words from both subjects.graphic file with name 40614_2023_394_Figi_HTML.jpg

Here, we get a quick sense of the kinds of trust- and nontrust-related words each subject might be using. These randomly selected words are overall somewhat banal, but they are consistent with what we would expect. Words like “system” imply something that needs to be relied on, so they exist somewhere along a dimension of trustworthiness. Words like “fair” are morally salient, but do not imply something related to reliance and thus not scored as trust-related. The same is true of words like persecution, which is unfair to be sure, but does not indicate a dimension of trust.

  • Step 6: Compute Summary Statistics

For our upcoming matching analysis, we’ll want to know whether each subject produces tweets with trust-related words in proportion to the likes those tweets received. To get this far, we needed to break up (“tokenize”) whole tweets into individual words, so that we could score those words with a lexicon. Now that they have been scored in the trust_word column, we need to start going in reverse. We need to recombine words into tweets and summarize each tweet by whether any of its words is a trust word.graphic file with name 40614_2023_394_Figj_HTML.jpg

Once individual tweets have been scored in the is_trust_tweet column, we arrange tweets by their ID numbers (which are strictly in order of date produced) and assign them to blocks of 50 people. This final line, block = floor(row_number() - 1) / 50, is just a shorthand way of saying “take the row number of each tweet, divide by 50, and round down to the nearest integer, and treat that as its block number.”graphic file with name 40614_2023_394_Figk_HTML.jpg

With blocks assigned, we simply compute the familiar matching statistics. One trick to note is that R will treat TRUE and FALSE values as 1 and 0 when they are forced into mathematical computations. Thus, sum(like_count*is_trust_tweet) can be read “the sum of likes produced when is_trust_tweet is true.” We also proactively filter() to retain only cases where the log_b and log_r are still finite, which in this case is all of them because there were no blocks with 0 trust/non-trust tweets or 0 likes for either of those cases.graphic file with name 40614_2023_394_Figl_HTML.jpg

  • Step 7: Analyze with Standard Behavior-Analytic Methods (e.g., Visual

  • Analysis, Regression)

At this point, all that is left to do is perform a matching analysis. Because we have two subjects who will need separate regressions, we use the group-nest-map-tidy-unnest approach. It is probably overkill for only two regressions, but in the common case that a matching analysis includes a half-dozen or more subjects to regress, this strategy is both faster and safer than copy-pasting code.graphic file with name 40614_2023_394_Figm_HTML.jpg

Unnesting the regression coefficients reveals some interesting results. Both subjects are highly sensitive to the likes associated with trust-related words.graphic file with name 40614_2023_394_Fign_HTML.jpg

What is even more interesting are these substantial bias terms, which suggest that even if trust-related tweets produced likes in equal proportion to nontrust-related tweets, both subjects would still produce trust-related tweets in substantial excess. In particular, even if the likes received for each tweet type were perfectly balanced, Harris would be expected to produce 2^0.278 = 1.21x more trust-related tweets than nontrust related ones—and Pence would produce 1.34x more.

Examining the efficacy of the matching model to explain such behavior, note that the R-squared values for both subjects are significant, but much higher for Pence. Combined with a sensitivity very near 1.0 for this subject, such a finding suggests this learning model is a compelling (if as yet, nonexperimental) account of his verbal behavior over many years.graphic file with name 40614_2023_394_Figo_HTML.jpg

We can see this by visually examining the log behavior and log reinforcement rates for each subject for each block, observing that Pence’s blocks conform much more closely to the theoretically perfect matching (the dashed line). (Fig. 1).graphic file with name 40614_2023_394_Figp_HTML.jpg

Fig. 1.

Fig. 1

Matching analyses. Note. R output for the log2 behavior and reinforcement rates by block for both subjects. Solid lines represent empirically observed regression slopes and gray ribbons represent confidence bounds. Dashed lines represent theoretically perfect matching

Funding

This work was supported by a grant (KL2 TR001999) from National Center for Advancing Translational Sciences (NCATS) at the National Institutes of Health (NIH). It was also supported by a National Institutes of Health Extramural Loan Repayment Award for Clinical Research (L30 MH120727).

Data Availability

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Declarations

Conflicts of Interest

The authors declare that they have no financial or nonfinancial interests that are directly or indirectly related to the work submitted for publication.

Footnotes

1

Whether these key assumptions hold well enough for a given application is of course an empirical question. However, some version of them must be true in order for humans to communicate at all. For if the meanings of words were totally unique from context to context, verbal communication itself would be impossible.

2

If the reader is having trouble visualizing how all of the components fit together, this is understandable. A sentiment analysis often involves a few different components that come together to produce the desired output. Because understanding that process is often easier with a concrete example, we have prepared a worked illustration with a familiar analysis question (e.g., related to the matching law) in the Appendix.

3

This intimidating acronym stands for Representational State Transfer Application Programming interface, but that is not especially informative to most readers.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Araujo, M., Reis, J., Pereira, A., & Benevenuto, F. (2016). An evaluation of machine translation for multilingual sentence-level sentiment analysis. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing, (pp. 1140–1145). 10.1145/2851613.2851817
  2. Baer, D. M., Wolf, M. M., & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 1(1), 91. [DOI] [PMC free article] [PubMed]
  3. Bailey JD, Baker JC, Rzeszutek MJ, Lanovaz MJ. Machine learning for supplementing behavioral assessment. Perspectives on Behavior Science. 2021;44(4):605–619. doi: 10.1007/s40614-020-00273-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barnes-Holmes D, Hayden E, Barnes-Holmes Y, Stewart I. The implicit relational assessment procedure (IRAP) as a response-time and event-related-potentials methodology for testing natural verbal relations: A preliminary study. Psychological Record. 2008;58(4):497–515. doi: 10.1007/BF03395634. [DOI] [Google Scholar]
  5. Barrie, C., Ho, J. C., Chan, C., Rico, N., König, T., & Davidson, T. (2022). academictwitteR: Access the Twitter Academic Research Product Track V2 API Endpoint (0.3.1) [Computer software]. https://CRAN.R-project.org/package=academictwitteR
  6. Becirevic, A., Critchfield, T. S., & Reed, D. D. (2016). On the social acceptability of behavior-analytic terms: Crowdsourced comparisons of lay and technical language. The Behavior Analyst, 39, 305–317. [DOI] [PMC free article] [PubMed]
  7. Becirevic, A., Reed, D. D., Amlung, M., Murphy, J. G., Stapleton, J. L., & Hillhouse, J. J. (2017). An initial study of behavioral addiction symptom severity and demand for indoor tanning. Experimental and Clinical Psychopharmacology, 25(5), 346. [DOI] [PubMed]
  8. Boyd, R. L., Ashokkumar, A., Seraj, S., & Pennebaker, J. W. (2022). The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin, pp 1–47.
  9. Brandt PM, Herzberg PY. Is a cover letter still needed? Using LIWC to predict application success. International Journal of Selection & Assessment. 2020;28(4):417–429. doi: 10.1111/ijsa.12299. [DOI] [Google Scholar]
  10. Cero I, Witte TK. Assortativity of suicide-related posting on social media. American Psychologist. 2020;75(3):365–379. doi: 10.1037/amp0000477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Cieliebak, M., Dürr, O., & Uzdilli, F. (2013). Potential and limitations of commercial sentiment detection tools. In: ESSEM@ AI* IA, (pp. 47–58).
  12. Critchfield, T. S., Becirevic, A., & Reed, D. D. (2016). In Skinner's early footsteps: Analyzing verbal behavior in large published corpora. The Psychological Record, 66, 639–647. 
  13. Critchfield, T. S., & Doepke, K. J. (2018). Emotional overtones of behavior analysis terms in English and five other languages. Behavior Analysis in Practice, 11, 97–105. [DOI] [PMC free article] [PubMed]
  14. Critchfield, T. S., Doepke, K. J., Kimberly Epting, L., Becirevic, A., Reed, D. D., Fienup, D. M., ... & Ecott, C. L. (2017). Normative emotional responses to behavior analysis jargon or how not to use words to win friends and influence people. Behavior Analysis in Practice, 10, 97–106. [DOI] [PMC free article] [PubMed]
  15. Cutler AD, Carden SW, Dorough HL, Holtzman NS. Inferring grandiose narcissism from text: LIWC versus machine learning. Journal of Language & Social Psychology. 2021;40(2):260–276. doi: 10.1177/0261927X20936309. [DOI] [Google Scholar]
  16. De Choudhury, M., Counts, S., Horvitz, E. J., & Hoff, A. (2014). Characterizing and predicting postpartum depression from shared facebook data. In: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing—CSCW 14, 626–638. 10.1145/2531602.2531675
  17. De Choudhury, M., Kiciman, E., Dredze, M., Coppersmith, G., & Kumar, M. (2016). Discovering shifts to suicidal ideation from mental health content in social media. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems—CHI Conference, 2016, (pp. 2098–2110). 10.1145/2858036.2858207 [DOI] [PMC free article] [PubMed]
  18. Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM. Temporal patterns of happiness and information in a global social network: Hedonometrics and Twitter. PLoS ONE. 2011;6(12):1–26. doi: 10.1371/journal.pone.0026752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Dragut, E., & Fellbaum, C. (2014, June). The role of adverbs in sentiment analysis. In Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014) (pp. 38–41).
  20. Dragut, E. C., Wang, H., Sistla, P., Yu, C., & Meng, W. (2014). Polarity consistency checking for domain independent sentiment dictionaries. IEEE Transactions on Knowledge and Data Engineering, 27(3), 838–851. 
  21. Dubey, S., Biswas, P., Ghosh, R., Chatterjee, S., Dubey, M. J., Chatterjee, S., & Lavie, C. J. (2020). Psychosocial impact of COVID-19. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(5), 779–788. [DOI] [PMC free article] [PubMed]
  22. Duong V, Luo J, Pham P, Yang T, Wang Y. The ivory tower lost: How college students respond differently than the general public to the covid-19 pandemic. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 2020;2020:126–130. [Google Scholar]
  23. Emerson, G., & Declerck, T. (2014, August). SentiMerge: Combining sentiment lexicons in a Bayesian framework. In Proceedings of workshop on lexical and grammatical resources for language processing (pp. 30–38). 
  24. Friman PC, Hayes SC, Wilson KG. Why behavior analysts should study emotion: The example of anxiety. Journal of Applied Behavior Analysis. 1998;31(1):137–156. doi: 10.1901/jaba.1998.31-137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hayes SC, Barnes-Holmes D, Roche B, editors. Relational frame theory: A post-Skinnerian account of human language and cognition. 2001. Springer; 2001. [DOI] [PubMed] [Google Scholar]
  26. Herrnstein RJ. On the law of effect. Journal of the Experimental Analysis of Behavior. 1970;13(2):243–266. doi: 10.1901/jeab.1970.13-243. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Hii, D. (2019). Using meaning specificity to aid negation handling in sentiment analysis.
  28. Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (pp. 168–177).
  29. Hussey I, Daly T, Barnes-Holmes D. Life is good, but death ain’t bad either: Counter-intuitive implicit biases to death in a normative population. Psychological Record. 2015;65(4):731–742. doi: 10.1007/s40732-015-0142-3. [DOI] [Google Scholar]
  30. Imtiaz, A., Khan, D., Lyu, H., & Luo, J. (2022). Taking sides: Public opinion over the Israel-Palestine Conflict in 2021. arXiv Preprint arXiv:2201.05961.
  31. Jia, J. (2009). An AI framework to teach English as a foreign language: CSIEC. Ai Magazine, 30(2), 59–59. 
  32. Joshi, A., Bhattacharyya, P., & Carman, M. J. (2016). Automatic sarcasm detection: A survey (arXiv:1602.03426). arXiv. http://arxiv.org/abs/1602.03426
  33. Jurafsky D, Martin J. Speech and language processing. 2. Prentice Hall; 2008. [Google Scholar]
  34. Kaity M, Balakrishnan V. Sentiment lexicons and non-English languages: A survey. Knowledge & Information Systems. 2020;62(12):4445–4480. doi: 10.1007/s10115-020-01497-6. [DOI] [Google Scholar]
  35. Khoo CS, Johnkhan SB. Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons. Journal of Information Science. 2018;44(4):491–511. doi: 10.1177/0165551517703514. [DOI] [Google Scholar]
  36. Kiritchenko, S., & Mohammad, S. (2017). The effect of negators, modals, and degree adverbs on sentiment composition. arXiv Preprint arXiv:1712.01794.
  37. Kotelnikova, A., Paschenko, D., Bochenina, K., & Kotelnikov, E. (2021). Lexicon-based Methods vs. BERT for Text Sentiment Analysis. arXiv Preprint arXiv:2111.10097.
  38. Lanovaz MJ, Giannakakos AR, Destras O. Machine learning to analyze single-case data: A proof of concept. Perspectives on Behavior Science. 2020;43(1):21–38. doi: 10.1007/s40614-020-00244-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Lanovaz MJ, Hranchuk K. Machine learning to analyze single-case graphs: A comparison to visual inspection. Journal of Applied Behavior Analysis. 2021;54(4):1541–1552. doi: 10.1002/jaba.863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Liu B. Sentiment analysis: Mining opinions, sentiments, and emotions. 2. Cambridge University Press; 2020. [Google Scholar]
  41. Lanovaz RZ., III Seeing the invisible: Extracting signs of depression and suicidal ideation from college students’ writing using LIWC a computerized text analysis. International Journal of Research. 2020;9(4):31–44. [Google Scholar]
  42. Lumontod III, R. Z. (2020). Seeing the invisible: Extracting signs of depression and suicidal ideation from college students' writing using LIWC a computerized text analysis. International Journal of Research Studies in Education, 9, 31–44.
  43. Luna, O. (2019). Matching analyses as an evaluative tool: Characterizing behavior in juvenile residential settings.
  44. McDowell JJ. On the theoretical and empirical status of the matching law and matching theory. Psychological Bulletin. 2013;139(5):1000–1028. doi: 10.1037/a0029924. [DOI] [PubMed] [Google Scholar]
  45. McDowell JJ, Caron ML. Matching in an undisturbed natural human environment. Journal of the Experimental Analysis of Behavior. 2010;93(3):415–433. doi: 10.1901/jeab.2010.93-415. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Mohammad, S., & Turney, P. (2010, June). Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text (pp. 26–34).
  47. Mohammad, S., & Turney, P. D. (2013). NRC emotion lexicon. National Research Council, Canada, 2.
  48. Nielsen, F. Å. (2011). A new ANEW: Evaluation of a word list for sentiment analysis in microblogs (arXiv:1103.2903). arXiv. 10.48550/arXiv.1103.2903
  49. Normand, M. P., & Donohue, H. E. (2022). Behavior analytic jargon does not seem to influence treatment acceptability ratings. Journal of Applied Behavior Analysis, 55(4), 1294–1305. [DOI] [PMC free article] [PubMed]
  50. O’Reilly A, Roche B, Ruiz M, Tyndall I, Gavin A. The function acquisition speed test (fast): A behavior analytic implicit test for assessing stimulus relations. Psychological Record. 2012;62(3):507–528. doi: 10.1007/BF03395817. [DOI] [Google Scholar]
  51. Palmer, D. C. (2023). Toward a behavioral interpretation of english grammar. Perspectives on Behavior Science. 10.1007/s40614-023-00368-z [DOI] [PMC free article] [PubMed]
  52. Pröllochs, N., Feuerriegel, S., & Neumann, D. (2015). Enhancing sentiment analysis of financial news by detecting negation scopes. In: 48th Hawaii International Conference on System Sciences, (pp. 959–968). 10.1109/HICSS.2015.119
  53. Reed, D. D. (2016). Matching theory applied to MLB team-fan social media interactions: An opportunity for behavior analysis.
  54. Rhanoui M, Mikram M, Yousfi S, Barzali S. A CNN-BiLSTM model for document-level sentiment analysis. Machine Learning & Knowledge Extraction. 2019;1(3):832–847. doi: 10.3390/make1030048. [DOI] [Google Scholar]
  55. Salameh, M., Mohammad, S., & Kiritchenko, S. (2015). Sentiment after translation: A case-study on arabic social media posts. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 767–777. 10.3115/v1/N15-1078
  56. Sarsam, S. M., Al-Samarraie, H., Alzahrani, A. I., Alnumay, W., & Smith, A. P. (2021). A lexicon-based approach to detecting suicide-related messages on Twitter. Biomedical Signal Processing and Control, 65, 102355.
  57. Schneider, A., & Dragut, E. (2015, July). Towards debugging sentiment lexicons. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1024–1034). 
  58. Silge, J., & Robinson, D. (2022). Text mining with R: A tidy approach (2022-05-03 ed.). https://www.tidytextmining.com/
  59. Simon C, Baum WM. Allocation of speech in conversation. Journal of the Experimental Analysis of Behavior. 2017;107(2):258–278. doi: 10.1002/jeab.249. [DOI] [PubMed] [Google Scholar]
  60. Skinner BF. Alliteration in Shakespeare’s sonnets: A study in Liberary behavior. The Psychological Record. 1939;3:185. [Google Scholar]
  61. Skinner BF. Verbal behavior. Copley Publishing Group; 1957. [Google Scholar]
  62. Tang, R., Lu, Y., Liu, L., Mou, L., Vechtomova, O., & Lin, J. (2019). Distilling task-specific knowledge from bert into simple neural networks. arXiv Preprint arXiv:1903.12136.
  63. Tausczik YR, Pennebaker JW. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language & Social Psychology. 2010;29(1):24–54. doi: 10.1177/0261927X09351676. [DOI] [Google Scholar]
  64. Taylor T, Lanovaz MJ. Machine learning to support visual inspection of data: A clinical application. Behavior Modification. 2021;46(5):1109–1136. doi: 10.1177/01454455211038208. [DOI] [PubMed] [Google Scholar]
  65. Turgeon S, Lanovaz MJ. Tutorial: Applying machine learning in behavioral research. Perspectives on Behavior Science. 2020;43(4):697–723. doi: 10.1007/s40614-020-00270-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Turgeon S, Lanovaz MJ. Perceptions of behavior analysis in France: Accuracy and tone of posts in an internet forum on autism. Behavior & Social Issues. 2021;30:308–322. doi: 10.1007/s42822-021-00057-z. [DOI] [Google Scholar]
  67. Wickham H, Grolemund G. R for Data Science: Import, tidy, transform, visualize, and model data. O’Reilly Media; 2017. [Google Scholar]
  68. Wickham, H., & RStudio. (2017). tidyverse: Easily install and load the “tidyverse” [Computer software]. https://CRAN.R-project.org/package=tidyverse
  69. Wong, C. A., Sap, M., Schwartz, A., Town, R., Baker, T., Ungar, L., & Merchant, R. M. (2015). Twitter sentiment predicts Affordable Care Act marketplace enrollment. Journal of Medical Internet Research, 17(2), e51. [DOI] [PMC free article] [PubMed]
  70. Yeung N, Lai J, Luo J. Face off: Polarized public opinions on personal face mask usage during the COVID-19 pandemic. IEEE International Conference on Big Data (Big Data) 2020;2020:4802–4810. doi: 10.1109/BigData50022.2020.9378114. [DOI] [Google Scholar]
  71. Zhang, H., Gan, W., & Jiang, B. (2014). Machine learning and lexicon based methods for sentiment classification: A survey. In: 11th Web Information System and Application Conference, (pp. 262–265).
  72. Zhang X, Wang Y, Lyu H, Zhang Y, Liu Y, Luo J. The influence of COVID-19 on the well-being of people: Big data methods for capturing the well-being of working adults and protective factors nationwide. Frontiers in Psychology. 2021;12:2327. doi: 10.3389/fpsyg.2021.681091. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.


Articles from Perspectives on Behavior Science are provided here courtesy of Association for Behavior Analysis International

RESOURCES