Skip to main content
Wiley Open Access Collection logoLink to Wiley Open Access Collection
. 2024 Dec 3;48(12):e70022. doi: 10.1111/cogs.70022

A Constant Error, Revisited: A New Explanation of the Halo Effect

Chris Westbury 1,, Daniel King 1
PMCID: PMC11614318  PMID: 39625934

Abstract

Judgments of character traits tend to be overcorrelated, a bias known as the halo effect. We conducted two studies to test an explanation of the effect based on shared lexical context and connotation. Study 1 tested whether the context similarity of trait names could explain 39 participants’ ratings of the probability that two traits would co‐occur. Over 126 trait pairs, cosine similarity between the word2vec vectors of the two words was a reliable predictor of the human judgments of trait co‐occurrence probability (cross‐validated r 2 = .19, p < .001). Two measures related to word similarity increased the variation accounted for in the human judgments to 45%, cross‐validated (p < .001). In Experiment 2, 40 different participants judged similarity of word meaning within the pairs, confirming that the word pairs were not simply synonymous (Average [SD] = 40.8/100 [13.1/100]). Shared lexical context and word connotation play a role in shaping the halo effect.

Keywords: Halo effect, Cognitive bias, Decision‐making, Human judgment, Word embedding models

1. Introduction

The halo effect is a widely studied effect by which presumably independent human ratings are correlated. The name is somewhat misleading, reflecting the fact that the effect is considered in a simplified form, as a bias to form an overall positive rating of a person or thing based on a high rating of a single trait (e.g., Nisbett & Wilson, 1977). The analogous effect at the low end of the correlation range is sometimes called the horn (or horns) effect1 or the devil's effect. The halo/horns effect is usually construed as an innate cognitive bias. In this paper, we consider an alternate (albeit not incompatible) explanation. We present evidence suggesting that much of the variance in human judgments of trait correlation is predictable from empirically observable properties of the trait labels. To the extent that we are able to rely on such properties to explain how much people judge traits to be correlated (likely to co‐occur in an individual), we change the focus of explanations of this effect from innate cognitive biases to a different level of explanation that might help explain those biases, based on the role of quantifiable elements in the linguistic environment. We thereby use language statistics to partially explain an effect that is usually considered to be a social‐cognitive effect. The cognitive bias is perhaps not innate, but rather a processing bias that reflects patterns of ordinary language use.

The effect was first documented by Wells (1907), who had students rate different pieces of literature based on 11 qualities familiar to literary critics. He found that the ratings on all the other scales were strongly correlated with the general merit ratings (which of course means they were also correlated with each other). Wells concluded that individuals allow their general impression to influence their ratings of different literary qualities.

Thorndike (1920) published a paper called A Constant Error in Psychological Ratings which looked at presumed‐independent trait ratings of 137 Aviation Cadets by officers. The correlations between pairs of ratings were consistently “too high and too even” (p. 27), suggesting that even expert judges are unable to treat an individual as a compound of independent qualities. It was Thorndike who dubbed this bias toward the overcorrelation of trait pairs the halo effect.

The effect was initially studied primarily in the field of education, examining its influence on student ratings of teachers (Remmers, 1934; Stalnaker & Remmers, 1928; Starrak, 1934). Bingham (1939) examined the validity of the effect through a semi‐replication of Thorndike's original experiment. Participants rated people on 10 presumably independent traits. Bingham found that the judgments “unconsciously and inevitably manifest a halo effect” (p. 228).

The halo effect has been extensively researched in multiple fields, including social psychology, clinical psychology, behavioral psychology, child psychology, politics, and marketing (e.g., Cao et al., 2023; Fusaro, Corriveau, & Harris, 2011; Hartung et al., 2009; Naquin & Tynan 2003; Teneva, 2020; Zeigler‐Hill, Besser, & Besser, 2019). This work has demonstrated that the halo effect is not limited to human trait ratings but is also seen in ratings of food, wine, groups, and ideas (e.g., Apaolaza, Hartmann, Echebarria, & Barrutia, 2017; Iles, Pearson, Lindblom, & Moran, 2021; Naquin & Tynan, 2003).

Fisicaro and Lance (1990) proposed three theoretical models as explanations for the halo effect:

  1. The General Impression Model states that the effect arises due to the influence of a rater's general impression, as Thorndike believed and as evidence from Nisbett and Wilson (1977) supported.

  2. The Salient Dimension Model suggests that the effect occurs due to the influence of one trait on the perception or evaluation of other traits.

  3. The Inadequate Discrimination Model proposes that the effect happens due to the inability of a rater to delineate behavior into separate categories, with the result that evaluations along one dimension influence evaluations in other dimensions.

As we have suggested above, these models all assume that the only explanation for the effect is a cognitive bias. The suggestion of that ratings are overcorrelated due to an innate cognitive bias does not provide insight into why the bias occurs in any particular context. It fails to propose any mechanism to predict how strongly correlated any two specific traits will be judged to be. We address this gap in the literature by outlining an explanation that allows for well‐motivated estimates of the strength of the halo effect for any pair of traits. Our approach is to predict the judged correlations between individual trait pairs, which are the underlying basis of the effect. We conducted two studies to explore the role of linguistic context in generating and perpetuating those correlation judgments.

2. Study 1

The first study aimed to investigate whether the shared context of words could account for variability in judgments of the likelihood that two traits would co‐occur in an individual.

2.1. Method

The study was ethically reviewed by the Ethical Review Board at the University of Alberta. All participants have informed consent. Data are available at https://osf.io/h324v/. Data were analyzed using R (version 4.2.2; R Core Team, 2022) and the R packages ggplot2 (version 3.4.0; Wickham, 2016) and mgcv (version 1.8‐41; Wood, 2011).

To calculate the degree of shared context between any two trait words, we used a word‐embedding model that uses a simple neural network to represent the average context in which a particular word appears in a large corpus of language, the word2vec skip‐gram model (Mikolov, Chen, Corrado, & Dean, 2013; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013; Mikolov, Yih, & Zweig, 2013). This model constructs vectors representing each word, by moving a word window over a large corpus of text, with the target word being the center word. The model uses a neural network with a single hidden layer to predict the context words based on the target word (hence the name skip‐gram, because the target word is skipped). The hidden layer for each word (which is an abstract representation of that word's context) are used as vectors for the word. We used the skip‐gram model with a 300‐unit hidden layer and a two‐word context window on either side. Although the length of the vectors and the choice of a two‐word preceding and following context are arbitrary, these values are commonly used in language research. For a corpus, we used the 100‐billion‐word Google news corpus. The word2vec matrix developed on that corpus is available from: https://code.google.com/archive/p/word2vec/.

The experiment was conducted online through testable.org. Participants saw two words above a slider that went from 0 to 100. They were asked to use the slider to indicate how likely it was that a person would be rated highly on both the specified traits. The order of the trait names was counterbalanced. The order of the stimuli was randomized for each participant.

2.1.1. Participants

Thirty‐nine participants (21 males; 18 females) participated in this experiment in return for a partial university course credit. Their average [SD] age was 21.7 [12.9] years.

2.1.2. Stimuli

The stimuli were 126 pairs of human traits. These were selected as much without human intervention as possible, in the following way. We began with 106 trait words. For each word, we computed the cosine similarity with all 105 other words and sorted the list by descending cosine similarity. We selected the first, middle, and last pair from each list, unless the words in a pair were judged synonymous in which case we selected the nearest following pair that was not synonymous (see Experiment 2 for verification). After removing 22 duplicates, we had 293 unique pairs. We sorted these descending by cosine similarity and took the most similar, mid‐similar, and least similar pairs from each of the 42 pair sets in which the initial word appeared at least three times, to end up with the 42 * 3 = 126 pairs, which were composed of 93 distinct words, and which systematically covered the range of cosine similarity (Fig. 1).

Fig. 1.

Fig. 1

Distribution of cosine similarities in the 126 stimulus pairs of trait words.

2.2. Results

Data from three participants were removed because they had a very low variance centered on a “round” value (50%, 75%, or 100%), suggesting that participants had not made genuine judgments. We also removed 407 responses (8.2%) with reaction times above 20,000 ms, as being implausibly related to the task. Using 500 ms as a lower cutoff, there were no unreasonably rapid responses.

The remaining data were analyzed using generalized additive modeling (GAM), using the R package mcdv.

We began by splitting the data randomly into two sets, one for model development and one for model cross‐validation. The correlation of the human judgments of relatedness between the two sets was r = .694 (95% confidence interval: 0.590–0.775; r 2 = .48; p < 2e‐16).

Cosine similarity between the vectors of the two words [model M1] was a reliable predictor of the human judgments of relatedness in the raw data (r = .162 [95% confidence interval: 0.123–0.200]; r 2 = .026; p = 7.34e‐16; see Fig. 2a). It performed nearly as well on the cross‐validation dataset (r 2 = .024; p = 8.581e‐15; see Fig. 2a). Collapsing by pairs, the correlation between the model estimates and the human judgments was r = .407 (95% confidence interval: 0.250–0.543; r 2 = .166, p = 2.26e‐06, see Fig. 2b) in the development set and r = .432 (95% confidence interval: 0.280–0.564; r 2 = .187, p = 4.271e‐07, see Fig. 2b) in the cross‐validation set. Since the human judgments for the development and cross‐validation datasets account for 48% of the variance in each other, this is 18.7/48 = 39.0% of the possible explicable variance.

Fig. 2.

Fig. 2

Summary of model for predicting human trait relatedness judgments using vector cosine similarity only. (a) Raw data. (b) Data collapsed by stimulus pair.

Although we have shown that vector cosine similarity between word characteristic pairs can account for a large proportion (∼39%) of the systematic variance in human judgments of the relationship between trait pairs, there is no reason to believe it is the only relevant predictor. To investigate further, we built a more complex GAM that began by adding measures of logged word frequency of each of the words (from Shaoul & Westbury, 2006) and estimates of valence, arousal, and dominance for each word (from Hollis, Westbury, & Lefsrud, 2017) [Model M2]. After removing predictors that did not contribute with p < .00001, the final model contained just three predictors: cosine similarity, the logged frequency of the most frequent word in the pair, and the valence of the lowest‐valenced word in the pair (see Table 1). Model M2 considerably outperformed the cosine‐distance‐only model M1. On the human judgments of relatedness in the collapsed data, it accounted for 48.3% of the variance in the development data (r = .695 [95% confidence interval: 0.591–0.776]; p < 2.2e‐16; see Fig. 3b). It performed nearly as well on the cross‐validation dataset (in the collapsed dataset, r = .667; r 2 = .447; p = < 2.2e‐16; Fig. 3b). This is 44.7/48 = 93% of the systematic variance in the human judgments, over 2.3 times as much variance as the cosine‐only model M1. M2 had an Akaike information criterion value (AIC; Akaike, 1974) that was much lower (22,030) than the AIC value of M1 (22,584), suggesting it was much more likely to minimize information loss.

Table 1.

Summary of GAM model smooth terms for predicting relatedness judgments from lexical predictors

Predictor edf F p‐value
COSINE 6.728 7.579 < 2e‐16
MAXLogFreq 8.471 7.271 < 2e‐16
MINValence 4.946 6.827 3.24e‐07

Note. See also Fig. 3.

Fig. 3.

Fig. 3

Summary of model for predicting relatedness judgments using vector cosine similarity and lexical predictors (see Table 1). (a) Raw data. (b) Data collapsed by stimulus pair.

The relationship between the three predictors in Model M2 and the estimated human judgments is shown in Fig. 4.

Fig. 4.

Fig. 4

Relationship of predictors in model M2 (x‐axis) to estimated human judgments of characteristic relatedness (y‐axis) across the full dataset. Minimum and maximum values refer to the minimums and maximums within each characteristic pair that was judged. Fits are smoothed by GAM. Shading shows 95% confidence intervals.

Cosine similarity shows a roughly linear positive correlation with estimated relatedness, which supports the original hypothesis that inspired this work. Word pairs have a high cosine similarity when they occur in similar linguistic contexts, and (as Model M1 demonstrated) words sharing context are judged by humans as more closely related.

The maximum logged frequency has an inverted‐U relationship to the human judgments: that is, estimated relatedness is lowest when both words are low frequency (which must be the case when the maximum logged frequency is low, i.e., astuteness‐doggedness, agreeability‐forcefulness, courteousness‐helpfulness) or when at least one of the words is high frequency (e.g., commitment‐determination, ability‐attentiveness, boldness‐health). If word frequency is used by human judges as a proxy for characteristic frequency, the low‐low finding makes sense. Rare characteristics must be logically expected to be less likely to co‐occur with each other than more common characteristics, precisely because they are rare. However, this explanation would predict that characteristics with high frequency should be expected to be judged as more likely to be related, which is contrary to the relationship graphed in Fig. 5. A better explanation may be that the relationship between high maximum word frequency and human judgments of relatedness is modulated by the frequency of the other word in the pair. Some support for this interpretation is shown in Fig. 5, which shows the relationship between estimated human judgments of characteristic pair relatedness and the logged difference of the two word‐frequencies. Characteristic pairs formed of words with a large difference in frequency (i.e., a high‐frequency word paired with a low‐frequency word) do have low estimated relatedness judgments. This is consistent with the explanation of the low‐low frequency characteristic pairs. Rare characteristics must be less likely to co‐occur with any other characteristic by virtue of their rarity. When logged frequency difference was entered as a predictor in place of maximal logged frequency, the resultant model was slightly worse than but very similar to Model M2, by AIC value (Model 2 AIC = 22,030; model with frequency difference AIC = 22,037).

Fig. 5.

Fig. 5

Relationship of estimated human judgments from Model M2 (y‐axis) to the standardized logged difference between the two words in a characteristic pair (y‐axis). Minimum and maximum values refer to the minimums and maximums within each characteristic pair that was judged. Fits are smoothed by GAM. Shading shows 95% confidence intervals.

We also considered a model that uses the vector values to predict the human judgments, instead of using the estimates of valence, arousal, and dominance for each word. Since this model performed worse than M2, we will not discuss it further here.

2.3. Discussion

This study investigated how well shared lexical context can predict ratings of the likelihood that two traits would co‐occur in an individual. Cosine similarity between the vectors of the two words is a dependable predictor of human judgments of trait co‐occurrence, suggesting that human judgments of the relationship between two human traits are influenced in part by the shared linguistic context of the names of those traits.

It is easy to get a feel for what this means by considering some pairs of words that are judged by humans as highly related (e.g., friendliness‐helpfulness; dedication‐professionalism; humility‐thoughtfulness) and little related (e.g., agreeability‐forcefulness; industry‐kindheartedness; creativity‐seriousness). Although (for example) friendliness does not mean the same thing as helpfulness (a person can be helpful without being friendly, as Tom Hanks’ character initially was in the 2022 movie, A Man Named Otto), they are nevertheless the same kind of thing, in a way that the unrelated word pairs are not. That similarity of kind is captured by the cosine similarities, suggesting that people use the words in similar contexts.

3. Study 2

Our second experiment was designed to rule out the possibility that word meanings themselves might account for the correlation between cosine similarity of trait names and the probability of those traits co‐occurring in a person. It is predictable that words for human traits that have the same meaning will be judged as co‐occurring in the same person. A person who is pretty is certainly also beautiful, since the two words have similar meanings. As noted above, we originally selected our word pairs by hand and tried to rule out synonyms in doing so. The second experiment was conducted to confirm that we had done this successfully, so the results could not be attributable simply to shared meaning.

3.1. Method

The experiment was again conducted online at testable.org. The study was ethically reviewed by the Ethical Review Board at the University of Alberta. All participants gave informed consent. As in Experiment 1, participants saw two words above a slider that went from 0 to 100. They were asked to use the slider to indicate how similar in meaning the two words were. The order of the words within each pair were counterbalanced. Pairs were randomly ordered for each participant. Participants did not receive any information about the halo effect.

3.1.1. Participants

Forty participants (16 males; 21 females; 3 other) participated in this experiment in return for partial university course credit. Their average [SD] age was 18.9 [1.38] years. All had completed high school. Five had completed some college training.

3.1.2. Stimuli

The stimuli were the same 126 pairs of human trait names that were used in Experiment 1.

3.2. Results

Data from two participants were removed because they had a very low variance around a “round” value (50% and 75%), suggesting they may have responded by rote. We used the remaining responses to get average meaning‐relatedness scores for all word pairs.

As we had expected since we had selected our stimuli pairs to be distinct, the meaning ratings were generally low (Average [SD] = 40.8/100 [13.1/100]; Median = 39.4/100), supporting our intention that the words in each pair contain words that were not synonymous.

The meaning ratings were very strongly correlated with the cosine similarity of their vectors (r = .83, p < 2e‐16), as one might expect since word embedding models often treat cosine similarity between word vectors as a measure of meaning similarity, as they were designed to do. This serves as a sign that people are indeed able to detect cosine similarity, using meaning relatedness as a measure of it. Despite being so strongly correlated with the cosine similarity values, the meaning ratings were better predictors of the trait correlation judgments than those cosine similarity values were. After collapsing by trait pair, meaning ratings correlated at r = .68 with the trait correlation judgments, significantly better than the r = .49 for the cosine similarity of the word vectors in each pair (M1: by Fisher's r‐to‐z test, z = 3.08, p = .002). However, the correlation between the meaning judgments and the trait correlation judgments was no different than the correlation of r = .695 with the estimates from the more complete model M2 (z = 0.22, p = .41).

Using a GAM to predict the character correlation judgments, the meaning judgments entered but the cosine similarity values did not. The predictive value of the meaning judgments was the same as the best model considered above, Model M2, which produced estimates that correlated at r = .695 with the human judgments (by Fisher's r‐to‐z test, z = 0.22, p = .82). Note that this is approximately the same as the split‐half reliability of the human judgments (r = .694), which puts a limit on how much variance is explicable in those judgments, that is, meaning judgment accounts for ∼100% of the explicable variance in the character correlation judgments.

3.3. Discussion

This second study was conducted to ensure that the trait names in our pairs were not synonymous with each other and thereby to confirm that participants’ ratings were not based solely on the similarity of the word meanings. We confirmed that the words in each pair have dissimilar meanings.

Despite the word meaning ratings being low, there was a significant correlation between those ratings and the cosine similarity of their vectors, as would be expected if our experience of lexical meaning is in part a function of similarity of word use. Although the meaning ratings are better predictors of the character correlation judgments than the cosine similarity values in Model M1, they are not useful as predictors since we are only correlating one human judgment (one unknown) with another, which leaves us with no explanatory value (see discussion in Hempel & Oppenheim, 1948; Westbury, 2016). Since the words in pairs are not similar in meaning, the meaning ratings cannot be reflecting meaning per se. The strong correlation between the meaning ratings and the cosine judgments, in the context of average meaning ratings below 50/100, suggests rather that the meaning ratings largely reflect the shared context of words.

One potential criticism of using vector cosine similarity as a predictor of trait co‐occurrence is that cosine similarity may reflect, rather than explain, the halo effect. People may use words in similar contexts precisely because those words have a halo effect association. We are disinclined to accept this argument for two reasons. One is that it is scientifically unhelpful, since it does not offer any alternate explanation of the halo effect, but simply asserts without evidence that an alternate explanation exists. A second, stronger reason is that lexical co‐occurrence is a fundamental level of explanation, in the sense that it cannot be reduced to any consistent lower‐level effects. This is not because there are no lower‐level effects but rather because there are a great many of them that operate largely independently: all the elements that underlie and influence our experience of word meaning and our word use behavior (Bloor, 1983; Wittgenstein, 1953). In the case of our word pairs, there is a multitude of different reasons why two trait names would have a similar context. For example, compassion and devotion (with a relatively high cosine similarity of 0.48) are not necessarily associated by their nature (a person can have either one without the other) but are often associated via religious contexts that view both traits as desirable positive virtues. Similarly, confidence and credibility (with a higher cosine of 0.51) also have no necessary association. Some politicians are absurdly confident but utterly lacking in credibility. However, as a matter of practicality, it is often difficult to convey credibility without having confidence. Honesty and professionalism (with a yet higher cosine of 0.57) are associated by convention and marketplace compulsion, since honesty is one component of the higher‐order construct of professionalism.

4. Conclusion

In this study, we found support for the hypothesis that the halo effect is partly predictable from the similarity of word context and connotation. Word2vec cosine similarity between names for character traits, valence estimates, and word frequency are all statistically significant predictors of the judged probability that the traits will be seen together in the same person. In Experiment 2, we showed that this was not due to similarity of word meanings. Our data suggest that an explanation of the human judgments in terms of word characteristics is nearly as good an explanation as possible of the trait‐correlation judgments, accounting for nearly all of the systematic variance in the character correlation judgments. However, those judgments are fairly noisy (split‐half correlation of r = .694) so this is not the final word on explaining all variance in the halo effect.

Previous studies of the halo effect have been limited largely to manipulations of a single relationship or a small set of characteristics used in a real‐world setting. Here, we have accounted for almost all the explicable variance in human judgments of relatedness in a large set of characteristic pairs, allowing for prediction of the strength of the effect for any such pairs. Our models show the interplay between linguistic context and the formation of the halo effect, offering insights into the underlying cognitive process that drives the too‐high correlations between evaluative judgments.

Our study has limitations: it only looked at 126 pairs of English words in the particular context of estimating human traits. Future research could extend this to other languages, other traits, and other contexts where the halo effect is relevant.

In particular, having a calculable estimate of the halo effect has some potential practical applications. By identifying shared context between words, it may be possible to predict how individuals will perceive and evaluate people, products, or ideas based on their impressions. For example, there have been concerns over the comorbidity of diagnoses within the mental health profession which has been attributed (in part) to the halo effect (Hartung et al., 2009). A calculable measure of the shared context of different diagnostic labels may help mental health professionals to be aware of the possibility of overestimating their true relationship. Another application may be within marketing. There have been questions regarding the ethics of using certain words to advertise products (Iles et al., 2021). Words used in advertisements have been found to unethically influence consumers’ impressions of the healthfulness of products. A calculable estimate of the relationship between healthiness and words used in advertisements may help policymakers estimate the acceptance of words within marketing.

These studies provide a novel approach to understanding the underlying mechanisms of the halo effect by investigating its linguistic and contextual basis. The effect is systematically influenced by the shared context and semantic associations of words that can be derived from normal language use.

Acknowledgments

This work was supported by a grant from the Natural Sciences and Engineering Research Council (NSERC) of Canada.

Note

1

The exact origin of the term “horn effect” is unclear, perhaps because it is such an obvious extension of “the halo effect.” The earliest use we have been able to find is in a corporate training manual by Campbell and Knowlton (1956) but they do not make any claim to having originated it.

References

  1. Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 17(6), 716–723. [Google Scholar]
  2. Apaolaza, V. , Hartmann, P. , Echebarria, C. , & Barrutia, J. M. (2017). Organic label's halo effect on sensory and hedonic experience of wine: A pilot study. Journal of Sensory Studies, 32(1), e12243. 10.1111/joss.12243 [DOI] [Google Scholar]
  3. Bingham, W. V. (1939). Halo, invalid and valid. Journal of Applied Psychology, 23(2), 221–228. [Google Scholar]
  4. Bloor, D. (1983). Wittgenstein: A social theory of knowledge. New York: Columbia University Press. [Google Scholar]
  5. Campbell, R. , & Knowlton, E. R. (1956). Business leadership course for American Airlines: Program summary. Ithaca, NY: New York State School of Industrial and Labor Relations. [Google Scholar]
  6. Cao, S. , Tang, C. , Carboon, I. , Hayward, C. , Capes, H. , Chen, Y. J. M. , Brennan, E. , Dixon, H. , Wakefield, M. , & Haynes, A. (2023). The health halo effect of “low sugar” and related claims on alcoholic drinks: An online experiment with young women. Alcohol & Alcoholism, 58(1), 93–99. [DOI] [PubMed] [Google Scholar]
  7. Fisicaro, S. A. , & Lance, C. E. (1990). Implications of three causal models for the measurement of halo error. Applied Psychological Measurement, 14(4), 419–429. [Google Scholar]
  8. Fusaro, M. , Corriveau, K. H. , & Harris, P. L. (2011). The good, the strong, and the accurate: Preschoolers’ evaluations of informant attributes. Journal of Experimental Child Psychology, 110(4), 561–574. [DOI] [PubMed] [Google Scholar]
  9. Hartung, C. M. , Lefler, E. K. , Tempel, A. B. , Armendariz, M. L. , Sigel, B. A. , & Little, C. S. (2009). Halo effects in ratings of ADHD and ODD: Identification of susceptible symptoms. Journal of Psychopathology and Behavioral Assessment, 32(1), 128–137. [Google Scholar]
  10. Hempel, C. G. , & Oppenheim, P. (1948). Studies in the logic of explanation. Philosophy of Science, 15(2), 135–175. [Google Scholar]
  11. Hollis, G. , Westbury, C. , & Lefsrud, L. (2017). Extrapolating human judgments from skip‐gram vector representations of word meaning. Quarterly Journal of Experimental Psychology, 70(8), 1603–1619. [DOI] [PubMed] [Google Scholar]
  12. Iles, I. A. , Pearson, J. L. , Lindblom, E. , & Moran, M. B. (2021). “Tobacco and water”: Testing the health halo effect of natural American spirit cigarette ads and its relationship with perceived absolute harm and use intentions. Health Communication, 36(7), 804–815. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Mikolov, T. , Chen, K. , Corrado, G. , & Dean, J. (2013). Efficient estimation of word representations in vector space. In ICLR . Retrieved from https://arxiv.org/abs/1301.3781
  14. Mikolov, T. , Sutskever, I. , Chen, K. , Corrado, G. , & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In NIPS .
  15. Mikolov, T. , Yih, W.‐T. , & Zweig, G. (2013). Linguistic regularities in continuous space word representations. HLT‐NAACL, 13, 746–751. [Google Scholar]
  16. Naquin, C. E. , & Tynan, R. O. (2003). The team halo effect: Why teams are not blamed for their failures. Journal of Applied Psychology, 88(2), 332–340. [DOI] [PubMed] [Google Scholar]
  17. Nisbett, R. E. , & Wilson, T. D. (1977). The halo effect: Evidence for unconscious alteration of judgments. Journal of Personality and Social Psychology, 35(4), 250–256. [Google Scholar]
  18. R Core Team . (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R‐project.org/. [Google Scholar]
  19. Remmers, H. H. (1934). Reliability and halo effect of high school and college students’ judgments of their teachers. Journal of Applied Psychology, 18(5), 619–630. [Google Scholar]
  20. Teneva, E. V. (2020). The halo effect in the political discourse of the English‐language online media. Russian Linguistic Bulletin, 23(3), 106–109. [Google Scholar]
  21. Thorndike, E. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4(1), 25–29. [Google Scholar]
  22. Shaoul, C. , & Westbury, C. (2006). Word frequency effects in high‐dimensional co‐occurrence models: A new approach. Behavior Research Methods, 38(2), 190–195. [DOI] [PubMed] [Google Scholar]
  23. Stalnaker, J. M. , & Remmers, H. H. (1928). Can students discriminate traits associated with success in teaching? Journal of Applied Psychology, 12(6), 602–610. [Google Scholar]
  24. Starrak, J. A. (1934). Student rating of instruction. Journal of Higher Education, 5(2), 88–90. [Google Scholar]
  25. Wells, F. L. (1907). A statistical study of literary merit; With remarks on some new phases of the method. Archives of Psychology, 7, 5–30. [Google Scholar]
  26. Westbury, C. (2016). Pay no attention to that man behind the curtain: Explaining semantics without semantics. Mental Lexicon, 11(3), 350–374. [Google Scholar]
  27. Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer‐Verlag New York. [Google Scholar]
  28. Wittgenstein, L. (1953). Philosophical investigations. Oxford: Blackwell Publishers Ltd. [Google Scholar]
  29. Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B), 73(1), 3–36. [Google Scholar]
  30. Zeigler‐Hill, V. , Besser, Y. , & Besser, A. (2019). A negative halo effect for stuttering? The consequences of stuttering for romantic desirability are mediated by perceptions of personality traits, self‐esteem, and intelligence. Self and Identity, 19(5), 613–628. [Google Scholar]

Articles from Cognitive Science are provided here courtesy of Wiley

RESOURCES