Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Jul 31.
Published in final edited form as: J Couns Psychol. 2018 Oct 4;66(2):224–233. doi: 10.1037/cou0000312

The Development and Evaluation of a Brief Form of the Normative Male Alexithymia Scale (NMAS-BF)

Ronald F Levant 1, Mike C Parent 2
PMCID: PMC10388695  NIHMSID: NIHMS1913815  PMID: 30284847

Abstract

The current study extended prior work on the Normative Male Alexithymia Scale (NMAS), a unidimensional measure of some men’s limitations in expressing emotion that results from gender-based socialization informed by the masculine norm of restrictive emotionality (RE). Data (N = 505 men) were from Amazon Mechanical Turk participants. First, dimensionality was reassessed using exploratory factor analysis, which supported the unidimensional structure. Second, based on these results, three 6-item models of the NMAS-Brief Form (NMAS-BF) were developed, based on classical test theory (CTT), CTT optimized to avoid item redundancy, and item response theory (IRT). Third, the relative fits of these versions were assessed using confirmatory factor analysis on a separate part of the sample, finding that the IRT version was the best fitting model. Fourth, evidence for reliability for the NMAS-BF items (α = .80) and validity was found. Convergent evidence for validity was supported by a significant, moderate, positive correlation between the latent constructs of the NMAS-BF and Toronto Alexithymia Scale-20 (TAS-20), which measures clinical alexithymia. Concurrent evidence for validity of the latent factor of the NMAS-BF was assessed in a structural regression model which found that the NMAS-BF uniquely predicted RE scores when TAS-20 scores were included in the model. Finally, incremental evidence for validity was examined using hierarchical multiple regression, finding that NMAS-BF scores significantly predicted variance in RE scores above and beyond that predicted by TAS-20 scores. The results are discussed in relation to prior literature, future research directions, applications to counseling practice, and limitations.

Keywords: alexithymia, Normative Male Alexithymia Scale, item response theory, structural equation modeling, incremental validity


In the past 30 years, counseling psychologists have made significant advancements in the measurement of masculinity-related constructs, such as gender role conflict (O’Neil, 2008), conformity to masculine norms (Mahalik et al., 2003), and masculinity ideology (Levant, Hall, & Rankin, 2013; Levant, Hall, Weigold, & McCurdy, 2016). The present paper focuses on one such construct—normative male alexithymia (NMA) and undertakes the development of a brief form of an extant scale designed to assess this construct with improved psychometric properties—the Normative Male Alexithymia Scale-Brief Form (NMAS-BF).

Alexithymia literally means “without words for emotions.” Sifneos (1967) originally coined the term to describe the impediments that certain psychiatric patients had in identifying and describing their feelings and related phenomena. Mild-to-moderate alexithymia symptoms were later observed in nonclinical populations, specifically participants in a fatherhood education course (Levant, 1992). Levant (1992) formulated the “NMA” hypothesis to account for these men’s observed limitations in identifying, describing, and, most particularly, expressing emotions. Levant posited that NMA was due to gender-based socialization practices influenced by the traditional masculine norm of restrictive emotionality (RE). RE discourages boys from showing vulnerability (i.e., boys do not cry or show fear) or their need for/attachment to other people (boys stand on their own two feet and do not need anyone). As a result of such childhood socialization experiences boys are discouraged from expressing and talking about their vulnerable and attachment emotions, and hence do not develop a vocabulary for, or awareness of, many of their emotions. Levant noted that men experiencing such gender-linked, normative, mild-to-moderate alexithymia did not display the severe symptoms associated with clinical alexithymia, such as a wooden facial expression, an inability to recognize even the physiological components of emotions, and a pensee operatoire cognitive style that focuses on the external details of everyday life. Thus such men would likely score in the nonalexithymic range on instruments designed to assess clinical alexithymia, such as the Toronto Alexithymia Scale (TAS-20; Bagby, Parker, & Taylor, 1994).

NMA is negatively correlated with relationship satisfaction and communication quality and positively correlated with fear of intimacy in men in heterosexual relationships (Karakis & Levant, 2012). Furthermore, as Levant (2001) theorized, NMA blocks men who suffer from it from utilizing the most effective means known for dealing with life’s stresses and traumas—namely, identifying, thinking about, and discussing one’s emotional responses to a stressor or trauma with a friend, family member, or counselor. Consequently, it predisposes such men to deal with stress in ways that make certain forms of pathology more likely, such as substance abuse, violent behavior, sexual compulsions, and stress-related illnesses. It also makes it less likely that such men will be able to benefit from counseling, which requires active engagement with one’s emotions.

Levant’s observations were consistent with a central tenet of the gender role strain paradigm (GRSP; Pleck, 1981, 1995), namely, that societal forces differentially shape men according to the degree to which they have been reared as boys to adhere to the norms of traditional masculinity. Levant (1992, 1995, 2001) drew on the GRSP to theorize that mild-to-moderate forms of alexithymia would occur more frequently among men whose socialization as boys was informed to greater degrees by traditional masculinity ideology (TMI). Levant’s (2001) review of relevant developmental psychology research literature on the emotion socialization of boys concluded that the evidence supported the view that boys are socialized to avoid the expression of vulnerable and caring emotions, whereas girls are encouraged to be expressive, which he theorized was likely to produce gender differences in alexithymia.

To assess the extent of these gender differences in alexithymia, Levant et al. (2006) reviewed 45 published studies which examined such gender differences. The investigators noted that few of the 13 studies using clinical samples found gender differences, which made sense because psychological disorders often impact the expression of emotion in people of all genders. However, the 32 studies using nonclinical samples presented a different picture: 17 of these studies found males more alexithymic than females, one found females more alexithymic than males, and 14 found no differences between males and females. The alexithymia literature was next meta-analyzed to further assess the extent of these gender differences (Levant, Hall, Williams, & Hasan, 2009). An effect size estimate based on 41 existing samples found consistent, although expectedly small, differences in mean alexithymia between women and men (Hedges’ d = .22). Men exhibited higher levels of alexithymia. There were no significant moderator effects for clinical versus nonclinical populations or alexithymia measure used, although there were relatively few clinical samples and most studies used the Toronto Alexithymia Scale-20 (TAS-20).

Emotional expression varies with culture. Several studies have examined the relationship between alexithymia and TMI, with a focus on the cultural dimensions of race and ethnicity. Using a large racially/ethnically diverse sample (40.7% Latino/a, 35% White, and 24.3% Black), Levant et al. (2003) found a relationship between TMI and alexithymia in men across these races/ethnicities. After controlling for demographic variables, TMI accounted for unique variance in alexithymia in men. A later analysis of the same diverse sample examined the role of race and gender as moderators of the relationship between TMI and alexithymia (Levant & Wong, 2013). While neither race nor gender moderated the relationship between these two variables, the moderating effect of race on the relationship between TMI and alexithymia was strongly affected by gender: TMI was more strongly related to alexithymia for White men than for racial minority men, whereas TMI was more strongly related to alexithymia for racial minority women than for White women. Finally, Levant, Wong, Karakis, and Welsh (2015) assessed a mediated moderation model of the relationship between RE and alexithymia in men. Conformity to the masculine norm of emotional control mediated the positive relationship between RE and alexithymia. In addition, the positive relationship between RE and alexithymia was stronger for Latino men versus men from other racial groups, but weaker for Asian American men versus men from other racial groups. Finally, the RE by race (Latinos vs. others) moderation effect on alexithymia was mediated through its association with emotional control, providing support for a mediated moderation effect.

To assess some men’s socialized limitations in emotional expression, Levant et al. (2006) developed the Normative Male Alexithymia Scale (NMAS). Exploratory and confirmatory factor analyses using separate samples indicated that the NMAS consisted of a single 20-item factor. Men’s scores on the NMAS displayed very good internal consistency (α = .92) and test–retest reliability (r = .91) over a 1–2 month period. Results of analyses of gender differences, relations of the NMAS with other instruments, and its incremental validity in predicting masculinity ideology provided evidence supporting the validity of the scale. However, the incremental fit indices did not support the factor structure of the original NMAS using contemporary standards (i.e., Tucker-Lewis index [TLI] = .85, comparative fit index [CFI] = .87). Although root-mean-square error of approximation (RMSEA; 0.08) met the criterion of ≤.08, Gignac, Palmer, and Stough (2007) pointed out that such absolute fit indices “may erroneously suggest satisfactory levels of model fit simply because the items are only weakly correlated” (p. 248). Hence there is a need for measure with acceptable fit statistics. Furthermore, the 20-item NMAS is long for a unidimensional scale, potentially creating participant fatigue as the NMAS is likely to be one of several assessments used in batteries related to masculine socialization and emotional expression.

The Present Study

The present study was designed to extend prior work on the NMAS. There were four objectives. The first aim was to conduct another exploratory factor analysis (EFA) of the NMAS. There were several reasons for undertaking this EFA: It has been over 10 years since the first one was conducted, the initial EFA was conducted with college students (Levant et al., 2006) whereas the present sample includes a wider range of ages in participants collected online, and the new EFA results will be used to develop a brief form of the NMAS (NMAS-BF). Based on prior literature, Hypothesis 1 (H1) is advanced, that evidence will be found for one-factor dimensionality of the NMAS. The second aim was to develop candidate models of the NMAS-BF and to compare them using confirmatory factor analysis (CFA). Item selection in scale development is often guided by classical test theory (CTT), in which the highest-loading items from an EFA are chosen to compose the final scale, and one of the candidate models will be developed this way (i.e., the CTT model). However, several NMAS items have similar content. Such content overlap may result in strong correlations among these items and potentially suboptimal content overlap among the highest-loading items (i.e., that the items selected for the NMAS-BF, based solely on factor loading strength, could be redundant or assess a limited range of the construct of interest). Based on this anticipated challenge, we decided a priori that we would also select items based on a combination of item loading and an examination of content to reduce redundancy (i.e., an optimized CTT model). Finally, item response theory (IRT) has more recently been advocated as superior to CTT for item selection (DeVellis, 2016; Mallinckrodt, Miles, & Recabarren, 2016). In using CTT for selection of items on a short form, one would select the highest-loading items for retention in the short form of the measure. In contrast, IRT, as applied to the development of measures such as the NMAS-BF, emphasizes item selection based on dispersing items across degrees of “difficulty” (for a Likert-type scale, difficulty reflects whether an item tends to be endorsed or not endorsed). Paradoxically, because items selected via IRT are likely to assess a wider range of the construct of interest than those selected by CCT, the items are likely to be less correlated and the model fit for an IRT-derived model would be superior to a CTT-derived model while assessing a broader range of the construct than the CTT-derived items. The objective was thus to generate three candidate models of the NMAS-BF: One based on CTT (Model 1), one based on CTT optimized for item diversity (Model 2), and one based on IRT (Model 3). We hypothesized (H2) that the IRT model would be superior to the other two models.

The third aim was to evaluate the convergent, concurrent, and incremental evidence for the validity of the NMAS-BF using latent variables. The use of latent instead of manifest variables to assess validity is important because prior research has found that many significant correlations calculated from raw scores were not significant when using latent variables (Levant, Alto, McKelvey, Richmond, & McDermott, 2017; Levant et al., 2016). Convergent evidence for validity was assessed by examining the correlations between the latent variables of normative male alexithymia (using the NMAS) and alexithymia (using the TAS-20). These two constructs overlap in terms of difficulties in identifying and describing feelings, but do not overlap on the more severe aspects of alexithymia (i.e., externally oriented thinking), on the one hand, nor on the socialized process of restricting the expression of vulnerable and caring emotions, on the other hand. Structural and hierarchical regression was used to evaluate the concurrent (unique) and incremental evidence for validity, respectively, of the latent factor of the NMAS-BF by examining relationships with alexithymia (TAS-20) and the RE norm of TMI (RE). If we find that the NMAS-BF explains both unique and incremental variance in RE when the TAS-20 is in the model that would suggest that the NMAS-BF may be tapping a form of alexithymia that is more directly related to men’s gender role socialization than the TAS-20.

For this objective the following hypotheses were advanced. Hypothesis 3 (H3): convergent evidence for validity would be supported by finding a significant, moderate-to-strong, positive correlation between the latent constructs of normative male alexithymia and alexithymia. Hypothesis 4 (H4): concurrent evidence for validity would be demonstrated by latent NMAS-BF scores uniquely predicting latent RE scores when latent alexithymia scores are included in the model. Hypothesis 5 (H5): incremental evidence would be found for validity by NMAS-BF scores significantly predicting variance in RE scores above and beyond that predicted by alexithymia scores.

Method

Participants

The present study uses data from a larger project, from which no publications have yet occurred. A total of 505 men were included in the data analysis. Participants ranged in age from 19 to 73 years, with a mean of 35.28 (SD = 11.08, median = 33, mode = 26). In regard to race/ethnicity, a majority of participants who responded to this question identified as White (373, 73.9%), and 57 (11.3%) identified as Asian or Asian American, 26 (5.1%) as Black, 24 (4.8%) as multiracial, 16 (3.2%) as Hispanic, five (0.8%) as American Indian, and four (0.8% of the total sample) did not respond to this question. Regarding sexual orientation identity, most (456, 90.3%) participants reported their sexual orientation as heterosexual, although 21 (4.2%) indicated they were bisexual, 18 (3.6%) indicated they were gay, and five (1.0%) indicated a different identity. Five participants (1.0%) did not respond to this question.

Recruitment and Survey Procedures

The study was approved by the university institutional review board. Community-dwelling participants were recruited using Amazon’s Mechanical Turk (MTurk) service. Data obtained from MTurk has been demonstrated to be valid and reliable when appropriate selection criteria and attention checks are used (Casler, Bickel, & Hackett, 2013; Peer, Vosgerau, & Acquisti, 2014), as was the case in this study. All participants were provided with a link to a Qualtrics website, which hosted the study. After completing the informed consent page, participants filled out the questionnaires and were provided with an educational debriefing. The survey contained two validity check items (e.g., “Please check strongly agree;” 30 participants were removed from the data set for failing to correctly respond to the validity check item and are not included in any analyses. Following completion of the study, credit was granted through an automated link between the Qualtrics survey and MTurk.

Sample Size Considerations

For the EFA, we used the MacCallum, Browne, & Sugawara (1996, Table 4) criteria. With 20 observed variables, there were 210 degrees of freedom, indicating that the minimum N is <178. We had randomly selected 247 cases from the full data set of 505, which is more than adequate. For the CFA, we used the remaining 258 cases, and for the validity analyses we used the full data set of 505. Kline (2016) recommended a minimum of 10 participants for every freely estimated parameter. The CFAs had 18 parameters, requiring 180 participants. Our N of 258 exceeds this number. The validity analysis using structural regression had 36 parameters, requiring 360 participants. Our N of 505 exceeds this number.

Table 4.

Raw Score Scale Intercorrelations, Alpha Coefficients, Means, and Standard Deviations

Scale 2 3 α Mean SD

1. NMAS-BF .49** .28** .80 4.04 1.32
2. TAS-20 .30** .89 2.39 .62
3. RE .85 2.60 1.32

Note. Scores for the NMAS-BF and RE range from 1 to 7, with higher scores indicating greater normative male alexithymia, and greater endorsement of the masculine norm of restrictive emotionality. Scores for the TAS-20 range from 1 to 5, with higher scores indicating greater alexithymia. NMAS-BF = Normative Male Alexithymia Scale-Brief Form; TAS-20 = Toronto Alexithymia Scale-20; RE = Restrictive Emotionality subscale of the Male Role Norms Inventory-Short Form.

**

p < .01.

Measures

Demographic Questionnaire.

This questionnaire inquired about gender, age, race/ethnicity, and sexual orientation.

NMAS.

The NMAS (Levant et al., 2006) is a 20-item inventory designed to assess normative male alexithymia. Participants answered questions about their own experience of emotions on a 7-point scale (1 = strongly disagree; 7 = strongly agree), with higher scores indicating greater normative male alexithymia. A sample item is “It is difficult for me to reveal my innermost feelings, even to close friends.” Seven items are reverse scored. NMAS scores were derived by taking a mean of the individual item scores, after recoding the reverse-scored items. The scale was constructed using two samples of mostly White university students (sample 1 = 248 men; sample 2 = 407 men and women). EFAs and CFAs indicated that the NMAS consisted of a single 20-item factor. As discussed above, scores on the NMAS displayed evidence of internal consistency, test–retest reliability, and validity, but did not have adequate fit statistics in the CFA.

TAS-20.

The TAS-20 (Bagby et al., 1994) is the most widely used measure of alexithymia, a construct referring to a cluster of characteristics including difficulty identifying and describing feelings, and externally oriented thinking. Participants rated their agreement with 20 statements on a 5-point scale (1 = strongly disagree; 5 = strongly agree), with higher scores indicating greater alexithymia. A sample item is “I am often confused about what emotion I am feeling.” Five items are reverse-scored. TAS-20 total scores were derived by taking a mean of the individual item scores after recoding the reverse-scored items. The TAS-20 was developed using a derivation sample of 965 university students, both men and women, to conduct an EFA, and was confirmed with two samples of men and women: 401 university students and 218 psychiatric outpatients. The scale developers reported total scale coefficient αs from .80 to .83 in the three different samples. Convergent validity has been demonstrated by negative associations with closely related constructs such as psychological mindedness, need-for-cognition, affective orientation, and emotional intelligence (see Taylor, 2004, for a summary of research using a broad array of student, community, and clinical samples).

RE subscale of the Male Role Norms Inventory—Short Form (MRNI-SF).

RE is one of seven three-item subscales of the MRNI-SF (Levant et al., 2013), which measures the endorsement of TMI. The scale was developed using data from 1,017 university men and women, who were mostly White and heterosexual. It was subsequently used in two samples diverse in terms of race/ethnicity and sexual orientation: Levant et al., 2015; McDermott et al., 2017). Participants responded on a 7-point scale (1 = strongly disagree;7 = strongly agree). RE scores are derived by taking a mean of the individual item scores, with higher scores indicating greater endorsement of the RE norm. A sample item is: “Men should be detached in emotionally charged situations.” No items are reverse scored. Levant et al. (2016) reported an alpha coefficient of .82. Using structural equation modeling (SEM), RE showed significant correlations with a latent NMAS factor, the Conformity to Masculine Norms Inventory-46 Emotional Control specific factor, and Gender Role Conflict Scale-SF RE first-order factor, providing concurrent evidence for validity (Levant et al., 2016).

Data Analytic Procedures

Overview.

EFA of the 20 NMAS items was conducted using principle axis factoring to assess the dimensionality of the scale. We then generated the three candidate models of the NMAS-BF defined above. We planned a priori to generate six-item versions of the NMAS-BF to accomplish two goals. First, we used a multiple of 3 because construction of latent variables in SEM requires use of at least three manifest variables to indicate a latent factor without causing local identification problems (Little, Cunningham, Shahar, & Widaman, 2002). Use of a model with a number of items that is a multiple of 3 also allows for easy construction of balanced item parcels and would be useful to future applications of the NMAS-BF. Second, we intended for the NMAS-BF to be a brief scale because the construct assessed would likely be only one of several assessment instruments included in future studies, and brevity would minimize participant burden. Finally, the candidate models of the NMAS-BF were compared in a separate sample using CFA.

Statistical analyses.

The EFAs, descriptive statistics, and multiple regressions were calculated using SPSS 25. The IRT analysis was conducted using Winsteps (Version 3.92; Linacre, 2016). For the CFAs and testing of hypotheses H2 through H4, Mplus (Version 8; Muthén & Muthén, 1998–2017) was used. The overall fit of all CFA models was assessed with the scaled chi-square goodness-of-fit test. However, because this statistic is dependent on sample size, it is overly sensitive to trivial sources of model misfit when sample sizes are large, as in the current study (Cheung & Rensvold, 2002). Thus, a set of alternative fit indices was consulted to determine whether a model demonstrates adequate fit (Kahn, 2006). These indices and the criteria used to assess their values (Kline, 2016) were the: (a) CFI and (b) TLI, for which both indices values of ≥.90 indicate reasonable fit, and values of ≥.95 indicate good fit; (c) RMSEA, where good model fit is suggested by values of .05 or lower and values between .05 and .08 suggest reasonable fit; and (d) standardized root-mean-square residual (SRMR), for which values of .05 or lower indicate good model fit, and values of less than .10 are considered acceptable.

For the concurrent validity hypothesis (H3) the recommendations of Russell, Kahn, Spoth, and Altmaier (1998) and Kline (2016) were followed, and three to four item parcels were created from the manifest variables for the TAS-20 and the NMAS-BF, the only instruments that had six or more observed items. Item parcels were created by performing principle axis EFAs with one-factor solutions for the items comprising this scale. Iterative assignment of items into each one of the parcels was done to ensure that parcel loadings were balanced (Russell et al., 1998). For the three-item RE subscale of the MRNI-SF, the observed items were used to assess the latent factors.

Results

Missing Data, Outliers, and Normality

Thirty participants were removed from the data set for failing to correctly respond to the validity check item. Among the remaining participants, a low level of missing data was observed at the item level (0.2% to 0.6% missing responses per item). The average percentage of missing responses per participant over all items was less than 1%, which is below Parent’s (2013) cutoff of 10% for using available item analysis. In addition, there were no other major complicating concerns (e.g., low sample size, poor internal reliability of scales). Thus we proceeded to analyze the data as recommended by Parent (2013), following the simplest path: no missing values were imputed; rather, all available responses for each item were used in the analysis.

The three scales (NMAS, TAS-20, and RE) evidenced no univariate outliers (i.e., z scores >3.29). Likewise, the same scales yielded eight multivariate outliers (1.58% of the sample) as evidenced by Mahalanobis distance procedures. Given the relatively small percentage of outliers and the fact that no outliers were extreme in magnitude, we followed recommendations from Myers, Gamst, and Guarino (2013) and did not delete or modify outlier cases. The data were only mildly nonnormally distributed, with values of skew ranging from −0.42 to 1.28 and values of kurtosis ranging from −1.08 to 0.61.

EFAs of Responses to the NMAS

Prior to conducting the EFAs, the suitability of the data for factor analysis was assessed. The Kaiser-Meyer-Olkin value was .93, which exceeds the suggested value of .60 (Kaiser, 1974). Bartlett’s test of sphericity (Bartlett, 1954) was statistically significant, again further supporting the factorability of the correlation matrix. First, Kahn’s (2006) recommendation to use parallel analysis (cf., Hayton, Allen, & Scarpello, 2004) was followed. The analysis indicated that a two-factor structure best represented the dimensionality of the data. To determine which items loaded on the factors, we set the minimum allowable loading at .35 (Tabachnick & Fidell, 2007). Five items that loaded on the second factor did not meet this criterion, which resulted in their removal. We followed Tabachnick and Fidell’s (2007) criteria on cross-loading: that items that load .32 or greater on a second factor should be removed, which resulted in the removal of the remaining five items on the second factor, and thus of the second factor. Second, Kaiser’s criterion that factors with eigenvalues greater than 1 should be retained was followed, which resulted in a four-factor solution, from which 10 items on the second, third, and fourth factors were deleted because of cross-loading with the first factor, resulting in the loss of the those factors. Finally, based on prior research (Levant et al., 2006) and the scree plot,1 the data were analyzed for a one-factor solution, in which the resulting factor loadings ranged from .40 to .85. Thus a one-factor solution was used to create the three candidate models, as shown in Table 1. H1 had specified one-factor dimensionality and thus was supported.

Table 1.

Factor Loadings for the Initial EFA

Item Factor loading Model 1 Model 2 Model 3

14. I have difficulty expressing my innermost feelings. .85 × ×
20. I don’t like to talk with others about my feelings. .84 × ×
9. It is difficult for me to reveal my innermost feelings, even to close friends. .76 × ×
13. I have difficulty expressing my emotional needs to my romantic partner, spouse, or best friend. .76 × ×
17. It is too risky to express my emotions to other people. .75 × ×
16. I do not like to show my emotions to other people. .74 × ×
2R. I feel comfortable expressing my affection to family members and friends. .55 ×
12. I have difficulty telling others that I care about them. .69 ×
7R. When someone close to me hurts my feelings, I am able to tell them that I am hurt. .50 ×
10. If someone asks how I am feeling, I typically say what I am not feeling (e.g., “not too bad”). .40 ×
15. Talking about my feelings during sexual relations is difficult for me. .70 ×
6R. I have no trouble putting my feelings into words and discussing them with others. .63 ×
8R. I enjoy discussing my innermost feelings with my romantic partner, spouse, or best friend. .62
11. I don’t see much value in talking about feelings. .60
3. It does not usually occur to me to deal with my stress by talking about what is bothering me. .58
1. If I am upset or worried I don’t like to show it for fear that I will be seen as weak. .56
5R. When asked, I can easily give an account of what I am feeling. .56
19R. I like my feelings. .52
18R. I am comfortable telling someone that I am afraid of something. .44
4. I find it is very hard to cry. .40

Note. Model 1 is the model obtained via selection of the highest-loading items from the exploratory factor analysis (EFA). Model 2 is the model obtained by using EFA factor loadings and reducing redundancy. Model 3 is the model obtained using item response theory (IRT) to guide item selection. R = reverse scored.

CFA Comparisons of Candidate Models of the NMAS-BF

The basic CFA model tested was a unidimensional model fit to the second data set of 258 participants, in which responses to the six-item NMAS-BF were used as indicators of the hypothesized latent factor. The purpose was to compare the fits of the three candidate models. Model 1, based on CTT, consisted of the six highest loading items, whose factor loadings ranged from.74 to .85. The chi-square goodness of fit statistic was statistically significant, χ2(9) = 92.34; p < .001, indicating some sources of misfit. Thus the remaining indices were consulted, and while SRMR was within the guidelines for good fit and CFI was within the guidelines for reasonable fit, TLI and RMSEA were not, indicating marginal to poor fit: CFI = .915; TLI = .858; RMSEA = .192 (90% CI [.157, .228]); SRMR = .041. Model 2 was based on a combination of item loading and content, and three high loading items that overlapped with the content of other items were replaced by the next higher loading items. Deleted Item 9 overlapped with retained Item 14, both referencing innermost feelings; deleted Item 20 referenced expressing feelings overlapping with all of the items in the CTT model, so it was replaced with an item tapping difficulty describing feelings (Item 6R); and retained item 15 tapped the sexuality question more directly than deleted Item 13. The chi-square goodness of fit statistic was also statistically significant, χ2(9) = 87.60; p < .001, indicating that the null hypothesis of perfect fit should be rejected. Thus, the remaining indices were consulted, and while SRMR was within the guidelines for good fit and CFI was within the guidelines for reasonable fit, TLI and RMSEA were not, indicating marginal-to-poor fit: CFI = .908; TLI = .847; RMSEA = .186 (90% CI [.152, .223]); SRMR = .049. Finally, Model 3 was based on IRT. All 20 items were entered into the IRT analysis in Winsteps (Table 2). First, we examined infit and outfit values. Infit and outfit values under .60 or over 1.40 indicate that the patterns of responses were unexpected (i.e., response patterns appear to be influenced by factors other than the construct underlying the measure; Mallinckrodt et al., 2016). Two items were removed from consideration because responses violated accepted infit/outfit criteria. Second, we assessed item difficulty. For scale responses, difficulty reflects whether an item tends to be endorsed or not (i.e., “easy” items are endorsed by more people while “difficult” items are endorsed by fewer). Items were selected for inclusion by dispersing them based on their difficulty scores so as to capture a range of assessments of the construct of NMA. The items selected, along with threshold values, are presented in Table 3. No items selected to compose the NMAS-BF demonstrated disordered thresholds. For Model 3, the chi-square goodness-of-fit statistic was statistically significant at only the .05 level, χ2(9) = 18.05; p = .035. Most of the remaining indices were within the guidelines for good fit or (in the case of RMSEA) reasonable fit, indicating decent fit: CFI = .978; TLI = .963; RMSEA = .063 (90% CI [.016, .105]); SRMR = .030. We also determined the test information functions for the three competing models. Test information functions are summaries of item information functions, which are curves that indicate the sensitivity of an item across the range of values on the underlying construct. In the case of the NMAS-BF, which is defined as a unidimensional assessment, the ideal test information function would have broad sides, a high peak, and a smooth bell shape to indicate broad assessment of the underlying construct, maximum contributions of the items to identifying the underlying construct, and uniformity in item difficulty, respectively. In addition to the superior fit indices, the IRT model demonstrated a superior test information function curve as the test information function was higher and smoother (Mallinckrodt et al., 2016; Figure 1). Hence Model 3 was the preferred model, and H2 was supported.

Table 2.

IRT Results

Item number Difficulty Infit Outfit Excluded Retained for IRT item set

2 .41 1.17 1.22 ×
19 .30 1.04 1.14
5 .29 .94 .91
11 .28 1.21 1.19
12 .19 .95 .95 ×
13 .16 .83 .82
8 .13 1.08 1.04
6 .10 .91 .89
7 .05 1.08 1.13 ×
14 −.02 .55 .57 ×
18 −.04 1.12 1.19
15 −.04 1.02 1.06
17 −.08 .79 .78
3 −.10 1.20 1.40
9 −.12 .77 .78 ×
4 −.14 1.68 1.87 ×
1 −.20 1.15 1.19
20 −.31 .60 .61 ×
16 −.37 .69 .68
10 −.51 1.39 1.40 ×

Note. Excluded items were not included for consideration in the item response theory (IRT) based on elevated infit and/or outfit statistics. Retained items, marked with an “×,” were distributed approximately evenly across difficulty values, consistent with IRT procedure. Item numbers in this table correspond to item numbers in Table 2.

Table 3.

CFA Standardized Loadings for NMAS-BF Items

CFA IRT thresholds


Item Loading SE Residual 1 2 3 4 5

N2R. I feel comfortable expressing my affection to family members and friends .570 .050 .675 −1.03 −.58 −.12 .22 .53
N12. I have difficulty telling others that I care about them .705 .042 .502 −1.20 −.74 −.22 .07 .42
N7R. When someone close to me hurts my feelings, I am able to tell them that I am hurt .613 .047 .624 −.93 −.79 −.40 −.09 .53
N9. It is difficult for me to reveal my innermost feelings, even to close friends .658 .044 .566 −1.64 −.91 −.42 −.11 .26
N20. I don’t like to talk with others about my feelings .787 .036 .381 −1.94 −1.05 −.62 −.22 .21
N10. If someone asks how I am feeling, I typically say what I am not feeling (e.g., “not too bad”) .505 .054 .745 −1.46 −.77 −.55 −.34 −.05

Note. All items loaded onto the latent factor at p < .001. CFA = confirmatory factor analysis; NMAS-BF = Normative Male Alexithymia Scale Brief Form; IRT = item response theory; SE = standard error; R = reverse scored.

Figure 1.

Figure 1.

Test information curves.

Descriptive Statistics

Raw score-based correlation coefficients, alpha coefficients, means, and standard deviations for the NMAS-BF, TAS-20, and RE scales are presented in Table 4. The reliability coefficient for the NMAS-BF was .80, which, according to Ponterotto and Ruckdeschel’s (2007) criteria represents good reliability.

Validity of the NMAS-BF

First, convergent evidence for validity was assessed. The correlation between the latent constructs of normative male alexithymia and alexithymia was significant, moderate-to-strong, and positive (.57, p < .001), providing convergent evidence for validity and supporting H3. Second, based on Chen, West, and Sousa’s (2006) guidelines, concurrent evidence for the validity of the latent factor of the NMAS-BF was assessed by constructing a structural regression model. The purpose was to assess whether normative male alexithymia (as assessed by the NMAS-BF) accounted for unique variance in the endorsement of the masculine norm of restricted emotionality (as assessed by RE) when alexithymia (as assessed by the TAS-20) was in the model. In this model, latent factors representing the NMAS-BF and the TAS-20, using parcels as discussed above, were regressed on the latent factor of RE (for which we used observed indicators). The CFA of the measurement model produced reasonable fit to the data, χ2(41) = 152.70, p < .001, CFI = .950, TLI = .933, RMSEA = .073 (90% CI [.061, .086]), SRMR = .038. All of the parcels had significant loadings on their respective factors; the standardized loadings ranged from .81 to .82 for the NMAS-BF, and .70 to .85 for the TAS-20. In addition, where manifest indicators were used (i.e., for RE), all indicators had significant loadings on their factor and ranged from .73 to .85. Next, latent factors representing the NMAS-BF and the TAS-20 were regressed on the latent factor of RE. This model had exactly the same fit statistics as the measurement model and thus showed reasonable fit to the data. The results are shown in Table 5, where it can be seen that the NMAS-BF uniquely predicted RE scores when alexithymia scores were included in the model, supporting H4. Finally, incremental evidence for validity was examined using hierarchical multiple regression. The results are shown in Table 6, where it can be seen that NMAS-BF scores significantly predicted variance in RE scores above and beyond that predicted by alexithymia scores, supporting H5.

Table 5.

Structural Regression Paths Between the Latent TAS-20 and NMAS-BF Factors and the RE Factor

Path R 2 Unstandardized SE Standardized

TAS-20 on RE .122*** .18*** .03 .35***
NMAS-BF on RE .127*** .31*** .05 .36***

Note. TAS-20 = Toronto Alexithymia Scale-20; NMAS-BF = Normative Male Alexithymia Scale-Brief Form; RE = Restrictive Emotionality subscale of the Male Role Norms Inventory-Short Form; SE = standard error.

***

p < .001.

Table 6.

Hierarchical Multiple Regression Analysis of the Regression of TAS-20 and the NMAS-BF on the RE (Criterion = RE)

Predictor R2 change Unstandardized coefficients SE Beta coefficients (standardized)

Step 1 .088***
 TAS-20 .634*** .091 .297***
Step 2 .025***
 TAS-20 .444*** .103 .189***
 NMAS-BF .212*** .057 .165***

Note. TAS-20 = Toronto Alexithymia Scale-20; NMAS-BF = Normative Male Alexithymia Scale-Brief Form; RE = Restrictive Emotionality subscale of the Male Role Norms Inventory-Short Form; SE = standard error.

***

p < .001.

Discussion

The purpose of this study was to develop and assess the psychometric proprieties of the NMAS-BF. EFA was used to generate three candidate six-item unidimensional models, which were compared through CFA. The model based on IRT was superior to models based on CTT and CTT optimized to reduce redundancy. The results also provided evidence for internal consistency reliability, and for convergent, concurrent, and incremental evidence for validity, supporting the use of the NMAS-BF. Future research should further assess the NMAS-BF for test–retest reliability and measurement invariance across age, race/ethnicity, and sexual orientation.

There are also implications of the measurement development undertaken in this study for practitioners. The NMAS-BF may be of use to counseling practitioners who wish to quickly assess their clients’ ability to express their emotions. Elevated scores on the NMAS-BF may speak to the need to address emotional expression through skill building while developing client awareness of the benefits and utility of the ability to access and express emotions in terms of interpersonal relationships (Brackett, Rivers, & Salovey, 2011). Also, men may have experienced direct punishment for expressing some emotions during childhood and adolescence and may benefit from explorations of negative learning experiences related to emotional expression (Levant, 2001). There may be cases where discussing a client’s overall scores and/or responses to specific items may be of value in the counseling process.

Our use of IRT to inform item selection for the NMAS-BF demonstrates the utility of this approach to item selection. The items selected through IRT encompass a wider range of aspects of normative male alexithymia compared with those selected via CCT, making the NMAS-BF more useful as a brief assessment. Another strength of this study was assessing validity in a latent variable context. The hazards of relying on raw scores have been strikingly demonstrated in other research, where as many as half of the significant correlations using raw scores were not significant using latent variables (Levant et al., 2016; Levant et al., 2017).

There are some limitations of the current study that should be acknowledged. First, the self-report nature of the surveys introduces the possibility of socially desirable responding (SDR). SDR was not measured in our study; however, a recent article demonstrated that SDR is not always a problem (Tracey, 2016). Future investigations using the NMAS-BF are encouraged to continue to address these issues. Second, our sample was composed of mostly White heterosexual men from the United States. Explorations of the construct of normative male alexithymia within more diverse samples and contexts may be useful to inform interventions and clinical work with diverse samples of men. Third, we did not collect data on participants’ socioeconomic status and educational attainment. Future research should collect and report these data. Fourth, participants without Internet access were implicitly excluded by the use of MTurk recruitment. Fifth, our evaluation of evidence for validity was limited to just two criterion variables. Future research should expand this to include more of the nomological network.

The NMAS-BF is intended to measure a mild-to-moderate form of alexithymia theorized to be normative for men due to gender-linked socialization practices—NMA. This theoretical formulation is supported by research demonstrating that the requirement to restrict emotional expression is a central aspect of traditional masculine norms, and that boys are normatively socialized to restrict the expression of vulnerable and caring motions. Although prior research discussed above has found that men experience alexithymia more frequently than do women, that NMAS scores are correlated with the endorsement of TMI (a construct theorized to be related to masculine socialization), and that in the present study the NMAS-BF demonstrates incremental validity beyond that provided by a current measure of clinical alexithymia (e.g., the TAS) in predicting endorsement of the traditional masculine norm of RE, we have not demonstrated that these differences are due to socialization, and thus we have not yet tested a central element of the GRSP theory. We thus do not know for certain whether NMA (as measured by the NMAS-BF) is related to masculine socialization. In order to definitively say that NMA is related to masculine socialization, we would have to use a measure of masculine socialization, which currently does not exist. Hence, at this stage of instrument development this question cannot be fully answered.

Finally, the present data focused on a singular assessment of the construct of alexithymia; other approaches, such as multitrait multimethod designs, may be useful in further exploring the nature and implications of normative male alexithymia.

Conclusions

Two main conclusions can be drawn from this study. First, there is evidence supporting the unidimensionality and internal constancy reliability; in addition there is convergent, concurrent, and incremental evidence for the validity of the NMAS-BF, although additional research is called for in investigating the test–retest reliability and measurement invariance across groups of men defined by race, age, and sexual orientation. Second, the present study demonstrates the advantages of SEM and IRT for assessing the psychometric properties of any scale used in counseling psychology research—in particular, for the selection of items, the assessment of dimensionality, and the use of latent variables for evaluating evidence for validity.

Public Significance Statement.

This study supports the use of the brief form of the Normative Male Alexithymia Scale (NMAS-BF). Evidence is also reported for its reliability and validity as a measure of men’s gender-socialized limitations in expressing emotions.

Footnotes

1

The eigenvalue of the first factor was 8.64, accounting for 43.2% of the variance. For comparison’s sake, the eigenvalue of the second factor was 1.90.

Contributor Information

Ronald F. Levant, Counseling Psychology Program, Department of Psychology, University of Akron

Mike C. Parent, Counseling Psychology and Counselor Education, Department of Educational Psychology, University of Texas at Austin.

References

  1. Bagby RM, Parker JDA, & Taylor GJ (1994). The twenty-item Toronto Alexithymia Scale—I. Item selection and cross-validation of the factor structure. Journal of Psychosomatic Research, 38, 23–32. 10.1016/0022-3999(94)90005-1 [DOI] [PubMed] [Google Scholar]
  2. Bartlett MS (1954). A note on the multiplying factors for various chi square approximations. Journal of the Royal Statistical Society, Series B, 16, 296–298. [Google Scholar]
  3. Brackett MA, Rivers SE, & Salovey P (2011). Emotional intelligence: Implications for personal, social, academic, and workplace success. Social and Personality Psychology Compass, 5, 88–103. 10.1111/j.1751-9004.2010.00334.x [DOI] [Google Scholar]
  4. Casler K, Bickel L, & Hackett E (2013). Separate but equal? A comparison of participants and data gathered via Amazon’s MTurk, social media, and face-to-face behavioral testing. Computers in Human Behavior, 29, 2156–2160. 10.1016/j.chb.2013.05.009 [DOI] [Google Scholar]
  5. Chen FF, West SG, & Sousa KH (2006). A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research, 41, 189–225. 10.1207/s15327906mbr4102_5 [DOI] [PubMed] [Google Scholar]
  6. Cheung GW, & Rensvold RB (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255. 10.1207/S15328007SEM0902_5 [DOI] [Google Scholar]
  7. DeVellis RF (2016). Scale development: Theory and applications. Thousand Oaks, CA: SAGE. [Google Scholar]
  8. Gignac GE, Palmer BR, & Stough C (2007). A confirmatory factor analytic investigation of the TAS-20: Corroboration of a five-factor model and suggestions for improvement. Journal of Personality Assessment, 89, 247–257. 10.1080/00223890701629730 [DOI] [PubMed] [Google Scholar]
  9. Hayton JC, Allen DG, & Scarpello V (2004). Factor retention decisions in exploratory factor analysis: A tutorial on parallel analysis. Organizational Research Methods, 7, 191–205. 10.1177/1094428104263675 [DOI] [Google Scholar]
  10. Kahn JH (2006). Factor analysis in counseling psychology research, training, and practice: Principles, advances, and applications. The Counseling Psychologist, 34, 684–718. 10.1177/0011000006286347 [DOI] [Google Scholar]
  11. Kaiser HH (1974). An index of factorial simplicity. Psychometrika, 39, 31–36. 10.1007/BF02291575 [DOI] [Google Scholar]
  12. Karakis EN, & Levant RF (2012). Is normative male alexithymia associated with relationship satisfaction, fear of intimacy and communication quality among men in heterosexual relationships. The Journal of Men’s Studies, 20, 179–186. 10.3149/jms.2003.179 [DOI] [Google Scholar]
  13. Kline RB (2016). Principles and practice of structural equation modeling (4th ed.). New York, NY: Guilford Press. [Google Scholar]
  14. Levant RF (1992). Toward the reconstruction of masculinity. Journal of Family Psychology, 5, 379–402. 10.1037/0893-3200.5.3-4.379 [DOI] [Google Scholar]
  15. Levant RF (1995). Toward the reconstruction of masculinity. In Levant RF & Pollack WS (Eds.), A new psychology of men (pp. 229–251). New York, NY: Basic Books. [Google Scholar]
  16. Levant RF (2001). Desperately seeking language: Understanding, assessing and treating normative male alexithymia. In Brooks GR & Good G (Eds.), The new handbook of counseling and psychotherapy for men (Vol. 1, pp. 424–443). San Francisco, CA: Jossey-Bass. [Google Scholar]
  17. Levant RF, Alto KM, McKelvey DK, Richmond KA, & McDermott RC (2017). Variance composition, measurement invariance by gender, and construct validity of the Femininity Ideology Scale-Short Form. Journal of Counseling Psychology, 64, 708–723. 10.1037/cou0000230 [DOI] [PubMed] [Google Scholar]
  18. Levant RF, Good GE, Cook S, O’Neil J, Smalley KB, Owen KA, & Richmond K (2006). The Normative Male Alexithymia Scale: Measurement of a gender-linked syndrome. Psychology of Men & Masculinity, 7, 212–224. 10.1037/1524-9220.7.4.212 [DOI] [Google Scholar]
  19. Levant RF, Hall RJ, & Rankin TJ (2013). Male Role Norms Inventory-Short Form (MRNI-SF): Development, confirmatory factor analytic investigation of structure, and measurement invariance across gender. Journal of Counseling Psychology, 60, 228–238. 10.1037/a0031545 [DOI] [PubMed] [Google Scholar]
  20. Levant RF, Hall RJ, Weigold IK, & McCurdy ER (2016). Construct validity evidence for the Male Role Norms Inventory-Short Form: A structural equation modeling approach using the bifactor model. Journal of Counseling Psychology, 63, 534–542. 10.1037/cou0000171 [DOI] [PubMed] [Google Scholar]
  21. Levant RF, Hall RJ, Williams C, & Hasan NT (2009). Gender differences in alexithymia. Psychology of Men & Masculinity, 10, 190–203. 10.1037/a0015652 [DOI] [Google Scholar]
  22. Levant RF, Richmond K, Majors RG, Inclan JE, Rossello JM, Heesacker M, . . . Sellers A. (2003). A multicultural investigation of masculinity ideology and alexithymia. Psychology of Men & Masculinity, 4, 91–99. 10.1037/1524-9220.4.2.91 [DOI] [Google Scholar]
  23. Levant RF, & Wong YJ (2013). Race and gender as moderators of the relationship between the endorsement of traditional masculinity ideology and alexithymia: An intersectional perspective. Psychology of Men & Masculinity, 14, 329–333. 10.1037/a0029551 [DOI] [Google Scholar]
  24. Levant RF, Wong YJ, Karakis EN, & Welsh MW (2015). Mediated moderation of the relationship between the endorsement of restrictive emotionality and alexithymia. Psychology of Men & Masculinity, 16, 459–467. 10.1037/a0039739 [DOI] [Google Scholar]
  25. Linacre JM (2016). Winsteps (Version 3.92.0) [Computer software]. Beaverton, OR: Winsteps.com. Retrieved from http://www.winsteps.com/ [Google Scholar]
  26. Little TD, Cunningham WA, Shahar G, & Widaman KF (2002). To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling, 9, 151–173. 10.1207/S15328007SEM0902_1 [DOI] [Google Scholar]
  27. MacCallum RC, Browne MW, & Sugawara HM (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1, 130–149. [Google Scholar]
  28. Mahalik JR, Locke BD, Ludlow LH, Diemer MA, Scott RJ, Gottfried M, & Freitas G (2003). Development of the Conformity to Masculine Norms Inventory. Psychology of Men & Masculinity, 4, 3–25. 10.1037/1524-9220.4.1.3 [DOI] [Google Scholar]
  29. Mallinckrodt B, Miles JR, & Recabarren DA (2016). Using focus groups and Rasch item response theory to improve instrument development. The Counseling Psychologist, 44, 146–194. 10.1177/0011000015596437 [DOI] [Google Scholar]
  30. McDermott RC, Levant RF, Hammer JH, Hall RJ, McKelvey DK, & Jones Z (2017). Further examination of the factor structure of the Male Role Norms Inventory-Short Form (MRNI-SF): Measurement considerations for women, men of color, and gay men. Journal of Counseling Psychology, 64, 724–738. 10.1037/cou0000225 [DOI] [PubMed] [Google Scholar]
  31. Muthén LK, & Muthén BO (1998–2015). Mplus user’s guide (7th ed.). Los Angeles, CA: Author. [Google Scholar]
  32. Myers LS, Gamst G, & Guarino AJ (2013). Applied multivariate research: Design and interpretation (2nd ed.). Thousand Oaks, CA: Sage. [Google Scholar]
  33. O’Neil JM (2008). Summarizing 25 years of research on men’s gender role conflict using the Gender Role Conflict Scale. The Counseling Psychologist, 36, 358–445. 10.1177/0011000008317057 [DOI] [Google Scholar]
  34. Parent MC (2013). Handling item-level missing data: Simpler is just as good. The Counseling Psychologist, 41, 568–600. 10.1177/0011000012445176 [DOI] [Google Scholar]
  35. Peer E, Vosgerau J, & Acquisti A (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior Research Methods, 46, 1023–1031. 10.3758/s13428-013-0434-y [DOI] [PubMed] [Google Scholar]
  36. Pleck JH (1981). The myth of masculinity. Cambridge, MA: MIT Press. [Google Scholar]
  37. Pleck JH (1995). The gender role strain paradigm: An update. In Levant RF & Pollack WS (Eds.), A new psychology of men (pp. 11–32). New York, NY: Basic Books. [Google Scholar]
  38. Ponterotto JG, & Ruckdeschel DE (2007). An overview of coefficient alpha and a reliability matrix for estimating adequacy of internal consistency coefficients with psychological research measures. Perceptual and Motor Skills, 105, 997–1014. 10.2466/pms.105.3.997-1014 [DOI] [PubMed] [Google Scholar]
  39. Russell DW, Kahn JH, Spoth R, & Altmaier EM (1998). Analyzing data from experimental studies: A latent variable structural equation modeling approach. Journal of Counseling Psychology, 45, 18–29. 10.1037/0022-0167.45.1.18 [DOI] [Google Scholar]
  40. Sifneos PE (1967). Clinical observations on some patients suffering from a variety of psychosomatic diseases. Acta Medicina Psychosomatica, 7, 1–10. [Google Scholar]
  41. Tabachnick BG, & Fidell LS (2007). Using multivariate statistics (5th ed.). Boston, MA: Pearson. [Google Scholar]
  42. Taylor GJ (2004). Alexithymia: Twenty-five years of theory and research. In Nyklicek I, Temoshok L, & Vingerhoets A (Eds.), Emotional expression and health: Advances in theory, assessment and clinical applications (pp. 137–153). New York, NY: Brunner-Routledge. [Google Scholar]
  43. Tracey TJG (2016). A note on socially desirable responding. Journal of Counseling Psychology, 63, 224–232. 10.1037/cou0000135 [DOI] [PubMed] [Google Scholar]

RESOURCES