Measurement invariance tests of revisions to archaically worded items in the Mach IV scale

Brian K Miller; Kay Nicols; Robert Konopaske

doi:10.1371/journal.pone.0223504

. 2019 Oct 16;14(10):e0223504. doi: 10.1371/journal.pone.0223504

Measurement invariance tests of revisions to archaically worded items in the Mach IV scale

Brian K Miller ^1,^*, Kay Nicols ¹, Robert Konopaske ¹

Editor: Angel Blanch²

PMCID: PMC6795441 PMID: 31618223

Abstract

The Machiavellian IV [1] instrument, developed almost 50 years ago to measure trait Machiavellianism and still in wide use in personality research, uses item wording that is not gender-neutral, makes use of idiomatic expressions, and includes archaic references. In this two-sample study, exploratory factor analysis (EFA) was conducted on one sample to examine the structure of responses to the Mach IV. In an independent second sample the resulting EFA structure was analyzed using confirmatory factor analysis-based measurement equivalence/invariance (ME/I) tests in a control group with the original archaic items and a treatment group with eight items rewritten in a more modern vernacular. Specific model testing steps [2] and statistical tests [3] were applied in a bottom-up approach [4] to ME/I tests on these two versions of the Mach IV. The two versions were found to have equal form, equal factor loadings, but unequal indicator error variances. Subsequent item-by-item tests of error invariance resulted in substantial decrements to fit for three revised items suggesting that the error associated with these items was not equal across the two versions.

Introduction

Machiavellianism is the predisposition to manipulate interpersonal relationships with guile, opportunism, and deceit. Research on this construct began in earnest with the development of a self-report inventory [1] based upon the lead character in Niccolo Machiavelli’s sixteenth century novel “The Prince”. The title character lacked morality and empathy and had ample distrust of others. The scale authors [1] painstakingly developed items designed to measure the lead character’s tendency toward amorality, to have negative views of human nature, and to employ interpersonal tactics designed for personal gain at the expense of others [1]. The result was the Mach IV which is a 20-item scale comprised of three subscales: morals, views, and tactics. This instrument is the most frequently used scale to measure Machiavellianism [1].

Interest in Machiavellianism was further with its inclusion as one of the three traits comprising the relatively new Dark Triad [5]. The other two traits are narcissism and psychopathy. Machiavellianism is distinct from the others in its focus on the strategic manipulation of others for personal gain [6]. Machiavellianism is the most often studied but least understood of the Dark Triad traits [6] likely due to its psychometric complexity resulting from factor indeterminacy, poorly loading items, and subscale unreliability. Numerous attempts at exploratory factor analysis (EFA) have revealed a factor structure ranging from the original three factors [1] to as many as nine factors [7], due in part to the diversity of populations to which it has been administered. Confirmatory factor analyses (CFA) have required the elimination of items ranging from seven of 20 [8] to ten of 20 [9] in order to achieve adequate model fit and/or model convergence. These problems with the Mach IV are not unlike other lengthy scales with obliquely related subscales and have greatly contributed to the limited understanding of Machiavellianism.

It is possible that problems with the Mach IV arise, at least partially, from outdated language not understandable to contemporaneous survey responders. The English lexicon is constantly changing as new words are created and outdated ones fall out of favor. Developers of self-report inventories are wise to use the vernacular of the time to make their instruments understandable but not so colloquial or era-specific so that they are only appropriate for contemporaneous respondents. Scales developed decades or generations ago can be particularly problematic for at least three reasons. First, because samples of college students are commonly used in psychological research [10,11], it is imperative that the wording of scale items is understandable to young adults and terms that refer to popular culture, historical events, or figures of the distant past are likely to be confusing or un-interpretable by many of today's college students. Second, from 1970–2017 the percentage of female students enrolled in college escalated from 48.5% to 56.4% of all students [12]. With this in mind, items referring only to men or that have prototypically masculine inferences are perhaps not conceptualized the same by females as by male respondents and are problematic given that the majority of research participants are likely to be female. Third, concurrent with changes in gender demography amongst college students is an increase in the number of international students enrolled in American universities which has quadrupled since 1976 (the earliest year data were tracked), topping one million in 2016 [13]. The largest numbers of international students are from China (31.5%), India (15.9%), Saudi Arabia (5.9%), and South Korea (5.8%) [14,15]. Some of these non-native English speaking college students are also likely participating in psychological research. For non-native English speakers, the inclusion of idiomatic expressions whose literal meaning (e.g., s/he is taking a major risk) is not easily discernable from the figurative meaning (e.g., s/he is playing with fire) can lead to misinterpretations or misunderstandings [16,17]. The psychometric properties of scales developed decades ago may therefore suffer lower validity and reliability when administered to current samples.

The current study examines whether a revised Mach IV scale [1] that omits archaic language and outdated references, eliminates idiomatic or unusual expressions, and corrects for non-gender-neutral language will be conceptualized differently than the original version developed almost 50 years ago. To this end, the factor structure of the original Mach IV scale was examined with exploratory factor analysis (EFA) in one sample and the structure was validated with confirmatory factor analysis (CFA) in an independent second sample. In the second sample CFA-based measurement equivalence / invariance (ME/I) tests were used to compare the original and revised items in order to determine if respondents in the two different groups ascribe the same meaning to items in both versions of the scale [2]. The ME/I tests are used to determine if different versions of a scale measure the same construct, in the same manner, in different groups [18]. It is hoped that the psychometric properties of the revised scale are better than those of the original scale developed decades ago.

There does exist an alternative version of the Mach IV known as the Kiddie Mach that was also developed [1] for administration to children. To address that audience, the authors discarded the original item on euthanasia and completely changed the intent and meaning of other items for such a young audience but some problematic item wording remained. The adult Mach IV instrument has enjoyed much more widespread use, however, and is thus the focus of this study.

Over the decades, various EFA examinations of the adult Mach IV have been undertaken. The factor structure of the Mach IV has ranged from the original three-factor structure [1] to a nine-factor structure [7]. These disparate solutions have also resulted in items in one solution loading on another factor in another solution. For example, item nine of the Mach IV ("It is wise to flatter important people") sometimes loads on the third factor [1]; on the first factor [19,20,21] on the second factor in U.S. samples [22,23], on a fourth factor in a Chinese sample [22], and even on a fifth factor [7]. This variety of factor solutions and very different loadings of items on factors in the Mach IV instrument suggests that further factor analyses are in order to build an understanding of the underlying factor structure of the Mach IV. See Table 1 for these factor solutions.

Table 1. Previous exploratory factor analytic results for the Machiavellianism IV scale [1].

Item #	Christie & Geis (1970)	Williams, et al. (1975)	Kuo & Marsella (1977) Chinese sample	Kuo & Marsella (1977) US sample	Ahmed & Stewart (1981)	O’Hair & Cody (1987)	Panitz (1989)	Andreou (2004)
Factor on which item loaded
1.	1	2	4	2	1	2	6	3
2.	1	2	5	5	3	1	4	3
3.	1	4	5	4	1	2	1	4
4.	2	3	1	3	1	--	2	2
5.	2	--	1	1,2,5	3	1	3	3
6.	1	1	1	1,3	5	2	2	4
7.	1	1	3	2	1,5	2	1	--
8.	2	2	2	1	4	--	2	3
9.	3	1	4	2	1	2	5	1
10.	1	1	2	1,2	3	1	3	3
11.	2	--	--	1	5	--	3	2
12.	1	--	2	5	4	--	7	--
13.	2	--	5	3	--	--	6	1
14.	2	3	2, 3	3	2	3	2	1
15.	1	2	2	5	3,4	1	7	--
16.	1	--	3	4	5	2	5	2
17.	2	--	2	3,4	2,4	3	1	1
18.	2	--	--	4	5	2	1	2
19.	3	4	3	3,4	--	2	1	4
20.	2	2	2	1,2	2	3	4	1
Type of EFA:		PCA	PAF	PAF	PCA	PCA	Unknown	PCA

Open in a new tab

Note: PCA = principal components analysis; PAF = principal axis factoring; Cells with more than one number indicate cross-loadings. Double dash indicates the item failed to load on any factor.

In partial response to these varying solutions to the Mach IV instrument, alternative measures of Machiavellianism have been developed. For example, some researchers have developed a multi-factor measure of Machiavellianism known as the Machiavellianism Personality Scale [24]. It too is not without problems [25,26,27]. Other researchers have developed promising new conceptualizations and measures of Machiavellianism [28,29]. Rather than developing another new measure of Machiavellianism, the current study seeks to update the Mach IV instrument by rewriting some problematic items and examining the factor structure of a revised version of the Mach IV in comparison to the structure of the original version.

Study one method

Procedure

Studies One and Two were approved by the Institutional Review Board (IRB) of Texas State University with documentation from the IRB Regulatory Manager. The approval number was EXP2016H133885Y and it was declared to be exempt from review. Consent from participants was gained verbally and in writing with signatures on forms that were separate from the actual surveys. Thus, all data were collected anonymously.

In Study One, data were collected via an anonymous paper-and-pencil self-report inventory in large sections of an upper level undergraduate course at a large public university in the southwestern U.S. Because of the many different factor structures found in previous research for both instruments, the purpose of Study One was to use EFA to determine the latent structure of the Mach IV instrument [30]. The resulting factor structure is validated with CFA in Study Two. Given the many issues that can affect EFA results (e.g. communalities, rotation method, sample size), the decision of an appropriate cutoff score for factor loadings is vital to interpreting the factor structure. Despite popular arguments against the use of strict statistical cutoffs [31,32] it is still widely accepted [33,34,35,36] that factor loadings should be at least .30 to be minimally acceptable [30]. This cutoff was used in this study in a principal axis factoring analysis on the data collected in Study One to extract a factor solution for the Mach IV instrument with an oblique rotation because the sub-scales of the instrument should be correlated.

Participants

Complete data were provided for Study One by 295 respondents. Listwise deletion was used so four participants who failed to complete the survey were discarded from the analysis. Slightly more than half of the participants were female (52.0%). The mean age was 21.97 years and self-reported racial or ethnic group membership was 59.7% White, 6.8% Black, 27.8% Hispanic, 2.4% Asian, 0.7% American Indian, and 2.7% other. The mean level of full-time and part-time work experience was 18.19 months and 41.98 months, respectively. Of the nearly 63% of respondents (i.e. 185) who were currently employed, 21% were direct supervisors or managers of other employees. Of those 33 supervisors or managers, the mean number of direct reports was nine with a range of two to 40.

Mach IV instrument

Responses were gathered on the 20-item Mach IV scale [1] using a Likert response scale anchored by 1 = "strongly disagree" and 7 = "strongly agree". Sample items included: "It is hard to get ahead without cutting corners here and there" and "Barnum was very wrong when he said there's a sucker born every minute" (reverse scored). All items were corrected for reverse scoring before the EFA was conducted. Cronbach's coefficient alpha of internal consistency reliability for scores on the Mach IV in Study One was .68. The item-level skewness and kurtosis ranged from -1.12 to 1.87 and from -1.08 to 4.04, respectively. These univariate statistics were within acceptable limits of |2.0| for skewness and |7.0| for kurtosis, respectively [37].

Study one results

A direct oblimin rotation with principal axis factor extraction resulted in a six-factor solution for the Mach IV that explained 50.21% of the variance in the items. The number of factors was based upon a visual inspection of the scree plot, the criteria of Eigen values greater than one, and parallel analysis. Sixteen of 20 items had factor loadings greater than .30 on one factor only (i.e. clean loadings), two items had cross loadings of greater than .30 on two factors, and two items loaded on no factor (i.e. no loadings on any factor > .30). Factors three and six were comprised solely of negatively loading items and factor four had one negatively and one positively loading item in excess of the cutoff. See Table 2 for the factor loadings.

Table 2. Factor loadings for the Mach IV with principal axis factor analysis and an oblique rotation.

	Factors
Items	1	2	3	4	5	6
Mach1	-.003	.283	.065	.141	-.148	.001
Mach2	.314	-.043	.193	.000	-.378	-.288
Mach3	-.032	.020	-.09	-.041	.070	-.566
Mach4	.080	.495	-.206	-.004	-.009	.070
Mach5	.676	.025	-.099	.007	-.016	.002
Mach6	-.044	.407	-.074	-.211	-.143	-.228
Mach7	.119	.401	.146	-.065	.109	-.032
Mach8	.460	-.017	.044	-.071	.010	.032
Mach9	-.004	.101	.100	.428	.124	-.072
Mach10	.318	-.176	-.175	.186	.012	-.425
Mach11	.126	.131	-.080	.158	.042	-.176
Mach12	.085	-.008	.088	.020	.381	-.075
Mach13	.455	.084	-.120	.008	.072	.021
Mach14	-.015	.056	-.605	-.129	.048	-.082
Mach15	.468	.075	-.132	.170	.024	-.061
Mach16	.042	.105	-.010	-.560	.125	-.147
Mach17	.132	-.076	-.497	-.043	-.064	.047
Mach18	-.060	.230	.136	-.147	-.006	-.316
Mach19	-.046	.410	-.082	.081	.137	-.105
Mach20	.093	.076	-.387	.052	-.062	-.087

Open in a new tab

Note: Strongest factor loadings are underlined. Factor loadings > .30 in bold and underlined

Study one discussion

The results of Study One suggest that there are some problems with responses to the Mach IV. First, alpha serves as the lower bound for score reliability and it is usually higher for scores on items in a unidimensional scale than in a scale with several sub-scales [38] but the low alpha of .68 here is perhaps symptomatic of the difficulty of estimating a baseline true score resulting from scale multidimensionality. These sub-scales may measure only moderately related dimensions. If the items themselves are not unidimensional then cross-loading items in a multi-factor solution is likely a source of model misfit. On the other hand, if the issue is one of simply misunderstanding the content of the items by respondents then the multidimensionality of the scale is much less of a problem. Second, the number of negatively loading items is potentially problematic. Negatively loading items that measure the opposite end of the Machiavellianism spectrum detrimentally affect overall scale scores. The result is that the sum of the potentially offsetting items can mask important individual differences measured by the scale. That is, items with negative loadings tend to measure the opposite end of the Machiavellianism spectrum and when summed with positively loading items the overall scale score is closer to the middle range than either the low or high ranges. Therefore, the Mach IV is unlikely to be effective in discriminating low or high Machs from mid-range Machs. Whether these negative factor loadings and low internal consistency reliability are the result of respondents being unsure of the archaic references, uncomfortable with the non-gender-neutral wording, or unfamiliar with the idiomatic/unusual phrasing is not yet known. To gather further evidence in Study Two the CFA tests of the factor structure determined with EFA in Study One were conducted on data from an independent sample administered as an experiment.

Study two method

Procedure

In Study Two, the same method of administration at the same university as above was used with different respondents. However, an experimental framework [39,40] was utilized whereby only the participants in the control group received questionnaires in the scale’s original language [1]. Participants in the treatment group received surveys with the same instructions as the control group in Study One and most of the same items but with eight of 20 Mach IV items rewritten to avoid problematic language. This study’s three authors independently selected specific items from the entire scale, revised those items, and came to a mutual agreement on the number of items to be reworded as well as the exact rewording of the items. For example, the original item "It is hard to get ahead without cutting corners here and there" was revised to "It is hard to get ahead without taking the easy way here and there." Additionally, the original item "Barnum was very wrong when he said there's a sucker born every minute" (reverse scored) was revised to "It is wrong to think that most people can be easily tricked") (reverse scored). All items were corrected for reverse scoring before the CFA was conducted. See Table 3 for a side-by-side comparison of the original and revised items.

Table 3. Original problematically worded scale items and revised versions examined in study two.

Item number	Original Mach IV items	Revised Mach IV items
2	When you ask someone to do something for you, it is best to give the real reasons for wanting it rather than giving reasons which might carry more weight	When you ask someone to do something for you, it is best to give the real reasons for wanting it rather than giving reasons which might be more influential
4	It is hard to get ahead without cutting corners here and there	It is hard to get ahead without taking the easy way here and there
6	It is safest to assume that all people have a vicious streak and it will come out when they are given a chance	It is safest to assume that all people can be cruel when they are given a chance
9	It is wise to flatter important people	It is wise to be especially nice to important people
11	Barnum was very wrong when he said there’s a sucker born every minute	It is wrong to think that most people can be easily tricked
16	Most men forget more easily the death of their father than the loss of their property	Most people forget more easily the death of a parent than the loss of their property
18	Generally speaking, men won’t work hard unless they’re forced to do so	Generally speaking, people won’t work hard unless they’re forced to do so
20	Most men are brave	Most people are brave

Open in a new tab

Note: Problematic item parts and their revisions are underlined.

In the CFA model tests that follow, the factor structure arising from the previous EFA test was imposed on the factor structure with items loading less than .30 in the EFA being forced to zero in the CFA. The goal was to use as parsimonious of a model of the data as possible. Therefore error terms was not allowed to correlate and items were not allowed to cross-load on multiple factors.

Several CFA-based ME/I tests were implemented in Study Two to determine if the revised items were interpreted differently in the treatment group than the original archaically worded items in the control group. This involved a specific sequence of tests of two broad types: measurement level invariance and latent construct invariance (sometimes referred to as structural invariance). Measurement invariance tests must precede structural invariance tests although methodologists disagree on the specific order of the tests within these two broad categories [41,42,43]. The sequence of steps in the bottom-up approach to ME/I tests [4] were used in the following order: (1) tests of equal form, (2) tests of equal factor loadings, (3) tests of equal indicator error variances, (4) tests of latent factor variance, and (5) tests of the covariance between latent factors comprising the scales. Steps one and two are essential for invariance testing but steps three through five are widely regarded as stringent and not always required [42,2,44]. Evidence of invariance is provided by an examination of changes in model fit from one test to the next.

When researchers examine instruments using ME/I tests it is recommended that they supplement the chi-square difference test, which is known to be heavily influenced by sample size, with other fit indices [45]. For example, a commonly used rule of thumb for changes in model fit for the comparative fit index (CFI) [46] is that each sequentially more restrictive model should result in less than a -0.01 decrement to fit in order to indicate invariance across groups [3]. If the CFI decreases in magnitude in a successively restrictive model by less than -0.01 (e.g., ΔCFI = -.009) then the two models are equivalent. Because there are no standard errors associated with the ΔCFI, this rule of thumb only serves as a guideline and not as a strict statistical test. To establish model fit in the first test of ME/I both the chi-square and the CFI were calculated as baselines. These supplemental tests to the baseline model were the Root Mean Squared Error of Approximation (RMSEA) [47] and the Standardized Root Mean Squared Residual (SRMR). Good model fit is indicated when CFI ≥ .95, RMSEA < .06, and SRMR < .08 [48]. More lenient cutoffs are that CFI ≥ .90, RMSEA < .10, and SRMR < .10 will indicate good fit [44]. Some use a different cutoff for the RMSEA such that < .08 indicates reasonable fit [47]. With such varying rules of thumb in mind, a strict adherence to fit index cutoffs for the rejection of models should be considered only in light of theoretical or substantive issues [49,50].

Participants

Complete data were provided anonymously by 483 respondents. Listwise deletion was used so 15 participants who failed to complete the survey were discarded from the analysis. As in Study One, most of the participants were female (52.0%). The mean age was 21.75 years and self-reported racial or ethnic group membership was 60.4% White, 8.5% Black, 25.6% Hispanic, 2.9% Asian, 0.2% American Indian, and 2.3% other. The mean level of full-time and part-time work experience was 18.88 months and 39.26 months, respectively. Nearly 58% were currently employed. Of those 282 currently employed participants, 17% were the direct supervisor or manager of other employees. Of those 48 managers, the mean number of direct reports was 11 with a range of two to 46.

Mach IV instrument

In the control group (n = 243) with originally worded items, alpha reliability for scores on the Mach IV was .70. The item-level skewness and kurtosis for responses to the original items ranged from -0.84 to 1.86 and from -1.01 to 4.14, respectively. In the treatment group (n = 240) with revised items, alpha was .67. The skewness and kurtosis ranged from -1.01 to 2.06 and from -1.16 to 4.67, respectively. In both groups, item level normality met recommended cutoffs [37].

Study two results

Tests of normality

Univariate normality is a necessary, but not sufficient, condition for multivariate normality [51], which was calculated as a function of Mardia's normalized coefficient of kurtosis using a macro [52]. The Mardia’s normalized coefficient for Study Two was 38.23 which was larger than the recommended cutoff of |3.0| [53,54]. Because the data were not multivariate normal, the Satorra-Bentler scaled chi-square (S-B χ²) and robust standard errors adjustment to the maximum likelihood method of estimation was used in CFA with Lisrel 8.8 software [55] and the covariance and asymptotic covariance matrices were used as input.

CFA tests of the Mach IV

Preliminary single group analysis

In the following CFA we refer to the items of the Mach IV as well as the factors that emerged from the EFA by our original numbering system. In the preliminary analysis with the control group examining the six-factor structure of the 20-item Mach IV found via EFA in Study One, the SRMR and RMSEA were good but the CFI was .894 and therefore just slightly below the recommended cutoff [44]. However, Item 12, which is the only item on Factor 5, was a Heywood case which resulted in a negative error variance for the item and model non-convergence. In the treatment group the fit indices were also good with only the CFI again falling a bit short at .860. In the treatment group, the sole item loading on Factor 5 (Item 12) was also a Heywood case. Among the solutions to a Heywood case is to force the item to load on some other factor or dropping it from the analysis. Item 12 was not a revised item and read as "People suffering from incurable diseases should have the choice of being put painlessly to death" in both groups. To examine Item 12's appropriateness for loading on some other factor, the EFA results from Study One were consulted. The standardized factor loadings for item 12 in the EFA results on factors other than Factor 5 were very weak and ranged from -.075 to .088 so the item was instead dropped from further analysis rather than forcing it to load on another factor on which it would have presumably loaded weakly. With this in mind, a five-factor solution to the Mach IV was re-analyzed in the control and treatment groups.

In subsequent analysis with the control group examining the now-five-factor structure of the now-19-item Mach IV, the fit indices improved with the CFI rising above the cutoff at .904. However, the covariance between Factor 4 and Factor 6 was unidentified indicating that the covariance was unstable and not likely to result in any particular best estimate of the data. The item loadings for both of Factor 4's only two items (Items 9 and 16) were close to zero and negative. In the treatment group the fit indices were similar to those of the last iteration with only the CFI falling short of the cutoff. There were no problems with model convergence in the treatment group but Item 16 also loaded very weakly at only .02 on Factor 4. This item was a revised item. In the control group it read as "Most men forget more easily the death of their father than the loss of their property." In the treatment group the revised item was "Most people forget more easily the death of a parent than the loss of their property." Because of problems with Item 16 in both groups, it was dropped and a now-18-item solution was again re-examined in both groups.

In the five-factor structure of the now-18-item Mach IV, the fit indices for the control group were good and the CFI improved to .903. In the treatment group the fit indices were good but the CFI still lagged a bit below the recommended cutoff. However, Item 9 was still problematic in both groups with a negative error variance in both groups so its standard error could not be calculated. Item 9 was a revised item and read as "It is wise to flatter important people" in the control group and as "It is wise to be especially nice to important people" in the treatment group. Thus, Item 9 was dropped and the model was yet again re-analyzed in both groups.

In the now-four-factor structure of the now-17-item Mach IV, in the control group the fit indices were good and the CFI rose further to .914. In the treatment group the fit indices were also good and although the CFI was .856, that was an improvement over the previous iteration. There were no problematic items and the model converged in both groups with no Heywood cases. However, because of sequential item elimination these fit statistics are likely biased upward and subsequent analysis based on these shortened scales should be viewed with caution given the changes to the instrument and its factor structure that were required to achieve model fit. Thus, the best fitting CFA solution to the Mach IV instrument used 17 of the original 20 items in a four-factor solution. It is noteworthy that two of the three items discarded were revised items and that this factor structure is not dissimilar to that of previous factor analyses of of the Mach IV instrument which vary wildly from three to seven factors on items ranging in number from 10 to 20. See Table 4 for these fit indices.

Table 4. Study two tests of Mach IV scale separately in the control and treatment groups.

Model	S-B χ²	df	CFI	SRMR	RMSEA (90% CI)
Control group
20 items, 6 factors	248.432	155	.894	.077	.050 (.038; .061)
19 items, 5 factors	225.507	142	.904	.075	.049 (.037; .061)
18 items, 5 factors	203.972	125	.903	.074	.051 (.038; .064)
17 items, 4 factors	180.824	113	.914	.074	.050 (.036; .063)
Treatment group
20 items, 6 factors	259.667	155	.860	.070	.053 (.042; .064)
19 items, 5 factors	247.602	142	.856	.071	.056 (.044; .067)
18 items, 5 factors	266.471	125	.853	.073	.060 (.048; .072)
17 items, 4 factors	216.963	113	.856	.073	.062 (.049; .074)

Open in a new tab

Note. S-B χ² = Satorra-Bentler scaled chi-square; df = degrees of freedom; SRMR = standardized root mean square residual; CFI = comparative fit index; RMSEA = root mean square error of approximation; 90% CI = 90% confidence interval for RMSEA

Multiple group (ME/I) analysis

Before combining the two groups for the ME/I tests, one item for each of the remaining four factors was selected to serve as the referent indicator so as to set the metric of the four latent constructs in the model. The factor variance could not be set to unity (i.e. 1), because setting factor variances to unity in both groups essentially constrains them to equivalency. Tests of factor variance equivalency are the fourth step in the ME/I sequence [4,2] and constraining the factor variances to equality in steps one through three is inappropriate. Therefore, the item that loaded on each factor with the most similar magnitude in both groups was constrained to unity (and therefore to equivalency) in subsequent analyses. These referent items (using the original numbering of the scale) were: Item 15 on Factor 1, Item 1 on Factor 2, Item 14 on Factor 3, and Item 18 on Factor 6 (note that Factors 4 and 5 were eliminated from the model because of problematic items).

The tests of equal form between the two groups resulted in S-B χ² = 398.693 (df = 230, p < .001), CFI = .882, and RMSEA = .055 (90% CI: .046, .064). Despite being a bit low, the CFI of .882 served as the baseline model fit statistic to which the second test was compared. The second test of the model was for equal factor loadings (i.e. full metric equivalence) and resulted in an increase in the CFI of .002 which was less than the cutoff recommendation [3]. The third test of equal indicator error variances resulted in ΔCFI = -.010 and therefore did not meet the recommended cutoff. With this in mind, subsequently more stringent ME/I tests were not conducted. See Table 5 for the fit indices.

Table 5. Study two tests for measurement invariance for a four factor solution for 17 of the 20 items on the Mach IV scale.

Model	S-B χ²	df	S-B χ²_diff [56]	Δdf	CFI	ΔCFI	RMSEA (90% CI)
1) Equal form	398.693	230	--	--	.882	--	.055 (.046; .064)
2) Equal factor loadings^a	416.919	243	18.223	13	.885	.003	.055 (.046; .063)
3) Equal indicator error variances for all 17 items^b	449.797	260	33.157^*	17	.875	-.010	.055 (.046; .064
Equal indicator error variances for …
3a) …16 items (without revised item 2^b)	449.318	259	32.812^**	16	.874	-.011	.055 (.047; .064)
3b) …16 items (without revised item 4^b)	444.496	259	27.585^*	16	.877	-.008	.055 (.046; .063)
3c) …16 items (without revised item 6^b)	440.547	259	23.370	16	.880	-.005	.054 (.045; .063)
3d) …16 items (without revised item 11^b)	449.318	259	32.713^**	16	.874	-.011	.055 (.047; .064)
3e) …16 items (without revised item 18^b)	438.350	259	21.052	16	.881	-.004	.054 (.045; .062)
3f) …16 items (without revised item 20^b)	449.719	259	33.201^**	16	.874	-.011	.055 (.047; .064)

Open in a new tab

^a Comparison of Model 2 to Model 1

^b Comparison of Models 3, 3a, 3b, 3c, 3d, 3e, and 3f to Model 2

Note. S-B χ² = Satorra-Bentler scaled chi-square; df = degrees of freedom; S-B χ²_diff = nested scaled χ² difference requiring adjustment to chi-square change test; SRMR = standardized root mean square residual; CFI = comparative fit index; ΔCFI = change in CFI for nested models; RMSEA = root mean square error of approximation; 90% CI = 90% confidence interval for RMSEA

* p < .05

** p < .01

To ascertain the specific sources of model misfit when error variances were constrained to equivalency, one-by-one the indicator error variances for the revised items were set free and comparisons to the baseline CFI of .882 were examined. When the error variance was freed one-at-a-time for Items 2, 4, 6, 11, 18, and 20 the ΔCFI was -.011, -.008, -.005, -.011, -.004, and -.011, respectively. In sum, the error variances for three of six revised items were not invariant. See Table 4 for the model fit statistics. The error variance was larger in the control group for items 2 and 11 but not for item 20. Specifically, the error variance for item 2 was .93 in the control group and .88 in the treatment group. The error variance for item 11 was .99 in the control group and .88 in the treatment group. Finally, the error variance for item 20 was .59 in the control group and .82 in the treatment group. This suggests that the revised items 2 and 11 were more reliable (i.e. more variance was explained by the common latent construct than by any unknown source) than were the original items 2 and 11. However, the error variances for each of these three items were responsible for the majority of the overall variance extracted. The average variance extracted for the reduced length Mach IV with 17 items and four factors was nearly identical at 22.51% to the 22.11% for the control and the treatment groups, respectively.

Study two discussion

Study Two examined the measurement invariance of originally worded and revised items in the Machiavellian IV scale [1] in an experimental framework. The scores on the two versions of the Mach IV showed equal form and equal factor loadings but unequal indicator error variances. Because invariance was detected at step three, the fourth and fifth tests of equal factor variances and equal factor covariances, respectively, were not conducted. The equal form of these two versions of the scale indicates that the pattern of covariances between the original and revised items is similar across the two groups (control and experimental groups). The result of equal factor loadings indicates that the items have similar variance emanating from their focal underlying constructs (sub-scales of Machiavellianism) in both settings. Unequal indicator error variances suggest that the unexplained variance in the scale items is of a different magnitude in the two versions of the instrument.

The RMSEA and the SRMR were both within range of acceptability in all full-length and reduced-length iterations of the model in both groups, but the CFI suffered a bit in the scale’s original entirety in the treatment group as well as in the multiple group analysis. The CFI for the final 17-item four-factor version of the Mach IV in the control and treatment groups was .914 and .856, respectively. These CFI values are not entirely out of line with previous efforts by other researchers. Previous CFA analyses of the Mach IV often also required the elimination of underperforming items and at least once, the revision of the wording for some items. Researchers found that the CFI ranged from .73 for 20 items using a four-point response scale [57], to .82 for 13 items [8], to .85 for all 20 items [58], and to .95 and .98 for ten-item versions of the Mach IV[9].

To further seek out the source of error invariance, six different single item error variances were freed one-by-one and the fit of these models were compared to the baseline model fit found in step one for equal form. It was found that the error variances for three of the six revised items (2, 11, and 20) were not invariant. Changes to the original items were: (a) the removal of an idiomatic expression " …carry more weight" in favor of " …be more influential" in item 2, (b) to correct both an archaic reference and an idiomatic expression with the complete rewrite of "Barnum was very wrong when he said there's a sucker born every minute" as "It is wrong to think that most people can be easily tricked" in Item 11, and (c) the removal of a strict gender reference such that "Most men are brave" was rewritten as "Most people are brave" in Item 20. These changes suggest that the error associated with these three items and therefore the reliability of the original and revised form was not equal. The alpha coefficient of reliability was .708 for the original 17 items and .687 for the revised scale of the same length.

Additionally, in an item-by-item comparison of scores in the control and treatment groups, three items had unequal variances by virtue of Levene’s test. All were revised items. Item 6 about vicious streaks (F = 4.234, p < .05), item 11 about Barnum and suckers (F = 4.221, p < .05), and item 18 about men working hard (F = 18.130, p < .001) each had a larger spread of scores in the revised version than in the original version of the items. Item variances for other items in the control and treatment groups were not different. There were also significant differences in the means for those three items. Item 6 (t = -4.234, df = 472.66, p < .001, Cohen's d = .381) and item 18 (t = -6.425, df = 466.15, p < .001, d = .580) had higher means in the revised versions of the items and item 11 (t = 7.992, df = 476.49, p < .001, d = .727) had a higher mean in the original item. Item means for other items were not different. See Table 6 for the results for all items.

Table 6. Tests of equality of variance and means for original versus revised items.

	Retained or	Revised or	Levene's test of equality of variance	t-test of equality of means
Item	discarded item?	original item?	F	t	df	Cohen’s d
1.	retained	original	.573	1.734	481	.157
2.	retained	revised	1.946	-.464	481	.000
3.	retained	original	.018	.076	481	.005
4.	retained	revised	1.116	-.682	481	.065
5.	retained	original	3.853	-.876	481	.081
6.	retained	revised	4.234^*	-4.234^***	472.66	.381
7.	retained	original	.466	-.044	481	.007
8.	retained	original	.365	-1.001	481	.091
9.	discarded	revised	--	--	--	--
10.	retained	original	.015	.630	481	.050
11.	retained	revised	4.221^*	7.992^***	476.49	.727
12.	discarded	original	--	--	--	--
13.	retained	original	.223	1.641	481	.152
14.	retained	original	.488	-.501	481	.041
15.	retained	original	.141	.459	481	.041
16.	discarded	revised	--	--	--	--
17.	retained	original	1.617	-.555	481	.049
18.	retained	revised	18.130^***	-6.425^***	466.15	.580
19.	retained	original	.058	.335	481	.031
20.	retained	revised	.041	-1.868	481	.173

Open in a new tab

Note. Non-integer degrees of freedom for some t-tests are an adjustment because of failing Levene’s test of the homogeneity of variance

* p < .05

*** p < .001

General discussion

The fit of the 17-item four-factor model in single group and multiple group analysis was not altogether poor with the RMSEA consistently less than the cutoff of .06 and the SRMR in the single group analyses being less than .08. However, the CFI for both the single group and multiple group analyses fell slightly short at .88. The RMSEA indicates complex model misfit and its impressive performance on these data indicate that the pattern of covariances between the latent constructs (i.e. sub-scales) was reproduced effectively in the model. The CFI measures simple model misfit and it is no surprise that the weak factor loadings affected the ability to reproduce the model from these data. A separate SRMR is produced for both groups in ME/I tests as an indicator of each group's contribution to fit but an overall SRMR for the multiple group analysis is not obtainable when using Lisrel 8.8 software. Because changes in the CFI are the focus of analyses in ME/I tests [3], a marginally acceptable starting value for the groups may have affected the results of subsequent tests on the data. In sum, the results of revising items in the Mach IV did not produce great change in model fit. With these results and those of other CFA tests of the Mach IV it may be time to move on to other more recently developed measures of Machiavellianism. The wildly fluctuating number of factors resulting from previous work, the oddly changing loading of items on different factors in different published studies, and our own ME/I tests suggest that the problems with the Mach IV might be insurmountable. We encourage other researchers to continue their development of alternative measures of Machiavellianism and support a move toward the measurement of actual Machiavellian behavior using multiple sources of information which will surely aid in the collection of validity evidence for the construct.

The results of research on the measurement of Machiavellianism appears to depend greatly on the intended audience for the instrument. However, much if not most initial psychometric analyses of self-report inventories is conducted using college students. These students are rarely the intended respondents of such scales and this should be noted as a limitation of the current study. Sample-specific instruments do exist which do not make use of college students in the scale development phase such as the Kiddie Mach which was developed for children [1]. With the passage of time the language of items in scales periodically requires updating because of demographic changes in typical respondents. Efforts at changing the wording of some of the items in the Mach IV in the current study produced only minor change in the fit of the models to the data. The measurement of Machiavellianism is likely to be more complex than originally envisioned in 1970 [1]. Given the inclusion of Machiavellianism in the Dark Triad [5], measurement research on the construct is likely to proliferate in the future.

Strengths

Some strengths of the current research are its experimental design and the use of stringent measurement tests associated with CFA and measurement invariance. The age of the respondents was also a strength given the propensity of psychological researchers to use undergraduate students and the main premise of this research that the lexicon of students today is likely different from that of students approximately 50 years ago when the scale was developed. Another strength is the use of CFA-based ME/I tests [2] given that less stringent guidelines for ME/I steps and tests have been advocated by others [59,60].

Future research

Researchers may want to consider pursuing some areas of related research. First, to gather evidence of the validity of the revisions to the Mach IV proposed here, they might consider examining the impact of our revised Mach IV scale in its relationship with other constructs. Second, they might consider conducting an item response theory (IRT) analysis on the revised items in the instrument. Such an analysis would help ascertain item level discrimination and difficulty as well as where the most item information is found on the response scale for the revised items of the Mach IV. The dependable, consistent measurement of any construct is paramount and an IRT-based analysis of item-level reliability would likely help with the effective measurement of Machiavellianism.

Another promising area of research is to extend this type of framework by revising problematically worded items in other scales. For example, the Protestant Work Ethic instrument [61] and the Conscientiousness scale from the International Personality Item Pool [62] have some potential problems with wording. The PWE items potentially suffer from non-gender-neutral references such as: "The self-made man is likely to be more ethical than the man born to wealth" and "Any man who is able and willing to work hard has a good chance of succeeding." To a female-dominated sample of college student respondents these items may be conceptualized differently than they are by male respondents. The IPIP's Conscientiousness scale potentially suffers from the use of archaic phrasing. One item reads as "I shirk my duties" (reverse scored). The use of the word "shirk" peaked in 1916 [63] and may be completely unfamiliar to a sample of today's college students. Scales developed almost 50 years ago like the PWE, or even as recent as 20 years ago as with the IPIP, may suffer from the sort of archaic references, non-gender neutral inferences, and idiomatic expressions examined in this study and are therefore worthy of examination in a ME/I framework as well.

Data Availability

The online Texas Data Repository (dataverse.tdl.org/dataverse/txstate) is used to share datasets through the Texas Digital Library and managed by local Texas State University librarians. The Texas Digital Library (TDL) is a consortium of academic libraries in Texas with a proven history of providing shared technology services to support secure, reliable access to digital collections of research and scholarship. The Texas Data Repository is a project of the TDL and its member institutions to develop a consortial statewide research data repository for researchers at Texas institutions of higher learning. Data is curated in the repository following accepted standards (NISO Framework Advisory Group, 2007). The persistent identifier, a DOI, used for the data in this study is https://dataverse.tdl.org/dataset.xhtml?persistentId=doi:10.18738/T8/WPZSAP.

Funding Statement

The authors received no specific funding for this work.

References

1.Christie R, Geis FL. Studies in Machiavellianism. New York: Academic Press; 1970. [Google Scholar]
2.Cheung GW, Rensvold RB. Testing factorial invariance across groups: A reconceptualization and proposed new method. J Mgmt. 1999; 25: 1–27. [Google Scholar]
3.Cheung GW, Rensvold RB. Evaluating goodness-of-fit indexes for testing measurement invariance. Struct Eq Model. 2002; 9: 233–255. [Google Scholar]
4.Brown TA. Confirmatory factor analysis for applied research. New York: The Guilford Press; 2006. [Google Scholar]
5.Paulhus DL, Williams KM. The dark triad of personality: Narcissism, Machiavellianism, and psychopathy. J Res Pers. 2002; 36: 556–563. [Google Scholar]
6.Jones D. The nature of Machiavellianism: Distinct patterns of misbehavior In Zeigler-Hill V, Marcus DK, editors. The dark side of personality: Science and practice in social, personality, and clinical psychology. Washington, DC: American Psychological Association; 2016. [Google Scholar]
7.Panitz E. Psychometric investigation of the Mach IV scale measuring Machiavellianism. Psych Rep. 1989; 64(3): 963–968. [Google Scholar]
8.Hunter JE, Gerbing DW, Boster FJ. Machiavellian beliefs and personality: Construct invalidity of the Machiavellian dimension. J Pers Soc Psych. 1982; 43(6): 1293–1305. [Google Scholar]
9.Monaghan C., Bizumic B, Sellbom M. The role of Machiavellian views and tactics in psychopathology. Pers Indiv Diff. 2016; 94: 72–81. [Google Scholar]
10.Hanel HP, Vione KC. Do student samples provide an accurate estimate of the general public? Plos One; 2016: 11(12). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Henry PJ. College sophomores in the laboratory redux: Influences of a narrow data base on social psychology’s view of the nature of prejudice. Psycho Inq: Inter J Adv Psych Theory; 2008: 19: 49–71. [Google Scholar]
12.National Center for Education Statistics, Table 187: College enrollment rates of high school graduates, by sex from 1960 to 1998. See www.nces.ed.gov, accessed December 6, 2018.
13.National Center for Education Statistics, What are the new back to school statistics for 2017? See www.nces.ed.gov, accessed December 6, 2018.
14.Ruiz NG. New foreign student enrollment at U.S. colleges and universities doubled since Great Recession. Pew Research Center, November 20, 2017, www.pewresearch.org . [Google Scholar]
15.Tara J. International students in U.S. colleges and universities Top 1 million. Time, November 14, 2017. [Google Scholar]
16.Abel B. (2003). English idioms in the first language and second language lexicon: A dual representation approach. Sec Lang Res. 2003; 19(4): 329–358. [Google Scholar]
17.Bortfield H. What native and non-native speakers' images for idioms tell us about figurative language. In Heredia RR, Altarriba J, editors. Advances in Psychology: Vol. 134; 2002. [Google Scholar]
18.Reise SP, Widaman KF, Pugh R.H. Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psych Bull. 1993; 114: 552–566. [DOI] [PubMed] [Google Scholar]
19.Williams ML, Hazleton V, Renshaw S. The measurement of Machiavellianism: A factor analytic and correlational study of Mach IV and Mach V. Speech Mon. 1975; 42(2):151–159. [Google Scholar]
20.Ahmed S, Stewart R. Factor analysis of the Machiavellian Scale. Soc Behav Pers: Int J. 1981; 9: 113–116. [Google Scholar]
21.Andreou E. Bully/victim problems and their association with Machiavellianism and self-efficacy in Greek primary school children. Brit J Ed Psych. 2004; 74(2): 297–309. [DOI] [PubMed] [Google Scholar]
22.Kuo HK, Marsella AJ. The meaning and measurement of Machiavellianism in Chinese and America College Students. J Soc Psych. 1977; 101(2): 165–173. [DOI] [PubMed] [Google Scholar]
23.O'Hair D, Cody MJ. Machiavellian beliefs and social influence. West J Speech Comm. 1987; 51(3): 279–303. [Google Scholar]
24.Dahling JJ, Whitaker BG, Levy PE. The development and validation of a new Machiavellian scale. J Mgmt. 2009; 35: 219–257. [Google Scholar]
25.Miller BK, Konopaske R. Dispositional correlates of perceived work entitlement. J Mgmt Psych. 2014; 29: 808–828. [Google Scholar]
26.Miller BK, Smart DL, Rechner PL. Confirmatory factor analysis of the Machiavellian Personality Scale. Pers Indiv Diff. 2015; 82: 120–124. [Google Scholar]
27.Niemi L, Young L. Caring across boundaries versus keeping boundaries intact: Links between moral values and interpersonal orientations. PLoS One. 2013; 8: e81605 10.1371/journal.pone.0081605 [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Rauthmann JF, Will T. Proposing a multidimensional Machiavellianism conceptualization. Soc Behav Pers. 2011; 39: 391–404. [Google Scholar]
29.Rauthmann JF. Towards multifaceted Machiavellianism: Content, factorial, and construct validity of a German Machiavellianism Scale. Pers Indiv Diff. 2012; 52: 345–351. [Google Scholar]
30.Hair JE, Anderson RE, Tatham RL, Black WC. Multivariate data analysis, 5th ed Upper Saddle River, NJ: Prentice Hall; 1998. [Google Scholar]
31.Cummings G. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge; 2012. [Google Scholar]
32.Trafimow D, Marks M. Editorial. Basic App Soc Psych. 2015; 37: 1–2. [Google Scholar]
33.Comrey AL. A first course in factor analysis. New York, NY: Academic Press; 1973. [Google Scholar]
34.Gorsuch RL. Factor analysis, 2nd ed Hillsdale, NJ: Erlbaum; 1983. [Google Scholar]
35.Tabachnik BG, Fidell LS. Using Multivariate Statistics, 5th ed Boston: Pearson; 2007. [Google Scholar]
36.Thompson B, Daniel L. Factor analytic evidence for the construct validity of scores: An historical overview and some guidelines. Ed Psych Meas. 1996; 5: 197–208 [Google Scholar]
37.West SG, Finch JD, Curran PJ. (1995). Structural equation models with non-normal data In Hoyle RH, editor. Structural equation modeling. Thousand Oaks, CA: Sage; 1995. [Google Scholar]
38.Cortina J. What is coefficient alpha? An examination of theory and applications. J App Psych 1993; 78(1): 98–104. [Google Scholar]
39.Campbell DT, Stanley JC. Experimental and quasi-experimental designs for research. Boston: Houghton Mifflin Company; 1963. [Google Scholar]
40.Cook TD, Campbell DT. Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin Company; 1979. [Google Scholar]
41.Bollen KA. Structural equation modeling with latent variables. New York: Wiley; 1989. [Google Scholar]
42.Byrne BM. Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: Lawrence Erlbaum Associates; 1998. [Google Scholar]
43.Byrne BM, Shavelson RJ, Muthèn B. Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psych Bull. 1989; 105: 456–466. [Google Scholar]
44.Kline RB. Principles and Practice of Structural Equation Modeling, 3rd ed New York: Guilford Press; 2011. [Google Scholar]
45.Steenkamp JB, Baumgartner H. Assessing measurement invariance in cross-national consumer research. J Consum Res. 1998; 25: 78–90. [Google Scholar]
46.Bentler PM. Comparative fit indexes in structural equation models. Psych Bull. 1990; 107: 238–246. [DOI] [PubMed] [Google Scholar]
47.Browne MW, Cudeck R. Alternative ways of assessing model fit In Bollen KA, Long JS, editors. Testing structural equation models. California: Sage Publications; 1993. [Google Scholar]
48.Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Eq Mod. 1999; 6: 1, 1–55. [Google Scholar]
49.Hopwood CJ, Donnellan MB. How should the internal structure of personality inventories be evaluated? Pers Soc Psych Rev. 2010; 14: 332–346. [DOI] [PubMed] [Google Scholar]
50.Marsh HW, Hau K, Wen Z. In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Struct Eq Mod. 2004; 11: 320–341. [Google Scholar]
51.Henson RK. Multivariate normality: What is it and how is it assessed? In Thompson B, editor. Advances in social science methodology. Stamford, CT: JAI Press; 1999. [Google Scholar]
52.DeCarlo LT. On the meaning and use of kurtosis. Psychological Methods. 1997; 2: 292–307. [Google Scholar]
53.Bentler PM. Re: Kurtosis, residuals, fit indices. [Message posted to SEMNET listserv]. Msg 011264. Archived at http://bama.ua.edu/cgibin/wa?A2=ind9803&L=semnet&P=R10144&I=1. March 10, 1998. [Google Scholar]
54.Bentler PM, Wu EJ. EQS for Windows User's Guide. Encino, CA: Multivariate Software; 2002. [Google Scholar]
55.Jöreskog K. Sörbom D. LISREL 8.80 [Computer Software]. Chicago: Scientific Software International; 2006. [Google Scholar]
56.Satorra A. Scaled and adjusted restricted tests in multi-sample analysis of moment structures In Heijmans RDH, Pollock DSG, Satorra A, editors. Innovations in Multivariate Statistical Analysis. A Festschrift for Heinz Neudecker. London: Kluwer Academic Publishers; 2000. [Google Scholar]
57.Andrew J, Cooke M, Muncer SJ. The relationship between empathy and Machiavellianism: An alternative to empathizing–Systemizing theory. Pers Indiv Diff. 2008; 44: 1203–1211. [Google Scholar]
58.Corral S, Calvete E. Machiavellianism: Dimensionality of the Mach IV and its relation to self-monitoring in a Spanish sample. Span J Psych. 2000; 3(1): 3–13. [DOI] [PubMed] [Google Scholar]
59.Schmitt N, Kuljanin G. Measurement invariance: Review of practice and implications. Hum Res Mgmt. 2008; 18:210–222. [Google Scholar]
60.Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Org Res Meth. 2000; 3: 4–70. [Google Scholar]
61.Mirels HL, Garrett JB. The Protestant ethic as a personality variable. J Consult Clin Psych. 1971; 36: 40–44. [DOI] [PubMed] [Google Scholar]
62.Goldberg LR. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models In Mervielde I, Deary I, De Fruyt F, Ostendorf F, editors. Personality Psychology in Europe, Vol. 7 Tilburg, Netherlands: Tilburg University Press; 1999. [Google Scholar]
63.Google Ngram Viewer. Retrieved January 02, 2019, from https://books.google.com/ngrams/graph?content=shirk&year_start=1600&year_end=2017&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cshirk%3B%2Cc0

PLoS One. doi: 10.1371/journal.pone.0223504.r001

Decision Letter 0

Angel Blanch

27 Jun 2019

PONE-D-19-15176

Measurement invariance tests of revisions to archaically worded items in the Mach IV scale

PLOS ONE

Dear Dr. Miller,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please, see the comments of three Reviewers appended at the bottom of this letter. I am sorry that the two-lines comment provided by Reviewer #2 are quite useless. In contrast, Reviewer #1 has offered you constructive feedback, which I think might contribute to improve the presentation of your study. Because this might be considered as a major review, please notice that a resubmission will require an additional round of reviews, and that the final outcome of the process cannot be predicted at this point. If you decide to resubmit a revised version of your manuscript, please provide either a proper answer or rebuttal to each of the suggestions that were raised by the Reviewers.

We would appreciate receiving your revised manuscript by Aug 11 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Angel Blanch, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

1. When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

3. Please address the following queries related to the Mach IV scale modified in the current study: 1) If the questionnaire is licensed, do you have permission to use the licensed questionnaire for the purposes of the study? 2) As the questionnaire has been published previously, please state whether you have permission to reprint items of the published questionnaire under a CC-BY license?

Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) how the verbal consent was documented and witnessed. If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.”

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comment:

The authors conduct exploratory and measurement invariance analyses of a scale which is intended to measure the construct Machiavellianism. More specifically, they examine the impact of a rewording of some of the items on the structure and measurement properties of the scale.

The method of analysis and the rationale underlying it are described clearly. However, I think that some of the conclusions which are drawn from the analysis are debatable. For instance, an initial step of testing weak measurement invariance, results in a mediocre model fit. Given this (mis)fit, the subsequently drawn conclusions from the model comparisons do not necessarily seem stringent (see comments below). Moreover, it should be stated that the reported fit-indices are likely to be biased upward, given that preliminary model selection was already applied (that is, a preliminary elimination of “bad” items was conducted). Finally, I also think that the usage of a 4-6-factor solution for a test comprised of less than 20 items is problematic (see comments below) and should be discussed appropriately.

Specific comments:

Introduction: The paper would benefit from providing some brief review of the theory underlying the construct of Machiavellianism. In addition, providing the content/wording of all items would be helpful in understanding the proposed factor structure.

34: Factor indeterminacy is a general problem prominent in any FA model.

35-38: This suggests large variation between different (study) populations.

51:53: This could be tested via factorial invariance. Are there any corresponding results in the literature. Why does the current study not include such a test of invariance across gender?

The aim seems to be to optimize the MACH for student populations, as a lot of the presented arguments are tied to changes in student populations, i.e. a change in the proportion of female and international students. This should be mentioned as a limitation because I don't think that students are the primary focus for the use of a diagnostic instrument for Machiavellianism.

115: I would avoid the usage of cut-off scores like .3. On the one hand, they are somewhat arbitrary. But more importantly: It can be shown that even small loadings (below .3) can have a large impact on the inference of the factor scores when diagnosing test takers (see e.g. Jordan & Spiess, 2019).

133: It is correctly stated that alpha only provides a lower bound for the overall reliability. However, even if alpha were sufficiently high, it would - in my opinion – still be of limited use because its reference point is a formally defined "overall" true score which in this case is a mixture of multiple (6) dimensions.

141: There are more reliable approaches to the determination of the number of factors like e.g. Horn's parallel analysis (PA) or the more recently developed “deterministic counterpart” based on random matrix theory (see Dobriban & Owen, 2019). The authors already mentioned large variations of the number of extracted factors across different studies. Hence, I wonder if some part of the variation could be explained by "suboptimal" extraction criteria. In any case, I would suggest the use of PA to determine the number of factors.

162: The authors should expand their argument on negative loadings. In general, negative loadings are not a problem unless they contradict the factor label (i.e. an item in an intelligence test with negative loadings would contradict the label "IQ" as test takers with lower IQ would score higher on the item).

233: The examination of normality is not necessary. We already know that normality can't hold due to the discrete (1 to 7) format of the responses.

246-250: How was the CFA specified? By usage of the exact loadings from the EFA or by treating loadings below .3 as zero? In addition, heywood cases should not arise under ML-estimation. If the ML method did not converge (to which I think the heywood case refers), then this points to problems in the specification of the model.

263: I do not understand what is meant by "the covariances of ... were unidentified". Does this mean that the estimated factor covariance matrix was not positive definite?

264: It seems problematic to use such a few number of items per factor. Measuring a factor by 2-5 items is tantamount to produce scorings with low reliability.

275: "unidentified": It would be helpful to distinguish an unindentifiable case from a case, wherein the numerical optimization did not converge and/or provided inadequate estimates (e.g. negative variances). Which type are the authors referring to?

308: The problem with the approach for determining various sorts of measurement invariance is that the baseline model does not provide a good fit. Hence, although it seems that given the baseline model, further restrictions are possible without substantially lowering the CFI, the fit of the baseline model itself is somewhat questionable. Given these doubts, I think that the subsequent comparison of error variances is problematic as it relies on a good overall model fit (and not just on a good fit relative to a model with mediocre fit).

372: This line of reasoning should be strengthened by computing model based reliability estimates. In addition, to demonstrate an effect of item wording, I would recommend to append the paper with a test of the treatment effect. That is, are there any difference in the distribution of the responses to an item between treatment and control group?

395: In the general discussion, critique of the MACH is mentioned. I think it would be important to add and discuss the following topics:

- How can we account for (or interpret) the large variations (e.g. number of factors; number of retained items) across studies?

- If the wording of items has an impact on the reliability (or potentially even on the factor structure) of the scale, then this leaves "us" with measurement devices (self-report questionnaires) which are very fragile. Hence, I think a point could be made here in favor of moving towards other (perhaps more costly) measurement devices (e.g. actual observation of behavior; using multiple sources of information etc.).

- Given the relatively large number of underlying factors (4-6), the researcher/practitioner has the option to

a) either compute an inhomogeneous overall score which refers to a (difficult to interpret) mixture of (4-6) constructs

b) to compute dimension specific scores (4-6).

However, choosing b) basically boils down to measure a latent construct by only a few items. Hence, the scores are highly unreliable.

- Related to the previous point, but broadening the scope: A topic of central importance in the analysis of the replication crisis in psychology referred to the role of the measurement error (see e.g. Loken & Gelman, 2013), i.e. measurements in psychology are in general rather noisy. I think that by using such short (on average 4-5 items per factor) subscales, classical test theory would predict unreliable, noisy measurements. Thus, their subsequent usage might entail all of the problems which were discussed within the context of the low replicability of psychological science. Hence, I regard this as an additional argument against the usage of the scale.

References:

Dobriban, E. & Owen, A. B. (2019). Deterministic parallel analysis: an improved method for selecting factors and principal components. J. R. Stat. Soc. B, 81: 163-183. doi:10.1111/rssb.12301.

Jordan, P. & Spiess, M. (2019). Rethinking the interpretation of item discrimination and factor loadings, Educational and Psychological Measurement.

Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584–585.

Review is also available in the attached file.

Reviewer #2: The manuscript offers all the statistical information; also, an exhaustive and rigorous analysis process has been made. However, a more consistent theoretical introduction is lacking.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Review MACH.docx

Click here for additional data file.^{(20.5KB, docx)}

PLoS One. 2019 Oct 16;14(10):e0223504. doi: 10.1371/journal.pone.0223504.r002

Author response to Decision Letter 0

1 Aug 2019

Editorial Comments:

1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming.

Authors’ response: Thank you. We have done so.

Authors’ response: Thank you. We will be happy to do so.

Authors’ response: To our knowledge the instrument is in the public domain. It has been published in its entirety many, many times by numerous researchers and internet web sites. For example, Rauthmann (2013) published the entire scale in his article using IRT to analyze the Mach IV. The web site known as “A conscious rethink” published the entire scale at https://www.aconsciousrethink.com/6299/machiavellian-scale-test/ , for example. Hundreds of sites and articles have republished it in its entirety.

4. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) how the verbal consent was documented and witnessed. If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.”

Authors’ response: The need for consent was waived by our university’s Institutional Review Board who approved the study as exempt. Nevertheless, consent was indeed informed and acknowledged by respondents via signed consent forms. The forms were collected separately from the actual paper-and-pencil surveys so as to maintain the anonymity of respondents and were used to record participation by the professor from whose class respondents were voluntarily solicited so as to award some very, very minor extra credit in the course.

Reviewer #2 Comments:

1. The manuscript offers all the statistical information; also, an exhaustive and rigorous analysis process has been made. However, a more consistent theoretical introduction is lacking.

Authors' reply: We thank the reviewer for their comment on our rigor. We take great pride in our statistical prowess and are happy that the reviewer noticed. Given the broad audience for PlosONE we were remiss in our duties with the original submission regarding the theoretical foundation. The other reviewer made the same suggestion. Thus, we have included an entirely new introductory paragraph starting on line 27 and slightly altered the lead sentence in the now-second paragraph as follows:

“Machiavellianism is the predisposition to manipulate interpersonal relationships with guile, opportunism, and deceit. Research on this set of traits began in earnest with the development of a self-report inventory [1] based upon the lead character in Niccolo’ Machiavelli’s sixteenth century novel “The Prince”. The title character lacked morality and empathy and had ample distrust of others. The scale authors [1] painstakingly developed items designed to measure the character’s tendency toward amorality, to have negative views of human nature, and to employ interpersonal tactics designed for personal gain at the expense of others [1]. The result was the Mach IV which is a 20-item scale comprised of three subscales: morals, views, and tactics. This instrument has become the most frequently used scale to measure Machiavellianism [1]. Interest in Machiavellianism was further with its inclusion as one of the three traits comprising the relatively new Dark Triad [5]. The other two traits are…” We hope that is will suffice but would be happy to add more should the reviewer or the editor require it.

Reviewer #1 Comments:

1. The method of analysis and the rationale underlying it are described clearly. However, I think that some of the conclusions which are drawn from the analysis are debatable. For instance, an initial step of testing weak measurement invariance, results in a mediocre model fit. Given this (mis)fit, the subsequently drawn conclusions from the model comparisons do not necessarily seem stringent (see comments below).

Authors’ reply: We must admit that we see our paper as a sort of latch-ditch effort to save the Mach IV scale. As we note in our paper, the number of factors extracted by previous authors has varied wildly and the need to eliminate many items to achieve model fit by others is also commonplace. We want to be polite to Christie and Geis, however. With this sort of professional courtesy in mind we treaded lightly on our view of the scale. We thought that by revising some of the problematically worded items we could perhaps save the instrument from its current precipitous decline in usage due in part to other newly developed scales that we note in the paper and the well documented problems with the Mach IV scale. As it turns out, our revised scale shows some differences in the error variances leading one to conclude that its reliability differs from the original. We address the reviewer’s points one-by-one below and thank the reviewer for their insightful comments and suggestions.

2. Moreover, it should be stated that the reported fit-indices are likely to be biased upward, given that preliminary model selection was already applied (that is, a preliminary elimination of “bad” items was conducted).

Authors’ reply: We agree and have added the following text starting on line 294: “However, because of sequential item eliminaton these fit statistics are likely biased upward and subsequent analysis based on this shortened scale should be viewed with caution given the changes to the instrument and its factor structure that were required to achieve model fit.“ We do ask the reviewer to note that previous efforts at factor analysis of the inventory required very similar item elimination and factor reductions. In essence our factor analytic results are not very different from those of other researchers on the Mach IV instrument.

3. Finally, I also think that the usage of a 4-6-factor solution for a test comprised of less than 20 items is problematic (see comments below) and should be discussed appropriately.

Authors’ reply: We apologize for not being more transparent in our review of other’s factor analytic work on the Mach IV. The reviewer will hopefully note that Table 1 contains ample previous evidence of EFA/PCA analyses of the Mach IV that found between 3 and 7 factors for the 20 items. This sort of factor indeterminancy seems to be the norm in analysis of the Mach IV. In fact, Christie and Geis (1970) recommended a three-factor solution with one factor comprised of only two items. Williams et al.’s (1975) multi-factor solution had two factors with only two items ewach loading on them. Kuo and Marsella’s (1977) U.S. sample allowed six items to cross-load on at least two different factors. On page 6 the original text mentions the following: “This variety of factor solutions and very different loadings of items on factors in the Mach IV instrument suggests that further factor analyses are in order to build an understanding of the underlying factor structure of the Mach IV.“ However, we agree with the reviewer that this is not likely to be enough. We have amended the last sentence in the paragraph ending on page 19 to read as follows: "It is noteworthy that two of the three items discarded were revised items and that this factor structure is not dissimilar to that of previous factor analyses of of the Mach IV instrument which vary wildly from three to seven factors on items ranging in number from 10 to 20.“ Please also note that the original text on page 24 reads as follows: “Previous CFA analyses of the Mach IV often also required the elimination of underperforming items and at least once, the revision of the wording for some items. Researchers found that the CFI ranged from .73 for 20 items using a four-point response scale [57], to .82 for 13 items [8], to .85 for all 20 items [58], and to .95 and .98 for ten-item versions of the Mach IV[9].“ We hope the reviewer agrees that we have adequately addressed this very valid concern and that our models are not dissimilar to those found in previous research on the Mach IV.

Specific comments:

4. Introduction: The paper would benefit from providing some brief review of the theory underlying the construct of Machiavellianism.

Authors’ reply: The other reviewer also made this suggestion. Thus, we have included an entirely new introductory paragraph starting on line 27, which reads as follows: “Machiavellianism is the predisposition to manipulate interpersonal relationships with guile, opportunism, and deceit. Research on this set of traits began in earnest with the development of a self-report inventory [1] based upon the lead character in Niccolo’ Machiavelli’s sixteenth century novel “The Prince”. The title character lacked morality and empathy and had ample distrust of others. The scale authors [1] painstakingly developed items designed to measure the character’s tendency toward amorality, to have negative views of human nature, and to employ interpersonal tactics designed for personal gain at the expense of others [1]. The result was the Mach IV which is a 20-item scale comprised of three subscales: morals, views, and tactics. This instrument has become the most frequently used scale to measure Machiavellianism [1]. Interest in Machiavellianism was further with its inclusion as one of the three traits comprising the relatively new Dark Triad [5]. The other two traits are...“

We hope that this helps alleviate the reviewer’s concerns but would be delighted to add more if this is not sufficient.

5. In addition, providing the content/wording of all items would be helpful in understanding the proposed factor structure.

Authors’ reply: We suggest that the items not being revised are of less concern than those which did undergo some revision in our study. The original and revised items are included in Table 3 and the items not being revised are nevertheless still important but only a minor part of our analysis. The entire scale has been republished in its entirey scores, if not hundreds, of times and is available eslewhere. We are fans of Rauthmann’s (2013) IRT paper published in the Journal of Personality Assessment which contains the full scale.

6. Line 34: Factor indeterminacy is a general problem prominent in any FA model.

Authors’ response: We agree and have added the following to line 48: “…are not unlike other lengthy scales with obliquely related subscales...“ Thank you for this suggestion.

7. Lines 35-38: This suggests large variation between different (study) populations.

Authors’ response: Thank you. We have added “…due in part to the diversity of populations to which it has been administered“ to those lines. This is important to note.

8. Lines 51:53: This could be tested via factorial invariance. Are there any corresponding results in the literature. Why does the current study not include such a test of invariance across gender?

Authors’ response: To our knowledge this has not been studied yet. However, the current study is about revised versus orginal items, not males versus females per se. However, part of our argument is that some modern women may object to non-gender neutral items. As the reviewer suggests, the next step is to test whether males and females conceptualize of our revised items differently in tests of measurement invariance. It is, perhaps, a study that we will do in the future. We thank the reviewer for this query but it is beyond the scope of the current study due in part to the sample size requirements associated with statistical power and the very specific focus of the current paper.

9. The aim seems to be to optimize the MACH for student populations, as a lot of the presented arguments are tied to changes in student populations, i.e. a change in the proportion of female and international students. This should be mentioned as a limitation because I don't think that students are the primary focus for the use of a diagnostic instrument for Machiavellianism.

Authors’ response: We understand this point very clearly. It is true that our study focuses on student respondents and the Mach IV was not designed to measure Machiavellianism only in students. However, our focus on students as respondents is because of their proliferate usage in the initial scale development stage of many if not most self-report inventories. To that end we have added the following to pages 26-27: “However, much if not most, initial psychometric analyses of self-report inventories is conducted using college students. These students are rarely the intended respondents of such scales and this should be noted as a limitation of the current study. Sample-specific instruments do exist which do not make use of college students in the scale development phase such...“ Thank you very much for this reminder.

10. Line 115: I would avoid the usage of cut-off scores like .3. On the one hand, they are somewhat arbitrary. But more importantly: It can be shown that even small loadings (below .3) can have a large impact on the inference of the factor scores when diagnosing test takers (see e.g. Jordan & Spiess, 2019).

Authors’ response: We apologize for not highlighting more strongly the arbitrariness of such cutoffs in our original submission. We did this because we needed to decide in some manner which items to force onto which factors in the CFA analysis that followed. The CFA analysis did not allow cross-loadings as is the norm. Using a stricter cutoff of .4 would have resulted in six of the 20 original items not loading at all on any factor. A more lenient cutoff of .2 would have forced us to include six items with cross-loadings in our CFA analysis which we wanted to avoid very much. Because our intent was to confirm the EFA factor structure with CFA in the next step of our study, we had to use some cutoffs. We hope that our current caution about using cutoffs is enough to satisfy the reviewer. Howver, we would be delighted to include any additional information should it be absolutely required. We thank the reviewer for this suggestion.

11. Line 133: It is correctly stated that alpha only provides a lower bound for the overall reliability. However, even if alpha were sufficiently high, it would - in my opinion – still be of limited use because its reference point is a formally defined "overall" true score which in this case is a mixture of multiple (6) dimensions.

Authors’ response: We agree and have added the following to the middle of the sentence of note on page 12: “…the difficulty of estimating a baseline true score resulting from scale...“ Thank you for this reminder.

12. Line 141: There are more reliable approaches to the determination of the number of factors like e.g. Horn's parallel analysis (PA) or the more recently developed “deterministic counterpart” based on random matrix theory (see Dobriban & Owen, 2019). The authors already mentioned large variations of the number of extracted factors across different studies. Hence, I wonder if some part of the variation could be explained by "suboptimal" extraction criteria. In any case, I would suggest the use of PA to determine the number of factors.

Authors’ response: We thank the reviewer. After reading this suggestion, we ran parallel analysis in JASP software, which easily conducts EFA. The parallel analysis also resulted in a six-factor solution. We have added some text to reflect this and we thank the reviewer for recommending it to us.

13. Line 162: The authors should expand their argument on negative loadings. In general, negative loadings are not a problem unless they contradict the factor label (i.e. an item in an intelligence test with negative loadings would contradict the label "IQ" as test takers with lower IQ would score higher on the item).

Authors’ response: Thank you for this suggestion that we expand upon our findings/assertions. These negative loading do indeed contradict the factor labels. We have added the following two sentences to pages 12-13 in the hopes of further clarifying this issues for potential future readers: “That is, items with negative loadings tend to measure the opposite end of the Machiavellianism spectrum and when summed with positively loading items the overall scale score is closer to the middle range than either the low or high ranges. Therefore, the Mach IV is unlikely to be effective in discriminating low or high Machs from mid-range Machs.“ We hope that this is what the reviewer had in mind but would be delighted to exand upon this issue further if another revise and resubmit is graciously offered or required.

14. Line 233: The examination of normality is not necessary. We already know that normality can't hold due to the discrete (1 to 7) format of the responses.

Authors’ response: We will politely disagree here. The two issues that determine whether the default maximum likelihood estimation procedure can be used are coarseness of measurement and multivariate normality. Scales with as few as four points have been determined to not be so course as to require weighted least squares estimation. Our response scale was seven points. However, the data were not multivariate normal. The first step in the assessment of item normality is the determination of univariate normality. If and only if the data are not univariate normal can one correctly assume that the data are not multivariate normal. Our data were univariate normal according to commonly accepted cutoffs for skewness and kurtosis. Therefore we had to then test for multivariate normality. If the items had not been univariate normal we would not have had to calculate Mardia’s coefficient because univariate normality is a necessary but not sufficient condition for multivariate normality. Our data were not multivariate normal so because of the lack of coarseness of measurement in the items but the fact that the data were not multivariate normal we used the Satorra-Bentler corrections. If the data had been more coarsely measured but multivariate normal we would have used weighted least squares estimation. We think coverage of how we determined the appropriate estimator function is critical to our analyses and is sorely lacking from much of the research that we read. We hope the reviewer will allow us to keep this information in the manuscript but will relent and delete it if absolutely required.

15. Lines 246-250: How was the CFA specified? By usage of the exact loadings from the EFA or by treating loadings below .3 as zero? In addition, heywood cases should not arise under ML-estimation. If the ML method did not converge (to which I think the heywood case refers), then this points to problems in the specification of the model.

Authors’ response: We regret the omission of details regarding the CFA model specification. We apologize. Factor loadings less than .3 from the EFA test were indeed treated as zero. To use the exact loadings of the EFA model in a cross-loaded CFA test would likely have resulted in an overly complicated model with near perfect fit. Our goal, like that of most CFA tests was to find as parsimonious of a model of the data as possible. To address our omission we have added the following paragraph to page 15: “In the CFA model tests that follow, the factor structure arising from the previous EFA test was imposed on the factor structure with items loading less than .30 in the EFA being forced to zero in the CFA. The goal was to use as parsimonious of a model of the data as possible. Therefore error terms was not allowed to correlate and items were not allowed to cross-load on multiple factors.“ Please also note that in the original version of our paper the following text was found in the subsection on "Preliminary single group analysis" (now on page 17) and reads as follows: "In the preliminary analysis with the control group examining the six-factor structure of the 20-item Mach IV found via EFA in Study One."

As to the Heywood cases being evidence of non-convergence, the reviewer is correct and the fact that they occured is what forced us to rework the specification of our model by eliminating items and factors. Technically, Heywood cases are not uncommon in ML estimation and certainly arise more often in ML than in both ordinary least squares or generalized least squares estimation procedures.

16. Line 263: I do not understand what is meant by "the covariances of ... were unidentified". Does this mean that the estimated factor covariance matrix was not positive definite?

Authors’ response: Yes, it does. The term “unidentified” is part of the Lisrel error message we received. Another way of describing such an error is to say that the relationship between the two variables is so unstable so as to not allow for the estimation of one best covariance. We have added some verbiage to that effect on page 18: “indicating that the covariance was unstable and not likely to result in any particular best estimate of the data.“ We hope this clarifies things a bit for the reviewer.

17. Line 264: It seems problematic to use such a few number of items per factor. Measuring a factor by 2-5 items is tantamount to produce scorings with low reliability.

Authors’ response: We agree and note in Table 1 the proliferation of such items-per-factor including the original scale by Christie and Geis. This is just one more of the many problems with this scale.

18. Line 275: "unidentified": It would be helpful to distinguish an unindentifiable case from a case, wherein the numerical optimization did not converge and/or provided inadequate estimates (e.g. negative variances). Which type are the authors referring to?

Authors’ response: This is a typographical error of sorts on our part. We have changed the verbiage to refer to the problem with item 9 in the 18-item CFA model as having “negative error variance”. We thank the reviewer for bringing this to our attention.

19. Line 308: The problem with the approach for determining various sorts of measurement invariance is that the baseline model does not provide a good fit. Hence, although it seems that given the baseline model, further restrictions are possible without substantially lowering the CFI, the fit of the baseline model itself is somewhat questionable. Given these doubts, I think that the subsequent comparison of error variances is problematic as it relies on a good overall model fit (and not just on a good fit relative to a model with mediocre fit).

Authors’ response: We thank the reviewer for this keen insight. As part of our response to the reviewer’s general comment #2, we address this here. Previous researchers have had to engage in all sorts of machinations to examine the model fit of the Mach IV. Ours is no different. The scale is a bit of a mess. The comparison of error variances is just the next step in Brown’s (2006) bottom-up approach to measurement invariance tests as codified by Cheung and Rensvold (1999; 2002). It is true that our changes in model fit from step to step are comparing one bad apple to another, but we worked with what we had. Our data collection process was tightly controlled and we trust the responses to our surveys. Even when revised, the scale still has problems. We wish it were different. We really do.

20. Line 372: This line of reasoning should be strengthened by computing model based reliability estimates. In addition, to demonstrate an effect of item wording, I would recommend to append the paper with a test of the treatment effect. That is, are there any difference in the distribution of the responses to an item between treatment and control group?

Authors’ response: In our original manuscript we have an examination of item-by-item error variances immediately following Table 5 that is quite lengthy. Overall, the 17-item scale had similar coefficient alphas. We have added the following to the section on results for study two: "The alpha coefficient of reliability was .708 for the original 17 items and .687 for the revised scale of the same length." We hope that this is what the reviewer had in mind. We also conducted independent samples t-tests on the items as well as Levene’s test of the homogeneity of variance between the control and treatment groups. The results are rather brief as few differences exist and we decided to add a paragraph detailing them to the end of the study two results section. It reads as follows: "Additionally, in an item-by-item comparison of scores in the control and treatment groups, only two items had unequal variances by virtue of Levene’s test. Item 11 about Barnum and suckers resulted in F = 4.86, p < .05 and item 18 about men and hard work resulted in F = 20.985 (p < .01). For these two items the spread of scores was significantly larger in the treatment group than in the control group. Regarding mean differences in scores on the items, item 6 about vicious streaks resulted in t = -4.19 (p < .001), item 11 resulted in t = 7.76 (p < .001) and item 18 resulted in t = -6.60 (p < .001). For items 6 and 18, the mean score was higher in the treatment group but the mean for item 11 was higher in the control group. All in all, there were significant differences in either the distribution of scores or the mean scores on three items, each of which was a revised item." We hope that these rather brief item comparisons are acceptable to the reviewer and we thank the reviewer for this very important suggestion.

21. Line 395: In the general discussion, critique of the MACH is mentioned. I think it would be important to add and discuss the following topics:

Authors’ response: Thank you. See below.

21A. - How can we account for (or interpret) the large variations (e.g. number of factors; number of retained items) across studies?

Authors’ resonse: Hmmm. This is a good question. We assume that it is largely but not completely because of idiosyncracies in different groups of respondents. Our analysis shows that the item wording had only a minimal effect on model fit. The problem surely lies at the intersection of the items and survey respondents. That is, different respondent interpret different items differently. That, of course, is a problem with the original scale not being particularly great. It was a giant leap forward in measuring dark personality traits 50 years ago, but modern psychometric analyses have shown that it simply may be time to retire this scale, as we noted at the end of the first paragraph in the general discussion section. As we mentioned in our response to the reviewer’s general comment #1 we want to be polite to people in general and to our colleagues engaged in this sort of research in particular, so we have added the following to this paragraph based upon the reviewer’s query: “The wildly fluctuating number of factors resulting from previous work, the oddly changing loading of items on different factors in different published studies, and our own ME/I tests suggest that the problems with the Mach IV might be insurmountable.“

21B. - If the wording of items has an impact on the reliability (or potentially even on the factor structure) of the scale, then this leaves "us" with measurement devices (self-report questionnaires) which are very fragile. Hence, I think a point could be made here in favor of moving towards other (perhaps more costly) measurement devices (e.g. actual observation of behavior; using multiple sources of information etc.).

Authors’ response: We agree and have added the following to the aforementioned paragraph above: “We encourage other researchers to continue their development of alternative measures of Machiavellianism and support a move toward the measurement of actual Machiavellian behavior using multiple sources of information which will surely aid in the collection of validity evidence for the construct.“

21C. - Given the relatively large number of underlying factors (4-6), the researcher/practitioner has the option to a) either compute an inhomogeneous overall score which refers to a (difficult to interpret) mixture of (4-6) constructs, or b) to compute dimension specific scores (4-6). However, choosing b) basically boils down to measure a latent construct by only a few items. Hence, the scores are highly unreliable.

Authors’ response: We agree and suggest that for those administering this in a diagnostic effort at understanding Machiavellian facets the use of subscale scores is best. However, as the reviewer notes, these very short measures are inherently unreliable. In contrast, simply summing the scores on the subscales for an overall measure of Machiavellianism is painting with a broad brush. Additionally, as the reviewer notes adding heterogenous subscales to magically achieve an acceptable reliability is problematic in its own right.

21D. - Related to the previous point, but broadening the scope: A topic of central importance in the analysis of the replication crisis in psychology referred to the role of the measurement error (see e.g. Loken & Gelman, 2013), i.e. measurements in psychology are in general rather noisy. I think that by using such short (on average 4-5 items per factor) subscales, classical test theory would predict unreliable, noisy measurements. Thus, their subsequent usage might entail all of the problems which were discussed within the context of the low replicability of psychological science. Hence, I regard this as an additional argument against the usage of the scale.

Authors’ response: We agree and as noted previously we want to be polite in our mild condemnation of the scale. We have added verbiage aluded to that which hopefully is strong enough for the reviewer. Overall, we hope that we have adequately addressed the reviewer's points and made appropriate changes to the paper where needed. Our paper is much better because of the reviewer and we hope the reviewer and editor agree. Thanks!

References:

Dobriban, E. & Owen, A. B. (2019). Deterministic parallel analysis: an improved method for selecting factors and principal components. J. R. Stat. Soc. B, 81: 163-183. doi:10.1111/rssb.12301.

Jordan, P. & Spiess, M. (2019). Rethinking the interpretation of item discrimination and factor loadings, Educational and Psychological Measurement.

Loken, E., & Gelman, A. (2017). Measurement error and the replication crisis. Science, 355(6325), 584–585.

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(43KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0223504.r003

Decision Letter 1

Angel Blanch

29 Aug 2019

[EXSCINDED]

PONE-D-19-15176R1

Measurement invariance tests of revisions to archaically worded items in the Mach IV scale

PLOS ONE

Dear Dr. Miller,

We would appreciate receiving your revised manuscript by Oct 13 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Angel Blanch, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

**********

6. Review Comments to the Author

Reviewer #1: The authors have addressed most of the points which were raised. They also provided convincing arguments for their approach. Some comments/suggestions remain:

General:

I can relate to the authors aim of providing a polite and respectful critique of the MACH. However, in my opinion, in some cases this aim has led to formulations within the paper which do almost give an ambiguous notion about the usefulness of the scale, when in fact the usefulness of the scale could have heavily been called into question. I realize that a direct statement of this critique risks being interpreted as disrespectful – however, I think that the benefit, that readers/researchers get a clear-cut impression on the properties of the scale outweighs this risk. However, this is just a matter of opinion and there is no need for the authors to address this point (in fact, the revised version contained clearer formulations of the central properties of the scale.)

Specific:

l 190+: Maybe I do not understand this correctly, but the argument in terms of the negative loading does not seem convincing to me. If it is just a problem with respect to the computation of the sumscore, then one could resolve this issue by reversing the item (or by scoring it with a negative weight). A more convincing argument would deduce a contradiction between the quantity the factor is supposed to measure and the anticipated sign of the item that loads on this factor.

l 299+: It might be confusing to mention a sixth factor “However, the covariance between Factor 4 and Factor 6 was unidentified” within a five-factor model. Perhaps the authors could add a short statement to clarify this. (I guess, it refers to the elimination of the former fifth factor which was related to a Heywood case)

l 417+: I appreciate the added information on the treatment-control comparison. However, in order to avoid filtering the results in terms of statistical significance, I would recommend to provide a table which contains the results of the comparisons on all relevant items. That is, the table should also include the results of nonsignificant comparisons. Ideally, an estimate of an effect size measure (computed for significant as well as for nonsignificant comparisons) should also be added to enhance the evaluation of the treatment.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

PLoS One. 2019 Oct 16;14(10):e0223504. doi: 10.1371/journal.pone.0223504.r004

Author response to Decision Letter 1

9 Sep 2019

Response to Reviewers

Reviewer #1: The authors have addressed most of the points which were raised. They also provided convincing arguments for their approach. Some comments/suggestions remain:

General:

Authors’ response: We thank the reviewer for understanding our plight. We “walked a tightrope” a bit but are glad that the reviewer is pleased with our revisions that do indicate our point of view of the scale.

Specific:

Authors’ response: Ah…now we understand. We regret not being clearer in the original version of our paper when we simply listed "reverse coded" next to some example items. We actually recoded items that were reverse scored before submitting them to EFA and the CFA. We have added the following to lines 148-149: "All items were corrected for reverse scoring before the EFA was conducted.” Additionally, on lines 205-206 we added the following: “All items were corrected for reverse scoring before the CFA was conducted.” So, after reverse coding the reverse scored items we found that some of the items load negatively on some factors that also had positively loading items. This is not good in that items designed to measure facets of a construct should be positively correlated with that facet as well as with other items measuring the construct. Some of the items in the Mach IV are not positively correlated with each other. We thank the reviewer for pointing out how we may have just skipped by this without a proper explanation and we hope that our corrections to the paper and our explanation make things a bit clearer now.

Authors' response: This is a very good point. We have added the following to lines 277-278 in the hopes of clarifying our nomenclature: "In the following CFA we refer to the items of the Mach IV as well as the factors that emerged from the EFA by our original numbering system."

Authors' response: We have added the table that is suggested. In addition to the information requrested we have added a column in the table to indicate if an item was discarded or retained from the measurement invariance tests and a column indicating if the item was a revised item or an original item. To match this new Table 6 we have rewritten the verbiage on lines 415-425 as follows: "Additionally, in an item-by-item comparison of scores in the control and treatment groups, three items had unequal variances by virtue of Levene’s test. All were revised items. Item 6 about vicious streaks (F = 4.234, p < .05), item 11 about Barnum and suckers (F = 4.221, p < .05), and item 18 about men working hard (F = 18.130, p < .001) each had a larger spread of scores in the revised version than in the original version of the items. Item variances for other items in the control and treatment groups were not different. There were also significant differences in the means for those three items. Item 6 (t = -4.234, df = 472.66, p < .001, Cohen's d = .381) and item 18 (t = -6.425, df = 466.15, p < .001, d = .580) had higher means in the revised versions of the items and item 11 (t = 7.992, df = 476.49, p < .001, d = .727) had a higher mean in the original item. Item means for other items were not different. See Table 6 for the results for all items." Our new table begins on line 426. Thank you very much for your insightful suggestions and keen eye.

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(15.4KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0223504.r005

Decision Letter 2

Angel Blanch

24 Sep 2019

Measurement invariance tests of revisions to archaically worded items in the Mach IV scale

PONE-D-19-15176R2

Dear Dr. Miller,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Angel Blanch, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

PLoS One. doi: 10.1371/journal.pone.0223504.r006

Acceptance letter

Angel Blanch

2 Oct 2019

PONE-D-19-15176R2

Measurement invariance tests of revisions to archaically worded items in the Mach IV scale

Dear Dr. Miller:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Angel Blanch

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Review MACH.docx

Click here for additional data file.^{(20.5KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(43KB, docx)}

Attachment

Submitted filename: Response to Reviewers.docx

Click here for additional data file.^{(15.4KB, docx)}

Data Availability Statement

[pone.0223504.ref001] 1.Christie R, Geis FL. Studies in Machiavellianism. New York: Academic Press; 1970. [Google Scholar]

[pone.0223504.ref002] 2.Cheung GW, Rensvold RB. Testing factorial invariance across groups: A reconceptualization and proposed new method. J Mgmt. 1999; 25: 1–27. [Google Scholar]

[pone.0223504.ref003] 3.Cheung GW, Rensvold RB. Evaluating goodness-of-fit indexes for testing measurement invariance. Struct Eq Model. 2002; 9: 233–255. [Google Scholar]

[pone.0223504.ref004] 4.Brown TA. Confirmatory factor analysis for applied research. New York: The Guilford Press; 2006. [Google Scholar]

[pone.0223504.ref005] 5.Paulhus DL, Williams KM. The dark triad of personality: Narcissism, Machiavellianism, and psychopathy. J Res Pers. 2002; 36: 556–563. [Google Scholar]

[pone.0223504.ref006] 6.Jones D. The nature of Machiavellianism: Distinct patterns of misbehavior In Zeigler-Hill V, Marcus DK, editors. The dark side of personality: Science and practice in social, personality, and clinical psychology. Washington, DC: American Psychological Association; 2016. [Google Scholar]

[pone.0223504.ref007] 7.Panitz E. Psychometric investigation of the Mach IV scale measuring Machiavellianism. Psych Rep. 1989; 64(3): 963–968. [Google Scholar]

[pone.0223504.ref008] 8.Hunter JE, Gerbing DW, Boster FJ. Machiavellian beliefs and personality: Construct invalidity of the Machiavellian dimension. J Pers Soc Psych. 1982; 43(6): 1293–1305. [Google Scholar]

[pone.0223504.ref009] 9.Monaghan C., Bizumic B, Sellbom M. The role of Machiavellian views and tactics in psychopathology. Pers Indiv Diff. 2016; 94: 72–81. [Google Scholar]

[pone.0223504.ref010] 10.Hanel HP, Vione KC. Do student samples provide an accurate estimate of the general public? Plos One; 2016: 11(12). [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0223504.ref011] 11.Henry PJ. College sophomores in the laboratory redux: Influences of a narrow data base on social psychology’s view of the nature of prejudice. Psycho Inq: Inter J Adv Psych Theory; 2008: 19: 49–71. [Google Scholar]

[pone.0223504.ref012] 12.National Center for Education Statistics, Table 187: College enrollment rates of high school graduates, by sex from 1960 to 1998. See www.nces.ed.gov, accessed December 6, 2018.

[pone.0223504.ref013] 13.National Center for Education Statistics, What are the new back to school statistics for 2017? See www.nces.ed.gov, accessed December 6, 2018.

[pone.0223504.ref014] 14.Ruiz NG. New foreign student enrollment at U.S. colleges and universities doubled since Great Recession. Pew Research Center, November 20, 2017, www.pewresearch.org . [Google Scholar]

[pone.0223504.ref015] 15.Tara J. International students in U.S. colleges and universities Top 1 million. Time, November 14, 2017. [Google Scholar]

[pone.0223504.ref016] 16.Abel B. (2003). English idioms in the first language and second language lexicon: A dual representation approach. Sec Lang Res. 2003; 19(4): 329–358. [Google Scholar]

[pone.0223504.ref017] 17.Bortfield H. What native and non-native speakers' images for idioms tell us about figurative language. In Heredia RR, Altarriba J, editors. Advances in Psychology: Vol. 134; 2002. [Google Scholar]

[pone.0223504.ref018] 18.Reise SP, Widaman KF, Pugh R.H. Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psych Bull. 1993; 114: 552–566. [DOI] [PubMed] [Google Scholar]

[pone.0223504.ref019] 19.Williams ML, Hazleton V, Renshaw S. The measurement of Machiavellianism: A factor analytic and correlational study of Mach IV and Mach V. Speech Mon. 1975; 42(2):151–159. [Google Scholar]

[pone.0223504.ref020] 20.Ahmed S, Stewart R. Factor analysis of the Machiavellian Scale. Soc Behav Pers: Int J. 1981; 9: 113–116. [Google Scholar]

[pone.0223504.ref021] 21.Andreou E. Bully/victim problems and their association with Machiavellianism and self-efficacy in Greek primary school children. Brit J Ed Psych. 2004; 74(2): 297–309. [DOI] [PubMed] [Google Scholar]

[pone.0223504.ref022] 22.Kuo HK, Marsella AJ. The meaning and measurement of Machiavellianism in Chinese and America College Students. J Soc Psych. 1977; 101(2): 165–173. [DOI] [PubMed] [Google Scholar]

[pone.0223504.ref023] 23.O'Hair D, Cody MJ. Machiavellian beliefs and social influence. West J Speech Comm. 1987; 51(3): 279–303. [Google Scholar]

[pone.0223504.ref024] 24.Dahling JJ, Whitaker BG, Levy PE. The development and validation of a new Machiavellian scale. J Mgmt. 2009; 35: 219–257. [Google Scholar]

[pone.0223504.ref025] 25.Miller BK, Konopaske R. Dispositional correlates of perceived work entitlement. J Mgmt Psych. 2014; 29: 808–828. [Google Scholar]

[pone.0223504.ref026] 26.Miller BK, Smart DL, Rechner PL. Confirmatory factor analysis of the Machiavellian Personality Scale. Pers Indiv Diff. 2015; 82: 120–124. [Google Scholar]

[pone.0223504.ref027] 27.Niemi L, Young L. Caring across boundaries versus keeping boundaries intact: Links between moral values and interpersonal orientations. PLoS One. 2013; 8: e81605 10.1371/journal.pone.0081605 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0223504.ref028] 28.Rauthmann JF, Will T. Proposing a multidimensional Machiavellianism conceptualization. Soc Behav Pers. 2011; 39: 391–404. [Google Scholar]

[pone.0223504.ref029] 29.Rauthmann JF. Towards multifaceted Machiavellianism: Content, factorial, and construct validity of a German Machiavellianism Scale. Pers Indiv Diff. 2012; 52: 345–351. [Google Scholar]

[pone.0223504.ref030] 30.Hair JE, Anderson RE, Tatham RL, Black WC. Multivariate data analysis, 5th ed Upper Saddle River, NJ: Prentice Hall; 1998. [Google Scholar]

[pone.0223504.ref031] 31.Cummings G. Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York: Routledge; 2012. [Google Scholar]

[pone.0223504.ref032] 32.Trafimow D, Marks M. Editorial. Basic App Soc Psych. 2015; 37: 1–2. [Google Scholar]

[pone.0223504.ref033] 33.Comrey AL. A first course in factor analysis. New York, NY: Academic Press; 1973. [Google Scholar]

[pone.0223504.ref034] 34.Gorsuch RL. Factor analysis, 2nd ed Hillsdale, NJ: Erlbaum; 1983. [Google Scholar]

[pone.0223504.ref035] 35.Tabachnik BG, Fidell LS. Using Multivariate Statistics, 5th ed Boston: Pearson; 2007. [Google Scholar]

[pone.0223504.ref036] 36.Thompson B, Daniel L. Factor analytic evidence for the construct validity of scores: An historical overview and some guidelines. Ed Psych Meas. 1996; 5: 197–208 [Google Scholar]

[pone.0223504.ref037] 37.West SG, Finch JD, Curran PJ. (1995). Structural equation models with non-normal data In Hoyle RH, editor. Structural equation modeling. Thousand Oaks, CA: Sage; 1995. [Google Scholar]

[pone.0223504.ref038] 38.Cortina J. What is coefficient alpha? An examination of theory and applications. J App Psych 1993; 78(1): 98–104. [Google Scholar]

[pone.0223504.ref039] 39.Campbell DT, Stanley JC. Experimental and quasi-experimental designs for research. Boston: Houghton Mifflin Company; 1963. [Google Scholar]

[pone.0223504.ref040] 40.Cook TD, Campbell DT. Quasi-experimentation: Design and analysis issues for field settings. Boston: Houghton Mifflin Company; 1979. [Google Scholar]

[pone.0223504.ref041] 41.Bollen KA. Structural equation modeling with latent variables. New York: Wiley; 1989. [Google Scholar]

[pone.0223504.ref042] 42.Byrne BM. Structural equation modeling with LISREL, PRELIS, and SIMPLIS: Basic concepts, applications, and programming. Mahwah, NJ: Lawrence Erlbaum Associates; 1998. [Google Scholar]

[pone.0223504.ref043] 43.Byrne BM, Shavelson RJ, Muthèn B. Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psych Bull. 1989; 105: 456–466. [Google Scholar]

[pone.0223504.ref044] 44.Kline RB. Principles and Practice of Structural Equation Modeling, 3rd ed New York: Guilford Press; 2011. [Google Scholar]

[pone.0223504.ref045] 45.Steenkamp JB, Baumgartner H. Assessing measurement invariance in cross-national consumer research. J Consum Res. 1998; 25: 78–90. [Google Scholar]

[pone.0223504.ref046] 46.Bentler PM. Comparative fit indexes in structural equation models. Psych Bull. 1990; 107: 238–246. [DOI] [PubMed] [Google Scholar]

[pone.0223504.ref047] 47.Browne MW, Cudeck R. Alternative ways of assessing model fit In Bollen KA, Long JS, editors. Testing structural equation models. California: Sage Publications; 1993. [Google Scholar]

[pone.0223504.ref048] 48.Hu LT, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Eq Mod. 1999; 6: 1, 1–55. [Google Scholar]

[pone.0223504.ref049] 49.Hopwood CJ, Donnellan MB. How should the internal structure of personality inventories be evaluated? Pers Soc Psych Rev. 2010; 14: 332–346. [DOI] [PubMed] [Google Scholar]

[pone.0223504.ref050] 50.Marsh HW, Hau K, Wen Z. In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Struct Eq Mod. 2004; 11: 320–341. [Google Scholar]

[pone.0223504.ref051] 51.Henson RK. Multivariate normality: What is it and how is it assessed? In Thompson B, editor. Advances in social science methodology. Stamford, CT: JAI Press; 1999. [Google Scholar]

[pone.0223504.ref052] 52.DeCarlo LT. On the meaning and use of kurtosis. Psychological Methods. 1997; 2: 292–307. [Google Scholar]

[pone.0223504.ref053] 53.Bentler PM. Re: Kurtosis, residuals, fit indices. [Message posted to SEMNET listserv]. Msg 011264. Archived at http://bama.ua.edu/cgibin/wa?A2=ind9803&L=semnet&P=R10144&I=1. March 10, 1998. [Google Scholar]

[pone.0223504.ref054] 54.Bentler PM, Wu EJ. EQS for Windows User's Guide. Encino, CA: Multivariate Software; 2002. [Google Scholar]

[pone.0223504.ref055] 55.Jöreskog K. Sörbom D. LISREL 8.80 [Computer Software]. Chicago: Scientific Software International; 2006. [Google Scholar]

[pone.0223504.ref056] 56.Satorra A. Scaled and adjusted restricted tests in multi-sample analysis of moment structures In Heijmans RDH, Pollock DSG, Satorra A, editors. Innovations in Multivariate Statistical Analysis. A Festschrift for Heinz Neudecker. London: Kluwer Academic Publishers; 2000. [Google Scholar]

[pone.0223504.ref057] 57.Andrew J, Cooke M, Muncer SJ. The relationship between empathy and Machiavellianism: An alternative to empathizing–Systemizing theory. Pers Indiv Diff. 2008; 44: 1203–1211. [Google Scholar]

[pone.0223504.ref058] 58.Corral S, Calvete E. Machiavellianism: Dimensionality of the Mach IV and its relation to self-monitoring in a Spanish sample. Span J Psych. 2000; 3(1): 3–13. [DOI] [PubMed] [Google Scholar]

[pone.0223504.ref059] 59.Schmitt N, Kuljanin G. Measurement invariance: Review of practice and implications. Hum Res Mgmt. 2008; 18:210–222. [Google Scholar]

[pone.0223504.ref060] 60.Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Org Res Meth. 2000; 3: 4–70. [Google Scholar]

[pone.0223504.ref061] 61.Mirels HL, Garrett JB. The Protestant ethic as a personality variable. J Consult Clin Psych. 1971; 36: 40–44. [DOI] [PubMed] [Google Scholar]

[pone.0223504.ref062] 62.Goldberg LR. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models In Mervielde I, Deary I, De Fruyt F, Ostendorf F, editors. Personality Psychology in Europe, Vol. 7 Tilburg, Netherlands: Tilburg University Press; 1999. [Google Scholar]

[pone.0223504.ref063] 63.Google Ngram Viewer. Retrieved January 02, 2019, from https://books.google.com/ngrams/graph?content=shirk&year_start=1600&year_end=2017&corpus=15&smoothing=3&share=&direct_url=t1%3B%2Cshirk%3B%2Cc0

PERMALINK

Measurement invariance tests of revisions to archaically worded items in the Mach IV scale

Brian K Miller

Kay Nicols

Robert Konopaske

Roles

Abstract

Introduction

Table 1. Previous exploratory factor analytic results for the Machiavellianism IV scale [1].

Study one method

Procedure

Participants

Mach IV instrument

Study one results

Table 2. Factor loadings for the Mach IV with principal axis factor analysis and an oblique rotation.

Study one discussion

Study two method

Procedure

Table 3. Original problematically worded scale items and revised versions examined in study two.

Participants

Mach IV instrument

Study two results

Tests of normality

CFA tests of the Mach IV

Preliminary single group analysis

Table 4. Study two tests of Mach IV scale separately in the control and treatment groups.

Multiple group (ME/I) analysis

Table 5. Study two tests for measurement invariance for a four factor solution for 17 of the 20 items on the Mach IV scale.

Study two discussion

Table 6. Tests of equality of variance and means for original versus revised items.

General discussion

Strengths

Future research

Data Availability

Funding Statement

References

Decision Letter 0

Angel Blanch

Roles

Author response to Decision Letter 0

Decision Letter 1

Angel Blanch

Roles

Author response to Decision Letter 1

Decision Letter 2

Angel Blanch

Roles

Acceptance letter

Angel Blanch

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases