Abstract
Certain diversity among team members is beneficial to the growth of an organization. Multiple measures have been proposed to quantify diversity, although little is known about their psychometric properties. This article proposes several methods to evaluate the unidimensionality and reliability of three measures of diversity. To approximate the interval scale required by the measures of diversity, a transformation on the Likert-item scores is proposed. Ridge maximum likelihood is used to deal with the issue of small sample size, and methods for evaluating the significance of the difference of two reliability estimates with correlated samples are also developed. Results with a real data set on entrepreneurial teams indicate that different measures of diversity may correspond to significantly different estimates of reliability. Results also indicate that diversity measures obtained with the transformed data tend to be more unidimensional than their counterparts obtained from Likert data. However, diversity measures obtained from Likert data tend to yield greater reliability estimates. Among the three examined measures of diversity, the standard deviation is found to yield greater and more efficient reliability estimates than the others and is thus recommended.
Keywords: unidimensionality, reliability, normal-curve transformation, ridge structural equation modeling
Introduction
The compositional diversity of team members within an organization has been shown to affect the performance and growth of the organization (Harrison & Klein, 2007; Van Knippenberg, De Dreu, & Homan, 2004). Among various kinds of diversity (e.g., demographic, informational, experiential, or personality attributes), not all have been determined to be beneficial to the growth of an organization. Some researchers have indicated that it is merely the differences between team members in terms of their skill level, knowledge, and perspectives that are needed to foster creativity and innovation (e.g., Guzzo & Shea, 1992). However, the findings in the extant literature have not been consistent (Jackson, Joshi, & Erhardt, 2003; Stewart, 2006; Van Knippenberg et al., 2004; Webber & Donahue, 2001) and indicate that our understanding of diversity and its role is still relatively limited. To facilitate a better understanding of diversity, Harrison and Klein (2007) classified diversity into three distinctive types: separation, variety, and disparity. Separation is for the difference in position or opinion among team members; variety is used to describe diversity in expertise, knowledge, or experience; and disparity refers to inequality in status or resources held among team members. Such a classification allows researchers to identify different roles of different types of diversity.
A variety of measures have also been proposed to quantify different types of diversity among individuals within a team. According to Harrison and Klein (2007), separation should be measured by either the standard deviation or the average of Euclidean distances, variety should be measured by the so-called Blau’s (1977) index or entropy (Teachman, 1980), and disparity should be measured by the coefficient of variation or the ratio of the average of the absolute differences over the mean. Harrison and Klein (2007) also discussed the type of scales required by each of these diversity measures and emphasized that measuring separation requires the observed data on team members to be at the interval scale, whereas measuring disparity requires the observed data to be at the ratio scale. However, in the study of human behavior within the fields of education, management, psychology, and related social and behavioral sciences, it is extremely difficult to obtain data at the ratio or even interval scales. What are typically obtained are data collected from a survey using questionnaires that are commonly only ordinal or Likert type. Nevertheless, researchers still regularly apply procedures that require interval-scale data to ordinal data. For example, factor analysis is commonly applied to Likert data for item selection or scale development (Raykov & Marcoulides, 2011). Although such a practice may still yield interpretable results, a better method is to factor analyze the polychoric correlation matrix (Babakus, Ferguson, & Jöreskog, 1987). Given that the observed values in Likert data used to determine the above-mentioned diversity measures are somewhat arbitrary, we propose to transform them to avoid the arbitrariness. The transformation is based on threshold values under the normal curve, parallel to those used in estimating polychoric correlations (Olsson, 1979). We can call it the normal-curve (NC) transformation. Although the transformed data are still limited in number of values, we argue that they are closer to the conditions required by diversity measures than the commonly used Likert data. To see the effect of the transformation, we will study the psychometric properties of several diversity measures when applied to both Likert and transformed data.
Unidimensionality and reliability are probably the two most basic psychometric properties one has to consider for any scale or instrument. Unidimensionality implies that the statistical dependence among the items can be accounted for by a single underlying latent trait, and reliability informs about the degree to which the observed individual differences are indicative of true individual differences on the latent dimension of interest. Measures of diversity, especially those for measuring separation with Likert data, are also subject to such properties if they aim to properly capture any underlying trait. In particular, when measurements in a scale are not unidimensional, the empirical meaning of the scale will be different from the meaning assigned to it, which will create interpretational confounding (e.g., Anderson & Gerbing, 1982; Bagozzi, 1980; Burt, 1973, 1976). Reliability is equally important because, for measurements with a low reliability index, the observed values of the obtained measurements can be mostly due to random errors. Additionally, because the value of the determined reliability index sets a bound on validity (Allen & Yen, 1979), a high reliability (index) is a necessary condition for high validity (Raykov & Marcoulides, 2011). We hope that by studying the unidimensionality and reliability of different measures of diversity, the inconsistent findings obtained to date on the roles of diversity can be better understood.
The methodological development presented in this article was motivated by the need to study the psychometric properties of diversity measures based on 13 Likert items administered to entrepreneurial teams. Because the number of teams plays the role of sample size, which is not sufficiently large, a method to deal with the issue of small sample sizes was also needed especially when using factor analysis to evaluate the unidimensionality of the diversity measures. For such a purpose, we make use of the ridge maximum likelihood (ML) method originally developed in Yuan and Chan (2008). This method has been shown to yield more accurate parameter estimates than the normal-distribution-based maximum likelihood (NML) even for normally distributed data. We also develop methods for evaluating the significance of the difference of two reliability estimates with correlated samples. This enables us to determine whether different measures of diversity correspond to significantly different reliability estimates. If different diversity measures yield significantly different reliability estimates, then it is better to use the one that corresponds to the greatest reliability.
In the next section, the methodological components for studying the unidimensionality and reliability of different measures of diversity are given, including the formulations of different diversity measures, the NC transformation, ridge ML, and standard error (SE) for difference of reliability estimates. A real data set with Likert scale and its analysis are presented in the following section. We conclude with a discussion and recommendations. It is important to note that our focus is on the psychometric properties (unidimensionality and reliability) of different measures of diversity, not on interrater reliability issues (for further details on interrater reliability, see Algina, 1978; Schuster & Smith, 2002; Shrout & Fleiss, 1979).
Methodology
This section first introduces the three diversity measures that will be used in the analysis of the real data. Then, the NC transformation is described. Ridge ML for factor analysis is reviewed next. Formulas for SE of the difference of two reliability estimates are developed at the end of this section. These measures and techniques will be used to analyze the real data in the subsequent section.
Diversity Measures
Let be the score of person on item within team Three measures of diversity derived from will be studied. These are the average of absolute distances (aad) among team members,
the average of absolute deviations from the mean (aadm) of team members,
where and the standard deviation (sd) among team members,
Two measures for separation were recommended by Harrison and Klein (2007). One is the standard deviation in which the denominator is instead of Another is the square root of the average of the squared Euclidean distances in which is not distinguished from According to Biemann and Kearney (2010), these measures may contain substantial bias due to including terms that are obviously 0 or without correcting for the loss of degrees of freedom. The diversity measure in (1) only includes the absolute distances for different team members, and degrees of freedom loss are accounted for in (2) and (3). Parallel to the average Euclidean distance (aed) in Harrison and Klein (2007) or Biemann and Kearney (2010), we define as
Because is proportional to (see Hays, 1981) and any results of reliability and unidimensionality analysis of would be identical to those of we do not separately examine in this article.
As we were not able to locate any references in the literature in which the in (2) or an index that is proportional to has been proposed to measure diversity, can be regarded as a new measure. The psychometric properties of the three measures, and will be examined through real data analysis in the following section.
Quantities in the form of the average of absolute distances (e.g., ) are not presented as stand-alone measures in either Harrison and Klein (2007) or Biemann and Kearney (2010). Instead, they are divided by the team mean score for measuring disparity. Another measure for disparity recommended by Harrison and Klein (2007) is the coefficient of variation. Since these measures require the observed data to possess the properties of ratio scale, they may not be applicable to Likert data and will not be studied in this article. Similarly, variety will not be measured through Likert data and neither do we study Blau’s index or the entropy.
Normal-Curve Transformation
Since all three measures of diversity (aad, aadm, sd) are obtained by arithmetic operations, they are ideally applicable to data that are of interval scale (Harrison & Klein, 2007). However, as indicated previously, measurements in the social and behavioral sciences are typically Likert or ordinal scale. To approximate interval scales, we propose a transformation to Likert data in this subsection.
With a total of individual observations and categories for a given item, let be the proportion of observations1 for category Following the convention of polychoric correlations (Olsson, 1979), we may assume that, for each observed there is an underlying continuous variable such that whenever belongs to the interval where are threshold values to be estimated. This implies that the probability for is given by
where is the cumulative distribution function of with and Thus, the marginal maximum likelihood estimate of is given by
Based on this underlying NC assumption, we propose a transformation to the Likert by
Notice that there are only finite values of in (4), and we cannot use or because they will result in when or when We further propose to use
The proposed values in (6) are equivalent to assigning a value of .5 to cells with zero number of observations in the analysis of contingency tables, because we can think of an extra category below and another extra category above and both had zero number of observations. The proposed values in (6) are also similar to the so-called continuity correction in applying the central limit theorem to categorical data (Feller, 1945), where a step of .5 is used when jumping from 1 to the next whole number.
Notice that the correction in (6) is for to avoid being or whenever or If the nominal number of categories is but only or fewer number of categories are observed, we may simply treat the unobserved categories in the middle as having probability of zero by just applying the correction to the end points of
We need to note that the transformed do not possess the property of interval scales, although they avoid the arbitrary nature of Likert data that assign consecutive whole numbers to ordered categories. Closely related to polychoric correlation, the rationale of the transformation in (5) depends heavily on the assumption of a normal curve underlying the observed frequencies. If the NC assumption holds, the obtained by the NC transformation determined by equations (4), (5), and (6) is simply the middle point of the interval belongs, and thus represents the best prediction of the true value of in the sense of smallest absolute mean difference.
Applying each of the three diversity measures, and to the transformed yields three more measures of diversity. In the next section, their reliability and unidimensionality are examined, and the results are contrasted with those obtained based on Likert data.
Ridge Maximum Likelihood for Factor Analysis With Small Sample Sizes
As indicated in the previous section, the number of teams, plays the role of sample size when evaluating the psychometric properties of the diversity measures aad, aadm, and sd. Since it can be expensive to have a large we use ridge ML for factor analysis of the diversity measures in (1) to (3) when studying their unidimensionality. Unless all the are sufficiently large, the diversity measures in (1) to (3) cannot be regarded as normally distributed. As such, we expect ridge ML to work better than NML when factor analyzing the diversity measures.
Let be a sample covariance matrix of size and we are interested in modeling by a confirmatory factor model
where is a factor loading matrix, is a factor correlation matrix, and is a diagonal matrix of measurement errors/uniquenesses. The widely used NML procedure for covariance structure analysis is to minimize
for parameter estimation. Let be a small number and with being the identity matrix. The ridge ML developed in Yuan and Chan (2008) is to estimate by minimizing and let the estimates be denoted by The corresponding estimates for are obtained by subtracting from each of the elements of corresponding to the diagonal elements of leaving the other elements of unchanged. Standard errors of are obtained by a sandwich-type covariance matrix, which accounts for the unknown underlying population distribution of the involved diversity measure. As for overall model evaluation, Yuan and Chan (2008) showed that, unless does not asymptotically follow the nominal chi-square distribution even if data are normally distributed. They developed a rescaled statistic and an adjusted statistic Parallel to the development for NML in Satorra and Bentler (1994), asymptotically follows a distribution whose mean equals and asymptotically follows a distribution whose mean and variance equal those of the approximating distribution. Since the details of ridge ML have already been described in Yuan and Chan (2008), no further elaboration is given here. Our purpose is to apply ridge ML to evaluate the unidimensionality of each of the three measures of diversity in (1) to (3) and to determine whether the corresponding sample covariance matrix can be reasonably fitted by a one-factor model. Following the recommendation of Yuan and Chan (2008), is used in applying the ridge ML.
In order to fully justify applying a factor analysis to each of the diversity measures, we do not need to assume that each of or is identically distributed across The development in Lee and Shi (1998) implies that the vector does not need to have the same population covariance as varies. Since for both reliability and unidimensionality the analysis is based on the sample covariance matrix of the corresponding diversity measures with the assumption our study of the psychometric properties of is for the population represented by the sample We will further discuss this point in the concluding section.
Standard Error for Difference of Two Reliability Estimates With Correlated Samples
Among the many available estimates of reliability for equally weighted composite scores, coefficient alpha is most widely used in practice even though it can over- or underestimate the population reliability (Raykov, 1997). Another popular estimate is coefficient omega defined through the factor loadings and error variances by fitting the sample covariance matrix to a one-factor model (McDonald, 1999). Both are applicable when evaluating the reliability of the different diversity measures. Our interest is whether different diversity measures will yield significantly different reliability estimates. Thus, we need to have an estimate of the SE of the difference of two estimates of alpha or omega. When the two estimates are independent, the variance of the difference of the two estimates is simply the summation of the variances of the two estimates of alpha or omega. However, with respect to the three diversity measures, the variance or SE of the difference of two estimates of alpha or omega depends on their correlation. Since the SE for the difference of two reliability estimates with correlated samples will facilitate comparison of reliabilities in other contexts, and the literature to date does not contain such a development, we provide more details for obtaining consistent SEs of the difference of two estimates of alpha and omega, respectively. We also present the necessary notation and formulas for calculating the SEs. The complete details leading to the calculation formulas are given in Appendices A and B.
Let be a sample covariance matrix of size and be the vector by stacking the elements in the lower-triangular part of Then, with is a vector of and the sample coefficient alpha is given by
where is a vector whose elements are 1 corresponding to and 0 elsewhere; and is also a vector whose elements are 1 corresponding to and 2 corresponding to when For example, at and We need to have the Jacobian matrix or the matrix of derivatives of with respect to the elements of and it is given by
With and from two correlated samples, standard error for also involves the variance-covariance matrices of and Denote these by and where These are consistently estimated by their sample counterparts, with details given in Appendix A. With the introduced notation, the result given in Appendix B implies that is asymptotically normally distributed with mean zero and variance consistently estimated by
It follows from (8) that the SE of is consistently estimated by which will be used in the next section when evaluating the significance of the difference Confidence interval (CI) for with confidence level can be obtained as
where is the critical value corresponding to probability under the standard normal curve.
We next consider the sample coefficient omega, which is defined through the estimates of a one-factor model. With items, the covariance structure of the one-factor model can be represented by (7), where is a vector of factor loadings, and is a diagonal matrix of error variances. Let be the ridge ML estimates for the one-factor model. Then the sample coefficient omega is given by
where is a vector of 1s. With two covariance matrices and the ridge tuning parameters and can be different. We set them equal () in our study and denote and Let the parameter estimates by minimizing and be denoted as and respectively. We need to introduce additional notation for presenting the SE of
Let be the vector of stacking all the columns of and Then there exists a matrix such that and is called the duplication matrix (e.g., Schott, 2005). Notice that the covariance structure in fitting and are the same, the difference between fitting the two samples are in parameter estimates. One is and the other is these are obtained by subtracting from and corresponding to each error variance, respectively. Let the Jacobian matrices of and be denoted by and (see Yuan & Bentler, 2002), respectively; and Then Appendix B shows that is asymptotically normally distributed with mean zero and variance can be consistently estimated by
with , The result in (9) allows us to evaluate the significance of Alternatively, we can obtain a -level confidence interval (CI) for as
Before ending this section, we note that the validity of SEs for and in this subsection does not need the normality assumption.
Psychometric Analysis of Diversity Measures With Entrepreneurial Teams
The data are part of a longitudinal study examining the impact of team attributes and team process on team performance (Deng, Ye, & Xie, 2013). Participants for the study are members nested within teams, which are distributed across provinces in the well-developed Eastern part of China and include Beijing. Because diversity is known to affect team performance, 13 items measuring diversity were administered to each team member starting from the first wave. The English version of the 13 diversity items are included in Appendix C, and each participant was asked to endorse each item using a 5-point Likert scale (1 = strongly disagree, 2 = somewhat disagree, 3 = neutral/no opinion, 4 = somewhat agree, 5 = strongly agree). In the design, the first five items are about the information/background of team members and are used to measure information diversity. The last eight items are about their opinions and measure underlying diversity. Following the design, we separate the five information-diversity items from the eight underlying-diversity items when studying their psychometric properties of reliability and unidimensionality. It is also worthy to note that, although Items 4, 6, 7, 8, 11, and 12 are phrased in the opposite direction from the other seven items, they do not affect the values of or whether they are reversed or not in the analysis, since each of the diversity measures uses absolute values of centralized scores or score differences.
There are a total of four waves of data in the longitudinal study of Deng et al. (2013). However, the majority of the participants showed little change in their answers to the 13 diversity items across the waves. Consequently, our analysis uses data from only the first wave. It should also be noted that there were many teams in which team members provided identical answers when endorsing each of the 13 items (resulting in a diversity value of 0 on all items), and these data are not included in our analysis. In summary, the study in this section is based on individual participants from teams, with the number of participants in a team ranging from 2 to 5.
As described in the second section, diversity measures and are obtained based on the Likert data and NC-transformed data for each of the 13 diversity items, respectively. These are also referred to as samples in the following discussion.
Distribution Properties
Before evaluating the reliability and unidimensionality of these measures, it is informative to check their distribution characteristics. In particular, we want to know whether the NC transformation has any effect on the distribution of the diversity measures. Table 1 contains the sample marginal skewness and excess kurtosis of each diversity measure for each of the 13 items. The absolute averages (aave) of the sample skewness and excess kurtosis across the 13 items are also reported at the bottom of the table. According to Table 34C of Pearson and Hartly (1954), at sample size sample skewness is statistically significant at 2% or 10% level if its absolute value is greater than .787 or .533, respectively. At these critical values are slightly smaller. It is clear that multiple entries of sample skewness in the top panel of Table 1 are greater than .787. This is because each of the diversity measures is obtained using absolute values or square root of a summation of squared deviations, and such kinds of measures tend to have longer right tail. The values of sample skewness in Table 1 suggest that NC transformation does make the resulting diversity measures less skewed on average, although not all the values following the transformation become smaller. Comparing among the three diversity measures, the values of sample skewness corresponding to are uniformly the smallest while those corresponding to are uniformly the largest, suggesting that different diversity measures have different distributional shapes.
Table 1.
Sample Skewness and Excess Kurtosis of the Three Measures of Diversities: Average of Absolute Distance (aad) Among Team Members, Average of Absolute Deviation From the Mean (aadm) of Team Members, and Standard Deviation (sd) Among Team Members.
| Sample skewness |
||||||
|---|---|---|---|---|---|---|
| Likert data |
Transformed data |
|||||
| Item | add | aadm | sd | add | aadm | sd |
| 1 | 1.640 | 1.725 | 1.294 | .832 | .859 | .363 |
| 2 | .827 | 1.024 | .655 | .618 | .780 | .385 |
| 3 | 1.028 | 1.225 | .694 | .681 | .814 | .375 |
| 4 | .425 | .778 | .122 | .505 | .859 | .202 |
| 5 | 1.057 | 1.418 | .616 | 1.189 | 1.493 | .707 |
| 6 | .724 | .812 | .534 | .705 | .826 | .485 |
| 7 | .547 | .646 | .403 | .588 | .758 | .317 |
| 8 | .951 | 1.266 | .613 | .834 | 1.122 | .492 |
| 9 | .614 | .874 | .329 | .349 | .535 | .106 |
| 10 | .603 | .818 | .269 | .543 | .773 | .219 |
| 11 | .262 | .386 | −.034 | .302 | .434 | −.011 |
| 12 | .863 | 1.158 | .587 | .897 | 1.192 | .583 |
| 13 | 1.693 | 1.903 | 1.357 | 1.666 | 1.880 | 1.297 |
| aave | .864 | 1.079 | .577 | .747 | .948 | .426 |
| Sample excess kurtosis |
||||||
| Likert data |
Transformed data |
|||||
| Item | add | aadm | sd | add | aadm | sd |
| 1 | 2.187 | 2.588 | 1.074 | − .071 | .039 | − .984 |
| 2 | .175 | 1.133 | − .515 | − .654 | − .162 | − 1.032 |
| 3 | .588 | 1.246 | − .339 | − .356 | − .021 | − .803 |
| 4 | .278 | .913 | − .080 | .394 | 1.092 | − .015 |
| 5 | 2.272 | 3.279 | .966 | 2.962 | 3.751 | 1.689 |
| 6 | − .669 | − .411 | − .844 | − .672 | − .345 | − .936 |
| 7 | − .242 | .032 | − .272 | − .065 | .371 | − .412 |
| 8 | 1.339 | 2.851 | .241 | .954 | 2.044 | .110 |
| 9 | − .300 | .456 | − .874 | − .972 | − .522 | − 1.313 |
| 10 | − .252 | .244 | − .625 | − .339 | .220 | − .719 |
| 11 | − .047 | .304 | − .484 | − .036 | .310 | − .470 |
| 12 | .839 | 2.501 | − .122 | .810 | 2.172 | − .144 |
| 13 | 3.240 | 3.907 | 2.257 | 3.223 | 3.917 | 2.145 |
| aave | .956 | 1.528 | .669 | .885 | 1.151 | .829 |
The lower panel of Table 1 contains the marginal sample excess kurtosis of each of the diversity measures. According to Pearson and Hartly (1954, Table 34C), at sample kurtosis is significantly different from that of a normal distribution (whose excess kurtosis equals 0) at 2% or 10% level if its value is outside the interval or At the end values of these intervals slightly move to the center. Similar to skewness, multiple entries of excess sample kurtosis are outside the two intervals. While the kurtosis values of and following the NC transformation become smaller on average, the average kurtosis of following the NC transformation becomes greater. None of the diversity measures enjoys uniformly smallest excess kurtosis although the absolute average for with the Likert data is the smallest.
The results in Table 1 suggest that, on average, has the smallest skewness and kurtosis with either the Likert data or the NC-transformed data. We note that the sample skewness and excess kurtosis for the 13th item are still significant at level .02. Thus, NML-based SEs (see, e.g., Van Zyl, Neudecker, & Nel, 2000) for reliability estimates are not valid even when sample size is large, and SEs based on the sandwich-type covariance matrix are needed. Similarly, we have to rely on the rescaled or adjusted statistics when evaluating the unidimensionality of the diversity measures using factor analysis.
Unidimensionality
Because the first five items are designed to measure information diversity and the last eight items are designed to measure underlying diversity, we would like to see whether some or all of the diversity measures on the first five or last eight items follow a one-factor model. If some or all of them follow a one-factor model, then we may choose a measure that is most reliable in future applications. If none of them follow a one-factor model, then we may need to further study the dimensionality of these diversity measures to better understand their factor structure as well as their relationship with the content of the items that these diversity measures are derived. Only after the factor structures of the or are well understood can we make better use of these diversity measures.
Since plays the role of sample size, which may not be sufficiently large for factor analysis, we use ridge ML for more reliable parameter estimates and overall model evaluation. Following the recommendation of Yuan and Chan (2008), the ridge parameter is chosen as when studying the five items of information diversity, and when studying the eight items of underlying diversity. Fitting the one-factor model to the original Likert data as well as the NC-transformed data with five and eight items by ridge ML, respectively, the rescaled and adjusted test statistics, and together with their associated p values, are obtained and reported in Table 2. The degrees of freedom for the adjusted statistic, are also included to better understand the value of The statistic is reported as well for comparison purpose.
Table 2.
Test Statistics by Fitting One-Factor Model to Each of the Three Measures of Information and Underlying Diversities: Average of Absolute Distance (aad) Among Team Members, Average of Absolute Deviation From the Mean (aadm) of Team Members, and Standard Deviation (sd) Among Team Members.
| (a) Information diversity (5 items, df = 5) | |||||
|---|---|---|---|---|---|
| Likert data |
Transformed data |
||||
| Statistic | p Value | Statistic | p Value | ||
| aad | 8.450 | .133 | 5.299 | .380 | |
| 8.872 | .114 | 5.578 | .349 | ||
| 5.160 | .151 | 3.587 | .343 | ||
| 2.908 | 3.216 | ||||
| aadm | 8.663 | .123 | 5.394 | .370 | |
| 8.240 | .143 | 5.388 | .370 | ||
| 4.606 | .179 | 3.548 | .360 | ||
| 2.794 | 3.293 | ||||
| sd | 6.039 | .302 | 3.770 | .583 | |
| 9.397 | .094 | 5.711 | .335 | ||
| 5.612 | .131 | 3.580 | .331 | ||
| 2.986 | 3.134 | ||||
| (b) Underlying diversity (8 items, df = 20) | |||||
| Likert data |
Transformed data |
||||
| Statistic | p Value | Statistic | p Value | ||
| aad | 24.335 | .228 | 28.521 | .098 | |
| 31.084 | .054 | 28.682 | .094 | ||
| 6.893 | .177 | 7.546 | .205 | ||
| 4.435 | 5.262 | ||||
| aadm | 24.193 | .234 | 28.972 | .088 | |
| 35.471 | .018 | 29.692 | .075 | ||
| 9.685 | .108 | 8.822 | .180 | ||
| 5.461 | 5.943 | ||||
| sd | 13.196 | .869 | 17.696 | .607 | |
| 30.545 | .061 | 28.402 | .100 | ||
| 6.958 | .183 | 7.806 | .208 | ||
| 4.556 | 5.497 | ||||
The statistics and for information diversity in the top panel of Table 2 suggest that, except the fit for the measure with the Likert data being marginal, other samples are well fitted by the one-factor model. The results also suggest that the fit to each of the three diversity measures with the NC-transformed data is a lot better than its counterpart obtained with the Likert data.
The statistic in the lower panel of Table 2 suggests that the fit of the one-factor model to each of the samples with underlying diversity is reasonable. However, the statistic suggests that the fit is marginal, although the p values corresponding to under the transformed data are uniformly larger. Similar to the results displayed in the top panel of Table 2, all the samples under NC transformation are fitted by the one-factor model uniformly better according to However, unlike in the top panel where is fitted by the one-factor model least well, in the lower panel is fitted by the one-factor model least well.
It is interesting to note that some of the statistics in Table 2 are multiple times of the corresponding and so are their corresponding degrees of freedom. This is because the measures of diversity are not normally distributed. In particular, when data are far from symmetrically distributed, may differ substantially from and automatically accounts for the value of due to certain distributional characteristics of the sample. The statistic is very close to for some samples and is quite different from for other samples. This is expected because their difference also depends on the distribution of the sample.
In summary, there exist differences among the test statistics regarding the unidimensionality of the three diversity measures. But the differences are not substantial. The results in Table 2 suggest that the fit with NC-transformed data is substantially better than with Likert data. The difference in p values between and on each sample is consistent with the literature when they are applied to NML (Satorra & Bentler, 1994). In particular, Bentler and Yuan (1999), Fouladi (2000), Nevitt and Hancock (2004), and Savalei (2010) found that tends to reject correct models too often at smaller sample sizes; and results in Nevitt and Hancock (2004) and Savalei (2010) indicate that Type I errors of tend to be lower than the nominal level.
Table 3 contains the ridge ML estimates of factor loadings and error variances for the three diversity measures with Likert data. Like the test statistics in Table 2, there exist noticeable differences among parameter estimates. For example, parameter estimates for and with in the lower panel of Table 3 are not statistically significant at the .05 level, whereas they are significant with and Other noticeable patterns include (a) estimates of factor loadings and error variances with the measure for information diversity are uniformly the smallest and (b) estimates for error variances with for underlying diversity are uniformly the smallest. In particular, for all the 13 items, the z-statistics for in Table 3 are uniformly the largest, implying that parameter estimates under tend to be more efficient.
Table 3.
Ridge ML Estimates of Factor Loadings and Error Variances by Fitting One-Factor Model to Each of the Three Measures of Information (Items 1-5) and Underlying (Items 6-13) Diversities With Likert Data: Average of Absolute Distance (aad) Among Team Members, Average of Absolute Deviation From the Mean (aadm) of Team Members, and Standard Deviation (sd) Among Team Members.
| aad |
aadm |
sd |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| SE | z | SE | z | SE | z | ||||
| .294 | .175 | 1.676 | .307 | .185 | 1.657 | .218 | .108 | 2.013 | |
| .408 | .160 | 2.547 | .375 | .164 | 2.284 | .353 | .109 | 3.253 | |
| .799 | .184 | 4.348 | .760 | .193 | 3.946 | .609 | .109 | 5.593 | |
| .148 | .100 | 1.482 | .154 | .101 | 1.525 | .121 | .072 | 1.678 | |
| .409 | .126 | 3.249 | .391 | .134 | 2.926 | .314 | .081 | 3.858 | |
| .565 | .148 | 4.484 | .537 | .147 | 4.296 | .303 | .072 | 5.579 | |
| .371 | .107 | 4.383 | .323 | .097 | 4.335 | .210 | .063 | 4.832 | |
| .049 | .218 | .668 | .065 | .213 | .756 | .032 | .098 | 1.316 | |
| .424 | .102 | 5.089 | .399 | .110 | 4.491 | .218 | .049 | 6.379 | |
| .500 | .160 | 3.737 | .487 | .165 | 3.542 | .271 | .077 | 4.765 | |
| aad |
aadm |
sd |
|||||||
| SE | z | SE | z | SE | z | ||||
| .425 | .105 | 4.059 | .459 | .108 | 4.237 | .317 | .074 | 4.293 | |
| .400 | .085 | 4.679 | .251 | .114 | 2.207 | .315 | .061 | 5.199 | |
| .400 | .139 | 2.871 | .203 | .156 | 1.300 | .340 | .081 | 4.208 | |
| .241 | .063 | 3.816 | .184 | .084 | 2.180 | .202 | .050 | 4.030 | |
| .265 | .077 | 3.452 | .392 | .100 | 3.931 | .208 | .055 | 3.820 | |
| .247 | .060 | 4.130 | .177 | .091 | 1.934 | .224 | .046 | 4.918 | |
| .315 | .131 | 2.411 | .215 | .145 | 1.480 | .271 | .081 | 3.358 | |
| .358 | .084 | 4.271 | .542 | .126 | 4.287 | .256 | .055 | 4.661 | |
| .326 | .087 | 5.507 | .240 | .091 | 4.345 | .185 | .044 | 7.732 | |
| .186 | .045 | 7.518 | .244 | .053 | 7.449 | .102 | .025 | 10.363 | |
| .268 | .082 | 5.162 | .331 | .084 | 5.751 | .133 | .041 | 6.995 | |
| .285 | .058 | 7.585 | .270 | .066 | 6.370 | .160 | .030 | 10.471 | |
| .286 | .057 | 7.708 | .180 | .058 | 5.756 | .148 | .028 | 10.634 | |
| .160 | .040 | 7.740 | .170 | .036 | 9.007 | .085 | .020 | 11.729 | |
| .353 | .064 | 7.960 | .331 | .076 | 6.401 | .193 | .035 | 9.905 | |
| .621 | .231 | 3.348 | .441 | .157 | 3.791 | .322 | .117 | 4.074 | |
Estimates of factor loadings and error variances with NC-transformed data are reported in Table 4, where again there exist noticeable differences among parameter estimates across the three samples. Error variance estimates for the measure with underlying diversity are still uniformly the smallest, but the pattern with information diversity is not so clear. Again, the z-statistics with the measure are uniformly the largest across the 13 items. Comparing Tables 3 and 4, except for whose estimate with in Table 3 is nonsignificant but significant in Table 4, the transformation does not change the significance status of other parameter estimates across the two tables. However, fluctuations exist in parameter estimates due to the transformation, some estimates in Table 3 are slightly greater while others in Table 4 are slightly greater.
Table 4.
Ridge ML Estimates of Factor Loadings and Error Variances by Fitting One-Factor Model to Each of the Three Measures of Information (Items 1-5) and Underlying (Items 6-13) Diversities With NC-Transformed Data: Average of Absolute Distance (aad) Among Team Members, Average of Absolute Deviation From the Mean (aadm) of Team Members, and Standard Deviation (sd) Among Team Members.
| aad |
aadm |
sd |
|||||||
|---|---|---|---|---|---|---|---|---|---|
| SE | z | SE | z | SE | z | ||||
| .192 | .141 | 1.362 | .178 | .144 | 1.236 | .187 | .095 | 1.978 | |
| .300 | .132 | 2.279 | .266 | .127 | 2.087 | .269 | .091 | 2.968 | |
| .675 | .206 | 3.283 | .682 | .228 | 2.991 | .484 | .107 | 4.539 | |
| .192 | .129 | 1.485 | .196 | .126 | 1.557 | .154 | .096 | 1.598 | |
| .317 | .122 | 2.604 | .306 | .126 | 2.438 | .244 | .080 | 3.065 | |
| .583 | .113 | 6.022 | .577 | .111 | 6.046 | .323 | .055 | 7.663 | |
| .311 | .078 | 5.206 | .288 | .070 | 5.519 | .168 | .046 | 5.719 | |
| .052 | .236 | .625 | .013 | .269 | .407 | .054 | .085 | 1.765 | |
| .520 | .138 | 4.472 | .489 | .147 | 3.994 | .267 | .068 | 5.350 | |
| .439 | .139 | 3.856 | .434 | .143 | 3.696 | .230 | .066 | 4.968 | |
| aad |
aadm |
sd |
|||||||
| SE | z | SE | z | SE | z | ||||
| .473 | .116 | 4.094 | .510 | .120 | 4.255 | .346 | .080 | 4.356 | |
| .490 | .118 | 4.132 | .379 | .146 | 2.590 | .367 | .078 | 4.723 | |
| .416 | .166 | 2.505 | .261 | .184 | 1.417 | .344 | .096 | 3.562 | |
| .305 | .100 | 3.048 | .257 | .107 | 2.409 | .267 | .076 | 3.512 | |
| .348 | .117 | 2.984 | .438 | .133 | 3.292 | .274 | .080 | 3.420 | |
| .333 | .086 | 3.865 | .266 | .115 | 2.323 | .304 | .065 | 4.688 | |
| .356 | .155 | 2.291 | .281 | .170 | 1.646 | .310 | .094 | 3.288 | |
| .386 | .089 | 4.345 | .508 | .123 | 4.124 | .279 | .056 | 4.978 | |
| .402 | .114 | 4.898 | .306 | .114 | 4.027 | .227 | .056 | 6.778 | |
| .310 | .083 | 5.575 | .368 | .095 | 5.500 | .169 | .043 | 7.566 | |
| .409 | .105 | 5.336 | .458 | .105 | 5.807 | .205 | .052 | 6.926 | |
| .606 | .104 | 7.315 | .553 | .116 | 6.115 | .341 | .054 | 9.227 | |
| .539 | .101 | 6.843 | .419 | .107 | 5.350 | .282 | .052 | 8.443 | |
| .332 | .084 | 5.791 | .337 | .077 | 6.396 | .175 | .042 | 7.797 | |
| .527 | .098 | 6.964 | .488 | .107 | 6.003 | .282 | .051 | 8.555 | |
| .724 | .262 | 3.354 | .595 | .208 | 3.609 | .376 | .131 | 4.031 | |
Reliability
Table 5 contains the estimates of alpha and omega for both information diversity and underlying diversity. Their SEs and corresponding z-statistics are also reported in the table. The differences of reliability estimates for either alpha or omega, together with their SEs and corresponding z-statistics are reported as well. The results in Table 5 suggest that both and with the Likert data are uniformly greater than those with the transformed data, so are the corresponding z-statistics. Except for aad with underlying diversity, all the other are greater than the corresponding
Table 5.
Estimates of Reliabilities Alpha and Omega for Three Measures of Information and Underlying Diversities: Average of Absolute Distance (aad) Among Team Members, Average of Absolute Deviation From the Mean (aadm) of Team Members, and Standard Deviation (sd) Among Team Members.
| (a) Reliability applied to sample covariance matrix | ||||||
|---|---|---|---|---|---|---|
| Likert data |
Transformed data |
|||||
| Information diversity (5 items) | SE | z | SE | z | ||
| aad | .641 | .092 | 6.988 | .556 | .101 | 5.500 |
| aadm | .644 | .097 | 6.615 | .554 | .101 | 5.488 |
| sd | .669 | .078 | 8.610 | .603 | .095 | 6.333 |
| aadm − aad | .003 | .014 | .232 | −.002 | .014 | −.166 |
| sd − aad | .028 | .028 | .989 | .047 | .028 | 1.666 |
| sd − aadm | .025 | .041 | .606 | .049 | .040 | 1.229 |
| Likert data |
Transformed data |
|||||
| Underlying diversity (8 items) | SE | z | SE | z | ||
| aad | .738 | .074 | 9.953 | .714 | .080 | 8.969 |
| aadm | .723 | .080 | 9.015 | .701 | .084 | 8.303 |
| sd | .774 | .063 | 12.356 | .752 | .068 | 11.004 |
| aadm − aad | −.016 | .020 | −.769 | −.014 | .020 | −.693 |
| sd − aad | .036 | .021 | 1.686 | .038 | .023 | 1.665 |
| sd − aadm | .051 | .039 | 1.324 | .052 | .040 | 1.296 |
| (b) Reliability following ridge ML | ||||||
| Likert data |
Transformed data |
|||||
| Information diversity (5 items) | SE | z | SE | z | ||
| aad | .689 | .077 | 8.973 | .596 | .080 | 7.483 |
| aadm | .686 | .086 | 7.969 | .595 | .083 | 7.159 |
| sd | .716 | .064 | 11.272 | .632 | .077 | 8.248 |
| aadm − aad | −.003 | .014 | −.255 | −.001 | .016 | −.036 |
| sd − aad | .027 | .027 | 1.018 | .036 | .033 | 1.103 |
| sd − aadm | .031 | .039 | .784 | .037 | .047 | .780 |
| Likert data |
Transformed data |
|||||
| Underlying diversity (8 items) | SE | z | SE | z | ||
| aad | .739 | .078 | 9.520 | .715 | .081 | 8.845 |
| aadm | .727 | .073 | 9.923 | .705 | .080 | 8.851 |
| sd | .774 | .065 | 11.846 | .751 | .070 | 10.761 |
| aadm − aad | −.012 | .025 | −.490 | −.010 | .021 | −.488 |
| sd − aad | .035 | .022 | 1.582 | .036 | .023 | 1.542 |
| sd − aadm | .047 | .038 | 1.236 | .046 | .039 | 1.196 |
Among the three diversity measures (aad, aadm, sd), sd always corresponds to the largest and with either the Likert or the transformed data, and for either information or underlying diversity. Except for the information diversity scale with Likert data in the top left portion of Table 5, aadm always corresponds to the smallest and However, the largest z-statistic for reliability difference is always between sd and aad (). This is because the SEs corresponding to the difference of the estimates between sd and aad tend to be smaller than those between and due to different correlations between the estimates of reliability.
Although none of the differences in reliability estimates are significant at the level of .05, three differences of are at the level of .1, corresponding to those between and for information diversity with transformed data (), and for underlying diversity with both Likert and NC-transformed data (, 1.665). Two differences in corresponding to those between and for underlying diversity are also marginal (, 1.542). As there are only independent teams in the analysis, we would expect that the difference of reliability estimates between and to become more pronounced and significant with a larger N.
Most are greater than in Table 5. There are also exceptions, for example, the measure sd with the NC-transformed sample for underlying diversity. Such observed differences are expected since the items may not be essentially tau-eq18 or literally unidimensional (see, e.g., Raykov, 1997).
Discussion and Conclusion
In this article, we described methodology for studying unidimensionality and reliability of diversity measures with Likert data. Using some real data, the analyses indicate that the reliability estimates corresponding to sd are the greatest. The z-statistics for the reliability estimates corresponding to sd are the largest, and so are the corresponding z-statistics for estimates of factor loading and error variances. The SEs for these estimates are also the smallest. These indicate that the diversity measure tends to yield more efficient parameter estimates than both and With respect to unidimensionality, there is little difference in test statistics across the three diversity measures. Thus, among the three measures of diversity, sd is the preferred measure. Comparing between the NC-transformed data and the Likert data, the transformed data are closer to the underling values if the NC assumption holds. The transformed data are less skewed on average; diversity measures with the transformed data also tend to be more unidimensional than those with Likert data. However, reliability estimates following the transformed data tend to be smaller than those following the Likert data. It would appear that some additional studies are needed to further examine the merit of the transformed data versus Likert data.
The implication in studying the reliability and unidimensionality of the diversity measures is that there exists a model in which each measure ( or ) is linearly related to a latent diversity trait plus an error or uniqueness term. In contrast, models in studying rater reliability assume that the observed scores are linearly related to some underlying traits. Clearly, both kinds of models are hypothetical and are motivated by the needs to study reliability and/or unidimensionality of the corresponding measurements. The obtained results in this study indicate that each of the three diversity measures is fitted by the one-factor model reasonably well, and each subscale defined by these measures also has a decent reliability. More research is clearly needed as to whether the model behind the diversity measures aad, aadm, or sd is more plausible or that behind the original or is more plausible. This issue might be best addressed through an analysis in which many different sets of real data are examined rather than through analytical or simulation studies.
Appendix A
This appendix gives the formula for calculating consistent estimates of
Let the two different measures of diversity be denoted by and each is a vector, Let be the sample mean of the jth diversity measure, and Then a consistent estimate of is given by
where and are the vectors of sample means of and respectively.
Appendix B
Asymptotic Distributions of and
This appendix shows that both and are asymptotically normally distributed and gives the formulas for calculating consistent estimates of their variances.
In the Methodology section we have introduced the notation and Their population counterparts are given by and It follows from standard asymptotics that
where denotes a term that converges to 0 in probability when . According to the central limit theorem, and are jointly asymptotically normally distributed with asymptotic variance-covariance matrices and It follows from (B1) that
where
For two estimates and of omega, there exists
The development in Yuan and Chan (2008) implies that
where
Combining (B3) and (B4) yields
where
with
Appendix C
Thirteen Items for Measuring Team Diversity
The first five items are for measuring information diversity, and the last eight items are for measuring underlying diversity. Participants were asked to endorse each of the items using a 5-point Likert-type scale.
Overall, the ages of team members are widely distributed.
Overall, team members have diverse background and training.
Overall, knowledge and specialty of team members are complementary.
Overall, team members have similar social experience (reversed).
Overall, team members have different expertise.
Overall, team members have the same value regarding entrepreneurial development (reversed).
If starting a new business, team members will aim to achieve the same goal (reversed).
If starting a new business, team members will have the same ambition (reversed).
Overall, team members have different personality.
Overall, team members have different working styles.
Overall, team members are on the same page regarding the goal of the team (reversed).
Overall, team members have the same consensus regarding the focus of the team (reversed).
Team members have different entrepreneurial philosophy.
We omit the subscripts i, j, and k from q and h to simplify the notation.
Footnotes
Declaration of Conflicting Interests: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research was supported in part by a grant from the National Natural Science Foundation of China (71002023) and a grant from China Scholarship Council.
References
- Algina J. (1978). Comment on Bartko’s “On various intraclass correlation reliability coefficients.” Psychological Bulletin, 85, 135-138. [Google Scholar]
- Allen M. J., Yen W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole. [Google Scholar]
- Anderson J. C., Gerbing D. W. (1982). Some methods for respecifying measurement models to obtain unidimensional construct measurement. Journal of Marketing Research, 19, 453-460. [Google Scholar]
- Babakus E., Ferguson J. C. E., Jöreskog K. G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. Journal of Marketing Research, 24, 222-228. [Google Scholar]
- Bagozzi R. P. (1980). Causal models in marketing. New York: John Wiley. [Google Scholar]
- Bentler P. M., Yuan K.-H. (1999). Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research, 34, 181-197. [DOI] [PubMed] [Google Scholar]
- Biemann T., Kearney E. (2010). Size does matter: How varying group sizes in a sample affect the most common measures of group diversity. Organizational Research Methods, 13, 582-599. [Google Scholar]
- Blau P. M. (1977). Inequality and heterogeneity. New York, NY: Free Press. [Google Scholar]
- Burt R. S. (1973). Confirmatory factor-analysis structures and the theory construction process. Sociological Methods & Research, 2, 131-187. [Google Scholar]
- Burt R. S. (1976). Interpretational confounding of unobserved variables in structural equation models. Sociological Methods & Research, 5, 3-52. [Google Scholar]
- Deng L., Ye S., Xie L. (2013). A longitudinal study of team trait combinations, team process and performance. Journal of Management Science (manuscript under review). [Google Scholar]
- Feller W. (1945). On the normal approximation to the binomial distribution. Annals of Mathematical Statistics, 16, 319-329. [Google Scholar]
- Fouladi R. (2000). Performance of modified test statistics in covariance and correlation structure analysis under conditions of multivariate nonnormality. Structural Equation Modeling, 7, 356-410. [Google Scholar]
- Guzzo R. A., Shea G.P. (1992). Group performance and intergroup relations in organizations. In Dunnette M. D., Hough L. M. (Eds.), Handbook of industrial and organizational psychology (Vol. 3, 2nd ed., pp. 269-313). Palo Alto, CA: Consulting Psychologists Press. [Google Scholar]
- Harrison D. A., Klein K. J. (2007). What’s the difference? Diversity constructs as separation, variety, or disparity in organizations. Academy of Management Review, 32, 1199-1228. [Google Scholar]
- Hays W. L. (1981). Statistics (3rd ed.). New York, NY: Holt, Rinehart & Winston. [Google Scholar]
- Jackson S. E., Joshi A., Erhardt N. L. (2003). Recent research on team and organizational diversity: SWOT analysis and implications. Journal of Management, 29, 801-830. [Google Scholar]
- Lee S.-Y., Shi J.-Q. (1998). Analysis of covariance structures with independent and non-identically distributed observations. Statistica Sinica, 8, 543-557. [Google Scholar]
- McDonald R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum. [Google Scholar]
- Nevitt J., Hancock G. (2004). Evaluating small sample approaches for model test statistics in structural equation modeling. Multivariate Behavioral Research, 39, 439-478. [Google Scholar]
- Olsson U. (1979). Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika, 44, 443-460. [Google Scholar]
- Pearson E. S., Hartly H. O. (1954). Biometrika tables for statisticians (Vol. 1). London, England: Biometrika Trust. [Google Scholar]
- Raykov T. (1997). Scale reliability, Cronbach’s coefficient alpha, and violations of essential tau-equivalence with fixed congeneric components. Multivariate Behavioral Research, 32, 329-353. [DOI] [PubMed] [Google Scholar]
- Raykov T., Marcoulides G. A. (2011). Introduction to psychometric theory. New York, NY: Routledge. [Google Scholar]
- Satorra A., Bentler P. M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In von Eye A., Clogg C. C. (Eds.), Latent variables analysis: Applications for developmental research (pp. 399-419). Thousand Oaks, CA: Sage. [Google Scholar]
- Savalei V. (2010). Small sample statistics for incomplete nonnormal data: Extensions of complete data formulae and a Monte Carlo comparison. Structural Equation Modeling, 17, 241-264. [Google Scholar]
- Schott J. (2005). Matrix analysis for statistics (2nd ed.). New York, NY: John Wiley. [Google Scholar]
- Schuster C., Smith D. A. (2002). Indexing systematic rater agreement with a latent-class model. Psychological Methods, 3, 384-395. [DOI] [PubMed] [Google Scholar]
- Shrout P. E., Fleiss J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428. [DOI] [PubMed] [Google Scholar]
- Stewart G. L. (2006). A meta-analytic review of relationships between team design features and team performance. Journal of Management, 32, 29-54. [Google Scholar]
- Teachman J. D. (1980). Analysis of population diversity. Sociological Methods & Research, 8, 341-362. [Google Scholar]
- Van Knippenberg D., De Dreu C. K. W., Homan A. C. (2004). Work group diversity and group performance: An integrative model and research agenda. Journal of Applied Psychology, 89, 1008-1022. [DOI] [PubMed] [Google Scholar]
- Van Zyl J. M., Neudecker H., Nel D. G. (2000). On the distribution of the maximum likelihood estimator of Cronbach’s alpha. Psychometrika, 65, 271-280. [Google Scholar]
- Webber S. S., Donahue L. M. (2001). Impact of highly and less job-related diversity on work group cohesion and performance: A meta-analysis. Journal of Management, 27, 141-162. [Google Scholar]
- Yuan K.-H., Bentler P. M. (2002). On robustness of the normal-theory based asymptotic distributions of three reliability coefficient estimates. Psychometrika, 67, 251-259. [Google Scholar]
- Yuan K.-H., Chan W. (2008). Structural equation modeling with near singular covariance matrices. Computational Statistics & Data Analysis, 52, 4842-4858. [Google Scholar]
