Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Mar 9.
Published in final edited form as: J Pers. 2015 Oct 8;85(2):123–135. doi: 10.1111/jopy.12224

College Student Samples Are Not Always Equivalent: The Magnitude of Personality Differences Across Colleges and Universities

Katherine S Corker 1, M Brent Donnellan 2, Su Yeong Kim 3, Seth J Schwartz 4, Byron L Zamboanga 5
PMCID: PMC7941735  NIHMSID: NIHMS1664310  PMID: 26331463

Abstract

This research examined the magnitude of personality differences across different colleges and universities to understand (a) how much students at different colleges vary from one another and (b) whether there are site-level variables that can explain observed differences. Nearly 8,600 students at 30 colleges and universities completed a Big Five personality trait measure. Site-level information was obtained from the Integrated Postsecondary Education System database (U.S. Department of Education). Multilevel models revealed that each of the Big Five traits showed significant between-site variability, even after accounting for individual-level demographic differences. Some site-level variables (e.g., enrollment size, requiring letters of recommendation) explained between-site differences in traits, but many tests were not statistically significant. Student samples at different universities differed in terms of average levels of Big Five personality domains. This raises the possibility that personality differences may explain differences in research results obtained when studying students at different colleges and universities. Furthermore, results suggest that research that compares findings for only a few sites (e.g., much cross-cultural research) runs the risk of overgeneralizing differences between specific samples to broader group differences. These results underscore the value of multisite collaborative research efforts to enhance psychological research.


In recent years, psychological researchers have taken an increased interest in the reproducibility of research findings. After the Journal of Personality and Social Psychology published an article alleging to show the existence of precognition (i.e., ESP), coupled in the same year with a data fabrication by a prominent scholar, researchers have intensified efforts to ensure that findings are reproducible (e.g., Asendorpf et al., 2013; Lishner, 2015; Nosek, Spies, & Motyl, 2012). One such effort has been an increased emphasis on replication (Nosek et al., 2012). There have, however, been questions about how to explain discrepancies in replication—when one lab confirms a finding but another fails to replicate it. Of course, there will always be uncertainty in our findings given the impact of measurement error (Fabrigar & Wegener, in press) and the presence of Type I (Ioannidas, 2005) and Type II (Cohen, 1992; Lishner, 2015) errors. However, researchers will occasionally allude to systematic (i.e., nonrandom) differences between samples to explain failures to replicate results. These “unmeasured moderators” are not usually well specified, but it is an open question as to how much variance exists across samples from the same general population when samples are not drawn using probabilistic methods. For example, given that college student convenience samples are widely used in personality research (and, indeed, the majority of psychological research), we should investigate the extent to which samples drawn from different colleges or universities might differ systematically from each other.

This issue regarding between-site differences is also relevant in cross-cultural research whereby students from a single university (perhaps in the United States) are compared with students from another single university in a different country. The validity of such comparisons may be undermined to the extent that there is considerable within-country variation (i.e., selecting a different pair of universities from the same countries would have provided different results). In short, there are compelling reasons to quantify the magnitude of the differences in personality attributes across research settings.

There are good reasons to suspect some differences between students from different settings. Colleges and universities differ in their selectivity (e.g., how high typical standardized test scores are), the gender and ethnic composition of the campus, and the focus of the institution (e.g., research intensive vs. primarily undergraduate), among other things. Given attraction and selection effects—in which students choose different institutions to attend and colleges choose different students to admit—there is reason to expect that the aggregate personality profiles of students at different schools may differ.

Although the possibility of differences seems plausible, it is more difficult to provide expectations for the size of any such differences. Are the differences trivial, and therefore unlikely to be a viable explanation for replicability differences? Or are they large and therefore potentially important to consider? Thus, the current research examines differences in the mean level of Big Five personality traits with the goal of assessing the magnitude of the differences in a sample of colleges and universities. Furthermore, we examine characteristics of these different colleges and universities to try to explain variability in personality traits across campuses.

Attraction, Selection, Attrition, and Socialization

A brief review of evidence for mechanisms that may underlie differences between colleges and universities is warranted before turning to the details of the current study. Attraction effects describe a type of active niche-picking in which individuals choose to enter situations and roles that fit their personality characteristics (Roberts, Donnellan, & Hill, 2012); the converse behavior, in which individuals leave situations and roles that do not fit their personalities, is termed attrition (Schneider, Smith, & Goldstein, 2000). Considering the context of university choice, an extraverted person may look for a college that allows many opportunities for social activities and large parties. If large universities provide more opportunities for such activities than smaller ones, an attraction effect would produce a situation in which larger universities have students with a higher overall level of Extraversion. Thus, attraction effects would reflect a match between the characteristics of students and the characteristics of the schools they choose to attend.

Selection effects1 refer to a complementary set of mechanisms by which individuals are selected into roles and situations by gatekeepers (e.g., admissions officers, hiring committees) because of their personality characteristics. Selection effects reflect a match between the characteristics of a student and the desired characteristics of a student body. That is, the preferences of a college’s admissions officers and its strategic plan should predict the characteristics of its students. For example, institutions that are more selective (i.e., admit a lower percentage of students) perhaps can afford to be choosy not only about their applicants’ academic credentials (e.g., test scores), but also about their applicants’ personalities—how agreeable and conscientious they are, for instance.

A meta-analysis of nine studies (N = 966; Cortina, Goldstein, Payne, Davison, & Gilliland, 2000) found that Conscientiousness correlated .258 with interview performance, suggesting that schools with mandatory interviews should yield students with higher levels of Conscientiousness. Mandatory interviews are relatively uncommon in the undergraduate admissions process, but some institutions do require applicants to submit one or more letters of reference. To the extent that information about personality is conveyed in letters of reference (Paunonen, Jackson, & Oberman, 1987), it may be the case that institutions use that information in their admissions process, opting to admit students with socially desirable characteristics like Agreeableness and Conscientiousness.

An effective (i.e., valid) selection procedure will identify candidates whose traits facilitate success on the job (or in this case, at university), but it should be noted that selection processes need not act only on traits that facilitate success. A campus that fancies itself to be especially unique, for instance, may try to enroll students who are themselves somewhat quirky (perhaps especially high in Openness to Experience). Or perhaps more generally, admissions officers may prefer to deal with candidates who are polite and organized (so candidates with related traits would be more likely to be admitted). The point is that in the context of college admissions, selection processes may be acting on any criteria valued by that organization.

Finally, there is also the possibility that socialization processes2 help shape the personality attributes of students in different university settings, as students adopt the values and social norms of others in the environment. Roberts, Wood, and Caspi (2008) discussed a particular form of socialization known as the corresponsive principle, which highlights the idea that the traits that attract individuals to roles are often developed even further as a result of time spent in those roles. The idea here is that selection, attraction, and socialization processes may act in concert to generate noticeable differences in the personalities of students at different schools.

For instance, previous research found that German young people who went to college were higher in Openness to Experience and lower in Neuroticism compared to those who went straight to work or to a trade school (demonstrating attraction), but those individuals who went to work or trade school showed larger increases in their Conscientiousness over time compared to the college students (suggesting socialization; Lüdtke, Roberts, Trautwein, & Nagy, 2011).

Jackson, Thoemmes, Jonkmann, Lüdtke, and Trautwein (2012) found that German adolescents who chose military service were lower in Agreeableness, Neuroticism, and Openness compared to adolescents who chose civilian-community service (attraction), and young people who entered military service showed much smaller increases in Agreeableness over the 4 years following their military training compared to those who chose community service over the same time span (socialization).

Similarly, German students with initially higher levels of community (i.e., helping) goals were more likely to major in medicine (attraction), and their levels of community goals increased with time in the major (socialization; Hill et al., 2015). Finally, Roberts, Caspi, and Moffit (2003) found that the traits (e.g., low levels of negative emotionality) that predicted work outcomes (e.g., occupational status attainment) were the same traits that were likely to change over time on the job, demonstrating socialization via the correspondence principle.

The Current Study

As reviewed above, a number of diverse lines of research suggest that the processes of attraction, attrition, selection, and socialization may work together to produce aggregate (university-level) personality differences in students at different institutions. In the current research, we examined a large sample of individuals from 30 colleges and universities to estimate the magnitude of these site differences.

Furthermore, we examined whether there were site-level characteristics that explained variability in trait levels. The site-level characteristics we considered were total enrollment, percentage of the student body that was female, percentage aged 18–24, percentage of various ethnic groups represented, site location (urban vs. non-urban), percentage admitted, average SAT score, whether recommendation letters were required, site type (public vs. private), whether a site was a land grant institution, cost of attendance, endowment, and first-year retention rates. Site-level moderators were selected based on availability in the national Integrated Postsecondary Education System (IPEDS) database and theoretical relevance to the college selection process.

We predicted that there would be detectable differences between sites, but we did not make predictions regarding the magnitude of differences for specific traits. Given the lack of prior work on university-level predictors of personality differences across institutions, we did not make specific predictions regarding the ability of site-level characteristics to explain site differences.

METHOD

Participants

The sample consisted of 8,569 students from the Multi-Site University Study of Identity and Culture (MUSIC; Castillo & Schwartz, 2013; Weisskirch et al., 2013). The MUSIC data represent a total of 30 colleges and universities in 20 U.S. states. Site-level characteristics are listed in Table 1. Variables concerning site-level characteristics were drawn from IPEDS (U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics) statistics from the 2008–2009 academic year (the year of data collection). Participants’ demographic characteristics are provided in Table 2.

Table 1.

Site Characteristics

Site Region Type % 18–24 % Asian % Black % Hispanic % White % Other % Women
Site A Southwest Public 81 5 4 14 64 12 52
Site B Southeast Public 95 2 8 2 83 6 49
Site C Rocky Mtns. Private 87 4 1 4 86 6 48
Site D Far West Public 81 7 4 27 46 16 60
Site E Rocky Mtns. Public 89 3 2 6 76 12 52
Site F Southeast Public 73 4 13 60 17 7 57
Site G Southeast Public 69 3 11 3 73 10 60
Site H Great Lakes Public 94 5 7 3 73 12 54
Site I Mid East Public NR 5 4 3 79 8 45
Site J Far West Private 96 14 8 11 46 19 50
Site K Far West Public 70 18 7 14 42 20 58
Site L New England Private NR 1 22 7 56 13 63
Site M Mid East Private 76 15 13 12 43 17 57
Site N Southwest Public 97 4 3 12 71 10 47
Site O Far West Public 93 35 3 11 37 14 55
Site P Far West Public 92 36 7 25 19 11 51
Site Q Southeast Public 80 5 15 4 72 5 54
Site R Southeast Public 87 3 5 3 80 8 49
Site S New England Public 93 7 5 5 63 20 51
Site T Southeast Public 94 8 8 12 61 10 53
Site U New England Public 91 7 5 4 67 17 50
Site V Plains Public 82 8 4 2 71 15 53
Site W Plains Public 95 2 6 2 81 10 54
Site X Plains Public 92 2 2 3 80 12 47
Site Y Great Lakes Private 100 6 3 8 75 8 44
Site Z Far West Private 86 18 5 12 39 27 62
Site AA Plains Public NR 1 1 1 86 10 62
Site AB Southeast Public 74 6 12 13 65 5 59
Site AC Southwest Public 93 15 4 16 54 10 51
Site AD New England Private 98 11 9 9 62 10 50

Note. NR = Not reported. Data are site characteristics (Level 2) from the Integrated Postsecondary Education Data System (IPEDS; U.S. Department of Education, Institute of Education Sciences, National Center for Education Statistics) in 2008–2009. % Other includes other races and unknown race.

Table 2.

Demographic Characteristics of Samples

Site N Women Black White East Asian Hispanic Other Race Age (M) Age (SD)
Site A 515 82.5 6.3 71.1 2.9 17.4 2.4 22.28 0.23
Site B 127 73.9 8.0 84.8 3.6 0.7 2.8 21.43 0.21
Site C 186 59.5 1.4 89.4 3.2 6.0 0.0 21.67 0.30
Site D 55 87.9 1.5 63.6 3.0 27.3 4.5 22.68 0.54
Site E 117 96.5 2.8 89.4 1.4 6.4 0.0 21.47 0.22
Site F 516 74.7 9.4 15.9 2.8 70.4 1.5 20.33 0.16
Site G 134 70.0 14.4 77.5 2.5 5.0 0.6 21.94 0.51
Site H 457 69.1 8.0 82.9 4.9 1.6 2.8 19.66 0.09
Site I 234 84.0 5.4 90.9 1.2 1.2 1.2 20.36 0.15
Site J 24 80.6 6.9 65.5 17.2 6.9 3.4 18.74 0.16
Site K 67 95.0 6.4 57.7 10.3 21.8 3.9 21.58 0.51
Site L 33 91.2 5.9 94.1 0.0 0.0 0.0 20.94 0.51
Site M 147 76.7 17.2 44.8 9.2 17.2 11.6 18.90 0.10
Site N 139 71.1 5.6 70.6 3.1 19.4 1.3 20.37 0.21
Site O 801 71.2 1.5 38.0 37.2 10.6 12.7 19.68 0.07
Site P 334 51.2 9.6 14.5 31.1 32.0 12.8 19.08 0.06
Site Q 188 75.2 26.6 66.3 2.5 3.0 1.5 20.88 0.33
Site R 182 73.2 9.0 80.4 4.8 3.2 2.7 19.52 0.17
Site S 423 53.5 5.4 80.9 6.0 5.2 2.6 18.91 0.05
Site T 982 68.1 15.5 60.6 6.9 14.1 3.0 19.97 0.07
Site U 65 81.7 1.4 64.3 27.1 7.1 0.0 20.73 0.45
Site V 383 67.8 2.8 75.4 13.3 2.3 6.3 20.12 0.13
Site W 410 82.4 8.6 86.0 3.6 1.4 0.4 19.86 0.08
Site X 22 58.3 4.2 75.0 8.3 4.2 8.3 21.58 0.38
Site Y 204 57.0 4.7 65.6 15.3 12.6 1.9 19.32 0.08
Site Z 64 72.8 1.0 34.7 39.6 15.8 8.9 20.86 0.31
Site AA 118 69.0 0.8 94.4 2.4 1.6 0.8 21.64 0.34
Site AB 1,202 77.4 13.6 64.3 3.8 14.0 4.2 20.75 0.11
Site AC 419 88.9 11.0 41.6 13.2 23.6 10.5 20.77 0.13
Site AD 21 54.2 9.1 59.1 9.1 18.2 4.5 19.00 0.24

Note. Values represent participant (Level 1) data in terms of percentages (of nonmissing responses) and age in years.

Procedure

Participants were recruited from courses in psychology, human development, family studies, nutrition, sociology, and business. Students were directed to the common online study Web site using printed or emailed announcements distributed at each site. In psychology departments, students were recruited from participant pools. In other types of departments, students were recruited from specific courses and offered course credit or entry into a prize drawing in exchange for their participation.

The survey took 1–2 hours to complete, and the Institutional Review Board at each college/university approved the study procedures (or deemed the study exempt). Data were collected between September 2008 and October 2009.

Materials

Participants completed the mini International Personality Item Pool (IPIP; Donnellan et al., 2006), along with other measures not analyzed here.3 The mini IPIP is a 20-item measure of the Big Five personality factors: Extraversion, Agreeableness, Conscientiousness, Neuroticism, and Openness. A 5-point response scale (1 = strongly disagree; 5 = strongly agree) was used. Table 3 provides the raw means, standard deviations, alpha reliabilities, and scale intercorrelations for each factor, both at Level 1 (individuals) and Level 2 (site aggregates). All data and analysis scripts are available at https://osf.io/if7ug/.

Table 3.

Descriptive Statistics

M SD α 1 2 3 4 5
Level 1 (N = 8,564–8,569)
  1. Extraversion 3.31 0.91 .77 1.00
  2. Agreeableness 3.93 0.70 .65 .23 1.00
  3. Conscientiousness 3.57 0.81 .65 .02 .15 1.00
  4. Neuroticism 2.80 0.80 .64 −.12 −.01 −.11 1.00
  5. Openness 3.75 0.76 .68 .19 .25 .01 −.10 1.00
Level 2 (k = 30)
  1. Extraversion 3.30 0.14 .88 1.00
  2. Agreeableness 3.96 0.14 .81 .28 1.00
  3. Conscientiousness 3.58 0.13 .77 −.11 .53 1.00
  4. Neuroticism 2.81 0.13 .74 −.10 .02 −.04 1.00
  5. Openness 3.73 0.16 .87 .34 .18 .25 −.36 1.00

RESULTS

Preliminary Analyses

Prior to conducting substantive analyses, we tested whether the measurement of traits was invariant across the different sites. A lack of measurement invariance could render cross-university comparisons meaningless (Chen, 2008). Because measurement invariance tests require large sample sizes, only the 11 sites with sample sizes greater than 300 were included in the test.4 Mplus (Version 7.0; Múthen & Múthen, 1998–2014) was used for all tests.

First, a model was estimated in which all loadings and intercepts were freely estimated across universities. For identification purposes, latent means were constrained to zero, and latent variances were constrained to one. Latent factors were also allowed to freely covary with one another. This baseline model was then compared to subsequent nested models. To test whether participants were using the same range of scale points, we equated loadings across sites (while also freeing all but one of the latent variances, which was fixed for identification). Then we equated intercepts across sites to test whether the origin, or zero point, of items was the same across sites (again, we did this while freeing all but one latent mean, which remained fixed to zero for identification). If these invariance tests are satisfied, we can conclude that the measurement of traits was invariant across sites and proceed with comparing latent variances and covariances, as well as latent means. Comparison of the latent means allows a global test of whether there are differences between sites in the mean level of traits.

Because chi-square difference tests are sensitive to large sample sizes (Cheung & Rensvold, 2002), we compared models using ΔCFI.5 If models differed in terms of CFI by less than .01, we assumed that the model was invariant (Cheung & Rensvold, 2002). Table 4 displays the results of these nested tests. Inspection of the table reveals that fixing the loadings to be equivalent across sites did not significantly decrease model fit (ΔCFI = .004). Fixing the intercepts to equivalence also did not hurt model fit substantially (ΔCFI = .008). These analyses suggest that participants across sites were interpreting items in similar ways. It is interesting to note that RMSEA actually improves across models as constraints are added, presumably because of improved model parsimony.

Table 4.

Fit Statistics for Nested Models

χ2 df RMSEA [90% CI] CFI SRMR
All parameters free 6,913.94 1,760 .071 [.069, .072] .806 .060
Loadings equal 7,171.02 1,910 .069 [.067, .070] .802 .063
Intercepts equal 7,533.15 2,060 .067 [.066, .069] .794 .065
Variances equal 7,626.08 2,110 .067 [.065, .068] .793 .070
Covariances equal 7,768.05 2,210 .066 [.064, .067] .791 .072
Latent means equal 8,188.61 2,260 .067 [.065, .068] .777 .079

Note. Total N = 6,444; k = 11.

We next proceeded to examine whether we could constrain latent variances and covariances to be equal across sites. Constraining latent variances did not hurt model fit (ΔCFI = .001), nor did constraining latent covariances (ΔCFI = .002). Finally, we constrained latent means to be equal across sites. This constraint did decrease model fit (ΔCFI = .014), suggesting the presence of meaningful differences across sites in terms of the mean levels of different traits. We therefore further examined these mean differences to evaluate our main research questions. The remaining analyses were conducted using observed scores so that all sites could be included in the analysis.

Primary Analyses

We used multilevel modeling6 to test whether there were differences in levels of personality characteristics across colleges/universities. All analyses were conducted in the nlme package in R (Pinheiro et al., 2015) using maximum likelihood estimation. We estimated separate models for each of the five traits. For each trait, we compared an intercept-only model (with no predictors) to an unconditional means model with college/university as a random factor. This model tests whether there is significant between-sites variability in traits.

The results of this set of analyses are displayed in Table 5. For each of the five traits, the unconditional means model fit the data better than an intercept-only model. However, there were some differences from trait to trait regarding the proportion of variability that was due to sites. The largest amount of variability was explained in Agreeableness, with 2.8% of the variability due to site, followed by 2.0% for Extraversion, 1.2% for Conscientiousness, 1.1% for Neuroticism, and 0.9% for Openness. Although seemingly small, these effect sizes are in line with estimates from other large multisite studies (e.g., Rentfrow, Jokela, & Lamb, 2015, had ICCs = 0.2–1.2%; M. Jokela, personal communication, March 30, 2015). The magnitude of these differences becomes more intuitive when expressed in the more familiar Cohen’s d metric as opposed to a percentage of variance metric as discussed below.

Table 5.

Results of Unconditional Means Model

LR (df = 1) Intercept (γ00) Site Var. (τ00) Residual (σ2) ICC (1) ICC (2)
Extraversion 160.99 3.31 (0.03) .016 .810 .020 .730
Agreeableness 142.21 3.95 (0.02) .014 .485 .028 .785
Conscientiousness 40.96 3.58 (0.02) .008 .649 .012 .651
Neuroticism 23.24 2.80 (0.02) .007 .642 .011 .630
Openness 25.13 3.74 (0.02) .005 .566 .009 .560

Note. LR = likelihood ratio; Var. = variance. LR values are results of a test comparing an intercept-only model to an unconditional means model; all LR tests are statistically significant. Values in parentheses are standard errors. ICC (1) is the proportion of variance in traits explained by site. ICC (2) is the average within-site reliability. All models are computed using maximum likelihood estimation.

Figure 1 displays the distribution of means obtained for each trait, along with 95% confidence intervals for each site and meta-analytic estimates. Visual inspection of these plots reveals that some sites appear to deviate quite dramatically from the overall mean level for a given trait. For instance, the Site O sample had a much lower level of Extraversion than the Site W sample. In terms of Cohen’s d (a commonly used standardized effect size metric) and relative to a grand standard deviation of 0.91 for Extraversion, Site O is 0.70 standard deviations lower than Site W on Extraversion. The maximum differences in Cohen’s d units were 0.70 (Extraversion), 0.77 (Agreeableness), 0.85 (Conscientiousness), 1.05 (Neuroticism), and 1.15 (Openness).

Figure 1.

Figure 1

Traits by site. Scale of points represents the contribution of site to the meta-analysis (larger points are weighted more heavily). Intervals are 95% confidence intervals. Figures were produced using mixed effects meta-analysis in the metafor package in R (Viechtbauer, 2010). Individual figures are also available at https://osf.io/if7ug/.

Rather than comparing extreme sites, it is perhaps more useful to estimate the typical amount of deviation from mean levels that researchers should expect. One measure of this would be the standard deviation of the site-level (Level 2) means (Table 3). These standard deviations ranged from 0.13 (Conscientiousness and Neuroticism) to 0.14 (Extraversion and Agreeableness) to 0.16 (Openness), suggesting small but meaningful deviations from the grand mean.

Alternatively, we could consider the mean absolute distance of each site from the grand mean. To do this, each site’s mean was converted into a Cohen’s d metric by taking the absolute value of the difference from the grand mean and dividing by the Level 1 standard deviation. For instance, Site O’s Extraversion mean of 2.96 was subtracted from the grand mean of 3.31 and scaled relative to the Level 1 standard deviation of 0.91 for a d of 0.38. It is then possible to average these deviations to get an estimate of the distance of the typical site from the grand mean. The full results are displayed in Supplementary Table 1. The average distances from the mean ranged from 0.08 (Neuroticism) to 0.14 (Agreeableness), indicating that a typical sample was about one-tenth of a standard deviation from the grand mean of the trait. Thus, in the aggregate, the expected value for a sample is not drastically far from the grand mean, but some sites did vary quite a bit from the grand mean (and from each other).

Finally, we also considered the average pairwise distance of each site mean from each other site mean for a given trait. This produces (k2k)/2 unique pairwise comparisons, where k is the number of sites. Thus, for each trait there are 435 comparisons. The average of the absolute value of these 435 comparisons, divided by the Level 1 standard deviation, is the average pairwise distance in Cohen’s d units. These values were 0.18 (Extraversion), 0.22 (Agreeableness), 0.18 (Conscientiousness), 0.16 (Neuroticism), and 0.21 (Openness). Thus, on average, it is expected that levels of a trait at a given site differ by 0.16 to 0.22 standardized units from the same trait at another site.

Moderator Analyses

Given meaningful site differences in traits, we attempted to explain these differences using site-level characteristics. First, because we know there are demographic individual-level variables (e.g., gender, race, and age) that predict individual trait levels, and given that samples differed in terms of these demographic variables, we conducted multilevel models separately for each trait with site as a Level 2 random intercept, and gender, ethnicity, and age as Level 1 fixed effects. Ethnicity was dummy coded with four indicator variables (one for Asian, one for Black, one for Hispanic, and one for other races, with White as the comparison group); gender was also dummy coded (1 = male, 0 = female). Age was grand-mean centered prior to analysis.

Results are reported in Table 6. Men scored lower than women on Extraversion, Agreeableness, Conscientiousness, and Neuroticism, and they scored higher on Openness.7 Age was negatively associated with Extraversion and positively associated with Conscientiousness. There was no significant association between the other three traits and age. Students of color had lower scores on Extraversion and Agreeableness than White participants. Ethnic comparisons were less consistent for the other three traits (see Table 6).

Table 6.

Random Effects Model With Level 1 Demographic Predictors

E A C N O
Intercept 3.40 (0.03)* 4.11 (0.02)* 3.63 (0.02)* 2.88 (0.02)* 3.75 (0.02)*
Male −0.11 (0.02)* −0.39 (0.02)* −0.16 (0.02)* −0.37 (0.02)* 0.11 (0.02)*
Age (centered) −0.01 (0.00)* −0.00 (0.00) 0.02 (0.00)* −0.00 (0.00) 0.00 (0.00)
Black −0.21 (0.04)* −0.17 (0.03)* 0.01 (0.03) −0.02 (0.03) −0.04 (0.03)
East Asian −0.28 (0.04)* −0.20 (0.03)* −0.09 (0.03)* 0.09 (0.03)* −0.19 (0.03)*
Hispanic −0.10 (0.03)* −0.13 (0.02)* 0.00 (0.03) 0.06 (0.03)* −0.03 (0.03)
Other Race −0.16 (0.05)* −0.18 (0.04)* −0.04 (0.04) 0.11 (0.04)* −0.15 (0.04)*
Random intercept model LR = 121.67 LR = 634.84 LR = 124.74 LR = 359.42 LR = 77.63
τ00 = .012 τ00 = .010 τ00 = .003 τ00 = .004 τ00 = .006
σ2 = .799 σ2 = .449 σ2 = .641 σ2 = .613 σ2 = .560
% W/I = 1.4 % W/I = 7.0 % W/I = 1.3 % W/I = 4.1 % W/I = 1.0
ICC (1) = 1.5 ICC (1) = 2.1 ICC (1) = 0.5 ICC (1) = 0.7 ICC (1) = 1.1
Random slopes model LR = 22.39 LR = 13.30 LR = 23.32 LR = 0.65 LR = 4.77
p = .72 p = .99 p = .67 p = .99 p = .57

Note. Coefficients in parentheses are standard errors. E = Extraversion; A = Agreeableness; C = Conscientiousness; N = Neuroticism; O = Openness to Experience; LR = likelihood ratio test for random intercept models compared to unconditional models (df = 6; all ps < .05) and random slopes models (df = 25 for E, A, C; df = 6 for N, O). Random slopes models allowed slope-intercept covariances (models for E, A, C) unless this specification produced a model error (models for N and O), in which case the covariances were assumed to be zero. τ00 is the proportion of variability that is attributable to sites, and σ2 is the error variance. % W/I is the percentage of within-site variance explained by this set of predictors. ICC (1) is the percentage of variability explained by site. White individuals and women are the comparison groups.

*

p<.05.

Most importantly, significant between-site variability remained for all traits after accounting for demographic characteristics, implying that site differences are not fully explained by demographic differences between samples. We also did a series of analyses allowing random slopes for the three individual-level demographic variables (i.e., age, sex, and race). This analysis allows the effects of age, sex, and race to vary across sites (e.g., the magnitude of the gender difference could vary from site to site). Nested models comparing these models to the models with only random intercepts did not indicate evidence of random slopes (see Table 6). Subsequent models therefore did not include random slopes for demographic variables.8

Site characteristics were then entered in individual models as fixed effects. Variables were dummy coded if categorical and rescaled to a sensible metric if continuous (to aid in the interpretation of the unstandardized coefficients). For instance, the number of students enrolled was divided by 10,000 so that regression coefficients can be interpreted as the predicted amount of change in each trait for every 10,000 students. All site-level percentages (e.g., percent of the student body that is female) were divided by 10, so regression coefficients are interpreted as the predicted amount of change in a trait for each 10% change in the representation of that group. Rescaling changes only the values of the site-level regression coefficients, but nothing else about the analysis (i.e., ICCs, proportion of variability explained, t statistics, and p-values) changes.

Each site-level characteristic was entered as a predictor of each trait, and the statistical significance of each predictor was evaluated. Additionally, the proportion of between-site variability explained by each Level 2 predictor was calculated, according to the following formula:

τ00(modelwithonlyLevel1predictors)τ00(modeladdingLevel2predictor)τ00(modelwithonlyLevel1predictors)

Attraction Effects.

We first consider variables that could be associated with student self-selection into different campuses. To this end, we focused on number of students enrolled in a campus, student body demographics (i.e., gender, age, and ethnic diversity), and campus location.

Detailed results are displayed in Supplemental Table 2. Perhaps unsurprisingly, the size of a student body was positively correlated with its level of Extraversion. Site enrollment size explained 15.6% of the remaining between-site variability in Extraversion (after accounting for Level 1 demographics). Figure 2 displays a simplified version of this analysis (site size was entered as a dichotomous predictor instead of a continuous predictor; sites were dummy coded as having 20,000 or more students vs. fewer than 20,000 students, consistent with IPEDS classifications). The figure shows that larger sites generally tended to have a higher level of Extraversion. Site enrollment size was unrelated to any of the remaining traits.

Figure 2.

Figure 2

Moderation of campus-level Extraversion by size of campus. Scale of points represents the contribution of site to mixed effects meta-analysis (larger points are weighted more heavily). Intervals are 95% confidence intervals. Meta-analytic estimate is the mean for all 30 sites.

One intriguing finding was that Openness was negatively correlated with the proportion of a student body aged 18–24. That is, the greater the proportion of younger students, the lower a site’s Openness to Experience (Figure 3). The proportion of students aged 18–24 explained 68.1% of the between-site variability in Openness. Openness tends to decline with age (Lucas & Donnellan, 2011), so at first blush, this is a somewhat surprising finding. Then again, correlations at the site level need not be consistent, either directionally or in terms of magnitude, with associations at the individual level (Robinson, 1950). Furthermore, the percentage of students aged 18–24 was negatively correlated (at the site level) with the percentage of Black (r = –.50) and Hispanic (r = –.33) students on campus, as well as the percentage of women (r = –.72). Thus, a younger campus may be a less diverse campus, and these lower levels of diversity may be a contributing factor to this negative association.

Figure 3.

Figure 3

Scatterplot of relation between college-level openness and percentage of student body aged 18 to 24 (r = ‒.36). Size of points corresponds to weight of contribution to analysis. Excluding an apparent outlier (Site AD) reduces the correlation to ‒.29.

Supporting this interpretation, variability in campus Openness levels was explained by percentage of White students (54.7%) and Hispanic students (49.1%). There were also nonsignificant but directionally consistent effects for percentage of Black students (20.7%), Asian students (9.5%), and female students (25.4%). All effects were such that the more diverse a campus was (i.e., more nontraditionally aged students, more women, more students of color), the higher the campus’s level of Openness.

The IPEDS database classifies site locations as city, suburban, town, or rural. Cities and suburbs are located in large metropolitan areas, whereas towns and rural areas are located in more sparsely populated areas. Therefore, we further simplified these classifications into urban (cities and suburbs = 1) and non-urban (towns and rural areas = 0). Consistent with the above findings for campus diversity, campuses located in urban areas were higher on Openness to Experience than campuses located in non-urban areas. The urban campus location variable accounted for 21.1% of the between-site variability in Openness. None of the other traits had a significant association with urban campus location.

Selection Effects.

Next, we considered site-level characteristics that might contribute to campus-level differences via selection mechanisms. These variables were percent of applicants admitted, average SAT9 scores (which were required for admission at the majority of the 30 sites), and a dummy variable indicating whether recommendation letters were required (= 1) or optional/not accepted (= 0).

Full results are displayed in Supplemental Table 3. There were no significant associations between percent of applicants admitted and any of the traits examined. The largest effect was for Agreeableness, such that as the percentage of students admitted increased, campuses had slightly lower levels of Agreeableness. Similarly, for average SAT scores, there were no significant effects. The largest effect was for Openness to Experience, and the effect was such that as average SAT score increased, campuses had somewhat lower levels of Openness. As with age of student body, this effect is counter to typical findings at the individual level (Noftle & Robins, 2007).

Finally, there was a significant effect for requiring recommendation letters. Campuses that required recommendation letters tended to have student bodies with higher levels of Agreeableness. Whether or not recommendation letters were required explained 22.8% of the between-site variability in Agreeableness. There was also a significant effect for Neuroticism, such that campuses that required recommendation letters had student bodies with lower levels of Neuroticism. Requiring recommendation letters explained 43.1% of the between-site variability in Neuroticism. The effects for the three remaining traits were not statistically significant. In sum, we found some evidence for potential selection effects via requiring letters of recommendation, but not for selectivity more generally, nor for average SAT scores.

Additional Site-Level Variables.

We also considered several other site characteristics that are difficult to classify as potential causes of either attraction or selection effects. Nonetheless, these seem like variables of interest to both students and educators. We considered whether a school was public (= 1) versus private (= 0) and whether it was a land grant institution (= 1) or not (= 0). We also considered the cost of attendance for in-state (i.e., resident) students and out-of-state (nonresident) students (scaled in $10,000 increments), as well as the size of an institution’s endowment (in billions of U.S. dollars). Finally, we examined full-time retention rates (scaled in 10% increments), defined as the percentage of full-time students who return after the first year for their second year of study.

Results are displayed in Supplemental Table 4. Public schools had lower levels of Agreeableness than private schools, with school type explaining 20.1% of the between-site variability. Land grant institutions had lower levels of Conscientiousness than non–land grant institutions, with the land grant variable explaining 38.8% of between-site variability in Conscientiousness. More expensive schools had higher levels of Neuroticism. The effect was statistically significant for out-of-state tuition, with price explaining 24.9% of the variability in Neuroticism, and the effect was marginal for in-state tuition, with price explaining 18.3% of the variability in Neuroticism. There were no statistically significant effects for either endowment or retention rates. Potential explanations for these effects are considered in the Discussion, though in general there did not appear to be any obvious themes in these findings.

Regional Differences.

Finally, we considered whether there was evidence for regional differences between sites. Colleges and universities are, of course, situated in geographic regions. However, given the somewhat transient nature of college student residents, and the fact that many colleges attract students from across the country to their campuses, it is unclear to what extent these college campuses will reflect the typical traits of their surrounding regions. Previous research has shown that there are robust regional differences in traits (Rentfrow, Gosling, & Potter, 2008), but it remains an open question whether there are still site differences in traits after accounting for variance due to shared regions.

The IPEDS database classifies institutions according to eight different areas of the United States: New England, Mid East, Great Lakes, Plains, Southeast, Southwest, Rocky Mountains, and Far West. We conducted a series of multilevel analyses, with random intercepts for region only, for site only, and for both regions and sites. The first two models are each nested within the final model, so two chi-square difference tests were conducted to determine whether the model with both random effects offered significantly more explanatory power than each of the models with only one random effect.

The results are displayed in Table 7. On their own, regions explained some variability in trait levels. However, they generally did not explain as much variability as sites, and when both regions and sites were included in the same model, regions generally failed to explain any variability above and beyond sites. This suggests that there are differences in personality traits between colleges and universities that are not reducible to regional differences. We consider the implications of this interpretation further in the Discussion.

Table 7.

Regional Differences

LR (df = 1) Intercept (γ00) Site Var. (τ00) R/S Residual (σ2) ICC (1) R/S
Region only
  Extraversion 44.65* 3.42 (0.03) .007 .806 .009
  Agreeableness 34.65* 4.12 (0.03) .005 .452 .010
  Conscientiousness 1.76 3.63 (0.02) .003 .642 .004
  Neuroticism 5.00* 2.87 (0.02) .003 .615 .005
  Openness 18.60* 3.74 (0.02) .002 .563 .003
Site only
  Extraversion 0.01 3.40 (0.03) .012 .799 .015
  Agreeableness 0.03 4.11 (0.02) .010 .449 .021
  Conscientiousness 0.51 3.63 (0.02) .004 .641 .005
  Neuroticism 0.01 2.88 (0.02) .004 .613 .007
  Openness 0.51 3.75 (0.02) .006 .560 .011
Region + Site
  Extraversion 3.40 (0.03) .0001/.012 .799 .0001/.015
  Agreeableness 4.11 (0.02) .0003/.009 .449 .0006/.020
  Conscientiousness 3.63 (0.02) .0011/.002 .641 .0017/.004
  Neuroticism 2.88 (0.02) .0000/.004 .613 .0000/.007
  Openness 3.74 (0.02) .0013/.005 .560 .0023/.010

Note. Var. = variability; R = region; S = site. Models control for individual-level demographic variables. Likelihood ratio (LR) tests compare model with one random intercept (either region or site) to model with two random intercepts (for both region and site).

*

p<.05.

DISCUSSION

The objective of this study was twofold: (a) to quantify the amount of between-university differences in the Big Five personality traits and (b) to test whether these differences could be explained by site-level variables that may serve as proxies for attraction, selection, and socialization effects. Considering the first goal, we found evidence that site accounted for between 0.9% and 2.9% of the variability in traits. Even after considering individual-level demographic differences, there was persistent variability in trait levels between different colleges and universities. In the Cohen’s d metric, sites differed on average from each other by 0.16 to 0.22 standardized units. Taken together, these results suggest small but meaningful between-site differences, though the majority of variability was within site. This suggests that college student samples from different universities may not be interchangeable in terms of their average personality trait profile. Although the effects we found are somewhat small, they should not be dismissed as necessarily trivial or unimportant.

Regarding site-level variables that explained between-campus trait differences, we found evidence for several potential attraction effects and a few selection effects. Larger campuses had more extraverted students, and more diverse and urban campuses had more open students. Campuses that required letters of recommendation had more agreeable and less neurotic students. These results offer promising avenues to pursue in future research and provide insights into broad strokes differences between college student samples drawn from different kinds of campuses.

According to lifespan developmental theory (Roberts et al., 2012), we would expect that attraction, selection, attrition, and socialization are all processes that would contribute to differences in students’ traits between campuses, but with cross-sectional data, we cannot test the hypothesis that students’ traits change in the direction of their peers over time. Future, longitudinal, studies should therefore test to what extent observed differences are due mainly to attraction/attrition and selection (as we have suggested here) versus socialization. It should further be noted that our classification of various site-level characteristics as “attraction” versus “selection” effects is face valid but probably too simplistic. For instance, although admissions officers select students based on their standardized test scores, students also choose which schools to apply to based on their perceptions of a school’s selectivity (likely derived from these same test scores). Selection and attraction effects likely work in tandem.

All told, these results suggest there is more variability between students at different colleges and universities than some researchers might have expected. At least in terms of Big Five personality traits, a college student sample from one school may not approximate a sample from another school. It remains to be seen, however, to what extent these trait differences translate into differences in results for studies being conducted at different colleges and universities. In a recent investigation, the Many Labs 3 team (Ebersole et al., in press) investigated whether personality (measured with the brief 10-item personality inventory; Gosling, Rentfrow, & Swann, 2003) varied over the course of the semester, as well as to what extent any site differences moderated observed experimental results. Unfortunately, the majority of the effects assessed were null, and there was little between-site variation in experimental effect sizes, so a full test of this hypothesis remains to be done. (It should be noted, however, that personality traits did differ from site to site across the 20 Many Labs schools, to a similar extent as they do in the current data set.) Given the size of the differences observed between campuses in personality traits, we urge researchers to be very cautious about invoking Big Five personality traits as moderators of cross-site replicability in the absence of a well-articulated theory about the underlying mechanism and empirical evidence about the size of the potential differences across campuses. The observed effect sizes do not make personality traits a very likely candidate for such a moderator.

There are also interesting parallels between the current research and previous cross-cultural personality studies. For instance, Schmitt, Allik, McCrae, and Benet-Martínez (2007) found sizable mean differences in Big Five traits across 56 nations. Their study, though notable for its large sample and the large number of countries represented, was fairly typical of much cross-cultural work in that one sample per country (except for the United States) was taken to be representative of the whole country. Our results illustrate some of the limitations of this approach. In the current data, we could potentially obtain a very different picture of what a “typical” personality profile of someone from the United States looks like depending on which sample we considered.

However, there are several important limitations of our study to note. First, each of our 30 samples was a convenience sample, consisting mainly of students enrolled in psychology, human development, family studies, and business courses. The results therefore may not generalize to these colleges and universities on the whole. Furthermore, social science majors may not be representative of typical college students. However, assuming that there are similarities between students in these fields across universities (see, e.g., Skatova & Ferguson, 2014), our estimates of between-university differences may be somewhat conservative. Second, our analyses, particularly regarding the site-level moderators, are likely somewhat underpowered (due to having only 30 sites in our data set). It is also challenging to consider combinations of site-level characteristics in the same analysis. This makes it difficult to draw sweeping conclusions about the explanatory power of these site-level characteristics, and instead, we treat these as promising future directions to pursue in subsequent studies with larger numbers of sites.

Overall, as psychologists are pondering the robustness of the existing empirical literature, it makes sense for all researchers to try to think more carefully about important individual difference characteristics that may moderate a given effect. To the extent that an effect only holds for individuals with a specific profile of characteristics, we would not expect results to replicate in a new sample of individuals who do not have those characteristics. On the other hand, it may be that many of our most central effects are relatively universal—that is, they do not depend on the level of various individual difference variables. To the extent that this is the case, we would not expect to see as much variability in effects from sample to sample, even if there are meaningful differences in mean levels of traits across different samples. That is, mean differences may not imply structural path differences. We therefore conclude with a call for researchers to continue promising work with large-scale collaborations (as was the case for this project and as the Many Labs group is currently pursing) because such collaboration offers a chance to involve researchers from a large number of organizations, to increase Ns, to test for between-site differences, and to generally increase collaborative work in personality psychology.

Supplementary Material

Tables 1-4

Supplementary Table 1. Absolute Value of Each Site’s Distance from Grand Mean

Supplementary Table 2. Multi-Level Models for Potential Attraction Effects with Site-Level Predictors

Supplementary Table 3. Multi-Level Models for Potential Selection Effects with Site-Level Predictors

Supplementary Table 4. Multi-Level Models for Additional Site-Level Predictors

Table 5

Supplementary Table 5. Fit Statistics from Models with Random Slopes for Traits

Acknowledgments

We wish to acknowledge Michael Vernon, who was the webmaster for the Multi-Site University Study of Identity and Culture (MUSIC). We would also like to acknowledge the following collaborators who were instrumental in collecting the MUSIC data: V. Bede Agocha, Melina Bersamin, Britton Brewer, Elissa Brown, Miguel A. Cano, S. Jean Caraway, Gustavo Carlo, Linda G. Castillo, H. Harrington Cleveland, Matthew J. Davis, Roxanne Donovan, Larry F. Forthun, Anthony D. Greene, Lindsay S. Ham, Sam A. Hardy, Monika Hudson, Eric Hurley, Que-Lam Huynh, Maria Iturbide, Thao N. Le, Richard M. Lee, Irene J. K. Park, Vicky Phares, Stephanie Pituc, Russell D. Ravert, Liliana Rodriguez, Ariz Rojas, Adriana Umaña-Taylor, Alexander T. Vazsonyi, Robert S. Weisskirch, Susan Krauss Whitbourne, Jacquelyn D. Wiersma, Michelle K. Williams, Gloria Wong, and Nolan Zane.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Notes

1.

Researchers sometimes fail to distinguish between active role choices on the part of individuals (i.e., attraction) and selection into a role by other people (i.e., selection), terming both types of behavior selection effects. Of course, both of these processes often work in tandem (a free person cannot be selected into a role by another person that she has not been attracted to), but for clarity and consistency with the organizational psychology literature, we distinguish between these processes here.

2.

Roberts (2006) distinguishes between transformation (i.e., changes in personality resulting from experiences in an organization or a role) and manipulation (i.e., changes to an organization stemming from an individual’s behavior), both of which can be considered finer gradations of socialization.

3.

These measures included, but were not limited to, measures of risk-taking, well-being, identity and cultural values, and stress (e.g., Schwartz et al., 2011; Waterman et al., 2013; Zamboanga et al., 2015). We chose to focus on the Big Five traits because of their breadth of coverage and their centrality to personality research. We have not yet applied similar analyses to any other measures in the data set.

4.

Alternatively, the remaining 19 sites with N<300 can be treated as one combined site and included in the analysis. The conclusions from this analysis are the same (a table of fit statistics is on the Open Science Framework project page).

5.

The CFI statistics for our models were less than typical cut-offs of.90, but this is typical for personality measurement models (see Hopwood & Donnellan, 2010). Allowing cross-loadings would likely improve model fit, but the purpose of this analysis is to assess relative fit, not maximize global fit.

6.

One could also use random effects meta-analysis to conduct these analyses, but it seemed more appropriate to view these 30 sites as a sample of possible sites drawn from a population and use multilevel modeling (MLM). Regardless, very similar conclusions are drawn using the metafor package (Viechtbauer, 2010) with maximum likelihood estimation in R.

7.

The Cohen’s ds for gender differences in our full sample are seemingly comparable to Schmitt, Realo, Voracek, and Allik’s (2008) estimates for American participants. We found d = 0.14 (E), 0.57 (A), 0.21 (C), 0.45 (N), and ‒0.13 (O), compared to their values of 0.15 (E), 0.19 (A), 0.20 (C), 0.53 (N), and ‒0.22 (O).

8.

We also tested models allowing each trait to predict each other trait, and we allowed random slopes for the trait predictors. Of 20 models, only two showed evidence of random slopes (see Supplementary Table 5). This suggests that for the most part, correlations between traits were consistent from site to site.

9.

Substituting average ACT score, an alternative achievement test, yields very similar results to those presented here. All sites required either ACT or SAT scores for admission.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  1. Asendorpf JB, Conner M, De Fruyt F, De Houwer J, Denissen JJA, Fiedler K, et al. (2013). Recommendations for increasing replicability in psychology. European Journal of Personality, 27, 108–119. [Google Scholar]
  2. Castillo LG, & Schwartz SJ (2013). Introduction to the special issue on college student mental health. Journal of Clinical Psychology, 69, 291–297. [DOI] [PubMed] [Google Scholar]
  3. Cohen J (1992). A power primer. Psychological Bulletin, 112, 155–159. [DOI] [PubMed] [Google Scholar]
  4. Chen FF (2008). What happens if we compare chopsticks with forks? The impact of making inappropriate comparisons in cross-cultural research. Journal of Personality and Social Psychology, 95, 1005–1018. [DOI] [PubMed] [Google Scholar]
  5. Cheung GW, & Rensvold RB (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233–255. [Google Scholar]
  6. Cortina JM, Goldstein NB, Payne SC, Davison HK, & Gilliland SW (2000). The incremental validity of interview scores over and above cognitive ability and conscientiousness scores. Personnel Psychology, 53, 325–351. [Google Scholar]
  7. Donnellan MB, Oswald FL, Baird BM, & Lucas RE (2006). The mini-IPIP scales: Tiny-yet-effective measures of the Big Five factors of personality. Psychological Assessment, 18, 192–203. [DOI] [PubMed] [Google Scholar]
  8. Ebersole CR, Atherton OE, Belanger AL, Skulborstand HM, Allen JM, Banks JB, et al. (in press). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology.
  9. Fabrigar LR, & Wegener DT (in press). Conceptualizing and evaluating the replication of research results. Journal of Experimental Social Psychology. Advance online publication. doi: 10.1016/j.jesp.2015.07.009 [DOI]
  10. Gosling SD, Rentfrow PJ, & Swann WB Jr. (2003). A very brief measure of the Big Five personality domains. Journal of Research in Personality, 37, 504–528. [Google Scholar]
  11. Hill PL, Jackson JJ, Nagy N, Nagy G, Roberts BW, L€udtke O, et al. (in press). Majoring in selection, and minoring in socialization: The role of the college experience on goal change post high school. Journal of Personality. [DOI] [PubMed]
  12. Hopwood CJ, & Donnellan MB (2010). How should the internal structure of personality inventories be evaluated? Personality and Social Psychology Bulletin, 14, 332–346. [DOI] [PubMed] [Google Scholar]
  13. Ioannidas JPA (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Jackson JJ, Thoemmes F, Jonkmann K, L€udtke O, & Trautwein U (2012). Military training and personality trait development: Do the military make the man, or does the man make the military? Psychological Science, 23, 270–277. [DOI] [PubMed] [Google Scholar]
  15. Lishner DA (2015). A concise set of core recommendations to improve the dependability of psychological research. Review of General Psychology, 19, 52–68. [Google Scholar]
  16. Lucas RE, & Donnellan MB (2011). Personality development across the life span: Longitudinal analyses with a national sample from Germany. Journal of Personality and Social Psychology, 101, 847–861. [DOI] [PubMed] [Google Scholar]
  17. Lüdtke O, Roberts BW, Trautwein U, & Nagy G (2011). A random walk down university avenue: Life paths, life events, and personality trait change at the transition to university life. Journal of Personality and Social Psychology, 101, 620–637. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Muthén LK, & Muthén BO (1998–2014). Mplus User’s Guide. (7th Ed.). Los Angeles, CA: Muthén & Muthén. [Google Scholar]
  19. Noftle EE, & Robins RW (2007). Personality predictors of academic outcomes: Big Five correlates of GPA and SAT scores. Journal of Personality and Social Psychology, 93, 116–130. [DOI] [PubMed] [Google Scholar]
  20. Nosek BA, Spies JR, & Motyl M (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7, 615–631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Paunonen SV, Jackson DN, & Oberman SM (1987). Personnel selection decisions: Effects of applicant personality and the letter of reference. Organizational Behavior and Human Decision Processes, 40, 96–114. [Google Scholar]
  22. Pinheiro J, Bates D, DebRoy S, Sarkar D, & R Core Team (2015). nlme: Linear and nonlinear mixed effects models. R package version 3. 1–120. Retrieved from http://CRAN.R-project.order/package=nlme
  23. Rentfrow PJ, Gosling SD, & Potter J (2008). A theory of the emergence, persistence, and expression of geographic variation in psychological characteristics. Perspectives on Psychological Science, 3, 339–369. [DOI] [PubMed] [Google Scholar]
  24. Rentfrow PJ, Jokela M, & Lamb ME, (2015). Regional personality differences in Great Britain. PLoS ONE, 10(3), e0122245. doi: 10.1371/journal.pone.0122245 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Roberts BW (2006). Personality development and organizational behavior. Research in Organizational Behavior, 27, 1–40. [Google Scholar]
  26. Roberts BW, Caspi A, & Moffit TE (2003). Work experiences and personality development in young adulthood. Journal of Personality and Social Psychology, 84, 582–593. [PubMed] [Google Scholar]
  27. Roberts BW, Donnellan MB, & Hill PL (2012). Personality trait development in adulthood: Findings and implications. In Tennen H & Suls J (Eds.), Handbook of psychology (Vol. 5, 2nd ed., pp. 183–196). Hoboken, NJ: Wiley. [Google Scholar]
  28. Roberts BW, Wood D, & Caspi A (2008). The development of personality traits in adulthood. In John OP, Robins RW & Pervin LA (Eds.), Handbook of Personality (3rd ed., pp. 375–398). New York: Guilford Press. [Google Scholar]
  29. Robinson WS (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351–357. [Google Scholar]
  30. Schmitt DP, Allik J, McCrae RR, & Benet-Martínez V (2007). The geographic distribution of Big Five personality traits: Patterns and profiles of human self-description across 56 nations. Journal of Cross-Cultural Psychology, 38, 173–212. [Google Scholar]
  31. Schmitt DP, Realo A, Voracek M, & Allik J (2008). Why can’t a man be more like a woman? Sex differences in Big Five personality traits across 55 cultures. Journal of Personality and Social Psychology, 94, 168–182. [DOI] [PubMed] [Google Scholar]
  32. Schneider B, Smith DB, & Goldstein HW (2000). Attraction, selection, attrition: Toward a person-environment psychology of organizations. In Walsh WB, Craik KH, & Price RH (Eds.), Person-environment psychology (2nd ed., pp. 61–85). Mahwah, NJ: Erlbaum. [Google Scholar]
  33. Schwartz SJ, Weisskirch RS, Zamboanga BL, Castillo LG, Ham LS, Huynh Q, et al. (2011). Dimensions of acculturation: Associations with health risk behaviors among college students from immigrant families. Journal of Counseling Psychology, 58, 27–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Skatova A, & Ferguson E (2014). Why do different people choose different university degrees? Motivation and the choice of degree. Frontiers in Psychology, 5, 1244. doi: 10.3389/fpsyg.2014.01244 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Viechtbauer W (2010). Conducting meta-analysis in R with the metafor package. Journal of Statistical Software, 36(3), 1–48. [Google Scholar]
  36. Waterman AS, Schwartz SJ, Hardy SA, Kim SY, Lee RM, Armenta BE, et al. (2013). Good choices, poor choices: Relationship between the quality of identity commitments and psychosocial functioning. Emerging Adulthood, 1, 163–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Weisskirch RS, Zamboanga BL, Ravert RD, Whitbourne SK, Park IK, Lee RM, et al. (2013). An introduction to the composition of the Multi-Site University Study of Identity and Culture (MUSIC): A collaborative approach to research and mentorship. Cultural Diversity and Ethnic Minority Psychology, 19, 123–130. [DOI] [PubMed] [Google Scholar]
  38. Zamboanga BL, Pesigan IA, Tomaso CC, Schwartz SJ, Ham LS, Bersamin M, et al. (2015). Frequency of drinking games participation and alcohol-related problems in a multiethnic sample of college students: Do gender and ethnicity matter? Addictive Behaviors, 41, 112–116. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Tables 1-4

Supplementary Table 1. Absolute Value of Each Site’s Distance from Grand Mean

Supplementary Table 2. Multi-Level Models for Potential Attraction Effects with Site-Level Predictors

Supplementary Table 3. Multi-Level Models for Potential Selection Effects with Site-Level Predictors

Supplementary Table 4. Multi-Level Models for Additional Site-Level Predictors

Table 5

Supplementary Table 5. Fit Statistics from Models with Random Slopes for Traits

RESOURCES