Abstract
This paper highlights the importance of examining individual, classroom, and school-level variables simultaneously in early childhood education research. While it is well known that Hierarchical Linear Modeling (HLM) in school-based studies can be used to account for the clustering of students within classrooms or schools, less known is that HLM can use random effects to investigate how higher-level factors (e.g., effects that vary by school) moderate associations between lower-level factors. This possible moderation can be detected even if higher-level data are not collected. Despite this important use of HLM, a clear resource explaining how to test this type of effect is not available for early childhood researchers. This paper demonstrates this use of HLM by presenting three analytic examples using empirical early childhood education data. First, we review school-level effects literature and HLM concepts to provide the rationale for testing cross-level moderation effects in education research; next we do a short review of literature on the variables that will be used in our three examples (viz., teacher beliefs and student socioemotional behavior); next we describe the dataset that will be analyzed; and finally we guide the reader step-by-step through analyses that show the presence and absence of fixed effects of teacher beliefs on student social outcomes and the erroneous conclusions that can occur if school-level moderation (i.e., random effects) tests are excluded from analyses. This paper provides evidence for the importance of testing for how teachers and students impact each other as a function of school differences, shows how this can be accomplished, and highlights the need to examine random effects of clustering in educational models to ensure the full context is accounted for when predicting student outcomes.
Keywords: random effects, hierarchical linear modeling, school-level effects, moderation
Children in educational contexts are inherently part of a complex system of interlocking levels of influence. Schools and childcare settings are ecological systems (Bronfenbrenner, 2009) that have a natural hierarchy, such as students clustered within classrooms (or teachers), which are then nested within schools or centers, which are then nested within districts and communities. Each of these settings has its own unique impact on children’s growth, development, and learning (Burstein, 1980; Raudenbush & Willms, 1995). In educational settings, research typically focuses on understanding student cognitive, social, emotional, or physical outcomes.i Misinterpretation of relations among these variables can result if the clustering of the data is not accounted for in analyses (Dorman, 2008). Hierarchical linear modeling (HLM; see Raudenbush & Bryk, 1986) is one method to account for nested data. HLM, also known as multilevel modeling or mixed-effect modeling in other fields, can be used with longitudinal data to look at student trajectories (i.e., within-student effects over time) while investigating contextual influences on these trajectories (Raudenbush & Bryk, 2002). The current paper alternatively focuses on utilizing HLM to analyze cross-sectional data (Burstein, 1980) to investigate the moderating influence of school-level factors on the relations among lower-level teacher/classroom and student variables.
School-Level Effects in Multi-Level Education Research
Examining school-level effects on student outcomes is not a new endeavor. School-level impacts on students have typically been investigated as direct paths, for example examining the effect of school resources on student outcomes (Greenwald et al., 1996), or indirect paths via the classroom-level, such as examining principals’ practices that impact classroom teachers who then impact individual student outcomes (Louis et al., 2010).
Examples of investigating direct school-level effects on student outcomes include studies of school size (Lee, 2000), principal leadership (Ross & Gray, 2006; Sebastian & Allensworth, 2012; Tan et al., 2024), type of school, such as public versus private (Raudenbush & Bryk, 1986), and school climate (see Anderson, 1982, and Wang & Degol, 2016, for reviews). Student outcomes studied most often are academically oriented, but also include variables like student aggression, classroom quality, and student bullying (Anderson, 1982; Gage et al., 2014; Rochester et al., 2019). Direct effects of school-level intervention on student outcomes have also been tested (e.g., social competence outcomes in the Conduct Problems Prevention Research Group, 2010).
Some researchers have taken a more nuanced approach by testing mediation model pathways to connect school factors to student outcomes. For example, school climate has been shown to impact teacher stress, which then impacts the learning environment and student socioemotional and academic outcomes (Valiente et al., 2021). Similarly, school leadership has been associated with teachers’ instruction and efficacy, which in turn impacts academic student outcomes (Ross & Gray, 2006; Sebastian & Allensworth, 2012).
This research shows that aspects of the school or characteristics at the school level can impact student outcomes and is demonstrated by testing cross-level effects where variables at different levels impact each other (e.g., a school-level variable impacts a student-level variable). However, an additional layer not typically investigated in school effects literature, particularly early childhood settings, is cross-level moderation (i.e., how school variables can impact the link between classroom-level and student-level variables; see Figure 1), which can be tested using HLM.
Figure 1.

School-Level Moderation
Higher-level moderation effects—or random effects—of school-level variables on lower-level relations are seldomly hypothesized and rarely tested. While the ability to calculate such random effects is not new (see Raudenbush & Bryk, 1986), the importance of testing for these effects has received little attention in the educational research realm. Yet if researchers do not look for such moderation effects, significant associations between teacher and student variables can be missed. HLM is one method that can be used to test such cross-level moderation hypotheses, even if school-level variables were not measured. In the current study, we illustrate how to test for such school-level moderation effects in an early childhood setting. Specifically, we use a 3-level HLM model and test whether there are school-level effects on how teacher beliefs about classroom social relations and bullying impact students’ social outcomes.
One study that does test higher-, school-level moderation (Neunschwander et al., 2017) examined whether school poverty effects the relation between teacher stress and student executive function. In this case, the school-level variable tested (viz., poverty) was a variable that had been measured. However, identification of a moderation effect can occur even without having measured a school-level variable, an analysis that will be illustrated in the current study. Although moderation models are relatively uncommon in school-effects research, Raudenbush and Byrk (2002) state that one of the general research purposes of utilizing an HLM approach is, “the formulation and testing of hypotheses about cross-level effects (e.g., how varying school size might affect the relationship between social class and academic achievement within schools)” (p. 7). The focus of the current paper is to expand on this purpose of utilizing HLM to test for such cross-level moderation influences between teacher and student-variables.
Statistical Approach and Applications to Education Research Concepts
HLM History, Concepts, and Current Directions
Traditionally, before statistical methods could account for the hierarchical nature of data in schools, aggregated variables for the class and school were used in single-level regression models to look at effects on student outcomes, but this type of analysis has many validity concerns (Hill & Rowe, 1996; Raudenbush & Bryk, 1986). Since students in the same classroom or school share the same environmental context, the assumption of independence is violated as students in the same school have a systematic reason that their outcome scores may be more similar to each other than to those outside of their school context (Peugh, 2010). Disaggregation of variables ignores the variance between classrooms and schools by analyzing all variables at the student level no matter what level the variables represent (i.e., classroom, school), while aggregation ignores individual differences by taking the mean of lower-level variables (Woltman et al., 2012). In the past, researchers acknowledged problems with analyzing single-level models for multi-level data; however, many of these practices continued to be used since no other alternative options were available (Hill & Rowe, 1996; Raudenbush & Bryk, 1986). As statistical analysis methods and software began to advance (see Hoffman & Walters, 2022 for a current overview), it became more common to utilize HLM to analyze school-level impacts on student outcomes by investigating the proportion of variance in the outcome attributed to each level of interest (Hill & Rowe, 1996). The HLM method improved on single-level analyses as it represents each level of the model with its own submodel, allowing for comparisons of variables within each level and between levels (Raudenbush & Bryk, 2002). Three main purposes of utilizing HLM (according to Raudenbush & Bryk, 2002) include improving estimates when sample sizes are small by pulling from a larger group of participants (i.e., collecting data from participants in multiple schools to increase sample size rather than analyzing participants in just one school); investigating differences between environments; and testing for variance at multiple levels of analysis.
In alignment with the last purpose, current methods continue to build on the HLM approach by focusing on more nuanced hypotheses that include within-level comparisons, between-level comparisons, and the variation accounted for at each of these levels (Raudenbush & Bryk, 2002; Woltman et al., 2012).
Two additional concepts that are now being incorporated in many HLM analyses include the computation of the intraclass correlation (ICC) for outcome variables and the splitting and centering of predictor variables. An ICC is an indicator of the amount of variance in the outcome variable attributed to each level of the model. By comparison, splitting and centering predictor variables creates separate variables to investigate the impact of distinct levels of the predictor on the outcome. ICC and splitting/centering comprise Step 1 and Step 2 in the statistical method presented in the current paper, namely, the test for random higher-level effects.
Step 1: Intraclass Correlation Coefficient Analyses for Three-level Model
In a two-level model (e.g., level-1 = student and level-2 = classroom), the intraclass correlation coefficient (Rabe-Hesketh & Skrondal, 2012) represents both the proportion of variance of the student outcome accounted for by between-group (classroom) differences (i.e., clustering; Goldstein et al., 2002) as well as the correlation within groups (i.e., between students in the same class). However, in a three-level model, there are multiple ICCs because there are multiple within-group and between-group variances. The ICCs are calculated using an unconditional model where the multilevel nature of the data is incorporated, but no predictors are included in the model. The level-3 ICC for the outcome is calculated as the variance at level 3 divided by the total variance (Hedges et al., 2012); when using student/classroom/school data, this can be interpreted as the proportion of all variance accounted for by schools. The level-2 (e.g., classroom level) ICC can be calculated in two ways. One is to divide level-2 variance by the total variance (Hedges et al., 2012). However, in some statistical software (such as STATA, 2019), the level-2 ICC is calculated as the proportion of variance at level 2 and above (i.e., levels 2 and 3) over the total variance (Goldstein et al., 2002; Rabe-Hesketh & Skrondal, 2012). It is important to note which method the statistical software uses because an additional step of splitting the level 2 and level 3 variance by hand or with additional syntax may be needed. ICCs are an integral part of HLM since they partition the variance of the outcome into the levels of interest, in our case, addressing how much of the variance in a student outcome variable is accounted for by other student-level variables as well as by classroom- and school-level variables.
Step 2: Splitting and Centering Predictor Variables
A second step to understand in connection with HLM is the splitting of predictor variables into higher-level clusters and then centering them (Brincks et al., 2017; Raudenbush & Bryk, 2002). This allows researchers to determine the separate impact of each level of the predictor variables on the outcome variable. There are two methods to centering, centering at the grand mean and centering within clusters, or group-mean centering (Enders & Tofighi, 2007). This paper focuses on the group-mean centering approach because it eliminates between-cluster correlation, resulting in leveled variables that represent the pure relation between the leveled predictor and the outcome (Enders & Tofighi, 2007). In addition, because group-mean centering eliminates correlations caused by clustering (i.e., between clusters), this makes group-mean centering an ideal option to identify true cross-level moderation (Enders & Tofighi, 2007).
Using the group-mean centering method with school data, a predictor variable measured at level-1 (a student predictor variable) would be transformed (or split) into three new predictor variables: A pure level-1 student variable, a level-2 student variable group-mean-centered by classroom, and a level-3 student variable group-mean-centered by school. The three resulting variables are then used in all analyses to interpret the impact of that predictor from a particular level of variation. This is not only to be conducted on continuous variables, but categorical variables as well (Yaremych et al., 2021). For example, consider a binary variable, student eligibility for free or reduced lunch (FRL) services, as a predictor variable in a 3-level model of student, classroom, and school with 0 = does not qualify for FRL and 1 = qualifies for FRL. To split and center this variable to create the classroom-level variable, (a) take the classroom mean (which will be between 0 and 1, therefore representing the proportion of students in the class who qualify for FRL) and (b) subtract this from the raw code (0 = does not qualify, 1 = qualifies). The same procedure is repeated for the school (i.e., the proportion of students in the school who qualify for FRL [between 0 and 1]). After splitting and centering the variable, the resulting level-1 variable corresponds to the effect of the student’s own FRL status on the outcome. The level-2 variable represents the effect of a classes’ FRL eligibility makeup on the outcome. The inclusion of the average of FRL eligibility at the school level represents the effect at level 3. Only some or all the adjusted variables could then be used as predictors in the model. This results in pure estimates of the relations between each level of the predictor and the outcome, as the split predictor levels are no longer correlated (Enders & Tofigi, 2007; see Peceguina et al., 2022, for an example).
However, researchers should be mindful of the data they have collected. In the example of FRL eligibility, if all students in the classroom or school are not in the study (e.g., due to attrition, subsampling, non-participation), the level-2 and level-3 variables will represent only the eligibility makeup of those in the study, not the entire classroom or school.
Random Effects Applied to Early Childhood Education Research
As described earlier, when utilizing an HLM analysis, not only can researchers test for the direct impact of higher-level variables on lower-level variables (e.g., a school-level variable impacting a classroom-level or student-level variable) but random effects (or random slopes) can also be used to test for the influence of higher-level variables on the relation between lower-level variables (i.e., moderation) (Heisig & Schaffer, 2019). A random effect is the variance around a parameter estimate, and random-effects testing examines whether this variance around the parameter could have occurred by chance (Raudenbush & Bryk, 1986). If a random effect is significant, this means that in the population, the distribution around the fixed effect (e.g., relation between teacher predictor and student outcome; see Figure 2 for an example) is not zero, allowing the slope of the fixed effect to vary. This suggests that there is something occurring at the higher level that is impacting where the lower-level relation occurs; in other words, a significant random effect is akin to a significant moderation effect of the higher-level variable on the association between the lower-level variables. This influence can occur in multiple ways, even in two-level models, such as a classroom variable moderating the relation between two student-level variables. However, we will focus on a three-level model where the moderator is at the school-level, moderating between the classroom and student levels.
Figure 2.

Theoretical Relationship of a Random School-Level Effect Between Teacher Reason and Student Bullying
The specific variable acting as a moderator does not have to be measured or identified; it can, instead, be inferred based on the distribution around the lower-level variables indicating when higher-level factors vary (analogous to a latent variable being inferred based on statistical properties of observed variables in Structural Equation Modeling). Random effects can be present even if fixed effects are nonsignificant as the presence of a random effect may conceal the significance of the fixed effect. This can happen because the fixed effect uses the grand mean of the higher-level cluster variable; this means the slope between the predictor and outcome is not allowed to vary for different clusters. If some relations between the predictor and outcome are positive and others are negative, the fixed effect may appear to be zero (or nonsignificant) due to the use of the grand mean even though moderation is present (see Figure 2). Although champions of HLM urge others to pose hypotheses that test random effects (Peugh, 2010; Raudenbush & Bryk, 2002), we found few empirical articles that utilized these suggestions.
School-Level Effects in the Current Study
In the data we use for illustrative purposes in the current study, the classroom-level factor examined is teacher beliefs about social exclusion, and the student-level factors are children’s social experience (e.g., bullying). In early childhood and elementary classrooms, the teacher is part of the classroom social environment; they are not part of children’s peer relationships, but they do play a role in the overall structure and emotional climate of the classroom (see Hughes, 2012, and Rodkin & Ryan, 2012, for reviews). The teacher’s impact has been viewed as an “invisible hand” impacting peer relationships due to its influence on the larger classroom social ecology (Farmer et al., 2011). While research supports the supposition that teachers impact the social environment in the classroom, the extent to which teachers can or should influence classroom peer social interactions is unclear. Even Developmentally Appropriate Practice (DAP) guidelines by the National Association for the Education of Young Children (NAEYC) are not well-defined regarding the teacher’s role in peer relationships. For example, the NAEYC suggests that while teachers should encourage social interactions in primary-aged children, “this is not to say that teachers can or should control all aspects” of children’s social interactions (Copple & Bredekamp, 2009, p. 271). The lack of clear guidance could impact teachers’ beliefs about their role in peer relationships. Teachers may or may not view students’ social experiences as their responsibility or even as something they are able to influence (Farmer et al., 2011). One specific area that has empirical support is the link between teacher beliefs and peer bullying. For example, teacher beliefs about bullying have been found to impact social relations in the classroom (Oldenburg et al., 2014) as well as the teachers’ likeliness to intervene (Kochenderfer-Ladd & Pelletier, 2008).
On a more general scale, there is an inconsistency among findings about the associations between teachers’ beliefs and their practices (See Buehl & Beck, 2015, for review; Vartuli, 2005). This inconsistency could be because, in most studies, the school context is not adequately considered. Fives and Buehl (2012) propose that the larger contexts of the school and beyond (i.e., external factors) need to be investigated to understand the complex process of when teacher beliefs align with their behaviors and practices that then affect student outcomes. Their call for more research points to the need to investigate a moderation model that identifies whether school-level factors impact the link between teacher beliefs and student outcomes.
Can Testing for Higher-Order Random Effects Change Findings?
To illustrate how the additional step of random effects testing can be utilized in empirical data analysis, three HLM analytic examples will be presented using our real data. The three exemplar models chosen (Models A, B, and C) focus on teacher beliefs about reasons for student social exclusion and students’ social interaction outcomes. These examples are for illustrative purposes; the concepts presented can be applied to a multitude of content areas within education and to hierarchical data outside of the education field.
Research Questions
To examine the impact of testing for random effects of unmeasured school-level factors, we conducted an HLM analysis using our data, guided by the following research questions:
Do teacher beliefs about reasons for student exclusion directly impact student social outcomes?
Does the school context moderate the relation between teacher beliefs about reasons for student exclusion and student social outcomes?
Can significant random effects be present in a model if fixed effects are non-significant?
Method
Participants
This analysis utilized a subsample of the Families and Schools for Health (FiSH) project, an IRB approved longitudinal study focused on the psychosocial correlates of child obesity obtained from students, their classmates (peers), and their teachers (see Harrist et al., 2016 and Topham et al., 2021, for a detailed description). The sample for this analysis included 932 first-grade students within 101 classrooms, nested within 25 schools.
Measures
Level 1 Student Outcome: Peer Reported Mutual Dislike
Peer reports of mutual dislike (sometimes referred to as mutual antipathy; e.g., Yarbrough et al., 2024) were obtained through a series of peer sociometric ratings (Coie et al., 1982). During the second half of the fall semester of 1st grade, researchers worked with each student one-on-one to answer questions about their classmates while referencing photographs to aid in recall. For every classmate, each student was asked to “tell me, how much do you like to play with [classmate’s first name].” The student identified their answer by pointing at one of three smiley faces associated with (1) I like to a lot, (2) sometimes I like to, sometimes I don’t, or (3) I don’t like to. This three-point scale was then used to compare across students in the same classroom. Mutual dislikes were counted if both students of a dyad chose that they did not like to play with the other. The number of mutual dislikes between classmates participating in the study were tallied for each student and then averaged by the number of participants in their class.
Level 1 Student Outcome: Teacher Reported 1) Internalizing Problems and 2) Bullying
Two student-level outcomes were measured using two sub-scales of the teacher report Behavior Assessment System for Children, 2nd edition (BASC-II, Reynolds & Kamphaus, 2004). The teacher rated each student in the class in terms of the degree to which they exhibited the described behavior across a 4-item word rating scale (never to almost always). The first sub-scale used was child internalizing problems which is a composite measure of the anxiety, depression, and somatization sub-scales (27 items, α = .91). The second scale utilized was the bullying sub-scale which measures the frequency of the child’s engagement in bullying behavior (e.g., teases others) (12 items, α = .93).
Level 2 Teacher Predictor: Social Beliefs
Teacher beliefs about developmentally appropriate practices and student social interactions were measured using an adaption of the Teacher Beliefs Scale (TBS, Burts, 1991) that included additional questions created by the project team. For each question, the teacher responded on a 4-point scale from strongly disagree to strongly agree about the statement. All subscales (see Table 1) were included in the model analyses. For our purposes of demonstrating how random effects may be identified, only results involving one subscale—teacher beliefs about the reason for exclusion—are presented here. The reason for exclusion subscale includes reasons teachers thought children might be left out of play with peers such as having a disability or other unchangeable characteristics, like being overweight or being in an ethnic minority group.
Table 1.
Teacher Beliefs Scale Subscales
| Subscale | # of Items | Cronbach’s α | Sample Item |
|---|---|---|---|
| Reason for Exclusion | 4 | .75 | Children are sometimes excluded from play because they are from an ethnic minority group |
| Intervention | 5 | .74 | Teachers should intervene when children are left out of play at school |
| Concern | 4 | .74 | If teachers could help decrease social exclusion among children the emotional climate of the classroom would improve |
| Changeable | 6 | .80 | The classroom is an appropriate place to teach children how to get along with one another |
| DAP (Original Scale) | 20 | .83 | First grade activities should be responsive to individual differences in interest |
Note. DAP = Developmentally Appropriate Practice.
Demographic Predictor Variables
Collected at Level 1-Student
Demographic variables collected at the student level and used in analyses included sex (analyzed as two categories based on identification by parent and school staff: female or male) and race (analyzed as three categories: White, Native American, and other minority).
Collected at Level 2-Classroom
The total class size was collected for all participating classrooms.
Collected at Level 3-School
School demographics included the number of 1st grade students in the school as a proxy for the size of the school, the percentage of White students in the school, and the percentage of students who qualified for either free or reduced lunches in the school. These values were obtained for the years of data collection via the Common Core of Data (CCD) available online (National Center for Education Statistics). See Table 2 for demographics and information about the participating classrooms and schools.
Table 2.
Sociodemographic Characteristics of Sample
| Characteristics | n | % | M | SD/Range |
|---|---|---|---|---|
| Level 1: Student | ||||
| Sex | ||||
| Female | 456 | 49 | ||
| Male | 476 | 51 | ||
| Race | ||||
| White | 694 | 74 | ||
| Native American | 162 | 17 | ||
| Other Minoritya | 76 | 8 | ||
| Level 2: Class | ||||
| Class Size | 20 | 3.4 | ||
| Level 3: School | ||||
| 1st Grade Students | 73 | 27.7 | ||
| White Students (%) | 73 | 10.2 | ||
| Students who Qualify for Free or Reduced Lunches (%) | 61 | 13.5 | ||
|
| ||||
| Participating Classes and Schools | ||||
| Classes | 101 | 9.2b | 3-20b | |
| Schools | 25 | 37.3c | 8-104c | |
Note. N = 932.
Other minority included African American, Hispanic, Asian, and multi-ethnic categories and were combined for analysis purposes;
Participants per class;
Participants per school.
Analytic Approach
All analyses were run using STATA 16 (StataCorp, 2019). HLM was chosen to answer the research questions to test for the potential of higher-level moderators of lower-level relations while taking into account the nesting of the students within classrooms and classrooms (i.e., teachers) within schools. This analysis of testing for random effects requires the following steps (see Table 3) and was followed for each of the three outcomes (mutual dislike, internalizing problems, and bullying).
Table 3.
Overview of Steps for Testing of Random Effects
| Analysis Step | Description |
|---|---|
| 1. Intraclass Correlation (ICC) computation | For the outcome, identifies the amount of variance at each level of interest |
| 2. Preparing the variables for analysis-splitting and centering predictor variables | Allows interpretation of predictor variables at each level of analysis |
| 3. Adding demographics to model | Create baseline for next step |
| 4. Testing random intercept model | Test random intercept model versus model in step 3 |
| 5. Testing random effect model | Test random effect model versus model in step 4 |
| 6. Test specific moderator | If random effect is significant from step 5, test potential moderators (i.e., other variables in the model that could potentially function as school-level effects) |
Step 1: ICC Computation
Three unconditional models were created (one for each student outcome variable), and the intraclass correlations were calculated for each. Without any predictor variables being included in the model, the resulting intraclass correlation coefficients were used to calculate how much variation there is in each outcome as a function of level (e.g., how much variance in Model A’s outcome variable is accounted for at the school, classroom, and student levels). Even small levels of variance could present with statistical significance at a later step due to the addition of more information to the model that may shift the balance of variables.
Step 2: Preparing the Variables for Analysis
Predictor variables collected at level-1 and level-2 (including demographics) were split and centered, resulting in corresponding level-1, -2, and -3 predictors (accounting for variance unique to each level). These new variables were used instead of the original predictor variables in all subsequent analyses. In our analyses, some variables were excluded (such as level-2 and level-3 sex). This is because, in our sample, not all students in the class participated in the study meaning that sex was not collected for every child in the participating classrooms. In addition, since we only collected data from 1st grade students, the sex makeup of the school would not be accurate of the entire school, only of those who participated in the 1st grade study classrooms. In this case, utilizing the level-2 and level-3 variable would indicate the sex makeup of the sample classroom rather than the entire classroom (i.e., the sex makeup of the students participating may not be representative of the sex makeup of the classroom or school). However, the level-1 variable created from splitting and centering is still utilized rather than the raw data variable. See Figure 3 for all included variables in the model.
Figure 3.

Variables in Model
Step 3: Adding Demographics
Next, the demographic variables were added to each model resulting in a model with the outcome and only demographics accounting for clustering. This model was used as a starting point for the next two steps to verify that accounting for these variables was not contributing to any findings.
Step 4: Testing Random Intercept Models
Next, random intercept models were tested for significance with each of the three outcome variables. A Likelihood Ratio (LR) test of comparison was used to test for a difference between the model without any teacher belief variables but including the split and centered demographics from step 3 to see if adding these teacher belief variables was significant. This allowed for testing whether adding the belief variables contributed significant statistical variance to the predictive model, and it created a baseline of comparison for the next step of testing random effects.
Step 5: Testing for Random Effects
The teacher beliefs subscales were then tested to see if any varied randomly at the school level. This was accomplished by utilizing an LR test to compare two models: (1) the outcome regressed on the demographic predictor variables and the teacher belief predictor variables (see Figure 3 for included predictor variables) versus (2) a model with the outcome regressed on the demographics and teacher belief variables plus one of the teacher belief variables as a possible random effect at the school level.
If the researcher has not collected school-level data, this is where the researcher would stop and make speculations about the moderator in the discussion. If the researcher collected school-level data, they would move onto step 6.
Step 6: Testing for Moderator
If a random effect is found in step 5 and a school-level variable was collected, step 6 involves testing this potential moderator. An additional step depending on the collected or available variables, is to test for a specific moderator if a random effect is found in Step 5. This is accomplished by conducting an additional LR test to compare the model from Step 5 to a model with the variable acting as a potential moderator as a random effect at the school level.
Results and Discussion
Results for three models are presented. For Model A, the level-1 outcome variable is student internalizing; Model B, mutual dislike; and for Model C, bullying. See Figure 3 for all included predictors in the models. Additionally, in each model, the teacher reasons for exclusion variable was tested as a potential target for school-level random effects.
ICC Results
Level-2 and level-3 ICCs were calculated for the unconditional models for each of the three outcomes (Step 1). From these values, the amount of variance at each level (see Table 4) was calculated. Despite the small amount of variance at the school level, random effects may still be found.
Table 4.
Percent of Variance at Each Level for Outcomes
| Outcome Variables | Level of Variance |
||
|---|---|---|---|
| Student | Classroom | School | |
| Model A: Internalizing (TR) | 53.6% | 37.3% | 9.1% |
| Model B: Mutual Dislike (PR) | 77.5% | 22.2% | 0.3% |
| Model C: Bullying (TR) | 88.9% | 11.1% | <0.01% |
Note. TR = teacher reported; PR = peer reported.
Model A: Significant Fixed Effect but No Random Effect Found
Results from the analysis of Model A (see Table 5 for random effect model results) indicate that, across all schools, teacher reasons for why students are excluded from peer interactions significantly predicted student internalizing behavior (p = .003) as calculated in Step 4 of a random intercept model. However, there was no significant random effect (calculated in Step 5) of teacher reason for exclusion at the school level, and the fixed effect remained significant (see Table 5). This means that there is an increase in student’s internalizing behavior as a function of teacher affirmation in the belief that students are excluded by their peers because of unchangeable personal characteristics such as a disability, minority status, or weight; this is consistent across all schools.
Table 5.
Random Effect Models (Step 5)
| Internalizing Behavior (TR) |
Reciprocated Enemies (PR) |
Bullying (TR) |
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Coefficient | SE | p | Coefficient | SE | p | Coefficient | SE | p | ||||
|
| ||||||||||||
| Fixed Effects | ||||||||||||
| Level 2 - TBS Reason for Exclusion | 0.16 | 0.06 | .003 | 0.02 | 0.03 | .493 | 0.09 | 0.94 | .362 | |||
| Level 3 - TBS Reason for Exclusion | 0.16 | 0.09 | .090 | −0.02 | 0.02 | .438 | −0.07 | 0.08 | .394 | |||
| Constant | 1.00 | 0.66 | .129 | −0.11 | 0.17 | .527 | 1.52 | 0.54 | .005 | |||
|
| ||||||||||||
| SD | p | 95% CI | SD | p | 95% CI | SD | p | 95% CI | ||||
|
|
|
|
||||||||||
| Random Effects at Level 3 | LL | UL | LL | UL | LL | UL | ||||||
| Level 2-TBS Reason for Exclusion | < 0.01 | .5 | < 0.01 | < 0.01 | 0.06 | .027 | 0.03 | 0.14 | 0.24 | .023 | 0.09 | 0.61 |
Notes. TR = teacher reported; PR = peer reported; TBS = teacher beliefs scale; LL = lower limit; UL = upper limit; all variables not discussed are left out of the results shown here, but were included in calculations (see Figure 3 for included variables).
Models B and C: No Fixed Effect but Significant Random Effect Found
Results from analyses of both Model B and Model C illustrate that random effects can occur despite non-significant fixed effects (see Table 5). In addition, these findings hold across multiple reporters (viz., peers and teachers). For both mutual dislike and bullying, there was no original impact of adding teacher beliefs into the model in Step 4 (mutual dislike, p = .40; bullying, p = .37). In many cases, researchers may be tempted to stop analyses after Step 4 if a significant result is not found. However, even if the fixed effect is not significant, Step 5 should still be conducted as a random effect could be present. This is confirmed in the Step 5 analysis for Models B and C when the random effect of teacher reason for exclusion is added. The fixed effect of teacher reason for exclusion is not significant, but the random effect is significant (see Table 5).
The reason this can happen is because the fixed effect is the average of the relation across the higher-level clusters, and this may be close to zero. In Model C, for example, the fixed effect of level-2 teacher reason for exclusion on bullying is not statistically significant. However, this is because in some schools there is a positive association between the teacher and student variables (i.e., stronger teacher beliefs correspond with more bullying) and in others, a negative relation (i.e., stronger teacher beliefs correspond with less bullying). Since schools fall on both sides of the average, the average of all schools will not be significant (i.e., no fixed effect) despite there being a significant association between the two, for example, between teacher beliefs and bullying. Figure 1 demonstrates this phenomenon. The yellow line represents the fixed effect of level-2 teacher reason on bullying; this is the average across all schools. This line is fairly close to horizontal, thus not statistically different from zero (i.e., no significant fixed effect). However, as shown by the orange lines (and other corresponding standard deviation lines), in the population, 95% of the schools’ relations between level-2 teacher reason and bullying would fall between the two orange lines. This is the random effect, or the distribution around the fixed effect. In this case, some slopes are negative while others are positive and when averaged, are close to zero. The same result occurred between the level-2 teacher reasons for exclusion and mutual dislike. These findings represent that a characteristic measurable at the school level is moderating when and in what direction level-2 teacher beliefs impact student outcomes. Figure 2 represents this in graphical form.
For Step 6, a step that is optional (i.e., identifying a random effect, Step 5, is still possible without conducting this step), we tested the demographic variables to see if they varied randomly at the school level. No demographic variables were significant predictors, and thus we did not identify a level-3 variable that moderates the relation between the teacher beliefs and student outcome. However, because the level-3 random effect was significant, we know there is a variable at the school level that is impacting the relationship that we did not measure, inviting further investigation. These results support Fives and Buehl’s (2012) idea that identifying the set of important moderators is essential to the understanding of the conditions under which teacher beliefs have an impact on student outcomes and the ways in which beliefs influence teacher practices and students.
Application and Future Research
This paper has the potential for pushing the early childhood research field forward, given that including tests for random effects in analyses can allow for more nuanced hypotheses to be developed and tested (Raudenbush & Bryk, 2002). While education researchers have been using HLM, they have not been taking advantage of the method we present to test for random effects. In our Models B and C, researchers using a traditional approach to HLM might have stopped after determining there were no significant fixed effects in their data (Step 4). However, by following the additional analytic step we outlined (Step 5), we found that there was a significant interaction among the variables, namely, a school-level moderation of the effect of the classroom variable on the student (i.e., a random effect that may have been missed if Step 5 was not followed). That is, although no identification of a specific moderator was located in our data in Models B and C (Step 6), a random effect was present (Step 5). This means that there is a moderator at the school level, but our set of collected variables, analyzed in Step 6, did not include the variable that served as a moderator in this case. For the results found, an example of a potential moderator at the school level could be a variable such as emphasis on meeting standardized testing criteria (see Henry et al., 2022, for an example of accountability pressure). In some schools, although teachers may observe that students are excluded from peer interactions and may wish to devote more time to supporting student social interactions, restraints on time due to the school’s emphasis on subjects that will be tested may not allow for teachers to implement desired practices (Smith & Kovacs, 2011). Lack of time for teachers to provide the support for student social interactions could result in teachers’ beliefs that students are excluded and teachers not acting on the beliefs, resulting in negative social outcomes such as bullying and mutual dislike. In other schools with different amount of testing emphasis, teachers may not feel this pressure and may be able to implement practices based on this belief. This would result in teachers who believe that students are excluded and that their role is to assist in changing this can do so thus increasing student social outcomes (see Dooley & Assaf, 2009, for an example of changes in Language Arts practices despite similar beliefs). This is an example of one potential school-level moderator that could be tested in future analyses.
Overall, random effects are important to test even if the fixed effects are null or if higher-level variables were not measured, as the true relations among variables may be obscured by the inability of the regression slopes to vary. The models we presented as examples point out how testing for random effects can help reveal the complex relations among factors that contribute to student outcomes and highlight the important notion that context matters.
Highlights.
Nested or hierarchical school- and center-based data benefit from HLM analysis.
School-level effects can moderate relations between student/classroom variables.
The presence of random effects can be identified without being measured.
Acknowledgments
This research was supported in part by the National Institute of Food and Agriculture, U.S. Department of Agriculture, under Agreement No. 05545; Oklahoma Center for the Advancement of Science & Technology, Grant #HR07-044; and Oklahoma Agricultural Experiment Station, Grant #2744, and the Bryan B. Close Professorship in Early Childhood Development. Dr. Swindle is supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number 5P20GM109096 as well as NIH R21CA237984 and R37CA252113. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies. Thank you to the elementary school staff and families who participated in this project.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Declarations of interest: none.
Causal language is used throughout the paper because of the guiding conceptual model.
References
- Anderson CS (1982). The search for school climate: A review of the research. Review of Educational Research, 52(3), 368–420. 10.3102/00346543052003368 [DOI] [Google Scholar]
- Brincks AM, Enders CK, Llabre MM, Bulotsky-Shearer RJ, Prado G, & Feaster DJ (2017). Centering predictor variables in three-level contextual models. Multivariate Behavioral Research, 52(2), 149–163. 10.1080/00273171.2016.1256753 [DOI] [PubMed] [Google Scholar]
- Bronfenbrenner U (2009). The ecology of human development: Experiments by nature and design. Harvard University Press. [Google Scholar]
- Buehl MM & Beck JS (2015). The relationship between teachers’ beliefs and teachers’ practices. In Fives H & Gill MG (Eds.), International handbook of research on teachers’ beliefs (pp. 66–84). Routledge. [Google Scholar]
- Burstein L (1980). Chapter 4: The analysis of multilevel data in educational research and evaluation. Review of Research in Education, 8(1), 158–233. 10.3102/0091732X008001158 [DOI] [Google Scholar]
- Burts D (1991). Teacher Beliefs Scale. Baton Rouge School of Human Ecology, Fouisiana State University. [Google Scholar]
- Coie JD, Dodge KA, & Coppotelli H (1982). Dimensions and types of status: A cross-age perspective. Developmental Psychology, 18(4), 557–570. 10.1037//00121649.18.4.557 [DOI] [Google Scholar]
- Conduct Problems Prevention Research Group (2010). The effects of a multiyear universal social-emotional learning program: The role of student and school characteristics. Journal of Consulting and Clinical Psychology, 78(2), 156–168. 10.1037/a0018607 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Copple C, & Bredekamp S (Eds.). (2009). Developmentally appropriate practice in early childhood programs: Serving children from birth through age 8 (3rd ed.). National Association for the Education of Young Children. [Google Scholar]
- Dooley CM, & Assaf LC (2009). Contexts matter: Two teachers’ Language Arts instruction in this high-stakes era. Journal of Literacy Research, 41, 354–392. 10.1080/10862960903133743 [DOI] [Google Scholar]
- Dorman JP (2008). Conducting statistical tests with data from clustered school samples. International Journal of Research & Method in Education, 31(2), 113–124. 10.1080/17437270802124368 [DOI] [Google Scholar]
- Enders CK, & Tofighi D (2007). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue. Psychological Methods, 12(2), 121–138. 10.1037/1082-989X.12.2.121 [DOI] [PubMed] [Google Scholar]
- Farmer TW, Lines MM, & Hamm JV (2011). Revealing the invisible hand: The role of teachers in children’s peer experiences. Journal of Applied Developmental Psychology, 32, 247–256. 10.1016/j.appdev.2011.04.006 [DOI] [Google Scholar]
- Fives H, & Buehl MM (2012). Spring cleaning for the “messy” construct of teachers’ beliefs: What are they? Which have been examined? What can they tell us? In Harris KR, Graham S, & Urdan T (Eds.), APA educational psychology handbook: Individual differences and cultural and contextual factors (Vol. 2, pp. 471–499). American Psychological Association. [Google Scholar]
- Gage NA, Prykanowski DA, & Larson A (2014). School climate and bullying victimization: A latent class growth model analysis. School Psychology Quarterly, 29(3), 256–271. 10.1037/spq0000064 [DOI] [PubMed] [Google Scholar]
- Goldstein H, Browne W, & Rasbash J (2002). Partitioning variation in multilevel models. Understanding Statistics, 1(4), 223–231. 10.1207/S15328031US0104_02 [DOI] [Google Scholar]
- Greenwald R, Hedges LV, & Laine RD (1996). The effect of school resources on student achievement. Review of Educational Research, 66(3), 361–396. 10.3102/00346543066003361 [DOI] [Google Scholar]
- Harrist AW, Swindle TM, Hubbs-Tait L, Topham GL, Shriver LH, & Page MC (2016). The social and emotional lives of overweight, obese, and severely obese children. Child Development, 87(5), 1564–1580. 10.1111/cdev.12548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hedges LV, Hedberg EC, & Kuyper AM (2012). The variance of intraclass correlations in three- and four-level models. Educational and Psychological Measurement, 72(6), 893–909. 10.1177/0013164412445193 [DOI] [Google Scholar]
- Heisig JP, & Schaeffer M (2019). Why you should always include a random slope for the lower-level variable involved in a cross-level interaction. European Sociological Review, 35(2), 258–279. 10.1093/esr/jcy053 [DOI] [Google Scholar]
- Henry GT, McNeill SM, & Harbatkin E (2022). Accountability-driven school reform: Are there unintended effects on younger children in untested grades? Early Childhood Research Quarterly, 61, 190–208. 10.1016/j.ecresq.2022.07.005 [DOI] [Google Scholar]
- Hill PW, & Rowe KJ (1996). Multilevel modelling in school effectiveness research. School Effectiveness and School Improvement, 7(1), 1–34. 10.1080/0924345960070101 [DOI] [Google Scholar]
- Hoffman L, & Walters RW (2022). Catching up on multilevel modeling. Annual Review of Psychology, 73, 659–689. 10.1146/annurev-psych-020821-103525 [DOI] [PubMed] [Google Scholar]
- Hughes JN (2012). Teachers as managers of students’ peer context. In Ryan AM & Ladd GW (Eds.), Peer relationships and adjustment at school (pp. 189–218). Information Age Press. [Google Scholar]
- Kochenderfer-Ladd B, & Pelletier ME (2008). Teachers’ views and beliefs about bullying: Influences on classroom management strategies and students’ coping with peer victimization. Journal of School Psychology, 46, 431–453. 10.1016/jjsp.2007.07.005 [DOI] [PubMed] [Google Scholar]
- Lee VE (2000). Using hierarchical linear modeling to study social contexts: The case of school effects. Educational Psychologist, 25(2), 125–141. 10.1207/S15326985EP3502_6 [DOI] [Google Scholar]
- Louis KS, Dretzke B, & Wahlstrom K (2010). How does leadership affect student achievement? Results from a national US survey. School Effectiveness and School Improvement, 21(3), 315–336. 10.1080/09243453.2010.486586 [DOI] [Google Scholar]
- National Center for Education Statistics.–Common Core of Data (CDC) [Data set]. Elementary/Secondary Information System. https://nces.ed.gov/ccd/elsi/ [Google Scholar]
- Neuenschwander R, Friedman-Krauss A, Raver C, & Blair C (2017). Teacher stress predicts child executive function: Moderation by school poverty. Early Education and Development, 28(7), 880–900. 10.1080/10409289.2017.1287993 [DOI] [Google Scholar]
- Oldenburg B, van Dujin M, Sentse M, Huitsing G, van der Ploeg R, Salmivalli C, & Veenstra R (2014). Teacher characteristics and peer victimization in elementary schools: A classroom-level perspective. Journal of Abnormal Child Psychology, 43(1), 33–44. 10.1007/sl0802-013-9847-4 [DOI] [PubMed] [Google Scholar]
- Peceguina MID, da Graça Daniel JRF, Correia NEFG, & da Mota Aguiar CDR (2022). Teacher attunement to preschool children’s peer preferences: Associations with child and classroom-level variables. Early Childhood Research Quarterly, 60, 150–160. 10.1016/j.ecresq.2022.01.004 [DOI] [Google Scholar]
- Peugh JL (2010). A practical guide to multilevel modeling. Journal of School Psychology, 48, 85–112. 10.1016/jjsp.2009.09.002 [DOI] [PubMed] [Google Scholar]
- Rabe-Hesketh S, & Skrondal A (2012). Multilevel and longitudinal modeling using Stata. (3rd ed., Vol. 1). Stata press. [Google Scholar]
- Raudenbush S, & Bryk AS (1986). A hierarchical model for studying school effects. Sociology of Education, 59(1), 1–17. 10.2307/2112482 [DOI] [Google Scholar]
- Raudenbush SW, & Bryk AS (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Sage. [Google Scholar]
- Raudenbush SW, & Willms JD (1995). The estimation of school effects. Journal of Educational and Behavioral Statistics, 20(4), 307–335. 10.2307/1165304 [DOI] [Google Scholar]
- Reynolds CR, &Kamphaus RW (2004). BASC-2 Behavioral Assessment System for Children Manual (2nd ed.). Pearson. [Google Scholar]
- Rochester SE, Weiland C, Unterman R, McCormick M, & Moffett L (2019). The little kids down the hall: Associations between school climate, pre-k classroom quality, and pre-k children’s gains in receptive vocabulary and executive function. Early Childhood Research Quarterly, 48, 84–97. 10.1016/j.ecresq.2019.02.008 [DOI] [Google Scholar]
- Rodkin PC, & Ryan AM (2012). Child and adolescent peer relations in educational context. In Harris KR, Graham S, Urdan T, Graham S, Royer JM, & Zeidner M (Eds.), APA educational psychology handbook: Individual differences and cultural and contextual factors (Vol. 2, pp. 363–389). American Psychological Association. 10.1037/13274-015 [DOI] [Google Scholar]
- Ross JA, & Gray P (2006). School leadership and student achievement: The mediating effects of teacher beliefs. Canadian Journal of Education, 29(3), 798–822. 10.2307/20054196 [DOI] [Google Scholar]
- Sebastian J, & Allensworth E (2012). The influence of principal leadership on classroom instruction and student learning: A study of mediated pathways to learning. Educational Administration Quarterly, 48(A), 626–663. 10.1177/0013161X11436273 [DOI] [Google Scholar]
- Smith JM, & Kovacs PE (2011). The impact of standards-based reform on teachers: The case of ‘No Child Left Behind’. Teachers and Teaching: Theory and Practice, 17(2), 201–225. 10.1080/13540602.2011.539802 [DOI] [Google Scholar]
- StataCorp. (2019). Stata Statistical software: Release 16. StataCorp LLC. [Google Scholar]
- Tan CY, Dimmock C, & Walker A (2024). How school leadership practices relate to student outcomes: Insights from a three-level meta-analysis. Educational Management Administration & Leadership, 52(1), 6–27. 10.1177/17411432211061445 [DOI] [Google Scholar]
- Topham GL, Washburn IJ, Hubbs-Tait L, Kennedy TS, Rutledge JM, Page MC, Swindle T, Shriver LH, & Harrist AW (2021). The Families and Schools for Health Project: A longitudinal cluster randomized controlled trial targeting children with overweight and obesity. International Journal of Environmental Research and Public Health, 18(16), 8744. 10.3390/ijerphl8168744 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Valiente C, Swanson J, DeLay D, Fraser AM, & Parker JH (2021). Emotion-related socialization in the classroom: Considering the roles of teachers, peers, and the classroom context. Developmental Psychology, 56(3), 578–594. 10.1037/dev0000863 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vartuli S (2005). Beliefs: The heart of teaching. Young Children, 60(5), 76–86. [Google Scholar]
- Wang M-T, & Degol JL (2016). School climate: A review of the construct, measurement, and impact on student outcomes. Educational Psychological Review, 28, 315–352. 10.1007/sl0648-015-9319-1 [DOI] [Google Scholar]
- Woltman H, Feldstain A, MacKay JC, & Rocchi M (2012). An introduction to hierarchical linear modeling. Tutorials in Quantitative Methods for Psychology, 8(1), 52–69. 10.20982/tqmp.08.1p052 [DOI] [Google Scholar]
- Yarbrough ER, Cohen R, Deptula DP, Ray GE, & Ankney RL (2024). A short-term longitudinal examination of the relation of forms of antipathy relationships to children’s loneliness, peer optimism, and peer sociability behaviors. The Journal of Genetic Psychology, 1–18. 10.1080/00221325.2024.2302813 [DOI] [PubMed] [Google Scholar]
- Yaremych HE, Preacher KJ, & Hedeker D (2021). Centering categorical predictors in multilevel models: Best practices and interpretation. Psychological Methods, 28(3), 613–630. 10.1037/met0000434 [DOI] [PubMed] [Google Scholar]
