Abstract
School value-added models are widely applied to study, monitor, and hold schools to account for school differences in student learning. The traditional model is a mixed-effects linear regression of student current achievement on student prior achievement, background characteristics, and a school random intercept effect. The latter is referred to as the school value-added score and measures the mean student covariate-adjusted achievement in each school. In this article, we argue that further insights may be gained by additionally studying the variance in this quantity in each school. These include the ability to identify both individual schools and school types that exhibit unusually high or low variability in student achievement, even after accounting for differences in student intakes. We explore and illustrate how this can be done via fitting mixed-effects location scale versions of the traditional school value-added model. We discuss the implications of our work for research and school accountability systems.
Keywords: school value-added models, mixed-effect models, mixed-effects location scale models, school effectiveness, school accountability
1. Introduction
School value-added models attempt to estimate school differences in student achievement and are widely applied in educational (Goldstein, 1997; Reynolds et al., 2014; Teddlie & Reynolds, 2000) and statistical research (American Statistical Association, 2014; Braun & Wainer, 2007; McCaffrey et al., 2004; Raudenbush & Willms, 1995; Wainer, 2004). They are also used in the United States, United Kingdom, and other school accountability systems, where the predicted school differences, often referred to as school value-added scores, provide the basis of reward and sanction decisions on schools (Amrein-Beardsley, 2014; Castellano & Ho, 2013; Koretz, 2017; Leckie & Goldstein, 2017; Organization for Economic Cooperation and Development, 2008). In educational and statistical research, there is an additional interest in identifying school policies and practices that predict the school differences and that might therefore prove effective at raising student achievement in schools in general.
The traditional school value-added model is formulated as a mixed-effects (multilevel or hierarchical) linear regression model (Goldstein, 2011; Raudenbush & Bryk, 2002; Snijders & Bosker, 2012) of student current achievement on student prior achievement measured at the start of the value-added period (typically defined as one or more school years or a phase of schooling) and a school random intercept effect to predict the school differences (Aitkin & Longford, 1986; Goldstein et al., 1993; Raudenbush & Bryk, 1986). The adjustment for student prior achievement is fundamental as simpler comparisons of unadjusted school mean achievement would in large part reflect school differences in student achievement present at the start of the value-added period. Such differences are argued beyond the control of the school. Student sociodemographic characteristics are often added to adjust for initial school differences in student composition more convincingly (Ballou et al., 2004; Leckie & Goldstein, 2019; Leckie & Prior, 2022; Levy et al., 2023). Schools with higher scores are described as adding more value: producing higher student achievement for any given set of students. The scores are argued to reflect the net influences of differences in the quality of teaching, availability of resources, and other policies and practices across schools, which are typically unobserved to the data analyst. The regression coefficient on student prior achievement is occasionally allowed to vary across schools. The resulting random slope model is sometimes referred to as a “differential school effectiveness” model as this extension allows schools to now have different effects for different types of students (Nuttal et al., 1989; Scherer & Nilsen, 2019; Strand, 2010).
While the traditional school value-added model is widely applied (Levy et al., 2019), it is important to realize that this model is just a regression model fitted to observational data and so the effects attributed to schools may also be caused by other factors that are not captured by the model (American Statistical Association, 2014). That is, while there is consensus that the predicted school effects are fairer and more meaningful measures to compare schools than comparing simple school mean achievement scores, the additional assumptions required to interpret these predicted school effects as causal effects rather than as merely adjusted school mean differences are challenging (Amrein-Beardsley, 2019; Reardon & Raudenbush, 2009; Rubin et al., 2004). For example, the school-level exogeneity assumption (independence of covariates and school random effect) will fail if higher prior achieving students select into more effective schools, perhaps because such students are from more affluent families who are more able to buy into the catchment areas of these schools (Angrist et al., 2021; De Fraine, 2002; Thomas & Mortimore, 1996; Timmermans & Thomas, 2015). The parameter estimates of the school value-added models presented in this article should therefore be viewed as the measures of association and the predicted school effects as descriptive differences in means and variances of student achievement across schools, where inevitably only partial and imperfect adjustments have been made for school differences in student characteristics at intake.
In the traditional school value-added model, the difference between observed and predicted student current achievement defines the total residual, which can be viewed as a covariate-adjusted (residualized) measure of student current achievement (i.e., a controlled comparison of student achievement levels). The total residual is modeled as the summation of the school random intercept effect and the student residual. The school random effect measures the mean student adjusted achievement in each school. In contrast, the constant residual variance implicitly assumes the variance in student adjusted achievement is the same in every school. This inconsistent modeling of the mean and variance does not seem realistic. Any given school policy or practice will have different effects on students as a function of their observed and unobserved characteristics and will therefore contribute to the variance in student adjusted achievement operating in each school. Indeed, this is the motivation for the random slope extension to the traditional value-added model described above. In practice, however, this extension can only be used to account for a limited number of observed student characteristics, not to all observed and unobserved student characteristics (Raudenbush & Bryk, 2002). Thus, the different sets of school policies and practices operating in each school will lead the variance in student adjusted achievement to vary across schools, even in random slope models.
Studying the variance in student adjusted achievement in each school may therefore provide valuable new insights into the differences in student learning between schools. Consider two schools that show similarly high levels of mean student adjusted achievement. The traditional school value-added model would describe these two schools as equally effective. Suppose, however, the first school shows higher variance in their student adjusted achievement scores than the second school. Which school should now be viewed more positively? The school with the higher variance will have more students making exceptionally high adjusted achievement (a positive) albeit at the expense of more students also making unacceptably low adjusted achievement (a negative). All else equal, the school with the higher variance will also show a weaker link between prior and current achievement, and so in this school, low prior achievement students are more able to raise up the achievement distribution (a positive), but equally and necessarily, high prior achievement students are more likely to fall down the distribution (a negative). Thus, in part, how higher variance should be viewed depends on value judgements regarding whether such positives outweigh such negatives. These are not simple questions to answer. Also relevant is the underlying explanation for the difference in variance. For example, if the higher variance seen in the first school is a result of its school policies and practices having greater differential effects on different student groups versus the second school, then higher variance might be viewed as a negative as the explanation implies that the school might not be in sufficient control in the implementation of its policies and practices and is exacerbating inequities in student learning versus the first school (Nuttal et al., 1989; Scherer & Nilsen, 2019; Strand, 2010). Though, here too, a tension lies around what is the optimal level of control. Again, these are not simple questions to answer. More generally, school differences in the adjusted variances, just like school differences in the adjusted means, may also reflect unmodeled school differences in student intake, and so, it is important to attempt to adjust fully for such differences.
A necessary first step to addressing these bigger questions and debates is to first measure school differences in the variance in student adjusted achievement. Only then can school effectiveness and other researchers follow up individual schools, which show unusually high or low variance to try to identify the specific school policies and practices, which are associated with this. Similarly, only then, can school accountability systems, via school inspections, ask schools to reflect on any unusual school variance scores and discuss these within the broader context of what is happening in these schools and other schools facing similar challenges. All these discussions should be alert to the descriptive rather than causal nature of the statistics and to the limitations of the data more generally, and these statistics should not be used to make automatic high-stakes judgements on schools.
The aim of this article is to therefore broaden the traditional school value-added model to study the effects of schools on not just mean student current achievement, but the variance in student current achievement. We do this by applying mixed-effect location scale (MELS) models to student current achievement. MELS models are an extension to conventional mixed-effects linear regression models that model the residual variance not as a constant, but as a function of the covariates and a new random effect. Thus, the residual variance is now allowed to vary across the schools. Hedeker et al. (2008) illustrated the MELS model in the context of studying intensive longitudinal data on mood. Subsequently, Hedeker and others further developed this class of models and applied it to a range of other longitudinal psychological and health data (e.g., Goldstein et al., 2018; Hedeker et al., 2012; Nordgren et al., 2020; Parker et al., 2021; Rast et al., 2012). Just as mixed-effects models more generally are routinely also applied to clustered cross-sectional data, so can MELS models. Indeed, several such applications have now been published, including in social science research (Brunton-Smith et al., 2017, 2018; Leckie et al., 2014; McNeish, 2021). However, the applicability of MELS models to school value-added studies has not yet been explored. We address this via an application to school value-added models for school accountability in London, England. Specifically, we examine the following research question: How does the variance in student adjusted achievement vary across schools?
This article proceeds as follows. In Section 2, we introduce our application. In Section 3, we present the traditional random-intercept and -slope linear regression school value-added models and their extensions to MELS models. In Section 4, we present the results. In Section 5, we provide a general discussion, including implications of our work for research and school accountability.
2. Application
Background
In England, since 2004, the Government has published school value-added scores derived from school value-added models for all secondary schools in the country in annual school performance tables (https://www.gov.uk/school-performance-tables). These scores aim to measure the value that each school adds to student achievement between the end of primary schooling national Key Stage 2 (KS2) tests (age 11, academic year 6) and the end of compulsory secondary schooling General Certificate of Secondary Education (GCSE) examinations (age 16, academic year 11). The scores play a pivotal role in the national school accountability system, informing school inspections and judgments on schools. They are also promoted to parents as a source of information when choosing schools for their children. Their high stakes use and public presentation have drawn sustained criticism from the academic literature (Goldstein & Spiegelhalter, 1996; Leckie & Goldstein, 2009, 2017, 2019; Prior, Jerrim, et al., 2021). Nevertheless, these authors also argue that when used carefully and collaboratively with schools in a sensitive and less public manner, there is still an important role for these scores to help identify and understand differences in student outcomes across schools, and it is in this spirit that we have carried out the current research (Goldstein, 2020).
Data, Sample, and Variables
We focus on schools in London and on those students who took their GCSE examinations in 2018 and therefore KS2 tests in 2013. The sample is drawn from the National Pupil Database (Department for Education [DfE], 2023) a census of all students in state education and consists of 71,321 students in 465 schools (mean = 153 students per school, range = 14–330).
Student current and prior achievement are measured by students’ GCSE examination and KS2 test scores (DfE, 2020). We standardize these scores to have means of 0 and standard deviations (SDs) of 1, so that the measures can be interpreted in SD units. Henceforth, we refer to these standardized scores simply as the student age 16 and 11 scores. Figure 1 shows both scores are approximately normally distributed and linearly related with a strong Pearson correlation of 0.72. There are very slight floor and ceiling effects in age 16 scores.
Figure 1. Histograms and scatterplot of student age 16 and age 11 scores.
Table 1 presents the summary statistics for the student characteristics. Of note, 61% of students are non-White and 35% poor (as measured by receipt of free school meals [FSMs]). The London sample is therefore more ethnically diverse and poorer than the full English sample, where only around 25% of students are non-White and 25% poor (Leckie & Goldstein, 2019).
Table 1. Summary Statistics for the Student Characteristics.
| N | % | |
|---|---|---|
| Age | ||
| Not summer born | 52,957 | 74.3 |
| Summer born | 18,364 | 25.8 |
| Gender | ||
| Boy | 35,338 | 49.6 |
| Girl | 35,983 | 50.5 |
| Ethnicity | ||
| White | 28,070 | 39.4 |
| Black | 15,633 | 21.9 |
| Asian | 14,987 | 21.0 |
| Chinese | 447 | 0.6 |
| Mixed | 5,795 | 8.1 |
| Other | 6,389 | 9.0 |
| Language | ||
| English | 42,789 | 60.0 |
| Not English | 28,532 | 40.0 |
| Special educational need (SEN) | ||
| Not SEN | 61,189 | 85.8 |
| SEN | 10,132 | 14.2 |
| Free school meal (FSM) | ||
| Not FSM | 46,500 | 65.2 |
| FSM | 24,821 | 34.8 |
Note. n = 71; 321.
Table 2 presents the summary statistics for the school characteristics. A range of school types operate in London (Leckie & Goldstein, 2019), and we have categorized these into four groups: standard, sponsored academy, converter academy, and other. Standard school type encompasses community, foundation, voluntary aided, voluntary controlled, and city technology colleges. In contrast to standard and other schools, academies receive their funding directly from the government rather than through local authorities (school districts). Sponsored academies are mostly underperforming schools, which have been required to change to academy status and are run by sponsors. Converter academies are successfully performing schools that have opted to convert to academy status. Other school type encompasses free, studio, university technology colleges (UTCS), and further education colleges. These are more technically or vocationally oriented schools.
Table 2. Summary Statistics for the School Characteristics.
| n | % | |
|---|---|---|
| Type | ||
| Standard | 151 | 32.5 |
| Sponsored academy | 93 | 20.0 |
| Converter academy | 184 | 39.6 |
| Other | 37 | 8.0 |
| Admissions | ||
| Comprehensive | 425 | 91.4 |
| Grammar | 19 | 4.1 |
| Secondary modern | 21 | 4.5 |
| School gender | ||
| Mixed | 340 | 73.1 |
| Boys | 50 | 10.8 |
| Girls | 75 | 16.1 |
| Religious | ||
| No | 349 | 75.1 |
| Yes | 116 | 25.0 |
Note. n = 465.
A minority of local authorities operate selective rather than comprehensive admissions. In these areas, grammar schools select students based on high performance in entrance examinations and so by definition have high mean age 11 scores and tend also to be educationally advantaged and homogenous in terms of student sociodemographic characteristics. Secondary modern schools take those students not admitted to grammar schools.
3. Models
Model 1: Random-Intercept Model
The traditional school value-added model (Aitkin & Longford, 1986; Goldstein et al., 1993; Raudenbush & Bryk, 1986) can be written as the following random-intercept linear regression:
| (1) |
where yij and x1ij denote current and prior achievement for student i (i = 1, …, nj) in school j (j = 1, …, J), β0 and β1 denote the regression coefficients, uj the school random intercept effect, eij the student residual, and where uj and eij are assumed independent of one another, independent of x1ij, and normally distributed with zero means and constant variances and . As discussed in Section 1, the independence assumptions are unlikely to hold, and so in this article, we interpret the school value-added model and the predicted school effects as descriptive rather than causal. Further student and school covariates may be added to this model, and we will explore this in Section 4.
The total residual uj + eij measures covariate-adjusted (residualized) student current achievement. That is student current achievement having adjusted for prior achievement. The overall average adjusted achievement is 0. The random effect uj therefore measures the mean student adjusted achievement in each school (the traditional school value-added score), while the residual eij measures the adjusted achievement of each student relative to their school mean. The random effect variance measures the variation in school mean adjusted achievement across schools. The residual variance measures the average variance in student adjusted achievement within schools. Crucially, this parameter is averaged across all schools. Thus, while the model allows mean student adjusted achievement to vary from school to school uj, it assumes the variance in student adjusted achievement is the same in every school (homoskedasticity).
Figure 2 illustrates the main details of this and subsequent models using hypothetical data on two schools. In each case, yij is plotted against x1ij. In Model 1 (Figure 2A), the two solid lines represent the school-specific relationships β0 + β1x1ij + uj. The dotted line depicts the average relationship between the two variables β0 + β1x1ij. The school lines are parallel to the average line, because in this model, only the intercept uj differs between schools. The line for School 1 lies above the average line, while the line for School 2 lies below it. The vertical deviations of the school lines from the average line correspond to the school-specific uj. In the current example, we have u1 > 0 > u2. Thus, on average, students in School 1 are predicted to score higher compared to students with the same prior achievement in the average school, while students in School 2 are predicted to score lower. The variability in these mean deviations across all schools corresponds to . The vertical deviation of the student current achievement scores from their relevant school line corresponds to eij. The variability in these deviations corresponds to . This is constant across x1ij and constant across schools.
Figure 2.
Illustration of different models using hypothetical student current and prior achievement scores data for two schools, School 1 (solid markers) and School 2 (hollow markers). Panel A: Random-intercept model. Panel B: Random-intercept model with random residual variance. Panel C: Random-intercept model with random residual variance function. Panel D: Random-slope model. Panel E: Random-slope model with random residual variance. Panel F: Random-slope model with random residual variance function.
Model 2: Random-Intercept Model With Random Residual Variance
Model 2 extends Model 1 by allowing the variance in student adjusted achievement to vary across schools. We do this by specifying an MELS version of the previous model (Hedeker et al., 2008). The model can be written as
| (2) |
where the second line of the equation specifies the residual variance as a log-linear function ln(·) of a new intercept α0 and a new random school effect vj. uj and vj are assumed bivariate normally distributed and independent of the residuals and covariates. The variance function random intercept variance measures the variation in the log of the residual variance across schools. The random intercept-slope covariance σuv measures how uj and vj covary. All other terms are defined as before. The log-linear link function ensures the resulting school-specific residual variances , and therefore, school variances of student adjusted achievement are positive (Hedeker, 2008). Figure 2B illustrates Model 2, where v2 > v1, and so, School 2 shows greater variance in their student adjusted achievements than is the case for School 1 .
Model 3: Random-Intercept Model With Random Residual Variance Function
Recall the reason for entering student prior achievement (and potentially further student covariates) into the mean function of the model is that schools should not be held accountable for pre-existing differences in student achievement across schools at the start of the value-added period (Ballou et al., 2004; Leckie & Goldstein, 2019; Leckie & Prior, 2022; Levy et al., 2023). A similar argument applies when comparing the variance in student adjusted achievement across schools. For example, suppose the residual variance increases with increasing student prior achievement. This would suggest that schools with higher mean student prior achievement would in general be expected to show more variable student adjusted achievement than schools with lower mean student prior achievement and this is even though we have adjusted for student prior achievement in the mean function. However, following the arguments under-pinning the traditional value-added model, this should be viewed as a reflection of their school intake rather than reflecting their school policies and practices. By entering student prior achievement into the model for the variance, we adjust for this overall variance trend. Focus then shifts to how schools deviate from this overall trend.
Model 3 therefore extends Model 2 by adding student prior achievement to the residual variance function. The model is written as
| (3) |
where α1 is the residual variance function regression coefficient on x1ij. All other terms are defined as before. Where further student and school covariates are added to the mean function, all or a subset of these may also be added to the residual variance function. However, in order to compare school intake-adjusted values of the school variance across schools, we must now calculate the residual variance in each school at a common value of x1ij such as the mean. For example, , where denotes the mean value for x1ij across all students and schools. Figure 2C illustrates Model 3, where α1 > 0, and so, the vertical scatter in student current achievement around each school line increases with student prior achievement in both schools and this is in addition to School 2 continuing to have greater within-school variance than school 1 (v2 > v1).
Model 4: Random-Slope Model
Model 4 is the differential effects version (Nuttal et al., 1989; Scherer & Nilsen, 2019; Strand, 2010) of the traditional school value-added model (Model 1) and can be written as the following random-slope linear regression:
| (4) |
where u0j and u1j denote the random intercept and random slope effects, assumed bivariate normally distributed and independent of the residual and covariates. The random intercept variance measures the variation in school mean adjusted achievement across schools when x1ij = 0. The random slope variance measures the variation in the slope adjustment for prior achievement across schools. The random intercept-slope covariance σu01 measures how these two terms covary. All other terms are defined as before. Where the model includes further student covariates, their regression coefficients may also be allowed to vary across schools.
The total residual, now u0j + u1jx1ij + eij, again measures covariate-adjusted student current achievement. However, school mean student adjusted achievement u0j + u1jx1ij now varies not only across schools but also across students as a function of the covariate with the random slope x1ij. Thus, this version of the model allows schools to be potentially more or less effective for students as a function of their prior achievement.
Figure 2D illustrates Model 4, where u1,1 > u1,2, and so, School 1 shows a steeper regression line than the average line, while School 2 shows a shallower line. The school lines are given by β0 + β1x1ij + u0j + u1jx1ij. The vertical deviations of each school line from the average line correspond to u0j + u1jx1ij and so are a linear function of x1ij: The figure shows the school value-added score for School 1 is positive in general, but especially positive for students with high x1ij. In contrast, the school value-added score for School 2 is negative in general, but especially negative for students with high x1ij.
School mean student adjusted achievement, averaging over all students in each school, is given by , where denotes the average of x1ij in school j. For the purpose of comparing schools in terms of their means, it is necessary to evaluate this quantity at common values of for all schools. The variance in student adjusted achievement in each school (over all students) is given by , where Varj(x1ij) denotes the variance of x1ij in school j. The first component of this expression captures the variance in student adjusted achievement attributable to interactions between the school effects u1j and the student prior achievement x1ij. The magnitude of this component varies across schools. For the purpose of comparing schools in terms of their variances, it is necessary to evaluate this component at a common value of Varj(x1ij) for all schools, for example, the average within school variance of x1ij. The second component is attributable to all other sources of variance in student adjusted achievement. Crucially, this continues to be assumed constant across schools (homoskedasticity). Thus, adding random slopes only partially recognizes that the variance in student adjusted achievement varies across schools.
Model 5: Random-Slope Model With Random Residual Variance
Model 5 extends Model 4 by allowing the variance in student adjusted achievement to vary across schools. (Equally Model 5 extends Model 2 by adding a random slope in the mean function to student prior achievement.) We do this by specifying an MELS version of the previous model. The model can be written as
| (5) |
where the second line of the equation specifies the log-linear function for the residual variance (see also Model 2). The three random effects u0j, u1j, and vj are assumed trivariate normally distributed and independent of the residuals and covariates. Figure 2E illustrates Model 5, where v2 > v1, and so, School 2 shows greater variance in their student adjusted achievements than is the case for School as well as a shallower slope (due to u1,1 > u1,2).
School mean student adjusted achievement (averaging over all students) is then given by as it was in the constant residual variance case (Model 4), and so, we will again need to evaluate this at a common value of for all schools. The variance in student adjusted achievement in each school (over all their students) is now given by and so differs from the constant residual variance case (Model 4) in that the last term also now varies across schools.
Model 6: Random-Slope Model With Random Residual Variance Function
Model 6 extends Model 5 by adding student prior achievement to the residual variance function. (Equally Model 6 extends Model 3 by adding a random slope to student prior achievement.) The model is written as
| (6) |
where α1 is the residual variance function regression coefficient on x1ij (see also Model 3). All other terms are defined as before. Figure 2F illustrates Model 6, where α1 > 0, and so, the vertical scatter in student current achievement around each school line increases with student prior achievement and this is in addition to School 2 continuing to have a shallower slope (u1,1 > u1,2) and greater within-school variance than school 1 (v2 > v1).
As in Model 5 (and Model 4), school mean student adjusted achievement (averaging over all students) is once again given by , while the variance in student adjusted achievement in each school (over all students) is now given by , where is the mean of the student specific residual variances in school j. Crucially, this mean is free to vary across schools.
Software
The traditional school value-added models (Models 1 and 4) are typically fitted via maximum likelihood estimation using conventional mixed-effects linear regression routines in standard software (R, SAS, SPSS, and Stata). However, the MELS versions of these models (Models 2, 3, 5, and 6) cannot be fitted using these routines, nor can they be fitted in specialized mixed-effects modeling packages (HLM and MLwiN). Hedeker and colleagues have developed the Mix-WILD software to fit MELS models by maximum likelihood estimation (Dzubur et al., 2020). These models can also be fitted via Markov Chain Monte Carlo (MCMC) methods in Stata and Mplus (McNeish, 2021), as well as dedicated Bayesian software such as Stan (including via the brms package in R; e.g., Parker et al., 2021), WinBUGS, OpenBUGS, and JAGS (including via the R2jags package R: e.g., Barrett et al., 2019). To support readers wishing to implement these models, we present annotated MixWILD, R, and Stata instructions and syntax and simulated data (Section S4 of the Supplemental information).
We fit all models using Stata (StataCorp, 2021). Specifically, we use the bayesmh command, which implements an adaptive Metropolis–Hastings MCMC algorithm. We use hierarchical centering reparameterizations to improve mixing. We specify vague (diffuse) normal priors for all regression coefficients and minimally informative inverse Wishart priors for the random effects variance–covariance matrices. We specify overdispersed initial values for all parameters. We fit all models with four chains, each with 5,000 burnin iterations and 10,000 monitoring iterations. We judge convergence using Gelman–Rubin convergence diagnostics (Gelman & Rubin, 1992) and trace, autocorrelation, and scatter plots. All models converged and all parameters had effective sample sizes > 400. We compare model fit using the deviance information criterion (DIC; Spiegelhalter et al., 2002). Smaller values are preferred.
4. Results
Model 1: Random-Intercept Model
Model 1 (Equation 1) is the traditional school value-added model. In other words, the random-intercept model. For simplicity and because not all researchers wish to additionally include student sociodemographics (Leckie & Prior, 2022; Levy et al., 2023), we only adjust for student prior achievement in this and subsequent Models 1 through 6, but we do explore the role of further covariates in Models 7 and 8. For the purpose of comparing to subsequent models, we parameterize as exp(α0).
Table 3 presents the results. The estimated slope coefficient on student age 11 score is , and so, a 1 SD difference in age 11 score is associated with a 0.678 SD difference in age 16 score. The estimated residual variance is . The estimated total variance in student adjusted achievement is (and so, student age 11 scores accounts for 51% of the variation in student age 16 scores ; Snijders & Bosker, 2012). The estimated between-school variance in school mean adjusted achievement is , and so, 14% of the total variation in student adjusted achievement ; Snijders & Bosker, 2012) is variation in the schools means. The between-school variance implies a 95% plausible values range (PVR) for the school means of (where Φ −1(·) denotes the inverse cumulative standard normal distribution; Raudenbush & Bryk, 2002). Thus, students in what would be deemed the most effective schools (operating at the 97.5th percentile of the distribution of all schools) are predicted to score 1.02 SD higher at age 16 than equivalent students in the least effective schools (operating at the 2.5th percentile). In contrast, the estimated student residual variance , is assumed constant, naively implying the variance in student adjusted achievement is the same in every school. Plots confirm that the random effect and residual normality assumptions for this and subsequent models are reasonable (Supplemental information).
Table 3. Results for the Random-Intercept Models Adjusting Only for Student Prior Achievement.
| Model 1 | Model 2 | Model 3 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Est. | SE | Est. | SE | Est. | SE | ||||
| Mean Function | |||||||||
| β0 | Intercept | −.011 | .012 | −.011 | 0.013 | −.011 | .012 | ||
| β1 | Age 11 score | .678 | .003 | .679 | 0.003 | .679 | .003 | ||
| School intercept effect variance | .067 | .005 | .067 | 0.005 | .067 | .005 | |||
| Residual Variance Function | |||||||||
| α0 | Intercept | −.870 | .005 | −.881 | 0.010 | −.881 | .011 | ||
| α1 | Age 11 score | .029 | .006 | ||||||
| School intercept effect variance | .037 | 0.003 | .040 | .004 | |||||
| Assoc. Between Mean and Var. Fn. Random Effects | |||||||||
| ρuv | Intercept effects correlation | −.472 | 0.048 | −.484 | .047 | ||||
| Fit Statistics | |||||||||
| DIC | 140,803 | 139,831 | 139,796 | ||||||
Model 2: Random-Intercept Model With Random Residual Variance
Model 2 (Equation 2) extends the random-intercept model (Model 1, Equation 1) to allow the residual variance and therefore variance in student adjusted achievement to vary across schools. Model 2 shows a reduction in the DIC of 972 points, confirming that this variation in variances is statistically significant. The mean function parameter estimates are largely unchanged. The estimated residual variance function intercept and estimated variance of the new school random effect are and . The model-implied population-averaged school variance in student adjusted achievement is estimated as (Hedeker et al., 2008), which, as expected, is close to the Model 1 estimate of 0.419. The estimated population 95% PVR of school variances of student adjusted achievement is . This range is substantial. For example, the estimated difference in student adjusted achievement between students performing at the 97.5th and 2.5th percentile within the most variable schools is 3.05 SD, while in the least variable schools , it is 2.09 SD (Raudenbush & Bryk, 2002).
Figure 3 plots the predicted school means of student adjusted achievement uj (y-axis) against the predicted school variances (x-axis). The means and variances are posterior mean predictions and so have been shrunk toward their population average values as a function of their sample size (Snijders & Bosker, 2012). The London average values are illustrated by the horizontal and vertical reference lines. The figure visualizes the substantial variation in both school means and variances of student adjusted achievement described above. While the negative correlation is moderate to large r = −0:54, having a high school mean by no means guarantees having a low variance. Equally, there are many instances where schools show similar means but noticeably different variances.
Figure 3.
Model 2 scatterplot of school means against school variances of student adjusted achievement. London average values are shown by horizontal and vertical reference lines.
Figure 4 presents the “caterpillar plots” of the 465 predicted school means (left panel) and school variances (right panel; Goldstein, 2011). Such plots are routinely used by researchers and accountability systems to identify schools that are significantly different from average (e.g., Prior et al., 2021). The distribution of the school variances is positively skewed, consistent with being modeled as log-normally distributed. Schools with fewer students have wider 95% credible intervals than schools with more students. Only 117 of the 465 schools (25%) can be statistically separated from the overall average in terms of their school variances compared to 320 schools (69%) when we consider the school means.
Figure 4.
Model 2 caterpillar plots for school means (left) and school variances (right) of student adjusted achievement presented in rank order. Posterior means with 95% credible intervals.
Model 3: Random Intercept Model With Random Residual Variance Function
Model 3 (Equation 3) further extends the random-intercept model to allow the residual variance to vary not just across schools (Model 2, Equation 2), but additionally as a function of student prior achievement. Model 3 is preferred to Model 2 (ΔDIC 34), showing the residual variance significantly increases with student age 11 scores . Thus, schools with in general higher age 11 scores are predicted to show higher variance in student adjusted achievement. However, this relationship is very weak. The estimated population 95% PVR of school intake adjusted variances of student adjusted achievement is effectively the same as in the previous model where we did not adjust for school intake, where denotes the London-wide average value for x1ij. That is, the variation in the variance in student adjusted achievement across schools is not simply explained by some schools showing in general higher age 11 scores and therefore higher variances than others.
Model 4: Random-Slope Model
Model 4 (Equation 4) is the differential effectiveness version of the traditional school-value-added model. In other words, the random-slopes model. Recall that this model, like the traditional random-intercepts model (Model 1, Equation 1), assumes the residual variance is once again constant across all students and schools . As in Model 1, we parameterize as exp(α0).
Table 4 presents the results. Model 4 is preferred to Model 1 (ΔDIC = 281) confirming the age 11 slope varies significantly across schools. The estimated mean and variance of the age 11 slope across schools are and . The latter implies an estimated 95% PVR of school slopes of . Figure 5 visualizes this variation for the sample schools by plotting the predicted school lines based on Model 1 (left panel) and Model 4 (right panel). The plots appear very similar, suggesting that while the random slopes are statistically significant, they are not practically significant. Indeed, moving from Model 1 to Model 4, the residual variance reduces by just 0.70%. Thus, in contrast to the literature which tends to show larger variation in school effects among low prior achievers versus high prior achievers, we find no such pattern. (Nuttal et al., 1989; Scherer & Nilsen, 2019; Strand, 2010).
Table 4. Results for the Random-Slope Models Adjusting Only for Student Prior Achievement.
| Model 4 | Model 5 | Model 6 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Est. | SE | Est. | SE | Est. | SE | ||||
| Mean Function | |||||||||
| β0 | Intercept | −.017 | .013 | −.015 | .013 | −.015 | .013 | ||
| β1 | Age 11 score | .675 | .004 | .673 | .004 | .672 | .004 | ||
| School intercept effect variance | .068 | .005 | .069 | .005 | .069 | .005 | |||
| School slope effect variance | .004 | .000 | .004 | .000 | .004 | .000 | |||
| ρu0u1 | Intercept slope effects correlation | .278 | .064 | .231 | .066 | .229 | .067 | ||
| Residual Variance Function | |||||||||
| α0 | Intercept | −.877 | .005 | −.889 .010 | −.889 | .011 | |||
| α1 | Age 11 score | .036 | .006 | ||||||
| School intercept effect variance | .037 .003 | .040 | .004 | ||||||
| Assoc. Between Mean and Var. Fn. Random Effects | |||||||||
| ρu0v | Intercept effects correlation | −.476 | .048 | −.494 | .047 | ||||
| ρu1v | Slope intercept effect correlation | −.089 | .075 | −.111 | .076 | ||||
| Fit Statistics | |||||||||
| DIC | 140,522 | 139,546 | 139,495 | ||||||
Note. Est. and SE denote the posterior means and SDs of the parameter chains. DIC denotes the deviance information criterion.
Figure 5.
Model 1 and Model 4 school regression lines of predicted age 16 scores against age 11 scores for random-intercept model (left) and random-slope model (right).
Model 5: Random-Slope Model With Random Residual Variance
Model 5 (Equation 5) extends the random-slope model (Model 4, Equation 4) to allow the residual variance to vary across schools. Thus, the move from Model 4 to 5 for the current random-slope model mirrors the move we explored from Model 1 to 2 for the earlier random-intercept versions of these models.
Model 5 allows us to quantify the relative importance of the differential school effects with respect to prior achievement as a component of the overall variance in student adjusted achievement in each school. We calculate the estimated variance for each school in our sample for a common reference distribution of students with student age 11 score variance (the mean of the sample school variances of student prior achievement). The resulting expression is , where (see Section 3, Model 5). The first component gives the variance attributable to the random slope interactions u1jx1ij. The second component captures all remaining variance. The first component is very small accounting for less than 1% of the variance in nearly all schools. In sum, the inclusion of the random slope on prior achievement has done very little to explain the variance in student adjusted achievement in each school.
Model 6: Random-Slope Model With Random Residual Variance Function
Model 6 (Equation 6) further extends the random-slope model to allow the residual variance to vary not just across schools (Model 5, Equation 5), but additionally as a function of student prior achievement. Thus, the move from Model 5 to 6 for the current random-slope model mirrors the move we explored from Model 2 to 3 for the earlier random-intercept versions of these models. As with the sequence of random intercept models, Model 6 shows the residual variance in the random-slope model significantly increases with student age 11 scores . However, as with the random-intercept models, this effect is slight and does little to explain the variation in school variances across schools. Given adding the random slope has little practical importance and in order to illustrate the subsequence models as simply as possible, we return to the sequence of random-intercept models.
Model 7: Random-Intercept Model With Random Variance Function and Student Characteristics
Model 7 extends Model 3 by adding student age, gender, first language, special educational need (SEN) status, and FSM status into the mean and residual variance functions (Table 1). Adding these characteristics to the mean function implies students are now compared to other students across London who not only share the same age 11 score, but who also share the same socio-demographic characteristics. The aim is to ensure that schools do not appear more or less effective simply as a result of recruiting more or less educationally advantaged students (Leckie & Goldstein, 2019). The resulting improved accuracy of the predicted age 16 scores will lead the student adjusted achievement scores to in general reduce in absolute magnitude (and reorder) leading the overall variance in student adjusted achievement to decrease. In turn, the school means and variances of student adjusted achievement scores will also change, again in general reducing in magnitude and reordering. We then further adjust the school variances of student adjusted achievement by including the student characteristics in the student residual variance function. This ensures that if there are any London-wide relationships between the variance in student adjusted achievement and particular student characteristics, this again will not benefit or count against schools with disproportionate numbers of these students.
Table 5 presents the results. Model 7 is preferred to Model 3 (ΔDIC = 7;247) confirming the statistical importance of the student characteristics. First consider the mean function. The results show that summer born students, girls, all ethnic minority groups except mixed ethnicity students (relative to White), and students who speak English as a second language, are all predicted to score higher at age 16, than otherwise equivalent students. SEN and FSM students, in contrast, are predicted to score lower than otherwise equivalent students. These results are established and consistent with the literature (Leckie & Goldstein, 2019). What is not known is whether there are also sociodemographic differences in the variance in student adjusted achievement. The results show that, all else equal, the residual variance and therefore variance in student adjusted achievement again increases with age 11 scores but is now also shown to be higher for SEN and FSM students than for otherwise equal students. Thus, it proves harder to predict reliably the age 16 scores of these student groups relative to other student groups. In contrast, summer born students, girls, Black, and Asian students show lower variance in student adjusted achievement and therefore appear to perform in a more consistent fashion than otherwise equal student groups within schools.
Table 5. Results for the Random-Intercept Models Adjusting for Student Prior Achievement and Student and School Characteristics.
| Model 7 | Model 8 | |||||
|---|---|---|---|---|---|---|
| Est. | SE | Est. | SE | |||
| Mean Function | ||||||
| β0 | Intercept | −.129 | .012 | −.235 | .017 | |
| β1 | Age 11 score | .634 | .003 | .632 | .003 | |
| β2 | Summer born | .045 | .005 | .044 | .005 | |
| β3 | Girl | .219 | .005 | .218 | .005 | |
| β4 | Ethnicity: Black | .015 | .006 | .014 | .007 | |
| β5 | Ethnicity: Asian | .152 | .008 | .150 | .008 | |
| β6 | Ethnicity: Chinese | .296 | .028 | .290 | .028 | |
| β7 | Ethnicity: Mixed | .001 | .009 | .000 | .009 | |
| β8 | Ethnicity: Other | .089 | .010 | .088 | .009 | |
| β9 | First language not English | .162 | .006 | .162 | .006 | |
| β10 | Special educational need (SEN) | −.276 | .008 | −.276 | .008 | |
| β11 | Free school meal (FSM) | −.193 | .005 | −.192 | .005 | |
| β12 | School type: Sponsored academy | .055 | .025 | |||
| β13 | School type: Converter academy | .082 | .020 | |||
| β14 | School type: Other | .023 | .038 | |||
| β15 | School admissions: Grammar | .396 | .049 | |||
| β16 | School admissions: Secondary modern | −.118 | .045 | |||
| β17 | School gender: Boys | .053 | .032 | |||
| β18 | School gender: Girls | .064 | .027 | |||
| β19 | School religious | .139 | .022 | |||
| School intercept effect variance | .050 | .004 | .037 | .003 | ||
| Residual Variance Function | ||||||
| α0 | Intercept | −.948 | .015 | −.889 | .024 | |
| α1 | Age 11 score | .077 | .006 | .081 | .006 | |
| α2 | Summer born | −.044 | .012 | −.045 | .012 | |
| α3 | Girl | −.059 | .012 | −.061 | .012 | |
| α4 | Ethnicity: Black | −.154 | .016 | −.156 | .016 | |
| α5 | Ethnicity: Asian | −.105 | .018 | −.106 | .018 | |
| α6 | Ethnicity: Chinese | −.088 | .072 | −.080 | .069 | |
| α7 | Ethnicity: Mixed | −.028 | .022 | −.035 | .021 | |
| α8 | Ethnicity: Other | −.014 | .020 | −.015 | .021 | |
| α9 | First language not English | −.002 | .013 | −.005 | .013 | |
| α10 | SEN | .204 | .016 | .203 | .016 | |
| α11 | FSM | .103 | .012 | .099 | .012 | |
| α12 | School type: Sponsored academy | .011 | .028 | |||
| α13 | School type: Converter academy | −.048 | .023 | |||
| α14 | School type: Other | .053 | .042 | |||
| α15 | School admissions: Grammar | −.280 | .052 | |||
| α16 | School admissions: Secondary modern | −.068 | .044 | |||
| α17 | School gender: Boys | .002 | .034 | |||
| α18 | School gender: Girls | .015 | .029 | |||
| α19 | School religious | −.110 | .023 | |||
| School intercept effect variance | .032 | .003 | .026 | .003 | ||
| Association Between Mean and Variance Function Random Effects |
||||||
| ρu0v | Intercept effects correlation | −.409 | .050 | −.282 | .057 | |
| Fit Statistics | ||||||
| Deviance information criterion (DIC) | 132,549 | 132,539 | ||||
Note. Est. and SE denote the posterior means and SDs of the parameter chains. Student ethnicity reference group is White. School type reference group is standard. School admissions reference group is comprehensive. School gender reference group is mixed-sex school.
Figure 6 presents the scatterplots of the school means and variances of student adjusted achievement based on the current model, which adjusts for student background against those based on Model 3 which ignores student background. The purpose of this figure is to explore the sensitivity of the school means and variances to the additional adjustments for student background and to therefore assess the importance of making such adjustments or not (Leckie & Prior, 2022; Levy et al., 2023). We calculate the estimated school variances in each model by plugging in the sample mean values for the covariates (Table 1) in the residual variance function, and so, . The plots show both the school means and the school variances are correlated 0.94 across the two models. Thus, schools that show high mean adjusted achievement when one ignores student background nearly always still show high mean adjusted achievement after adjustment. The same applies for school variances of student adjusted achievement. However, even with such high correlations, the rank ordering of those schools whose social mix differ most markedly from the London-wide average still change considerably as shown by schools located furthest away from the 45° line in the bottom plots. Thus, the decision of whether to adjust for student background has a bearing on the manner, in which many individual schools are viewed in terms of their school variances as well as their school means.
Figure 6.
Model 7 against Model 3 scatterplots of school means of student adjusted achievement (top left), school variances of student adjusted achievement (top right), ranks of school means of student adjusted achievement (bottom left), and ranks of school variances of student adjusted achievement (bottom right).
Model 8: Random-Intercept Model With Random Variance Function and School Characteristics
We now shift from attempting to best define and measure student adjusted achievement, and therefore the school means and variances of student adjusted achievement, to attempting to explain why some schools show higher mean student adjusted achievement and lower variance in student adjusted achievement than others. Unfortunately, we do not observe school policies and practices in our data. However, we do observe some school characteristics (Table 2). Model 8 extends Model 7 by adding school type, school admissions, school gender (mixed, boys, and girls), and school religion to the mean and residual variance functions.
The results (Table 5) for the existing mean and residual variance function regression coefficients are very similar to before and so we restrict our interpretation here to the new results. First, consider the mean function. Relative to standard school types, school mean adjusted achievement is somewhat higher in sponsored and converter academies having adjusted for the other covariates. Similarly, school mean adjusted achievement is higher in girls’ schools and religious schools, all else equal. However, the most sizable differential is related to school admissions: School mean adjusted achievement is considerably higher in grammar schools and lower in secondary modern schools relative to comprehensive schools. These results agree with the literature (Leckie & Goldstein, 2019). With respect to the residual variance function, we see new findings. School variances in student adjusted achievement tend to be lower in converter academies compared to standard school types, lower in grammar schools versus comprehensive school types, and lower in religious schools versus nonreligious schools, and this is after adjusting for London-wide relationships between the variance in student adjusted achievement and student characteristics. Thus, students in converter academies, grammar, and religious schools not only tend to show higher student adjusted achievement on average but also tend to show more consistent student adjusted achievement.
5. Discussion
In this article, we have argued that the focus of school value-added models should broaden to measure not just school mean differences in student adjusted achievement (student achievement beyond that predicted by student prior achievement and other student background characteristics), but school variance differences in student adjusted achievement. To study school variance differences, we have proposed extending the traditional school value-added model, a random-intercept mixed-effects linear regression of student current achievement on prior achievement and other student background characteristics, by modeling the residual variance as a log-linear function of the student covariates and a new random school effect. The school random intercept effect and random residual variance in this model measure the school mean and variance in student adjusted achievement. This model can be viewed as an application of the MELS model popular in biostatistics (Hedeker et al., 2008). It is, however, important to reiterate that the school value-added models and their respective predicted school effects should be viewed as descriptive rather than causal since these models do not address the complex selection into schools processes that will be in play in many school systems.
We have illustrated this extended school value-added model with an application to schools in London. In response to our research question: Our results suggest meaningful differences in the variance in student adjusted achievement across schools. We also find a moderate to large negative association between the school mean and variance in student adjusted achievement. Thus, schools that show the highest mean student adjusted achievement also tend to be the schools that show the lowest variance in student adjusted achievement. One process by which school variance differences may arise is if there is a London-wide negative relationship between the variance in student adjusted achievement and student prior achievement. We adjusted for this by entering student prior achievement into the residual variance function. A second process by which school variance differences may arise is via interaction effects between the different school policies and practices envisaged to be represented by the school random intercept effect and observed and unobserved student characteristics. Previous research has studied this via entering a school random slope on student prior achievement and this showed schools to be differentially effective for students with low, middle, and high prior achievement. In our application, however, these school-by-student prior achievement interactions are small and explain little of the variation in school variances between schools. We then turned our attention to entering student characteristics into the model, both in the mean and residual variance functions, to better measure student adjusted achievement. In terms of new results, we find that FSM and SEN students show greater variance in student adjusted achievement and therefore less predictable age 16 scores than otherwise equal students. The resulting predicted school means and variances of student adjusted achievement, however, are similar to those based on the model, which only adjusts for student prior achievement. Nevertheless, schools whose sociodemographic student mix differ most from the average school still move up and down the London-wide rankings considerably, demonstrating the importance of adjusting for student background at least for some schools (Leckie & Goldstein, 2019; Leckie & Prior, 2022; Levy et al., 2023). Finally, we shifted our emphasis from measuring school means and variances of student adjusted achievement to seeking to explain them. We find converter academies and grammar schools tend to show lower variances in student adjusted achievement than other school types. Importantly, here too we adjusted for any overall relationship between the variance in student adjusted achievement and student prior achievement and background characteristics, and so, these differences in school variances lie beyond this simple explanation.
Future studies might seek to identify whether school variance differences can be predicted by specific school policies and practices. It will also be interesting and important to explore the role of school composition covariates, such as the school mean and school SD of the student prior achievement (Raudenbush & Bryk, 2002). One issue that such studies should bear in mind is that some student current achievement measures may exhibit floor or ceiling effects. Where these are pronounced, they may bias the model parameters relative to fitting models to measures without such effects. Tobit versions of the models might be considered to address this issue (Lu, 2018). Another issue is sample-size requirements. In general, we found that the residual variance function regression coefficients and predicted school effects were less precisely estimated than their analogous quantities in the mean function. This suggests that larger sample sizes are needed for these models than traditionally used for school value-added studies. Future studies might therefore use power calculations to guide such decisions (Walters et al., 2018).
More generally, however, expanding the focus of school value-added models to consider schools effects on the variance in student achievement raises value judgements and interpretational challenges that future work will need to engage with. Fundamentally, it is not clear how positively or negatively higher or lower variances should be viewed in general. Similarly, where a given school policy or practice is identified as driving school differences in variance via differential effects on students as a function of their observed and unobserved characteristics, it will not typically be clear what the optimal degree of differential impact might be. Even if it is decided that higher variance should be interpreted in a particular way, faced now with two summaries of school effects on student learning (mean and variance effects), researchers and school accountability systems must make further value judgments as to how to best combine them into any overall summary of school effectiveness for the purpose of making overall inferences, judgements and decisions about schools (Prior, Goldstein, et al., 2021). Crucially, it is only by extending the school value-added model to allow for school effects on the variance in student adjusted achievement that such debates are made possible. The extension we have presented paves the way for new substantive research into the reasons behind differences in variability and therefore how best such differences should be interpreted and addressed.
The school value-added model presented here can be further extended in various ways beyond simply adding further covariates and random slopes suggesting avenues for new methodological research. First, in the school effectiveness literature, there is interest in studying the consistency of school effects across academic subjects (Goldstein, 1997; Reynolds et al., 2014; Teddlie & Reynolds, 2000). We can further develop the school value-added model to study this phenomenon with respect to the school variance in student adjusted achievement. Essentially, we would fit a multivariate response version of this model for multiple student achievement scores (Kapur et al., 2015; Leckie, 2018; Pugach et al., 2014). The model would have multiple residual variance functions, one for each academic subject. We can then study the correlations of the school means and variances of student adjusted achievement across subjects. Second, the same multivariate response version of the model can be used to study the stability of school effects over time. Here, we would fit a multivariate response model to a single achievement score, but for multiple student cohorts (Leckie & Goldstein, 2009). Third, we could include a random slope in the residual variance function (Goldstein et al., 2018; McNeish, 2021) to study whether schools exacerbate or mitigate any overall relationship between the variance in student adjusted achievement and student prior achievement. Fourth, while we have flexibly modeled the residual variance, we have not modeled the random intercept variance (the random slope model relaxed this, but in a rather specific way). It is also possible to model the random intercept variance as a log-linear function of school covariates (Hedeker et al, 2008). For example, the variability of school mean adjusted achievement scores across schools may appear greater for some school groups than others, and this could then be tested by introducing the school group variable as a covariate in this second variance function. Fifth, we can expand the model to three levels to incorporate an additional random effect into the mean and residual variance functions relating to, for example, school district and thereby study school district differences in the mean and variance in student adjusted achievement. This then raises the possibility of entering school district random effects into the school random intercept variance function since school mean adjusted achievement might vary more in some school districts than in others, and so with this extension, we can potentially study differential school-level inequalities in the education system by school district (Leckie et al., 2012; Leckie and Goldstein, 2015). Alternatively, teacher random effects could be introduced as a new level between the student and school level. Finally, our focus has been on shifting attention from studying school mean of student adjusted achievement to additionally focusing on the variance in student adjusted achievement. In future work, it would be interesting to explore further ways the distribution of student adjusted achievement might vary across schools, for example, with respect to skewness.
Supplementary Material
Funding
The author(s) disclosed receipt of the following financial support for the research and/or authorship of this article: This research was funded by UK Economic and Social Research Council (ESRC) grants ES/R010285/1 and ES/W000555/1 and UK Medical Research Council (MRC) grants MR/N027485/1 and MC_UU_00032/02.
Biographies
Authors
GEORGE LECKIE is a professor of social statistics at the Centre for Multilevel Modelling and School of Education, University of Bristol, Bristol, United Kingdom; g.leckie@bristol.ac.uk. His research interests are in the development, application, and dissemination of multilevel models to analyze messy and complex clustered cross-sectional and longitudinal data in educational and other social science research.
RICHARD PARKER is a senior research associate in applied statistics/epidemiology at the Bristol Medical School, University of Bristol, Bristol, United Kingdom; richard.parker@bristol.ac.uk. His research interests are in multilevel and mixed-effects location scale models and their applications in epidemiology.
HARVEY GOLDSTEIN died April 9, 2020, aged 80, while this article was in preparation. He was a professor of social statistics at the Centre for Multilevel Modelling and School of Education, University of Bristol, Bristol, United Kingdom His research interests were in the use of statistical modeling techniques in the construction and analysis of educational tests, educational (school) effectiveness, the methodology of multilevel modeling, and Bayesian modeling methods for handling missing data values and measurement errors
KATE TILLING is a professor of medical statistics at the Bristol Medical School, University of Bristol, United Kingdom; kate.tilling@bristol.ac.uk. Her research interests are in the development and application of statistical methods to causal problems in epidemiology/health services research. Two particular areas are methods for the analysis of longitudinal data and methods for minimizing bias due to missing data.
Footnotes
Authors’ Note
This work contains statistical data from Office for National Statistics (ONS), UK, which is Crown Copyright. The use of the ONS statistical data in this work does not imply the endorsement of the ONS in relation to the interpretation or analysis of the statistical data. These data are not publicly accessible, but researchers can apply to analyze them https://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/secureresearchservice.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
References
- Aitkin M, Longford N. Statistical modelling issues in school effectiveness studies. Journal of the Royal Statistical Society Series A (General) 1986;149(1):1–43. [Google Scholar]
- American Statistical Association. ASA Statement on Using Value-Added Models for Educational Assessment. 2014. https://www.amstat.org/asa/files/pdfs/POL-ASAVAM-Statement.pdf .
- Amrein-Beardsley A. Rethinking value-added models in education: Critical perspectives on tests and assessment-based accountability. Routledge; 2014. [Google Scholar]
- Amrein-Beardsley A, Holloway J. Value-added models for teacher evaluation and accountability: Commonsense assumptions. Educational Policy. 2019;33(3):516–542. [Google Scholar]
- Angrist J, Hull P, Pathak PA, Walters C. Credible school value-added with undersubscribed school lotteries. The Review of Economics and Statistics. 2021:1–46. [Google Scholar]
- Ballou D, Sanders W, Wright P. Controlling for student background in value-added assessment of teachers. Journal of Educational and Behavioral Statistics. 2004;29:37–65. [Google Scholar]
- Barrett JK, Huille R, Parker RMA, Yano Y, Griswold M. Estimating the association between blood pressure variability and cardiovascular disease: An application using the ARIC Study. Statistics in Medicine. 2019;38(10):1855–1868. doi: 10.1002/sim.8074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Braun HI, Wainer H. Value-added assessment. Handbook of Statistics. 2007;27:867–892. [Google Scholar]
- Brunton-Smith I, Sturgis P, Leckie G. Detecting and understanding interviewer effects on survey data by using a cross-classified mixed effects location–scale model. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2017;180(2):551–568. [Google Scholar]
- Brunton-Smith I, Sturgis P, Leckie G. How collective is collective efficacy? The importance of consensus in judgments about community cohesion and willingness to intervene. Criminology. 2018;56(3):608–637. [Google Scholar]
- Castellano KE, Ho AD. A practitioner’s guide to growth models. Council of Chief State School Officers; 2013. [Google Scholar]
- De Fraine B, Van Damme J, Onghena P. Accountability of schools and teachers: What should be taken into account? European Educational Research Journal. 2002;1(3):403–428. [Google Scholar]
- Department for Education. Secondary accountability measures: Guide for maintained secondary schools; academies, and free schools. Department for Education; 2020. [Google Scholar]
- Department for Education. National pupil database. Department for Education; London: 2023. https://www.gov.uk/government/collections/national-pupil-database . [Google Scholar]
- Dzubur E, Ponnada A, Nordgren R, Yang CH, Intille S, Dunton G, Hedeker D. MixWILD: A program for examining the effects of variance and slope of time-varying variables in intensive longitudinal data. Behavior Research Methods. 2020;52(4):1403–1427. doi: 10.3758/s13428-019-01322-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;7(4):457–472. [Google Scholar]
- Goldstein H. Methods in school effectiveness research. School Effectiveness and School Improvement. 1997;8:369–395. [Google Scholar]
- Goldstein H. Multilevel statistical models. 4th ed. Wiley; 2011. [Google Scholar]
- Goldstein H. Living by the evidence. Significance. 2020;17:38–40. [Google Scholar]
- Goldstein H, Leckie G, Charlton C, Tilling K, Browne WJ. Multilevel growth curve models that incorporate a random coefficient model for the level 1 variance function. Statistical Methods in Medical Research. 2018;27(11):3478–3491. doi: 10.1177/0962280217706728. [DOI] [PubMed] [Google Scholar]
- Goldstein H, Rasbash J, Yang M, Woodhouse G, Pan H, Nuttall D, Thomas S. A multilevel analysis of school examination results. Oxford Review of Education. 1993;19(4):425–433. [Google Scholar]
- Goldstein H, Spiegelhalter DJ. League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society: Series A (Statistics in Society) 1996;159(3):385–409. [Google Scholar]
- Hedeker D, Mermelstein RJ, Demirtas H. An application of a mixed-effects location scale model for analysis of ecological momentary assessment (EMA) data. Biometrics. 2008;64(2):627–634. doi: 10.1111/j.1541-0420.2007.00924.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hedeker D, Mermelstein RJ, Demirtas H. Modeling between-subject and within-subject variances in ecological momentary assessment data using mixed-effects location scale models. Statistics in Medicine. 2012;31(27):3328–3336. doi: 10.1002/sim.5338. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kapur K, Li X, Blood EA, Hedeker D. Bayesian mixed-effects location and scale models for multivariate longitudinal outcomes: An application to ecological momentary assessment data. Statistics in Medicine. 2015;34:630–651. doi: 10.1002/sim.6345. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koretz D. The testing charade: Pretending to make schools better. University of Chicago Press; 2017. [Google Scholar]
- Leckie G. Avoiding bias when estimating the consistency and stability of value-added school effects using multilevel models. Journal of Educational and Behavioral Statistics. 2018;43(3):440–468. [Google Scholar]
- Leckie G, French R, Charlton C, Browne W. Modeling heterogeneous variance-covariance components in two-level models. Journal of Educational and Behavioral Statistics. 2014;39(5):307–332. [Google Scholar]
- Leckie G, Goldstein H. The limitations of using school league tables to inform school choice. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009;172(4):835–851. [Google Scholar]
- Leckie G, Goldstein H. A multilevel modelling approach to measuring changing patterns of ethnic composition and segregation among London secondary schools, 2001-2010. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2015;178(2):405–422. [Google Scholar]
- Leckie G, Goldstein H. The evolution of school league tables in England 1992–2016: “Contextual value-added,” “expected progress” and “progress 8”. British Educational Research Journal. 2017;43(2):193–212. [Google Scholar]
- Leckie G, Goldstein H. The importance of adjusting for student background in school value-added models: A study of Progress 8 and school accountability in England. British Educational Research Journal. 2019;45(3):518–537. [Google Scholar]
- Leckie G, Pillinger R, Jones K, Goldstein H. Multilevel modelling of social segregation. Journal of Educational and Behavioral Statistics. 2012;37(1):3–30. [Google Scholar]
- Leckie G, Prior L. A comparison of value-added models for school accountability. School Effectiveness and School Improvement. 2022;33(3):431–455. [Google Scholar]
- Levy J, Brunner M, Keller U, Fischbach A. Methodological issues in value-added modeling: An international review from 26 countries. Educational Assessment, Evaluation and Accountability. 2019;31:257–287. [Google Scholar]
- Levy J, Brunner M, Keller U, Fischbach A. How sensitive are the evaluations of a school’s effectiveness to the selection of covariates in the applied value-added model? Educational Assessment, Evaluation and Accountability. 2023;35(1):129–164. doi: 10.1007/s11092-022-09386-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu T. Mixed-effects location and scale Tobit joint models for heterogeneous longitudinal data with skewness, detection limits, and measurement errors. Statistical Methods in Medical Research. 2018;27(12):3525–3543. doi: 10.1177/0962280217704225. [DOI] [PubMed] [Google Scholar]
- McCaffrey DF, Lockwood JR, Koretz D, Louis TA, Hamilton L. Models for value-added modeling of teacher effects. Journal of Educational and Behavioral Statistics. 2004;29(1):67–101. doi: 10.3102/10769986029001067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McNeish D. Specifying location scale models for heterogeneous variances as multilevel SEMs. Organizational Research Methods. 2021;24(3):630–653. [Google Scholar]
- Nordgren R, Hedeker D, Dunton G, Yang CH. Extending the mixed-effects model to consider within-subject variance for ecological momentary assessment data. Statistics in Medicine. 2020;39(5):577–590. doi: 10.1002/sim.8429. [DOI] [PubMed] [Google Scholar]
- Nuttall DL, Goldstein H, Prosser R, Rasbash J. Differential school effectiveness. International Journal of Educational Research. 1989;13(7):769–776. [Google Scholar]
- Organization for Economic Cooperation and Development. Measuring improvements in learning outcomes: Best practices to assess the value-added of schools. Organization for Economic Co-operation and Development Publishing & Centre for Educational Research and Innovation; 2008. [DOI] [Google Scholar]
- Parker RMA, Leckie G, Goldstein H, Howe LD, Heron J, Hughes AD, Phillippo DM, Tilling K. Joint modeling of individual trajectories, within-individual variability, and a later outcome: Systolic blood pressure through childhood and left ventricular mass in early adulthood. American Journal of Epidemiology. 2021;190(4):652–662. doi: 10.1093/aje/kwaa224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Prior L, Goldstein H, Leckie G. School value-added models for multivariate academic and non-academic outcomes: Exploring implications for performance monitoring and accountability. School Effectiveness and School Improvement. 2021;32(3):486–507. [Google Scholar]
- Prior L, Jerrim J, Thomson D, Leckie G. A review and evaluation of secondary school accountability in England: Statistical strengths, weaknesses, and challenges for “Progress 8”. Review of Education. 2021;9(3):1–30. doi: 10.1002/rev3.3299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pugach O, Hedeker D, Mermelstein RJ. A bivariate mixed-effects location-scale model with application to ecological momentary assessment (EMA) data. Health Services and Outcomes Research Methodology. 2014;14(4):194–212. doi: 10.1007/s10742-014-0126-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rast P, Hofer SM, Sparks C. Modeling individual differences in within-person variation of negative and positive affect in a mixed effects location scale model using BUGS/JAGS. Multivariate Behavioral Research. 2012;47(2):177–200. doi: 10.1080/00273171.2012.658328. [DOI] [PubMed] [Google Scholar]
- Raudenbush SW, Bryk AS. A hierarchical model for studying school effects. Sociology of Education. 1986;59(1):1–17. [Google Scholar]
- Raudenbush SW, Bryk AS. Hierarchical linear models: Applications and data analysis methods. 2nd ed. Sage; 2002. [Google Scholar]
- Raudenbush SW, Willms JD. The estimation of school effects. Journal of Educational and Behavioral Statistics. 1995;20(4):307–335. [Google Scholar]
- Reardon SF, Raudenbush SW. Assumptions of value-added models for estimating school effects. Education Finance and Policy. 2009;4(4):492–519. [Google Scholar]
- Reynolds D, Sammons P, De Fraine B, Van Damme J, Townsend T, Teddlie C, Stringfield S. Educational effectiveness research (EER): A state-of-the-art review. School Effectiveness and School Improvement. 2014;25(2):197–230. [Google Scholar]
- Rubin DB, Stuart EA, Zanutto EL. A potential outcomes view of value-added assessment in education. Journal of Educational and Behavioral Statistics. 2004;29(1):103–116. [Google Scholar]
- Scherer R, Nilsen T. Closing the gaps? Differential effectiveness and accountability as a road to school improvement. School Effectiveness and School Improvement. 2019;30(3):255–260. [Google Scholar]
- Snijders TAB, Bosker RJ. Multilevel analysis: An introduction to basic and advanced multilevel modeling. 2nd Sage; 2012. [Google Scholar]
- Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B. 2002;64(4):583–639. [Google Scholar]
- Strand S. Do some schools narrow the gap? Differential school effectiveness by ethnicity, gender, poverty, and prior achievement. School Effectiveness and School Improvement. 2010;21(3):289–314. [Google Scholar]
- StataCorp. Stata 17 Bayesian analysis reference manual. Stata Press; 2021. [Google Scholar]
- Teddlie C, Reynolds D. The international handbook of school effectiveness research. Psychology Press; 2000. [Google Scholar]
- Thomas S, Mortimore P. Comparison of value-added models for secondary-school effectiveness. Research Papers in Education. 1996;11(1):5–33. [Google Scholar]
- Timmermans AC, Thomas SM. The impact of student composition on schools’ value-added performance: A comparison of seven empirical studies. School Effectiveness and School Improvement. 2015;26(3):487–498. [Google Scholar]
- Wainer H. Introduction to a special issue of the journal of educational and behavioral statistics on value-added assessment. Journal of Educational and Behavioral Statistics. 2004;29(1):1–2. [Google Scholar]
- Walters RW, Hoffman L, Templin J. The power to detect and predict individual differences in intra-individual variability using the mixed-effects location-scale model. Multivariate Behavioral Research. 2018;53(3):360–374. doi: 10.1080/00273171.2018.1449628. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.






