Skip to main content
PLOS One logoLink to PLOS One
. 2025 Mar 19;20(3):e0302771. doi: 10.1371/journal.pone.0302771

An assessment of the teacher completed ‘Early Years Foundation Stage Profile’ as a routine measure of child developmental health

Kate E Mooney 1,2,*, Charlie Welch 1, Gareth Palliser 1, Rachael W Cheung 1,2, Dea Nielsen 1,2, Lucy H Eddy 3, Sarah L Blower 1,2
Editor: Nhu N Tran4
PMCID: PMC11957556  PMID: 40106470

Abstract

The routine measurement of children’s developmental health varies across educational settings and systems. The Early Years Foundation Stage Profile (EYFSP) is a routinely recorded measure of a child’s development completed at the end of their first school year, for all children attending school in England and Wales. Despite widespread use for research and educational purposes, the measurement properties are unknown. This study examined the internal consistency and structural validity of the EYFSP, investigating whether the summed item-level scores, which we refer to as the ‘total score’, can be used as a summary of children’s developmental health. It also examined predictive validity of the total score with respect to later academic attainment and behavioural, social, and emotional difficulties. The data source was the longitudinal prospective birth cohort, Born in Bradford (BiB), and routine education data were obtained from Local Authorities. The internal consistency and structural validity of the EYFSP total score were investigated using Confirmatory Factor Analysis and a Rasch model. Predictive validity was assessed using linear mixed effects models for Key Stage 2 (Maths, Reading, Grammar/Punctuation/Spelling), and behavioural, social, and emotional difficulties (Strengths and Difficulties Questionnaire). We found that the EYFSP items demonstrated internal consistency, however, an Item Response model suggested weak structural validity (n = 10,589). Mixed effects regression found the EYFSP total score to predict later academic outcomes (n = 2711), and behavioural, social, and emotional difficulties (n = 984). This study has revealed that whilst caution should be applied for measurement of children with close to ‘average’ ability levels using the EYFSP, the EYFSP total score is an internally consistent measure with predictive validity.

Introduction

‘Developmental health’ is a broad concept that combines a holistic understanding of physical, mental, social, and emotional wellbeing, combined with core educational abilities such as mathematics and literacy [1]. Measurements of children’s early developmental health can be used to predict later educational performance and health [24], which are both, in turn, important predictors of adult social and health outcomes [5,6]. Ensuring that children have strong developmental health in the earliest years of their lives can therefore contribute to their future educational attainment [7] and, consequently, help to close socioeconomic inequalities in educational outcomes [8,9]. It is therefore important to routinely measure children’s developmental health using an accurate and valid measure to identify those who may need extra support [10,11].

At an international level, widely used measures of child developmental health include the Early Childhood Development Index (ECDI) and the Early Development Instrument (EDI) [12]. The ECDI has been in use since 1990 by the United Nations as an indicator of ‘improved access to quality early childhood development, care and pre-primary education’, and consists of 10 caregiver-reported questions for children aged 3–5 years across literacy, numeracy, learning/cognition, physical and socioemotional development [13].

Some countries have utilised existing educational settings to monitor children’s developmental health with teacher-based assessments, such as the EDI and the Teaching Standard GOLD® (TS GOLD) measure. The EDI has been widely implemented across Canada (all provinces except for one since 1998) and Australia (all schools since 2009) [14], and assesses children aged 4–6 years-old, regarding their physical health, wellbeing, social competence, emotional maturity, language and cognitive development, communication skills, and general knowledge across 103 items [15,16]. The EDI has generally demonstrated adequate psychometric properties in terms of internal consistency (with all domains except for physical health having Cronbach’s alpha > 0.86). However, it has demonstrated variable model fit (with CFI and TLI values all above.80 for all subscales in Canada, Australia, and the USA, and RMSEA values ranging between 0.063 and 0.228) [17], and variable predictive validity (with Pearson’s r correlations ranging between 0.19 and 0.38 between the Language and Cognitive Development domain scores and the Peabody Picture Vocabulary Test, PPVT) [17].

In the US, the Teaching Standard GOLD® (TS GOLD) teacher-report measure has been used as a formative assessment of developmental health in children aged 2–5 years in 19 out of 54 public preschool programmes as of 2012, and is also used by federal Head Start programmes as of 2019 [18]. Teachers rate children across social-emotional, physical, language, cognitive, literacy, and mathematics domains from Level 0 (‘Not Yet’) to Level 9 (‘Exceeds kindergarten expectations’). TS GOLD has demonstrated adequate measurement of each domain as a latent construct (factor loadings ranged between 0.68 – 0.95), but variable model fit (SRMR ranging between 0.38 to 0.50; CFI between 0.90 to 0.92; RMSEA between 0.06 to 0.07). It has demonstrated metric, scalar, and strict invariance across longitudinal measurements, good interrater agreement between teachers and experienced raters (all above 0.80), and poor to moderate concurrent validity with the Bracken School Readiness Scale (ranging ICC between 0.38 to 0.54; Pearson’s r for individual scale scores between 0.27 to 0.74) [19].

Overall, although teacher-based assessments have the advantage of using existing educational settings to assess children’s developmental health, providing much-needed population data on child development [14], as well as limiting the stress children may experience from formalised exam-based assessments [20], further research is needed to understand the psychometric strengths and limitations of teacher-based assessments as general developmental health measures.

Within England and Wales, the Early Years Foundation Stage (EYFS) was introduced in 2008 to provide a research-based framework with information on how children learn and develop, aimed at practitioners to assist them in delivering high quality early years environments [21]. Based on the EYFS framework, the EYFS ‘Profile’ (EYFSP) was introduced as a teacher assessment of children’s development and learning, completed at the end of the academic year in which the child turns five [22]. It was originally introduced with 69 ‘Early Learning Goals’ (ELGs). Following a review which indicated a need to simplify and reduce the number of goals for teachers to complete [21], a new profile consisting of 17 ELGs was introduced in 2012. Whilst specific, detailed information regarding how the specific ELGs were chosen is limited, and the EYFSP was not developed as a robust measurement tool (in comparison to, for instance, the EDI), the ELGs do appear to relate to children’s early developmental health. The ELGs span seven different developmental areas; ‘Communication and language development’, ‘Physical development’, ‘Personal, social and emotional development’, ‘Literacy’, ‘Mathematics’, ‘Understanding the world’, and ‘Expressive arts and design’ [23,24].

In the version of the EYFSP that we analyse in this study (second version, delivered 2012-2021), the EYFSP is scored according to whether a child meets each ELG as “Emerging”, “Expected” or “Exceeding”. A revised version of the EYFSP has been available since 2021, with changes to the content and focus of the ELGs, and children are scored as only “Emerging” or “Expected” [25] (see Supporting Information File 1 for second and revised versions). The present study investigates the second version of the EYFSP as this was used nationally and routinely for nine years, and cohort studies have utilised it in research studies, both as an outcome in evaluations of interventions or policies [26], and as a predictor in association studies [27]. Despite the update to the revised version, data from second version is likely to continue to be relevant in the future, as there are several studies listed on the ISRCTN that are using the EYFSP as an outcome, and protocols for evaluations which plan to use it as an outcome in the future [28].

The ‘Good Level of Development’ (GLD)

The EYFSP has been predominantly used in research studies and educational monitoring as a binary measure, where children either meet a ‘Good Level of Development’ (GLD), or they do not. Children are scored as having achieved a GLD if they have achieved at least the expected level for the ELGs in the core areas of “communication and language”, “physical development”, “personal, social and emotional development”, “literacy” and “mathematics” [23]. The Department for Education monitors national and regional averages of children reaching a GLD and compares the number of children achieving GLD across different groups according to characteristics such as gender and eligibility for free school meals [29].

Further, several research studies have investigated risk factors for not achieving a GLD. Children with ‘English as an Additional Language’ (EAL) status have been found to have lower proportions of GLD achievement in comparison to native English-speaking children [30], and children born later in the academic year are much less likely to achieve a GLD [3133]. Additionally, children achieving the GLD have higher odds of performing at expected levels on later academic assessments at age 7 [34], and lower odds of later being identified as having Special Educational Needs or Disability (SEND) [35].

Whilst the GLD is a useful benchmark to establish which children are meeting the core components of the EYFSP, it has important limitations. Dichotomising variables (continuous or categorical) is problematic for two key reasons. First, much information is lost, so the statistical power to detect an association using the variable is reduced substantially [36]. In fact, dichotomising a variable can reduce statistical power by the same amount as would discarding a third of the data or more [37]. Second, dichotomisation can lead to an underestimation of the extent of variation in outcome between groups, as individuals close to but on opposite sides of the cut point are characterised as being very different rather than very similar [36].

Applying the GLD method to the EYFSP therefore means missing out potentially valuable information on the number of goals for which a child meets or exceeds. It means that children very close to, but on opposite sides of, the GLD threshold are characterised as being very different, despite meeting or exceeding a similar number of goals. For instance, children who meet zero goals, and children who meet eleven out of twelve GLD goals, would be scored as ‘0’ on the GLD. The GLD also essentially ignores the distinction between children who are “Expected” and “Exceeding” in various goals, as a child who scores “Expected” in all the GLD goals, and a child who scores “Exceeding” in all the GLD goals would both be scored as a 1. As children vary considerably across different developmental areas during early childhood [38,39], this simple GLD approach is a very limited assessment of children’s developmental health. In summary, much of the variation in the EYFSP items, and thus the variation in developmental health amongst children, is ignored by the GLD measure.

The ‘total score’

An alternative to the GLD is to instead assign numerical scores to each category in the EYFSP (e.g., 0 for emerging, 1 for expected, and 2 for exceeding in the revised version; or 0 for emerging and 1 for expected in the newer version), and sum these scores into a ‘total score’ (resulting in a score ranging between 0–34 for the original version, and 0–17 for the revised version). This approach overcomes the above limitations that are found with using the GLD, as it better captures the variation in EYFSP responses.

Nonetheless, the EYFSP total score has been seldom used in research studies in comparison to the GLD. Previous research has considered the impact of early years workforce qualifications on children’s later EYFSP total scores [40,41]. One study found the original version of the EYFSP to be predictive of later language, literacy, and mathematics [42]. Since then, only one study has used the revised version of the EYFSP total score to predict later outcomes, finding it to be a strong predictor of later Autism Spectrum Disorder diagnoses for children within the Born in Bradford cohort [43]. Importantly, there are no studies exploring the psychometric measurement properties of the EYFSP total score.

Subscale scores of the EYFSP

As described earlier, there are seven individual learning areas within the EYFSP. However, associations between the seven individual areas of the EYFSP and later related outcomes have not been extensively explored. This may provide information about the predictive validity (i.e., the extent to which a test is an adequate reflection of a ‘gold standard’) of the specific areas [44]. For instance, do the ‘personal, social and emotional development’ areas have significant predictive associations with a validated measure of children’s social and emotional development? If so, this specific area (with a score ranging between 0-12) could be used as an outcome in isolation, meaning that intervention studies aiming to improve children’s social and emotional development could use this area with the three goals as an outcome. This rationale can be generalised to all seven areas of the EYFSP.

The preliminary evidence on whether the individual areas significantly relate to other outcomes is promising, but very limited. Children with higher language comprehension scores achieved higher scores on the EYFSP writing scale, however, the writing scale is no longer in the current version of the EYFSP [45]. In the Born in Bradford cohort, EYFSP scores relating to literacy and physical development were found to predict total difficulties on the Strengths and Difficulties Questionnaire (SDQ) [27]. However, the EYFSP scores relating to literacy and physical development are not the most relevant subscales for the SDQ total difficulties score, and it was not reported how the EYFSP subscale scores were calculated for this particular study.

The EYFSP is recommended for use in educational settings to assess children’s strengths and weaknesses, and whether they need support in a particular area [23]. Information about the predictive validity of the specific areas will provide confidence in doing this, as it will validate whether the areas are predictive of later outcomes.

Rationale and objectives

The EYFSP total score has huge potential to provide useful information on children’s early developmental health that could be utilised for research and educational purposes, at both a population and individual level. Despite the EYFSP being administered to over 7.5million children since being introduced [34], there is an absence of any psychometric research on it. Specifically, there is no previous research on the internal consistency or structural validity of the EYFSP ‘total score’, nor any research on its predictive validity for academic outcomes. Research is therefore needed to establish whether the EYFSP ‘total score’ is fit for purpose in both research studies and applied educational settings.

We first investigate the structural validity of the EYFSP total score; that is, the degree to which the total score reflects the dimensionality of the construct to be measured [44]. We achieve this using Item Response Theory (IRT); a set of psychometric models for developing and refining psychological measures [46]. To accompany this, we investigate the internal consistency of the EYFSP; that is, the degree of the interrelatedness among the items which represents the extent to which all items of a test measure the same construct [44,47].

We also investigate the predictive validity of the EYFSP total score, to assess the degree to which it predicts future outcomes [44]. Since it is assumed that measures administered at the start of school can provide an understanding into children’s future attainment, establishing predictive validity is crucial [4]. Whilst the predictive validity of the EYFSP GLD has been investigated [34,35], the predictive validity of the EYFSP total score for academic outcomes has not been investigated. We investigate whether the EYFSP total score is predictive of children’s later academic outcomes at age 10–11 years, and investigate whether specific EYFSP subscales (relating to communication and socioemotional wellbeing), are predictive of children’s behavioural, social, and emotional difficulties.

In summary, we had five aims that assessed two key aspects of using the EYFSP for research and educational purposes:

Internal Consistency/Structural validity of the EYFSP:

  • 1)

    Investigate whether the EYFSP items demonstrate internal consistency

  • 2)

    Investigate whether the EYFSP items demonstrate structural validity, i.e., that the total scores from the instrument can be used as a summary measurement that represents children’s early school skills

Predictive Validity of the EYFSP:

  • 3)

    Investigate if the EYFSP total score predicts children’s later academic attainment (for maths, reading, and grammar/punctuation/spelling)

  • 4)

    Investigate if the EYFSP total score predicts children’s later behavioural, social, and emotional difficulties

  • 5)

    Investigate if the EYFSP subscales (relating only to communication and socioemotional wellbeing) predict later behavioural, social, and emotional difficulties

Materials and methods

Design

This study comprises secondary data analyses of an observational birth cohort in Bradford, England.

Setting

The data source is the longitudinal cohort study, Born in Bradford (BiB). The BiB cohort recruited pregnant mothers between March 2007 and December 2010 at the Bradford Royal Infirmary. All babies born to these mothers were eligible to participate and more than 80% of women invited agreed to participate [48]. The cohort comprises of 12,453 mothers, 13,776 pregnancies and 3,448 fathers. At recruitment, the two largest ethnic groups in the sample were Pakistani heritage (45%) and White British (40%), followed by Indian (4%) and Asian Other (3%) [49].

Mothers completed the BiB baseline questionnaire when they were recruited and reported information on family demographics and socioeconomic indicators. Routine education data relating to personal characteristics and educational outcomes were obtained from the Local Authority every year that the child attends school, starting at age 4 (Reception year). Additional bespoke data were collected by Born in Bradford on children aged 7 to 10 years in 89 Bradford schools between 2016 and 2019, including a teacher reported Strengths and Difficulties Questionnaire (SDQ) (which is the outcome for Research Questions 4-5) [38]. Born in Bradford and the ‘Primary School Years’ wave received ethical approval for the data collection from the NHS Health Research Authority’s Yorkshire and the Humber—Bradford Leeds Research Ethics committee (references: 07/H1302/112, 16/YH/0062). Informed written consent was obtained for all parents recruited.

Internal consistency and structural validity analyses

The analyses were preregistered at osf.io/s6num. Data were combined and cleaned using Stata/MP 18.0. Internal validity analyses were completed using the mirt and [50,51] ggmirt [52] packages in R.

Measurements.

The EYFSP total score was summed from the 17 Early Learning Goals (ELGs) in the profile.

As seen in Table 1, each area of learning contains specific goals. The EYFSP handbook provides a description of each goal and what a child must achieve to meet each level [24]. Practitioners are instructed to review the evidence gathered in order to make a judgement for each child and for each ELG, and then to score each ELG as either:

Table 1. Overview of the ELG’s within the EYFSP and the area of learning they relate to.
Area of Learning EYFSP ELGs
(See Section 6 of 2020 EYFSP handbook for further detail [24])
Communication and language development 1. Listening and attention*
2. Understanding*
3. Speaking*
Physical development 4. Moving and handling*
5. Health and self-care*
Personal, social and emotional development 6. Self-confidence and self-awareness*
7. Managing feeling and behaviour*
8. Making relationships*
Literacy 9. Reading*
10. Writing*
Mathematics 11. Numbers*
12. Shape, space and measures*
Understanding the world 13. People and communities
14. The world
15. Technology
Expressive arts and design 16. Exploring and using media and materials
17. Being imaginative

Note: Asterisks are the ELGs that a child must achieve at least ‘expected’ level into achieve a GLD.

  • Emerging: not yet at the level of development expected at the end of the EYFS

  • Expected: best described by the level of development expected at the end of the EYFS

  • Exceeding: beyond the level of development expected at the end of the EYFS

The EYFSP handbook instructs that practitioners must make their final EYFSP assessments based on all their evidence, where ‘evidence’ means any “material, knowledge of the child, anecdotal incident or result of observation, or information from additional sources that supports the overall picture of a child’s development” [24].

The responses to each ELG and how they were coded in this study are as follows: ‘Emerging’ =  0, ‘Expected’ =  1, and ‘Exceeding’ =  2. If children were absent from school for a long period of time, this is marked on their records and these children were dropped from the analyses. The EYFSP total score was summed from the 17 ELGs (see Table 1), and therefore ranged between 0–34.

Analysis.

We used Item response theory (IRT) to assess the structural validity of the EYFSP total score [46]. IRT can be used to assess whether creating a total score from the items is appropriate and assess the strength of relationships between items and constructs of interest. Item response models assume the latent trait variable is reflected by a unidimensional continuum (i.e., item responses are explained by one latent continuous variable, or single dimension). We fitted a polytomous ‘Rating Scale’ version of the 1-parameter logistic Rasch model, since the items have more than two possible response categories (see further details under ‘Rasch model parameters’) [53]. Under the Rasch model, two test takers who both achieved, for example, 12 EYFSP items, but who achieved a different set of items would receive the same ability estimate [54]. This allows us to interrogate the structural validity of the summed ‘total score’.

Rasch model parameters.

Let Yij denote the response to item i for child j, with Yij taking the values 0 (‘Emerging’), 1 (‘Expected’) or 2 (‘‘Exceeding’). The polytomous rating scale Rasch model posits that the probability of child j with latent ability θj obtaining responses 0, 1 or 2 for item i are given by:

PrYij=0 |bi,d1,d2,θj=11+expθjbi+d1+expθjbi+d1+θjbi+d2
PrYij=1|bi,d1,d2,θj=expθjbi+d11+expθjbi+d1+expθjbi+d1+θjbi+d2
PrYij=2|bi,d1,d2,θj=eθjbi+d1+θjbi+d21+expθjbi+d1+expθjbi+d1+θjbi+d2

where bi. denotes the overall difficulty of item i and d1,d2 denote the distances between adjacent response categories (common across all items). Furthermore, it is assumed that θj~N0,σθ2 and that the item discrimination parameters are 1 across all items. This contrasts with conventional Rasch parameterisation which constrains the item discrimination parameters to be constant across all items (but not equal to unity) and assumes the latent ability θj to be distributed N0,1.

The item difficulty parameter measures the difficulty of achieving a higher scoring response, whereas the discrimination parameter is a measure of the differential capability of an item (i.e., a high discrimination value suggests an item that has a high ability to differentiate between subjects with similar, latent abilities) [55]. In a Rasch model, discrimination is constrained to be equal across all items, and difficulty is estimated separately for all items [54]. The polytomous rating scale version of the Rasch model also includes category threshold parameters which are constrained to be equal across items, and provide a measure of the distances between the difficulties of adjacent levels of response for each item.

Model fit.

The fit of the Rasch model was assessed using Root Mean Square Error of Approximation (RMSEA), where values of < 0.02 with sample sizes of 1000 + indicate that the data do not underfit the model [56]. We also report the Comparative Fit Index (CFI) (values > .90 are acceptable), and Standardised Root Mean Square Residual (SRMR) (values < .08 are acceptable) [57].

Item fit.

Item infit and outfit indicate how well the item responses fit the model [58]. Item fit was assessed using infit/outfit statistics, with values between 0.5 and 1.5 considered to be acceptable [59] and RMSEA as described above.

Local dependence.

Local dependence is the assumption that the only influence on an individual’s item response is that of the latent trait variable being measured and that no other variable (e.g., other items on the EYFSP scale) is influencing individual item responses. We used the ‘residuals’ function in the mirt package to examine the standardised local dependency χ2 statistic (where any correlation higher than the average item residual + .2 [60] classifies as local dependency).

Item Response Theory visuals.

The test information function shows a measure of the information provided by the total test score across the range of latent ability levels (denoted θ). Information is a statistical concept that refers to the ability of a test (or item) to reliably measure the latent ability θ. The test characteristic curve shows the relationship between the total summed score on the y axis, and latent ability (θ) on the x axis [61]. Plots of item characteristic curves and item information functions are provided at osf.io/s6num/.

Unidimensionality.

We tested unidimensionality with a Confirmatory Factor Analysis (CFA) of a latent trait with all EYFSP items loading onto it and examined McDonald’s hierarchical Omega, which reflects the percentage of variance in the scale score accounted for by a single general factor. This allows us to estimate the extent of internal consistency among the EYFSP.

Predictive validity analyses

The analysis plan for the predictive validity analyses was preregistered at osf.io/s6num. Following pre-registration, we made two changes to the analytic plan. These were: (1) the inclusion of a binary term for Special Educational Needs and Disabilities (SEND) status as a covariate in all analysis models and (2) inclusion of ‘school at time of outcome’ as a random intercept in all analysis models. Inclusion of SEND status as a control covariate was necessary as children with SEND may have lower EYFSP scores relative to typically developing children [62]. Inclusion of ‘school at time of outcome’ as a random intercept was necessary as EYFSP scores may vary across schools. All analyses for this component of the research were undertaken using Stata/MP 18.0.

Measurements.

There were two separate predictors for this analysis. For the measurement of EYFSP total score, see the above section.

For the measurement of EYFSP Communication and Socioemotional goals (EYFSP-CS), we tested the strength of the association between the ‘communication and language’ and ‘personal, social, and emotional’ ELG’s with children’s outcomes. This EYFSP-CS score ranged between 0-12 and was obtained by summing the responses to the six items in the two relevant areas.

There were two separate outcomes for this analysis. For Research Question 3, we measured Academic attainment via the Key Stage 2 Assessment completed towards the end of Year 6 at school by children when aged 10-11. In educational records, there are separate continuous scaled scores for (1) Maths, (2) Reading, and (3) Grammar/Punctuation/Spelling that range between 0 and 120. Any children who scored ‘0’ were excluded from the analyses, as any children with ‘0’s recorded are pupils who have achieved too few marks to be awarded a scaled score [63]. Analysed scores therefore ranged between 80 and 120.

For Research Questions 4-5, we used the Strengths and Difficulties Questionnaire (SDQ) to measure children’s behavioural, social, and emotional difficulties [64]. The SDQ was collected once for children when they were aged 7-10 in the ‘Primary School Years’ wave. The 25 items in the SDQ comprise five scales of five items each. ‘Somewhat True’ is always scored as 1, but the scoring of ‘Not True’ and ‘Certainly True’ varies with the item. A total difficulties score is generated by summing scores from all the scales (emotional symptoms, conduct problems, hyperactivity, peer relationships) except the prosocial scale, and the resultant score ranges from 0 to 40, where a higher score indicates higher difficulties

Table 2 below provides an overview of all covariates included in both models. Covariates were included in the regression models if they were thought to be confounders of the association between EYFSP and the outcome, or if they were covariates that would be expected to improve the precision of our estimates.

Table 2. Overview of covariates in all models.
Variable (with evidence for relationship to exposure/outcome) Variable type (scale) Details
EYFSP score (exposure) Continuous (0–34) Modelled via a single linear term
Child English as an Additional Language (EAL) (confounder) [65] Binary (0/1) Coded as 0 =  English is first language, 1 =  English is an Additional Language
Child ethnicity (confounder) [66] Categorical (0/1/2) Coded as 0 =  White British, 1 =  Pakistani, 2 =  Other
Parent immigration status (confounder) [67] Binary (0/1) Coded as 0 =  Born in UK, 1 =  Born outside of UK
Socioeconomic status (confounder) [8,68] Categorical (0/1/2/3/4) “Most economically deprived” =  0, “benefits and not materially deprived = 1, “employed and no access to money = 2, “employed and not materially deprived” =  3, “Least socioeconomically deprived and most educated” =  4.
Derived from a previously validated measure of socioeconomic position in Born in Bradford [69]. See Supporting Information File 2 (Attachment A) for the characteristics of the socioeconomic groups.
Special educational needs and/or disability (SEND) (confounder) [68] Binary (0/1) Coded as 0 =  No SEND, and 1 =  Any SEND (including children with an EHCP).
Child age at time of outcome (covariate) [68] Continuous Child age in months is recorded for Research Question 1, and child age in years is recorded for Research Question 2 (due to data availability). Both modelled via a single linear term in the respective analyses
School at time of outcome (multilevel variable) Categorical Modelled via a random intercept

Analysis models.

All research questions were answered using linear mixed effects models, with fixed effects of socioeconomic status, parent immigration status, child ethnicity, SEND, child age, and child language as covariates (see Table 2), and a random intercept for school at the time of outcome measurement. The four outcomes were: (1) Reading, (2) Maths, (3) Grammar, Punctuation, and Spelling, and (4) SDQ. The SDQ scores were analysed twice, once using EYFSP total score as a predictor, and once using EYFS-CS subscale as a predictor. The model for each outcome can be described as;

δij=β0+β1EYFSPscoreij+β2ChildEALij+β3&4Childethnicityij+β5Parentimmigrationstatusij+β6&7&8&9Socioeconomicstatusij+β10SENDij+β11ChildAgeij+Uj+εij

Where δ is each outcome, β0 is the intercept, each β is a coefficient, uj is the random intercept for school j, and εij is the residual error for individual i within school j. The letters identify the levels within the model, where i is the individual and j is the school. Child ethnicityij & Socioeconomic statusij represent a set of dummy variables.

Unstandardized regression coefficients and Wald method 95% confidence intervals based on variance estimates obtained via Rubin’s rules are reported for all models [70].

Missing data methods.

We used Multiple Imputation using Chained Equations (MICE) to impute missing data on parent immigration status, socioeconomic position, and SEND (see Fig 1 for numbers of missing values), under the assumption that the missing values are missing at random (MAR). Briefly, data are MAR if the probability of the data being missing does not depend on the unobserved measurements/values, conditional on the observed data [70,71]. While the validity of this approach to analysis rests on assumptions about the nature of the missing data, and indeed the appropriateness of the imputation and substantive analysis models, we believe that these assumptions serve as reasonable approximations to reality in the present context, and are certainly more plausible than the assumptions underpinning the analysis that excludes the incomplete cases.

Fig 1. Flow chart of included study participants..

Fig 1

Every variable that was in the analysis model was included in the imputation model. Eligibility for free school meals (binary, no missing values) was also included in the imputation model. We used Stata’s ‘mi impute chained’ command to generate 25 imputed datasets for each research question. The results section presents the pooled results from the multiply imputed datasets (results from analyses based on complete cases were similar).

Robustness checks.

Model fit was assessed between models run with (1) EYFSP modelled as a continuous variable via a single linear term and (2) EYFSP as an unordered categorical variable modelled via a series of dummy variables. Model fit assessed via AIC and BIC was marginally better with EYFSP as a continuous variable, and the continuous modelling provides a more parsimonious estimate, so this model was selected. A scatter plot of fitted and residual values was considered to show no evidence of heteroskedasticity.

Effect sizes.

Half of a standard deviation has been previously found to correspond to a minimum clinically important difference [72,73]. We therefore calculated half of a standard deviation in the outcomes, and compared these to our effect estimates. The outcomes, standard deviations, and effect sizes of interest are provided in Table 3.

Table 3. Standard deviations and effect sizes of interest for all outcomes.
Standard deviation Effect size of interest (unstandardised)
Maths 7.05 3.52
Reading 8.16 4.08
Grammar/Punctuation/Spelling. 8.09 4.05
Behavioural, social, and emotional difficulties (SDQ) 6.26 3.13

Results

Participants

Fig 1 shows the total number of recruited BiB children (n = 13,858), and the numbers within each measurement and analyses set.

Descriptive information

Table 4 describes the sample for all children who had EYFSP data (n = 10,589). The mean EYFSP total score in this sample was 15.30 (SD = 8.07), and it ranged between 0-34. The mean EYFSP score within children who achieved a GLD (n = 6,272, 59%) was 20.38 (SD = 4.96), and scores ranged between 12-34. The mean EYFSP score within children who did not achieve a GLD (n = 4,317, 41%) was 7.92 (SD = 5.67), and scores ranged between 0-27.

Table 4. Descriptive information on sample for all children with complete EYFSP data (n = 10,589).

Variable N (%)
Ethnicity
 White British 3,650 (34%)
 Pakistani 4,874 (46%)
 Other* 2,063 (19%)
 Missing 2 (<1%)
Socioeconomic Position **
 Most deprived 1,414 (13%)
 Benefits but coping 2,649 (25%)
 Employed no access to money 1,354 (13%)
 Employed not materially dep 1,730 (16%)
 Least deprived and most educated 1,529 (14%)
 Missing 1,913 (18%)
Parent immigration status
 Parent born inside UK 5,551 (52%)
 Parent born outside UK 3,168 (30%)
 Missing 1,870 (17%)
English as an Additional Language
 Yes 4,662 (44%)
 No 5,753 (54%)
 Missing 174 (2%)
Special Educational Needs
 No 8,345 (79%)
 Yes 2,132 (20%)
 Missing 112 (1%)
*

The most populous ethnic groups within other were Indian (16% of the Other group, Bangladeshi (14%), Other Asian (14%), and Other White (14%).

**

socioeconomic groups listed in Supporting Information File 2: Attachment A and in Fairley et al (2014).

Fig 2 further demonstrates that there is considerable overlap in total scores between children who do and do not achieve a GLD. It also demonstrates that there is substantial variability in scores within children who do and do not achieve a GLD.

Fig 2. Kernel density distributions of EYFSP total score for those who do not achieve a GLD (in blue) and do achieve a GLD (in orange) (n.

Fig 2

 = 10,589).

Item response theory analysis

Full analyses with the code, results, and additional sensitivity analyses are provided at https://osf.io/s6num/.

Structural validity: Rasch model parameters, model fit, and item fit.

The model fit values (RMSEA = 0.138, SRMSR = 0.162, CFI = 0.938) indicated poor fit to the overall Rasch model. The maximum likelihood estimates of the category threshold parameters were -3.585 and 3.473 and the maximum likelihood estimate of the variance of the latent ability was 9.532 (discrimination parameter constrained to be equal to 1). We next assessed the item parameters and the item fit values for the overall Rasch model.

Table 5 shows that the easiest item is ‘Moving and handling’ (goal 4), and hardest item is ‘Writing’ (goal 10). The item fit values show that Item 9 has the highest RMSEA value (although other items also have problems with misfit). The item infit/outfit values are provided at osf.io/s6num/ and generally indicated values within the acceptable range.

Table 5. Rating Scale Model parameters and item fit values.
Item Difficulty θ such that
Pr(Emerging|θ) = Pr(Expected|θ)
θ such that
Pr(Expected|θ) = Pr(Exceeding|θ)
RMSEA P value
1. Communication and language: Listening and attention 0.000 −3.585 3.473 0.040 <.001
2. Communication and language: Understanding 0.037 −3.548 3.510 0.050 <.001
3. Communication and language: Speaking 0.446 −3.139 3.919 0.034 <.001
4. Physical development: Moving and handling −0.147 −3.732 3.326 0.038 <.001
5. Physical development: Health and self-care −0.050 −3.635 3.423 0.035 <.001
6. Personal, social and emotional: Self-confidence and self-awareness 0.125 −3.460 3.598 0.022 <.001
7. Personal, social and emotional: Managing feelings and behaviour 0.317 −3.268 3.790 0.025 <.001
8. Personal, social and emotional: Making relationships 0.146 −3.439 3.619 0.032 <.001
9. Literacy: Reading 1.206 −2.379 4.679 0.070 <.001
10. Literacy: Writing 1.975 −1.610 5.448 0.050 <.001
11. Mathematics: Numbers 1.431 −2.154 4.904 0.041 <.001
12. Mathematics: Shapes, space and measures 1.408 −2.177 4.881 0.032 <.001
13. Understanding the world: People and communities 1.224 −2.361 4.697 0.025 <.001
14. Understanding the world: The world 1.311 −2.274 4.784 0.022 <.001
15. Understanding the world: Technology 0.607 −2.978 4.080 0.063 <.001
16. Expressive arts and design: Exploring and using media and materials 0.869 −2.716 4.342 0.034 <.001
17. Expressive arts and design: Being imaginative 1.110 −2.475 4.583 0.030 <.001

Note: Column two shows the estimated item level difficulty, where higher values indicate greater difficulty. Columns three and four show the values obtained by adding the estimated category threshold parameters (-3.585 and 3.473) to the estimated difficulty parameters. These show the values of the latent ability θ such that the probabilities of a participant achieving adjacent responses are equal.

Local dependence.

The local dependency matrix is presented in osf.io/s6num/. The matrix identifies a local dependence issue between Items 2 & 3 (communication items) (residual = .44); and Item 9 & Item 10 (the literacy items) (residual = .48).

Test information function and test characteristic curve.

Fig 3 demonstrates that most information is provided at the lower/higher ends of ability (i.e., those children with latent abilities at least one standard deviation above/below the mean latent ability). It also shows that less information is provided for children with close to average abilities - shown by the dip in the curve around θ =  0. Fig 4 presents the scale characteristic curve, showing the relationship between the total summed score on the y axis, and the overall latent ability (θ) on the x axis. The test shows good discrimination for children with latent abilities that are slightly-to-moderately higher and lower than average (i.e., θ [-5, -1]  ∪  [1,5]), and slightly less powerful discrimination for children with close to average abilities (shown by the flattening in the curve around θ =  0), and for children with very high or low abilities (shown by the flattening of the curve at the more extreme values of θ.

Fig 3. Test Information Curve for θ∈ [-10, 10].

Fig 3

Fig 4. Scale characteristic curves representing total score (0-34) across ability (θ∈ [-10, 10]) based on the fitted model.

Fig 4

Internal consistency.

The CFA indicated high factor loadings (all > .8) onto one construct, and a parallel analysis indicated that a one factor model was a reasonable representation of the data [74]. We assessed internal consistency using McDonald’s hierarchical omega, finding a point estimate of 0.89.

Predictive validity analysis

Academic attainment outcome.

The mean scores and standard deviations for the Key Stage 2 outcomes were Maths =  105.08 (7.05), Reading =  103.67 (8.16) and Grammar/Punctuation/Spelling (GPS) =  106.64 (8.09). For full regression results and the key analyses code, please see Technical Appendix File 2 (Attachment C). All models indicated that a higher EYFSP total score was associated with a higher Key Stage 2 outcome. Key results are described and displayed below.

For the Maths outcome (n = 2711), the model explained a significant amount of the variance (unadjusted R2 = .33; F(11,23939.7) =  124.65, p < .001). Higher EYFSP total scores were associated with higher Maths scores (B = 0.356 [0.322 to 0.390], p < .001).

For the Reading outcome (n = 2711), the model explained a significant amount of the variance (unadjusted R2 = .31; F(11,22414.3)) =  108.91, p < .001). Higher EYFSP total scores were associated with higher Reading scores (B = 0.424 [0.384 to 0.464], p < .001).

For the GPS outcome (n = 2711), the model explained a significant amount of the variance (unadjusted R2 = .37; F(11,18477.5) =  146.05, p < .001). Higher EYFSP total scores were associated with higher GPS scores (B = 0.427 [0.390 to 0.464], p < .001).

Fig 5 displays the association between a difference in EYFSP goals (ranging between 1-10), and the estimated change in outcome in the different academic outcomes. For instance, an increment of 1 EYFSP total score point results in a change of between 0.36 to 0.42 in the outcomes, and an increment of 10 results in a change of between 3.56 to 4.24. To reach a minimum clinically important difference (shaded area in Fig 3), the difference in EYFSP total score required is approximately 8 for Reading and Grammar/Punctuation/Spelling, and 10 for Maths.

Fig 5. Increase in ‘EYFSP total score’ and ‘EYFSP-CS score’ associated with change in Academic Outcomes (Maths, Reading, Grammar/Punctuation/Spelling).

Fig 5

Behavioural, social, and emotional difficulties outcome.

The mean EYFSP-CS score was 5.78 (SD = 3.16). The mean SDQ score was 7.31 (SD = 6.26). For full regression results, please see Technical Appendix File 2, Attachment D. Key results are described and displayed below. Note that a higher score on SDQ indicates more socioemotional difficulties.

When we included the EYFSP total score as the predictor (n = 984), the model explained a significant amount of the variance (unadjusted R2 = .25, p < .001; F(11,66324.9) =  27.04). The EYFSP total score was associated with a decrease in the SDQ total difficulties (B = -0.20 [-0.26 to -0.15], p < .001).

When we included only the EYFSP-CS predictor (n = 984), the model explained a significant amount of the variance (unadjusted R2 = .25, p < .001; F(11,67390.5) =  26.72). The EYFSP-CS sores were associated with a stronger decrease in the SDQ total difficulties (B = -0.48 [-0.61 to -0.37], p < .001).

Fig 4 displays the association between an increase in EYFSP goals (ranging between 1–10), and the estimated change in outcome for the socioemotional wellbeing measure (SDQ). For instance, a change of 1 in the EYFSP total score results in a change of -.20 in SDQ, and a change of 1 in EYFSP-CS results in a change of -.48 in SDQ. A change of 6 in the EYFSP total score results in a change of -1.22 in SDQ, and a change of 6 in EYFSP-CS results in a change of -2.90 in SDQ (with the confidence interval crossing over the clinically important difference) (Fig 6).

Fig 6. Increase in ‘EYFSP total score’ and ‘EYFSP-CS score’ associated with change in SDQ total difficulties score.

Fig 6

Note: estimates were produced using Stata user written command xlincom. The shaded area represents an estimated minimum clinically important difference.

Discussion

The first aim of this study was to investigate the internal consistency the EYFSP items [44,47]. The EYFSP items demonstrated high internal consistency, with results indicating that the items primarily measure one unidimensional construct. We tentatively suggest that the measured construct is compatible with the definition of children’s ‘developmental health’. The construct of developmental health encompasses a holistic understanding of children’s physical, mental, social, and emotional wellbeing, combined with core educational abilities such as mathematics and literacy [1]. This reflects the EYFSP’s original purpose to operate as a research-based framework of children’s learning and development [21,75].

The second aim was to investigate if the EYFSP demonstrated structural validity. The IRT analyses indicated a poor fit to the polytomous Rasch model. However, the test information and scale characteristic curves show the total score provides substantial information across a wide range of underlying ability, with some loss of precision at very close to average abilities. This indicates that whilst the test provides information across a wide range of ability levels, it provides relatively less information for children with ‘average’ latent abilities (e.g., the 40% of children between roughly the 35th and 75th percentiles). This means that two children with equal scores of, for example, 16, may have different ability levels in reality (e.g., one could have slightly below average ability and one slightly above), but that the EYFSP total score is not able to precisely discriminate between them. It should also be noted that an IRT Rasch model is extremely restrictive, as it requires all items to be equally discriminating, and this is very rarely the case in measurements of person ability [54].

Although IRT has been used to examine measures of, for example, emotional dysregulation [76] and neuropsychological capabilities [77] of young children, we have not been able to locate any previous studies using IRT on a comparable measure of broad child developmental health. The closest study is an investigation of the ‘Denver Developmental Screening Test (DDST)’, though the aim of this study was to apply IRT to develop a scoring method to estimate ‘ability age’ of individual children using the DDST [78]. Hence, IRT has been underused in the context of examining the structural validity of child developmental health measures. Our study is therefore first to investigate the EYFSP’s internal consistency and structural validity using IRT, and one of the first to apply this method to early child developmental health.

Nonetheless, the internal consistency and structural validity of comparable child developmental health measures has been investigated using other psychometric analysis methods, namely the EDI and TS GOLD. The EYFSP has now demonstrated similar adequate internal consistency as the EDI [17], and demonstrates model fit similar to both the EDI and TS GOLD, which have also shown poor model fit [17,19]. Model fit refers to the ability of a model to reproduce the data, with poor model fit indicating that relationships between variables may be incorrectly specified in the applied statistical model [57]. Given that the EYFSP, EDI, and TS-GOLD have all demonstrated poor model fit, this may indicate a general challenge with using teacher reported measures of overall child developmental health. As this is such a broad construct, perhaps it is challenging to assess in one holistic measure. Indeed, a systematic review of parent-reported measures of child social and emotional wellbeing/behavior found such measures to have structural validity [79].

The poor model fit may be explained by misspecification of individual items. Item misfit can arise due to multidimensionality (where the item relates to a separate latent trait) and/or poor item quality [58]. The items with worst fit were two of the ‘Literacy’ area items, and one of the ‘Understanding the World’ items. This may indicate that these items do not measure the latent trait of children’s developmental health, and that their removal may improve the model fit. However, Literacy is one of the core areas of the EYFSP and is crucial for teachers to be able to assess. Hence, this item could instead be replaced with another, similarly worded item that reflects ‘Literacy’, but has a better fit to the model.

The poor model fit may also be due to the less precise estimates of ability evident for children with ‘average’ ability, which may relate to the varying administration of the measure in educational settings [40,41]. The administration is not standardised or moderated, and therefore susceptible to considerable variation. Additionally, the procedures and requirements of the EYFSP may not lend themselves to identification of more nuanced differences in ability for children with generally average levels of development. The high number of children meeting expected levels of development in all 17 goals is potentially indicative of this issue. More guidance for teachers on how to identify differences in children’s abilities, as well as more robust procedures for moderating scores, could potentially address the apparent issues with reduced precision for children with close to average abilities, and increase the information provided by the measure.

Evidence from comparable measures of child developmental health comes from numerous psychometric studies conducted in several different countries over several years [17,19]. Hence, to be able to thoroughly compare the EYFSP to these similar measures, more research on the measurement properties of the EYFSP is needed (see implications and future directions section).

Our third aim was to investigate the predictive validity of the EYFSP total score for academic outcomes. We found that the EYFSP total score strongly and consistently predicts academic outcomes at ages 10-11 in Maths, Reading, and Grammar, Punctuation and Spelling assessments. It has been previously found that the EYFSP GLD is predictive of children’s academic outcomes at ages 6-7 during Key Stage 1 [34], and the present study extends this finding to the EYFSP total score, and to Key Stage 2 assessments at ages 10-11 years. To reach an important change in academic outcomes (considered to be half the standard deviations of the observed Key Stage 2 scores), a difference in EYFSP total score of 8-10 points was required (dependent on the outcome). This information will be useful for researchers to note if they wish to use the EYFSP total score as an outcome for intervention studies. For instance, as a difference in EYFSP total score of 8 was required to reach an important difference for the Reading outcome, and this could be used as a benchmark for future educational interventions which aim to improve children’s reading abilities. Though, the estimates reported in Fig 3 could also be used to identify differences in the EYFSP total score that translate to smaller differences in these outcomes, which may serve as more realistic target differences for future intervention studies.

Our fourth and fifth aims were to explore the predictive validity of the EYFSP total score and the EYFSP-CS subscales for children’s behavioural, social, and emotional difficulties at ages 7-10. The relevant EYFSP-CS subscales had a much stronger association with behavioural, social, and emotional difficulties than the EYFSP total score. A difference of 6 points for the EYFSP-CS score was associated with important differences in behavioural, social, and emotional difficulties, whereas no changes in the EYFSP total score were associated with important differences. Again, a difference of 6 (for the EYFSP-CS score) could be used as a benchmark for future interventions which aim to improve children’s behavioural, social, and emotional abilities (or translated for a more realistic target difference). Researchers can more confidently use the communication and social subscales to measure behavioural, social, and emotional difficulties.

There is only one other study that has reported predictive validity of this version of the EYFSP subscales; it found that EYFSP scores relating to literacy and physical development also predicted children’s behavioral, social, and emotional difficulties [27]. In comparison to other measures of child developmental health, the EDI has demonstrated variable predictive validity between the language and cognitive development domain scores and the Peabody Picture Vocabulary Test [17]. The TS GOLD has been found to be associated with children’s assessments throughout the school year [80], and has been found to have variable concurrent validity with the Bracken School Readiness Scale [19]. The EYFSP therefore has demonstrated adequate predictive validity in comparison to other measures of child developmental health, however, there are substantially fewer studies regarding the EYFSP.

Implications and future directions

Although this study has highlighted some limitations of using the EYFSP, we do not suggest the use of a different measure of child developmental health over the EYFSP. Whilst other measures of child developmental health have undergone further psychometric analyses, they too demonstrate some variable psychometric properties (e.g., variable model fit values have been demonstrated in both the EDI [17] and TS-GOLD [19]). Replacement of the EYFSP would be a significant overhaul to current educational practice and should be avoided if possible. Hence, once the measurement properties of the EYFSP are investigated as described below, the EYFSP could be used with more confidence than it currently is. Or, if the measurement properties of the EYFSP are found to be significantly lacking, a significant programme of development should be undertaken to develop it further, or replace it with an already validated instrument of child developmental health.

However, the findings from this study do support future use of the EYFSP total score over the EYFSP GLD score for research and educational purposes. Although both the GLD and now the total score have been shown to predict future outcomes [34,35], there is substantial variation in total scores within children who do and do not achieve a GLD (see Fig 1). The GLD therefore does not capture the variation in children’s developmental health that the EYFSP total score does, and it has no other evidence regarding its measurement properties. We therefore recommend that researchers use the EYFSP total score instead of the GLD, if it suits their study purpose. For teachers, the GLD is a useful metric for identification of children who may later be diagnosed with special educational needs [35], however, teachers may wish to also examine a child’s EYFSP total score to gain a more nuanced understanding of a pupils’ development. Though, it is important to note that the EYFSP total score should be used with some caution when making inferences about ‘average’ ability children (those with total scores between approximately 15 and 18).

There is still much to be learnt about the measurement properties of the EYFSP. It would be beneficial to directly compare the measurement properties of the EYFSP total score to the GLD in a future study, explicitly examining whether more valid and accurate conclusions can be made about child developmental health using the EYFSP total score than can be done using the GLD. There are also other measurement properties which could be tested, including the content validity (the degree to which the EYFSP reflects children’s developmental health as a construct) and criterion validity (the degree to which the EYFSP items and summaries derived thereof are an adequate reflection of a gold standard). This would require collection of an additional measure of child development at the same time of the EYFSP, perhaps the EDI, since this is implemented in a comparable way at population level, and has undergone substantial development since its inception [15,81]. In terms of predictive validity, we explored associations between EYFSP-CS and the SDQ, due to the Born in Bradford sample facilitating this analysis with a sufficient sample size, providing an ideal test case. However, more research is needed to explore whether each specific goal area is associated with other measurements (e.g., literacy to reading assessments, physical activity to motor skill measurements).

Finally, it will be important to test the measurement invariance of the EYFSP over socio-demographic subgroups. Given the possibility that teacher-based assessment may systematically underassess minority groups [82], we suggest a future study should explore the validity of the measure across various socioeconomic and ethnic groups.

Limitations

This study specifically considered data from the second version of the EYFSP, which was administered between 2012 to 2021. We tentatively suggest that the findings regarding internal consistency and predictive validity will generalise to the revised version of the EYFSP, as the items (which contain similar wording and measure the same areas, see Technical Appendix File 1) should remain internally consistent with one another and predictive of future outcomes, even with less variation in the data. However, as data from the revised EYFSP becomes widely available, future research will need to test if the findings in this study generalise to the current version. This is particularly important as the structural validity may be significantly affected by the changes to the content of the ELGs, along with the removal of one of the response categories (the ‘Exceeding’ category). Removal of this category may result in a ‘ceiling’ effect for populations, with most children scoring ‘expected’ on all categories. While this change may be acceptable for educational purposes, it may have a negative impact on the usefulness of the EYFSP total score for research purposes.

This study included only children in the Born in Bradford cohort, and therefore may only be relevant for comparable populations with high levels of deprivation and a diverse ethnic population. Whilst the ethnic diversity of this sample improves the generalisability of the findings to ethnically diverse populations, we did not explore the measurement properties of the EYFSP within specific ethnic groups.

Conclusions

While the EYFSP has been utilized as a measure of children’s early developmental health, this was not its intended purpose. Despite this, the present study has revealed that whilst caution should be applied for measurement of children with close to ‘average’ ability levels, the EYFSP total score is an internally consistent measure with predictive validity. The EYFSP total score also provides better information for children with very high and very low abilities. Given that the EYFSP was not developed as a robust measurement tool, the EYFSP total score appears to be a reasonable measure of child developmental health for routine use in England and Wales.

Supporting information

S1 File. Overview of the Early Learning Goals (ELGs) in the old and new versions of the Early Years Foundation Stage Profile (EYFSP).

(DOCX)

pone.0302771.s001.docx (15.4KB, docx)
S2 File. Predictive Validity Analysis.

(DOCX)

pone.0302771.s002.docx (313.8KB, docx)

Acknowledgments

Born in Bradford is only possible because of the enthusiasm and commitment of the children and parents in BiB. We are grateful to all the participants, health professionals, schools and researchers who have made Born in Bradford happen.

Data Availability

Data cannot be shared publicly as they are available through a system of managed open access. Researchers interested in accessing the data can find details and procedures on the Born in Bradford website (https://borninbradford.nhs.uk/research/how-to-access-data/). Data access is subject to review by the Born in Bradford Executive, who review proposals on a monthly basis. Requests can be submitted to borninbradford@bthft.nhs.uk. Data sharing agreements are established between the researcher and the Bradford Institute for Health Research. The full analysis code for the IRT analyses is publicly available at https://osf.io/s6num/.

Funding Statement

This work was supported by the National Lottery Community Fund (previously the Big Lottery Fund) as part of the A Better Start programme (Ref 10094849). The funder was not involved in the design of the study and collection, analysis, and interpretation of data nor in writing the manuscript. SLB is supported by the NIHR Yorkshire and Humber Applied Research Collaboration (ARC-YH; Ref: NIHR200166, see https://www.arc-yh.nihr.ac.uk,). The views in this publication are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. LHE was supported by an ESRC Postdoctoral Fellowship (ES/X006050/1).

References

  • 1.Keating DP, Hertzman C. Developmental Health and the Wealth of Nations: Social, Biological, and Educational Dynamics. In: The Guildford Press: New York, NY, USA, [Internet]. 1999 [cited 12 Jan 2024]. Available: https://books.google.co.uk/books?hl=en&lr=&id=EkCq8P0_ZGAC&oi=fnd&pg=PR15&ots=TDJP8Z8AF0&sig=DNdWg_8uTZOMQBVUKyVR1hu6EeU&redir_esc=y#v=onepage&q&f=false [Google Scholar]
  • 2.Romano E, Babchishin L, Pagani LS, Kohen D. School readiness and later achievement: replication and extension using a nationwide Canadian survey. Dev Psychol. 2010;46(5):995–1007. doi: 10.1037/a0018880 [DOI] [PubMed] [Google Scholar]
  • 3.Murrah W. Comparing self-regulatory and early academic skills as predictors of later math, reading, and science elementary school achievement. n.d.
  • 4.Raikes HA. Measuring of child development and learning; Background paper prepared for the 2016 Global education monitoring report, Education for people and planet: creating sustainable futures for all; 2016. 2016.
  • 5.Silles MA. The causal effect of education on health: Evidence from the United Kingdom. Economics of Education Review. 2009;28(1):122–8. doi: 10.1016/j.econedurev.2008.02.003 [DOI] [Google Scholar]
  • 6.Amin V, Behrman JR, Spector TD. Does More Schooling Improve Health Outcomes and Health Related Behaviors? Evidence from U.K. Twins. Econ Educ Rev. 2013;3510.1016/j.econedurev.2013.04.004. doi: 10.1016/j.econedurev.2013.04.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Claessens A, Duncan G, Engel M. Kindergarten skills and fifth-grade achievement: Evidence from the ECLS-K. Economics of Education Review. 2009;28(4):415–27. doi: 10.1016/j.econedurev.2008.09.003 [DOI] [Google Scholar]
  • 8.Sirin SR. Socioeconomic Status and Academic Achievement: A Meta-Analytic Review of Research. Review of Educational Research. 2005;75(3):417–53. doi: 10.3102/00346543075003417 [DOI] [Google Scholar]
  • 9.Marmot M, Allen J, Goldblatt P, Boyce T, McNeish D, Grady M, et al. The Marmot review: Fair society, healthy lives. The Strategic Review of Health Inequalities in England Post-2010. 2010.
  • 10.Measuring in support of early childhood development. Paediatrics and Child Health. 2011. doi: 10.1093/pch/16.10.655 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Sylva K, Melhuish E, Sammons P, Siraj Blatchford I, Taggart B. The Effective Provision of Pre-School Education (EPPE) Project: Findings from Pre-school to end of Key Stage1. 2004 [cited 3 Oct 2023]. Available: www.ioe.ac.uk/projects/eppe [Google Scholar]
  • 12.Sincovich A, Gregory T, Zanon C, Santos DD, Lynch J, Brinkman SA. Measuring early child development in low and middle income countries: Investigating the validity of the early Human Capability Index. SSM Popul Health. 2020;11:100613. doi: 10.1016/j.ssmph.2020.100613 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Riche L, Black M, Britto P, Daelmans B, Desmond C, Devercelli A, et al. Early childhood development: an imperative for action and measurement at scale. BMJ Glob Health. 2019;4(Suppl 4):e001302. doi: 10.1136/bmjgh-2018-001302 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Janus M, Harrison LJ, Goldfeld S, Guhn M, Brinkman S. International research utilizing the Early Development Instrument (EDI) as a measure of early child development: Introduction to the Special Issue. Early Childhood Research Quarterly. 2016;35:1–5. doi: 10.1016/j.ecresq.2015.12.007 [DOI] [Google Scholar]
  • 15.Janus M, Reid-Westoby C, Raiter N, Forer B, Guhn M. Population-Level Data on Child Development at School Entry Reflecting Social Determinants of Health: A Narrative Review of Studies Using the Early Development Instrument. Int J Environ Res Public Health. 2021;18(7):3397. doi: 10.3390/ijerph18073397 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Guhn M, Goelman H. Bioecological theory, early child development and the validation of the population-level early development instrument. Social Indicators Research. 2011;103:193–217. doi: 10.1007/S11205-011-9842-5 [DOI] [Google Scholar]
  • 17.Janus M, Brinkman S, Duku E. Validity and psychometric properties of the early development instrument in Canada, Australia, United States, and Jamaica. Social Indicators Research. 2011;103:283–97. doi: 10.1007/S11205-011-9846-1 [DOI] [Google Scholar]
  • 18.Vitiello VE, Williford AP. Alignment of teacher ratings and child direct assessments in preschool: A closer look at teaching strategies GOLD. Early Childhood Research Quarterly. 2021;56:114–23. doi: 10.1016/j.ecresq.2021.03.004 [DOI] [Google Scholar]
  • 19.Lambert RG, Kim D-H, Burts DC. The measurement properties of the Teaching Strategies GOLD® assessment system. Early Childhood Research Quarterly. 2015;33:49–63. doi: 10.1016/j.ecresq.2015.05.004 [DOI] [Google Scholar]
  • 20.Rimfeld K, Malanchini M, Hannigan LJ, Dale PS, Allen R, Hart SA, et al. Teacher assessments during compulsory education are as reliable, stable and heritable as standardized test scores. J Child Psychol Psychiatry. 2019;60(12):1278–88. doi: 10.1111/jcpp.13070 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Tickell C. The Early Years: Foundations for life, health and learning. An Independent Report on the Early Years Foundation Stage. 2012. [Google Scholar]
  • 22.Department for Education. Statutory framework for the early years foundation stage | Setting the standards for learning, development and care for children from birth to five. 2021.
  • 23.GOV.UK. Early Years Foundation Stage profile 2023 handbook. 2022.
  • 24.GOV.UK. Early Years Foundation Stage Profile: 2020 Handbook. 2019 [cited 10 Jan 2024]. Available: https://dera.ioe.ac.uk/id/eprint/34802/.
  • 25.GOV.UK. Changes to the early years foundation stage (EYFS) framework - GOV.UK. 2021 [cited 17 Nov 2021]. Available: https://www.gov.uk/government/publications/changes-to-the-early-years-foundation-stage-eyfs-framework/changes-to-the-early-years-foundation-stage-eyfs-framework [Google Scholar]
  • 26.Robling M, Cannings-John R, Lugg-Widger F. Using multiple routine data sources linked to a trial cohort to establish the longer-term effectiveness of specialist home visiting in England: main results of the BB:2-6 study of the Family Nurse Partnership. Int J Popul Data Sci. 2022 [cited 6 Nov 2023]. Available: https://ijpds.org/article/view/1829/3533 [Google Scholar]
  • 27.Kirby N, Wright B, Allgar V. Child mental health and resilience in the context of socioeconomic disadvantage: results from the Born in Bradford cohort study. Eur Child Adolesc Psychiatry. 2020;29(4):467–77. doi: 10.1007/s00787-019-01348-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mooney KE, Bywater T, Hinde S, Richardson G, Wright J, Dickerson J, et al. A quasi-experimental effectiveness evaluation of the “Incredible Years Toddler” parenting programme on children’s development aged 5: A study protocol. PLoS One. 2023;18(9):e0291557. doi: 10.1371/journal.pone.0291557 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.GOV.UK. Early years foundation stage profile results: 2018 to 2019. 2019 [cited 17 Nov 2022]. Available: https://www.gov.uk/government/statistics/early-years-foundation-stage-profile-results-2018-to-2019 [Google Scholar]
  • 30.Tracey L, Bowyer-Crane C, Bonetti S, Nielsen D, Apice KD’, Compton S. The Impact of the COVID-19 Pandemic on Children’s Socio-Emotional Wellbeing and Attainment during the Reception Year. Research Report. Education Endowment Foundation. 2022 [cited 28 Mar 2023]. Available: www.educationendowmentfoundation.org.uk [Google Scholar]
  • 31.Campbell T. Relative age and the Early Years Foundation Stage Profile: How do birth month and peer group age composition determine attribution of a ‘Good Level of Development’—and what does this tell us about how ‘good’ the Early Years Foundation Stage Profile is?. British Educational Res J. 2021;48(2):371–401. doi: 10.1002/berj.3771 [DOI] [Google Scholar]
  • 32.Pettinger KJ, Kelly B, Sheldon TA, Mon-Williams M, Wright J, Hill LJB. Starting school: educational development as a function of age of entry and prematurity. Arch Dis Child. 2020;105(2):160–5. doi: 10.1136/archdischild-2019-317124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Campbell T. Special Educational Needs and Disabilities within the English primary school system: What can disproportionalities by season of birth contribute to understanding processes behind attributions and (lack of) provisions? CASE Papers. 2021 [cited 29 Nov 2022]. Available: https://ideas.repec.org/p/cep/sticas/-223.html [Google Scholar]
  • 34.Atkinson AL, Hill L, Pettinger K, Wright J, Hart A, Dickerson J, et al. Can holistic school readiness evaluations predict academic achievement and special educational needs status? Evidence from the Early Years Foundation Stage Profile. 2021. doi: 10.31234/osf.io/496xt [DOI] [Google Scholar]
  • 35.Wood ML, Gunning L, Relins S, Sohal K, Wright J, Mon-Williams M, et al. Potential for England’s statutory school entry assessment to identify special educational needs and reveal structural inequalities: a population-based study. Arch Dis Child. 2023;109(1):52–7. doi: 10.1136/archdischild-2023-325590 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Altman DG, Royston P. The cost of dichotomising continuous variables. BMJ. 2006;332(7549):1080. doi: 10.1136/bmj.332.7549.1080 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cohen J. The Cost of Dichotomization. 1983.
  • 38.Hill LJ, Shire KA, Allen RJ, Crossley K, Wood ML, Mason D, et al. Large-scale assessment of 7-11-year-olds’ cognitive and sensorimotor function within the Born in Bradford longitudinal birth cohort study. Wellcome Open Res. 2021;653. doi: 10.12688/wellcomeopenres.16429.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Duncan RJ, Duncan GJ, Stanley L, Aguilar E, Halfon N. The Kindergarten Early Development Instrument Predicts Third Grade Academic Proficiency. Early Child Res Q. 2020;53:287–300. doi: 10.1016/j.ecresq.2020.05.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Bonetti S, Blanden J. Early years workforce qualifications and children’s outcomes: An analysis using administrative data. Education Policy Institute/Nuffield Foundation. 2020. Available: www.nuffieldfoundation.org [Google Scholar]
  • 41.Teager W, McBride T. An initial assessment of the 2-year-old free childcare entitlement: Drivers of take-up and impact on early years outcomes. Early Intervention Foundation. 2018. Available: www.EIF.org.uk [Google Scholar]
  • 42.Snowling MJ., Lindsay Geoff, Stothard SE., Bailey AM., Hulme Charles. Better Communication Research Programme: Language and LiteracyAttainment of Pupils During Early Years and Through Ks2: Does TeacherAssessment at Five Provide a Valid Measure of Children’s Current andFuture Educational Attainments? Research Brief. Dfe-Rb172A. 2011.
  • 43.Wright B, Mon-Williams M, Kelly B, Williams S, Sims D, Mushtaq F, et al. Investigating the association between early years foundation stage profile scores and subsequent diagnosis of an autism spectrum disorder: a retrospective study of linked healthcare and education data. BMJ Paediatr Open. 2019;3(1):e000483. doi: 10.1136/bmjpo-2019-000483 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–45. doi: 10.1016/j.jclinepi.2010.02.006 [DOI] [PubMed] [Google Scholar]
  • 45.Bourke L, Adams A-M. Is it differences in language skills and working memory that account for girls being better at writing than boys?. Journal of Writing Research. 2012;3(3):249–77. doi: 10.17239/jowr-2012.03.03.5 [DOI] [Google Scholar]
  • 46.Reise SP, Waller NG. Item response theory and clinical measurement. Annu Rev Clin Psychol. 2009;5:27–48. doi: 10.1146/annurev.clinpsy.032408.153553 [DOI] [PubMed] [Google Scholar]
  • 47.Tang W, Cui Y, Babenko O. Internal Consistency: Do We Really Know What It Is and How to Assess It? Journal of Psychology and Behavioral Science. 2014;2: 205–220. Available: https://www.researchgate.net/publication/280839401 [Google Scholar]
  • 48.Raynor P, Born in Bradford Collaborative Group. Born in Bradford, a cohort study of babies born in Bradford, and their parents: protocol for the recruitment phase. BMC Public Health. 2008;8:327. doi: 10.1186/1471-2458-8-327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Wright J, Small N, Raynor P, Tuffnell D, Bhopal R, Cameron N, et al. Cohort Profile: the Born in Bradford multi-ethnic family cohort study. Int J Epidemiol. 2013;42(4):978–91. doi: 10.1093/ije/dys112 [DOI] [PubMed] [Google Scholar]
  • 50.Chalmers P. mirt: A Multidimensional Item Response Theory Package for the R Environment. 2012 [cited 5 Jan 2023]. Available: https://www.jstatsoft.org/article/view/v048i06 [Google Scholar]
  • 51.Hao H. Intro to Item Response Modeling in R: An Tutorial on MIRT Package. 2022 [cited 5 Jan 2023]. Available: https://hanhao23.github.io/project/irttutorial/irt-tutorial-in-r-with-mirt-package/. [Google Scholar]
  • 52.Masur P. masurp/ggmirt: Plotting functions to extend “mirt” for IRT analyses version 0.1.0 from GitHub. 2022 [cited 10 Jul 2023]. Available: https://rdrr.io/github/masurp/ggmirt/. [Google Scholar]
  • 53.Andrich D. A rating formulation for ordered response categories. 1978.
  • 54.Stemler SE, Naples A. Rasch Measurement v. Item Response Theory: Knowing When to Cross the Line. Practical Assessment, Research and Evaluation. 2021;26: 1–16. doi: 10.7275/v2gd-4441 [DOI] [Google Scholar]
  • 55.An X, Yung Y-F. Item Response Theory: What It Is and How You Can Use the IRT Procedure to Apply It. 2014.
  • 56.Tennant A, Pallant J. The Root Mean Square Error of Approximation (RMSEA) as a supplementary statistic to determine fit to the Rasch model with large sample sizes. 2012 [cited 15 Feb 2023]. Available: https://www.rasch.org/rmt/rmt254d.htm [Google Scholar]
  • 57.Hooper D, Coughlan J, Mullen M. Evaluating model fit: a synthesis of the structural equation modelling literature. In: 7th European Conference on research methodology for business and management studies. 2008 pp. 195–200.
  • 58.Kean J, Brodke DS, Biber J, Gross P. An introduction to Item Response Theory and Rasch Analysis of the Eating Assessment Tool (EAT-10). Brain Impair. 2018;19(Spec Iss 1):91–102. doi: 10.1017/BrImp.2017.31 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Linacre. What do Infit and Outfit, Mean-square and Standardized mean? In: Rasch Measurement Transactions. 2002 p. 878.
  • 60.Chen W-H, Thissen D. Local Dependence Indexes for Item Pairs Using Item Response Theory. Journal of Educational and Behavioral Statistics. 1997;22(3):265–89. doi: 10.3102/10769986022003265 [DOI] [Google Scholar]
  • 61.Baker F. Chapter 4. The Test Characteristic Curve. The Basics of Item Response Theory. 2001.
  • 62.Parsons S, Platt L. The early academic progress of children with special educational needs. British Educational Res J. 2017;43(3):466–85. doi: 10.1002/berj.3276 [DOI] [Google Scholar]
  • 63.GOV.UK. Understanding scaled scores at key stage 2 - GOV.UK. 2019 [cited 23 Feb 2023]. Available: https://www.gov.uk/guidance/understanding-scaled-scores-at-key-stage-2#what-is-a-scaled-score [Google Scholar]
  • 64.Goodman R. The Strengths and Difficulties Questionnaire: a research note. J Child Psychol Psychiatry. 1997;38(5):581–6. doi: 10.1111/j.1469-7610.1997.tb01545.x [DOI] [PubMed] [Google Scholar]
  • 65.Whiteside KE, Gooch D, Norbury CF. English Language Proficiency and Early School Attainment Among Children Learning English as an Additional Language. Child Dev. 2017;88(3):812–27. doi: 10.1111/cdev.12615 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Parsons C. Ethnicity, gender, deprivation and low educational attainment in England: Political arithmetic, ideological stances and the deficient society. Education, Citizenship and Social. 2016;11(2):160–83. doi: 10.1177/1746197916648282 [DOI] [Google Scholar]
  • 67.Schnepf SV. Immigrants’ educational disadvantage: an examination across ten countries and three surveys. J Popul Econ. 2006;20(3):527–45. doi: 10.1007/s00148-006-0102-y [DOI] [Google Scholar]
  • 68.Education Endowment Foundation. The Attainment Gap 2017. Education Endowment Foundation. 2018; 1–18. Available: https://educationendowmentfoundation.org.uk/public/files/Annual_Reports/EEF_Attainment_Gap_Report_2018.pdf [Google Scholar]
  • 69.Fairley L, Cabieses B, Small N, Petherick ES, Lawlor DA, Pickett KE, et al. Using latent class analysis to develop a model of the relationship between socioeconomic position and ethnicity: cross-sectional analyses from a multi-ethnic birth cohort study. BMC Public Health. 2014;14835. doi: 10.1186/1471-2458-14-835 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Rubin DB. Multiple Imputation after 18+ Years. Journal of the American Statistical Association. 1996;91(434):473–89. doi: 10.1080/01621459.1996.10476908 [DOI] [Google Scholar]
  • 71.White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30(4):377–99. doi: 10.1002/sim.4067 [DOI] [PubMed] [Google Scholar]
  • 72.David HA, Hartley HO, Pearson ES. The Distribution of the Ratio, in a Single Normal Sample, of Range to Standard Deviation. Biometrika. 1954;41(3/4):482. doi: 10.2307/2332728 [DOI] [Google Scholar]
  • 73.Copay AG, Subach BR, Glassman SD, Polly DW Jr, Schuler TC. Understanding the minimum clinically important difference: a review of concepts and methods. Spine J. 2007;7(5):541–6. doi: 10.1016/j.spinee.2007.01.008 [DOI] [PubMed] [Google Scholar]
  • 74.Williams B, Onsman A, Brown T. Exploratory Factor Analysis: A Five-Step Guide for Novices. Australasian Journal of Paramedicine. 2010;81–13. doi: 10.33151/ajp.8.3.93 [DOI] [Google Scholar]
  • 75.GOV.UK. New early years framework published - GOV.UK. 2012 [cited 3 Oct 2023]. Available: https://www.gov.uk/government/news/new-early-years-framework-published [Google Scholar]
  • 76.Day TN, Mazefsky CA, Yu L, Zeglen KN, Neece CL, Pilkonis PA. The Emotion Dysregulation Inventory-Young Child: Psychometric Properties and Item Response Theory Calibration in 2- to 5-Year-Olds. J Am Acad Child Adolesc Psychiatry. 2024;63(1):52–64. doi: 10.1016/j.jaac.2023.04.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Martins PSR, Barbosa-Pereira D, Valgas-Costa M, Mansur-Alves M. Item analysis of the Child Neuropsychological Assessment Test (TENI): Classical test theory and item response theory. Appl Neuropsychol Child. 2022;11(3):339–49. doi: 10.1080/21622965.2020.1846128 [DOI] [PubMed] [Google Scholar]
  • 78.Drachler M de L, Marshall T, de Carvalho Leite JC. A continuous-scale measure of child development for population-based epidemiological surveys: a preliminary study using Item Response Theory for the Denver Test. Paediatr Perinat Epidemiol. 2007;21(2):138–53. doi: 10.1111/j.1365-3016.2007.00787.x [DOI] [PubMed] [Google Scholar]
  • 79.Gridley N, Blower S, Dunn A, Bywater T, Bryant M. Psychometric Properties of Child (0–5 Years) Outcome Measures as used in Randomized Controlled Trials of Parent Programs: A Systematic Review. Clin Child Fam Psychol Rev. 2019;22: 388–405. doi: 10.1007/s10567-019-00277-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Russo JM, Williford AP, Markowitz AJ, Vitiello VE, Bassok D. Examining the validity of a widely-used school readiness assessment: Implications for teachers and early childhood programs. Early Childhood Research Quarterly. 2019;48:14–25. doi: 10.1016/j.ecresq.2019.02.003 [DOI] [Google Scholar]
  • 81.Guhn M, Gadermann AM, Almas A, Schonert-Reichl KA, Hertzman C. Associations of teacher-rated social, emotional, and cognitive development in kindergarten to self-reported wellbeing, peer relations, and academic test scores in middle childhood. Early Childhood Research Quarterly. 2016;35:76–84. doi: 10.1016/j.ecresq.2015.12.027 [DOI] [Google Scholar]
  • 82.Burgess S, Greaves E. Test Scores, Subjective Assessment, and Stereotyping of Ethnic Minorities. Journal of Labor Economics. 2013;31(3):535–76. doi: 10.1086/669340 [DOI] [Google Scholar]

Decision Letter 0

Nhu N Tran

17 Jul 2024

PONE-D-24-14293An assessment of the teacher completed ‘Early Years Foundation Stage Profile’ as a routine measure of child developmental healthPLOS ONE

Dear Dr. Mooney,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Thank you for your patience with the review process!  Please revise and resubmit your manuscript after thoroughly addressing all three of the reviewers comments and feedback. These revisions, specifically from reviewer #1, will strengthen your manuscript. ==============================

Please submit your revised manuscript by Aug 31 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Nhu N. Tran

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please ensure that you refer to Figure 6 in your text as, if accepted, production will need this reference to link the reader to the figure.

3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for the opportunity to review the manuscript, “An assessment of the teacher completed ‘Early Years Foundation Stage Profile’ as a routine measure of child developmental health”, submitted to PLOS One.

The authors report a secondary data analysis, which attempts to establish the internal consistency and structural validity of the Early Years Foundation Stage Profile (EYFS-P) in one ethnically diverse region in England (i.e., Bradford).

In principle, this study has the potential to inform research using this specific administrative data. However, several substantial issues within the current manuscript prevent its publication in its current form and underscore my ‘reject’ recommendation. These issues are outlined in detail below. Overall, I hope this feedback is helpful for future revisions of this work.

‘Developmental health’ definition and implications for structural validity evaluation

Throughout the manuscript, the authors refer to the EYFS-P as a measure of ‘developmental health’ which they define as “a broad concept that combines a holistic understanding of physical, mental, social and emotional wellbeing”.

However, this definition is not well-aligned with the intended purpose of the EYFS-P, which also incorporates “specific areas of learning” (i.e., early academic skills). For example, as highlighted in the DfE (2024) guidance, the EYFS-P is designed to assess the “prime areas of learning (which are: communication and language; personal, social and emotional development; and physical development) and the specific areas of mathematics and literacy” (as well as Understanding the world and Expressive arts and design) (see page 6 https://assets.publishing.service.gov.uk/media/65253bc12548ca000dddf050/EYFSP_2024_handbook.pdf). Moreover, the early learning goals (ELGs) within the EYFS-P are reflective of the early years foundation stage curriculum (see Development Matters documentation) and thus may be better thought of as a curriculum-based measure.

As a consequence of this misalignment, the authors have misattributed the constructs measured within the EYFS-P. This has important implications for their structural validity evaluation. Specifically, they have used a confirmatory factor analysis approach with the a-priori assumption that the EYFS-P loads onto one factor (‘developmental health’).

However, given the design and intended purpose of the EYFS-P, there may be alternative factor structures that should also be considered, for example, a 2-factor structure (prime areas of learning; specific areas of learning) or a 7-factor structure (Communication and language; Personal, social and emotional development; Physical development; Literacy; Mathematics; Understanding the world; Expressive arts and design). As such, an exploratory factor analysis approach would be more appropriate in the current study.

Application of the current findings to current educational practice

Although the authors used the 2008-2012 version of the EYFS-P as the basis for their study, they make the following conclusions: “We expect that the results from this study will generalise to the revised version of the EYFSP, particularly the findings regarding internal consistency and predictive validity” and “This study supports future use of the EYFSP total score over the EYFSP GLD score for research and educational purposes”.

These conclusions are not supported and are problematic because the 2008-2012 version of the EYFS-P is significantly different from current educational practice. For example, the most recent statutory EYFS-P introduced in Sept 2021 includes several substantial changes to the ELGs. These changes include the removal of the Shape, Space, and Measure ELG, for the introduction of the Numerical Patterns ELG, as well as distinct changes to the content of a range of the ELGs across the EYFS-P. These changes will have important implications for the psychometric properties that the authors have examined and thus limit the generalisability of the current findings.

Furthermore, it is of concern that the authors do not acknowledge anywhere in their manuscript that the EYFS-P was revised again in 2021 (e.g., on page 4).

The authors should, therefore, clarify that their results are only applicable to research studies that are planning to use the 2008-2012 EYFS-P total score (page 27). Any further application (without further investigation) would be inappropriate.

Fine-grained analytical approach

Although the authors helpfully adopt a more fine-grained approach for considering the relationships between EYFS-P Communication and Socioemotional goals and SDQ difficulties (page 25), there presents a missed opportunity for adopting a similar approach for literacy-based ELGs to later literacy outcomes, and maths-based ELGs to later maths outcomes. This is despite the authors’ argument on page 6, which states “this rationale can be generalised to all seven areas of the EYFS-P”.

Psychometric terminology

Pages 6-7- The authors write “information about the construct validity (i.e. the extent to which a test measures what it is intended to measure)… or instance, do the ‘personal, social and emotional development’ areas have significant predictive associations with a validated measure of children’s social and emotional development?” This is incorrect, as it is an example of (predictive) criterion validity, not construct validity. I recommend that the authors familiarise themselves with resources, such as the AERA, APA, NCME Standards for Educational and Psychological Testing, which will help ensure that the terminology they use is accurate.

Page 8- The authors write “this [internal consistency] is an essential first step prior to investigating the structural validity of the EYFS-P” – this is not strictly true. Rather the COSMIN taxonomy and the AERA, APA, NCME Standards for Educational and Psychological Testing resources (mentioned above) suggest that structural validity comes first (i.e., how many factors are there?), followed by internal consistency (i.e., are the items interrelated?). Furthermore, the interrelations between items should be considered at both the full-item level (i.e., all EYFS-P items) and (if applicable) at the specific factor level (e.g., if the structural validity evaluation identified 2-factors, then the internal consistency also needs to be reported for these 2 factors). As such, I recommend that the reporting of the results in the current study be re-structured (and re-analysed per the suggestions mentioned).

Other specific issues within the manuscript

Page 3- The authors write “The embedding of standardised measurement of this into educational systems varies greatly across countries”, but then go on to discuss one example (Canada) – are there other examples that could be included here?

Page 3- The authors write “Due to the educational pressures that standardised exam settings can bring, assessments completed by children’s teachers can instead offer a valuable insight” – this is somewhat true, but requires further clarification- e.g., Educational pressures are also not the only reason why teacher-based assessments may be considered optimal with very young children. Likewise, teacher-based assessments can also be ‘high-stakes’, such as those completed in Key Stage 1.

Page 3- The authors write “The EDI… has generally demonstrated adequate psychometric properties in terms of internal consistency (15), and predictive validity”. It would be helpful to report exact figures here, as the term ‘adequate’ is often inconsistently and/or misused across psychometric studies. The COSMIN taxonomy and the AERA, APA, NCME Standards for Educational and Psychological Testing, will provide helpful starting points for the authors in establishing the consistent use of the term ‘adequate psychometric properties’.

Page 4- The authors write “69 ‘early learning goals’ (ELGs)”- this is not strictly true, rather the DfE refer to the structure of the 2008-2012 version of the EYFS-P as “13 assessment scales, with 9 points within each scale (‘scale point’). The 13 assessment scales are grouped into six areas of learning” (e.g., https://assets.publishing.service.gov.uk/media/5a7b875c40f0b62826a0429c/sfr28-2010.pdf and https://dera.ioe.ac.uk/id/eprint/8221/13/Early_FS_Handbook_v11_WO_LR_Redacted.pdf ).

Page 4- The authors write “the original version…has been used nationally and routinely for nine years” – this is also not strictly true, as the 2008-2012 version of the EYFS-P (the focus of the current study) was used for 4 years. The 2012-2021 revised version for a further 9 years, with the latest iteration made statutory in Sept 2021.

Page 6- The authors highlight research evaluating the measurement properties of the EYFS-P, but do not include other relevant literature, e.g., Snowling et al. 2011 https://files.eric.ed.gov/fulltext/ED526910.pdf

Page 6- The authors highlight “there are seven individual learning areas within the EYFSP” – (this refers to the 2012-2021 version), but on page 4 the authors state “the present study investigates the original version” (which I assume to mean the 2008-2012 version), which would include 13 assessment scales (see above point)- further clarity is required here.

It may also be helpful to create a table (perhaps expanding on table 1) that outlines each of the three versions of the EYFS-P, including which learning areas are measured, how they are measured (i.e., meets, exceeds, etc) and when they were implemented (i.e., years).

Page 7- The authors write “In understanding these strengths and weaknesses, a child could then be provided with support in a particular area”- this argument needs to be further developed, as the EYFS-P in its current scoring structure is still designed and used for this purpose by educational practitioners. How is what the authors are suggesting, any different (and superior) from current practice?

Page 15- The authors write “Analysed scores therefore ranged between 80 and 120”- It is not clear how academic attainment was measured - were maths, reading, and grammar considered separately, or combined? what score was used- raw or scaled? it isn't clear why 80 is the minimum score?

Pages 15-16- Further clarity is required about the SDQ measure- e.g., were children assessed every year between the ages 7-10? Or just once? What are the five scales? Is the prosocial scale included in the five scales?

Page 16 – What is your evidence to justify the inclusion of these covariates? “thought to be confounders” is not evidence-based.

Page 20- There is a relatively large proportion of missing data within the socioeconomic position variable (18%), which is comparable to the size of other response options. Although missing data methods are applied, it would be beneficial to conduct additional robustness checks with other socioeconomic-proxy variables, such as children’s eligibility for free school meals or pupil premium. These administrative data are likely to be available for a larger proportion of children in the linked sample that the authors are already using.

Page 20- Were there missing data for other predictor and outcome variables (described on pages 15-16)? If so, how were they handled? What proportion of data were missing?

Page 30- The authors helpfully highlight the ethnic diversity of their sample. However, the authors should expand their discussions to consider any potential SES/ethnic inequalities in teacher-based assessment (rather than child-direct measures).

Page 31- The conclusion “total score is an internally consistent measure with predictive validity” should be further clarified/toned down, as it is only for the very low and very high-performing children; the 'average' child is not adequately assessed. Although the authors later add “we caution against using it for measurement of children with very close to ‘average’ ability levels” – the first highlighted sentence is misleading for readers.

Minor points

Page 3 – “can therefore improve” would be better phrased as “can therefore help contribute to” – or something similar. Currently, the text implies a definitive causal relationship, which excludes the possibility of other factors contributing to later, positive child outcomes.

Page 3- “kindergarten teachers [in Canada]”- what age is this?

Page 8- “future outcomes” – which ones?

Page 10- “an observational birth cohort” – additional text is required to clarify the data is from one region in England (Bradford).

Page 10- “for all women recruited” – however the authors also mention fathers in paragraph 2, page 10. Please clarify the text accordingly.

Page 15- “two key changes to this upon starting the analysis” – what was the rationale for these changes?

Page 15 (and throughout the manuscript)- inconsistent use of SEN/ SEND.

Page 15- make it clear that the “Key Stage 2 assessment” is completed by children, and not a teacher-based assessment.

Page 15- “end of Year 6 at school”- what age is this?

Page 16- use APA guidelines when using numbers in text – i.e., five, instead of 5.

Page 19- “Model fit assessed via AIC and BIC was marginally better with EYFSP as a continuous variable, and the continuous modelling provides a more parsimonious estimate, so this model was selected” – please clarify where these results are reported.

Page 23 (and throughout the manuscript)- the information reported on the OSF page should also be included in a Supplementary materials section in the published manuscript.

Page 24- “EYFSP total score was associated with a higher Key Stage 2 outcome” – This text needs to be clarified – e.g., “higher EYFS-P total scores were…”

Page 24- what does GPS stand for? This is the first time the acronym is used.

Page 25- “Note that a higher score on SDQ indicates more socioemotional difficulties” should be included in the text on page 16 (i.e., SDQ method section).

Page 29- “affected by the removal of one of the response categories” – be clear which one.

Reviewer #2: This manuscript examines the internal consistency and structural and predictive validity of the EYFSP measure using linear mixed effects models, a CFA, and an IRT. This is thorough and timely work, and the statistical tests in the manuscript are appropriate and well-described. The feedback that I have is intended to strengthen the manuscript for publication.

1. In the predictive validity analysis on page 15, please provide a brief justification for the changes made to the pre-registered analysis.

2. Please provide a brief justification for the use of MICE over other missing data imputation methods (i.e., why a MAR method over a MCAR method?).

3. I would like if the implications of the misfit of certain items, as demonstrated by the RMSEA scores and discussed on pages 22-23, were more directly addressed in the discussion. Why might these particular items have misfit issues or be particularly impacted by the issues addressed by the authors on pages 27-28?

4. The authors mention several times that caution should be taken when making inferences about 'average' ability children. I wondered if the authors believe that there is an alternative to the EYFSP total score worth exploring for these children, and what this might be if so.

5. In their limitations, the authors mention that the BiB cohort may not be relevant across other kinds of populations. In what ways do the authors believe that this sample may yield different results as opposed to other samples?

6. Along the same lines as the above, in the methods and materials section, the authors should provide more detail about the demographic makeup of the BiB cohort. They mention in the discussion that this is an ethnically diverse sample. Can specific ethnicity and/or SES demographics be included in the main manuscript beyond the inclusion of the two largest ethnicities represented?

Reviewer #3: The authors take on an extraordinary blind spot in the UK public policy with child development and education at the intersection. Millions of teachers complete millions of measures on millions of children, and never has a psychometric property been considered. The authors clearly list their intentions, and the rationale for the study is well articulated. The authors also thoroughly examine the precedent held where data were compressed into Good Level of Development. This reviewer agrees that the dichotomous representation bleaches the data of nuance.

Little was written in the background as to how the EYFSP was developed, though the brief history of its revision is included. Some of the administration of the EYFSP methods are left to the discussion rather than the background. That the EYFSP is not standardized or moderated leaves the original intent (research) of the tool to collect dust. This reviewer is stunned that a gigantic, incalculable number of person hours is invested in the UK each year on a measure that was never "normed." This is not a criticism of the present manuscript, but rather a reason to accept this manuscript that critiques our well intentions even when they are not evidence-based. I was relieved to learn that there was indeed high internal consistency, but do wonder if the original authors of the measure (or panel, or work group - however it came to pass) developed the EYFSP with some degree of sound methodology.

The authors present novel findings on the extended predictive properties of the total score, now reaching children 10-11. Some inconsistent arguments exist in that the total score is discussed as needing to increase by 8-10 in order to achieve 1/2SD on key stage 2 scores. That seems statistically true, but then again the total scores are discussed throughout the manuscript as lacking nuance. Also, the EYFSP is given once, correct? No change in scores at the individual level can be expected - it's a developmental health cross sectional measure. If this reviewer has missed a key point, and that they EYFSP is a longitudinally administered measure, please correct me.

In limitations, I do not follow how the authors conclude that their findings will generalize to the revised version. The revised version seems to compress data even further, limiting total score nuance even further.

Implications and future directions: does this manuscript really support continued use of this measure, or should it be revised to support educational benchmarking and interventions? I felt a firmer stance was being taken in the discussion for the total score instead of the GLD score. This reviewer appreciates the authors consideration of how else the EYFSP should be examined. How should it change, though? With it being in its teen years after development, if the authors are calling for revisions to the method of examining it, can they also call for revisions to the tool itself?

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: No

Reviewer #2: Yes:  Lauren N. Girouard-Hallam

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Mar 19;20(3):e0302771. doi: 10.1371/journal.pone.0302771.r003

Author response to Decision Letter 1


2 Sep 2024

SEE RESPONSE TO REVIEWERS DOCUMENT

We thank all three reviewers for their reviews of the manuscript. We provide our response to each of the reviewers points below, with our responses written in blue. Where we refer to page numbers where we have made changes in the manuscript, the numbers relate to the manuscript with tracked changes.

Reviewer #1:

Thank you for the opportunity to review the manuscript, “An assessment of the teacher completed ‘Early Years Foundation Stage Profile’ as a routine measure of child developmental health”, submitted to PLOS One.

The authors report a secondary data analysis, which attempts to establish the internal consistency and structural validity of the Early Years Foundation Stage Profile (EYFS-P) in one ethnically diverse region in England (i.e., Bradford).

In principle, this study has the potential to inform research using this specific administrative data. However, several substantial issues within the current manuscript prevent its publication in its current form and underscore my ‘reject’ recommendation. These issues are outlined in detail below. Overall, I hope this feedback is helpful for future revisions of this work.

Response: We thank the reviewer for the thorough and rigorous review. We have made several changes to the manuscript based on their suggestions and feel it has strengthened it substantially.

‘Developmental health’ definition and implications for structural validity evaluation

Throughout the manuscript, the authors refer to the EYFS-P as a measure of ‘developmental health’ which they define as “a broad concept that combines a holistic understanding of physical, mental, social and emotional wellbeing”.

However, this definition is not well-aligned with the intended purpose of the EYFS-P, which also incorporates “specific areas of learning” (i.e., early academic skills). For example, as highlighted in the DfE (2024) guidance, the EYFS-P is designed to assess the “prime areas of learning (which are: communication and language; personal, social and emotional development; and physical development) and the specific areas of mathematics and literacy” (as well as Understanding the world and Expressive arts and design) (see page 6 https://assets.publishing.service.gov.uk/media/65253bc12548ca000dddf050/EYFSP_2024_handbook.pdf). Moreover, the early learning goals (ELGs) within the EYFS-P are reflective of the early years foundation stage curriculum (see Development Matters documentation) and thus may be better thought of as a curriculum-based measure.

As a consequence of this misalignment, the authors have misattributed the constructs measured within the EYFS-P. This has important implications for their structural validity evaluation. Specifically, they have used a confirmatory factor analysis approach with the a-priori assumption that the EYFS-P loads onto one factor (‘developmental health’).

However, given the design and intended purpose of the EYFS-P, there may be alternative factor structures that should also be considered, for example, a 2-factor structure (prime areas of learning; specific areas of learning) or a 7-factor structure (Communication and language; Personal, social and emotional development; Physical development; Literacy; Mathematics; Understanding the world; Expressive arts and design). As such, an exploratory factor analysis approach would be more appropriate in the current study.

Response: We appreciate the Reviewer’s point that the way in which we had defined ‘developmental health’ did not align with the intended purpose of the EYFSP, as it appeared to ignore some aspects of the curriculum that the EYFSP was based on. We revisited the reference for the term ‘developmental health’ (Keating et al., 1999) and found that they do include measures of mathematics and literacy in their definition. We have therefore amended the first sentence in our manuscript to ensure that the definition covers not only physical, social, and emotional wellbeing, but also “core educational abilities such as mathematics and literacy”. We feel that this clarification now better aligns with the original purpose of the EYFSP.

With regards to the issue around the confirmatory factor analysis (CFA) with all EYFSP items loading onto one factor, this was carried out as it is an essential analysis to accompany Item Response modelling. Item response models assume that the latent trait variable is reflected by a unidimensional continuum, thus this must be tested using CFA (explained in detail on p.14-15 of the manuscript). We do agree with the reviewer that an exploratory factor analysis of the EYFSP items would be interesting, to explore if the items do appear to relate to the areas they are intended to measure. However, since the intended purpose of our study was to validate the total score of the EYFSP (ie. not the EYFSP as a whole) – an EFA would not answer the objectives of our study.

Application of the current findings to current educational practice

Although the authors used the 2008-2012 version of the EYFS-P as the basis for their study, they make the following conclusions: “We expect that the results from this study will generalise to the revised version of the EYFSP, particularly the findings regarding internal consistency and predictive validity” and “This study supports future use of the EYFSP total score over the EYFSP GLD score for research and educational purposes”.

These conclusions are not supported and are problematic because the 2008-2012 version of the EYFS-P is significantly different from current educational practice. For example, the most recent statutory EYFS-P introduced in Sept 2021 includes several substantial changes to the ELGs. These changes include the removal of the Shape, Space, and Measure ELG, for the introduction of the Numerical Patterns ELG, as well as distinct changes to the content of a range of the ELGs across the EYFS-P. These changes will have important implications for the psychometric properties that the authors have examined and thus limit the generalisability of the current findings.

Furthermore, it is of concern that the authors do not acknowledge anywhere in their manuscript that the EYFS-P was revised again in 2021 (e.g., on page 4).

The authors should, therefore, clarify that their results are only applicable to research studies that are planning to use the 2008-2012 EYFS-P total score (page 27). Any further application (without further investigation) would be inappropriate.

Response: Thank you to the reviewer for raising concerns around the version of the EYFSP examined. We agree this is important, as the EYFSP has changed several times over the years, and agree it is crucial for the reader to understand which version is used.

We used the version of the EYFSP delivered between 2012-2021, not the version delivered between 2008-2012. We apologise for the confusion and believe this may be because we stated that we used the ‘original version’ of the EYFSP (p.6) – we have now updated this to say we investigate the ‘second version of the EYFSP’ (p.6). Based on the reviewer’s recommendation, we have also included a file (Technical Appendix File 1) which shows the second version of the EYFSP (the one we analysed), and the revised version (the one used in practice now). We have also made our statement clearer that the EYFSP was revised in 2021. We have also stated in the limitations of our discussion that we only considered data from this version of the EYFSP, and state that future research should test if the findings generalise to the newer version (p.34). We hope this alleviates the reviewers concerns around our conclusions.

Fine-grained analytical approach

Although the authors helpfully adopt a more fine-grained approach for considering the relationships between EYFS-P Communication and Socioemotional goals and SDQ difficulties (page 25), there presents a missed opportunity for adopting a similar approach for literacy-based ELGs to later literacy outcomes, and maths-based ELGs to later maths outcomes. This is despite the authors’ argument on page 6, which states “this rationale can be generalised to all seven areas of the EYFS-P”.

Response: We agree with the reviewer that it will be helpful to examine relationships between specific EYFSP goals and later outcomes. The Born in Bradford dataset facilitated an analysis of the longitudinal association between EYFSP goals and SDQ scores, and this is particularly useful as SDQ is a widely validated measure of children’s socioemotional abilities. However, data relating to all other areas of the EYFSP were not available in the Born in Bradford sample. Our discussion does highlight that ‘more research is needed to explore whether each specific goal area is associated with other measurements’, and we have now added in an explanation of why we selected the SDQ as the test case (p.35-36).

Psychometric terminology

Pages 6-7- The authors write “information about the construct validity (i.e. the extent to which a test measures what it is intended to measure)… or instance, do the ‘personal, social and emotional development’ areas have significant predictive associations with a validated measure of children’s social and emotional development?” This is incorrect, as it is an example of (predictive) criterion validity, not construct validity. I recommend that the authors familiarise themselves with resources, such as the AERA, APA, NCME Standards for Educational and Psychological Testing, which will help ensure that the terminology they use is accurate.

Response: Thank you for highlighting this mistake in our language. We agree with the Reviewer that this paragraph is describing predictive validity, and have updated the language to clarify this (p.8)

Page 8- The authors write “this [internal consistency] is an essential first step prior to investigating the structural validity of the EYFS-P” – this is not strictly true. Rather the COSMIN taxonomy and the AERA, APA, NCME Standards for Educational and Psychological Testing resources (mentioned above) suggest that structural validity comes first (i.e., how many factors are there?), followed by internal consistency (i.e., are the items interrelated?). Furthermore, the interrelations between items should be considered at both the full-item level (i.e., all EYFS-P items) and (if applicable) at the specific factor level (e.g., if the structural validity evaluation identified 2-factors, then the internal consistency also needs to be reported for these 2 factors). As such, I recommend that the reporting of the results in the current study be re-structured (and re-analysed per the suggestions mentioned).

Response: Thank you for raising this important point. We have now rearranged our analysis so that we first present the structural validity, followed by internal consistency.

We do appreciate the reviewers point around structural validity evaluation (e.g. potentially identifying >1 factor), however, in this case the EYFSP ‘total score’ measure is already being used – hence the structural validity section of our study aims to validate the total score only. We have explained this briefly in the rationale.

Other specific issues within the manuscript

Page 3- The authors write “The embedding of standardised measurement of this into educational systems varies greatly across countries”, but then go on to discuss one example (Canada) – are there other examples that could be included here?

We have now explained internationally used measures of child developmental health (the Early Child Developmental Index, and Early Development Instrument). We have also contextualised how these are embedded into educational systems – with the EDI into Canda and Australia, and the Teaching Standard Gold (TS GOLD) in the US (p.3-4).

Page 3- The authors write “Due to the educational pressures that standardised exam settings can bring, assessments completed by children’s teachers can instead offer a valuable insight” – this is somewhat true, but requires further clarification- e.g., Educational pressures are also not the only reason why teacher-based assessments may be considered optimal with very young children. Likewise, teacher-based assessments can also be ‘high-stakes’, such as those completed in Key Stage 1.

Response: Here we mean to refer to the stress that is caused to children by exam settings, and we have now clarified this. (p.5)

Page 3- The authors write “The EDI… has generally demonstrated adequate psychometric properties in terms of internal consistency (15), and predictive validity”. It would be helpful to report exact figures here, as the term ‘adequate’ is often inconsistently and/or misused across psychometric studies. The COSMIN taxonomy and the AERA, APA, NCME Standards for Educational and Psychological Testing, will provide helpful starting points for the authors in establishing the consistent use of the term ‘adequate psychometric properties’.

Response: We have added in the internal consistency, model fit, and predictive validity values for the measurement instruments that we describe (p.3-4).

Page 4- The authors write “69 ‘early learning goals’ (ELGs)”- this is not strictly true, rather the DfE refer to the structure of the 2008-2012 version of the EYFS-P as “13 assessment scales, with 9 points within each scale (‘scale point’). The 13 assessment scales are grouped into six areas of learning” (e.g., https://assets.publishing.service.gov.uk/media/5a7b875c40f0b62826a0429c/sfr28-2010.pdf and https://dera.ioe.ac.uk/id/eprint/8221/13/Early_FS_Handbook_v11_WO_LR_Redacted.pdf).

Response: We appreciate the reviewer’s comments on this point, and we have revisited the literature on the reforms to the original EYFSP. Our reading of the Tickell Review (reference #17) and the government’s publications on the changes to the EYFSP (see here: https://www.gov.uk/government/news/new-early-years-framework-published) confirms our understanding that the original 69 early learning goals were consolidated into the 17 used in the second version of the EYFSP.

Page 4- The authors write “the original version…has been used nationally and routinely for nine years” – this is also not strictly true, as the 2008-2012 version of the EYFS-P (the focus of the current study) was used for 4 years. The 2012-2021 revised version for a further 9 years, with the latest iteration made statutory in Sept 2021.

Response: This relates to the earlier point regarding which version of the EYFSP was used, we have added in clarification that we are looking at the second version of the EYFSP, not the original version.

Page 6- The authors highlight research evaluating the measurement properties of the EYFS-P, but do not include other relevant literature, e.g., Snowling et al. 2011 https://files.eric.ed.gov/fulltext/ED526910.pdf

Response: We have now included reference to this study (p.6).

Page 6- The authors highlight “there are seven individual learning areas within the EYFSP” – (this refers to the 2012-2021 version), but on page 4 the authors state “the present study investigates the original version” (which I assume to mean the 2008-2012 version), which would include 13 assessment scales (see above point)- further clarity is required here.

Response: This relates to an earlier point regarding which version of the EYFSP was used, we have added in clarification that we are looking at the second version of the EYFSP, not the original version.

It may also be helpful to create a table (perhaps expanding on table 1) that outlines each of the three versions of the EYFS-P, including which learning areas are measured, how they are measured (i.e., meets, exceeds, etc) and when they were implemented (i.e., years).

Response: We agree with the reviewer that a table is helpful for clarifying the areas of learning, thank you for the recommendation. We have included a table which outlines the second version of the EYFSP (the version that we use) and the revised version of the EYFSP (the version currently used in practice). We have not included the original version of the EYFSP, as this would contain lots of information (69 ELGs), and we do not feel that this version has implications for our findings (uploaded as Technical Appendix File 1).

Page 7- The authors write “In understanding these strengths and weaknesses, a child could then be

Attachment

Submitted filename: Response to reviewers V1.0.docx

pone.0302771.s003.docx (41.6KB, docx)

Decision Letter 1

Nhu N Tran

1 Dec 2024

PONE-D-24-14293R1An assessment of the teacher completed ‘Early Years Foundation Stage Profile’ as a routine measure of child developmental healthPLOS ONE

Dear Dr. Mooney,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Dear Dr. Mooney, 

Thank you for your patience in the review process and the detailed response to each of the original reviewers' critiques.  These revisions have greatly improved your manuscript.  Please thoroughly respond to each of the new reviewers' critiques and resubmit your manuscript.  Please use titles of the reviewer number, spaces, and numbers to itemize your changes so that it is easier to follow and read.  Please contact our team with any questions.  

==============================

Please submit your revised manuscript by December 29, 2024. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Nhu N. Tran

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

Reviewer #4: (No Response)

Reviewer #5: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: (No Response)

Reviewer #4: Yes

Reviewer #5: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: (No Response)

Reviewer #4: Yes

Reviewer #5: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: (No Response)

Reviewer #4: Yes

Reviewer #5: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: (No Response)

Reviewer #4: No

Reviewer #5: No

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: (No Response)

Reviewer #4: Again, nice work but needs some copy editing and then the results and discussion need to be redone to be more traditional

See the note for more information

Reviewer #5: This article examined the internal consistency , structural validity and predictive validity of the EYFSP.It is a longitudinal study with ethical clearance obtained from concerned authority and written consent taken.

Clarifications required- in page 8 this sentence has to be framed differently " and it was not reported how EYFSP subscale scores were calculated for this study" to be framed as above mentioned study.( Reference no 28)

Page 11 " routine educational data was collected....from local authority every year the child attended school" there is no mention at what age the first educational data was collected. Next additional data was collected at 7-10 years but In the discussion of third aim of predicting academic outcome the authors mentioned outcome of 10-11 years.

Authors to kindly clarify " overlap in total scores and variability in scores who does and does not meet GLD" accompanied table of overlap items would help to clarify and assist teachers and researchers.

The research has mentioned limitations and future directions.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #2: No

Reviewer #4: Yes:  M. Diane Clark

Reviewer #5: Yes:  Shabina Ahmed

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Early Years Foundation Stage Profile.docx

pone.0302771.s004.docx (14KB, docx)
PLoS One. 2025 Mar 19;20(3):e0302771. doi: 10.1371/journal.pone.0302771.r005

Author response to Decision Letter 2


3 Jan 2025

SEE ATTACHED WORD DOCUMENT.

Response to reviewers, 03/01/2024

We thank both Reviewer’s for their consideration of our study. Their review has helped to strengthen the manuscript. We provide responses to each Reviewer’s comments below. The page numbers we refer to relate to the manuscript with tracked changes.

Comments to the Author

Reviewer #5:

This article examined the internal consistency , structural validity and predictive validity of the EYFSP. It is a longitudinal study with ethical clearance obtained from concerned authority and written consent taken.

Clarifications required- in page 8 this sentence has to be framed differently " and it was not reported how EYFSP subscale scores were calculated for this study" to be framed as above mentioned study. ( Reference no 28)

Response: Thank you for noting this. We have reworded this sentence to make it clear that we are referencing the aforementioned study (see page 8).

Page 11 " routine educational data was collected.... from local authority every year the child attended school" there is no mention at what age the first educational data was collected. Next additional data was collected at 7-10 years but In the discussion of third aim of predicting academic outcome the authors mentioned outcome of 10-11 years.

Response: We have now explained that “educational outcomes were obtained from the Local Authority every year that the child attends school, starting at age 4 (Reception year)” (page 11).

Thank you for raising that it is not clear which age our outcomes were collected at. Outcomes for RQ3 were collected at age 10-11 through the routine educational data, and outcomes for RQ4-5 were collected in bespoke data collection by Born in Bradford at ages 7-10 (see page 17 for an explanation).

We have made amendments to the language in our discussion to make it clear when the outcomes were collected. In the discussion we refer to ‘outcomes at ages 10-11 in Maths, Reading, and Grammar…”. We then explain that we ‘explore the predictive validity of the EYFSP total score and the EYFSP-CS subscales for children’s behavioural, social, and emotional difficulties”. We have now amended this sentence to explain that these were collected “at ages 7-10”, which aligns with the additional data collection explained in the methods (see page 31).

Authors to kindly clarify " overlap in total scores and variability in scores who does and does not meet GLD" accompanied table of overlap items would help to clarify and assist teachers and researchers.

The research has mentioned limitations and future directions.

Response: Thank you for raising this and suggesting a table to clarify the overlap in items. In response to the reviewer’s suggestion, we did explore if it would be possible to include a table here, but we would have to suppress numbers within some cells due to the risk of reidentification. We instead refer to the figure as this demonstrates the overlap in total scores between those who do and do not achieve a GLD, without displaying the actual frequencies and risking reidentification of participants. We hope that this addresses the Reviewer’s concern (see figure below).

Figure 2. Kernel density distributions of EYFSP total score for those who do not achieve a GLD (in blue) and do achieve a GLD (in orange) (n=10,589).

Reviewer #4:

Again, nice work but needs some copy editing and then the results and discussion need to be redone to be more traditional

See the note for more information

Response: We thank the Reviewer for their review of our study. Please see our response below to Reviewer #4.

Attached note from Reviewer #4

Early Years Foundation Stage Profile (EYESP)

Interesting article that I enjoyed reviewing ---thanks.

Please look up the manuscript guidelines for Plos One

You need to clean up spacing and when you indent and when you do not indent the first

paragraph after a heading

Only three levels of headings are permitted

The references need to be reformatted.

Response: Thank you for noting this. We have: (1) cleaned up our spacing throughout, ensuring we are using a spacing of 2.0 for all lines, (2) ensured that we do not indent the first paragraph after each heading, (3) amended our heading levels so that we use a maximum of three levels of headings throughout, and (4) checked the formatting of referencing throughout.

In response to the Reviewer’s earlier suggestion that the manuscript requires some copy-editing, we have re-read the manuscript and made minor changes throughout to improve its readability. Thank you.

Your discussion is really results that should be more clearly related to your hypothesis

Then in the discussion it should bring in the lit review with your findings

Response: In addition to this comment, the Reviewer suggests that the results and discussion need to be redone to be more traditional. In response to these comments, we have examined the overlap between the literature covered in our introduction and discussion. Due to there being no previous psychometric research on the EYFSP total score, there is a lack of literature to bring in for Research Questions 1-2. For Research Questions 3-5 regarding the predictive validity, our discussion already mentions the prior literature (see pages 30-31). However, we have made the following changes to ensure our discussion covers the literature in our introduction as much as possible (with changes highlighted in underlined italics):

• Included the full explanation of child developmental health as mentioned in our introduction: “The construct of developmental health encompasses a holistic understanding of children’s physical, mental, social, and emotional wellbeing, combined with core educational abilities such as mathematics and literacy (1)” (see page 29)

• Included reference to studies which describe the methods we have used: “The first aim was to investigate the internal consistency the EYFSP items (44,47)” (see page 29)

• Included a reference to two studies from our introduction into our discussion section: “Although both the GLD and now the total score have been shown to predict future outcomes (34,35),” (see page 32)

• Included literature relating to the psychometric properties of other measures of child developmental health on page 34: “We do not suggest the use of a different measure of child developmental health over the EYFSP. Whilst other measures of child developmental health have undergone further psychometric analyses, they too demonstrate some variable psychometric properties (e.g. variable model fit values have been demonstrated in both the EDI (17) and TS-GOLD (19))” (see page 34)

To further address the Reviewer’s comments, we have confirmed that our current format adheres to the PLOS ONE submission guidelines and style recommendations (https://journals.plos.org/plosone/s/submission-guidelines). The discussion section thoroughly addresses each research question and integrates relevant literature wherever possible.

We hope that the changes made outlined here, and the clarifications we have provided on the PLOS ONE guidelines, will effectively address the Reviewer's comments. Whilst we understand that the Reviewer may have intended more significant revisions, we would kindly request further information on the specific changes to ensure that results and discussion sections are traditional, if the Reviewer still deems this necessary for publication of our study.

Small issues

In the marked up version ( there are no track changes) page 25 I found this (45,48). (47)

Not sure why the 47 is outside the period

Response: Thank you for noting this error, we have now amended this and removed the 47 outside of the period (see page 9).

A change in 6 in the EYFSP total score results in a change of -1.22 in SDQ, and a change in 6 in EYFSP-

Seems it should be of 6 in the EYESP (in both places)

On page 44

Response: Thank you. We have now amended this to instead say ‘a change of’ (see page 28).

Despite this, this study

----avoid the double this page 50

Response: Thank you, we have no amended this to instead say ‘despite this, the present study’ (see page 34).

Decision Letter 2

Nhu N Tran

14 Jan 2025

PONE-D-24-14293R2An assessment of the teacher completed ‘Early Years Foundation Stage Profile’ as a routine measure of child developmental health

PLOS ONE

Dear Dr. Mooney,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Thank you for revising and resubmitting your manuscript!  The revisions have strengthened the manuscript.  Please consider the revisions detailed below. 

  1. Discussion:

This section continues to lack detail and robustness. Please revise this section to review how your study’s results relate to the other existing literature. It should not be a repeat of the rationale from the introduction section. If studies that have examined your topics do not exist, then explicitly write that, however, other studies may have had similar study designs, etc. If that is the case, those studies results should be compared to your findings.

Please copy paste the changes to your manuscript into the Rebuttal/response letter so that it is easier for the reviewers to access.  

Please submit your revised manuscript by Feb 28 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Nhu N. Tran

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

PLoS One. 2025 Mar 19;20(3):e0302771. doi: 10.1371/journal.pone.0302771.r007

Author response to Decision Letter 3


21 Jan 2025

Reviewer comment:

1. Discussion:

This section continues to lack detail and robustness. Please revise this section to review how your study’s results relate to the other existing literature. It should not be a repeat of the rationale from the introduction section. If studies that have examined your topics do not exist, then explicitly write that, however, other studies may have had similar study designs, etc. If that is the case, those studies results should be compared to your findings.

Please copy paste the changes to your manuscript into the Rebuttal/response letter so that it is easier for the reviewers to access.

Thank you for helping us revise our discussion regarding how our results relate to the other existing literature on comparable measures of child developmental health. We have made the following changes:

1. We have added two paragraphs explaining previous studies which have used IRT to evaluate comparable measures of child developmental health, and related the EYFSP’s internal consistency and structural validity to comparable measures of child developmental health in previous literature [pages 29-30, version with no tracked changes]:

“Although IRT has been used to examine measures of, for example, emotional dysregulation (76) and neuropsychological capabilities (77) of young children, we have not been able to locate any previous studies using IRT on a comparable measure of broad child developmental health. The closest study is an investigation of the ‘Denver Developmental Screening Test (DDST)’, though the aim of this study was to apply IRT to develop a scoring method to estimate ‘ability age’ of individual children using the DDST (78). Hence, IRT has been underused in the context of examining the structural validity of child developmental health measures. Our study is therefore first to investigate the EYFSP’s internal consistency and structural validity using IRT, and one of the first to apply this method to early child developmental health.

Nonetheless, the internal consistency and structural validity of comparable child developmental health measures has been investigated using other psychometric analysis methods, namely the EDI and TS GOLD. The EYFSP has now demonstrated similar adequate internal consistency as the EDI (17), and demonstrates model fit similar to both the EDI and TS GOLD, which have also shown poor model fit (17,19). Model fit refers to the ability of a model to reproduce the data, with poor model fit indicating that relationships between variables may be incorrectly specified in the applied statistical model (57). Given that the EYFSP, EDI, and TS-GOLD have all demonstrated poor model fit, this may indicate a general challenge with using teacher reported measures of overall child developmental health. As this is such a broad construct, perhaps it is challenging to assess in one holistic measure. Indeed, a systematic review of parent-reported measures of child social and emotional wellbeing/behavior found such measures to have structural validity (79).

2. Following the above paragraphs, we have kept our previous discussion section which discusses the potential reasons for poor model fit. We have then added the following paragraph [page 31]:

Evidence from comparable measures of child developmental health comes from numerous psychometric studies conducted in several different countries over several years (17,19). Hence, to be able to thoroughly compare the EYFSP to these similar measures, more research on the measurement properties of the EYFSP is needed (see implications and future directions section).

3. Following the above paragraph, we then discuss the results from the predictive validity analysis. This section already considered the previous studies which have investigated the predictive validity of the EYFSP. To further situate our findings existing literature we have added the following paragraph [page 32-33]:

“There is only one other study that has reported the predictive validity of this version of the EYFSP subscales; it found that EYFSP scores relating to literacy and physical development also predicted children’s behavioral, social, and emotional difficulties (27). In comparison to other measures of child developmental health, the EDI has demonstrated variable predictive validity between the language and cognitive development domain scores and the Peabody Picture Vocabulary Test (17). The TS GOLD has been found to be associated with children’s assessments throughout the school year (80), and has been found to have variable concurrent validity with the Bracken School Readiness Scale (19). The EYFSP therefore has demonstrated adequate predictive validity in comparison to other measures of child developmental health, however, there are substantially fewer studies regarding the EYFSP.”

4. Our previous discussion then had a ‘limitations’ section, followed by an ‘implications and future directions’ section. To improve the flow of the new discussion, and to situate the ‘implications and future directions’ within the new literature we have now discussed, we have rearranged the remainder of the discussion so that the sections are ordered as follows:

Implications and future directions

Limitations

Conclusions

5. Related to the above point, we have moved a paragraph to the start of our implications and future directions section to compare the EYFSP to other comparable measures as now discussed earlier [page 34]:

Although this study has highlighted some limitations of using the EYFSP, we do not suggest the use of a different measure of child developmental health over the EYFSP. Whilst other measures of child developmental health have undergone further psychometric analyses, they too demonstrate some variable psychometric properties (e.g. variable model fit values have been demonstrated in both the EDI (17) and TS-GOLD (19)). Replacement of the EYFSP would be a significant overhaul to current educational practice and should be avoided if possible. Hence, once the measurement properties of the EYFSP are investigated as described above, the EYFSP could be used with more confidence than it currently is. Or, if the measurement properties of the EYFSP are found to be significantly lacking, a significant programme of development should be undertaken to develop it further, or replace it with an already validated instrument of child developmental health.

6. Finally, we have made minor copy edits throughout to improve the flow of the discussion. This includes the removal of a paragraph at the start of the introduction that reiterated the rationale for our study to ensure that our discussion is dedicated to discussing the findings in relation to the wider literature:

[Now removed] Embedding routine measurement of children’s developmental health into educational systems is crucial to provide support to those who need it (10,11), and potentially close inequalities in educational outcomes (9). In England and Wales, the EYFSP with 17 goals has been routinely completed by teachers for all children attending school for nearly ten years. Due to the potential use of the EYFSP ‘total score’ for both research studies and applied educational settings, we investigated whether it is fit for purpose as an overall summary of child developmental health [Now removed]

Decision Letter 3

Nhu N Tran

28 Jan 2025

An assessment of the teacher completed ‘Early Years Foundation Stage Profile’ as a routine measure of child developmental health

PONE-D-24-14293R3

Dear Dr. Mooney,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Nhu N. Tran

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Nhu N Tran

PONE-D-24-14293R3

PLOS ONE

Dear Dr. Mooney,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Nhu N. Tran

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Overview of the Early Learning Goals (ELGs) in the old and new versions of the Early Years Foundation Stage Profile (EYFSP).

    (DOCX)

    pone.0302771.s001.docx (15.4KB, docx)
    S2 File. Predictive Validity Analysis.

    (DOCX)

    pone.0302771.s002.docx (313.8KB, docx)
    Attachment

    Submitted filename: Response to reviewers V1.0.docx

    pone.0302771.s003.docx (41.6KB, docx)
    Attachment

    Submitted filename: Early Years Foundation Stage Profile.docx

    pone.0302771.s004.docx (14KB, docx)

    Data Availability Statement

    Data cannot be shared publicly as they are available through a system of managed open access. Researchers interested in accessing the data can find details and procedures on the Born in Bradford website (https://borninbradford.nhs.uk/research/how-to-access-data/). Data access is subject to review by the Born in Bradford Executive, who review proposals on a monthly basis. Requests can be submitted to borninbradford@bthft.nhs.uk. Data sharing agreements are established between the researcher and the Bradford Institute for Health Research. The full analysis code for the IRT analyses is publicly available at https://osf.io/s6num/.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES