Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Oct 7.
Published in final edited form as: Econ J (London). 2023 Nov 24;134(659):959–984. doi: 10.1093/ej/uead099

THE LIMITATIONS OF ACTIVITY-BASED INSTRUCTION TO IMPROVE THE PRODUCTIVITY OF SCHOOLING

Andreas de Barros 1, Johanna Fajardo-Gonzalez 2, Paul Glewwe 3, Ashwini Sankar 4
PMCID: PMC12499910  NIHMSID: NIHMS2050984  PMID: 41058638

Abstract

There is substantial emphasis on improving classroom practices, primarily through activity-based instruction, to increase the productivity of schooling. We study a large programme that seeks to promote mathematics learning in government primary schools in India. Through a cluster-randomised trial we find that the programme increased activity-based instruction, but yielded only muted impacts on learning. We provide a potential explanation: school value-added models suggest a negative relationship between activity-based instruction and test score gains. Our findings are robust to adding a community-engagement component to the intervention. These results highlight the limitations of activity-based instruction programmes for increasing school productivity.


In recent decades, many developing countries have substantially increased their spending on education, which was followed by increased enrolment in primary education. Despite—or perhaps because of—these developments, student learning levels remain very low, and researchers have shifted their attention to the low academic performance of primary school students.

India exemplifies this phenomenon of increased education spending, high student enrolment rates and low levels of productivity in government primary schools. Government spending on education in India more than doubled between 2006 and 2013 (in constant purchasing power parity dollars; see UNESCO Institute for Statistics, 2018). Alongside this increased spending, India’s primary school enrolment rates have consistently been over 95% for both boys and girls over the past decade (ASER, 2018). Yet, only about half of Indian children enrolled in Grade 5 can read a simple paragraph at Grade 2 level (50.1% of children), or solve a two-digit subtraction problem (52.3% of children) (ASER, 2018). These alarming statistics have opened a serious debate on ‘what works’ to improve learning.

Activity-based instruction is one approach that has recently gained prominence and is starting to be adopted in many developing countries. Activity-based instruction views learning as an active and social process that works best when a child engages in hands-on experiences, often with small groups of other children. Indeed, a recent systematic review by an expert panel of 28 studies from developed countries provides strong support for this pedagogical approach (Fuchs et al., 2021). Moreover, this approach aligns with India’s recent National Education Policy 2020, which promotes activity-based, experiential learning in primary grades (Ministry of Education, 2021).

We evaluate a large, state-wide programme in Karnataka, India. The programme promotes activity-based instruction that aims to enable students to learn mathematical concepts and develop their mathematical thinking through engaging activities that allow them to find creative ways to solve mathematical problems—in marked contrast to conventional chalk-and-talk methods typically used in Indian schools. The programme also conducts community-led contests that convene stakeholders to witness the mathematical performance of school children. It is a collaboration between the state government and an Indian non-governmental organisation that includes a phased scale-up to all 44,000 government primary schools in the state.

We implemented a cluster-randomised trial to estimate the causal effect of this programme on student learning in mathematics. We assigned 98 administrative units (Gram Panchayats1) and their schools to either the programme or a control group. To isolate the effect of the pedagogical intervention, we conducted a second randomisation and removed the community contests from half of the treated Gram Panchayats. Our sample of 292 schools in two districts includes all students in Grade 4 in those schools at the start of the study.

We begin by documenting adherence to the treatment assignment, implementation fidelity and changes in teachers’ instructional practices. We find that: ( i) all programme schools received the additional teaching inputs; (ii) almost all of the Grade 4 teachers in the programme schools received the programme’s training; (iii) after programme implementation, there were large differences in the pedagogical methods used by programme school teachers (relative to control group teachers); and (iv) the vast majority of the programme schools assigned to the community contest group participated in the contests. Any lack of programme impact is thus unlikely to be due to failure to implement the programme. We also find support for the study’s internal validity, including experimental balance and the absence of attrition bias.

We then present three sets of results. First, we show the intention-to-treat (ITT) effects of promoting activity-based instruction on student learning. After 13 months of the programme, we estimate an average impact of 0.12 standard deviations (SD) of the distribution of test scores on students’ mathematics skills, although not statistically significant at conventional levels of significance (p = 0.11). The programme raised girls’ maths scores by 0.18 SD (p < 0.05), but had no effect on boys (0.04 SD). We compare our estimates with those of the aforementioned systematic review of other rigorous studies that investigated the same pedagogical approach. We show that, while expert opinion (based on results from developed countries) strongly recommends this particular teaching practice, our findings can rule out the positive effects found in nearly all (26 out of 28) prior studies.

Second, we explore reasons for these muted impacts of activity-based instruction on student learning. Going beyond the experimental design of our study, we calculate school value-added models, both in terms of schools’ effectiveness in raising student learning and their effectiveness in improving student attitudes towards mathematics. We find that the teaching practices promoted by the programme are negatively correlated with school effects on test scores, and we find no association with school effects on student attitudes. We also show that these two types of school effects are orthogonal to each other.

Lastly, to explore whether the intervention’s (lack of) effectiveness depends on complementarities with another input, we investigate whether adding community contests improves impacts on child learning. Contrary to expectations, the estimate for the variant with community contests is almost zero (0.01 SD), and we can rule out that contests added sizeable learning increases over and above the variant with no contest (added effects of 0.06 SD or more are ruled out at 95% confidence). In fact, adding community contests to the intervention led to sizeable negative effects on classroom culture and created a less-supportive learning environment for students in the study periods after the community contests were conducted (−0.40 to −0.49 SD).

Our contribution is threefold. First, there is substantial emphasis on improving classroom practices, especially through activity-based instruction, as a way to improve the productivity of schooling. Our results show the limitations of such activity-based instruction as we rule out the positive effects found by a large body of related efficacy trials from the United States (Fuchs et al., 2021). Our results also diverge from the positive findings from a primary school intervention that aims to increase students’ curiosity in Turkey (Alan and Mumcu, 2022) and the promising results of another intervention that promotes ‘learning to learn’ principles and ‘conceptual learning’ in primary schools in Uganda (Ashraf et al., 2021).

Second, as prior evidence on, and the push for, activity-based instruction primarily builds on research from the United States and Europe, our study also speaks to the pitfalls of generalising across vastly different contexts. Few papers evaluate, at scale, a composite of ‘best practices’ widely recommended for adoption by education practitioners, but having little evidence of ever working at scale outside of high-income settings. It is well known that such educational interventions frequently lose their effectiveness at scale if they are not adopted with fidelity (Vivalt, 2020). For example, Angrist and Meager (2022) review the effectiveness of a targeted instruction intervention (‘Teaching at the Right Level’) and document vastly different impacts depending on the programme’s delivery model and implementation fidelity. Muralidharan and Singh (2020) evaluate a large-scale educational reform that appeared effective based on administrative measures of compliance, but did not affect classroom practices. In contrast, our paper documents a case of high levels of implementation fidelity and impacts on the prescribed dimensions of instructional quality, yet muted impacts on student learning. Moreover, the baseline levels of school productivity and activity-based instruction are low in India, and so the conditions for generalisability were high in the given study environment (see Bates and Glennerster, 2017). This suggests that the given intervention failed to generalise to the context of Indian government schools.

Finally, we contribute to the economics of education literature on value-added models. While these models are increasingly common for developed countries, only a handful of published studies have employed them for less-developed countries (Andrabi et al., 2011; Singh, 2015; Araujo et al., 2016; Bau and Das, 2020). Whether in developed or developing countries, even fewer studies have been able to relate schools’ or teachers’ value added to detailed classroom observations of teaching practices (Araujo et al., 2016; Blazar and Kraft, 2017). Our analyses add to evidence that one such practice (activity-based instruction) may reduce student learning (Berlinski and Busso, 2017). Our study also adds to a nascent value-added literature that examines the multi-dimensionality of educational effects on assessment-based versus non-test outcomes (Blazar and Kraft, 2017; Jackson, 2018; Beuermann et al., 2022).

1. Background and Intervention

1.1. Context

Schooling in India is free and compulsory for ages 6 to 14. Elementary education runs from Grades 1–8, of which Grades 1–5 are ‘primary’ education (the focus of this study) and Grades 6–8 are ‘upper primary’. In 2018, India had 1,255,841 schools serving ‘primary’ grades, of which 69% (860,790) were managed by state and local governments (National Institute of Educational Planning and Administration, 2018).

In 2020, India passed its new National Education Policy (NEP), which codified the country’s shift in focus towards foundational numeracy and literacy learning in the primary grades. In terms of mathematics pedagogy, NEP and a subsequent national initiative (National Initiative for Proficiency in Reading with Understanding and Numeracy, ‘NIPUN Bharat’) identified ‘joyful and experiential learning’ as a focus area, advising that classroom interactions include ‘toys, games … to be used extensively for teaching through play/discovery/game/art/activity-based pedagogy’ (Ministry of Education, 2021, p.201). India now promotes such ‘experiential learning’, both nationally (Kumar and Kumari, 2022) and through state initiatives. For example, new programmes in Uttar Pradesh and Madhya Pradesh aim for students to learn concrete mathematical concepts first, before moving to abstract understanding; we investigate a similar state-wide initiative.

We implemented this study in the Indian state of Karnataka, which is an ideal context to conduct a state-wide proof of concept for education interventions before scaling up to the entire country. First, the state is large, ranking sixth in terms of area and eighth in population (Ministry of Home Affairs, 2012). Second, it exemplifies how high enrolment and additional inputs may not raise student learning. It has very high enrolment (over 99% of rural children ages 5–14 are in school), attendance (observed attendance of rural primary students and teachers is over 90%) and infrastructure (e.g., over 99% of rural primary schools have a library or dedicated reading corner) (ASER, 2018; National Institute of Educational Planning and Administration, 2018). Yet, primary students’ arithmetic skills rank Karnataka near the bottom of India’s states; for example, less than 20% of rural government school students in Grade 5 can do basic division (ASER, 2018). Third, other states often mimic Karnataka’s education policies; for example, Odisha recently adopted the programme evaluated in this paper.

1.2. Intervention

We conducted this study co-operating with the Akshara Foundation, a large non-governmental organisation (NGO) dedicated to ensuring quality preschool and primary education in India. Founded in 2000, the organisation works with several state governments to support primary education in government-led schools. The Akshara Foundation’s Ganitha Kalika Andolana (GKA) intervention combines the provision of new instructional materials, related teacher training and community engagement to improve primary school students’ mathematics abilities. This subsection summarises the programme’s two main components (see Online Appendix B for more detail on the intervention).

The programme was started in 2011 for 249 government primary schools in Bangalore Rural district. Karnataka’s government has since committed to scaling it up to all of the state’s 44,000 government primary schools in a phased manner. In 2017, another Indian state, Odisha, began implementing GKA, expanding it to about 30,000 schools by 2020. We conducted our study during the state-wide scale up in Karnataka; we did not alter the intervention.

1.2.1. Teaching inputs for activity-based instruction and related training

The programme’s first component provides additional teaching inputs and related teacher training. This component seeks to refocus mathematics instruction on conceptual understanding rather than rote learning. Specifically, GKA provides a kit of teaching/learning materials (TLMs), and instructions to teachers to facilitate activity-based pedagogy that follows a ‘concrete-representational-abstract’ (CRA) model.2 The TLM kits include items such as an abacus, a set of shapes and measuring kits. Each item maps into mathematical concepts required by the state curriculum.

Expert teachers provide training to primary school teachers. These off-site training sessions are held during the state’s scheduled in-service teacher training, replacing its content; they are not additional training sessions, which keeps costs neutral. The training focuses on enabling teachers to create activities using the TLM kits. After this initial training, a block-level field coordinator supports the teachers as they implement this new teaching method.

1.2.2. Community contests

The second component is the community contests. These Gram Panchayat Mathematics Contests (‘GP contests’) convene stakeholders to observe students’ mathematical performance. They are intended to encourage parent engagement and community participation, which can pressure teachers to improve their teaching, thereby raising student learning.

Contests start with a maths test for students from any government primary school in the GP. After the test, participants discuss the GKA programme and other education issues, focusing on students’ learning and the quality of instruction. Next, the assessment results are announced, the top three students are recognised, and other education performance statistics are presented to community members. Following the contest, a letter is sent to local leaders and to the school’s School Development and Monitoring Committee (SDMC), summarising test scores for each participating school. GPs are free to decide whether they would like to hold a contest and, while the NGO initiates these contests in participating GPs, the GP and other local sources pay for all operational expenses.3 A GP holds at most one contest in any given school year.

1.2.3. Programme costs

We estimate that the average programme cost is US$7.40 per student per year across the two programme versions. The variant of the programme without GP contests costs US$6.80 per student; the variant with GP contests costs US$8.00 per student. The programme’s cost thus falls in the middle of the costs of interventions that have improved student learning in India’s government schools.4

2. Study Design

2.1. Sample

We implemented the study in two of Karnataka’s 30 districts: Tumkur and Vijayapura. We selected these two districts to maximise the study’s geographic spread and representativeness within the state.5 Our study includes only ‘higher primary schools’, which end in Grades 7 or 8. Before sampling, we excluded schools where the medium of instruction was not Kannada and schools with fewer than five students in Grade 4 in the previous school year (including those schools that did not teach Grade 4 at all). We also excluded Gram Panchayats (GPs) with fewer than three eligible schools. We first randomly sampled 98 GPs from these two districts. Within each GP, we randomly sampled three schools, yielding 294 schools. Two schools were removed, reducing our sample to 292 schools, after we discovered that they had no Grade 4 students; this removal occurred before randomisation into treatment and control schools.

The sampling strategy ensured that half of these GPs and schools were drawn from each of the two districts. In each district, we randomly selected 49 GPs using ‘probability proportional to size’ (PPS) sampling. Next, we randomly selected three schools from each of the 98 GPs. Within each GP, all schools had the same probability of being selected. Finally, we included all Grade 4 students who were enrolled in these sampled schools at baseline.

At baseline, 5,227 Grade 4 students were enrolled in the study’s 292 schools, of which 4,026 (77.0%) were present for the baseline data collection. This number is similar to other large-scale studies in India; for example, Goodnight and Bobde (2018) report a 73.1% student attendance rate for India’s government primary schools.

These 4,026 students are the study’s sample. At baseline, they were, on average, 9 years and 2 months old. About 53% were female.6 Of these students, 3,971 (98.6%) took the written baseline test, and 3,881 (96.4%) took the written and oral baseline tests (described in Section 2.3). We focus on students with both tests, but we also present robustness checks for the sample with only the written test and the full sample (with or without baseline tests).

To analyse intermediate outcomes, we interviewed subsamples of students and parents by randomly selecting (up to) eight students and parents per school. We drew new samples of students and parents for each survey round. More specifically, we conducted 1,924 student surveys in the first process monitoring round, 1,875 in the second round and 1,861 in the fourth round (we did not conduct student surveys in the third round). We conducted 1,967 parent interviews in round three (we did not conduct parent surveys in the other rounds).

2.2. Randomisation

To increase statistical power and ensure balance across treatment and control units, we conducted a stratified randomisation to assign the 292 schools to be treatment or control schools. Within each district, we used baseline test scores on the one-on-one test (described below) to create quadruplets of GPs with similar academic performance. For each stratum of four GPs, two were randomly selected to participate in the GKA programme, leaving the other two as ‘controls’. Thus, 49 GPs and their selected schools were assigned to receive the programme; the other 49 and their selected schools were ‘controls’.7 We repeated this randomisation procedure ten times and selected the one with the greatest balance (see Online Appendix D for details).

Finally, we randomised all 49 treatment GPs into two arms: one of 24 GPs with community contests, and one of 25 GPs without those contests. Both treatment arms received the kits and related training. Online Appendix Figure A2 depicts the study schools by treatment status. In Section 2.6, we check whether the randomisation led to comparable groups.

2.3. Data

2.3.1. Student achievement

We administered three rounds of standardised maths tests to the students in all sampled schools to obtain baseline, midline and endline assessments. These paper-based tests were administered to students in groups.8 Assessments had 30–35 multiple-choice items, which are mapped to four content domains (number sense, whole number operations, shapes and geometry, and data display, measurement and statistics) and two cognitive domains (knowing, and reasoning and applying). Test items are also mapped to the official state curriculum and include items one or two years below grade level. They have been administered in similar contexts in India for large-scale assessments. From these previous administrations, we used item response theory-based (IRT) characteristics to maximise the assessments’ test information. Students had a one hour time limit; they typically took about 45 minutes to complete each test.

Due to its salience in India, we also administered the ‘ASER’ test of basic arithmetic skill (see ASER, 2018) to the full sample of students.9 These tablet-based tests were administered by trained enumerators. The tests are adaptive: they begin with two subtraction problems and, based on a student’s performance on these questions, either continue with more difficult (i.e., division) or easier (i.e., number recognition) questions. One-on-one test administration took, at most, ten minutes per student. We followed the ASER’s standard grading procedures, classifying test takers into five ordered ability levels: beginner, recognition of single-digit numbers, recognition of two-digit numbers, two-digit subtraction (with borrowing) and three-digit by one-digit division.

We estimate each student’s ability using a two-parameter logistic (2PL) IRT model. We used anchor items across test rounds (baseline, midline, endline) to link all rounds onto a common, continuous ability scale (Kolen and Brennan, 2004).10 We describe in more detail the test design and related validity evidence in Online Appendix C. The evidence confirms that the tests did not display floor or ceiling effects. It also indicates that our test items discriminate well for student ability and that the tests exhibit low levels of noise.

2.3.2. Intermediate outcomes

We collected data on three types of intermediate outcomes. The first is teachers’ instructional behaviours. After the programme’s implementation, we used unannounced classroom observation visits to measure instructional quality, time-on-task and instructional behaviours in treatment and control schools. These visits were scheduled to follow the study’s sample of students—not a given mathematics teacher—so we focused on the instruction these students actually received, regardless of whether their teachers changed over time. We conducted one round of these visits in the first school year (June 2018 to May 2019) and three additional rounds in the second school year (June 2019 to May 2020).

More specifically, we used a novel, standardised classroom observation instrument, developed by the World Bank, called ‘Teach’. It focuses on three broad domains of instructional quality—classroom culture, instruction and socio-emotional skills—as well as nine narrower subdomains.11 We pre-specified that we would expect to find impacts on three of the nine subdomains: teaching practices related to critical thinking, autonomy, and social and collaborative skills.12 In addition, we complement Teach with two ancillary data sources for teachers’ instructional behaviours: teacher surveys and surveys of subsamples of students.

The second type of intermediate outcomes concerns parental involvement and community engagement. The student interviews included questions on parental involvement in their child’s maths education. The teacher interviews elicited teachers’ perceptions of parental involvement, including when they last communicated with a parent. Interviews with the subsample of parents asked about their involvement in their child’s maths education. To measure community engagement, we asked headmasters about the activities of their schools’ SDMCs; these committees formalise community involvement in school management and school improvement efforts. We also asked headmasters about parents’ meetings with teachers.

The third type of intermediate outcomes is student attitudes towards mathematics. We used surveys of the subsample of students to measure their attitudes towards mathematics learning. We asked whether the student: (i) enjoys learning maths; (ii) is made nervous by maths; (iii) finds maths hard to understand; and (iv) finds maths harder than other subjects.

2.3.3. Implementation fidelity

To capture implementation fidelity in treatment schools, we use two sets of primary and secondary data: (i) data on teacher training and additional teaching inputs; and (ii) data on community contests (‘GP contests’).

Regarding the former, the Akshara Foundation provided us with administrative records on teachers’ participation in GKA training sessions. We augmented these data by surveying teachers on whether they were trained on how to use the teaching and learning materials, the availability and usage of those materials and their perceptions of the programme. Information on the availability and use of the GKA teaching and learning materials was also obtained from classroom observations and the school survey. Finally, we gathered administrative information on the Akshara Foundation’s monitoring efforts and (on-site) teacher re-trainings. Akshara requires its field staff to document all school visits through a mobile app; we used this information to count, for each school, the number of school visits by Akshara staff.

Turning to the community contests data, our research team attended all community contests (GP contests). During the contests, we recorded individual student attendance, including unique student IDs. At each contest, the research team also recorded parents’ attendance. The student survey questionnaire also asked the students whether they had participated in the GP contests.

2.3.4. Additional background information

In addition to measuring students’ skills, we collected their demographic information to use as additional covariates and to track students over the study’s multiple rounds of data collection. We also acquired additional administrative information for each school at baseline. In particular, we obtained data from official school report cards from the District Information System for Education (DISE), as well as data on each school’s village from India’s 2011 Census, using geographic information system (GIS) software to match each school’s location to its respective village.

2.4. Timeline

Online Appendix Figure A4 depicts the study’s timeline, for both programme implementation and data collection. The data collection began with the November 2018 baseline survey, followed by four rounds of process monitoring in 2019 (February, August, November and December), and midline (September 2019) and endline (February 2020) assessments.13 All students started the study in Grade 4, and a new school year began after the first round of process monitoring. We tracked individual students irrespective of whether they moved to Grade 5 (almost all did), yet we focused process monitoring rounds two, three and four on Grade 5 classrooms.

2.5. Empirical Strategy

We use the following specification (1), to estimate the effects of the programme’s promotion of activity-based instruction, disentangling it from and comparing it to the intervention variant that also includes community contests.

Yisgrt=αr+β1tTgr+β2tDgr+γtYisgrt=0+δXisgrt=0+ϵisgrt. (1)

Here, Yisgrt is the outcome of interest for student i in school s, GP g, and randomisation stratum r, at time t. In our primary analysis, Yisgrt represents test scores. In our secondary analyses, Yisgrt is either measures of sub-competencies or potential mediating variables. The αr terms are strata fixed effects, Tgr is the treatment dummy for the programme variant without community contests, Dgr is a dummy indicating the treatment GPs randomly assigned to contests, and ϵisgrt is the residual. To increase precision, all specifications include Yisgrt=0 and Xisgrt=0 as covariates. Measured at baseline (t = 0), Yisgrt=0 is a student’s initial outcome of interest, and Xisgrt=0 is a vector of baseline controls selected by a Lasso procedure on student age, gender, school-level DISE data and village-level census data. The β1t and β2t parameters capture the ITT effect for each programme variant, for follow-up round t. We also test whether β2t differs from β1t.

We use specifications that allow for heterogeneous treatment effects by interacting potential moderators (e.g., student gender) with the treatment indicator.

We estimate ordinary least squares (OLS) regressions. For the ASER data, we create binary outcomes, so we estimate linear probability models. We cluster standard errors (SEs) at the GP level (see Abadie et al., 2023). To check robustness, we use randomisation inference to assess whether our re-randomisation procedure led to unexpected consequences (Young, 2019). In particular, we replicate our procedure for each of 5,000 iterations. We describe the statistical methods in more detail in Online Appendix D.

2.6. Experiment Validity

2.6.1. Baseline balance

As shown in Table 1 and Online Appendix Table A1, randomisation led to three groups of schools that are balanced in terms of observable student characteristics at baseline. Of the 84 comparisons across the three experimental groups in these two tables, we detect only four statistically significant differences at the 5% significance level, which is well in line with what can be expected by chance. The main outcome variable (students’ overall maths score) is also balanced at baseline across the three groups.

Table 1.

Student Characteristics at Baseline.

Number of observations
Mean
Differences
Control Contests Materials Control Contests Materials Contests vs control Materials vs control Contests vs materials
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Student age (as of 31 December 2018) 1,852 999 1,008 9.14 [0.54] 9.15 [0.55] 9.16 [0.58] −0.00 (0.03) 0.03 (0.03) −0.03 (0.03)
Female 1,862 1,002 1,017 0.53 [0.50] 0.53 [0.50] 0.52 [0.50] 0.00 (0.02) −0.01 (0.02) 0.02 (0.03)
Maths Score (2pl, std.) 1,862 1,002 1,017 0.01 [0.99] −0.03 [0.96] −0.07 [0.98] −0.00 (0.05) −0.07 (0.07) 0.07 (0.07)
ASER >= 1-digit 1,862 1,002 1,017 0.99 [0.10] 0.98 [0.13] 0.98 [0.13] −0.01 (0.01) −0.01 (0.01) −0.00 (0.01)
ASER >= 2-digit 1,862 1,002 1,017 0.90 [0.30] 0.88 [0.33] 0.91 [0.28] −0.02 (0.02) 0.01 (0.01) −0.03** (0.01)
ASER >= subtraction 1,862 1,002 1,017 0.33 [0.47] 0.33 [0.47] 0.32 [0.47] 0.01 (0.02) −0.02 (0.02) 0.02 (0.02)
ASER = division 1,862 1,002 1,017 0.09 [0.29] 0.09 [0.29] 0.10 [0.30] 0.00 (0.01) 0.01 (0.01) −0.00 (0.02)
Maths, HOTS (2pl, std.) 1,862 1,002 1,017 0.00 [0.99] −0.03 [0.98] −0.07 [0.99] −0.00 (0.05) −0.07 (0.08) 0.07 (0.08)
Maths, LOTS (2pl, std.) 1,862 1,002 1,017 0.01 [0.99] −0.02 [0.95] −0.07 [0.98] 0.00 (0.05) −0.08 (0.07) 0.08 (0.06)
Data (prop.) 1,862 1,002 1,017 0.37 [0.21] 0.35 [0.21] 0.35 [0.21] −0.02 (0.01) −0.02 (0.02) 0.00 (0.02)
Geometry (prop.) 1,862 1,002 1,017 0.48 [0.29] 0.48 [0.28] 0.46 [0.29] 0.00 (0.02) −0.02 (0.02) 0.03 (0.02)
Number sense (prop.) 1,862 1,002 1,017 0.60 [0.27] 0.59 [0.28] 0.59 [0.28] −0.00 (0.01) −0.01 (0.02) 0.00 (0.02)
Whole number ops. (prop.) 1,862 1,002 1,017 0.52 [0.29] 0.52 [0.28] 0.49 [0.28] 0.01 (0.01) −0.03 (0.02) 0.04** (0.02)
Attrition at midline 1,862 1,002 1,017 0.28 [0.45] 0.29 [0.45] 0.29 [0.45] 0.00 (0.02) 0.00 (0.02) 0.00 (0.03)
Attrition at endline 1,862 1,002 1,017 0.19 [0.40] 0.23 [0.42] 0.19 [0.39] 0.03* (0.02) −0.01 (0.02) 0.04** (0.02)

Notes: This table provides descriptive statistics for the study sample, by treatment status. ‘Contests’ refers to the full treatment; ‘Materials’ refers to the treatment without contests; ‘2pl, std.’ refers to the two-parameter logistic item response theory (IRT) model, standardised with respect to the control group at baseline; ‘prop.’ refers to the proportion of test questions answered correctly; ‘HOTS’ and ‘LOTS’ refer to higher- and lower-order thinking skills, respectively. Standard deviations in brackets; standard errors in parentheses (clustered at the Panchayat level). All estimations include randomisation strata fixed effects (FEs).

*

p < 0.10,

**

p < 0.05.

The overall attrition rate from baseline to midline is 28% for the control group, but this is reduced to 19% from baseline to endline. Although these attrition rates seem high, we find comparable rates in other studies (see Ghanem et al., 2020). At endline, attrition is slightly higher in the experimental group with community contests (by 3.2 percentage points) in comparison to the control group. However, as shown in Online Appendix Table A1, the non-attriting sample continues to be balanced on observable characteristics, across all three groups, at endline.14

2.6.2. Implementation fidelity and programme take-up

We observed virtually full compliance of GPs’ and schools’ random assignment to treatment arms, the only exception being one non-contest GP that received a contest. As shown in Online Appendix Figure A4, the one-week teacher training took place in January 2019, with a one-week refresher training provided in June 2019. Between the initial training and the midline assessment, we estimate an exposure of 19 weeks. The exposure until endline was 37 weeks.15 Our calculations, based on the official school calendar, but removing any days with school closures (e.g., due to local festivals and holidays, or due to floods), indicate that the effective number of days that schools were open over the study period was 215 days.

We consider three dimensions of implementation fidelity and programme take-up: (i) training and teacher perception of the programme; (ii) teaching inputs and take-up of materials; and (iii) community contests. In summary, although there are dimensions that can be improved in the future, we find that the programme was largely implemented as intended. Here, we summarise these indicators for the treatment group without contests (see Figure 1); implementation fidelity and take-up are similar in the group with contests (see Online Appendix Figure A5).

Fig. 1. Implementation Fidelity and Programme Take-up.

Fig. 1.

Notes: This figure depicts the percentage of teachers or schools that satisfy indicators of implementation fidelity and programme take-up, by experimental group. ‘Treatment’ refers to schools in the treatment group without contests.

For the training and perception dimension, we use both a headmaster and a teacher survey. The headmaster survey shows a high take-up rate: 92% of the treated schools actually participated in the GKA programme, whereas none of the control schools did. Participation in any training and workshops since 2017 was high for both treated (99%) and control (93%) schools, according to our fourth and last teacher survey. This is not surprising because the GKA trainings replace the existing government training schedule; therefore, we do not expect a large difference in the percentage of teachers receiving any type of training. However, specific GKA training was received by 81% of the Grade 4 maths teachers in the treatment group, with no GKA trainings administered to control-group teachers. Similarly, 87% of teachers in treated schools, and no teachers in control schools, reported having received training on how to use the GKA kits. As for on-site follow-up training and monitoring, NGO staff reported visiting 96% of the treatment schools at least once, and 75% of the treatment schools at least twice, over the study period (they did not visit control schools). Overall, 84% of maths teachers in treated schools perceived that the GKA programme had a large impact.

We also report on seven indicators related to teaching inputs and take-up of materials. Almost all (92%) treatment-group teachers reported having received the GKA kit. More teachers in treated schools (37%) conduct group activities on a daily basis than teachers in control schools (15%). Most (59%) treatment-group teachers reported using the GKA kit for maths classes in every class. Classroom observations using the Teach instrument reveal that 40% of teachers in treated schools conduct group activities during class. This is a 29 percentage-point difference compared to control schools. While 13% of teachers in control schools used teaching and learning materials (TLMs) in class, the proportion is much larger for teachers in treated schools (71%). In almost all of these cases, when a treatment teacher used teaching and learning materials, the TLMs had been provided by the GKA programme (68% overall, or 96% of the treatment-group teachers who used TLMs).

Finally, we investigate whether the GP contests were implemented as intended. These events took place between August 2019 and January 2020, with 24 days on which contests were held. In Online Appendix Figure A5, we focus on schools assigned to the kit-plus-contests treatment arm (in comparison to control-group schools). The GP contest survey shows that 86% of the treated-with-contests schools participated in the GP contests. The headmaster survey indicates that 33% of the schools participating in the contests received a report card after the contest. Our last indicators use GP contest data and reveal that only 1.4% (14 out of 1,018) of parents attended the GP contests, 16 although 73% of students participated in the contests.

3. Results

Here, we report whether the standalone intervention (without contests) had an impact on intermediate outcomes (especially the pre-registered dimension of instructional quality), whether it improved student learning, and how these effects compare to those in prior studies. We document muted impacts and explore explanations for this finding in Section 4. We investigate whether adding community contests improved the standalone intervention in Section 5.

3.1. Effects on Classroom Instruction and Other Intermediate Outcomes

3.1.1. Effects on classroom instruction

In Figure 2, we present the intervention’s effects on teaching quality.17 We document a positive effect of 0.11 SD on the overall index of teaching quality, but we cannot conclude with confidence that the coefficient is in fact different from zero (p = 0.12). This finding for the overall index masks a positive impact of 0.16 SD on the pre-specified dimension of teaching behaviours related to activity-based instruction (e.g., whether the teacher promotes student autonomy) and a positive impact of 0.17 SD on the dimension of teaching that is expected to promote students’ socio-emotional skills (e.g., whether the teacher promotes collaborative skills). We do not find statistically significant effects on the dimension of teaching related to classroom culture (e.g., whether the teacher creates a supportive learning environment) and instructional quality (e.g., whether the teacher provides high-quality feedback). In summary, we find positive effects on the index of the three dimensions of teaching related to activity-based instruction and no notable impacts on the remaining subdomains.

Fig. 2. ITT Effects on Instruction and Intermediate Outcomes.

Fig. 2.

Notes: This figure depicts the programme’s ITT effects on classroom instruction and other intermediate outcomes. Panel (a) depicts the ITT effects on instructional quality. Panel (b) depicts the ITT effects on parent self-reported involvement and teacher-reported parental involvement, as well as student attitudes towards math. Thick/thin horizontal bars show 90–95% confidence intervals.

3.1.2. Effects on student attitudes and parental involvement

In Figure 2, we document small positive effects on the study’s index of student attitudes towards mathematics (e.g., whether students enjoy the subject or, in contrast, whether mathematics makes them nervous). The point estimate shows a 0.08 SD improvement, but the coefficient is not significantly different from zero at conventional levels. The standalone programme (without community contests) did not target parents and, perhaps unsurprisingly, both parent- and teacher- reported parental involvement are similar to those in control schools.

3.2. Effects on Student Learning

3.2.1. Averag e effects

Panel A of Table 2 summarises results for the study’s main outcome. In the period from baseline to midline, control-school students’ maths scores improved by 0.13 SD (p < 0.10). In the period from baseline to endline, control-school students’ maths scores improved by 0.40 SD (p < 0.01). At midline, the difference across students in treatment schools and control schools is statistically indistinguishable from zero (p > 0.10)—that is, conditional on the vector of covariates, we cannot reject that treatment school students learned an equal amount when compared to their peers in control group schools. At endline, 13 months after the launch of the intervention, we find marginally significant, positive effects (0.12 SD, p = 0.11).

Table 2.

ITT Effects on Student Learning.

Control group
ITT effects
Baseline mean Gain to midline Gain to endline At midline At endline
(1) (2) (3) (4) (5)
Panel A: Effects on main outcome
Written test 0.04 [0.97] 0.13* (0.07) 0.40*** (0.08) −0.02 (0.07) 0.12 (0.07)
Panel B: Effects on ASER test
ASER >= 1-digit 0.99 [0.09] 0.01 (0.00) 0.01*** (0.00) −0.00 (0.00) 0.00 (0.00)
ASER >= 2-digit 0.90 [0.30] 0.06*** (0.01) 0.07*** (0.01) −0.02* (0.01) −0.02** (0.01)
ASER > = subtraction 0.34 [0.47] 0.09*** (0.02) 0.24*** (0.01) −0.00 (0.03) 0.02 (0.03)
ASER = division 0.09 [0.29] 0.01 (0.01) 0.11*** (0.02) −0.02 (0.01) 0.02 (0.02)
Panel C: Effects by cognitive domain
Higher-order 0.04 [0.98] 0.08 (0.08) 0.27*** (0.08) −0.01 (0.07) 0.08 (0.07)
Lower-order 0.04 [0.98] 0.10 (0.07) 0.31*** (0.07) −0.01 (0.07) 0.14** (0.07)
Panel D: Effects by content domain
Data 0.38 [0.21] −0.01 (0.02) 0.02 (0.02)
Geometry 0.49 [0.29] 0.01 (0.02) 0.05*** (0.01)
Number sense 0.61 [0.27] 0.00 (0.02) 0.03* (0.02)
Whole number operations 0.52 [0.29] −0.00 (0.02) 0.03 (0.02)

Notes: This table provides descriptive statistics for the control group (column 1), control-group gains to midline (column 2), control-group gains to endline (column 3), the difference across treatment and control students at midline (column 4) and the difference across treatment and control students at endline (column 5). Outcomes in Panel A and C are standardised with respect to the control group at baseline. Outcomes in Panels B and D are proportions ([0,1]). In column 1, the sample consists of students with a written test score at endline. Standard deviations in brackets; standard errors in parentheses (clustered at the Panchayat level). All estimations include randomisation strata fixed effects (FEs) and a vector of control variables selected via Lasso.

*

p < 0.10,

**

p < 0.05,

***

p < 0.01.

The remaining panels of Table 2 provide secondary results. First, the programme did not lead to notable improvements on the ASER test (see Panel B). Second, the small programme impacts at endline are driven by positive effects on lower-order thinking skills (0.14 SD, p < 0.05; see Panel C) and on geometry questions (5 percentage points, p < 0.01; see Panel D).

3.2.2. Heteroge neous effects

We further investigate whether the effects on students’ maths learning differ for three different (pre-specified) subgroups of students. We provide results by gender, by students’ performance on the written baseline test (by tercile) and by district (Bijapur versus Tumkur). Table 3 provides the intention-to-treat effects on the study’s main outcome measure. Table 3 shows that positive programme effects are entirely driven by improvements among girls. For girls, we find significant improvements of 0.18 SD (p < 0.05). In contrast, for boys, relative to girls, these effects are 0.15 SD lower (p < 0.10). That is, for boys, coefficients are very close to zero, and they are statistically insignificant. We do not observe clear patterns of heterogeneous effects for the remaining two subgroups of students.

Table 3.

Heterogeneity in ITT Effects on Student Learning.

Control group
ITT effects
Baseline mean Gain to midline Gain to endline At midline At endline
(1) (2) (3) (4) (5)
Panel A: By gender
Female 0.12 [0.98] 0.09 (0.07) 0.31*** (0.08) 0.01 (0.07) 0.18** (0.09)
Male −0.05 [0.96] 0.19** (0.09) 0.51*** (0.08) −0.06 (0.08) 0.04 (0.08)
Male vs female −0.13* (0.07) 0.10* (0.05) 0.20*** (0.06) −0.07 (0.06) −0.15* (0.08)
Panel B: By baseline learning level
Bottom tercile − 1.06 [0.55] 0.61*** (0.05) 0.96*** (0.04) − 0.10 (0.10) 0.12 (0.10)
Middle tercile 0.01 [0.25] 0.20*** (0.04) 0.41*** (0.04) −0.03 (0.09) 0.06 (0.08)
Top tercile 1.05 [0.49] −0.32*** (0.04) −0.10** (0.04) 0.07 (0.08) 0.18* (0.10)
Top vs bottom tercile 2.07 (1.48) −0.94*** (0.09) −1.06*** (0.11) 0.17 (0.12) 0.06 (0.12)
Panel C: By district
Bijapur −0.04 [0.76] 0.10** (0.05) 0.30*** (0.04) −0.11 (0.09) 0.16 (0.11)
Tumkur 0.19 [0.04] 0.18*** (0.06) 0.58*** (0.05) 0.10 (0.12) 0.07 (0.12)
Tumkur vs Bijapur 0.22* (0.13) 0.08 (0.14) 0.27** (0.14) 0.21 (0.15) −0.09 (0.17)

Notes: This table provides descriptive statistics for the control group (column 1), control-group growth to midline (column 2), control-group growth to endline (column 3), the difference across treatment and control students at midline (column 4) and the difference across treatment and control students at endline (column 5). The outcome is students’ overall maths score, standardised with respect to the control group at baseline. In column 1, the sample consists of students with a written test score at endline. Standard deviations in brackets; standard errors in parentheses (clustered at the Panchayat level). All estimations include randomisation strata fixed effects (FEs) and a vector of control variables selected via Lasso.

*

p < 0.10,

**

p < 0.05,

***

p < 0.01.

3.2.3. Connecting the results to prior evidence

Findings that can rule out substantial impacts can be particularly informative if they challenge expert opinion. To assess whether this is the case for our paper, we compare our findings with those from a recent systematic review of rigorous education studies conducted in developed countries on how to assist primary school students who struggle with mathematics (Fuchs et al., 2021). This review strongly supports the pedagogical approach promoted by India’s GKA programme, whereby students are prompted to move from concrete and semi-concrete to abstract representations of mathematical concepts. From their assessment of 28 studies, the review’s expert panel found that there is strong evidence in support of this pedagogical approach. They concluded that there is a preponderance of evidence of positive effects, with strong external validity, which provides a high degree of confidence that this pedagogical practice is effective.

In Figure 3, we compare the findings from the 28 studies identified by this systematic review to our findings. The panel on the left plots effect sizes against students’ exposure to an intervention (in weeks), while the panel on the right plots effect sizes against each study’s sample size. This comparison suggests that all but two of the previously reported effects are outside the confidence intervals that we find for the treatment effects of the GKA programme (at midline and at endline). This occurs even though, at endline, this intervention had provided the longest student exposure to the programme. The comparison also shows that previous studies predominantly relied on small-sample efficacy trials (with a median sample size of 180 students). Such small sample sizes will yield significant results only for large impacts. If there is a tendency for insignificant results not to be published, then this could explain why these 28 studies found large impacts: perhaps many other studies found smaller impacts, but were never published.

Fig. 3. Comparison With Other Studies.

Fig. 3.

Notes: This figure compares the study’s treatment effects with those from other elementary-grade studies of the same pedagogical approach, as identified in a recent systematic review by the What Works Clearinghouse (Fuchs et al., 2021). Each dot represents one study; we average effect sizes if a study reports on effects for multiple outcomes (e.g., for whole numbers computation and whole numbers magnitude understanding). Vertical bars show 95% confidence intervals; ‘ruled out’ refers to effect sizes that are greater than the larger (endline) upper bound. The panel to the left shows the exposure time on the x-axis (in weeks); the panel to the right shows the study’s sample size on the x-axis.

4. Exploring Reasons for Muted Impacts

Teaching is multi-dimensional and, in developed countries, some teaching practices that improve students’ performance on written assessments have been found to be unrelated to, or even negatively related to, other non-test outcomes (Blazar and Kraft, 2017; Jackson, 2018; Beuermann et al., 2022). Here, we explore whether schools’ ability to improve test scores is correlated with the particular pedagogical approach promoted by the GKA programme. We also explore whether schools’ ability to increase test scores is correlated with their ability to improve students’ attitudes towards mathematics. These analyses are exploratory, going beyond our pre-registered report and beyond the study’s experimental design. Yet, we believe that they suggest an explanation for why the programme did not lead to substantive impacts on student learning, despite its positive effects on the pre-specified dimensions of teaching practices and its positive (albeit insignificant) effects on student attitudes towards mathematics.

We begin by estimating value-added models of schools’ ability to improve test and non-test outcomes. We estimate (2), a school fixed effects model

Yisg=αsg+δXisg+ϵisg, (2)

where Yisg is an outcome measure (test scores at endline or attitudes towards mathematics) for student i in school s and Gram Panchayat g, αsg is a school fixed effect, Xisg is a vector of covariates measured at baseline, including baseline test scores, and ϵisg is an independent and identically distributed error term.

To account for the potential sorting of students into schools, other studies in the value-added literature usually estimate classroom fixed effects relative to the school average (e.g., Chetty et al., 2011; Araujo et al., 2016; Bau and Das, 2020). In our context, schools usually have only one classroom and one teacher per grade; therefore, we can estimate only school fixed effects relative to the Gram Panchayat average (accounting for the potential sorting of students into Gram Panchayats). This demeaned school effect, denoted by λsg, is shown in (3)

λsg=αsgs=1SgNsgαsgs=1SgNsg, (3)

where Sg is the number of schools in a Gram Panchayat (usually three schools) and Nsg is the number of students in school s in Gram Panchayat g.

Let Vλsg be the contribution of (variation in) school quality to (variation in) student-level outcomes. Vλˆsg may overestimate Vλsg due to sampling error. Therefore, following Chetty et al. (2011), we apply the shrinkage procedure shown in (4)

Vλsg=VλˆsgEd=1SgNdg-NsgNsgd=1SgNdgσ2, (4)

where σ2 is the within-school variance of the student-level residual ϵisg.18

Our results point to substantial differences in schools’ ability to boost test scores. We find that, corrected for sampling error, a 1 SD increase in school quality is associated with a 0.37 SD increase in student learning. In contrast, schools’ value-added for the non-test outcome is smaller; we find that a 1 SD increase in school quality (defined with respect to student attitudes towards mathematics) is associated with a 0.19 SD increase in student attitudes towards mathematics.

Next, we correlate λsg with teachers and teaching practices (focusing on the index of the three pre-specified subdimension of practices associated with the GKA programme). Table 4 shows the results of regressing the estimates of λsg for student test scores on a vector of teacher characteristics, the overall Teach index and the subdimensions of teaching practices.19 Column (1) shows that observable teacher characteristics can explain only a very small fraction of the variance of student learning. Column (3) shows that the pre-specified measure of teaching practices is negatively related to student learning (and none of the other subdimensions is distinguishable from zero, at conventional significance levels).20 Online Appendix Table A3 repeats this analysis for student attitudes. We do not find a significant relationship between the pre-specified dimension of teaching and attitudes; however, a more welcoming classroom culture appears to be positively related to student attitudes towards mathematics.

Table 4.

Regressions of School Value-Added on Teacher Characteristics and Instructional Quality.

(1) (2) (3) (4) (5) (6)
Teach global −0.03 (0.03)
Pre-specified skills −0.05** (0.02)
Classroom culture −0.02 (0.03)
Instruction −0.02 (0.03)
Socio-emotional skills −0.02 (0.02)
Teacher age 0.00 (0.00) −0.00 (0.00) −0.00 (0.00) 0.00 (0.00) 0.00 (0.00) 0.00 (0.00)
Teacher is female 0.01 (0.06) 0.00 (0.06) 0.00 (0.06) 0.01 (0.06) 0.01 (0.06) 0.01 (0.06)
Teacher years at school −0.01 (0.00) −0.00 (0.00) −0.00 (0.00) −0.00 (0.00) −0.00 (0.00) −0.00 (0.00)
Teacher holds teaching degree 0.04 (0.09) 0.04 (0.09) 0.04 (0.09) 0.04 (0.09) 0.04 (0.09) 0.04 (0.09)
R 2 0.00 0.01 0.02 0.01 0.01 0.01

Notes: This table shows results from regressions of school value-added on teacher characteristics and measures of instructional quality. The dependent variable is the school’s fixed effect’s deviation from the Gram Panchayat mean, from a regression of student endline scores on school fixed effects, baseline scores and a vector of control variables selected via Lasso (n = 292). Regressions control for the treatment indicators (results not shown). Standard errors in parentheses (clustered at the Gram Panchayat level).

**

p < 0.05.

Thereafter, we investigate the correlation between the estimates of λsg for student test scores and λsg for student attitudes. In Online Appendix Figure A6 we show that these two dimensions are unrelated to each other, with a correlation coefficient of 0.03 ( p = 0.56).21

Finally, recall that we cannot account for the systematic sorting of better-performing students to particular schools within Gram Panchayats, nor for the adoption of better teaching practices in classrooms with higher baseline scores. Online Appendix Table A4, which shows that several dimensions of teacher quality are positively correlated with students’ baseline test scores, suggests that this phenomenon exists. Yet, note that better baseline scores correlate with better instruction and greater adoption of activity-based instruction. We expect these schools to be more productive than others. Therefore, we interpret our finding of a negative relationship between school value-added and activity-based instruction as a conservative estimate of (the absolute value of) the true association.

5. Do Community Contests Add Value?

Recall that the randomisation of treatment GPs to contests occurred after data collection Round 1, and the launch of contests approximately coincided with data collection in Round 2; only two treatment Gram Panchayats (and their six schools) received the contest prior to Round 2 data collection (see Online Appendix Figure A4). This allows us to observe the causal effect of adding the community contests to the intervention across the study period: in Rounds 1 and 2, we would not expect any differences across the two treatment groups, but effects may materialise after Round 2. As shown in Table 5, negative effects of adding community contests to the intervention coincide with this timeline. In Rounds 3 and 4, we find large negative effects on the overall quality of instruction (between 0.26 and 0.30 SD), and in particular on the dimension of classroom culture (between 0.30 and 0.48 SD). As shown in the table’s even-numbered columns, these results hold when we add Round 1’s school-level Teach scores as a covariate in the estimation of effects in Rounds 3 and 4. Online Appendix Table A5 does the same for each of the nine underlying Teach indicators.

Table 5.

ITT Effects of Adding Community Contests to the Intervention on Instructional Quality.

Round 2
Round 3
Round 4
Rounds 3–1 (pooled)
(1) (2) (3) (4) (5) (6) (7) (8)
All rounds Round 1 controls All rounds Round 1 controls All rounds Round 1 controls All rounds Round 1 controls
Teach global (standardized) 0.14 (0.13) 0.07 (0.14) −0.30** (0.12) −0.38*** (0.13) −0.26** (0.13) −0.30** (0.12) −0.28*** (0.09) −0.34*** (0.08)
Pre-specified skills (standardized) 0.25** (0.12) 0.26** (0.13) −0.15 (0.14) −0.16 (0.14) −0.02 (0.14) 0.00 (0.14) −0.09 (0.11) −0.08 (0.10)
Classroom culture (standardized) 0.18 (0.15) 0.10 (0.17) −0.48*** (0.14) −0.59*** (0.14) −0.30* (0.16) −0.38** (0.16) −0.40*** (0.09) −0.49*** (0.09)
Instruction (standardized) −0.05 (0.13) −0.11 (0.13) −0.15 (0.12) −0.21* (0.12) −0.14 (0.17) −0.15 (0.15) −0.15 (0.10) −0.18* (0.09)
Socio-emotional skills (standardized) 0.19 (0.12) 0.20* (0.12) −0.02 (0.15) −0.03 (0.14) −0.07 (0.15) −0.05 (0.15) −0.04 (0.11) −0.04 (0.10)

Notes: This table presents the ITT effects of adding contests to the programme, on instructional quality. The first row reflects effects on the overall Teach index; the following three rows reflect effects on its three subdimensions. Randomisation of treatment-group GPs to the variant with versus without contests occurred in July 2019, after Round 1 had been completed; the first contests started around the time of Round 2 (compared to the study timeline, depicted in Online Appendix Figure A4). The estimation sample consists of 1,615 classroom observation ratings. Odd models (‘All rounds’) include all ratings and estimate round-specific effects with interaction terms; even models (‘Round 1 controls’) drop Round 1 observations and use their ratings as school-level controls.

*

p < 0.10,

**

p < 0.05,

***

p < 0.01.

At the same time, the contests failed to increase parental engagement. Perhaps unsurprisingly, as hardly any parents attended these events, Online Appendix Figure A7 shows that parental engagement remained unaffected. In addition, the effect on student attitudes of the programme with contests is very similar to (and statistically indistinguishable from) the effect of the programme variant without contests.

While the adverse effects on instructional quality did not translate into reduced learning outcomes, we can rule out that adding the contests led to improvements in student learning. Online Appendix Table A6 shows that the ITT effects of being assigned to the augmented programme variant that includes the contests are close to zero. Coefficients for a comparison with the standalone version of the programme are negative (−0.10), and under a directed hypothesis (one-sided test), the 95% confidence interval suggests that we can reject positive effects of 0.06 SD or higher.

6. Robustness Checks

We subject the study’s main findings to a series of robustness checks. Table 6 presents their results. The outcome of interest is students’ overall maths score at endline, standardised with respect to the control group at baseline.

Table 6.

Robustness of Results at Endline.

Attrition
Sample definition
Outcome Re-randomisation
IPW Lee (lower) Lee (upper) Any BL Any BL, written EL No contamination Complete strata Written only Rand. inference
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Average effect 0.12* (0.07) 0.08 (0.07) 0.14** (0.07) 0.12 (0.07) 0.12 (0.07) 0.11 (0.08) 0.13* (0.07) 0.12* (0.07) 0.12 [0.28]
Effect among girls 0.20** (0.09) 0.14* (0.08) 0.20** (0.09) 0.18** (0.09) 0.18** (0.09) 0.17* (0.09) 0.20** (0.09) 0.18** (0.09) 0.18 [0.14]
Effect among boys 0.03 (0.08) 0.02 (0.07) 0.06 (0.07) 0.04 (0.08) 0.04 (0.08) 0.02 (0.08) 0.05 (0.08) 0.05 (0.08) 0.04 [0.74]
Average effect with contests 0.01 (0.09) 0.00 (0.09) 0.01 (0.09) 0.02 (0.09) 0.02 (0.09) 0.02 (0.10) 0.03 (0.09) 0.01 (0.09) 0.01 [0.91]

Notes: This table presents robustness checks for the paper’s main findings at endline. Columns (1) to (3) investigates robustness to attrition. We present inverse probability-weighted (IPW) estimates and Lee (2009) bounds. Columns (4) to (7) investigate robustness to alternative sample definitions. We present results for the study sample of students with any baseline test, the study sample of students with any baseline test and at least a written endline test (but not necessarily an oral endline test), a sample where we remove a randomisation stratum with contamination (one treatment school of the group without events received an event), and a sample of strata where no schools had to be dropped after baseline (two schools had zero attendance at baseline). The outcome is students’ overall maths score, standardised with respect to the control group at baseline, except in column (8). Column (8) investigates robustness to measurement. We present results for an outcome measure that uses written test items only, ignoring students’ performance on the oral test. Standard errors in parentheses (clustered at the Panchayat level). Column (9) investigates robustness to the re-randomisation procedure used for treatment assignment. We present RI-based p-values (in brackets), where we repeated the same re-randomisation procedure in 5,000 RI iterations. All estimations include randomisation strata fixed effects (FEs) and a vector of control variables selected via Lasso.

*

p < 0.10,

**

p < 0.05.

Columns (1) to (3) in Table 6 investigate robustness to attrition. We present inverse probability-weighted estimates and Lee (2009) bounds. Columns (4) to (7) investigate robustness to alternative sample definitions. We present results for the study sample of students conditional on participating in any baseline test, the study sample of students conditional on participating in any baseline test and at least a written endline test (but not necessarily an oral endline test), a sample where we remove a randomisation stratum with contamination (one treatment school in the group without community contests received a contest) and a sample of schools that drop the two strata where a school was dropped after baseline (two schools had zero attendance at baseline). Column (8) investigates robustness to measurement decisions; we present results for an outcome measure that uses written test items only, ignoring students’ performance on the oral test. Finally, we investigate robustness to the re-randomisation procedure used for treatment assignment. Column (9) presents randomisation inference-based (RI) p-values, where we repeated the same randomisation procedure in each of 5,000 RI iterations.

In general, our point estimates remain remarkably similar across all robustness checks; we are confident that they are not substantially affected by attrition, the study’s sample definition or our choices in constructing the summary measure of mathematics learning. However, the precision of our findings is somewhat reduced for the RI-based estimate; the p-value on the ITT effect for the standalone programme increases to 0.27, and the one for the same effect among girls increases to 0.15. This reduction in the statistical significance of our results when randomisation inference is used should be interpreted with caution because, as explained in Athey and Imbens (2017, p.89), the sampling variance for the estimated average treatment effect that is calculated using randomisation inference omits the sampling variance of unit-level treatment effects (since it is not possible to estimate the latter variance consistently). Since this latter variance reduces the overall variance of the estimated average treatment effect, omitting it overestimates the overall variance of the average treatment effect obtained by using randomisation inference. Also, we continue to see strong evidence for our conclusion that the addition of community contests did not lead to improvements in mathematics learning over and above the programme variant without the contests.

7. Conclusion

Many developing countries have substantially increased their spending on education, which has led to large increases in student enrolment. Nevertheless, these positive educational outcomes are unlikely to lead to higher economic growth and improvements in the quality of life if students learn much less than the curriculum expects them to master. The academic performance of primary school students in many developing countries is disappointingly low, and India is one of those countries. This state of affairs has opened a serious debate on ‘what works’ to improve learning outcomes in developing countries.

Recent reviews of the literature point to pedagogical interventions and teacher training as among the most effective educational interventions to increase student learning. In India, both national and state-wide initiatives have started to promote one particular pedagogical approach: activity-based instruction of mathematics, following a ‘concrete-to-abstract’ model. Based on a large body of efficacy trials from developed countries, expert opinion concludes that this approach is highly effective. Yet, such interventions have rarely been evaluated at scale.

Our research investigates a large-scale effort to improve teaching and student learning by promoting activity-based instruction. We estimated the causal effects of an innovative programme in the state of Karnataka, India, that promotes activity-based instruction of mathematics at the primary school level through additional teaching inputs, related teacher training and community engagement. The Ganitha Kalika Andolana (GKA) programme is designed to help students learn mathematical concepts and to develop their concrete mathematical understanding through engaging activities (before moving on to representational and abstract learning)—in contrast with the conventional chalk-and-talk teaching commonly used in Indian schools.

To estimate the causal effect of this programme on student learning in mathematics, we implemented a randomised controlled trial in 98 administrative units (Gram Panchayats), dividing these units, and the 292 schools within them, into either the programme group or a control group. To isolate the effect of activity-based instruction from a programme component that aims to increase community engagement, we randomly assigned the treatment group into two sets of Gram Panchayats, one of which received a version of the programme that excluded community contests, while the other received a version that included those contests.

Our analysis shows adherence to this study design: the programme was implemented as intended, and that led to the expected changes in pedagogy. More specifically, almost all of the Grade 4 teachers in programme schools received the GKA training, all programme schools received the additional teaching inputs (GKA kits) and there were substantial differences in the teaching methods used by the programme school teachers (relative to the control group teachers). Therefore, we consider our study to be a successful test of a ‘best practice’ widely recommended for adoption by education practitioners, but outside of its original high-income setting. As in Muralidharan and Singh (2020), we document high implementation fidelity and programme adoption; however, we also show impacts on classroom practices. Combined with favourable generalisability conditions—the baseline levels of school productivity and activity-based instruction are low in the study context—our findings thus shed light on whether the potential of activity-based instruction generalises to developing country settings.

Our primary outcome of interest is learning in mathematics among Grade 4 students (almost all of whom moved to Grade 5 during the study period), as measured by both oral and written mathematics assessments. Thirteen months after the launch of the intervention, we find that, on average, the promotion of activity-based instruction had small (0.12 SD) impacts on students’ mathematics learning. Analysis by gender finds a significant impact of 0.18 SD ( p < 0.05) for girls’ maths scores, but no effect for boys. Even so, our estimates can rule out almost all positive estimates from previous efficacy trials, which led expert opinion to strongly recommend the programme’s pedagogical approach. Our findings are robust to adding the community-engagement component to the intervention. The estimate for the programme variant with the added community-engagement component is close to zero, and we can rule out with precision that the addition of contests led to additional learning gains (added effects of 0.06 SD or more are ruled out at 95% confidence). If anything, we find that the addition of community contests led to a less hospitable classroom environment (−0.40 to −0.49 SD) in the remaining study period after the contests were conducted.

Our analysis of value-added models provides one explanation for the muted programme impacts. We find that those schools with greater adherence to activity-based teaching practices exhibited lower productivity in raising student test scores. This (non-experimental) finding may explain why the programme’s impact on instructional practices did not coincide with substantive improvements in test scores. Hence, as governments try to promote activity-based instruction at scale, they cannot expect that these efforts will necessarily lead to test score gains; rather, such efforts may even come at the expense of test score gains. At the same time, we also document how schools’ productivity in terms of student performance on assessments was orthogonal to their ability to improve student attitudes towards mathematics. This suggests that the effects of activity-based instruction may be multi-dimensional. Unfortunately, through this study, we cannot rule out that the given pedagogical approach has desirable effects on other, non-test dimensions of student learning, which we did not measure.

Future research could go in several directions. First, programme impacts may be larger (or smaller) when implemented over a period longer than the 13-month intervention period covered in this evaluation. Second, research on skills other than mathematics, and on socio-emotional skills particularly, would be highly informative. Third, the differences in programme effects by gender merit further investigation, which would probably require a larger sample and more classroom observation data. Finally, given the high proportion of primary school students in India who are enrolled in private schools, an evaluation of this programme’s effectiveness in private schools would likely be very valuable.

Supplementary Material

Supplementary

Acknowledgments

The authors thank two anonymous referees and the editor, Marco Manacorda, for their constructive feedback. We also thank the editor of the Journal of Development Economics and two anonymous reviewers for their helpful comments on a draft version of this article. Abhijit Banerjee, Julián Cristia, Alejandro Ganimian, Andrea Guariso and Tavneet Suri provided additional helpful comments. The authors also thank participants at numerous seminars and conferences. The authors thank J-PAL South Asia and its team of field staff. Sandhya Seetharaman, Prajwal Shenoy and Anuja Venkatachalam provided excellent research assistance (Shenoy) and outstanding research management (Seetharaman and Venkatachalam). The authors are grateful to Jack Cavanagh for an expertly conducted code replication. The authors thank staff at Akshara Foundation for their collaboration, particularly Ashok Kamaths and K. Vaijayanti. The authors are grateful for the collaboration between Akshara Foundation and the government of Karnataka. The study is pre-registered at the AEA RCT Trial Registry (AEARCTR-0003494). The usual disclaimers apply. The findings, interpretations and conclusions expressed in this article are entirely those of the authors. They do not necessarily represent the views of the World Bank Group and its affiliated organizations, or those of the executive directors of the World Bank Group or the governments they represent. The authors have no conflicting interests to declare. The authors gratefully acknowledge the funding provided by the Omidyar Network for this project. The study has been approved by the University of Minnesota Human Research Protection programme (Study 4101) and by the Institute for Financial Management and Research (IFMR) Human Subjects Committee.

Footnotes

The data and codes for this paper are available on the Journal repository. They were checked for their ability to reproduce the results presented in the paper. The authors were granted an exemption to publish parts of their data because access to these data is restricted. However, the authors provided the Journal with temporary access to the data, which enabled the Journal to run their codes. The codes for the parts subject to exemption are also available on the Journal repository. The restricted access data and these codes were also checked for their ability to reproduce the results presented in the paper. The replication package for this paper is available at the following address: https://doi.org/10.5281/zenodo.10038843.

1

The local government system in India, at the village or town level.

2

Under the CRA model, students: first, develop conceptual understanding by manipulating objects; next, learn how pictures, numbers and symbols represent objects; and last, master mathematical problems using only abstract numbers and symbols. CRA is loosely based on a learning theory with three ‘stages of representation’: enactive, iconic and symbolic (Bruner and Kenney, 1965). CRA is sometimes interchangeably referred to as the ‘concrete, pictorial, abstract’ (CPA) approach.

3

Students who are present cannot opt out of participating in the contests, but they may be absent on the day of the contest. Parents are free to decide whether they would like to attend a contest.

4

One example is a study on a remedial education programme that increased maths and language learning at a cost of US$4.50 per student (Banerjee et al. , 2007). Another example is Duflo et al. (2012), who find that a programme that provided teachers with incentives to work improved student learning in maths and language at a cost of US$7.50 per student. More recently, Nyqvist and Guariso (2021) document how the combination of targeted instruction with out-of-school study groups increased maths test scores at a cost of US$17.20 per student.

5

Online Appendix Figure A1 shows the study’s two districts. In Section 3, our results suggest that these districts indeed showed different learning levels at baseline; control-group students in these districts also differed in terms of their progress from baseline to endline.

6

These students’ age and gender numbers are approximate, as they are missing for 2.0% of the students.

7

There was one left-over GP in each district (as 49 is not divisible by 4). We paired these two GPs and randomly assigned one to the intervention group and the other to the control group.

8

At baseline, we were concerned that weak students could not answer the paper-based test. Therefore, we administered a subset of seven items both orally (one-on-one) and on paper. We found no floor effects, so our concerns were unwarranted (results available on request). In later rounds, we used only written (group) tests.

9

The ASER is a comprehensive household surv e y of rural India. For children between 3 and 16 years, it records enrolment status and tests basic reading and arithmetic skills using a common set of testing tools.

10

Our pre-registered report did not discuss how to combine oral and written items. We treat each ASER level as an additional mathematics item, but constrain the written item parameters to match those from a model that uses the written test items only. This follows our pre-registered plan to calculate an IRT-based test score based on written items, but also incorporating the information from the oral test.

11

For definitions of each domain and subdomain, see Molina et al. (2020).

12

Online Appendix Figure A3 provides mean Teach scores for the control group.

13

We used J-PAL’s strict data collection procedures, including double-entry of paper-based tests, high-frequency checks of electronic forms, spot-checks, and weekly monitoring and debriefs for field staff (see Glennerster, 2017).

14

As shown by Ghanem et al. (2020), differential attrition is not a source of concern regarding internal validity—selective attrition is. Following Ghanem et al. (2020), we conduct a formal test using the study’s baseline learning levels. While attritors differed from non-attritors (p < 0.01 at midline and endline), we do not find evidence of selective attrition across experimental groups (p = 0.34 at midline; p = 0.55 at endline; these results are available from the authors on request). This corroborates that our study’s findings are internally valid for the analytical sample.

15

From the random assignment of treatment GPs to community contests (in July 2019), the exposure until endline was approximately 50% shorter than the exposure to the activity-based instruction (the median interval between a contest and the endline assessment was five months). It is not uncommon for education studies to observe short-term effects over a similar time span. For examples, see Duflo et al. (2012) and Banerjee et al. (2007).

16

We also conducted a parent survey at endline, where 11% of parents (self-)reported attending a maths contest. Our preferred data source is our research team’s observations during the contests.

17

Observers also recorded whether teachers spent their time ‘on task’ and whether the observed maths class appeared staged. We do not observe any differences across the treatment and control groups on these indicators.

18

Note that our measure of school value-added includes school effects, teacher effects and idiosyncratic classroom shocks. We observe only one cohort of students and usually do not observe multiple classrooms per school; thus we cannot disentangle these effects from each other (as in Araujo et al., 2016; Bau and Das, 2020).

19

Following Bau and Das ( 2020), in these regressions that include value-added measures on the left-hand side, we do not use a shrinkage correction.

20

Online Appendix Table A2 shows that the point estimates are similar (albeit slightly more imprecisely estimated) if the value-added estimation focuses on girls’ test scores only. Thus, we are unable to link the gender heterogeneity in treatment effects to a differential relationship between activity-based instruction and test score gains for girls (relative to boys).

21

Observationally, students’ test scores, however, are correlated with their attitudes. In the control group, students with a 1 SD more positive attitude towards mathematics performed 0.21 SD higher on the endline test (p < 0.01).

Contributor Information

Andreas de Barros, University of California, Irvine, USA.

Johanna Fajardo-Gonzalez, The World Bank, USA.

Paul Glewwe, University of Minnesota, USA.

Ashwini Sankar, Minnesota Department of Health, USA.

References

  1. Abadie A, Athey S, Imbens GW and Wooldridge JM (2023). ‘When should you adjust standard errors for clustering?’, The Quarterly Journal of Economics, vol. 138(1), pp. 1–35. [Google Scholar]
  2. Alan S and Mumcu I (2022). ‘Nurturing childhood curiosity to enhance learning: Evidence from a randomized pedagogical intervention’, Discussion papers DP17601, Centre for Economic Policy Research. [Google Scholar]
  3. Andrabi T, Das J, Khwaja AI and Zajonc T (2011). ‘Do value-added estimates add value? Accounting for learning dynamics’, American Economic Journal: Applied Economics, vol. 3(3), pp. 29–54. [Google Scholar]
  4. Angrist N and Meager R (2022). ‘The role of implementation in generalisability: A synthesis of evidence on targeted educational instruction and a new randomised trial’, CEDIL Syntheses paper 4, Centre for Excellence and Development Impact and Learning. [Google Scholar]
  5. Araujo MC, Carneiro P, Cruz-Aguayo Y and Schady N (2016). ‘Teacher quality and learning outcomes in kindergarten’, Quarterly Journal of Economics, vol. 131(3), pp. 1415–53. [Google Scholar]
  6. ASER. (2018). Annual Status of Education Report (Rural) 2017, New Delhi: ASER Centre. [Google Scholar]
  7. Ashraf N, Banerjee A and Nourani V (2021). ‘Learning to teach by learning to learn’, Ms, University of Chicago and Makerere University. [Google Scholar]
  8. Athey S and Imbens G (2017). ‘The econometrics of randomized experiments’, in (Banerjee A and Duflo E, eds.), Handbook of Economic Field Experiments, vol. 1, pp. 73–140, Amsterdam: Elsevier. [Google Scholar]
  9. Banerjee A, Cole S, Duflo E and Linden L (2007). ‘Remedying education: Evidence from two randomized experiments in India’, Quarterly Journal of Economics, vol. 122(3), pp. 1235–64. [Google Scholar]
  10. Bates MA and Glennerster R (2017). ‘The generalizability puzzle’, Stanford Social Innovation Review , vol. 15, pp. 50–4. [Google Scholar]
  11. Bau N and Das J (2020). ‘Teacher value added in a low-income country’, American Economic Journal: Economic Policy, vol. 12(1), pp. 62–96. [Google Scholar]
  12. Berlinski S and Busso M (2017). ‘Challenges in educational reform: An experiment on active learning in mathematics’, Economics Letters, vol. 156, pp. 172–5. [Google Scholar]
  13. Beuermann DW, Jackson CK, Navarro-Sola L and Pardo F (2022). ‘What is a good school, and can parents tell? Evidence on the multidimensionality of school output’, The Rev iew of Economic Studies, article ID rdac025. [Google Scholar]
  14. Blazar D and Kraft MA (2017). ‘Teacher and teaching effects on students’ attitudes and behaviors’, Educational Evaluation and Policy Analysis, vol. 39(1), pp. 146–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Bruner JS and Kenney HJ (1965). ‘Representation and mathematics learning’, Monographs of the Society for Research in Child Development, vol. 30(1), pp. 50–9. [Google Scholar]
  16. Chetty R, Friedman JN, Hilger N, Saez E, Schanzenbach DW and Yagan D (2011). ‘How does your kindergarten classroom affect your earnings? Evidence from project star’, Quarterly Journal of Economics, vol. 126(4), pp. 1593–660. [DOI] [PubMed] [Google Scholar]
  17. Duflo E, Hanna R and Ryan SP (2012). ‘Incentives work: Getting teachers to come to school’, American Economic Review, vol. 102(4), pp. 1241–78. [Google Scholar]
  18. Fuchs LS, Bucka N, Clarke B, Dougherty B, Jordan NC, Karp KS, Woodward J, Jayanthi M, Gersten R, Newman-Gonchar R, Schumacher R, Haymond K, Lyskawa J, Keating B and Morgan S (2021). Assisting Students Struggling with Mathematics: Intervention in the Elementary Grades , Washington, DC: What Works Clearinghouse. [Google Scholar]
  19. Ghanem D, Hirshleifer S and Ortiz-Becerra K (2020). ‘Testing attrition bias in field experiments’, Working Paper 202010, University of California at Riverside. [Google Scholar]
  20. Glennerster R (2017). ‘The practicalities of running randomized evaluations: Partnerships, measurement, ethics, and transparency’, in (Banerjee A and Duflo E, eds.), Handbook of Economic Field Experiments, vol. 1, pp. 175–243, Amsterdam: Elsevier. [Google Scholar]
  21. Goodnight MR and Bobde S (2018). ‘Missing children in educational research: Investigating school-based versus household-based assessments in India’, Comparative Education, vol. 54(2), pp. 225–49. [Google Scholar]
  22. Jackson CK (2018). ‘What do test scores miss? The importance of teacher effects on non-test score outcomes’, Journal of Political Economy, vol. 126(5), pp. 2072–107. [Google Scholar]
  23. Kolen MJ and Brennan RL (2004). Test Equating, Scaling, and Linking, 3rd edn, New York: Springer. [Google Scholar]
  24. Kumar P and Kumari K (2022). Experiential Learning, New Delhi: Central Board of Secondary Education. [Google Scholar]
  25. Lee DS (2009). ‘Training, wages, and sample selection: Estimating sharp bounds on treatment effects’, Review of Economic Studies, vol. 76(3), pp. 1071–102. [Google Scholar]
  26. Ministry of Education. (2021). National Initiative for Proficiency in Reading with Understanding and Numeracy (NIPUN BHARAT) , New Delhi: Department of School Education & Literacy, Ministry of Education, Government of India. [Google Scholar]
  27. Ministry of Home Affairs. (2012). ‘Census 2011: 15th census of India’, Registrar General and Census Commissioner of India. [Google Scholar]
  28. Molina E, Fatima SF, Ho AD, Melo C, Wilichowski TM and Pushparatnam A (2020). ‘Measuring the quality of teaching practices in primary schools: Assessing the validity of the teach observation tool in Punjab, Pakistan’, Teaching and Teacher Education, vol. 96, article ID 103171. [Google Scholar]
  29. Muralidharan K and Singh A (2020). ‘Improving public sector management at scale? Experimental evidence on school governance in India’, Working Paper 28129, National Bureau of Economic Research. [Google Scholar]
  30. National Institute of Educational Planning and Administration. (2018). School Education in India: U-DISE Flash Statistics 2016–17, New Delhi: National Institute of Educational Planning and Administration. [Google Scholar]
  31. Nyqvist MB and Guariso A (2021). ‘Supporting learning in and out of school: Experimental evidence from India’, Ms., Stockholm School of Economics. [Google Scholar]
  32. Singh A. (2015). ‘Private school effects in urban and rural India: Panel estimates at primary and secondary school ages’, Journal of Development Economics, vol. 113, pp. 16–32. [Google Scholar]
  33. UNESCO Institute for Statistics. (2018). ‘Data for the sustainable development goals’, http://uis.unesco.org (last accessed: 21 November 2019).
  34. Vivalt E. (2020). ‘How much can we generalize from impact evaluations?’, Journal of the European Economic Association, vol. 18, pp. 3045–89. [Google Scholar]
  35. Young A. (2019). ‘Channeling Fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results’, Quarterly Journal of Economics, vol. 134(2), pp. 557–98. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary

RESOURCES