Skip to main content
PLOS One logoLink to PLOS One
. 2020 Apr 20;15(4):e0231548. doi: 10.1371/journal.pone.0231548

Information feedback in relative grading: Evidence from a field experiment

Shinya Kajitani 1,#, Keiichi Morimoto 2,#, Shiba Suzuki 3,*,#
Editor: Baogui Xin4
PMCID: PMC7170246  PMID: 32311001

Abstract

Previous studies have revealed the role of relative performance information feedback on providing agent incentives under a relative rewarding scheme through laboratory experiments. This study examines the impact of relative performance information feedback of students’ performance on their examination score under the relative grading scheme in an actual educational environment. Conducting a randomized controlled trial in a compulsory subject at a Japanese university, we show that the relative performance information feedback has a significantly positive impact on the students’ examination score on average, but that the average positive impact is derived by the improvement of low-performing students.

Introduction

Does relative performance information feedback improve a student’s incentive to study under a relative grading scheme? Many consider information feedback associated with a reward environment as an efficient way of increasing the incentives of students to study. “Relative grading” or “grading on a curve” is widely used in grading students. Conducting a randomized controlled trial in a compulsory subject required for university graduation, we examine the impact of relative performance information feedback on students’ examination scores.

In relative grading, a student’s grade depends on her position in the class score distribution. To understand student incentives in a relative grading scheme, Becker and Rosen [1] extend the rank-order tournament model of Lazear and Rosen [2] and emphasize that the student’s learning effort depends on her position in the distribution of academic attainment. Andreoni and Brownback [3] construct a theoretical model of relative grading employing an all-pay auction and demonstrate that low skilled subjects decrease effort but high skilled subjects increase effort as the auction size increases. These suggest that relative performance information feedback affects student decision-making in providing effort. Besides, in actual schooling environments, multiple examinations typically grade students. Aoyagi [4] and Ederer [5] theoretically analyze information feedback in a dynamic tournament context. It is then worth considering the relationship between information on a student’s relative position in the distribution of earlier examination scores and her incentive to provide study effort for the following examination in actual schooling environments.

How then does this relative performance information feedback affect the students’ incentive to study under a relative grading scheme in a multiple examinations environment? In a relative grading scheme, to obtain a better grade a student needs to receive a higher score than her opponents do. That is, an opponent’s score serves as a threshold she must exceed. In this grading environment, the relative performance information feedback is then a signal of the effort she should provide. For example, when relative performance information feedback tells the student that her current score is relatively low, she understands that she has to provide a higher level of effort to rise above the threshold. Conversely, she may give up, saving the cost of effort.

Some laboratory experiments reveal the role of relative performance information feedback with respect to relative rewarding. These studies generally suggest that there is no guarantee that the relative performance information feedback has a positive impact on the students’ incentive to study. For example, Eriksson et al. [6] and Freeman and Gelber [7] conclude that relative performance information feedback lowers the performance of subjects whose interim performance is relatively low. However, those subjects whose midterm performance is relatively high do not slacken off. In contrast, Ludwig and Lünser [8] examine the effects of effort information in a two-stage rank-order tournament. They demonstrate that laboratory subjects who lead tend to lower their effort, but those who lag increase it relative to the first stage, while the subjects who lead exert a greater effort than those who lag. Thus, the impact of relative performance information feedback may vary according to the initial level of attainment.

In an actual educational environment, previous studies focus on the impact of relative performance information feedback on student incentives under absolute grading. For example, Azmat and Iriberri [9], using data from Spanish high schools, and Tran and Zeckhauser [10], in a field experiment of Vietnamese university students, demonstrate that relative performance information feedback raises the performance of students when rewarded absolutely. Both these studies argue that if students have competitive preferences, which means that they inherently prefer receiving a higher rank than others, relative performance information has a positive impact on their incentive to study on average.

The question in this paper is whether relative performance information feedback improves student examination scores in an actual relative grading environment where students sit for examinations on multiple occasions. To examine this issue, we conduct a field randomized controlled trial employing the compulsory subject of economics at a Japanese university. In this course, after students sat two examinations (the midterm examination and the final examination), instructors calculated the students’ final raw scores mainly by taking the weighted average of their two examination scores. However, the students’ grades were evaluated by grading on a curve, with the instructors adjusting the students’ final raw scores subject to the entire final raw score distribution to obtain the reasonable pass rate. In our experiment, we allocated more than 200 students into a control group and a treatment group immediately following the midterm examination. We only provided students in the treatment group with feedback on their midterm examination relative performance and explored the impact of this feedback on student performance in the final examination.

This study is the attempt to investigate the impact of relative performance information feedback on student incentives to study in an actual educational environment encompassing relative grading. We show the significant positive impact of relative performance information feedback on the students’ final examination scores on average. Note that because students cannot graduate from the university unless they receive credit in this subject, they care about whether they can receive credit. In other words, the threshold between a pass and a fail in the course is significant for students. Moreover, the threshold depends not only on the students’ ranks in the distribution of the final raw scores but also on their final raw scores per se. By considering the threshold, we demonstrate that the average positive impact on the final examination scores is through the improvement of low-performing students in the midterm examination.

The remainder of the paper is organized as follows. Section “” describes the experimental design. Section “Balance between the control and treatment groups” presents the empirical framework and reports the estimation results and discusses the findings. Section “Discussion” concludes.

Materials and methods

This experiment was approved by Meisei University’s research ethics committee on the Use of Human Subjects (Application No. H26-002). Before conducting the studies, we obtained informed consent from all subjects.

Description of the randomized trial

This section provides details of the randomized trial, performed using first-year students in an economics department at a Japanese private university. We begin by describing the flow of interventions in the experiments, which are displayed in Fig 1. The academic year comprised first and second semesters: the first semester began in April 2012 and ended in July 2012; the second semester began in September 2012 and ended in January 2013. We conducted a mathematical achievement test (referred to as the Pretest of Mathematics) immediately following university entrance. Students enrolled in two compulsory introductory economics courses in their first year: Economics I in the first semester and Economics II in the second semester. In Economics I and II, we administered midterm and final examinations to grade students. While the midterm and final examinations in Economics I were in May and July 2012, those in Economics II were in November 2012 and January 2013. We note that the score for the Pretest of Mathematics was independent of the grades for Economics I and II. The dotted vertical lines in Fig 1 represent the timing of the examinations.

Fig 1. The flow of interventions in the experiment.

Fig 1

We evaluated the students in both Economics I and II using the same grading scheme, namely, grading on a curve. The instructors explained this grading scheme in detail to the students in Economics II at the beginning of the second semester. We provide details of the grading on a curve scheme later.

We divided students into four classes. According to their score in the Pretest of Mathematics, we placed all students with a top-40 score in one small class. Hereafter, we refer to this as Classroom 1. The designation of Classroom 1 is for purely educational purposes. For example, in teaching economics, we used a different level of mathematics in Classroom 1 and the other classrooms. We then randomly allocated the remaining students to the other three classes. Hereafter, we refer to these as Classrooms 2, 3 and 4. We fixed all class enrollments and instructors across both semesters. While each class had its own instructors, such that all classes were held in the third period (12:55 p.m.–14:25 p.m.) on Wednesday, all students took the same examination using a multiple-choice computer-scored answer sheet at the same time.

The experimental intervention was implemented immediately after the midterm examination in Economics II, which then randomly assigned all students to the treatment or control group. In the first class time after the midterm examination, we handed students letters revealing their score for the midterm examination. In addition, the letters given to students in the treatment group also reported their ranks in the midterm examination. We did not include this information in the letters to the students in the control group. The student letter content is similar to that used by Ashraf et al. [11]. Figs 2 and 3 reproduce the information provided to the students in the treatment and control groups. On this basis, while students in the treatment group knew their precise rank, students in the control group would only have a vague awareness.

Fig 2. The letter to students (treatment group).

Fig 2

Fig 3. The letter to students (control group).

Fig 3

There are several points to note in our randomized control trial. First, we exclude some students who did not receive the letter regarding the midterm examination from our sample. Because some students were absent from the class time just after the midterm examination, we could not hand letters to these students. Therefore, we do not consider these students as subjects in our experiment.

Second, our experimental design cannot exclude the possibility that some students may have exchanged their rank information. Because our experimental design is similar to that of Tran and Zeckhauser [10] save the grading scheme, we share the same problem that students in both the control and treatment groups sit in the same classroom, making the exchange of rankings a genuine possibility. However, it would be generally difficult for a student in the control group to identify a student in the treatment group with exactly the same score; students in the treatment group know their exact rank, while students in the control group do not. Therefore, the impact of relative performance information feedback, if any, is captured by our experimental design. We discuss this further in Subsection “Balance between the control and treatment groups” and Section “Balance between the control and treatment groups.”

Third, instructors were not blind to the treatment a certain student had been assigned because the letter was not put in an envelope and instructors could confirm this information. However, aside from handing letters to students, instructors could not confirm which students were assigned to treatment or control, and it would be difficult for instructors to remember this information. In addition, class attendance or participation, such as the number of times a student spoke in class, was not evaluated at all. Thus, whether instructors were blind to which students were assigned to treatment or control would have little impact on the experimental results.

Finally, students are exogenously assigned in the treatment group and the control group in Economics II. The class assignment is neutral for our randomized control trial. The reason why we placed students with a top-40 score in the Pretest of Mathematics in Classroom 1 is purely for educational purposes. A substantial mathematical background is generally important to understand economics. However, there was a disparity in the mathematical background of freshman students, and many did not have sufficient knowledge of mathematics. In Economics I, to provide these students with remedial education in mathematics, we placed them in Classroom 2, 3, or 4. Consequently, we placed few students who had sufficient mathematical background in Classroom 1. Nevertheless, except for the remedial education in mathematics, the course material and examinations in economics were the same across Classrooms 1–4 in Economics I. In Economics II, the course material and examinations in economics were also identical, and no remedial education in mathematics in any of the classrooms was given. Instead, students are required to understand the basic concepts in economics in Economics II. Most of questions in examinations in Economics II are not mathematical but of a multiple-choice type. Therefore, initial mathematical background itself has less of an impact on achievement in Economics II.

The grading scheme

In Economics II, a final raw score was calculated as follows: perfect raw scores were 110 points, in which 100 points were for the two examinations (the midterm and final examinations) and the remaining 10 points for the number of homework submissions. The examination score of 100 is divided into “40% of the midterm examination score” and “60% of the final examination score.” “The number of homework submissions” is based on 10 homework assignments, each worth one point.

It should be noted that the instructors can adjust students’ final raw score upward considering the entire final raw score distribution. This upward adjustment introduces uncertainty into threshold scores between one grade and another, especially the threshold score between pass and fail. The official university’s guidelines recommend that instructors should assign the first grade (S) to scores 90 and over, the second grade (A) to scores between 80 and 89, the third grade (B) to scores between 70 and 79, the fourth grade (C) to scores between 60 and 69, and fail (F) to scores below 60. A student who marks F fails the course. These grading criteria were explained in the guidance for freshman students in April. Fig 4 shows the slide that was used for this purpose. Therefore, students learnt that the official guideline related the scores to the grading criteria. At the same time, in this guidance, instructors also verbally announced that there was a possibility that raw scores would be adjusted upward to obtain a reasonable pass rate. That is, when the instructors give grades to the students afterward, they can adjust students’ final raw scores upward depending on the entire final raw score distribution. Meanwhile, none of the students ever knows whether the instructors will adjust their final raw scores in advance of taking the final examination. For example, because the average final raw score in Economics I was quite low (52.7 points out of the perfect score of 100), the instructors decided to add 9 points to all the students’ final raw scores. This upward adjustment made the student whose final raw score was above 50 receive a credit for Economics I. In contrast, because the instructors did not finally adjust the students’ final raw score in Economics II, only the students whose final raw score was 60 and over got credit for Economics II.

Fig 4. The slide to explain the grading scheme (originally written in Japanese).

Fig 4

There are three points to note in our grading scheme. First, students in Economics II knew the grading scheme described above. The instructors had already explained this grading scheme in detail at the beginning of the second semester, in addition to the guidance given in April. Moreover, this grading scheme had already been employed in Economics I, in which students were graded based on scores in the midterm examination (June) and final examination (July). After the midterm examination, instructors handed a letter in person to the students to inform them of their own absolute score, the average score of all students, and the class average score. Students also received the official grade report from the university in August. This grade report was also available online. Therefore, in Economics I, students could decide how much effort to put into the final examination after they learnt their absolute scores in the midterm examination; after that, they also learnt whether they had passed the course or not. This flow of events was the same in Economics II. Thus, students in Economics II had already experienced the same grading scheme in Economics I. It can be considered that students understood the relationship between effort input in the midterm and final examinations and their grade. In other words, Economics I served as a practice session that familiarized students with the grading scheme used in Economics II.

Second, instructors grade all students in the four classrooms using the same grading criteria in Economics II. All students had to register for the same courses of Economics I and II. After registration, instructors assigned Classrooms 1, 2, 3, and 4 to students. As noted, all students took the same examination at the same time using a multiple-choice computer-scored answer sheet. It was announced several times during the course that the grading criteria were the same for all four classes. We therefore believe that students considered that they were graded according to the same criteria across the four classes. In addition, four instructors decided the cutoff scores after consultation, and one instructor registered the grades of all the students on behalf of the other three instructors. Instructors could not deviate from the agreed pass scores. Therefore, students compete not only with students in their classroom but also the other classrooms; whether students pass or fail will depend on their relative position in the entire score distribution of more than 200 students. The students in Economics II were exposed to the uncertainty of the threshold they must exceed to pass the course.

Finally, in our experiment, when the instructors gave grades to the students afterward, they may well adjust the students’ final raw scores upward but never adjusted their final raw scores downward. Under these circumstances, a student whose final raw score was over 60 points got credit for Economics II. If the student who already got 60 points in the midterm examination gets at least 60 points in the final examination, she can get credit for Economics II. That is, whether students pass or fail depends not only on the students’ rank in the distribution of the final raw scores but also on their final raw scores per se.

Balance between the control and treatment groups

Table 1 provides the total number of students and the means and standard deviations of the midterm examination scores in Economics II for the control and treatment groups. Table 1 also shows how we randomly divided these students into the control and treatment groups.

Table 1. Randomization checks.

Panel A. Descriptive statistics of the midterm examination scores.
1 2 3 4 5
Classrooms 1–4 Classrooms 2–4 Classroom 1 Male Female
I. All Obs. 284 244 40 248 36
Mean 49.57 46.96 65.48 49.40 50.72
S.D. 17.36 16.24 15.54 17.60 15.72
II. Receive a letter Obs. 255 215 40 221 34
Mean 50.67 47.92 65.48 50.77 50.03
S.D. 17.02 15.85 15.54 17.22 15.90
 -i. Control Obs. 130 106 24 113 17
Mean 51.48 48.21 65.96 51.94 48.47
S.D. 18.64 17.70 15.84 18.88 17.23
 -ii. Treatment Obs. 125 109 16 108 17
Mean 49.82 47.63 64.75 49.55 51.59
S.D. 15.18 13.90 15.56 15.29 14.82
Panel B. Mean-comparison test (Welch t-test).
t-value P-value
(a) Comparison (1, II-i) with (1, II-ii) 0.781 0.435
(b) Comparison (2, II-i) with (2, II-ii) 0.264 0.792
(c) Comparison (3, II-i) with (3, II-ii) 0.239 0.813
(d) Comparison (4, II-i) with (4, II-ii) 1.037 0.301
(e) Comparison (5, II-i) with (5, II-ii) -0.566 0.576

In total, 284 students took midterm examinations, and their mean score was 49.57. We randomly divided these students into control and treatment groups. However, some students failed to receive the letter. Consequently, in our experiment, there are 255 subjects with a mean score of 50.67. There were 130 and 125 students in the control and treatment groups, respectively. The mean scores for the control and treatment groups are 51.48 and 49.82, respectively, and there is no significant difference in the mean scores between the control and treatment groups, as shown in row (a) in Panel B.

One point to note is the differences between classrooms. Table 1 also shows that we randomly divided students into the control and treatment groups if we consider these differences. The mean score in the midterm examination in Classroom 1 is much higher than that in Classrooms 2–4 because we enrolled students with a top-40 mark in the Pretest of Mathematics in Classroom 1. The number of students who received the letter in Classrooms 2–4 is 215, while that in Classroom 1 is 40. The mean for students who received the letter in Classroom 1 is 65.48 and that in Classrooms 2–4 is 47.92. There is no significant difference in the mean scores between the control and treatment groups across Classrooms 2–4. While the number of students and the mean for the control group is 106 and 48.21, respectively, those for the treatment group are 109 and 47.63, respectively. We do not reject the null hypothesis that “the mean values of the two groups are not different,” as shown in row (b) in Panel B. In addition, as for Classroom 1, the number of students for the control and treatment groups are 24 and 16, respectively, and the mean scores for the control and treatment groups are 65.96 and 64.75, respectively. We again do not reject the null hypothesis that “the mean values of the two groups are not different,” as shown in row (c) in Panel B. There is also no significant difference in the mean scores between the control and treatment groups by sex (rows (d) and (e) in Panel B).

Another point to note is the possibility of information spillover. Because Classroom 1 contains only 40 students, one may wonder if students in the control group can know their rank by communicating with students in the treatment group. However, such communication and potential spillover should be minimal because the information set in Classroom 1 is sparse. Only seven students in the control group can find a student in the treatment group with exactly the same score. The remaining 17 students in the control group cannot find such a student, meaning that these students can only know a range of rank but cannot know their precise rank. For example, there are two students in treatment group: the score of one of them is 55 and the rank of her is 95, and the score of the other of them is 52 and the rank of her is 120. Also, there is one student in the control group whose midterm score is 53. However, there is no student in the treatment group whose midterm score is exactly 53. In this case, the student cannot know her own rank accurately. All she can know is that her rank is between 96 and 119. Of course, because there is no guarantee that she will successfully find students in the treatment group with scores of 55 and 52, her estimation about her own rank could be more ambiguous. Therefore, our experimental procedure could capture the effect of the difference in the degree of uncertainty in the rank perception on the final exam scores.

Results and discussion

The effects of relative performance information feedback

The randomized controlled trial means that we obtain two groups that are statistically equivalent to each other. This study captures the effects of relative performance information feedback on the final examination scores, that is, the average effects of assignment to the treatment group versus assignment to the control group. We simply use OLS to estimate the effects of treatment interventions on test scores (e.g., Levitt et al. [12]) and employ the following empirical framework:

YFi=αDi+Xiβ+ϵi, (1)

where YFi denotes the scores in the final examination for student i. When student i took the midterm examination but not the final examination, we treat YF for student i as zero. Di is a dummy variable equal to one if student i is given information on her relative rank in the midterm examination (i.e., the student is in the treatment group), and zero if student i is not given this information (i.e., the student is in the control group). Xi denotes the covariates including a constant term, and ϵi are disturbances.

Table 2 provides descriptive statistics for all the variables used in this estimation model. In our experiment, we randomly assigned all students to the treatment or the control group using a random number generator. While associated covariates, the midterm examination scores and the classrooms are also randomly selected, we are concerned about ex post differences in the values of the midterm examination scores. We include the midterm examination scores YMi for student i, dummy variables for students in different classrooms, Class1i, Class2i, and Class3i (the classroom fixed effects), and a female dummy variable Femalei in the vector Xi.

Table 2. Descriptive statistics.

Variable Definition Mean Std. Dev. Min Max
Total (Obs. = 254)
YF Score in the final examination 63.807 20.341 0 100
D = 1 if student is given information on her relative rank in the midterm examination, = 0 if elsewhere 0.488 0.501 0 1
YM Score in the midterm examination 50.602 17.020 12 102
H = 1 if YMi ≥ 60, = 0 if elsewhere 0.291 0.455 0 1
Class1 = 1 if in the classroom 1 (math class), = 0 elsewhere 0.157 0.365 0 1
Class2 = 1 if in the classroom 2, = 0 elsewhere 0.295 0.457 0 1
Class3 = 1 if in the classroom 3, = 0 elsewhere 0.264 0.442 0 1
Female = 1 if student is a female, = 0 elsewhere 0.130 0.337 0 1
Treatment (Obs. = 124)
YF 64.927 17.372 0 92
D 1 0 1 1
YM 49.677 15.154 12 95
H 0.250 0.435 0 1
Class1 0.129 0.337 0 1
Class2 0.306 0.463 0 1
Class3 0.266 0.444 0 1
Female 0.129 0.337 0 1
Control (Obs. = 130)
YF 62.738 22.833 0 100
D 0 0 0 0
YM 51.485 18.642 18 102
H 0.331 0.472 0 1
Class1 0.185 0.389 0 1
Class2 0.285 0.453 0 1
Class3 0.262 0.441 0 1
Female 0.131 0.338 0 1

1) Because we exclude a student whose midterm score was revised from the sample, our final sample comprised 254 students.

2) Our experiment was performed using freshman students in an economics department at a private Japanese university. Using the observed standard deviation of 17.02 from the midterm examination scores shown in Table 2, our sample size can detect a 5.5 point treatment effect of relative performance information feedback at 10% significance level with 80% power (the sample size per group required to detect the significant effect is 120). Thus, our sample size is underpowered to estimate effect sizes below this cutoff, many of which could have a positive effect.

Column (1) in Table 3 report the results of estimating Eq (1). We report robust standard errors that are not clustered by classroom but that are adjusted for individual heterogeneity. In our experiment, there is no cluster in the population of interest not represented in the sample. Moreover, clustered standard errors with only a few clusters (e.g., just four classrooms in our experiment) could not be reliable as explained by Angrist and Pischke [13]. There is no need to adjust standard errors for clustering once fixed effects are included as argued by Abadie et al. [14]. The magnitude of the coefficient for D is 3.517 and statistically significant. This indicates that the scores in the final examination for students who received information on their relative rank in the midterm examination are 3.517 points higher on average than the scores for students who did not get this information. When we use the difference between the midterm examination score and the final examination score (YFiYMi) as the dependent variable, we see the significantly positive impacts of the relative rank in the midterm examination. As shown in Column (2), the magnitude coefficient for Di is 3.706. The coefficient of Class2 in Column (1) and those of Class1–Class3 in Column (2) are significant. The classroom fixed effects would absorb instructor fixed effects as well as peer effects and other classroom-level factors. These effects and factors may be associated with test scores.

Table 3. Estimation results: The effects of relative performance information feedback.

Dependent variables YF YFYM
(1)
Coeff.
(2)
Coeff.
D 3.517*
(2.023)
3.706*
(2.067)
YM 0.752***
(0.075)
Class1 -2.775
(3.166)
-7.694***
(2.953)
Class2 -4.589*
(2.654)
-4.947*
(2.644)
Class3 -4.552
(2.973)
-5.970**
(2.990)
Female 0.978
(3.112)
1.268
(3.183)
Constant 26.884***
(4.773)
15.478***
(2.416)
Obs. 254 254
Adjusted R2 0.38 0.03
F test H0: all the coefficients except the constant are jointly zero 22.59*** 2.77**

1) *,** and*** indicate statistical significance at the 10%, 5% and 1% levels, respectively.

2) Standard errors in parentheses are adjusted for heterogeneity.

3) Because we exclude a student whose midterm examination score was revised from the sample, our final sample comprised 254 students.

In terms of other research considerations, such as the experimental design employed by Tran and Zeckhauser [10], we divided students into control and treatment groups within each classroom. Because Di was randomly assigned, the coefficient for Di have a causal interpretation. However, the coefficient for Di tells us the causal effect of the offer of treatment, including the fact that some of those offered have shared their ranks with their classmates, even if it is difficult for a student in the control group to identify a student in the treatment group with precisely the same score. This leads to the suggestion that the coefficient for Di may be small relative to the average causal effects on those in fact treated.

The heterogeneity in the effects due to high- or low-performance in the midterm examination

To understand students incentive to study in our experiment, it is useful to identify the causes of our treatment effects. Czibor et al. [15] conduct a field experiment in a Dutch university and compare relative and absolute grading. They find no significant differences in examination scores between a relative and an absolute grading scheme. On this basis, Czibor et al. [15] contend that rank incentives are weak if students adopt a just-pass behavior. That is, if students only care about whether they can pass the course, they will not put in effort to gain a higher rank than that to which they aspired. Even if graded relatively, relative performance information feedback may exert different impacts on student incentives to study depending on their attitude toward obtaining higher grades. Our grading scheme is close to the classroom setting of just-pass students considered by Czibor et al. [15]. That is, although grades consist of S, A, B, C, and F in our experiment, the threshold between C and F is distinguished from the other thresholds. This is because students must pass the course in order to graduate. In addition, in our experiment, when the instructors gave grades to the students afterward, they never adjusted their final raw scores downward. Under these circumstances, the threshold between C and F depends not only on the students’ ranks in the distribution of the final raw scores but also on their final raw scores. Therefore, in our experiment, the higher a student’s midterm examination score, the lower the required score in the final examination for her to receive credit in Economics II. These indicate that the rank for students with a higher (lower) midterm examination score has less (more) tangible benefits. If so, the relative performance information feedback could affect high- or low-performing students differentially.

We can visually observe the heterogeneity in the effects when depicting a violin plot of the final examination scores by group in Fig 5. When based on below the lower level of the median scores in the final examination, the kernel density for the treatment group is narrower than that for the control group. This suggests that the relative performance information feedback may raise effort for students whose midterm examination scores were relatively lower, while the feedback may decrease effort for students whose midterm examination score were relatively higher. That is, the relative performance information feedback could affect high- or low-performing students differentially.

Fig 5. The violin plot of the final examination scores.

Fig 5

The violin plot comprises a combination of the box plot and the density trace. This includes a marker for the median scores of the final examination, a box indicating the interquartile range and spikes extending to the upper- and lower-adjacent values, and plots of the estimated kernel density.

To examine the heterogeneous effects, we consider the following two equations:

YFi=α1Di+α2(Di×YMmci)+Xiβ1+u1i, (2)
YFi=α3Di+α4(Di×Hi)+α5Hi+Xiβ2+u2i. (3)

In Eq (2), we additionally include YMmci instead of YMi and the interaction term Di × YMmci in Eq (1). YMmci denotes the midterm examination scores that are centered at the mean of YMi. On the other hand, in Eq (3), we additionally include Hi and the interaction term Di × Hi in Eq (1). Hi is a dummy variable equal to one if student i’s performance in the midterm examination was relatively high. That is, we use the nonbinary midterm performance variable in the interaction term in Eq (2), while we use the dummy variable for top-half versus bottom-half in the midterm performance in the interaction term in Eq (3). In Eq (3), we set the threshold between high and low performances on the midterm examination to 60 points, where 60 is the ex ante threshold score whether the students pass or fail the course.

As shown in Column (1) in Table 4, the estimated coefficient for the interaction term Di × YMmci is negative (−0.197) but insignificant. It is not confirmed that the impacts of relative performance information feedback linearly depend on the midterm performance. In contrast, as shown in Column (2), the coefficient for Di is significantly positive and this magnitude is 5.553. The relative performance information feedback for low-performing students on the midterm examination has a positive impact on their performance. Moreover, while the coefficient for the interaction term Di × Hi is significantly negative, we do not reject the null hypothesis that “the sum of the coefficients for Di+ (Di × Hi) equals the coefficients for Hi” in Column (2). This indicates that, as for high-performing students, there is no significant difference between the final examination scores for students who received their rank information for the midterm examination and the scores for students who did not receive their rank information. That is, relative performance information feedback does not have a significant impact on high-performing students.

Table 4. Estimation results: The heterogeneity in the effects of relative performance information feedback.

Dependent variables YF
(1)
Coeff.
(2)
Coeff.
D 3.686*
(2.077)
5.553**
(2.602)
D × YMmc -0.197
(0.134)
YMmc 0.827***
(0.096)
D × H -8.074**
(3.550)
H -2.892
(3.575)
YM 0.875***
(0.115)
Class1 -2.640
(3.136)
-1.831
(3.140)
Class2 -4.685*
(2.641)
-3.709
(2.690)
Class3 -4.357
(2.961)
-3.622
(2.991)
Female 1.148
(3.115)
1.363
(3.085)
Constant 63.950***
(2.396)
20.831***
(6.137)
Obs. 254 254
Adjusted R2 0.39 0.40
F test H0: all the coefficients except the constant are jointly zero 19.11*** 18.66***
F test H0: the sum of the coefficients of D and D × H = the coefficients of H 0.01

1) *,** and*** indicate statistical significance at the 10%, 5% and 1% levels, respectively.

2) Standard errors in parentheses are adjusted for heterogeneity.

3) Because we exclude a student whose midterm examination score was revised from the sample, our final sample comprised 254 students.

4) YMmc denotes the midterm examination scores which are centered at the mean of YM.

In Economics I and II, when student i took the midterm examination but not the final examination, we usually set YF for student i to zero. The official university guidelines also state that a student who does not take the final examination fails the course. In our experiment, 12 students who took the midterm examination did not take the final examination. Table 5 shows the number of students whose final examination score is zero by control and treatment groups. We can see that, while the midterm examination scores for the students in both two groups are lower, the number of students in the control group was greater than the number of students in the treatment group. This suggests that relative performance information feedback could prevent the students with lower scores in the midterm examination from dropping out of the final examination.

Table 5. The number of students whose final examination scores are zero.

Score in the midterm examination Control Treatment
18 2 0
19 0 1
28 4 0
32 1 1
37 0 1
38 2 0
Total 9 3

When we treat the students who did not take the final examination as missing and exclude them from our estimation sample, the coefficients for D in Eqs (1)–(3) are positive but insignificant, as reported in Columns (1)–(4) in Table 6. On the other hand, relative performance information feedback on the midterm examination has a significantly negative impact on dropout of the final examination. Table 7 reports the estimation results using OLS applied to a 0–1 dummy variable Dropouti, which takes the value of one if student i dropped out of the final examination and zero otherwise. Thus, relative performance information feedback prevents low-performing students from dropping out of the final examination.

Table 6. Robustness check: The effects of relative performance information feedback (excluding students whose final examination scores are zero).

Dependent variables YF YFYM YF YF
(1)
Coeff.
(2)
Coeff.
(3)
Coeff.
(4)
Coeff.
D 0.820
(1.536)
1.734
(1.792)
0.958
(1.602)
2.637
(2.012)
D × YMmc -0.074
(0.103)
YMmc 0.562***
(0.058)
D × H -6.300**
(3.121)
H 0.596
(3.356)
YM 0.532***
(0.050)
0.572***
(0.091)
Obs. 242 242 242 242
Adjusted R2 0.37 0.03 0.37 0.38
F test H0: all the coefficients except the constant are jointly zero 24.54*** 2.92** 22.41*** 20.15***
F test H0: the sum of the coefficients of D and D × H = the coefficients of H 0.86

1) *,** and*** indicate statistical significance at the 10%, 5% and 1% levels, respectively.

2) Standard errors in parentheses are adjusted for heterogeneity.

3) Because we exclude students whose final examination score was zero or revised from the sample, our final sample comprised 242 students.

4) YMmc denotes the midterm examination scores which are centered at the mean of YM.

5) Coefficients of Class1, Class2, Class3, Female and Constant are not reported.

Table 7. Robustness check: The effects of relative performance information feedback on dropout (linear probability model).

Dependent variables Dropout Dropout
(1)
Coeff.
(2)
Coeff.
D -0.050*
(0.026)
-0.052*
(0.027)
D × YMmc 0.
(0.002)
YM -0.004***
(0.001)
YMmc -0.005***
(0.002)
Obs. 254 254
Adjusted R2 0.09 0.09
F test H0: all the coefficients except the constant are jointly zero 2.24** 1.94*

1) *,** and*** indicate statistical significance at the 10%, 5% and 1% levels, respectively.

2) Standard errors in parentheses are adjusted for heterogeneity.

3) Because we exclude a student whose midterm examination score was revised from the sample, our final sample comprised 254 students.

4) YMmc denotes the midterm examination scores which are centered at the mean of YM.

5) Coefficients of Class1, Class2, Class3, Female and Constant are not reported.

In our experiment, we confirm the following key finding: relative performance information feedback indeed exploits the incentive to study of low-performing students in the midterm examination, but it has no significant impact on high-performing students.

Discussion

We demonstrate that the significant positive impact of relative performance information feedback is mainly caused by the impact on low-performing students rather than that on high-performing students. What is the reason for this? Suppose that a student aspires to achieve a higher score than another student. The relative performance information feedback tells her the distribution of abilities of students, and how much effort to put in to beat her competitor. Thus, a student with feedback faces the decreased uncertainty of the threshold needed to pass. However, the impact of this information feedback may be asymmetric due to the relative position of the student to her rival. That is, if the student knows that she is already ahead of her rival, she may slack off; if the student knows that she is tied with her rival, she would do her best; if the student knows that she is behind her rival, she may give up. Therefore, relative performance information feedback may potentially diminish the incentive to study in students with very low midterm examination scores even in our setting.

However, we did not find such detrimental effects. Instead, we find that relative performance information feedback reduces the number of students who drop out before the final examination. On this basis, our result is closely related to the effect of class size on students’ incentive to study, as demonstrated by Andreoni and Brownback [3] and Brownback [16]. These studies relate an increase in the number of enrollments in an all-pay auction to a decrease in the uncertainty of the threshold needed to pass in a tournament. Their theoretical model predicts that there are aggregately positive but heterogeneously mixed impacts; a decreasing degree of uncertainty as measured by an increasing class size elicits the effort of high-ability students but suppresses the effort of low-ability students. Andreoni and Brownback [3] confirm this theoretical prediction in a laboratory experiment. In contrast, Brownback [16] conducts an actual classroom experiment, and finds that although increasing class size has positive aggregate impacts, there is no negative impact on the effort of low-ability students. This is similar to our findings.

Some other field experiments also demonstrate that treatments regarding relative rewarding or relative performance information feedback have negative impacts on the students with relatively low abilities. Campos-Mercade and Wengstrom [17] conduct field experiments in an actual university program to examine how monetary reward affects the threshold incentive. In their experiments, students in the treatment group received a scholarship if they reached a certain GPA. The authors demonstrate that only the treatment effects on the male students who are just below the thresholds are significantly positive in the short run. This result is also consistent with ours. However, Campos-Mercade and Wengstrom [17] also demonstrate that the treatment effects on the female students whose initial abilities are low are significantly negative in the long run. Bedard and Fischer [18] examine the effect of relative evaluation on students’ examination performance using a field experiment in an actual university classroom. They find that the relative evaluation scheme negatively affects the performance of students who consider they are relatively low in the ability distribution. Ashraf et al. [11] conduct a field experiment in a health assistant training program in Zambia. In their experiment, student rewards are absolute, with some students advised that they will receive a rank-related reward. The authors conclude that the performance of students whose initial achievement level is relatively low is significantly lower when it is announced that they will receive a rank-related reward. These results suggest that relative performance information feedback might have potentially detrimental impacts on students with low ability.

In our experiment, however, information feedback prevents students with low midterm exam scores from dropping out. This may be due to a number of factors that are specific to our experiment. For example, the midterm examination score distribution may be sufficiently converged for students. If the ability distribution is more diverged, relative performance information feedback may have a detrimental impact even in our experimental framework. Other factors may also prevent students with low midterm scores from dropping out. Further research is required to identify the determinants of the incentive to study for low-ability students.

Finally, we should note a limitation concerning the generalizability of our results. Once we consider the situation in which students’ primary concern is to obtain the highest possible grade, the results may be altered. That is, if there are multiple thresholds that students would like to pass, relative performance information feedback even has a significant positive impact on high-performing students. Indeed, this may be the case in the study by Tran and Zeckhauzer [10]. In their experiment, rank has no tangible benefit, but students may have an inherent preference for rank. Gill et al. [19] conduct a laboratory experiment to examine the impact of relative performance information feedback on the subjects’ performance when rewards are absolute, that is, where rewards are independent of the other subjects’ performance. They find that the rank response function is U-shaped, that is, subjects increase their effort the most in response to relative performance information feedback when they are ranked first or last. The authors argue that the U-shaped response function is caused by the combination of pride or “joy of winning” from achieving a high rank together with an aversion to a low rank. Further research is needed to identify the most effective grading and information feedback scheme that can elicit the incentive to study for high-ability students.

Conclusions

Our experimental results demonstrate that relative performance information feedback has a positive impact on a student’s examination score in a relative grading environment where she takes examinations multiple times. In particular, this positive impact is indeed significant for low-performing students in the previous examination. As emphasized in Becker and Rosen [1], a student’s position in the distribution of academic attainment is crucial in relative grading.

There are two important areas for future research: one is to identify the determinants of the incentive to study for low-performing students; the second is to identify the most effective grading and information feedback scheme for eliciting the incentive to study for high-performing students.

Supporting information

S1 File. Steps to replicate the tables and figures.

The instruction to replicate the tables and figures in “Information Feedback in Relative Grading: Evidence from a Field Experiment” by Shinya Kajitani, Keiichi Morimoto and Shiba Suzuki.

(PDF)

S2 File. Stata program file.

(DO)

S1 Dataset. This is the CSV file that contains our datasets.

(CSV)

Acknowledgments

We are very grateful to four anonymous reviewers of this journal, Akira Yamazaki, Kentaro Kobayashi, Hayato Nakata, and Masahiro Watabe for invaluable comments and suggestions. We also thank Naohito Abe, Kosuke Aoki, David Gill, Tao Gu, Shigeki Kano, Vu Tuan Kai, Nobuyoshi Kikuchi, Jun-Hyung Ko, Colin McKenzie, Akira Miyaoka, Koyo Miyoshi, Tomoharu Mori, Chang-Min Lee, Masao Nagatsuka, Kengo Nutahara, Fumio Ohtake, Daniela Puzzello, David Reiley, Masaru Sasaki, Michio Suzuki, Kan Takeuchi, Ryuichi Tanaka, Tomoaki Yamada and participants at annual meeting on Japanese Economic Association 2013 Spring Meeting (Toyama, Japan), Economic Science Association European meeting 2015 (Heidelberg, Germany), Asian and Australasian Society of Labour Economics 2018 Conference (Seoul National University), Kansai Labor Workshop, and seminar participants at Meiji University, Meisei University, Seikei University, and the University of Tokyo for their helpful comments.

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1. Becker WE, Rosen S. The learning effect of assessment and evaluation in high school. Econ Educ Rev. 1992;11(2):107–18. 10.1016/0272-7757(92)90002-K. [DOI] [Google Scholar]
  • 2. Lazear EP, Rosen S. Rank-order tournaments as optimum labor contracts. J Polit Econ. 1981;89(5):841–64. 10.1086/261010. [DOI] [Google Scholar]
  • 3. Andreoni J, Brownback A. All pay auctions and group size: Grading on a curve and other applications. J Econ Behav Organ. 2017;137:361–73. 10.1016/j.jebo.2017.03.017. [DOI] [Google Scholar]
  • 4. Aoyagi M. Information feedback in a dynamic tournament. Games Econ Behav. 2010;70(2):242–60. 10.1016/j.geb.2010.01.013. [DOI] [Google Scholar]
  • 5. Ederer F. Feedback and motivation in dynamic tournaments. J Econ Manag Strat. 2010;19(3):733–69. 10.1111/j.1530-9134.2010.00268.x. [DOI] [Google Scholar]
  • 6. Eriksson T, Poulsen A, Villeval MC. Feedback and incentives: Experimental evidence. Labour Econ. 2009;16(6): 679–88. 10.1016/j.labeco.2009.08.006. [DOI] [Google Scholar]
  • 7. Freeman RB, Gelber AM. Prize structure and information in tournaments: Experimental evidence. Am Econ J Appl Econ. 2010;2(1):149–64. 10.1257/app.2.1.149. [DOI] [Google Scholar]
  • 8. Ludwig S, Lünser GK. Observing your competitor: The role of effort information in two-stage tournaments. J Econ Psychol. 2012;33:166–82. 10.1016/j.joep.2011.09.011. [DOI] [Google Scholar]
  • 9. Azmat G, Iriberri N. The importance of relative performance feedback information: Evidence from a natural experiment using high school students. J Pub Econ. 2010;94(7):435–52. 10.1016/j.jpubeco.2010.04.001. [DOI] [Google Scholar]
  • 10. Tran A, Zeckhauser R. Rank as an inherent incentive: Evidence from a field experiment. J Pub Econ. 2012;96(9):645–50. 10.1016/j.jpubeco.2012.05.004. [DOI] [Google Scholar]
  • 11. Ashraf N, Bandiera O, Lee SS. Awards unbundled: Evidence from a natural field experiment. J Econ. Behav Organ. 2014;100:44–63. 10.1016/j.jebo.2014.01.001. [DOI] [Google Scholar]
  • 12. Levitt SD, List JA, Neckermann S, Sadoff S. The Behavioralist Goes to School: Leveraging Behavioral Economics to Improve Educational Performance. Am Econ J Econ Policy. 2016;8(4):183–219. 10.1257/pol.20130358. [DOI] [Google Scholar]
  • 13. Angrist JD, Pischke JS. Mostly harmless econometrics: An empiricist’s companion. Princeton: Princeton university press; 2008. [Google Scholar]
  • 14.Abadie A, Athey S, Imbens GW, Wooldridge J. When should you adjust standard errors for clustering? In: National Bureau of Economic Research Working Paper 24003;2017.
  • 15.Czibor, E., S. Onderstal, R. Sloof, and M. van Praag. Does relative grading help male students? Evidence from a field experiment in the classroom. Econ Edu Rev. forthcoming. 10.1016/j.econedurev.2019.101953. [DOI]
  • 16. Brownback A. A classroom experiment on effort allocation under relative grading. Econ Edu Rev. 2018; 62:113–128. 10.1016/j.econedurev.2017.11.005. [DOI] [Google Scholar]
  • 17.Campos-Mercade, P., E. Wengström. Threshold incentives and academic performance. Job market paper. https://www.dropbox.com/s/ejnc8uzz6y30dio/Campos-Mercade_JMP.pdf?dl=0.
  • 18. Bedard K., Fischer S. Does the response to competition depend on perceived ability? Evidence from a classroom experiment. J Econ Behav Organ. 2019; 159:146–166. 10.1016/j.jebo.2019.01.014. [DOI] [Google Scholar]
  • 19. Gill D., Kissovà Z., Lee J., Prowse V. First-place loving and last-place loathing: How rank in the distribution of performance affects effort provision. Manage Sci. 2019:65(2):494–507. 10.1287/mnsc.2017.2907. [DOI] [Google Scholar]

Decision Letter 0

Baogui Xin

10 Feb 2020

PONE-D-19-35238

Information feedback in relative grading: Evidence from a field experiment

PLOS ONE

Dear Prof. Suzuki,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Though Reviewer 4 rejected PONE-D-19-35238, the reviewer provided many valuable and constructive comments. Considering reviewers’ useful comments and the interesting topic of the manuscript, I would like to give you a chance to revise your manuscript. The revised manuscript will undergo the next round of review by the same reviewers.

We would appreciate receiving your revised manuscript by Mar 26 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Baogui Xin, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: No

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

Reviewer #4: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors conducted a randomized controlled trial at a Japanese university to examine the impact of relative performance information feedback of students' performance on their examination score under the relative. Their results show that relative performance information feedback has a significantly positive impact on the students' final examination score on average, but that the average positive impact is derived by the improvement of low-performing students.

In general, this is not a large sample for RCT, but I think it’s an interesting topic and authors perform a good study design and data analysis. However, I fear that this will require quite some work with respect to the paper.

First, although authors claim that it would be generally difficult for a student in the control group to identify a student in the treatment group with exactly the same score and it leads to the ITT effects may underestimate the average causal effects on those in fact treated, my major concern still the information spill over that students in both the control and treatment groups sit in the same classroom can exchange their information from letters. Actually, it is easy for students to exchange their score and rank information in the same classroom, especially in the classroom 1 (only 40 students), and students may guess why only a part of students receive rank information that would disturb the treatment effect. Authors should make more clarification or verification about this problem.

Second, authors should report more details on experiment manipulation. For example, how they avoid different pass standards for exam scores from different instructors? Did students pass exam depend on specific distribution of all the scores, and whether students know it? Whether there was a strict demand from school on distribution of every score level (A/B/C/D) for instructors?

Third, authors can do more work on data analysis. The paper presents some reasons that authors do not report robust standard errors on classroom level. I want to know whether they can try to cluster at the other level, such as dormitory. The paper does not report the variables they controlled in the regression. In addition, whether they can try to use the difference between the midterm exam score and the final exam score as the explained variable.

Reviewer #2: Review overview:

In this paper, the authors look at how effort responds to information about relative position in a classroom graded on the curve. The primary treatment arm randomly assigns students to receive information about their midterm exam grade relative to the population of students. Then, the authors measure how this affects performance on the final exam. As a result of this information intervention, higher ability students reduce their motivated relative to lower ability students.

This paper has chosen an important topic and an ambitious approach to studying it. I have no concerns about this being of broad interest.

Please see attached file for the remainder of the review.

Reviewer #3: Report on PONE-D-19-35238: Information feedback in relative grading: Evidence from a field experiment by Shinya Kajitani, Keiichi Morimoto, & Shiba Suzuki

Summary: The authors report the results from a field experiment with economics students testing the impact of providing relative performance feedback in an intermediate exam on students' performance in the final exam. They find that the intermediate relative performance feedback has a significantly positive effect on students' performance in the final exam on average, and that this effect is mainly driven by students who performed badly in the intermediate exam.

Comments:

This paper reports interesting results from a well-conducted field experiment in a relevant setting with real incentives. I am in general quite positive about the paper, however, I do have some comments and questions that the authors should or could address in a revision.

Major comments

1) Generalizability of the result that relative performance feedback mainly affects low performers: One of the main results of the experiment, which the authors mention already in the abstract, is that the relative performance feedback had an effect mainly by increasing the performance of students who performed badly in the intermediate exam. While this effect is interesting and is supported by the data analysis of the current experiment, it seems important to state that this effect cannot simply be generalized to other settings in which relative performance feedback is given. From a theoretical perspective, on which groups relative performance feedback has an effect likely depends on the distribution of performance or skills differences in the population (and, of course, people's knowledge about or perception of these distributions) and on the exact incentive structure. If the incentive structure is such that only very top performers are rewarded or if initial low performers have much lower skills (higher effort costs) than their higher-skilled competitors, then it may well be the case that relative performance feedback actually demotivates low performers and has a more motivating effect on people in the middle of the skill / performance distribution. Thus, while I believe the result the authors find in the current setting is interesting, I think it should be made clear that one cannot automatically expect it to generalize to other settings where relative performance feedback is used.

2) Observations of students who did not take the final exam: Currently the authors set the scores of students who did not take the final exam to zero. This seems to be a strong assumption that is not necessarily warranted. Another approach would be to simply treat these observations as missing and exclude them from the analyses. As a robustness check, I would like to see the results of the regression analyses if this way of dealing with these cases is used. Moreover, I find the authors' use of the "Intention to Treat" (ITT) effect in the paper confusing. To me an ITT effect would much rather be including those students who were randomized into a certain group but did not receive the treatment (i.e., the feedback).

3) Estimators used in regression analyses: what estimator was used in the regressions? I assume the authors used a Tobit-estimation, as they speak about left-censoring. If this is the case, I doubt whether it is necessary. It seems perfectly fine to simply use OLS to estimate the effects of treatment interventions on test scores with a lower bound at zero (see, e.g., Levitt et al., 2016). Using OLS would also allow dropping the marginal effects columns, as the coefficients can be directly interpreted. In any case, the authors should be clear about what estimator they used and why.

Minor comments

4) Information / knowledge about incentive structure: The authors write that the "students in Economics II knew the grading scheme … because the instructors already explained the grading scheme in detail at the beginning of the second semester and this grading scheme had already been employed in Economics I" (p. 8). Given that students' knowledge about the relative performance incentive structure created by the grading scheme is a key element of the experiment and the paper, I think it would nice to provide more details on the exact information that students received. Could, for instance, slides or other materials that were used to explain the grading scheme to students be shown in an Appendix? Or, even better, do the authors have any data on students' understanding of the grading system and the incentive structure it creates? The system seems relatively complicated with quite some discretion on the part of the graders, so it is not completely obvious that the students would have understood how the system works exactly and what incentives it creates. Any additional evidence that can be provided in this regard would therefore be helpful and make the interpretation of the experimental results more convincing.

5) Were the graders blind to the treatment a certain student had been assigned to? Ideally, they would have been. In any case, this information should be added.

6) If available, please provide some more information on demographics for the randomization checks in Table 1 (e.g., gender or any other available data).

7) I think the paragraph on how randomization avoids pitfalls of regression to the mean (p. 10) can be deleted. This point is obvious to anybody who has understood how randomization works, and the paragraph does little more than divert the reader's attention.

8) I don't understand how Table 5 allows addressing the point that "relative performance information feedback could positively affect high-performing students more than low-performing students, even if the rank for the high-performing students has less tangible benefits" (p. 15). The only thing that the regressions reported in Table 5 do differently compared to those in Table 4 is to use a non-binary intermediate performance variable (the score in the intermediate exam) in the interaction term. This is a relevant robustness check, as the dummy for top-half vs. bottom-half in the intermediate exams contains less information than the quasi-continuous score variable. I would suggest that the authors motivate Table 5 that way, or alternatively, explain better how this analysis addresses the point quoted above.

9) Moreover, the authors could consider mean-centering Y_M (the score in the intermediate exam) in the regressions reported in Table 5. Without mean-centering (as is currently the case) the effect of the treatment Dummy D is estimated at Y_M = 0, which is a very special case.

10) Wording / language:

P. 2: "efficient way of eliciting the incentives": "eliciting" doesn't seem to be the right word to me here. Maybe "increasing" would be better?

P. 3: First sentence of the second paragraph (starting with "revealing the role of…") needs to be rewritten.

P. 6: First sentence in second full paragraph: "The randomized controlled trial was conducted immediately after…": I think it would be more appropriate to write something like "The experimental intervention was implemented immediately after…."

P. 9: I would suggest calling Table 1 "Randomization Checks" instead of "Confirmation of randomness"

Congratulations to the authors on a very nice paper and all the best for their future work!

References:

Levitt, S. D., List, J. A., Neckermann, S., & Sadoff, S. (2016). The behavioralist goes to school: Leveraging behavioral economics to improve educational performance. American Economic Journal: Economic Policy, 8(4), 183-219.

Reviewer #4: This paper reports results of an RCT study on the effect of relative performance information feedback on students’ examination score in a compulsory subject in a Japanese university. Authors find a significant positive effect but the average treatment effect is driven by improvement of scores of low-performing students but not high-performing students. Though there are a few studies on effect of relative grading feedback on scores, there is a dearth of studies on effect of relative grading feedback on scores in a real educational environment. In this context, the study contributes to vast empirical literature on grade incentives. However, there are major issues that need to be addressed. Please find below my specific and general comments.

Specific Comments:

1. The design of experiment has not been explained clearly. For instance, what was the logic behind having Classroom 1 with higher ability students being selected into it? Is it because they are deemed to be more motivated than others or the stratification was driven by other aspects? What were the variations for the three other classrooms? Was there a neutral framing in the group assignment?

2. Any insights on the power calculations would have been useful to understand if the cell sizes are statistically justified?

3. The experiment procedure suggests several threats to internal validity. Of particular concern are the spill over effects. The authors have raised concerns about both treatment and control groups sitting in the same classroom. Was there any control over communications among students about the different grading patterns?

4. How does the treatment result in the observed impact? There is a debate on how relative grading influences motivation that has not been discussed. Aspects of direct competition among peers is not evident.

5. Does the relative performance feedback affect self- perceived ability, cognitive tactics, strategies, reinforcing cues, or identity rather than directly affecting effort? How does the effort incentives differ for those with low and high grades?

6. Theoretically relevant interaction effects and robustness checks could be examined. E.g. past academic history and relative grading.

7. Are there gender gaps in performance such that the sex-ratios in classrooms need to be controlled for?

8. Role of teachers is not clear in this experiment. There is no information on teacher ability differences that may impact students’ scores. This is particularly disconcerting because feedback and instructions have interactions that have been shown to impact performance.

General Comments:

1. The paper is generally well written but there are a few typos that can be addressed.

2. The review of literature can improve, and major theoretical aspects relevant to the paper can be developed.

3. Please verify the claim “By contrast, no existing study examines the impact of relative performance information feedback on student incentives under relative grading in an actual educational environment.”

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Jun Luo

Reviewer #2: No

Reviewer #3: Yes: Manuel Grieder

Reviewer #4: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PLOS_One_Review.docx

PLoS One. 2020 Apr 20;15(4):e0231548. doi: 10.1371/journal.pone.0231548.r002

Author response to Decision Letter 0


24 Mar 2020

Responses to Editor and Reviewers

Dear Professor Baogui Xin and Reviewers,

Thank you very much for giving us an opportunity to submit a revised version of our manuscript entitled “Information feedback in relative grading: Evidence from a field experiment” (Manuscript ID: PONE-D-19-35238). We would like to thank the referees for their insightful comments, all of which have contributed to improving our manuscript. We have followed the referees’ comments and we summarize our responses to their comments below. The major revised portions are marked in red in the revised manuscript. Our detailed responses to all of the comments are in “Reply_Letter.pdf”.

We hope that this version of our paper is now deemed suitable for publication and we look forward to hearing from you in due course.

Sincerely,

Shiba Suzuki, PhD

Professor of Economics, Seikei University

==================== 

Responses to reviewers’ comments

Reviewer #1

1.1) First, although authors claim that it would be generally difficult for a student in the control group to identify a student in the treatment group with exactly the same score and it leads to the ITT effects may underestimate the average causal effects on those in fact treated, my major concern still the information spill over that students in both the control and treatment groups sit in the same classroom can exchange their information from letters. Actually, it is easy for students to exchange their score and rank information in the same classroom, especially in the classroom 1 (only 40 students), and students may guess why only a part of students receive rank information that would disturb the treatment effect. Authors should make more clarification or verification about this problem.

In response to this comment, we confirm that most of the students in the control group cannot know their own rank exactly even if students in Classroom 1 aggregate the rank information they received. Thus, the degree of uncertainty that students in the control group face is much higher than that for students in the treatment group. This point is now added as a separate paragraph in lines 271–288 in the revised paper.

1.2) Second, authors should report more details on experiment manipulation. For example, how they avoid different pass standards for exam scores from different instructors? Did students pass exam depend on specific distribution of all the scores, and whether students know it? Whether there was a strict demand from school on distribution of every score level (A/B/C/D) for instructors?

In response to this comment, the slide that was used to explain the grading criteria in the guidance for freshman students in April is added as Fig. 4 in the revised paper (p. 8). In addition, we also explain that instructors verbally announced the possibility of raw scores being adjusted upward (lines 185–190 in the revised paper). There was no strict demand from school on distribution of every score level for instructors.

In addition, we have added an explanation (lines 219–227) in the revised paper to stress that each instructor could not deviate from the agreed pass scores.

1.3) Third, authors can do more work on data analysis. The paper presents some reasons that authors do not report robust standard errors on classroom level. I want to know whether they can try to cluster at the other level, such as dormitory. The paper does not report the variables they controlled in the regression. In addition, whether they can try to use the difference between the midterm exam score and the final exam score as the explained variable.

There is no cluster at the other level (such as dormitory) in our experimental environment. However, we did not adjust the standard errors for individual heterogeneity. We have rewritten the text on this issue (lines 305–307) and show the robust standard errors in all tables in the revised paper.

The variables we controlled in the regression were D, YM, Class1, Class2, and Class3, as shown in Table 3 in the previous version (p. 13). In the revised paper, we additionally control for the sex dummy Female in the regression. We highlight this in lines 302–304 in the revised paper.

Moreover, as the reviewer suggested, we include the difference between the midterm examination score and the final examination score (YF − YM) instead of YM as the independent variable. As shown in Table 3 in the revised paper (p. 14), the magnitude of the coefficient of D in Column (2) (3.706) is similar to that in Column (1) (3.517). We point this out in lines 315–319 in the revised paper.

Reviewer #2:

2.1) By separating the top 40 students, the interpretation of information feedback becomes challenging. Moreover, the treatment assignment weights are different between classroom 1 and classrooms 2-4.

- I’m a bit confused about Table 3, where there are class-level indicator variables. In this case, dropping Class 1 should only change the precision of the estimates of other covariates, but shouldn’t change the estimation of the treatment effects, because there is no within-class-1 variation in treatment. So, I think you could even drop columns (2a) and (2b) and say it’s taken care of by the indicator variables.

- When you think about the treatment on the treated, it is also challenging to think about how this classroom affects that. If I’m not in Class 1, then I already know something about my relative score, so it’s a little strange. I would like to see this explained a bit in the text.

Thank you for your insightful suggestion that “dropping Classroom 1 should only change the precision of the estimates of other covariates, but should not change the estimation of the treatment effects.” We have now deleted the estimation results shown in Columns (2a) and (2b) in Tables 3–5 in the previous paper.

2.2) The fact that the courses are not explicitly graded on a curve is a little strange. The curving is more important for the lower performing students because it can only move scores upwards. I would like to see the paper mention how an asymmetric curve may affect lower ability students differently from higher ability ones.

A few papers that I would cite are:

- Gill, David, et al. "First-place loving and last-place loathing: How rank in the distribution of performance affects effort provision." Management Science 65.2 (2019): 494-507.

-This could help inform the discussion of how rankings matter for students.

-Bedard, Kelly, and Stefanie Fischer. "Does the response to competition depend on perceived ability? Evidence from a classroom experiment." Journal of Economic Behavior & Organization 159 (2019): 146-166.

- "Threshold Incentives and Academic Performance," Pol Campos-Mercade and Erik Wengström

- Brownback, Andy. "A classroom experiment on effort allocation under relative grading." Economics of Education Review 62 (2018): 113-128.

These three can address the existing literature on how perceptions or realities about relative ranking affect effort choices.

We appreciate this comment because these suggested papers have substantially improved our manuscript. In response, we added the new subsection “Discussion” (pages 19-22) in the revised paper to relate our experimental results to those of existing research. In particular, the first and the second paragraphs in the subsection “Discussion” (p. 19) state the possible asymmetric impact of relative performance information feedback.

Reviewer #3:

3.1) Generalizability of the result that relative performance feedback mainly affects low performers: One of the main results of the experiment, which the authors mention already in the abstract, is that the relative performance feedback had an effect mainly by increasing the performance of students who performed badly in the intermediate exam. While this effect is interesting and is supported by the data analysis of the current experiment, it seems important to state that this effect cannot simply be generalized to other settings in which relative performance feedback is given. From a theoretical perspective, on which groups relative performance feedback has an effect likely depends on the distribution of performance or skills differences in the population (and, of course, people's knowledge about or perception of these distributions) and on the exact incentive structure. If the incentive structure is such that only very top performers are rewarded or if initial low performers have much lower skills (higher effort costs) than their higher-skilled competitors, then it may well be the case that relative performance feedback actually demotivates low performers and has a more motivating effect on people in the middle of the skill / performance distribution. Thus, while I believe the result the authors find in the current setting is interesting, I think it should be made clear that one cannot automatically expect it to generalize to other settings where relative performance feedback is used.

We really appreciate this comment. In response, we have added the new subsection “Discussion” (pages 19-22) in the revised paper to discuss the generalizability of our results. In particular, we relate our experimental results to those of existing research that stress the importance of students’ ability distribution. For example, the paragraphs starting line 443 and 464 state the possible negative impact of relative performance information feedback on students with low midterm scores. The paragraph starting line 472 suggests the possible positive impact on students with high midterm scores.

3.2) Observations of students who did not take the final exam: Currently the authors set the scores of students who did not take the final exam to zero. This seems to be a strong assumption that is not necessarily warranted. Another approach would be to simply treat these observations as missing and exclude them from the analyses. As a robustness check, I would like to see the results of the regression analyses if this way of dealing with these cases is used. Moreover, I find the authors' use of the "Intention to Treat" (ITT) effect in the paper confusing. To me an ITT effect would much rather be including those students who were randomized into a certain group but did not receive the treatment (i.e., the feedback).

This is an insightful suggestion for revising our paper. In both Economics I and II, when students did not take the examination, we usually set their examination score to zero. In Table 5 in the revised paper, we can see that, while the midterm examination scores for students whose final examination score is zero in both the control and treatment groups are lower, the number of students in the control group who did not take the final examination was greater than the number of students in the treatment group. This suggests that relative performance information feedback could prevent the students with lower scores in the midterm examination from dropping out of the final examination. When we exclude them from our estimation sample, the coefficients for D are positive but insignificant, as reported in Columns (1)–(4) in Table 6 in the revised paper. On the other hand, relative performance information feedback on the midterm examination has a significantly negative impact on dropout of the final examination Dropout, as shown in Table 7 in the revised paper. We have suggestive evidence that relative performance information feedback prevents low-performing students from dropping out of the final examination. We point out these issues in the revised paper (lines 391-401 and 402-410). Moreover, as the reviewer rightly notes, it was inappropriate to use the “intention-to-treat” (ITT) effect in the earlier version of the paper. We have rewritten the relevant text in the revised paper.

3.3) Estimators used in regression analyses: what estimator was used in the regressions? I assume the authors used a Tobit-estimation, as they speak about left-censoring. If this is the case, I doubt whether it is necessary. It seems perfectly fine to simply use OLS to estimate the effects of treatment interventions on test scores with a lower bound at zero (see, e.g., Levitt et al., 2016). Using OLS would also allow dropping the marginal effects columns, as the coefficients can be directly interpreted. In any case, the authors should be clear about what estimator they used and why.

We have rewritten the estimation model (first paragraph in the section “Results and Discussion” on p. 12 of the revised paper) and show the revised estimation results in Column (1) in Table 3 in the revised paper.

3.4) Information / knowledge about incentive structure: The authors write that the "students in Economics II knew the grading scheme … because the instructors already explained the grading scheme in detail at the beginning of the second semester and this grading scheme had already been employed in Economics I" (p. 8). Given that students' knowledge about the relative performance incentive structure created by the grading scheme is a key element of the experiment and the paper, I think it would nice to provide more details on the exact information that students received. Could, for instance, slides or other materials that were used to explain the grading scheme to students be shown in an Appendix? Or, even better, do the authors have any data on students' understanding of the grading system and the incentive structure it creates? The system seems relatively complicated with quite some discretion on the part of the graders, so it is not completely obvious that the students would have understood how the system works exactly and what incentives it creates. Any additional evidence that can be provided in this regard would therefore be helpful and make the interpretation of the experimental results more convincing.

In response to this comment, we have added the slide to explain the grading scheme for freshman students in April as Fig. 4 on p. 9. Although the slide only states the relationship between scores and grades, we also verbally explained that there is a possibility that raw scores are adjusted upward to obtain a reasonable pass rate. In addition, we explain the grading scheme of Economics I in more detail in lines 201–217.

3.5) Were the graders blind to the treatment a certain student had been assigned to? Ideally, they would have been. In any case, this information should be added.

In response to this comment, we have explained the grading scheme of Economics I in more detail, and the following text has been added (lines 146–153).

3.6) If available, please provide some more information on demographics for the randomization checks in Table 1 (e.g., gender or any other available data).

We additionally provide a randomization check (Table 1 in the revised paper). There is no significant difference in the mean scores between the control and treatment groups by sex, as clarified in lines 269–270.

3.7) I think the paragraph on how randomization avoids pitfalls of regression to the mean (p. 10) can be deleted. This point is obvious to anybody who has understood how randomization works, and the paragraph does little more than divert the reader's attention.

As the reviewer suggested, we have deleted this point in the revised paper.

3.8) I don't understand how Table 5 allows addressing the point that "relative performance information feedback could positively affect high-performing students more than low-performing students, even if the rank for the high-performing students has less tangible benefits" (p. 15). The only thing that the regressions reported in Table 5 do differently compared to those in Table 4 is to use a non-binary intermediate performance variable (the score in the intermediate exam) in the interaction term. This is a relevant robustness check, as the dummy for top-half vs. bottom-half in the intermediate exams contains less information than the quasi-continuous score variable. I would suggest that the authors motivate Table 5 that way, or alternatively, explain better how this analysis addresses the point quoted above.

We reported the estimation results shown in Table 5 in the previous paper as the relevant robustness check for the estimation results shown in Table 4 in the previous paper. We have rewritten this information (lines 365–390) in the revised paper.

3.9) Moreover, the authors could consider mean-centering Y_M (the score in the intermediate exam) in the regressions reported in Table 5. Without mean-centering (as is currently the case) the effect of the treatment Dummy D is estimated at Y_M = 0, which is a very special case.

We now include the mean-centering YM instead of YM as the independent variable. As shown in Table 4 in the revised paper (p. 14), the magnitude of the coefficient of D in Column (1) (3.686) is similar to that in Column (1) in Table 3 in the revised paper (3.517). We discuss this in lines 365–390 in the revised paper.

3.10) Wording / language:

P. 2: "efficient way of eliciting the incentives": "eliciting" doesn't seem to be the right word to me here. Maybe "increasing" would be better?

P. 3: First sentence of the second paragraph (starting with "revealing the role of…") needs to be rewritten.

P. 6: First sentence in second full paragraph: "The randomized controlled trial was conducted immediately after…": I think it would be more appropriate to write something like "The experimental intervention was implemented immediately after…."

P. 9: I would suggest calling Table 1 "Randomization Checks" instead of "Confirmation of randomness"

As the reviewer suggested, we have corrected all sentences.

Reviewer #4:

4.1) The design of experiment has not been explained clearly. For instance, what was the logic behind having Classroom 1 with higher ability students being selected into it? Is it because they are deemed to be more motivated than others or the stratification was driven by other aspects? What were the variations for the three other classrooms? Was there a neutral framing in the group assignment?

We have rewritten the design of our experiment in the last paragraph on p. 7 in the revised paper. In particular, we have added the reason why we placed students with a top-40 score in the Pretest of Mathematics in Classroom 1. (lines 154-170)

4.2) Any insights on the power calculations would have been useful to understand if the cell sizes are statistically justified?

This is a very insightful suggestion and we mention this issue in footnote 2 in Table 2 in the revised paper.

4.3) The experiment procedure suggests several threats to internal validity. Of particular concern are the spill over effects. The authors have raised concerns about both treatment and control groups sitting in the same classroom. Was there any control over communications among students about the different grading patterns?

In response to this comment, we confirm that most of students in the control group cannot know their own rank exactly even if students in Classroom 1 aggregate the rank information they received. Thus, the degree of uncertainty that students in the control group face is much higher than for students in the treatment group. This point has been added as a separate paragraph (lines 271–288) in the revised paper.

4.4) How does the treatment result in the observed impact? There is a debate on how relative grading influences motivation that has not been discussed. Aspects of direct competition among peers is not evident.

As the reviewer pointed out, the relative grading scheme is itself an important issue and it is worth considering further. A recent paper (Czibor et al. [15]) reported that relative grading could not elicit the incentive to study compared with absolute grading when students are less interested in achieving a higher grade. As our experimental framework is close to theirs, we have added some explanation (lines 334–350) in the revised version.

4.5) Does the relative performance feedback affect self- perceived ability, cognitive tactics, strategies, reinforcing cues, or identity rather than directly affecting effort? How does the effort incentives differ for those with low and high grades?

We really appreciate this comment. In response, we have added the new subsection “Discussion” (pages 19-22) in the revised paper to discuss the issues raised. In particular, we relate our experimental results to those of existing research that argue for the importance of students’ ability distribution. For example, in the paragraphs starting line 443 and 464, we note the possible negative impact of relative performance information feedback on students with low midterm scores; the paragraph starting line 472 suggests the possible positive impact on students with high midterm scores.

4.6) Theoretically relevant interaction effects and robustness checks could be examined. E.g. past academic history and relative grading.

We demonstrate that the significant positive impact of relative performance information feedback is mainly caused by the impact on low-performing students rather than that on high-performing students. We discuss these results from a theoretical perspective in the subsection “Discussion” (pages 19-22) in the revised paper.

In response to the reviewer’s suggestion, we examine whether the effect of relative performance information feedback depends not on the distribution of performance in the midterm examination (the midterm examination score Y_M) but on students’ skills differences, which are assessed from their past academic history: the final score in Economics I Econ1 and the score in the Pretest of Mathematics Math. We use Econ1 and Math as the interaction terms with D instead of Y_M:

Y_Fi=α_1 D_i+α_2 (D_i×H_Econ1i )+α_3 H_Econ1i+X_i β_1+u_1i (A)

Y_Fi=α_4 D_i+α_5 (D_i×H_Mathi )+α_6 H_Mathi+X_i β_2+u_2i (B)

where H_Econ1i in Equation (A) denotes a dummy variable equal to one if student i’s performance in Economics I was relatively high (above average score), and H_Mathi in Equation (B) denotes a dummy variable equal to one if student i’s performance in the Pretest of Mathematics was relatively high (above average score).

Both the coefficients α_2 and α_5 are insignificant. The results suggest that, compared with the results shown in Column (2) in Table 4 in the revised paper, the effect of relative performance information feedback depends on the distribution of performance in the midterm examination per se. These results are not surprising once we consider students’ incentive to study. Because students have to pass the course, the score of the midterm examination is important for them. We believe that this analysis is very helpful in interpreting our results. However, due to time and space constraints, we did not include these results in the revised manuscript.

4.7) Are there gender gaps in performance such that the sex-ratios in classrooms need to be controlled for?

Following this suggestion, we have additionally used the female dummy Female in the estimation model in the revised paper. The coefficients of Female in all the estimation models are insignificant. We additionally provide a randomization check (Table 1 in the revised paper). There is no significant difference in the mean scores between the control and treatment groups by sex. We have added this information in lines 302–304.

4.8) Role of teachers is not clear in this experiment. There is no information on teacher ability differences that may impact students’ scores. This is particularly disconcerting because feedback and instructions have interactions that have been shown to impact performance.

This is an insightful suggestion. In Table 3 in the revised paper, the coefficient of Class2 in Column (1) and those of Class1–Class3 in Column (2) are significant. The classroom fixed effects would absorb instructor fixed effects as well as peer effects and other classroom-level factors. We point out that these effects and factors may be associated with test scores in lines 319–322 in the revised paper.

In addition, it is noted that instructors were not technically blind regarding which students had been assigned to the treatment group. However, apart from handing letters to students, instructors could not confirm which students were assigned to treatment or control, and it would be difficult for instructors to remember this information. In addition, class attendance or participation, such as the number of times a student spoke in class, was not evaluated at all. Thus, whether instructors were blind to which students were assigned to treatment or control would have little impact on the experimental results. We address this issue in lines 146–153 in the revised paper.

4.9) The paper is generally well written but there are a few typos that can be addressed.

We have now carefully corrected typos.

4.10) The review of literature can improve, and major theoretical aspects relevant to the paper can be developed.

In response to this comment, we added the new subsection “Discussion” (pp. 19–22) in the revised paper to relate our experimental results to those of existing research.

4.11) Please verify the claim “By contrast, no existing study examines the impact of relative performance information feedback on student incentives under relative grading in an actual educational environment.”

To our knowledge, there is no previous study that examined the impact of relative performance information feedback on student incentives under relative grading in an actual educational environment. However, in response to this comment, we have deleted the expression noted.

Attachment

Submitted filename: Reply_Letter.pdf

Decision Letter 1

Baogui Xin

26 Mar 2020

Information feedback in relative grading: Evidence from a field experiment

PONE-D-19-35238R1

Dear Dr. Suzuki,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Baogui Xin, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Baogui Xin

1 Apr 2020

PONE-D-19-35238R1

Information feedback in relative grading: Evidence from a field experiment

Dear Dr. Suzuki:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Baogui Xin

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Steps to replicate the tables and figures.

    The instruction to replicate the tables and figures in “Information Feedback in Relative Grading: Evidence from a Field Experiment” by Shinya Kajitani, Keiichi Morimoto and Shiba Suzuki.

    (PDF)

    S2 File. Stata program file.

    (DO)

    S1 Dataset. This is the CSV file that contains our datasets.

    (CSV)

    Attachment

    Submitted filename: PLOS_One_Review.docx

    Attachment

    Submitted filename: Reply_Letter.pdf

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES