Skip to main content
PLOS One logoLink to PLOS One
. 2020 Jul 1;15(7):e0235396. doi: 10.1371/journal.pone.0235396

Validation of survey effort measures of grit and self-control in a sample of high school students

Gema Zamarro 1,*, Malachi Nichols 1, Angela L Duckworth 2, Sidney K D’Mello 3
Editor: Frantisek Sudzina4
PMCID: PMC7329102  PMID: 32609785

Abstract

Personality traits such as grit and self-control are important determinants of success in life outcomes. However, most measures of these traits, which rely on self-reports, might be biased when used for the purpose of evaluating education policies or interventions. Recent research has shown the potential of survey effort—in particular, item non-response and careless answering—as a proxy measure of these traits. The current investigation uses a dataset of high school seniors (N = 513) to investigate survey effort measures in relationship with teacher reports, performance task measures, high school academic outcomes, and college attendance. Our results show promise for use of survey effort as proxy measures of grit and self-control.

Introduction

Though the importance of personality traits such as grit (passion and perseverance for long-term goals) and self-control (the ability to regulate attention, emotion, and behavior despite temptations) to life outcomes including education levels, career success, health outcomes, and criminal behavior is well-established [1], [2], [3]), researchers have struggled to find non-biased measures of these traits to be used for the purpose of evaluation of education policies and interventions [4], Further, many existing datasets lack any measures at all. As a result, research on how to support and develop these important traits is limited by an inability to measure them.

Recent literature has proposed the use of survey effort as proxy measures of grit and self-control to either supplement information obtained through self-reports, which might be affected by multiple types of bias (e.g., reference group bias and social desirability bias; see [4]), or to complement datasets that lack measures of these traits [5], [6], [7], [8], [9], [10], Why? Surveys take effort to complete. For students, in particular, surveys administered in classrooms can feel like schoolwork or homework. Therefore, by studying how much effort students put into surveys, we can obtain proxy measures of a student’s grit and self-control.

Two measures, in particular, have shown promise: item non-response rates and careless answering. We define item non-response as the percentage of questions skipped by a respondent on a survey. Taking advantage of longitudinal nationally representative samples of adolescents and adults in the United States and Germany, Hedengren and Stratmann [11], found item non-response to be correlated with self-reported conscientiousness, a personality trait related to grit and self-control (one standard deviation increase in response rates was associated with a statistically significant 0.3 standard deviation increase in self-reported conscientiousness) and a significant predictor of earnings and mortality risks. Furthermore, Hitt, Trivitt, and Cheng [12] use six different longitudinal nationally representative samples of American youth to determine the relationship between the percentage of questions skipped and desirable self-reported outcomes measured in adulthood that are known to be related to individual’s levels of grit, self-control, and related traits [1], They found that item non-response was a significant predictor of self-reported educational attainment and labor market outcomes, independent of available measures of cognitive ability (one standard deviation increase in item non-response was associated with completing between 0.1 and 0.3 fewer years of education).

In addition, some respondents might show low survey effort by answering randomly and carelessly [8] [9], Using two national longitudinal surveys, Hitt [6], found that careless answering, measured as the presence of haphazard and inconsistent responses, in adolescent respondents was associated with reporting fewer years of completed education and a decreased probability of high school completion, independent of cognitive ability (one standard deviation increase in careless answering was associated with an about 0.1 decrease in self-reported completed years of education and almost a two percentage points decrease in the probability of graduating from high school). Similarly, using data from a nationally representative internet panel of American adults, Zamarro et al. [10], found that repeated careless answering behavior among adults was negatively correlated with self-reported grit [13], and self-reported conscientiousness [14] (partial correlations (rxy,z) of about -0.15 after controlling for cognitive ability and demographic information), and positively correlated with neuroticism, shedding light on its validity as a measure. They also determined that careless answering was a significant negative predictor of self-reported total years of education and lower levels of self-reported income and career success.

Although recent research has shown that survey effort measures can be promising proxy measures for personality traits related to grit and self-control [11] [12] [6] [10], these validation exercises have relied on self-reported measures of personality traits and outcome variables, and lacked external sources of information. The sole exception is the work of Hedengren and Stratmann [11], which used information on earnings and mortality risks from administrative sources. We aim to fill this gap in the literature by studying the relationship between survey effort measures and teacher evaluations of traits, performance task measures, and external outcome measures.

We use data on a sample of 513 high school seniors attending a public school in the Northeastern United States. Although our dataset is comprised of a relatively small convenience sample, it is a unique one: it collates a diverse set of measures of students’ personality traits, including self-reported measures, teacher reports, performance measures from two validated tasks, and administrative records. Complementing the work of Hitt, Trivitt, and Cheng [12] and Hitt [6], we study the correlation of survey effort measures with students’ self-reported measures of grit and self-control and academic outcomes at the end of high school, college attendance one year after graduation, and, more importantly, with teacher reports on these traits. Second, we study the relationship between survey effort measures and other performance task measures designed to capture related traits like academic diligence, effort put forward by students on tedious school related tasks [15], and frustration tolerance, the ability to overcome frustration arising from challenges that block goals [16], Our results suggest that survey effort can be used as proxy measures of grit and self-control.

Materials and methods

Participants

This study was approved by University of Pennsylvania IRB (Protocol 814991) and the University of Arkansas IRB (Protocol 16-10-164). The data used is from a study on college persistence led by a research team at the University of Pennsylvania. In the spring of 2014, the team collected data from 513 high school seniors attending a public high school in the Northeastern United States. The research team recruited participants through opt-out parental consent forms distributed by the school administration. If the parent did not wish for the child to be part of the study, they could indicate so by signing the provided form and sending it back to the school. Alternatively, they could call or email the principal investigator of the study. In addition, non-opted out students were also given a child assent from at the beginning of the first session of the study. Through this form they got the option to also opt out of the study themselves. A total of 154 students opted out of the study. One year later, the research team used National Student Clearinghouse (a non-profit organization offering nationwide college enrollment and degree attainment data, see https://www.studentclearinghouse.org/) to track college enrollment status of as many participants as possible, which resulted in a study with adequate power to detect small to medium effects [16],

According to demographic information obtained from school records (see Table 1), 41% of students were African American, 36% White, 20% Asian, and 3% Hispanic; 54% were female. Half (51%) qualified for Free and Reduced-price Lunch (FRL).

Table 1. Summary statistics for demographic and outcome variables.

  Measure Mean Standard Deviation Minimum Maximum
Demographic
Age 17.93 0.53 16 21
Female 0.54 0.50 0 1
Asian 0.20 0.40 0 1
African American 0.41 0.49 0 1
Hispanic 0.03 0.16 0 1
Caucasian 0.36 0.48 0 1
ELL 0.14 0.35 0 1
SPED 0.14 0.35 0 1
FRL 0.51 0.50 0 1
Median Household Income ($) 52,530 22,915 9,471 128,618
KBIT Scaled Score 94.26 21.43 40 132
Outcome
HS GPA Senior 85.07 7.66 55 100
HS Graduate 0.95 0.22 0 1
End of Year Math Test 1529.32 54.49 1363 1698
End of Year Reading Test 1528.77 48.32 1385 1706
Attempted SAT 0.51 0.50 0 1
Mean SAT 1414.80 254.25 820 2060
College Enrollment for 1 Year 0.64 0.48 0 1
4-year College Enrollment for 1 Year 0.43 0.50 0 1
4-year College Enrollment for 1 Year (Full-Time) 0.40 0.49 0 1

N = 513 students. ELL, English Language Learner students; SPED, Special Programs Education Students; FRL, students eligible for Free or Reduced-price Lunch.

Assessments and measures

In a first session, during the month of November 2012, students were administered the assent forms and a vocabulary test during planning periods in school (37 minutes sessions). A large make-up session with about 300 students was held on the final day of testing in the library computer lab.

In a second session, during January 2013, students completed the Matrix Reasoning subtest of the Kaufman Brief Intelligence Test (KBIT) [17], then an online questionnaire on student autonomy, purpose for applying to college, growth mindset measures, locus of control, trust and belonging, feelings toward math, Big 5 personality questions, positive and negative affect, believe about the role of effort, and life satisfaction. Afterward students completed the Academic Diligence Task (ADT), which is described in more detail below. Finally, students answered 10 questions related to socio-economic status, participation in extracurricular activities, and description of self. This was a 2.5 hours session that took place in the cafeteria (50 to 200 students per day) or individual classrooms in school (about 30 students per class per day).

In a final session in May 2013, students completed the Mirror Tracing Frustration Task (MTFT–described below) during senior planning periods at the school library. Students were tested during four periods per day with two classes of students per period (about 30 to 60 students per period).

Separately, three teachers provided overall ratings about all their participating students’ levels of grit and self-control. Participating students and teachers were compensated for their time with small, non-monetary rewards (e.g., credit to the school library coffee house). Teachers’ compensation was less than $25 in value, and students’ compensation was less than $5 in value.

Survey effort measures

Item non-response. Following Hitt, Trivitt, and Cheng [12], we parametrized survey effort by computing two measures of survey item non-response. We determined item non-response by dividing the total number of questions left blank by the number of answerable questions, given legitimate skips, that is excluding those questions left blank because the student was not requested to answer given the routing in the survey and prior answers. Relatedly, we also computed a dichotomous item non-response measure as a binary indicator for the student leaving any answerable question blank, dependent on legitimate skips. We computed this measure because almost half of our sample (47%) completed the entire survey. Fig 1 shows the distribution of survey item non-response rates in our sample, among those who left at least one question blank.

Fig 1. Distribution of item non-response rates among those leaving questions blank.

Fig 1

Careless answering. Following Hitt [6], the second way we parametrized survey effort is through measures of careless answering. The idea behind this measure is as follows: Consider a reliable, validated scale with a number of items. If the scale is reliable, each item will consistently measure the same underlying construct. Individual responses to each item would be closely predicted by responses to other items in the same scale. Thus, we interpreted deviations in responses from predicted values given responses to other items in the scale as measures of careless answering.

In practice, we first identified reliable scales within the student survey with Cronbach’s alpha reliability coefficients of at least 0.7 [18], We excluded the self-reported scales of grit and self-control used to validate survey effort measures in this paper. In total, we identified the following seven scales: a trust scale, a belonging scale, an interest in school scale, an academic self-efficacy scale, a distress tolerance scale, a purpose scale, and a brief self-control scale. For each, we did a regression analysis of responses for each item compared to the average score of the rest of the items in that scale. Then, we computed residuals from each of these regression models to capture the extent to which the response to a particular item is unpredictable based upon the response patterns of the individual student and others in the analytic sample. We standardized the absolute values of each of these residuals to account for any differences across the items within the same scale. We then averaged these standardized residuals within scales and standardized them again to take into account differences across scales (e.g., different total number of items, answer options). Finally, we calculated a composite careless answering score by averaging these standardized averaged residuals at the student level, with higher values of this measure indicating higher levels of carelessness or unpredictability in responding. The Table A.1 in S1 Appendix displays the Cronbach’s alpha reliability coefficients for each scale we considered in our careless answering measure, the items included in each, as well as the average absolute residuals associated with each item in each scale following the regression analysis just described above.

Fig 2 shows the distribution of careless answering in our sample. Since the careless answering measure is standardized by construction, mean and standard deviation are not very informative. However, we note considerable variation across students in the sample as presented in the summary statistics in Table 2.

Fig 2. Distribution of careless answering measures.

Fig 2

Table 2. Summary statistics for measures of character traits.
  Measure M SD Minimum Maximum
Survey Effort
Item Non-Response (%) 2.41 5.35 0.00 37.18
Dichotomous Item Non-Response 0.53 0.50 0.00 1.00
Careless Answering 0.00 1.00 -2.24 3.93
Performance Task Measures
Diligence Task, Percentage Time Spent on Math 0.64 0.30 0.00 1.00
Frustration Task, Percentage Time Spent Tracing 0.55 0.27 0.00 1.00
Self-Reported Measures
Grit 3.76 0.71 1.00 5.00
Locus of Control 4.57 0.75 2.50 6.00
Self-Control Combined (Work and Interpersonal) 3.61 0.60 1.00 5.00
Teacher-Reported Measures
Work Self-Control 3.72 0.88 1.00 5.00
Interpersonal Self-Control 4.21 0.77 1.00 5.00
Grit 3.53 0.87 1.00 5.00
Redirection 0.92 1.16 0.00 5.00
  Homework Completion 77.69 21.63 0.00 100.00

N = 513 students. The statistics reported for the Frustration Task are from an analytical sample of n = 391. Following Meindl et al. [16], we removed participants if they failed to complete a practice trial preceding the actual task, fully completed tracing the shape, experienced technical problems within the task, or were not allowed an adequate amount of time to complete the task.

Item non-response and careless answering seem to be different approaches to exerting low survey effort. On any given item, careless answering and item non-response are mutually exclusive approaches, so it is impossible to do both at the same time. The participant-level Pearson’s correlation coefficient between these two measures of survey effort is 0.17, suggesting complementary measures.

Teacher reports. Three teachers (homeroom, English, and social studies) provided an overall rating of each of their students participating in the study on grit and self-control and answered additional questions about classroom behavior and work ethic. To minimize burden, teachers were shown the items from the grit scale students were asked to complete [13], described in more detail below, and asked to rate how much these items as a whole described each student using a 5-point Likert-type scale. Teachers were also asked to report on students’ self-control according to an 8-item Likert-type scale from the Brief Self-Control Scale [19], which students had been asked to complete, also described in detail below. This measurement approach using a single overall assessment of a personality traits has been proven to show adequate levels of convergent and discriminatory validity, test-retest reliability, and convergence between self-and observed-ratings [20] [21], Since three teachers reported on each child, individual z-scores were averaged for individual students, giving each student a unique construct score to increase validity [22], High scores represent higher levels of that trait.

To measure classroom behavior and work ethic, we asked teachers to report on students’ redirection (reminders to stay on task or follow rules) and homework completion. For redirection, teachers estimated the number of times the student required redirection within the last week, with options ranging from 0 to 5 or more times. The three teachers’ scores were averaged to give each student a redirection score. A high number of redirects could represent a lack of diligence amongst other factors. Finally, we asked teachers to rate students on homework completion, giving the percentage of assignments (from 0 to 100) the student completed on time and received a passing grade. The three teachers’ scores were averaged to give each student a homework completion score. A higher percentage infers that the student has high levels of work ethic.

Direct performance task measures

Academic Diligence Task (ADT). A computer-generated task designed to measure academic diligence [15], called the ADT gives students the option to either perform simple math problems, after being told about the benefits of this type of exercises, or consume media by watching online video clips or playing online games. We measured academic diligence by the percentage of the total task time (12 minutes) a student spent completing math problems instead of consuming media. Higher percentages represent higher levels of academic diligence.

In a sample of over 900 high school students, Galla et al. [15], found that measures of student engagement in the ADT were correlated with self-reported measures of conscientiousness (rxy,z = 0.09), self-control (rxy,z = 0.15), and grit (rxy,z = 0.17). Performance on the ADT was also predictive of the student’s high school Grade Point Average (GPA), standardized test scores, high school graduation, and college enrollment, even after controlling for potential confounds including cognitive ability and sociodemographic characteristics.

Mirror Tracing Frustration Task (MTFT). Participants were also asked to complete the Mirror Tracing Frustration Task (MTFT) [16], The MTFT measures frustration tolerance. During this task, students were given the option to trace a shape using the mouse on their computer or consume media by watching online videos. However, using the mouse produced movements in the opposite direction. There was also a random drift added to each mouse moment, so perfect control was not possible. This required students to use high levels of concentration when performing the task and induced frustration. If the student stopped tracing or traced off the shape, the task automatically restarted. Students were informed about the importance of developing perceptual-motor skills for various real-world tasks in order to motivate the tracing task, but had the option to switch between the task and media as often as they desired. Frustration tolerance was measured as the percentage of the total task assigned time (5 minutes) a student spent tracing. Using this same data, Meindl and colleagues [16] showed that higher frustration tolerance was significantly associated with self-reported and teacher-reported grit and self-control measures (rS = 0.11 to 0.22), as well as high school GPA, standardized test scores, and college persistence.

Self-reported measures

We also study the relationship between survey effort measures and the following self-reported measures collected in the study.

Grit. Following Duckworth and Quinn [13], students were asked to rate how true five statements described themselves on a 5-point Likert-type scale from 1 = not at all true to 5 = completely true. These statements included, for example, “I finish whatever I begin” and “I stay committed to my goals.” We averaged each student’s item scores to create a grit score for each respondent. Possible grit scores range from one to five, with a high score representing high values of grit. This scale showed high reliability in our sample with a Cronbach’s alpha of 0.8.

Self-control. Students were also asked to complete eight items from the Brief Self-Control Scale [19], This scale consisted of a combination of four questions pertaining to schoolwork and four questions pertaining to interpersonal situations. Students then rated how true the eight statements were for themselves using a 5-point Likert-type scale from 1 = not at all true to 5 = completely true. The statements for work skills included “I come to class prepared” and “I get to work right away, instead of waiting until the last minute,” while the statements relating to interpersonal skills included “I allow others to speak without interruption” and “I control my temper.” Scores from each scale were averaged to create a combined self-control score for each student. Average scores were also computed separately to represent self-control in work and self-control related to interpersonal skills. Scores ranged from 1 to 5 with a high score meaning the student has high levels of self-control. The combined self-control scale showed high reliability in our sample with a Cronbach’s alpha of 0.8.

Locus of control. Finally, students were also asked to complete a 4 items locus of control scale [23] using a 6-point reporting scale. This scale captures how strongly students believe they have control over the situations and experiences that affect their educational outcomes. The items in this scale include, for example, “Getting good grades is a matter of luck” and “If you get bad grades, it’s not your fault.”

Outcome measures

We also studied the relationship between survey effort measures and other outcome variables to further study the criterion validity of survey effort measures of grit and self-control. For this purpose, our outcome measures included high school GPA ranging from 0 to 100 (In the state where our data comes from, high schools vary on their calculations and scale used for their GPAs. Therefore, we converted GPA to a 100-point scale with the help of district provided handbooks and information from the College Board), a binary variable indicating if the student graduated high school, a binary variable indicating if the student attempted to take the Scholastic Aptitude Test (SAT), the total SAT score in the first attempt (for those who attempted the test) ranging from 600 to 2400, which was the sum total of the critical reading, math, and writing scores. Furthermore, we constructed three binary variables indicating if the student was continuously enrolled in college for one year after graduating high school, if the institution was a four-year college, and if the student was continuously enrolled full-time in that four-year college. Additionally, we also studied the relationship between survey effort measures and performance on students’ final senior year assessments in math and reading, which are part of the state’s high school graduation requirements; scores ranged from 1200 to 1800.

Cognitive ability and other data

To control for cognitive ability, we used measures of students’ performance on the matrix reasoning subset of the Kaufman Brief Intelligence Test (KBIT) [17], scaled scores ranged from 40 to 132. Our analysis also includes controls for age, gender, ethnicity, English Language Learner (ELL), Special Education Status (SPED), free and reduced-price lunch (FRL) status, and household income.

Empirical strategy for validation of measures

For survey effort measures to be valid proxy measures of grit and self-control, they should be correlated with other measures of these character skills (convergent validity) as well as with other outcome variables known to be correlated with the same latent skills (criterion validity). Accordingly, we computed Spearman correlations and partial rank correlations (controlling for cognitive ability and socio-demographic information) for our measures of survey effort (i.e,. non-response rates and measures of careless answering) with self- and teacher- reported grit and self-control, with an expectation that the correlations would be negative. We expected to find negative correlations of both survey effort measures and teacher-reported homework completion, and positive correlations with redirection. Finally, we expected to find negative correlations between survey effort and diligence and frustration tolerance as measured through the relevant performance tasks.

The last set of analyses looked at criterion validity of item non-response and careless answering measures of survey effort. To do so, we estimated linear regression models and linear probability models to predict (from survey effort measures) each of the following academic outcomes: high school GPA, high school graduation, attempt to take the SAT, SAT scores if attempted, end-of-senior-year math and reading test scores, college enrollment in the first year after high school graduation, enrollment in a four-year college, and full-time enrollment in a four-year college. For binary outcomes, we also estimated discrete choice logit models. Results were similar to those in the linear probability models presented here. We estimated separate models for item non-response rates, dichotomous non-response, and careless answering measures as specified below:

AcademicOutcomei=β0+β1SurveyEfforti+β2CognitiveAbilityi+β3Xi+εi (1)

Our models controlled for cognitive ability using the KBIT scaled score. Xi represents a vector of student socio-demographic controls, including age, ethnicity, gender, English Language Learner (ELL) status, Free-Reduced Price Lunch (FRL) status, Special Education (SPED) status, and parental income. We reported estimated coefficients along with standardized regression coefficients for all models. For comparison, we also estimated models including direct performance task measures of academic diligence and frustration tolerance, as well as teacher reported and self-reported related measures.

Results

Descriptive statistics

Table 1 includes summary statistics for our outcome variables. On a scale of 0 to 100, the students had an average high school GPA of 85, and 95% of our sample of high school seniors graduated high school. Only half of the sample, however, attempted to take the SAT, but about 60% enrolled in college after graduation. Of these, 43% enrolled in a four-year college and 40% did so full time.

On average, students did not answer 2% of the items they were asked to complete (see Table 2). Forty-seven percent of students answered all the questions in the survey. These item non-response rates are similar to those found by Hitt, Trivitt, and Cheng [12] in multiple nationally representative samples of adolescents. Our careless answering measure, which captures inconsistent responses, ranges from -2.2 to 3.9. This indicates considerable variation in the degree of care that students put into completing the surveys, with some being more careful than the average (negative values) and some being less careful (positive values). For the performance task measures, students devoted an average of 64% of the assigned time (about 10 minutes) engaged in the math exercises in the diligence task. They spent an average of 55% of the assigned time (almost three minutes) tracing instead of engaging with the distractors when completing the frustration task.

The average self-reported grit of students in our sample was almost 4 (out of 5). Similarly, the students scored an average of almost 4 on the self-control combined scale and about 4.6 on the locus of control scale. Additionally, teachers reported an average of 3.5 in the level of grit of students in our sample, 3.7 in the level of work-related self-control and 4.2 in the level of interpersonal self-control. Teachers reported that students needed redirection on average about once during the previous week and they completed, on average, about 78% of the assigned homework on time and with a passing grade.

Relationship among character trait measures

Table 3 presents Spearman’s correlations among our proposed survey effort measures and student self-reported and teacher-reported measures of character traits. As expected, item non-response rates and careless answering were negatively correlated with self-reported grit and self-control as well as teacher-reported grit, self-control, and homework completion. Additionally, both survey effort measures were positively correlated with teacher redirection. Importantly, they were both negatively correlated with performance on both the diligence and frustration tasks, which corresponds with what we expected (i.e., lower levels of effort on the survey correspond with lower levels of performance in these tasks).

Table 3. Spearman and partial rank correlations between performance task measures, self-reports, and teacher reports.

    Item Non-response Careless Answering
    (1)b (2)c (1)b (2)c
Self-Reported Measures
Grit -0.118* -0.155* -0.024 -0.066
Locus of Control -0.093* -0.091 0.038 -0.022
Self-Control Combined -0.135* -0.148* -0.104* -0.114*
Self-Control Work -0.081* -0.101* -0.127* -0.153*
Self-Control Interpersonal -0.144* -0.155* -0.042 -0.035
Teachers-Reported Measures
Teacher-Reported Grit -0.216* -0.184* -0.170* -0.131*
Teacher-Reported Work Self-Control -0.201* -0.164* -0.165* -0.107*
Teacher-Reported Interpersonal Self-Control -0.147* -0.122* -0.092* -0.065
Teacher-Reported Redirection 0.112* 0.111* 0.133* 0.091
Teacher-Reported HW Completion -0.157* -0.122* -0.105* -0.058
Performance Task Measures
Diligence Task PT Math -0.152* -0.084 -0.163* -0.125*
  Frustration Task PT Tracea -0.104* -0.067 -0.134* -0.102*

* represents p-value < 0.05. Total sample of 513 students.

a The statistics reported for the Frustration Task are from a sample of 391 students.

b corresponds to Spearman correlations

c corresponds to partial correlations controlling for KBIT Scaled Score, Age, Ethnicity, Gender, FRL, SPED, ELL, and household income.

Table 3 also shows partial rank correlations among these measures after controlling for students’ cognitive ability and socio-demographic information. We observed a similar pattern compared to the zero-order correlations, but partial correlations with teacher reports and performance task measures were smaller. Although the magnitudes of the correlations between survey effort and survey self-reported measures may appear small, they are at least as large as the correlations reported in prior literature validating other behavioral-task measures of conscientiousness, grit, and self-control [24] [15] [16].

Relationship between survey effort measures and academic outcomes

We find evidence of criterion validity with respect to the predictive power of survey effort measures on high school and college academic outcomes. For comparison, we also examined the predictive power of the performance task measures, teacher reported, and self-reported related measures. Table 4 presents the results of linear regression models for student academic outcomes, following the specifications described above in Eq (1), when different survey effort measures and performance task measures were included as explanatory variables. Regressions that use SAT scores as a dependent variable were limited to only those students who attempted the SAT. Sample sizes varied depending on the available information for each individual regression model, ranging from 392 to 458 observations and from 216 to 240 for SAT score models. Similarly, following Meindl et al. [16], results for the frustration task excluded data from students if they failed to complete a practice trial preceding the actual task, fully completed tracing the shape, experienced technical problems during the task, or were not allowed an adequate amount of time to complete the task due to data collection constraints. As a robustness check, we also performed estimates including the full data set (i.e,. N = 513) and the main results were comparable to the ones presented above.

Table 4. Estimated coefficients of linear regression models predicting academic outcomes.

  High School GPA High School Graduation Attempt SAT SAT End of Year Math End of Year Read College Enroll 1 year 4yr College Enroll 1 year 4yr College Enroll Full Time 1 year
Item Non-Response (%) -0.271*** 0.0007 -0.021*** -12.116** -1.917*** -1.848*** -0.020*** -0.019*** -0.017***
[-0.196] (0.060) [0.024] (0.001) [-0.236] (0.004) [-0.139] (4.956) [-0.193] (0.394) [-0.197] (0.388) [-0.238] (0.004) [-0.213]
(0.004)
-0.192
(0.004)
Adj R-squared 0.240 0.077 0.161 0.273 0.374 0.316 0.126 0.167 0.159
Dichotomous Item Non-response -2.220*** -0.021 -0.271*** -64.868** -17.890*** -22.212*** -0.239*** -0.241*** -0.212***
[-0.144] (0.664) [-0.065] (0.015) [0.271] (0.044) [-0.125] (29.650) -0.164 (4.311) [-0.228] (3.993) [-0.249] (0.043) [-0.242]
(0.044)
[-0.214]
(0.044)
Adj R-squared 0.224 0.080 0.178 0.270 0.365 0.329 0.132 0.180 0.168
Dichotomous Item Non-response -1.195* -0.029* -0.210*** -40.470 -11.218*** -17.194*** -0.175*** -0.186*** -0.162***
[-0.078] (0.719) [-0.090] (0.016) [-0.209] (0.048) [-0.078] (33.11) [-0.103] (4.670) [-0.176] (4.360) [-0.183] (0.047) [-0.186]
(0.047)
[-0.164]
(0.047)
Item Non-Response (%) -0.227*** 0.002 -0.014*** -9.053 -1.491*** -1.156*** -0.014*** -0.012** -0.011**
[-0.163] (0.065) [0.061] (0.001) [-0.151] (0.004) [-0.104] (5.548) [-0.150] (0.430) [-0.123] (0.420) [-0.163] (0.004) [-0.137]
(0.004)
[-0.125]
(0.004)
Adj R-squared 0.243 0.081 0.194 0.275 0.381 0.340 0.151 0.193 0.179
Careless Answering -1.967*** -0.007 -0.131*** 19.536 -15.749*** -10.331** -0.085** -0.062 -0.050
[-0.119] (0.724) [-0.021] (0.016) [-0.122] (0.049) [0.033] (34.598) [-0.133] (4.781) [-0.097] (4.535) [-0.083] (0.048) [0.058]
(0.049)
[-0.047]
(0.048)
Adj R-squared 0.217 0.076 0.122 0.255 0.355 0.287 0.079 0.127 0.126
Diligence Task PT Math 3.708*** -0.012 0.039 96.553* 25.956*** 20.649*** 0.1485* 0.108 0.094
[0.145] (1.207) [-0.021] (0.028) [0.023] (0.084) [0.114] (53.832) [0.144] (7.962) [0.126] (7.513) [0.093] (0.081) [0.064]
(0.082)
[0.057]
(0.082)
Adj R-squared 0.232 0.077 0.110 0.242 0.352 0.304 0.091 0.147 0.127
Frustration Task PT Tracing 3.707*** 0.030 0.214** 41.058 34.872*** 20.138** 0.145 0.119 0.098
[0.132] (1.414) [0.052] (0.030) [0.118] (0.096) 0.046 (61.836) [0.196] (8.702) [0.117] (8.684) [0.086] (0.092) [0.065]
(0.098)
[0.054]
(0.097)
Adj R-squared 0.208 0.114 0.117 0.171 0.278 0.223 0.068 0.095 0.100

Standardized coefficients in brackets. Standard errors of estimated coefficients in parenthesis. Additional controls included in the model are: KBIT Scaled Score, Age, Ethnicity, Gender, FRL, SPED, ELL and household income.

* Indicates P-values<0.1

** Indicates P-values<0.05, and

*** Indicates P-values<0.01.

In the results for survey effort measures, we found that a standard deviation increase in item non-response led to an almost 0.2 standard deviation decrease in high school GPA, a 0.2 standard deviations decrease in the probability of attempting the SAT, a 0.14 decrease in SAT scores if attempted, an almost 0.2 standard deviation decrease in end-of-senior-year math and reading scores, and a 0.2 standard deviation decrease in the probability of being enrolled in college one year after graduation, keeping cognitive ability and demographic information fixed (see Table 4). We also estimated models that include both item non-response rates and a binary indicator for leaving any question blank to see if these behaviors were related to academic outcomes. We found that this was generally the case: both were significant predictors of these academic outcomes. Similarly, a one standard deviation increase in careless answering was associated with a 0.12 standard deviation decrease in GPA, a comparable decrease in the probability of attempting the SAT, about 0.1 standard deviation decrease on end-of-senior-year math and reading exams, and a 0.08 standard deviation decrease in the probability of being enrolled in college one year after graduation, all else being equal. Finally, generally none of the survey effort measures was found to be a predictor of high school graduation, only dichotomous item non-response appears marginally significant when included along with item non-response rates. This result could be because a great majority of students in our sample (95 percent) graduated high school.

We found that both the academic diligence and frustration tasks significantly predicted GPA and end-of-senior-year math and reading test scores. Estimated effects were comparable in size to those we found for survey effort measures. One standard deviation increase in performance in the diligence task is associated with a 0.14 standard deviation increase in GPA, a 0.11 standard deviation increase in SAT scores, and about 0.14 standard deviation increase in end-of-senior-year math and reading test scores. Performance in the diligence task also significantly predicted SAT scores and college enrollment, but only marginally. Finally, performance on the frustration task significantly predicted the probability of attempting the SAT. These findings confirm the work of Galla et al. [15] and Meindl et al. [16] who found that performance on the academic diligence task and the frustration task predicted high school academic outcomes and college enrollment.

Similarly, for comparison, Tables 5 and 6 show the predictive power of teacher reported and student self-reported related measures. Student self-reported measures of grit and self-control appear to be comparable predictors than survey effort measures but teacher reports appear to be a better predictor of student academic outcomes than survey effort or performance tasks. We find that all types of teacher reports considered significantly predict senior year GPA, the probability of high school graduation, attempting the SAT, performance in the Keystone reading and math tests and college enrollment. Finally, the only significant predictor of SAT scores, among those who took the test, is teacher reported interpersonal self-control. Effect sizes are also generally larger than those found for survey effort or performance tasks. It should be stressed, however, that as it was the case with self-reports, teacher reports are subject to similar biases and manipulation if used for evaluation purposes. Also, they are often not available in researcher’s datasets. However, when available they seem to be good measures of students’ character traits. Survey effort measures, on the other hand, still showed predictive power and concurrent validity and so, are potentially a good proxy measure of grit and self-control when other measures are not available or when we suspect might be affected by manipulation or other sources of bias.

Table 5. Estimated coefficients of linear regression models predicting academic outcomes.

  High School GPA High School Graduation Attempt SAT SAT End of Year Math End of Year Read College Enroll 1 year 4yr College Enroll 1 year 4yr College Enroll Full Time 1 year
Teacher-Reported Grit 4.648*** [0.505] 0.037*** [0.206] 0.146*** [0.253] 29.751 [0.090] 17.450*** [0.274] 16.976*** [0.295] 0.141*** [0.257] 0.116*** [0.202] 0.115*** [0.202]
(0.334) (0.008) (0.026) (18.957) (2.464) (2.320) (0.025) (0.026) (0.025)
Adj R-squared 0.449 0.093 0.166 0.262 0.407 0.362 0.132 0.161 0.162
Teacher Reported work self-control 4.588*** [0.500] 0.040*** [0.230] 0.161*** [0.283] 25.883 [0.078] 15.435*** [0.244] 16.126*** [0.288] 0.132*** [0.243] 0.118*** [0.208] 0.120*** [0.215]
(0.341) (0.008) (0.025) (19.410) (2.528) (2.312) (0.025) (0.026) (0.025)
Adj R-squared 0.438 0.101 0.179 0.260 0.390 0.355 0.123 0.162 0.165
Teacher Reported Interpersonal self-control 2.768*** [0.262] 0.030*** [0.150] 0.126*** [0.193] 70.297*** [0.172] 12.181*** [0.169] 15.042*** [0.231] 0.127*** [0.203] 0.128*** [0.197] 0.120*** [0.186]
(0.454) (0.009) (0.030) (23.502) (2.941) (2.756) (0.029) (0.030) (0.030)
Adj R-squared 0.266 0.073 0.140 0.283 0.362 0.327 0.107 0.158 0.154
Teacher Reported Redirection -2.575*** [-0.381] -0.016** [-0.123] -0.086*** [-0.201] -11.119 [-0.044] -8.558*** [-0.182] -10.410*** [-0.252] -0.063*** [-0.154] -0.057*** [-0.134] -0.050*** [-0.119]
(0.270) (0.006) (0.019) (14.37) (1.900) (1.718) (0.019) (0.019) (0.019)
Adj R-squared 0.341 0.067 0.142 0.254 0.367 0.338 0.091 0.141 0.138
Teacher Reported Homework Completion 0.107*** [0.289] 0.001*** [0.194] 0.004*** [0.172] -0.762 [-0.055] 0.313*** [0.120] 0.253** [0.106] 0.005*** [0.209] 0.003*** [0.120] 0.002** [0.111]
(0.015) (0.0003) (0.001) (0.804) (0.107) (0.103) (0.001) (0.001) (0.001)
Adj R-squared 0.284 0.092 0.130 0.255 0.338 0.286 0.106 0.134 0.134

Standardized coefficients in brackets. Standard errors of estimated coefficients in parenthesis. Additional controls included in the model are: KBIT Scaled Score, Age, Ethnicity, Gender, FRL, SPED, ELL and household income.

* Indicates P-values<0.1

** Indicates P-values<0.05, and

*** Indicates P-values<0.01.

Table 6. Estimated coefficients of linear regression models predicting academic outcomes.

  High School GPA High School Graduation Attempt SAT SAT End of Year Math End of Year Read College Enroll 1 year 4yr College Enroll 1 year 4yr College Enroll Full Time 1 year
Self Reported Grit 2.663*** [0.244] 0.015 [0.068] 0.079** [0.112] -19.963 [-0.053] 6.808** [0.887] 8.324*** [0.121] 0.088*** [0.131] 0.065** [0.093] 0.067** [0.097]
(0.451) (0.010) (0.031) (21.732) (3.041) (2.852) (0.030) (0.031) (0.031)
Adj R-squared 0.262 0.080 0.120 0.257 0.346 0.293 0.089 0.133 0.134
Self Reported Locus of Control 1.486*** [0.142] 0.024* [0.112] 0.056* [0.084] -7.319 [-0.021] 6.717** [0.091] 3.001 [0.045] 0.094*** [0.147] 0.051* [0.076] 0.031 [0.047]
(0.446) (0.010) (0.030) (19.780) (2.922) (2.817) (0.029) (0.030) (0.030)
Adj R-squared 0.223 0.088 0.115 0.255 0.347 0.280 0.093 0.130 0.126
Self-Control Work 2.792*** [0.252] 0.013 [0.058] 0.102*** [0.144] -33.992 [-0.089] 1.129 [0.014] 4.178 [0.060] 0.091*** [0.134] 0.063** [0.089] 0.047 [0.067]
  (0.456) (0.010) (0.031) (21.531) (3.157) (2.956) `(0.031) (0.031) (0.031)
Adj R-squared 0.267 0.079 0.128 0.262 0.338 0.282 0.090 0.132 0.129
Self-Control Interpresonal 1.532*** [0.138] 0.016 [0.068] 0.112*** [0.157] -39.898* [-0.102] 6.624** [0.085] 12.624*** [0.179] 0.077** [0.0113] 0.079** [0.111] 0.052* [0.074]
  (0.477) (0.010) (0.032) (21.924) (3.112) (2.907) (0.031) (0.032) (0.031)
Adj R-squared 0.222 0.080 0.132 0.265 0.345 0.310 0.085 0.136 0.130

Standardized coefficients in brackets. Standard errors of estimated coefficients in parenthesis. Additional controls included in the model are: KBIT Scaled Score, Age, Ethnicity, Gender, FRL, SPED, ELL and household income.

* Indicates P-values<0.1

** Indicates P-values<0.05, and

*** Indicates P-values<0.01.

Discussion and conclusions

Using data from a study of high school seniors (N = 513), we considered the potential of survey effort measures as proxy measures of character traits. Surveys often resemble routine paperwork and tasks that people have to complete in their everyday lives. For students, in particular, surveys completed at school can resemble schoolwork or homework. Therefore, we hypothesized that measuring the effort students put into surveys can provide relevant information about their grit and self-control, two character traits that correlate with academic and life success.

Two survey effort measures have shown recent promise: item non-response and careless answering. We contribute to previous research in two ways. First, we complement the work of Hitt, Trivitt and Cheng [12], and Hitt [6] on the validity of survey effort measures in adolescents by studying their correlation with teacher reports of students’ skills, academic outcomes at the end of high school, and college attendance. Secondly, we looked at the relationship between survey effort measures and performance task measures of academic diligence and frustration tolerance.

Our results showed the promise of survey effort measures when used as proxy measures of grit and self-control. Both item non-response and careless answering showed convergent validity via negative correlations with self-reported and teacher-reported measures of grit and self-control. Although we acknowledge that the magnitudes of the correlations between survey effort and survey self-reported measures appear small and more replication of these results is needed, they are at least as large as the correlations reported in prior literature validating other behavioral-task measures of traits related to grit, and self-control. Item non-response demonstrated criterion validity through significant negative correlations with high school GPA, the probability of attempting to take the SAT, SAT scores, performance on end-of-senior-year math and reading tests, and the probability of being enrolled in college one year after graduation. Careless answering also showed significant correlations with senior year GPA, attempting the SAT, end-of-senior-year math and reading test scores, and college enrollment. We acknowledge that one of our outcome measures–high-school graduation–had limited variability, suggesting a restriction of range concern. That said, this was not a concern for the other eight academic outcome measures.

We note one key limitation of our study: we only used a convenience sample of high school students in the United States. We encourage further replication work using other samples and settings to corroborate our results.

We believe that this study adds evidence to the potential of survey effort measures to provide meaningful information about students’ character traits related to grit and self-control. These measures provide researchers and evaluators with a relatively easy source of information on students’ traits related to grit and self-control in a manner that is not affected by biases that can affect self-reported or teacher-reported measures as respondents are usually unaware they are being monitored on their survey effort. In addition, they open the opportunity to gain further insights on character traits using previously collected data that had no direct measures of these skills [25]. We acknowledge, however, that these measures could also be biased and manipulated if used in higher stakes educational decisions or if students become aware of the fact that their survey behavior is being observed.

Supporting information

S1 Appendix

(DOCX)

Acknowledgments

We would like to thank Julie Trivitt for her help in the early stages of this paper and Albert Cheng and Collin Hitt for their comments and feedback on our results. We also thank conference participants at the 42nd AEFP Annual Conference and the University of Arkansas Department of Education Reform Brownbag Seminar Series for all their feedback. Any errors are our own.

Data Availability

All data collected in this study is considered confidential. Our IRB protocol at the University of Pennsylvania only allows sharing a fully de-identified dataset with collaborators for research analysis directly involved in this project after receiving approval by the PI of the project (Angela Lee Duckworth), who assumed responsibility for how and where the data will be stored and analyzed.

Funding Statement

Angela Duckworth and Sidney D'Mello received funding from the Bill and Melinda Gates Foundation (https://www.gatesfoundation.org/) and the Walton Family Foundation (https://www.waltonfamilyfoundation.org/) for this work. The funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Almlund M, Duckworth AL, Heckman JJ, Kautz TD. Personality psychology and economics. Handbook of the Economics of Education. 2011; 4: 1–181. [Google Scholar]
  • 2.Weel B. The noncognitive determinants of labor market and behavioral outcomes: Introduction to the symposium. Journal of Human Resources. 2008; 43(4): 729–737. [Google Scholar]
  • 3.Heckman JJ, Stixrud J, Urzua S. The effects of cognitive and noncognitive abilities on labor market outcomes and social behavior. Journal of Labor Economics. 2006; 24(3): 411–482. [Google Scholar]
  • 4.Duckworth AL, Yeager D. Measurement matters assessing personal qualities other than cognitive ability for educational purposes. Educational Researcher. 2015; 44 (4): 237–251. 10.3102/0013189X15584327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Marcus B, Schütz A. Who are the people reluctant to participate in research? Personality correlates of four different types of nonresponse as inferred from self- and Observer Ratings. Journal of Personality. 2005; 73: 959–984. 10.1111/j.1467-6494.2005.00335.x [DOI] [PubMed] [Google Scholar]
  • 6.Hitt C. Just filling in the bubbles: Using careless answer patterns on surveys as a proxy measure of noncognitive skills. EDRE working paper 2013–05. 2015. Fayetteville, AR: Department of Education Reform, University of Arkansas.
  • 7.Huang J, Curran P, Keeney J, Poposki E, DeShon R. Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology. 2012; 27(1): 99–114. [Google Scholar]
  • 8.Johnson JA. Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality. 2005; 39: 103–129. [Google Scholar]
  • 9.Meade A, Craig S. Identifying careless responses in survey data. Psychological Methods. 2012; 17(3): 437–55. 10.1037/a0028085 [DOI] [PubMed] [Google Scholar]
  • 10.Zamarro G, Cheng A, Shakeel D, Hitt C. Comparing and validating measures of non-cognitive traits: Performance task measures and self-reports from a nationally representative internet panel. Journal of Behavioral and Experimental Economics. 2018; 72, 51–60. [Google Scholar]
  • 11.Hedengren D, Strattman T. The dog that didn’t bark: What item nonresponse shows about cognitive and noncognitive ability. Unpublished Manuscript. 2012. Retrieved from http://ssrn.com/abstract = 2194373 [Google Scholar]
  • 12.Hitt C, Trivitt J, Cheng A. When you say nothing at all: The predictive power of student effort on surveys. Economics of Education Review. 2016; 52: 105–119. [Google Scholar]
  • 13.Duckworth AL, Quinn PD. Development and validation of the short grit scale (Grit-S). Journal of personality assessment. 2009; 91: 166–174. 10.1080/00223890802634290 [DOI] [PubMed] [Google Scholar]
  • 14.John OP, Donahue EM, Kentle RL. The Big Five Inventory–Versions 4a and 54. 1991. Berkeley, CA: University of California. [Google Scholar]
  • 15.Galla BM, Plummer BD, White RE, Meketon D, D’Mello SK, Duckworth AL. The Academic Diligence Task (ADT): Assessing individual differences in effort on tedious but important schoolwork. Contemporary Educational Psychology. 2014; 39(4): 314–325. 10.1016/j.cedpsych.2014.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Meindl P, Yu A, Galla B, Quirk A, Haeck C, Goyer P, et al. A brief behavioral measure of frustration tolerance predicts academic achievement immediately and two years later. Emotion. 2019; 19(6): 1081–1092. 10.1037/emo0000492 [DOI] [PubMed] [Google Scholar]
  • 17.Kaufman AS, Kaufman NL. Kaufman Brief Intelligence Test. 1990. John Wiley and Sons, Inc. [Google Scholar]
  • 18.Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychological Bulletin. 1955; 52(4): 174–203. [DOI] [PubMed] [Google Scholar]
  • 19.Tangney JP, Baumeister RF, Boone AL. High self-control predicts good adjustment, less pathology, better grades, and interpersonal success. Journal of Personality. 2004; 72: 271–322. 10.1111/j.0022-3506.2004.00263.x [DOI] [PubMed] [Google Scholar]
  • 20.Gosling SD, Rentfrow PJ, Swann WB Jr. A very brief measure of the big five personality domains. Journal of Research in Personality. 2003; 37: 504–528. [Google Scholar]
  • 21.Rammstedt B, John OP. Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. Journal of Research in Personality. 2007; 41: 203–212. [Google Scholar]
  • 22.Eid ME, Diener EE. Handbook of Multimethod Measurement in Psychology. American Psychological Association; 2006. [Google Scholar]
  • 23.Turner LA, Pickering S, Burke JR. The relationship of attributional beliefs to self-esteem. Adolescence.1998; 33 (130): 477–484. [PubMed] [Google Scholar]
  • 24.Duckworth AL, Kern ML. A meta-analysis of the convergent validity of self-control measures. Journal of Research in Personality. 2011; 4: 259–268. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Cheng A, Zamarro G. Measuring teacher non-cognitive skills and its impact on students: Insight from the Measures of Effective Teaching Longitudinal Database. Economics of Education Review. 2018; 64: 251–260. [Google Scholar]

Decision Letter 0

Frantisek Sudzina

24 Dec 2019

PONE-D-19-29452

Further Validation of Survey Effort Measures of Relevant Character Skills: Results from a Sample of High School Students

PLOS ONE

Dear Dr. Zamarro,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by Feb 07 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Frantisek Sudzina

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent.

In the ethics statement in the Methods and online submission information, please ensure that you have specified (i) whether consent was informed and (ii) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed).

If your study included minors, state whether you obtained consent from parents or guardians.

If the need for consent was waived by the ethics committee, please include this information.

3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: 1. The author's do not sufficiently acknowledge that their small effect sizes should be interpreted with caution.

2. The authors seem to conduct the appropriate analyses (partial correlations, regression) to answer their research questions.

3. The data were not made public, however the authors do acknowledge in the Data Availability Statement (not in the body of the manuscript itself) that they cannot make the data open due to IRB restrictions.

4. This manuscript contained several grammatical, sentence structure, and APA formatting errors, all of which significantly hindered comprehension of its content.

Reviewer #2: Thank you for the opportunity to review this interesting research. This article aims to further validate survey effort measures as potential proxy measures for characteristics like grit and self-control. Strengths include integrating multiple sources of information (self-reported information, teacher reports, performance task measures, and administrative records). The concept of effort spent on tasks as a proxy for different personality or character traits is an interesting premise.

My greatest concerns are about the details that are provided, the language used, and consistency throughout the manuscript, which make it unclear what was done and how that resulted in the results. Below are comments related to the main criteria, which I hope can be useful to the authors in considering how to refine their work and improve the contribution to the literature.

1. While the article is written in standard English, the aims and contributions are inconsistently presented. Some parts provide extra details that seem irrelevant, and then insufficient detail is provided in other parts. As a whole, it’s not clear from the narrative what this study shows and how it contributes to existing knowledge (that information is there, but the focus and use of terms is inconsistent, adding confusion). Careful consideration of the structure, how details can be clearly yet concisely conveyed, and consistency across parts of the narrative would be beneficial. In addition, there are numerous grammatical errors throughout, suggesting a lack of care (that is a bit amusing with a paper on aspects related to conscientiousness).

2. In terms of sharing data, while it's understandable that data cannot be released, it would be useful to include the code files, if possible. In addition, for each of the scales that the careless answering is calculated on, indicate the number of items included in those scales and reliability information, which speaks to the extent to which inconsistencies might be due to the person versus to the scale. It would also be helpful to have a supplemental file that includes the extent to which a particular item is unpredictable.

3. The introduction immediately assumes that the reader views conscientiousness as a character skill. That is debatable, as most uses of the word clearly places it in the personality space, but not necessarily as a skill. Grit is defined here as persistence in long-term tasks, but generally this has involved not only persistence but also passion. Rationale for using these terms as stated would be useful to provide context to the reader. And then in terms of use, this seems to be using these terms more from a personality than a character skills perspective, so rationale for taking this lens (and what is meant by this lens) is needed. In addition, clear definition of terms, such as survey effort measures, parametrization, careless answering.

4. Method:

a) p. 4 notes that data were collected on as many students as resources allowed. Meaning what? How many were included? What were the resources here? While pointing to Meindl et al., 2019, what is the reader supposed to see that citation for? Greater specificity about the students involved would be useful. This vaguely notes 513 high school seniors from a public school. Some indication of the socioeconomic and ethnic makeup of students would be useful.

b) What is the National Student Clearinghouse? Provide a citation or website.

c) In describing what students completed, indicate the exact number of items, not “about 100” and “about 10 more final questions”.

d) When did the sessions occur? Was this during school? In class or outside of class? Did all students in a class complete, or only some? When was the second session? How close in time? How many students did teachers report on? The description of this comes across as quite vague and hard to make sense of what was really done by whom.

e) What is meant by the number of answerable questions to which a student should have responded to? Without knowing the measures being used, not clear what is meant by this.

f) Figure 1 is hard to read, with the large percentage of complete cases. It might be useful to break this into two parts, one indicating response or not, then indicate the distribution of the 53% with missing responses (with the axis adjusted accordingly.

g) In describing careless answering, a reliable scale will not necessarily be consistent (at least in the psychological sciences), as the items can only approximate the underlying construct. While on average across a sample there is consistency, for any individual a variety of factors can cause an item to be less inconsistent. It’s a strong assumption to say that variance is due to carelessness (i.e., a problem with the person), and not due, for instance to the wording of the question or the person carefully discriminating between two option choices.

h) p. 7, noting the estimated correlation coefficient, is this the Pearson r? Say this directly. And be careful about making interpretations about what the correlation does or does not indicate that students do.

i) Did teachers complete the 6 and 8 items for grit and self-control for each student, or did they read through the items and make a judgment call about the extent to which those represent the student?

j) For the questions being asked of teachers, it seems odd that a homeroom teacher is assessing this, as I would think they would have less experience redirecting attention and determining homework completed, (for readers less aware of how the school structure works). Rationale for choosing the 3 subjects would be useful.

k) I’m not clear why details on the ADT correlations in a different study are reported in the text, whereas similar information is not included for all the other measures.

l) Spell out acronyms on first use (GPA, SAT).

m) I find it hard to follow the tests that are planned and the expectations. A table or figure could be a useful way to convey the analytic strategy.

5. Results:

a) Information about the participants are finally provided in the results. This would be useful much earlier. Tables should also be numbered in the order they are mentioned in the text (Table 2 is noted before Table 1). The description of the table could be briefer to be less repetitious with the table (or the info could be descriptively given in the text and then just summarise the measures in the table). The description of the measure responses could also be more clear and concise.

b) Consider combining tables 3 and 4, so it’s easier to directly compare the direct and partial correlations.

c) For comparison, it would be useful to also predict the outcomes with the self-report and teacher reported measures, to see if a similar pattern to the frustration measures occur. This is especially necessary as the conclusion indicates that they are tested as a proxy measure of character skills – so need to directly see that they are capturing the same thing.

6. Conclusions

a) The conclusion notes that this can give relevant information about conscientiousness. But the focus is on grit and self-control, not on conscientiousness (though that is measured in the Big 5) – if that’s the goal, then should be addressed directly (and also have more consideration of how the different self, teacher, and performance measures intersect with conscientiousness. (Though the third paragraph then instead speaks of skills related to conscientiousness. It would be useful to be consistent in the narrative throughout.)

b) The overlap with both the self-reported measures and teacher measures are quite small, suggesting they are not a particularly good proxy (perhaps are capturing different variance). Some discussion of this should be included.

c) At some point should discuss the assumptions that are being made with the survey effort measures. The last paragraph suggests these are not affected by biases that affect self and teacher reported measures, but instead they reflect biases and assumption of the researchers, which should be explicitly acknowledged.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: PLOS November 2019.docx

PLoS One. 2020 Jul 1;15(7):e0235396. doi: 10.1371/journal.pone.0235396.r002

Author response to Decision Letter 0


16 Jun 2020

RESPONSE TO COMMENTS FROM REVIEWER 1

1) The author's do not sufficiently acknowledge that their small effect sizes should be interpreted with caution.

This is now better acknowledged in the text both in the introduction and conclusion of the paper. Please, see pages 3 and 4.

2) The data were not made public, however the authors do acknowledge in the Data Availability Statement (not in the body of the manuscript itself) that they cannot make the data open due to IRB restrictions.

All data collected in this study is considered confidential. After further studying and discussing our IRB at the University of Pennsylvania we learned that our IRB protocol only allows sharing a fully de-identified dataset with collaborators for research analysis directly involved in this project after receiving approval by the PI of the project, who assumed responsibility for how and where the data will be stored and analyzed. The reason being that this research involves minors and this confidentially was promised to participants. Therefore, we regret that our IRB does not allow us to share the data. We hope you understand given the nature of the data and the fact that our research involves minors.

3) This manuscript contained several grammatical, sentence structure, and APA formatting errors, all of which significantly hindered comprehension of its content. This manuscript includes several grammatical errors and would benefit substantially from thorough copy editing. Moreover, the organization of this manuscript (e.g., the subhead “Data” should be “Materials and Methods”) does not fit with APA guidelines and that of PLOS (https://journals.plos.org/plosone/s/submission-guidelines). Finally, as per APA guidelines, please report measures and results in past tense. These errors and issues limit the manuscripts readability quite a bit.

We have revised the text to follow APA style and the text has also been copy-edited for grammatical errors.

4) Use of the term “character skills” to describe the personality trait conscientiousness and its facets is odd. Decades of research has identified conscientiousness as a personality trait. Please include rationale for why conscientiousness and its facets are termed as a character skill and not a personality trait in this context. Alternatively, given that conscientiousness is not measured directly, I suggest focusing on grit and self-control throughout the manuscript, especially the methods and results sections.

There is a lack of consensus across disciplines on the use of these terms and hence the confusion as our research team is multidisciplinary. Economists use the term non-cognitive skills while education policy researchers refer to these terms as character skills. We have revised the text to use terminology more in line with the psychology literature.

5) Relatedly, you introduce the manuscript’s focus on validation by stating that “researchers have struggled to find valid measures of these skills with many existing datasets lacking any measures at all”. In fact, measures to assess trait level conscientiousness have been very well validated (see BFI, BFI2, NEO, etc.) If you decide to focus on conscientiousness and its facets, this sentence is problematic.

Available self-reported measures, despite being validated, can be problematic when used to evaluate the effects of education policies or interventions (see Duckworth and Yeagar, 2015), that is the point we were trying to make with this sentence. We have revised the text to try to make this more clear. See the revised abstract and the first paragraph of the Introduction page 1.

6) Please include reference to the measures you used as well as example items for each.

References to the measures and example items are included in the text. We have also added Table A.1 to the appendix which describes all the scales included in our careless answering measures.

7)In the Survey Effort Measures section, please clarify what constitutes as “legitimate skips” with regards to item non-responding.

These are questions that, given a previous student’s responses, he/she is not required to complete in the survey. We have tried to clarify now this more in the text. See pages 6 and 7.

8) Please clarify why teachers rated the grit measure items as a composite instead of each item individually. This approach limits the measures’ predictive power as well as its ability to assess the measures’ reliability. Also, please report the number of students each teacher gave ratings for.

Teachers provided an overall rating on their student’s levels of grit and self-control to minimize burden, as teachers reported on all of their current students participating in the study. This measurement approach using a single overall assessment of personality traits has been proved to show adequate levels of convergent and discriminatory validity, test-retest reliability, and convergent validity between self-and observed-ratings (See, Gosling, Rentfrow, & Swann, 2003; Rammstedt & John, 2007)

9)Clarify what including self-report measures “for completeness” means in the context of this study.

The point we wanted to make here was that, even though others have already studied the relationship between survey effort measures and self-reports, we also study this relationship as it provides evidence of convergent vaidity. We have rewritten this sentence on page 10 and eliminated “for completeness”.

10) Please explain how and why the measures for grit and self-control were adapted from the original scales.

Most studies of grit include at least 8 items. For the sake of brevity, we followed Duckworth and Quinn (2009) and used only the 5 items below:

1) I finish whatever I begin.

2) I work independently with focus.

3) I tried very hard even after experiencing failure.

4) I stay committed to my goals.

5) I keep working hard even when I feel like quitting

Source: Duckworth, A. L., & Quinn, P. D. (2009). Development and validation of the short grit scale (Grit-S). Journal of Personality Assessment, 91, 166-174.

Similarly, we included the following 8 items from the Brief Self-Control Scale to get at the overarching construct of self-control.

Self-control at work:

1) I come to class prepared.

2) I pay attention and resist distractions in class.

3) I remember and follow directions.

4) I get to work right away, instead of waiting until the last minute.

Self-Control (Interpersonal)

5) I allow others to speak without interruption.

6) I am polite to adults and classmates.

7) I can control my temper.

8) I can remain calm even when criticized or otherwise provoked.

Source: Tangney, J. P., Baumeister, R. F., & Boone, A. L. (2004). High self-control predicts good adjustment, less pathology, better grades, and interpersonal success. Journal of Personality, 72, 271-322.

Overall, both modified scales continued to capture the underlying measures as intended.

11) Its my understanding that GPA was on a scale from 1.0-4.0. Please explain why your measure of GPA is on a scale from 1 to 100.

In the state where our data comes from, high schools vary on their calculations and scale used for their GPAs. Therefore, we converted GPA to a 100-point scale with the help of district-provided handbooks and information from the College Board.

12) Please report effect sizes of regression models throughout results sections instead of or in addition to standard deviation increases or decreases.

We have added estimated coefficients and standard errors of estimated coefficients in addition to standardized coefficients to the current regression result tables. See the new Table 4.

13) Please include a note for Table 1 explaining what ELL, SPED etc. mean.

Done

14) In the discussion section, address limitations for the lack of variation in outcome measures (e.g., graduating high school). Relatedly, please address the limitations of your item non-response measure using (1) a continuous measure with 47% of participants’ completing all questions (2) a dichotomous measure of non-responding is problematic (i.e., not answering 1 question is not the same as not answering 10 questions), as well as the implications of the non-normal distribution for this measure (e.g., that careless responding may be a better predictor).

In the discussion section, we now acknowledge that one of our outcomes measures – high-school graduation – had limited variability, suggesting a restriction of range concern. That said, this was not a concern for the other eight academic outcome measures.

15) While you do recognize the small effect sizes (i.e., “Although the magnitudes of the correlations between survey effort and survey self-reported measures may appear small, they are at least as large as the correlations reported in prior literature validating other behavioral-task measures of conscientiousness, grit, and self-control.”), please address this limitation in the discussion section and caution readers in over-interpreting small effect sizes. Specifically, we may trust some effects more than others. Likewise, some effects may require replication more than others.

This is now better acknowledged in the discussion section.

RESPONSE TO COMMENTS FROM REVIEWER 2

1) While the article is written in standard English, the aims and contributions are inconsistently presented. Some parts provide extra details that seem irrelevant, and then insufficient detail is provided in other parts. As a whole, it’s not clear from the narrative what this study shows and how it contributes to existing knowledge (that information is there, but the focus and use of terms is inconsistent, adding confusion). Careful consideration of the structure, how details can be clearly yet concisely conveyed, and consistency across parts of the narrative would be beneficial. In addition, there are numerous grammatical errors throughout, suggesting a lack of care (that is a bit amusing with a paper on aspects related to conscientiousness).

We have considerably revised the text to address these concerns. We have also used the services of a copy editor to eliminate any possible grammatical errors.

2) In terms of sharing data, while it's understandable that data cannot be released, it would be useful to include the code files, if possible. In addition, for each of the scales that the careless answering is calculated on, indicate the number of items included in those scales and reliability information, which speaks to the extent to which inconsistencies might be due to the person versus to the scale. It would also be helpful to have a supplemental file that includes the extent to which a particular item is unpredictable.

We have created the appendix table describing all this information. See Table A.1 page 33. In terms of the code, if the editor considers it necessary, we could document and provide the code if the paper is accepted for publication.

3) The introduction immediately assumes that the reader views conscientiousness as a character skill. That is debatable, as most uses of the word clearly places it in the personality space, but not necessarily as a skill. Grit is defined here as persistence in long-term tasks, but generally this has involved not only persistence but also passion. Rationale for using these terms as stated would be useful to provide context to the reader. And then in terms of use, this seems to be using these terms more from a personality than a character skills perspective, so rationale for taking this lens (and what is meant by this lens) is needed. In addition, clear definition of terms, such as survey effort measures, parametrization, careless answering.

Thank you for this comment. We acknowledge the fact that these terms might be confusing and adding to the confusion is the fact that different disciplines (e.g. economics, psychology, education policy) are using these terms in different ways. We have tried to clarify the definition of these terms in the text and tried to better justify how we are using them to avoid misunderstandings.

4) Method:

a) p. 4 notes that data were collected on as many students as resources allowed. Meaning what? How many were included? What were the resources here? While pointing to Meindl et al., 2019, what is the reader supposed to see that citation for? Greater specificity about the students involved would be useful. This vaguely notes 513 high school seniors from a public school. Some indication of the socioeconomic and ethnic makeup of students would be useful.

Following this comment, we have added more details about the sample. A description of the makeup of students in our sample can be found at the end of page 4.

b) What is the National Student Clearinghouse? Provide a citation or website.

We have described and added a link for the National Student Clearinghouse see page 4

c) In describing what students completed, indicate the exact number of items, not “about 100” and “about 10 more final questions”.

Different students would complete a different total number of questions because some questions are only required if they answer certain options in prior questions. That is why we did not give the exact number of questions as it would slightly vary by student.We changed this part of the text and eliminated the reference to the number of questions to avoid the confussion.

d) When did the sessions occur? Was this during school? In class or outside of class? Did all students in a class complete, or only some? When was the second session? How close in time? How many students did teachers report on? The description of this comes across as quite vague and hard to make sense of what was really done by whom.

Those students whose parents did not opt-out were surveyed in three sessions:

Session 1: Students were administered the assent forms and Mill Hill Vocabulary test during senior panning periods in school (37 minutes sessions). A large make-up session (n ~300) was held on the final day of testing in the library computer lab. This session occurred in November 2012.

Session 2: Students were administered the a Cognitive battery, Non-Cognitive battery, and the Academic Diligence task during one 2.5 hour session. Students were administered the tasks in either the cafeteria (50-200 students per day) or individual classrooms on Macbook laptop computers (~30 students/class per day). This session occurred in January 2013.

Session 3: Students were administered the Frustration Task during senior planning periods. Students were administered the task in the library on Macbook laptop computers. For four periods per day, two classes of students were tested during each period (approximately 30-60 students per period). This session occurred in May 2013.

Teachers reported on all of their participating students in the study.

e) What is meant by the number of answerable questions to which a student should have responded to? Without knowing the measures being used, not clear what is meant by this.

This means all questions that a student is supposed to answer not counting as non-response legit skips because of survey routing. We have tried to clarify this better in the text. See pages 6 and 7.

f) Figure 1 is hard to read, with the large percentage of complete cases. It might be useful to break this into two parts, one indicating response or not, then indicate the distribution of the 53% with missing responses (with the axis adjusted accordingly.

Following the reviewer’s advice we have eliminated prior Figure 1 and opted for discussing the results in the text and added a new figure representing the distribution of item non-response rates for those with missing responses (new Figure 1).

g) In describing careless answering, a reliable scale will not necessarily be consistent (at least in the psychological sciences), as the items can only approximate the underlying construct. While on average across a sample there is consistency, for any individual a variety of factors can cause an item to be less inconsistent. It’s a strong assumption to say that variance is due to carelessness (i.e., a problem with the person), and not due, for instance to the wording of the question or the person carefully discriminating between two option choices.

We acknowledge that our survey effort measures, including careless answering, are proxy measures for student effort and as so, would have some noise. Previous literature has suggested their promise as proxy measures for personality traits related to grit and self-control (Hedengren and Stratmann, 2012; Hitt, Trivitt and Cheng 2016; Hitt, 2015; Zamarro et al., 2018). Therefore, we believe that despite being potentially noisy they can contain relevant information about an individual’s diligence which will relate to their levels of grit and self-control.

h) p. 7, noting the estimated correlation coefficient, is this the Pearson r? Say this directly. And be careful about making interpretations about what the correlation does or does not indicate that students do.

Yes, this is a Pearson’s correlation coefficient. We have now stated this in the text and rewrote this paragraph to not make any interpretations about what this correlation indicates about student behaviors.

i) Did teachers complete the 6 and 8 items for grit and self-control for each student, or did they read through the items and make a judgment call about the extent to which those represent the student?

To limit the workload on teachers, as they reported on all their participating students, teachers provided an overall rating on their student’s levels of grit and self-control to minimize their burden. This measurement approach using a single overall assessment of personality traits has been proved to show adequate levels of convergent and discriminatory validity, test-retest reliability, and convergence between self-and observed-ratings (See, Gosling, Rentfrow, & Swann, 2003; Rammstedt & John, 2007)

j) For the questions being asked of teachers, it seems odd that a homeroom teacher is assessing this, as I would think they would have less experience redirecting attention and determining homework completed, (for readers less aware of how the school structure works). Rationale for choosing the 3 subjects would be useful.

Homeroom, English, and social science teachers were asked to do so. We believe that these three teachers can provide meaningful information about students. Given our experience in the field, we also think that homeroom teachers are capable to report on these items.

k) I’m not clear why details on the ADT correlations in a different study are reported in the text, whereas similar information is not included for all the other measures.

Details on correlations for the ADT and MTFT are provided for comparison with our estimated correlations for measures of survey effort. We believe this comparison is meaningful because these are direct task measures and we believe effort in a survey could be view as a behavioral task.

l) Spell out acronyms on first use (GPA, SAT).

This is now done

m) I find it hard to follow the tests that are planned and the expectations. A table or figure could be a useful way to convey the analytic strategy.

We have rewritten and reorganized the paper including our empirical strategy section, pages 15 and 16, and we hope this is now more clear in this version.

5) Results

a) Information about the participants are finally provided in the results. This would be useful much earlier. Tables should also be numbered in the order they are mentioned in the text (Table 2 is noted before Table 1). The description of the table could be briefer to be less repetitious with the table (or the info could be descriptively given in the text and then just summarise the measures in the table). The description of the measure responses could also be more clear and concise.

We have moved the information about participants earlier in the text and addressed the issue with the numbering of Tables. We have also revised the text aiming to increase clarity in describing the tables.

b) Consider combining tables 3 and 4, so it’s easier to directly compare the direct and partial correlations.

We have done this

c) For comparison, it would be useful to also predict the outcomes with the self-report and teacher reported measures, to see if a similar pattern to the frustration measures occur. This is especially necessary as the conclusion indicates that they are tested as a proxy measure of character skills – so need to directly see that they are capturing the same thing.

We have added new estimates predicting the outcomes with the self-report and teacher reported measures. See new Tables 5 and 6 that are described in the text in page 24.

6) Conclusions

a) The conclusion notes that this can give relevant information about conscientiousness. But the focus is on grit and self-control, not on conscientiousness (though that is measured in the Big 5) – if that’s the goal, then should be addressed directly (and also have more consideration of how the different self, teacher, and performance measures intersect with conscientiousness. (Though the third paragraph then instead speaks of skills related to conscientiousness. It would be useful to be consistent in the narrative throughout.)

We revised the text to be more consistent on our use of these terms and make it more clear to the reader.

b) The overlap with both the self-reported measures and teacher measures are quite small, suggesting they are not a particularly good proxy (perhaps are capturing different variance). Some discussion of this should be included.

As pointed out in the text our observed correlations are of the same magnitude than those observed with other designed behavioral tasks like the Diligence Task and the Frustration task and so, we disagree that these correlations are as small as this reviewer suggests. In any case, this is now better acknowledged in the conclusions where we point out the need for more replication.

c) At some point should discuss the assumptions that are being made with the survey effort measures. The last paragraph suggests these are not affected by biases that affect self and teacher reported measures, but instead they reflect biases and assumption of the researchers, which should be explicitly acknowledged.

We now better acknowledge in the conclusions section that our survey effort measures are not free of potential biases:

“We acknowledge, however, that these measures could also be biased and manipulated if used in higher stakes educational decisions or if students become aware of the fact that their survey behavior is being observed.”

Attachment

Submitted filename: Response_Reviewers.docx

Decision Letter 1

Frantisek Sudzina

16 Jun 2020

Validation of survey effort measures of grit and self-control in a sample of high school students

PONE-D-19-29452R1

Dear Dr. Zamarro,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Frantisek Sudzina

Academic Editor

PLOS ONE

Acceptance letter

Frantisek Sudzina

22 Jun 2020

PONE-D-19-29452R1

Validation of survey effort measures of grit and self-control in a sample of high school students

Dear Dr. Zamarro:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Frantisek Sudzina

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Appendix

    (DOCX)

    Attachment

    Submitted filename: PLOS November 2019.docx

    Attachment

    Submitted filename: Response_Reviewers.docx

    Data Availability Statement

    All data collected in this study is considered confidential. Our IRB protocol at the University of Pennsylvania only allows sharing a fully de-identified dataset with collaborators for research analysis directly involved in this project after receiving approval by the PI of the project (Angela Lee Duckworth), who assumed responsibility for how and where the data will be stored and analyzed.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES