Time-on-task estimation for tasks lasting hours spread over multiple days

Kaden Hart; Christopher M Warren; John Edwards

doi:10.1371/journal.pone.0330758

. 2025 Sep 18;20(9):e0330758. doi: 10.1371/journal.pone.0330758

Time-on-task estimation for tasks lasting hours spread over multiple days

Kaden Hart ^1,^*, Christopher M Warren ², John Edwards ¹

Editor: Vineeta Khemchandani³

PMCID: PMC12445496 PMID: 40966275

Abstract

The ability of a human to retrospectively estimate the amount of time spent on a task is largely only understood when the period of time is seconds- or minutes-long. The lack of research into estimation of longer periods of time can be attributed, in part, to the difficulty of measuring ground truth durations when the task is broken up by other activities in a natural, day-to-day setting. An empirically based model of engagement was recently proposed that statistically estimates time-on-task for computer programming assignments in an introductory computer programming course. Computer programming assignments can be completed in many sessions across days or weeks and, based on recorded keystroke data, an objective ground truth of task duration can be measured. In this work, we take advantage of this new measurement method to explore duration estimation of tasks lasting hours that are spread out over multiple days in a natural setting. Subjects in our study overestimated time-on-task 78% of the time and reported a median of 1.45 hours worked for every actual hour spent on task. We find that self-reports are more accurate when students score higher on their assignments in our data, suggesting the accuracy of estimated time is correlated with task performance.

Introduction

Time duration estimation, where a person estimates how much time elapsed during an interval or while completing a task, is much-studied because of its impacts on daily life and on our psychological and physiological understanding of related phenomena, such as memory. Time estimation studies generally take the form of comparing a subject’s estimate of time duration with an established actual time duration, measuring how much the subject over- or underestimates duration.

There are two broad classes of time duration studies, those in which duration estimates range from seconds to minutes, and those in which estimates are in hours. Studies that use durations at the second and minute levels are generally called interval timing studies [1,2]. These studies merge the theory of time awareness with empirical studies and with physiology. They are generally conducted in highly controlled laboratory conditions and often seek to isolate specific mechanisms, so few (usually one) tasks or stimuli are allowed to occur in the estimated time interval. Most studies use durations no longer than a few seconds, but some extend to as long as 60 minutes. Ground truth time duration is simply clock time.

The other broad class of time estimation study is of interest to the labor and organizational psychology communities. These studies are often conducted in the context of understanding employee effort and performance and treat much longer time durations, with subjects estimating time spent on a task over the course of a day or a work week. Ground truth time-on-task is difficult to measure, so most studies use time diaries, where subjects recount all tasks done during the day, as a subjective stand-in for actual time spent on a given task.

A gap exists between these two methodologies. On the one hand, we have highly controlled studies with trustworthy ground truth, but they study only relatively short time intervals and they are generally conducted in laboratory settings. On the other hand, we have loosely controlled studies with subjective ground truth with little theoretical or physiological explanation, but they span longer time intervals and are conducted in natural settings.

In this paper, we report a study that is a step in bridging this gap. We extend task lengths to durations of hours and allow task switching. Our data is collected in a natural (not laboratory) setting while using an objective measure of actual time duration that we can compare subjects’ estimates to. We build on work recently published in the computing education literature that proposes a way to accurately measure how long computer programming students work on an hours-long assignment using keystroke logs [3,4]. These time estimates work by recording students’ keystrokes while they work on computer programming assignments whenever and wherever they choose, then use a probabilistic model to determine how much time students spend working on their assignments. With these keystroke-derived time-on-task measurements we can compare students’ retrospective estimates of time-on-task for a given assignment. Using this methodology we gain benefits from both classes of duration estimation studies: we use objective ground truth measurements like interval (short duration) timing studies and we are able to measure in natural settings for longer periods of time, like in time diary studies. Furthermore, the task need not be completed in one continuous session. We incorporate an additional variable, that of performance as measured by graded scores on assignments, in our analysis.

Our overarching research question is: how accurately do humans estimate time-on-task after completing the task, and how does this correlate with performance? Our contribution in this paper is to take the first measurements in a natural setting where the baseline time estimates do not rely on self-reports, but are objective estimates based on a statistical model of behavior. Our specific research questions in this study are:

RQ1 How accurately do students estimate time spent on computer programming assignments?
RQ2 Does a correlation exist between performance and the accuracy of self-reported task duration?

We find that subjects overestimate their assignment duration by a median of 45%, reporting 1.45 hours for every hour worked; confirming the feeling of CS1 instructors that student estimates felt inflated [3]. The 45% error is much higher than what was found in time diaries. Despite the large overestimates, the majority of student participants nevertheless believed themselves able to accurately estimate time-on-task according to a survey. We also find that performance on assignments is moderately correlated with the accuracy of time-on-task estimates. This study is the first to compare estimated time-on-task to a baseline obtained using an objective, empirically based method that is measured in a natural non-laboratory setting. The experiment is done in an educational setting, but we suggest that the results are potentially generalizable and important enough to contribute to the general time-on-task estimation literature. This paper contributes to time-estimate research by demonstrating methodology to obtain objective measures for long duration tasks in a case study exploring subjective time estimates to objective time estimates.

Background

Interval time estimation

Interval timing is tracking elapsed time in the seconds to minutes range [1]. It is typically done in a laboratory setting where actual time duration is simply clock time since the intervals are short enough that subjects don’t have time to task switch. There are two paradigms of duration estimation, prospective and retrospective. In prospective estimation, participants are told beforehand that they will be asked about time, while retrospective estimates are made without prior knowledge that a time estimate will be made [5]. When participants know they will be asked about time they tend to overestimate more [6], although Walker et al. [7] only found this effect at 8 minutes and not at longer durations. Balci et al. [1] found in a retrospective study with approximately 24,500 participants and intervals from 5 minutes to 90 minutes that intervals less than 15 minutes tended to be overestimated and intervals over 15 minutes tended to be underestimated.

Tobin et al. [6] found that gamers overestimate their time spent gaming at all intervals tested: 12 minutes, 35 minutes, and 58 minutes. It is well known that “time flies when you’re having fun” [8] but gamers did not underestimate their time; possibly because they are aware that their perceived time is less than their actual time [9] and increase their estimates to compensate. If the task chosen for the interval was not as engaging (e.g. homework), overestimation could be higher because the perception of time is inflated when tasks are not engaging [10].

An important feature of interval timing is the existence of the objective ground truth interval duration to compare to estimates, yet there is a “paucity” [7] of prospective vs. retrospective interval timing studies that consider intervals longer than two minutes [6].

Time diary studies

Related to interval timing are studies that measure time on tasks lasting hours spread over multiple days. These studies are important to labor researchers and stakeholders; however an accurate ground truth is rarely available. Time estimation studies where durations are in the range of hours use various measurement and estimation methods. The most common is the time-diary study. In these studies, participants keep a daily “time diary," where they log each activity done sequentially during the day. This is used as a stand-in for actual time duration for a specific task. The estimated time comes after a number of days (usually a week) and is compared to the time diaries. Time estimates generally exceed estimates derived from time diaries. Other measurement methods at this time scale exist, but time diaries are generally accepted as a “richer and more contextualized source of information about people’s activities than any present alternative” [11].

Time-diaries provided by various universities starting with the University of Michigan in 1965 and since 2003 by the US Census Bureau with the Bureau of Labor Statistics American Time Use Survey have been used as ground truth [11]. Time diaries have their issues: people can distort, embellish, lie, forget, or “substitute a habitual activity for what actually took place" [11]. Despite their issues, time-diaries are used because they are widely available, have a long history, encompass almost every profession, and are the most accurate measures available for their broad contexts [12–14].

It is well documented that people overestimate their time spent working when comparing time estimates to time-diaries [15–17] by 5% to 10% [11] especially for mundane tasks like housework [18,19], although there exists some debate on the interpretation of these overestimates (e.g. are respondents including housework or commute time when asked about work in surveys?) [20].

Measuring time-on-task using keystroke logs

In an educational context, the amount of time spent working on a task has been researched since the 1970’s [21]. It is an important metric that contributes to learning and achievement [22]. To measure time-on-task, computing education researchers often use software that records events when students work on their computer code. Course-grained measures, such as measuring time-on-task as the time between when a student started working on their assignment and when they finished working on their assignment, are not as accurate as fine-grained measures, such as using keystrokes to determine when a student was working or not [22]. The time between two keystrokes is known as the keystroke latency.

A straightforward way to estimate time-on-task using keystroke logs is to simply sum the keystroke latencies. However, if a latency of, say, two hours appears, then we intuitively consider that the student was taking a break at that time and don’t include that latency duration in the time-on-task estimate. Threshold-based time estimation methods define a threshold latency value such that any latency greater than the threshold is excluded from the time-on-task estimate. For years, ad hoc thresholds were used, which ranged from five minutes to one hour [23–27]. This lack of consistency in choice of threshold has been a major issue [3,4,21]. Without a validated way of determining which latencies represented disengagement from the task, time estimates from event logs could not be trusted.

Very recently, Edwards et al. [3] and Hart et al. [4] developed a statistical model that could probabilistically determine how likely a latency is to indicate the student was engaged or not (e.g., short latency keystrokes were more likely to indicate engagement than long latency keystrokes). Using this model, an accurate estimate could be made for how much time a student worked on an assignment by summing the keystroke latencies that are determined to be engaged by the model.

During the study developing the statistical model of engagement, Edwards et. al. noted that when students are asked how long an assignment took to complete, student responses seemed inflated to the instructor [3]. This work investigates the validity of that notion.

Bridging the gap

With our methodology, we start to explore the space between interval timing and time diary studies. For example, in one interval timing study using a retrospective estimation paradigm, subjects tended to overestimate when the time interval is less than 15 minutes and were found to underestimate when the interval was between 15 and 90 minutes [1]. This is in contrast to a time diary study in which time diaries (the surrogate for ground truth) showed more working hours when subjects estimated fewer than 25 hours worked in a week but showed fewer working hours when weekly estimates exceeded 25 hours [11]. A direct comparison between these two studies is not possible because of the major methodological and contextual differences.

We view our time-on-task estimation using the lens of retrospective timing processes. Technically, the experiment is one of prospective timing, since for the majority of assignments, participants know that they will be asked to estimate afterwards how long they took to complete a programming assignment. However, because of the scale and length of task, and because the task is not purposed to the time estimation study, but rather, estimating time-on-task is more of an after-the-fact add-on, we suggest that the processes involved in time estimation are more closely aligned with retrospective timing. This is supported by our results: if we make the assumption that lower-performing students, in terms of assignment score, experience higher cognitive loads, then, according to [28], the duration judgment ratio (subjective duration to objective duration) should increase in a retrospective setting. In other words, if subjects are not informed ahead of time that they will be estimating the time taken, they tend to overestimate the time when under cognitive load. However, this could also be explained by the effect of prospective/retrospective estimation paradigm becoming less pronounced as length of task increases [7].

Methods

Thirty-three students in an introductory computer programming course (CS1) at Utah State University, a mid-sized, research-intensive university in the western United States, participated in our study. The CS1 course is a semester-long (14 weeks) course that teaches students the fundamentals of writing computer programs in the Python programming language. As a general science elective, the course attracts a mix of computer science majors and non-majors as well as a mix of students with prior programming experience and those without. Students were given a short, 2-minute, description of the study then given an informed consent document to sign and return to the research team if they decided to participate. Students were recruited from 08/05/2023 to 13/10/2023. Data were collected during the Summer (May 8th to August 18th) and Fall (August 28th to December 15th) semester of 2023. Our university’s ethics review board reviewed and approved this study (IRB #13514).

In the CS1 course, students must complete approximately one programming assignment each week. In this study, we report data on seven of the assignments. Four of the assignments have two parts, or “tasks." See Table 1 for brief descriptions of assignments, each of which targets a specific concept or concepts in computer programming (e.g., iteration, conditionals, etc). We collected data on 106 assignment submissions, but removed one submission because there was not enough keystroke data collected during the assignment; we used the remaining 105 submissions for all analyses. Students write their code using a specialized piece of software called an integrated development environment (IDE). IDEs provide a convenient environment that includes text coloring, debugging tools, code suggestions, and a user interface organized around the file structure of the code files. Students in our study were encouraged to use the PyCharm IDE.

Table 1. Descriptions of programs students programmed for their assignments.

Assignment	Task	Description
A1	1	Type a year into the command prompt and the program reports if the year is a leap year or not
A1	2	The program prompts the user for information about animals seen then generates a summary with indenting for legibility.
A2	1	The program asks simple addition problems for the user to solve and animates a score for answers with more points awarded or lost based on how quickly the answer was given.
A2	2	The program prints a properly formatted number pyramid when given the number of rows.
A3	1	Draw an 8x8 black and white grid to the users specified width and height.
A3	2	Draw random numbers of rectangles or circles evenly spaced and rotated in circular patterns at random positions.
A4	1	A game where the player must move around to collect treasure and avoid a moving enemy.
A4	2	Draw a yellow smiley face that can be changed to a frown or a different color.
A5	1	A Duck Hunt like game where the player must catch butterflies and kill wasps.
A6	1	A Snake like game with two players. A player loses when they run into a wall or the path left by themselves or the other player.
A7	1	Given a program that makes a deck of cards and deals them correctly, find and fix errors in the given sorting and searching code.

Open in a new tab

Measurement of actual time duration

The primary data used in this study is keystroke data. Study participants were asked to install a plugin to their PyCharm IDE called ShowYourWork [29,30]. This plugin logs keystrokes, pastes, and other events to a file. Each entry in the file contains a timestamp, the action type (e.g. keystroke), any text that was inserted or deleted as a result of the action, and other data. This file would be updated anytime a student worked on their assignment and included in their final submission.

The key to this study is our ability to objectively measure how long students took to complete their programming assignments without requiring them to work in a laboratory setting. Students could work on their assignments on campus or at home at any time that they chose. They could take breaks and even work on the assignment across multiple days (most did, in fact). Our time-on-task measurement uses keystroke log data obtained using the ShowYourWork plugin. As discussed in the Background section, recent work in the computing education literature has established an empirically based statistical model that gives probabilities that students are on task at any given moment based on the amount of time elapsed since their last keystroke [3,4]. Using these probabilities, we can estimate, in the aggregate, the amount of time spent on the assignment. This is done with the student in a natural setting. Furthermore, the Hawthorne effect [31] is minimized by the fact that roughly half of the students forgot that their keystrokes were being logged [30].

To estimate how much time a student spent working on an assignment, we used the equation from the work of Hart et al. [4]

y (x) = {\begin{matrix} 1 & if x < 0.75 \\ \frac{1}{(1 + Q e^{- B (x - M)})^{1 / v}} & if x \in [0.75, 120] \\ 0 & if x > 120 \end{matrix}

(1)

with $Q = 6604, B = - 4.99, M = 0.01,$ $v = 58.32$ , and x equal to the keystroke latency in minutes. This equation gives the probability that a student was working on their assignment during the time between two keystrokes. With this equation, each submission was probabilistically sampled many times to create a range of possible assignment durations. The median sampled assignment duration was then used as the true assignment duration.

Time estimation

The course had a post-assignment reflection questionnaire for each assignment that asked students questions about the assignment for instructor feedback but also included a question for our research: “How many hours did you spend working on this assignment? (example: 2.5, 1.25, 4)”. Students were required to enter an estimate to receive credit on the questionnaire.

Duration judgment ratio

Our main analysis measure is the so-called duration judgment ratio, or simply ratio, which is a measure of quality of duration estimation [28]. The duration judgment ratio is defined as $E_{D} / T_{D}$ where E_D is the estimated duration and T_D is the actual task duration. The range of the ratio is $[0, \infty)$ . Estimated durations come from the post-assignment surveys and actual task durations are derived from the keystroke data.

Pre-survey

Additionally, participants were given a survey at the beginning of the course:

I am capable of remembering how much time I worked on an assignment. [Strongly Disagree, Slightly Disagree, Neither Agree nor Disagree, Slightly Agree, Strongly Agree]
A 3-hour assignment is: [Significantly Shorter Than Most, Slightly Shorter Than Most, About Average, Slightly Longer Than Most, Significantly Longer Than Most]

Results and discussion

Estimated time

Our first research question RQ1 is: How accurately do students estimate time spent on computer programming assignments? In our data, students over-reported their assignment durations 78% of the time with an average ratio of 2.2, and a median ratio of 1.45. This means that students reported an average of 2.2 hours worked for every hour actually worked. As seen in Fig 1, outliers, such as the student who reported 8 hours for each hour actually worked, skewed the distribution. The median of measured assignment time-on-task in our data is 5.1 hours, with a median error of 3.2 hours, i.e., the median reported assignment time was 8.3 hours for a 5.1-hour assignment.

Fig 1 — Most estimates resulted in a ratio greater than one, meaning the student over estimated how long they took on the assignment.

The size of our subjects’ estimation error is significantly larger than the 5% to 10% found by Robinson et al. [11] in a time diary study. This may be because students can work on assignments with extreme flexibility and greater errors and exaggerations have been found when people have nonstandard or irregular schedules [32]. Homework may also be more stressful to students because of mismanaged time, difficult deadlines, and inexperience leading to higher error. Estimates in industry may be more accurate because there may be a known expected answer, such as a typical 40-hour workweek, where an assignment has no expected duration. It is also likely that students did not account for their wasted time due to short-duration breaks such as phone notifications. Time diary study ground-truth durations likely include short breaks [15], while our actual duration measures do not count breaks. This leads to what we suggest as the most likely cause of difference in estimate error between our study and time diary studies: we use an objective measure of actual time duration while time diaries rely on self-report. The implication is that the time estimation error in time diary studies may be greatly underestimated.

Task performance vs. Estimate error

Our second research question RQ2 is: Does a correlation exist between performance and the accuracy of self-reported task duration? We analyzed two measures of performance. The first measure is assignment score. The time duration ratio had a moderate negative correlation with assignment score (Spearman’s rank correlation r = −0.4, p < 0.001 Table 2). See Fig 2. In other words, when students received a higher grade, their self-reported duration was more accurate. The average score for an assignment where a student underestimated their time was 94% and the average score for overestimated submissions was 81%. As a second measure of performance, we considered total time-on-task and we correlated the total amount of time a student took to complete an assignment with time duration ratio. We found a similar moderate negative correlation (Spearman’s rank correlation r = −0.46, p = 1.8e⁻⁶ Table 2). See Fig 3. If we assume that students getting lower grades or those taking longer experience more cognitive load, then these findings are consistent with previous work that shows that perceived duration is dependent on cognitive load [28,33]. Difficulty has also been shown to disrupt time estimation [34] and frustration also inflates time estimates [10].

Table 2. Correlation tests.

Variable	Variable	Test	Coefficient	p-Value
Error Ratio	Assignment Score	Spearman	–0.4	<0.001
Error Ratio	Time-On-Task	Spearman	–0.46	<0.001
Error Ratio	Break Ratio	Spearman	0.15	=0.13
Error Ratio	Estimation Confidence	Kendall	0.01	=0.93
Error Ratio	3-Hour Perception	Kendall	0.03	=0.85

Open in a new tab

Fig 3 — Since some of the students with very small measured times may have worked outside of the *PyCharm* IDE that we used to capture keystrokes, we performed the test throwing out submissions that measured less than one hour of working time.

As a related measure, we looked at a possible correlation between “break ratio” and time duration ratio. The break ratio is the percentage of keystroke latencies that are greater than 60 seconds. Based on work done in interval timing that suggests that retrospective duration judgments lengthen with more remembered context changes [33], we hypothesized that the correlation would be positive, since a higher break ratio implies more breaks and thus more context changes. While the hypothesis test suggests a possible weak correlation, the finding is not statistically significant (Spearman’s rank correlation r = 0.15, p = 0.13 Table 2). See Fig 4.

Analysis of error in actual durations

An issue with estimating total assignment duration from keystroke data is that individuals may act very differently from each other, but we only use one model for engagement for all students [4]. While the model is robust for a group of students, it is not validated at an individual level. To account for error in individual differences we made two confidence intervals for each assignment submission. The first interval we built was the 100% range, meaning the absolute minimal and maximal duration supported by the data. The minimum time an assignment could take, based on Hart et al.’s [4] model, was the sum of the keystrokes with a latency of 45 seconds or less. Similarly, the maximum time an assignment could take was the sum of the keystrokes with a latency of 2 hours or less. This yielded an average interval of 6.67 hours. Surprisingly, we found that about half (49%) of self-reported assignment durations fell outside this 100% confidence interval, giving strong supporting evidence that students are, in fact, overestimating time-on-task.

The second confidence interval we made was the 90% interval. We chose this range because it is still broad enough to allow for individual differences in behavior, but doesn’t include the extremely unlikely cases that the 100% interval does. It is impractical to find the true 90% confidence interval because there are on average 115 latencies that must be probabilistically defined as engaged or disengaged, and thus 2¹¹⁵ possible durations for each assignment. To estimate the 90% confidence interval we probabilistically sampled each assignment’s duration 1,000 times, then found the 90% interval of our 1,000 samples. We chose a sample size of 1,000 by testing varying sample sizes from 100 to 2,000 and investigating the variance between samples. A sample size of 1000 provides reasonably small variance and could be computed quickly. See Fig 5. We found an average range of 1.05 hours for the 90% confidence interval and considered an estimate accurate if it falls within this interval. With this smaller range of uncertainty, we found only 9.5% of assignment submissions with reported times within the 90% confidence interval. See Fig 6. In our data, we find that self-reported assignment durations are not only inaccurate but inflated beyond reason with only 9.5% considered accurate.

Fig 5 — We ran 50 iterations of each sample size and computed the variance across iterations. For example, in sample size 1000 we estimated each assignment duration 1000 times, then calculated the percentiles. We repeated this 50 times and computed the average variance in all assignments.

Fig 6 — The center line indicates where estimated duration equals actual duration.

Pre-survey

In response to the question “I am capable of remembering how much time I worked on an assignment,” 79% of students selected either “Agree” or “Strongly Agree” in their ability to track time (Fig 7). Students think they can keep track of time spent working, but rarely can. This has implications that the memory of time spent working on assignments, and possibly the recollected effort and lost free time, is greater than reality. The perception of doing homework could be worse than the reality of doing homework. Understanding this could help motivate students to get their work done because it’s not as bad as they think it is. Students’ confidence in their ability to track time was not correlated with the estimation ratio (Kendall rank = 0.01, p = 0.93 Table 2).

Fig 7 — Where 1 = Strongly Disagree, ..., 5 = Strongly Agree.

In response to the question asking about the duration of a 3-hour assignment, 48% of students indicated that a 3-hour assignment was longer or significantly longer than most, and 48% indicated that a 3-hour assignment was about average (Fig 8). The median assignment time in our data was 5.1 hours; 70% longer than students were surveyed about. It is likely that students knew their assignments were lasting longer than what they thought was normal for most assignments causing some overestimation because assignments felt longer than most. Students’ perception of the duration of a 3-hour assignment was not correlated to the estimation ratio (Kendall rank = 0.03, p = 0.85 Table 2). Future work could investigate higher level courses: sophomore, junior and senior. Estimates may improve because students are accustomed to the college environment and have learned better time management skills. Overestimating assignment durations may be a good thing for students because they will plan more time to get their work done, but it is also possible that they do not plan enough time due to the planning fallacy [35].

Threats to validity

A possible threat to validity is that the model of engagement does not account for work done before a student’s first keystroke. Additionally, students make their time duration estimate in a post-assignment survey which could be done in a setting that could affect their estimate [36].

A shortcoming of any self-reported data is “that self-report and objective measures provide information on distinct, different aspects of work performance” [14] (e.g., counting the commute to work as work time). In our study, students may count the time spent downloading starter files, setting up their IDE, and submitting an assignment where we consider such tasks to be unrelated to the learning objectives of the assignment and therefore not measured as time-on-task by our model. (A member of the research team timed themselves doing these steps at under 3 minutes.) This discrepancy could explain some of the error between self-reported and measured assignment durations.

Students also exhibited a rounding bias, rounding estimates to whole or half hour increments; our measured time-on-task has millisecond resolution. We controlled for this with analysis of confidence intervals and giving students a generous range to be considered accurate.

This work was done in a single University’s CS1 course. The results of this study may not hold in other contexts.

Conclusions

This paper reports results of a study in which CS1 students estimated how much time they spent working on their assignments. Students overestimate time-on-task in CS1 programming tasks even across multiple assignments and contexts in which students make the duration estimation, all of which should factor into the complex interactions affecting duration judgments [36]. We found that students think they can accurately remember how much time they spend working on their assignments, yet students overestimated time-on-task 79% of the time, usually by 45% or more, which is much higher than overestimates found in working time studies. We also find that when a student scores higher on an assignment or takes less time to complete the assignment, their self-report is more likely to be accurate.

This work is a first step in bridging the gap between second- and minute-level interval time estimation studies and hour- and day-level time diary studies. Our ability to objectively measure ground truth durations while subjects perform a task in a natural setting at times of their choosing further emphasizes the validity of our results.

Our study also enables research into whether additional theories of time awareness apply at larger time scales. For example, it has been suggested that the ability of a human to make retrospective duration judgments isn’t based just on how well they are able to recall individual events, but rather, they use a heuristic based on how easily they can retrieve the events of the time period [37]. Our work takes the next step in validating the heuristic hypothesis and describing the heuristic itself. Some work suggests that time estimate error may behave similarly to the senses such as vision and hearing [38,39]. Those studies were conducted in the second and minute range. Using our high temporal resolution ground truthing could uncover whether this behavior scales to tasks lasting hours.

Estimates are influenced by expected completion time [11]. For future work, a study varying expected completion time would be important. This could be done by splitting student participants into three conditions and telling one group that the assignment would be expected to be completed in one time duration, telling another group another duration, and not giving the third group any time estimate. In other future work, performing similar studies in other contexts is needed. This would require a context in which high temporal resolution events (e.g., keystrokes) can be collected in a natural setting. Candidate contexts might be tasks requiring typing or tasks for which a wearable (e.g., watch) could be trained to detect on- or off-task activity.

Data Availability

Data cannot be shared publicly because it is identifiable. De-identification of programming process data is difficult and anonymization is nearly impossible to prove. This is because the keystrokes logs contain a record of every edit a student made to a document, which includes their name. Students could add any type of identifiable information to their homework and later remove them, but the log will still contain that information. We can remove this but if they have typed their name or other identifying information elsewhere. Even careful inspection can miss potentially identifiable information. USU’s institutional review board can be contacted at irb@usu.edu or 435-797-1821.

Funding Statement

The author(s) received no specific funding for this work.

References

1.Balcı F, Ünübol H, Grondin S, Sayar GH, van Wassenhove V, Wittmann M. Dynamics of retrospective timing: a big data approach. Psychon Bull Rev. 2023;30(5):1840–7. doi: 10.3758/s13423-023-02277-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Buhusi CV, Meck WH. What makes us tick? Functional and neural mechanisms of interval timing. Nat Rev Neurosci. 2005;6(10):755–65. doi: 10.1038/nrn1764 [DOI] [PubMed] [Google Scholar]
3.Edwards J, Hart K, Warren C. A practical model of student engagement while programming. In: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education-Volume 1. 2022. p. 558–64.
4.Hart K, Warren CM, Edwards J. Accurate estimation of time-on-task while programming. In: Proceedings of the 54th ACM Technical Symposium on Computer Science Education. 2023. p. 708–14.
5.Hicks RE, Miller GW, Kinsbourne M. Prospective and retrospective judgments of time as a function of amount of information processed. Am J Psychol. 1976;89(4):719–30. [PubMed] [Google Scholar]
6.Tobin S, Bisson N, Grondin S. An ecological approach to prospective and retrospective timing of long durations: a study involving gamers. PLoS One. 2010;5(2):e9271. doi: 10.1371/journal.pone.0009271 [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Walker JA, Aswad M, Lacroix G. The impact of cognitive load on prospective and retrospective time estimates at long durations: an investigation using a visual and memory search paradigm. Mem Cognit. 2022;50(4):837–51. doi: 10.3758/s13421-021-01241-7 [DOI] [PubMed] [Google Scholar]
8.Simen P, Matell M. Why does time seem to fly when we’re having fun?. Science. 2016;354(6317):1231–2. doi: 10.1126/science.aal4021 [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Wood RTA, Griffiths MD, Parke A. Experiences of time loss among videogame players: an empirical study. Cyberpsychol Behav. 2007;10(1):38–44. doi: 10.1089/cpb.2006.9994 [DOI] [PubMed] [Google Scholar]
10.Soares S, Atallah BV, Paton JJ. Midbrain dopamine neurons control judgment of time. Science. 2016;354(6317):1273–7. doi: 10.1126/science.aah5234 [DOI] [PubMed] [Google Scholar]
11.Robinson JP, Martin S, Glorieux I, Minnens J. The overestimated workweek revisited. Monthly Lab Rev. 2011;134:43. [Google Scholar]
12.Allen HM Jr, Bunn WB 3rd. Validating self-reported measures of productivity at work: a case for their credibility in a heavy manufacturing setting. J Occup Environ Med. 2003;45(9):926–40. doi: 10.1097/01.jom.0000090467.58809.5c [DOI] [PubMed] [Google Scholar]
13.Tangen S. An overview of frequently used performance measures. Work Study. 2003;52(7):347–54. [Google Scholar]
14.Pransky G, Finkelstein S, Berndt E, Kyle M, Mackell J, Tortorice D. Objective and self-report work performance measures: a comparative analysis. International Journal of Productivity and Performance Management. 2006;55(5):390–9. [Google Scholar]
15.Robinson JP, Bostrom A. The overestimated workweek-what time diary measures suggest. Monthly Lab Rev. 1994;117:11. [Google Scholar]
16.Sundstrom WA. The overworked American or the overestimated workweek? trend and bias in recent estimates of weekly work hours in the United States. Trend and Bias in Recent Estimates of Weekly Work Hours in the United States. 1999. [Google Scholar]
17.Carlos VS, Rodrigues RG. Development and validation of a self-reported measure of job performance. Social Indicators Research. 2016;126:279–307. [Google Scholar]
18.Marini MM, Shelton BA. Measuring household work: Recent experience in the United States. Social Science Research. 1993;22(4):361–82. [Google Scholar]
19.Press JE, Townsley E. Wives’ and husbands’ housework reporting: gender, class, and social desirability. Gender & Society. 1998;12(2):188–218. [Google Scholar]
20.Frazis H. Is the workweek really overestimated. Monthly Lab Rev. 2014;137:1. [Google Scholar]
21.Kovanovic V, Gašević D, Dawson S, Joksimovic S, Baker R. Does time-on-task estimation matter? Implications on validity of learning analytics findings. Journal of Learning Analytics. 2015;2(3):81–110. [Google Scholar]
22.Leinonen J, Castro FEV, Hellas A. Time-on-task metrics for predicting performance. In: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education; 2022. p. 871–7.
23.Murphy C, Kaiser G, Loveland K, Hasan S. Retina: helping students and instructors based on observed programming activities. In: Proceedings of the 40th ACM technical symposium on Computer Science Education; 2009. p. 178–82.
24.Price TW, Brown NC, Lipovac D, Barnes T, Kölling M. Evaluation of a frame-based programming editor. In: Proceedings of the 2016 ACM Conference on International computing education research; 2016. p. 33–42.
25.Kazerouni AM, Edwards SH, Shaffer CA. Quantifying incremental development practices and their relationship to procrastination. In: Proceedings of the 2017 ACM Conference on International Computing Education Research; 2017. p. 191–9.
26.Leinonen J, Leppänen L, Ihantola P, Hellas A. Comparison of time metrics in programming. In: Proceedings of the 2017 ACM Conference on International Computing Education Research; 2017. p. 200–8.
27.Leinonen A, Nygren H, Pirttinen N, Hellas A, Leinonen J. Exploring the applicability of simple syntax writing practice for learning programming. In: Proceedings of the 50th ACM Technical Symposium on Computer Science Education; 2019. p. 84–90.
28.Block RA, Hancock PA, Zakay D. How cognitive load affects duration judgments: a meta-analytic review. Acta Psychol (Amst). 2010;134(3):330–43. doi: 10.1016/j.actpsy.2010.03.006 [DOI] [PubMed] [Google Scholar]
29.Edwards J, Hart K, Shrestha R. Review of CSEDM data and introduction of two public CS1 keystroke datasets. Journal of Educational Data Mining. 2023;15(1):1–31. [Google Scholar]
30.Hart K, Mano C, Edwards J. Plagiarism deterrence in CS1 through keystroke data. In: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1; 2023. p. 493–9.
31.Landsberger HA. Hawthorne Revisited: Management and the Worker, Its Critics, and Developments in Human Relations in Industry. ERIC; 1958.
32.Americans A. Measuring time at work: are self-reports accurate? Monthly Labor Review. 1998;43. [Google Scholar]
33.Block RA, Grondin S, Zakay D. Prospective and retrospective timing processes: theories, methods, and findings. Timing and Time Perception: Procedures, Measures, & Applications. Brill; 2018. p. 32–51.
34.Brown SW. Attentional resources in timing: interference effects in concurrent temporal and nontemporal working memory tasks. Percept Psychophys. 1997;59(7):1118–40. doi: 10.3758/bf03205526 [DOI] [PubMed] [Google Scholar]
35.Kahneman D, Tversky A. Intuitive prediction: biases and corrective procedures. Decision Research. 1977. [Google Scholar]
36.Block RA. Experiencing and remembering time: affordances, context, and cognition. Elsevier; 1989. p. 333–63.
37.Block RA, Zakay D. Retrospective and prospective timing: memory, attention, and consciousness. Time and Memory: Issues in Philosophy and Psychology. 2001. p. 59–76.
38.Gibbon J, Malapani C, Dale CL, Gallistel C. Toward a neurobiology of temporal cognition: advances and challenges. Curr Opin Neurobiol. 1997;7(2):170–84. doi: 10.1016/s0959-4388(97)80005-0 [DOI] [PubMed] [Google Scholar]
39.Coull JT, Cheng R-K, Meck WH. Neuroanatomical and neurochemical substrates of timing. Neuropsychopharmacology. 2011;36(1):3–25. doi: 10.1038/npp.2010.113 [DOI] [PMC free article] [PubMed] [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0330758.r001

Decision Letter 0

Vineeta Khemchandani

26 Mar 2025

PONE-D-24-26297Time-on-task estimation for tasks lasting hours spread over multiple daysPLOS ONE

Dear Dr. Hart,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by May 10 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Vineeta Khemchandani

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

Before we proceed with your manuscript, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see

https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible.

We will update your Data Availability statement on your behalf to reflect the information you provide.

3. Please ensure that you refer to Figures 5-8 in your text as, if accepted, production will need this reference to link the reader to the figure.

4. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: 1. Might a correction in two data points shown in method section "33 students in an introductory computer programming course (CS1)......", We collected data on 106 student submissions......"

2. The research questions is clearly visible but the objective of the research not written clearly.

3. The implication of the research is to be added.

4. One Correlation Table might be added.

Reviewer #2: Here are few important suggestions for modifications or changes in the manuscript:

1. Enhanced Methodology Section: Expand the methodology section to provide detailed information about the keystroke data collection process, including the specific tools used for recording, the duration of the data collection period, and how participant engagement was defined and measured. This will help readers better understand the rigor of your methods.

2. In-depth Analysis of Results: Include a more thorough analysis of the results, particularly the implications of overestimating time-on-task. Discuss potential psychological or contextual factors that may contribute to this overestimation, providing a deeper understanding of the findings.

3. Integrate Visual Aids: Consider adding visual aids such as graphs or tables that summarize key findings or display trends in time estimation versus actual time spent. This could make the results more accessible and impactful for readers.

4. Broader Implications and Future Directions: In the discussion section, emphasize the broader implications of your findings for educational practices and learning analytics. Suggest specific areas for future research that could expand on your work or explore related questions, thus guiding subsequent studies in the field.

Please site few papers in your manuscript given as follows:

1. Novita, M., Saputro, N. D., Chauhan, A. S., & Waliyansyah, R. R. (2022). Digitalization of Education in the Implementation of Kurikulum Merdeka. KnE Social Sciences, 153-164.

2. Chauhan, A. S. (2022). Modeling and predicting student academic performance in higher education using data mining techniques. International Journal of Software Innovation (IJSI), 10(1), 1-10.

3. Sudirman, S., Rodríguez-Nieto, C. A., Dhlamini, Z. B., Chauhan, A. S., Baltaeva, U., Abubakar, A., ... & Andriani, M. (2023). Ways of thinking 3D geometry: exploratory case study in junior high school students. Polyhedron International Journal in Mathematics Education, 1(1), 15-34.

4. Chauhan, A. S., Singh, Y., & Soam, A. (2012). Effective Decision Making in Higher Educational Institutions Using Data Warehousing and Data Mining. Journal of Computer Science Engineering and Information Technology Research (JCSEITR), 2(1).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2025 Sep 18;20(9):e0330758. doi: 10.1371/journal.pone.0330758.r002

Author response to Decision Letter 1

23 Jun 2025

Dear Editors,

We are delighted at the opportunity to revise our paper and even more delighted with the comments we received from reviewers. We believe the paper is a far better offering to the community thanks to their insightful reviews. Below we reproduce specific comments along with the changes we made as a result. The resubmitted paper has a number of changes and additions.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming.

We updated file names and captions. Included .tex file in submition

We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

Keystroke data is highly identifiable even after removing the identifying labels. This is because students type their name at the beginning of their files, and the keystroke data recreates that name. We can remove this but they have typed their name or other identifying information elsewhere. Even careful inspection can miss potentially identifiable information.

USU’s institutional review board can be contacted at irb@usu.edu or 435-797-1821

Please ensure that you refer to Figures 5-8 in your text as, if accepted, production will need this reference to link the reader to the figure.

Updated references to include in text.

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

We searched through https://retractiondatabase.org and did not find any retractions.

Reviewer #1: Might a correction in two data points shown in method section "33 students in an introductory computer programming course (CS1)......", We collected data on 106 student submissions......"

Fixed grammar (33 => Thirty-three). Clarified that there are 33 students and 106 assignment submissions (3.2 assignments per student).

Reviewer #1: The research questions is clearly visible but the objective of the research not written clearly.

The objective is to discover how accurate time estimates are for long tasks. To do this we needed to bridge the gap between short term time measures that are objective and long term time measures that are subjective. We added the following sentence to the end of introduction section to clarify this:

“This paper contributes to time-estimate research by demonstrating methodology to obtain objective measures for long duration tasks in a case study exploring subjective time estimates to objective time estimates.”

Reviewer #1: The implication of the research is to be added.

The implication is that better objective measures of time can be found for long duration tasks. This would enable better research into human time perception. See paragraph three of conclusion.

“Our study also enables research into whether additional theories of time awareness apply at larger time scales. For example…”

Reviewer #1: One Correlation Table might be added.

Added Table 2, the correlations we investigated are now easily accessible in one place.

Reviewer #2: Enhanced Methodology Section: Expand the methodology section to provide detailed information about the keystroke data collection process, including the specific tools used for recording, the duration of the data collection period, and how participant engagement was defined and measured. This will help readers better understand the rigor of your methods.

The ShowYourWork plugin was the tool we used for data collection. Added dates for the semesters. Added the engagement formula we used to section “Measurement of Actual Time Duration” and paragraph explaining its use.

Reviewer #2: In-depth Analysis of Results: Include a more thorough analysis of the results, particularly the implications of overestimating time-on-task. Discuss potential psychological or contextual factors that may contribute to this overestimation, providing a deeper understanding of the findings.

Second paragraph of Section Estimated Time is just this. We weren’t testing a specific cause, so I’m including reasons that may contribute.

Added “The perception of doing homework could be worse than the reality of doing homework.” to section Pre-Sruvey

Reviewer #2: Integrate Visual Aids: Consider adding visual aids such as graphs or tables that summarize key findings or display trends in time estimation versus actual time spent. This could make the results more accessible and impactful for readers.

Time estimation vs actual time is fig 6. It is possible that the reviewers couldn’t see the figures because of formatting errors with file names.

Reviewer #2: Broader Implications and Future Directions: In the discussion section, emphasize the broader implications of your findings for educational practices and learning analytics. Suggest specific areas for future research that could expand on your work or explore related questions, thus guiding subsequent studies in the field.

Added the following

1. Students think homework is worse than it is.

2. Investigate other courses in the future, possible one per level: freshmen, sophomore, junior, senior. Do estimation errors change as students become more accustomed to college? Does accuracy change with experience?

3. Future research should investigate the general application of objective time measures for long duration tasks. To do this a model of engagement would need to be made for other contexts. Will self-estimates of task duration be overestimated like homework?

Reviewer #2: Please site few papers in your manuscript given as follows:

1. Novita, M., Saputro, N. D., Chauhan, A. S., & Waliyansyah, R. R. (2022). Digitalization of Education in the Implementation of Kurikulum Merdeka. KnE Social Sciences, 153-164.

2. Chauhan, A. S. (2022). Modeling and predicting student academic performance in higher education using data mining techniques. International Journal of Software Innovation (IJSI), 10(1), 1-10.

Response

These papers are not related to time estimation or time estimation errors. We reviewed the first three and did not see the relevance to our research. I was unable to find the fourth. We would be happy to update the citations if we understood how they are related.

Attachment

Submitted filename: Response to reviewers IRB.pdf

pone.0330758.s001.pdf^{(136.1KB, pdf)}

PLoS One. doi: 10.1371/journal.pone.0330758.r003

Decision Letter 1

Vineeta Khemchandani

6 Aug 2025

Time-on-task estimation for tasks lasting hours spread over multiple days

PONE-D-24-26297R1

Dear Dr. Hart,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Vineeta Khemchandani

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

**********

6. Review Comments to the Author

Reviewer #1: No more comments from my site. The authors addressed all the comments. Accept the manuscript. This manuscript has highlighted some clear insight.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Aurobindo Kar

**********

PLoS One. doi: 10.1371/journal.pone.0330758.r004

Acceptance letter

Vineeta Khemchandani

PONE-D-24-26297R1

PLOS ONE

Dear Dr. Hart,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Vineeta Khemchandani

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Response to reviewers IRB.pdf

pone.0330758.s001.pdf^{(136.1KB, pdf)}

Data Availability Statement

[pone.0330758.ref001] 1.Balcı F, Ünübol H, Grondin S, Sayar GH, van Wassenhove V, Wittmann M. Dynamics of retrospective timing: a big data approach. Psychon Bull Rev. 2023;30(5):1840–7. doi: 10.3758/s13423-023-02277-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0330758.ref002] 2.Buhusi CV, Meck WH. What makes us tick? Functional and neural mechanisms of interval timing. Nat Rev Neurosci. 2005;6(10):755–65. doi: 10.1038/nrn1764 [DOI] [PubMed] [Google Scholar]

[pone.0330758.ref003] 3.Edwards J, Hart K, Warren C. A practical model of student engagement while programming. In: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education-Volume 1. 2022. p. 558–64.

[pone.0330758.ref004] 4.Hart K, Warren CM, Edwards J. Accurate estimation of time-on-task while programming. In: Proceedings of the 54th ACM Technical Symposium on Computer Science Education. 2023. p. 708–14.

[pone.0330758.ref005] 5.Hicks RE, Miller GW, Kinsbourne M. Prospective and retrospective judgments of time as a function of amount of information processed. Am J Psychol. 1976;89(4):719–30. [PubMed] [Google Scholar]

[pone.0330758.ref006] 6.Tobin S, Bisson N, Grondin S. An ecological approach to prospective and retrospective timing of long durations: a study involving gamers. PLoS One. 2010;5(2):e9271. doi: 10.1371/journal.pone.0009271 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0330758.ref007] 7.Walker JA, Aswad M, Lacroix G. The impact of cognitive load on prospective and retrospective time estimates at long durations: an investigation using a visual and memory search paradigm. Mem Cognit. 2022;50(4):837–51. doi: 10.3758/s13421-021-01241-7 [DOI] [PubMed] [Google Scholar]

[pone.0330758.ref008] 8.Simen P, Matell M. Why does time seem to fly when we’re having fun?. Science. 2016;354(6317):1231–2. doi: 10.1126/science.aal4021 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0330758.ref009] 9.Wood RTA, Griffiths MD, Parke A. Experiences of time loss among videogame players: an empirical study. Cyberpsychol Behav. 2007;10(1):38–44. doi: 10.1089/cpb.2006.9994 [DOI] [PubMed] [Google Scholar]

[pone.0330758.ref010] 10.Soares S, Atallah BV, Paton JJ. Midbrain dopamine neurons control judgment of time. Science. 2016;354(6317):1273–7. doi: 10.1126/science.aah5234 [DOI] [PubMed] [Google Scholar]

[pone.0330758.ref011] 11.Robinson JP, Martin S, Glorieux I, Minnens J. The overestimated workweek revisited. Monthly Lab Rev. 2011;134:43. [Google Scholar]

[pone.0330758.ref012] 12.Allen HM Jr, Bunn WB 3rd. Validating self-reported measures of productivity at work: a case for their credibility in a heavy manufacturing setting. J Occup Environ Med. 2003;45(9):926–40. doi: 10.1097/01.jom.0000090467.58809.5c [DOI] [PubMed] [Google Scholar]

[pone.0330758.ref013] 13.Tangen S. An overview of frequently used performance measures. Work Study. 2003;52(7):347–54. [Google Scholar]

[pone.0330758.ref014] 14.Pransky G, Finkelstein S, Berndt E, Kyle M, Mackell J, Tortorice D. Objective and self-report work performance measures: a comparative analysis. International Journal of Productivity and Performance Management. 2006;55(5):390–9. [Google Scholar]

[pone.0330758.ref015] 15.Robinson JP, Bostrom A. The overestimated workweek-what time diary measures suggest. Monthly Lab Rev. 1994;117:11. [Google Scholar]

[pone.0330758.ref016] 16.Sundstrom WA. The overworked American or the overestimated workweek? trend and bias in recent estimates of weekly work hours in the United States. Trend and Bias in Recent Estimates of Weekly Work Hours in the United States. 1999. [Google Scholar]

[pone.0330758.ref017] 17.Carlos VS, Rodrigues RG. Development and validation of a self-reported measure of job performance. Social Indicators Research. 2016;126:279–307. [Google Scholar]

[pone.0330758.ref018] 18.Marini MM, Shelton BA. Measuring household work: Recent experience in the United States. Social Science Research. 1993;22(4):361–82. [Google Scholar]

[pone.0330758.ref019] 19.Press JE, Townsley E. Wives’ and husbands’ housework reporting: gender, class, and social desirability. Gender & Society. 1998;12(2):188–218. [Google Scholar]

[pone.0330758.ref020] 20.Frazis H. Is the workweek really overestimated. Monthly Lab Rev. 2014;137:1. [Google Scholar]

[pone.0330758.ref021] 21.Kovanovic V, Gašević D, Dawson S, Joksimovic S, Baker R. Does time-on-task estimation matter? Implications on validity of learning analytics findings. Journal of Learning Analytics. 2015;2(3):81–110. [Google Scholar]

[pone.0330758.ref022] 22.Leinonen J, Castro FEV, Hellas A. Time-on-task metrics for predicting performance. In: Proceedings of the 53rd ACM Technical Symposium on Computer Science Education; 2022. p. 871–7.

[pone.0330758.ref023] 23.Murphy C, Kaiser G, Loveland K, Hasan S. Retina: helping students and instructors based on observed programming activities. In: Proceedings of the 40th ACM technical symposium on Computer Science Education; 2009. p. 178–82.

[pone.0330758.ref024] 24.Price TW, Brown NC, Lipovac D, Barnes T, Kölling M. Evaluation of a frame-based programming editor. In: Proceedings of the 2016 ACM Conference on International computing education research; 2016. p. 33–42.

[pone.0330758.ref025] 25.Kazerouni AM, Edwards SH, Shaffer CA. Quantifying incremental development practices and their relationship to procrastination. In: Proceedings of the 2017 ACM Conference on International Computing Education Research; 2017. p. 191–9.

[pone.0330758.ref026] 26.Leinonen J, Leppänen L, Ihantola P, Hellas A. Comparison of time metrics in programming. In: Proceedings of the 2017 ACM Conference on International Computing Education Research; 2017. p. 200–8.

[pone.0330758.ref027] 27.Leinonen A, Nygren H, Pirttinen N, Hellas A, Leinonen J. Exploring the applicability of simple syntax writing practice for learning programming. In: Proceedings of the 50th ACM Technical Symposium on Computer Science Education; 2019. p. 84–90.

[pone.0330758.ref028] 28.Block RA, Hancock PA, Zakay D. How cognitive load affects duration judgments: a meta-analytic review. Acta Psychol (Amst). 2010;134(3):330–43. doi: 10.1016/j.actpsy.2010.03.006 [DOI] [PubMed] [Google Scholar]

[pone.0330758.ref029] 29.Edwards J, Hart K, Shrestha R. Review of CSEDM data and introduction of two public CS1 keystroke datasets. Journal of Educational Data Mining. 2023;15(1):1–31. [Google Scholar]

[pone.0330758.ref030] 30.Hart K, Mano C, Edwards J. Plagiarism deterrence in CS1 through keystroke data. In: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1; 2023. p. 493–9.

[pone.0330758.ref031] 31.Landsberger HA. Hawthorne Revisited: Management and the Worker, Its Critics, and Developments in Human Relations in Industry. ERIC; 1958.

[pone.0330758.ref032] 32.Americans A. Measuring time at work: are self-reports accurate? Monthly Labor Review. 1998;43. [Google Scholar]

[pone.0330758.ref033] 33.Block RA, Grondin S, Zakay D. Prospective and retrospective timing processes: theories, methods, and findings. Timing and Time Perception: Procedures, Measures, & Applications. Brill; 2018. p. 32–51.

[pone.0330758.ref034] 34.Brown SW. Attentional resources in timing: interference effects in concurrent temporal and nontemporal working memory tasks. Percept Psychophys. 1997;59(7):1118–40. doi: 10.3758/bf03205526 [DOI] [PubMed] [Google Scholar]

[pone.0330758.ref035] 35.Kahneman D, Tversky A. Intuitive prediction: biases and corrective procedures. Decision Research. 1977. [Google Scholar]

[pone.0330758.ref036] 36.Block RA. Experiencing and remembering time: affordances, context, and cognition. Elsevier; 1989. p. 333–63.

[pone.0330758.ref037] 37.Block RA, Zakay D. Retrospective and prospective timing: memory, attention, and consciousness. Time and Memory: Issues in Philosophy and Psychology. 2001. p. 59–76.

[pone.0330758.ref038] 38.Gibbon J, Malapani C, Dale CL, Gallistel C. Toward a neurobiology of temporal cognition: advances and challenges. Curr Opin Neurobiol. 1997;7(2):170–84. doi: 10.1016/s0959-4388(97)80005-0 [DOI] [PubMed] [Google Scholar]

[pone.0330758.ref039] 39.Coull JT, Cheng R-K, Meck WH. Neuroanatomical and neurochemical substrates of timing. Neuropsychopharmacology. 2011;36(1):3–25. doi: 10.1038/npp.2010.113 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Time-on-task estimation for tasks lasting hours spread over multiple days

Kaden Hart

Christopher M Warren

John Edwards

Roles

Abstract

Introduction

Background

Interval time estimation

Time diary studies

Measuring time-on-task using keystroke logs

Bridging the gap

Methods

Table 1. Descriptions of programs students programmed for their assignments.

Measurement of actual time duration

Time estimation

Duration judgment ratio

Pre-survey

Results and discussion

Estimated time

Fig 1. Distribution of duration judgment ratios between estimated duration and actual duration for each assignment submission.

Task performance vs. Estimate error

Table 2. Correlation tests.

Fig 2. Assignment score vs. Duration judgment ratio.

Fig 3. Total measured time vs. Ratio.

Fig 4. Breaks vs. Ratio.

Analysis of error in actual durations

Fig 5. Variance vs. Sample size for probabilistically sampled assignment durations.

Fig 6. Measured vs. Reported assignment duration with 90% confidence interval for assignment submission.

Pre-survey

Fig 7. Average ratio and survey responses for the survey question “I am capable of remembering how much time I worked on an assignment.”.

Fig 8. Average estimation ratio and survey responses for the survey question “A 3-hour assignment is: 1 = Significantly Shorter Than Most, ..., 5 = Significantly Longer Than Most.”.

Threats to validity

Conclusions

Data Availability

Funding Statement

References

Decision Letter 0

Vineeta Khemchandani

Roles

Author response to Decision Letter 1

Decision Letter 1

Vineeta Khemchandani

Roles

Acceptance letter

Vineeta Khemchandani

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases