Skip to main content
PLOS One logoLink to PLOS One
. 2021 Sep 17;16(9):e0257500. doi: 10.1371/journal.pone.0257500

Time-of-day changes in physician clinical decision making: A retrospective study

Peter Trinh 1,*, Donald R Hoover 2, Frank A Sonnenberg 3
Editor: Paola Iannello4
PMCID: PMC8448311  PMID: 34534247

Abstract

Background

Time of day has been associated with variations in certain clinical practices such as cancer screening rates. In this study, we assessed how more general process measures of physician activity, particularly rates of diagnostic test ordering and diagnostic assessments, might be affected by time of day.

Methods

We conducted a retrospective chart review of 3,342 appointments by 20 attending physicians at five outpatient clinics, matching appointments by physician and comparing the average diagnostic tests ordered and average diagnoses assessed per appointment in the first hour of the day versus the last hour of the day. Statistical analyses used sign tests, two-sample t-tests, Wilcoxon tests, Kruskal Wallis tests, and multivariate linear regression.

Results

Examining physicians individually, four and six physicians, respectively, had statistically significant first- versus last-hour differences in the number of diagnostic tests ordered and number of diagnoses assessed per patient visit (p ≤ 0.04). As a group, 16 of 20 physicians ordered more tests on average in the first versus last hour (p = 0.012 for equal chance to order more in each time period). Substantial intra-clinic heterogeneity was found in both outcomes for four of five clinics (p < 0.01).

Conclusions

There is some statistical evidence on an individual and group level to support the presence of time-of-day effects on the number of diagnostic tests ordered per patient visit. These findings suggest that time of day may be a factor influencing fundamental physician behavior and processes. Notably, many physicians exhibited significant variation in the primary outcomes compared to same-specialty peers. Additional work is necessary to clarify temporal and inter-physician variation in the outcomes of interest.

Introduction

Decision-making is fundamental to patient care, and patients and fellow healthcare professionals expect physicians to make clinical decisions in a consistent and deliberative evidence-based manner. However, a growing body of evidence suggests that physicians are not as consistent as expected and may be susceptible to decision fatigue, which is the depletion of self-control and the reduced ability to make decisions and regulate behavior as a result of frequent and recurrent acts of decision making [1, 2].

The study of decision fatigue is derived from psychology studies assessing the nature of self-control. A prominent model in the field is the Strength Model of Self-Control, which posits that self-control originates from an unknown internal resource that functions similarly to a muscle in that it fatigues over time, particularly due to sequential tasks that require self-control [3]. When this internal resource is exhausted or fatigued, one becomes ego-depleted, which is a “state of diminished [mental] resources following exertion of self-control” [4]. In this state, one is more likely to perform worse on subsequent tasks requiring self-control, a phenomenon known as the sequential task paradigm. Another model, the mechanistic Process Model of Ego Depletion posits that ego depletion is less the result of the exhaustion of some internal resource and more the product of subconscious shifts in motivation and attention over time that detract from one’s ability to regulate impulses and exercise self-control [5]. While the two models each have their merits, decreased cognitive performance over time is a central feature of both.

Decreased cognitive performance over time is particularly pertinent in the context of decision making and decision fatigue. Frequent and recurrent decisions over time have been shown to impair the ability to exercise self-control and make appropriate decisions in ensuing tasks. For instance, in a randomized controlled trial in which the degree of pain tolerance functioned as a proxy for self-control, Vohs and colleagues found that shoppers who had to make many shopping decisions at a grocery store subsequently demonstrated decreased pain tolerance compared to shoppers who only thought about the grocery choices without making decisions [2]. A well-known 2011 study by Danziger et al. examined 1000+ parole decisions and found that the percentage of favorable decisions for defendants gradually decreased from ~65% to near zero as decision making sessions progressed over the course of a day, and this pattern reset and repeated after meal breaks [6]. Both studies arguably demonstrate a temporal effect with decision making and decision fatigue.

In medicine, several studies have shown temporal variations in clinical outcomes that are explained potentially by decision fatigue. For example, in one study by Persson et al., the probability of orthopedic surgeons deciding to operate on a patient was strongly associated with a patient’s appointment time, with probabilities steadily declining throughout the day [7]. Additionally, rates of influenza vaccination and clinician ordering of cancer screening tests have been shown to significantly decline over the course of a day [8, 9], and similar patterns were observed in studies examining primary care providers’ antibiotic and opioid prescribing practices as well as hand-washing compliance in hospitals [1012]. These findings clearly suggest that physicians’ ability to make rational, evidence-based patient care decisions may suffer as a day goes on, and patients seen later in a clinic day may experience suboptimal care compared to those seen at the start of the day. Such temporal variations in clinical outcomes have significant implications for healthcare quality.

To our knowledge, studies examining temporal variations and the potential influence of decision fatigue on clinical decision making have been limited to assessing specific clinical decisions. Therefore, we aimed to investigate variations in more general process measures of clinical activity that could potentially reflect fundamental physician behavior on a more generalizable dimension and spur greater thought and action to mitigate variation in healthcare quality.

We examined variations in two general process measures: the number of diagnostic tests ordered and the number of diagnoses assessed per patient encounter. We are not aware of any prior studies that have utilized these outcomes in the assessment of temporal variation and decision fatigue. Both were chosen because unlike outcomes in prior studies, these are generalizable to most physicians; almost all clinicians must go through the decision-making process of ordering tests and assessing diagnoses in an electronic medical record (EMR) system in order to properly care for patients. Also, given that those experiencing decision fatigue tend to exhibit avoidant behaviors like procrastination, deferment, or complete avoidance of decisions during subsequent decision making [1, 2, 7, 13], temporal variations in the number of these decisions could be a suitable proxy for decision fatigue.

Moreover, these process measures can be used as markers of healthcare quality. Diagnostic test ordering has important implications for the general practice of medicine, healthcare costs, and clinical outcomes. Estimates of the rates of unnecessary tests vary from 10–50% of all orders [14, 15], and such inappropriate testing can have significant downstream effects on patients as a result of follow-up tests, prolonged hospital stays, patient dissatisfaction, and unnecessary referrals or procedures [16, 17]. Alternatively, a landmark study showed that patients seen by their physicians receive only approximately 55% of recommended care, including diagnostic tests, for preventive, acute, and chronic conditions [18]. In considering the number of diagnoses assessed per patient appointment, research has suggested that quality of care typically rises as a patient’s number of medical conditions rises [19]. Assuming that the number of diagnoses assessed by a physician rises with the number of medical conditions a patient has, the number of diagnoses may be an indicator of healthcare quality, possibly as a reflection of the thoroughness of a clinical encounter. How thoroughly clinicians evaluate their patients may vary due to time of day and decision fatigue.

Based on the behavioral theories of ego depletion and evidence that both repeated decision-making leads to decision fatigue and that those who are depleted are more likely to exhibit avoidant behavior like deferring decisions [2, 7], we conducted a retrospective analysis of patient records to assess the outcomes of the number of diagnostics tests ordered per patient appointment and the number of diagnoses assessed per patient appointment. We hypothesized that physicians ordered fewer diagnostic tests and assessed fewer diagnoses per patient appointment for patients seen during the last hour versus the first hour of the day.

Methods

This study was approved by the institutional review board of New Brunswick Health Sciences, Rutgers, the State University of New Jersey. A waiver of informed consent was granted due to the study’s minimal risk and infeasibility of informed consent given the study’s retrospective design.

Setting and participants

Patient data from 5,354 unique appointments by 39 attending physicians from January 1 to December 31, 2017 were gathered retrospectively from the AthenaFlow EMR (formerly known as GE Centricity) used by outpatient clinics in the Department of Medicine of Rutgers Robert Wood Johnson Medical School. The study authors captured all encounters from six different specialty clinics in order to control for inter-specialty differences in diagnostic test ordering patterns. The cardiology, endocrinology, hematology, nephrology, general internal medicine, and rheumatology practices clinics were chosen because they tend to differ in the number and type of diagnoses managed and tests ordered. To preserve provider confidentiality, clinic identities were masked during data extraction and analysis, and physicians were designated by numbers.

Outpatient clinic sessions generally occur in two 4-hour sessions, typically either morning (8AM-12PM) or afternoon (1PM-5PM). Some physicians see outpatients only in the morning or only in the afternoon, and some see patients for a full day. Providers and the associated patient visit data were included in the study if 1) the providers held full clinic days (both morning and afternoon), since prior studies have demonstrated significant temporal differences in decision outcomes between the start and end of days [68], 2) the clinicians had at least 25 patient-visits in the first hour (~8-9AM) and at least 25 patient-visits in the last hour (~4-5PM) of the above-described days, cumulative over the entire year, to ensure adequate sample size, and 3) the providers were attending-level physicians. Patient visits were excluded if patients were seen by residents and fellows and if patients were younger than 18. Using these criteria, data from 3,342 unique patient appointments by 20 attending physicians at five clinics were included; no doctors from Clinic Two were included due to lack of physicians who saw patients for a full day.

Data collection

Patient encounters were matched by physician, and to maximize potential for differential cumulative work fatigue, only data from the morning’s first hour (~8-9AM) and the afternoon’s last hour (~4-5PM) were analyzed. The start time of an encounter was defined as the EMR time stamp on the first entry in the History of Present Illness section in the EMR. Appointments were counted as first-hour appointments if the start time was within one hour of the start of the first appointment of the day. For example, if the first appointment of the day started at 8:00am, then an appointment that started at 8:59am was counted as occurring in the first hour, while an appointment starting at or after 9:00am was not. Appointments were identified as last-hour appointments if, according to the clinic schedule, they ended less than one hour prior to the end of the last appointment of the afternoon. For instance, if the last appointment ended at 5:15pm, an appointment that started at 3:50pm and ended at 4:20pm was categorized as being in the last hour, but one that started at 3:50pm and ended at or before 4:15pm was not.

The number of diagnoses assessed during each encounter was determined by the number of ICD-10 codes associated with the Evaluation and Management (E&M) billing code for the encounter. In the AthenaFlow EMR, all diagnoses recorded in the Assessment and Plan portion of EMR notes are automatically associated with the E&M code. Additional data collected included patient age, sex, race, ethnicity, health insurance, number of diagnostic tests ordered, number of diagnoses evaluated, and patient primary and secondary diagnoses via ICD-10 codes. Patients’ active problems were collected to calculate the Charlson comorbidity index [20].

Statistical analysis

ANOVA models with physician as a covariate assessed the overall differences (first and last hour combined) in means between physicians at the same clinic sites, and two sample t-tests assessed the differences in first- versus last-hour means for the same physician for the primary outcomes. As the distributions of numbers of diagnoses made and diagnostic tests ordered per visit were often skewed, nonparametric Kruskal Wallis and Wilcoxon tests, respectively, were also used to obtain p-values for the same-clinic site physician differences and the same-physician first versus last hour differences. To confirm that statistically significant Wilcoxon and Kruskal Wallis p-values seen in unadjusted analyses remained statistically significant in case mix-adjusted analyses (p < 0.05), to the best of our ability to do so, multivariate linear regression was used. Linear models of the primary outcomes comparing i) all physicians at the same site, and ii) within each individual physician, first versus last hour, were fit and adjusted for patient age, sex, race, ethnicity, health insurance, and Charlson comorbidity index. No confirmatory case mix-adjusted comparisons were made for physicians with nonsignificant Wilcoxon and Kruskal Wallis p-values. For the number of diagnostic tests ordered per appointment, the multivariate models also adjusted for number of diagnoses made per appointment, and the overall comparisons of physicians within clinics were adjusted for first versus last hour.

It should be noted that as a result of i) our belief that first versus last hour differences varied from physician to physician, ii) skewness of the data and limited numbers of observations for some physicians, and iii) other violations of needed assumptions, such as homogeneity of variance between physicians, we did not find it feasible nor informative to fit more complicated, repeated measures linear models that pooled all physicians together. However, we did use the sign test to compare numbers of physicians that had more diagnoses made (or more laboratory tests ordered) in the first versus last hour to see if a directional, across-physician trend existed. All p-values reported are two-sided. Analyses were conducted using SAS Version 9.4 (Cary, NC), and statistical tests considered p < 0.05 to be statistically significant.

Results

The study included 20 physicians in five practices with 3,342 total patient appointments: 2,013 in the first hour and 1,329 in the last hour. Patient characteristics, except perhaps for sex, were similar between patients seen in the first versus the last hour of the day (Table 1). About 55% of first-hour appointments and 61% of last-hour appointments were with female patients, but this difference varies by physician; the Breslow-Day test of homogeneity for equality of proportion of female patients in the first versus last hour between physicians is 0.025. Fourteen physicians had a greater proportion of female patients in last-hour visits. Eighteen physicians had more patient encounters in the first hour of the day versus the last hour compared to only one clinician with the opposite trend (sign test p < 0.001) (Table 2). A single physician had the same number of appointments in each time period.

Table 1. Sample demographics of patients.

  No. (%)  
Characteristic First Hour Last Hour
Patients, No. 2013 1329
Age, mean (SD), years 56.4 (16.6) 56.3 (17.7)
Gender    
    Male 908 (45.1) 515 (38.8)
    Female 1104 (54.8) 814 (61.2)
    Unspecified 1 (0.0) 0 (0.0)
Ethnicity    
    Hispanic or Latino 317 (15.7) 192 (14.4)
    Not Hispanic or Latino 1663 (82.6) 1120 (84.3)
    Unspecified 33 (1.6) 17 (1.3)
Race    
    American Indian or Alaska Native 4 (0.2) 7 (0.5)
    Asian 176 (8.7) 138 (10.4)
    Black or African American 373 (18.5) 276 (20.8)
    Native Hawaiian or Other Pacific Islander 6 (0.3) 6 (0.5)
    White 1102 (54.7) 685 (51.5)
    Unspecified 352 (17.5) 217 (16.3)
Insurance    
    Private 1224 (60.8) 770 (57.9)
    Medicare 735 (36.5) 509 (38.3)
    Medicaid 6 (0.3) 2 (0.2)
    Not Recorded 48 (2.4) 48 (3.6)
Charlson Comorbidity Index    
    0 679 (33.7) 394 (29.6)
    1 535 (26.6) 386 (29.0)
    2+ 799 (39.7) 549 (41.3)

Table 2. Differences in mean number of laboratory tests ordered.

  Number of Appointments Means ± Std-Dev P Values
Clinic Location Provider First Hour Last Hour First Hour Last Hour Difference Wilcoxon P-Genmod
One 2 27 19 3.3 ± 2.13 2.95 ± 2.2 -0.35 0.58 ÷
6 33 21 0.52 ± 0.83 0.33 ± 0.91 -0.19 0.18 ÷
7 143 80 1.05 ± 1.38 0.81 ± 1.23 -0.24 0.18 ÷
8 21 16 0.38 ± 0.97 0.75 ± 1.48 0.37 0.55 ÷
10 294 172 0.59 ± 1.15 0.53 ± 1.02 -0.06 0.98 ÷
Three 17 29 29 4.59 ± 4.57 4.48 ± 4.01 -0.11 0.87 ÷
18 43 22 8.44 ± 2.6 7.95 ± 2.42 -0.49 0.74 ÷
Four 20 27 19 7.81 ± 6.14 6.96 ± 5.73 -0.85 0.46 ÷
22 48 27 5.39 ± 2.91 1.63 ± 2.16 -3.76 <0.001* <0.001*
Five 25 197 126 3.07 ± 2.58 1.83 ± 2.46 -1.24 <0.001* 0.012*
26 18 16 1.67 ± 2.2 1.81 ± 2.43 0.14 0.74 ÷
27 21 31 1.48 ± 2.04 1.29 ± 2.05 -0.17 0.74 ÷
29 32 19 3.09 ± 3.33 2.32 ± 2.5 -0.77 0.56 ÷
30 290 215 2.91 ± 3.14 1.84 ± 2.31 -1.07 0.0002* 0.002*
31 61 29 2.28 ± 2.41 2.14 ± 2.66 -0.14 0.58 ÷
32 64 36 2.52 ± 2.29 1.08 ± 1.73 -1.44 <0.001* 0.24
Six 34 188 130 8.15 ± 8.18 8.4 ± 9.22 0.35 0.68 ÷
35 213 132 5.45 ± 4.05 5.2 ± 3.61 -0.25 0.9 ÷
36 109 83 11.88 ± 9.25 11.82 ± 8.79 -0.06 0.98 ÷
37 141 93 14.96 ± 9.54 21.81 ± 10.76 6.85 <0.001* <0.001*

÷ Confirmatory adjusted p-values not taken due to low sample size and/or non-statistically significant unadjusted p-values.

** Statistically significant p-values.

Primary outcomes for individual physicians

Tables 2 and 3 display results of the primary outcomes when comparing each individual physician to themselves at the beginning versus the end of the day. Medians for both primary outcomes for each individual clinician is provided in S1 and S2 Tables.

Table 3. Differences in mean number of diagnoses assessed.

  Number of Appointments Means ± Std-Dev P Values
Clinic Location Provider First Hour Last Hour First Hour Last Hour Difference Wilcoxon P-Genmod
One 2 27 19 6.37 ± 1.9 6.11 ± 2.11 -0.26 0.78 ÷
6 33 21 2.12 ± 0.78 1.71 ± 0.84 -0.41 0.06 ÷
7 143 80 3.12 ± 1.32 2.66 ± 1.19 -0.46 0.006* 0.01*
8 21 16 1.29 ± 0.46 1.38 ± 0.72 0.09 1 ÷
10 294 172 1.94 ± 0.94 1.92 ± 0.87 -0.02 0.19 ÷
Three 17 29 29 3.76 ± 0.51 3.9 ± 0.41 0.14 0.33 ÷
18 43 22 4.95 ± 1.56 5.41 ± 1.76 0.46 0.27 ÷
Four 20 27 19 2.73 ± 1.3 3.11 ± 1.12 0.38 0.13 ÷
22 48 27 2.85 ± 1.44 2.73 ± 1.44 -0.12 0.69 ÷
Five 25 197 126 3.93 ± 1.65 2.95 ± 1.65 -0.98 <0.001* <0.001*
26 18 16 3.61 ± 2.5 2.12 ± 1.89 -1.49 0.05* 0.02*
27 21 31 2.67 ± 1.11 2.39 ± 0.95 -0.28 0.48 ÷
29 32 19 3.19 ± 1.4 3.74 ± 1.88 0.55 0.45 ÷
30 290 215 2.83 ± 1.39 2.28 ± 1.34 -0.55 <0.001* <0.001*
31 61 29 3.16 ± 1.69 3.17 ± 1.83 0.01 0.92 ÷
32 64 36 3.98 ± 1.54 2.78 ± 1.57 -1.2 <0.001* 0.007*
Six 34 188 130 2.89 ± 1.67 2.78 ± 1.02 -0.11 0.34 ÷
35 213 132 3.55 ± 1.21 3.86 ± 1.22 0.31 0.004* 0.04*
36 109 83 2.94 ± 1.09 2.87 ± 1.12 -0.07 0.5 ÷
37 141 93 4.59 ± 1.69 4.68 ± 1.91 0.09 0.52 ÷

÷ Confirmatory adjusted p-values not taken due to low sample size and/or non-statistically significant unadjusted p-values.

** Statistically significant p-values.

Table 2 displays the within-physician time-of-day differences in the mean numbers of diagnostic tests ordered per patient appointment. For example, Physician 2 in Clinic 1 ordered an average of 3.3 ± 2.13 diagnostic tests per patient visit in the first hour compared to 2.95 ± 2.2 in the last hour, corresponding to a mean difference of -0.35 tests per patient visit (p = 0.58 for equality by the Wilcoxon test). Overall, for this outcome, sixteen physicians had no statistically significant differences between the first and last hour. Only four physicians had both statistically significant, unadjusted (Wilcoxon) and adjusted multivariate linear regression differences in diagnostic tests ordered per patient encounter between the first and last hours of their day. Of these four, three had ordered more diagnostic tests per encounter on average in the first hour compared to the last hour of the workday (adjusted p-values ranging from 0.012 to < 0.001). Conversely, Physician 37 had fewer mean tests per encounter in the first hour (14.86 tests vs. 21.10 tests, adjusted p < 0.001).

Table 3 displays similar data for the number of diagnoses assessed per appointment. For instance, Physician 2 in Clinic 1 assessed 3.12 ± 1.32 diagnoses per patient visit in the first hour compared to 2.66 ± 1.19 in the last hour, corresponding to a mean difference of -0.46 diagnoses per patient visit (p = 0.006 for equality by the Wilcoxon test). Because this p-value was ≤ 0.05, a confirmatory case mix-adjusted comparison was made, yielding a p-value of 0.01. Overall, for this outcome, 14 physicians had no statistically significant unadjusted or adjusted differences between the first and last hour of their day, but six physicians did. Five of these six physicians made more unadjusted and adjusted mean diagnoses per encounter in the first hour of their day compared to the last hour (adjusted p-values ranging from 0.02 to <0.001), while there was again one physician with the opposite pattern (Physician 35: 3.55 diagnoses assessed in the first hour vs. 3.86 in the last hour, adjusted p < 0.04).

Assessing for a collective temporal trend in the primary outcomes

Despite detecting only a handful of doctors with statistically significant time-of-day differences in the primary outcomes, these findings do not statistically rule out time-of-day differences for other physicians. Thus, we were interested in assessing post hoc if there was a statistically significant group-level trend toward more tests ordered or more diagnoses assessed in the first versus the last hour of the day. We found that 80% of the clinicians (16 of the 20) ordered more diagnostic tests per appointment on average in the first hour of the day compared to 20% (4 out of 20) who ordered more in the last hour of the day (two-sided p = 0.012 by exact test for each physician to have equal probability to order more during each time period). For the diagnoses assessed outcome, 60% of doctors (12 out of 20) assessed more diagnoses on average per patient encounter in the first hour compared to 40% (8 out of 20) who assessed more diagnoses on average in the last hour (two-sided p = 0.50 by exact test).

Same-specialty variation in the primary outcomes

Finally, while not part of the original study objective, we noticed considerable variation in the primary outcome data from physician to physician, prompting us to assess how physicians performed relative to their same-specialty peers. Tables 4 and 5 display the aggregate mean and median data for the primary outcomes for all physicians. Physicians in Clinics One, Three, Five, and Six all had statistically significant within-clinic differences in their practice patterns for each primary outcome compared to their same-specialty peers (p-values for equality ranging from 0.01 to <0.001). An evident example in Clinic Six is Physician 35 who had a mean of 3.88 lab tests ordered per appointment while his or her peer, Physician 37, had a much larger mean of 17.68 tests per appointment (Table 4). Only Clinic Four had statistically nonsignificant within-clinic physician differences for both outcomes, except in the unadjusted analysis of diagnostic test orders.

Table 4. Overall number of laboratory tests ordered by providers in first and last hour combined.

Clinic Locationa Provider Total Appointments (First + Last Hour) Mean Tests Ordered Per Appointment ± Std-Dev Median (95% CI) Q3 (95% CI) Kruskal Wallis p-valueb P-Genmod p-valueb
One 2 46 3.15 ± 2.14 3 (2,4) 5 (4,6) <0.001 <0.001
6 54 0.44 ± 0.86 0 (0,0) 1 (0,2)
7 223 0.96 ± 1.33 0 (0,1) 1 (1,2)
8 37 0.54 ± 1.22 0 (0,0) 0 (0,2)
10 466 0.57 ± 1.10 0 (0,0) 1 (1,1)
Three 17 58 4.53 ± 4.25 4 (2,5) 7 (5,10) <0.001 <0.001
18 65 8.28 ± 2.53 8 (8,9) 10 (9,11)
Four 20 75 5.97 ± 1.94 7 (5,8) 10 (8,12) <0.001 0.22
22 74 3.72 ± 3.19 4 (2,5) 6 (6,7)
Five 25 323 2.59 ± 2.60 2 (2,2) 4 (4,5) 0.006 0.01
26 34 1.74 ± 2.27 1 (0,2) 3 (1,5)
27 52 1.38 ± 2.03 0 (0,1) 3 (1,4)
29 51 2.80 ± 3.05 1 (1,4) 6 (3,6)
30 505 2.46 ± 2.86 1 (1,2) 4 (3,5)
31 90 2.33 ± 2.48 1.5 (1,2) 4 (3,5)
32 100 2.21 ± 1.20 1 (1,2) 2 (3,4)
Six 34 318 8.25 ± 1.98 5 (4,7) 11 (10,14) <0.001 <0.001
35 345 3.88 ± 0.64 6 (5,6) 8 (7,9)
36 192 11.85 ± 9.03 10 (8,12) 18 (16,21)
37 234 17.68 ± 10.57 16 (15,18) 25 (23,28)

a Each Clinic Location represents a specific specialty.

b For within-clinic equality of providers.

Table 5. Overall number of diagnoses assessed by providers in first and last hour combined.

Clinic Locationa Provider Total Appointments (First + Last Hour) Mean Diagnoses Assessed Per Appointment ± Std-Dev Median (95% CI) Q3 (95% CI) Kruskal Wallis p-valueb P-Genmod p-valueb
One 2 46 6.26 ± 2.02 5 (8,17) 8 (8,29) <0.001 <0.001
6 54 1.96 ± 0.75 2 (2,2) 2 (2,3)
7 223 2.96 ± 1.29 3 (3,3) 4 (4,4)
8 37 1.32 ± 0.58 1 (1,1) 2 (1,2)
10 466 1.90 ± 0.91 2 (2,2) 2 (2,3)
Three 17 58 3.83 ± 0.46 4 (4,4) 4 (4,4) <0.001 <0.001
18 65 5.11 ± 1.63 5 (5,6) 6 (6,7)
Four 20 75 2.87 ± 1.24 3 (2,3) 4 (3,4) 0.49 0.66
22 74 2.80 ± 1.43 3 (2,3) 3 (3,4)
Five 25 323 3.55 ± 1.69 4 (3,4) 5 (3,5) <0.001 <0.001
26 34 2.91 ± 2.33 2 (1,4) 4 (3,7)
27 52 2.50 ± 1.02 3 (2,3) 3 (3,3)
29 51 3.39 ± 1.60 3 (3,4) 4 (4,5)
30 505 2.59 ± 1.39 2 (2,3) 4 (3,4)
31 90 3.17 ± 1.72 3 (2,4) 4 (4,5)
32 100 3.55 ± 1.65 4 (3,4) 4.5 (4,5)
Six 34 318 2.84 ± 1.05 3 (3,3) 4 (3,4) <0.001 <0.001
35 345 3.67 ± 1.22 4 (4,4) 4 (4,4)
36 192 2.90 ± 0.78 3 (3,3) 3.5 (3,4)
37 234 4.62 ± 1.78 5 (4,5) 6 (5,6)

a Each Clinic Location represents a specific specialty.

b For within-clinic equality of providers.

Discussion

Among a group of 20 outpatient physicians, there is statistical evidence to support the existence of time-of-day effects on diagnostic test ordering and diagnostic assessments, the study’s proxies for physician decision making. Statistically significant time-of-day differences in the average number of tests ordered and diagnoses assessed per patient visit were found in a non-negligible minority of physicians, and the directional trend at the group level for diagnostic test orders support our hypothesis that clinicians would order more tests per patient in the first hour compared to the last hour of the day. Interestingly, there was also substantial variation in the primary outcomes between physicians of the same specialty. Altogether, these findings demonstrate the need for further investigation.

If time of day affects decision making related to test orders and diagnostic assessment for at least some doctors, we believe that based on observations from prior clinical studies [711], decision fatigue and ego depletion may be a mediating factor. Tasks like ordering lab tests and assessing diagnoses require executive cognitive function, and as decision fatigue progresses as a day goes on, physicians may become ego-depleted and subconsciously exhibit avoidant behavior by forgoing additional tasks and decisions [2, 7]. This leads to fewer actions performed per patient appointment. The statistically significant, within-physician level data and the group level data is largely consistent with this hypothesis. For each outcome, all but one of the minority of physicians with statistically significant differences ordered more tests or assessed more diagnoses per patient in the first hour compared to the last hour of the day. The group level data also demonstrated that most clinicians ordered more diagnostics tests per patient encounter at the start versus the end of the day.

The fact that most individual physicians had statistically nonsignificant time-of-day differences can be interpreted in several ways. First, from a behavioral standpoint, it could suggest that those doctors with no significant differences consistently engaged in behavior that mitigated time-of-day effects and ego depletion. Studies have shown that ego depletion of self-control can interestingly be countered by personal factors such as beliefs about self-control, moods, and self-affirmations [2123]. It is entirely plausible that most study physicians, who by default are an accomplished group of individuals, held positive self-affirming thoughts about their abilities and self-control, whereas a few doctors lacked those perspectives, leaving them particularly susceptible to time-of-day effects. Alternatively, it is possible that the use of EMR decision support tools, which aim to reduce the number of decisions and actions for physicians, varied amongst study physicians to the point that those who did not use such tools were more susceptible to temporal effects. Indeed, a study by Kim et al. demonstrated a substantial role for EMR decision support in reducing same-day, temporal disparities in influenza vaccination rates [9].

Second, from a statistical perspective, the lack of statistically significant temporal differences for the majority of study physicians does not necessarily mean that such differences do not exist for those clinicians. Instead, this result could be due to Type II error, as many of the clinicians had small numbers of patient visits in the data set. Despite the inability to pinpoint with statistical certainty a temporal difference in primary outcomes for every individual doctor, the collective finding that i) some individual physicians had significant temporal differences, and ii) there was a significant group-level temporal trend for diagnostic tests ordered, is considerable statistical evidence for an association between time of day and decision making. While the group-level trend was statistically significant only for the diagnostic test outcome, this underscores that we cannot rule out potential time-of-day effects on physician decision making.

Alternative explanations for time-of-day, decision fatigue-mediated associations include the tendency for doctors to fall behind schedule as the clinic day progresses, compressing the time available for assessment and task completion for each last-hour patient. This idea holds less weight considering that collectively there were fewer visits in the last hour (1329 visits) compared to the first hour (2013 visits), and only one doctor saw more patients in the last hour compared to the first hour. Another alternative is that both patients and clinicians may experience a desire to leave sooner at the end of the day [8], leaving less time for patients to participate in further evaluation and for clinicians to perform additional tasks. Lastly, purposeful scheduling of more or less complex patients for certain parts of the day is a possibility, although the temporal differences for the minority of physicians persisted after case-mix adjustment.

It is important to note that in both subsets of physicians with statistically significant differences, one physician exhibited contrasting behavior by ordering more tests or assessing more diagnoses per patient on average in the last hour of the day. This result may support an alternative or complementary hypothesis on the effect of decision fatigue on decision making–that sometimes given an ego-depleted state, individuals may act more impulsively [24]. When physicians generate a diagnostic workup plan for patient care, an initial step is to think of what diagnostics tests to order. To hone this plan, a second cognitive step involves assessing if each test is truly necessary. This second step is arguably often skipped or forgotten, as evidenced by the large estimated rates of unnecessary tests [14, 15]. From an ego-depletion perspective, Physician 37’s end-of-day behavior of ordering more tests per patient on average may represent either an impulsive tendency to order more tests than necessary as a form of defensive medicine, or an avoidance of the follow-up cognitive step to evaluate test necessity. For Physician 35 who assessed more diagnoses per patient at the end of the day, their behavior could possibly represent impulsive entry of unconfirmed differential diagnoses into the EMR or procrastination in the removal of disproved diagnoses from the EMR.

While the underlying causative factors are unclear, time-of-day differences in the primary outcomes, and by extension physician decision making, would suggest that quality of clinical care varies by time of day. Such variation carries important implications for healthcare operations and practices. However, this study was unable to determine how quality of care was temporally affected since defining and assessing quality of care, including whether study physicians were over or undertesting or over or underassessing patients by time of day, was beyond the scope of the study data.

Interestingly, unanticipated nontemporal variation was observed between same-specialty physicians with respect to the primary outcomes. Clinicians at four clinic locations had considerable case mix-adjusted differences among themselves and their same-specialty peers in both the numbers of diagnostic tests ordered and the number of diagnoses assessed per patient appointment. While these differences likely reflect factors such as age, experience, within-specialty expertise around complex disorders, and different backgrounds in residency or fellowship training [25], they do imply variation in quality of care delivered within singular outpatient clinics. Whether one doctor delivers better or worse care compared to his or her same-specialty peer at the same clinic location is unclear, but such variation alone merits further investigation.

This work has several important limitations. First, the study is observational, and consequently, the results are subject to unmeasured confounders, one of which is true duration of appointment. Because there was no reliable data point in the EMR system that accurately reflected actual appointment end times, we made the assumption that the duration of the appointment was the scheduled duration. Additionally, literature has suggested that the greater the difficulty of the decision, the more decision fatigue an individual may face [26]. We attempted to control for decision complexity using the Charlson comorbidity index as a proxy, but this may be an imperfect measure. Second, because the initial EMR data extraction was limited to a one-year period and many physicians at the study site do not often work full outpatient days in the same clinic, this comparison of first versus last hour of a full day was significantly underpowered for many of the original 39 physicians, resulting in fewer data points and limited generalizability of the study. The single-site design of the study also contributed to its limited generalizability. Additionally, no demographic data was collected on providers per IRB concerns. While such data could have provided key insights into same-specialty physician differences observed, possible unmeasured confounders of first versus last hour within-physician differences, and physician susceptibility to decision fatigue, the provider population at the study institution is sufficiently small that collecting and reporting clinician demographic data could jeopardize provider confidentiality. Importantly, this study did not attempt to measure physician decision fatigue directly, and as a result, relationships between the outcomes of interest, decision fatigue, and time of day are limited to inferences. Lastly, some of the within-physician mean time-of-day differences, while statistically significant, are small enough in magnitude that they may not be clinically significant for individual patients. It should be noted, however, that small differences can aggregate into large differences over time and patient visit volume, which from a health systems perspective, may shed valuable light on the quality of care delivered by physicians as a whole.

Conclusion

There is some statistical evidence on an individual and group level to support the existence of time-of-day effects on clinician decision making, particularly on the number of diagnostic tests ordered per patient. These findings suggest that time of day may be a factor influencing fundamental physician behavior and processes. Notably, many physicians also exhibited significant variation in the primary outcomes compared to same-specialty peers. Additional work is necessary to clarify time-of-day effects and inter-physician variation in the outcomes of interest.

Supporting information

S1 Table. Differences in median number of laboratory tests ordered.

(TIF)

S2 Table. Differences in median number of diagnoses assessed.

(TIF)

Acknowledgments

Contributors: The authors would like to acknowledge John Francis for his key contribution to the acquisition of data for the study.

Data Availability

The dataset generated and analyzed during the current study is available in the Mendeley Data repository, http://dx.doi.org/10.17632/v27rr3zpws.1.

Funding Statement

This study is supported, in part, by Grant Number 1UL1TR003017-01 from the National Center for Advancing Translational Science (https://ncats.nih.gov/). The author associated with this Grant is FAS. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

Decision Letter 0

Paola Iannello

26 May 2021

PONE-D-21-13464

Time-of-Day Changes in Physician Clinical Decision Making: A Retrospective Study

PLOS ONE

Dear Dr. Trinh,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Paola Iannello

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The paper deals with the decision fatigue in the medical decision making context. The authors discussed the time-of-day changes in daily physicians’ decision making. They compared the impact of work fatigue between first hour in the morning and last hour in the afternoon on number of diagnostic tests ordered and number of diagnosis assessed per patient appointment.

One main concern regarding the paper is related to the fact that a theoretical background is completely missing. One theory that may be useful could be the Strength Model of Self-Control (Baumeister et al., 1998); another one the Process Model of Ego Depletion (Inzlicht, Schmeichel, 2012). I suggest the authors to better frame their introduction with reference to the most update theories on decision fatigue.

A definition of the decision fatigue has been given, but I guess in a wrong way. The authors use Hsiang et al., 2019 as reference paper, but Hsiang et al quote Vohs et al 2008 when they give the definition of decision fatigue.

(Vohs KD, Baumeister RF, Schmeichel BJ, Twenge JM, Nelson NM, Tice DM. Making choices impairs subsequent self-control: a limited-resource account of decision making, self-regulation, and active initiative. J Pers Soc Psychol. 2008;94(5):883-898. doi:10.1037/0022-3514.94.5.883)

There are several variables that the authors did not consider in their study. For example, they did not control for the complexity or difficult of decisions. Literature* suggests that the higher is the difficulty rises by a decision, the more decision fatigue an individual experiences. In their study, we do not know anything about the king of decisions the physicians took and we cannot be sure the decisions can be compared each other.

*Oto, B (2012) When thinking is hard: Managing decision fatigue. EMS World 41(5): 46–50.

The authors should better clarify in the introduction what is the expected direction in the relationship between time of day and decision fatigue.

One of the most important flaw of the paper is that it does not measure the decision fatigue itself. The time-of-day during which the decisions are taken is used as proxy of decision fatigue, but the authors did not measure the decision fatigue levels of the doctors.

The fact that the time-of-day influences the decision fatigue is already know in literature. What is the novelty of this study?

Moreover, there are studies in literature in which results demonstrated that who is experience decision fatigue may be either passive/avoidant or impulsive. Then, in some cases it seems that decision fatigue acts increasing procrastination, passive behavior, low persistence, and the choice of a default option; whereas, in others, individuals can act impulsively. All of this may impact on the medical decision making either in the way the authors hypothesized (low number of diagnostic tests ordered and diagnosis assessed or in the opposite way.

Regarding the method, I have a question for the authors: how can we know that number of diagnostic tests ordered and number of diagnosis assessed were lower at the end of the day just because those patients needed less tests than the patients visited in the early morning? Another thing is: is there any way to know the physicians’ characteristics? We do not know if they have different characteristics that may explain the results. A similar consideration can be made for the patients.

I have one question again for the authors regarding the results: in Table 2 and 3, they reported the number of appointments divided for first hour and last hour. It seems to me that, for example, in clinic number 1, the provider number 2 had 27 appointments in the first hour and 19 in the last hour. Perhaps I do not understand very well this point, but it seems quite odd.

The tables are cut on their right sides.

Thank you very much for the opportunity to read and revise this paper.

Reviewer #2: Thank you for giving me the opportunity to review the paper entitled “Time-of-day changes in physician clinical decision making: a retrospective study”, submitted to PlosOne. I think this manuscript addresses a paramount topic and I agree with the importance of analyze factors affecting clinical decision and in particular decision fatigue in clinical context. However, I would like to suggest some changes that I hope the authors will consider.

Introduction

The introduction is clear and there is a concise description of the concept of the measures used within the project. However, I don’t understand some fundamental points

• there is no/little reference framework, which is supported in the literature, on the use of these measures to identify the “decision fatigue” in physician clinical decision making context. Are there references in the literature that justify the choice of these two measurement indices (the number of diagnostic tests ordered and the number of diagnoses assessed during a clinical encounter)? "The study of variations in more general measures of clinical activity, such as frequency of diagnostic test ordering, could potentially reflect physician behavior on a wider dimension, including capturing the impact of fatigue on decision making." (line 83-86). Please specify the references or argue more about the choice of indicators.

• It may be useful to broaden the analysis of the literature on the importance of time-of-day in clinical practice (see Shuchman, M. (2019). Does time of day matter in clinical practice?).

• For the purpose of understanding and linearity of reading, I would consider it useful to add in the final part of the introduction the hypotheses of research and the questions to be answered in an orderly manner in the section dedicated to results and discussion.

• the "primary outcomes" paraghaph is not very clear, in relation to its position in the Manuscript text. Is it possible to move it and integrate it with the hypothesis, at the end of the introduction?

Setting and Participants

Regarding clinic identities (line 122-127), is it possible to have information about the association between the number with which the clinic is identified in the study and the specialty? While respecting the privacy and anonymity of participants and preserving provider confidentiality, it would be useful to know the association between the clinic and their specific field of work.

Please specify more clearly and in a more structured way which were the criteria for inclusion and exclusion in the study design. Furthermore, it is not clear which inclusion/exclusion criteria have not been met by physicians from “Clinic 2”, which has not been considered in the results sections.

I wonder whether the authors could be a little more specific in what they considered as “Cumulative Work Fatigue” and “decision fatigue” and whether they can discuss (in the Introduction and/or Discussion section) their study (and their results) in light with literature works on repeated decision making process and associated decision fatigue. The analysis of the pros and cons in the decision-making process requires cognitive commitment; this is one of the reasons why, when you are more tired, you tend to avoid a reasoning that requires cognitive commitment (See for example Persson, E., Barrafrem, K., Meunier, A., & Tinghög, G. (2019). The effect of decision fatigue on Surgeons' clinical decision making. Health economics, 28(10), 1194-1203; Vohs, K. D., Baumeister, R. F., Schmeichel, B. J., Twenge, J. M., Nelson, N. M., & Tice, D. M. (2008). Making Choices Impairs Subsequent Self-control: A Limited-Resource Account of Decision Making, Self-regulation, and Active Initiative. Journal of Personality and Social Psychology, 94(5), 883-898. https://doi.org/10.1037/0022-3514.94.5.883.).

Furthermore, it would be useful to understand why you choose to analyze the first and last hour of the day in order to maximize potential for differential cumulative work fatigue. With respect to work fatigue, the time of the day between 1pm to 3 pm are not taken into account, unless someone consider them to be the time-of-day in which major at work-accidents occur precisely because of fatigue and sleep. What studies have you referred to considering the first and last hour of the day as being more impacted by fatigue at work?

Results

In Table 2 and Table 3 I suggest to put a star next to the statistically significant values in the table, to make them quickly identifiable

Discussion

Line 300-301: Can you give examples of behaviors that could mitigated time-of-day effects?

Line 303-304: Can you suggest some explanations to why some doctors might be more susceptible to the time-of-day effect on their clinical performance than others? Could they be more susceptible to cognitive bias? Could depend on personal characteristics such as seniority of service or personality characteristics? Please report some references to your suggestion or hypothesis.

Line 330-336: It is interesting to note that in the discussion is inserted and contemplated the possibility of a different/inverse effect of fatigue on the decision making process. Are there any references/other studies that have detected this cognitive effect? Please argue.

Conclusions

I think conclusions should be more cautious given the different limitations identified in the study.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Giulia Ongaro

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Sep 17;16(9):e0257500. doi: 10.1371/journal.pone.0257500.r002

Author response to Decision Letter 0


5 Jul 2021

Paola Iannello

Academic Editor

PLOS ONE

Dear Dr. Iannello and Reviewers,

On behalf of my co-authors, I’d like to thank you very much for your thoughtful comments about our manuscript. Incorporating your constructive feedback has enabled us to produce a more robust manuscript. Below is a point-by-point response to each of the reviewers. The reviewer comments are in quotations, and our response follows each quotation block. I hope that we have addressed and clarified each point fittingly, and we look forward to your response.

Sincerely,

Peter Trinh, MD, MBA

Peter.trinh1@gmail.com

+1-973-747-5664

Reviewer #1:

“One main concern regarding the paper is related to the fact that a theoretical background is completely missing. One theory that may be useful could be the Strength Model of Self-Control (Baumeister et al., 1998); another one the Process Model of Ego Depletion (Inzlicht, Schmeichel, 2012). I suggest the authors to better frame their introduction with reference to the most update theories on decision fatigue.”

Thank you for this crucial feedback. We fully acknowledge this shortcoming and have updated the Introduction to incorporate a theoretical foundation. Please see Page 3, Lines 81-120.

“A definition of decision fatigue has been given, but I guess in a wrong way. The authors use Hsiang et al., 2019 as reference paper, but Hsiang et al quote Vohs et al 2008 when they give the definition of decision fatigue.

(Vohs KD, Baumeister RF, Schmeichel BJ, Twenge JM, Nelson NM, Tice DM. Making choices impairs subsequent self-control: a limited-resource account of decision making, self-regulation, and active initiative. J Pers Soc Psychol. 2008;94(5):883-898. doi:10.1037/0022-3514.94.5.883)”

Thank you. We have updated the manuscript with the appropriate citation.

“There are several variables that the authors did not consider in their study. For example, they did not control for the complexity or difficult of decisions. Literature* suggests that the higher is the difficulty rises by a decision, the more decision fatigue an individual experiences. In their study, we do not know anything about the king of decisions the physicians took and we cannot be sure the decisions can be compared each other.

*Oto, B (2012) When thinking is hard: Managing decision fatigue. EMS World 41(5): 46–50.”

The reviewer makes a great point about the possible effects of the complexity/difficulty of decisions. Making a choice about whether to order a diagnostic test or assess a diagnosis is ultimately a binary decision. In the context of clinical practice, what makes these binary decisions more difficult is arguably the patient’s medical complexity, which is represented in this study by the Charlson comorbidity index (CCI) (Reference #20). We believe that by controlling for this index, similar to other studies (Reference #8-9), and by comparing early encounters in a given specialty only with late encounters in the same specialty, we have, by proxy, controlled for the difficulty of the decisions. According to Table 1, there were no remarkable differences in CCI between patients in the first and last hours. Please see Page 21, Lines 606-609 for our in-text edits.

“The authors should better clarify in the introduction what is the expected direction in the relationship between time of day and decision fatigue.”

To address this point, we have edited the Introduction to delineate our hypothesis more clearly. Please see Page 6, Lines 206-212.

“One of the most important flaws of the paper is that it does not measure the decision fatigue itself. The time-of-day during which the decisions are taken is used as proxy of decision fatigue, but the authors did not measure the decision fatigue levels of the doctors.”

Thank you for this feedback. This is indeed a critical point, and we have edited the limitations section of the Discussion to acknowledge this. Please see Page 22, Line 575. This is a potential subject for future study.

“The fact that the time-of-day influences the decision fatigue is already know in literature. What is the novelty of this study?”

The novelty of the study is to expand the literature on the temporal effect of decision fatigue in medicine beyond the very clinically specific outcomes that presently exist in the literature. Studies in the medical field have only looked at specific decisions/outcomes that are particular to their medical specialty. Please see References #7-11. Our outcomes (diagnostic tests ordered and diagnoses assessed per patient) are much more general process measures that can be applied to many more medical specialties. We have edited the Introduction to hopefully delineate this point more clearly to readers. Please see Page 4, Lines 133-162.

“Moreover, there are studies in literature in which results demonstrated that who is experience decision fatigue may be either passive/avoidant or impulsive. Then, in some cases it seems that decision fatigue acts increasing procrastination, passive behavior, low persistence, and the choice of a default option; whereas, in others, individuals can act impulsively. All of this may impact on the medical decision making either in the way the authors hypothesized (low number of diagnostic tests ordered and diagnosis assessed or in the opposite way.”

This is an excellent point. We have integrated these findings from the literature into the Introduction and our interpretation of the results in the Discussion section to address this point of feedback. Please see Page 5, Line 169 (Introduction), Page 18, Line 457 and Page 20, Lines 519-529 (Discussion).

“Regarding the method, I have a question for the authors: how can we know that number of diagnostic tests ordered and number of diagnosis assessed were lower at the end of the day just because those patients needed less tests than the patients visited in the early morning? Another thing is: is there any way to know the physicians’ characteristics? We do not know if they have different characteristics that may explain the results. A similar consideration can be made for the patients.”

This is a great question. To try and address this, we used the Charlson comorbidity index (CCI), which is a measure of medical complexity of patients (Ref #20). According to our analysis, patients in the morning versus the afternoon seemed to be no different from each other, especially in the CCI score, so it’s less likely that patients at the end of the day had less tests just because those patients needed less tests. This is based on the assumption that patients with similar CCI scores would likely have similar numbers of diagnostic tests ordered and numbers of diagnoses assessed. As with all things, we unfortunately cannot definitively rule out that there were other patient differences as well.

With regard to the physicians, we stratified the analyses comparing the first to the last hour within the same physician, so in effect, our comparisons were within-physician (i.e. what he or she did in the afternoon compared to what he/she did in the morning). Since the characteristics of the physician remain the same in both time periods, these cancel out in the comparison. We would have liked to study further these associations for interactions with physician characteristics but were unable to collect this data. Due to the small size of the study site, the IRB felt it best to mask physician demographics and which specialties were associated with each clinic in order to protect provider confidentiality. We acknowledge how provider characteristics could certainly influence these results and have edited language in the limitation section of the Discussion section (Page 22, Lines 570-575) to communicate this message more effectively.

“I have one question again for the authors regarding the results: in Table 2 and 3, they reported the number of appointments divided for first hour and last hour. It seems to me that, for example, in clinic number 1, the provider number 2 had 27 appointments in the first hour and 19 in the last hour. Perhaps I do not understand very well this point, but it seems quite odd.”

We are not sure why the number of appointments in the last hour of the day was smaller than in the first hour, but it was a consistent empirical finding in the outpatient practices we studied. In fact, of 20 providers, 18 had more appointments in the first hour than the last hour compared to one physician who had the same number of first and last hour appointments (p < 0.0001). This difference may relate to scheduling strategies (e.g. scheduling new or extended appointments near the end of the day) or to a greater tendency for appointments later in the day to be cancelled. We would also like to note that despite the number of appointments in the first and last hours not being the same for each physician, our outcomes of interest were purposely “per appointment” outcomes and not “per hour” outcomes that could have been influenced by the number of patient appointments the physicians had.

“The tables are cut on their right sides.”

Thank you for catching this. The Tables have been edited to address this issue.

Reviewer #2:

“The introduction is clear and there is a concise description of the concept of the measures used within the project. However, I don’t understand some fundamental points.

• there is no/little reference framework, which is supported in the literature, on the use of these measures to identify the “decision fatigue” in physician clinical decision making context. Are there references in the literature that justify the choice of these two measurement indices (the number of diagnostic tests ordered and the number of diagnoses assessed during a clinical encounter)? "The study of variations in more general measures of clinical activity, such as frequency of diagnostic test ordering, could potentially reflect physician behavior on a wider dimension, including capturing the impact of fatigue on decision making." (line 83-86). Please specify the references or argue more about the choice of indicators.”

Thank you very much for this crucial feedback. We have edited the Introduction to incorporate a theoretical foundation and more thorough reference framework behind decision fatigue. Please see Page 3, Lines 81-120.

In our literature search, we could not find any studies that utilized the number of diagnostic tests ordered and the number of diagnoses assessed to examine time-of-day effects, and one of the reasons we chose these parameters was due to their wide applicability within medicine. Please see Page 4, Line 133-205 for our edited Introduction that reflects more clearly our rationale for choosing these parameters.

“• It may be useful to broaden the analysis of the literature on the importance of time-of-day in clinical practice (see Shuchman, M. (2019). Does time of day matter in clinical practice?).”

In conducting our literature research, we were able to find only a handful of studies that analyzed the importance of time-of-day in clinical practice. We have cited these studies in our Introduction (Page 4, Lines 121-128). Coincidentally, these studies are the same studies referred to by M. Shuchman in her 2019 review article. Please see References #8-11.

“• For the purpose of understanding and linearity of reading, I would consider it useful to add in the final part of the introduction the hypotheses of research and the questions to be answered in an orderly manner in the section dedicated to results and discussion.”

Thank you for this feedback. We have edited the Introduction as the reviewer suggested to communicate our hypothesis more clearly. Please see Page 6, Lines 206-212.

“• The "primary outcomes" paragraph is not very clear, in relation to its position in the Manuscript text. Is it possible to move it and integrate it with the hypothesis, at the end of the introduction?”

Thank you. We agree with this feedback and have edited the Introduction and Methods to address this feedback and improve the flow of the paper. Please see Page 6, Lines 206-212.

“Setting and Participants

Regarding clinic identities (line 122-127), is it possible to have information about the association between the number with which the clinic is identified in the study and the specialty? While respecting the privacy and anonymity of participants and preserving provider confidentiality, it would be useful to know the association between the clinic and their specific field of work.”

Due to the size of the study site, the Institutional Review Board (IRB) felt it best to mask both physician demographics and which specialties were associated with each clinic in order to best protect provider confidentiality. We thus did not collect the physician characteristics. We made sure to acknowledge how provider characteristics could certainly have influenced these results in the limitations paragraph of our Discussion (Page 21, Lines 570-575). In a future large-scale study, we hope to capture provider characteristics, including specialty, without compromising confidentiality.

“Please specify more clearly and in a more structured way which were the criteria for inclusion and exclusion in the study design. Furthermore, it is not clear which inclusion/exclusion criteria have not been met by physicians from “Clinic 2”, which has not been considered in the results sections.”

We have edited the Methods section to address this feedback and convey the exclusion/inclusion criteria more clearly. Please see Page 7, Lines 240-250.

I wonder whether the authors could be a little more specific in what they considered as “Cumulative Work Fatigue” and “decision fatigue” and whether they can discuss (in the Introduction and/or Discussion section) their study (and their results) in light with literature works on repeated decision making process and associated decision fatigue. The analysis of the pros and cons in the decision-making process requires cognitive commitment; this is one of the reasons why, when you are more tired, you tend to avoid a reasoning that requires cognitive commitment (See for example Persson, E., Barrafrem, K., Meunier, A., & Tinghög, G. (2019). The effect of decision fatigue on Surgeons' clinical decision making. Health economics, 28(10), 1194-1203; Vohs, K. D., Baumeister, R. F., Schmeichel, B. J., Twenge, J. M., Nelson, N. M., & Tice, D. M. (2008). Making Choices Impairs Subsequent Self-control: A Limited-Resource Account of Decision Making, Self-regulation, and Active Initiative. Journal of Personality and Social Psychology, 94(5), 883-898. https://doi.org/10.1037/0022-3514.94.5.883.).

We very much appreciate the reviewer’s feedback on this point and the provided citations. To address this, we have modified the Introduction to incorporate a theoretical foundation (Lines 81-120) and reference prior studies looking at decision fatigue within medicine (Lines 121-128). We have also edited the Discussion to discuss our study in the context of the literature and theory on ego depletion and decision fatigue (Page 18, Lines 453-474; Page 20, 519-529).

“Furthermore, it would be useful to understand why you chose to analyze the first and last hour of the day in order to maximize potential for differential cumulative work fatigue. With respect to work fatigue, the time of the day between 1pm to 3 pm are not taken into account, unless someone consider them to be the time-of-day in which major at work-accidents occur precisely because of fatigue and sleep. What studies have you referred to considering the first and last hour of the day as being more impacted by fatigue at work?”

Our idea was to maximize the potential to detect same-day differences by taking and comparing the two extremes. One extreme where the physician is assumedly fresh and has just started the workday, and the other extreme being a timepoint in which the physician has accumulated the most amount of decision fatigue or is the most ego-depleted due to the accumulated amount of work and decisions they’ve made over the course of the day (as one would infer according to the sequential task paradigm). This study design was, in part, inspired by the results of the following studies (Ref #6-8):

Danziger S, Levav J, Avnaim-Pesso L. Extraneous factors in judicial decisions. Proc Natl Acad Sci U S A [Internet]. 2011 Apr 26 [cited 2018 Apr 16];108(17):6889–92. Available from: http://www.ncbi.nlm.nih.gov/pubmed/21482790

Persson E, Barrafrem K, Meunier A, Tinghög G. The effect of decision fatigue on surgeons’ clinical decision making. Heal Econ (United Kingdom) [Internet]. 2019 Oct 1 [cited 2021 Jun 10];28(10):1194–203. Available from: /pmc/articles/PMC6851887/

Hsiang EY, Mehta SJ, Small DS, Rareshide CAL, Snider CK, Day SC, et al. Association of Primary Care Clinic Appointment Time With Clinician Ordering and Patient Completion of Breast and Colorectal Cancer Screening. JAMA Netw Open [Internet]. 2019 May 10 [cited 2019 May 20];2(5):e193403. Available from: http://jamanetworkopen.jamanetwork.com/article.aspx?doi=10.1001/jamanetworkopen.2019.3403

Each of these studies tracked their respective outcomes by hour over the course of a day, and each demonstrated significant differences in their outcomes between the first and last hours of the day. With consistent first versus last hour differences found in these prior studies, we thought it sufficient to simplify our analysis to the first versus last hour of a day as well.

“Results

In Table 2 and Table 3 I suggest to put a star next to the statistically significant values in the table, to make them quickly identifiable.”

We have modified the Tables with an asterisk to demarcate the statistically significant values.

“Discussion

Line 300-301: Can you give examples of behaviors that could mitigated time-of-day effects?”

To address this point, we have edited the Discussion (please see Page 18, Line 468-470) and referenced appropriate studies that demonstrate examples of mitigating factors to time-of-day effects/ego depletion, such as personal beliefs about self-control, self-affirmations, and emotional mood. The studies referenced are below (Ref #21-23):

Schmeichel BJ, Vohs KD. Self-Affirmation and Self-Control: Affirming Core Values Counteracts Ego Depletion. J Pers Soc Psychol [Internet]. 2009 Apr [cited 2021 Jun 19];96(4):770–82. Available from: https://pubmed.ncbi.nlm.nih.gov/19309201/

Tice DM, Baumeister RF, Shmueli D, Muraven M. Restoring the self: Positive affect helps improve self-regulation following ego depletion. J Exp Soc Psychol. 2007 May 1;43(3):379–84.

Job V, Dweck CS, Walton GM. Ego depletion-is it all in your head? implicit theories about willpower affect self-regulation. Psychol Sci [Internet]. 2010 Nov [cited 2021 Jun 19];21(11):1686–93. Available from: https://pubmed.ncbi.nlm.nih.gov/20876879/

“Line 303-304: Can you suggest some explanations to why some doctors might be more susceptible to the time-of-day effect on their clinical performance than others? Could they be more susceptible to cognitive bias? Could depend on personal characteristics such as seniority of service or personality characteristics? Please report some references to your suggestion or hypothesis.”

Susceptibility and mitigation of time-of-day effects could be two sides of the same coin. We referenced the aforementioned studies above that demonstrated how mood, self-affirmation, and views on self-control could mitigate ego depletion and decision fatigue. Individuals who are more susceptible to time-of-day effects could simply be ones who do not exhibit these mitigating behaviors. Alternatively, it’s possible that physicians have variable use of decision support tools in the electronic medical record system such that those who are not using decision support tools may be more susceptible to time-of-day effects and decision fatigue. Studying the use of decision support and its effect on decision is fatigue is a topic of interest to us for further study. We have edited the Discussion to incorporate these possible explanations (please see Page 18, Line 466-479).

“Line 330-336: It is interesting to note that in the discussion is inserted and contemplated the possibility of a different/inverse effect of fatigue on the decision making process. Are there any references/other studies that have detected this cognitive effect? Please argue.”

Thank you for this feedback. We have edited the manuscript and referenced a 2011 article by Tierney (Ref #34), who has cowritten books with Dr. Baumeister, to better ground our discussion with the extant literature. This article explains the possibilities of impulsive versus avoidant behaviors that may result from decision fatigue and thus could possibly explain the relative outlier physicians in our study. Please see Page 20, Lines 515-529.

“Conclusions

I think conclusions should be more cautious given the different limitations identified in the study.”

We agree that we should be cautious and conservative with our conclusions. We have further hedged the conclusions by editing it to “some statistical evidence.” Overall, we tried to be as conservative as possible with our study conclusions, using less firm language such as “some” “suggest” and “may.”

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Paola Iannello

3 Sep 2021

Time-of-Day Changes in Physician Clinical Decision Making: A Retrospective Study

PONE-D-21-13464R1

Dear Dr. Trinh,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Paola Iannello

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have answered to all my questions. Thank you for their work, I think the paper is now publishable

Reviewer #2: I thank the author very much for their thoughtful response to this round of reviews. In my opinion, this revised version was much improved. I suggest only authors to pay more attention to the responses to the reviewers in particular with reference to the indications for lines and pages that are not at any point correct. This makes it more difficult to identify corrections and changes made to the Manuscript.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Acceptance letter

Paola Iannello

8 Sep 2021

PONE-D-21-13464R1

Time-of-day changes in physician clinical decision making: a retrospective study

Dear Dr. Trinh:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Paola Iannello

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Differences in median number of laboratory tests ordered.

    (TIF)

    S2 Table. Differences in median number of diagnoses assessed.

    (TIF)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    The dataset generated and analyzed during the current study is available in the Mendeley Data repository, http://dx.doi.org/10.17632/v27rr3zpws.1.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES