Skip to main content
Annals of Surgery logoLink to Annals of Surgery
. 2005 Jun;241(6):847–860. doi: 10.1097/01.sla.0000164075.18748.38

Resident Work Hour Limits and Patient Safety

Benjamin K Poulose *, Wayne A Ray , Patrick G Arbogast , Jack Needleman §, Peter I Buerhaus , Marie R Griffin , Naji N Abumrad *, R Daniel Beauchamp *, Michael D Holzman *
PMCID: PMC1357165  PMID: 15912034

Abstract

Objective:

This study evaluates the effect of resident physician work hour limits on surgical patient safety.

Background:

Resident work hour limits have been enforced in New York State since 1998 and nationwide from 2003. A primary assumption of these limits is that these changes will improve patient safety. We examined effects of this policy in New York on standardized surgical Patient Safety Indicators (PSIs).

Methods:

An interrupted time series analysis was performed using 1995 to 2001 Nationwide Inpatient Sample data. The intervention studied was resident work hour limit enforcement in New York teaching hospitals. PSIs included rates of accidental puncture or laceration (APL), postoperative pulmonary embolus or deep venous thrombosis (PEDVT), foreign body left during procedure (FB), iatrogenic pneumothorax (PTX), and postoperative wound dehiscence (WD). PSI trends were compared pre- versus postintervention in New York teaching hospitals and in 2 control groups: New York nonteaching hospitals and California teaching hospitals.

Results:

A mean of 2.6 million New York discharges per year wereanalyzed with cumulative events of 33,756 (APL), 36,970 (PEDVT), 1,447 (FB), 10,727 (PTX), and 2,520 (WD). Increased rates over time (expressed per 1000 discharges each quarter) were observed in both APL (0.15, 95% confidence interval, 0.09–0.20, P<0.05) and PEDVT (0.43, 95% confidence interval, 0.03–0.83, P<0.05) after policy enforcement in New York teaching hospitals. No changes were observed in either control group for these events or New York teaching hospital rates of FB, PTX, or WD.

Conclusions:

Resident work hour limits in New York teaching hospitals were not associated with improvements in surgical patient safety measures, with worsening trends observed in APL and PEDVT corresponding with enforcement.


Resident physician work hour limits have been implemented in New York State at great cost and effort with a primary assumption of patient safety improvement. We present the first population-based evaluation of this policy using standardized surgical Patient Safety Indicators. Worsening trends in accidental puncture or lacerations and postoperative pulmonary embolus or deep venous thrombosis were observed in New York teaching hospitals after work hour limit enforcement.

Few topics evoke more heightened emotions and controversy at academic medical centers than resident physician work hour limits. From first year intern to seasoned hospital administrator, the policy change instituted on a nationwide basis in July 2003 by the Accreditation Council on Graduate Medical Education (ACGME) has impacted trainees, reshaped residency training programs, and demanded shifts in hospital resources to compensate for the reduction in resident work hours. A primary goal of these work hour limits is an improvement in patient safety.1 This goal arose largely from publicity about errors at teaching institutions (eg, the Libby Zion case in New York) and public reaction to controversial reports of increased medical errors possibly linked to provider fatigue.2–4

Empirical evaluation of far-reaching policy changes, such as resident work hour limits, is critical to appraising the predicted and unforeseen consequences that inevitably accompany such interventions. Being the first state to enact and enforce resident work hour limits, examination of the New York experience provides a unique opportunity to evaluate this policy that now affects the entire nation. Although important in their own scope, most evaluations of resident work hour limits have been limited to nonstandardized surveys or single institutions.5–7 Furthermore, recent statewide evaluations have not focused on the groups most impacted by work hour regulations, namely, surgical specialties.8

In this study, we test the hypothesis that resident work hour limits are associated with surgical patient safety measure improvement in New York teaching hospitals. In addition, we use standardized Patient Safety Indicators (PSIs) developed by the Agency for Healthcare Research and Quality (AHRQ) as outcome measures sensitive to surgical training or technical performance.9 This study provides critical, timely data in the appraisal of resident work hour limits and gives a measure of PSI effect sizes to design further evaluations at the state and national levels.

MATERIALS AND METHODS

Design Overview

The purpose of this study was to evaluate the association between resident physician work hour limits and changes in patient safety using data from the 1995 to 2001 Nationwide Inpatient Sample (NIS). Analyses were conducted using an interrupted time series design. This method is particularly useful in evaluating how policy change (ie, an “interruption”) affects outcome measures over time.10 The policy change evaluated in this study was the enforcement of resident work hour limits in New York teaching hospitals. The outcomes measured were rates per 1000 discharges per quarter of PSIs developed by AHRQ specifically for use in administrative datasets. PSIs sensitive to surgical training or technical performance evaluated in this study included accidental puncture or laceration (APL), postoperative pulmonary embolus or deep venous thrombosis (PEDVT), foreign body left during procedure (FB), iatrogenic pneumothorax (PTX), and postoperative wound dehiscence (WD). Two concurrent control groups were also evaluated as groups not affected by the work hour limit policy: New York nonteaching hospitals and California teaching hospitals. The study was approved by the Vanderbilt University Institutional Review Board.

Data Source

Data were obtained from the AHRQ Healthcare Cost and Utilization Project NIS for the years 1995 to 2001. The NIS is the largest all-payor database publicly available, representing 20% of nonfederal discharges within the United States for a given year.11 Data for 17 states are included in the 1995 NIS encompassing 6,714,935 raw discharges. By 2001, data were available from 33 states with 7,452,727 raw discharges. Accounting for NIS sampling weights and the stratified, complex database structure allowed evaluation of national cohorts ranging in size from 34,791,998 (1995) to 37,187,641 (2001). Data were available for all study years (1995–2001) for New York and California. Final statewide PSI rates were calculated for each analysis group at each yearly quarter by appropriate subpopulation of the national datasets. Teaching hospitals were defined as hospitals with American Medical Association-approved residency teaching programs, membership in the Council of Teaching Hospitals, or a ratio of full-time equivalent interns and residents to beds of at least 0.25.11

Intervention

Resident physician work hour limits became law in New York State on July 1, 1989.12 These regulations limited resident work hours to no more than 80 hours per week (averaged over 4 weeks) with no single work period exceeding 24 hours. In addition, residents were to have at least one 24-hour period of nonworking time each week. Notably, physicians in training were also to be supervised 24 hours a day by in-house attending staff or senior resident physicians. However, essentially no hospitals complied with these regulations upon enactment.13 On March 5, 1998, the state of New York began inspecting teaching hospitals for work hour limit compliance and hired a third party, the Island Peer Review Organization, to perform surveillance.13,14 Based on compliance assessment, teaching hospitals were fined for lack of conformity with the regulations. In this study, we defined the intervention (ie, policy change) as the enforcement of resident work hour limits in New York State with inspections and fines. For the time series analysis, the beginning of the postintervention period was designated as the second quarter of 1998, which was the quarter immediately following the start date of enforcement.

Concurrent Control Groups

To improve inferential validity in this study, PSI rates were calculated for 2 groups not affected by resident work hour regulations: New York nonteaching hospitals and California teaching hospitals. New York nonteaching hospitals were selected because of their existence in the same state as the hospitals of interest (New York teaching hospitals) and for their lack of regulation or enforcement of any resident work hour limits. Thus, New York nonteaching hospitals were similar to teaching hospitals, but without the intervention. California teaching hospitals were selected as another control group from a large, populous state geographically “removed” from the work hour regulations enforced in New York. As such, the California hospitals evaluated in this study were similar to the New York teaching hospitals in their academic status but without the intervention of resident work hour limit enforcement.

Outcome Measures: Patient Safety Indicators

The PSIs developed by AHRQ consist of 26 empirically tested measures designed as screening tools to identify potential problems in quality of care.9 Based on the International Classification of Diseases, 9th revision, Clinical Modification (ICD-9-CM) codes, these measures are intended for analysis using administrative datasets, such as the NIS. PSIs were selected based on 2 criteria. First, candidate PSIs were each required to have a minimum of 20 events per quarter for stability of rates. Second, focus groups were conducted with experts in the field of surgical education at our institution to select PSIs for analysis. After an iterative review process, 5 PSIs were selected as outcomes sensitive to surgical training or technical performance, which included the following rates: APL per 1000 discharges (excluding obstetric admissions), PEDVT per 1000 surgical discharges (excluding obstetric admissions); FB per 1000 discharges; PTX per 1000 discharges (excluding trauma, cardiothoracic, and obstetric admissions); and WD per 1000 abdominopelvic operations (excluding obstetric admissions). PTX was included as an outcome measure for technical proficiency, recognizing that nonsurgical trainees would also contribute to the variation in this particular PSI. To improve precision in PSI computation, software code (version 2.1) provided by AHRQ and NIS sampling weights were used to derive final PSI rates from the NIS ICD-9-CM data. The coding structure for these PSIs has been previously published.9 In addition, preintervention rates were compared with empirical national rates published by AHRQ to ensure consistency. All rates were calculated for each group (New York teaching and nonteaching hospitals, and California teaching hospitals) per quarter, which served as the unit of time for the interrupted time series analysis.

Comorbidity Adjustment

Critical to any analysis of adverse events is proper accounting of differences in patient acuity between study groups. In this study, risk adjustment in the time series model was based on comorbidity evaluation methods appropriate to single hospitalization datasets (eg, the NIS) as described by Elixhauser et al.15 In this method, 30 possible comorbid conditions were identified for each patient in the NIS from 1995 to 2001, including congestive heart failure, cardiac arrhythmia or valvular disease, pulmonary circulation disorders, peripheral vascular disorders, hypertension, paralysis, other neurologic disorders, chronic lung disease, diabetes, hypothyroidism, renal failure, liver disease, peptic ulcer disease, acquired immunodeficiency syndrome, lymphoma, metastatic cancer, solid tumor without metastasis, rheumatoid arthritis or collagen vascular disease, coagulopathy, obesity, weight loss, fluid and electrolyte disorder, blood loss anemia, deficiency anemia, alcohol abuse, drug abuse, psychoses, and depression. A comorbidity score was estimated for each patient in which a complex sample, multivariate logistic regression model was fit relating these comorbid conditions to probability of death. The comorbidity score was the linear predictor estimated from this regression model. The mean comorbidity score was computed separately for each quarter and study group and entered as a covariate in the time series model for risk adjustment.

Statistical Analysis

Interrupted time series analyses using autoregressive integrated moving average models were used to estimate the effect of resident work hour limit enforcement on PSI rates per 1000 discharges each quarter, adjusting for secular trends preintervention and postintervention as well as comorbidity score. The series consisted of 28 quarterly rates: 13 quarters before enforcement (from first quarter of 1995 through first quarter of 1998) and 15 quarters after enforcement (including second quarter of 1998 through fourth quarter of 2001). To account for varying levels of compliance immediately after enforcement, a 4-quarter transition period (second quarter of 1998 through second quarter of 1999) was included in the time series models. Autocorrelation was assessed for each model to determine if data points were independent of each other. Both serial autocorrelation (relatedness of data due to proximity in time) and seasonal autocorrelation (relatedness of data due to a time index such as seasons or years) were evaluated by constructing correlograms and calculating Portmanteau Q statistics (which tested for significant autocorrelation at a 2-tailed alpha level of 0.05). Our data assumed first-order and seasonal autocorrelation as appropriate, which were incorporated into the time series models. Time series models assume that the random error component of an individual model is normally distributed with a mean of zero and has homogeneous (constant) variance. Normality was evaluated by creating histograms of the time series model residual values; in addition, residuals were plotted against time to assess centrality around zero and mean residual values were calculated. Of the 15 models constructed (5 PSIs and 3 groups of interest), 2 had slight skewing (PTX in New York teaching hospitals and APL in California teaching hospitals). However, the residual means were close to zero. Homogeneity of variance was evaluated by plotting residual values versus fitted values. All plots had a random appearance confirming constant variance.

For each regression performed, the risk-adjusted slope (change in PSI rate per quarter) of the pre, post, and immediate intervention PSI trend was compared with a slope of zero (no change in trend) with 2-tailed significance achieved at an alpha level of 0.05. Final results reflected the change in PSI rate per quarter. An improvement in PSI rate was determined if a statistically significant decline in these adverse event rates was observed after resident work hour limits were enforced. A possible effect was noted if PSI rates showed a significant preintervention trend (either improving or worsening) but no significant change after enforcement. Worsening PSI rates were apparent if a statistically significant rise in rate was discovered after enforcement. Finally, no effect was interpreted if resident work hour limit enforcement had no discernible impact on PSI rates over time. Graphical results are presented as risk-adjusted PSI rates. In addition, smoothed risk-adjusted rates are presented for visual clarity in discerning trends. All analyses and graphical processing were performed using STATA version 8.2 (STATA Corporation, College Station, TX).

RESULTS

General Characteristics of Study Groups

Study group characteristics (by state, teaching status, and year) are summarized in Table 1. In general, more teaching hospital discharges were observed in New York State and California than nonteaching discharges in New York each year. The mean age of patients was highest in New York nonteaching institutions followed by New York and California teaching hospitals, respectively. No clear pattern of distribution was discernible with respect to the proportion of women by state, teaching status, or year.

TABLE 1. General Characteristics of Study Groups 1995–2001

graphic file with name 2TT1.jpg

Comparison of Calculated PSI Rates to Published Empiric Performance

Empiric performance of national PSI rates is published by AHRQ.9 These published national rates are compared with calculated mean preintervention rates obtained in this study and are summarized in Table 2. Both the mean preintervention rates and published rates were in general agreement. Total New York State cumulative events observed over the entire time period for each PSI were 33,756 APLs, 36,970 PEDVTs, 1447 FBs, 10,727 PTXs, and 2520 WDs.

TABLE 2. Comparison of Calculated Preintervention Patient Safety Indicator Rates to Published Empiric Performance

graphic file with name 2TT2.jpg

PSI Rates Over Time and Interrupted Time Series Regression Modeling Results

No immediate effect of the resident work hour limit enforcement was observed in any study group. Figures 1 to 5 depict risk-adjusted PSI rates per 1000 discharges over time, time series smoothed graphs, and risk-adjusted changes in PSI rate per 1000 discharges per quarter from time series modeling. A worsening in APL rate of 0.15 per 1000 discharges per quarter was observed (95% confidence interval, 0.09–0.20, P < 0.05) after enforcement of work hour limits in New York, with no change in either concurrent control group. Similarly, a worsening in postoperative PEDVT rate of 0.43 per 1000 discharges per quarter was evident (95% confidence interval, 0.03–0.83, P < 0.05) after New York resident work hour limit enforcement. The work hour limit enforcement was not associated with changes in rates of FB, PTX, or WD in either New York State teaching or nonteaching hospitals. Rates of FB and PTX showed possible effects for the worse in California teaching hospitals corresponding with work hour limit enforcement in New York State. Interpretations of interrupted time series regression models are summarized in Table 3.

graphic file with name 2FF1.jpg

FIGURE 1. Accidental puncture or laceration rates for New York (NY) teaching hospitals, NY nonteaching hospitals (concurrent control group), and California (CA) teaching hospitals (concurrent control group) plotted over time (year and first quarter “q1” are indicated). Dashed vertical line indicates intervention time of resident work hour limit enforcement (second quarter of 1998) in NY teaching hospitals. The top graph in each figure plots risk-adjusted rates, and the bottom graph shows time series smoothed risk-adjusted rates for clarity. Changes in PSI rate per 1000 discharges per quarter (risk-adjusted) are shown for each study group in the preintervention and postintervention periods with 95% confidence intervals and asterisk (*) indicating P < 0.05 by regression modeling.

graphic file with name 2FF2.jpg

FIGURE 2. Postoperative pulmonary embolus (PE) or deep venous thrombosis (DVT) rates for New York (NY) teaching hospitals, NY nonteaching hospitals (concurrent control group), and California (CA) teaching hospitals (concurrent control group) plotted over time. Graphing convention and data presentation are described in Figure 1; asterisk (*) indicates P < 0.05 by regression modeling.

graphic file with name 2FF3.jpg

FIGURE 3. Foreign body left during procedure rates for New York (NY) teaching hospitals, NY nonteaching hospitals (concurrent control group), and California (CA) teaching hospitals (concurrent control group) plotted over time. Graphing convention and data presentation are described in Figure 1; asterisk (*) indicates P < 0.05 by regression modeling.

graphic file with name 2FF4.jpg

FIGURE 4. Iatrogenic pneumothorax rates for New York (NY) teaching hospitals, NY nonteaching hospitals (concurrent control group), and California (CA) teaching hospitals (concurrent control group) plotted over time. Graphing convention and data presentation are described in Figure 1; asterisk (*) indicates P < 0.05 by regression modeling.

graphic file with name 2FF5.jpg

FIGURE 5. Postoperative wound dehiscence rates for New York (NY) teaching hospitals, NY nonteaching hospitals (concurrent control group), and California (CA) teaching hospitals (concurrent control group) plotted over time. Graphing convention and data presentation are described in Figure 1; asterisk (*) indicates P < 0.05 by regression modeling.

TABLE 3. Interpretation of Interrupted Time Series Regression Models

graphic file with name 2TT3.jpg

DISCUSSION

This study provides initial data that resident work hour limits enforced in New York State were not associated with surgical patient safety measure improvement in teaching hospitals. Furthermore, worsening trends in rates of APL and postoperative PEDVT were observed after enforcement of resident work hour regulations in New York teaching hospitals, which were not seen in New York nonteaching hospitals or California teaching institutions. Strengths of the approach used in this study include its population-based nature allowing the analysis of multiple institutions, use of time series analysis methodology to reduce bias in hypothesis testing, utilization of standardized measures of patient safety, and evaluation of resident work hour limit effects in the group most affected by these regulations: surgical specialties.

To date, evaluation of resident work hour limits’ impact on patient safety consists chiefly of nonstandard surveys and the examination of a variety of clinical outcomes. Survey data have produced conflicting results. Conigliaro et al administered surveys to 345 senior medical residents in New York State 2 years after enactment of resident work hour limits; resident physicians perceived less fatigue and had more spare time while attending staff were concerned about “shift mentality” and continuity of care.6 In a survey of 4510 obstetrics and gynecology residents, fatigue (78%), increased personal time (76%), and fear of compromising care (60%) were cited as legitimate reasons for reducing work hours.16 Based on survey data, other investigators have promoted work hour regulations as a means to improve resident education.17–19

Scant data exist on the effects of resident work hour limits and clinical outcomes. In an evaluation of different housestaff schedules, Petersen et al20 examined 3146 medical patients and reported that preventable adverse events (defined as unexpected complications of medical therapy that resulted in increased length of stay or disability at discharge) were 3.5 times more likely to occur when patients were under the care of a “cross covering” physician as compared with patients’ primary physicians. Griffith et al21 found that an increased clinical workload affected decisions to perform procedures on infants in a neonatal intensive care unit. In recent work, Howard et al8 examined the effect that New York resident work hour limits had on mortality (for diagnoses of congestive heart failure, acute myocardial infarction, or pneumonia) before the era of regulation (1988) and after (1991). This study evaluated 170,214 teaching hospital patients and 143,455 nonteaching hospital patients from a New York administrative dataset. No differences in mortality were observed for the studied diagnoses between teaching and nonteaching hospitals. Landrigan et al,22 in a prospective randomized fashion, recently evaluated the effect of a reduced work load schedule compared with a traditional “every third night” call schedule for first year medical residents on intensive care unit rotations. This study of 634 admissions found that residents on the traditional work schedule had 36% more serious medical errors (defined by diagnostic, procedural, and medication related errors) as compared with their reduced work schedule counterparts. These studies provide important information regarding resident work hour limits and patient care outcomes but have some limitations. “Before-after” studies do not capture secular trends that may occur over time (regardless of the effects of work hour limits). Our study assesses secular trends in the time series analysis to help evaluate other influences that may have affected the outcome measures. In addition, studies evaluating medical residents and services may not fully capture the variation in outcome created by work hour limits. Residents in surgical and obstetrics-gynecologic specialties tend to work more hours per week (average of 96 hours) than their medical counterparts.23 Thus, work hour limits would have the most impact on surgical specialties. For this reason, we studied outcome measures sensitive to surgical training.

Our data revealed worsening trends of APL and PEDVT in New York teaching hospitals after resident work hour limits were enforced. Because of the study design limitations, the reasons behind these trends remain elusive. Specifically, this study was designed to examine changes in adverse events over time and test if these changes were associated with the designated intervention. Our study cannot discern cause and effect at the patient level. Other factors may have coincided with the intervention evaluated and might have impacted the outcome measures; these include changes in nurse staffing, change in health care technology, and a variety of other issues unrelated to resident work hour limits. However, these factors should affect teaching and nonteaching institutions alike. More physician extenders were hired in New York teaching hospitals to compensate for the decreased number of hours worked by residents, which also may have impacted the outcome measures.24 However, this shift in workforce is a direct result of the resident work hour limit policy and would be one component of the “intervention” evaluated in this study. Our results suggest that unintended consequences of resident work hour limits may have occurred in teaching hospitals only. Possibly, more technical errors during procedures (APL) are occurring as surgical trainees have fewer hours with which to become technically proficient during their training. In addition, the increased number of shifts made necessary by shorter work schedules may increase the number of informational exchanges by caregivers; this scenario might lead to higher rates of postoperative PEDVT if appropriate and timely antithrombotic measures are not communicated and instituted. These and other possibilities require further evaluation to help determine the causes of the increased adverse events. A critical finding of this study, nevertheless, is that PSI adverse event rates of APL and PEDVT do not improve (and, as shown here, worsen) after enforcement of resident work hour limits. These results are bolstered by the lack of an effect in either concurrent control group (New York nonteaching hospitals or California teaching hospitals). Although no improvement (or worsening) was observed in the other measures examined (FB, iatrogenic PTX, and WD) in any study group, significant preintervention declines in rates of FB and PTX were observed in California teaching hospitals (Table 3; Figs. 3 and 4). These were conservatively interpreted as “possible effects” because it would be difficult to determine if the postintervention “stabilization” of these rates (ie, no postintervention increase or decrease) was due, in part, to the attainment of the natural nadir of these rates or due to any effect of resident work hour limits. Conversely, we cannot exclude the possibility that resident work hour limit enforcement in New York teaching hospitals in some way impacted care in California teaching hospitals (even though these states are geographically separated).

Several limitations deserve mention in this analysis. First, this study can be characterized as an “ecological” evaluation in which group outcomes are correlated with the intervention (resident work hour limit enforcement), yet the effect of the intervention is difficult to assess at the individual patient level. As such, this study is subject to the “ecologic fallacy” in which erroneous conclusions about individuals are derived from large group data. Because of this limitation, causality is difficult to establish given the study design and its retrospective nature. Other events may have occurred that could affect any or all of the study groups and may have had an impact on the outcome measures. By using concurrent control groups, however, we sought to improve the inferential validity of this study with its particular intervention. In addition, designing prospective evaluations to assess the impact of this policy change in the United States may be impossible due to nationwide implementation of resident work hour limits in all teaching hospitals in 2003. Second, the chance of committing a type II error (concluding that no difference exists between 2 groups when one actually does) is increased in the PSIs with lower total cumulative counts (FB, PTX, WD). Indeed, post hoc power analyses revealed that our study had at least 80% power to detect the resultant differences in all study groups except outcomes of FB in New York teaching hospitals and PTX/WD in New York nonteaching hospitals. Based on our results, adequately powered studies can be performed to further evaluate PSIs with lower cumulative events, now that effect sizes for this intervention are known. We were unable to examine additional hospital-level effects (such as size or volume) due to the increased chance of committing a type II error in hypothesis testing with further stratification. We felt that accounting for patient-level acuity in this study of patient safety events was the most critical task in adjustment while maximizing statistical power. A third limitation of this study is the use of administrative data to detect adverse events. Coding errors and underestimation of true event rates can increase the chance of type II error. In addition, differential changes in coding over time and between study groups could have impacted the outcome measures. However, the detection of surgical patient safety events coded in administrative data has a higher corroboration with actual events (73%–81%) than for medical events (32%–70%).25,26 Because of this limitation, we focused on PSIs sensitive to surgical patient safety or technical performance only. Fourth, not all teaching hospitals in New York State complied with resident work hour limits after enforcement. However, the third party organization hired to assess New York teaching hospital compliance with these regulations noted no “hideous” violations immediately after the second quarter of 1998.13 In addition, we attempted to account for this variable level of compliance by modeling a 1-year transition period after the first quarter of 1998 in our regression analyses. Our finding that there was no immediate effect of work hour limit enforcement on any of the PSI rates in any study group agreed with a more gradual compliance with these regulations even after enforcement. Finally, the work hour restrictions implemented in New York differed slightly from the regulations instituted by the ACGME; this may impact generalizations of this study's results to other states. Most notably, ACGME regulations allow a grace period after 24 hours on call, and programs can apply for a 10% work hour per week increase. However, this study is more of a global evaluation designed to assess the systems implemented by hospitals to compensate for the reduction in resident work hours rather than a study examining the direct effects of fatigue on patient safety. Because teaching hospitals around the nation today are using very similar measures to make up for the reduction in resident work hours as those undertaken in New York (eg, increased ancillary staff and physician extenders, changes in call schedule structure), we believe that these results are generalizable to other states given this limitation.

In this study, we demonstrated results contrary to the hypothesis that resident work hour limits enforced in New York teaching institutions were associated with surgical patient safety measure improvement. In addition, effect sizes were established for the 5 PSI measures with respect to the intervention of work hour regulation enforcement. These data can be used to design further studies examining the singular New York experience with resident work hour limits and the national implementation of work hour regulations mandated in 2003 by the ACGME.

Discussions

Dr. L. D. Britt (Norfolk, Virginia): I commend the authors for their effort to evaluate the effect of resident work hour limits on patient safety. I just have some general comments and a few questions.

The authors mention a less familiar aspect of the New York regulation, which stipulates that physicians-in-training needed to be supervised 24 hours a day by in-house attending staff, all senior resident positions. The authors stated that essentially no hospital complied with these regulations upon the enactment. However, the reported noncompliance rate for the resident work hour requirements, particularly during this period, was extremely high, with some reports estimating a 75% to 90% noncompliance rate even with compliance surveillance and threats of penalties. If this is indeed the case, would this not invalidate your conclusions?

Also, the New York regulations, as a result of the Bell Commission, are more stringent than the ACGME requirement, which provides a 10% increase in the duty hour threshold and a grace period after being on call. With New York not requiring this, will this severely limit any generalization of the findings to any other area outside of New York?

Particularly with respect to the statistical analysis, I have concerns about the complex modeling of data as illustrated in this manuscript. Such modeling has many assumptions: normality assumptions, independent data assumptions, homogeneity of variance assumptions. If that is the case, there is no mention in the manuscript what assumptions were made. And, if so, were they checked? Conclusions reached from invoking statistical methodology in the presence of assumption violations are essentially invalid. Were the assumptions made actually checked?

Also, because the outcome variable in the data analysis was a quarterly rate computed from all the hospitals and lumped together in each of the 3 groups, any effect due to the hospital is missed; therefore, if rates differ from hospital to hospital, it would not be detected in the analysis. Why did you not compute the yearly rate for each of the 3 years pre- and post-intervention? Such an analysis would have determined if the hospitals were significant and if years were significant. If years are not significant, intervention had no effect on the outcome measure. Why did the authors choose an analysis, which would ignore the hospital effect?

Again, I want to commend the authors for a gallant attempt to address a very puzzling dilemma.

Dr. Kirby I. Bland (Birmingham, Alabama): It is a pleasure to discuss this important contemporary paper organized by the Section of Surgical Sciences at Vanderbilt.

The authors have confirmed in this interrupted time series analysis performed between 1995 and 2001 the nationwide inpatient sample data. In this careful analysis of a mean 2.6 million discharges per year in New York, the authors have confirmed no changes in either the control or the New York teaching hospital rates of 1) retention of foreign bodies during procedures, 2) iatrogenic pneumothorax, or 3) frequency of wound dehiscence. The authors concluded that resident work hour limits in these New York hospitals with restriction to an 80-hour work week prior to the mandate of the ACGME-Resident Review Committee (RRC) in Surgery did not improve surgical patient safety compared to the nonteaching New York hospitals and California teaching hospitals.

These results give critical, timely concern for the New York resident work hour limits and provide the authors a measure of “PSI effect” to allow design of efficient evaluations of national work hour regulations. The primary goal of work hour limits (eg, the 80-hour limit) has been “an improvement in patient safety.” This largely has surfaced from the highly publicized, and in my view erroneous, interpretation of the errors incurred in a teaching hospital in New York state with the management of Libby Zion. The public reaction to controversial reports that have been linked to the Zion case has inferred that it is highly probable that there is an increase in medical errors and the possibility that this is linked strongly to resident fatigue (Samkoff and Jacques, Academic Medicine 1991; Kohn LT, et al. National Academy of Press, 2000).

This work is important in that most prior evaluations of resident work hour limits have been limited to nonstandardized surveys and to single institutions. In this study, the authors have not been able to confirm the hypothesis “that resident work hour limits have improved measures of patient safety in New York teaching hospitals.” Rather, the authors note that not only have they not improved these measures of surgical patient safety, but worsening trends were evident in 2 of the 5 standardized surgical PSIs evaluated.

This study raises several issues. With the enactment of the 80-hour work week, the RRC Council of Surgical Chairs (10 in number) became very concerned of the ability in the highly intense specialties (General Surgery, Cardiothoracic, Neurosurgery, Vascular, Trauma-Critical Care) that limitations in work hours would potentially impact resident operative experience accredited for index cases in the primary and secondary components of Surgery recorded by the Boards of these specialties. From the perspective of the RRC, this is currently being evaluated in great detail. Should this have negative consequences to limit primary and secondary procedures, the restriction of the total number of index cases performed by a finishing chief resident could have negative impact upon surgical experience, and thus, patient management competence. This has yet to be confirmed but remains an issue for deliberation by the RRC.

Many program directors and future employers of surgical residents have suggested that the 80-hour work week may create a “shift mentality” as all residents (effective July 2003) are required to abide and document 80-hour work week profiles. Will this concern of performance limitation for duty hours be carried forward into the job market and into academic practice? These issues are to be evaluated in future studies when more experience is available for the ultimate impact of 80-hour duty issues on surgical resident education for cognitive and technical skills. The RRC-Surgery has also made an attempt to provide dispensation to the senior/chief resident work hours and received unanimous approval to do so from the RRC Council of Chairs of the ACGME. However, because of concern of public safety and the short duration (9 months) in which the restriction in duty hours have been in effect, the ACGME Board of Directors has denied this opportunity to give the chief resident more latitude in patient care without restriction of duty hours. This will be revisited within the next 6 to 12 months by the RRC Council.

Finally, I would like to ask the authors to extrapolate upon the obvious concern as to why the 2 PSI indicators confirmed worsening trends for accidental puncture or laceration as well as for postoperative pulmonary emboli and DVT? These worsening trends were evident only in teaching hospitals that had resident work hour limitations in New York. 1) Is it possible that an increase in pulmonary emboli and DVT are related to the lack of patient contact, the lack of communication or the restriction in opportunity to be inquisitive in patient management issues which would otherwise have been avoided if proper physical examination and evaluation had been completed? 2) Is there any probability that you could confirm from your large database that a greater frequency of technical errors occurs with restriction of hours for residents to become less technically proficient in the conduct of these procedures?

I have enjoyed this paper very much. We must all be concerned that we continue to keep very high on our agenda the training objectives and the quality of the product (the resident) that we produce, as the future of American Surgery will depend on the competency of our trainees.

Dr. Josef E. Fischer (Boston, Massachusetts): I would like to congratulate the authors for addressing this issue from the standpoint of surgical training programs because heretofore all of the data that we have seen have been from either single institutions or have been associated with medical programs.

I think you are all aware of the fact that recently in the New England Journal of Medicine, Czeisler and his group from Brigham and Women's Hospital published a paper concerning medical ICU with the interns that seemed to have less than adequate supervision and the number of errors they made after a certain period of time on duty.

I was interviewed for the Wall Street Journal by Laura Landrau, who is one of the very well-respected assistant managing editors, and pointed out to her that I thought that the authors had a vested interest and a conflict, basically because all of the previous work was based on the fact that there was such a problem, and that I thought what the outcome really reflected was the fact that, in a largely unsupervised medical ICU setting with a raw, new medical intern, this was an expected outcome without senior supervision and that in a surgical service in which there is senior supervision at most or all times, the results might be different.

Unfortunately, the only thing that was quoted of our fairly long interview was the fact that I thought that 90 to 92 hours was a necessary period of time for continuity of care, and this is the result of the studies that Bob Bower carried out in Cincinnati, and that was our data. I got into a fair amount of trouble with that quote in the People's Republic of Massachusetts, which seems to be somewhat different from the rest of the country.

And yet this paper, I think, asks several cogent questions. But I have a little difficulty with the parameters such as those chosen because I think the basic issue, at least for a surgical service, is continuity of care. And, I have difficulty, as does Dr. Bland, transposing continuity of care into the observed outcome.

So my questions for the authors are: Are there other parameters that are more closely linked to continuity of care? Or are there other questions that the authors asked that were not particularly answered by the types of parameters that were chosen? Because I think certainly for a surgical service, continuity of care is the basic issue, an issue thatis still out there and needs to be answered.

Dr. Thomas R. Russell (Chicago, Illinois): I really enjoyed this paper by Dr. Beauchamp's group and appreciate your attempt to put science into something that has been imposed on our profession.

I think many of us who have come from another model, the immersion model, where we simply lived in hospitals, have questions, and are aware of some of the potential difficulties this could create.

I think, though, that the public's view of this is very important. And that was one of your first comments. The public view is something that you have to really consider as we try to put science into this. Obviously, the College had a strong position on this for years, but I think the decision has been made that work hours are going to be imposed, and I don't think, from my analysis of the situation, that we are going to go back on that.

So I would like to just ask a question about the need of the science and trying to see whether it is safer or not safer. And this model is very unique and complicated. But perhaps we would be better off to devote our attention to how to make this system that has been put on us work better with the use of technology, ways of signing out, learning from each other, such as through our web page where we get several thousand hits a month from various interested parties around the country. Is there anything that we could be doing to address the issue to make this thing work? I just don't see us going back to the old model. And yet we could try to be scientific and never prove its value in safety. But my question is: Should our efforts be better put in a different direction?

Dr. Thomas R. Gadacz (Augusta, Georgia): There are several issues about the 80-hour work week. First, the 80-hour work week is mandated for all residency training programs. Second, what is an 80-hour work week and how are those 80 hours spent? Can we have an 80-hour plus work week and structure it so that patient care is not fragmented and fatigue and medical errors avoided? Third, what work effort can residents expect once they enter practice?

Although we haven't analyzed all the data, we are completing a survey from the Southeastern Surgical Congress on the work hours of physicians in practice. A real surprise is the finding that the mean average work hours of surgeons in full-time practice is around 65 hours per week, and 20% of the surgeons in practice exceed 80 hours. I would be very surprised if that is going to change just because the 80-hour work week has been mandated for residents. Surgery residents going into practice may have to work longer than 80 hours per week.

We need to design a work hour week that is not totally dependent upon total hours and allow sufficient rest to prevent impairment because of sleep deprivation. Surgeons in practice seem to have better control of their hours to avoid exhaustion. We need to design systems within the residency program that can control the fatigue component and simply not just focus on the 80-hour work week.

Do you have any sense of other components that may have contributed to the higher rate of medical errors that occurred in the 2 instances that you quoted?

Dr. James T. Evans (Buffalo, New York): I would just like to make a comment on databases and some of the outcomes. First, it comes from the same institution that recently published an article about using professional coders to upgrade your reimbursement.

At the same time that the data is collected, there were 2 changes in New York. First of all, they went from having a long discharge field capability to having both truncated and long. And each institution was given the opportunity to choose which discharge category they would use. So some hospitals only transmit the top 5 discharge diagnoses rather than the whole long list. And at the same time, more and more professional coders were coming in to help upgrade.

Two or 3 of the items that are listed in this discharge component will help the reimbursement if they are listed in your discharge summary. So what you see is what the hospitals in New York were doing. At the same time the work hours were coming about, they were finding for reimbursement by more actively coding their top 5 diagnoses in order to effect their reimbursement.

I commend the authors on their effort in looking at this, and unfortunately I think you will probably get the most frequent requests from those people in our state who champion 60-hour work weeks.

Dr. Benjamin K. Poulose (Nashville, Tennessee): To address Dr. Britt's questions, we will first start with the reported high noncompliance rate even after the enforcement of resident work hour limits in New York state. The best information we can obtain from the third-party organization hired to oversee New York state compliance after enforcement was that no “hideous” violations existed just after enforcement. Those hospitals labeled as noncompliant included programs with very small infractions to more gross violations of the work rules. The larger proportion, 45%, of noncompliance represented hospitals that had residents working in excess of 24 hours per work period, while the 80-hour work week infringement was 28%.

Second, Dr. Britt asked if New York regulations were more stringent than the current regulations promulgated by the Accreditation Council of Graduate Medical Education, referring specifically to the additional 10% increase in work hours per week for selected programs. This is certainly a difference between the 2 sets of regulations, but we feel the generalizability to the current national experience is preserved because we are really evaluating the net effect of the systems involved to compensate for the reduction in work hours rather than the work hour restrictions themselves.

Third, Dr. Britt asked about the necessary assumptions required for time series analysis. We did assess if these assumptions were met before applying our time series models. The most important ones are the following. Number 1 is checking the data for all manifestations of autocorrelation including first, second, or higher-order serial correlation and seasonal correlation. As Dr. Beauchamp mentioned, this is done to determine whether or not outcomes in a given year are more related to outcomes in adjacent years rather than years more distant from that time point and to assess whether outcomes correlate by some time index such as month or season. Number 2 is assuming the random error in the model (not the autocorrelation) is normally distributed and has constant variance. Number 3 is ensuring a minimum baseline raw event count of at least 20 events per quarter for reliable rates. We tested and met these assumptions before our analysis; these are the reasons why “traditional” methods of statistical analyses may lead to biased conclusions when analyzing time-dependent data.

Dr. Britt's final comment concerned why we chose a cumulative measure of a quarterly rate as opposed to individual hospital level rates. I refer to my previous point about maintaining a baseline reliable rate for analysis. We really had no way with our dataset to look at individual hospitals yet also maintain enough reliability in the number of events per unit time for robust analyses.

Dr. Bland asked about some possible causes of the increase in rates of pulmonary embolus/deep venous thrombosis and accidental puncture or laceration. Do they occur because of less patient contact or a number of other reasons? I would like to stress that this study was designed to test an association and not causality. If I could use an aeronautical analogy: with administrative datasets, we are flying at 10,000 feet and can see the ships in the ocean, can count how many exist, and can see where they are headed, but we don't know exactly what is happening on the decks of those ships. This is occurring in our analysis using administrative data. Our upcoming goals are to confirm these analyses in subsequent datasets using the State Inpatient Databases and to perform root-cause analyses if these results are reproduced.

Dr. Bland's second question asked if we can confirm a greater frequency of technical errors with these data. The 2 outcome measures that help answer this question on a system-wide level are the rate of accidental puncture or laceration and rate of iatrogenic pneumothorax. However, the mentioned limitation still applies: we have no idea about causality at the individual patient level.

Dr. Fischer mentioned Dr. Charles Czeisler's work recently published in the New England Journal of Medicine regarding resident work hours and errors in the intensive care unit and the issue of measuring continuity of care outcomes. Dr. Czeisler's well-designed ongoing studies hone down on some of the direct effects of fatigue on clinical outcomes. We, on the other hand, are examining less of the fatigue component and more of the “systems compensation” that has occurred as a result of resident work hour limits. Obtaining better outcomes to measure continuity of care is something we are exploring. In this study, the increased rates of pulmonary embolus/deep venous thrombosis found after resident work hour limit enforcement may reflect inefficiencies in provider-to-provider communication as more informational exchanges occur in patient care, but this needs further exploration.

Dr. Russell mentioned that the decision has already been made, so why really look at this? We agree fully that there is no turning back the clock on this issue. Our intent was a critical systems evaluation of the processes put into place to compensate for the reduction in resident work hours. To put it simply, wanted to obtain data on the possible effects that this policy change had on clinical outcomes. The questions really become: Do we need wider use of computerized order entry systems to decrease the number of adverse events that are occurring? Do we need to institute surgical simulation initiatives that make training more efficient and serve as adjuncts to clinical surgical experiences? All these questions remain to be answered.

Dr. Gadacz asked the question of how the 80 hours per week are being spent. This is a very important issue as 80 hours of performing scut is quite different from sleeping for 40 of those 80 hours. Single institution experiences can contribute greatly to this topic by detailing how residents are spending time. Possibly, some of the techniques of calculating surgical work described in another paper to be presented during this meeting could be used.

And finally, there was a comment from the floor about coding issues and whether there could there be differential coding between 2 of the study groups. Yes, this is absolutely possible. One way we tried to compensate for this was to use multiple concurrent control groups. I will add that we examined these same outcomes in the state of Illinois as another large state where work hour limits were not mandated on a statewide level, and we did not observe a change in these rates over the examined time period.

I would like to thank Dr. Holzman, my mentor, for his invaluable guidance in completing this study and I thank the Southern Surgical Association for the privilege of closing this discussion.

Footnotes

Supported by a grant from the Agency for Healthcare Research and Quality (T32 HS 13833-01).

Reprints: Benjamin K. Poulose, MD, D-5203 Medical Center North, 1161 21st Avenue South, Vanderbilt University Medical Center, Nashville, TN 37232. E-mail: benjamin.poulose@vanderbilt.edu.

REFERENCES

  • 1.Bell BM. The new hospital code and the supervision of residents. N Y State J Med. 1988;88:617–619. [PubMed] [Google Scholar]
  • 2.Brensilver JM, Smith L, Lyttle CS. Impact of the Libby Zion case on graduate medical education in internal medicine. Mt Sinai J Med. 1998;65:296–300. [PubMed] [Google Scholar]
  • 3.Samkoff JS, Jacques CH. A review of studies concerning effects of sleep deprivation and fatigue on residents’ performance. Acad Med. 1991;66:687–693. [DOI] [PubMed] [Google Scholar]
  • 4.Kohn LT, Corrigan JM, Donaldson MS. To Err Is Human: Building a Safer Health System. Washington, DC: National Academy Press, 2000. [PubMed] [Google Scholar]
  • 5.Whang EE, Mello MM, Ashley SW, et al. Implementing resident work hour limitations: lessons from the New York State experience. Ann Surg. 2003;237:449–455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Conigliaro J, Frishman WH, Lazar EJ, et al. Internal medicine housestaff and attending physician perceptions of the impact of the New York State Section 405 regulations on working conditions and supervision of residents in two training programs. J Gen Intern Med. 1993;8:502–507. [DOI] [PubMed] [Google Scholar]
  • 7.Laine C, Goldman L, Soukup JR, et al. The impact of a regulation restricting medical house staff working hours on the quality of patient care. JAMA. 1993;269:374–378. [PubMed] [Google Scholar]
  • 8.Howard DL, Silber JH, Jobes DR, et al. Do regulations limiting residents’ work hours affect patient mortality? J Gen Intern Med. 2004;19:1–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Agency for Healthcare Research and Quality. Quality Indicators-Patient Safety Indicators: Software Documentation, version 2. 1-SAS [AHRQ Publication 03-R203]. Rockville, MD: Agency for Healthcare Research and Quality, 2003. [Google Scholar]
  • 10.Matowe LK, Leister CA, Crivera C, et al. Interrupted time series analysis in clinical research. Ann Pharmacother. 2003;37:1110–1116. [DOI] [PubMed] [Google Scholar]
  • 11.Agency for Healthcare Research and Quality. Healthcare Cost and Utilization Project: Nationwide Inpatient Sample. Rockville, MD: Agency for Healthcare Research and Quality. [Google Scholar]
  • 12.New York State Department of Health, Medical Staff. New York Codes, Rules, and Regulations. 1988;10:405.4.
  • 13.Hallam K. N.Y. investigators checking up on residents’ training. Mod Healthcare. 1998;28:20. [PubMed] [Google Scholar]
  • 14.State Health Department Cites 54 Teaching Hospitals for Resident Working Hour Violations, 2002. Available at http://www.health.state.ny.us/nysdoh/commish/2002/resident_working_hours.htm. Accessed November 11, 2004.
  • 15.Elixhauser A, Steiner C, Harris DR, et al. Comorbidity measures for use with administrative data. Med Care. 1998;36:8–27. [DOI] [PubMed] [Google Scholar]
  • 16.Defoe DM, Power ML, Holzman GB, et al. Long hours and little sleep: work schedules of residents in obstetrics and gynecology. Obstet Gynecol. 2001;97:1015–1018. [DOI] [PubMed] [Google Scholar]
  • 17.Kelly A, Marks F, Westhoff C, et al. The effect of the New York State restrictions on resident work hours. Obstet Gynecol. 1991;78:468–473. [PubMed] [Google Scholar]
  • 18.Carey JC, Fishburne JI. A method to limit working hours and reduce sleep deprivation in an obstetrics and gynecology residency program. Obstet Gynecol. 1989;74:668–672. [PubMed] [Google Scholar]
  • 19.Ruby ST, Allen L, Fielding LP, et al. Survey of residents’ attitudes toward reform of work hours. Arch Surg. 1990;125:764–767; discussion 767–768. [DOI] [PubMed]
  • 20.Petersen LA, Brennan TA, O'Neil AC, et al. Does housestaff discontinuity of care increase the risk for preventable adverse events? Ann Intern Med. 1994;121:866–872. [DOI] [PubMed] [Google Scholar]
  • 21.Griffith CH 3rd, Desai NS, Wilson JF, et al. Housestaff experience, workload, and test ordering in a neonatal intensive care unit. Acad Med. 1996;71:1106–1108. [DOI] [PubMed] [Google Scholar]
  • 22.Landrigan CP, Rothschild JM, Cronin JW, et al. Effect of reducing interns’ work hours on serious medical errors in intensive care units. N Engl J Med. 2004;351:1838–1848. [DOI] [PubMed] [Google Scholar]
  • 23.Gaba DM, Howard SK. Patient safety: fatigue among clinicians and the safety of patients. N Engl J Med. 2002;347:1249–1255. [DOI] [PubMed] [Google Scholar]
  • 24.Spencer FC. A surgical program: director's view. Bull N Y Acad Med 1991;67:344–350. [PMC free article] [PubMed] [Google Scholar]
  • 25.Lawthers AG, McCarthy EP, Davis RB, et al. Identification of in-hospital complications from claims data: is it valid? Med Care. 2000;38:785–795. [DOI] [PubMed] [Google Scholar]
  • 26.McCarthy EP, Iezzoni LI, Davis RB, et al. Does clinical evidence support ICD-9-CM diagnosis coding of complications? Med Care. 2000;38:868–876. [DOI] [PubMed] [Google Scholar]

Articles from Annals of Surgery are provided here courtesy of Lippincott, Williams, and Wilkins

RESOURCES